word2vec   1708

« earlier    

Rotated Word Vector Representations and their Interpretability
Sungjoon Park and JinYeong Bak and Alice Oh
Department of Computing, KAIST, Republic of Korea
6 days ago by hustwj
A Beginner’s Guide to word2vec AKA What’s the Opposite of Canada? | Distilled
You can think of this as:

[king] - [man] + [woman] ~= [queen] (another way of thinking about this is that [king] - [queen] is encoding just the gendered part of [monarch])

[walking] - [swimming] + [swam] ~= [walked] (or [swam] - [swimming] is encoding just the “past-tense-ness” of the verb)

[madrid] - [spain] + [france] ~= [paris] (or [madrid] - [spain] ~= [paris] - [france] which is presumably roughly “capital”)
13 days ago by aleksi
gensim: models.word2vec – Deep learning with word2vec
You can perform various NLP word tasks with the model. Some of them are already built-in:

>>> model.wv.most_similar(positive=['woman', 'king'], negative=['man'])
[('queen', 0.50882536), ...]

>>> model.wv.most_similar_cosmul(positive=['woman', 'king'], negative=['man'])
[('queen', 0.71382287), ...]

>>> model.wv.doesnt_match("breakfast cereal dinner lunch".split())

>>> model.wv.similarity('woman', 'man')
nlp  gensim  word2vec 
13 days ago by aleksi
sdimi/average-word2vec: 🔤 calculate average word embeddings (word2vec) from documents
Quick Python script I wrote in order to process the 20 Newsgroup dataset with word embeddings. Suggested to run on a Jupyter Notebook. Most word2vec pre-trained models allow to get numerical representations of individual words but not of entire documents. While most sophisticated methods like doc2vec exist, with this script we simply average each word of a document so that the generated document vector is actually a centroid of all words in feature space.

gensim (for word2vec model load)

numpy (for averaging and array manipulation)
14 days ago by aleksi
python - Sentence similarity prediction - Data Science Stack Exchange
I'm looking to solve the following problem: I have a set of sentences as my dataset, and I want to be able to type a new sentence, and find the sentence that the new one is the most similar to in the dataset. An example would look like:

New sentence: "I opened a new mailbox"

Prediction based on dataset:

Sentence | Similarity
A dog ate poop 0%
A mailbox is good 50%
A mailbox was opened by me 80%
I've read that cosine similarity can be used to solve these kinds of issues paired with tf-idf (and RNNs should not bring significant improvements to the basic methods), or also word2vec is used for similar problems. Are those actually viable for use in this specific case, too? Are there any other techniques/algorithms to solve this (preferably with Python and SKLearn, but I'm open to learn about TensorFlow, too)?
14 days ago by aleksi
agnusmaximus/Word2Bits: Quantized word vectors that take 8x-16x less storage/memory than regular word vectors
GitHub is where people build software. More than 27 million people use GitHub to discover, fork, and contribute to over 80 million projects.
machinelearning  datascience  word2vec  nlp  language  opensource  programming  ml 
21 days ago by e2b
Word2bits: representing dimensions as single bits, instead of 32bit floats, to save space without losing…
word2vec  from twitter_favs
4 weeks ago by briantrice

« earlier    

related tags

2014  3d  ai  alternative  analysis  annoy  art  autoencoder  aws  bias  bioinformatics  blog-material  chart  classification  clustering  computing  conceptnet  conference  content-samurai  convolution  convolutional  creativity  cs224n  data  data_science  datascience  dataset  datasets  deep-learning  deeplearning  development  discussion  doc2vec  embedding  embeddings  evaluation  facebook  fasttext  generative  generators  gensim  github  glove  google  graph  history  hn  howto  ketchum  knn  language-models  language  lda  linguistics  loss-function  loss  lstm  machine_learning  machinelearning  marketing  ml  model  music  nce  negative-sampling  neural-networks  nlp  nlproc  noise-contrastive-loss  numpy  nuralnetwork  open-source  opensource  paper  poetry  presentation  programming  python  pytorch  r  recommendation_system  research  rnn  search  semantic  semantics  sentiment  similarity  slack  softmax  spark  stanford  stitch-fix  tensorflow  text-analysis  text-classification  text-mining  text  textanalysis  theory  thesis  timeseries  tracery  training  transferlearning  tutorial  vector-model  vector  video  visualization  vs  webgl  wikipedia  word-embedding  word  word2vec-tutorial  wordembedding  words  xai 

Copy this bookmark: