word2vec   1741

« earlier    

Efficient Vector Representation for Documents through Corruption | OpenReview
We present an efficient document representation learning framework, Document Vector through Corruption (Doc2VecC). Doc2VecC represents each document as a simple average of word embeddings. It ensures a representation generated as such captures the semantic meanings of the document during learning. A corruption model is included, which introduces a data-dependent regularization that favors informative or rare words while forcing the embeddings of common and non-discriminative ones to be close to zero. Doc2VecC produces significantly better word embeddings than Word2Vec. We compare Doc2VecC with several state-of-the-art document representation learning algorithms. The simple model architecture introduced by Doc2VecC matches or out-performs the state-of-the-art in generating high-quality document representations for sentiment analysis, document classification as well as semantic relatedness tasks. The simplicity of the model enables training on billions of words per hour on a single machine. At the same time, the model is very efficient in generating representations of unseen documents at test time.
word2vec  embeddings 
2 days ago by foodbaby
Factors Influencing the Surprising Instability of Word Embeddings
Despite the recent popularity of word embedding
methods, there is only a small body of
work exploring the limitations of these representations.
In this paper, we consider one aspect
of embedding spaces, namely their stability.
We show that even relatively high frequency
words (100-200 occurrences) are often
unstable. We provide empirical evidence for
how various factors contribute to the stability
of word embeddings, and we analyze the effects
of stability on downstream tasks.
word  embeddings  word2vec  evaluation 
17 days ago by foodbaby
How to easily do Topic Modeling with LSA, PSLA, LDA & lda2Vec
This article is a comprehensive overview of Topic Modeling and its associated techniques.
word2vec  TopicModeling  LSA  PSLA  LDA  Ida2Vec  NLU  NLP 
21 days ago by areich
agnusmaximus/Word2Bits: Quantized word vectors that take 8x-16x less space than regular word vectors
Word vectors require significant amounts of memory and storage, posing issues to resource limited devices like mobile phones and GPUs. We show that high quality quantized word vectors using 1-2 bits per parameter can be learned by introducing a quantization function into Word2Vec. We furthermore show that training with the quantization function acts as a regularizer. We train word vectors on English Wikipedia (2017) and evaluate them on standard word similarity and analogy tasks and on question answering (SQuAD). Our quantized word vectors not only take 8-16x less space than full precision (32 bit) word vectors but also outperform them on word similarity tasks and question answering.
word2vec  papers  quantized 
4 weeks ago by foodbaby

« earlier    

related tags

3d  ai  alternative  analysis  annoy  arcana  art  autoencoder  aws  bias  blog-material  blog  chart  classification  clustering  cnn  compare  conference  content-samurai  convolution  convolutional  creativity  cs224n  data  data_science  datascience  datasets  deep-learning  deeplearning  dl  doc2vec  drawing  embedded  embedding  embeddings  entities  evaluation  facebook  fasttext  generators  gensim  github  glove  golang  google  graph  hn  howto  ida2vec  illustration  ir  keras  king  knn  language  languages  lda  learning  loss-function  loss  lsa  machine-learning  machine  machine_learning  machinelearning  man  marketing  ml  model  morphology  music  natural_language_processing  nce  neural-networks  neuralnets  neuralnetworks  nlp  nlproc  nlu  noise-contrastive-loss  numpy  nuralnetwork  open-source  opensource  papers  poetry  presentation  programming  pronouns  psla  python  pytorch  quantized  queen  recommendation_system  search  semantics  sentiment  skipthought  slack  softmax  spacy  spark  stanford  talk  tensorflow  text-analysis  text-classification  text-mining  text  textanalysis  timeseries  topic-model  topicmodeling  topics  tutorial  vector-model  vector  video  visualization  webgl  wikipedia  woman  word-embedding  word  word2vec-tutorial  wordembedding  words  writing  xai 

Copy this bookmark: