papers   23197

« earlier    

word2vec Explained: Deriving Mikolov et al.'s negative-sampling word-embedding method
The word2vec software of Tomas Mikolov and colleagues (this https URL ) has gained a lot of traction lately, and provides state-of-the-art word embeddings. The learning models behind the software are described in two research papers. We found the description of the models in these papers to be somewhat cryptic and hard to follow. While the motivations and presentation may be obvious to the neural-networks language-modeling crowd, we had to struggle quite a bit to figure out the rationale behind the equations.
embeddings  word2vec  nlproc  machine-learning  NLP  deeplearning  ml  word-embeddings  papers 
yesterday by jfrazee
Distributed Representations of Words and Phrases and their Compositionality
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling.
nlproc  word2vec  word-embeddings  deeplearning  NLP  embeddings  ML  machine-learning  papers 
yesterday by jfrazee

« earlier    

related tags

academia  acm  activation  adversarial-learning  ai  algorithms  amazon  architecture  artificial-intelligence  attention  bandit  bandits  bayesian  behaviour  berkeley  bike  book  bootstrap  branch_prediction  bytecode  clothing  codegen  collision  commute  compiler  compilers  compsci  computer-science  computer  computerscience  concurrency  crashes  crowd  cryptography  cs  cycling  data-stores  database  databases  dataset  datasets  datastructures  dating  deep-learning  deeplearning  depth  distcomp  distributedsystems  distribution  dl  dynamo  effective-altruism  embeddings  entity  ethics  evaluation  feedback  floating-point  functional  gan  garbage-collection  google  gpu  gradient-flows  graphics  guide  hardware  haskell  health  help  hi-viz  hints  history  hmm  implicit  interesting  interleaving  ir  job  journal  judgements  judgments  kafka  lab  largedeviations  machine-learning  mask  math  meta-learning  metrics  ml  ner  nlp  nlproc  nosql  numerical-methods  numerical  online  optimal-transportation  optimization  overview  paper  parallelism  pdf  pedestrian  philosophy  physics  plt  probabilistic-numerics  probability  programming  pseudo  publishing  readthis  recognition  recsys  recursion-schemes  reference  reinforcement-learning  relevance  rendering  research  reverseengineering  reviews  roguelike  rop  safety  science  scientists  search  security  shading  simulation  sourcing  statistics  storage  strangeloop  study  svm  switch  tails  termrewriting  thompson-sampling  to-read  toread  types  visibility  word-embeddings  word2vec  words  writing 

Copy this bookmark: