natural-language-processing   360

« earlier    

What every software engineer should know about search
Ask a software engineer: “How would you add search functionality to your product?” or “How do I build a search engine?” You’ll probably immediately hear back something like: “Oh, we’d just launch an…
search  architecture  artificial-intelligence  natural-language-processing  programming  ai 
9 days ago by e2b
Swype right – Almost looks like work
In this post I’ll discuss optimising the layout of an English QWERTY keyboard in an effort to minimise the average distance a digit must travel to type a word. Let’s have a look.
mathematical-recreations  optimization  natural-language-processing  user-interface  amusing 
6 weeks ago by Vaguery
[1708.00214] Natural Language Processing with Small Feed-Forward Networks
We show that small and shallow feed-forward neural networks can achieve near state-of-the-art results on a range of unstructured and structured language processing tasks while being considerably cheaper in memory and computational requirements than deep recurrent models. Motivated by resource-constrained environments like mobile phones, we showcase simple techniques for obtaining such small neural network models, and investigate different tradeoffs when deciding how to allocate a small memory budget.
neural-networks  natural-language-processing  machine-learning  amusing  not-so-deep  to-write-about  metaheuristics 
7 weeks ago by Vaguery
Natural Language Processing in Artificial Intelligence | Sigmoidal
Back in the days when a Neural Network was that scary, hard-to-learn thing which was rather a mathematical curiosity than a powerful Machine Learning or Artificial Intelligence tool - there were surprisingly many relatively successful applications of classical data mining algorithms in Natural Language Processing (NLP) domain. It seemed that problems like spam filtering or Part of Speech Tagging could be solved using rather easy and understandable models.

But not every problem can be solved this way. Simple models fail to properly capture linguistic subtleties like irony (although humans often fail at that one too), idioms or context. Algorithms based on overall summarization (e.g. bag-of-words) turned out to be not powerful enough to capture sequential nature of text data, whereas n-grams struggled to model general context and suffered severely from a curse of dimensionality. Even HMM-based models had trouble overcoming these issues due to their Markovian nature (memorylessness). Of course, these methods were also used when tackling more complex NLP tasks, but not to a great success.
natural-language-processing  neural-networks  representation  machine-learning  rather-interesting  to-write-about 
7 weeks ago by Vaguery
PoolParty Semantic Suite - Semantic Technology Platform
PoolParty is a world-class semantic technology suite that offers sharply focused solutions to your knowledge organization and content business.
tfidf  natural-language-processing  nlp  api  corpus 
june 2017 by nharbour
Earning My Turns: A (computational) linguistic farce in three acts
The empiricist invaders were in their way heirs to Shannon, Turing, Kullback, I.J. Good who had been plying an effective if secretive trade at IDA and later at IBM and Bell Labs looking at speech recognition and translation as cryptanalysis problems (The history of the road from Bletchley Park to HMMs to IBM Model 2 is still buried in the murk of not fully declassified materials, but it would be awesome to write — I just found this about the early steps that could be a lot of fun). They convinced funders, especially at DARPA, that the rationalist empire was hollow and that statistical metrics on (supposedly) realistic tasks were needed to drive computational language work to practical success, as had been happening in speech recognition (although by the light of today, that speech recognition progress was less impressive than it seemed then). It did not hurt the campaign that many of the invaders were closer to the DoD in their backgrounds, careers, and outlooks than egghead computational linguists (another story that could be expanded, but might make some uncomfortable). Anyway, I was there in meetings where the empiricist invaders allied with funders increasingly laid down the new rules of the game. Like in the Norman invasion of England, a whole new vocabulary took over quickly with the new aristocracy.
natural-language-processing  history-of-science  system-of-professions  essay  have-read  the-objective-truth-oh-right  theory-and-practice-sitting-in-a-tree 
june 2017 by Vaguery
[1703.00565] Scattertext: a Browser-Based Tool for Visualizing how Corpora Differ
Scattertext is an open source tool for visualizing linguistic variation between document categories in a language-independent way. The tool presents a scatterplot, where each axis corresponds to the rank-frequency a term occurs in a category of documents. Through a tie-breaking strategy, the tool is able to display thousands of visible term-representing points and find space to legibly label hundreds of them. Scattertext also lends itself to a query-based visualization of how the use of terms with similar embeddings differs between document categories, as well as a visualization for comparing the importance scores of bag-of-words features to univariate metrics.
natural-language-processing  text-mining  feature-extraction  rather-interesting  programming  library  to-write-about 
june 2017 by Vaguery
E-commerce search engine that parses search queries with natural language processing.
ecommerce  ai  natural-language-processing  ia  search 
may 2017 by stumax
[1702.08359] Dynamic Word Embeddings via Skip-Gram Filtering
We present a probabilistic language model for time-stamped text data which tracks the semantic evolution of individual words over time. The model represents words and contexts by latent trajectories in an embedding space. At each moment in time, the embedding vectors are inferred from a probabilistic version of word2vec [Mikolov, 2013]. These embedding vectors are connected in time through a latent diffusion process. We describe two scalable variational inference algorithms---skip-gram smoothing and skip-gram filtering---that allow us to train the model jointly over all times; thus learning on all data while simultaneously allowing word and context vectors to drift. Experimental results on three different corpora demonstrate that our dynamic model infers word embedding trajectories that are more interpretable and lead to higher predictive likelihoods than competing methods that are based on static models trained separately on time slices.
natural-language-processing  machine-learning  representation  to-understand  skip-grams  consider:looking-to-see  to-write-about 
may 2017 by Vaguery
[1703.08052] Dynamic Bernoulli Embeddings for Language Evolution
Word embeddings are a powerful approach for unsupervised analysis of language. Recently, Rudolph et al. (2016) developed exponential family embeddings, which cast word embeddings in a probabilistic framework. Here, we develop dynamic embeddings, building on exponential family embeddings to capture how the meanings of words change over time. We use dynamic embeddings to analyze three large collections of historical texts: the U.S. Senate speeches from 1858 to 2009, the history of computer science ACM abstracts from 1951 to 2014, and machine learning papers on the Arxiv from 2007 to 2015. We find dynamic embeddings provide better fits than classical embeddings and capture interesting patterns about how language changes.
natural-language-processing  representation  rather-interesting  to-write-about  nudge-targets  consider:representation 
may 2017 by Vaguery

« earlier    

related tags

aesthetics  affect  ai  algorithms  amusing  anaphora-resolution  anaphora  api  architecture  artificial-intelligence  auto  bias  bioinformatics  book  bots  calculus  cfg  chatbot  chatterbot  classification  clojure  computational-psycholinguistics  computer-club  conceptnet  conservatism  consider:architecture  consider:feature-discovery  consider:generative-art  consider:impossible-tasks  consider:looking-to-see  consider:meaning  consider:other-applications  consider:performance-measures  consider:rediscovery  consider:representation  consider:stress-testing  constraint-satisfaction  content-free-grammar  content-samurai  coreference-resolution  corpus  course-notes  course  cyc  data-fusion  data-mining  data-science  data  dataset  datasets  deep-learning  dialog  digital-humanities  digital-preservation  diss  documentation  ecommerce  eliza  embedded-systems  embeddings  emoji  encoders  english  essay  evolutionary-algorithms  feature-construction  feature-extraction  feature-selection  generative-models  genetic-programming  gensim  google  grammar  have-read  history-of-science  horse-races  ia  image-analysis  image-processing  inference  interpretability  ios  javascript  kinda-scary  knowledge  language  latex  learning-by-doing  learning-by-watching  learning-from-data  lecture-notes  lecture  library  linear-algebra  linguistics  logic  looking-to-see  machine-learning  machine-translation  mathematical-recreations  mathematics  max-grigorev  metaheuristics  metal  ml  music  natural-language-understanding  natural-language  neural-nets  neural-network  neural-networks  nll  nlp  nlu  node.js  node  not-so-deep  nslinguistictagger  nudge-targets  ocr  omcs  ontologies  ontology  open-data  open-mind-common-sens  open-source  optimization  overview  package  parse  parser  parsey-mcparseface  parsing  part-of-speech  pattern-discovery  pos  preprocessing  privacy  probability  programming  pronouns  psycholinguistics  pypi  python  r  racism  rather-interesting  rdf  reference  reinforcement-learning  representation  rita  robustness  salesforce  search  seeing-like-a-state  semantics  sentence  sentiment-analysis  sexism  shrdlu  skip-grams  social-networks  social-norms  sources  speech-recognition  speech  stanford  stars:5  statistics  strings  stylometry  summarize  summarizer  summary  survey  syntax  syntaxnet  system-of-professions  tensorflow  text-analysis  text-mining  tfidf  the-objective-truth-oh-right  theory-and-practice-sitting-in-a-tree  thought  time-series  to-grok  to-read  to-share  to-understand  to-write-about  tools  topic-modeling  tree  treebanks  tutorial  twitter  type-systems  uni  universal-dependencies  unsupervised-learning  user-centric?  user-interface  vector  video  voice-interface  wikipedia  word-frequency  word-nets  word-order  wordnet 

Copy this bookmark: