natural-language-processing 77
NaturalNode/natural
7 weeks ago by vailripper
Natural language processing for node.
natural-language-processing
node
javascript
7 weeks ago by vailripper
thinkroth/Sentimental
7 weeks ago by vailripper
Sentiment analysis tool for node.js based on the AFINN-111 wordlist.
javascript
node
natural-language-processing
7 weeks ago by vailripper
A Picture of Language - NYTimes.com
8 weeks ago by Vaguery
"The book was enormously popular, and Mr. Reed and Mr. Brainerd’s diagramming swept through American schools like a refreshing breeze. By the latter half of the 19th century, chalkboards had become increasingly common in classrooms; for students, the impact of watching a sentence take shape on that large surface as a comprehensible, often elegant, and sometimes downright ingenious drawing must have been significant. It’s hard to believe anyone but the most dedicated pedant could have actually enjoyed parsing, but plenty of students — including me — loved diagramming.
A century and a half later, diagramming sentences is even more out of date than writing lessons on a piece of slate. When the book I wrote about it was published in 2006, a couple of hundred people sent me e-mails. One writer accused me of succumbing to Stockholm syndrome because I wrote so benignly about the nun who brainwashed me into thinking diagramming was fun. Another asked me for a date. Two objected to my political attitudes, as they deduced them between the lines. A dozen or so either faulted some of the diagrams or challenged me with a particularly tricky sentence."
grammar
pedagogy
styles-of-thinking
sentence-diagrams
mathematical-recreations
natural-language-processing
it-was-fun
A century and a half later, diagramming sentences is even more out of date than writing lessons on a piece of slate. When the book I wrote about it was published in 2006, a couple of hundred people sent me e-mails. One writer accused me of succumbing to Stockholm syndrome because I wrote so benignly about the nun who brainwashed me into thinking diagramming was fun. Another asked me for a date. Two objected to my political attitudes, as they deduced them between the lines. A dozen or so either faulted some of the diagrams or challenged me with a particularly tricky sentence."
8 weeks ago by Vaguery
[1112.6045] Comparing intermittency and network measurements of words and their dependency on authorship
january 2012 by Vaguery
Many features from texts and languages can now be inferred from statistical analyses using concepts from complex networks and dynamical systems. In this paper we quantify how topological properties of word co-occurrence networks and intermittency (or burstiness) in word distribution depend on the style of authors. Our database contains 40 books from 8 authors who lived in the 19th and 20th centuries, for which the following network measurements were obtained: clustering coefficient, average shortest path lengths, and betweenness. We found that the two factors with stronger dependency on the authors were the skewness in the distribution of word intermittency and the average shortest paths. Other factors such as the betweeness and the Zipf's law exponent show only weak dependency on authorship. Also assessed was the contribution from each measurement to authorship recognition using three machine learning methods. The best performance was a ca. 65 % accuracy upon combining complex network and intermittency features with the nearest neighbor algorithm. From a detailed analysis of the interdependence of the various metrics it is concluded that the methods used here are complementary for providing short- and long-scale perspectives of texts, which are useful for applications such as identification of topical words and information retrieval.
natural-language-processing
document-clustering
clustering
feature-selection
algorithms
nudge-targets
january 2012 by Vaguery
[1110.1391] A Comparison of Different Machine Transliteration Models
october 2011 by Vaguery
"Machine transliteration is a method for automatically converting words in one language into phonetically equivalent ones in another language. Machine transliteration plays an important role in natural language applications such as information retrieval and machine translation, especially for handling proper nouns and technical terms. Four machine transliteration models -- grapheme-based transliteration model, phoneme-based transliteration model, hybrid transliteration model, and correspondence-based transliteration model -- have been proposed by several researchers. To date, however, there has been little research on a framework in which multiple transliteration models can operate simultaneously. Furthermore, there has been no comparison of the four models within the same framework and using the same data. We addressed these problems by 1) modeling the four models within the same framework, 2) comparing them under the same conditions, and 3) developing a way to improve machine transliteration through this comparison. Our comparison showed that the hybrid and correspondence-based models were the most effective and that the four models can be used in a complementary manner to improve machine transliteration performance."
natural-language-processing
machine-learning
review
nudge-targets
october 2011 by Vaguery
[1106.5264] Acquiring Correct Knowledge for Natural Language Generation
october 2011 by Vaguery
"Natural language generation (NLG) systems are computer software systems that produce texts in English and other human languages, often from non-linguistic input data. NLG systems, like most AI systems, need substantial amounts of knowledge. However, our experience in two NLG projects suggests that it is difficult to acquire correct knowledge for NLG systems; indeed, every knowledge acquisition (KA) technique we tried had significant problems. In general terms, these problems were due to the complexity, novelty, and poorly understood nature of the tasks our systems attempted, and were worsened by the fact that people write so differently. This meant in particular that corpus-based KA approaches suffered because it was impossible to assemble a sizable corpus of high-quality consistent manually written texts in our domains; and structured expert-oriented KA techniques suffered because experts disagreed and because we could not get enough information about special and unusual cases to build robust systems. We believe that such problems are likely to affect many other NLG systems as well. In the long term, we hope that new KA techniques may emerge to help NLG system builders. In the shorter term, we believe that understanding how individual KA techniques can fail, and using a mixture of different KA techniques with different strengths and weaknesses, can help developers acquire NLG knowledge that is mostly correct."
natural-language-processing
artificial-intelligence
interesting-problems
high-hanging-fruit
machine-learning
nudge-targets
october 2011 by Vaguery
NLTK Home (Natural Language Toolkit)
algorithms api code datamining development education language library linguistics natural opensource programming python research software toolkit tools text ai machinelearning natural-language natural-language-processing natural_language naturallanguage nlp
september 2011 by rryyan
algorithms api code datamining development education language library linguistics natural opensource programming python research software toolkit tools text ai machinelearning natural-language natural-language-processing natural_language naturallanguage nlp
september 2011 by rryyan
[1107.1322] Text Classification: A Sequential Reading Approach
august 2011 by Vaguery
"We propose to model the text classification process as a sequential decision process. In this process, an agent learns to classify documents into topics while reading the document sentences sequentially and learns to stop as soon as enough information was read for deciding. The proposed algorithm is based on a modelisation of Text Classification as a Markov Decision Process and learns by using Reinforcement Learning. Experiments on four different classical mono-label corpora show that the proposed approach performs comparably to classical SVM approaches for large training sets, and better for small training sets. In addition, the model automatically adapts its reading process to the quantity of training information provided."
text-classification
natural-language-processing
machine-learning
nudge-targets
august 2011 by Vaguery
Weka 3 - Data Mining with Open Source Machine Learning Software in Java
may 2011 by approximatelylinear
Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.
Weka is open source software issued under the GNU General Public License.
java
machine-learning
foss
data-mining
NLP
natural-language-processing
algorithms
Weka is open source software issued under the GNU General Public License.
may 2011 by approximatelylinear
ashleyw/phrasie - GitHub
may 2011 by Vaguery
Determines important terms within a given piece of content. It uses linguistic tools such as Parts-Of-Speech (POS) and some simple statistical analysis to determine the terms and their strength.
Ruby
library
tagging
natural-language-processing
NLP
statistics
text-mining
may 2011 by Vaguery
Copy this bookmark: