[1612.04118] Information Extraction with Character-level Neural Networks and Free Noisy Supervision
We present an architecture for information extraction from text that augments an existing parser with a character-level neural network. The network is trained using a measure of consistency of extracted data with existing databases as a form of noisy supervision. Our architecture combines the ability of constraint-based information extraction systems to easily incorporate domain knowledge and constraints with the ability of deep neural networks to leverage large amounts of data to learn complex features. Boosting the existing parser's precision, the system led to large improvements over a mature and highly tuned constraint-based production information extraction system used at Bloomberg for financial language text.
This is an implementation of the TextRank algorithm for keyword extraction from documents. It adapts the PageRank algorithm to documents and was originally published in this article.

Intuitively, it builds a graph of words which are linked by the number of times they appear in the same context (here, same sentence). Then, it finds the words that most central in this graph, i.e. appear in context with as many other words from separate parts of the graph. The further refine, it performes part-of-speech tagging on all the debates and took into account only nouns as these are known to be most distinctive for summarization purposes. Then, a chunker identifies names like ‘Wall Street’ or ‘New York’ and collocations such as ‘ballistic missile’ or ‘coal miner’. Finally, it outputs lemmatized words in order to merge words with the same lemma such as ‘republican’ - ‘republicans’.
TinyQueries™ - Home
TinyQueries is a framework for extracting data out of relational databases
"7-Zip is a file archiver with a high compression ratio."
