jerid.francom + annotation   10

BOLT English Treebank - Discussion Forum - Linguistic Data Consortium
The source data is English discussion forum web text collected by LDC in 2011 and 2012. A subset of that collection -- 702 files representing 268,907 tokens -- was selected for the treebank and annotated for word-level tokenization, part-of-speech and syntactic structure.

Data is presented in a a variety of UTF-8 encoded text formats, specifically, plain text, XML, and Penn Treebank. See the included documentation for more information about specific formats.
corpus  textbook  english  treebank  constituency  penn-annotation  annotation 
yesterday by jerid.francom
R package providing a set of fast tools for converting a textual corpus into a set of normalized tables. Users may make use of a Python backend with 'spaCy' or the Java backend 'CoreNLP'.
r  packages  nlp  annotation 
october 2017 by jerid.francom
Tool to produce annotations for the Open ANC
anc  corpus  annotation  tools  corpora  american  english  spoken  written 
october 2014 by jerid.francom

Copy this bookmark: