jerid.francom + corpus   103

TED talks as Data
The files in this folder are the data files released as part of the paper, "TED talks as Data," submitted to the Journal of Cultural Analyics. The first of which is the exported CSV (from a Google sheet) of a list of TED talks maintained by anonymous authors
corpus  ted  textbook  resources 
5 weeks ago by jerid.francom
PRESEEA es un proyecto para la creación de un corpus de lengua española hablada representativo del mundo hispánico en su variedad geográfica y social. Esos materiales se reúnen atendiendo a la diversidad sociolingüística de las comunidades de habla hispanohablantes.
corpus  spanish  sociolinguistics 
november 2018 by jerid.francom
Home | BSL Corpus Project
The British Sign Language (BSL) Corpus is a collection of video clips showing Deaf people using BSL, together with background information about the signers and written descriptions of the signing in ELAN
corpus  signlanguage  British  corpora 
november 2018 by jerid.francom
The aim of this repository is to promote research on the learning of French and Spanish as L2, by making parallel learner corpora for each language freely available to the research community.
corpus  learner  spanish  french  textbook  data  corpora 
october 2018 by jerid.francom
Linguist's Search Engine
Web interface to various English corpora
web  corpus  search  textbook 
december 2017 by jerid.francom
Tool to produce annotations for the Open ANC
anc  corpus  annotation  tools  corpora  american  english  spoken  written 
october 2014 by jerid.francom
Distant reading and the blurry edges of genre. | The Stone and the Shell
There are basically two different ways to build collections for distant reading. You can build up collections of specific genres, selecting volumes that you know belong to them. Or you can take an entire digital library as your base collection, and subdivide it by genre. Most people do it the first way, and having just…
linguistics  corpora  corpus  nlp  genre  380  literature 
october 2014 by jerid.francom
The Corpora
The Providence (English) Corpus
The Lyon (French) Corpus
The Demuth Sesotho Corpus
corpora  corpus  children  language  acquisition  brown-university 
october 2014 by jerid.francom
Russian National Corpus
A corpus of modern Russian language incorporating over 300 million words.
corpus  russian  rnc  380  corpora  textbook  data 
december 2012 by jerid.francom
Text-mining as a Research Tool in the Humanities and Social Sciences
thanks for sharing Ryan #llcu606 MT @rybesh: "Text mining as a research tool" slides are up:
digitalhumanities  digital  humanities  text  computation  corpus  llcu606  corpora 
september 2012 by jerid.francom
Westbury Lab Web Site: Usenet Corpus Download
This corpus is a collection of public USENET postings. This corpus was collected between Oct 2005 and Jan 2011, and covers 47,860 English language, non-binary-file news groups (see list of newsgroups included with the corpus for details)
English  corpora  corpus  data  dataset  nlp  usenet  textbook 
february 2012 by jerid.francom
Cornell Movie-Dialogs Corpus
A corpus containing a large metadata-rich collection of fictional conversations extracted from raw movie scripts.
ACTIV-ES  Subtitles  corpus  corpora  data  textbook 
february 2012 by jerid.francom
El Habla de la Ciudad de Mèxico - Norma Culta y Habla Popular de la ciudad de México
A new resource for sociolinguistic research from a colleague at the Universidad Nacional Autónoma de México
UNAM  linguistics  sociology  spanish  corpus  corpora 
november 2011 by jerid.francom
« earlier      
per page:    204080120160

related tags

academic  accomodation  acquisition  ACTIV-ES  american  analysis  analytics  anaphora  anc  annotated  annotation  api  applied  arizona  audio-search  ausl  bible  bibliography  bnc  bncweb  british  brown  brown-university  bsl  catalan  celex  children  collegues  colmex  comparable  computation  computers  concordance  concordancer  conversation  corpora  corpus  correlation  course  court  data  database  dataset  datasets  description  design  dh  dialectology  dialects  dialog  dialogue  digital  digitalhumanities  distance  dutch  english  epic  español  europarle  experiments  facebook  free  french  genre  german  ggplot2  google  grammar  guiones  hamming  historical  history  humanities  icame  ICE  imdb  interface  interpreting  iula  journals  l2  labs  language  ldc  learner  legal  linguistics  linguists  listing  literature  llcu606  maltparser  manual  manuals  McEnery  measures  metrics  mexico  mooc  movies  multilingual  narrative  neighborhooddensity  news  ngrams  nlp  online  orthography  parallel  parliament  pedagogy  phonology  phonotactics  plyr  pos  pos-tagger  professional  programming  psychology  publications  pyschology  python  r  reference  research  resources  rnc  russian  scholars  scrape  scraping  screenplays  scripts  search  sentiment  signlanguage  sms  social  sociolinguistics  sociology  software  spanish  spoken  ssl  statistics  subtitles  switchboard  tagging  talkbank  technology  ted  term-extractor  text  textbook  tools  translation  treebank  tutorial  tutorials  twitter  unam  usenet  valibel  varieties  video  viewer  visualization  web  webcrawling  wiki  wikipedia  words  writing  written 

Copy this bookmark: