corpus   2262

« earlier    

PRESEEA es un proyecto para la creación de un corpus de lengua española hablada representativo del mundo hispánico en su variedad geográfica y social. Esos materiales se reúnen atendiendo a la diversidad sociolingüística de las comunidades de habla hispanohablantes.
corpus  spanish  sociolinguistics 
6 days ago by jerid.francom
Home | BSL Corpus Project
The British Sign Language (BSL) Corpus is a collection of video clips showing Deaf people using BSL, together with background information about the signers and written descriptions of the signing in ELAN
corpus  signlanguage  British 
6 days ago by jerid.francom
Common Crawl
We build and maintain an open repository of web crawl data that can be accessed and analyzed by anyone.

Petabytes of crawled data starting from 2011... for free!

The Common Crawl Foundation is a California 501(c)(3) registered non-profit founded by Gil Elbaz with the goal of democratizing access to web information by producing and maintaining an open repository of web crawl data that is universally accessible and analyzable.
crawler  data  opendata  corpus  commoncrawl  project 
6 weeks ago by searchmeister
The aim of this repository is to promote research on the learning of French and Spanish as L2, by making parallel learner corpora for each language freely available to the research community.
corpus  learner  spanish  french  textbook 
6 weeks ago by jerid.francom
Shakespeare plays | Kaggle
"All of shakespeares plays, characters, lines, and acts in one CSV"
shakespeare  plays  machinelearning  data  corpus  linguistics  history  literature 
8 weeks ago by sometimesfood
1000 hours of read English Speech.
nlp  corpus  speech  dataset  ML 
10 weeks ago by mootPoint

« earlier    

related tags

2018  academic  accountability  annotation  arabic  archive  argument  art  artificialintelligence  artist  audio  bible  biomedical  bnc  british  chatbot  chinese  civics  code  commoncrawl  compling  comprehension  corpora  corpus_ch  corruption  crawler  crime  crowd  culture  data  database  datascience  dataset  datasets  dialog  dialogue  diffbot  education  email  english  forensics  free  french  frequency  funny  geo  github  gloss  grammar  happiness  harassment  historical  history  ifttt  information_extraction  ir  italiano  language  learner  letters  linguistics  literature  machine-learning  machinelearning  malware  mark_davies  media  mediatracking  metoo  mining  misconduct  misogyny  ml  movie  movies  multilingual  music  news  nlp  nlproc  nltk  online  open-source  opendata  opinion  opus  parsing  pinboard  plainlanguage  plays  poetics  portuguese  português  project  protocol  r  reddit  reference  relation_extraction  research  resource  review  reviews  schema  scholarly  sciencepublishing  security  sexism  shakespeare  signlanguage  sociolinguistics  software  sound  sourcecode  sourcing  spanish  speech  subtitles  supremecourt  tagging  text  textanalysis  textbook  tools  translation  trec  tutorial  twitter  ukgov  van.gogh  web  wikipedia  word-frequency  word2vec  wordnet  words  work  writing  xml  yahoo  语料库 

Copy this bookmark: