Cedel2 | Corpus Escrito del Español como L2
CEDEL2 stands for Corpus Escrito del Español como L2 ‘L2 Spanish Written Corpus’.
CEDEL2 (version 1.0) is a large database containing the language produced by learners of Spanish as a second/foreign language (L2). This database is called a linguistic ‘corpus’.
CEDEL2 contains the language produced by English-speaking natives who are learning Spanish. It also contains a subcorpus of Greek-speaking learners of English. For future versions of the corpus, we are also collecting data from Japanese-speaking learners of Spanish.
CEDEL2 also contains a subcorpus of Spanish native speakers for comparative purposes.
corpus  spanish  language  acquisition  textbook 
4 days ago
BOLT English Treebank - Discussion Forum - Linguistic Data Consortium
The source data is English discussion forum web text collected by LDC in 2011 and 2012. A subset of that collection -- 702 files representing 268,907 tokens -- was selected for the treebank and annotated for word-level tokenization, part-of-speech and syntactic structure.

Data is presented in a a variety of UTF-8 encoded text formats, specifically, plain text, XML, and Penn Treebank. See the included documentation for more information about specific formats.
corpus  textbook  english  treebank  constituency  penn-annotation  annotation 
5 days ago
Bootstrap Live Customizer (3.3.5)
This is a live customizer for Bootstrap (aka a Bootstrap Themeroller), very similar to Bootstrap's customizer (it works with the same variables), but here the results of the edits are visible live on this page. The theme.less file is also editable which can be used to overwrite any style rule (the editable less variables can be used in theme.less as well).
r  rstudio  website  bootstrap  css 
8 days ago
Project Environments • renv
The renv package helps you create reproducible environments for your R projects. Use renv to make your R projects more:

Isolated: Installing a new or updated package for one project won’t break your other projects, and vice versa. That’s because renv gives each project its own private package library.

Portable: Easily transport your projects from one computer to another, even across different platforms. renv makes it easy to install the packages your project depends on.

Reproducible: renv records the exact package versions you depend on, and ensures those exact versions are the ones that get installed wherever you go.
package  r  packrat  renv  reproducible  research 
27 days ago
TED talks as Data
The files in this folder are the data files released as part of the paper, "TED talks as Data," submitted to the Journal of Cultural Analyics. The first of which is the exported CSV (from a Google sheet) of a list of TED talks maintained by anonymous authors
corpus  ted  textbook  resources 
9 weeks ago
usethis workflow for package development
In this blogpost I’ll outline the basis workflow you can acquire using the tools in usethis. More specifically I’ll outline a workflow of a R package development.
r  packages  development  tutorials 
july 2019
Yadkin River Adventures
Kayaking service on the Yadkin River.
kayak  nc  winston 
july 2019
Enron Email Corpus
This dataset contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a total of about 0.5M messages. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation.
data  cmc  textbook  email 
june 2019
Penn Discourse Treebank Version 3.0
Discourse Treebank for the Wall Street Journal section of the Penn Treebank (version 2)
corpora  LDC  english 
june 2019
RT : The most popular word in each state
from twitter_favs
may 2019
Small World of Words Home
Word associations provided for multiple languages based on human entries
datasets  language  nlp  textbook  data 
december 2018
PRESEEA es un proyecto para la creación de un corpus de lengua española hablada representativo del mundo hispánico en su variedad geográfica y social. Esos materiales se reúnen atendiendo a la diversidad sociolingüística de las comunidades de habla hispanohablantes.
corpus  spanish  sociolinguistics 
november 2018
Home | BSL Corpus Project
The British Sign Language (BSL) Corpus is a collection of video clips showing Deaf people using BSL, together with background information about the signers and written descriptions of the signing in ELAN
corpus  signlanguage  British  corpora 
november 2018
Coursicle | WFU
Explore classes and plan schedules for courses at WFU
wfu  classes  scheduling 
october 2018
CRAN - Package fakeR
R package to simulate datasets from various distributions
textbook  data  simulation  tidyverse 
october 2018
Simulating study data
R package to simulate datasets with various distributions
textbook  data  simulation  datasets  tidyverse 
october 2018
The aim of this repository is to promote research on the learning of French and Spanish as L2, by making parallel learner corpora for each language freely available to the research community.
corpus  learner  spanish  french  textbook  data  corpora 
october 2018
Radix for R Markdown
Web publishing with R Markdown
Rmarkdown  r  publishing  radix 
september 2018
Rachael Tatman | Kaggle
A great series of tutorials on various aspects of R and doing text analytics with R.
textbook  tutorials  r  nlp  textmining  transformation  modeling 
september 2018
Tutorials on Advanced Stats and Machine Learning With R
A good introduction to ggplot plotting and regression models for data science.
datascience  r  statistics  tutorial  textbook 
july 2018
« earlier      
150 330 380 383 academia academic acquisition activ-es ai amazon analysis analytics annotation api apps backup beer bigdata blog books brewing career census children classification cloud clustering code coding collaboration command-line computation computing conference conferences corpora corpus corpus-linguistics courses crowdsourcing culture data database datamining datascience design development dh dh@wake dialect dictionary digital digitalhumanities distancelearning dmdx documentation ebooks editor education elearning english evolution examples experiments facebook finance food free from funding geolocation ggplot2 gis git github google grants graphics hadoop highered history home homebrew homes howto html humanities humor iceland imdb international ipad iphone javascript jobs journals language languages latex learning lexicon library linguistics linguists linux literacy literature lmer localization mac machinelearning mapping maps markdown methodology mexico modeling movies music my nc neh neuroscience news ngrams nlp nltk online opensource osx package packages parallel pedagogy people perl phonology plots plugins politics privacy processing productivity professional programming project-management psycholinguistics psychology publication publications publishing python r raspberry-pi reading reference regression repository reproducible research resources rmarkdown rstats rstudio scholars science scraping scripts search security semantics sent sentiment server services shiny shopping social socialmedia software spanish speech spoken standards stanford statistics subtitles syntax tagger teaching technology text textbook textbooks textmining tips tm tools translation travel treebanks tutorial tutorials twitter ubuntu unix utilities variation via:zite video vision2020 visualization web web2.0 webcrawling wfu wiki windows winston wordlists wordpress writing

Copy this bookmark: