jerid.francom + data   185

Small World of Words Home
Word association and participant data for 100 primary, secondary and tertiary responses to 12,292 cues. The data published in Behavior Research Methods were collected between 2011 and 2018. The preprocessed data consist of normalizations of cues and responses by spell-checking them, correcting capitalization and Americanizing. In addition to normalizing cues and responses, the preprocessed file contains data in which each cue is judged by exactly 100 participants (see Github repository for details).
data  datasets  word-associations  psycholinguistics 
november 2019 by jerid.francom
Enron Email Corpus
This dataset contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a total of about 0.5M messages. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation.
data  cmc  textbook  email 
june 2019 by jerid.francom
Small World of Words Home
Word associations provided for multiple languages based on human entries
datasets  language  nlp  textbook  data 
december 2018 by jerid.francom
CRAN - Package fakeR
R package to simulate datasets from various distributions
textbook  data  simulation  tidyverse 
october 2018 by jerid.francom
Simulating study data
R package to simulate datasets with various distributions
textbook  data  simulation  datasets  tidyverse 
october 2018 by jerid.francom
The aim of this repository is to promote research on the learning of French and Spanish as L2, by making parallel learner corpora for each language freely available to the research community.
corpus  learner  spanish  french  textbook  data  corpora 
october 2018 by jerid.francom
OECD Statistics
Organization for Economic Co-operation and Development
data  datasets  world 
june 2018 by jerid.francom
The Switchboard Dialog Act Corpus
A corpus of 1155 5-minute conversations in American English, comprising 205,000 utterances and 1.4 million words, from the Switchboard corpus of telephone conversations.
textbook  data  corpora  switchboard  spoken 
november 2017 by jerid.francom
Sentiment lexicon for Portuguese
r  packages  sentiment  portuguese  data  textbook 
november 2017 by jerid.francom
Data Viz Project
Collection of data visualizations to get inspired and finding the right type.
r  visualization  guide  data  textbook 
november 2017 by jerid.francom
Web application for sharing .csv files and making them searchable. Also a potential source of data (if the resource catches on!).
csv  collaboration  data  datascience 
november 2017 by jerid.francom
R package interface to connect with the Glottolog database and provides additional functionality for linguistic mapping.
r  textbook  data  experimental 
october 2017 by jerid.francom
Learner corpora around the world
A listing of learner corpora around the world
textbook  data  listing 
october 2017 by jerid.francom
English Lexicon Project
Access to a large set of lexical characteristics, along with behavioral data from visual lexical decision and naming studies.
data  lexical  lexicon  textbook  psycholinguistics  experimental 
september 2017 by jerid.francom
R package interface to query the data sharing platform FigShare.
r  textbook  figshare  reproducible  research  publishing  data  language  api 
august 2017 by jerid.francom
The Moby lexicon project
Language wordlists and resources from the Moby project.
380  textbook  data  experimental 
september 2016 by jerid.francom
GoodReads: Webscraping and Text Analysis with R (Part 1)
Inspired by this article about sentiment analysis and this guide to webscraping, I have decided to get my hands dirty by scraping and analyzing a sample of reviews on the
r  data  webcrawling  reviews  380 
september 2016 by jerid.francom
Enron Email Dataset
Enron email data from about 150 users, mostly senior management.
data  enron  corpora  dataset  language  380  textbook 
september 2016 by jerid.francom
The Life-Changing Magic of Tidying Text | R-bloggers
When I went to the rOpenSci unconference about a month ago, I started work with Dave Robinson on a package for text mining using tidy data principles. What is
corpora  r  380  data  tidytext  packages 
july 2016 by jerid.francom
ulrich-matter / pvsR
pvsR: Interact with the Project Vote Smart API for scientific research

An R package that facilitates data retrieval from Project Vote Smart's rich online data base on US politics via the Project Vote Smart application programming interface (PVS API). The functions in this package cover most PVS API classes and methods and return the requested data in data-frames.
data  380  politics 
july 2016 by jerid.francom
GREA: The RStudio Add-In to read ALL the data into R! | R-bloggers
Guest post by Stanislaus Stadlmann Have you also been overburdened by the vast selection of R packages to read different filetypes into R? Do you sometimes
380  r  tools  software  data  reading 
july 2016 by jerid.francom
John Oliver Explains How The Media Distorts Study Results Like ‘A Game Of Telephone’
What shows up as headlines on news sites and TV isn't always an accurate depiction of what scientific studies really find.
380  data  statistics  science  public 
may 2016 by jerid.francom
The advantages of using count() to get N-way frequency tables as data frames in R
Introduction I recently introduced how to use the count() function in the “plyr” package in R to produce 1-way frequency tables in R.  Several
r  linguistics  380  tools  data  tables  xtabs  plyr 
april 2016 by jerid.francom
How To Find Simple and Interesting Multi-Gigabyte Data Sets - DZone Big Data
Wanna play with some real Big Data? This introduction to machine learning and processing uses Stack Overflow social data for over a terabyte of test data!
data  analysis  spark  big  bigdata 
october 2015 by jerid.francom
Competitions | Kaggle
Kaggle is a platform for data prediction competitions. Companies, organizations and researchers post their data and have it scrutinized by the world's best statisticians.
kaggle  datascience  data  analytics  tutorials 
may 2015 by jerid.francom
Government Releases Massive Trove of Data on Doctors’ Prescribing Patterns
The move follows a ProPublica investigation showing that Medicare did little to find dangerous prescribing by doctors to seniors and the disabled. It is also part of the government’s new push to bring transparency to taxpayer-supported medical care.
data  medical  government  datascience 
may 2015 by jerid.francom
Hortonworks. We Do Hadoop.
Hortonworks develops, distributes and supports a 100% open source distribution of Apache Hadoop for the enterprise, also training, support & services.
hadoop  data  bigdata  development  tutorials 
april 2015 by jerid.francom
LeaRning Path on R - Step by Step Guide to Learn Data Science on R
Learning path on R provides a step by step guide to become a data scientist using R. The path includes exercises, tutorials & best practices
r  learning  learnR  data  mining  visualization  performance  clustering  machinelearning 
march 2015 by jerid.francom
« earlier      
per page:    204080120160

related tags

academic  ACTIV-ES  ads  aggregator  analysis  analytics  anc  annotation  api  apple  applications  arizona  big  bigdata  bigmemory  bnc  book  books  business  buzzdata  career  census  centering  children  cleaning  clustering  cmc  coca  coding  collaboration  community  comp-ling  computation  conferences  corpora  corpus  crawl  crisis  csv  culture  data  data-literacy  data.frame  database  databases  datacamp  datamining  datascience  dataset  datasets  dataviz  demographics  density  development  dh  dh@wake  dialect  dictionary  digitalhumanities  distributed  documentation  downloads  dplyr  dr.who  dumps  ec2  education  effectsize  email  english  enron  eula  europarle  experimental  facebook  ff  figshare  finance  formatting  free  freeware  french  frequency  geolocation  ggplot  ggplot2  gis  git  github  gmail  google  government  grants  guide  hadoop  hdfs  histogram  historic-places  history  humanities  imdb  influence  interactive  interface  international  java  javascript  jobs  journals  k-nn  kaggle  knitr  l2  labs  language  languages  law  learner  learning  learnR  legal  lessons  lexical  lexical-diversity  lexicon  linguistics  linguists  linkeddata  listing  literacy  literature  lme4  lmer  loans  long-wide  lyrics  machine  machinelearning  machine_learning  map  mapping  maps  markdown  marketing  media  medical  meetup  metadata  methodology  methods  mining  mit  mixed.effect.models  models  movies  multilingual  munging  music  named-entity  neh  network  news  newspapers  ngrams  nlp  non-profit  nytimes  ocr  OOP  open  openaccess  opendata  openNLP  opensource  oral-history  package  packages  pandas  pandoc  parallel  parser  pedagogy  performance  plot  plotting  plyr  police  politics  portuguese  privacy  programming  psycholinguistics  public  publishing  pyschology  python  r  RData  reading  reference  refine  repetition  repository  representations  reproducible  research  resources  respositories  reviews  rmarkdown  rnc  Rstats  russian  science  scientist  scikit  scrape  scraping  search  sentiment  services  shakespeare  shapefile  sharing  shiny  shp  simulation  social  social-good  socialmedia  sociolinguistics  software  source  southern  spanish  spark  speeches  spoken  statistics  studentloans  Subtitles  SUBTLEXuk  summarization  swirl  switchboard  tables  teach  teaching  technology  text  TextAnalytics  textbook  textbooks  tidyr  tidytext  tidyverse  timeline  tm  tokenizer  tools  torrents  transcripts  transformation  treebank  treebanks  tuning  tutorials  twitter  unc  unstructured  usa  usenet  visualization  web  web2.0  webcrawling  wfu  wiki  wikipedia  word  word-associations  word-vectors  word2vec  wordclouds  wordlists  workflow  world  writing  xtabs  zsr 

Copy this bookmark: