Small World of Words Home
Word association and participant data for 100 primary, secondary and tertiary responses to 12,292 cues. The data published in Behavior Research Methods were collected between 2011 and 2018. The preprocessed data consist of normalizations of cues and responses by spell-checking them, correcting capitalization and Americanizing. In addition to normalizing cues and responses, the preprocessed file contains data in which each cue is judged by exactly 100 participants (see Github repository for details).
november 2019 by jerid.francom 
november 2019 by jerid.francom
Enron Email Corpus
This dataset contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a total of about 0.5M messages. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation.
june 2019 by jerid.francom 
june 2019 by jerid.francom
Word associations provided for multiple languages based on human entries
december 2018 by jerid.francom 
december 2018 by jerid.francom
CRAN - Package fakeR
R package to simulate datasets from various distributions
october 2018 by jerid.francom 
october 2018 by jerid.francom
Simulating study data
R package to simulate datasets with various distributions
october 2018 by jerid.francom 
october 2018 by jerid.francom
The aim of this repository is to promote research on the learning of French and Spanish as L2, by making parallel learner corpora for each language freely available to the research community.
october 2018 by jerid.francom 
october 2018 by jerid.francom
OECD Statistics
Organization for Economic Co-operation and Development
june 2018 by jerid.francom 
june 2018 by jerid.francom
The Switchboard Dialog Act Corpus
A corpus of 1155 5-minute conversations in American English, comprising 205,000 utterances and 1.4 million words, from the Switchboard corpus of telephone conversations.
november 2017 by jerid.francom 
november 2017 by jerid.francom
Sentiment lexicon for Portuguese
november 2017 by jerid.francom 
november 2017 by jerid.francom
Data Viz Project
Collection of data visualizations to get inspired and finding the right type.
november 2017 by jerid.francom 
november 2017 by jerid.francom
Web application for sharing .csv files and making them searchable. Also a potential source of data (if the resource catches on!).
november 2017 by jerid.francom 
november 2017 by jerid.francom
R package interface to connect with the Glottolog database and provides additional functionality for linguistic mapping.
october 2017 by jerid.francom 
october 2017 by jerid.francom
Learner corpora around the world
A listing of learner corpora around the world
october 2017 by jerid.francom 
october 2017 by jerid.francom
English Lexicon Project
Access to a large set of lexical characteristics, along with behavioral data from visual lexical decision and naming studies.
september 2017 by jerid.francom 
september 2017 by jerid.francom
R package interface to query the data sharing platform FigShare.
august 2017 by jerid.francom 
august 2017 by jerid.francom
The Moby lexicon project
Language wordlists and resources from the Moby project.
september 2016 by jerid.francom 
september 2016 by jerid.francom
GoodReads: Webscraping and Text Analysis with R (Part 1)
Inspired by this article about sentiment analysis and this guide to webscraping, I have decided to get my hands dirty by scraping and analyzing a sample of reviews on the
september 2016 by jerid.francom 
september 2016 by jerid.francom
Enron Email Dataset
Enron email data from about 150 users, mostly senior management.
september 2016 by jerid.francom 
september 2016 by jerid.francom
The Life-Changing Magic of Tidying Text | R-bloggers
When I went to the rOpenSci unconference about a month ago, I started work with Dave Robinson on a package for text mining using tidy data principles. What is
july 2016 by jerid.francom 
july 2016 by jerid.francom
ulrich-matter / pvsR
pvsR: Interact with the Project Vote Smart API for scientific research

An R package that facilitates data retrieval from Project Vote Smart's rich online data base on US politics via the Project Vote Smart application programming interface (PVS API). The functions in this package cover most PVS API classes and methods and return the requested data in data-frames.
july 2016 by jerid.francom 
july 2016 by jerid.francom
GREA: The RStudio Add-In to read ALL the data into R! | R-bloggers
Guest post by Stanislaus Stadlmann Have you also been overburdened by the vast selection of R packages to read different filetypes into R? Do you sometimes
july 2016 by jerid.francom 
july 2016 by jerid.francom
John Oliver Explains How The Media Distorts Study Results Like ‘A Game Of Telephone’
What shows up as headlines on news sites and TV isn't always an accurate depiction of what scientific studies really find.
may 2016 by jerid.francom 
may 2016 by jerid.francom
The advantages of using count() to get N-way frequency tables as data frames in R
Introduction I recently introduced how to use the count() function in the “plyr” package in R to produce 1-way frequency tables in R.  Several
april 2016 by jerid.francom 
april 2016 by jerid.francom
How To Find Simple and Interesting Multi-Gigabyte Data Sets - DZone Big Data
Wanna play with some real Big Data? This introduction to machine learning and processing uses Stack Overflow social data for over a terabyte of test data!
october 2015 by jerid.francom 
october 2015 by jerid.francom
Competitions | Kaggle
Kaggle is a platform for data prediction competitions. Companies, organizations and researchers post their data and have it scrutinized by the world's best statisticians.
may 2015 by jerid.francom 
may 2015 by jerid.francom
Government Releases Massive Trove of Data on Doctors’ Prescribing Patterns
The move follows a ProPublica investigation showing that Medicare did little to find dangerous prescribing by doctors to seniors and the disabled. It is also part of the government’s new push to bring transparency to taxpayer-supported medical care.
may 2015 by jerid.francom 
may 2015 by jerid.francom
Hortonworks. We Do Hadoop.
Hortonworks develops, distributes and supports a 100% open source distribution of Apache Hadoop for the enterprise, also training, support & services.
april 2015 by jerid.francom 
april 2015 by jerid.francom
LeaRning Path on R - Step by Step Guide to Learn Data Science on R
Learning path on R provides a step by step guide to become a data scientist using R. The path includes exercises, tutorials & best practices
march 2015 by jerid.francom 
march 2015 by jerid.francom
