xkcd: Online Communities 2
XKCD map of language use; spoken versus commuter mediated.
internet  maps  social-media  textbook  380 
june 2018 by jerid.francom
Gendered Language in Teaching Evaluations
Visualization of men and women instructors on RateMyProfessor
visualization  gender  teaching  course  380 
december 2017 by jerid.francom
Characterizing Twitter followers with tidytext | R-bloggers
Lately, I have been more and more taken with tidy principles of data analysis. They are elegant and make analyses clearer and easier to comprehend. Following
r  r-bloggers  tidytext  twitter  topicModel  sentiment  network-analysis  380  textbook 
june 2017 by jerid.francom
All the fake data that’s fit to print | R-bloggers
charlatan makes fake data. Excited to annonunce a new package called charlatan. While perusing packages from other programming languages, I saw a neat Python
r  packages  380  textbook 
june 2017 by jerid.francom
How to Contribute a Cheatsheet
Want to contribute a cheatsheet of your own?

We'd like to help you make and share high quality cheatsheets on R topics. The template below provides a useful starting place. It contains tips for designing a three or four column cheatsheet, as well as reusable elements to build your sheet with.
r  cheat-sheet  templates  rstudio  380  textbook 
june 2017 by jerid.francom
A Partial Remedy to the Reproducibility Problem
Several years ago, John Ionnidis jolted the scientific establishment with an article titled, "Why Most Published Research Findings Are False." He had concerns about inattention to statistical power, multiple inference issues and so on. Most people had already been aware of all this, of course, but that conversation opened the floodgates, and many more issues…
reproducible  packages  r  textbook  380 
june 2017 by jerid.francom
Text Mining with R: A Tidy Approach | R-bloggers
About the book This book applies tidy data principles to text analysis. The aim is to present tools to make many text mining tasks easier, more effective, and
textbook  380  tidyr  tidytext 
may 2017 by jerid.francom
R -
Practice having thoughtful conversations about code.
r  380  textbook  exercises  learning 
april 2017 by jerid.francom
Dissecting Trump’s Most Rabid Online Following
We dissected Trump’s most rabid reddit following. Here’s what we found.
380  textbook  lsa  semantic  analysis  politics  2016-elections  trump 
march 2017 by jerid.francom
Extracting Tables from PDFs in R using the Tabulizer Package
Recently I wanted to extract a table from a pdf file so that I could work with the table in R. Specifically, I wanted to get data on layoffs in California from
r  packages  pdf  tables  380 
december 2016 by jerid.francom
How To Become A Data Scientist In 2017
Yesterday LinkedIn released its list of top skills for 2017. Statistical analysis and data mining retained the second spot they held last year while data presentation entered the top 10 for the first time.
r  380  datascience  career  jobs  textbooks 
october 2016 by jerid.francom
Clustering Search Keywords Using K-Means Clustering
One of the key tenets to doing impactful digital analysis is understanding what your visitors are trying to accomplish. One of the easiest methods to do this is
r  380  clustering  machinelearning 
october 2016 by jerid.francom
awesome-machine-learning - A curated list of awesome Machine Learning frameworks, libraries and software.
machinelearning  380  r  tools  software  nlp 
october 2016 by jerid.francom
U.S. Presidential Debates Through the Eyes of a Computer | CrowdFlower
Of the presidential talks for each candidate from the last debate, which moments are most consistent with everything they’ve said up to then?
380  machinelearning  politics  prediction  author  detection 
october 2016 by jerid.francom
Programming languages in highest demand
The programming languages that are in highest demand, from Google's Go to Python to Apple Swift and PHP.
380  news  career  r  datascience 
october 2016 by jerid.francom
Analyzing the first Presidential Debate
A significant chunk of the data that we encounter on a daily basis is available in an unstructured, free text format. Hence, the ability to glean useful bits of
380  r  politics  webcrawling  rvest  stringr 
october 2016 by jerid.francom
The Moby lexicon project
Language wordlists and resources from the Moby project.
380  textbook  data  experimental 
september 2016 by jerid.francom
GoodReads: Webscraping and Text Analysis with R (Part 1)
Inspired by this article about sentiment analysis and this guide to webscraping, I have decided to get my hands dirty by scraping and analyzing a sample of reviews on the
r  data  webcrawling  reviews  380 
september 2016 by jerid.francom
Enron Email Dataset
Enron email data from about 150 users, mostly senior management.
data  enron  corpora  dataset  language  380  textbook 
september 2016 by jerid.francom
Humans may speak a 'universal' language
From nose to knee and red to round, the sounds humans use to construct basic words are similar around the world
150  380  datascience  language  variation 
september 2016 by jerid.francom
A MODERN DIVE into Data with R
Getting away from the traditional introductory statistics curriculum, more focused on reproducible research and modern data analysis techniques and tools
380  textbooks  r  bookdown  statistics  datascience 
september 2016 by jerid.francom
The Data Science Venn Diagram
On Monday I—humbly—joined a group of NYC's most sophisticated thinkers on
all things data for a half-day unconference to help O'Reily organize their
upcoming Strata conference. The break out sessions were fantastic, and the
number of people in each allowed for outstanding, expert driven,
discussions. One of the best sessions I attended focused on issues related
to teaching data science, which inevitably led to a discussion on the
skills needed to be a fully competent data scientist.

As I have said before, I think the term "data science" is a bit of a
misnomer, but I was very hopeful after this discussion; mostly because of
the utter lack of agreement on what a curriculum on this subject would look
like. The difficulty in defining these skills is that the split between
substance and methodology is ambiguous, and as such it is unclear how to
distinguish among hackers, statisticians, subject matter experts, their
overlaps and where data science fits.

What is clear, however, is that one needs to lea
datascience  r  380  jobs  learning 
august 2016 by jerid.francom
15 Page Tutorial for R
For Beginners in R, here is a 15 page example based tutorial that covers the basics of R. Starting R – Trivial tutorial on how to start R for those just
r  tutorial  380 
august 2016 by jerid.francom
Chi-Squared Test
Before we build stats/machine learning models, it is a good practice to understand which predictors are significant and have an impact on the response variable.
380  r  statistics  chi-squared 
august 2016 by jerid.francom
Text analysis of Trump’s tweets confirms he writes only the (angrier) Android half
I don’t normally post about politics (I’m not particularly savvy about polling, which is where data science has had the largest impact on politics). But this
380  twitter  politics  trump  TextAnalytics 
august 2016 by jerid.francom
The reproducibility crisis in science and prospects for R | R-bloggers
Guest post by Gregorio Santori (<>) The results that emerged from a recent Nature‘s survey confirm as, for many researchers, we
380  reproducible  research  rmarkdown 
july 2016 by jerid.francom
How Vector Space Mathematics Reveals the Hidden Sexism in Language
As neural networks tease apart the structure of language, they are finding a hidden gender bias that nobody knew was there.
vector-space-models  word2vec  sexism  language  380 
july 2016 by jerid.francom
Why Democrats and Republicans Speak Different Languages. Literally.
The Republican National Convention proved yet again that the GOP talks about America and U.S. policy with an entire unique vocabulary. It hasn’t always been this way.
380  news  politics  stylistics  naive_bayes  language_model  prediction  machine_learning 
july 2016 by jerid.francom
Data Science at the Command Line
We data scientists love to create exciting data visualizations and insightful statistical models. However, before we get to that point, usually much effort goes into obtaining, scrubbing, and exploring the required data. The command line, although invented decades ago, is an amazing environment for performing such data science tasks. By combining small, yet powerful, command-line tools you can quickly explore your data and hack together prototypes. New tools such as parallel, jq, and csvkit allow you to use the command line for today's data challenges. Even if you're already comfortable processing data with, say, R or Python, being able to also leverage the power of the command line can make you a more productive and efficient data scientist.
cli  tools  unix  380  datascience 
july 2016 by jerid.francom
The Life-Changing Magic of Tidying Text | R-bloggers
When I went to the rOpenSci unconference about a month ago, I started work with Dave Robinson on a package for text mining using tidy data principles. What is
corpora  r  380  data  tidytext  packages 
july 2016 by jerid.francom
Does sentiment analysis work? A tidy analysis of Yelp reviews | R-bloggers
This year Julia Silge and I released the tidytext package for text mining using tidy tools such as dplyr, tidyr, ggplot2 and broom. One of the canonical
packages  tidytext  r  380  tasks  yelp  socialmedia 
july 2016 by jerid.francom
How Melania Trump’s Speech Veered Off Course and Caused an Uproar

“Ninety-three percent of the speech is completely different,” declared Gov. Chris Christie of New Jersey. Paul Manafort, Mr. Trump’s campaign chairman, pegged the number of suspicious words at 50. “And that includes ‘ands’ and ‘thes’ and things like that,” he said on Tuesday.
quotes  380 
july 2016 by jerid.francom
ulrich-matter / pvsR
pvsR: Interact with the Project Vote Smart API for scientific research

An R package that facilitates data retrieval from Project Vote Smart's rich online data base on US politics via the Project Vote Smart application programming interface (PVS API). The functions in this package cover most PVS API classes and methods and return the requested data in data-frames.
data  380  politics 
july 2016 by jerid.francom
GREA: The RStudio Add-In to read ALL the data into R! | R-bloggers
Guest post by Stanislaus Stadlmann Have you also been overburdened by the vast selection of R packages to read different filetypes into R? Do you sometimes
380  r  tools  software  data  reading 
july 2016 by jerid.francom
The Mathematics of Machine Learning | R-bloggers
This post was first published on my Linkedin page and posted here as a contributed post. In the last few months, I have had several people contact me
machinelearning  statistics  380 
july 2016 by jerid.francom
