jerid.francom + 380   321

xkcd: Online Communities 2
XKCD map of language use; spoken versus commuter mediated.
internet  maps  social-media  textbook  380 
june 2018 by jerid.francom
Gendered Language in Teaching Evaluations
Visualization of men and women instructors on RateMyProfessor
visualization  gender  teaching  course  380 
december 2017 by jerid.francom
Characterizing Twitter followers with tidytext | R-bloggers
Lately, I have been more and more taken with tidy principles of data analysis. They are elegant and make analyses clearer and easier to comprehend. Following
r  r-bloggers  tidytext  twitter  topicModel  sentiment  network-analysis  380  textbook 
june 2017 by jerid.francom
All the fake data that’s fit to print | R-bloggers
charlatan makes fake data. Excited to annonunce a new package called charlatan. While perusing packages from other programming languages, I saw a neat Python
r  packages  380  textbook 
june 2017 by jerid.francom
How to Contribute a Cheatsheet
Want to contribute a cheatsheet of your own?

We'd like to help you make and share high quality cheatsheets on R topics. The template below provides a useful starting place. It contains tips for designing a three or four column cheatsheet, as well as reusable elements to build your sheet with.
r  cheat-sheet  templates  rstudio  380  textbook 
june 2017 by jerid.francom
A Partial Remedy to the Reproducibility Problem
Several years ago, John Ionnidis jolted the scientific establishment with an article titled, "Why Most Published Research Findings Are False." He had concerns about inattention to statistical power, multiple inference issues and so on. Most people had already been aware of all this, of course, but that conversation opened the floodgates, and many more issues…
reproducible  packages  r  textbook  380 
june 2017 by jerid.francom
Text Mining with R: A Tidy Approach | R-bloggers
About the book This book applies tidy data principles to text analysis. The aim is to present tools to make many text mining tasks easier, more effective, and
textbook  380  tidyr  tidytext 
may 2017 by jerid.francom
R -
Practice having thoughtful conversations about code.
r  380  textbook  exercises  learning 
april 2017 by jerid.francom
Dissecting Trump’s Most Rabid Online Following
We dissected Trump’s most rabid reddit following. Here’s what we found.
380  textbook  lsa  semantic  analysis  politics  2016-elections  trump 
march 2017 by jerid.francom
Extracting Tables from PDFs in R using the Tabulizer Package
Recently I wanted to extract a table from a pdf file so that I could work with the table in R. Specifically, I wanted to get data on layoffs in California from
r  packages  pdf  tables  380 
december 2016 by jerid.francom
How To Become A Data Scientist In 2017
Yesterday LinkedIn released its list of top skills for 2017. Statistical analysis and data mining retained the second spot they held last year while data presentation entered the top 10 for the first time.
r  380  datascience  career  jobs  textbooks 
october 2016 by jerid.francom
Clustering Search Keywords Using K-Means Clustering
One of the key tenets to doing impactful digital analysis is understanding what your visitors are trying to accomplish. One of the easiest methods to do this is
r  380  clustering  machinelearning 
october 2016 by jerid.francom
awesome-machine-learning - A curated list of awesome Machine Learning frameworks, libraries and software.
machinelearning  380  r  tools  software  nlp 
october 2016 by jerid.francom
U.S. Presidential Debates Through the Eyes of a Computer | CrowdFlower
Of the presidential talks for each candidate from the last debate, which moments are most consistent with everything they’ve said up to then?
380  machinelearning  politics  prediction  author  detection 
october 2016 by jerid.francom
Programming languages in highest demand
The programming languages that are in highest demand, from Google's Go to Python to Apple Swift and PHP.
380  news  career  r  datascience 
october 2016 by jerid.francom
Analyzing the first Presidential Debate
A significant chunk of the data that we encounter on a daily basis is available in an unstructured, free text format. Hence, the ability to glean useful bits of
380  r  politics  webcrawling  rvest  stringr 
october 2016 by jerid.francom
The Moby lexicon project
Language wordlists and resources from the Moby project.
380  textbook  data  experimental 
september 2016 by jerid.francom
GoodReads: Webscraping and Text Analysis with R (Part 1)
Inspired by this article about sentiment analysis and this guide to webscraping, I have decided to get my hands dirty by scraping and analyzing a sample of reviews on the
r  data  webcrawling  reviews  380 
september 2016 by jerid.francom
Enron Email Dataset
Enron email data from about 150 users, mostly senior management.
data  enron  corpora  dataset  language  380  textbook 
september 2016 by jerid.francom
Humans may speak a 'universal' language
From nose to knee and red to round, the sounds humans use to construct basic words are similar around the world
150  380  datascience  language  variation 
september 2016 by jerid.francom
A MODERN DIVE into Data with R
Getting away from the traditional introductory statistics curriculum, more focused on reproducible research and modern data analysis techniques and tools
380  textbooks  r  bookdown  statistics  datascience 
september 2016 by jerid.francom
The Data Science Venn Diagram
On Monday I—humbly—joined a group of NYC's most sophisticated thinkers on
all things data for a half-day unconference to help O'Reily organize their
upcoming Strata conference. The break out sessions were fantastic, and the
number of people in each allowed for outstanding, expert driven,
discussions. One of the best sessions I attended focused on issues related
to teaching data science, which inevitably led to a discussion on the
skills needed to be a fully competent data scientist.

As I have said before, I think the term "data science" is a bit of a
misnomer, but I was very hopeful after this discussion; mostly because of
the utter lack of agreement on what a curriculum on this subject would look
like. The difficulty in defining these skills is that the split between
substance and methodology is ambiguous, and as such it is unclear how to
distinguish among hackers, statisticians, subject matter experts, their
overlaps and where data science fits.

What is clear, however, is that one needs to lea
datascience  r  380  jobs  learning 
august 2016 by jerid.francom
15 Page Tutorial for R
For Beginners in R, here is a 15 page example based tutorial that covers the basics of R. Starting R – Trivial tutorial on how to start R for those just
r  tutorial  380 
august 2016 by jerid.francom
Chi-Squared Test
Before we build stats/machine learning models, it is a good practice to understand which predictors are significant and have an impact on the response variable.
380  r  statistics  chi-squared 
august 2016 by jerid.francom
Text analysis of Trump’s tweets confirms he writes only the (angrier) Android half
I don’t normally post about politics (I’m not particularly savvy about polling, which is where data science has had the largest impact on politics). But this
380  twitter  politics  trump  TextAnalytics 
august 2016 by jerid.francom
The reproducibility crisis in science and prospects for R | R-bloggers
Guest post by Gregorio Santori (<>) The results that emerged from a recent Nature‘s survey confirm as, for many researchers, we
380  reproducible  research  rmarkdown 
july 2016 by jerid.francom
How Vector Space Mathematics Reveals the Hidden Sexism in Language
As neural networks tease apart the structure of language, they are finding a hidden gender bias that nobody knew was there.
vector-space-models  word2vec  sexism  language  380 
july 2016 by jerid.francom
Why Democrats and Republicans Speak Different Languages. Literally.
The Republican National Convention proved yet again that the GOP talks about America and U.S. policy with an entire unique vocabulary. It hasn’t always been this way.
380  news  politics  stylistics  naive_bayes  language_model  prediction  machine_learning 
july 2016 by jerid.francom
Data Science at the Command Line
We data scientists love to create exciting data visualizations and insightful statistical models. However, before we get to that point, usually much effort goes into obtaining, scrubbing, and exploring the required data. The command line, although invented decades ago, is an amazing environment for performing such data science tasks. By combining small, yet powerful, command-line tools you can quickly explore your data and hack together prototypes. New tools such as parallel, jq, and csvkit allow you to use the command line for today's data challenges. Even if you're already comfortable processing data with, say, R or Python, being able to also leverage the power of the command line can make you a more productive and efficient data scientist.
cli  tools  unix  380  datascience 
july 2016 by jerid.francom
The Life-Changing Magic of Tidying Text | R-bloggers
When I went to the rOpenSci unconference about a month ago, I started work with Dave Robinson on a package for text mining using tidy data principles. What is
corpora  r  380  data  tidytext  packages 
july 2016 by jerid.francom
Does sentiment analysis work? A tidy analysis of Yelp reviews | R-bloggers
This year Julia Silge and I released the tidytext package for text mining using tidy tools such as dplyr, tidyr, ggplot2 and broom. One of the canonical
packages  tidytext  r  380  tasks  yelp  socialmedia 
july 2016 by jerid.francom
How Melania Trump’s Speech Veered Off Course and Caused an Uproar

“Ninety-three percent of the speech is completely different,” declared Gov. Chris Christie of New Jersey. Paul Manafort, Mr. Trump’s campaign chairman, pegged the number of suspicious words at 50. “And that includes ‘ands’ and ‘thes’ and things like that,” he said on Tuesday.
quotes  380 
july 2016 by jerid.francom
ulrich-matter / pvsR
pvsR: Interact with the Project Vote Smart API for scientific research

An R package that facilitates data retrieval from Project Vote Smart's rich online data base on US politics via the Project Vote Smart application programming interface (PVS API). The functions in this package cover most PVS API classes and methods and return the requested data in data-frames.
data  380  politics 
july 2016 by jerid.francom
GREA: The RStudio Add-In to read ALL the data into R! | R-bloggers
Guest post by Stanislaus Stadlmann Have you also been overburdened by the vast selection of R packages to read different filetypes into R? Do you sometimes
380  r  tools  software  data  reading 
july 2016 by jerid.francom
The Mathematics of Machine Learning | R-bloggers
This post was first published on my Linkedin page and posted here as a contributed post. In the last few months, I have had several people contact me
machinelearning  statistics  380 
july 2016 by jerid.francom
« earlier      
per page:    204080120160

related tags

2016-elections  academia  academic  acquisition  adbusters  advanced  aggregator  AI  algorithm  amazon  american  analysis  analytics  anaphora  anc  annotated  annotation  apa  api  applications  apps  archives  artificial-intelligence  ASL  ausl  author  aws  base-graphics  basic  bibliography  bibtex  bigdata  bnc  bncweb  book  bookdown  books  bookscanning  bots  british  brown  bsl  buckeye  career  caret  cheat-sheet  chi-squared  childes  children  christine  chunking  citations  citr  classification  classroom  cli  cloud  clustering  code  coding  colleagues  command-line  comparable  computation  computational  computer-science  computers  computing  concordance  concordancer  concordancers  constitution  contingency  conversation  coreNLP  corpora  corpus  corpus-linguistics  correlation  course  courses  crowdsourcing  culture  culturomics  data  data-literacy  data.table  database  datacamp  datamining  datascience  dataset  death  delicious  demo  design  detection  dh  dialectology  dialog  dialogue  dictionaries  dictionary  digital  digitalhumanities  digital_humanities  distant  distribution  downloads  dplyr  dumps  ECHO  education  english  enron  environment  epic  epidemics  europarle  events  evolution  examples  excel  exercises  experimental  experiments  facebook  fee-based  flipped-classroom  food  forensic  forensics  formatting  free  freebase  french  frequency  gender  genre  geolocation  ggmap  ggplot2  gitbook  glossary  google  government  graphics  gui  historical  history  howto  html  humanities  humor  HunPos  identification  idioms  information  install  interactive  interdisciplinary  interface  internet  internetarchive  interpreting  ipa  ipython  java  jobs  keyboard  keyboard-shortcuts  language  languages  language_model  latex  law  ldc  learner  learning  learnR  legal  lemma  less-resourced  lessons  lexical  lexicology  lexicon  library  life  linguistic  linguistics  linguists  listing  literacy  literature  logistic  lsa  lucy  lyrics  machinelearning  machine_learning  magrittr  manual  mapping  maps  marketing  McEnery  measures  mechanicalturk  media  memory  metadata  methodology  methods  metrics  mining  misconduct  mlk  modeling  monitoring  mooc  multilingual  munging  music  naive_bayes  named-entity  nbviewer  neh  network-analysis  new-york  news  ngram-viewer  ngrams  nlp  nltk  notebook  nwav41  online  openlibrary  openNLPmodels  opensource  overview  p-hacking  p-values  package  packages  papers  parallel  parliament  parser  parsing  part-of-speech  pc  pdf  pedagogy  Penn  performance  perl  pipeR  plots  plyr  politics  portuguese  pos  power-laws  practice  prediction  prescriptivist  processing  productivity  programming  project  pronouns  psychology  public  publication  publicdomain  publishing  python  qdap  quotes  r  r-bloggers  rae  rattle  readability  reading  reference  regex  regional  regression  reporting  repository  reproducible  research  researchers  resource  resources  respositories  retractions  retrieval  reviews  revolution  RGoogleDocs  rmarkdown  rnc  rproject  rstudio  russian  rvest  sas  science  scientist  scraping  search  selecting-text  semantic  semanticweb  sentiment  server  sexism  shakespeare  shiny  sign  signlanguage  singularity  slang  social  social-media  socialmedia  sockington  software  spanglish  spanish  speech  spoken  ssl  standards  stanford  statistics  stemmer  stopwords  storage  stringr  stylistics  subtitles  summarization  susanne  swirl  syllabus  syntax  tables  tagger  tagging  tagset  talkbank  tasks  teach  teaching  technology  tei  templates  terminology  text  TextAnalytics  textbook  textbooks  texting  textmining  thesaurus  tidyr  tidytext  tm  tokenzation  tool  tools  topicModel  tracking  transcription  transcripts  translation  treebank  treebanks  trending  trump  tutorial  tutorials  twitter  unix  unstructured  unsupervised  usa  users  valibel  variation  vector-space-models  vectors  venn  via:zite  videos  viewer  vignettes  visualization  vsauce  wallstreet  web  web2.0  webcrawling  wedding  wfu  wikipedia  windows  word  word2vec  wordclouds  wordle  wordlists  wordnet  words  world  writing  xml  xtabs  yelp  zipf 

Copy this bookmark: