SciKit-Learn Laboratory (SKLL) — SciKit-Learn Laboratory 1.0.0 documentation
SKLL (pronounced “skull”) provides a number of utilities to make it simpler to run common scikit-learn experiments with pre-generated features.

scikit python ipython ML teaching 
november 2014 by thadk
DataKind | DataKind Blog
For example, you could make a table where and columns are records and rows are variables. Or you could make a table that includes a few rows that are statistics about the other rows. I get confused when they I have tables like this.

There's actually a name for this sort of data table; it's called *untidy* data, and the first thing that I do when analyzing such a data table is converting it into the *tidy* format where each row is an observation/trial/record and each column is a variable.
tidy data teaching datakind tables observations science statistics clean screenscraping 
april 2014 by thadk
Portia is a tool for visually scraping web sites without any programming knowledge. Just annotate web pages with a point and click editor to indicate what data you want to extract, and portia will learn how to scrape similar pages from the site.
scraping automation screenscraping python 
april 2014 by thadk
Finding structure in xkcd comics with Latent Dirichlet Allocation
An "optimal" number of topics is found using the Bayesian model selection approach (with uniform prior belief on the number of topics) suggested by Griffiths and Steyvers (2004). After an optimal number is decided, topic interpretations and trends over time are explored.
ggplot2 LDA 
december 2013 by thadk
Ken Shirriff's blog: How Hacker News ranking really works: scoring, controversy, and penalties
On average, about 20% of the articles on the front page have been penalized, while 38% of the articles on the second page have been penalized. (The front page rate is lower since penalized articles are less likely to be on the front page, kind of by definition.) There is a lot more penalization going on than you might expect.
hackernews acceleration reddit frontpage points scoring ranking criticism 
november 2013 by thadk

