SciKit-Learn Laboratory (SKLL) — SciKit-Learn Laboratory 1.0.0 documentation
SKLL (pronounced “skull”) provides a number of utilities to make it simpler to run common scikit-learn experiments with pre-generated features.

scikit  python  ipython  @seniorproject  ML  teaching 
november 2014 by thadk
DataKind | DataKind Blog
For example, you could make a table where and columns are records and rows are variables. Or you could make a table that includes a few rows that are statistics about the other rows. I get confused when they I have tables like this.

There's actually a name for this sort of data table; it's called *untidy* data, and the first thing that I do when analyzing such a data table is converting it into the *tidy* format where each row is an observation/trial/record and each column is a variable.
tidy  via:tomlevine  data  teaching  datakind  tables  observations  science  statistics  clean  screenscraping  @seniorproject 
april 2014 by thadk
Portia is a tool for visually scraping web sites without any programming knowledge. Just annotate web pages with a point and click editor to indicate what data you want to extract, and portia will learn how to scrape similar pages from the site.
scraping  automation  screenscraping  @seniorproject  python 
april 2014 by thadk
Finding structure in xkcd comics with Latent Dirichlet Allocation
An "optimal" number of topics is found using the Bayesian model selection approach (with uniform prior belief on the number of topics) suggested by Griffiths and Steyvers (2004). After an optimal number is decided, topic interpretations and trends over time are explored.
ggplot2  @seniorproject  LDA  via:quidlabs 
december 2013 by thadk
Ken Shirriff's blog: How Hacker News ranking really works: scoring, controversy, and penalties
On average, about 20% of the articles on the front page have been penalized, while 38% of the articles on the second page have been penalized. (The front page rate is lower since penalized articles are less likely to be on the front page, kind of by definition.) There is a lot more penalization going on than you might expect.
hackernews  @seniorproject  acceleration  reddit  frontpage  points  scoring  ranking  criticism 
november 2013 by thadk

