@seniorproject   165

« earlier    

SciKit-Learn Laboratory (SKLL) — SciKit-Learn Laboratory 1.0.0 documentation
SKLL (pronounced “skull”) provides a number of utilities to make it simpler to run common scikit-learn experiments with pre-generated features.

refrr:https://t.co/qnTXhOhs83
scikit  python  ipython  @seniorproject  ML  teaching 
november 2014 by thadk
DataKind | DataKind Blog
refrr:http://t.co/5OrWnfXKXC
For example, you could make a table where and columns are records and rows are variables. Or you could make a table that includes a few rows that are statistics about the other rows. I get confused when they I have tables like this.

There's actually a name for this sort of data table; it's called *untidy* data, and the first thing that I do when analyzing such a data table is converting it into the *tidy* format where each row is an observation/trial/record and each column is a variable.
tidy  via:tomlevine  data  teaching  datakind  tables  observations  science  statistics  clean  screenscraping  @seniorproject 
april 2014 by thadk
scrapinghub/portia
refrr:https://github.com/scrapinghub/portia
Portia is a tool for visually scraping web sites without any programming knowledge. Just annotate web pages with a point and click editor to indicate what data you want to extract, and portia will learn how to scrape similar pages from the site.
scraping  automation  screenscraping  @seniorproject  python 
april 2014 by thadk
Finding structure in xkcd comics with Latent Dirichlet Allocation
An "optimal" number of topics is found using the Bayesian model selection approach (with uniform prior belief on the number of topics) suggested by Griffiths and Steyvers (2004). After an optimal number is decided, topic interpretations and trends over time are explored.
ggplot2  @seniorproject  LDA  via:quidlabs 
december 2013 by thadk
Ken Shirriff's blog: How Hacker News ranking really works: scoring, controversy, and penalties
On average, about 20% of the articles on the front page have been penalized, while 38% of the articles on the second page have been penalized. (The front page rate is lower since penalized articles are less likely to be on the front page, kind of by definition.) There is a lot more penalization going on than you might expect.
hackernews  @seniorproject  acceleration  reddit  frontpage  points  scoring  ranking  criticism 
november 2013 by thadk

« earlier    

related tags

@projects  @readreview  academia  academic  acceleration  acumenfund  africa  ai  aiderss  algorithm  algorithms  analysis  analytics  anthropology  api  archive  article  automation  award  baby  bayesian  bigdata  binding  blogs  book  books  boosting  boxplot  breadth  browser  business  calendar  cambridge  camp  canada  canon  career  catastrophe  cdc  charts  circadian  classification  clean  clustering  cnet  code  cogsci  color  comment  commodities  communication  communities  community  compare  competition  compsci  computers  computing  conference  context  conversion  cornell  coursera  creativity  creators  criticism  crowd  cycle  darknet  data-mining  data  database  datakind  datamining  datascience  ddj  design  dev  development  discussion  diversity  diy  dml  download  duflo  education  eliezeryudowsky  english  ethiopia  evaporativecooling  examples  excel  extraction  facebook  filesystem  flesch  forest  forum  fragile  frontpage  funding  fuse  gallery  gathering  geek  geeky  ggplot2  gift  gis  github  givecamp  google  gradschool  graph  graphics  graphing  greasemonkey  grosvetter  groups  guide  hackernews  hackers  hadoop  halvarian  hbr  hci  hierarchy  history  homemade  homophily  hoodwinkd  html  human  ict4d  idf  ihub  individuals  information  infovis  infoviz  instapaper  internet  intldevelopment  ipython  ir  javascript  jobs  jonathanstray  journalism  k5  kmeans  latentvariables  lda  learn  learning  lesswrong  liberalarts  library  librarybooks  linearalgebra  linguistics  listserv  looseties  lost  lsa  lsi  lug  lvm  machinelearning  macosx  make  mama  mangement  market  markup  math  mathematics  matlab  media  mediated  metaweb  meteorology  mit  ml  modeling  moderation  mom  mooc  msresearch  mythology  netflix  network  networking  neuralnet  newspapers  nlp  nonprofit  notebook  observations  online  opendata  openness  opennews  opensource  outline  overlay  overview  pagerank  paper  parsing  participant  paulsimon  pdf  penpaper  people  pivot  plot  points  portland  postrank  presentation  productive  programming  project  publics  pydata  python  quartile  r  ranking  readability  reallife  recommendation  reddit  reference  regexp  regression  relations  reputation  research  rss  s-curve  scent  science  scikit  scoring  scraping  screenscraping  search  secondlife  semantic  semanticweb  sentences  sentiment  sine  sleep  snooping  social  socialmedia  socialnetworking  socialscience  socialsciences  socialsoftware  sociology  software  sql  stanford  statistics  stats  stemming  svd  swahili  tables  tactile  tagging  tanzania  teaching  technology  ted  testing  text  textanalysis  textmining  textprocessing  tf-idf  thatcamp  theory  thesis  tidy  ties  tiesporting  time  timeline  tokens  tools  topicmodeling  topicspace  tranfer  triangle  troll  trolls  trust  twitter  ucdavis  ui  unix  usergroup  usergroups  valuable  video  violin  visualisation  visualization  weakties  web  webdesign  webdev  webkit  wiki  wikipedia  wired  word2vec  yikyak 

Copy this bookmark:



description:


tags: