gnat + bigdata   4

A Programmer's Guide to Data Mining | The Ancient Art of the Numerati
A guide to practical data mining, collective intelligence, and building recommendation systems
book  data  machinelearning  python  programming  bigdata 
november 2012 by gnat
graphchi - Big Data - small machine - Google Project Hosting
GraphChi can run very large graph computations on just a single machine, by using a novel algorithm for processing the graph from disk (SSD or hard drive). Programs for GraphChi are written in the vertex-centric model, proposed by GraphLab and Google's Pregel. GraphChi runs vertex-centric programs asynchronously (i.e changes written to edges are immediately visible to subsequent computation), and in parallel. GraphChi also supports streaming graph updates and removal of edges from the graph. Section 'Performance' contains some examples of applications implemented for GraphChi and their running times on GraphChi.
open  source  algorithms  bigdata  graph 
november 2012 by gnat
Classifier Technology and the Illusion of Progress
A great many tools have been developed for supervised classification, ranging from early methods such as linear discriminant analysis through to modern developments such as neural networks and support vector machines. A large number of comparative studies have been
conducted in attempts to establish the relative superiority of these
methods. This paper argues that these comparisons often fail to take
into account important aspects of real problems, so that the apparent
superiority of more sophisticated methods may be something of an illusion. In particular, simple methods typically yield performance almost
as good as more sophisticated methods, to the extent that the difference
in performance may be swamped by other sources of uncertainty that
generally are not considered in the classical supervised classification
ai  machinelearning  cs  bigdata 
may 2012 by gnat

Copy this bookmark: