jm + graphs + big-data   2

Sorting out graph processing
Some nice real-world experimentation around large-scale data processing in differential dataflow:
If you wanted to do an iterative graph computation like PageRank, it would literally be faster to sort the edges from scratch each and every iteration, than to use unsorted edges. If you want to do graph computation, please sort your edges.

Actually, you know what: if you want to do any big data computation, please sort your records. Stop talking sass about how Hadoop sorts things it doesn't need to, read some papers, run some tests, and then sort your damned data. Or at least run faster than me when I sort your data for you.
algorithms  graphs  coding  data-processing  big-data  differential-dataflow  radix-sort  sorting  x-stream  counting-sort  pagerank 
august 2015 by jm
GraphChi
"big data, small machine" -- perform computation on very large graphs using an algorithm they're calling Parallel Sliding Windows. similar to Google's Pregel, apparently
graphs  graphchi  big-data  algorithms  parallel 
july 2012 by jm

Copy this bookmark:



description:


tags: