jm + parallel   10

Amazon Aurora Parallel Query is Available for Preview
Looks very nifty (at least once it's GA)
Parallel Query improves the performance of large analytic queries by pushing processing down to the Aurora storage layer, spreading processing across hundreds of nodes.
With Parallel Query, you can run sophisticated analytic queries on Aurora tables with an order of magnitude performance improvement over serial query processing, in many cases. Parallel Query currently pushes down predicates used to filter tables and hash joins. 
parallel  aurora  amazon  mysql  sql  performance  joins  architecture  data-model 
19 days ago by jm
a free, multi-threaded compression utility with support for bzip2 compressed file format. lbzip2 can process standard bz2 files in parallel. It uses POSIX threading model (pthreads), which allows it to take full advantage of symmetric multiprocessing (SMP) systems. It has been proven to scale linearly, even to over one hundred processor cores.

lbzip2 is fully compatible with bzip2 – both at file format and command line level. Files created by lbzip2 can be decompressed by all versions of bzip2 and other software supporting bz2 format. lbzip2 can decompress any bz2 files in parallel. All bzip2 command-line options are also accepted by lbzip2. This makes lbzip2 a drop-in replacement for bzip2.
bzip2  gzip  compression  lbzip2  parallel  cli  tools 
march 2016 by jm
Spark Breaks Previous Large-Scale Sort Record – Databricks
Massive improvement over plain old Hadoop. This blog post goes into really solid techie reasons why, including:
First and foremost, in Spark 1.1 we introduced a new shuffle implementation called sort-based shuffle (SPARK-2045). The previous Spark shuffle implementation was hash-based that required maintaining P (the number of reduce partitions) concurrent buffers in memory. In sort-based shuffle, at any given point only a single buffer is required. This has led to substantial memory overhead reduction during shuffle and can support workloads with hundreds of thousands of tasks in a single stage (our PB sort used 250,000 tasks).

Also, use of Timsort, an external shuffle service to offload from the JVM, Netty, and EC2 SR-IOV.
spark  hadoop  map-reduce  batch  parallel  sr-iov  benchmarks  performance  netty  shuffle  algorithms  sort-based-shuffle  timsort 
october 2014 by jm
#AltDevBlog » Parallel Implementations
John Carmack describes this code-evolution approach to adding new code:
The last two times I did this, I got the software rendering code running on the new platform first, so everything could be tested out at low frame rates, then implemented the hardware accelerated version in parallel, setting things up so you could instantly switch between the two at any time.  For a mobile OpenGL ES application being developed on a windows simulator, I opened a completely separate window for the accelerated view, letting me see it simultaneously with the original software implementation.  This was a very significant development win.

If the task you are working on can be expressed as a pure function that simply processes input parameters into a return structure, it is easy to switch it out for different implementations.  If it is a system that maintains internal state or has multiple entry points, you have to be a bit more careful about switching it in and out.  If it is a gnarly mess with lots of internal callouts to other systems to maintain parallel state changes, then you have some cleanup to do before trying a parallel implementation.

There are two general classes of parallel implementations I work with:  The reference implementation, which is much smaller and simpler, but will be maintained continuously, and the experimental implementation, where you expect one version to “win” and consign the other implementation to source control in a couple weeks after you have some confidence that it is both fully functional and a real improvement.

It is completely reasonable to violate some generally good coding rules while building an experimental implementation – copy, paste, and find-replace rename is actually a good way to start.  Code fearlessly on the copy, while the original remains fully functional and unmolested.  It is often tempting to shortcut this by passing in some kind of option flag to existing code, rather than enabling a full parallel implementation.  It is a  grey area, but I have been tending to find the extra path complexity with the flag approach often leads to messing up both versions as you work, and you usually compromise both implementations to some degree.

(via Marc)
via:marc  coding  john-carmack  parallel  development  evolution  lifecycle  project-management 
june 2014 by jm
"big data, small machine" -- perform computation on very large graphs using an algorithm they're calling Parallel Sliding Windows. similar to Google's Pregel, apparently
graphs  graphchi  big-data  algorithms  parallel 
july 2012 by jm
Ask For Forgiveness Programming - Or How We'll Program 1000 Cores
Nifty concept from IBM Research's David Ungar -- "race-and-repair". Simply put, allow lock-free lossy/inconsistent calculation, and backfill later, using concepts like "freshener" threads, to reconcile inconsistencies. This is a familiar concept in distributed computing nowadays thanks to CAP, but I hadn't heard it being applied to single-host multicore parallel programming before -- I can already think of an application in our codebase...
race-and-repair  concurrency  coding  ibm  parallelism  parallel  david-ungar  cap  multicore 
april 2012 by jm
Introduction to parallel & distributed algorithms
really interesting parallel algorithm concepts. I'd seen parallel merge sort before from the map-reduce world, but some others are new to me and worth thinking about (via Hacker News)
via:hackernews  algorithms  distributed  parallel  map-reduce  merge-sort  sorting  from delicious
august 2010 by jm
'A parallel implementation of gzip for modern multi-processor, multi-core machines', by Mark Adler, no less
adler  pigz  gzip  compression  performance  concurrency  shell  parallel  multicore  zip  software  from delicious
october 2009 by jm
'File-based, rather than tuple-based processing'; based around UNIX command-line toolset; good UNIXish UI; lots of caching of intermediate results; low setup overhead -- although it does require a shared POSIX filesystem, e.g. NFS, for synchronization
networking  python  opensource  grid  map-reduce  filemap  files  unix  command-line  parallel  distcomp 
july 2009 by jm

Copy this bookmark: