jm + databricks   2

interesting, a synchronization daemon from Databricks which they use to synch up dev repos with a remote "devbox" in EC2 for heavyweight compilation
remote-compiles  compiling  devbox  databricks  coding  tools  dev  ec2 
21 days ago by jm
Spark 1.2 released
This is the version with the superfast petabyte-sort record:
Spark 1.2 includes several cross-cutting optimizations focused on performance for large scale workloads. Two new features Databricks developed for our world record petabyte sort with Spark are turned on by default in Spark 1.2. The first is a re-architected network transfer subsystem that exploits Netty 4’s zero-copy IO and off heap buffer management. The second is Spark’s sort based shuffle implementation, which we’ve now made the default after significant testing in Spark 1.1. Together, we’ve seen these features give as much as 5X performance improvement for workloads with very large shuffles.
spark  sorting  hadoop  map-reduce  batch  databricks  apache  netty 
december 2014 by jm

Copy this bookmark: