Organizing Functional Code for Parallel Execution; or, foldl and foldr Considered Slightly Harmful
'Divide and conquer'...

We need parallel strategies for problem decomposition, data structured design, and algorithmic organisation:
> The top down view:
Don't split a problem into 'the first' and 'the rest'
Instead *split a problem into roughly equal pieces;
recursively solve sub-problems and then combine sub-solutions.*
> The bottom up view:
Don't create a null solution, the successively update it
Instead *map inputs independently to singleton solutions,
then merge the sub-solutions treewise.*
> Combining sub-solutions is usually trickier than
incremental update of a single solution.

Google MapReduce is a **big deal**!

Airpal: a Web UI for PrestoDB – Airbnb Engineering & Data Science – Medium
We currently hold about one and a half petabytes of data as Hive managed tables in HDFS, and the relatively small data size of our important “core_data” tables allows us to use Presto as the default query engine for analysis. When running ad hoc queries and iterating on the steps of an analysis, Presto is much snappier and more responsive than traditional map reduce jobs. The biggest benefit to adding Presto to our infrastructure stack, though, is that we don’t have to add additional complexity to allow “interactive” querying. Because we are querying against our one, central Hive warehouse, we can keep a “single source of truth” with no large scale copies to a separate storage/query layer. Additionally, the fact that we don’t need change data storage type from RC format to see the speed improvements, makes Presto a great choice for our infrastructure.
Azkaban is a batch workflow job scheduler created at LinkedIn to run Hadoop jobs. Azkaban resolves the ordering through job dependencies and provides an easy to use web user interface to maintain and track your workflows.
