mapreduce   11486

« earlier    

Organizing Functional Code for Parallel Execution; or, foldl and foldr Considered Slightly Harmful
'Divide and conquer'...

We need parallel strategies for problem decomposition, data structured design, and algorithmic organisation:
> The top down view:
Don't split a problem into 'the first' and 'the rest'
Instead *split a problem into roughly equal pieces;
recursively solve sub-problems and then combine sub-solutions.*
> The bottom up view:
Don't create a null solution, the successively update it
Instead *map inputs independently to singleton solutions,
then merge the sub-solutions treewise.*
> Combining sub-solutions is usually trickier than
incremental update of a single solution.

Google MapReduce is a **big deal**!

... get rid of cons!
functionalprogramming  mapreduce  videos  presentations  performance 
7 weeks ago by ianchanning
Airpal: a Web UI for PrestoDB – Airbnb Engineering & Data Science – Medium
We currently hold about one and a half petabytes of data as Hive managed tables in HDFS, and the relatively small data size of our important “core_data” tables allows us to use Presto as the default query engine for analysis. When running ad hoc queries and iterating on the steps of an analysis, Presto is much snappier and more responsive than traditional map reduce jobs. The biggest benefit to adding Presto to our infrastructure stack, though, is that we don’t have to add additional complexity to allow “interactive” querying. Because we are querying against our one, central Hive warehouse, we can keep a “single source of truth” with no large scale copies to a separate storage/query layer. Additionally, the fact that we don’t need change data storage type from RC format to see the speed improvements, makes Presto a great choice for our infrastructure.
performance  analytics  mapreduce 
11 weeks ago by janpeuker
Azkaban
Azkaban is a batch workflow job scheduler created at LinkedIn to run Hadoop jobs. Azkaban resolves the ordering through job dependencies and provides an easy to use web user interface to maintain and track your workflows.
java  mapreduce  integration 
may 2018 by janpeuker

« earlier    

related tags

advocacy  algorithm  algorithms  amazon  analytics  apache  apache_beam  app  appengine  article  asynchronous_programming  avro  avrò  awesome  aws-lambda  aws  bash  batch_processing  big-data  big  big_data  bigdata  bigquery  blockchain  business  calcite  captheorem  cascading  cassandra  cli  clojure  cloud  cluster  clustering  clusters  code  collectors  compsci  computerscience  computing  concurrency  containers  coreos  course  cpp  criticism  cryptography  data/science  data  database  databases  datastore  dataviz  dbms  delicious-import  delicious  dev  development  distcp  distributed  distributedsystem  distsys  docker  drill  editor  engine  etl  example  filter  flume  for  forthecomments  framework  functional  functionalprogramming  gae  git  github  go  golang  google  google_dataflow  graph  hadoop  hbase  hdfs  history  hive  howto  important  integration  java  java8  javascript  javascript_map  jobtracker  kmeans  kubernetes  lambda  lambda_architecture  local  logfiles  machinelearning  map  map_function  memory  mesos  metadata  mobile  mr  mr1  mr2  node.js  nosql  nsa  olap  opensource  optimization  package  parallel  parkour  patterns  performance  pig  presentation  presentations  processing  programming  python  pywren  r  rdbms  realtime  recommendation  reduce  reference  relational  relationship  review  rhadoop  ruby  s3  scala  scalability  scalding  schema  science  security  serverless  shell  shellscript  slides  smartcontracts  software  spark  sql  sqoop  statistics  storage  storm  stream  stream_processing  streaming  sysadmin  system  teaching  tech  techtalk  tez  tuning  tutorial  twitter  typescript  udacity  udf  video  videos  web  webdev  yarn 

Copy this bookmark:



description:


tags: