mapreduce   11476

« earlier    

Map, reduce, and fold for the programmatically imperative | prose :: and :: conz
When I first experienced functional programming in Scala, I had some learning curves to get over thanks to years of imperative programming. One of those came when knowing what to do with a collection of items (more generally, this applies to some (all?) monadic types. However, I'm going to resist the urge to contribute to�
scala  map-reduce  mapreduce  functional  programming  functional-programming 
5 days ago by lenards
klbostee/dumbo @ GitHub
Dumbo is a project that allows you to easily write and run Hadoop programs in Python
python  mapreduce  hadoop 
6 weeks ago by brunns
Replacing Sawzall — a case study in domain-specific language migration
Since the logs proxy decouples our data access policy from the programming language used for analysis, individual teams now have more freedom to choose the language that best fits their needs. However, since analysis libraries can often get very complicated, and multiple teams often share common data sources, there is an economy of scale in choosing a common language for most analysis.

At Google, most Sawzall analysis has been replaced by Go. Go has the advantage of being a relatively small language which is easy to learn and integrates well with Google’s production infrastructure. Fast compile times and garbage collection make Go a natural fit for iterative development. To ease the process of migrating from Sawzall, we’ve developed a set of Go libraries that we call Lingo (for Logs in Go). Lingo includes a table aggregation library that brings the powerful features of Sawzall aggregation tables to Go, using reflection to support user-defined types for table keys and values. It also provides default behavior for setting up and running a MapReduce that reads data from the logs proxy. The result is that Lingo analysis code is often as concise and simple as (and sometimes simpler than) the Sawzall equivalent.
mapreduce  go  analytics 
10 weeks ago by janpeuker
[1512.01625] Coded MapReduce
Feels, at first glance, a lot like the 'sudoku' methods in genomics, and uses of codes in experimental designs
mapreduce  arxiv  computerscience  research-article  code  coding-theory 
november 2018 by arthegall
Reflow - Language & runtime for distributed, incremental data processing in the cloud
A system for incremental data processing in the cloud. Reflow enables scientists and engineers to compose existing tools (packaged in Docker images) using ordinary programming constructs. Reflow then evaluates these programs in a cloud environment, transparently parallelizing work and memoizing results. Reflow was created at GRAIL to manage our NGS (next generation sequencing) bioinformatics workloads on AWS, but has also been used for many other applications, including model training and ad-hoc data analyses.
streaming  big-data  mapreduce  opensource  DSL  distributed 
november 2018 by liqweed
Apache ORC - High-Performance Columnar Storage for Hadoop
The smallest, fastest columnar storage for Hadoop workloads.
Includes support for ACID transactions and snapshot isolation. Built-in Indexes - Jump to the right row with indexes including minimum, maximum, and bloom filters for each column.

Complex Types - Supports all of Hive's types including the compound types: structs, lists, maps, and unions.
big-data  Hadoop  mapreduce  streaming  opensource  search  database  filesystem  persistence 
november 2018 by liqweed
dg92/Performance-Analysis-JS: Map/Reduce/Filter/Find Vs For loop Vs For each Vs Lodash vs Ramda
Comparing native JavaScript array methods map, reduce, filter, and find against for loop, forEach loop and lodash methods. The analysis uses basic operations and heavy data manipulation to analyze the execution speed of each method.
es6  performance  javascript  functional_programming  mapreduce 
october 2018 by archangel

« earlier    

related tags

advocacy  algorithm  algorithms  analytics  apache  apache_beam  appengine  article  artificialintelligence  arxiv  asynchronous_programming  avro  avrò  awesome  aws-lambda  aws  bash  batch  batch_processing  big-data  big  big_data  bigdata  bigquery  blockchain  c#  calcite  captheorem  cascading  cassandra  cli  clojure  cloud  cluster  clustering  clusters  code  coding-theory  collaboration  collectors  compsci  computerscience  concurrency  consumer  containers  coreos  course  cpp  criticism  cryptography  data/science  data  data_engineering  database  databases  datascience  dbms  delicious-import  delicious  dev  development  distcp  distributed  distributedsystem  distsys  docker  drill  dsl  editor  engine  es6  etl  example  filesystem  filter  filtering  flume  forthecomments  framework  functional-programming  functional  functional_programming  functionalprogramming  git  github  go  golang  google  google_dataflow  graph  hadoop  hbase  hdfs  history  hive  howto  important  integration  java  java8  javascript  javascript_map  jobtracker  kmeans  kubernetes  lambda  lambda_architecture  learning  local  logfiles  machinelearning  map-reduce  map  map_function  memory  mesos  metadata  mr  mr1  mr2  nosql  nsa  olap  opensource  optimization  pairprogramming  parallel  parallelization  parkour  patterns  performance  persistence  pig  pipelines  presentation  presentations  processing  producer  programming  python  pywren  r  rdbms  realtime  recommendation  reduce  reference  relational  relationship  research-article  review  ruby  scala  scalability  scalding  schema  science  search  security  serverless  shell  shellscript  sidekiq  slice  slides  smartcontracts  software  spark  sql  sqoop  statistics  storage  stream  stream_processing  streaming  sysadmin  system  teaching  tech  technology  techtalk  tensorflow  tez  tpl  tuning  tutorial  typescript  udacity  udf  video  videos  web  webdev  yarn 

Copy this bookmark: