jm + olap   5

RADStack - an open source Lambda Architecture built on Druid, Kafka and Samza
'In this paper we presented the RADStack, a collection of complementary technologies that can be used together to power interactive analytic applications. The key pieces of the stack are Kafka, Samza, Hadoop, and Druid. Druid is designed for exploratory analytics and is optimized for low latency data exploration, aggregation, and ingestion, and is well suited for OLAP workflows. Samza and Hadoop complement Druid and add data processing functionality, and Kafka enables high throughput event delivery.'
druid  samza  kafka  streaming  cep  lambda-architecture  architecture  hadoop  big-data  olap 
april 2015 by jm
Presto: Interacting with petabytes of data at Facebook
Presto has become a major interactive system for the company’s data warehouse. It is deployed in multiple geographical regions and we have successfully scaled a single cluster to 1,000 nodes. The system is actively used by over a thousand employees,who run more than 30,000 queries processing one petabyte daily.

Presto is 10x better than Hive/MapReduce in terms of CPU efficiency and latency for most queries at Facebook. It currently supports a large subset of ANSI SQL, including joins, left/right outer joins, subqueries,and most of the common aggregate and scalar functions, including approximate distinct counts (using HyperLogLog) and approximate percentiles (based on quantile digest). The main restrictions at this stage are a size limitation on the join tables and cardinality of unique keys/groups. The system also lacks the ability to write output data back to tables (currently query results are streamed to the client).
facebook  hadoop  hdfs  open-source  java  sql  hive  map-reduce  querying  olap 
november 2013 by jm
shades
A command-line utility in Ruby to perform (a) OLAP cubing and (b) histogramming, given whitespace-delimited line data
ruby  olap  number-crunching  data  histograms  cli 
june 2013 by jm
Cloudera Impala 1.0: It’s Here, It’s Real, It’s Already the Standard for SQL on Hadoop
we are proud to announce the first production drop of Impala, which reflects feedback from across the user community based on multiple types of real-world workloads. Just as a refresher, the main design principle behind Impala is complete integration with the Hadoop platform (jointly utilizing a single pool of storage, metadata model, security framework, and set of system resources). This integration allows Impala users to take advantage of the time-tested cost, flexibility, and scale advantages of Hadoop for interactive SQL queries, and makes SQL a first-class Hadoop citizen alongside MapReduce and other frameworks. The net result is that all your data becomes available for interactive analysis simultaneously with all other types of processing, with no ETL delays needed.


Along with some great benchmark numbers against Hive. nifty stuff
cloudera  impala  sql  querying  etl  olap  hadoop  analytics  business-intelligence  reports 
may 2013 by jm
Boundary Techtalk - Large-scale OLAP with Kobayashi
Boundary on their TSD-on-Riak store.
Dietrich Featherston, Engineer at Boundary, walks through the process of designing Kobayashi, the time-series analytics database behind our network metrics. He goes through the false-starts and lessons learned in effectively using Riak as the storage layer for a large-scale OLAP database.  The system is ultimately capable of answering complex, ad-hoc queries at interactive latencies.
video  boundary  tsd  riak  eventual-consistency  storage  kobayashi  olap  time-series 
april 2013 by jm

Copy this bookmark:



description:


tags: