good blog post on histogram-estimation stream processing algorithms

streams
streaming
stream-processing
histograms
percentiles
estimation
waves
statistics
algorithms

february 2013 by jm

After reviewing several dozen papers, a score or so in depth, I identified two data structures that appear to enable us to answer these recency and frequency queries: exponential histograms (from "Maintaining Stream Statistics Over Sliding Windows" by Datar et al.) and waves (from "Distributed Streams Algorithms for Sliding Windows" by Gibbons and Tirthapura). Both of these data structures are used to solve the so-called counting problem, the problem of determining, with a bound on the relative error, the number of 1s in the last N units of time. In other words, the data structures are able to answer the question: how many 1s appeared in the last n units of time within a factor of Error (e.g., 50%). The algorithms are neat, so I'll present them briefly.

february 2013 by jm

Distributed Streams Algorithms for Sliding Windows [PDF]

february 2013 by jm

'Massive data sets often arise as physically distributed, parallel data streams, and it is important to estimate various aggregates and statistics on the union of these streams. This paper presents algorithms for estimating aggregate

functions over a “sliding window” of the N most recent data items in one or more streams. [...] Our results are obtained using a novel family of synopsis data structures called waves.'

waves
papers
streaming
algorithms
percentiles
histogram
distcomp
distributed
aggregation
statistics
estimation
streams
functions over a “sliding window” of the N most recent data items in one or more streams. [...] Our results are obtained using a novel family of synopsis data structures called waves.'

february 2013 by jm

**related tags**

Copy this bookmark: