Probabilistic Data Structures for Web Analytics and Data Mining « Highly Scalable Blog
158 bookmarks. First posted by jmeagher may 2012.
A nice visual representing the difference in amount of memory it takes to estimate vs. get exact answers for analyzing stream-based data.
architecture
analytics
metrics
memory_footprint
algorithms
8 days ago by countfloortiles
"Probabilistic Data Structures for Web Analytics and Data Mining" - recommended via @Prismatic
from twitter
9 weeks ago by peschkaj
Probabilistic Data Structures for Web Analytics and Data Mining http://t.co/TkXTvNJW
from instapaper
12 weeks ago by apas
Probabilistic Data Structures for Web Analytics and Data Mining -
from twitter_favs
december 2012 by michaeltri
Probabilistic Data Structures for Web Analytics and Data Mining -
from twitter_favs
december 2012 by andrewbrown
Probabilistic Data Structures for Web Analytics and Data Mining
from twitter_favs
november 2012 by ngpestelos
Probabilistic Data Structures for Web Analytics and Data Mining
from twitter_favs
november 2012 by tdhopper
At the same time, the length of the estimator is a very slow growing function of the capacity, 5-bit buckets are enough
analytics
november 2012 by zdwalter
I should read this... I haven't.
probabilisticcomputing
todo
programming
algorithm
algorithms
bigdata
via:HackerNews
september 2012 by mcherm
Probabilistic Data Structures for Web Analytics and Data Mining:
from twitter_favs
august 2012 by leecarrot
Mining big data on a budget: use probabilistic data structures for limited memory footprint in counting and queries
later
prob
ds
from delicious
august 2012 by chl
Computation of more advanced metrics like a number of unique visitor or most frequent items is more challenging and requires a lot of resources if implemented straightforwardly. In this article, I provide an overview of probabilistic data structures that allow one to estimate these and many other metrics and trade precision of the estimations for the memory consumption. These data structures can be used both as temporary data accumulators in query processing procedures and, perhaps more important, as a compact – sometimes astonishingly compact – replacement of raw data in stream-based computing.
algorithms
datamining
probability
machinelearning
august 2012 by charman
RT @ikatsov: Probabilistic Data Structures for Web Analytics and Data Mining
from twitter
june 2012 by kangaroo5383
@rgarver: Probabilistic Data Structures for Web Analytics and Data Mining « Highly Scalable Blog http://t.co/iQDG319c
ifttt
twitter
may 2012 by rgarver
Analysis of such large data sets often requires powerful distributed data stores like Hadoop and heavy data processing with techniques like MapReduce. This approach often leads to heavyweight high-latency analytical processes and poor applicability to realtime use cases.
bigdata
algorithm
algorithms
datamining
probability
compsci
via:fourshortlinks
may 2012 by thadk
Statistical analysis and mining of huge multi-terabyte data sets is a common task nowadays, especially in the areas like web analytics and Internet advertising. Analysis of such large data sets often requires powerful distributed data stores like Hadoop and heavy data processing with techniques like MapReduce. This approach often leads to heavyweight high-latency analytical processes and poor applicability to realtime use cases. On the other hand, when one is interested only in simple additive metrics like total page views or average price of conversion, it is obvious that raw data can be efficiently summarized, for example, on a daily basis or using simple in-stream counters. Computation of more advanced metrics like a number of unique visitor or most frequent items is more challenging and requires a lot of resources if implemented straightforwardly. In this article, I provide an overview of probabilistic data structures that allow one to estimate these and many other metrics and trade precision of the estimations for the memory consumption. These data structures can be used both as temporary data accumulators in query processing procedures and, perhaps more important, as a compact – sometimes astonishingly compact – replacement of raw data in stream-based computing.
datastructures
probalistic
bigdata
may 2012 by sstrobel
RT @newsycombinator: Probabilistic Data Structures for Web Analytics and Data Mining
from twitter
may 2012 by ma51ne64
We'll have blog posts on most of these at @Kiip soon, as we use most. Probabilistic Data Structures for Web Analytics http://t.co/LxS9zyfA
from instapaper
may 2012 by indirect
from Pinboard Network RSS Improver http://pipes.yahoo.com/pipes/pipe.info?_id=b22b9c9acee5906aab7e8a7645a247a9 Stream summary, count-min sketches, loglog counting, linear counters. Some nifty algorithms for probabilistic estimation of element frequencies and data-set cardinality (via proggit)Source: http://pinboard.in/
iftttGR
may 2012 by earth2marsh
Stream summary, count-min sketches, loglog counting, linear counters. Some nifty algorithms for probabilistic estimation of element frequencies and data-set cardinality (via proggit)
via:proggit
algorithms
probability
probabilistic
count-min
stream-summary
loglog-counting
linear-counting
estimation
big-data
may 2012 by jm
tags
(source : @data_mining @web_scraping algorithm algorithms algos analytics architecture bestof bestof2012 big-data bigdata bloom bloomflilter cardinality compsci computerscience count-min counting data-mining data-stream data-structure data datamining datastructure datastructures ds e estimation filter frequency from:instapaper hacking https://twitter.com/datajunkie/status/280197969427451906) ifttt iftttgr later linear-counting loglog-counting machine-learning machinelearning math memory_footprint metrics ml prob probabilistic probabilisticcomputing probabilistic_data_structures_analytics_machine-learning probability probalistic processing programming random referen research scalability scale sent-weekly software statistics stats stream-summary stream streaming structures summarization todo toread twitter twitter_analysis via:fourshortlinks via:hacker-news via:hackernews via:popular via:proggit website work