jm + graphs   28

ASAP: Automatic Smoothing for Attention Prioritization in Streaming Time Series Visualization
Peter Bailis strikes again.

'Time series visualization of streaming telemetry (i.e., charting of
key metrics such as server load over time) is increasingly prevalent
in recent application deployments. Existing systems simply plot the
raw data streams as they arrive, potentially obscuring large-scale
deviations due to local variance and noise. We propose an alternative:
to better prioritize attention in time series exploration and
monitoring visualizations, smooth the time series as much as possible
to remove noise while still retaining large-scale structure. We
develop a new technique for automatically smoothing streaming
time series that adaptively optimizes this trade-off between noise
reduction (i.e., variance) and outlier retention (i.e., kurtosis). We
introduce metrics to quantitatively assess the quality of the choice
of smoothing parameter and provide an efficient streaming analytics
operator, ASAP, that optimizes these metrics by combining techniques
from stream processing, user interface design, and signal
processing via a novel autocorrelation-based pruning strategy and
pixel-aware preaggregation. We demonstrate that ASAP is able to
improve users’ accuracy in identifying significant deviations in time
series by up to 38.4% while reducing response times by up to 44.3%.
Moreover, ASAP delivers these results several orders of magnitude
faster than alternative optimization strategies.'
dataviz  graphs  metrics  peter-bailis  asap  smoothing  aggregation  time-series  tsd 
8 days ago by jm
Online chart maker for CSV and Excel data; make charts and dashboards online. One really nice feature is that charts made this way get permalinks, and can be easily inlined as PNGs or HTML5 divs. (See for an example.)
data  javascript  python  tools  visualization  dataviz  charts  graphing  web  plotly  plots  graphs 
january 2016 by jm
Sorting out graph processing
Some nice real-world experimentation around large-scale data processing in differential dataflow:
If you wanted to do an iterative graph computation like PageRank, it would literally be faster to sort the edges from scratch each and every iteration, than to use unsorted edges. If you want to do graph computation, please sort your edges.

Actually, you know what: if you want to do any big data computation, please sort your records. Stop talking sass about how Hadoop sorts things it doesn't need to, read some papers, run some tests, and then sort your damned data. Or at least run faster than me when I sort your data for you.
algorithms  graphs  coding  data-processing  big-data  differential-dataflow  radix-sort  sorting  x-stream  counting-sort  pagerank 
august 2015 by jm
Really nice time series dashboarding app. Might consider replacing graphitus with this...
time-series  data  visualisation  graphs  ops  dashboards  facette 
january 2015 by jm
Charted is a tool for automatically visualizing data, created by the
Product Science team at Medium. Give it the link to a data file and Charted returns a beautiful, shareable chart of the data.

Nice, but it's no graphite -- pretty basic.
charted  graphs  charts  ui  open-source  medium 
november 2014 by jm
Metrics-Driven Development
we believe MDD is equal parts engineering technique and cultural process. It separates the notion of monitoring from its traditional position of exclusivity as an operations thing and places it more appropriately next to its peers as an engineering process. Provided access to real-time production metrics relevant to them individually, both software engineers and operations engineers can validate hypotheses, assess problems, implement solutions, and improve future designs.

Broken down into the following principles: 'Instrumentation-as-Code', 'Single Source of Truth', 'Developers Curate Visualizations and Alerts', 'Alert on What You See', 'Show me the Graph', 'Don’t Measure Everything (YAGNI)'.

We do all of these at Swrve, naturally (a technique I happily stole from Amazon).
metrics  coding  graphite  mdd  instrumentation  yagni  alerting  monitoring  graphs 
july 2014 by jm
Spark Streaming
an extension of the core Spark API that allows enables high-throughput, fault-tolerant stream processing of live data streams. Data can be ingested from many sources like Kafka, Flume, Twitter, ZeroMQ or plain old TCP sockets and be processed using complex algorithms expressed with high-level functions like map, reduce, join and window. Finally, processed data can be pushed out to filesystems, databases, and live dashboards. In fact, you can apply Spark’s in-built machine learning algorithms, and graph processing algorithms on data streams.
spark  streams  stream-processing  cep  scalability  apache  machine-learning  graphs 
may 2014 by jm
Traffic Graph – Google Transparency Report
this is cool. Google are exposing an aggregated 'all services' hit count time-series graph, broken down by country, as part of their Transparency Report pages
transparency  filtering  web  google  http  graphs  monitoring  syria 
february 2014 by jm
How to Name a Baby
some good data (and graphs) on baby names (via Ruth)
via:ruth  babies  naming  graphs  dataviz  data  usa  names 
january 2014 by jm
a metric storage daemon, exposing both a carbon listener and a simple web service. Its aim is to become a simple, scalable and drop-in replacement for graphite's backend.

Pretty alpha for now, but definitely worth keeping an eye on to potentially replace our burgeoning Carbon fleet...
graphite  carbon  cassandra  storage  metrics  ops  graphs  service-metrics 
december 2013 by jm
There is NO spare capacity for Dublin's water supply
The problem in a nutshell is that for an uncomfortable amount of the year the demand outstrips what the system can comfortably supply. In the graph below you’ll see the red line (demand for water) matches and regularly exceeds the blue line (what’s produced).
drought  water  dublin  mismanagement  capacity  dcc  dublin-council  graphs 
november 2013 by jm
'Plugin to make highly interactive graphite graph objects ((i.e. graphs where you can interactively toggle on/off individual series, inspect datapoints, zoom in realtime, etc) Supports Flot (canvas), Rickshaw (svg) and standard graphite png images (in case you're nostalgic and don't like interactivity).'
graphs  graphing  graphite  dataviz  flot  rickshaw  svg  canvas  javascript 
september 2013 by jm
gnuplot's dumb terminal
Turns out gnuplot has a pretty readable ASCII terminal rendering mode; combined with 'watch' it makes for a nifty graphing one-liner
gnuplot  plotting  charts  graphs  cli  command-line  unix  gnu  hacks  dataviz  visualization  ascii 
june 2013 by jm
Accuweather long-range forecast accuracy questionable
"questionable" is putting it mildly:

Now to to the point: Are the 25-day forecasts any good? In a word, no. Specifically, after running this data, I would not trust a forecast high temperature more than a week out. I’d rather look at the normal (historical average) temperature for that day than the forecast. Similarly, I would not even look at a precipitation forecast more than 6 days in advance, and I wouldn’t start to trust it for anything important until about 3 days ahead of time.
accuweather  accuracy  fail  graphs  data  weather  forecasting  philadelphia 
june 2013 by jm
Paper: "Root Cause Detection in a Service-Oriented Architecture" [pdf]
LinkedIn have implemented an automated root-cause detection system:

This paper introduces MonitorRank, an algorithm that can reduce the time, domain knowledge, and human effort required to find the root causes of anomalies in such service-oriented architectures. In the event of an anomaly, MonitorRank provides a ranked order list of possible root causes for monitoring teams to investigate. MonitorRank uses the historical and current time-series metrics of each sensor as its input, along with the call graph generated between sensors to build an unsupervised model for ranking. Experiments on real production outage data from LinkedIn, one of the largest online social networks, shows a 26% to 51% improvement in mean
average precision in finding root causes compared to baseline and current state-of-the-art methods.

This is a topic close to my heart after working on something similar for 3 years in Amazon!

Looks interesting, although (a) I would have liked to see more case studies and examples of "real world" outages it helped with; and (b) it's very much a machine-learning paper rather than a systems one, and there is no discussion of fault tolerance in the design of the detection system, which would leave me worried that in the case of a large-scale outage event, the system itself will disappear when its help is most vital. (This was a major design influence on our team's work.)

Overall, particularly given those 2 issues, I suspect it's not in production yet. Ours certainly was ;)
linkedin  soa  root-cause  alarming  correlation  service-metrics  machine-learning  graphs  monitoring 
june 2013 by jm
Introducing Kale « Code as Craft
Etsy have implemented a tool to perform auto-correlation of service metrics, and detection of deviation from historic norms:
at Etsy, we really love to make graphs. We graph everything! Anywhere we can slap a StatsD call, we do. As a result, we’ve found ourselves with over a quarter million distinct metrics. That’s far too many graphs for a team of 150 engineers to watch all day long! And even if you group metrics into dashboards, that’s still an awful lot of dashboards if you want complete coverage. Of course, if a graph isn’t being watched, it might misbehave and no one would know about it. And even if someone caught it, lots of other graphs might be misbehaving in similar ways, and chances are low that folks would make the connection.

We’d like to introduce you to the Kale stack, which is our attempt to fix both of these problems. It consists of two parts: Skyline and Oculus. We first use Skyline to detect anomalous metrics. Then, we search for that metric in Oculus, to see if any other metrics look similar. At that point, we can make an informed diagnosis and hopefully fix the problem.

It'll be interesting to see if they can get this working well. I've found it can be tricky to get working with low false positives, without massive volume to "smooth out" spikes caused by normal activity. Amazon had one particularly successful version driving severity-1 order drop alarms, but it used massive event volumes and still had periodic false positives. Skyline looks like it will alarm on a single anomalous data point, and in the comments Abe notes "our algorithms err on the side of noise and so alerting would be very noisy."
etsy  monitoring  service-metrics  alarming  deviation  correlation  data  search  graphs  oculus  skyline  kale  false-positives 
june 2013 by jm
21 graphs that show America’s health-care prices are ludicrous
Excellent data, this. I'd heard a few of these prices, but these graphs really hit home. $26k for a caesarean section at the 95th percentile!? talk about out of control price gouging.
healthcare  costs  economics  us-politics  world  comparison  graphs  charts  data  via:hn  america 
march 2013 by jm
Monitoring Apache Hadoop, Cassandra and Zookeeper using Graphite and JMXTrans
nice enough, but a lot of moving parts. It would be nice to see a simpler ZK+Graphite setup using the 'mntr' verb
graphite  monitoring  ops  zookeeper  cassandra  hadoop  jmx  jmxtrans  graphs 
march 2013 by jm
Network graph viz of Irish politicians and organisations on Twitter
generated by the Clique Research Cluster at UCD and DERI. 'a visualization of the unified graph representation for the users in the data, produced using Gephi and sigma.js. Users are coloured according to their community (i.e. political affiliation). The size of each node is proportional to its in-degree (i.e. number of incoming links).' sigma.js provides a really user-friendly UI to the graphs, although -- as with most current graph visualisations -- it'd be particularly nice if it was possible to 'tease out' and focus on interesting nodes, and get a pasteable URL of the result, in context. Still, the most usable graph viz I've seen in a while...
graphs  dataviz  ucd  research  ireland  twitter  networks  community  sigma.js  javascript  canvas  gephi 
january 2013 by jm
"big data, small machine" -- perform computation on very large graphs using an algorithm they're calling Parallel Sliding Windows. similar to Google's Pregel, apparently
graphs  graphchi  big-data  algorithms  parallel 
july 2012 by jm
Floyd–Warshall algorithm - Wikipedia, the free encyclopedia
"a graph analysis algorithm for finding shortest paths in a weighted graph (with positive or negative edge weights)".
graphs  algorithms  k-shortest-paths 
june 2012 by jm
"K* : A Directed On-The-Fly Algorithm for Finding the k Shortest Paths", Husain Aljazzar and Stefan Leue, 2008
"We present a new algorithm, called K*, for finding the k shortest paths between a designated pair of vertices in a given directed weighted graph. Compared to Eppstein’s algorithm, which is the most prominent algorithm for solving this problem, K* has two advantages. First, K* performs on-the-fly, which means that
it does not require the graph to be explicitly available and stored in main memory. Portions of the graph will
be generated as needed. Second, K* is a directed algorithm which enables the use of heuristic functions
to guide the search. This leads to significant improvements in the memory and runtime demands for many
practical problem instances. We prove the correctness of K* and show that it maintains a worst-case runtime
complexity of O(m+k n log(k n)) and a space complexity of O(k n + m), where n is the number of vertices
and m is the number of edges of the graph. We provide experimental results which illustrate the scalability of
the algorithm."
graphs  k-shortest-paths  algorithms  papers 
june 2012 by jm
Open Data Structures
A free-as-in-speech as well as -beer textbook of data structures, covering a great range, including some I hadn't heard of before. Here's the full list: ArrayStack, FastArrayStack, ArrayQueue, ArrayDeque, DualArrayDeque, RootishArrayStack, SLList, DLList,
SEList, SkiplistSSet, SkiplistList, ChainedHashTable, LinearHashTable, BinaryTree, BinarySearchTree, Treap, ScapegoatTree, RedBlackTree, BinaryHeap, MeldableHeap, AdjacencyMatrix, AdjacencyLists, BinaryTrie, XFastTrie, and YFastTrie
algorithms  books  data-structures  computer-science  coding  tries  skiplists  arrays  queues  heap  trees  graphs  hashtables 
may 2012 by jm
sparklines in your terminal window. Simply give it a comma or space-separated list of data values, and it'll generate an ANSI-graphics sparkline chart. Brilliant! (via mjd)
via:mjdominus  sparklines  charts  graphs  bash  shell  terminal  cli  ansi 
december 2011 by jm
dygraphs JavaScript Visualization Library
'an open source JavaScript library that produces produces interactive, zoomable charts of time series. It is designed to display dense data sets and enable users to explore and interpret them.' quite pretty
time-series  data  tsd  graphs  charts  javascript  via:reddit  dataviz  visualization  opensource  dygraphs  from delicious
december 2009 by jm

related tags

accuracy  accuweather  aggregation  alarming  alerting  algorithms  america  ansi  apache  arrays  asap  ascii  babies  bash  big-data  books  canvas  capacity  carbon  cassandra  cep  charted  charts  cli  coding  command-line  commandline  community  comparison  computer-science  correlation  costs  counting-sort  dashboards  data  data-processing  data-structures  dataviz  dcc  design  deviation  differential-dataflow  drought  dublin  dublin-council  dygraphs  economics  economist  etsy  facette  fail  false-positives  filtering  flot  forecasting  gephi  gnu  gnuplot  google  graphchi  graphing  graphite  graphs  hacker-news  hacks  hadoop  hashtables  healthcare  heap  http  instrumentation  io  ireland  javascript  jmx  jmxtrans  js  k-shortest-paths  kale  latency  linkedin  linux  machine-learning  mdd  medium  metrics  mismanagement  monitoring  names  naming  networks  oculus  open-source  opensource  ops  pagerank  papers  parallel  peter-bailis  philadelphia  plotly  plots  plotting  python  queues  radix-sort  research  rickshaw  root-cause  scalability  search  service-metrics  shell  sigma.js  skiplists  skyline  smoothing  soa  sorting  spark  sparklines  storage  stream-processing  streams  svg  syria  sysdig  terminal  time-series  tools  transparency  trees  tries  tsd  twitter  ucd  ui  unix  us-politics  usa  via:hn  via:mjdominus  via:reddit  via:ruth  visualisation  visualization  water  weather  web  world  x-stream  yagni  zookeeper 

Copy this bookmark: