jm + graphite   25

Raintank investing in Graphite
paying Jason Dixon to work on it, improving the backend, possibly replacing the creaky Whisper format. great news!
graphite  metrics  monitoring  ops  open-source  grafana  raintank 
july 2016 by jm
Nobody Loves Graphite Anymore - VividCortex
Graphite has a place in our current monitoring stack, and together with StatsD will always have a special place in the hearts of DevOps practitioners everywhere, but it’s not representative of state-of-the-art in the last few years. Graphite is where the puck was in 2010. If you’re skating there, you’re missing the benefits of modern monitoring infrastructure.

The future I foresee is one where time series capabilities (the raw power needed, which I described in my time series requirements blog post, for example) are within everyone’s reach. That will be considered table stakes, whereas now it’s pretty revolutionary.


Like I've been saying -- we need Time Series As A Service! This should be undifferentiated heavy lifting.
graphite  tsd  time-series  vividcortex  statsd  ops  monitoring  metrics 
november 2015 by jm
outbrain/gruffalo
an asynchronous Netty based graphite proxy. It protects Graphite from the herds of clients by minimizing context switches and interrupts; by batching and aggregating metrics. Gruffalo also allows you to replicate metrics between Graphite installations for DR scenarios, for example.

Gruffalo can easily handle a massive amount of traffic, and thus increase your metrics delivery system availability. At Outbrain, we currently handle over 1700 concurrent connections, and over 2M metrics per minute per instance.
graphite  backpressure  metrics  outbrain  netty  proxies  gruffalo  ops 
april 2015 by jm
Time Series Metrics with Cassandra
slides from Chris Maxwell of Ubiquiti Networks describing what he had to do to get cyanite on Cassandra handling 30k metrics per second; an experimental "Date-tiered compaction" mode from Spotify was essential from the sounds of it. Very complex :(
cassandra  spotify  date-tiered-compaction  metrics  graphite  cyanite  chris-maxwell  time-series-data 
april 2015 by jm
Heka
an open source stream processing software system developed by Mozilla. Heka is a “Swiss Army Knife” type tool for data processing, useful for a wide variety of different tasks, such as:

Loading and parsing log files from a file system.
Accepting statsd type metrics data for aggregation and forwarding to upstream time series data stores such as graphite or InfluxDB.
Launching external processes to gather operational data from the local system.
Performing real time analysis, graphing, and anomaly detection on any data flowing through the Heka pipeline.
Shipping data from one location to another via the use of an external transport (such as AMQP) or directly (via TCP).
Delivering processed data to one or more persistent data stores.


Via feylya on twitter. Looks potentially nifty
heka  mozilla  monitoring  metrics  via:feylya  ops  statsd  graphite  stream-processing 
march 2015 by jm
One year of InfluxDB and the road to 1.0
half of the [Monitorama] attendees were employees and entrepreneurs at monitoring, metrics, DevOps, and server analytics companies. Most of them had a story about how their metrics API was their key intellectual property that took them years to develop. The other half of the attendees were developers at larger organizations that were rolling their own DevOps stack from a collection of open source tools. Almost all of them were creating a “time series database” with a bunch of web services code on top of some other database or just using Graphite. When everyone is repeating the same work, it’s not key intellectual property or a differentiator, it’s a barrier to entry. Not only that, it’s something that is hindering innovation in this space since everyone has to spend their first year or two getting to the point where they can start building something real. It’s like building a web company in 1998. You have to spend millions of dollars and a year building infrastructure, racking servers, and getting everything ready before you could run the application. Monitoring and analytics applications should not be like this.
graphite  monitoring  metrics  tsd  time-series  analytics  influxdb  open-source 
february 2015 by jm
pcp2graphite
A gateway script, now included in PCP
pcp2graphite  pcp  graphite  ops  metrics  system 
february 2015 by jm
Backstage Blog - Prometheus: Monitoring at SoundCloud - SoundCloud Developers
whoa, this is pretty excellent. The major improvement over a graphite-based system would be the multi-dimensional tagging of metrics, which we currently have to do by simply expanding the graphite metric's name to encompass all those dimensions and use searching at query time, inefficiently.
monitoring  soundcloud  prometheus  metrics  service-metrics  graphite  alerting 
february 2015 by jm
carbon-c-relay
A much better carbon-relay, written in C rather than Python. Linking as we've been using it in production for quite a while with no problems.
The main reason to build a replacement is performance and configurability. Carbon is single threaded, and sending metrics to multiple consistent-hash clusters requires chaining of relays. This project provides a multithreaded relay which can address multiple targets and clusters for each and every metric based on pattern matches.
graphite  carbon  c  python  ops  metrics 
january 2015 by jm
PDX DevOps Graphite replacement
Replacing graphite with InfluxDB, Riemann and Grafana. Not quite there yet, looks like
influxdb  graphite  ops  metrics  riemann  grafana  slides 
december 2014 by jm
Carbon vs Megacarbon and Roadmap ? · Issue #235 · graphite-project/carbon
Carbon is a great idea, but fundamentally, twisted doesn't do what carbon-relay or carbon-aggregator were built to do when hit with sustained and heavy throughput. Much to my chagrin, concurrency isn't one of python's core competencies.


+1, sadly. We are patching around the edges with half-released third-party C rewrites in our graphite setup, as we exceed the scale Carbon can support.
carbon  graphite  metrics  ops  python  twisted  scalability 
october 2014 by jm
Metrics-Driven Development
we believe MDD is equal parts engineering technique and cultural process. It separates the notion of monitoring from its traditional position of exclusivity as an operations thing and places it more appropriately next to its peers as an engineering process. Provided access to real-time production metrics relevant to them individually, both software engineers and operations engineers can validate hypotheses, assess problems, implement solutions, and improve future designs.


Broken down into the following principles: 'Instrumentation-as-Code', 'Single Source of Truth', 'Developers Curate Visualizations and Alerts', 'Alert on What You See', 'Show me the Graph', 'Don’t Measure Everything (YAGNI)'.

We do all of these at Swrve, naturally (a technique I happily stole from Amazon).
metrics  coding  graphite  mdd  instrumentation  yagni  alerting  monitoring  graphs 
july 2014 by jm
Tessera
Urban Airship with a new open-source Graphite front-end UI; similar enough to Grafana at a glance, no releases yet, ASL2-licensed
graphite  metrics  ui  front-ends  open-source  ops 
july 2014 by jm
Extending graphite’s mileage
Ad company InMobi are using graphite heavily (albeit not as heavily as $work are), ran into the usual scaling issues, and chose to fix it in code by switching from a filesystem full of whisper files to a LevelDB per carbon-cache:
The carbon server is now able to run without breaking a sweat even when 500K metrics per minute is being pumped into it. This has been in production since late August 2013 in every datacenter that we operate from.


Very nice. I hope this gets merged/supported.
graphite  scalability  metrics  leveldb  storage  inmobi  whisper  carbon  open-source 
january 2014 by jm
Cyanite
a metric storage daemon, exposing both a carbon listener and a simple web service. Its aim is to become a simple, scalable and drop-in replacement for graphite's backend.


Pretty alpha for now, but definitely worth keeping an eye on to potentially replace our burgeoning Carbon fleet...
graphite  carbon  cassandra  storage  metrics  ops  graphs  service-metrics 
december 2013 by jm
Statsite
A C reimplementation of Etsy's statsd, with some interesting memory optimizations.
Statsite is designed to be both highly performant, and very flexible. To achieve this, it implements the stats collection and aggregation in pure C, using libev to be extremely fast. This allows it to handle hundreds of connections, and millions of metrics. After each flush interval expires, statsite performs a fork/exec to start a new stream handler invoking a specified application. Statsite then streams the aggregated metrics over stdin to the application, which is free to handle the metrics as it sees fit. This allows statsite to aggregate metrics and then ship metrics to any number of sinks (Graphite, SQL databases, etc). There is an included Python script that ships metrics to graphite.
statsd  graphite  statsite  performance  statistics  service-metrics  metrics  ops 
november 2013 by jm
vimeo/timeserieswidget
'Plugin to make highly interactive graphite graph objects ((i.e. graphs where you can interactively toggle on/off individual series, inspect datapoints, zoom in realtime, etc) Supports Flot (canvas), Rickshaw (svg) and standard graphite png images (in case you're nostalgic and don't like interactivity).'
graphs  graphing  graphite  dataviz  flot  rickshaw  svg  canvas  javascript 
september 2013 by jm
DevOps Eye for the Coding Guy: Metrics
a pretty good description of the process of adding service metrics to a Django webapp using graphite and statsd. Bookmarking mainly for the great real-time graphing hack at the end...
statsd  django  monitoring  metrics  python  graphite 
september 2013 by jm
graphite-metrics
metric collectors for various stuff not (or poorly) handled by other monitoring daemons

Core of the project is a simple daemon (harvestd), which collects metric values and sends them to graphite carbon daemon (and/or other configured destinations) once per interval. Includes separate data collection components ("collectors") for processing of:

/proc/slabinfo for useful-to-watch values, not everything (configurable).
/proc/vmstat and /proc/meminfo in a consistent way.
/proc/stat for irq, softirq, forks.
/proc/buddyinfo and /proc/pagetypeinfo (memory fragmentation).
/proc/interrupts and /proc/softirqs.
Cron log to produce start/finish events and duration for each job into a separate metrics, adapts jobs to metric names with regexes.
Per-system-service accounting using systemd and it's cgroups.
sysstat data from sadc logs (use something like sadc -F -L -S DISK -S XDISK -S POWER 60 to have more stuff logged there) via sadf binary and it's json export (sadf -j, supported since sysstat-10.0.something, iirc).
iptables rule "hits" packet and byte counters, taken from ip{,6}tables-save, mapped via separate "table chain_name rule_no metric_name" file, which should be generated along with firewall rules (I use this script to do that).


Pretty exhaustive list of system metrics -- could have some interesting ideas for Linux OS-level metrics to monitor in future.
graphite  monitoring  metrics  unix  linux  ops  vm  iptables  sysadmin 
june 2013 by jm
Care and Feeding of Large Scale Graphite Installations [slides]
good docs for large-scale graphite use: 'Tip and tricks of using and scaling graphite. First presented at DevOpsDays Austin Texas 2013-05-01'
graphite  devops  ops  metrics  dashboards  sysadmin 
june 2013 by jm
Monitoring Apache Hadoop, Cassandra and Zookeeper using Graphite and JMXTrans
nice enough, but a lot of moving parts. It would be nice to see a simpler ZK+Graphite setup using the 'mntr' verb
graphite  monitoring  ops  zookeeper  cassandra  hadoop  jmx  jmxtrans  graphs 
march 2013 by jm
Unhelpful Graphite Tips
10 particularly good -- actually helpful -- tips on using the Graphite metric graphing system
graphite  ops  metrics  service-metrics  graphing  ui  dataviz 
february 2013 by jm
check_graphite
'a Nagios plugin to poll Graphite'. Necessary, since service metrics are the true source of service health information
nagios  graphite  service-metrics  ops 
january 2013 by jm

Copy this bookmark:



description:


tags: