jm + load-testing   10

Gil Tene on benchmarking
'I would strongly encourage you to avoid repeating the mistakes of testing methodologies that focus entirely on max achievable throughput and then report some (usually bogus) latency stats at those max throughout modes. The techempower numbers are a classic example of this in play, and while they do provide some basis for comparing a small aspect of behavior (what I call the "how fast can this thing drive off a cliff" comparison, or "pedal to the metal" testing), those results are not very useful for comparing load carrying capacities for anything that actually needs to maintain some form of responsiveness SLA or latency spectrum requirements.'

Some excellent advice here on how to measure and represent stack performance.

Also: 'DON'T use or report standard deviation for latency. Ever. Except if you mean it as a joke.'
performance  benchmarking  testing  speed  gil-tene  latency  measurement  hdrhistogram  load-testing  load 
april 2016 by jm
Benchmarking GitHub Enterprise - GitHub Engineering
Walkthrough of debugging connection timeouts in a load test. Nice graphs (using matplotlib)
github  listen-backlog  tcp  debugging  timeouts  load-testing  benchmarking  testing  ops  linux 
july 2015 by jm
Lambda: Bees with Frickin' Laser Beams
a HTTP testing tool in AWS Lambda. nice enough, but still a toy...
lambda  aws  node  javascript  hacks  http  load-testing 
may 2015 by jm
'A constant throughput, correct latency-recording variant of wrk. This is a must-have when measuring network service latency -- corrects for Coordinated Omission error:
wrk's model, which is similar to the model found in many current load generators, computes the latency for a given request as the time from the sending of the first byte of the request to the time the complete response was received. While this model correctly measures the actual completion time of individual requests, it exhibits a strong Coordinated Omission effect, through which most of the high latency artifacts exhibited by the measured server will be ignored. Since each connection will only begin to send a request after receiving a response, high latency responses result in the load generator coordinating with the server to avoid measurement during high latency periods.
wrk  latency  measurement  tools  cli  http  load-testing  testing  load-generation  coordinated-omission  gil-tene 
november 2014 by jm
Load testing Apache Kafka on AWS
This is a very solid benchmarking post, examining Kafka in good detail. Nicely done. Bottom line:
I basically spend 2/3 of my work time torture testing and operationalizing distributed systems in production. There's some that I'm not so pleased with (posts pending in draft forever) and some that have attributes that I really love. Kafka is one of those systems that I pretty much enjoy every bit of, and the fact that it performs predictably well is only a symptom of the reason and not the reason itself: the authors really know what they're doing. Nothing about this software is an accident. Performance, everything in this post, is only a fraction of what's important to me and what matters when you run these systems for real. Kafka represents everything I think good distributed systems are about: that thorough and explicit design decisions win.
testing  aws  kafka  ec2  load-testing  benchmarks  performance 
october 2014 by jm
REST Commander: Scalable Web Server Management and Monitoring
We dynamically monitor and manage a large and rapidly growing number of web servers deployed on our infrastructure and systems. However, existing tools present major challenges when making REST/SOAP calls with server-specific requests to a large number of web servers, and then performing aggregated analysis on the responses. We therefore developed REST Commander, a parallel asynchronous HTTP client as a service to monitor and manage web servers. REST Commander on a single server can send requests to thousands of servers with response aggregation in a matter of seconds. And yes, it is open-sourced at

Feature highlights:

Click-to-run with zero installation;
Generic HTTP request template supporting variable-based replacement for sending server-specific requests;
Ability to send the same request to different servers, different requests to different servers, and different requests to the same server;
Maximum concurrency control (throttling) to accommodate server capacity;
Commander itself is also “as a service”: with its powerful REST API, you can define ad-hoc target servers, an HTTP request template, variable replacement, and a regular expression all in a single call. In addition, intuitive step-by-step wizards help you achieve the same functionality through a GUI.
rest  http  clients  load-testing  ebay  soap  async  testing  monitoring 
july 2014 by jm
Flood IO Offering Network Emulation
Performance-testing-as-a-service company Flood.IO now offering emulation of various crappy end-user networks: GSM, DSL, GPRS, 3G, 4G etc. Great idea.  performance  networking  internet  load-testing  testing  jmeter  gatling  tests  gsm  3g  mobile  simulation 
april 2014 by jm
a modern HTTP benchmarking tool capable of generating significant load when run on a single multi-core CPU. It combines a multithreaded design with scalable event notification systems such as epoll and kqueue.  An optional LuaJIT script can perform HTTP request generation, response processing, and custom reporting.

Written in C, ASL2 licensed.
wrk  benchmarking  http  performance  testing  lua  load-testing  load-generation 
december 2013 by jm
Coordinated Omission
Gil Tene raises an extremely good point about load testing, high-percentile response-time measurement, and behaviour when testing a system under load:

I've been harping for a while now about a common measurement technique problem I call "Coordinated Omission" for a while, which can often render percentile data useless. [...] I believe that this problem occurs extremely frequently in test results, but it's usually hard to deduce it's existence purely from the final data reported. But every once in a while, I see test results where the data provided is enough to demonstrate the huge percentile-misreporting effect of Coordinated Omission based purely on the summary report.

I ran into just such a case in Attila's cool posting about log4j2's truly amazing performance, so I decided to avoid polluting his thread with an elongated discussion of how to compute 99.9%'ile data, and started this topic here. That thread should really be about how cool log4j2 is, and I'm certain that it really is cool, even after you correct the measurements. [...] Basically, I think that the 99.99% observation computation is wrong, and demonstrably (using the data in the graph data posted) exhibits the classic "coordinated omission" measurement problem I've been preaching about. This test is not alone in exhibiting this, and there is nothing to be ashamed of when you find yourself making this mistake. I only figured it out after doing it myself many many times, and then I noticed that everyone else seems to also be doing it but most of them haven't yet figured it out. In fact, I run into this issue so often in percentile reporting and load testing that I'm starting to wonder if coordinated omission is there in 99.9% of latency tests ;-)
measurement  testing  latency  load-testing  gil-tene  coordinated-omission  validity  log4j  percentiles 
august 2013 by jm
Percona Playback's tcpdump plugin
Capture MySQL traffic via tcpdump, tee it over the network to replay against a second database. Even supports query execution times and pauses between queries to playback the same load level
tcpdump  production  load-testing  testing  staging  tee  networking  netcat  percona  replay  mysql 
march 2013 by jm

Copy this bookmark: