jm + coordinated-omission   5

Correcting YCSB's Coordinated Omission problem
excellent walkthrough of CO and how it affects Yahoo!'s Cloud Storage Benchmarking platform
coordinated-omission  co  yahoo  ycsb  benchmarks  performance  testing 
march 2015 by jm
'A constant throughput, correct latency-recording variant of wrk. This is a must-have when measuring network service latency -- corrects for Coordinated Omission error:
wrk's model, which is similar to the model found in many current load generators, computes the latency for a given request as the time from the sending of the first byte of the request to the time the complete response was received. While this model correctly measures the actual completion time of individual requests, it exhibits a strong Coordinated Omission effect, through which most of the high latency artifacts exhibited by the measured server will be ignored. Since each connection will only begin to send a request after receiving a response, high latency responses result in the load generator coordinating with the server to avoid measurement during high latency periods.
wrk  latency  measurement  tools  cli  http  load-testing  testing  load-generation  coordinated-omission  gil-tene 
november 2014 by jm
LatencyUtils by giltene
The LatencyUtils package includes useful utilities for tracking latencies. Especially in common in-process recording scenarios, which can exhibit significant coordinated omission sensitivity without proper handling.
gil-tene  metrics  java  measurement  coordinated-omission  latency  speed  service-metrics  open-source 
november 2013 by jm
Coordinated Omission
Gil Tene raises an extremely good point about load testing, high-percentile response-time measurement, and behaviour when testing a system under load:

I've been harping for a while now about a common measurement technique problem I call "Coordinated Omission" for a while, which can often render percentile data useless. [...] I believe that this problem occurs extremely frequently in test results, but it's usually hard to deduce it's existence purely from the final data reported. But every once in a while, I see test results where the data provided is enough to demonstrate the huge percentile-misreporting effect of Coordinated Omission based purely on the summary report.

I ran into just such a case in Attila's cool posting about log4j2's truly amazing performance, so I decided to avoid polluting his thread with an elongated discussion of how to compute 99.9%'ile data, and started this topic here. That thread should really be about how cool log4j2 is, and I'm certain that it really is cool, even after you correct the measurements. [...] Basically, I think that the 99.99% observation computation is wrong, and demonstrably (using the data in the graph data posted) exhibits the classic "coordinated omission" measurement problem I've been preaching about. This test is not alone in exhibiting this, and there is nothing to be ashamed of when you find yourself making this mistake. I only figured it out after doing it myself many many times, and then I noticed that everyone else seems to also be doing it but most of them haven't yet figured it out. In fact, I run into this issue so often in percentile reporting and load testing that I'm starting to wonder if coordinated omission is there in 99.9% of latency tests ;-)
measurement  testing  latency  load-testing  gil-tene  coordinated-omission  validity  log4j  percentiles 
august 2013 by jm

Copy this bookmark: