SQS performance and latency
july 2017 by jm
Some decent benchmark data on SQS:
sqs
benchmarking
measurement
aws
latency
We were looking at four values in the tests:
total number of messages sent per second (by all nodes)
total number of messages received per second
95th percentile of message send latency (how fast a message send call completes)
95th percentile of message processing latency (how long it takes between sending and receiving a message)
july 2017 by jm
AWS Inter-Region Latency Monitoring
june 2017 by jm
only averages, though, no percentiles
latency
networking
aws
ops
inter-region
cross-region
ping
june 2017 by jm
Amazon DynamoDB Accelerator (DAX)
No latency percentile figures, unfortunately. Also still in preview.
amazon
dynamodb
aws
dax
performance
storage
databases
latency
low-latency
april 2017 by jm
Amazon DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from milliseconds to microseconds – even at millions of requests per second. DAX does all the heavy lifting required to add in-memory acceleration to your DynamoDB tables, without requiring developers to manage cache invalidation, data population, or cluster management.
No latency percentile figures, unfortunately. Also still in preview.
april 2017 by jm
AWS latency comparison: API Gateway vs Lambda vs Bare EC2
october 2016 by jm
ugh, 213ms mean response overhead
aws
latency
lambda
api-gateway
architecture
http
october 2016 by jm
ztellman/dirigiste
june 2016 by jm
'centrally-planned object and thread pools' for java.
'In the default JVM thread pools, once a thread is created it will only be retired when it hasn't performed a task in the last minute. In practice, this means that there are as many threads as the peak historical number of concurrent tasks handled by the pool, forever. These thread pools are also poorly instrumented, making it difficult to tune their latency or throughput. Dirigiste provides a fast, richly instrumented version of a java.util.concurrent.ExecutorService, and provides a means to feed that instrumentation into a control mechanism that can grow or shrink the pool as needed. Default implementations that optimize the pool size for thread utilization are provided. It also provides an object pool mechanism that uses a similar feedback mechanism to resize itself, and is significantly simpler than the Apache Commons object pool implementation.'
Great metric support, too.
async
jvm
dirigiste
java
threadpools
concurrency
utilization
capacity
executors
object-pools
object-pooling
latency
'In the default JVM thread pools, once a thread is created it will only be retired when it hasn't performed a task in the last minute. In practice, this means that there are as many threads as the peak historical number of concurrent tasks handled by the pool, forever. These thread pools are also poorly instrumented, making it difficult to tune their latency or throughput. Dirigiste provides a fast, richly instrumented version of a java.util.concurrent.ExecutorService, and provides a means to feed that instrumentation into a control mechanism that can grow or shrink the pool as needed. Default implementations that optimize the pool size for thread utilization are provided. It also provides an object pool mechanism that uses a similar feedback mechanism to resize itself, and is significantly simpler than the Apache Commons object pool implementation.'
Great metric support, too.
june 2016 by jm
Gil Tene on benchmarking
april 2016 by jm
'I would strongly encourage you to avoid repeating the mistakes of testing methodologies that focus entirely on max achievable throughput and then report some (usually bogus) latency stats at those max throughout modes. The techempower numbers are a classic example of this in play, and while they do provide some basis for comparing a small aspect of behavior (what I call the "how fast can this thing drive off a cliff" comparison, or "pedal to the metal" testing), those results are not very useful for comparing load carrying capacities for anything that actually needs to maintain some form of responsiveness SLA or latency spectrum requirements.'
Some excellent advice here on how to measure and represent stack performance.
Also: 'DON'T use or report standard deviation for latency. Ever. Except if you mean it as a joke.'
performance
benchmarking
testing
speed
gil-tene
latency
measurement
hdrhistogram
load-testing
load
Some excellent advice here on how to measure and represent stack performance.
Also: 'DON'T use or report standard deviation for latency. Ever. Except if you mean it as a joke.'
april 2016 by jm
The Nyquist theorem and limitations of sampling profilers today, with glimpses of tracing tools from the future
february 2016 by jm
Awesome post from Dan Luu with data from Google:
debugging
performance
visualization
instrumentation
metrics
dan-luu
latency
google
dick-sites
linux
scheduler
throttling
kernel
hangs
The cause [of some mystery widespread 250ms hangs] was kernel throttling of the CPU for processes that went beyond their usage quota. To enforce the quota, the kernel puts all of the relevant threads to sleep until the next multiple of a quarter second. When the quarter-second hand of the clock rolls around, it wakes up all the threads, and if those threads are still using too much CPU, the threads get put back to sleep for another quarter second. The phase change out of this mode happens when, by happenstance, there aren’t too many requests in a quarter second interval and the kernel stops throttling the threads. After finding the cause, an engineer found that this was happening on 25% of disk servers at Google, for an average of half an hour a day, with periods of high latency as long as 23 hours. This had been happening for three years. Dick Sites says that fixing this bug paid for his salary for a decade. This is another bug where traditional sampling profilers would have had a hard time. The key insight was that the slowdowns were correlated and machine wide, which isn’t something you can see in a profile.
february 2016 by jm
League of Legends win-rates vs latency analysed
(via Nelson)
games
league-of-legends
latency
ping
gaming
internet
via:nelson
december 2015 by jm
It appears that more mechanically intensive champions are more affected by latency, while tankier champions or those with point-and-click abilities are less affected by latency.
(via Nelson)
december 2015 by jm
Low-latency journalling file write latency on Linux
december 2015 by jm
great research from LMAX: xfs/ext4 are the best choices, and they explain why in detail, referring to the code
linux
xfs
ext3
ext4
filesystems
lmax
performance
latency
journalling
ops
december 2015 by jm
ELS: latency based load balancer, part 1
latency
spotify
proxies
load-balancing
els
algorithms
c3
round-robin
load-balancers
routing
december 2015 by jm
ELS measures the following things:
Success latency and success rate of each machine;
Number of outstanding requests between the load balancer and each machine. These are the requests that have been sent out but we haven’t yet received a reply;
Fast failures are better than slow failures, so we also measure failure latency for each machine.
Since users care a lot about latency, we prefer machines that are expected to answer quicker. ELS therefore converts all the measured metrics into expected latency from the client’s perspective.[...]
In short, the formula ensures that slower machines get less traffic and failing machines get much less traffic. Slower and failing machines still get some traffic, because we need to be able to detect when they come back up again.
december 2015 by jm
Topics in High-Performance Messaging
december 2015 by jm
'We have worked together in the field of high-performance messaging for many years, and in that time, have seen some messaging systems that worked well and some that didn't. Successful deployment of a messaging system requires background information that is not easily available; most of what we know, we had to learn in the school of hard knocks. To save others a knock or two, we have collected here the essential background information and commentary on some of the issues involved in successful deployments. This information is organized as a series of topics around which there seems to be confusion or uncertainty. Please contact us if you have questions or comments.'
messaging
scalability
scaling
performance
udp
tcp
protocols
multicast
latency
december 2015 by jm
Seastar
september 2015 by jm
C++ high-performance app framework; 'currently focused on high-throughput, low-latency I/O intensive applications.'
Scylla (Cassandra-compatible NoSQL store) is written in this.
c++
opensource
performance
framework
scylla
seastar
latency
linux
shared-nothing
multicore
Scylla (Cassandra-compatible NoSQL store) is written in this.
september 2015 by jm
toxy
toxy
proxies
proxy
http
mitm
node.js
soa
network
failures
latency
slowdown
jitter
bandwidth
tcp
august 2015 by jm
toxy is a fully programmatic and hackable HTTP proxy to simulate server failure scenarios and unexpected network conditions. It was mainly designed for fuzzing/evil testing purposes, when toxy becomes particularly useful to cover fault tolerance and resiliency capabilities of a system, especially in service-oriented architectures, where toxy may act as intermediate proxy among services.
toxy allows you to plug in poisons, optionally filtered by rules, which essentially can intercept and alter the HTTP flow as you need, performing multiple evil actions in the middle of that process, such as limiting the bandwidth, delaying TCP packets, injecting network jitter latency or replying with a custom error or status code.
august 2015 by jm
Amazon EC2 2015 Benchmark: Testing Speeds Between AWS EC2 and S3 Regions
august 2015 by jm
Here we are again, a year later, and still no bloody percentiles! Just amateurish averaging. This is not how you measure anything, ffs. Still, better than nothing I suppose
fail
latency
measurement
aws
ec2
percentiles
s3
august 2015 by jm
Performance Testing at LMAX
june 2015 by jm
Good series of blog posts on the LMAX trading platform's performance testing strategy -- they capture live traffic off the wire, then build statistical models simulating its features. See also http://epickrram.blogspot.co.uk/2014/07/performance-testing-at-lmax-part-two.html and http://epickrram.blogspot.co.uk/2014/08/performance-testing-at-lmax-part-three.html .
performance
testing
tests
simulation
latency
lmax
trading
sniffing
packet-capture
june 2015 by jm
SolarCapture Packet Capture Software
may 2015 by jm
Interesting product line -- I didn't know this existed, but it makes good sense as a "network flight recorder". Big in finance.
See also Corvil (based in Dublin!): 'I'm using a Corvil at the moment and it's awesome- nanosecond precision latency measurements on the wire.'
(via mechanical sympathy list)
corvil
timing
metrics
measurement
latency
network
solarcapture
packet-capture
financial
performance
security
network-monitoring
SolarCapture is powerful packet capture product family that can transform every server into a precision network monitoring device, increasing network visibility, network instrumentation, and performance analysis. SolarCapture products optimize network monitoring and security, while eliminating the need for specialized appliances, expensive adapters relying on exotic protocols, proprietary hardware, and dedicated networking equipment.
See also Corvil (based in Dublin!): 'I'm using a Corvil at the moment and it's awesome- nanosecond precision latency measurements on the wire.'
(via mechanical sympathy list)
may 2015 by jm
"Trash Day: Coordinating Garbage Collection in Distributed Systems"
may 2015 by jm
Another GC-coordination strategy, similar to Blade (qv), with some real-world examples using Cassandra
blade
via:adriancolyer
papers
gc
distsys
algorithms
distributed
java
jvm
latency
spark
cassandra
may 2015 by jm
_Blade: a Data Center Garbage Collector_
april 2015 by jm
Essentially, add a central GC scheduler to improve tail latencies in a cluster, by taking instances out of the pool to perform slow GC activity instead of letting them impact live operations. I've been toying with this idea for a while, nice to see a solid paper about it
gc
latency
tail-latencies
papers
blade
go
java
scheduling
clustering
load-balancing
low-latency
performance
april 2015 by jm
Gil Tene's "usual suspects" to reduce system-level hiccups/latency jitters in a Linux system
performance
latency
hiccups
gil-tene
tuning
mechanical-sympathy
hyperthreading
linux
ops
april 2015 by jm
Based on empirical evidence (across many tens of sites thus far) and note-comparing with others, I use a list of "usual suspects" that I blame whenever they are not set to my liking and system-level hiccups are detected. Getting these settings right from the start often saves a bunch of playing around (and no, there is no "priority" to this - you should set them all right before looking for more advice...).
april 2015 by jm
The Four Month Bug: JVM statistics cause garbage collection pauses (evanjones.ca)
march 2015 by jm
Ugh, tying GC safepoints to disk I/O? bad idea:
bugs
gc
java
jvm
disk
mmap
latency
ops
jstat
The JVM by default exports statistics by mmap-ing a file in /tmp (hsperfdata). On Linux, modifying a mmap-ed file can block until disk I/O completes, which can be hundreds of milliseconds. Since the JVM modifies these statistics during garbage collection and safepoints, this causes pauses that are hundreds of milliseconds long. To reduce worst-case pause latencies, add the -XX:+PerfDisableSharedMem JVM flag to disable this feature. This will break tools that read this file, like jstat.
march 2015 by jm
JClarity's Illuminate
february 2015 by jm
Performance-diagnosis-as-a-service. Cool.
illuminate
jclarity
java
jvm
scala
latency
gc
tuning
performance
Users download and install an Illuminate Daemon using a simple installer which starts up a small stand alone Java process. The Daemon sits quietly unless it is asked to start gathering SLA data and/or to trigger a diagnosis. Users can set SLA’s via the dashboard and can opt to collect latency measurements of their transactions manually (using our library) or by asking Illuminate to automatically instrument their code (Servlet and JDBC based transactions are currently supported).
SLA latency data for transactions is collected on a short cycle. When the moving average of latency measurements goes above the SLA value (e.g. 150ms), a diagnosis is triggered. The diagnosis is very quick, gathering key data from O/S, JVM(s), virtualisation and other areas of the system. The data is then run through the machine learned algorithm which will quickly narrow down the possible causes and gather a little extra data if needed.
Once Illuminate has determined the root cause of the performance problem, the diagnosis report is sent back to the dashboard and an alert is sent to the user. That alert contains a link to the result of the diagnosis which the user can share with colleagues. Illuminate has all sorts of backoff strategies to ensure that users don’t get too many alerts of the same type in rapid succession!
february 2015 by jm
Azul Zing on Ubuntu on AWS Marketplace
february 2015 by jm
hmmm, very interesting -- the super-low-latency Zing JVM is available as a commercial EC2 instance type, at costs less than the EC2 instance price
zing
azul
latency
performance
ec2
aws
february 2015 by jm
HdrHistogram: A better latency capture method
february 2015 by jm
An excellent intro to HdrHistogram usage
hdrhistogram
hdr
histograms
statistics
latency
measurement
metrics
percentiles
quantiles
gil-tene
nitsan-wakart
february 2015 by jm
Visualizing AWS Storage with Real-Time Latency Spectrogram
january 2015 by jm
ohhhh this is very nice indeed. Great viz!
dataviz
latency
io
ops
sysdig
charts
graphs
commandline
linux
january 2015 by jm
Comcast
january 2015 by jm
Nice wrapper for 'tc' and 'netem', for network latency/packet loss emulation
networking
testing
linux
tc
netem
latency
packet-loss
iptables
january 2015 by jm
wrk2
november 2014 by jm
'A constant throughput, correct latency-recording variant of wrk. This is a must-have when measuring network service latency -- corrects for Coordinated Omission error:
wrk
latency
measurement
tools
cli
http
load-testing
testing
load-generation
coordinated-omission
gil-tene
wrk's model, which is similar to the model found in many current load generators, computes the latency for a given request as the time from the sending of the first byte of the request to the time the complete response was received. While this model correctly measures the actual completion time of individual requests, it exhibits a strong Coordinated Omission effect, through which most of the high latency artifacts exhibited by the measured server will be ignored. Since each connection will only begin to send a request after receiving a response, high latency responses result in the load generator coordinating with the server to avoid measurement during high latency periods.
november 2014 by jm
testing latency measurements using CTRL-Z
november 2014 by jm
An excellent tip from Gil "HDRHistogram" Tene:
control-z
suspend
unix
testing
latencies
latency
measurement
percentiles
tips
Good example of why I always "calibrate" latency tools with ^Z tests. If ^Z results don't make sense, don't use [the] tool. ^Z test math examples: If you ^Z for half the time, Max is obvious. [90th percentile] should be 80% of the ^Z stall time.
november 2014 by jm
Linux kernel's Transparent Huge Pages feature causing 300ms-800ms pauses
november 2014 by jm
bad news for low-latency apps. See also its impact on redis: http://antirez.com/news/84
redis
memory
defrag
huge-pages
linux
kernel
ops
latency
performance
transparent-huge-pages
november 2014 by jm
Testing fork time on AWS/Xen infrastructure
october 2014 by jm
Redis uses forking to perform persistence flushes, which means that once every 30 minutes it performs like crap (and kills the 99th percentile latency). Given this, various Redis people have been benchmarking fork() times on various Xen platforms, since Xen has a crappy fork() implementation
fork
xen
redis
bugs
performance
latency
p99
october 2014 by jm
Most page loads will experience the 99th percentile response latency
latency
metrics
percentiles
p99
web
http
soa
october 2014 by jm
MOST of the page view attempts will experience the 99%'lie server response time in modern web applications. You didn't read that wrong.
october 2014 by jm
FelixGV/tehuti
october 2014 by jm
Felix says:
'Like I said, I'd like to move it to a more general / non-personal repo in the future, but haven't had the time yet. Anyway, you can still browse the code there for now. It is not a big code base so not that hard to wrap one's mind around it.
It is Apache licensed and both Kafka and Voldemort are using it so I would say it is pretty self-contained (although Kafka has not moved to Tehuti proper, it is essentially the same code they're using, minus a few small fixes missing that we added).
Tehuti is a bit lower level than CodaHale (i.e.: you need to choose exactly which stats you want to measure and the boundaries of your histograms), but this is the type of stuff you would build a wrapper for and then re-use within your code base. For example: the Voldemort RequestCounter class.'
asl2
apache
open-source
tehuti
metrics
percentiles
quantiles
statistics
measurement
latency
kafka
voldemort
linkedin
'Like I said, I'd like to move it to a more general / non-personal repo in the future, but haven't had the time yet. Anyway, you can still browse the code there for now. It is not a big code base so not that hard to wrap one's mind around it.
It is Apache licensed and both Kafka and Voldemort are using it so I would say it is pretty self-contained (although Kafka has not moved to Tehuti proper, it is essentially the same code they're using, minus a few small fixes missing that we added).
Tehuti is a bit lower level than CodaHale (i.e.: you need to choose exactly which stats you want to measure and the boundaries of your histograms), but this is the type of stuff you would build a wrapper for and then re-use within your code base. For example: the Voldemort RequestCounter class.'
october 2014 by jm
"Quantiles on Streams" [paper, 2009]
october 2014 by jm
'Chiranjeeb Buragohain and Subhash Suri: "Quantiles on Streams" in Encyclopedia of Database Systems, Springer, pp 2235–2240, 2009. ISBN: 978-0-387-35544-3', cited by Martin Kleppman in http://mail-archives.apache.org/mod_mbox/kafka-dev/201402.mbox/%3C131A7649-ED57-45CB-B4D6-F34063267664@linkedin.com%3E as a good, short literature survey re estimating percentiles with a small memory footprint.
latency
percentiles
coding
quantiles
streams
papers
algorithms
october 2014 by jm
Tehuti
october 2014 by jm
An embryonic metrics library for Java/Scala from Felix GV at LinkedIn, extracted from Kafka's metric implementation and in the new Voldemort release. It fixes the major known problems with the Meter/Timer implementations in Coda-Hale/Dropwizard/Yammer Metrics.
'Regarding Tehuti: it has been extracted from Kafka's metric implementation. The code was originally written by Jay Kreps, and then maintained improved by some Kafka and Voldemort devs, so it definitely is not the work of just one person. It is in my repo at the moment but I'd like to put it in a more generally available (git and maven) repo in the future. I just haven't had the time yet...
As for comparing with CodaHale/Yammer, there were a few concerns with it, but the main one was that we didn't like the exponentially decaying histogram implementation. While that implementation is very appealing in terms of (low) memory usage, it has several misleading characteristics (a lack of incoming data points makes old measurements linger longer than they should, and there's also a fairly high possiblity of losing interesting outlier data points). This makes the exp decaying implementation robust in high throughput fairly constant workloads, but unreliable in sparse or spiky workloads. The Tehuti implementation provides semantics that we find easier to reason with and with a small code footprint (which we consider a plus in terms of maintainability). Of course, it is still a fairly young project, so it could be improved further.'
More background at the kafka-dev thread: http://mail-archives.apache.org/mod_mbox/kafka-dev/201402.mbox/%3C131A7649-ED57-45CB-B4D6-F34063267664@linkedin.com%3E
kafka
metrics
dropwizard
java
scala
jvm
timers
ewma
statistics
measurement
latency
sampling
tehuti
voldemort
linkedin
jay-kreps
'Regarding Tehuti: it has been extracted from Kafka's metric implementation. The code was originally written by Jay Kreps, and then maintained improved by some Kafka and Voldemort devs, so it definitely is not the work of just one person. It is in my repo at the moment but I'd like to put it in a more generally available (git and maven) repo in the future. I just haven't had the time yet...
As for comparing with CodaHale/Yammer, there were a few concerns with it, but the main one was that we didn't like the exponentially decaying histogram implementation. While that implementation is very appealing in terms of (low) memory usage, it has several misleading characteristics (a lack of incoming data points makes old measurements linger longer than they should, and there's also a fairly high possiblity of losing interesting outlier data points). This makes the exp decaying implementation robust in high throughput fairly constant workloads, but unreliable in sparse or spiky workloads. The Tehuti implementation provides semantics that we find easier to reason with and with a small code footprint (which we consider a plus in terms of maintainability). Of course, it is still a fairly young project, so it could be improved further.'
More background at the kafka-dev thread: http://mail-archives.apache.org/mod_mbox/kafka-dev/201402.mbox/%3C131A7649-ED57-45CB-B4D6-F34063267664@linkedin.com%3E
october 2014 by jm
"Left-Right: A Concurrency Control Technique with Wait-Free Population Oblivious Reads" [pdf]
september 2014 by jm
'In this paper, we describe a generic concurrency control technique with Blocking write operations and Wait-Free Population Oblivious read operations, which we named the Left-Right technique. It is of particular interest for real-time applications with dedicated Reader threads, due to its wait-free property that gives strong latency guarantees and, in addition, there is no need for automatic Garbage Collection.
The Left-Right pattern can be applied to any data structure, allowing concurrent access to it similarly to a Reader-Writer lock, but in a non-blocking manner for reads. We present several variations of the Left-Right technique, with different versioning mechanisms and state machines. In addition, we constructed an optimistic approach that can reduce synchronization for reads.'
See also http://concurrencyfreaks.blogspot.ie/2013/12/left-right-concurrency-control.html for java implementation code.
left-right
concurrency
multithreading
wait-free
blocking
realtime
gc
latency
reader-writer
locking
synchronization
java
The Left-Right pattern can be applied to any data structure, allowing concurrent access to it similarly to a Reader-Writer lock, but in a non-blocking manner for reads. We present several variations of the Left-Right technique, with different versioning mechanisms and state machines. In addition, we constructed an optimistic approach that can reduce synchronization for reads.'
See also http://concurrencyfreaks.blogspot.ie/2013/12/left-right-concurrency-control.html for java implementation code.
september 2014 by jm
"The Tail at Scale"
july 2014 by jm
by Jeffrey Dean and Luiz Andre Barroso, Google. A selection of Google's architectural mechanisms used to defeat 99th-percentile latency spikes: hedged requests, tied requests, micro-partitioning, selective replication, latency-induced probation, canary requests.
google
architecture
distcomp
soa
http
partitioning
replication
latency
99th-percentile
canary-requests
hedged-requests
july 2014 by jm
Google's Pegasus
june 2014 by jm
a power-management subsystem for warehouse-scale computing farms. "It adjusts the power-performance settings of servers so that the overall workload barely meets its latency constraints for user queries."
pegasus
power-management
power
via:fanf
google
latency
scaling
june 2014 by jm
Monitoring Reactive Applications with Kamon
may 2014 by jm
"quality monitoring tools for apps built in Akka, Spray and Play!". Uses Gil Tene's HDRHistogram and dropwizard Metrics under the hood.
metrics
dropwizard
hdrhistogram
gil-tene
kamon
akka
spray
play
reactive
statistics
java
scala
percentiles
latency
may 2014 by jm
Uplink Latency of WiFi and 4G Networks
april 2014 by jm
It's high. Wifi in particular shows high variability and long latency tails
wifi
3g
4g
mobile
networking
internet
latency
tcp
april 2014 by jm
Game servers: UDP vs TCP
april 2014 by jm
this HN thread on the age-old UDP vs TCP question is way better than the original post -- lots of salient comments
udp
tcp
games
protocols
networking
latency
internet
gaming
hackernews
april 2014 by jm
Micro jitter, busy waiting and binding CPUs
march 2014 by jm
pinning threads to CPUs to reduce jitter and latency. Lots of graphs and measurements from Peter Lawrey
pinning
threads
performance
latency
jitter
tuning
march 2014 by jm
'Bobtail: Avoiding Long Tails in the Cloud' [pdf]
february 2014 by jm
'A system that proactively detects and avoids bad neighbouring VMs without significantly penalizing node instantiation [in EC2]. With Bobtail, common [datacenter] communication patterns benefit from reductions of up to 40% in 99.9th percentile response times.'
Excellent stuff -- another conclusion they come to is that it's not the network's fault, it's the Xen hosts themselves. The EC2 networking team will be happy about that ;)
networking
ec2
bobtail
latency
long-tail
xen
performance
Excellent stuff -- another conclusion they come to is that it's not the network's fault, it's the Xen hosts themselves. The EC2 networking team will be happy about that ;)
february 2014 by jm
LatencyUtils by giltene
gil-tene
metrics
java
measurement
coordinated-omission
latency
speed
service-metrics
open-source
november 2013 by jm
The LatencyUtils package includes useful utilities for tracking latencies. Especially in common in-process recording scenarios, which can exhibit significant coordinated omission sensitivity without proper handling.
november 2013 by jm
"Effective Computation of Biased Quantiles over Data Streams" [paper]
statistics
quantiles
percentiles
stream-processing
skew
papers
histograms
latency
algorithms
november 2013 by jm
Skew is prevalent in many data sources such as IP traffic streams.To continually summarize the distribution of such data, a high-biased set of quantiles (e.g., 50th, 90th and 99th percentiles) with finer error guarantees at higher ranks (e.g., errors of 5, 1 and 0.1 percent, respectively) is more useful than uniformly distributed quantiles (e.g., 25th, 50th and 75th percentiles) with uniform error guarantees. In this paper, we address the following two prob-lems. First, can we compute quantiles with finer error guarantees for the higher ranks of the data distribution effectively, using less space and computation time than computing all quantiles uniformly at the finest error? Second, if specific quantiles and their error bounds are requested a priori, can the necessary space usage and computation time be reduced? We answer both questions in the affirmative by formalizing them as the “high-biased” quantiles and the “targeted” quantiles problems, respectively, and presenting algorithms with provable guarantees, that perform significantly better than previously known solutions for these problems. We implemented our algorithms in the Gigascope data stream management system, and evaluated alternate approaches for maintaining the relevant summary structures.Our experimental results on real and synthetic IP data streams complement our theoretical analyses, and highlight the importance of lightweight, non-blocking implementations when maintaining summary structures over high-speed data streams.
Implemented as a timer-histogram storage system in http://armon.github.io/statsite/ .
november 2013 by jm
Barbarians at the Gateways - ACM Queue
Insane stuff -- FPGAs embedded in the network switches to shave off nanoseconds of latency.
low-latency
hft
via:nelson
markets
stock-trading
latency
fpgas
networking
october 2013 by jm
I am a former high-frequency trader. For a few wonderful years I led a group of brilliant engineers and mathematicians, and together we traded in the electronic marketplaces and pushed systems to the edge of their capability.
Insane stuff -- FPGAs embedded in the network switches to shave off nanoseconds of latency.
october 2013 by jm
"High Performance Browser Networking", by Ilya Grigorik, read online for free
october 2013 by jm
Wow, this looks excellent. A must-read for people working on systems with high-volume, low-latency phone-to-server communications -- and free!
Via Eoin Brazil.
book
browser
networking
performance
phones
mobile
3g
4g
hsdpa
http
udp
tls
ssl
latency
webrtc
websockets
ebooks
via:eoin-brazil
google
http2
sse
xhr
ilya-grigorik
How prepared are you to build fast and efficient web applications? This eloquent book provides what every web developer should know about the network, from fundamental limitations that affect performance to major innovations for building even more powerful browser applications—including HTTP 2.0 and XHR improvements, Server-Sent Events (SSE), WebSocket, and WebRTC.
Author Ilya Grigorik, a web performance engineer at Google, demonstrates performance optimization best practices for TCP, UDP, and TLS protocols, and explains unique wireless and mobile network optimization requirements. You’ll then dive into performance characteristics of technologies such as HTTP 2.0, client-side network scripting with XHR, real-time streaming with SSE and WebSocket, and P2P communication with WebRTC.
Deliver optimal TCP, UDP, and TLS performance;
Optimize network performance over 3G/4G mobile networks;
Develop fast and energy-efficient mobile applications;
Address bottlenecks in HTTP 1.x and other browser protocols;
Plan for and deliver the best HTTP 2.0 performance;
Enable efficient real-time streaming in the browser;
Create efficient peer-to-peer videoconferencing and low-latency applications with real-time WebRTC transports
Via Eoin Brazil.
october 2013 by jm
Rapid read protection in Cassandra 2.0.2
october 2013 by jm
Nifty new feature -- if a request takes over the 99th percentile for requests to that server, it'll be repeated against another replica. Unnecessary for Voldemort, of course, which queries all replicas anyway!
cassandra
nosql
replication
distcomp
latency
storage
october 2013 by jm
Attacking Tor: how the NSA targets users' online anonymity
whoa, I missed this before.
nsa
gchq
packet-injection
attacks
security
backbone
http
latency
october 2013 by jm
As part of the Turmoil system, the NSA places secret servers, codenamed Quantum, at key places on the internet backbone. This placement ensures that they can react faster than other websites can. By exploiting that speed difference, these servers can impersonate a visited website to the target before the legitimate website can respond, thereby tricking the target's browser to visit a Foxacid server.
whoa, I missed this before.
october 2013 by jm
Why Tellybug moved from Cassandra to Amazon DynamoDB
october 2013 by jm
Summary: poor reliability, better latencies, and cheaper (!)
aws
dynamodb
cassandra
nosql
storage
tellybug
counters
scalability
reliability
latency
october 2013 by jm
Benchmarking Redis on AWS ElastiCache
september 2013 by jm
good data points, but could do with latency percentiles
latency
redis
measurement
benchmarks
ec2
elasticache
aws
storage
tests
september 2013 by jm
[#CASSANDRA-5582] Replace CustomHsHaServer with better optimized solution based on LMAX Disruptor
september 2013 by jm
Disruptor: decimating P99s since 2011
disruptor
cassandra
java
p99
latency
speed
performance
concurrency
via:kellabyte
september 2013 by jm
Coordinated Omission
august 2013 by jm
Gil Tene raises an extremely good point about load testing, high-percentile response-time measurement, and behaviour when testing a system under load:
measurement
testing
latency
load-testing
gil-tene
coordinated-omission
validity
log4j
percentiles
I've been harping for a while now about a common measurement technique problem I call "Coordinated Omission" for a while, which can often render percentile data useless. [...] I believe that this problem occurs extremely frequently in test results, but it's usually hard to deduce it's existence purely from the final data reported. But every once in a while, I see test results where the data provided is enough to demonstrate the huge percentile-misreporting effect of Coordinated Omission based purely on the summary report.
I ran into just such a case in Attila's cool posting about log4j2's truly amazing performance, so I decided to avoid polluting his thread with an elongated discussion of how to compute 99.9%'ile data, and started this topic here. That thread should really be about how cool log4j2 is, and I'm certain that it really is cool, even after you correct the measurements. [...] Basically, I think that the 99.99% observation computation is wrong, and demonstrably (using the data in the graph data posted) exhibits the classic "coordinated omission" measurement problem I've been preaching about. This test is not alone in exhibiting this, and there is nothing to be ashamed of when you find yourself making this mistake. I only figured it out after doing it myself many many times, and then I noticed that everyone else seems to also be doing it but most of them haven't yet figured it out. In fact, I run into this issue so often in percentile reporting and load testing that I'm starting to wonder if coordinated omission is there in 99.9% of latency tests ;-)
august 2013 by jm
Improved HTTPS Performance with Early SSL Termination
july 2013 by jm
This is a neat hack. Since SSL/TLS connection establishment requires lots of consecutive round trips before the connection is ready, by performing that closer to the user and reusing an existing region-to-region connection behind the scenes, the overall latency is greatly improved. Works for HTTP as well
http
https
ssl
architecture
aws
ec2
performance
latency
internet
round-trip
nginx
tls
july 2013 by jm
SSL/TLS overhead
june 2013 by jm
'The TLS handshake has multiple variations, but let’s pick the most common one – anonymous client and authenticated server (the connections browsers use most of the time).' Works out to 4 packets, in addition to the TCP handshake's 3, and about 6.5k bytes on average.
network
tls
ssl
performance
latency
speed
networking
internet
security
packets
tcp
handshake
june 2013 by jm
Communication costs in real-world networks
may 2013 by jm
Peter Bailis has generated some good real-world data about network performance and latency, measured using EC2 instances, between ec2 regions, between zones, and between hosts in a single AZ. good data (particularly as I was looking for this data in a public source not too long ago).
Some of the high-percentile measurements are undoubtedly impact of host and VM behaviour, but that is still good data for a typical service built in EC2.
networks
performance
measurements
benchmarks
ops
ec2
networking
internet
az
latency
I wasn’t aware of any datasets describing network behavior both within and across datacenters, so we launched m1.small Amazon EC2 instances in each of the eight geo-distributed “Regions,” across the three us-east “Availability Zones” (three co-located datacenters in Virginia), and within one datacenter (us-east-b). We measured RTTs between hosts for a week at a granularity of one ping per second.
Some of the high-percentile measurements are undoubtedly impact of host and VM behaviour, but that is still good data for a typical service built in EC2.
may 2013 by jm
Storm and Hadoop: Convergence of Big-Data and Low-Latency Processing
february 2013 by jm
Yahoo! are going big with Storm for their next-generation internal cloud platform:
'Yahoo! engineering teams are developing technologies to enable Storm applications and Hadoop applications to be hosted on a single cluster.
• We have enhanced Storm to support Hadoop style security mechanism (including Kerberos authentication), and thus enable Storm applications authorized to access Hadoop datasets on HDFS and HBase.
• Storm is being integrated into Hadoop YARN for resource management. Storm-on-YARN enables Storm applications to utilize the computation resources in our tens of thousands of Hadoop computation nodes. YARN is used to launch Storm application master (Nimbus) on demand, and enables Nimbus to request resources for Storm application slaves (Supervisors).'
yahoo
yarn
cloud-computing
private-clouds
big-data
latency
storm
hadoop
elastic-computing
hbase
'Yahoo! engineering teams are developing technologies to enable Storm applications and Hadoop applications to be hosted on a single cluster.
• We have enhanced Storm to support Hadoop style security mechanism (including Kerberos authentication), and thus enable Storm applications authorized to access Hadoop datasets on HDFS and HBase.
• Storm is being integrated into Hadoop YARN for resource management. Storm-on-YARN enables Storm applications to utilize the computation resources in our tens of thousands of Hadoop computation nodes. YARN is used to launch Storm application master (Nimbus) on demand, and enables Nimbus to request resources for Storm application slaves (Supervisors).'
february 2013 by jm
#AltDevBlogADay » Latency Mitigation Strategies
february 2013 by jm
John Carmack on the low-latency coding techniques used to support head mounted display devices.
head-mounted-display
display
ui
latency
vision
coding
john-carmack
Virtual reality (VR) is one of the most demanding human-in-the-loop applications from a latency standpoint. The latency between the physical movement of a user’s head and updated photons from a head mounted display reaching their eyes is one of the most critical factors in providing a high quality experience.
Human sensory systems can detect very small relative delays in parts of the visual or, especially, audio fields, but when absolute delays are below approximately 20 milliseconds they are generally imperceptible. Interactive 3D systems today typically have latencies that are several times that figure, but alternate configurations of the same hardware components can allow that target to be reached.
A discussion of the sources of latency throughout a system follows, along with techniques for reducing the latency in the processing done on the host system.
february 2013 by jm
AnandTech - The Intel SSD DC S3700: Intel's 3rd Generation Controller Analyzed
november 2012 by jm
Interesting trend; Intel moved from a btree to an array-based data structure for their logical-block address indirection map, in order to reduce worst-case latencies (via Martin Thompson)
latency
intel
via:martin-thompson
optimization
speed
p99
data-structures
arrays
btrees
ssd
hardware
november 2012 by jm
How does LMAX's disruptor pattern work? - Stack Overflow
november 2011 by jm
LMAX's "Disruptor" concurrent-server pattern, claiming to be a higher-throughput, lower-latency, and lock-free alternative to the SEDA pattern using a massive ring buffer. Good discussion here at SO. (via Filippo)
via:filippo
servers
seda
queueing
concurrency
disruptor
patterns
latency
trading
performance
ring-buffers
november 2011 by jm
Overclocking SSL
july 2010 by jm
techie details from Adam Langley on how Google's been improving TLS/SSL, with lots of good tips. they switched in January to HTTPS for all Gmail users by default, without any additional machines or hardware
certificates
encryption
google
https
latency
speed
ssl
tcp
tls
web
performance
from delicious
july 2010 by jm
related tags
3g ⊕ 4g ⊕ 99th-percentile ⊕ akka ⊕ algorithms ⊕ amazon ⊕ apache ⊕ api-gateway ⊕ architecture ⊕ arrays ⊕ asl2 ⊕ async ⊕ attacks ⊕ aws ⊕ az ⊕ azul ⊕ backbone ⊕ bandwidth ⊕ benchmarking ⊕ benchmarks ⊕ big-data ⊕ blade ⊕ blocking ⊕ bobtail ⊕ book ⊕ browser ⊕ btrees ⊕ bugs ⊕ c++ ⊕ c3 ⊕ canary-requests ⊕ capacity ⊕ cassandra ⊕ certificates ⊕ charts ⊕ cli ⊕ cloud-computing ⊕ clustering ⊕ code ⊕ coding ⊕ commandline ⊕ concurrency ⊕ control-z ⊕ coordinated-omission ⊕ corvil ⊕ counters ⊕ cross-region ⊕ dan-luu ⊕ data-structures ⊕ databases ⊕ dataviz ⊕ dax ⊕ debugging ⊕ defrag ⊕ dick-sites ⊕ dirigiste ⊕ disk ⊕ display ⊕ disruptor ⊕ distcomp ⊕ distributed ⊕ distsys ⊕ dropwizard ⊕ dynamodb ⊕ ebooks ⊕ ec2 ⊕ elastic-computing ⊕ elasticache ⊕ els ⊕ encryption ⊕ errors ⊕ ewma ⊕ executors ⊕ ext3 ⊕ ext4 ⊕ fail ⊕ failures ⊕ filesystems ⊕ financial ⊕ fork ⊕ fpgas ⊕ framework ⊕ games ⊕ gaming ⊕ gc ⊕ gchq ⊕ gil-tene ⊕ go ⊕ google ⊕ graphs ⊕ hackernews ⊕ hadoop ⊕ handshake ⊕ hangs ⊕ hardware ⊕ hbase ⊕ hdr ⊕ hdrhistogram ⊕ head-mounted-display ⊕ hedged-requests ⊕ hft ⊕ hiccups ⊕ histograms ⊕ hsdpa ⊕ http ⊕ http2 ⊕ https ⊕ huge-pages ⊕ hyperthreading ⊕ illuminate ⊕ ilya-grigorik ⊕ instrumentation ⊕ intel ⊕ inter-region ⊕ internet ⊕ io ⊕ iptables ⊕ java ⊕ jay-kreps ⊕ jclarity ⊕ jitter ⊕ jmh ⊕ john-carmack ⊕ journalling ⊕ jstat ⊕ jvm ⊕ kafka ⊕ kamon ⊕ kernel ⊕ kragen ⊕ lambda ⊕ latencies ⊕ latency ⊖ league-of-legends ⊕ left-right ⊕ linkedin ⊕ linux ⊕ lmax ⊕ load ⊕ load-balancers ⊕ load-balancing ⊕ load-generation ⊕ load-testing ⊕ locking ⊕ log4j ⊕ long-tail ⊕ low-latency ⊕ mapreduce ⊕ markets ⊕ measurement ⊕ measurements ⊕ mechanical-sympathy ⊕ memory ⊕ messaging ⊕ metrics ⊕ mitm ⊕ mmap ⊕ mobile ⊕ multicast ⊕ multicore ⊕ multithreading ⊕ netem ⊕ network ⊕ network-monitoring ⊕ networking ⊕ networks ⊕ nginx ⊕ nitsan-wakart ⊕ node.js ⊕ nosql ⊕ nsa ⊕ object-pooling ⊕ object-pools ⊕ open-source ⊕ opensource ⊕ ops ⊕ optimization ⊕ p99 ⊕ packet-capture ⊕ packet-injection ⊕ packet-loss ⊕ packets ⊕ papers ⊕ partitioning ⊕ patterns ⊕ pegasus ⊕ percentiles ⊕ performance ⊕ phones ⊕ ping ⊕ pinning ⊕ play ⊕ power ⊕ power-management ⊕ private-clouds ⊕ protocols ⊕ proxies ⊕ proxy ⊕ quantiles ⊕ queueing ⊕ random ⊕ reactive ⊕ reader-writer ⊕ realtime ⊕ redis ⊕ reliability ⊕ replication ⊕ ring-buffers ⊕ round-robin ⊕ round-trip ⊕ routing ⊕ rtt ⊕ s3 ⊕ sampling ⊕ scala ⊕ scalability ⊕ scaling ⊕ scheduler ⊕ scheduling ⊕ scylla ⊕ seastar ⊕ security ⊕ seda ⊕ servers ⊕ service-metrics ⊕ shared-nothing ⊕ simulation ⊕ skew ⊕ slowdown ⊕ sniffing ⊕ soa ⊕ solarcapture ⊕ spark ⊕ speed ⊕ spotify ⊕ spray ⊕ sqs ⊕ ssd ⊕ sse ⊕ ssl ⊕ statistics ⊕ stock-trading ⊕ storage ⊕ storm ⊕ stream-processing ⊕ streams ⊕ suspend ⊕ synchronization ⊕ sysdig ⊕ tail-latencies ⊕ tc ⊕ tcp ⊕ tehuti ⊕ tellybug ⊕ testing ⊕ tests ⊕ thoughts ⊕ threadpools ⊕ threads ⊕ throttling ⊕ timers ⊕ timing ⊕ tips ⊕ tls ⊕ tools ⊕ toxy ⊕ trading ⊕ trains ⊕ transparent-huge-pages ⊕ tuning ⊕ udp ⊕ ui ⊕ unix ⊕ utilization ⊕ validity ⊕ via:adriancolyer ⊕ via:eoin-brazil ⊕ via:fanf ⊕ via:filippo ⊕ via:kellabyte ⊕ via:martin-thompson ⊕ via:nelson ⊕ vision ⊕ visualization ⊕ voldemort ⊕ wait-free ⊕ web ⊕ webrtc ⊕ websockets ⊕ wifi ⊕ wrk ⊕ xen ⊕ xfs ⊕ xhr ⊕ yahoo ⊕ yarn ⊕ zing ⊕Copy this bookmark: