jm + latency   62

SQS performance and latency
Some decent benchmark data on SQS:
We were looking at four values in the tests:
total number of messages sent per second (by all nodes)
total number of messages received per second
95th percentile of message send latency (how fast a message send call completes)
95th percentile of message processing latency (how long it takes between sending and receiving a message)
sqs  benchmarking  measurement  aws  latency 
july 2017 by jm
Amazon DynamoDB Accelerator (DAX)
Amazon DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from milliseconds to microseconds – even at millions of requests per second. DAX does all the heavy lifting required to add in-memory acceleration to your DynamoDB tables, without requiring developers to manage cache invalidation, data population, or cluster management.


No latency percentile figures, unfortunately. Also still in preview.
amazon  dynamodb  aws  dax  performance  storage  databases  latency  low-latency 
april 2017 by jm
ztellman/dirigiste
'centrally-planned object and thread pools' for java.

'In the default JVM thread pools, once a thread is created it will only be retired when it hasn't performed a task in the last minute. In practice, this means that there are as many threads as the peak historical number of concurrent tasks handled by the pool, forever. These thread pools are also poorly instrumented, making it difficult to tune their latency or throughput. Dirigiste provides a fast, richly instrumented version of a java.util.concurrent.ExecutorService, and provides a means to feed that instrumentation into a control mechanism that can grow or shrink the pool as needed. Default implementations that optimize the pool size for thread utilization are provided. It also provides an object pool mechanism that uses a similar feedback mechanism to resize itself, and is significantly simpler than the Apache Commons object pool implementation.'

Great metric support, too.
async  jvm  dirigiste  java  threadpools  concurrency  utilization  capacity  executors  object-pools  object-pooling  latency 
june 2016 by jm
Gil Tene on benchmarking
'I would strongly encourage you to avoid repeating the mistakes of testing methodologies that focus entirely on max achievable throughput and then report some (usually bogus) latency stats at those max throughout modes. The techempower numbers are a classic example of this in play, and while they do provide some basis for comparing a small aspect of behavior (what I call the "how fast can this thing drive off a cliff" comparison, or "pedal to the metal" testing), those results are not very useful for comparing load carrying capacities for anything that actually needs to maintain some form of responsiveness SLA or latency spectrum requirements.'

Some excellent advice here on how to measure and represent stack performance.

Also: 'DON'T use or report standard deviation for latency. Ever. Except if you mean it as a joke.'
performance  benchmarking  testing  speed  gil-tene  latency  measurement  hdrhistogram  load-testing  load 
april 2016 by jm
The Nyquist theorem and limitations of sampling profilers today, with glimpses of tracing tools from the future
Awesome post from Dan Luu with data from Google:
The cause [of some mystery widespread 250ms hangs] was kernel throttling of the CPU for processes that went beyond their usage quota. To enforce the quota, the kernel puts all of the relevant threads to sleep until the next multiple of a quarter second. When the quarter-second hand of the clock rolls around, it wakes up all the threads, and if those threads are still using too much CPU, the threads get put back to sleep for another quarter second. The phase change out of this mode happens when, by happenstance, there aren’t too many requests in a quarter second interval and the kernel stops throttling the threads. After finding the cause, an engineer found that this was happening on 25% of disk servers at Google, for an average of half an hour a day, with periods of high latency as long as 23 hours. This had been happening for three years. Dick Sites says that fixing this bug paid for his salary for a decade. This is another bug where traditional sampling profilers would have had a hard time. The key insight was that the slowdowns were correlated and machine wide, which isn’t something you can see in a profile.
debugging  performance  visualization  instrumentation  metrics  dan-luu  latency  google  dick-sites  linux  scheduler  throttling  kernel  hangs 
february 2016 by jm
League of Legends win-rates vs latency analysed
It appears that more mechanically intensive champions are more affected by latency, while tankier champions or those with point-and-click abilities are less affected by latency.


(via Nelson)
games  league-of-legends  latency  ping  gaming  internet  via:nelson 
december 2015 by jm
Low-latency journalling file write latency on Linux
great research from LMAX: xfs/ext4 are the best choices, and they explain why in detail, referring to the code
linux  xfs  ext3  ext4  filesystems  lmax  performance  latency  journalling  ops 
december 2015 by jm
ELS: latency based load balancer, part 1
ELS measures the following things:

Success latency and success rate of each machine;
Number of outstanding requests between the load balancer and each machine. These are the requests that have been sent out but we haven’t yet received a reply;
Fast failures are better than slow failures, so we also measure failure latency for each machine.

Since users care a lot about latency, we prefer machines that are expected to answer quicker. ELS therefore converts all the measured metrics into expected latency from the client’s perspective.[...]

In short, the formula ensures that slower machines get less traffic and failing machines get much less traffic. Slower and failing machines still get some traffic, because we need to be able to detect when they come back up again.
latency  spotify  proxies  load-balancing  els  algorithms  c3  round-robin  load-balancers  routing 
december 2015 by jm
Topics in High-Performance Messaging
'We have worked together in the field of high-performance messaging for many years, and in that time, have seen some messaging systems that worked well and some that didn't. Successful deployment of a messaging system requires background information that is not easily available; most of what we know, we had to learn in the school of hard knocks. To save others a knock or two, we have collected here the essential background information and commentary on some of the issues involved in successful deployments. This information is organized as a series of topics around which there seems to be confusion or uncertainty. Please contact us if you have questions or comments.'
messaging  scalability  scaling  performance  udp  tcp  protocols  multicast  latency 
december 2015 by jm
Seastar
C++ high-performance app framework; 'currently focused on high-throughput, low-latency I/O intensive applications.'

Scylla (Cassandra-compatible NoSQL store) is written in this.
c++  opensource  performance  framework  scylla  seastar  latency  linux  shared-nothing  multicore 
september 2015 by jm
toxy
toxy is a fully programmatic and hackable HTTP proxy to simulate server failure scenarios and unexpected network conditions. It was mainly designed for fuzzing/evil testing purposes, when toxy becomes particularly useful to cover fault tolerance and resiliency capabilities of a system, especially in service-oriented architectures, where toxy may act as intermediate proxy among services.

toxy allows you to plug in poisons, optionally filtered by rules, which essentially can intercept and alter the HTTP flow as you need, performing multiple evil actions in the middle of that process, such as limiting the bandwidth, delaying TCP packets, injecting network jitter latency or replying with a custom error or status code.
toxy  proxies  proxy  http  mitm  node.js  soa  network  failures  latency  slowdown  jitter  bandwidth  tcp 
august 2015 by jm
Amazon EC2 2015 Benchmark: Testing Speeds Between AWS EC2 and S3 Regions
Here we are again, a year later, and still no bloody percentiles! Just amateurish averaging. This is not how you measure anything, ffs. Still, better than nothing I suppose
fail  latency  measurement  aws  ec2  percentiles  s3 
august 2015 by jm
Performance Testing at LMAX
Good series of blog posts on the LMAX trading platform's performance testing strategy -- they capture live traffic off the wire, then build statistical models simulating its features. See also http://epickrram.blogspot.co.uk/2014/07/performance-testing-at-lmax-part-two.html and http://epickrram.blogspot.co.uk/2014/08/performance-testing-at-lmax-part-three.html .
performance  testing  tests  simulation  latency  lmax  trading  sniffing  packet-capture 
june 2015 by jm
SolarCapture Packet Capture Software
Interesting product line -- I didn't know this existed, but it makes good sense as a "network flight recorder". Big in finance.
SolarCapture is powerful packet capture product family that can transform every server into a precision network monitoring device, increasing network visibility, network instrumentation, and performance analysis. SolarCapture products optimize network monitoring and security, while eliminating the need for specialized appliances, expensive adapters relying on exotic protocols, proprietary hardware, and dedicated networking equipment.


See also Corvil (based in Dublin!): 'I'm using a Corvil at the moment and it's awesome- nanosecond precision latency measurements on the wire.'

(via mechanical sympathy list)
corvil  timing  metrics  measurement  latency  network  solarcapture  packet-capture  financial  performance  security  network-monitoring 
may 2015 by jm
"Trash Day: Coordinating Garbage Collection in Distributed Systems"
Another GC-coordination strategy, similar to Blade (qv), with some real-world examples using Cassandra
blade  via:adriancolyer  papers  gc  distsys  algorithms  distributed  java  jvm  latency  spark  cassandra 
may 2015 by jm
_Blade: a Data Center Garbage Collector_
Essentially, add a central GC scheduler to improve tail latencies in a cluster, by taking instances out of the pool to perform slow GC activity instead of letting them impact live operations. I've been toying with this idea for a while, nice to see a solid paper about it
gc  latency  tail-latencies  papers  blade  go  java  scheduling  clustering  load-balancing  low-latency  performance 
april 2015 by jm
Gil Tene's "usual suspects" to reduce system-level hiccups/latency jitters in a Linux system
Based on empirical evidence (across many tens of sites thus far) and note-comparing with others, I use a list of "usual suspects" that I blame whenever they are not set to my liking and system-level hiccups are detected. Getting these settings right from the start often saves a bunch of playing around (and no, there is no "priority" to this - you should set them all right before looking for more advice...).
performance  latency  hiccups  gil-tene  tuning  mechanical-sympathy  hyperthreading  linux  ops 
april 2015 by jm
The Four Month Bug: JVM statistics cause garbage collection pauses (evanjones.ca)
Ugh, tying GC safepoints to disk I/O? bad idea:
The JVM by default exports statistics by mmap-ing a file in /tmp (hsperfdata). On Linux, modifying a mmap-ed file can block until disk I/O completes, which can be hundreds of milliseconds. Since the JVM modifies these statistics during garbage collection and safepoints, this causes pauses that are hundreds of milliseconds long. To reduce worst-case pause latencies, add the -XX:+PerfDisableSharedMem JVM flag to disable this feature. This will break tools that read this file, like jstat.
bugs  gc  java  jvm  disk  mmap  latency  ops  jstat 
march 2015 by jm
JClarity's Illuminate
Performance-diagnosis-as-a-service. Cool.
Users download and install an Illuminate Daemon using a simple installer which starts up a small stand alone Java process. The Daemon sits quietly unless it is asked to start gathering SLA data and/or to trigger a diagnosis. Users can set SLA’s via the dashboard and can opt to collect latency measurements of their transactions manually (using our library) or by asking Illuminate to automatically instrument their code (Servlet and JDBC based transactions are currently supported).

SLA latency data for transactions is collected on a short cycle. When the moving average of latency measurements goes above the SLA value (e.g. 150ms), a diagnosis is triggered. The diagnosis is very quick, gathering key data from O/S, JVM(s), virtualisation and other areas of the system. The data is then run through the machine learned algorithm which will quickly narrow down the possible causes and gather a little extra data if needed.

Once Illuminate has determined the root cause of the performance problem, the diagnosis report is sent back to the dashboard and an alert is sent to the user. That alert contains a link to the result of the diagnosis which the user can share with colleagues. Illuminate has all sorts of backoff strategies to ensure that users don’t get too many alerts of the same type in rapid succession!
illuminate  jclarity  java  jvm  scala  latency  gc  tuning  performance 
february 2015 by jm
Azul Zing on Ubuntu on AWS Marketplace
hmmm, very interesting -- the super-low-latency Zing JVM is available as a commercial EC2 instance type, at costs less than the EC2 instance price
zing  azul  latency  performance  ec2  aws 
february 2015 by jm
Comcast
Nice wrapper for 'tc' and 'netem', for network latency/packet loss emulation
networking  testing  linux  tc  netem  latency  packet-loss  iptables 
january 2015 by jm
wrk2
'A constant throughput, correct latency-recording variant of wrk. This is a must-have when measuring network service latency -- corrects for Coordinated Omission error:
wrk's model, which is similar to the model found in many current load generators, computes the latency for a given request as the time from the sending of the first byte of the request to the time the complete response was received. While this model correctly measures the actual completion time of individual requests, it exhibits a strong Coordinated Omission effect, through which most of the high latency artifacts exhibited by the measured server will be ignored. Since each connection will only begin to send a request after receiving a response, high latency responses result in the load generator coordinating with the server to avoid measurement during high latency periods.
wrk  latency  measurement  tools  cli  http  load-testing  testing  load-generation  coordinated-omission  gil-tene 
november 2014 by jm
testing latency measurements using CTRL-Z
An excellent tip from Gil "HDRHistogram" Tene:
Good example of why I always "calibrate" latency tools with ^Z tests. If ^Z results don't make sense, don't use [the] tool. ^Z test math examples: If you ^Z for half the time, Max is obvious. [90th percentile] should be 80% of the ^Z stall time.
control-z  suspend  unix  testing  latencies  latency  measurement  percentiles  tips 
november 2014 by jm
Testing fork time on AWS/Xen infrastructure
Redis uses forking to perform persistence flushes, which means that once every 30 minutes it performs like crap (and kills the 99th percentile latency). Given this, various Redis people have been benchmarking fork() times on various Xen platforms, since Xen has a crappy fork() implementation
fork  xen  redis  bugs  performance  latency  p99 
october 2014 by jm
Most page loads will experience the 99th percentile response latency
MOST of the page view attempts will experience the 99%'lie server response time in modern web applications. You didn't read that wrong.
latency  metrics  percentiles  p99  web  http  soa 
october 2014 by jm
FelixGV/tehuti
Felix says:

'Like I said, I'd like to move it to a more general / non-personal repo in the future, but haven't had the time yet. Anyway, you can still browse the code there for now. It is not a big code base so not that hard to wrap one's mind around it.

It is Apache licensed and both Kafka and Voldemort are using it so I would say it is pretty self-contained (although Kafka has not moved to Tehuti proper, it is essentially the same code they're using, minus a few small fixes missing that we added).

Tehuti is a bit lower level than CodaHale (i.e.: you need to choose exactly which stats you want to measure and the boundaries of your histograms), but this is the type of stuff you would build a wrapper for and then re-use within your code base. For example: the Voldemort RequestCounter class.'
asl2  apache  open-source  tehuti  metrics  percentiles  quantiles  statistics  measurement  latency  kafka  voldemort  linkedin 
october 2014 by jm
"Quantiles on Streams" [paper, 2009]
'Chiranjeeb Buragohain and Subhash Suri: "Quantiles on Streams" in Encyclopedia of Database Systems, Springer, pp 2235–2240, 2009. ISBN: 978-0-387-35544-3', cited by Martin Kleppman in http://mail-archives.apache.org/mod_mbox/kafka-dev/201402.mbox/%3C131A7649-ED57-45CB-B4D6-F34063267664@linkedin.com%3E as a good, short literature survey re estimating percentiles with a small memory footprint.
latency  percentiles  coding  quantiles  streams  papers  algorithms 
october 2014 by jm
Tehuti
An embryonic metrics library for Java/Scala from Felix GV at LinkedIn, extracted from Kafka's metric implementation and in the new Voldemort release. It fixes the major known problems with the Meter/Timer implementations in Coda-Hale/Dropwizard/Yammer Metrics.

'Regarding Tehuti: it has been extracted from Kafka's metric implementation. The code was originally written by Jay Kreps, and then maintained improved by some Kafka and Voldemort devs, so it definitely is not the work of just one person. It is in my repo at the moment but I'd like to put it in a more generally available (git and maven) repo in the future. I just haven't had the time yet...

As for comparing with CodaHale/Yammer, there were a few concerns with it, but the main one was that we didn't like the exponentially decaying histogram implementation. While that implementation is very appealing in terms of (low) memory usage, it has several misleading characteristics (a lack of incoming data points makes old measurements linger longer than they should, and there's also a fairly high possiblity of losing interesting outlier data points). This makes the exp decaying implementation robust in high throughput fairly constant workloads, but unreliable in sparse or spiky workloads. The Tehuti implementation provides semantics that we find easier to reason with and with a small code footprint (which we consider a plus in terms of maintainability). Of course, it is still a fairly young project, so it could be improved further.'

More background at the kafka-dev thread: http://mail-archives.apache.org/mod_mbox/kafka-dev/201402.mbox/%3C131A7649-ED57-45CB-B4D6-F34063267664@linkedin.com%3E
kafka  metrics  dropwizard  java  scala  jvm  timers  ewma  statistics  measurement  latency  sampling  tehuti  voldemort  linkedin  jay-kreps 
october 2014 by jm
"Left-Right: A Concurrency Control Technique with Wait-Free Population Oblivious Reads" [pdf]
'In this paper, we describe a generic concurrency control technique with Blocking write operations and Wait-Free Population Oblivious read operations, which we named the Left-Right technique. It is of particular interest for real-time applications with dedicated Reader threads, due to its wait-free property that gives strong latency guarantees and, in addition, there is no need for automatic Garbage Collection.
The Left-Right pattern can be applied to any data structure, allowing concurrent access to it similarly to a Reader-Writer lock, but in a non-blocking manner for reads. We present several variations of the Left-Right technique, with different versioning mechanisms and state machines. In addition, we constructed an optimistic approach that can reduce synchronization for reads.'

See also http://concurrencyfreaks.blogspot.ie/2013/12/left-right-concurrency-control.html for java implementation code.
left-right  concurrency  multithreading  wait-free  blocking  realtime  gc  latency  reader-writer  locking  synchronization  java 
september 2014 by jm
"The Tail at Scale"
by Jeffrey Dean and Luiz Andre Barroso, Google. A selection of Google's architectural mechanisms used to defeat 99th-percentile latency spikes: hedged requests, tied requests, micro-partitioning, selective replication, latency-induced probation, canary requests.
google  architecture  distcomp  soa  http  partitioning  replication  latency  99th-percentile  canary-requests  hedged-requests 
july 2014 by jm
Google's Pegasus
a power-management subsystem for warehouse-scale computing farms. "It adjusts the power-performance settings of servers so that the overall workload barely meets its latency constraints for user queries."
pegasus  power-management  power  via:fanf  google  latency  scaling 
june 2014 by jm
Monitoring Reactive Applications with Kamon
"quality monitoring tools for apps built in Akka, Spray and Play!". Uses Gil Tene's HDRHistogram and dropwizard Metrics under the hood.
metrics  dropwizard  hdrhistogram  gil-tene  kamon  akka  spray  play  reactive  statistics  java  scala  percentiles  latency 
may 2014 by jm
Uplink Latency of WiFi and 4G Networks
It's high. Wifi in particular shows high variability and long latency tails
wifi  3g  4g  mobile  networking  internet  latency  tcp 
april 2014 by jm
Game servers: UDP vs TCP
this HN thread on the age-old UDP vs TCP question is way better than the original post -- lots of salient comments
udp  tcp  games  protocols  networking  latency  internet  gaming  hackernews 
april 2014 by jm
Micro jitter, busy waiting and binding CPUs
pinning threads to CPUs to reduce jitter and latency. Lots of graphs and measurements from Peter Lawrey
pinning  threads  performance  latency  jitter  tuning 
march 2014 by jm
'Bobtail: Avoiding Long Tails in the Cloud' [pdf]
'A system that proactively detects and avoids bad neighbouring VMs without significantly penalizing node instantiation [in EC2]. With Bobtail, common [datacenter] communication patterns benefit from reductions of up to 40% in 99.9th percentile response times.'

Excellent stuff -- another conclusion they come to is that it's not the network's fault, it's the Xen hosts themselves. The EC2 networking team will be happy about that ;)
networking  ec2  bobtail  latency  long-tail  xen  performance 
february 2014 by jm
LatencyUtils by giltene
The LatencyUtils package includes useful utilities for tracking latencies. Especially in common in-process recording scenarios, which can exhibit significant coordinated omission sensitivity without proper handling.
gil-tene  metrics  java  measurement  coordinated-omission  latency  speed  service-metrics  open-source 
november 2013 by jm
"Effective Computation of Biased Quantiles over Data Streams" [paper]

Skew is prevalent in many data sources such as IP traffic streams.To continually summarize the distribution of such data, a high-biased set of quantiles (e.g., 50th, 90th and 99th percentiles) with finer error guarantees at higher ranks (e.g., errors of 5, 1 and 0.1 percent, respectively) is more useful than uniformly distributed quantiles (e.g., 25th, 50th and 75th percentiles) with uniform error guarantees. In this paper, we address the following two prob-lems. First, can we compute quantiles with finer error guarantees for the higher ranks of the data distribution effectively, using less space and computation time than computing all quantiles uniformly at the finest error? Second, if specific quantiles and their error bounds are requested a priori, can the necessary space usage and computation time be reduced? We answer both questions in the affirmative by formalizing them as the “high-biased” quantiles and the “targeted” quantiles problems, respectively, and presenting algorithms with provable guarantees, that perform significantly better than previously known solutions for these problems. We implemented our algorithms in the Gigascope data stream management system, and evaluated alternate approaches for maintaining the relevant summary structures.Our experimental results on real and synthetic IP data streams complement our theoretical analyses, and highlight the importance of lightweight, non-blocking implementations when maintaining summary structures over high-speed data streams.


Implemented as a timer-histogram storage system in http://armon.github.io/statsite/ .
statistics  quantiles  percentiles  stream-processing  skew  papers  histograms  latency  algorithms 
november 2013 by jm
Barbarians at the Gateways - ACM Queue

I am a former high-frequency trader. For a few wonderful years I led a group of brilliant engineers and mathematicians, and together we traded in the electronic marketplaces and pushed systems to the edge of their capability.


Insane stuff -- FPGAs embedded in the network switches to shave off nanoseconds of latency.
low-latency  hft  via:nelson  markets  stock-trading  latency  fpgas  networking 
october 2013 by jm
"High Performance Browser Networking", by Ilya Grigorik, read online for free
Wow, this looks excellent. A must-read for people working on systems with high-volume, low-latency phone-to-server communications -- and free!
How prepared are you to build fast and efficient web applications? This eloquent book provides what every web developer should know about the network, from fundamental limitations that affect performance to major innovations for building even more powerful browser applications—including HTTP 2.0 and XHR improvements, Server-Sent Events (SSE), WebSocket, and WebRTC.

Author Ilya Grigorik, a web performance engineer at Google, demonstrates performance optimization best practices for TCP, UDP, and TLS protocols, and explains unique wireless and mobile network optimization requirements. You’ll then dive into performance characteristics of technologies such as HTTP 2.0, client-side network scripting with XHR, real-time streaming with SSE and WebSocket, and P2P communication with WebRTC.

Deliver optimal TCP, UDP, and TLS performance;
Optimize network performance over 3G/4G mobile networks;
Develop fast and energy-efficient mobile applications;
Address bottlenecks in HTTP 1.x and other browser protocols;
Plan for and deliver the best HTTP 2.0 performance;
Enable efficient real-time streaming in the browser;
Create efficient peer-to-peer videoconferencing and low-latency applications with real-time WebRTC transports


Via Eoin Brazil.
book  browser  networking  performance  phones  mobile  3g  4g  hsdpa  http  udp  tls  ssl  latency  webrtc  websockets  ebooks  via:eoin-brazil  google  http2  sse  xhr  ilya-grigorik 
october 2013 by jm
Rapid read protection in Cassandra 2.0.2
Nifty new feature -- if a request takes over the 99th percentile for requests to that server, it'll be repeated against another replica. Unnecessary for Voldemort, of course, which queries all replicas anyway!
cassandra  nosql  replication  distcomp  latency  storage 
october 2013 by jm
Attacking Tor: how the NSA targets users' online anonymity
As part of the Turmoil system, the NSA places secret servers, codenamed Quantum, at key places on the internet backbone. This placement ensures that they can react faster than other websites can. By exploiting that speed difference, these servers can impersonate a visited website to the target before the legitimate website can respond, thereby tricking the target's browser to visit a Foxacid server.


whoa, I missed this before.
nsa  gchq  packet-injection  attacks  security  backbone  http  latency 
october 2013 by jm
Benchmarking Redis on AWS ElastiCache
good data points, but could do with latency percentiles
latency  redis  measurement  benchmarks  ec2  elasticache  aws  storage  tests 
september 2013 by jm
Coordinated Omission
Gil Tene raises an extremely good point about load testing, high-percentile response-time measurement, and behaviour when testing a system under load:

I've been harping for a while now about a common measurement technique problem I call "Coordinated Omission" for a while, which can often render percentile data useless. [...] I believe that this problem occurs extremely frequently in test results, but it's usually hard to deduce it's existence purely from the final data reported. But every once in a while, I see test results where the data provided is enough to demonstrate the huge percentile-misreporting effect of Coordinated Omission based purely on the summary report.

I ran into just such a case in Attila's cool posting about log4j2's truly amazing performance, so I decided to avoid polluting his thread with an elongated discussion of how to compute 99.9%'ile data, and started this topic here. That thread should really be about how cool log4j2 is, and I'm certain that it really is cool, even after you correct the measurements. [...] Basically, I think that the 99.99% observation computation is wrong, and demonstrably (using the data in the graph data posted) exhibits the classic "coordinated omission" measurement problem I've been preaching about. This test is not alone in exhibiting this, and there is nothing to be ashamed of when you find yourself making this mistake. I only figured it out after doing it myself many many times, and then I noticed that everyone else seems to also be doing it but most of them haven't yet figured it out. In fact, I run into this issue so often in percentile reporting and load testing that I'm starting to wonder if coordinated omission is there in 99.9% of latency tests ;-)
measurement  testing  latency  load-testing  gil-tene  coordinated-omission  validity  log4j  percentiles 
august 2013 by jm
Improved HTTPS Performance with Early SSL Termination
This is a neat hack. Since SSL/TLS connection establishment requires lots of consecutive round trips before the connection is ready, by performing that closer to the user and reusing an existing region-to-region connection behind the scenes, the overall latency is greatly improved. Works for HTTP as well
http  https  ssl  architecture  aws  ec2  performance  latency  internet  round-trip  nginx  tls 
july 2013 by jm
_Measuring Mobile Web Performance_ [slides]
Notable slide is #13, displaying a graph of HSDPA packet RTTs measured from a train. Max RTT gets up to 20,266ms. ouch
rtt  packets  latency  hsdpa  mobile  internet  trains  packet-loss 
june 2013 by jm
SSL/TLS overhead
'The TLS handshake has multiple variations, but let’s pick the most common one – anonymous client and authenticated server (the connections browsers use most of the time).' Works out to 4 packets, in addition to the TCP handshake's 3, and about 6.5k bytes on average.
network  tls  ssl  performance  latency  speed  networking  internet  security  packets  tcp  handshake 
june 2013 by jm
Communication costs in real-world networks
Peter Bailis has generated some good real-world data about network performance and latency, measured using EC2 instances, between ec2 regions, between zones, and between hosts in a single AZ. good data (particularly as I was looking for this data in a public source not too long ago).

I wasn’t aware of any datasets describing network behavior both within and across datacenters, so we launched m1.small Amazon EC2 instances in each of the eight geo-distributed “Regions,” across the three us-east “Availability Zones” (three co-located datacenters in Virginia), and within one datacenter (us-east-b). We measured RTTs between hosts for a week at a granularity of one ping per second.


Some of the high-percentile measurements are undoubtedly impact of host and VM behaviour, but that is still good data for a typical service built in EC2.
networks  performance  measurements  benchmarks  ops  ec2  networking  internet  az  latency 
may 2013 by jm
Storm and Hadoop: Convergence of Big-Data and Low-Latency Processing
Yahoo! are going big with Storm for their next-generation internal cloud platform:

'Yahoo! engineering teams are developing technologies to enable Storm applications and Hadoop applications to be hosted on a single cluster.

• We have enhanced Storm to support Hadoop style security mechanism (including Kerberos authentication), and thus enable Storm applications authorized to access Hadoop datasets on HDFS and HBase.
• Storm is being integrated into Hadoop YARN for resource management. Storm-on-YARN enables Storm applications to utilize the computation resources in our tens of thousands of Hadoop computation nodes. YARN is used to launch Storm application master (Nimbus) on demand, and enables Nimbus to request resources for Storm application slaves (Supervisors).'
yahoo  yarn  cloud-computing  private-clouds  big-data  latency  storm  hadoop  elastic-computing  hbase 
february 2013 by jm
#AltDevBlogADay » Latency Mitigation Strategies
John Carmack on the low-latency coding techniques used to support head mounted display devices.

Virtual reality (VR) is one of the most demanding human-in-the-loop applications from a latency standpoint. The latency between the physical movement of a user’s head and updated photons from a head mounted display reaching their eyes is one of the most critical factors in providing a high quality experience.

Human sensory systems can detect very small relative delays in parts of the visual or, especially, audio fields, but when absolute delays are below approximately 20 milliseconds they are generally imperceptible. Interactive 3D systems today typically have latencies that are several times that figure, but alternate configurations of the same hardware components can allow that target to be reached.

A discussion of the sources of latency throughout a system follows, along with techniques for reducing the latency in the processing done on the host system.
head-mounted-display  display  ui  latency  vision  coding  john-carmack 
february 2013 by jm
AnandTech - The Intel SSD DC S3700: Intel's 3rd Generation Controller Analyzed
Interesting trend; Intel moved from a btree to an array-based data structure for their logical-block address indirection map, in order to reduce worst-case latencies (via Martin Thompson)
latency  intel  via:martin-thompson  optimization  speed  p99  data-structures  arrays  btrees  ssd  hardware 
november 2012 by jm
How does LMAX's disruptor pattern work? - Stack Overflow
LMAX's "Disruptor" concurrent-server pattern, claiming to be a higher-throughput, lower-latency, and lock-free alternative to the SEDA pattern using a massive ring buffer. Good discussion here at SO. (via Filippo)
via:filippo  servers  seda  queueing  concurrency  disruptor  patterns  latency  trading  performance  ring-buffers 
november 2011 by jm
Overclocking SSL
techie details from Adam Langley on how Google's been improving TLS/SSL, with lots of good tips. they switched in January to HTTPS for all Gmail users by default, without any additional machines or hardware
certificates  encryption  google  https  latency  speed  ssl  tcp  tls  web  performance  from delicious
july 2010 by jm

related tags

3g  4g  99th-percentile  akka  algorithms  amazon  apache  api-gateway  architecture  arrays  asl2  async  attacks  aws  az  azul  backbone  bandwidth  benchmarking  benchmarks  big-data  blade  blocking  bobtail  book  browser  btrees  bugs  c++  c3  canary-requests  capacity  cassandra  certificates  charts  cli  cloud-computing  clustering  code  coding  commandline  concurrency  control-z  coordinated-omission  corvil  counters  cross-region  dan-luu  data-structures  databases  dataviz  dax  debugging  defrag  dick-sites  dirigiste  disk  display  disruptor  distcomp  distributed  distsys  dropwizard  dynamodb  ebooks  ec2  elastic-computing  elasticache  els  encryption  errors  ewma  executors  ext3  ext4  fail  failures  filesystems  financial  fork  fpgas  framework  games  gaming  gc  gchq  gil-tene  go  google  graphs  hackernews  hadoop  handshake  hangs  hardware  hbase  hdr  hdrhistogram  head-mounted-display  hedged-requests  hft  hiccups  histograms  hsdpa  http  http2  https  huge-pages  hyperthreading  illuminate  ilya-grigorik  instrumentation  intel  inter-region  internet  io  iptables  java  jay-kreps  jclarity  jitter  jmh  john-carmack  journalling  jstat  jvm  kafka  kamon  kernel  kragen  lambda  latencies  latency  league-of-legends  left-right  linkedin  linux  lmax  load  load-balancers  load-balancing  load-generation  load-testing  locking  log4j  long-tail  low-latency  mapreduce  markets  measurement  measurements  mechanical-sympathy  memory  messaging  metrics  mitm  mmap  mobile  multicast  multicore  multithreading  netem  network  network-monitoring  networking  networks  nginx  nitsan-wakart  node.js  nosql  nsa  object-pooling  object-pools  open-source  opensource  ops  optimization  p99  packet-capture  packet-injection  packet-loss  packets  papers  partitioning  patterns  pegasus  percentiles  performance  phones  ping  pinning  play  power  power-management  private-clouds  protocols  proxies  proxy  quantiles  queueing  random  reactive  reader-writer  realtime  redis  reliability  replication  ring-buffers  round-robin  round-trip  routing  rtt  s3  sampling  scala  scalability  scaling  scheduler  scheduling  scylla  seastar  security  seda  servers  service-metrics  shared-nothing  simulation  skew  slowdown  sniffing  soa  solarcapture  spark  speed  spotify  spray  sqs  ssd  sse  ssl  statistics  stock-trading  storage  storm  stream-processing  streams  suspend  synchronization  sysdig  tail-latencies  tc  tcp  tehuti  tellybug  testing  tests  thoughts  threadpools  threads  throttling  timers  timing  tips  tls  tools  toxy  trading  trains  transparent-huge-pages  tuning  udp  ui  unix  utilization  validity  via:adriancolyer  via:eoin-brazil  via:fanf  via:filippo  via:kellabyte  via:martin-thompson  via:nelson  vision  visualization  voldemort  wait-free  web  webrtc  websockets  wifi  wrk  xen  xfs  xhr  yahoo  yarn  zing 

Copy this bookmark:



description:


tags: