jm + mechanical-sympathy   8

Gil Tene's "usual suspects" to reduce system-level hiccups/latency jitters in a Linux system
Based on empirical evidence (across many tens of sites thus far) and note-comparing with others, I use a list of "usual suspects" that I blame whenever they are not set to my liking and system-level hiccups are detected. Getting these settings right from the start often saves a bunch of playing around (and no, there is no "priority" to this - you should set them all right before looking for more advice...).
performance  latency  hiccups  gil-tene  tuning  mechanical-sympathy  hyperthreading  linux  ops 
april 2015 by jm
Netty: Using as a generic library
Some cool stuff that comes along with Netty: an improved ByteBuffer, a thread-local object pool, a hashed-wheel Timer, and some nice mechanical-sympathy utils.
mechanical-sympathy  netty  java  bytebuffer  object-pools  data-structures  hashed-wheel-timer  algorithms  timers 
november 2014 by jm
'Mythbusting Modern Hardware to gain "Mechanical Sympathy"' [slides]
Martin Thompson's latest talk -- taking a few common concepts about modern hardware performance and debunking/confirming them, mythbusters-style
mythbusters  hardware  mechanical-sympathy  martin-thompson  java  performance  cpu  disks  ssd 
may 2013 by jm
Martin Thompson, Luke "Snabb Switch" Gorrie etc. review the C10M presentation from Schmoocon
on the mechanical-sympathy mailing list. Some really interesting discussion on handling insane quantities of TCP connections using low volumes of hardware:
This talk has some good points and I think the subject is really interesting.  I would take the suggested approach with serious caution.  For starters the Linux kernel is nowhere near as bad as it made out.  Last year I worked with a client and we scaled a single server to 1 million concurrent connections with async programming in Java and some sensible kernel tuning.  I've heard they have since taken this to over 5 million concurrent connections.

BTW Open Onload is an open source implementation.  Writing a network stack is a serious undertaking.  In a previous life I wrote a network probe and had to reassemble TCP streams and kept getting tripped up by edge cases.  It is a great exercise in data structures and lock-free programming.  If you need very high-end performance I'd talk to the Solarflare or Mellanox guys before writing my own.

There are some errors and omissions in this talk.  For example, his range of ephemeral ports is not quite right, and atomic operations are only 15 cycles on Sandy Bridge when hitting local cache.  A big issue for me is when he defined C10M he did not mention the TIME_WAIT issue with closing connections.  Creating and destroying 1 million connections per second is a major issue.  A protocol like HTTP is very broken in that the server closes the socket and therefore has to retain the TCB until the specified timeout occurs to ensure no older packet is delivered to a new socket connection.
mechanical-sympathy  hardware  scaling  c10m  tcp  http  scalability  snabb-switch  martin-thompson 
may 2013 by jm
The Bw-Tree: A B-tree for New Hardware - Microsoft Research
The emergence of new hardware and platforms has led to reconsideration of how data management systems are designed. However, certain basic functions such as key indexed access to records remain essential. While we exploit the common architectural layering of prior systems, we make radically new design decisions about each layer. Our new form of B tree, called the Bw-tree achieves its very high performance via a latch-free approach that effectively exploits the processor caches of modern multi-core chips. Our storage manager uses a unique form of log structuring that blurs the distinction between a page and a record store and works well with flash storage. This paper describes the architecture and algorithms for the Bw-tree, focusing on the main memory aspects. The paper includes results of our experiments that demonstrate that this fresh approach produces outstanding performance.
bw-trees  database  paper  toread  research  algorithms  microsoft  sql  sql-server  b-trees  data-structures  storage  cache-friendly  mechanical-sympathy 
april 2013 by jm
Jetty-9 goes fast with Mechanical Sympathy
This is very cool! Applying Mechanical Sympathy optimization techniques to Jetty, specifically: "False sharing" on the BlockingArrayQueue data structure resolved; a new ArrayTernaryTrie data structure to improve header field storage, making it faster to build. look up, efficient on RAM, cheap to GC, and more cache-friendly than a traditional trie; and a branchless hex-to-byte conversion statement. The results are a 30%-faster microbenchmark on amd64, with 50% less Young Gen garbage collections. Lovely to see low-level infrastructure libs like Jetty getting this kind of optimization.
jetty  java  mechanical-sympathy  optimization  coding  tries 
february 2013 by jm

Copy this bookmark:



description:


tags: