jm + ram   5

How To Measure the Working Set Size on Linux
A nifty metric:
The Working Set Size (WSS) is how much memory an application needs to keep working. Your app may have populated 100 Gbytes of main memory, but only uses 50 Mbytes each second to do its job. That's the working set size. It is used for capacity planning and scalability analysis.

You may never have seen WSS measured by any tool (I haven't either). OSes usually show you virtual memory and resident memory, shown as the "VIRT" and "RES" columns in top. Resident memory is real memory: main memory that has been allocated and page mapped. But we don't know how much of that is in heavy use, which is what WSS tells us.

In this post I'll introduce some new things I've developed for WSS estimation: two Linux tools, and WSS profile charts. The tools use either the referenced or the idle page flags to measure a page-based WSS, and were developed out of necessity for another performance problem.

(via Amy Tobey)
via:amytobey  memory  linux  rss  wss  proc  ps  processes  metrics  working-set-size  ram 
4 weeks ago by jm
OpenJDK: jol
'JOL (Java Object Layout) is the tiny toolbox to analyze object layout schemes in JVMs. These tools are using Unsafe, JVMTI, and Serviceability Agent (SA) heavily to decoder the actual object layout, footprint, and references. This makes JOL much more accurate than other tools relying on heap dumps, specification assumptions, etc.'

Recommended by Nitsan Wakart, looks pretty useful for JVM devs
java  jvm  tools  scala  memory  estimation  ram  object-layout  debugging  via:nitsan 
february 2015 by jm
"Pitfalls of Object Oriented Programming", SCEE R&D
Good presentation discussing "data-oriented programming" -- the concept of optimizing memory access speed by laying out large data in a columnar format in RAM, rather than naively in the default layout that OOP design suggests
columnar  ram  memory  optimization  coding  c++  oop  data-oriented-programming  data  cache  performance 
july 2014 by jm
MICA: A Holistic Approach To Fast In-Memory Key-Value Storage [paper]
Very interesting new approach to building a scalable in-memory K/V store. As Rajiv Kurian notes on the mechanical-sympathy list:

'The basic idea is that each core is responsible for a portion of the key-space and requests are forwarded to the right core, avoiding multiple-writer scenarios. This is opposed to designs like memcache which uses locks and shared memory.

Some of the things I found interesting: The single writer design is taken to an extreme. Clients assist the partitioning of requests, by calculating hashes before submitting GET requests. It uses Intel DPDK instead of sockets to forward packets to the right core, without processing the packet on any core. Each core is paired with a dedicated RX/TX queue. The design for a lossy cache is simple but interesting. It does things like replacing a hash slot (instead of chaining) etc. to take advantage of the lossy nature of caches. There is a lossless design too. A bunch of tricks to optimize for memory performance. This includes pre-allocation, design of the hash indexes, prefetching tricks etc. There are some other concurrency tricks that were interesting. Handling dangling pointers was one of them.'

Source code here:
mica  in-memory  memory  ram  key-value-stores  storage  smp  dpdk  multicore  memcached  concurrency 
april 2014 by jm
Instagram: Making the Switch to Cassandra from Redis, a 75% 'Insta' Savings
shifting data out of RAM and onto SSDs -- unsurprisingly, big savings.
a 12 node cluster of EC2 hi1.4xlarge instances; we store around 1.2TB of data across this cluster. At peak, we're doing around 20,000 writes per second to that specific cluster and around 15,000 reads per second. We've been really impressed with how well Cassandra has been able to drop into that role.
ram  ssd  cassandra  databases  nosql  redis  instagram  storage  ec2 
june 2013 by jm

Copy this bookmark: