jm + memory   48

Native Memory Tracking
Java 8 HotSpot feature to monitor and diagnose native memory leaks
java  jvm  memory  native-memory  malloc  debugging  coding  nmt  java-8  jcmd 
4 days ago by jm
Debugging Java Native Memory Leaks (evanjones.ca)
Using jemalloc to instrument the contents of the native heap and record stack traces of each chunk's allocators, so that leakers can be quickly identified (GZIPInputStream in this case).

See also https://gdstechnology.blog.gov.uk/2015/12/11/using-jemalloc-to-get-to-the-bottom-of-a-memory-leak/ , https://github.com/jeffgriffith/native-jvm-leaks/blob/master/README.md .
debugging  memory  jvm  java  leaks  memory-leaks  leak-checking  jemalloc  malloc  native  heap  off-heap  gzipinputstream 
january 2017 by jm
MemC3: Compact and concurrent Memcache with dumber caching and smarter hashing
An improved hashing algorithm called optimistic cuckoo hashing, and a CLOCK-based eviction algorithm that works in tandem with it. They are evaluated in the context of Memcached, where combined they give up to a 30% memory usage reduction and up to a 3x improvement in queries per second as compared to the default Memcached implementation on read-heavy workloads with small objects (as is typified by Facebook workloads).
memcached  performance  key-value-stores  storage  databases  cuckoo-hashing  algorithms  concurrency  caching  cache-eviction  memory  throughput 
november 2016 by jm
Squeezing blood from a stone: small-memory JVM techniques for microservice sidecars
Reducing service memory usage from 500MB to 105MB:
We found two specific techniques to be the most beneficial: turning off one of the two JIT compilers enabled by default (the “C2” compiler), and using a 32-bit, rather than a 64-bit, JVM.
32bit  jvm  java  ops  memory  tuning  jit  linkerd 
june 2016 by jm
qp tries: smaller and faster than crit-bit tries
interesting new data structure from Tony Finch. "Some simple benchmarks say qp tries have about 1/3 less memory overhead and are about 10% faster than crit-bit tries."
crit-bit  popcount  bits  bitmaps  tries  data-structures  via:fanf  qp-tries  crit-bit-tries  hacks  memory 
october 2015 by jm
GSMem: Data Exfiltration from Air-Gapped Computers over GSM Frequencies
Holy shit.
Air-gapped networks are isolated, separated both logically and physically from public networks. Although the feasibility of invading such systems has been demonstrated in recent years, exfiltration of data from air-gapped networks is still a challenging task. In this paper we present GSMem, a malware that can exfiltrate data through an air-gap over cellular frequencies. Rogue software on an infected target computer modulates and transmits electromagnetic signals at cellular frequencies by invoking specific memory-related instructions and utilizing the multichannel memory architecture to amplify the transmission. Furthermore, we show that the transmitted signals can be received and demodulated by a rootkit placed in the baseband firmware of a nearby cellular phone.
gsmem  gsm  exfiltration  air-gaps  memory  radio  mobile-phones  security  papers 
august 2015 by jm
Memory Layouts for Binary Search
Key takeaway:
Nearly uni­ver­sally, B-trees win when the data gets big enough.
caches  cpu  performance  optimization  memory  binary-search  b-trees  algorithms  search  memory-layout 
may 2015 by jm
Cassandra moving to using G1 as the default recommended GC implementation
This is a big indicator that G1 is ready for primetime. CMS has long been the go-to GC for production usage, but requires careful, complex hand-tuning -- if G1 is getting to a stage where it's just a case of giving it enough RAM, that'd be great.

Also, looks like it'll be the JDK9 default: https://twitter.com/shipilev/status/593175793255219200
cassandra  tuning  ops  g1gc  cms  gc  java  jvm  production  performance  memory 
april 2015 by jm
Transparent huge pages implicated in Redis OOM
A nasty real-world prod error scenario worsened by THPs:
jemalloc(3) extensively uses madvise(2) to notify the operating system that it's done with a range of memory which it had previously malloc'ed. The page size on this machine is 2MB because transparent huge pages are in use. As such, a lot of the memory which is being marked with madvise(..., MADV_DONTNEED) is within substantially smaller ranges than 2MB. This means that the operating system never was able to evict pages which had ranges marked as MADV_DONTNEED because the entire page has to be unneeded to allow a page to be reused. Despite initially looking like a leak, the operating system itself was unable to free memory because of madvise(2) and transparent huge pages. This led to sustained memory pressure on the machine and redis-server eventually getting OOM killed.
oom-killer  oom  linux  ops  thp  jemalloc  huge-pages  madvise  redis  memory 
march 2015 by jm
OpenJDK: jol
'JOL (Java Object Layout) is the tiny toolbox to analyze object layout schemes in JVMs. These tools are using Unsafe, JVMTI, and Serviceability Agent (SA) heavily to decoder the actual object layout, footprint, and references. This makes JOL much more accurate than other tools relying on heap dumps, specification assumptions, etc.'

Recommended by Nitsan Wakart, looks pretty useful for JVM devs
java  jvm  tools  scala  memory  estimation  ram  object-layout  debugging  via:nitsan 
february 2015 by jm
Please grow your buffers exponentially
Although in some cases x1.5 is considered good practice. YMMV I guess
malloc  memory  coding  buffers  exponential  jemalloc  firefox  heap  allocation 
november 2014 by jm
Java tip: optimizing memory consumption
Good tips on how to tell if object allocation rate is a bottleneck in your JVM-based code
yourkit  memory  java  jvm  allocation  gc  bottlenecks  performance 
august 2014 by jm
mariusaeriksen/heapster
Heapster provides an agent library to do heap profiling for JVM processes with output compatible with Google perftools. The goal of Heapster is to be able to do meaningful (sampled) heap profiling in a production setting.


Used by Twitter in production, apparently.
heap  monitoring  memory  jvm  java  performance 
july 2014 by jm
"Pitfalls of Object Oriented Programming", SCEE R&D
Good presentation discussing "data-oriented programming" -- the concept of optimizing memory access speed by laying out large data in a columnar format in RAM, rather than naively in the default layout that OOP design suggests
columnar  ram  memory  optimization  coding  c++  oop  data-oriented-programming  data  cache  performance 
july 2014 by jm
Jump Consistent Hash: A Fast, Minimal Memory, Consistent Hash Algorithm
'a fast, minimal memory, consistent hash algorithm that can be expressed in about 5 lines of code. In comparison to the algorithm of Karger et al., jump consistent hash requires no storage, is faster, and does a better job of evenly dividing the key space among the buckets and of evenly dividing the workload when the number of buckets changes. Its main limitation is that the buckets must be numbered sequentially, which makes it more suitable for data storage applications than for distributed web caching.'

Implemented in Guava. This is also noteworthy:

'Google has not applied for patent protection for this algorithm, and, as of this writing, has no plans to. Rather, it wishes to contribute this algorithm to the community.'
hashing  consistent-hashing  google  guava  memory  algorithms  sharding 
june 2014 by jm
Plumbr.eu's reference page for java.lang.OutOfMemoryError
With examples of each possible cause of a Java OOM, and suggested workarounds. succinct
reference  plumbr.eu  oom  java  memory  heap  ops 
june 2014 by jm
Friends don't let friends use mmap(2)
Rather horrific update from the trenches of Mozilla
mozilla  mmap  performance  linux  io  files  memory  unix  windows 
may 2014 by jm
Dynamic Tuple Performance On the JVM
More JVM off-heap storage from Boundary:
generates heterogeneous collections of primitive values and ensures as best it can that they will be laid out adjacently in memory. The individual values in the tuple can either be accessed from a statically bound interface, via an indexed accessor, or via reflective or other dynamic invocation techniques. FastTuple is designed to deal with a large number of tuples therefore it will also attempt to pool tuples such that they do not add significantly to the GC load of a system. FastTuple is also capable of allocating the tuple value storage entirely off-heap, using Java’s direct memory capabilities.
jvm  java  gc  off-heap  storage  boundary  memory 
may 2014 by jm
Structs
storage of structured data in a continuous block of memory. The memory can be allocated on the heap using a byte[] array or can be allocated off the java heap in native memory. [...] Use cases: store/cache huge amounts of data records without impact on GC duration; high performance data transfer in a cluster or in between processes


handy OSS from Ruediger Moeller
structs  java  jvm  memory  off-heap  storage  reference 
may 2014 by jm
Rope-core memory
as used in the Apollo guidance computer systems -- hand-woven by "little old ladies". Amazing
core-memory  memory  rope-core  guidance  apollo  space  nasa  history  1960s  via:hn 
april 2014 by jm
Sirius by Comcast
At Comcast, our applications need convenient, low-latency access to important reference datasets. For example, our XfinityTV websites and apps need to use entertainment-related data to serve almost every API or web request to our datacenters: information like what year Casablanca was released, or how many episodes were in Season 7 of Seinfeld, or when the next episode of the Voice will be airing (and on which channel!).

We traditionally managed this information with a combination of relational databases and RESTful web services but yearned for something simpler than the ORM, HTTP client, and cache management code our developers dealt with on a daily basis. As main memory sizes on commodity servers continued to grow, however, we asked ourselves: How can we keep this reference data entirely in RAM, while ensuring it gets updated as needed and is easily accessible to application developers?

The Sirius distributed system library is our answer to that question, and we're happy to announce that we've made it available as an open source project. Sirius is written in Scala and uses the Akka actor system under the covers, but is easily usable by any JVM-based language.

Also includes a Paxos implementation with "fast follower" read-only slave replication. ASL2-licensed open source.

The only thing I can spot to be worried about is speed of startup; they note that apps need to replay a log at startup to rebuild state, which can be slow if unoptimized in my experience.

Update: in a twitter conversation at https://twitter.com/jon_moore/status/459363751893139456 , Jon Moore indicated they haven't had problems with this even with 'datasets consuming 10-20GB of heap', and have 'benchmarked a 5-node Sirius ingest cluster up to 1k updates/sec write throughput.' That's pretty solid!
open-source  comcast  paxos  replication  read-only  datastores  storage  memory  memcached  redis  sirius  scala  akka  jvm  libraries 
april 2014 by jm
MICA: A Holistic Approach To Fast In-Memory Key-Value Storage [paper]
Very interesting new approach to building a scalable in-memory K/V store. As Rajiv Kurian notes on the mechanical-sympathy list:

'The basic idea is that each core is responsible for a portion of the key-space and requests are forwarded to the right core, avoiding multiple-writer scenarios. This is opposed to designs like memcache which uses locks and shared memory.

Some of the things I found interesting: The single writer design is taken to an extreme. Clients assist the partitioning of requests, by calculating hashes before submitting GET requests. It uses Intel DPDK instead of sockets to forward packets to the right core, without processing the packet on any core. Each core is paired with a dedicated RX/TX queue. The design for a lossy cache is simple but interesting. It does things like replacing a hash slot (instead of chaining) etc. to take advantage of the lossy nature of caches. There is a lossless design too. A bunch of tricks to optimize for memory performance. This includes pre-allocation, design of the hash indexes, prefetching tricks etc. There are some other concurrency tricks that were interesting. Handling dangling pointers was one of them.'

Source code here: https://github.com/efficient/mica
mica  in-memory  memory  ram  key-value-stores  storage  smp  dpdk  multicore  memcached  concurrency 
april 2014 by jm
Impact of large primitive arrays (BLOBS) on JVM Garbage Collection
some nice graphs and data on CMS performance, with/without -XX:ParGCCardsPerStrideChunk
cms  java  jvm  performance  optimization  tuning  off-heap-storage  memory 
march 2014 by jm
Huge Redis rant
I want to emphasize that if you use redis as intended (as a slightly-persistent, not-HA cache), it's great. Unfortunately, more and more shops seem to be thinking that Redis is a full-service database and, as someone who's had to spend an inordinate amount of time maintaining such a setup, it's not. If you're writing software and you're thinking "hey, it would be easy to just put a SET key value in this code and be done," please reconsider. There are lots of great products out there that are better for the overwhelming majority of use cases.


Ouch. (via Aphyr)
redis  storage  architecture  memory  caching  ha  databases 
february 2014 by jm
On Hacking MicroSD Cards
incredible stuff from Bunnie Huang:
Today at the Chaos Computer Congress (30C3), xobs and I disclosed a finding that some SD cards contain vulnerabilities that allow arbitrary code execution — on the memory card itself. On the dark side, code execution on the memory card enables a class of MITM (man-in-the-middle) attacks, where the card seems to be behaving one way, but in fact it does something else. On the light side, it also enables the possibility for hardware enthusiasts to gain access to a very cheap and ubiquitous source of microcontrollers.
security  memory  hacking  hardware  ccc  sd-cards  memory-cards 
december 2013 by jm
Asynchronous logging versus Memory Mapped Files
Interesting article around using mmap'd files from Java using RandomAccessFile.getChannel().map(), which allows them to be accessed directly as a ByteBuffer. together with Atomic variable lazySet() operations, this provides pretty excellent performance results on low-latency writes to disk. See also: http://psy-lob-saw.blogspot.ie/2012/12/atomiclazyset-is-performance-win-for.html
atomic  lazyset  putordered  jmm  java  synchronization  randomaccessfile  bytebuffers  performance  optimization  memory  disk  queues 
november 2013 by jm
Response to "Optimizing Linux Memory Management..."
A follow up to the LinkedIn VM-tuning blog post at http://engineering.linkedin.com/performance/optimizing-linux-memory-management-low-latency-high-throughput-databases --
Do not read in to this article too much, especially for trying to understand how the Linux VM or the kernel works.  The authors misread the "global spinlock on the zone" source code and the interpretation in the article is dead wrong.
linux  tuning  vm  kernel  linkedin  memory  numa 
october 2013 by jm
Sketch of the Day – Frugal Streaming
ha, this is very clever! If you have enough volume, this is a nice estimation algorithm to compute stream quantiles in very little RAM
memory  streaming  stream-processing  clever  algorithms  hacks  streams 
september 2013 by jm
Java Garbage Collection Distilled
a great summary of the state of JVM garbage collection from Martin Thompson
jvm  java  gc  garbage-collection  tuning  memory  performance  martin-thompson 
july 2013 by jm
memcached turns 10 years old
Well, apparently tomorrow, but close enough. Happy birthday to bradfitz' greatest creation and its wonderful slab allocator!
birthdays  code  via:alex-popescu  open-source  history  malloc  memory  caching  memcached 
may 2013 by jm
Lectures in Advanced Data Structures (6.851)
Good lecture notes on the current state of the art in data structure research.
Data structures play a central role in modern computer science. You interact with data structures even more often than with algorithms (think Google, your mail server, and even your network routers). In addition, data structures are essential building blocks in obtaining efficient algorithms. This course covers major results and current directions of research in data structures:

TIME TRAVEL We can remember the past efficiently (a technique called persistence), but in general it's difficult to change the past and see the outcomes on the present (retroactivity). So alas, Back To The Future isn't really possible.
GEOMETRY When data has more than one dimension (e.g. maps, database tables).
DYNAMIC OPTIMALITY Is there one binary search tree that's as good as all others? We still don't know, but we're close.
MEMORY HIERARCHY Real computers have multiple levels of caches. We can optimize the number of cache misses, often without even knowing the size of the cache.
HASHING Hashing is the most used data structure in computer science. And it's still an active area of research.
INTEGERS Logarithmic time is too easy. By careful analysis of the information you're dealing with, you can often reduce the operation times substantially, sometimes even to constant. We will also cover lower bounds that illustrate when this is not possible.
DYNAMIC GRAPHS A network link went down, or you just added or deleted a friend in a social network. We can still maintain essential information about the connectivity as it changes.
STRINGS Searching for phrases in giant text (think Google or DNA).
SUCCINCT Most “linear size” data structures you know are much larger than they need to be, often by an order of magnitude. Some data structures require almost no space beyond the raw data but are still fast (think heaps, but much cooler).


(via Tim Freeman)
data-structures  lectures  mit  video  data  algorithms  coding  csail  strings  integers  hashing  sorting  bst  memory 
april 2013 by jm
Peek and poke in the age of Linux
Neat demo of using ptrace to inject into a running process, just like the good old days ;)
Some time ago I ran into a production issue where the init process (upstart) stopped behaving properly. Specifically, instead of spawning new processes, it deadlocked in a transitional state. [...] What’s worse, upstart doesn’t allow forcing a state transition and trying to manually create and send DBus events didn’t help either. That meant the sane options we were left with were:
restart the host (not desirable at all in that scenario);
start the process manually and hope auto-respawn will not be needed.
Of course there are also some insane options. Why not cheat like in the old times and just PEEK and POKE the process in the right places? The solution used at the time involved a very ugly script driving gdb which probably summoned satan in some edge cases. But edge cases were not hit and majority of hosts recovered without issues.
debugging  memory  linux  upstart  peek  poke  ptrace  gdb  processes  hacks 
march 2013 by jm
Fatcache
from Twitter -- 'a cache for your big data. Even though memory is thousand times faster than SSD, network connected SSD-backed memory makes sense, if we design the system in a way that network latencies dominate over the SSD latencies by a large factor. To understand why network connected SSD makes sense, it is important to understand the role distributed memory plays in large-scale web architecture. In recent years, terabyte-scale, distributed, in-memory caches have become a fundamental building block of any web architecture. In-memory indexes, hash tables, key-value stores and caches are increasingly incorporated for scaling throughput and reducing latency of persistent storage systems. However, power consumption, operational complexity and single node DRAM cost make horizontally scaling this architecture challenging. The current cost of DRAM per server increases dramatically beyond approximately 150 GB, and power cost scales similarly as DRAM density increases. Fatcache extends a volatile, in-memory cache by incorporating SSD-backed storage.'
twitter  ssd  cache  caching  memcached  memcache  memory  network  storage 
february 2013 by jm
Special encoding of small aggregate data types in Redis
Nice performance trick in Redis on hash storage:

'In theory in order to guarantee that we perform lookups in constant time (also known as O(1) in big O notation) there is the need to use a data structure with a constant time complexity in the average case, like an hash table. But many times hashes contain just a few fields. When hashes are small we can instead just encode them in an O(N) data structure, like a linear array with length-prefixed key value pairs. Since we do this only when N is small, the amortized time for HGET and HSET commands is still O(1): the hash will be converted into a real hash table as soon as the number of elements it contains will grow too much (you can configure the limit in redis.conf). This does not work well just from the point of view of time complexity, but also from the point of view of constant times, since a linear array of key value pairs happens to play very well with the CPU cache (it has a better cache locality than an hash table).'
memory  redis  performance  big-o  hash-tables  storage  coding  cache  arrays 
november 2012 by jm
ElementCostInDataStructures
"The cost per element in major data structures offered by Java and Guava (r11)]." A very useful reference!

Ever wondered what's the cost of adding each entry to a HashMap? Or one new element in a TreeSet? Here are the answers: the cost per-entry for each well-known structure in Java and Guava. You can use this to estimate the cost of a structure, like this: if the per-entry cost of a structure is 32 bytes, and your structure contains 1024 elements, the structure's footprint will be around 32 kilobytes. Note that non-tree mutable structures are amortized (adding an element might trigger a resize, and be expensive, otherwise it would be cheap), making the measurement of the "average per element cost" measurement hard, but you can expect that the real answers are close to what is reported below.
java  coding  guava  reference  memory  cost  performance  data-structures 
october 2012 by jm
Avoiding Full GCs in HBase with MemStore-Local Allocation Buffers
Fascinating. Evading the Java GC by reimplementing a slab allocator, basically
memory  allocation  java  gc  jvm  hbase  memstore  via:dehora  slab-allocator 
october 2011 by jm
Golomb-coded sets
'a probabilistic data structure conceptually similar to a Bloom filter, but with a more compact in-memory representation, and a slower query time.' could come in handy
gcs  bloom-filters  probabilistic  data-structures  memory  algorithms 
september 2011 by jm
Taming the OOM killer [LWN.net]
hmm, I never knew about oom_adj, useful (via Peter Blair)
via:petermblair  oom  linux  memory  oom-killer  sysadmin  lwn  from delicious
january 2011 by jm
Blosc
A high-performance compressor optimized for binary data -- 'designed to transmit data to the processor cache faster than a traditional, non-compressed, direct memory fetch via memcpy()' (via Bill de hOra)
via:dehora  compression  memcpy  caching  l1  software  memory  optimization  performance  python  pytables  from delicious
october 2010 by jm
The MySQL “swap insanity” problem and the effects of the NUMA architecture
very interesting; modern multicore x86 architectures use a NUMA memory architecture, which can cause a dip into swap, even when there appears to be plenty of free RAM available
linux  memory  mysql  optimization  performance  swap  tuning  vm  numa  swap-insanity  swapping  from delicious
september 2010 by jm
Why WeakHashMap Sucks
'SoftReferences are the cheap, crappy caching mechanism [...] perfect for when you'd like your cache to be cleared at random times and in random order.'
softreferences  weakreferences  weak  references  gc  java  jvm  caching  hash  memory  collections  vm  weakhashmap  via:spyced  from delicious
september 2009 by jm

related tags

32bit  1960s  air-gaps  akka  algorithms  allocation  apollo  architecture  arrays  atomic  aws  b-trees  big-o  binary-search  birthdays  bitmaps  bits  bloom-filters  bottlenecks  boundary  bst  buffers  bytebuffers  c++  cache  cache-eviction  caches  caching  cassandra  ccc  clever  cms  code  coding  collections  columnar  comcast  compression  concurrency  consistent-hashing  core-memory  cost  cpu  crit-bit  crit-bit-tries  csail  cuckoo-hashing  data  data-oriented-programming  data-structures  databases  datastores  dataviz  debugging  defrag  disk  dpdk  ec2  estimation  exfiltration  exponential  files  firefox  g1gc  garbage-collection  gc  gcs  gdb  google  gsm  gsmem  guava  guidance  gzipinputstream  ha  hacking  hacks  hardware  hash  hash-tables  hashing  hbase  heap  history  huge-pages  in-memory  instances  integers  io  java  java-8  jcmd  jemalloc  jit  jmm  jvm  kernel  key-value-stores  kragen  l1  latency  lazyset  leak-checking  leaks  lectures  libraries  linkedin  linkerd  linux  lwn  madvise  malloc  mapreduce  mark-and-sweep  martin-thompson  memcache  memcached  memcpy  memory  memory-cards  memory-layout  memory-leaks  memstore  mica  mit  mmap  mobile-phones  monitoring  mozilla  multicore  mysql  nasa  native  native-memory  network  nmt  numa  object-layout  off-heap  off-heap-storage  oom  oom-killer  oop  open-source  ops  optimization  papers  paxos  peek  performance  plumbr.eu  poke  popcount  pricing  probabilistic  processes  production  ptrace  putordered  pytables  python  qp-tries  queues  r3  radio  ram  random  randomaccessfile  read-only  redis  refcounting  reference  references  replication  rope-core  ruby  scala  sd-cards  search  security  sharding  sirius  slab-allocator  smp  softreferences  software  sorting  space  speed  ssd  stack-size  storage  stream-processing  streaming  streams  strings  structs  swap  swap-insanity  swapping  synchronization  sysadmin  thoughts  thp  threads  throughput  tools  transparent-huge-pages  tries  tuning  twitter  unix  upstart  via:alex-popescu  via:dehora  via:fanf  via:hn  via:nitsan  via:petermblair  via:spyced  video  visualization  vm  weak  weakhashmap  weakreferences  windows  yourkit 

Copy this bookmark:



description:


tags: