jm + oom   6

a simple JVMTI agent that forcibly terminates the JVM when it is unable to allocate memory or create a thread. This is important for reliability purposes: an OutOfMemoryError will often leave the JVM in an inconsistent state. Terminating the JVM will allow it to be restarted by an external process manager.

This is apparently still useful despite the existence of '-XX:ExitOnOutOfMemoryError' as of java 8, since that may somehow still fail occasionally.
oom  java  reliability  uptime  memory  ops 
12 days ago by jm
The Discovery of Apache ZooKeeper's Poison Packet - PagerDuty
Excellent deep dive into a production issue. Root causes: crappy error handling code in Zookeeper; lack of bounds checking in ZK; and a nasty kernel bug.
zookeeper  bugs  error-handling  bounds-checking  oom  poison-packets  pagerduty  packets  tcpdump  xen  aes  linux  kernel 
may 2015 by jm
Transparent huge pages implicated in Redis OOM
A nasty real-world prod error scenario worsened by THPs:
jemalloc(3) extensively uses madvise(2) to notify the operating system that it's done with a range of memory which it had previously malloc'ed. The page size on this machine is 2MB because transparent huge pages are in use. As such, a lot of the memory which is being marked with madvise(..., MADV_DONTNEED) is within substantially smaller ranges than 2MB. This means that the operating system never was able to evict pages which had ranges marked as MADV_DONTNEED because the entire page has to be unneeded to allow a page to be reused. Despite initially looking like a leak, the operating system itself was unable to free memory because of madvise(2) and transparent huge pages. This led to sustained memory pressure on the machine and redis-server eventually getting OOM killed.
oom-killer  oom  linux  ops  thp  jemalloc  huge-pages  madvise  redis  memory 
march 2015 by jm's reference page for java.lang.OutOfMemoryError
With examples of each possible cause of a Java OOM, and suggested workarounds. succinct
reference  oom  java  memory  heap  ops 
june 2014 by jm
from the Percona toolkit. 'Conveniently summarizes the status and configuration of a server. It is not a tuning tool or diagnosis tool. It produces a report that is easy to diff and can be pasted into emails without losing the formatting. This tool works well on many types of Unix systems.' --- summarises OOM history, top, netstat connection table, interface stats, network config, RAID, LVM, disks, inodes, disk scheduling, mounts, memory, processors, and CPU.
percona  tools  cli  unix  ops  linux  diagnosis  raid  netstat  oom 
october 2013 by jm
Taming the OOM killer []
hmm, I never knew about oom_adj, useful (via Peter Blair)
via:petermblair  oom  linux  memory  oom-killer  sysadmin  lwn  from delicious
january 2011 by jm

Copy this bookmark: