jm + jvm   98

Locking, Little's Law, and the USL
Excellent explanatory mailing list post by Martin Thompson to the mechanical-sympathy group, discussing Little's Law vs the USL:
Little's law can be used to describe a system in steady state from a queuing perspective, i.e. arrival and leaving rates are balanced. In this case it is a crude way of modelling a system with a contention percentage of 100% under Amdahl's law, in that throughput is one over latency.

However this is an inaccurate way to model a system with locks. Amdahl's law does not account for coherence costs. For example, if you wrote a microbenchmark with a single thread to measure the lock cost then it is much lower than in a multi-threaded environment where cache coherence, other OS costs such as scheduling, and lock implementations need to be considered.

Universal Scalability Law (USL) accounts for both the contention and the coherence costs.

When modelling locks it is necessary to consider how contention and coherence costs vary given how they can be implemented. Consider in Java how we have biased locking, thin locks, fat locks, inflation, and revoking biases which can cause safe points that bring all threads in the JVM to a stop with a significant coherence component.
usl  scaling  scalability  performance  locking  locks  java  jvm  amdahls-law  littles-law  system-dynamics  modelling  systems  caching  threads  schedulers  contention 
5 days ago by jm
Native Memory Tracking
Java 8 HotSpot feature to monitor and diagnose native memory leaks
java  jvm  memory  native-memory  malloc  debugging  coding  nmt  java-8  jcmd 
7 days ago by jm
Java Flame Graphs Introduction: Fire For Everyone!
lots of good detail on flame graph usage in Java, and the Honest Profiler (honest because it's safepoint-free)
profiling  java  safepoints  jvm  flame-graphs  perf  measurement  benchmarking  testing 
14 days ago by jm
Log analyser and visualiser for the HotSpot JIT compiler. Inspect inlining decisions, hot methods, bytecode, and assembly. View results in the JavaFX user interface.
analysis  java  jvm  performance  tools  debugging  optimization  jit 
4 weeks ago by jm
'A Java Virtual Machine written in 100% JavaScript.' Wrapping outbound TCP traffic in websockets, mad stuff
jvm  java  javascript  js  hacks  browser  emulation  websockets 
10 weeks ago by jm
Debugging Java Native Memory Leaks (
Using jemalloc to instrument the contents of the native heap and record stack traces of each chunk's allocators, so that leakers can be quickly identified (GZIPInputStream in this case).

See also , .
debugging  memory  jvm  java  leaks  memory-leaks  leak-checking  jemalloc  malloc  native  heap  off-heap  gzipinputstream 
january 2017 by jm
Squeezing blood from a stone: small-memory JVM techniques for microservice sidecars
Reducing service memory usage from 500MB to 105MB:
We found two specific techniques to be the most beneficial: turning off one of the two JIT compilers enabled by default (the “C2” compiler), and using a 32-bit, rather than a 64-bit, JVM.
32bit  jvm  java  ops  memory  tuning  jit  linkerd 
june 2016 by jm
'centrally-planned object and thread pools' for java.

'In the default JVM thread pools, once a thread is created it will only be retired when it hasn't performed a task in the last minute. In practice, this means that there are as many threads as the peak historical number of concurrent tasks handled by the pool, forever. These thread pools are also poorly instrumented, making it difficult to tune their latency or throughput. Dirigiste provides a fast, richly instrumented version of a java.util.concurrent.ExecutorService, and provides a means to feed that instrumentation into a control mechanism that can grow or shrink the pool as needed. Default implementations that optimize the pool size for thread utilization are provided. It also provides an object pool mechanism that uses a similar feedback mechanism to resize itself, and is significantly simpler than the Apache Commons object pool implementation.'

Great metric support, too.
async  jvm  dirigiste  java  threadpools  concurrency  utilization  capacity  executors  object-pools  object-pooling  latency 
june 2016 by jm
Netty @Apple: Large Scale Deployment/Connectivity [video]
'Norman Maurer presents how Apple uses Netty for its Java based services and the challenges of doing so, including how they enhanced performance by participating in the Netty open source community. Maurer takes a deep dive into advanced topics like JNI, JVM internals, and others.'
apple  netty  norman-maurer  java  jvm  async  talks  presentations 
january 2016 by jm
The Importance of Tuning Your Thread Pools
Excellent blog post on thread pools, backpressure, Little's Law, and other Hystrix-related topics (PS: use Hystrix)
hystrix  threadpools  concurrency  java  jvm  backpressure  littles-law  capacity 
january 2016 by jm
a tool which simplifies tracing and testing of Java programs. Byteman allows you to insert extra Java code into your application, either as it is loaded during JVM startup or even after it has already started running. The injected code is allowed to access any of your data and call any application methods, including where they are private. You can inject code almost anywhere you want and there is no need to prepare the original source code in advance nor do you have to recompile, repackage or redeploy your application. In fact you can remove injected code and reinstall different code while the application continues to execute. The simplest use of Byteman is to install code which traces what your application is doing. This can be used for monitoring or debugging live deployments as well as for instrumenting code under test so that you can be sure it has operated correctly. By injecting code at very specific locations you can avoid the overheads which often arise when you switch on debug or product trace. Also, you decide what to trace when you run your application rather than when you write it so you don't need 100% hindsight to be able to obtain the information you need.
tracing  java  byteman  injection  jvm  ops  debugging  testing 
september 2015 by jm
Time on multi-core, multi-socket servers
Nice update on the state of System.currentTimeMillis() and System.nanoTime() in javaland. Bottom line: both are non-monotonic nowadays:
The conclusion I've reached is that except for the special case of using nanoTime() in micro benchmarks, you may as well stick to currentTimeMillis() —knowing that it may sporadically jump forwards or backwards. Because if you switched to nanoTime(), you don't get any monotonicity guarantees, it doesn't relate to human time any more —and may be more likely to lead you into writing code which assumes a fast call with consistent, monotonic results.
java  time  monotonic  sequencing  nanotime  timers  jvm  multicore  distributed-computing 
september 2015 by jm
Preventing Dependency Chain Attacks in Maven
using a whitelist of allowed dependency JARs and their SHAs
security  whitelisting  dependencies  coding  jar  maven  java  jvm 
august 2015 by jm
a command line tool for JVM diagnostic troubleshooting and profiling.
java  jvm  monitoring  commandline  jmx  sjk  tools  ops 
june 2015 by jm
Orbit Async
Orbit Async implements async-await methods in the JVM. It allows programmers to write asynchronous code in a sequential fashion. It was developed by BioWare, a division of Electronic Arts.

Open source, BSD-licensed.
async  await  java  jvm  bioware  coding  threading 
june 2015 by jm
a Java based low latency, high throughput message bus, built on top of a memory mapped file; inspired by Java Chronicle with the main difference that it's designed to efficiently support multiple writers – enabling use cases where the order of messages produced by multiple processes are important. MappedBus can be also described as an efficient IPC mechanism which enable several Java programs to communicate by exchanging messages.
ipc  java  jvm  mappedbus  low-latency  mmap  message-bus  data-structures  queue  message-passing 
may 2015 by jm
Migration to, Expectations, and Advanced Tuning of G1GC
Bookmarking for future reference. recommended by one of the GC experts, I can't recall exactly who ;)
gc  g1gc  jvm  java  tuning  performance  ops  migration 
may 2015 by jm
"Trash Day: Coordinating Garbage Collection in Distributed Systems"
Another GC-coordination strategy, similar to Blade (qv), with some real-world examples using Cassandra
blade  via:adriancolyer  papers  gc  distsys  algorithms  distributed  java  jvm  latency  spark  cassandra 
may 2015 by jm
Cassandra moving to using G1 as the default recommended GC implementation
This is a big indicator that G1 is ready for primetime. CMS has long been the go-to GC for production usage, but requires careful, complex hand-tuning -- if G1 is getting to a stage where it's just a case of giving it enough RAM, that'd be great.

Also, looks like it'll be the JDK9 default:
cassandra  tuning  ops  g1gc  cms  gc  java  jvm  production  performance  memory 
april 2015 by jm
Optimizing Java CMS garbage collections, its difficulties, and using JTune as a solution | LinkedIn Engineering
I like the sound of this -- automated Java CMS GC tuning, kind of like a free version of JClarity's Censum (via Miguel Ángel Pastor)
java  jvm  tuning  gc  cms  linkedin  performance  ops 
april 2015 by jm
The Four Month Bug: JVM statistics cause garbage collection pauses (
Ugh, tying GC safepoints to disk I/O? bad idea:
The JVM by default exports statistics by mmap-ing a file in /tmp (hsperfdata). On Linux, modifying a mmap-ed file can block until disk I/O completes, which can be hundreds of milliseconds. Since the JVM modifies these statistics during garbage collection and safepoints, this causes pauses that are hundreds of milliseconds long. To reduce worst-case pause latencies, add the -XX:+PerfDisableSharedMem JVM flag to disable this feature. This will break tools that read this file, like jstat.
bugs  gc  java  jvm  disk  mmap  latency  ops  jstat 
march 2015 by jm
Biased Locking in HotSpot (David Dice's Weblog)
This is pretty nuts. If biased locking in the HotSpot JVM is causing performance issues, it can be turned off:
You can avoid biased locking on a per-object basis by calling System.identityHashCode(o). If the object is already biased, assigning an identity hashCode will result in revocation, otherwise, the assignment of a hashCode() will make the object ineligible for subsequent biased locking.
hashcode  jvm  java  biased-locking  locking  mutex  synchronization  locks  performance 
march 2015 by jm
JClarity's Illuminate
Performance-diagnosis-as-a-service. Cool.
Users download and install an Illuminate Daemon using a simple installer which starts up a small stand alone Java process. The Daemon sits quietly unless it is asked to start gathering SLA data and/or to trigger a diagnosis. Users can set SLA’s via the dashboard and can opt to collect latency measurements of their transactions manually (using our library) or by asking Illuminate to automatically instrument their code (Servlet and JDBC based transactions are currently supported).

SLA latency data for transactions is collected on a short cycle. When the moving average of latency measurements goes above the SLA value (e.g. 150ms), a diagnosis is triggered. The diagnosis is very quick, gathering key data from O/S, JVM(s), virtualisation and other areas of the system. The data is then run through the machine learned algorithm which will quickly narrow down the possible causes and gather a little extra data if needed.

Once Illuminate has determined the root cause of the performance problem, the diagnosis report is sent back to the dashboard and an alert is sent to the user. That alert contains a link to the result of the diagnosis which the user can share with colleagues. Illuminate has all sorts of backoff strategies to ensure that users don’t get too many alerts of the same type in rapid succession!
illuminate  jclarity  java  jvm  scala  latency  gc  tuning  performance 
february 2015 by jm
Sysdig Cloud’s JMX Metrics
Sysdig Cloud users have the ability to view and analyze Java Management Extensions (JMX) metrics out of the box with no additional configuration or setup required.
sysdig  jmx  java  jvm 
february 2015 by jm
OpenJDK: jol
'JOL (Java Object Layout) is the tiny toolbox to analyze object layout schemes in JVMs. These tools are using Unsafe, JVMTI, and Serviceability Agent (SA) heavily to decoder the actual object layout, footprint, and references. This makes JOL much more accurate than other tools relying on heap dumps, specification assumptions, etc.'

Recommended by Nitsan Wakart, looks pretty useful for JVM devs
java  jvm  tools  scala  memory  estimation  ram  object-layout  debugging  via:nitsan 
february 2015 by jm
TL;DR: Cassandra Java Huge Pages
Al Tobey does some trial runs of -XX:+AlwaysPreTouch and -XX:+UseHugePages
jvm  performance  tuning  huge-pages  vm  ops  cassandra  java 
february 2015 by jm
Maintaining performance in distributed systems [slides]
Great slide deck from Elasticsearch on JVM/dist-sys performance optimization
performance  elasticsearch  java  jvm  ops  tuning 
january 2015 by jm
Java Concurrency Tools for the JVM. This project aims to offer some concurrent data structures currently missing from the JDK:

Bounded lock free queues
SPSC/MPSC/SPMC/MPMC variations for concurrent queues
Alternative interfaces for queues (experimental)
Offheap concurrent ring buffer for ITC/IPC purposes (experimental)
Executor (planned)
concurrency  lock-free  data-structures  queues  jvm  java 
january 2015 by jm
Hack workaround to get JVM thread priorities working on Linux
As used in Cassandra ( )!
if you just set the "ThreadPriorityPolicy" to something else than the legal values 0 or 1, [...] a slight logic bug in Sun's JVM code kicks in, and thus sets the policy to be as if running with root - thus you get exactly what one desire. The operating system, Linux, won't allow priorities to be heightened above "Normal" (negative nice value), and thus just ignores those requests (setting it to normal instead, nice value 0) - but it lets through the requests to set it lower (setting the nice value to some positive value).
cassandra  thread-priorities  threads  java  jvm  linux  nice  hacks 
january 2015 by jm
ExecutorService - 10 tips and tricks
Excellent advice from Tomasz Nurkiewicz' blog for anyone using java.util.concurrent.ExecutorService regularly. The whole blog is full of great posts btw
concurrency  java  jvm  threading  threads  executors  coding 
november 2014 by jm
An embryonic metrics library for Java/Scala from Felix GV at LinkedIn, extracted from Kafka's metric implementation and in the new Voldemort release. It fixes the major known problems with the Meter/Timer implementations in Coda-Hale/Dropwizard/Yammer Metrics.

'Regarding Tehuti: it has been extracted from Kafka's metric implementation. The code was originally written by Jay Kreps, and then maintained improved by some Kafka and Voldemort devs, so it definitely is not the work of just one person. It is in my repo at the moment but I'd like to put it in a more generally available (git and maven) repo in the future. I just haven't had the time yet...

As for comparing with CodaHale/Yammer, there were a few concerns with it, but the main one was that we didn't like the exponentially decaying histogram implementation. While that implementation is very appealing in terms of (low) memory usage, it has several misleading characteristics (a lack of incoming data points makes old measurements linger longer than they should, and there's also a fairly high possiblity of losing interesting outlier data points). This makes the exp decaying implementation robust in high throughput fairly constant workloads, but unreliable in sparse or spiky workloads. The Tehuti implementation provides semantics that we find easier to reason with and with a small code footprint (which we consider a plus in terms of maintainability). Of course, it is still a fairly young project, so it could be improved further.'

More background at the kafka-dev thread:
kafka  metrics  dropwizard  java  scala  jvm  timers  ewma  statistics  measurement  latency  sampling  tehuti  voldemort  linkedin  jay-kreps 
october 2014 by jm
Troubleshooting Production JVMs with jcmd
remotely trigger GCs, finalization, heap dumps etc. Handy
jvm  jcmd  debugging  ops  java  gc  heap  troubleshooting 
september 2014 by jm
Java tip: optimizing memory consumption
Good tips on how to tell if object allocation rate is a bottleneck in your JVM-based code
yourkit  memory  java  jvm  allocation  gc  bottlenecks  performance 
august 2014 by jm
OpenHFT/hftc · GitHub
This is a yet another Java collections library of primitive specializations. Java 6+. Apache 2.0 license. Currently only hash sets and hash maps are implemented.
openhft  performance  java  jvm  collections  asl  hashsets  hashmaps  data-structures 
august 2014 by jm
Heapster provides an agent library to do heap profiling for JVM processes with output compatible with Google perftools. The goal of Heapster is to be able to do meaningful (sampled) heap profiling in a production setting.

Used by Twitter in production, apparently.
heap  monitoring  memory  jvm  java  performance 
july 2014 by jm
Dynamic Tuple Performance On the JVM
More JVM off-heap storage from Boundary:
generates heterogeneous collections of primitive values and ensures as best it can that they will be laid out adjacently in memory. The individual values in the tuple can either be accessed from a statically bound interface, via an indexed accessor, or via reflective or other dynamic invocation techniques. FastTuple is designed to deal with a large number of tuples therefore it will also attempt to pool tuples such that they do not add significantly to the GC load of a system. FastTuple is also capable of allocating the tuple value storage entirely off-heap, using Java’s direct memory capabilities.
jvm  java  gc  off-heap  storage  boundary  memory 
may 2014 by jm
storage of structured data in a continuous block of memory. The memory can be allocated on the heap using a byte[] array or can be allocated off the java heap in native memory. [...] Use cases: store/cache huge amounts of data records without impact on GC duration; high performance data transfer in a cluster or in between processes

handy OSS from Ruediger Moeller
structs  java  jvm  memory  off-heap  storage  reference 
may 2014 by jm
Exceptional Performance
Good benchmark data on the performance of JVM exceptions
java  jvm  exceptions  benchmarking  performance  optimization  coding 
may 2014 by jm
Alexey Shipilev on Java's System.nanoTime()
System.nanoTime is as bad as String.intern now: you can use it, but use it wisely. The latency, granularity, and scalability effects introduced by timers may and will affect your measurements if done without proper rigor. This is one of the many reasons why System.nanoTime should be abstracted from the users by benchmarking frameworks, monitoring tools, profilers, and other tools written by people who have time to track if the underlying platform is capable of doing what we want it to do.

In some cases, there is no good solution to the problem at hand. Some things are not directly measurable. Some things are measurable with unpractical overheads. Internalize that fact, weep a little, and move on to building the indirect experiments. This is not the Wonderland, Alice. Understanding how the Universe works often needs side routes to explore.

In all seriousness, we should be happy our $1000 hardware can measure 30 nanosecond intervals pretty reliably. This is roughly the time needed for the Internet packets originating from my home router to leave my apartment. What else do you want, you spoiled brats?
benchmarking  jdk  java  measurement  nanoseconds  nsecs  nanotime  jvm  alexey-shipilev  jmh 
may 2014 by jm
Sirius by Comcast
At Comcast, our applications need convenient, low-latency access to important reference datasets. For example, our XfinityTV websites and apps need to use entertainment-related data to serve almost every API or web request to our datacenters: information like what year Casablanca was released, or how many episodes were in Season 7 of Seinfeld, or when the next episode of the Voice will be airing (and on which channel!).

We traditionally managed this information with a combination of relational databases and RESTful web services but yearned for something simpler than the ORM, HTTP client, and cache management code our developers dealt with on a daily basis. As main memory sizes on commodity servers continued to grow, however, we asked ourselves: How can we keep this reference data entirely in RAM, while ensuring it gets updated as needed and is easily accessible to application developers?

The Sirius distributed system library is our answer to that question, and we're happy to announce that we've made it available as an open source project. Sirius is written in Scala and uses the Akka actor system under the covers, but is easily usable by any JVM-based language.

Also includes a Paxos implementation with "fast follower" read-only slave replication. ASL2-licensed open source.

The only thing I can spot to be worried about is speed of startup; they note that apps need to replay a log at startup to rebuild state, which can be slow if unoptimized in my experience.

Update: in a twitter conversation at , Jon Moore indicated they haven't had problems with this even with 'datasets consuming 10-20GB of heap', and have 'benchmarked a 5-node Sirius ingest cluster up to 1k updates/sec write throughput.' That's pretty solid!
open-source  comcast  paxos  replication  read-only  datastores  storage  memory  memcached  redis  sirius  scala  akka  jvm  libraries 
april 2014 by jm
Garbage Collection Optimization for High-Throughput and Low-Latency Java Applications
LinkedIn talk about the GC opts they used to optimize the Feed. good detail
performance  optimization  linkedin  java  jvm  gc  tuning 
april 2014 by jm
Impact of large primitive arrays (BLOBS) on JVM Garbage Collection
some nice graphs and data on CMS performance, with/without -XX:ParGCCardsPerStrideChunk
cms  java  jvm  performance  optimization  tuning  off-heap-storage  memory 
march 2014 by jm
Little’s Law, Scalability and Fault Tolerance: The OS is your bottleneck. What you can do?
good blog post on Little's Law, plugging quasar, pulsar, and comsat, 3 new open-source libs offering Erlang-like lightweight threads on the JVM
jvm  java  quasar  pulsar  comsat  littles-law  scalability  async  erlang 
february 2014 by jm
Cassandra: tuning the JVM for read heavy workloads
The cluster we tuned is hosted on AWS and is comprised of 6 hi1.4xlarge EC2 instances, with 2 1TB SSDs raided together in a raid 0 configuration. The cluster’s dataset is growing steadily. At the time of this writing, our dataset is 341GB, up from less than 200GB a few months ago, and is growing by 2-3GB per day. The workload on this cluster is very read heavy, with quorum reads making up 99% of all operations.

Some careful GC tuning here. Probably not applicable to anyone else, but good approach in general.
java  performance  jvm  scaling  gc  tuning  cassandra  ops 
january 2014 by jm
Safe cross-thread publication of a non-final variable in the JVM
Scary, but potentially useful in future, so worth bookmarking. By carefully orchestrating memory accesses using volatile and non-volatile fields, one can ensure that a non-volatile, non-synchronized field's value is safely visible to all threads after that point due to JMM barrier semantics.

What you are looking to do is enforce a barrier between your initializing stores and your publishing store, without that publishing store being made to a volatile field. This can be done by using volatile access to other fields in the publication path, without using those variables in the later access paths to the published object.
volatile  atomic  java  jvm  gil-tene  synchronization  performance  threading  jmm  memory-barriers 
january 2014 by jm
examining the Hardware Performance Counters
using the overseer library and libpfm, it's possible for a JVM app to record metrics about L2/DRAM cache hit rates and latency
metrics  hpc  libpfm  java  jvm  via:normanmaurer  l2  dram  llc  cpu 
december 2013 by jm
Twitter tech talk video: "Profiling Java In Production"
In this talk Kaushik Srenevasan describes a new, low overhead, full-stack tool (based on the Linux perf profiler and infrastructure built into the Hotspot JVM) we've built at Twitter to solve the problem of dynamically profiling and tracing the behavior of applications (including managed runtimes) in production.

Looks very interesting. Haven't watched it yet though
twitter  tech-talks  video  presentations  java  jvm  profiling  testing  monitoring  service-metrics  performance  production  hotspot  perf 
december 2013 by jm
Looks very alpha, but one to watch.
A JVM Implementation of the Raft Consensus Protocol
via:sbtourist  raft  jvm  java  consensus  distributed-computing 
november 2013 by jm
Reactor hits GA
'It can't just be Big Data, it has to be Fast Data: Reactor 1.0 goes GA':

Reactor provides the necessary abstractions to build high-throughput, low-latency--what we now call "fast data"--applications that absolutely must work with thousands, tens of thousands, or even millions of concurrent requests per second. Modern JVM applications must be built on a solid foundation of asynchronous and reactive components that efficiently manage the execution of a very large number of tasks on a very small number of system threads. Reactor is specifically designed to help you build these kinds of applications without getting in your way or forcing you to work within an opinionated pattern.

Featuring the LMAX Disruptor ringbuffer, the JavaChronicle fast persistent message-passing queue, Groovy closures, and Netty 4.0. This looks very handy indeed....
disruptor  reactive-programming  reactor  async  libraries  java  jvm  frameworks  spring  netty  fast-data 
november 2013 by jm
interesting new data structure, pending addition in Java 8. Basically an array of arrays which presents the API of a single List.
An ordered collection of elements. Elements can be added, but not removed. Goes through a building phase, during which elements can be added, and a traversal phase, during which elements can be traversed in order but no further modifications are possible.
spinedbuffer  data-structures  algorithms  java  jdk  jvm  java-8  arrays  lists 
october 2013 by jm
What drives JVM full GC duration
Interesting empirical results using JDK 7u21:
Full GC duration depends on the number of objects allocated and the locality of their references. It does not depend that much on actual heap size.

Reference locality has a surprisingly high effect.
java  jvm  data  gc  tuning  performance  cms  g1 
october 2013 by jm
SPSC revisited part III - FastFlow + Sparse Data
holy moly. This is some heavily-optimized mechanical-sympathy Java code. By using a sparse data structure, cache-aligned fields, and wait-free low-level CAS concurrency primitives via sun.misc.Unsafe, a single-producer/single-consumer queue implementation goes pretty damn fast compared to the current state of the art
nitsanw  optimization  concurrency  java  jvm  cas  spsc  queues  data-structures  algorithms 
october 2013 by jm
Down the Rabbit Hole
An adventure that takes you through several popular Java language features and shows how they compile to bytecode and eventually JIT to assembly code.
charles-nutter  java  jvm  compilation  reversing  talks  slides 
october 2013 by jm
Creating Flight Recordings
lots more detail on the new "Java Mission Control" feature in Hotspot 7u40 JVMs, and how to use it to start and stop profiling in a live, production JVM from a separate "jcmd" command-line client. If the overhead is small, this could be really neat -- turn on profiling for 1 minute every hour on a single instance, and collect realtime production profile data on an automated basis for post-facto analysis if required
instrumentation  logging  profiling  java  jvm  ops 
september 2013 by jm
Low Overhead Method Profiling with Java Mission Control now enabled in the most recent HotSpot JVM release
Built into the HotSpot JVM [in JDK version 7u40] is something called the Java Flight Recorder. It records a lot of information about/from the JVM runtime, and can be thought of as similar to the Data Flight Recorders you find in modern airplanes. You normally use the Flight Recorder to find out what was happening in your JVM when something went wrong, but it is also a pretty awesome tool for production time profiling. Since Mission Control (using the default templates) normally don’t cause more than a per cent overhead, you can use it on your production server.

I'm intrigued by the idea of always-on profiling in production. This could be cool.
performance  java  measurement  profiling  jvm  jdk  hotspot  mission-control  instrumentation  telemetry  metrics 
september 2013 by jm
modern JVM concurrency primitives are broken if the system clock steps backwards
'The implementation of the concurrency primitive LockSupport.parkNanos(), the function that controls *every* concurrency primitive on the JVM, is flawed, and any NTP sync, or system time change, can potentially break it with unexpected results across the board when running a 64bit JVM on Linux 64bit.'

Basically, LockSupport.parkNanos() calls pthread_cond_timedwait() using a CLOCK_REALTIME instead of CLOCK_MONOTONIC. 'tinker step 0' in ntp.conf may be a viable workaround.
clocks  timing  ntp  slew  sync  step  pthreads  java  jvm  timers  clock_realtime  clock_monotonic 
september 2013 by jm
Voldemort on Solid State Drives [paper]
'This paper and talk was given by the LinkedIn Voldemort Team at the Workshop on Big Data Benchmarking (WBDB May 2012).'

With SSD, we find that garbage collection will become a very significant bottleneck, especially for systems which have little control over the storage layer and rely on Java memory management. Big heapsizes make the cost of garbage collection expensive, especially the single threaded CMS Initial mark. We believe that data systems must revisit their caching strategies with SSDs. In this regard, SSD has provided an efficient solution for handling fragmentation and moving towards predictable multitenancy.
voldemort  storage  ssd  disk  linkedin  big-data  jvm  tuning  ops  gc 
september 2013 by jm
Interview with the Github Elasticsearch Team
good background on Github's Elasticsearch scaling efforts. Some rather horrific split-brain problems under load, and crashes due to OpenJDK bugs (sounds like OpenJDK *still* isn't ready for production). painful
elasticsearch  github  search  ops  scaling  split-brain  outages  openjdk  java  jdk  jvm 
september 2013 by jm
New Tweets per second record, and how | Twitter Blog
How Twitter scaled up massively in 3 years -- replacing Ruby with the JVM, adopting SOA and custom sharding. Good summary post, looking forward to more techie details soon
twitter  performance  scalability  jvm  ruby  soa  scaling 
august 2013 by jm
[JVM] GC is a difficult, specialised area that can be very frustrating for busy developers or devops folks to deal with. The JVM has a number of Garbage Collectors and a bewildering array of switches that can alter the behaviour of each collector. Censum does all of the parsing, number crunching and statistical analysis for you, so you don't have to go and get that PhD in Computer Science in order to solve your GC performance problem. Censum gives you straight answers as opposed to a ton of raw data. can eat any GC log you care to throw at it. is easy to install and use.

Commercial software, UKP 495 per license.
censum  gc  tuning  ops  java  jvm  commercial 
july 2013 by jm
Tuning and benchmarking Java 7's Garbage Collectors: Default, CMS and G1
Rudiger Moller runs through a typical GC-tuning session, in exhaustive detail
java  gc  tuning  jvm  cms  g1  ops 
july 2013 by jm
Java Garbage Collection Distilled
a great summary of the state of JVM garbage collection from Martin Thompson
jvm  java  gc  garbage-collection  tuning  memory  performance  martin-thompson 
july 2013 by jm
Java Concurrent Counters By Numbers
threadsafe counters in the JVM compared. AtomicLong, Doug Lea's LongAdder, a ThreadLocal counter, and a field-on-the-Thread-object counter int (via Darach Ennis). Nitsan's posts on concurrency are fantastic
counters  concurrency  threads  java  jvm  atomic 
june 2013 by jm
Java Garbage Collection Distilled
Martin Thompson lays it out:
Serial, Parallel, Concurrent, CMS, G1, Young Gen, New Gen, Old Gen, Perm Gen, Eden, Tenured, Survivor Spaces, Safepoints, and the hundreds of JVM start-up flags. Does this all baffle you when trying to tune the garbage collector while trying to get the required throughput and latency from your Java application? If it does then don’t worry, you are not alone. Documentation describing garbage collection feels like man pages for an aircraft. Every knob and dial is detailed and explained but nowhere can you find a guide on how to fly. This article will attempt to explain the tradeoffs when choosing and tuning garbage collection algorithms for a particular workload.
gc  java  garbage-collection  coding  cms  g1  jvm  optimization 
june 2013 by jm
Functional Reactive Programming in the Netflix API with RxJava
Hmm, this seems nifty as a compositional building block for Java code to enable concurrency without thread-safety and sync problems.
Functional reactive programming offers efficient execution and composition by providing a collection of operators capable of filtering, selecting, transforming, combining and composing Observable's.

The Observable data type can be thought of as a "push" equivalent to Iterable which is "pull". With an Iterable, the consumer pulls values from the producer and the thread blocks until those values arrive. By contrast with the Observable type, the producer pushes values to the consumer whenever values are available. This approach is more flexible, because values can arrive synchronously or asynchronously.
concurrency  java  jvm  threads  thread-safety  coding  rx  frp  fp  functional-programming  reactive  functional  async  observable 
april 2013 by jm
The useful JVM options
a good reference, with lots of sample output. Not clear if it takes 1.6/1.7 differences into account, though
jvm  reference  java  ops  hotspot  command-line 
april 2013 by jm
dumping a JVM heap using gdb
now this is a neat trick -- having been stuck having to flip to spares and do other antics while a long-running heap dump took place, this is a winner.
Dumping a JVM’s heap is an extremely useful tool for debugging problems with a J2EE application. Unfortunately, when a JVM explodes, using the standard jmap tool can take an inordinate amount of time to execute for lots of different reasons. This leads to extended downtime when a heap dump is attempted and even then, jmap regularly fails.
This blog post is intended to outline an alternate method using [gdb] to achieve a heap dump that only requires mere seconds of additional downtime allowing the slow jmap process to happen once the application is back in service.
heap-dump  gdb  heap  jvm  java  via:peakscale  gcore  core  core-dump  debugging 
march 2013 by jm
Single Producer/Consumer lock free Queue step by step
great dissection of Martin "Disruptor" Thompson's lock-free single-producer/single-consumer queue data structure, with benchmark results showing crazy speedups. This is particularly useful since it's a data structure that can be used to provide good lock-free speedups without adopting the entire Disruptor design pattern.
disruptor  coding  java  jvm  martin-thompson  lock-free  volatile  atomic  queue  data-structures 
march 2013 by jm
From a monolithic Ruby on Rails app to the JVM
How Soundcloud have ditched the monolithic Rails for nimbler, small-scale distributed polyglot services running on the JVM
soundcloud  rails  slides  jvm  scalability  ruby  scala  clojure  coding 
march 2013 by jm
Are volatile reads really free?
Marc Brooker with some good test data:
It appears as though reads to volatile variables are not free in Java on x86, or at least on the tested setup. It's true that the difference isn't so huge (especially for the read-only case) that it'll make a difference in any but the more performance sensitive case, but that's a different statement from free.
volatile  concurrency  jvm  performance  java  marc-brooker 
february 2013 by jm
Extreme Performance with Java - Charlie Hunt [slides, PDF]
presentation slides for Charlie Hunt's 2012 QCon presentation, where he discusses 'what you need to know about a modern JVM in order
to be effective at writing a low latency Java application'. The talk video is at
low-latency  charlie-hunt  performance  java  jvm  presentations  qcon  slides  pdf 
january 2013 by jm
Cliff Click in "A JVM Does What?"
interesting YouTubed presentation from Azul's Cliff Click on some java/JVM innards
presentation  concurrency  jvm  video  java  youtube  cliff-click 
december 2012 by jm
"This project aims at creating a simple efficient building block for "Big Data" libraries, applications and frameworks; thing that can be used as an in-memory, bounded queue with opaque values (sequence of JDK primitive values): insertions at tail, removal from head, single entry peeks), and that has minimal garbage collection overhead. Insertions and removals are as individual entries, which are sub-sequences of the full buffer.

GC overhead minimization is achieved by use of direct ByteBuffers (memory allocated outside of GC-prone heap); and bounded nature by only supporting storage of simple primitive value (byte, `long') sequences where size is explicitly known.

Conceptually memory buffers are just simple circular buffers (ring buffers) that hold a sequence of primitive values, bit like arrays, but in a way that allows dynamic automatic resizings of the underlying storage. Library supports efficient reusing and sharing of underlying segments for sets of buffers, although for many use cases a single buffer suffices."
gc  java  jvm  bytebuffer 
december 2012 by jm
« earlier      
per page:    204080120160

related tags

32bit  actors  akka  alexey-shipilev  algorithms  allocation  amdahls-law  analysis  apple  arrays  asl  async  atomic  await  azul  backpressure  benchmarking  benchmarks  biased-locking  big-data  bioware  blade  bottlenecks  boundary  browser  bugs  bytebuffer  byteman  caching  capacity  cas  cassandra  censum  charles-nutter  charlie-hunt  chronon  classpath  cliff-click  clocks  clock_monotonic  clock_realtime  clojure  cms  coding  collections  comcast  command-line  commandline  commercial  compilation  comsat  concurrency  consensus  contention  core  core-dump  counters  cpu  data  data-structures  datastores  debugging  dependencies  deployment  dirigiste  disk  disruptor  distributed  distributed-computing  distsys  doug-lea  dram  dropwizard  dvr  eclipse  elasticsearch  emulation  erlang  estimation  event-processing  event-streams  ewma  exceptions  executors  fast-data  flame-graphs  fp  frameworks  frp  functional  functional-programming  g1  g1gc  garbage-collection  gc  gcore  gdb  gil-tene  github  globbing  gzipinputstream  hacks  hash  hashcode  hashmap  hashmaps  hashsets  hbase  heap  heap-dump  hotspot  hpc  http  https  huge-pages  hystrix  illuminate  injection  instrumentation  interactive  io  ipc  jar  jars  java  java-8  javascript  jay-kreps  jclarity  jcmd  jdk  jemalloc  jit  jmh  jmm  jmx  js  jstat  jvm  kafka  l2  language  latency  leak-checking  leaks  libpfm  libraries  linkedin  linkerd  linux  lists  littles-law  llc  lock-free  locking  locks  logging  longadder  low-latency  mailinator  malloc  map  mappedbus  marc-brooker  martin-thompson  maven  measurement  mechanical-sympathy  memcached  memory  memory-barriers  memory-leaks  memstore  message-bus  message-passing  metrics  migration  mission-control  mmap  modelling  monitoring  monotonic  multicore  multithreading  mutex  nanoseconds  nanotime  native  native-memory  netty  nice  nio  nitsanw  nmt  nonblockinghashtable  norman-maurer  nsecs  ntp  object-layout  object-pooling  object-pools  observable  off-heap  off-heap-storage  open-source  openhft  openjdk  ops  optimization  oscon  outages  papers  paul-tyma  paxos  pdf  perf  performance  presentation  presentations  production  profiling  programming  pthreads  pulsar  qcon  quasar  queue  queues  raft  rails  ram  reactive  reactive-programming  reactor  read-only  redis  reference  references  repl  replication  reversing  ruby  rx  safepoints  sampling  scala  scalability  scaling  schedulers  scott-andreas  search  security  sequencing  server  service-metrics  signalling  sirius  sjk  slab-allocator  slew  slides  snaptree  soa  softreferences  soundcloud  spark  spinedbuffer  split-brain  spring  spsc  ssd  ssh  stack-size  statistics  step  storage  structs  sync  synchronization  sysdig  system-dynamics  systems  talks  tcp  tech-talks  tehuti  telemetry  testing  thread-priorities  thread-safety  threading  threadpools  threads  time  timers  timing  tools  tracing  troubleshooting  tuning  twitter  uberjars  undocumented  usenix  usl  utilization  via:adriancolyer  via:b3n  via:dehora  via:nitsan  via:normanmaurer  via:peakscale  via:sbtourist  via:spyced  video  vm  volatile  voldemort  weak  weakhashmap  weakreferences  websockets  whitelisting  yourkit  youtube 

Copy this bookmark: