jm + java   137

Java tip: optimizing memory consumption
Good tips on how to tell if object allocation rate is a bottleneck in your JVM-based code
yourkit  memory  java  jvm  allocation  gc  bottlenecks  performance 
13 days ago by jm
OpenHFT/hftc · GitHub
This is a yet another Java collections library of primitive specializations. Java 6+. Apache 2.0 license. Currently only hash sets and hash maps are implemented.
openhft  performance  java  jvm  collections  asl  hashsets  hashmaps  data-structures 
28 days ago by jm
MinHash for dummies
A Java-oriented practical intro to the MinHash duplicate-detection shingling algo
shingling  algorithms  minhash  hashing  duplicates  duplicate-detection  fuzzy-matching  java 
28 days ago by jm
How to take over the computer of any JVM developer
To prove how easy [MITM attacking Mavencentral JARs] is to do, I wrote dilettante, a man-in-the-middle proxy that intercepts JARs from maven central and injects malicious code into them. Proxying HTTP traffic through dilettante will backdoor any JARs downloaded from maven central. The backdoored version will retain their functionality, but display a nice message to the user when they use the library.
jars  dependencies  java  build  clojure  security  mitm  http  proxies  backdoors  scala  maven  gradle 
5 weeks ago by jm
Collection Pipeline
a nice summarisation of the state of pipe/stream-oriented collection operations in various languages, from Martin Fowler
martin-fowler  patterns  coding  ruby  clojure  streams  pipelines  pipes  unix  lambda  fp  java  languages 
5 weeks ago by jm
mariusaeriksen/heapster
Heapster provides an agent library to do heap profiling for JVM processes with output compatible with Google perftools. The goal of Heapster is to be able to do meaningful (sampled) heap profiling in a production setting.


Used by Twitter in production, apparently.
heap  monitoring  memory  jvm  java  performance 
5 weeks ago by jm
Netflix/ribbon
a client side IPC library that is battle-tested in cloud. It provides the following features:

Load balancing;
Fault tolerance;
Multiple protocol (HTTP, TCP, UDP) support in an asynchronous and reactive model;
Caching and batching.

I like the integration of Eureka and Hystrix in particular, although I would really like to read more about Eureka's approach to availability during network partitions and CAP.

https://groups.google.com/d/msg/eureka_netflix/LXKWoD14RFY/-5nElGl1OQ0J has some interesting discussion on the topic. It actually sounds like the Eureka approach is more correct than using ZK: 'Eureka is available. ZooKeeper, while tolerant against single node failures, doesn't react well to long partitioning events. For us, it's vastly more important that we maintain an available registry than a necessary consistent registry. If us-east-1d sees 23 nodes, and us-east-1c sees 22 nodes for a little bit, that's OK with us.'

See also http://ispyker.blogspot.ie/2013/12/zookeeper-as-cloud-native-service.html which corroborates this:

I went into one of the instances and quickly did an iptables DROP on all packets coming from the other two instances. This would simulate an availability zone continuing to function, but that zone losing network connectivity to the other availability zones. What I saw was that the two other instances noticed that the first server “going away”, but they continued to function as they still saw a majority (66%). More interestingly the first instance noticed the other two servers “going away” dropping the ensemble availability to 33%. This caused the first server to stop serving requests to clients (not only writes, but also reads). [...]

To me this seems like a concern, as network partitions should be considered an event that should be survived. In this case (with this specific configuration of zookeeper) no new clients in that availability zone would be able to register themselves with consumers within the same availability zone. Adding more zookeeper instances to the ensemble wouldn’t help considering a balanced deployment as in this case the availability would always be majority (66%) and non-majority (33%).
netflix  ribbon  availability  libraries  java  hystrix  eureka  aws  ec2  load-balancing  networking  http  tcp  architecture  clients  ipc 
7 weeks ago by jm
ByteArrayOutputStream is really, really slow sometimes in JDK6
This leads us to the bug. The size of the array is determined by Math.max(buf.length << 1, newcount). Ordinarily, buf.length << 1 returns double buf.length, which would always be much larger than newcount for a 2 byte write. Why was it not? The problem is that for all integers larger than Integer.MAX_INTEGER / 2, shifting left by one place causes overflow, setting the sign bit. The result is a negative integer, which is always less than newcount. So for all byte arrays larger than 1073741824 bytes (i.e. one GB), any write will cause the array to resize, and only to exactly the size required.


Ouch.
bugs  java  jdk6  bytearrayoutputstream  impala  performance  overflow 
9 weeks ago by jm
AWS SDK for Java Client Configuration
turns out the AWS SDK has lots of tuning knobs: region selection, socket buffer sizes, and debug logging (including wire logging).
aws  sdk  java  logging  ec2  s3  dynamodb  sockets  tuning 
11 weeks ago by jm
FlatBuffers: Main Page
A new serialization format from Google's Android gaming team, supporting C++ and Java, open source under the ASL v2. Reasons to use it:
Access to serialized data without parsing/unpacking - What sets FlatBuffers apart is that it represents hierarchical data in a flat binary buffer in such a way that it can still be accessed directly without parsing/unpacking, while also still supporting data structure evolution (forwards/backwards compatibility).
Memory efficiency and speed - The only memory needed to access your data is that of the buffer. It requires 0 additional allocations. FlatBuffers is also very suitable for use with mmap (or streaming), requiring only part of the buffer to be in memory. Access is close to the speed of raw struct access with only one extra indirection (a kind of vtable) to allow for format evolution and optional fields. It is aimed at projects where spending time and space (many memory allocations) to be able to access or construct serialized data is undesirable, such as in games or any other performance sensitive applications. See the benchmarks for details.
Flexible - Optional fields means not only do you get great forwards and backwards compatibility (increasingly important for long-lived games: don't have to update all data with each new version!). It also means you have a lot of choice in what data you write and what data you don't, and how you design data structures.
Tiny code footprint - Small amounts of generated code, and just a single small header as the minimum dependency, which is very easy to integrate. Again, see the benchmark section for details.
Strongly typed - Errors happen at compile time rather than manually having to write repetitive and error prone run-time checks. Useful code can be generated for you.
Convenient to use - Generated C++ code allows for terse access & construction code. Then there's optional functionality for parsing schemas and JSON-like text representations at runtime efficiently if needed (faster and more memory efficient than other JSON parsers).


Looks nice, but it misses the language coverage of protobuf. Definitely more practical than capnproto.
c++  google  java  serialization  json  formats  protobuf  capnproto  storage  flatbuffers 
11 weeks ago by jm
Plumbr.eu's reference page for java.lang.OutOfMemoryError
With examples of each possible cause of a Java OOM, and suggested workarounds. succinct
reference  plumbr.eu  oom  java  memory  heap  ops 
11 weeks ago by jm
Monitoring Reactive Applications with Kamon
"quality monitoring tools for apps built in Akka, Spray and Play!". Uses Gil Tene's HDRHistogram and dropwizard Metrics under the hood.
metrics  dropwizard  hdrhistogram  gil-tene  kamon  akka  spray  play  reactive  statistics  java  scala  percentiles  latency 
may 2014 by jm
Dynamic Tuple Performance On the JVM
More JVM off-heap storage from Boundary:
generates heterogeneous collections of primitive values and ensures as best it can that they will be laid out adjacently in memory. The individual values in the tuple can either be accessed from a statically bound interface, via an indexed accessor, or via reflective or other dynamic invocation techniques. FastTuple is designed to deal with a large number of tuples therefore it will also attempt to pool tuples such that they do not add significantly to the GC load of a system. FastTuple is also capable of allocating the tuple value storage entirely off-heap, using Java’s direct memory capabilities.
jvm  java  gc  off-heap  storage  boundary  memory 
may 2014 by jm
Structs
storage of structured data in a continuous block of memory. The memory can be allocated on the heap using a byte[] array or can be allocated off the java heap in native memory. [...] Use cases: store/cache huge amounts of data records without impact on GC duration; high performance data transfer in a cluster or in between processes


handy OSS from Ruediger Moeller
structs  java  jvm  memory  off-heap  storage  reference 
may 2014 by jm
Exceptional Performance
Good benchmark data on the performance of JVM exceptions
java  jvm  exceptions  benchmarking  performance  optimization  coding 
may 2014 by jm
Spark - A small web framework for Java
A Sinatra-like minimal web framework built on Java 8 lambdas:

<code>public class HelloWorld {
public static void main(String[] args) {
get("/hello", (request, response) -> {
return "Hello World!";
});
}
}</code>
via:sampullara  web  java  sinatra  lambdas  closures  java8  spark 
may 2014 by jm
Alexey Shipilev on Java's System.nanoTime()
System.nanoTime is as bad as String.intern now: you can use it, but use it wisely. The latency, granularity, and scalability effects introduced by timers may and will affect your measurements if done without proper rigor. This is one of the many reasons why System.nanoTime should be abstracted from the users by benchmarking frameworks, monitoring tools, profilers, and other tools written by people who have time to track if the underlying platform is capable of doing what we want it to do.

In some cases, there is no good solution to the problem at hand. Some things are not directly measurable. Some things are measurable with unpractical overheads. Internalize that fact, weep a little, and move on to building the indirect experiments. This is not the Wonderland, Alice. Understanding how the Universe works often needs side routes to explore.

In all seriousness, we should be happy our $1000 hardware can measure 30 nanosecond intervals pretty reliably. This is roughly the time needed for the Internet packets originating from my home router to leave my apartment. What else do you want, you spoiled brats?
benchmarking  jdk  java  measurement  nanoseconds  nsecs  nanotime  jvm  alexey-shipilev  jmh 
may 2014 by jm
Google's Open Bidder stack moving from Jetty to Netty
Open Bidder traditionally used Jetty as an embedded webserver, for the critical tasks of accepting connections, processing HTTP requests, managing service threads, etc. Jetty is a robust, but traditional stack that carries the weight and tradeoffs of Servlet’s 15 years old design. For a maximum performance RTB agent that must combine very large request concurrency with very low latencies, and often benefit also from low-level control over the transport, memory management and other issue, a different webserver stack was required. Open Bidder now supports Netty, an asynchronous, event-driven, high-performance webserver stack.

For existing code, the most important impact is that Netty is not compatible with the Servlet API. Its own internal APIs are often too low-level, not to mention proprietary to Netty; so Open Bidder v0.5 introduces some new, stack-neutral APIs for things like HTTP requests and responses, cookies, request handlers, and even simple HTML templating based on Mustache. These APIs will work with both Netty and Jetty. This means you don’t need to change any code to switch between Jetty and Netty; on the other hand, it also means that existing code written for Open Bidder 0.4 may need some changes even if you plan to keep using Jetty.

[....] Netty's superior efficiency is very significant; it supports 50% more traffic in the same hardware, and it maintains a perfect latency distribution even at the peak of its supported load.


This doc is noteworthy on a couple of grounds:

1. the use of Netty in a public API/library, and the additional layer in place to add a friendlier API on top of that. I hope they might consider releasing that part as OSS at some point.

2. I also find it interesting that their API uses protobufs to marshal the message, and they plan in a future release to serialize those to JSON documents -- that makes a lot of sense.
apis  google  protobufs  json  documents  interoperability  netty  jetty  servlets  performance  java 
april 2014 by jm
Garbage Collection Optimization for High-Throughput and Low-Latency Java Applications
LinkedIn talk about the GC opts they used to optimize the Feed. good detail
performance  optimization  linkedin  java  jvm  gc  tuning 
april 2014 by jm
Transitioning to Scala
Advice from a developer who helped rebuild Walmart.ca with Scala and Play


This is really good advice.
walmart  scala  java  languages  coding  relearning  play  akka 
april 2014 by jm
fauxflake
an easily embeddable, decentralized, k-ordered unique ID generator. It can use the same encoded ID format as Twitter's Snowflake or Boundary's Flake implementations as well as any other customized encoding without too much effort. The fauxflake-core module has no external dependencies and is meant to be about as light as possible while still delivering useful functionality. Essentially, if you want to be able to generate a unique identifier across your infrastructure with reasonable assurances about collisions, then you might find this useful.


From the same guy as the excellent Guava Retrier library; java, ASL2-licensed open source.
open-source  java  asl2  fauxflake  tools  libraries  unique-ids  ids  unique  snowflake  distsys 
april 2014 by jm
Netty Best Practices
'a.k.a. Faster == Better'. Slides from a talk given at Facebook on 19th March 2014 by Norman Maurer
netty  java  performance  optimization  facebook  slides  presentations 
april 2014 by jm
[#1259] Add optimized queue for SCMP pattern and use it in NIO and nativ... · 6efac61 · netty/netty
Interesting -- Netty has imported an optimized ASL2-licensed MPSC queue implementation from Akka (presumably for performance raisins)
performance  optimization  open-source  mpsc  queues  data-structures  netty  akka  java 
march 2014 by jm
Issue 122 - android-query - HTTP 204 Response results in Network Error (-101)
an empty 204 response to a HTTP PUT will trigger this. See also https://code.google.com/p/android/issues/detail?id=24672, '"java.io.IOException: unexpected end of stream" on HttpURLConnection HEAD call'.
http  urlconnection  httpurlconnection  java  android  dalvik  bugs  204  head  get  exceptions 
march 2014 by jm
Impact of large primitive arrays (BLOBS) on JVM Garbage Collection
some nice graphs and data on CMS performance, with/without -XX:ParGCCardsPerStrideChunk
cms  java  jvm  performance  optimization  tuning  off-heap-storage  memory 
march 2014 by jm
What's New in Java 8
good explanation of all the new features -- I'm really looking forward to fixing up all the crappy over-verbose interface-as-lambdas we have scattered throughout our code
java  java8  lambdas  fp  functional-programming  currying  joda-time 
march 2014 by jm
IntelliJ IDEA 13.1 will support Chronon Debugger
This, IMO, would be a really good reason to upgrade to the payware version of IDEA - Chronon looks cool.
Chronon is a new revolutionary tool keeping track of running Java programs and recording their execution process for later analysis, which can be helpful when you need to thoroughly retrace your steps when dealing with complicated bugs.
chronon  debugging  java  intellij  idea  ides  coding  time-warp  time 
march 2014 by jm
A cautionary tale about building large-scale polyglot systems
'a fucking nightmare':
Cascading requires a compilation step, yet since you're writing Ruby code, you get get none of the benefits of static type checking. It was standard to discover a type issue only after kicking off a job on, oh, 10 EC2 machines, only to have it fail because of a type mismatch. And user code embedded in strings would regularly fail to compile – which you again wouldn't discover until after your job was running. Each of these were bad individually, together, they were a fucking nightmare. The interaction between the code in strings and the type system was the worst of all possible worlds. No type checking, yet incredibly brittle, finicky and incomprehensible type errors at run time. I will never forget when one of my friends at Etsy was learning Cascading.JRuby and he couldn't get a type cast to work. I happened to know what would work: a triple cast. You had to cast the value to the type you wanted, not once, not twice, but THREE times.
etsy  scalding  cascading  adtuitive  war-stories  languages  polyglot  ruby  java  strong-typing  jruby  types  hadoop 
march 2014 by jm
The Netflix Dynamic Scripting Platform
At the core of the redesign is a Dynamic Scripting Platform which provides us the ability to inject code into a running Java application at any time. This means we can alter the behavior of the application without a full scale deployment. As you can imagine, this powerful capability is useful in many scenarios. The API Server is one use case, and in this post, we describe how we use this platform to support a distributed development model at Netflix.


Holy crap.
scripting  dynamic-languages  groovy  java  server-side  architecture  netflix 
march 2014 by jm
java - Why not use Double or Float to represent currency?
A good canonical URL for this piece of coding guidance.
For example, suppose you have $1.03 and you spend 42c. How much money do you have left?

System.out.println(1.03 - .42); => prints out 0.6100000000000001.
coding  tips  floating-point  float  java  money  currency  bugs 
february 2014 by jm
Little’s Law, Scalability and Fault Tolerance: The OS is your bottleneck. What you can do?
good blog post on Little's Law, plugging quasar, pulsar, and comsat, 3 new open-source libs offering Erlang-like lightweight threads on the JVM
jvm  java  quasar  pulsar  comsat  littles-law  scalability  async  erlang 
february 2014 by jm
Home · linkedin/rest.li Wiki
Rest.li is a REST+JSON framework for building robust, scalable service architectures using dynamic discovery and simple asynchronous APIs. Rest.li fills a niche for building RESTful service architectures at scale, offering a developer workflow for defining data and REST APIs that promotes uniform interfaces, consistent data modeling, type-safety, and compatibility checked API evolution.


The new underlying comms layer for Voldemort, it seems.
voldemort  d2  rest.li  linkedin  json  rest  http  api  frameworks  java 
february 2014 by jm
Apache Curator
Netflix open-source library to make using ZooKeeper from Java less of a PITA. I really wish I'd used this now, having reimplemented some key parts of it after failures in prod ;)
zookeeper  netflix  apache  curator  java  libraries  open-source 
january 2014 by jm
Cassandra: tuning the JVM for read heavy workloads
The cluster we tuned is hosted on AWS and is comprised of 6 hi1.4xlarge EC2 instances, with 2 1TB SSDs raided together in a raid 0 configuration. The cluster’s dataset is growing steadily. At the time of this writing, our dataset is 341GB, up from less than 200GB a few months ago, and is growing by 2-3GB per day. The workload on this cluster is very read heavy, with quorum reads making up 99% of all operations.


Some careful GC tuning here. Probably not applicable to anyone else, but good approach in general.
java  performance  jvm  scaling  gc  tuning  cassandra  ops 
january 2014 by jm
Safe cross-thread publication of a non-final variable in the JVM
Scary, but potentially useful in future, so worth bookmarking. By carefully orchestrating memory accesses using volatile and non-volatile fields, one can ensure that a non-volatile, non-synchronized field's value is safely visible to all threads after that point due to JMM barrier semantics.

What you are looking to do is enforce a barrier between your initializing stores and your publishing store, without that publishing store being made to a volatile field. This can be done by using volatile access to other fields in the publication path, without using those variables in the later access paths to the published object.
volatile  atomic  java  jvm  gil-tene  synchronization  performance  threading  jmm  memory-barriers 
january 2014 by jm
examining the Hardware Performance Counters
using the overseer library and libpfm, it's possible for a JVM app to record metrics about L2/DRAM cache hit rates and latency
metrics  hpc  libpfm  java  jvm  via:normanmaurer  l2  dram  llc  cpu 
december 2013 by jm
Simple Binary Encoding
'SBE is an OSI layer 6 representation for encoding and decoding application messages in binary format for low-latency applications.'

Licensed under ASL2, C++ and Java supported.
sbe  encoding  codecs  persistence  binary  low-latency  open-source  java  c++  serialization 
december 2013 by jm
Twitter tech talk video: "Profiling Java In Production"
In this talk Kaushik Srenevasan describes a new, low overhead, full-stack tool (based on the Linux perf profiler and infrastructure built into the Hotspot JVM) we've built at Twitter to solve the problem of dynamically profiling and tracing the behavior of applications (including managed runtimes) in production.


Looks very interesting. Haven't watched it yet though
twitter  tech-talks  video  presentations  java  jvm  profiling  testing  monitoring  service-metrics  performance  production  hotspot  perf 
december 2013 by jm
[JavaSpecialists 215] - StampedLock Idioms
a demo of Doug Lea's latest concurrent data structure in Java 8
doug-lea  concurrency  coding  java-8  java  threads 
december 2013 by jm
Asynchronous logging versus Memory Mapped Files
Interesting article around using mmap'd files from Java using RandomAccessFile.getChannel().map(), which allows them to be accessed directly as a ByteBuffer. together with Atomic variable lazySet() operations, this provides pretty excellent performance results on low-latency writes to disk. See also: http://psy-lob-saw.blogspot.ie/2012/12/atomiclazyset-is-performance-win-for.html
atomic  lazyset  putordered  jmm  java  synchronization  randomaccessfile  bytebuffers  performance  optimization  memory  disk  queues 
november 2013 by jm
mgodave/barge
Looks very alpha, but one to watch.
A JVM Implementation of the Raft Consensus Protocol
via:sbtourist  raft  jvm  java  consensus  distributed-computing 
november 2013 by jm
LatencyUtils by giltene
The LatencyUtils package includes useful utilities for tracking latencies. Especially in common in-process recording scenarios, which can exhibit significant coordinated omission sensitivity without proper handling.
gil-tene  metrics  java  measurement  coordinated-omission  latency  speed  service-metrics  open-source 
november 2013 by jm
Reactor hits GA
'It can't just be Big Data, it has to be Fast Data: Reactor 1.0 goes GA':

Reactor provides the necessary abstractions to build high-throughput, low-latency--what we now call "fast data"--applications that absolutely must work with thousands, tens of thousands, or even millions of concurrent requests per second. Modern JVM applications must be built on a solid foundation of asynchronous and reactive components that efficiently manage the execution of a very large number of tasks on a very small number of system threads. Reactor is specifically designed to help you build these kinds of applications without getting in your way or forcing you to work within an opinionated pattern.


Featuring the LMAX Disruptor ringbuffer, the JavaChronicle fast persistent message-passing queue, Groovy closures, and Netty 4.0. This looks very handy indeed....
disruptor  reactive-programming  reactor  async  libraries  java  jvm  frameworks  spring  netty  fast-data 
november 2013 by jm
error-prone - Catch common Java mistakes as compile-time errors
It's common for even the best programmers to make simple mistakes. And commonly, a refactoring which seems safe can leave behind code which will never do what's intended. We're used to getting help from the compiler, but it doesn't do much beyond static type checking. Using error-prone to augment the compiler's static analysis, you can catch more mistakes before they cost you time, or end up as bugs in production. We use error-prone in Google's Java build system to eliminate classes of serious bugs from entering our code, and we've open-sourced it, so you can too!
analysis  java  static-analysis  code  errors  bugs 
november 2013 by jm
Presto: Interacting with petabytes of data at Facebook
Presto has become a major interactive system for the company’s data warehouse. It is deployed in multiple geographical regions and we have successfully scaled a single cluster to 1,000 nodes. The system is actively used by over a thousand employees,who run more than 30,000 queries processing one petabyte daily.

Presto is 10x better than Hive/MapReduce in terms of CPU efficiency and latency for most queries at Facebook. It currently supports a large subset of ANSI SQL, including joins, left/right outer joins, subqueries,and most of the common aggregate and scalar functions, including approximate distinct counts (using HyperLogLog) and approximate percentiles (based on quantile digest). The main restrictions at this stage are a size limitation on the join tables and cardinality of unique keys/groups. The system also lacks the ability to write output data back to tables (currently query results are streamed to the client).
facebook  hadoop  hdfs  open-source  java  sql  hive  map-reduce  querying  olap 
november 2013 by jm
NettoSphere demo
'This article will use NettoSphere, a framework build on top of the popular Netty Framework and Atmosphere with support of WebSockets, Server Side Events and Long-Polling. NettoSphere allows [async JVM framework] Atmosphere's applications to run on top of the Netty Framework.'
atmosphere  netty  async  java  scala  websockets  sse  long-polling  http  demos  games 
november 2013 by jm
java.util.stream.SpinedBuffer
interesting new data structure, pending addition in Java 8. Basically an array of arrays which presents the API of a single List.
An ordered collection of elements. Elements can be added, but not removed. Goes through a building phase, during which elements can be added, and a traversal phase, during which elements can be traversed in order but no further modifications are possible.
spinedbuffer  data-structures  algorithms  java  jdk  jvm  java-8  arrays  lists 
october 2013 by jm
What drives JVM full GC duration
Interesting empirical results using JDK 7u21:
Full GC duration depends on the number of objects allocated and the locality of their references. It does not depend that much on actual heap size.


Reference locality has a surprisingly high effect.
java  jvm  data  gc  tuning  performance  cms  g1 
october 2013 by jm
SPSC revisited part III - FastFlow + Sparse Data
holy moly. This is some heavily-optimized mechanical-sympathy Java code. By using a sparse data structure, cache-aligned fields, and wait-free low-level CAS concurrency primitives via sun.misc.Unsafe, a single-producer/single-consumer queue implementation goes pretty damn fast compared to the current state of the art
nitsanw  optimization  concurrency  java  jvm  cas  spsc  queues  data-structures  algorithms 
october 2013 by jm
Down the Rabbit Hole
An adventure that takes you through several popular Java language features and shows how they compile to bytecode and eventually JIT to assembly code.
charles-nutter  java  jvm  compilation  reversing  talks  slides 
october 2013 by jm
Creating Flight Recordings
lots more detail on the new "Java Mission Control" feature in Hotspot 7u40 JVMs, and how to use it to start and stop profiling in a live, production JVM from a separate "jcmd" command-line client. If the overhead is small, this could be really neat -- turn on profiling for 1 minute every hour on a single instance, and collect realtime production profile data on an automated basis for post-facto analysis if required
instrumentation  logging  profiling  java  jvm  ops 
september 2013 by jm
Low Overhead Method Profiling with Java Mission Control now enabled in the most recent HotSpot JVM release
Built into the HotSpot JVM [in JDK version 7u40] is something called the Java Flight Recorder. It records a lot of information about/from the JVM runtime, and can be thought of as similar to the Data Flight Recorders you find in modern airplanes. You normally use the Flight Recorder to find out what was happening in your JVM when something went wrong, but it is also a pretty awesome tool for production time profiling. Since Mission Control (using the default templates) normally don’t cause more than a per cent overhead, you can use it on your production server.


I'm intrigued by the idea of always-on profiling in production. This could be cool.
performance  java  measurement  profiling  jvm  jdk  hotspot  mission-control  instrumentation  telemetry  metrics 
september 2013 by jm
modern JVM concurrency primitives are broken if the system clock steps backwards
'The implementation of the concurrency primitive LockSupport.parkNanos(), the function that controls *every* concurrency primitive on the JVM, is flawed, and any NTP sync, or system time change, can potentially break it with unexpected results across the board when running a 64bit JVM on Linux 64bit.'

Basically, LockSupport.parkNanos() calls pthread_cond_timedwait() using a CLOCK_REALTIME instead of CLOCK_MONOTONIC. 'tinker step 0' in ntp.conf may be a viable workaround.
clocks  timing  ntp  slew  sync  step  pthreads  java  jvm  timers  clock_realtime  clock_monotonic 
september 2013 by jm
Blueflood by rackerlabs
Rackspace's large-scale TSD storage system, built on Cassandra, Java, ASL2
cassandra  tsd  storage  time-series  data  open-source  java  rackspace 
september 2013 by jm
Interview with the Github Elasticsearch Team
good background on Github's Elasticsearch scaling efforts. Some rather horrific split-brain problems under load, and crashes due to OpenJDK bugs (sounds like OpenJDK *still* isn't ready for production). painful
elasticsearch  github  search  ops  scaling  split-brain  outages  openjdk  java  jdk  jvm 
september 2013 by jm
Lock-Based vs Lock-Free Concurrent Algorithms
An excellent post from Martin Thompson showing a new JSR166 concurrency primitive, StampedLock, compared against a number of alternatives in a simple microbenchmark.
The most interesting thing for me is how much the lock-free, AtomicReference.compareAndSet()-based approach blows away all the lock-based approaches -- even in the 1-reader-1-writer case. Its code is extremely simple, too: https://github.com/mjpt777/rw-concurrency/blob/master/src/LockFreeSpaceship.java
concurrency  java  threads  lock-free  locking  compare-and-set  cas  atomic  jsr166  microbenchmarks  performance 
august 2013 by jm
Snowizard
'a Java port of Twitter's Snowflake thrift service presented as an HTTP-based Dropwizard service'.
an HTTP-based service for generating unique ID numbers at high scale with some simple guarantees. supports returning ID numbers as: JSON and JSONP; Google's Protocol Buffers; Plain text.

At GE, we were more interested in the uncoordinated aspects of Snowflake than its throughput requirements, so HTTP was fine for our needs. We also exposed the core of Snowflake as an embeddable module so it can be directly integrated into our applications. We don't have the guarantees that the Snowflake-Zookeeper integration was providing, but that was also acceptable to us. In places where we really needed high throughput, we leveraged the snowizard-core embeddable module directly.


Odd OSS license, though -- BSDish?
java  open-source  ids  soa  services  snowflake  http 
august 2013 by jm
How to configure ntpd so it will not move time backwards
The "-x" switch will expand the step/slew boundary from 128ms to 600 seconds, ensuring the time is slewed (drifted slowly towards the correct time at a max of 5ms per second) rather than "stepped" (a sudden jump, potentially backwards). Since slewing has a max of 5ms per second, time can never "jump backwards", which is important to avoid some major application bugs (particularly in Java timers).
ntpd  time  ntp  ops  sysadmin  slew  stepping  time-synchronization  linux  unix  java  bugs 
august 2013 by jm
Recordinality
a new, and interesting, sketching algorithm, with a Java implementation:
Recordinality is unique in that it provides cardinality estimation like HLL, but also offers "distinct value sampling." This means that Recordinality can allow us to fetch a random sample of distinct elements in a stream, invariant to cardinality. Put more succinctly, given a stream of elements containing 1,000,000 occurrences of 'A' and one occurrence each of 'B' - 'Z', the probability of any letter appearing in our sample is equal. Moreover, we can also efficiently store the number of times elements in our distinct sample have been observed. This can help us to understand the distribution of occurrences of elements in our stream. With it, we can answer questions like "do the elements we've sampled present in a power law-like pattern, or is the distribution of occurrences relatively even across the set?"
sketching  coding  algorithms  recordinality  cardinality  estimation  hll  hashing  murmurhash  java 
august 2013 by jm
Randomly Failed! The State of Randomness in Current Java Implementations
This would appear to be the paper which sparked off the drama around BitCoin thefts from wallets generated on Android devices:

The SecureRandom PRNG is the primary source of randomness for Java and is used e.g., by cryptographic operations. This underlines its importance regarding security. Some of fallback solutions of the investigated implementations [are] revealed to be weak and predictable or capable of being influenced. Very alarming are the defects found in Apache Harmony, since it is partly used by Android.


More on the BitCoin drama: https://bitcointalk.org/index.php?topic=271486.40 , http://bitcoin.org/en/alert/2013-08-11-android
android  java  prng  random  security  bugs  apache-harmony  apache  crypto  bitcoin  papers 
august 2013 by jm
 Censum
[JVM] GC is a difficult, specialised area that can be very frustrating for busy developers or devops folks to deal with. The JVM has a number of Garbage Collectors and a bewildering array of switches that can alter the behaviour of each collector. Censum does all of the parsing, number crunching and statistical analysis for you, so you don't have to go and get that PhD in Computer Science in order to solve your GC performance problem. Censum gives you straight answers as opposed to a ton of raw data. can eat any GC log you care to throw at it. is easy to install and use.


Commercial software, UKP 495 per license.
censum  gc  tuning  ops  java  jvm  commercial 
july 2013 by jm
Tuning and benchmarking Java 7's Garbage Collectors: Default, CMS and G1
Rudiger Moller runs through a typical GC-tuning session, in exhaustive detail
java  gc  tuning  jvm  cms  g1  ops 
july 2013 by jm
Rooting SIM cards
the details of Karsten Nohl's attack against SIM cards, allowing remote-root malware via SMS.
Cracking SIM update keys: [Over The Air] commands, such as software updates, are cryptographically-secured SMS messages, which are delivered directly to the SIM. While the option exists to use state-of-the-art AES or the somewhat outdated 3DES algorithm for OTA, many (if not most) SIM cards still rely on the 70s-era DES cipher. [...] To derive a DES OTA key, an attacker starts by sending a binary SMS to a target device. The SIM does not execute the improperly signed OTA command, but does in many cases respond to the attacker with an error code carrying a cryptographic signature, once again sent over binary SMS. A rainbow table resolves this plaintext-signature tuple to a 56-bit DES key within two minutes on a standard computer.


2 minutes. Sic transit gloria DES. The next step after that is to send a signed request to run a Java applet, then exploit a hole in the JVM sandbox, and the SIM card is rooted.

Looking forward to the full paper on July 31st...
des  3des  crypto  security  sms  sim-cards  smartcards  java  applets  ota  rainbow-tables  cracking  karsten-nohl 
july 2013 by jm
Chronicle
an ultra low latency, high throughput, persisted, messaging and event driven in memory database. The typical latency is as low as 80 nano-seconds and supports throughputs of 5-20 million messages/record updates per second.

This library also supports distributed, durable, observable collections (Map, List, Set) The performance depends on the data structures used, but simple data structures can achieve throughputs of 5 million elements or key/value pairs in batches (eg addAll or putAll) and 500K elements or key/values per second when added/updated/removed individually.

It uses almost no heap, trivial GC impact, can be much larger than your physical memory size (only limited by the size of your disk) and can be shared between processes with better than 1/10th latency of using Sockets over loopback. It can change the way you design your system because it allows you to have independent processes which can be running or not at the same time (as no messages are lost) This is useful for restarting services and testing your services from canned data. e.g. like sub-microsecond durable messaging. You can attach any number of readers, including tools to see the exact state of the data externally.
library  messaging  performance  java  chronicle  disk  mmap 
july 2013 by jm
Log4j 2: Performance close to insane
Nice writeup on Log4j 2's new AsyncAppender implementation, based on the LMAX Disruptor. sounds pretty excellent:
“One nice little detail I should mention is that both Async Loggers and Async Appenders fix something that has always bothered me in Log4j-1.x, which is that they will flush the buffer after logging the last event in the queue . With Log4j-1.x, if you used buffered I/O, you often could not see the last few log events, as they were still stuck in the memory buffer. Your only option was setting immediateFlush to true, which forces disk I/O on every single log event and has a performance impact.
With Async Loggers and Appenders in Log4j-2.0 your log statements are all flushed to disk, so they are always visible, but this happens in a very efficient manner.”
logging  java  performance  async  disruptor  low-latency 
july 2013 by jm
Java Garbage Collection Distilled
a great summary of the state of JVM garbage collection from Martin Thompson
jvm  java  gc  garbage-collection  tuning  memory  performance  martin-thompson 
july 2013 by jm
Java Concurrent Counters By Numbers
threadsafe counters in the JVM compared. AtomicLong, Doug Lea's LongAdder, a ThreadLocal counter, and a field-on-the-Thread-object counter int (via Darach Ennis). Nitsan's posts on concurrency are fantastic
counters  concurrency  threads  java  jvm  atomic 
june 2013 by jm
Java Garbage Collection Distilled
Martin Thompson lays it out:
Serial, Parallel, Concurrent, CMS, G1, Young Gen, New Gen, Old Gen, Perm Gen, Eden, Tenured, Survivor Spaces, Safepoints, and the hundreds of JVM start-up flags. Does this all baffle you when trying to tune the garbage collector while trying to get the required throughput and latency from your Java application? If it does then don’t worry, you are not alone. Documentation describing garbage collection feels like man pages for an aircraft. Every knob and dial is detailed and explained but nowhere can you find a guide on how to fly. This article will attempt to explain the tradeoffs when choosing and tuning garbage collection algorithms for a particular workload.
gc  java  garbage-collection  coding  cms  g1  jvm  optimization 
june 2013 by jm
Building a Modern Website for Scale (QCon NY 2013) [slides]
some great scalability ideas from LinkedIn. Particularly interesting are the best practices suggested for scaling web services:

1. store client-call timeouts and SLAs in Zookeeper for each REST endpoint;
2. isolate backend calls using async/threadpools;
3. cancel work on failures;
4. avoid sending requests to GC'ing hosts;
5. rate limits on the server.

#4 is particularly cool. They do this using a "GC scout" request before every "real" request; a cheap TCP request to a dedicated "scout" Netty port, which replies near-instantly. If it comes back with a 1-packet response within 1 millisecond, send the real request, else fail over immediately to the next host in the failover set.

There's still a potential race condition where the "GC scout" can be achieved quickly, then a GC starts just before the "real" request is issued. But the incidence of GC-blocking-request is probably massively reduced.

It also helps against packet loss on the rack or server host, since packet loss will cause the drop of one of the TCP packets, and the TCP retransmit timeout will certainly be higher than 1ms, causing the deadline to be missed. (UDP would probably work just as well, for this reason.) However, in the case of packet loss in the client's network vicinity, it will be vital to still attempt to send the request to the final host in the failover set regardless of a GC-scout failure, otherwise all requests may be skipped.

The GC-scout system also helps balance request load off heavily-loaded hosts, or hosts with poor performance for other reasons; they'll fail to achieve their 1 msec deadline and the request will be shunted off elsewhere.

For service APIs with real low-latency requirements, this is a great idea.
gc-scout  gc  java  scaling  scalability  linkedin  qcon  async  threadpools  rest  slas  timeouts  networking  distcomp  netty  tcp  udp  failover  fault-tolerance  packet-loss 
june 2013 by jm
Big Memory, Part 4
good microbenchmarking of a bunch of Java collections; Trove, fastutil, PCJ, mahout-collections, hppc
java  collections  benchmarks  performance  speed  coding  data-structures  optimization 
june 2013 by jm
fastutil
fastutil extends the Java™ Collections Framework by providing type-specific maps, sets, lists and queues with a small memory footprint and fast access and insertion; provides also big (64-bit) arrays, sets and lists, and fast, practical I/O classes for binary and text files. It is free software distributed under the Apache License 2.0. It requires Java 6 or newer.


used by Facebook (along with Apache Giraph, Netty, Unsafe) to speed up "weekend Hive jobs" to "coffee breaks". http://www.slideshare.net/nitayj/2013-0603-berlin-buzzwords
via:highscalability  facebook  giraph  optimization  java  speed  fastutil  collections  data-structures 
june 2013 by jm
'Mythbusting Modern Hardware to gain "Mechanical Sympathy"' [slides]
Martin Thompson's latest talk -- taking a few common concepts about modern hardware performance and debunking/confirming them, mythbusters-style
mythbusters  hardware  mechanical-sympathy  martin-thompson  java  performance  cpu  disks  ssd 
may 2013 by jm
Berkeley DB Java Edition Architecture [PDF]
background white paper on the BDB-JE innards and design, from 2006. Still pretty accurate and good info
bdb-je  java  berkeley-db  bdb  design  databases  pdf  white-papers  trees 
may 2013 by jm
Log4j2 Asynchronous Loggers for Low-Latency Logging - Apache Log4j 2
implemented using the LMAX Disruptor library -- very impressive performance figures. I presume in real-world usage, these latencies are dwarfed by hardware costs, though
disruptor  coding  java  log4j  logging  async  performance 
april 2013 by jm
Functional Reactive Programming in the Netflix API with RxJava
Hmm, this seems nifty as a compositional building block for Java code to enable concurrency without thread-safety and sync problems.
Functional reactive programming offers efficient execution and composition by providing a collection of operators capable of filtering, selecting, transforming, combining and composing Observable's.

The Observable data type can be thought of as a "push" equivalent to Iterable which is "pull". With an Iterable, the consumer pulls values from the producer and the thread blocks until those values arrive. By contrast with the Observable type, the producer pushes values to the consumer whenever values are available. This approach is more flexible, because values can arrive synchronously or asynchronously.
concurrency  java  jvm  threads  thread-safety  coding  rx  frp  fp  functional-programming  reactive  functional  async  observable 
april 2013 by jm
The useful JVM options
a good reference, with lots of sample output. Not clear if it takes 1.6/1.7 differences into account, though
jvm  reference  java  ops  hotspot  command-line 
april 2013 by jm
« earlier      
per page:    204080120160

related tags

3des  actors  adrian-cockroft  adtuitive  akka  alexey-shipilev  algorithms  allocation  analysis  android  annotations  apache  apache-harmony  api  apis  applets  architecture  arrays  asl  asl2  async  atmosphere  atomic  availability  avro  aws  azul  backdoors  bdb  bdb-je  benchmarking  benchmarks  berkeley-db  binary  bitcoin  bloom-filters  bottlenecks  boundary  bugs  build  bytearrayoutputstream  bytebuffer  bytebuffers  c++  c10k  caching  capnproto  cardinality  cas  cascading  cassandra  censum  charles-nutter  charlie-hunt  chmod  chronicle  chronon  cli  clients  cliff-click  clocks  clock_monotonic  clock_realtime  clojure  closures  cms  code  codecs  coding  coding-standards  collections  command-line  commercial  compare-and-set  compilation  compiler  compression  comsat  concurrency  concurrent  consensus  contracts  coordinated-omission  core  core-dump  cost  counters  cpu  cracking  crypto  cuda  curator  currency  currying  d2  dalvik  data  data-structures  databases  debugging  demos  dependencies  des  design  disk  disks  disruptor  distcomp  distributed  distributed-computing  distsys  dns  documents  doug-lea  dram  dropwizard  duplicate-detection  duplicates  dvr  dynamic-languages  dynamodb  dynect  ec2  eclipse  editors  eiffel  elasticsearch  encapsulation  encoding  erlang  errors  estimation  etsy  eureka  event-processing  event-streams  events  exceptions  executable  facebook  fail  failover  fast-data  fastutil  fault-tolerance  fauxflake  final  findbugs  flatbuffers  float  floating-point  fluent-interfaces  fomats  formats  fp  framework  frameworks  frp  functional  functional-programming  fuzzy-matching  g1  g1gc  games  garbage-collection  gc  gc-scout  gcore  gdb  get  gil-tene  giraph  git  github  google  gossip  gpu  gradle  groovy  guardian  guava  guidelines  hacking  hacks  hadoop  hardware  hash  hashing  hashmap  hashmaps  hashsets  hbase  hdfs  hdrhistogram  head  heap  heap-dump  hive  hll  hotspot  hpc  http  httpurlconnection  hystrix  idea  ides  ids  impala  infoq  instrumentation  intellij  interactive  interoperability  invariants  io  ipc  jackson  james-gosling  jars  java  java-8  java8  jdk  jdk6  jersey  jetty  jmh  jmm  joda-time  jpl  jruby  json  jsr-133  jsr166  jvm  kamon  karsten-nohl  l2  lambda  lambdas  language  languages  larry-ellison  latency  lazy  lazy-computation  lazy-loading  lazyset  libpfm  libraries  library  lift  linkedin  linux  lisp  lists  littles-law  llc  lmax  load-balancing  lock-free  locking  locks  log4j  logging  long-polling  low-latency  lucene  mailinator  map  map-reduce  mapreduce  maps  marc-brooker  martin-fowler  martin-thompson  maven  measurement  mechanical-sympathy  memory  memory-barriers  memstore  messaging  metrics  microbenchmarks  minhash  mission-control  mitm  mmap  money  monitoring  mpsc  multicore  murmurhash  mutexes  mvc  mythbusters  nanoseconds  nanotime  nasa  netflix  netty  networking  nio  nitsanw  non-blocking  nonblockinghashtable  nsecs  ntp  ntpd  numeric  object-model  observable  off-heap  off-heap-storage  olap  oo  oom  oop  open-source  openhft  openjdk  opensource  ops  optimization  oracle  oss  ota  outages  overflow  p99  packet-loss  papers  patterns  paul-tyma  pdf  percentiles  perf  performance  persistence  persistent  pipelines  pipes  play  playframework  plumbr.eu  polyglot  preconditions  presentation  presentations  prng  production  profiling  programming  protobuf  protobufs  proxies  pthreads  pulsar  putordered  qcon  quasar  querying  queue  queues  rackspace  raft  rails  rainbow-tables  random  randomaccessfile  reactive  reactive-programming  reactor  recordinality  refactoring  reference  references  relearning  reliability  repl  resiliency  rest  rest.li  retries  retrying  reversing  ribbon  route53  ruby  rx  s3  sbe  scala  scalability  scalding  scaling  scott-andreas  scraping  scripting  sdk  search  security  semaphores  serialization  server  server-side  service-metrics  services  servlets  set  sets  shell  shingling  signalling  sim-cards  sinatra  sketching  slab-allocator  slas  slew  slides  smartcards  sms  snaptree  snowflake  soa  sockets  softreferences  software  spark  speed  spinedbuffer  split-brain  spray  spring  spsc  sql  ssd  sse  ssh  startup  static-analysis  statistics  step  stepping  storage  streams  strong-typing  structs  style  sun  sync  synchronization  sysadmin  talks  tcp  tech-talks  telemetry  templates  templating  testing  thread-safety  threading  threadpools  threads  time  time-series  time-synchronization  time-warp  timeouts  timers  timing  tips  tools  trees  tries  tsd  tuning  twitter  types  udp  ultradns  undocumented  unique  unique-ids  unix  urban-airship  urlconnection  usenix  utf  utf8  via:b3n  via:caro  via:cscotta  via:dehora  via:highscalability  via:janl  via:kellabyte  via:netflix  via:normanmaurer  via:peakscale  via:sampullara  via:sbtourist  via:spyced  via:twitter  video  vim  vm  volatile  voldemort  walmart  war-stories  weak  weakhashmap  weakreferences  web  web-services  webapps  webdev  websockets  white-papers  yammer  yourkit  youtube  zip  zookeeper 

Copy this bookmark:



description:


tags: