jm + locking   14

Locking, Little's Law, and the USL
Excellent explanatory mailing list post by Martin Thompson to the mechanical-sympathy group, discussing Little's Law vs the USL:
Little's law can be used to describe a system in steady state from a queuing perspective, i.e. arrival and leaving rates are balanced. In this case it is a crude way of modelling a system with a contention percentage of 100% under Amdahl's law, in that throughput is one over latency.

However this is an inaccurate way to model a system with locks. Amdahl's law does not account for coherence costs. For example, if you wrote a microbenchmark with a single thread to measure the lock cost then it is much lower than in a multi-threaded environment where cache coherence, other OS costs such as scheduling, and lock implementations need to be considered.

Universal Scalability Law (USL) accounts for both the contention and the coherence costs.

When modelling locks it is necessary to consider how contention and coherence costs vary given how they can be implemented. Consider in Java how we have biased locking, thin locks, fat locks, inflation, and revoking biases which can cause safe points that bring all threads in the JVM to a stop with a significant coherence component.
usl  scaling  scalability  performance  locking  locks  java  jvm  amdahls-law  littles-law  system-dynamics  modelling  systems  caching  threads  schedulers  contention 
27 days ago by jm
Doorman is a solution for Global Distributed Client Side Rate Limiting. Clients that talk to a shared resource (such as a database, a gRPC service, a RESTful API, or whatever) can use Doorman to voluntarily limit their use (usually in requests per second) of the resource. Doorman is written in Go and uses gRPC as its communication protocol. For some high-availability features it needs a distributed lock manager. We currently support etcd, but it should be relatively simple to make it use Zookeeper instead.

From google -- very interesting to see they're releasing this as open source, and it doesn't rely on G-internal services
distributed  distcomp  locking  youtube  golang  doorman  rate-limiting  rate-limits  limits  grpc  etcd 
july 2016 by jm
How to do distributed locking
A critique of the "Redlock" locking algorithm from Redis by Martin Kleppman. antirez responds here: ; summary of followups:
distributed  locking  redis  algorithms  coding  distcomp  redlock  martin-kleppman  zookeeper 
february 2016 by jm
5 subtle ways you're using MySQL as a queue, and why it'll bite you
Excellent post from Percona. I particularly like that they don't just say "don't use MySQL" -- they give good advice on how it can be made work: 1) avoid polling; 2) avoid locking; and 3) avoid storing your queue in the same table as other data.
database  mysql  queueing  queue  messaging  percona  rds  locking  sql  architecture 
january 2016 by jm
Biased Locking in HotSpot (David Dice's Weblog)
This is pretty nuts. If biased locking in the HotSpot JVM is causing performance issues, it can be turned off:
You can avoid biased locking on a per-object basis by calling System.identityHashCode(o). If the object is already biased, assigning an identity hashCode will result in revocation, otherwise, the assignment of a hashCode() will make the object ineligible for subsequent biased locking.
hashcode  jvm  java  biased-locking  locking  mutex  synchronization  locks  performance 
march 2015 by jm
"Left-Right: A Concurrency Control Technique with Wait-Free Population Oblivious Reads" [pdf]
'In this paper, we describe a generic concurrency control technique with Blocking write operations and Wait-Free Population Oblivious read operations, which we named the Left-Right technique. It is of particular interest for real-time applications with dedicated Reader threads, due to its wait-free property that gives strong latency guarantees and, in addition, there is no need for automatic Garbage Collection.
The Left-Right pattern can be applied to any data structure, allowing concurrent access to it similarly to a Reader-Writer lock, but in a non-blocking manner for reads. We present several variations of the Left-Right technique, with different versioning mechanisms and state machines. In addition, we constructed an optimistic approach that can reduce synchronization for reads.'

See also for java implementation code.
left-right  concurrency  multithreading  wait-free  blocking  realtime  gc  latency  reader-writer  locking  synchronization  java 
september 2014 by jm
Google's purify/valgrind-like concurrency checking tool:

'As a bonus, ThreadSanitizer finds some other types of bugs: thread leaks, deadlocks, incorrect uses of mutexes, malloc calls in signal handlers, and more. It also natively understands atomic operations and thus can find bugs in lock-free algorithms. [...] The tool is supported by both Clang and GCC compilers (only on Linux/Intel64). Using it is very simple: you just need to add a -fsanitize=thread flag during compilation and linking. For Go programs, you simply need to add a -race flag to the go tool (supported on Linux, Mac and Windows).'
concurrency  bugs  valgrind  threadsanitizer  threading  deadlocks  mutexes  locking  synchronization  coding  testing 
june 2014 by jm
Flock for Cron jobs
good blog post writing up the 'flock -n -c' trick to ensure single-concurrent-process locking for cron jobs
cron  concurrency  unix  linux  flock  locking  ops 
december 2013 by jm
Lock-Based vs Lock-Free Concurrent Algorithms
An excellent post from Martin Thompson showing a new JSR166 concurrency primitive, StampedLock, compared against a number of alternatives in a simple microbenchmark.
The most interesting thing for me is how much the lock-free, AtomicReference.compareAndSet()-based approach blows away all the lock-based approaches -- even in the 1-reader-1-writer case. Its code is extremely simple, too:
concurrency  java  threads  lock-free  locking  compare-and-set  cas  atomic  jsr166  microbenchmarks  performance 
august 2013 by jm
Locks & Condition Variables - Latency Impact

Firstly, this is 3 orders of magnitude greater latency than what I illustrated in the previous article using just memory barriers to signal between threads. This cost comes about because the kernel needs to get involved to arbitrate between the threads for the lock, and then manage the scheduling for the threads to awaken when the condition is signalled. The one-way latency to signal a change is pretty much the same as what is considered current state of the art for network hops between nodes via a switch. It is possible to get ~1µs latency with InfiniBand and less than 5µs with 10GigE and user-space IP stacks.

Secondly, the impact is clear when letting the OS choose what CPUs the threads get scheduled on rather than pinning them manually. I've observed this same issue across many use cases whereby Linux, in default configuration for its scheduler, will greatly impact the performance of a low-latency system by scheduling threads on different cores resulting in cache pollution. Windows by default seems to make a better job of this.
locking  concurrency  java  jvm  signalling  locks  linux  threading 
september 2012 by jm
Martin "Disruptor" Thompson's Single Writer Principle
Contains these millisecond estimates for highly-contended inter-thread signalling when incrementing a 64-bit counter in java:
One Thread 300<br>
One Thread with Memory Barrier 4,700<br>
One Thread with CAS 5,700<br>
Two Threads with CAS 18,000<br>
One Thread with Lock 10,000<br>
Two Threads with Lock 118,000<br>

Undoubtedly not realistic for a lot of cases, but it's still useful for order-of-magnitude estimates of locking cost. Bottom line: don't lock if you can avoid it, even with 'volatile' or AtomicFoo types.
java  jvm  performance  coding  concurrency  threading  cas  locking 
september 2012 by jm
Striped (Guava: Google Core Libraries for Java 13.0.1 API)
Nice piece of Guava concurrency infrastructure in the latest release:
A striped Lock/Semaphore/ReadWriteLock. This offers the underlying lock striping similar to that of ConcurrentHashMap in a reusable form, and extends it for semaphores and read-write locks. Conceptually, lock striping is the technique of dividing a lock into many stripes, increasing the granularity of a single lock and allowing independent operations to lock different stripes and proceed concurrently, instead of creating contention for a single lock.<br>

The guarantee provided by this class is that equal keys lead to the same lock (or semaphore), i.e. if (key1.equals(key2)) then striped.get(key1) == striped.get(key2) (assuming Object.hashCode() is correctly implemented for the keys). Note that if key1 is not equal to key2, it is not guaranteed that striped.get(key1) != striped.get(key2); the elements might nevertheless be mapped to the same lock. The lower the number of stripes, the higher the probability of this happening.<br>

Prior to this class, one might be tempted to use Map<K, Lock>, where K represents the task. This maximizes concurrency by having each unique key mapped to a unique lock, but also maximizes memory footprint. On the other extreme, one could use a single lock for all tasks, which minimizes memory footprint but also minimizes concurrency. Instead of choosing either of these extremes, Striped allows the user to trade between required concurrency and memory footprint. For example, if a set of tasks are CPU-bound, one could easily create a very compact Striped<Lock> of availableProcessors() * 4 stripes, instead of possibly thousands of locks which could be created in a Map<K, Lock> structure.
locking  concurrency  java  guava  semaphores  coding  via:twitter 
september 2012 by jm
InfoQ: Lock-free Algorithms
Michael Barker and Martin Thompson's talk at the last QCon on the LMAX Disruptor, and other nifty lock-free techniques and patterns. 'Martin Thompson and Michael Barker explain how Intel x86_64 processors and their memory model work, along with low-level techniques that help creating lock-free software.'
lock-free  locking  mutexes  algorithms  lmax  disruptor  infoq  slides  presentations  qcon  java 
april 2012 by jm
Mindblowing Python GIL
'presentation about how the Python GIL actually works and why it's even worse than most people even imagine.' A good chunk btw could be rephrased as 'pthreads is worse than most people even imagine'. pretty awful data, though
python  gil  locking  synchronization  ouch  performance  tuning  coding  interpreters  threads  pthreads  from delicious
february 2010 by jm

Copy this bookmark: