mpm + non-blocking   50

A Scalable, Portable, and Memory-Efficient Lock-Free FIFO Queue
We present a new lock-free multiple-producer and multiple-consumer (MPMC) FIFO queue design which is scalable and, unlike existing high-performant queues, very memory efficient. Moreover, the design is ABA safe and does not require any external memory allocators or safe memory reclamation techniques, typically needed by other scalable designs. In fact, this queue itself can be leveraged for object allocation and reclamation, as in data pools. We use FAA (fetch-and-add), a specialized and more scalable than CAS (compare-and-set) instruction, on the most contended hot spots of the algorithm. However, unlike prior attempts with FAA, our queue is both lock-free and linearizable.
We propose a general approach, SCQ, for bounded queues. This approach can easily be extended to support unbounded FIFO queues which can store an arbitrary number of elements. SCQ is portable across virtually all existing architectures and flexible enough for a wide variety of uses. We measure the performance of our algorithm on the x86-64 and PowerPC architectures. Our evaluation validates that our queue has exceptional memory efficiency compared to other algorithms and its performance is often comparable to, or exceeding that of state-of-the-art scalable algorithms.
non-blocking  datastructure 
18 hours ago by mpm
The wait-free hierarchy hrm classifies AsynchronousSharedMemory object types T by consensus number, where a type T has consensus number n if with objects of type T and atomic registers (all initialized to appropriate values) it is possible to solve wait-free consensus (i.e., agreement, validity, wait-free termination) for n processes but not for n 1 processes. The consensus number of any type is at least 1, since 1-process consensus requires no interaction, and may range up to ∞ for particularly powerful objects.
concurrency  non-blocking 
26 days ago by mpm
OneFile is a Software Transactional Memory (STM) meant to make it easy to implement lock-free and wait-free data structures
concurrency  non-blocking 
5 weeks ago by mpm
The Adaptive Priority Queue with Elimination and Combining
Priority queues are fundamental abstract data structures, often used to manage limited resources in parallel programming. Several proposed parallel priority queue implementations are based on skiplists, harnessing the potential for parallelism of the add() operations. In addition, methods such as Flat Combining have been proposed to reduce contention by batching together multiple operations to be executed by a single thread. While this technique can decrease lock-switching overhead and the number of pointer changes required by the removeMin() operations in the priority queue, it can also create a sequential bottleneck and limit parallelism, especially for non-conflicting add() operations.
In this paper, we describe a novel priority queue design, harnessing the scalability of parallel insertions in conjunction with the efficiency of batched removals. Moreover, we present a new elimination algorithm suitable for a priority queue, which further increases concurrency on balanced workloads with similar numbers of add() and removeMin() operations. We implement and evaluate our design using a variety of techniques including locking, atomic operations, hardware transactional memory, as well as employing adaptive heuristics given the workload.
non-blocking  datastructure 
6 weeks ago by mpm
EventCount allows to wait for arbitrary predicates in non-blocking algorithms. Think of condition variable, but wait predicate does not need to be protected by a mutex.
non-blocking  c 
9 weeks ago by mpm
Preemption Is GC for Memory Reordering
Event counts let waiters detect when they would have been woken up (the event count’s version counter has changed), and thus patch up this window where waiters can miss wake-ups for data changes they have yet to observe. Crucially, waiters detect lost wake-ups, rather than preventing them by locking writers out. Event counts thus preserve lock-freedom (and even wait-freedom!).
may 2019 by mpm
Concurrent Robin Hood Hashing
In this paper we examine the issues involved in adding concurrency to the Robin Hood hash table algorithm. We present a non-blocking obstruction-free K-CAS Robin Hood algorithm which requires only a single word compare-and-swap primitive, thus making it highly portable. The implementation maintains the attractive properties of the original Robin Hood structure, such as a low expected probe length, capability to operate effectively under a high load factor and good cache locality, all of which are essential for high performance on modern computer architectures. We compare our data-structures to various other lock-free and concurrent algorithms, as well as a simple hardware transactional variant, and show that our implementation performs better across a number of contexts.
non-blocking  datastructure 
december 2018 by mpm
FreeBSD lockless algorithm - seq
There are multiple places in the kernel where we need to read some variables very often, but there is a small number of cases when we write to them. For such cases, the seq interface was created.
non-blocking  algorithm 
september 2018 by mpm
MySQL 8.0: New Lock free, scalable WAL design
The Write Ahead Log (WAL) is one of the most important components of a database. All the changes to data files are logged in the WAL (called the redo log in InnoDB). This allows to postpone the moment when the modified pages are flushed to disk, still protecting from data losses.
database  concurrency  storage  non-blocking 
july 2018 by mpm
Concurrent Tries with Efficient Non-blocking Snapshots
We describe a non-blocking concurrent hash trie based on shared-memory single-word compare-and-swap instructions. The hash trie supports standard mutable lock-free operations such as insertion, removal, lookup and their conditional variants. To ensure space-efficiency, removal operations compress the trie when necessary.
non-blocking  datastructure 
january 2018 by mpm
Reclaiming memory for lock-free data structures: there has to be a better way
Memory reclamation for lock-based data structures is typically easy. However, it is a significant challenge for lock-free data structures. Automatic techniques such as garbage collection are inefficient or use locks, and non-automatic techniques either have high overhead, or do not work for many data structures. For example, subtle problems can arise when hazard pointers, one of the most common non-automatic techniques, are applied to many lock-free data structures. Epoch based reclamation (EBR), which is by far the most efficient non-automatic technique, allows the number of unreclaimed objects to grow without bound, because one crashed process can prevent all other processes from reclaiming memory. We develop a more efficient, distributed variant of EBR that solves this problem. It is based on signaling, which is provided by many operating systems, such as Linux and UNIX. Our new scheme takes O(1) amortized steps per high-level operation on the data structure and O(1) steps in the worst case each time an object is removed from the data structure. At any point, O(mn2) objects are waiting to be freed, where n is the number of processes and m is a small constant for most data structures. Experiments show that our scheme has very low overhead: on average 10%, and at worst 28%, for a balanced binary search tree over many thread counts, operation mixes and contention levels. Our scheme also outperforms a highly tuned implementation of hazard pointers by an average of 75%. Typically, memory reclamation is tightly woven into lock-free data structure code. To improve modularity and facilitate the comparison of different memory reclamation schemes, we also introduce a highly flexible abstraction. It allows a programmer to easily interchange schemes for reclamation, object pooling, allocation and deallocation with virtually no overhead, by changing a single line of code.
non-blocking  memory 
december 2017 by mpm
What every systems programmer should know about lockless concurrency
Seasoned programmers are familiar with concurrency building blocks like mutexes, semaphores, and condition variables. But what makes them work? How do we write concurrent code when we can’t use them, like when we’re working below the operating system in an embedded environment, or when we can’t block due to hard time constraints? And since your system transforms your code into things you didn’t write, running in orders you never asked for, how do multithreaded programs work at all? Concurrency—especially on modern hardware—is a complicated and unintuitive topic, but let’s try to cover some fundamentals.
concurrency  non-blocking 
november 2017 by mpm
A Practical Multi-Word Compare-and-Swap Operation
Work on non-blocking data structures has proposed extending processor designs with a compare-and-swap primitive, CAS2, which acts on two arbitrary memory locations. Experience suggested that current operations, typically single-word compare-and-swap (CAS1), are not expressive enough to be used alone in an efficient manner. In this paper we build CAS2 from CAS1 and, in fact, build an arbitrary multi-word compare-and-swap (CASN). Our design requires only the primitives available on contemporary systems, reserves a small and constant amount of space in each word updated (either 0 or 2 bits) and permits nonoverlapping updates to occur concurrently. This provides compelling evidence that current primitives are not only universal in the theoretical sense introduced by Herlihy, but are also universal in their use as foundations for practical algorithms. This provides a straightforward mechanism for deploying many of the interesting non-blocking data structures presented in the literature that have previously required CAS2.
concurrency  non-blocking 
october 2017 by mpm
BwTree is a general purpose, concurrent and lock-free B+-Tree index. It allows for multiple threads modifying and/or querying the tree concurrently without corrupting the tree or giving inconsistent results. However, BwTree only guarantees atomicity of operations, and is not a concurrency control agent. If both atomicity and isolation is required, an extra concurrency control layer must be implemented
datastructure  non-blocking  concurrency 
march 2017 by mpm
Nonblocking Concurrent Data Structures with Condition Synchronization
We use the term dual data structure to describe a concurrent object implementation that may hold both data and reservations (registered requests). By reasoning separately about a request, its successful follow-up, and the period in-between, we obtain meaningful definitions of nonblocking dual data structures. As concrete examples, we present lock-free dualstacks and dualqueues, and experimentally compare their performance with that of lock-based and nonblocking alternatives.
concurrency  datastructure  non-blocking 
february 2017 by mpm
CRTurn Queue
The first MPMC memory-unbounded wait-free queue with memory reclamation
concurrency  c++  non-blocking 
november 2016 by mpm
Fast Non-intrusive Memory Reclamation for Highly-Concurrent Data Structures
This paper proposes three novel ways to alleviate the costs of the memory barriers associated with hazard pointers and related techniques. These new proposals are backward-compatible with existing code that uses hazard pointers. They move the cost of memory management from the principal code path to the infrequent memory reclamation procedure, significantly reducing or eliminating memory barriers executed on the principal code path.
memory  performance  non-blocking 
may 2016 by mpm
Split-Ordered Lists: Lock-Free Extensible Hash Tables
We present the first lock-free implementation of an extensible hash table running on current architectures. Our algorithm provides concurrent insert, delete, and find operations with an expected O(1) cost. It consists of very simple code, easily implementable using only load, store, and compareand-swap operations. The new mathematical structure at the core of our algorithm is recursive splitordering, a way of ordering elements in a linked list so that they can be repeatedly “split ” using a single compare-and-swap operation. Metaphorically speaking, our algorithm differs from prior known algorithms in that extensibility is derived by “moving the buckets among the items ” rather than “the items among the buckets. ” Though lock-free algorithms are expected to work best in multiprogrammed environments, empirical tests we conducted on a large shared memory multiprocessor show that even in non-multiprogrammed environments, the new algorithm performs as well as the most efficient known lock-based resizable hash-table algorithm, and in high load cases it significantly outperforms it.
datastructure  c++  memory  non-blocking 
december 2015 by mpm
Replacing Competition with Cooperation to Achieve Scalable Lock-Free FIFO Queues
In order to guarantee that each method of a data structure updates the logical state exactly once, almost all non-blocking implementations employ Compare-And-Swap (CAS) based synchronization. For FIFO queue implementations this translates into concurrent enqueue or dequeue methods competing among themselves to update the same variable, the tail or the head, respectively, leading to high contention and poor scalability. Recent non-blocking queue implementations try to alleviate high contention by increasing the number of contention points, all the while using CAS-based synchronization. Furthermore, obtaining a wait-free implementation with competition is achieved by additional synchronization which leas to further degradation of performance. In this paper we formalize the notion of competitiveness of a synchronizing statement which can be used as a measure for the scalability of concurrent implementations. We present a new queue implementation, the Speculative Pairing (SP) queue, which, as we show, decreases competitiveness by using Fetch-And-Increment (FAI) instead of CAS. We prove that the SP queue is linearizable and lock-free. We also show that replacing CAS with FAI leads to wait-freedom for dequeue methods without an adverse effect on performance. In fact, our experiments suggest that the SP queue can perform and scale better than the state-of-the-art queue implementations.
may 2015 by mpm
Comparative Performance of Memory Reclamation Strategies for Lock-free and Concurrently-readable Data Structures
We compare the costs of three memory reclamation strategies: quiescent-state-based reclamation, epoch-based reclamation, and safe memory reclamation. Our experiments show that changing the workload or execution environment can change which of these schemes is the most efficient.
memory  concurrency  non-blocking 
march 2015 by mpm
Fast Bounded-Concurrency Hash Tables
This article introduces a general technique for achieving single-writer many-reader lock-free and wait-free hash tables at low to negligible cost. The resulting hash table requires no barriers (fences) or locked instructions on architectures such as x86/x86-64. Read operations are lock-free and write operations are wait-free.
concurrency  non-blocking 
march 2015 by mpm
Java Concurrency Tools for the JVM. This project aims to offer some concurrent data structures currently missing from the JDK:
concurrency  java  data-structures  non-blocking 
january 2015 by mpm
A fast multiple-producer, multi-consumer lock-free concurrent queue for C++11
c++  concurrency  non-blocking 
november 2014 by mpm
Nah Lock: A Lock-Free Memory Allocator
The first allocator had poor scaling on par with libc, but I learned enough from it to write a second lockfree allocator that scales approximately linearly up to 30 cores. It scales sublinearly but slightly better than tcmalloc up to 64 cores
memory  performance  non-blocking 
november 2014 by mpm
The Purpose of memory_order_consume in C++11
memory_order_consume is probably the least well-understood. It’s the most complicated ordering constraint, and it offers the least reward for using it correctly. Nonetheless, there it is, tempting the curious programmer to make sense of it – if only to unlock its dark, mysterious secrets. That’s exactly what this post aims to do.
c++  concurrency  non-blocking 
july 2014 by mpm
Are Lock-Free Concurrent Algorithms Practically Wait-Free?
Lock-free concurrent algorithms guarantee that some concurrent operation will always make progress in a finite number of steps. Yet programmers prefer to treat concurrent code as if it were wait-free, guaranteeing that all operations always make progress. Unfortunately, designing wait-free algorithms is generally a very complex task, and the resulting algorithms are not always efficient. While obtaining efficient wait-free algorithms has been a long-time goal for the theory community, most non-blocking commercial code is only lock-free. This paper suggests a simple solution to this problem. We show that, for a large class of lock- free algorithms, under scheduling conditions which approximate those found in commercial hardware architectures, lock-free algorithms behave as if they are wait-free. In other words, programmers can keep on designing simple lock-free algorithms instead of complex wait-free ones, and in practice, they will get wait-free progress. Our main contribution is a new way of analyzing a general class of lock-free algorithms under a stochastic scheduler. Our analysis relates the individual performance of processes with the global performance of the system using Markov chain lifting between a complex per-process chain and a simpler system progress chain. We show that lock-free algorithms are not only wait-free with probability 1, but that in fact a general subset of lock-free algorithms can be closely bounded in terms of the average number of steps required until an operation completes. To the best of our knowledge, this is the first attempt to analyze progress conditions, typically stated in relation to a worst case adversary, in a stochastic model capturing their expected asymptotic behavior.
concurrency  non-blocking 
november 2013 by mpm
Is Parallel Programming Hard, And, If So, What Can You Do About It?
The purpose of this book is to help you understand how
to program shared-memory parallel machines without
risking your sanity
concurrency  book  non-blocking 
september 2013 by mpm
Inside HyperLevelDB
While stock LevelDB is an excellent foundation, our experience with HyperDex identified several opportunities for further performance improvements. This article describes the changes we've made to LevelDB meet HyperDex clients' demanding needs
database  io  concurrency  storage  non-blocking 
june 2013 by mpm
Nonblocking Algorithms and Scalable Multicore Programming
Nonblocking synchronization may be used in building predictable and resilient systems while avoiding the problems of lock-based synchronization. This class of synchronization is not a silver bullet, especially outside of the abstract models in which its performance properties were originally defined. Several of the performance bottlenecks and error conditions associated with lock-based synchronization remain (albeit in more obfuscated forms); therefore, ensuring correctness requires more complex verification methods, and in some cases nonblocking algorithms require the adoption of complex support systems. These complexities frequently make nonblocking synchronization a victim of hype, fear, uncertainty, and doubt. This article aims to equip the reader with the knowledge needed to identify situations that benefit from nonblocking synchronization.
concurrency  non-blocking 
june 2013 by mpm
Proving the Correctness of Nonblocking Data Structures
This article looks at what makes nonblocking data structure design and implementation tricky, and it surveys modeling techniques and verification tools that can provide valuable feedback on the correctness of those algorithms
concurrency  non-blocking 
june 2013 by mpm
Structured Deferral: Synchronization via Procrastination
Developers often take a proactive approach to software design, especially those from cultures valuing industriousness over procrastination. Lazy approaches, however, have proven their value, with examples including reference counting, garbage collection, and lazy evaluation. This structured deferral takes the form of synchronization via procrastination, specifically reference counting, hazard pointers, and RCU (read-copy-update).
concurrency  non-blocking 
may 2013 by mpm
Lightweight Contention Management for Efficient Compare-and-Swap Operations
Many concurrent data-structure implementations use the well-known compare-and-swap (CAS) operation, supported in hardware by most modern multiprocessor architectures for inter-thread synchronization. A key weakness of the CAS operation is the degradation in its performance in the presence of memory contention. In this work we study the following question: can software-based contention management improve the efficiency of hardware-provided CAS operations? Our performance evaluation establishes that lightweight contention management support can greatly improve performance under medium and high contention levels while typically incurring only small overhead when contention is low.
concurrency  non-blocking 
may 2013 by mpm
Mintomic (short for "minimal atomic") is an API for low-level lock-free programming in C and C++.
concurrency  c++  memory  non-blocking 
may 2013 by mpm
This package provides semi-portable access to hardware provided atomic memory operations
memory  non-blocking 
march 2013 by mpm
Using elimination to implement scalable and lock-free fifo queues
This paper shows for the first time that elimination, a scaling technique formerly applied only to counters and LIFO structures, can be applied to FIFO data structures, specifically, to linearizable FIFO queues. We show how to transform existing nonscalable FIFO queue implementations into scalable implementations using the elimination technique, while preserving lock-freedom and linearizablity. We apply our transformation to the FIFO queue algorithm of Michael and Scott, which is included in the Java TM Concurrency Package. Empirical evaluation on a state-ofthe-art CMT multiprocessor chip shows that by using elimination as a backoff technique for the Michael and Scott queue algorithm, we can achieve comparable performance at low loads, and improved scalability as load increases.
march 2013 by mpm
Non-blocking Patricia Tries with Replace Operations
This paper presents a non-blocking Patricia trie implementation for an asynchronous shared-memory system using Compare&Swap
datastructure  non-blocking 
march 2013 by mpm
Google Concurrency Library for C++
c++  concurrency  non-blocking 
march 2013 by mpm
Systems Programming: Coping with Parallelism
Creating operating system components and subsystcms in today's large processors generally requires dealing with more than one CPU operating in parallel against a shared memory. While "applications" are typically shielded from the effects of parallelism, components and subsystems ususally are designed in such a way that some level of understanding is required. This paper concentrates on the pitfalls awaiting the proprammer in a parallel (or even a multiprogramming) environment when shared data structures (control blocks) are referenced and altercd by muitipie processes (tasks). The focus is on the IBM Systemi 370 architecture because of its multiple CPU architecture and the powerful "compare and swap" instruction. The paper reviews some architectural groundrules that a parallel programmer must understand, presents problems that must often be solved in a parallel environment, then describes solutions such as usage of compare and swap, locks, and single-process schemes. Kernels of code are used to illustrate problems and solutions
concurrency  datastructure  non-blocking 
february 2013 by mpm
A Lock-Free Algorithm for Concurrent Bags
A lock-free bag data structure supporting unordered buffering is presented in this paper. The algorithm supports multiple producers and multiple consumers, as well as dynamic collection sizes. To handle concurrency efficiently, the algorithm was designed to thrive for disjoint-access-parallelism for the supported semantics. Therefore, the algorithm exploits a distributed design combined with novel techniques for handling concurrent modifications of linked lists using double marks, detection of total emptiness, and efficient memory management with hazard pointer handover. Experiments on a 24-way multi-core platform show significantly better performance for the new algorithm compared to previous algorithms of relevance.
concurrency  datastructure  non-blocking 
february 2013 by mpm
A Scalable Lock-free Stack Algorithm
The literature describes two high performance concurrent stack algorithms based on combining funnels and elimination trees. Unfortunately, the funnels are linearizable but blocking, and the elimination trees are non-blocking but not linearizable. Neither is used in practice since they perform well only at exceptionally high loads. The literature also describes a simple lock-free linearizable stack algorithm that works at low loads but does not scale as the load increases. The question of designing a stack algorithm that is non-blocking, linearizable, and scales well throughout the concurrency range, has thus remained open. This paper presents such a concurrent stack algorithm. It is based on the following simple observation: that a single elimination array used as a backoff scheme for a simple lock-free stack is lock-free, linearizable, and scalable. As our empirical results show, the resulting eliminationbackoff stack performs as well as the simple stack at low loads, and increasingly outperforms all other methods (lock-based and non-blocking) as concurrency increases. We believe its simplicity and scalability make it a viable practical alternative to existing constructions for implementing concurrent stacks.
concurrency  datastructure  non-blocking 
february 2013 by mpm
FastFlow Queue
FastFlow is a parallel programming framework for multicore platforms, which uses hybrid single-producer/single-consumer queues as a base underlying component for inter-thread communication
concurrency  datastructure  non-blocking 
february 2013 by mpm
Amino CBBs - Concurrent Building Blocks
Project provides a set of concurrent building blocks (Java & C/C++) that can be used to develop parallel/multi-threaded applications
concurrency  c++  non-blocking 
february 2013 by mpm
Lock-free multithreaded memory allocation
c++  memory  non-blocking 
january 2013 by mpm
Nonblocking memory management support for dynamic-sized data structures
Conventional dynamic memory management methods interact poorly with lock-free synchronization. In this article, we introduce novel techniques that allow lock-free data structures to allocate and free memory dynamically using any thread-safe memory management library
concurrency  memory  non-blocking 
july 2012 by mpm
Non-intrusive MPSC node-based queue
+ Waitfree and fast producers. One XCHG is maximum what one can get with multi-producer non-distributed queue.
+ Extremely fast consumer. On fast-path it's atomic-free, XCHG executed per node batch, in order to grab 'last item'.
+ No need for node order reversion. So pop operation is always O(1).
+ ABA-free.
+ No need for PDR. That is, one can use this algorithm out-of-the-box. No need for thread registration/deregistration, periodic activity, deferred garbage etc.
datastructure  non-blocking 
march 2012 by mpm
Single-Producer/Single-Consumer Queue
Unbounded single-producer/single-consumer node-based queue
datastructure  non-blocking 
march 2012 by mpm
A concurrent hash-trie or Ctrie is a concurrent thread-safe lock-free implementation of a hash array mapped trie. It is used to implement the concurrent map abstraction. It has particularly scalable concurrent insert and remove operations and is memory-efficient. It is the first known concurrent data-structure that supports O(1), atomic, lock-free snapshots
concurrency  datastructure  non-blocking 
february 2012 by mpm
Lockless Data Structures
In this article, the second in the series, we're going to take a look at some lockless data structures. These structures allow concurrent access, without the need for explicit locking, and can be significantly faster.
concurrency  non-blocking 
august 2011 by mpm
Thinking about how modern CPUs work, something we like to call “mechanical sympathy”, using good design practices with a strong focus on teasing apart the concerns, we came up with a data structure and a pattern of use that we have called the Disruptor
concurrency  performance  java  non-blocking 
june 2011 by mpm

Copy this bookmark: