mpm + consensus   37

Designing an Efficient Replicated Log Store with Consensus Protocol
Highly available and high-performance message logging system is critical building block for various use cases that require global ordering, especially for deterministic distributed transactions. To achieve availability, we maintain multiple replicas that have the same payloads in exactly the same order. This introduces various challenging issues such as consistency between replicas after failure, while minimizing performance degradation. Replicated state machine-based consensus protocols are the most suitable candidates to fulfill those requirements, but double-write problem and different logging granularity make it hard to keep the system efficient. This paper suggests a novel way to build a replicated log store on top of Raft consensus protocol, aiming at providing the same level of consistency as well as fault-tolerance without sacrificing the throughput of the system.
consensus  consistency  replication  storage  database 
4 weeks ago by mpm
Same as the other ones but Heidi Howard so ¯\_(ツ)_/¯
8 weeks ago by mpm
A Generalised Solution to Distributed Consensus
Distributed consensus, the ability to reach agreement in the face of failures and asynchrony, is a fundamental primitive for constructing reliable distributed systems from unreliable components. The Paxos algorithm is synonymous with distributed consensus, yet it performs poorly in practice and is famously difficult to understand. In this paper, we re-examine the foundations of distributed consensus. We derive an abstract solution to consensus, which utilises immutable state for intuitive reasoning about safety. We prove that our abstract solution generalises over Paxos as well as the Fast Paxos and Flexible Paxos algorithms. The surprising result of this analysis is a substantial weakening to the quorum requirements of these widely studied algorithms.
8 weeks ago by mpm
Partitioned consensus and its impact on Spanner’s latency
I noticed that there appears to be a general assumption amongst my readers that unified consensus systems must have higher latency than partitioned consensus systems. In this post, I want to clear up some of the misconceptions and inaccurate assumptions around these latency tradeoffs, and present a deeper (and technical) analysis on how these different approaches to consensus have surprisingly broad consequences on transaction latency. We will analyze the latency tradeoff from three perspectives: (1) Latency for write transactions, (2) Latency for linearizable read-only transactions and (3) Latency for serializable snapshot transactions.
consensus  latency  consistency 
december 2018 by mpm
Liberating Distributed Consensus
Three example algorithms which achieve consensus in 1 round trip and tolerate any minority failure
paxos  consistency  consensus 
november 2018 by mpm
CASPaxos: Replicated State Machines without logs
CASPaxos is a replicated state machine (RSM) protocol, an extension of Synod. Unlike Raft and Multi-Paxos, it doesn't use leader election and log replication, thus avoiding associated complexity. Its symmetric peer-to-peer approach achieves optimal commit latency in wide-area networks and doesn't cause transient unavailability when any (N−1)/2 of N nodes crash. The lightweight nature of CASPaxos allows new combinations of RSMs in the designs of distributed systems. For example, a representation of a key-value storage as a hashtable with independent RSM per key increases fault tolerance and improves performance on multi-core systems compared with a hashtable behind a single RSM. This paper describes CASPaxos protocol, formally proves its safety properties, covers cluster membership change and evaluates the benefits of a CASPaxos-based key-value storage.
paxos  consensus 
february 2018 by mpm
Seamless Paxos Coordinators
The Paxos algorithm requires a single correct coordinator process to operate. After a failure, the replacement of the coordinator may lead to a temporary unavailability of the application implemented atop Paxos. So far, this unavailability has been addressed by reducing the coordinator replacement rate through the use of stable coordinator selection algorithms. We have observed that the cost of recovery of the newly elected coordinator's state is at the core of this unavailability problem. In this paper we present a new technique to manage coordinator replacement that allows the recovery to occur concurrently with new consensus rounds. Experimental results show that our seamless approach effectively solves the temporary unavailability problem, its adoption entails uninterrupted execution of the application. Our solution removes the restriction that the occurrence of coordinator replacements is something to be avoided, allowing the decoupling of the application execution from the accuracy of the mechanism used to choose a coordinator. This result increases the performance of the application even in the presence of failures, it is of special importance to the autonomous operation of replicated applications that have to adapt to varying network conditions and partial failures.
paxos  consensus  protocol 
december 2017 by mpm
Tendermint is like Apache Web Server for distributed ledgers. It handles things like p2p networking, consensus, transaction broadcasting, etc. It’s agnostic to any business logic process of transactions; to Tendermint they look like binary bytes. You plug in a program that processes the raw transaction bytes. Once a network of validators agree on a block and commit it, the transactions get pushed into the application
p2p  broadcast  consensus  transactions 
september 2017 by mpm
Revisiting the Relationship Between Non-blocking Atomic Commit and Consensus
This paper discusses the relationship between the Non-Blocking Atomic Commitment problem (NB-AC) and the Consensus problem in asynchronous systems with unreliable failure detectors. We first confirm that NB-AC is harder than Consensus. In contrast to Consensus, NB-AC is impossible to solve with unreliable failure detectors even with a single crash failure. We define a weaker problem than NB-AC, called Non-Blocking Weak Atomic Commitment (NB-WAC), which is sufficient to solve for most practical situations. A fundamental characteristic of NB-WAC is its reducibility to Consensus. The previous results on solving Consensus with unreliable failure detectors apply therefore to NB-WAC. An interesting intermediate result of this reducibility is that Uniform Consensus and Consensus are equivalent problems. We show actually that any algorithm that solves Consensus with unreliable failure detectors also solves Uniform Consensus
consensus  transactions 
may 2017 by mpm
In search of a simple consensus algorithm
In this post: (1) covered an availability limitation of the Raft protocol (2) demonstrated that modern implementations of Raft are subject to it (3) described an existing simpler approach to the problem of consensus (4) showed that its toy 500-lines implementation has performance similar to Etcd but doesn't suffer from Raft's performance penalty
consensus  paxos  availability  actors 
april 2017 by mpm
Spanner vs. Calvin
I found it very difficult to find cases where an ideal implementation of Spanner theoretically outperforms an ideal implementation of Calvin.
consistency  consensus  database  performance  scalability 
april 2017 by mpm
Multileader WAN Paxos: Ruling the Archipelago with Fast Consensus
We present WPaxos, a multileader wide area network (WAN) Paxos protocol, that achieves low-latency high-throughput consensus across WAN deployments. WPaxos dynamically partitions the global object-space across multiple concurrent leaders that are deployed strategically using flexible quorums. This partitioning and emphasis on local operations allow our protocol to significantly outperform leaderless approaches, such as EPaxos, while maintaining the same consistency guarantees. Unlike statically partitioned multiple Paxos deployments, WPaxos adapts dynamically to the changing access locality through adaptive object stealing. The ability to quickly react to changing access locality not only speeds up the protocol, but also enables support for mini-transactions
paxos  consensus  scalability 
april 2017 by mpm
Efficient and Modular Consensus-Free Reconfiguration for Fault-Tolerant Storage
Quorum systems are useful tools for implementing consistent and available storage in the presence of failures. These systems usually comprise a static set of servers that provide a fault-tolerant read/write register accessed by a set of clients. We consider a dynamic variant of these systems and propose FreeStore, a set of fault-tolerant protocols that emulates a register in dynamic asynchronous systems in which processes are able to join/leave the servers set during the execution. These protoco...
consensus  replication  storage  fault-tolerance 
july 2016 by mpm
DistributedLog (DL) is a high-performance, replicated log service, offering durability, replication and strong consistency as essentials for building reliable distributed systems
may 2016 by mpm
Estensible Distributed Coordination
Most services inside a data center are distributed systems requiring coordination and synchronization in the form of primitives like distributed locks and message queues. We argue that extensibility is a crucial feature of the coordination infrastructures used in these systems. Without the ability to extend the functionality of coordination services, applications might end up using sub-optimal coordination algorithms, possibly leading to low performance. Adding extensibility, however, requires mechanisms that constrain extensions to be able to make reasonable security and performance guarantees. We propose a scheme that enables extensions to be introduced and removed dynamically in a secure way. To avoid performance overheads due to poorly designed extensions, it constrains the access of extensions to resources. Evaluation results for extensible versions of ZooKeeper and DepSpace show that it is possible to increase the throughput of a distributed queue by more than an order of magnitude (17x for ZooKeeper, 24x for DepSpace) while keeping the underlying coordination kernel small
consensus  coordination 
may 2015 by mpm
An Efficient Read Dominant Data Replication Protocol under Serial Isolation using Quorum Consensus Approach
In distributed systems, data replication provides better availability, higher read capacity, improved access efficiency and lower bandwidth requirements in the system. In this paper, we propose a significantly efficient approach of the data replication for serial isolation by using newly proposed Circular quorum systems. This paper has three major contributions. First, we have proposed the Circular quorum systems that generalize the various existing quorum systems, such as Read-one-write-all (ROWA) quorum systems, Majority quorum systems, Grid quorum systems, Diamond quorum systems, D-Space quorum systems, Multi-dimensional-grid quorum systems and Generalized-grid quorum systems. Second, Circular quorum systems not only generalizes but also improves the performance over existing quorum systems of their category. Third, we proposed a highly available Circular quorum consensus protocol for data replication under serial isolation level that uses a suitable Circular quorum system for read dominant scenario
consensus  performance  replication 
july 2014 by mpm
Ark: A Real-World Consensus Implementation
Ark is an implementation of a consensus algorithm similar to Paxos and Raft, designed as an improvement over the existing consensus algorithm used by MongoDB and TokuMX. Ark was designed from first principles, improving on the election algorithm used by TokuMX, to fix deficiencies in MongoDB's consensus algorithms that can cause data loss. It ultimately has many similarities with Raft, but diverges in a few ways, mainly to support other features like chained replication and unacknowledged writes
july 2014 by mpm
ARC: Analysis of Raft Consensus
The Paxos algorithm, despite being synonymous with distributed consensus for a decade, is famously difficult to reason about and implement due to its non-intuitive approach and underspecification. In response, this project implemented and evaluated a framework for constructing fault-tolerant applications, utilising the recently proposed Raft algorithm for distributed consensus. Constructing a simulation framework for our implementation enabled us to evaluate the protocol on everything from understandability and efficiency to correctness and performance in diverse network environments. We propose a range of optimisations to the protocol and released to the community a testbed for developing further optimisations and investigating optimal protocol parameters for real-world deployments
paxos  consensus 
july 2014 by mpm
Optimistic Parallel State-Machine Replication
State-machine replication, a fundamental approach to fault tolerance, requires replicas to execute commands deterministically, which usually results in sequential execution of commands. Sequential execution limits performance and underuses servers, which are increasingly parallel (i.e., multicore). To narrow the gap between state-machine replication requirements and the characteristics of modern servers, researchers have recently come up with alternative execution models. This paper surveys existing approaches to parallel state-machine replication and proposes a novel optimistic protocol that inherits the scalable features of previous techniques. Using a replicated B+-tree service, we demonstrate in the paper that our protocol outperforms the most efficient techniques by a factor of 2.4 times.
replication  performance  consensus 
june 2014 by mpm
Viewstamped Replication Revisited
This paper presents an updated version of Viewstamped Replication, a replication technique that handles failures in which nodes crash. It describes how client requests are handled, how the group reorganizes when a replica fails, and how a failed replica is able to rejoin the group. The paper also describes a number of important optimizations and presents a protocol for handling reconfigurations that can change both the group membership and the number of failures the group is able to handle.
consensus  consistency  fault-tolerance  availability 
june 2013 by mpm
Generalized Lattice Agreement
Unlike consensus, which is impossible in the presence of even a single process failure, lattice agreement has been shown to be decidable in the presence of failures. In this paper, we consider lattice agreement problems in asynchronous, message passing systems. We present an algorithm for the lattice agreement problem that guarantees liveness as long as a majority of the processes are non-faulty
january 2013 by mpm
Dynamic Voting for Consistent Primary Components
The dynamic voting paradigm allows such systems to define quorums adaptively, accounting for the changes in the set of participants. Rrrthermore, dynamic voting was proven to be the most available paradigm for maintaining quorums in unreliable networks. However, the subtleties of implementing dynamic voting were not well understood; in fact, many of the suggested protocols may lead to inconsistencies in caseof failurs.
consensus  consistency  fault-tolerance 
december 2012 by mpm
SolidFrame is a C++ framework intending to offer everything one need to build powerful, distributed, highly-scalable, client-server applications. It is designed and implemented for speed and power, with great interest on the ease of use by the developers
c++  networking  paxos  consensus 
april 2012 by mpm
Boxwood: Abstractions as the Foundation for Storage Infrastructure
We have built a system called Boxwood to explore the feasibility and utility of providing high-level abstractions or data structures as the fundamental storage infrastructure
distributed  consensus  datastructure  storage  fault-tolerance 
march 2012 by mpm
Paxos Made Moderately Complex
This paper provides imperative pseudo-code for the full Paxos (or Multi-Paxos) protocol without shying away from discussing various implementation details.
paxos  consistency  consensus  distributed  fault-tolerance  leader-election 
march 2012 by mpm
The SMART way to migrate replicated stateful services
This paper describes SMART, a new technique for changing the set of machines where such a service runs, i.e., migrating the service
fault-tolerance  availability  consensus  consistency 
march 2012 by mpm
Lower Bounds for Asynchronous Consensus
Impossibility results and best-case lower bounds are proved for the number of message delays and the number of processes required to reach agreement in an asynchronous consensus algorithm that tolerates non-Byzantine failure
consensus  availability  consistency  distributed  fault-tolerance 
march 2012 by mpm
Doozer is a highly-available, completely consistent store for small amounts of extremely important data. When the data changes, it can notify connected clients immediately (no polling), making it ideal for infrequently-updated data for which clients want real-time updates. Doozer is good for name service, database master elections, and configuration data shared between several machines
distributed  coordination  consistency  consensus  availability  go 
april 2011 by mpm
riak_zab is an extension for riak_core that provides totally ordered atomic broadcast capabilities. This is accomplished through a pure Erlang implementation of Zab, the Zookeeper Atomic Broadcast protocol invented by Yahoo! Research
erlang  coordination  consistency  consensus  leader-election 
april 2011 by mpm
Optimistic Replication
surveys optimistic replication algorithms that allow replica contents to diverge in the short term, in order to support concurrent work practices and to tolerate failures in low-quality communication links
distributed  base  consensus 
february 2009 by mpm
high available and reliable coordination system
distributed  consensus  scalability 
july 2008 by mpm
Leader Election in Distributed Systems with Crash Failures
Garcia-Molina's Bully Algorithm is a classic solution to leader election in synchronous systems with crash failures. This paper shows that the Bully Algorithm can be easily adapted for use in asynchronous systems
distributed  consensus 
may 2008 by mpm
Paxos Made Live
our experience building a fault-tolerant data-base using the Paxos consensus algorithm
paxos  consensus 
october 2007 by mpm
The Byzantine Generals Problem
This article presents the algorithm that solves the Byzantine General’s Problem, as first described by Lamport, Pease, and Shostak in 1982
concurrency  consensus  byzantine 
august 2007 by mpm
A brief history of Consensus, 2PC and Transaction Commit
Reading the literature on consensus is difficult because the language changes (consensus was originally called agreement), the results come in an order that isn't logical, and the whole framework for describing distributed algorithms evolved in parallel with the work
consensus  transactions 
june 2007 by mpm

Copy this bookmark: