jm + distributed-systems   21

slides from "Distributed Log-Processing Design Workshop", SRECon Americas 2018
Fantastic presentation discussing the kinds of design criteria used when architecting a large-scale data processing and storage service. Interesting to see some Google terminology, e.g. "dimensioning" -- ballparking the expected scalability numbers, bandwidth, qps, and limits.
distributed-systems  coding  design  architecture  google  photon  logs  log-storage  slides  srecon 
4 weeks ago by jm
Elasticsearch and data loss
"@alexbfree @ThijsFeryn [ElasticSearch is] fine as long as data loss is acceptable. https://aphyr.com/posts/317-call-me-maybe-elasticsearch . We lose ~1% of all writes on average."
elasticsearch  data-loss  reliability  data  search  aphyr  jepsen  testing  distributed-systems  ops 
october 2015 by jm
Henry Robinson on testing and fault discovery in distributed systems

'Let's talk about finding bugs in distributed systems for a bit.
These chaos monkey-style fault testing systems are all well and good, but by being application independent they're a very blunt instrument.
Particularly they make it hard to search the fault space for bugs in a directed manner, because they don't 'know' what the system is doing.
Application-aware scripting of faults in a dist. systems seems to be rarely used, but allows you to directly stress problem areas.
For example, if a bug manifests itself only when one RPC returns after some timeout, hard to narrow that down with iptables manipulation.
But allow a script to hook into RPC invocations (and other trace points, like DTrace's probes), and you can script very specific faults.
That way you can simulate cross-system integration failures, *and* write reproducible tests for the bugs they expose!
Anyhow, I've been doing this in Impala, and it's been very helpful. Haven't seen much evidence elsewhere.'
henry-robinson  testing  fault-discovery  rpc  dtrace  tracing  distributed-systems  timeouts  chaos-monkey  impala 
september 2015 by jm
Reliable Cron across the Planet - ACM Queue
How Google (hi Niall!) built their internal "distributed cron" service, using a Paxos-driven master election process at its core. I've been looking for a distributed cron for donkey's years, I wish someone would write a decent open source one....
distributed-systems  cron  acm  paxos  distributed-cron  master-election  distcomp  reliability 
march 2015 by jm
A Brief Tour of FLP Impossibility
One of the most important results in distributed systems theory was published in April 1985 by Fischer, Lynch and Patterson. Their short paper ‘Impossibility of Distributed Consensus with One Faulty Process’, which eventually won the Dijkstra award given to the most influential papers in distributed computing, definitively placed an upper bound on what it is possible to achieve with distributed processes in an asynchronous environment.

This particular result, known as the ‘FLP result’, settled a dispute that had been ongoing in distributed systems for the previous five to ten years. The problem of consensus – that is, getting a distributed network of processors to agree on a common value – was known to be solvable in a synchronous setting, where processes could proceed in simultaneous steps. In particular, the synchronous solution was resilient to faults, where processors crash and take no further part in the computation. Informally, synchronous models allow failures to be detected by waiting one entire step length for a reply from a processor, and presuming that it has crashed if no reply is received.

This kind of failure detection is impossible in an asynchronous setting, where there are no bounds on the amount of time a processor might take to complete its work and then respond with a message. Therefore it’s not possible to say whether a processor has crashed or is simply taking a long time to respond. The FLP result shows that in an asynchronous setting, where only one processor might crash, there is no distributed algorithm that solves the consensus problem.
distributed-systems  flp  consensus-algorithms  algorithms  distcomp  papers  proofs 
november 2013 by jm
Model checking for highly concurrent code
Applied formal methods in order to test distributed systems -- specifically GlusterFS:

I'll use an example from my own recent experience. I'm developing a new kind of replication for GlusterFS. To make sure the protocol behaves correctly even across multiple failures, I developed a Murphi model for it. [...]

I added a third failure [to the simulated model]. I didn't expect a three-node system to continue working if more than one of those were concurrent (the model allows the failures to be any mix of sequential and concurrent), but I expected it to fail cleanly without reaching an invalid state. Surprise! It managed to produce a case where a reader can observe values that go back in time. This might not make much sense without knowing the protocol involved, but it might give some idea of the crazy conditions a model checker will find that you couldn't possibly have considered. [...]

So now I have a bug to fix, and that's a good thing. Clearly, it involves a very specific set of ill-timed reads, writes, and failures. Could I have found it by inspection or ad-hoc analysis? Hell, no. Could I have found it by testing on live systems? Maybe, eventually, but it probably would have taken months for this particular combination to occur on its own. Forcing it to occur would require a lot of extra code, plus an exerciser that would amount to a model checker running 100x slower across machines than Murphi does. With enough real deployments over enough time it would have happened, but the only feasible way to prevent that was with model checking. These are exactly the kinds of bugs that are hardest to fix in the field, and that make users distrust distributed systems, so those of us who build such systems should use every tool at our disposal to avoid them.
model-checking  formal-methods  modelling  murphi  distcomp  distributed-systems  glusterfs  testing  protocols 
september 2013 by jm
Call me maybe: Kafka
Aphyr takes a look at Kafka 0.8's replication with the Jepsen test suite. It doesn't go great. Jay Kreps responds here: http://blog.empathybox.com/post/62279088548/a-few-notes-on-kafka-and-jepsen
jay-kreps  kafka  replication  distributed-systems  distcomp  networking  reliability  fault-tolerance  jepsen 
september 2013 by jm
Getting Real About Distributed System Reliability
I have come around to the view that the real core difficulty of [distributed] systems is operations, not architecture or design. Both are important but good operations can often work around the limitations of bad (or incomplete) software, but good software cannot run reliably with bad operations. This is quite different from the view of unbreakable, self-healing, self-operating systems that I see being pitched by the more enthusiastic NoSQL hypesters. Worse yet, you can’t easily buy good operations in the same way you can buy good software—you might be able to hire good people (if you can find them) but this is more than just people; it is practices, monitoring systems, configuration management, etc.
reliability  nosql  distributed-systems  jay-kreps  ops 
september 2013 by jm
"Scalable Eventually Consistent Counters over Unreliable Networks" [paper, pdf]

Counters are an important abstraction in distributed computing, and
play a central role in large scale geo-replicated systems, counting events such as web page impressions or social network "likes". Classic distributed counters, strongly consistent, cannot be made both available and partition-tolerant, due to the CAP Theorem, being unsuitable to large scale scenarios.

This paper defi nes Eventually Consistent Distributed Counters (ECDC) and presents an implementation of the concept, Hando ff Counters, that is scalable and works over unreliable networks. By giving up the sequencer aspect of classic distributed counters, ECDC implementations can be made AP in the CAP design space, while retaining the essence of counting. Handoff Counters are the first CRDT (Conflict-free Replicated Data Type) based mechanism that overcomes the identity explosion problem in naive CRDTs, such as G-Counters (where state size is linear in the number of independent actors that ever incremented the counter), by managing identities towards avoiding global propagation, and garbage collecting temporary entries. The approach used in Hando ff Counters is not restricted to counters, being more generally applicable to other data types with associative and commutative operations.
pdf  papers  eventual-consistency  counters  distributed-systems  distcomp  cap-theorem  ecdc  handoff-counters  crdts  data-structures  g-counters 
august 2013 by jm
Beating the CAP Theorem Checklist
'Your ( ) tweet ( ) blog post ( ) marketing material ( ) online comment
advocates a way to beat the CAP theorem. Your idea will not work. Here is why
it won't work:'

lovely stuff, via Bill De hOra
via:dehora  funny  cap  cs  distributed-systems  distcomp  networking  partitions  state  checklists 
august 2013 by jm
An excellent writeup of the TCP bounded-buffer deadlock problem
on pages 146-149 of 'TCP/IP Sockets in C: Practical Guide for Programmers' by Michael J. Donahoo and Kenneth L. Calvert.
tcp  ip  bounded-buffer  deadlock  bugs  buffering  connections  distributed-systems 
july 2013 by jm
the TCP bounded buffer deadlock problem
I've wound up mentioning this twice in the past week, so it's worth digging up and bookmarking!
Under certain circumstances a TCP connection can end up in a "deadlock", where neither the client nor the server is able to write data out or read data in. This is caused by two factors. First, a client or server cannot perform two transactions at once; a read cannot be performed if a write transaction is in progress, and vice versa. Second, the buffers that exist at either end of the TCP connection are of limited size. The deadlock occurs when both the client and server are trying to send an amount of data that is larger than the combined input and output buffer size.
tcp  ip  bounded-buffer  deadlock  bugs  buffering  connections  distributed-systems 
july 2013 by jm
Riak, CAP, and eventual consistency
Good (albeit draft) write-up of the implications of CAP, allow_mult, and last_write_wins conflict-resolution policies in Riak:
As Brewer's CAP theorem established, distributed systems have to make hard choices. Network partition is inevitable. Hardware failure is inevitable. When a partition occurs, a well-behaved system must choose its behavior from a spectrum of options ranging from "stop accepting any writes until the outage is resolved" (thus maintaining absolute consistency) to "allow any writes and worry about consistency later" (to maximize availability). Riak leans toward the availability end of the spectrum, but allows the operator and even the developer to tune read and write requests to better meet the business needs for any given set of data.
riak  cap  eventual-consistency  distcomp  distributed-systems  partition  last-write-wins  voldemort  allow_mult 
april 2013 by jm
Eventual Consistency Today: Limitations, Extensions, and Beyond - ACM Queue
Good overview of the current state of eventually-consistent data store research, covering CALM and CRDTs, from Peter Bailis and Ali Ghodsi
eventual-consistency  data  storage  horizontal-scaling  research  distcomp  distributed-systems  via:martin-thompson  crdts  calm  acid  cap 
april 2013 by jm
Lone Sale of $4.1 Billion in Contracts Led to ‘Flash Crash’ in May
'as the computers of the high-frequency traders traded contracts back and forth, a “hot potato” effect was created, the report said, as contracts changed hands 27,000 times in 14 seconds, but with eventually only 200 actually being bought or sold.' upshot: horrifically complex distributed feedback loops now directly impact our economies -- great :(
distributed-systems  distcomp  flash-crash  stock-market  trading  automation  via:nelson  sec  nyse  high-frequency-trading  from delicious
october 2010 by jm

related tags

acid  acm  algorithms  allow_mult  aphyr  architecture  automation  bounded-buffer  buffering  bugs  calm  canary-requests  cap  cap-theorem  chaos-monkey  checklists  coding  connections  consensus-algorithms  counters  crdts  cron  cs  data  data-loss  data-structures  databases  deadlock  debugging  design  devops  distcomp  distributed-cron  distributed-systems  dtrace  ecdc  elasticsearch  eventual-consistency  fault-discovery  fault-tolerance  flash-crash  flp  formal-methods  funny  g-counters  glusterfs  google  graph  handoff-counters  henry-robinson  high-frequency-trading  horizontal-scaling  http  impala  infrastructure  ip  jay-kreps  jepsen  jsq  kafka  last-write-wins  linkedin  live  load-balancers  load-balancing  log-storage  logs  master-election  microservices  model-checking  modelling  murphi  netflix  networking  nosql  nyse  ops  papers  partition  partitions  paxos  pdf  photon  prod  production  proofs  protocols  querying  reliability  replication  research  riak  rpc  scalability  scaling  scheduling  search  sec  set  set-cover  slides  soa  sparrow  srecon  stack  state  stock-market  storage  tcp  testing  timeouts  tracer-requests  tracing  trading  twitter  via:dehora  via:martin-thompson  via:nelson  via:xmal  voldemort  zipkin 

Copy this bookmark:



description:


tags: