jm + dynamo   4

A Decade of Dynamo: Powering the next wave of high-performance, internet-scale applications - All Things Distributed
A deep dive on how we were using our existing databases revealed that they were frequently not used for their relational capabilities. About 70 percent of operations were of the key-value kind, where only a primary key was used and a single row would be returned. About 20 percent would return a set of rows, but still operate on only a single table.

With these requirements in mind, and a willingness to question the status quo, a small group of distributed systems experts came together and designed a horizontally scalable distributed database that would scale out for both reads and writes to meet the long-term needs of our business. This was the genesis of the Amazon Dynamo database.

The success of our early results with the Dynamo database encouraged us to write Amazon's Dynamo whitepaper and share it at the 2007 ACM Symposium on Operating Systems Principles (SOSP conference), so that others in the industry could benefit. The Dynamo paper was well-received and served as a catalyst to create the category of distributed database technologies commonly known today as "NoSQL."

That's not an exaggeration. Nice one Werner et al!
dynamo  history  nosql  storage  databases  distcomp  amazon  papers  acm  data-stores 
8 days ago by jm
Great quote from Voldemort author Jay Kreps
"Reading papers: essential. Slavishly implementing ideas you read: not necessarily a good idea. Trust me, I wrote an Amazon Dynamo clone."

Later in the discussion, on complex conflict resolution logic (as used in Dynamo, Voldemort, and Riak):

"I reviewed 200 Voldemort stores, 190 used default lww conflict resolution. 10 had custom logic, all 10 of which had bugs." --

(although IMO I'd prefer complex resolution to non-availability, when AP is required)
voldemort  jay-kreps  dynamo  cap-theorem  ap  riak  papers  lww  conflict-resolution  distcomp 
november 2014 by jm
Alex Feinberg's response to Damien Katz' anti-Dynamoish/pro-Couchbase blog post
Insightful response, worth bookmarking. (the original post is at ).
while you are saving on read traffic (online reads only go to the master), you are now decreasing availability (contrary to your stated goal), and increasing system complexity.
You also do hurt performance by requiring all writes and reads to be serialized through a single node: unless you plan to have a leader election whenever the node fails to meet a read SLA (which is going to result a disaster -- I am speaking from personal experience), you will have to accept that you're bottlenecked by a single node. With a Dynamo-style quorum (for either reads or writes), a single straggler will not reduce whole-cluster latency.
The core point of Dynamo is low latency, availability and handling of all kinds of partitions: whether clean partitions (long term single node failures), transient failures (garbage collection pauses, slow disks, network blips, etc...), or even more complex dependent failures.
The reality, of course, is that availability is neither the sole, nor the principal concern of every system. It's perfect fine to trade off availability for other goals -- you just need to be aware of that trade off.
cap  distributed-databases  databases  quorum  availability  scalability  damien-katz  alex-feinberg  partitions  network  dynamo  riak  voldemort  couchbase 
may 2013 by jm

Copy this bookmark: