jm + data-stores   3

A Decade of Dynamo: Powering the next wave of high-performance, internet-scale applications - All Things Distributed
A deep dive on how we were using our existing databases revealed that they were frequently not used for their relational capabilities. About 70 percent of operations were of the key-value kind, where only a primary key was used and a single row would be returned. About 20 percent would return a set of rows, but still operate on only a single table.

With these requirements in mind, and a willingness to question the status quo, a small group of distributed systems experts came together and designed a horizontally scalable distributed database that would scale out for both reads and writes to meet the long-term needs of our business. This was the genesis of the Amazon Dynamo database.

The success of our early results with the Dynamo database encouraged us to write Amazon's Dynamo whitepaper and share it at the 2007 ACM Symposium on Operating Systems Principles (SOSP conference), so that others in the industry could benefit. The Dynamo paper was well-received and served as a catalyst to create the category of distributed database technologies commonly known today as "NoSQL."

That's not an exaggeration. Nice one Werner et al!
dynamo  history  nosql  storage  databases  distcomp  amazon  papers  acm  data-stores 
7 days ago by jm
"Why We Built Our Own Distributed Column Store" (video)
"Why We Built Our Own Distributed Column Store" by Sam Stokes of -- Retriever, inspired by Facebook's Scuba
scuba  retriever  storage  data-stores  columnar-storage  databases  via:charitymajors 
7 days ago by jm
Why I'm Walking Away From CouchDB
In practice there are two gotchas that are so painful I am  looking for a replacement with a different featureset than couchdb provides. The location tracking project uses couchdb to store 20,000 new records per day. It has more write traffic than read traffic and runs on modest hardware. Those two gotchas are:

1. View Index updates.

While I have a vague understanding of why view index updates are slow and bulky and important, in practice it is unworkable. Every write sets up a trap for the first reader to come along after the write. The more writes there are, the bigger the trap for the first reader which has to wait on the couchdb process that refreshes the view index on an as-needed basis. I believe this trade-off was made to keep writes fast. No need to update the view index until all writes are actually complete, right? Write traffic is heavier than read traffic and the time needed for that index refresh causes the webapp to crash because its not setup to handle timeouts from a database query. The workaround is as hackish as one can imagine -  cron jobs to hit every  map/reduce query to keep indexes fresh.

2. Append only database file

Append only is in theory a great way to ensure on-disk reliability. A system crash during an append should only affect that append. Its a crash during an update to existing parts of the file that risks the integrity of more than whats being updated. With so many layers of caching and optimizations in the kernel and the filesystem and now in the workings of SSD drives, I'm not sure append-only gives extra protection anymore.

What it does do is a create a huge operational headache. The on-disk file can never grow beyond half the available storage space. Record deletion uses new disk space and if the half-full mark approaches, vacuuming must be done. The entire database is rewritten to the filesystem, leaving out no longer needed records. If the data file should happen to grow beyond half the partition, the system has esentially crashed because there is no way to compact the file and soon the partition will be full. This is a likely scenario when there is a lot of record deletion activity.

The system in question does a lot of writes of temporary data that is followed up by deletes a few days later. There is also a lot of permanent storage that hardly gets used. Rewriting every byte of the records that are long-lived due to compaction is an enormous amount of wasted I/O - doubly so given SSD drives have a short write-cycle lifespan.
nosql  couchdb  consistency  checkpointing  databases  data-stores  indexing 
april 2013 by jm

Copy this bookmark: