jm + voldemort   18

If Eventual Consistency Seems Hard, Wait Till You Try MVCC
ex-Percona MySQL wizard Baron Schwartz, noting that MVCC as implemented in common SQL databases is not all that simple or reliable compared to big bad NoSQL Eventual Consistency:
Since I am not ready to assert that there’s a distributed system I know to be better and simpler than eventually consistent datastores, and since I certainly know that InnoDB’s MVCC implementation is full of complexities, for right now I am probably in the same position most of my readers are: the two viable choices seem to be single-node MVCC and multi-node eventual consistency. And I don’t think MVCC is the simpler paradigm of the two.
nosql  concurrency  databases  mysql  riak  voldemort  eventual-consistency  reliability  storage  baron-schwartz  mvcc  innodb  postgresql 
december 2014 by jm
Great quote from Voldemort author Jay Kreps
"Reading papers: essential. Slavishly implementing ideas you read: not necessarily a good idea. Trust me, I wrote an Amazon Dynamo clone."

Later in the discussion, on complex conflict resolution logic (as used in Dynamo, Voldemort, and Riak):

"I reviewed 200 Voldemort stores, 190 used default lww conflict resolution. 10 had custom logic, all 10 of which had bugs." --

(although IMO I'd prefer complex resolution to non-availability, when AP is required)
voldemort  jay-kreps  dynamo  cap-theorem  ap  riak  papers  lww  conflict-resolution  distcomp 
november 2014 by jm
Smart Clients, haproxy, and Riak
Good, thought-provoking post on good client library approaches for complex client-server systems, particularly distributed stores like Voldemort or Riak. I'm of the opinion that a smart client lib is unavoidable, and in fact essential, since the clients are part of the distributed system, personally.
clients  libraries  riak  voldemort  distsys  haproxy  client-server  storage 
october 2014 by jm
Felix says:

'Like I said, I'd like to move it to a more general / non-personal repo in the future, but haven't had the time yet. Anyway, you can still browse the code there for now. It is not a big code base so not that hard to wrap one's mind around it.

It is Apache licensed and both Kafka and Voldemort are using it so I would say it is pretty self-contained (although Kafka has not moved to Tehuti proper, it is essentially the same code they're using, minus a few small fixes missing that we added).

Tehuti is a bit lower level than CodaHale (i.e.: you need to choose exactly which stats you want to measure and the boundaries of your histograms), but this is the type of stuff you would build a wrapper for and then re-use within your code base. For example: the Voldemort RequestCounter class.'
asl2  apache  open-source  tehuti  metrics  percentiles  quantiles  statistics  measurement  latency  kafka  voldemort  linkedin 
october 2014 by jm
An embryonic metrics library for Java/Scala from Felix GV at LinkedIn, extracted from Kafka's metric implementation and in the new Voldemort release. It fixes the major known problems with the Meter/Timer implementations in Coda-Hale/Dropwizard/Yammer Metrics.

'Regarding Tehuti: it has been extracted from Kafka's metric implementation. The code was originally written by Jay Kreps, and then maintained improved by some Kafka and Voldemort devs, so it definitely is not the work of just one person. It is in my repo at the moment but I'd like to put it in a more generally available (git and maven) repo in the future. I just haven't had the time yet...

As for comparing with CodaHale/Yammer, there were a few concerns with it, but the main one was that we didn't like the exponentially decaying histogram implementation. While that implementation is very appealing in terms of (low) memory usage, it has several misleading characteristics (a lack of incoming data points makes old measurements linger longer than they should, and there's also a fairly high possiblity of losing interesting outlier data points). This makes the exp decaying implementation robust in high throughput fairly constant workloads, but unreliable in sparse or spiky workloads. The Tehuti implementation provides semantics that we find easier to reason with and with a small code footprint (which we consider a plus in terms of maintainability). Of course, it is still a fairly young project, so it could be improved further.'

More background at the kafka-dev thread:
kafka  metrics  dropwizard  java  scala  jvm  timers  ewma  statistics  measurement  latency  sampling  tehuti  voldemort  linkedin  jay-kreps 
october 2014 by jm
Home · linkedin/ Wiki is a REST+JSON framework for building robust, scalable service architectures using dynamic discovery and simple asynchronous APIs. fills a niche for building RESTful service architectures at scale, offering a developer workflow for defining data and REST APIs that promotes uniform interfaces, consistent data modeling, type-safety, and compatibility checked API evolution.

The new underlying comms layer for Voldemort, it seems.
voldemort  d2  linkedin  json  rest  http  api  frameworks  java 
february 2014 by jm
Gilt Tech
Gilt ran a stress-test of Riak to replace Voldemort (I think) in a shadow stack, with good results:
Riak’s strong performance suggests that, should we pursue implementation, it will withstand our unique traffic needs and prove reliable. As for the Gilt-Basho team’s strong performance: It was amazing that we were able to accomplish so much in just a week’s time! Thanks again to Seth and Steve for making this possible.
riak  testing  shadow-stack  voldemort  storage  gilt 
september 2013 by jm
Voldemort on Solid State Drives [paper]
'This paper and talk was given by the LinkedIn Voldemort Team at the Workshop on Big Data Benchmarking (WBDB May 2012).'

With SSD, we find that garbage collection will become a very significant bottleneck, especially for systems which have little control over the storage layer and rely on Java memory management. Big heapsizes make the cost of garbage collection expensive, especially the single threaded CMS Initial mark. We believe that data systems must revisit their caching strategies with SSDs. In this regard, SSD has provided an efficient solution for handling fragmentation and moving towards predictable multitenancy.
voldemort  storage  ssd  disk  linkedin  big-data  jvm  tuning  ops  gc 
september 2013 by jm
Project Voldemort: measuring BDB space consumption
HOWTO measure this using the BDB-JE command line tools. this is exposed through JMX as the CleanerBacklog metric, too, I think, but good to bookmark just in case
voldemort  cleaner  bdb  ops  space  storage  monitoring  debug 
june 2013 by jm
Making sense out of BDB-JE fast stats
good info on the system metrics recorded by BDB-JE's EnvironmentStats code, particularly where cache and cleaner activity are concerned. Particularly useful for Voldemort
voldemort  caching  bdb  bdb-je  storage  tuning  ops  metrics  reference 
may 2013 by jm
Alex Feinberg's response to Damien Katz' anti-Dynamoish/pro-Couchbase blog post
Insightful response, worth bookmarking. (the original post is at ).
while you are saving on read traffic (online reads only go to the master), you are now decreasing availability (contrary to your stated goal), and increasing system complexity.
You also do hurt performance by requiring all writes and reads to be serialized through a single node: unless you plan to have a leader election whenever the node fails to meet a read SLA (which is going to result a disaster -- I am speaking from personal experience), you will have to accept that you're bottlenecked by a single node. With a Dynamo-style quorum (for either reads or writes), a single straggler will not reduce whole-cluster latency.
The core point of Dynamo is low latency, availability and handling of all kinds of partitions: whether clean partitions (long term single node failures), transient failures (garbage collection pauses, slow disks, network blips, etc...), or even more complex dependent failures.
The reality, of course, is that availability is neither the sole, nor the principal concern of every system. It's perfect fine to trade off availability for other goals -- you just need to be aware of that trade off.
cap  distributed-databases  databases  quorum  availability  scalability  damien-katz  alex-feinberg  partitions  network  dynamo  riak  voldemort  couchbase 
may 2013 by jm
Riak, CAP, and eventual consistency
Good (albeit draft) write-up of the implications of CAP, allow_mult, and last_write_wins conflict-resolution policies in Riak:
As Brewer's CAP theorem established, distributed systems have to make hard choices. Network partition is inevitable. Hardware failure is inevitable. When a partition occurs, a well-behaved system must choose its behavior from a spectrum of options ranging from "stop accepting any writes until the outage is resolved" (thus maintaining absolute consistency) to "allow any writes and worry about consistency later" (to maximize availability). Riak leans toward the availability end of the spectrum, but allows the operator and even the developer to tune read and write requests to better meet the business needs for any given set of data.
riak  cap  eventual-consistency  distcomp  distributed-systems  partition  last-write-wins  voldemort  allow_mult 
april 2013 by jm
Project Voldemort at Gilt Groupe: When Failure Isn't an Option [slides]
Geir Magnusson explains how Gilt Groupe is using Project Voldemort to scale out their e-commerce transactional system. The initial SQL solution had to be replaced because it could not handle the transactional spikes the site is experiencing daily due to its particular way of selling their inventory: each day at noon. Magnusson explains why they chose Voldemort and talks about the architecture.

via Filippo
via:filippo  database  architecture  nosql  data  voldemort  gilt-groupe  ops  storage  presentations 
april 2013 by jm
Announcing the Voldemort 1.3 Open Source Release
new release from LinkedIn -- better p90/p99 PUT performance, improvements to the BDB-JE storage layer, massively-improved rebalance performance
voldemort  linkedin  open-source  bdb  nosql 
march 2013 by jm
High Scalability - Analyzing billions of credit card transactions and serving low-latency insights in the cloud
Hadoop, a batch-generated read-only Voldemort cluster, and an intriguing optimal-storage histogram bucketing algorithm:
The optimal histogram is computed using a random-restart hill climbing approximated algorithm.
The algorithm has been shown very fast and accurate: we achieved 99% accuracy compared to an exact dynamic algorithm, with a speed increase of one factor. [...] The amount of information to serve in Voldemort for one year of BBVA's credit card transactions on Spain is 270 GB. The whole processing flow would run in 11 hours on a cluster of 24 "m1.large" instances. The whole infrastructure, including the EC2 instances needed to serve the resulting data would cost approximately $3500/month.
scalability  scaling  voldemort  hadoop  batch  algorithms  histograms  statistics  bucketing  percentiles 
february 2013 by jm

related tags

alex-feinberg  algorithms  allow_mult  ap  apache  api  architecture  asl2  availability  baron-schwartz  batch  bdb  bdb-je  big-data  bucketing  caching  cap  cap-theorem  cassandra  cdt  cleaner  client-server  clients  concurrency  conflict-resolution  consistency  couchbase  crdt  d2  damien-katz  data  database  databases  debug  disk  distcomp  distributed  distributed-databases  distributed-systems  distsys  dropwizard  dynamo  eventual-consistency  ewma  frameworks  gc  gilt  gilt-groupe  hadoop  haproxy  histograms  http  innodb  java  jay-kreps  json  jvm  kafka  last-write-wins  latency  libraries  linkedin  lww  manhattan  measurement  metrics  monitoring  mvcc  mysql  network  nosql  open-source  ops  papers  partition  partitions  percentiles  postgresql  presentations  quantiles  quorum  reference  reliability  replication  rest  riak  sampling  scala  scalability  scaling  shadow-stack  space  ssd  statistics  storage  tehuti  testing  time-series  timers  tuning  twitter  via:filippo  voldemort 

Copy this bookmark: