jm + nosql   41

A Decade of Dynamo: Powering the next wave of high-performance, internet-scale applications - All Things Distributed
A deep dive on how we were using our existing databases revealed that they were frequently not used for their relational capabilities. About 70 percent of operations were of the key-value kind, where only a primary key was used and a single row would be returned. About 20 percent would return a set of rows, but still operate on only a single table.

With these requirements in mind, and a willingness to question the status quo, a small group of distributed systems experts came together and designed a horizontally scalable distributed database that would scale out for both reads and writes to meet the long-term needs of our business. This was the genesis of the Amazon Dynamo database.

The success of our early results with the Dynamo database encouraged us to write Amazon's Dynamo whitepaper and share it at the 2007 ACM Symposium on Operating Systems Principles (SOSP conference), so that others in the industry could benefit. The Dynamo paper was well-received and served as a catalyst to create the category of distributed database technologies commonly known today as "NoSQL."


That's not an exaggeration. Nice one Werner et al!
dynamo  history  nosql  storage  databases  distcomp  amazon  papers  acm  data-stores 
9 weeks ago by jm
Will the last person at Basho please turn out the lights? • The Register
Basho, once a rising star of the NoSQL database world, has faded away to almost nothing [...] According to sources, the company, which developed the Riak distributed database, has been shedding engineers for months, and is now operating as a shadow of its former self, as at least one buy-out has fallen through.
basho  riak  nosql  databases  storage  startups  funding 
july 2017 by jm
Cluster benchmark: Scylla vs Cassandra
ScyllaDB (the C* clone in C++) is now actually looking promising -- still need more reassurance about its consistency/reliabilty side though
scylla  databases  storage  cassandra  nosql 
october 2015 by jm
If Eventual Consistency Seems Hard, Wait Till You Try MVCC
ex-Percona MySQL wizard Baron Schwartz, noting that MVCC as implemented in common SQL databases is not all that simple or reliable compared to big bad NoSQL Eventual Consistency:
Since I am not ready to assert that there’s a distributed system I know to be better and simpler than eventually consistent datastores, and since I certainly know that InnoDB’s MVCC implementation is full of complexities, for right now I am probably in the same position most of my readers are: the two viable choices seem to be single-node MVCC and multi-node eventual consistency. And I don’t think MVCC is the simpler paradigm of the two.
nosql  concurrency  databases  mysql  riak  voldemort  eventual-consistency  reliability  storage  baron-schwartz  mvcc  innodb  postgresql 
december 2014 by jm
Hermitage: Testing the "I" in ACID
[Hermitage is] a test suite for databases which probes for a variety of concurrency issues, and thus allows a fair and accurate comparison of isolation levels. Each test case simulates a particular kind of race condition that can happen when two or more transactions concurrently access the same data. Each test can pass (if the database’s implementation of isolation prevents the race condition from occurring) or fail (if the race condition does occur).
acid  architecture  concurrency  databases  nosql 
november 2014 by jm
"Macaroons" for fine-grained secure database access
Macaroons are an excellent fit for NoSQL data storage for several reasons. First, they enable an application developer to enforce security policies at very fine granularity, per object. Gone are the clunky security policies based on the IP address of the client, or the per-table access controls of RDBMSs that force you to split up your data across many tables. Second, macaroons ensure that a client compromise does not lead to loss of the entire database. Third, macaroons are very flexible and expressive, able to incorporate information from external systems and third-party databases into authorization decisions. Finally, macaroons scale well and are incredibly efficient, because they avoid public-key cryptography and instead rely solely on fast hash functions.
security  macaroons  cookies  databases  nosql  case-studies  storage  authorization  hyperdex 
november 2014 by jm
The Myth of Schema-less [NoSQL]
We don't seem to gain much in terms of database flexibility. Is our application more flexible? I don't think so. Even without our schema explicitly defined in our database, it's there... somewhere. You simply have to search through hundreds of thousands of lines to find all the little bits of it. It has the potential to be in several places, making it harder to properly identify. The reality of these codebases is that they are error prone and rarely lack the necessary documentation. This problem is magnified when there are multiple codebases talking to the same database. This is not an uncommon practice for reporting or analytical purposes.

Finally, all this "flexibility" rears its head in the same way that PHP and Javascript's "neat" weak typing stabs you right in the face. There are some somethings you can be cavalier about, and some things you should be strict about. Your data model is one you absolutely need to be strict on. If a field should store an int, it should store nothing else. Not a string, not a picture of a horse, but an integer. It's nice to know that I have my database doing type checking for me and I can expect a field to be the same type across all records.

All this leads us to an undeniable fact: There is always a schema. Wearing "I don't do schema" as a badge of honor is a complete joke and encourages a terrible development practice.
nosql  databases  storage  schema  strong-typing 
july 2014 by jm
RocksDB
' A persistent key-value store for fast storage environments', ie. BerkeleyDB/LevelDB competitor, from Facebook.
RocksDB builds on LevelDB to be scalable to run on servers with many CPU cores, to efficiently use fast storage, to support IO-bound, in-memory and write-once workloads, and to be flexible to allow for innovation.

We benchmarked LevelDB and found that it was unsuitable for our server workloads. Thebenchmark results look awesome at first sight, but we quickly realized that those results were for a database whose size was smaller than the size of RAM on the test machine - where the entire database could fit in the OS page cache. When we performed the same benchmarks on a database that was at least 5 times larger than main memory, the performance results were dismal.

By contrast, we've published the RocksDB benchmark results for server side workloads on Flash. We also measured the performance of LevelDB on these server-workload benchmarks and found that RocksDB solidly outperforms LevelDB for these IO bound workloads. We found that LevelDB's single-threaded compaction process was insufficient to drive server workloads. We saw frequent write-stalls with LevelDB that caused 99-percentile latency to be tremendously large. We found that mmap-ing a file into the OS cache introduced performance bottlenecks for reads. We could not make LevelDB consume all the IOs offered by the underlying Flash storage.


Lots of good discussion at https://news.ycombinator.com/item?id=6736900 too.
flash  ssd  rocksdb  databases  storage  nosql  facebook  bdb  disk  key-value-stores  lsm  leveldb 
november 2013 by jm
Rapid read protection in Cassandra 2.0.2
Nifty new feature -- if a request takes over the 99th percentile for requests to that server, it'll be repeated against another replica. Unnecessary for Voldemort, of course, which queries all replicas anyway!
cassandra  nosql  replication  distcomp  latency  storage 
october 2013 by jm
Sergio Bossa's thoughts about Datomic
good comments from Sergio, particularly about the scalability of the single transactor in the Datomic architecture. I agree it's a worrying design flaw
clojure  nosql  datomic  sergio-bossa  transactor  spof  architecture  storage 
october 2013 by jm
Getting Real About Distributed System Reliability
I have come around to the view that the real core difficulty of [distributed] systems is operations, not architecture or design. Both are important but good operations can often work around the limitations of bad (or incomplete) software, but good software cannot run reliably with bad operations. This is quite different from the view of unbreakable, self-healing, self-operating systems that I see being pitched by the more enthusiastic NoSQL hypesters. Worse yet, you can’t easily buy good operations in the same way you can buy good software—you might be able to hire good people (if you can find them) but this is more than just people; it is practices, monitoring systems, configuration management, etc.
reliability  nosql  distributed-systems  jay-kreps  ops 
september 2013 by jm
Instagram: Making the Switch to Cassandra from Redis, a 75% 'Insta' Savings
shifting data out of RAM and onto SSDs -- unsurprisingly, big savings.
a 12 node cluster of EC2 hi1.4xlarge instances; we store around 1.2TB of data across this cluster. At peak, we're doing around 20,000 writes per second to that specific cluster and around 15,000 reads per second. We've been really impressed with how well Cassandra has been able to drop into that role.
ram  ssd  cassandra  databases  nosql  redis  instagram  storage  ec2 
june 2013 by jm
The CAP FAQ by henryr
No subject appears to be more controversial to distributed systems engineers than the oft-quoted, oft-misunderstood CAP theorem. The purpose of this FAQ is to explain what is known about CAP, so as to help those new to the theorem get up to speed quickly, and to settle some common misconceptions or points of disagreement.
database  distributed  nosql  cap  consistency  cap-theorem  faqs 
june 2013 by jm
Call me maybe: Carly Rae Jepsen and the perils of network partitions
Kyle "aphyr" Kingsbury expands on his slides demonstrating the real-world failure scenarios that arise during some kinds of partitions (specifically, the TCP-hang, no clear routing failure, network partition scenario). Great set of blog posts clarifying CAP
distributed  network  databases  cap  nosql  redis  mongodb  postgresql  riak  crdt  aphyr 
may 2013 by jm
Project Voldemort at Gilt Groupe: When Failure Isn't an Option [slides]
Geir Magnusson explains how Gilt Groupe is using Project Voldemort to scale out their e-commerce transactional system. The initial SQL solution had to be replaced because it could not handle the transactional spikes the site is experiencing daily due to its particular way of selling their inventory: each day at noon. Magnusson explains why they chose Voldemort and talks about the architecture.

via Filippo
via:filippo  database  architecture  nosql  data  voldemort  gilt-groupe  ops  storage  presentations 
april 2013 by jm
FastBit: An Efficient Compressed Bitmap Index Technology
an [LGPL] open-source data processing library following the spirit of NoSQL movement. It offers a set of searching functions supported by compressed bitmap indexes. It treats user data in the column-oriented manner similar to well-known database management systems such as Sybase IQ, MonetDB, and Vertica. It is designed to accelerate user's data selection tasks without imposing undue requirements. In particular, the user data is NOT required to be under the control of FastBit software, which allows the user to continue to use their existing data analysis tools.

The key technology underlying the FastBit software is a set of compressed bitmap indexes. In database systems, an index is a data structure to accelerate data accesses and reduce the query response time. Most of the commonly used indexes are variants of the B-tree, such as B+-tree and B*-tree. FastBit implements a set of alternative indexes called compressed bitmap indexes. Compared with B-tree variants, these indexes provide very efficient searching and retrieval operations, but are somewhat slower to update after a modification of an individual record.

A key innovation in FastBit is the Word-Aligned Hybrid compression (WAH) for the bitmaps.[...] Another innovation in FastBit is the multi-level bitmap encoding methods.
fastbit  nosql  algorithms  indexing  search  compressed-bitmaps  indexes  wah  bitmaps  compression 
april 2013 by jm
TouchDB's reverse-engineered write-up of the Couch replication protocol
There really isn’t a separate “protocol” per se for replication. Instead, replication uses CouchDB’s REST API and data model. It’s therefore a bit difficult to talk about replication independently of the rest of CouchDB. In this document I’ll focus on the algorithm used, and link to documentation of the APIs it invokes. The “protocol” is simply the set of those APIs operating over HTTP.
couchdb  protocols  touchdb  nosql  replication  sync  mvcc  revisions  rest 
april 2013 by jm
CouchDB: not drinking the kool-aid
Jonathan Ellis on some CouchDB negatives:
Here are some reasons you should think twice and do careful testing before using CouchDB in a non-toy project:
Writes are serialized.  Not serialized as in the isolation level, serialized as in there can only be one write active at a time.  Want to spread writes across multiple disks?  Sorry.
CouchDB uses a MVCC model, which means that updates and deletes need to be compacted for the space to be made available to new writes.  Just like PostgreSQL, only without the man-years of effort to make vacuum hurt less.
CouchDB is simple.  Gloriously simple.  Why is that a negative?  It's competing with systems (in the popular imagination, if not in its author's mind) that have been maturing for years.  The reason PostgreSQL et al have those features is because people want them.  And if you don't, you should at least ask a DBA with a few years of non-MySQL experience what you'll be missing.  The majority of CouchDB fans don't appear to really understand what a good relational database gives them, just as a lot of PHP programmers don't get what the big deal is with namespaces.
A special case of simplicity deserves mention: nontrivial queries must be created as a view with mapreduce.  MapReduce is a great approach to trivially parallelizing certain classes of problem.  The problem is, it's tedious and error-prone to write raw MapReduce code.  This is why Google and Yahoo have both created high-level languages on top of it (Sawzall and Pig, respectively).  Poor SQL; even with DSLs being the new hotness, people forget that SQL is one of the original domain-specific languages.  It's a little verbose, and you might be bored with it, but it's much better than writing low-level mapreduce code.
cassandra  couch  nosql  storage  distributed  databases  consistency 
april 2013 by jm
Why I'm Walking Away From CouchDB
In practice there are two gotchas that are so painful I am  looking for a replacement with a different featureset than couchdb provides. The location tracking project icecondor.com uses couchdb to store 20,000 new records per day. It has more write traffic than read traffic and runs on modest hardware. Those two gotchas are:

1. View Index updates.

While I have a vague understanding of why view index updates are slow and bulky and important, in practice it is unworkable. Every write sets up a trap for the first reader to come along after the write. The more writes there are, the bigger the trap for the first reader which has to wait on the couchdb process that refreshes the view index on an as-needed basis. I believe this trade-off was made to keep writes fast. No need to update the view index until all writes are actually complete, right? Write traffic is heavier than read traffic and the time needed for that index refresh causes the webapp to crash because its not setup to handle timeouts from a database query. The workaround is as hackish as one can imagine -  cron jobs to hit every  map/reduce query to keep indexes fresh.

2. Append only database file

Append only is in theory a great way to ensure on-disk reliability. A system crash during an append should only affect that append. Its a crash during an update to existing parts of the file that risks the integrity of more than whats being updated. With so many layers of caching and optimizations in the kernel and the filesystem and now in the workings of SSD drives, I'm not sure append-only gives extra protection anymore.

What it does do is a create a huge operational headache. The on-disk file can never grow beyond half the available storage space. Record deletion uses new disk space and if the half-full mark approaches, vacuuming must be done. The entire database is rewritten to the filesystem, leaving out no longer needed records. If the data file should happen to grow beyond half the partition, the system has esentially crashed because there is no way to compact the file and soon the partition will be full. This is a likely scenario when there is a lot of record deletion activity.

The system in question does a lot of writes of temporary data that is followed up by deletes a few days later. There is also a lot of permanent storage that hardly gets used. Rewriting every byte of the records that are long-lived due to compaction is an enormous amount of wasted I/O - doubly so given SSD drives have a short write-cycle lifespan.
nosql  couchdb  consistency  checkpointing  databases  data-stores  indexing 
april 2013 by jm
Riak CS is now ASL2 open source
'Organizations and users can now access the source code on Github and download the latest packages from the downloads page. Also, today, we announced that Riak CS Enterprise is now available as commercial licensed software, featuring multi-datacenter replication technology and 24×7 Basho customer support.'
riak  riak-cs  nosql  storage  basho  open-source  github  apache  asl2 
march 2013 by jm
Announcing the Voldemort 1.3 Open Source Release
new release from LinkedIn -- better p90/p99 PUT performance, improvements to the BDB-JE storage layer, massively-improved rebalance performance
voldemort  linkedin  open-source  bdb  nosql 
march 2013 by jm
Metric Collection and Storage with Cassandra | DataStax
DataStax' documentation on how they store TSD data in Cass. Pretty generic
datastax  nosql  metrics  analytics  cassandra  tsd  time-series  storage 
march 2013 by jm
Riakking Complex Data Types
interesting details about Riak's support for secondary indexes. Not quite SQL, but still more powerful than plain old K/V storage (via dehora)
via:dehora  riak  indexes  storage  nosql  key-value-stores  2i  range-queries 
march 2013 by jm
Big Data Analytics at Netflix. Interview with Christos Kalantzis and Jason Brown.
Good interview with the Cassandra guys at Netflix, and some top Mongo-bashing in the comments
cassandra  netflix  user-stories  testimonials  nosql  storage  ec2  mongodb 
february 2013 by jm
Cassandra, Hive, and Hadoop: How We Picked Our Analytics Stack
reasonably good whole-stack performance testing and analysis; HBase, Riak, MongoDB, and Cassandra compared. Riak did pretty badly :(
riak  mongodb  cassandra  hbase  performance  analytics  hadoop  hive  big-data  storage  databases  nosql 
february 2013 by jm
Basho | Alert Logic Relies on Riak to Support Rapid Growth
'The new [Riak-based] analytics infrastructure performs statistical and correlation processing on all data [...] approximately 5 TB/day. All of this data is processed in real-time as it streams in. [...] Alert Logic’s analytics infrastructure, powered by Riak, achieves performance results of up to 35k operations/second across each node in the cluster – performance that eclipses the existing MySQL deployment by a large margin on single node performance. In real business terms, the initial deployment of the combination of Riak and the analytic infrastructure has allowed Alert Logic to process in real-time 7,500 reports, which previously took 12 hours of dedicated processing every night.'

Twitter discussion here: https://twitter.com/fisherpk/status/294984960849367040 , which notes 'heavily cached SAN storage, 12 core blades and 90% get to put ops', and '3 riak nodes, 12-cores, 30k get heavy riak ops/sec. 8 nodes driving ops to that cluster'. Apparently the use of SAN storage on all nodes is historic, but certainly seems to have produced good iops numbers as an (expensive) side-effect...
iops  riak  basho  ops  systems  alert-logic  storage  nosql  databases 
january 2013 by jm
SSTable and Log Structured Storage: LevelDB
good writeup of LevelDB's native storage formats; the Sorted String Table (SSTable), Log Structured Merge Trees, and Snappy compression
leveldb  nosql  data  storage  disk  persistence  google 
july 2012 by jm
Goodbye, CouchDB
'From most model-using code, using [Percona] MySQL looks exactly the same as using CouchDB did. Except it’s faster, and the DB basically never fails.'
couchdb  mysql  nosql  databases  storage  percona  via:peakscale 
may 2012 by jm
How we use Redis at Bump
via Simon Willison. some nice ideas here, particularly using a replication slave to handle the potentially latency-impacting disk writes in AOF mode
queueing  redis  nosql  databases  storage  via:simonw  replication  bump 
july 2011 by jm
The MongoDB NoSQL Database Blog - MongoDB live at Craigslist
'>MongoDB is now live at Craigslist, where it is being used to archive [10TB] of [old posts]'. iiiinteresting
mongodb  nosql  craigslist  systems 
may 2011 by jm
Foursquare MongoDB outage post mortem
MongoDB was set up to write to RAM if possible, omitting immediate writes to disk -- but then the db size exceeded RAM size, the disk was hit, imposing a massive slowdown and creating a huge backlog immediately, bringing the site down (via Nelson)
via:nelson  mongodb  sharding  nosql  ouch  outage  foursquare  sysadmin  ops  from delicious
october 2010 by jm
We’re Back… so long MongoDB! · Blue74
MongoDB war story -- records going missing, eek
mongodb  mysql  nosql  rant  stability  beta  from delicious
june 2010 by jm
GitHub scheduled maintainance due to Redis upgrade
good comments on the processes useful for large-scale Redis upgrades
upgrades  redis  spof  nosql  databases  github  deployment  from delicious
may 2010 by jm
NoSQL at Twitter (NoSQL EU 2010) [PDF]
specifically, Hadoop and Pig for log/metrics analytics, Cassandra going forward; great preso, lots of detail and code examples. also, impressive number-crunching going on at Twitter
twitter  analytics  cassandra  databases  hadoop  pdf  logs  metrics  number-crunching  nosql  pig  presentation  slides  scribe  from delicious
april 2010 by jm
BlueRunner: Email in the Cloud with Cassandra [PDF]
interesting prez from some IBM researchers on using Cassandra as a mail store, via Jeremy
via:jzawodny  mail  cassandra  database  data  ibm  nosql  performance  presentation  pdf  from delicious
april 2010 by jm
Why I like Redis
Simon Willison plugs Redis as a good datastore for quick-hack scripts with requirements for lots of fast, local data storage -- the kind of thing I'd often use a DB_File for
python  storage  databases  schemaless  nosql  redis  simon-willison  data-store  from delicious
october 2009 by jm

related tags

2i  acid  acm  alert-logic  algorithms  amazon  analytics  apache  aphyr  architecture  asl2  authorization  aws  banking  banks  baron-schwartz  basho  bdb  beta  big-data  bitmaps  bump  cap  cap-theorem  case-studies  cassandra  checkpointing  clojure  compressed-bitmaps  compression  concurrency  consistency  cookies  couch  couchdb  counters  craigslist  crdt  data  data-store  data-stores  database  databases  datastax  datomic  deployment  disk  distcomp  distributed  distributed-systems  durability  dynamo  dynamodb  ec2  eventual-consistency  facebook  faqs  fastbit  flash  foursquare  funding  gilt-groupe  github  gizzard  google  hadoop  hbase  history  hive  hyperdex  ibm  indexes  indexing  innodb  instagram  insurance  iops  jay-kreps  key-value-stores  latency  leveldb  linkedin  logs  lsm  macaroons  mail  manhattan  marc-brooker  metrics  mongodb  mvcc  mysql  netflix  network  nosql  number-crunching  open-source  ops  ouch  outage  pacelc  papers  pdf  percona  performance  persistence  pig  postgresql  presentation  presentations  protocols  python  queueing  ram  range-queries  rant  redis  reliability  replication  rest  revisions  riak  riak-cs  rocksdb  scala  scalability  schema  schemaless  scribe  scylla  search  security  sergio-bossa  sharding  simon-willison  slides  spof  ssd  stability  startups  storage  strong-typing  sync  sysadmin  systems  tellybug  testimonials  time-series  touchdb  transactions  transactor  tsd  twitter  upgrades  user-stories  via:dehora  via:filippo  via:jzawodny  via:nelson  via:peakscale  via:simonw  voldemort  wah 

Copy this bookmark:



description:


tags: