jm + compaction   2

Taming the Beast: How Scylla Leverages Control Theory to Keep Compactions Under Control - ScyllaDB
This is a really nice illustration of the use of control theory to set tunable thresholds automatically in a complex storage system. Nice work Scylla:

At any given moment, a database like ScyllaDB has to juggle the admission of foreground requests with background processes like compactions, making sure that the incoming workload is not severely disrupted by compactions, nor that the compaction backlog is so big that reads are later penalized.

In this article, we showed that isolation among incoming writes and compactions can be achieved by the Schedulers, yet the database is still left with the task of determining the amount of shares of the resources incoming writes and compactions will use.

Scylla steers away from user-defined tunables in this task, as they shift the burden of operation to the user, complicating operations and being fragile against changing workloads. By borrowing from the strong theoretical background of industrial controllers, we can provide an Autonomous Database that adapts to changing workloads without operator intervention.
scylladb  storage  settings  compaction  automation  thresholds  control-theory  ops  cassandra  feedback 
4 weeks ago by jm
Faster BAM Sorting with SAMtools and RocksDB
Now this is really really clever. Heap-merging a heavyweight genomics format, using RocksDB to speed it up.
There’s a problem with the single-pass merge described above when the number of intermediate files, N/R, is large. Merging the sorted intermediate files in limited memory requires constantly reading little bits from all those files, incurring a lot of disk seeks on rotating drives. In fact, at some point, samtools sort performance becomes effectively bound to disk seeking. [...] In this scenario, samtools rocksort can sort the same data in much less time, using no more memory, by invoking RocksDB’s background compaction capabilities. With a few extra lines of code we configure RocksDB so that, while we’re still in the process of loading the BAM data, it runs additional background threads to merge batches of existing sorted temporary files into fewer, larger, sorted files. Just like the final merge, each background compaction requires only a modest amount of working memory.

(via the RocksDB facebook group)
rocksdb  algorithms  sorting  leveldb  bam  samtools  merging  heaps  compaction 
may 2014 by jm

Copy this bookmark: