jm + cassandra   45

Taming the Beast: How Scylla Leverages Control Theory to Keep Compactions Under Control - ScyllaDB
This is a really nice illustration of the use of control theory to set tunable thresholds automatically in a complex storage system. Nice work Scylla:

At any given moment, a database like ScyllaDB has to juggle the admission of foreground requests with background processes like compactions, making sure that the incoming workload is not severely disrupted by compactions, nor that the compaction backlog is so big that reads are later penalized.

In this article, we showed that isolation among incoming writes and compactions can be achieved by the Schedulers, yet the database is still left with the task of determining the amount of shares of the resources incoming writes and compactions will use.

Scylla steers away from user-defined tunables in this task, as they shift the burden of operation to the user, complicating operations and being fragile against changing workloads. By borrowing from the strong theoretical background of industrial controllers, we can provide an Autonomous Database that adapts to changing workloads without operator intervention.
scylladb  storage  settings  compaction  automation  thresholds  control-theory  ops  cassandra  feedback 
5 weeks ago by jm
Cherami: Uber Engineering’s Durable and Scalable Task Queue in Go - Uber Engineering Blog

a competing-consumer messaging queue that is durable, fault-tolerant, highly available and scalable. We achieve durability and fault-tolerance by replicating messages across storage hosts, and high availability by leveraging the append-only property of messaging queues and choosing eventual consistency as our basic model. Cherami is also scalable, as the design does not have single bottleneck. [...]
Cherami is completely written in Go, a language that makes building highly performant and concurrent system software a lot of fun. Additionally, Cherami uses several libraries that Uber has already open sourced: TChannel for RPC and Ringpop for health checking and group membership. Cherami depends on several third-party open source technologies: Cassandra for metadata storage, RocksDB for message storage, and many other third-party Go packages that are available on GitHub. We plan to open source Cherami in the near future.
cherami  uber  queueing  tasks  queues  architecture  scalability  go  cassandra  rocksdb 
december 2016 by jm
Highly Available Counters Using Cassandra
solid discussion of building HA counters using CRDTs and similar eventually-consistent data structures
crdts  algorithms  data-structures  cassandra  ha  counters 
september 2016 by jm
Life360 testimonial for Prometheus
Now this is a BIG thumbs up:
'Prometheus has been known to us for a while, and we have been tracking it and reading about the active development, and at a point (a few months back) we decided to start evaluating it for production use. The PoC results were incredible. The monitoring coverage of MySQL was amazing, and we also loved the JMX monitoring for Cassandra, which had been sorely lacking in the past.'
metrics  monitoring  time-series  prometheus  testimonials  life360  cassandra  jmx  mysql 
march 2016 by jm
Spotify wrote their own metrics store on ElasticSearch and Cassandra. Sounds very similar to Prometheus
cassandra  elasticsearch  spotify  monitoring  metrics  heroic 
december 2015 by jm
Cluster benchmark: Scylla vs Cassandra
ScyllaDB (the C* clone in C++) is now actually looking promising -- still need more reassurance about its consistency/reliabilty side though
scylla  databases  storage  cassandra  nosql 
october 2015 by jm
"Trash Day: Coordinating Garbage Collection in Distributed Systems"
Another GC-coordination strategy, similar to Blade (qv), with some real-world examples using Cassandra
blade  via:adriancolyer  papers  gc  distsys  algorithms  distributed  java  jvm  latency  spark  cassandra 
may 2015 by jm
Cassandra moving to using G1 as the default recommended GC implementation
This is a big indicator that G1 is ready for primetime. CMS has long been the go-to GC for production usage, but requires careful, complex hand-tuning -- if G1 is getting to a stage where it's just a case of giving it enough RAM, that'd be great.

Also, looks like it'll be the JDK9 default:
cassandra  tuning  ops  g1gc  cms  gc  java  jvm  production  performance  memory 
april 2015 by jm
Time Series Metrics with Cassandra
slides from Chris Maxwell of Ubiquiti Networks describing what he had to do to get cyanite on Cassandra handling 30k metrics per second; an experimental "Date-tiered compaction" mode from Spotify was essential from the sounds of it. Very complex :(
cassandra  spotify  date-tiered-compaction  metrics  graphite  cyanite  chris-maxwell  time-series-data 
april 2015 by jm
Cassandra remote code execution hole (CVE-2015-0225)
Ah now lads.
Under its default configuration, Cassandra binds an unauthenticated
JMX/RMI interface to all network interfaces. As RMI is an API for the
transport and remote execution of serialized Java, anyone with access
to this interface can execute arbitrary code as the running user.
cassandra  jmx  rmi  java  ops  security 
april 2015 by jm
TL;DR: Cassandra Java Huge Pages
Al Tobey does some trial runs of -XX:+AlwaysPreTouch and -XX:+UseHugePages
jvm  performance  tuning  huge-pages  vm  ops  cassandra  java 
february 2015 by jm
Personalization at Spotify using Cassandra
Lots and lots of good detail into the Spotify C* setup (via Bill de hOra)
via:dehora  spotify  cassandra  replication  storage  ops 
january 2015 by jm
Hack workaround to get JVM thread priorities working on Linux
As used in Cassandra ( )!
if you just set the "ThreadPriorityPolicy" to something else than the legal values 0 or 1, [...] a slight logic bug in Sun's JVM code kicks in, and thus sets the policy to be as if running with root - thus you get exactly what one desire. The operating system, Linux, won't allow priorities to be heightened above "Normal" (negative nice value), and thus just ignores those requests (setting it to normal instead, nice value 0) - but it lets through the requests to set it lower (setting the nice value to some positive value).
cassandra  thread-priorities  threads  java  jvm  linux  nice  hacks 
january 2015 by jm
Why We Didn’t Use Kafka for a Very Kafka-Shaped Problem
A good story of when Kafka _didn't_ fit the use case:
We came up with a complicated process of app-level replication for our messages into two separate Kafka clusters. We would then do end-to-end checking of the two clusters, detecting dropped messages in each cluster based on messages that weren’t in both.

It was ugly. It was clearly going to be fragile and error-prone. It was going to be a lot of app-level replication and horrible heuristics to see when we were losing messages and at least alert us, even if we couldn’t fix every failure case.

Despite us building a Kafka prototype for our ETL — having an existing investment in it — it just wasn’t going to do what we wanted. And that meant we needed to leave it behind, rewriting the ETL prototype.
cassandra  java  kafka  scala  network-partitions  availability  multi-region  multi-az  aws  replication  onlive 
november 2014 by jm
get page cache statistics for files.
A common question when tuning databases and other IO-intensive applications is, "is Linux caching my data or not?" pcstat gets that information for you using the mincore(2) syscall. I wrote this is so that Apache Cassandra users can see if ssTables are being cached.
linux  page-cache  caching  go  performance  cassandra  ops  mincore  fincore 
september 2014 by jm
Manages migrations for your Cassandra data stores. Pillar grew from a desire to automatically manage Cassandra schema as code. Managing schema as code enables automated build and deployment, a foundational practice for an organization striving to achieve Continuous Delivery.

Pillar is to Cassandra what Rails ActiveRecord migrations or Play Evolutions are to relational databases with one key difference: Pillar is completely independent from any application development framework.
migrations  database  ops  pillar  cassandra  activerecord  scala  continuous-delivery  automation  build 
june 2014 by jm
AWS Case Study: Hailo
Ubuntu, C*, HAProxy, MySQL, RDS, multiple AWS regions.
hailo  cassandra  ubuntu  mysql  rds  aws  ec2  haproxy  architecture 
april 2014 by jm
Migrating from MongoDB to Cassandra
Interesting side-effect of using LUKS for full-disk encryption: 'For every disk read, we were pulling in 3MB of data (RA is sectors, SSZ is sector size, 6144*512=3145728 bytes) into cache. Oops. Not only were we doing tons of extra work, but we were trashing our page cache too. The default for the device-mapper used by LUKS under Ubuntu 12.04LTS is incredibly sub-optimal for database usage, especially our usage of Cassandra (more small random reads vs. large rows). We turned this down to 128 sectors — 64KB.'
cassandra  luks  raid  linux  tuning  ops  blockdev  disks  sdd 
february 2014 by jm
Cassandra: tuning the JVM for read heavy workloads
The cluster we tuned is hosted on AWS and is comprised of 6 hi1.4xlarge EC2 instances, with 2 1TB SSDs raided together in a raid 0 configuration. The cluster’s dataset is growing steadily. At the time of this writing, our dataset is 341GB, up from less than 200GB a few months ago, and is growing by 2-3GB per day. The workload on this cluster is very read heavy, with quorum reads making up 99% of all operations.

Some careful GC tuning here. Probably not applicable to anyone else, but good approach in general.
java  performance  jvm  scaling  gc  tuning  cassandra  ops 
january 2014 by jm
a metric storage daemon, exposing both a carbon listener and a simple web service. Its aim is to become a simple, scalable and drop-in replacement for graphite's backend.

Pretty alpha for now, but definitely worth keeping an eye on to potentially replace our burgeoning Carbon fleet...
graphite  carbon  cassandra  storage  metrics  ops  graphs  service-metrics 
december 2013 by jm
Introducing Chaos to C*
Autoremediation, ie. auto-replacement, of Cassandra nodes in production at Netflix
ops  autoremediation  outages  remediation  cassandra  storage  netflix  chaos-monkey 
october 2013 by jm
The trouble with timestamps
Timestamps, as implemented in Riak, Cassandra, et al, are fundamentally unsafe ordering constructs. In order to guarantee consistency you, the user, must ensure locally monotonic and, to some extent, globally monotonic clocks. This is a hard problem, and NTP does not solve it for you. When wall clocks are not properly coupled to the operations in the system, causal constraints can be violated. To ensure safety properties hold all the time, rather than probabilistically, you need logical clocks.
clocks  time  distributed  databases  distcomp  ntp  via:fanf  aphyr  vector-clocks  last-write-wins  lww  cassandra  riak 
october 2013 by jm
Rapid read protection in Cassandra 2.0.2
Nifty new feature -- if a request takes over the 99th percentile for requests to that server, it'll be repeated against another replica. Unnecessary for Voldemort, of course, which queries all replicas anyway!
cassandra  nosql  replication  distcomp  latency  storage 
october 2013 by jm
Observability at Twitter
Bit of detail into Twitter's TSD metric store.
There are separate online clusters for different data sets: application and operating system metrics, performance critical write-time aggregates, long term archives, and temporal indexes. A typical production instance of the time series database is based on four distinct Cassandra clusters, each responsible for a different dimension (real-time, historical, aggregate, index) due to different performance constraints. These clusters are amongst the largest Cassandra clusters deployed in production today and account for over 500 million individual metric writes per minute. Archival data is stored at a lower resolution for trending and long term analysis, whereas higher resolution data is periodically expired. Aggregation is generally performed at write-time to avoid extra storage operations for metrics that are expected to be immediately consumed. Indexing occurs along several dimensions–service, source, and metric names–to give users some flexibility in finding relevant data.
twitter  monitoring  metrics  service-metrics  tsd  time-series  storage  architecture  cassandra 
september 2013 by jm
Blueflood by rackerlabs
Rackspace's large-scale TSD storage system, built on Cassandra, Java, ASL2
cassandra  tsd  storage  time-series  data  open-source  java  rackspace 
september 2013 by jm
Instagram: Making the Switch to Cassandra from Redis, a 75% 'Insta' Savings
shifting data out of RAM and onto SSDs -- unsurprisingly, big savings.
a 12 node cluster of EC2 hi1.4xlarge instances; we store around 1.2TB of data across this cluster. At peak, we're doing around 20,000 writes per second to that specific cluster and around 15,000 reads per second. We've been really impressed with how well Cassandra has been able to drop into that role.
ram  ssd  cassandra  databases  nosql  redis  instagram  storage  ec2 
june 2013 by jm
CouchDB: not drinking the kool-aid
Jonathan Ellis on some CouchDB negatives:
Here are some reasons you should think twice and do careful testing before using CouchDB in a non-toy project:
Writes are serialized.  Not serialized as in the isolation level, serialized as in there can only be one write active at a time.  Want to spread writes across multiple disks?  Sorry.
CouchDB uses a MVCC model, which means that updates and deletes need to be compacted for the space to be made available to new writes.  Just like PostgreSQL, only without the man-years of effort to make vacuum hurt less.
CouchDB is simple.  Gloriously simple.  Why is that a negative?  It's competing with systems (in the popular imagination, if not in its author's mind) that have been maturing for years.  The reason PostgreSQL et al have those features is because people want them.  And if you don't, you should at least ask a DBA with a few years of non-MySQL experience what you'll be missing.  The majority of CouchDB fans don't appear to really understand what a good relational database gives them, just as a lot of PHP programmers don't get what the big deal is with namespaces.
A special case of simplicity deserves mention: nontrivial queries must be created as a view with mapreduce.  MapReduce is a great approach to trivially parallelizing certain classes of problem.  The problem is, it's tedious and error-prone to write raw MapReduce code.  This is why Google and Yahoo have both created high-level languages on top of it (Sawzall and Pig, respectively).  Poor SQL; even with DSLs being the new hotness, people forget that SQL is one of the original domain-specific languages.  It's a little verbose, and you might be bored with it, but it's much better than writing low-level mapreduce code.
cassandra  couch  nosql  storage  distributed  databases  consistency 
april 2013 by jm
Metric Collection and Storage with Cassandra | DataStax
DataStax' documentation on how they store TSD data in Cass. Pretty generic
datastax  nosql  metrics  analytics  cassandra  tsd  time-series  storage 
march 2013 by jm
Monitoring Apache Hadoop, Cassandra and Zookeeper using Graphite and JMXTrans
nice enough, but a lot of moving parts. It would be nice to see a simpler ZK+Graphite setup using the 'mntr' verb
graphite  monitoring  ops  zookeeper  cassandra  hadoop  jmx  jmxtrans  graphs 
march 2013 by jm
Netflix Queue: Data migration for a high volume web application
There will come a time in the life of most systems serving data, when there is a need to migrate data to [another] data store while maintaining or improving data consistency, latency and efficiency. This document explains the data migration technique we used at Netflix to migrate the user’s queue data between two different distributed NoSQL storage systems [SimpleDB to Cassandra].
cassandra  netflix  migrations  data  schema  simpledb  storage 
march 2013 by jm
Big Data Analytics at Netflix. Interview with Christos Kalantzis and Jason Brown.
Good interview with the Cassandra guys at Netflix, and some top Mongo-bashing in the comments
cassandra  netflix  user-stories  testimonials  nosql  storage  ec2  mongodb 
february 2013 by jm
Cassandra, Hive, and Hadoop: How We Picked Our Analytics Stack
reasonably good whole-stack performance testing and analysis; HBase, Riak, MongoDB, and Cassandra compared. Riak did pretty badly :(
riak  mongodb  cassandra  hbase  performance  analytics  hadoop  hive  big-data  storage  databases  nosql 
february 2013 by jm
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Amazing stuff from Adrian Cockroft at last week's QCon. Faceted object model, lots of Cassandra automation
cassandra  api  design  oo  object-model  java  adrian-cockroft  slides  qcon  scaling  aws  netflix 
march 2012 by jm
Benchmarking Cassandra Scalability on AWS - Over a million writes per second
NetFlix' benchmarks -- impressively detailed. '48, 96, 144 and 288 instances', across 3 EC2 AZs in us-east, successfully scaling linearly
ec2  aws  cassandra  scaling  benchmarks  netflix  performance 
november 2011 by jm
NoSQL at Twitter (NoSQL EU 2010) [PDF]
specifically, Hadoop and Pig for log/metrics analytics, Cassandra going forward; great preso, lots of detail and code examples. also, impressive number-crunching going on at Twitter
twitter  analytics  cassandra  databases  hadoop  pdf  logs  metrics  number-crunching  nosql  pig  presentation  slides  scribe  from delicious
april 2010 by jm
BlueRunner: Email in the Cloud with Cassandra [PDF]
interesting prez from some IBM researchers on using Cassandra as a mail store, via Jeremy
via:jzawodny  mail  cassandra  database  data  ibm  nosql  performance  presentation  pdf  from delicious
april 2010 by jm

related tags

activerecord  adrian-cockroft  algorithms  analytics  aphyr  api  architecture  automation  autoremediation  availability  aws  b-trees  batch  benchmarks  big-data  blade  blockdev  bloom-filters  bugs  build  caching  carbon  cardinality  cassandra  chaos-monkey  cherami  chris-maxwell  clocks  cms  compaction  concurrency  configuration  consistency  continuous-delivery  control-theory  couch  counters  crdts  cyanite  data  data-structures  database  databases  datastax  date-tiered-compaction  design  disks  disruptor  distcomp  distributed  distsys  dynamodb  ec2  elasticsearch  eventual-consistency  exception-handling  failure  fault-tolerance  feedback  fincore  g1gc  gc  go  graphite  graphs  ha  hacker-news  hacks  hadoop  hailo  haproxy  hbase  hdfs  heroic  hive  hll  huge-pages  hyperloglog  ibm  instagram  java  jmx  jmxtrans  jvm  kafka  last-write-wins  latency  life360  linux  logs  luks  lww  mail  manhattan  mapreduce  memory  metrics  migrations  mincore  mongodb  monitoring  multi-az  multi-region  mysql  netflix  network-partitions  nice  nosql  ntp  number-crunching  object-model  onlive  oo  open-source  ops  outages  p99  page-cache  papers  pdf  performance  pig  pillar  presentation  production  prometheus  qcon  queueing  queues  race-conditions  rackspace  raid  ram  rds  redis  reliability  remediation  replication  riak  rmi  rocksdb  scala  scalability  scale  scaling  schema  scribe  scylla  scylladb  sdd  security  service-metrics  settings  simpledb  slides  spark  spark-streaming  speed  spotify  ssd  sstables  startup  storage  tasks  tellybug  testimonials  thread-priorities  threads  thresholds  time  time-series  time-series-data  tsd  tuning  twitter  uber  ubuntu  user-stories  vector-clocks  via:adriancolyer  via:dehora  via:fanf  via:jzawodny  via:kellabyte  vm  voldemort  zookeeper 

Copy this bookmark: