jm + database   23

When Boring is Awesome: Building a scalable time-series database on PostgreSQL
Nice. we built something along these lines atop MySQL before -- partitioning by timestamp is the key. (via Nelson)
database  postgresql  postgres  timeseries  tsd  storage  state  via:nelson 
21 days ago by jm
Why Uber Engineering Switched from Postgres to MySQL
Uber bringing the smackdown for the HN postgres fanclub, with some juicy technical details of issues that caused them pain. FWIW, I was bitten by crappy postgres behaviour in the past (specifically around vacuuming and pgbouncer), so I've long been a MySQL fan ;)
database  mysql  postgres  postgresql  uber  architecture  storage  sql 
july 2016 by jm
ClickHouse — open-source distributed column-oriented DBMS
'ClickHouse manages extremely large volumes of data in a stable and sustainable manner. It currently powers Yandex.Metrica, world’s second largest web analytics platform, with over 13 trillion database records and over 20 billion events a day, generating customized reports on-the-fly, directly from non-aggregated data. This system was successfully implemented at CERN’s LHCb experiment to store and process metadata on 10bn events with over 1000 attributes per event registered in 2011.'

Yandex-tastic, but still looks really interesting
yandex  analytics  database  storage  sql  clickhouse 
june 2016 by jm
Visual Representation of SQL Joins
useful bookmark to have (via Nelson)
sql  joins  mysql  reference  database 
march 2016 by jm
5 subtle ways you're using MySQL as a queue, and why it'll bite you
Excellent post from Percona. I particularly like that they don't just say "don't use MySQL" -- they give good advice on how it can be made work: 1) avoid polling; 2) avoid locking; and 3) avoid storing your queue in the same table as other data.
database  mysql  queueing  queue  messaging  percona  rds  locking  sql  architecture 
january 2016 by jm
'Continuous Deployment: The Dirty Details'
Good slide deck from Etsy's Mike Brittain regarding their CD setup. Some interesting little-known details:

Slide 41: database schema changes are not CD'd -- they go out on "Schema change Thursdays".

Slide 44: only the webapp is CD'd -- PHP, Apache, memcache components (, support and back-office tools, developer API, gearman async worker queues). The external "services" are not -- databases, Solr/JVM search (rolling restarts), photo storage (filters, proxy cache, S3), payments (PCI-DSS, controlled access).

They avoid schema changes and breaking changes using an approach they call "non-breaking expansions" -- expose new version in a service interface; support multiple versions in the consumer. Example from slides 50-63, based around a database schema migration.

Slide 66: "dev flags" (rollout oriented) are promoted to "feature flags" (long lived degradation control).

Slide 71: some architectural philosophies: deploying is cheap; releasing is cheap; gathering data should be cheap too; treat first iterations as experiments.

Slide 102: "Canary pools". They have multiple pools of users for testing in production -- the staff pool, users who have opted in to see prototypes/beta stuff, 0-100% gradual phased rollout.
cd  deploy  etsy  slides  migrations  database  schema  ops  ci  version-control  feature-flags 
april 2015 by jm
The Large Hadron Migrator is a tool to perform live database migrations in a Rails app without locking.

The basic idea is to perform the migration online while the system is live, without locking the table. In contrast to OAK and the facebook tool, we only use a copy table and triggers. The Large Hadron is a test driven Ruby solution which can easily be dropped into an ActiveRecord or DataMapper migration. It presumes a single auto incremented numerical primary key called id as per the Rails convention. Unlike the twitter solution, it does not require the presence of an indexed updated_at column.
migrations  database  sql  ops  mysql  rails  ruby  lhm  soundcloud  activerecord 
march 2015 by jm
Manages migrations for your Cassandra data stores. Pillar grew from a desire to automatically manage Cassandra schema as code. Managing schema as code enables automated build and deployment, a foundational practice for an organization striving to achieve Continuous Delivery.

Pillar is to Cassandra what Rails ActiveRecord migrations or Play Evolutions are to relational databases with one key difference: Pillar is completely independent from any application development framework.
migrations  database  ops  pillar  cassandra  activerecord  scala  continuous-delivery  automation  build 
june 2014 by jm
Database Migrations Done Right
The rule is simple. You should never tie database migrations to application deploys or vice versa. By minimising dependencies you enable faster, easier and cleaner deployments.

A solid description of why this is a good idea, from an ex-Guardian dev.
migrations  database  sql  mysql  postgres  deployment  ops  dependencies  loose-coupling 
may 2014 by jm
Kelly "kellabyte" Sommers on Redis' "relaxed CP" approach to the CAP theorem

Similar to ACID properties, if you partially provide properties it means the user has to _still_ consider in their application that the property doesn't exist, because sometimes it doesn't. In you're fsync example, if fsync is relaxed and there are no replicas, you cannot consider the database durable, just like you can't consider Redis a CP system. It can't be counted on for guarantees to be delivered. This is why I say these systems are hard for users to reason about. Systems that partially offer guarantees require in-depth knowledge of the nuances to properly use the tool. Systems that explicitly make the trade-offs in the designs are easier to reason about because it is more obvious and _predictable_.
kellabyte  redis  cp  ap  cap-theorem  consistency  outages  reliability  ops  database  storage  distcomp 
december 2013 by jm
Non-blocking transactional atomicity
interesting new distributed atomic transaction algorithm from Peter Bailis
algorithms  database  distributed  scalability  storage  peter-bailis  distcomp 
october 2013 by jm
The CAP FAQ by henryr
No subject appears to be more controversial to distributed systems engineers than the oft-quoted, oft-misunderstood CAP theorem. The purpose of this FAQ is to explain what is known about CAP, so as to help those new to the theorem get up to speed quickly, and to settle some common misconceptions or points of disagreement.
database  distributed  nosql  cap  consistency  cap-theorem  faqs 
june 2013 by jm
Project Voldemort at Gilt Groupe: When Failure Isn't an Option [slides]
Geir Magnusson explains how Gilt Groupe is using Project Voldemort to scale out their e-commerce transactional system. The initial SQL solution had to be replaced because it could not handle the transactional spikes the site is experiencing daily due to its particular way of selling their inventory: each day at noon. Magnusson explains why they chose Voldemort and talks about the architecture.

via Filippo
via:filippo  database  architecture  nosql  data  voldemort  gilt-groupe  ops  storage  presentations 
april 2013 by jm
The Bw-Tree: A B-tree for New Hardware - Microsoft Research
The emergence of new hardware and platforms has led to reconsideration of how data management systems are designed. However, certain basic functions such as key indexed access to records remain essential. While we exploit the common architectural layering of prior systems, we make radically new design decisions about each layer. Our new form of B tree, called the Bw-tree achieves its very high performance via a latch-free approach that effectively exploits the processor caches of modern multi-core chips. Our storage manager uses a unique form of log structuring that blurs the distinction between a page and a record store and works well with flash storage. This paper describes the architecture and algorithms for the Bw-tree, focusing on the main memory aspects. The paper includes results of our experiments that demonstrate that this fresh approach produces outstanding performance.
bw-trees  database  paper  toread  research  algorithms  microsoft  sql  sql-server  b-trees  data-structures  storage  cache-friendly  mechanical-sympathy 
april 2013 by jm
Online Schema Change for MySQL
A tool written by Facebook to ease the pain of online MySQL schema-change migrations.
Some ALTER TABLE statements take too long form the perspective of some MySQL users. The fast index create feature for the InnoDB plugin in MySQL 5.1 makes this less of an issue but this can still take minutes to hours for a large table and for some MySQL deployments that is too long.
A workaround is to perform the change on a slave first and then promote the slave to be the new master. But this requires a slave located near the master. MySQL 5.0 added support for triggers and some replication systems have been built using triggers to capture row changes. Why not use triggers for this? The openarkkit toolkit did just that with oak-online-alter-table. We have published our version of an online schema change utility (OnlineSchemaChange.php aka OSC).
facebook  mysql  sql  schema  database  migrations  ops  alter-table 
march 2013 by jm
Two Sides For Salvation « Code as Craft
Etsy's MySQL master-master pair configuration, and how it allows no-downtime schema changes
database  etsy  mysql  replication  schema  availability  downtime 
december 2012 by jm
Spanner: Google's Globally-Distributed Database [PDF]

Abstract: Spanner is Google's scalable, multi-version, globally-distributed, and synchronously-replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This paper describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: non-blocking reads in the past, lock-free read-only transactions, and atomic schema changes, across all of Spanner.

To appear in:
OSDI'12: Tenth Symposium on Operating System Design and Implementation, Hollywood, CA, October, 2012.
database  distributed  google  papers  toread  pdf  scalability  distcomp  transactions  cap  consistency 
september 2012 by jm
BlueRunner: Email in the Cloud with Cassandra [PDF]
interesting prez from some IBM researchers on using Cassandra as a mail store, via Jeremy
via:jzawodny  mail  cassandra  database  data  ibm  nosql  performance  presentation  pdf  from delicious
april 2010 by jm
UK company selling "have you been phished" check using stolen data
according to this, a retired cop has set up a company called Lucid Intelligence with 'the records of four million Britons, and 40 million people worldwide, mostly Americans', and plans to 'charge members of the public for access to his database to check whether their data security has been breached.' How is this legal under Data Protection law? wtf
privacy  uk  law  hacking  phishing  fraud  crime  police  database  identity-theft  lucid-intelligence  data-protection  security  colin-holder 
july 2009 by jm

related tags

activerecord  algorithms  alter-table  analytics  ap  architecture  automation  availability  b-trees  build  bw-trees  cache-friendly  cap  cap-theorem  cassandra  cd  ci  clickhouse  colin-holder  consistency  continuous-delivery  cp  crime  data  data-protection  data-structures  database  dependencies  deploy  deployment  distcomp  distributed  downtime  etsy  eventual-consistency  facebook  faqs  feature-flags  fraud  gilt-groupe  gizzard  google  hacker-news  hacking  ibm  identity-theft  innodb  joins  kellabyte  law  lhm  locking  loose-coupling  lucid-intelligence  mail  manhattan  mechanical-sympathy  messaging  microsoft  migrations  mysql  nosql  ops  outages  paper  papers  pdf  percona  performance  peter-bailis  phishing  pillar  police  postgres  postgresql  presentation  presentations  privacy  queue  queueing  rails  rds  redis  reference  reliability  replication  research  riak  ruby  scala  scalability  schema  security  sharding  slides  soundcloud  sql  sql-server  state  storage  time-series  timeseries  toread  transactions  tsd  twitter  uber  uk  version-control  via:filippo  via:jzawodny  via:nelson  voldemort  yandex 

Copy this bookmark: