jm + deploy   6

Outage postmortem (2015-10-08 UTC) : Stripe: Help & Support
There was a breakdown in communication between the developer who requested the index migration and the database operator who deleted the old index. Instead of working on the migration together, they communicated in an implicit way through flawed tooling. The dashboard that surfaced the migration request was missing important context: the reason for the requested deletion, the dependency on another index’s creation, and the criticality of the index for API traffic. Indeed, the database operator didn’t have a way to check whether the index had recently been used for a query.

Good demo of how the Etsy-style chatops deployment approach would have helped avoid this risk.
stripe  postmortem  outages  databases  indexes  deployment  chatops  deploy  ops 
october 2015 by jm
Etsy's Release Management process
Good info on how Etsy use their Deployinator tool, end-to-end.

Slide 11: git SHA is visible for each env, allowing easy verification of what code is deployed.

Slide 14: Code is deployed to "princess" staging env while CI tests are running; no need to wait for unit/CI tests to complete.

Slide 23: smoke tests of pre-prod "princess" (complete after 8 mins elapsed).

Slide 31: dashboard link for deployed code is posted during deploy; post-release prod smoke tests are run by Jenkins. (short ones! they complete in 42 seconds)
deployment  etsy  deploy  deployinator  princess  staging  ops  testing  devops  smoke-tests  production  jenkins 
april 2015 by jm
'Continuous Deployment: The Dirty Details'
Good slide deck from Etsy's Mike Brittain regarding their CD setup. Some interesting little-known details:

Slide 41: database schema changes are not CD'd -- they go out on "Schema change Thursdays".

Slide 44: only the webapp is CD'd -- PHP, Apache, memcache components (, support and back-office tools, developer API, gearman async worker queues). The external "services" are not -- databases, Solr/JVM search (rolling restarts), photo storage (filters, proxy cache, S3), payments (PCI-DSS, controlled access).

They avoid schema changes and breaking changes using an approach they call "non-breaking expansions" -- expose new version in a service interface; support multiple versions in the consumer. Example from slides 50-63, based around a database schema migration.

Slide 66: "dev flags" (rollout oriented) are promoted to "feature flags" (long lived degradation control).

Slide 71: some architectural philosophies: deploying is cheap; releasing is cheap; gathering data should be cheap too; treat first iterations as experiments.

Slide 102: "Canary pools". They have multiple pools of users for testing in production -- the staff pool, users who have opted in to see prototypes/beta stuff, 0-100% gradual phased rollout.
cd  deploy  etsy  slides  migrations  database  schema  ops  ci  version-control  feature-flags 
april 2015 by jm
Rails migrations with no downtime
Ugh, Rails fail. It is impossible to drop a column from a Rails-managed table without downtime, even if nothing in the code accesses it (!!), without ugly hacks that don't even seem to work on recent versions of ActiveRecord.
activerecord  deploy  migrations  rails  ruby  sql  fail  downtime 
november 2014 by jm
Test-Driven Infrastructure with Chef
Interesting idea.
The book introduces “Infrastructure as Code,” test-driven development, Chef, and cucumber-chef, and then proceeds to a simple example using Chef to provision a shared Linux server. The recipes for the server are developed test-first, demonstrating both the technique and the workflow.
tdd  chef  server  provisioning  build  deploy  linux  coding  ops  sysadmin 
march 2013 by jm

Copy this bookmark: