jm + stripe   6

Reproducible research: Stripe’s approach to data science
This is intriguing -- using Jupyter notebooks to embody data analysis work, and ensure it's reproducible, which brings better rigour similarly to how unit tests improve coding. I must try this.
Reproducibility makes data science at Stripe feel like working on GitHub, where anyone can obtain and extend others’ work. Instead of islands of analysis, we share our research in a central repository of knowledge. This makes it dramatically easier for anyone on our team to work with our data science research, encouraging independent exploration.

We approach our analyses with the same rigor we apply to production code: our reports feel more like finished products, research is fleshed out and easy to understand, and there are clear programmatic steps from start to finish for every analysis.
stripe  coding  data-science  reproducability  science  jupyter  notebooks  analysis  data  experiments 
november 2016 by jm
Service discovery at Stripe
Writeup of their Consul-based service discovery system, a bit similar to smartstack. Good description of the production problems that they saw with Consul too, and also they figured out that strong consistency isn't actually what you want in a service discovery system ;)

HN comments are good too:
consul  api  microservices  service-discovery  dns  load-balancing  l7  tcp  distcomp  smartstack  stripe  cap-theorem  scalability 
november 2016 by jm
Outage postmortem (2015-10-08 UTC) : Stripe: Help & Support
There was a breakdown in communication between the developer who requested the index migration and the database operator who deleted the old index. Instead of working on the migration together, they communicated in an implicit way through flawed tooling. The dashboard that surfaced the migration request was missing important context: the reason for the requested deletion, the dependency on another index’s creation, and the criticality of the index for API traffic. Indeed, the database operator didn’t have a way to check whether the index had recently been used for a query.

Good demo of how the Etsy-style chatops deployment approach would have helped avoid this risk.
stripe  postmortem  outages  databases  indexes  deployment  chatops  deploy  ops 
october 2015 by jm
Scaling email transparency
This is quite interesting/weird -- Stripe's protocol for mass-CCing email as they scale up the company, based around
communication  culture  email  management  stripe  cc  transparency  civil-inattention 
december 2014 by jm
Game Day Exercises at Stripe: Learning from `kill -9`
We’ve started running game day exercises at Stripe. During a recent game day, we tested failing over a Redis cluster by running kill -9 on its primary node, and ended up losing all data in the cluster. We were very surprised by this, but grateful to have found the problem in testing. This result and others from this exercise convinced us that game days like these are quite valuable, and we would highly recommend them for others.

Excellent post. Game days are a great idea. Also: massive Redis clustering fail
game-days  redis  testing  stripe  outages  ops  kill-9  failover 
october 2014 by jm
Big, Small, Hot or Cold - Your Data Needs a Robust Pipeline
'(Examples [of big-data B-I crunching pipelines] from Stripe, Tapad, Etsy & Square)'
stripe  tapad  etsy  square  big-data  analytics  kafka  impala  hadoop  hdfs  parquet  thrift 
february 2014 by jm

Copy this bookmark: