slides from "Distributed Log-Processing Design Workshop", SRECon Americas 2018
Fantastic presentation discussing the kinds of design criteria used when architecting a large-scale data processing and storage service. Interesting to see some Google terminology, e.g. "dimensioning" -- ballparking the expected scalability numbers, bandwidth, qps, and limits.
distributed-systems  coding  design  architecture  google  photon  logs  log-storage  slides  srecon 
9 weeks ago by jm
The Yelp Production Engineering Documentation Style Guide
This is great! Also they correctly use the term "runbook" instead of "playbook" :)
Documentation is something that many of us in software and site reliability engineering struggle with – even if we recognize its importance, it can still be a struggle to write it consistently and to write it well. While we in Yelp’s Production Engineering group are no different, over the last few quarters we’ve engaged in a concerted effort to do something about it.

One of the first steps towards changing this process was developing our documentation style guide, something that started out as a Hackathon project late last year. I spoke about it when I was giving my talk on documentation at SRECon EMEA in August, and afterwards, a number of people reached out to ask if they could have a copy.

While what we’re sharing today isn’t our exact style guide – we’ve trimmed out some of the specifics that aren’t really relevant, done a bit of rewording for a more general audience, and added some annotations – it’s essentially the one we’ve been using since the start of this year, with the caveat that it’s a living document and continues to be refined. While this may not be perfect for every team (both at Yelp and elsewhere), it’s helped us raise the bar on our own documentation and provides an example for others to follow.
yelp  pe  sre  ops  engineering  documentation  srecon  chastity-blackwell  processes 
october 2018 by jm

