jm + sre   8

Annotated tenets of SRE
A google SRE annotates the Google SRE book with his own thoughts. The source material is great, but the commentary improves it alright.

Particularly good for the error budget concept.

Also: when did "runbooks" become "playbooks"? Don't particularly care either way, but needless renaming is annoying.
runbooks  playbooks  ops  google  sre  error-budget 
18 days ago by jm
Google - Site Reliability Engineering
The Google SRE book is now online, for free
sre  google  ops  books  reading 
7 weeks ago by jm
'Jupiter rising: A decade of Clos topologies and centralized control in Google’s datacenter networks'
Love the 'decade of' dig at FB and Amazon -- 'we were doing it first' ;)

Great details on how Google have built out and improved their DC networking. Includes a hint that they now use DCTCP (datacenter-optimized TCP congestion control) on their internal hosts....
datacenter  google  presentation  networks  networking  via:irldexter  ops  sre  clos-networks  fabrics  switching  history  datacenters 
october 2016 by jm
My Philosophy on Alerting
'based my observations while I was a Site Reliability Engineer at Google', courtesy of Rob Ewaschuk <rob@infinitepigeons.org>. Seem pretty reasonable
monitoring  sysadmin  alerting  alerts  nagios  pager  ops  sre  rob-ewaschuk 
july 2016 by jm
Review: Site Reliability Engineering
John "lusis" Vincent reviews the SRE book, not 100% positively
sre  books  reading  reviews  lusis 
april 2016 by jm
Dan Luu reviews the Site Reliability Engineering book
voluminous! still looks great, looking forward to reading our copy (via Tony Finch)
via:fanf  books  reading  devops  ops  google  sre  dan-luu 
april 2016 by jm
Wired on the new O'Reilly SRE book
"Site Reliability Engineering: How Google Runs Production Systems", by Chris Jones, Betsy Beyer, Niall Richard Murphy, Jennifer Petoff. Go Niall!
google  sre  niall-murphy  ops  devops  oreilly  books  toread  reviews 
april 2016 by jm
interview with Google VP of SRE Ben Treynor
interviewed by Niall Murphy, no less ;). Some good info on what Google deems important from an ops/SRE perspective
sre  ops  devops  google  monitoring  interviews  ben-treynor 
may 2014 by jm

Copy this bookmark:



description:


tags: