jm + processes   5

The Yelp Production Engineering Documentation Style Guide
This is great! Also they correctly use the term "runbook" instead of "playbook" :)
Documentation is something that many of us in software and site reliability engineering struggle with – even if we recognize its importance, it can still be a struggle to write it consistently and to write it well. While we in Yelp’s Production Engineering group are no different, over the last few quarters we’ve engaged in a concerted effort to do something about it.

One of the first steps towards changing this process was developing our documentation style guide, something that started out as a Hackathon project late last year. I spoke about it when I was giving my talk on documentation at SRECon EMEA in August, and afterwards, a number of people reached out to ask if they could have a copy.

While what we’re sharing today isn’t our exact style guide – we’ve trimmed out some of the specifics that aren’t really relevant, done a bit of rewording for a more general audience, and added some annotations – it’s essentially the one we’ve been using since the start of this year, with the caveat that it’s a living document and continues to be refined. While this may not be perfect for every team (both at Yelp and elsewhere), it’s helped us raise the bar on our own documentation and provides an example for others to follow.
yelp  pe  sre  ops  engineering  documentation  srecon  chastity-blackwell  processes 
6 weeks ago by jm
How To Measure the Working Set Size on Linux
A nifty metric:
The Working Set Size (WSS) is how much memory an application needs to keep working. Your app may have populated 100 Gbytes of main memory, but only uses 50 Mbytes each second to do its job. That's the working set size. It is used for capacity planning and scalability analysis.

You may never have seen WSS measured by any tool (I haven't either). OSes usually show you virtual memory and resident memory, shown as the "VIRT" and "RES" columns in top. Resident memory is real memory: main memory that has been allocated and page mapped. But we don't know how much of that is in heavy use, which is what WSS tells us.

In this post I'll introduce some new things I've developed for WSS estimation: two Linux tools, and WSS profile charts. The tools use either the referenced or the idle page flags to measure a page-based WSS, and were developed out of necessity for another performance problem.


(via Amy Tobey)
via:amytobey  memory  linux  rss  wss  proc  ps  processes  metrics  working-set-size  ram 
january 2018 by jm
How Completely Messed Up Practices Become Normal
on Normalization of Deviance, with a few anecdotes from Silicon Valley. “The gradual process through which unacceptable practice or standards become acceptable. As the deviant behavior is repeated without catastrophic results, it becomes the social norm for the organization.”
normalization-of-deviance  deviance  bugs  culture  ops  reliability  work  workplaces  processes  norms 
december 2015 by jm
Peek and poke in the age of Linux
Neat demo of using ptrace to inject into a running process, just like the good old days ;)
Some time ago I ran into a production issue where the init process (upstart) stopped behaving properly. Specifically, instead of spawning new processes, it deadlocked in a transitional state. [...] What’s worse, upstart doesn’t allow forcing a state transition and trying to manually create and send DBus events didn’t help either. That meant the sane options we were left with were:
restart the host (not desirable at all in that scenario);
start the process manually and hope auto-respawn will not be needed.
Of course there are also some insane options. Why not cheat like in the old times and just PEEK and POKE the process in the right places? The solution used at the time involved a very ugly script driving gdb which probably summoned satan in some edge cases. But edge cases were not hit and majority of hosts recovered without issues.
debugging  memory  linux  upstart  peek  poke  ptrace  gdb  processes  hacks 
march 2013 by jm

Copy this bookmark:



description:


tags: