jm + complexity   7

'STELLA Report from the SNAFUcatchers Workshop on Coping With Complexity', March 14-16 2017
'A consortium workshop of high end techs reviewed postmortems to better understand how engineers cope with the complexity of anomalies (SNAFU and SNAFU catching episodes) and how to support them. These cases reveal common themes regarding factors that produce resilient performances. The themes that emerge also highlight opportunities to move forward.'

The 'Dark debt' concept is interesting here.
complexity  postmortems  dark-debt  technical-debt  resilience  reliability  systems  snafu  reports  toread  stella  john-allspaw 
4 days ago by jm
Taming Complexity with Reversibility
This is a great post from Kent Beck, putting a lot of recent deployment/rollout patterns in a clear context -- that of supporting "reversibility":
Development servers. Each engineer has their own copy of the entire site. Engineers can make a change, see the consequences, and reverse the change in seconds without affecting anyone else.
Code review. Engineers can propose a change, get feedback, and improve or abandon it in minutes or hours, all before affecting any people using Facebook.
Internal usage. Engineers can make a change, get feedback from thousands of employees using the change, and roll it back in an hour.
Staged rollout. We can begin deploying a change to a billion people and, if the metrics tank, take it back before problems affect most people using Facebook.
Dynamic configuration. If an engineer has planned for it in the code, we can turn off an offending feature in production in seconds. Alternatively, we can dial features up and down in tiny increments (i.e. only 0.1% of people see the feature) to discover and avoid non-linear effects.
Correlation. Our correlation tools let us easily see the unexpected consequences of features so we know to turn them off even when those consequences aren't obvious.
IRC. We can roll out features potentially affecting our ability to communicate internally via Facebook because we have uncorrelated communication channels like IRC and phones.
Right hand side units. We can add a little bit of functionality to the website and turn it on and off in seconds, all without interfering with people's primary interaction with NewsFeed.
Shadow production. We can experiment with new services under real load, from a tiny trickle to the whole flood, without affecting production.
Frequent pushes. Reversing some changes require a code change. On the website we never more than eight hours from the next schedule code push (minutes if a fix is urgent and you are willing to compensate Release Engineering). The time frame for code reversibility on the mobile applications is longer, but the downward trend is clear from six weeks to four to (currently) two.
Data-informed decisions. (Thanks to Dave Cleal) Data-informed decisions are inherently reversible (with the exceptions noted below). "We expect this feature to affect this metric. If it doesn't, it's gone."
Advance countries. We can roll a feature out to a whole country, generate accurate feedback, and roll it back without affecting most of the people using Facebook.
Soft launches. When we roll out a feature or application with a minimum of fanfare it can be pulled back with a minimum of public attention.
Double write/bulk migrate/double read. Even as fundamental a decision as storage format is reversible if we follow this format: start writing all new data to the new data store, migrate all the old data, then start reading from the new data store in parallel with the old.


We do a bunch of these in work, and the rest are on the to-do list. +1 to these!
software  deployment  complexity  systems  facebook  reversibility  dark-releases  releases  ops  cd  migration 
july 2015 by jm
Is Docker ready for production? Feedbacks of a 2 weeks hands on
I have to agree with this assessment -- there are a lot of loose ends still for production use of Docker in a SOA stack environment:
From my point of view, Docker is probably the best thing I’ve seen in ages to automate a build. It allows to pre build and reuse shared dependencies, ensuring they’re up to date and reducing your build time. It avoids you to either pollute your Jenkins environment or boot a costly and slow Virtualbox virtual machine using Vagrant. But I don’t feel like it’s production ready in a complex environment, because it adds too much complexity. And I’m not even sure that’s what it was designed for.
docker  complexity  devops  ops  production  deployment  soa  web-services  provisioning  networking  logging 
october 2014 by jm
The End of Linux
'Linux is becoming the thing that we adopted Linux to get away from.'

Great post on the horrible complexity of systemd. It reminds me of nothing more than mid-90s AIX, which I had the displeasure of opsing for a while -- the Linux distros have taken a very wrong turn here.
linux  unix  complexity  compatibility  ops  rant  systemd  bloat  aix 
september 2014 by jm
Excellent Rob Pike quote about algorithmic complexity
'Fancy algorithms are slow when n is small, and n is usually small.' -- Rob Pike


Been there, bought the t-shirt ;)
rob-pike  quotes  algorithms  big-o  complexity  coding 
september 2013 by jm
Scala 2.8 Collections API -- Performance Characteristics
wow. Every library vending a set of collection types should have a page like this
collections  scala  performance  reference  complexity  big-o  coding 
january 2013 by jm

Copy this bookmark:



description:


tags: