cleskowsky + goodops   82

We Re-Launched The New York Times Paywall and No One Noticed
nice writeup about how a large migration was done with a missed aspect that triggered a pause, rollback, fix, and then moving forward again ...
nytimes  goodops 
7 weeks ago by cleskowsky
(18) Randy Shoup on Twitter: "Building an infrastructure team in a product-led organization like Slack: evangelize your value from day 1 @jewelia at #QConNYC" / Twitter
Observabiilty for teams. If you aren't making it apparent to the rest of the business what they do, don't be surprised when they ask eventually. (And it might by along the lines of "So what do they do again, anyways?")
goodops  teams  leadership  communication  comms  observability 
8 weeks ago by cleskowsky
Serverless beyond Functions: Building Modern Applications - Speaker Deck
Goes all the way from architectures,design through to team ownership. Great high level overview!
goodops  aws  architecture  systems  functions  faas  lambda 
9 weeks ago by cleskowsky
The Reactive Manifesto
resilience, async, back pressure, messages
architecture  programming  goodops 
9 weeks ago by cleskowsky
Distributed Log-Processing Design Workshop
Non abstract large system design
From the google sre workbook
Sizing, capacity planning
google  nalsd  architecture  design  sreworkbook  goodops  capacity  sizing 
10 weeks ago by cleskowsky
(19) Daniel Vassallo on Twitter: "This is how I use the good parts of @awscloud, while filtering out all the distracting hype. 👇 1/25" / Twitter
Advice for when you're just getting started. Simple strategy around local dev, prod will take you far and make for agile, quick forward motion
goodops  aws  earlystage  startup  infra  simple 
12 weeks ago by cleskowsky
How It Works - Let's Encrypt - Free SSL/TLS Certificates
Nice description of the letsencrypt client/server flow that allows for creating tls certs (scriptable)
letsencrypt  tls  goodops  ssl  security  ansible 
may 2019 by cleskowsky
An explanation and a metaphor around this topic. It's nuanced, and subjective for the most part. I guess you need a bit of experience behind you to be able to tell what and how much to take on. (So many different factors to consider...)
goodops  design  martinfowler 
may 2019 by cleskowsky
Metric and label naming | Prometheus
Rules of thumb to consider when naming a metric
monitoring  goodops  naming  prometheus 
may 2019 by cleskowsky
Apdex - Wikipedia
Relating user satisfaction to metrics like latency, errors, ...

Below or close to slos => satisfied, tolerating. 4x above slo => angry (doesn't contribute to satisfaction score
goodops  monitoring  performance 
may 2019 by cleskowsky
Introduction · John Lewis IT Software Engineering Principles
A list of design principles a team could follow to create more consistent results in terms of quality. This is not heavy handed or too prescriptive. Describes qualities of work product that include scaling, monitoring, availability, prefer simple to complex, etc

Heavy on values, principles over specific tactics
goodops  johnlewis  british  howwework  prodready 
may 2019 by cleskowsky
Questions for a new technology. | Kellan Elliott-McCrea
System design style : "Small number of well known tools" <-I like this a lot!
leadership  architecture  complexity  kellan  goodops  newshiny 
may 2019 by cleskowsky
Weathering the Unexpected - ACM Queue
DiRT <-disaster recovery testing
How to do it and why.
goodops  google  gamedays  disasterrecovery 
april 2019 by cleskowsky
Segment | Customer Data Infrastructure (CDI)
Find your key value metrics. (KPIs) Look at these. Rates can be more powerful than totals. Lots of good advice in here.
goodops  metrics  monitoring 
april 2019 by cleskowsky
Stigmergy - Wikipedia
This seems like an important idea
japan  leadership  change  goodops 
april 2019 by cleskowsky
The CASE Method: Better Monitoring For Humans
Much goodness here. Alerts should have context, be actionable, trigger on user facing symptoms, and face regular evaluation to determine usefulness.
design  monitoring  goodops  corywatson 
april 2019 by cleskowsky
Cindy Sridharan on Twitter: "OMG so much this!! What kinds of telemetry should you emit? Well, there are many - @xaprb at #velocityconf… "
Universally good metrics:
USE util saturation errors
RED rate errors duration
Golden signals errors rate latency saturation
Throughput, concurrency, arrival rate

bit of overlap. some are intended for h/w+os, some for applevel, some for connective tissue between apps
monitoring  goodops 
july 2018 by cleskowsky
« earlier      
per page:    204080120160

related tags

1on1s  agentforwarding  ansible  apidesign  apis  architecture  automation  availability  awk  aws  baronschwartz  bash  bastion  boring  bringtheboring  british  calendarversioning  capacity  career  change  charitymajors  cloudflare  cockroachdb  coderead  cog  comms  communication  complexity  control  controlsystems  corywatson  data  databases  decisions  deploy  deploys  design  devops  digitalocean  disasterrecovery  dns  docs  earlystage  effective  email  faas  frontend  functions  gamedays  goodops  google  growth  happinessmeasure  heroku  honeycomb  howwework  incidentreport  incidentresponse  incidentreview  incrementmag  infra  infrastructure  japan  jenkins  jessicakerr  joelspolsky  johnallspaw  johnlewis  kellan  lambda  leadership  letsencrypt  linux  logs  management  martinfowler  memory  metrics  migrate  monitoring  monthly  nalsd  naming  netflix  newshiny  nytimes  observability  oncall  operations  pagerduty  panic  papers  parameters  performance  postfix  postman  postmortems  prettybash  prodready  programming  prometheus  python  release  secrets  security  segment  semanticversioning  simple  singlevps  sizing  slack  slas  slis  slos  spinnaker  sreworkbook  ssh  ssl  startup  static  stripe  swap  sysctl  systems  teams  timbray  timeseriesdbs  tls  tools  ux  versioning  videos  webhooks  wellarchitected 

Copy this bookmark: