post-mortem   248

« earlier    

The Travis CI Blog: Incident Post-Mortem and Security Advisory: Data Exposure After Outage
公司里某熟练工程序员在跑测试代码时、数据库URL的环境变量设成了 prod 的数据库,直接删库了;恢复数据后、在故障期间登录的用户可登录到其他用户的账号,安全隐患。最终丢失15分钟的数据。
事故报告  Post-Mortem 
5 weeks ago by cdpath
Our Wedding Chatbot – Chatbots Magazine
I came to the realization that a bot was the perfect solution for the flood of wedding messages.
bots  post-mortem 
january 2018 by lorenzck
RESOLVED: Current account payments may fail - Major Outage (27/10/2017) - The Current Account - Monzo Community
A large scale failure in a distributed system can be very difficult to understand, and well-intentioned human action can sometimes compound issues, as happened here. When things like this do happen, we want to learn as much as possible from the event to ensure it can’t resurface.
kubernetes  linkerd  distributed-system  post-mortem 
november 2017 by sanjary

« earlier    

related tags

2016  accident  accidents  advice  air  airlines  allspaw  amazon  analysis  apps  arbeit  ars-technica  aws  bailey  best-practices  bgp  blacklocus  blameless  blog  bots  brain-dissection  brains  buffer  bug  business  cars  case  clock  clojure  cloud  cloudflare  cms  community  company  crashes  culture  database  datetime  db  debriefs  design  development  devops  disaster  distributed-system  dns  downtime  dyndns  energy  engineering  etsy  experience  fail  failosophy  failure  firebase  games  gc  gitlab  gladwell  go  golang  google  gossip  governance  guide  hack  history-computers  history-games  honeycomb  howto  hugops  ia  important  incident  info-sec  infosec  instapaper  internet  iphone  java  jeremy  jvm  kafka  king-crimson  korea  kubernetes  leap-second  leap  linkerd  management  memory  metadiscovery  mistake  mistakes  mobile  monitoring  movie-history  movies  mysql  nanomsg  netapp  ntp  nuclear  open-source  operations  ops  outage  outages  performance  permissions  peter-bright  polygon  post  postgres  postgresql  postgresql_tips  postmortem  priorities  project-management  rds  reddit  regex  reliability  report  rethinkdb  retrocomputing  retrogaming  retrospective  root-cause-analysis  s3  second  security  share  skyliner  smackdown  software  space  square  sre  stackoverflow  startup  startups  street-fighter  study  sync  sysadmin  tech  the-construkction-of-light  time  tony-levin  travis  twitter  uber  ui  usability  ux  video  webapps  why  windows-phone  writing  y!  zeromq  事故报告 

Copy this bookmark: