jm + ops   488

15 Key Takeaways from the Serverless Talk at AWS Startup Day
Best current practices for AWS Lambda usage. (still pretty messy/hacky/Rube-Goldberg-y from the looks of it tbh)
aws  lambda  serverless  ops  hacks  amazon 
yesterday by jm
The problems with DynamoDB Auto Scaling and how it might be improved
'Based on these observations, we hypothesize that you can make two modifications to the system to improve its effectiveness:

trigger scaling up after 1 threshold breach instead of 5, which is in-line with the mantra of “scale up early, scale down slowly”;
trigger scaling activity based on actual request count instead of consumed capacity units, and calculate the new provisioned capacity units using actual request count as well.

As part of this experiment, we also prototyped these changes (by hijacking the CloudWatch alarms) to demonstrate their improvement.'
dynamodb  autoscaling  ops  scalability  aws  scaling  capacity 
5 days ago by jm
What I’ve learned from nearly three years of enterprise Wi-Fi at home
I am happy to note that I've grown out of this kind of pain (I think)....
Do you just want better Wi-Fi in every room? Consider buying a Plume or Amplifi or other similar plug-n-go mesh system. On the other hand, are you a technically proficient network kind of person who wants to build an enterprise-lite configuration at home? Do you dream of VLANs and port profiles and lovingly tweaked firewall rules? Does the idea of crawling around in your attic to ceiling-mount some access points sound like a fun way to kill a weekend? Is your office just too quiet for your liking? Buy some Ubiquiti Unifi gear and enter network nerd nirvana.
networking  wifi  wireless  ubiquiti  sdn  vlans  home  ops 
11 days ago by jm
Wifi Design Tips
PDF with a few good tips on wifi layout, AP placement etc. Also recommended: https://www.youtube.com/watch?v=Adep0SeOjAE&feature=youtu.be&t=17m22s (via irldexter)
via:irldexter  wifi  802.11  wireless  ops  networking 
11 days ago by jm
Nginx tuning tips: TLS/SSL HTTPS – Improved TTFB/latency
Must do these soon on jmason.org / taint.org et al.
nginx  http  https  http2  ops  tls  security  linux 
12 days ago by jm
airlift/jvmkill
a simple JVMTI agent that forcibly terminates the JVM when it is unable to allocate memory or create a thread. This is important for reliability purposes: an OutOfMemoryError will often leave the JVM in an inconsistent state. Terminating the JVM will allow it to be restarted by an external process manager.


This is apparently still useful despite the existence of '-XX:ExitOnOutOfMemoryError' as of java 8, since that may somehow still fail occasionally.
oom  java  reliability  uptime  memory  ops 
12 days ago by jm
Save on your AWS bill with Kubernetes Ingress
decent into to Kubernetes Ingress and the Ambassador microservices API gateway built on Envoy Proxy
envoy  proxying  kubernetes  aws  elb  load-balancing  ingress  ambassador  ops 
28 days ago by jm
Taming the Beast: How Scylla Leverages Control Theory to Keep Compactions Under Control - ScyllaDB
This is a really nice illustration of the use of control theory to set tunable thresholds automatically in a complex storage system. Nice work Scylla:

At any given moment, a database like ScyllaDB has to juggle the admission of foreground requests with background processes like compactions, making sure that the incoming workload is not severely disrupted by compactions, nor that the compaction backlog is so big that reads are later penalized.

In this article, we showed that isolation among incoming writes and compactions can be achieved by the Schedulers, yet the database is still left with the task of determining the amount of shares of the resources incoming writes and compactions will use.

Scylla steers away from user-defined tunables in this task, as they shift the burden of operation to the user, complicating operations and being fragile against changing workloads. By borrowing from the strong theoretical background of industrial controllers, we can provide an Autonomous Database that adapts to changing workloads without operator intervention.
scylladb  storage  settings  compaction  automation  thresholds  control-theory  ops  cassandra  feedback 
4 weeks ago by jm
How to change JVM arguments at runtime to avoid application restart
This is a super nifty feature of the JVM: turn on and off heap class histogram dumps at runtime, for instance.
java -XX:+PrintFlagsFinal -version|grep manageable
jvm  ops  switches  cli  java  heap-dumps  memory  debugging  memory-leaks 
5 weeks ago by jm
AWS Region Table
what products are available where
amazon  aws  regions  azs  services  architecture  ops 
5 weeks ago by jm
course-hero/slacktee
'a bash script that works like tee command. Instead of writing the standard input to files, slacktee posts it to Slack.'

(via Ardi)
via:ardi  shell  slack  ops  hacks  notification 
8 weeks ago by jm
schibsted/strongbox: A secret manager for AWS
Strongbox is a CLI/GUI and SDK to manage, store, and retrieve secrets (access tokens, encryption keys, private certificates, etc). Strongbox is a client-side convenience layer on top of AWS KMS, DynamoDB and IAM. It manages the AWS resources for you and configure them in a secure way. Strongbox has been used in production since mid-2016 and is now used extensively within Schibsted.
schibsted  strongbox  kms  aws  dynamodb  storage  secrets  credentials  passwords  ops 
8 weeks ago by jm
EC2 Instance Update – C5 Instances with Local NVMe Storage (C5d)
With a 25% to 50% improvement in price-performance over the C4 instances, the C5 instances are designed for applications like batch and log processing, distributed and or real-time analytics, high-performance computing (HPC), ad serving, highly scalable multiplayer gaming, and video encoding. Some of these applications can benefit from access to high-speed, ultra-low latency local storage. For example, video encoding, image manipulation, and other forms of media processing often necessitates large amounts of I/O to temporary storage. While the input and output files are valuable assets and are typically stored as Amazon Simple Storage Service (S3) objects, the intermediate files are expendable. Similarly, batch and log processing runs in a race-to-idle model, flushing volatile data to disk as fast as possible in order to make full use of compute resources.


Very nice!
ec2  instance-types  ops  storage  hardware  aws 
8 weeks ago by jm
Docker is the dangerous gamble which we will regret : devops
The article this Reddit thread links to is garbage clickbait, but the responses are insightful and much better
reddit  ops  containerization  docker  contrarians  rkt 
8 weeks ago by jm
Attacks against GPG signed APT repositories - Packagecloud Blog

It is a common misconception that simply signing your packages and repository metadata with GPG is enough to create a secure APT repository. This is false. Many of the attacks outlined in the paper and this blog post are effective against GPG-signed APT repositories. GPG signing Debian packages themselves does nothing, as explained below. The easiest way to prevent the attacks covered below is to always serve your APT repository over TLS; no exceptions.


This is excellent research. My faith in GPG sigs on packages is well shaken.
apt  security  debian  packaging  gpg  pgp  packages  dpkg  apt-get  ops 
9 weeks ago by jm
Debugging Stuck Ruby Processes — What to do Before You Kill -9
good tips on using gdb to gather backtraces (via Louise)
debugging  gdb  ruby  linux  unix  threads  ops 
12 weeks ago by jm
"Tweeps! What’s the craziest infra incident you worked on at Twitter"
great thread of Twitter outages and production incidents. I would love to hear more details about these, I love hearing about other people's outages ;) Even reading "over a month of cleanup and some permanent data loss" has me sweating....
infrastructure  engineering  twitter  ops  outages  production 
april 2018 by jm
Another reason why your Docker containers may be slow
TL;DR: fadvise() is a bottleneck on Linux machines running many containers
linux  fadvise  filesystems  performance  docker  containers  ops 
april 2018 by jm
Generate Mozilla Security Recommended Web Server Configuration Files
this is quite cool -- generate web server configs to activate current best-practice TLS settings
web  openssl  nginx  lighttpd  apache  haproxy  hsts  security  ssl  tls  ops 
february 2018 by jm
Checkup
'Simple uptime monitoring: distributed, self-hosted health checks and status pages' -- stores in S3
go  ops  monitoring  uptime  health-checks  status-pages  status  golang  s3 
december 2017 by jm
auto53
'The missing link between AWS AutoScaling Groups and Route53 [...] solves the issue of keeping a route53 zone up to date with the changes that an autoscaling group might face.'
auto53  route-53  dns  aws  amazon  ops  hostnames  asg  autoscaling 
december 2017 by jm
AWS re:invent 2017: Container Networking Deep Dive with Amazon ECS (CON401) // Practical Applications
Another re:Invent highlight to watch -- ECS' new native container networking model explained
reinvent  aws  containers  docker  ecs  networking  sdn  ops 
december 2017 by jm
Introducing the Amazon Time Sync Service
Well overdue; includes Google-style leap smearing
time-sync  time  aws  services  ntp  ops 
november 2017 by jm
Introducing AWS Fargate – Run Containers without Managing Infrastructure
now that's a good announcement. Available right away running atop ECS; EKS in 2018
eks  ecs  fargate  aws  services  ops  containers  docker 
november 2017 by jm
Cronic
'A cure for Cron's chronic email problem'
cron  linux  unix  ops  sysadmin  mail 
october 2017 by jm
IBM broke its cloud by letting three domain names expire - The Register
“multiple domain names were mistakenly allowed to expire and were in hold status.”
outages  fail  ibm  the-register  ops  dns  domains  cloud 
october 2017 by jm
srcecde/aws-lambda-cheatsheet
'AWS Lambda cheatsheet' -- a quick ref card for Lambda users
aws  lambda  ops  serverless  reference  quick-references 
october 2017 by jm
How to operate reliable AWS Lambda applications in production
running a reliable Lambda application in production requires you to still follow operational best practices. In this article I am including some recommendations, based on my experience with operations in general as well as working with AWS Lambda.
aws  cloud  lambda  ops  amazon 
october 2017 by jm
S3 Point In Time Restore
restore a versioned S3 bucket to the state it was at at a specific point in time
ops  s3  restore  backups  versioning  history  tools  scripts  unix 
october 2017 by jm
Share scripts that have dependencies with Nix
Nice approach to one-liner packaging invocations using nix-shell
nix  packaging  unix  linux  ops  shebang  #! 
october 2017 by jm
HN thread on the new Network Load Balancer AWS product
looks like @colmmacc works on it. Lots and lots of good details here
nlb  aws  load-balancing  ops  architecture  lbs  tcp  ip 
september 2017 by jm
Going Multi-Cloud with AWS and GCP: Lessons Learned at Scale
Metamarkets splits across AWS and GCP, going into heavy detail here
aws  gcp  google  ops  hosting  multi-cloud 
august 2017 by jm
Linux Load Averages: Solving the Mystery
Nice bit of OS archaeology by Brendan Gregg.
In 1993, a Linux engineer found a nonintuitive case with load averages, and with a three-line patch changed them forever from "CPU load averages" to what one might call "system load averages." His change included tasks in the uninterruptible state, so that load averages reflected demand for disk resources and not just CPUs. These system load averages count the number of threads working and waiting to work, and are summarized as a triplet of exponentially-damped moving sum averages that use 1, 5, and 15 minutes as constants in an equation. This triplet of numbers lets you see if load is increasing or decreasing, and their greatest value may be for relative comparisons with themselves.
load  monitoring  linux  unix  performance  ops  brendan-gregg  history  cpu 
august 2017 by jm
Arq Backs Up To B2!
Arq backup for OSX now supports B2 (as well as S3) as a storage backend.
"it’s a super-cheap option ($.005/GB per month) for storing your backups." (that is less than half the price of $0.0125/GB for S3's Infrequent Access class)
s3  storage  b2  backblaze  backups  arq  macosx  ops 
august 2017 by jm
Working with multiple AWS accounts at Ticketea
AWS STS/multiple account best practice described
sts  aws  authz  ops  ticketea  dev 
august 2017 by jm
AWS Lambda Deployment using Terraform – Build ACL – Medium
Fairly persuasive that production usage of Lambda is much easier if you go full Terraform to manage and deploy.
A complete picture of what it takes to deploy your Lambda function to production with the same diligence you apply to any other codebase using Terraform. [...] There are many cases where frameworks such as SAM or Serverless are not enough. You need more than that for a highly integrated Lambda function. In such cases, it’s easier to simply use Terraform.
infrastructure  aws  lambda  serverless  ops  terraform  sam 
august 2017 by jm
Nextflow - A DSL for parallel and scalable computational pipelines
Data-driven computational pipelines

Nextflow enables scalable and reproducible scientific workflows using software containers. It allows the adaptation of pipelines written in the most common scripting languages.

Its fluent DSL simplifies the implementation and the deployment of complex parallel and reactive workflows on clouds and clusters.


GPLv3 licensed, open source
computation  workflows  pipelines  batch  docker  ops  open-source 
august 2017 by jm
EBS gp2 I/O BurstBalance exhaustion
when EBS volumes in EC2 exhaust their "burst" allocation, things go awry very quickly
performance  aws  ebs  ec2  burst-balance  ops  debugging 
july 2017 by jm
Kubernetes Best Practices // Speaker Deck
A lot of these are general Docker/containerisation best practices, too.

(via Devops Weekly)
k8s  kubernetes  devops  ops  containers  docker  best-practices  tips  packaging 
july 2017 by jm
awslabs/aws-ec2rescue-linux
Amazon Web Services Elastic Compute Cloud (EC2) Rescue for Linux is a python-based tool that allows for the automatic diagnosis of common problems found on EC2 Linux instances.


Most of the modules appear to be log-greppers looking for common kernel issues.
ec2  aws  kernel  linux  ec2rl  ops 
july 2017 by jm
Wifi AP Placement [video]
'AP Placement - A Job For the Work Experience Kid? | Scott Stapleton | WLPC EU Budapest 2016'
ap  wifi  placement  layout  ops  wireless  home  presos 
july 2017 by jm
OVH suffer 24-hour outage (The Register)
Choice quotes:

‘At 6:48pm, Thursday, June 29, in Room 3 of the P19 datacenter, due to a crack on a soft plastic pipe in our water-cooling system, a coolant leak causes fluid to enter the system';
‘This process had been tested in principle but not at a 50,000-website scale’
postmortems  ovh  outages  liquid-cooling  datacenters  dr  disaster-recovery  ops 
july 2017 by jm
Fastest syncing of S3 buckets
good tip for "aws s3 sync" performance
performance  aws  s3  copy  ops  tips 
july 2017 by jm
Scheduled Tasks (cron) - Amazon EC2 Container Service
ECS now does cron jobs. But where does AWS Batch fit in? confusing
aws  batch  ecs  cron  scheduling  recurrence  ops 
july 2017 by jm
Top 5 ways to improve your AWS EC2 performance
A couple of bits of excellent advice from Datadog (although this may be a slightly old post, from Oct 2016):

1. Unpredictable EBS disk I/O performance. Note that gp2 volumes do not appear to need as much warmup or priming as before.

2. EC2 Instance ECU Mismatch and Stolen CPU. advice: use bigger instances

The other 3 ways are a little obvious by comparison, but worth bookmarking for those two anyway.
ops  ec2  performance  datadog  aws  ebs  stolen-cpu  virtualization  metrics  tips 
july 2017 by jm
How Did I “Hack” AWS Lambda to Run Docker Containers?
Running Docker containers in Lambda using a usermode-docker hack -- hacky as hell but fun ;) Lambda should really support native Docker though
docker  lambda  aws  serverless  ops  hacks  udocker 
june 2017 by jm
Open Guide to Amazon Web Services
'A lot of information on AWS is already written. Most people learn AWS by reading a blog or a “getting started guide” and referring to the standard AWS references. Nonetheless, trustworthy and practical information and recommendations aren’t easy to come by. AWS’s own documentation is a great but sprawling resource few have time to read fully, and it doesn’t include anything but official facts, so omits experiences of engineers. The information in blogs or Stack Overflow is also not consistently up to date. This guide is by and for engineers who use AWS. It aims to be a useful, living reference that consolidates links, tips, gotchas, and best practices. It arose from discussion and editing over beers by several engineers who have used AWS extensively.'
amazon  aws  guides  documentation  ops  architecture 
june 2017 by jm
usl4j And You | codahale.com
Coda Hale wrote a handy java library implementing a USL solver
usl  scalability  java  performance  optimization  benchmarking  measurement  ops  coda-hale 
june 2017 by jm
Scaling Amazon Aurora at ticketea
Ticketing is a business in which extreme traffic spikes are the norm, rather than the exception. For Ticketea, this means that our traffic can increase by a factor of 60x in a matter of seconds. This usually happens when big events (which have a fixed, pre-announced 'sale start time') go on sale.
scaling  scalability  ops  aws  aurora  autoscaling  asg 
may 2017 by jm
Enough with the microservices
Good post!
Much has been written on the pros and cons of microservices, but unfortunately I’m still seeing them as something being pursued in a cargo cult fashion in the growth-stage startup world. At the risk of rewriting Martin Fowler’s Microservice Premium article, I thought it would be good to write up some thoughts so that I can send them to clients when the topic arises, and hopefully help people avoid some of the mistakes I’ve seen. The mistake of choosing a path towards a given architecture or technology on the basis of so-called best practices articles found online is a costly one, and if I can help a single company avoid it then writing this will have been worth it.
architecture  design  microservices  coding  devops  ops  monolith 
may 2017 by jm
Sorry
hosted status page / downtime banner service
banners  web  status  uptime  downtime  ops  reliability 
may 2017 by jm
Spotting a million dollars in your AWS account · Segment Blog
You can easily split your spend by AWS service per month and call it a day. Ten thousand dollars of EC2, one thousand to S3, five hundred dollars to network traffic, etc. But what’s still missing is a synthesis of which products and engineering teams are dominating your costs. 

Then, add in the fact that you may have hundreds of instances and millions of containers that come and go. Soon, what started as simple analysis problem has quickly become unimaginably complex. 

In this follow-up post, we’d like to share details on the toolkit we used. Our hope is to offer up a few ideas to help you analyze your AWS spend, no matter whether you’re running only a handful of instances, or tens of thousands.

segment  money  costs  billing  aws  ec2  ecs  ops 
may 2017 by jm
jantman/awslimitchecker

A script and python module to check your AWS service limits and usage, and warn when usage approaches limits.

Users building out scalable services in Amazon AWS often run into AWS' service limits - often at the least convenient time (i.e. mid-deploy or when autoscaling fails). Amazon's Trusted Advisor can help this, but even the version that comes with Business and Enterprise support only monitors a small subset of AWS limits and only alerts weekly. awslimitchecker provides a command line script and reusable package that queries your current usage of AWS resources and compares it to limits (hard-coded AWS defaults that you can override, API-based limits where available, or data from Trusted Advisor where available), notifying you when you are approaching or at your limits.


(via This Week in AWS)
aws  amazon  limits  scripts  ops 
may 2017 by jm
cristim/autospotting: Pay up to 10 times less on EC2 by automatically replacing on-demand AutoScaling group members with similar or larger identically configured spot instances.
A simple and easy to use tool designed to significantly lower your Amazon AWS costs by automating the use of the spot market.

Once enabled on an existing on-demand AutoScaling group, it launches an EC2 spot instance that is cheaper, at least as large and configured identically to your current on-demand instances. As soon as the new instance is ready, it is added to the group and an on-demand instance is detached from the group and terminated.

It continuously applies this process, gradually replacing any on-demand instances with spot instances until the group only consists of spot instances, but it can also be configured to keep some on-demand instances running.
aws  golang  ec2  autoscaling  asg  spot-instances  ops 
may 2017 by jm
acksin/seespot: AWS Spot instance health check with termination and clean up support
When a Spot Instance is about to terminate there is a 2 minute window before the termination actually happens. SeeSpot is a utility for AWS Spot instances that handles the health check. If used with an AWS ELB it also handles cleanup of the instance when a Spot Termination notice is sent.
aws  elb  spot-instances  health-checks  golang  lifecycle  ops 
may 2017 by jm
NetSpot
'FREE WiFi Site Survey Software for MAC OS X & Windows'.
Sadly reviews from pals are that it is 'shite' :(
osx  wifi  network  survey  netspot  networking  ops  dataviz  wireless 
april 2017 by jm
Julia Evans on Twitter: "notes on this great "When the pager goes off" article"
'notes on this great "When the pager goes off" article from @incrementmag https://increment.com/on-call/when-the-pager-goes-off/ ' -- cartoon summarising a much longer article of common modern ops on-call response techniques. Still pretty consistent with the systems we used in Amazon
on-call  ops  incident-response  julia-evans  pager  increment-mag 
april 2017 by jm
Ubuntu on AWS gets serious performance boost with AWS-tuned kernel
interesting -- faster boots, CPU throttling resolved on t2.micros, other nice stuff
aws  ubuntu  ec2  kernel  linux  ops 
april 2017 by jm
Spotify’s Love/Hate Relationship with DNS
omg somebody at Spotify really really loves DNS. They even store a DHT hash ring in it. whyyyyyyyyyyy
spotify  networking  architecture  dht  insane  scary  dns  unbound  ops 
april 2017 by jm
Things I Learned Managing Site Reliability for Some of the World’s Busiest Gambling Sites
Solid article proselytising runbooks/playbooks (or in this article's parlance, "Incident Models") for dev/ops handover and operational knowledge
ops  process  sre  devops  runbooks  playbooks  incident-models 
april 2017 by jm
« earlier      
per page:    204080120160

related tags

#!  2fa  10/8  16.04  32bit  802.11  accept  accidents  accounts  accounts-daemon  acm  acm-queue  action-items  activemq  activerecord  admin  adrian-cockcroft  advent  advice  agpl  airbnb  airflow  airtable  aix  alarm-fatigue  alarming  alarms  alb  alert-logic  alerting  alerts  alestic  algorithms  allspaw  alter-table  ama  amazon  ambassador  ami  analysis  analytics  anomaly-detection  antarctica  anti-spam  antipatterns  anycast  ap  apache  aphyr  api  api-gateway  apis  app-engine  apt  apt-get  archaius  architecture  arq  asg  asgard  aspirations  assembly  atlas  atomic  auditd  auditing  aufs  aurora  authentication  authz  auto-remediation  auto-scaling  auto53  automation  autoremediation  autoscaling  availability  aws  awsume  az  azs  azure  b2  backblaze  background  backlog  backpressure  backup  backups  banking  banners  bare-metal  baron-schwartz  bash  basho  bastions  batch  bbc  bdb  bdb-je  bdd  beanstalk  ben-maurer  ben-treynor  benchmarking  benchmarks  best-practices  big-data  billing  bind  bit-errors  bitcoin  bitly  bitrot  blake2  blameless  bloat  blockdev  blogs  blue-green-deployments  blue-green-deploys  books  boot2docker  borg  boundary  bpf  brendan-gregg  bridge  broadcast  bryan-cantrill  bsd  btrfs  bugs  build  build-out  building  bureaucracy  burst-balance  byteman  c  c5  ca  ca-7  caches  caching  calico  campaigns  canaries  canary-requests  cap  cap-theorem  capacity  carbon  cascading-failures  case-studies  cassandra  cd  cdn  censum  certificates  certs  cfengine  cgroups  change-management  change-monitoring  changes  chaos-kong  chaos-monkey  charity-majors  charts  chatops  checkip  checklists  chef  chefspec  china  chronos  ci  circuit-breakers  circus  cisco  classification  classifiers  cleaner  cleanup  cli  clocks  clos-networks  cloud  cloud-storage  cloudera  cloudflare  cloudfront  cloudnative  cloudwatch  cluster  clustering  clusters  cms  coda-hale  code-spaces  codeascraft  codedeploy  codel  coding  coes  coinbase  cold  collaboration  command-line  commandline  commercial  compaction  company  compatibility  complexity  compression  computation  concurrency  conferences  confidence-bands  configuration  consistency  consul  containerization  containers  continuous-delivery  continuous-deployment  continuous-integration  continuousintegration  contrarians  control-theory  copy  copy-on-write  copyright  coreos  coreutils  corruption  costs  counting  coursera  cp  cpu  crash-only-software  credentials  critiques  criu  cron  cross-region  crypto  cubism  culture  curl  d-bus  daemon  daemons  dan-luu  danilop  dark-releases  dashboards  data  data-centers  data-corruption  data-loss  database  database-is-not-a-queue  databases  datacenter  datacenters  datadog  dataviz  datawire  dba  dbus  ddl  debian  debriefing  debug  debugging  decay  defrag  delete  delivery  delta  demo  dependencies  deploy  deployinator  deployment  derp  design  desktops  dev  developers  development  deviance  devops  dht  diagnosis  digital-ocean  disaster-recovery  disk  disk-space  disks  distcomp  distributed  distributed-cron  distributed-systems  distros  diy  dmca  dns  dnsmasq  docker  documentation  domains  dotcloud  downtime  dpkg  dr  drivers  dropbox  dstat  duplicity  duply  dynalite  dynamic  dynamic-configuration  dynamodb  dynect  ebooks  ebs  ec2  ec2rl  ecs  efficiency  eks  elastic-scaling  elasticache  elasticsearch  elb  email  emr  emrfs  encryption  engineering  ensemble  environments  envoy  erasure-coding  ergonomics  error-budget  etcd  etl  etsy  eureka  ev  event-management  eventual-consistency  exception-handling  exercises  exponential-decay  ext3  ext4  extortion  fabric  fabrics  facebook  facette  fadvise  fail  failover  failure  false-positives  fargate  fault-tolerance  fcron  feature-flags  fedora  feedback  file-transfer  filesystems  fincore  firefighting  five-whys  flapjack  flavour-of-the-month  flock  flow-logs  forecasting  foursquare  freebsd  front-ends  frontline  fs  fsync  ftrace  fuse  g1  g1gc  ga  gae  game-days  games  gating  gc  gce  gcp  gdb  gdpr  genomics  gifee  gil-tene  gilt  gilt-groupe  git  github  glitch  gnome  go  god  golang  google  gossip  gpg  grafana  graphing  graphite  graphs  gruffalo  guides  gulp  gzip  ha  hacks  hadoop  hailo  haproxy  hardware  hbase  hdds  hdfs  health-checks  heap  heap-dumps  heartbeats  heka  hero-coder  hero-culture  heron  hiccups  hidden-costs  history  hn  holt-winters  home  honeypot  horizon-charts  horror  hostedgraphite  hosting  hostnames  hotels  hotspot  howto  hrd  hsts  http  http2  httpry  https  huge-pages  human-factors  humor  hvm  hyperthreading  hystrix  iam  ian-wilkes  ibm  icecube  ifttt  images  imaging  inactivity  incident-models  incident-response  incidents  increment-mag  indexes  inept  influxdb  infrastructure  ingress  init  injection  insane  inspeqtor  instance-types  instances  instapaper  instrumentation  integration-testing  integration-tests  inter-region  internet  internet-scale  interviews  inviso  io  iops  iostat  ioutil  ip  ip-addresses  iptables  ironfan  james-hamilton  java  javascript  jay-kreps  jcmd  jdk  jemalloc  jenkins  jepsen  jit  jmx  jmxtrans  jobs  john-allspaw  journalling  joyent  jstat  julia-evans  juniper  jvm  k8s  kafka  kdd  kde  kellabyte  kernel  key-distribution  key-rotation  key-value  keybox  keys  keywhiz  kill-9  kms  knife  kubernetes  lambda  languages  laptops  latency  layout  lbs  leap-second  leap-smearing  legacy  leveldb  lhm  lhtable  libc  librato  lifecycle  lifespan  lighttpd  limits  linden  linkedin  linkerd  links  linode  linux  liquid-cooling  listen-backlog  lists  live  lmax  load  load-balancers  load-balancing  load-testing  locking  logentries  logging  loggly  loose-coupling  lsb  lsof  lsx  luks  lxc  m5  mac  machine-learning  macosx  madvise  mail  maintainance  mandos  manta  map-reduce  mapreduce  measurement  measurements  mechanical-sympathy  memory  memory-leaks  mesos  metrics  mfa  microservices  microsoft  migration  migrations  mincore  mirroring  mit  ml  mmap  mocha  money  mongodb  monit  monitorama  monitoring  monolith  movies  mozilla  mpstat  mq  mtbf  mttr  multi-cloud  multi-region  multiplexing  mysql  mytaxi  nagios  namespaces  nannies  nas  nat  natwest  nerve  netdata  netflix  nethogs  netspot  netstat  netty  network  network-monitoring  network-partitions  networking  networks  new-relic  nginx  niall-murphy  nix  nixos  nixpkgs  nlb  node.js  normalization-of-deviance  norms  nosql  notification  notifications  npm  ntp  ntpd  nuclear-power  nurse  oauth  obama  omega  omniti  on-call  oom  oom-killer  ooms  open-source  openjdk  openssl  operability  operations  ops  opsgenie  optimization  oreilly  organisations  os  oss  osx  ouch  out-of-band  outage  outages  outbrain  outsourcing  overhead  overlay  overlayfs  ovh  owasp  packages  packaging  packet-capture  packets  page-cache  pager  pager-duty  pagerduty  pages  paging  papers  parse  partition  partitions  passenger  passwords  patterns  paxos  pbailis  pcp  pcp2graphite  pdf  peering  percentiles  percona  performance  pgp  php  phusion  pie  pillar  pinball  ping  pinterest  piops  pipelines  pixar  pki  placement  planning  platform  platforms  playbooks  plumbr.eu  post-mortem  post-mortems  postgres  postmortem  postmortems  presentation  presentations  presos  pricing  princess  prioritisation  procedures  process  processes  procfs  prod  production  profiling  programming  prometheus  provisioning  proxies  proxy  proxying  pty  puppet  pv  python  qa  qdisc  qed-regime  questions  queueing  quick-references  rabbitmq  race-conditions  rafe-colburn  raid  rails  raintank  rami-rosen  randomization  ranking  rant  rate-limiting  rbs  rc3  rdbms  rds  read-only  reading  real-time  records  recovery  recurrence  red-hat  reddit  redis  redshift  refactoring  reference  regions  registry  regression-testing  reinvent  release  releases  reliability  reliabilty  remediation  replicas  replication  request-routing  resiliency  resource-limits  restarting  restore  restoring  rethinkdb  reverse-proxy  reversibility  reviews  rewrites  riak  riemann  ripienaar  risks  rkt  rm-rf  rmi  rob-ewaschuk  rocket  rocksdb  rollback  root-cause  root-causes  route-53  route53  routing  rspec  ruby  rules-of-thumb  runbooks  runit  runjop  rvm  rwasa  s3  s3funnel  s3ql  saas  safety  sam  sandboxing  sanity-checks  sar  scala  scalability  scale  scaling  scary  scheduler  scheduling  schema  schibsted  scripts  scylladb  sdd  sdn  seagate  search  secrets  security  seesaw  segment  sensu  serf  serialization  server  serverless  servers  serverspec  service-discovery  service-metrics  service-registry  services  ses  settings  sev1  severity  sharding  shebang  shell  shippable  shodan  shopify  shorn-writes  signalfx  silos  sjk  skyliner  slack  slashdot  sleep  slew  slides  smartstack  smoke-tests  smtp  snappy  snapshots  sns  soa  sockets  software  solaris  soundcloud  south-pole  space  spark  sparkey  spdy  speculative-execution  spinnaker  split-brain  spot-fleet  spot-fleets  spot-instances  spotify  sql  sqs  square  squarespace  sre  ssd  ssh  ssl  stack  stack-size  stackoverflow  stacks  stackshare  staging  startup  state  stateful-services  statistics  stats  statsd  statsite  status  status-pages  stephanie-dean  stepping  stolen-cpu  storage  stores  storm  strace  stratus  stream-processing  streaming  streams  stress-testing  strider  stripe  strongbox  sts  supervision  supervisord  support  survey  svctm  swarm  switches  switching  syadmin  symantec  synapse  sysadmin  sysadvent  syscalls  sysdig  syslog  sysstat  system  system-testing  system-v  systemd  systems  tahoe-lafs  talks  tc  tcp  tcpcopy  tcpdump  tdd  teams  tech  tech-debt  technical-debt  techops  tee  telefonica  telemetry  teleport  terraform  testing  tests  the-register  thp  threadpools  threads  three-mile-island  thresholds  throughput  thundering-herd  ticketea  tier-one-support  tildeslash  time  time-machine  time-series  time-sync  time-synchronization  timeouts  tips  tls  tools  top  toread  tos  trace  tracer-requests  tracing  trading  traefik  training  transactional-updates  transparent-huge-pages  trivago  troubleshooting  tsd  tuning  turing-complete  twilio  twisted  twitter  two-factor-authentication  uat  ubiquiti  ubuntu  ubuntu-core  udocker  udp  ui  ulster-bank  ultradns  unbound  unicorn  unikernels  unit-testing  unit-tests  unix  upgrades  upstart  uptime  urls  use  uselessd  usenix  user-submitted-code  usl  ux  vagrant  varnish  vector  version-control  versioning  via:aphyr  via:ardi  via:bill-dehora  via:chughes  via:codeslinger  via:dave-doran  via:dehora  via:eoinbrazil  via:fanf  via:feylya  via:filippo  via:highscalability  via:irldexter  via:jgilbert  via:jk  via:kragen  via:lusis  via:marc  via:markkenny  via:martharotter  via:nelson  via:pdolan  via:pixelbeat  vips  virtualisation  virtualization  visualisation  vividcortex  vlans  vm  vms  voldemort  vpc  weave  web  web-services  webmail  weighting  whats-my-ip  wifi  wiki  winston  wipac  wireless  wishlist  wlan  work  workflow  workflows  workplaces  x86_64  xen  xfs  xooglers  yahoo  yammer  yelp  zfs  zipkin  zonify  zookeeper  zooko 

Copy this bookmark:



description:


tags: