jm + ec2   32

Adrian Cockroft's Cloud Outage Reports Collection
The detailed summaries of outages from cloud vendors are comprehensive and the response to each highlights many lessons in how to build robust distributed systems. For outages that significantly affected Netflix, the Netflix techblog report gives insight into how to effectively build reliable services on top of AWS. [....] I plan to collect reports here over time, and welcome links to other write-ups of outages and how to survive them.
outages  post-mortems  documentation  ops  aws  ec2  amazon  google  dropbox  microsoft  azure  incident-response 
23 days ago by jm
'Bobtail: Avoiding Long Tails in the Cloud' [pdf]
'A system that proactively detects and avoids bad neighbouring VMs without significantly penalizing node instantiation [in EC2]. With Bobtail, common [datacenter] communication patterns benefit from reductions of up to 40% in 99.9th percentile response times.'

Excellent stuff -- another conclusion they come to is that it's not the network's fault, it's the Xen hosts themselves. The EC2 networking team will be happy about that ;)
networking  ec2  bobtail  latency  long-tail  xen  performance 
6 weeks ago by jm
Video Processing at Dropbox
On-the-fly video transcoding during live streaming. They've done a great job of this!
At the beginning of the development of this feature, we entertained the idea to simply pre-transcode all the videos in Dropbox to all possible target devices. Soon enough we realized that this simple approach would be too expensive at our scale, so we decided to build a system that allows us to trigger a transcoding process only upon user request and cache the results for subsequent fetches. This on-demand approach: adapts to heterogeneous devices and network conditions, is relatively cheap (everything is relative at our scale), guarantees low latency startup time.
ffmpeg  dropbox  streaming  video  cdn  ec2  hls  http  mp4  nginx  haproxy  aws  h264 
8 weeks ago by jm
Chartbeat's Lessons learned tuning TCP and Nginx in EC2
a good writeup of basic sysctl tuning for an internet-facing HTTP proxy fleet running in EC2. Nothing groundbreaking here, but it's well-written
nginx  amazon  ec2  tcp  ip  tuning  sysctl  linux  c10k  ssl  http 
january 2014 by jm
10 Things You Should Know About AWS
Some decent tips in here, mainly EC2-focussed
amazon  ec2  aws  ops  rds 
november 2013 by jm
Scryer: Netflix’s Predictive Auto Scaling Engine
Scryer is a new system that allows us to provision the right number of AWS instances needed to handle the traffic of our customers. But Scryer is different from Amazon Auto Scaling (AAS), which reacts to real-time metrics and adjusts instance counts accordingly. Rather, Scryer predicts what the needs will be prior to the time of need and provisions the instances based on those predictions.
scaling  infrastructure  aws  ec2  netflix  scryer  auto-scaling  aas  metrics  prediction  spikes 
november 2013 by jm
DynamoDB Local
'a client-side database that supports the complete DynamoDB API, but doesn't manipulate any tables or data in DynamoDB itself. You can write code while sitting in a tree, on the beach, or in the desert. When you are ready to deploy your application, you simply instruct it to connect to the actual DynamoDB endpoint. No other modifications will be needed.'

This is good -- an in-memory data store for integration testing is absolutely vital for production usage. (Voldemort does this well, for example.)
dynamodb  aws  ec2  testing  integration-testing  unit-tests 
september 2013 by jm
Benchmarking Redis on AWS ElastiCache
good data points, but could do with latency percentiles
latency  redis  measurement  benchmarks  ec2  elasticache  aws  storage  tests 
september 2013 by jm

The future of the AWS command line tools is awscli, a single, unified, consistent command line tool that works with almost all of the AWS services. Here is a quick list of the services that awscli currently supports: Auto Scaling, CloudFormation, CloudSearch, CloudWatch, Data Pipeline, Direct Connect, DynamoDB, EC2, ElastiCache, Elastic Beanstalk, Elastic Transcoder, ELB, EMR, Identity and Access Management, Import/Export, OpsWorks, RDS, Redshift, Route 53, S3, SES, SNS, SQS, Storage Gateway, Security Token Service, Support API, SWF, VPC. Support for the following appears to be planned: CloudFront, Glacier, SimpleDB.

The awscli software is being actively developed as an open source project on Github, with a lot of support from Amazon. You’ll note that the biggest contributors to awscli are Amazon employees with Mitch Garnaat leading. Mitch is also the author of boto, the amazing Python library for AWS.
aws  awscli  cli  tools  command-line  ec2  s3  amazon  api 
august 2013 by jm
Improved HTTPS Performance with Early SSL Termination
This is a neat hack. Since SSL/TLS connection establishment requires lots of consecutive round trips before the connection is ready, by performing that closer to the user and reusing an existing region-to-region connection behind the scenes, the overall latency is greatly improved. Works for HTTP as well
http  https  ssl  architecture  aws  ec2  performance  latency  internet  round-trip  nginx  tls 
july 2013 by jm
Instagram: Making the Switch to Cassandra from Redis, a 75% 'Insta' Savings
shifting data out of RAM and onto SSDs -- unsurprisingly, big savings.
a 12 node cluster of EC2 hi1.4xlarge instances; we store around 1.2TB of data across this cluster. At peak, we're doing around 20,000 writes per second to that specific cluster and around 15,000 reads per second. We've been really impressed with how well Cassandra has been able to drop into that role.
ram  ssd  cassandra  databases  nosql  redis  instagram  storage  ec2 
june 2013 by jm
'Easy Amazon EC2 Instance Comparison'. a nice UI on the various EC2 instance types on offer with their key attributes. Misses out availability of EBS-optimized instances though
amazon  ec2  aws  comparison  pricing 
june 2013 by jm
Communication costs in real-world networks
Peter Bailis has generated some good real-world data about network performance and latency, measured using EC2 instances, between ec2 regions, between zones, and between hosts in a single AZ. good data (particularly as I was looking for this data in a public source not too long ago).

I wasn’t aware of any datasets describing network behavior both within and across datacenters, so we launched m1.small Amazon EC2 instances in each of the eight geo-distributed “Regions,” across the three us-east “Availability Zones” (three co-located datacenters in Virginia), and within one datacenter (us-east-b). We measured RTTs between hosts for a week at a granularity of one ping per second.

Some of the high-percentile measurements are undoubtedly impact of host and VM behaviour, but that is still good data for a typical service built in EC2.
networks  performance  measurements  benchmarks  ops  ec2  networking  internet  az  latency 
may 2013 by jm
This program creates an EBS snapshot for an Amazon EC2 EBS volume. To
help ensure consistent data in the snapshot, it tries to flush and
freeze the filesystem(s) first as well as flushing and locking the
database, if applicable.

Filesystems can be frozen during the snapshot. Prior to Linux kernel
2.6.29, XFS must be used for freezing support. While frozen, a
filesystem will be consistent on disk and all writes will block.

There are a number of timeouts to reduce the risk of interfering with
the normal database operation while improving the chances of getting a
consistent snapshot.

If you have multiple EBS volumes in a RAID configuration, you can
specify all of the volume ids on the command line and it will create
snapshots for each while the filesystem and database are locked. Note
that it is your responsibility to keep track of the resulting snapshot
ids and to figure out how to put these back together when you need to
restore the RAID setup.

ubuntu  ec2  aws  linux  ebs  snapshots  ops  tools  alestic 
may 2013 by jm
Understanding Elastic Block Store Availability and Performance [slides]
fantastic in-depth presentation on EBS usage; lots of good advice here if you're using EBS volumes with/without PIOPS
piops  ebs  performance  aws  ec2  ops  storage  amazon  presentations 
may 2013 by jm
Under the Covers of DynamoDB
mostly a DynamoDB puff-piece from last week's Amazon Cloud Connect, but contains some good real-world figures for a 20-billion-GUID deduping table use-case at end. ($4,150 per month, to cut to the chase)
dynamodb  aws  figures  costs  architecture  ec2  dedupe  cloud-connect  slides 
april 2013 by jm
Latency's Worst Nightmare: Performance Tuning Tips and Tricks [slides]
the basics of running a service stack (web, app servers, data stores) on AWS. some good benchmark figures in the final slides
benchmarks  aws  ec2  ebs  piops  services  scaling  scalability  presentations 
april 2013 by jm
TCP Tune
These notes are intended to help users and system administrators maximize TCP/IP performance on their computer systems. They summarize all of the end-system (computer system) network tuning issues including a tutorial on TCP tuning, easy configuration checks for non-experts, and a repository of operating system specific instructions for getting the best possible network performance on these platforms.

Some tips for maximizing HPC network performance for the intra-DC case; recommended by the LinkedIn Kafka operations page.
tuning  network  tcp  sysadmin  performance  ops  kafka  ec2 
april 2013 by jm
High Performance MongoDB Clusters with Amazon EBS Provisioned IOPS
yeah yeah, Mongo. bookmarking for the good data on EBS+PIOPS
ebs  piops  aws  performance  tips  ops  ec2  mongodb  presentations 
april 2013 by jm
By the numbers: How Google Compute Engine stacks up to Amazon EC2
Scalr's thoughts on Google's EC2 competitor.
with Google Compute Engine, AWS has a formidable new competitor in the public cloud space, and we’ll likely be moving some of Scalr’s production workloads from our hybrid aws-rackspace-softlayer setup to it when it leaves beta. There’s a strong technical case for migrating heavy workloads to GCE, and I’ll be grabbing popcorn to eagerly watch as the battle unfolds between the giants.
gce  cloud  ec2  amazon  aws  google  scalr 
march 2013 by jm
Big Data Analytics at Netflix. Interview with Christos Kalantzis and Jason Brown.
Good interview with the Cassandra guys at Netflix, and some top Mongo-bashing in the comments
cassandra  netflix  user-stories  testimonials  nosql  storage  ec2  mongodb 
february 2013 by jm
'an expressive toolset for constructing scalable, resilient [service] architectures. It works in the cloud, in the data center, and on your laptop, and it makes your system diagram visible and inevitable. Inevitable systems coordinate automatically to interconnect, removing the hassle of manual configuration of connection points (and the associated danger of human error).' Looks like a pretty neat cluster deployment tool; driven from a single configuration file, using Chef, integrating closely with AWS and providing many useful additional features
chef  deployment  clusters  knife  services  aws  ec2  ops  ironfan  demo 
january 2013 by jm
AWS Advent 2012
'an annual exploration of Amazon Web Services.' Some great hacks here
aws  amazon  advent  sysadmin  s3  ec2  chef  puppet  ops 
december 2012 by jm
How Team Obama’s tech efficiency left Romney IT in dust | Ars Technica
The web-app dev and ops best practices used by the Obama campaign's tech team. Some key tools: Puppet, EC2, Asgard, Cacti, Opsview, StatsD, Graphite, Seyren, Route53, Loggly, etc.
obama  campaigns  tools  ops  asgard  ec2  aws  route53 
november 2012 by jm
C500k in Action at Urban Airship
I missed this back in 2010; 500k active TCP connections to a single EC2 large instance using Java and NIO
c10k  java  linux  ec2  scaling  nio  netty  urban-airship 
july 2012 by jm
Cloudsmith Stack Hammer
something Chris Horn sent on -- using Puppet to build stacks and deploy to AWS using a simple point-and-click interface. looks cool
github  ec2  aws  puppet  stacks  cloudsmith  stack-hammer  via:chorn 
february 2012 by jm
Benchmarking Cassandra Scalability on AWS - Over a million writes per second
NetFlix' benchmarks -- impressively detailed. '48, 96, 144 and 288 instances', across 3 EC2 AZs in us-east, successfully scaling linearly
ec2  aws  cassandra  scaling  benchmarks  netflix  performance 
november 2011 by jm
Amazon EC2 outage: summary and lessons learned
Rightscale CTO on last week's outage; pretty detailed, good round-up of useful commentary from around the web, too
ebs  ec2  aws  cloud  availability  slas  rightscale  amazon 
april 2011 by jm
CloudSplit – Real Time Cloud Analytics
interesting idea from Joe -- track your cloud-hosting spend in real-time
cloudsplit  hosting  amazon  ec2  azure  joe-drumgoole  analytics  real-time  from delicious
september 2009 by jm

related tags

aas  advent  alestic  amazon  analytics  api  architecture  asgard  auto-scaling  availability  aws  awscli  az  azure  benchmarks  bobtail  c10k  campaigns  cassandra  cdn  chef  cli  cloud  cloud-connect  cloudsmith  cloudsplit  clusters  command-line  comparison  costs  databases  dedupe  demo  deployment  docs  documentation  dropbox  dynamodb  ebs  ec2  elasticache  elb  ffmpeg  figures  gce  github  google  h264  haproxy  hls  hosting  http  https  incident-response  infrastructure  instagram  instances  integration-testing  internet  ip  ironfan  java  joe-drumgoole  kafka  knife  latency  linux  long-tail  measurement  measurements  memory  metrics  microsoft  mongodb  mp4  netflix  netty  network  networking  networks  nginx  nio  nosql  obama  ops  outages  perfect-forward-secrecy  performance  piops  post-mortems  prediction  presentations  pricing  proxying  puppet  r3  ram  rds  real-time  redis  rightscale  round-trip  route53  s3  scalability  scaling  scalr  scryer  security  services  slas  slides  snapshots  spikes  ssd  ssl  stack-hammer  stacks  storage  streaming  sysadmin  sysctl  tcp  testimonials  testing  tests  tips  tls  tools  tuning  ubuntu  unit-tests  urban-airship  user-stories  via:chorn  video  xen 

Copy this bookmark: