jm + ec2   47

The Network is Reliable - ACM Queue
Peter Bailis and Kyle Kingsbury accumulate a comprehensive, informal survey of real-world network failures observed in production. I remember that April 2011 EBS outage...
ec2  aws  networking  outages  partitions  jepsen  pbailis  aphyr  acm-queue  acm  survey  ops 
8 days ago by jm
Netflix/ribbon
a client side IPC library that is battle-tested in cloud. It provides the following features:

Load balancing;
Fault tolerance;
Multiple protocol (HTTP, TCP, UDP) support in an asynchronous and reactive model;
Caching and batching.

I like the integration of Eureka and Hystrix in particular, although I would really like to read more about Eureka's approach to availability during network partitions and CAP.

https://groups.google.com/d/msg/eureka_netflix/LXKWoD14RFY/-5nElGl1OQ0J has some interesting discussion on the topic. It actually sounds like the Eureka approach is more correct than using ZK: 'Eureka is available. ZooKeeper, while tolerant against single node failures, doesn't react well to long partitioning events. For us, it's vastly more important that we maintain an available registry than a necessary consistent registry. If us-east-1d sees 23 nodes, and us-east-1c sees 22 nodes for a little bit, that's OK with us.'

See also http://ispyker.blogspot.ie/2013/12/zookeeper-as-cloud-native-service.html which corroborates this:

I went into one of the instances and quickly did an iptables DROP on all packets coming from the other two instances. This would simulate an availability zone continuing to function, but that zone losing network connectivity to the other availability zones. What I saw was that the two other instances noticed that the first server “going away”, but they continued to function as they still saw a majority (66%). More interestingly the first instance noticed the other two servers “going away” dropping the ensemble availability to 33%. This caused the first server to stop serving requests to clients (not only writes, but also reads). [...]

To me this seems like a concern, as network partitions should be considered an event that should be survived. In this case (with this specific configuration of zookeeper) no new clients in that availability zone would be able to register themselves with consumers within the same availability zone. Adding more zookeeper instances to the ensemble wouldn’t help considering a balanced deployment as in this case the availability would always be majority (66%) and non-majority (33%).
netflix  ribbon  availability  libraries  java  hystrix  eureka  aws  ec2  load-balancing  networking  http  tcp  architecture  clients  ipc 
21 days ago by jm
New AWS Web Services region: eu-central-1 (soon)
Iiiinteresting. Sounds like new anti-NSA-snooping privacy laws will be driving a lot of new mini-regions in AWS. Hope Amazon have their new-region-standup process a little more streamlined by now than when I was there ;)
aws  germany  privacy  ec2  eu-central-1  nsa  snooping 
26 days ago by jm
New Low Cost EC2 Instances with Burstable Performance
Oh, very neat. New micro, small, and medium-class instances with burstable CPU scaling:
The T2 instances are built around a processing allocation model that provides you a generous, assured baseline amount of processing power coupled with the ability to automatically and transparently scale up to a full core when you need more compute power. Your ability to burst is based on the concept of "CPU Credits" that you accumulate during quiet periods and spend when things get busy. You can provision an instance of modest size and cost and still have more than adequate compute power in reserve to handle peak demands for compute power.
ec2  aws  hosting  cpu  scaling  burst  load  instances 
4 weeks ago by jm
Building a Smarter Application Stack - DevOps Ireland
This sounds like a very interesting Dublin meetup -- Engine Yard on thursday night:
This month, we'll have Tomas Doran from Yelp talking about Docker, service discovery, and deployments. 'There are many advantages to a container based, microservices architecture - however, as always, there is no silver bullet. Any serious deployment will involve multiple host machines, and will have a pressing need to migrate containers between hosts at some point. In such a dynamic world hard coding IP addresses, or even host names is not a viable solution. This talk will take a journey through how Yelp has solved the discovery problems using Airbnb’s SmartStack to dynamically discover service dependencies, and how this is helping unify our architecture, from traditional metal to EC2 ‘immutable’ SOA images, to Docker containers.'
meetups  talks  dublin  deployment  smartstack  ec2  docker  yelp  service-discovery 
4 weeks ago by jm
Amazon EC2 Service Limits Report Now Available
'designed to make it easier for you to view and manage your limits for Amazon EC2 by providing the latest information on service limits and links to quickly request limit increases. EC2 Service Limits Report displays all your service limit information in one place to help you avoid encountering limits on future EC2, EBS, Auto Scaling, and VPC usage.'
aws  ec2  vpc  ebs  autoscaling  limits  ops 
5 weeks ago by jm
Code Spaces data and backups deleted by hackers
Rather scary story of an extortionist wiping out a company's AWS-based infrastructure. Turns out S3 supports MFA-required deletion as a feature, though, which would help against that.
ops  security  extortion  aws  ec2  s3  code-spaces  delete  mfa  two-factor-authentication  authentication  infrastructure 
6 weeks ago by jm
Use of Formal Methods at Amazon Web Services
Chris Newcombe, Marc Brooker, et al. writing about their experience using formal specification and model-checking languages (TLA+) in production in AWS:

The success with DynamoDB gave us enough evidence to present TLA+ to the broader engineering community at Amazon. This raised a challenge; how to convey the purpose and benefits of formal methods to an audience of software engineers? Engineers think in terms of debugging rather than ‘verification’, so we called the presentation “Debugging Designs”.

Continuing that metaphor, we have found that software engineers more readily grasp the concept and practical value of TLA+ if we dub it 'Exhaustively-testable pseudo-code'.

We initially avoid the words ‘formal’, ‘verification’, and ‘proof’, due to the widespread view that formal methods are impractical. We also initially avoid mentioning what the acronym ‘TLA’ stands for, as doing so would give an incorrect impression of complexity.


More slides at http://tla2012.loria.fr/contributed/newcombe-slides.pdf ; proggit discussion at http://www.reddit.com/r/programming/comments/277fbh/use_of_formal_methods_at_amazon_web_services/
formal-methods  model-checking  tla  tla+  programming  distsys  distcomp  ebs  s3  dynamodb  aws  ec2  marc-brooker  chris-newcombe 
6 weeks ago by jm
AWS SDK for Java Client Configuration
turns out the AWS SDK has lots of tuning knobs: region selection, socket buffer sizes, and debug logging (including wire logging).
aws  sdk  java  logging  ec2  s3  dynamodb  sockets  tuning 
6 weeks ago by jm
moto
Mock Boto: 'a library that allows your python tests to easily mock out the boto library.' Supports S3, Autoscaling, EC2, DynamoDB, ELB, Route53, SES, SQS, and STS currently, and even supports a standalone server mode, to act as a mock service for non-Python clients. Excellent!

(via Conor McDermottroe)
python  aws  testing  mocks  mocking  system-tests  unit-tests  coding  ec2  s3 
12 weeks ago by jm
Docker Plugin for Jenkins
The aim of the docker plugin is to be able to use a docker host to dynamically provision a slave, run a single build, then tear-down that slave. Optionally, the container can be committed, so that (for example) manual QA could be performed by the container being imported into a local docker provider, and run from there.


The holy grail of Jenkins/Docker integration. How cool is that...
jenkins  docker  ops  testing  ec2  hosting  scaling  elastic-scaling  system-testing 
12 weeks ago by jm
AWS Case Study: Hailo
Ubuntu, C*, HAProxy, MySQL, RDS, multiple AWS regions.
hailo  cassandra  ubuntu  mysql  rds  aws  ec2  haproxy  architecture 
april 2014 by jm
AWS Elastic Beanstalk for Docker
This is pretty amazing. nice work, Beanstalk team. not sure how well it integrates with the rest of AWS though
aws  amazon  docker  ec2  beanstalk  ops  containers  linux 
april 2014 by jm
Using AWS in the context of Australian Privacy Considerations
interesting new white paper from Amazon regarding recent strengthening of the Aussie privacy laws, particularly w.r.t. geographic location of data and access by overseas law enforcement agencies...
amazon  aws  security  law  privacy  data-protection  ec2  s3  nsa  gchq  five-eyes 
april 2014 by jm
Adrian Cockroft's Cloud Outage Reports Collection
The detailed summaries of outages from cloud vendors are comprehensive and the response to each highlights many lessons in how to build robust distributed systems. For outages that significantly affected Netflix, the Netflix techblog report gives insight into how to effectively build reliable services on top of AWS. [....] I plan to collect reports here over time, and welcome links to other write-ups of outages and how to survive them.
outages  post-mortems  documentation  ops  aws  ec2  amazon  google  dropbox  microsoft  azure  incident-response 
march 2014 by jm
'Bobtail: Avoiding Long Tails in the Cloud' [pdf]
'A system that proactively detects and avoids bad neighbouring VMs without significantly penalizing node instantiation [in EC2]. With Bobtail, common [datacenter] communication patterns benefit from reductions of up to 40% in 99.9th percentile response times.'

Excellent stuff -- another conclusion they come to is that it's not the network's fault, it's the Xen hosts themselves. The EC2 networking team will be happy about that ;)
networking  ec2  bobtail  latency  long-tail  xen  performance 
february 2014 by jm
Video Processing at Dropbox
On-the-fly video transcoding during live streaming. They've done a great job of this!
At the beginning of the development of this feature, we entertained the idea to simply pre-transcode all the videos in Dropbox to all possible target devices. Soon enough we realized that this simple approach would be too expensive at our scale, so we decided to build a system that allows us to trigger a transcoding process only upon user request and cache the results for subsequent fetches. This on-demand approach: adapts to heterogeneous devices and network conditions, is relatively cheap (everything is relative at our scale), guarantees low latency startup time.
ffmpeg  dropbox  streaming  video  cdn  ec2  hls  http  mp4  nginx  haproxy  aws  h264 
february 2014 by jm
Chartbeat's Lessons learned tuning TCP and Nginx in EC2
a good writeup of basic sysctl tuning for an internet-facing HTTP proxy fleet running in EC2. Nothing groundbreaking here, but it's well-written
nginx  amazon  ec2  tcp  ip  tuning  sysctl  linux  c10k  ssl  http 
january 2014 by jm
10 Things You Should Know About AWS
Some decent tips in here, mainly EC2-focussed
amazon  ec2  aws  ops  rds 
november 2013 by jm
Scryer: Netflix’s Predictive Auto Scaling Engine
Scryer is a new system that allows us to provision the right number of AWS instances needed to handle the traffic of our customers. But Scryer is different from Amazon Auto Scaling (AAS), which reacts to real-time metrics and adjusts instance counts accordingly. Rather, Scryer predicts what the needs will be prior to the time of need and provisions the instances based on those predictions.
scaling  infrastructure  aws  ec2  netflix  scryer  auto-scaling  aas  metrics  prediction  spikes 
november 2013 by jm
DynamoDB Local
'a client-side database that supports the complete DynamoDB API, but doesn't manipulate any tables or data in DynamoDB itself. You can write code while sitting in a tree, on the beach, or in the desert. When you are ready to deploy your application, you simply instruct it to connect to the actual DynamoDB endpoint. No other modifications will be needed.'

This is good -- an in-memory data store for integration testing is absolutely vital for production usage. (Voldemort does this well, for example.)
dynamodb  aws  ec2  testing  integration-testing  unit-tests 
september 2013 by jm
Benchmarking Redis on AWS ElastiCache
good data points, but could do with latency percentiles
latency  redis  measurement  benchmarks  ec2  elasticache  aws  storage  tests 
september 2013 by jm
awscli

The future of the AWS command line tools is awscli, a single, unified, consistent command line tool that works with almost all of the AWS services. Here is a quick list of the services that awscli currently supports: Auto Scaling, CloudFormation, CloudSearch, CloudWatch, Data Pipeline, Direct Connect, DynamoDB, EC2, ElastiCache, Elastic Beanstalk, Elastic Transcoder, ELB, EMR, Identity and Access Management, Import/Export, OpsWorks, RDS, Redshift, Route 53, S3, SES, SNS, SQS, Storage Gateway, Security Token Service, Support API, SWF, VPC. Support for the following appears to be planned: CloudFront, Glacier, SimpleDB.

The awscli software is being actively developed as an open source project on Github, with a lot of support from Amazon. You’ll note that the biggest contributors to awscli are Amazon employees with Mitch Garnaat leading. Mitch is also the author of boto, the amazing Python library for AWS.
aws  awscli  cli  tools  command-line  ec2  s3  amazon  api 
august 2013 by jm
Improved HTTPS Performance with Early SSL Termination
This is a neat hack. Since SSL/TLS connection establishment requires lots of consecutive round trips before the connection is ready, by performing that closer to the user and reusing an existing region-to-region connection behind the scenes, the overall latency is greatly improved. Works for HTTP as well
http  https  ssl  architecture  aws  ec2  performance  latency  internet  round-trip  nginx  tls 
july 2013 by jm
Instagram: Making the Switch to Cassandra from Redis, a 75% 'Insta' Savings
shifting data out of RAM and onto SSDs -- unsurprisingly, big savings.
a 12 node cluster of EC2 hi1.4xlarge instances; we store around 1.2TB of data across this cluster. At peak, we're doing around 20,000 writes per second to that specific cluster and around 15,000 reads per second. We've been really impressed with how well Cassandra has been able to drop into that role.
ram  ssd  cassandra  databases  nosql  redis  instagram  storage  ec2 
june 2013 by jm
EC2Instances.info
'Easy Amazon EC2 Instance Comparison'. a nice UI on the various EC2 instance types on offer with their key attributes. Misses out availability of EBS-optimized instances though
amazon  ec2  aws  comparison  pricing 
june 2013 by jm
Communication costs in real-world networks
Peter Bailis has generated some good real-world data about network performance and latency, measured using EC2 instances, between ec2 regions, between zones, and between hosts in a single AZ. good data (particularly as I was looking for this data in a public source not too long ago).

I wasn’t aware of any datasets describing network behavior both within and across datacenters, so we launched m1.small Amazon EC2 instances in each of the eight geo-distributed “Regions,” across the three us-east “Availability Zones” (three co-located datacenters in Virginia), and within one datacenter (us-east-b). We measured RTTs between hosts for a week at a granularity of one ping per second.


Some of the high-percentile measurements are undoubtedly impact of host and VM behaviour, but that is still good data for a typical service built in EC2.
networks  performance  measurements  benchmarks  ops  ec2  networking  internet  az  latency 
may 2013 by jm
ec2-consistent-snapshot
This program creates an EBS snapshot for an Amazon EC2 EBS volume. To
help ensure consistent data in the snapshot, it tries to flush and
freeze the filesystem(s) first as well as flushing and locking the
database, if applicable.

Filesystems can be frozen during the snapshot. Prior to Linux kernel
2.6.29, XFS must be used for freezing support. While frozen, a
filesystem will be consistent on disk and all writes will block.

There are a number of timeouts to reduce the risk of interfering with
the normal database operation while improving the chances of getting a
consistent snapshot.

If you have multiple EBS volumes in a RAID configuration, you can
specify all of the volume ids on the command line and it will create
snapshots for each while the filesystem and database are locked. Note
that it is your responsibility to keep track of the resulting snapshot
ids and to figure out how to put these back together when you need to
restore the RAID setup.


Handy!
ubuntu  ec2  aws  linux  ebs  snapshots  ops  tools  alestic 
may 2013 by jm
Understanding Elastic Block Store Availability and Performance [slides]
fantastic in-depth presentation on EBS usage; lots of good advice here if you're using EBS volumes with/without PIOPS
piops  ebs  performance  aws  ec2  ops  storage  amazon  presentations 
may 2013 by jm
Under the Covers of DynamoDB
mostly a DynamoDB puff-piece from last week's Amazon Cloud Connect, but contains some good real-world figures for a 20-billion-GUID deduping table use-case at end. ($4,150 per month, to cut to the chase)
dynamodb  aws  figures  costs  architecture  ec2  dedupe  cloud-connect  slides 
april 2013 by jm
Latency's Worst Nightmare: Performance Tuning Tips and Tricks [slides]
the basics of running a service stack (web, app servers, data stores) on AWS. some good benchmark figures in the final slides
benchmarks  aws  ec2  ebs  piops  services  scaling  scalability  presentations 
april 2013 by jm
TCP Tune
These notes are intended to help users and system administrators maximize TCP/IP performance on their computer systems. They summarize all of the end-system (computer system) network tuning issues including a tutorial on TCP tuning, easy configuration checks for non-experts, and a repository of operating system specific instructions for getting the best possible network performance on these platforms.


Some tips for maximizing HPC network performance for the intra-DC case; recommended by the LinkedIn Kafka operations page.
tuning  network  tcp  sysadmin  performance  ops  kafka  ec2 
april 2013 by jm
High Performance MongoDB Clusters with Amazon EBS Provisioned IOPS
yeah yeah, Mongo. bookmarking for the good data on EBS+PIOPS
ebs  piops  aws  performance  tips  ops  ec2  mongodb  presentations 
april 2013 by jm
By the numbers: How Google Compute Engine stacks up to Amazon EC2
Scalr's thoughts on Google's EC2 competitor.
with Google Compute Engine, AWS has a formidable new competitor in the public cloud space, and we’ll likely be moving some of Scalr’s production workloads from our hybrid aws-rackspace-softlayer setup to it when it leaves beta. There’s a strong technical case for migrating heavy workloads to GCE, and I’ll be grabbing popcorn to eagerly watch as the battle unfolds between the giants.
gce  cloud  ec2  amazon  aws  google  scalr 
march 2013 by jm
Big Data Analytics at Netflix. Interview with Christos Kalantzis and Jason Brown.
Good interview with the Cassandra guys at Netflix, and some top Mongo-bashing in the comments
cassandra  netflix  user-stories  testimonials  nosql  storage  ec2  mongodb 
february 2013 by jm
Ironfan
'an expressive toolset for constructing scalable, resilient [service] architectures. It works in the cloud, in the data center, and on your laptop, and it makes your system diagram visible and inevitable. Inevitable systems coordinate automatically to interconnect, removing the hassle of manual configuration of connection points (and the associated danger of human error).' Looks like a pretty neat cluster deployment tool; driven from a single configuration file, using Chef, integrating closely with AWS and providing many useful additional features
chef  deployment  clusters  knife  services  aws  ec2  ops  ironfan  demo 
january 2013 by jm
AWS Advent 2012
'an annual exploration of Amazon Web Services.' Some great hacks here
aws  amazon  advent  sysadmin  s3  ec2  chef  puppet  ops 
december 2012 by jm
How Team Obama’s tech efficiency left Romney IT in dust | Ars Technica
The web-app dev and ops best practices used by the Obama campaign's tech team. Some key tools: Puppet, EC2, Asgard, Cacti, Opsview, StatsD, Graphite, Seyren, Route53, Loggly, etc.
obama  campaigns  tools  ops  asgard  ec2  aws  route53 
november 2012 by jm
C500k in Action at Urban Airship
I missed this back in 2010; 500k active TCP connections to a single EC2 large instance using Java and NIO
c10k  java  linux  ec2  scaling  nio  netty  urban-airship 
july 2012 by jm
Cloudsmith Stack Hammer
something Chris Horn sent on -- using Puppet to build stacks and deploy to AWS using a simple point-and-click interface. looks cool
github  ec2  aws  puppet  stacks  cloudsmith  stack-hammer  via:chorn 
february 2012 by jm
Benchmarking Cassandra Scalability on AWS - Over a million writes per second
NetFlix' benchmarks -- impressively detailed. '48, 96, 144 and 288 instances', across 3 EC2 AZs in us-east, successfully scaling linearly
ec2  aws  cassandra  scaling  benchmarks  netflix  performance 
november 2011 by jm
Amazon EC2 outage: summary and lessons learned
Rightscale CTO on last week's outage; pretty detailed, good round-up of useful commentary from around the web, too
ebs  ec2  aws  cloud  availability  slas  rightscale  amazon 
april 2011 by jm
CloudSplit – Real Time Cloud Analytics
interesting idea from Joe -- track your cloud-hosting spend in real-time
cloudsplit  hosting  amazon  ec2  azure  joe-drumgoole  analytics  real-time  from delicious
september 2009 by jm

related tags

aas  acm  acm-queue  advent  alestic  amazon  analytics  aphyr  api  architecture  asgard  authentication  auto-scaling  autoscaling  availability  aws  awscli  az  azure  beanstalk  benchmarks  bobtail  burst  c10k  campaigns  cassandra  cdn  chef  chris-newcombe  cli  clients  cloud  cloud-connect  cloudformation  cloudsmith  cloudsplit  clusters  code-spaces  coding  command-line  comparison  containers  costs  cpu  data-protection  databases  dedupe  delete  demo  deployment  distcomp  distsys  docker  docs  documentation  dropbox  dublin  dynamodb  ebs  ec2  elastic-scaling  elasticache  elb  eu-central-1  eureka  extortion  ffmpeg  figures  five-eyes  formal-methods  gce  gchq  germany  github  google  h264  hailo  haproxy  hls  hosting  http  https  hystrix  incident-response  infrastructure  instagram  instances  integration-testing  internet  ip  ipc  ironfan  java  jenkins  jepsen  joe-drumgoole  kafka  knife  latency  law  libraries  limits  linux  load  load-balancing  logging  long-tail  marc-brooker  measurement  measurements  meetups  memory  metrics  mfa  microsoft  mocking  mocks  model-checking  mongodb  mp4  mysql  netflix  netty  network  networking  networks  nginx  nio  nosql  nsa  obama  ops  outages  partitions  pbailis  perfect-forward-secrecy  performance  piops  post-mortems  prediction  presentations  pricing  privacy  programming  provisioning  proxying  puppet  python  r3  ram  rds  real-time  redis  ribbon  rightscale  round-trip  route53  s3  scalability  scaling  scalr  scryer  sdk  security  service-discovery  services  slas  slides  smartstack  snapshots  snooping  sockets  spikes  ssd  ssl  stack-hammer  stacks  storage  streaming  survey  sysadmin  sysctl  system-testing  system-tests  talks  tcp  testimonials  testing  tests  tips  tla  tla+  tls  tools  tuning  two-factor-authentication  ubuntu  unit-tests  urban-airship  user-stories  via:chorn  video  vpc  xen  yelp 

Copy this bookmark:



description:


tags: