jm + aws   68

Elastic MapReduce vs S3
Turns out there are a few bugs in EMR's S3 support, believe it or not.

1. 'Consider disabling Hadoop's speculative execution feature if your cluster is experiencing Amazon S3 concurrency issues. You do this through the mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution configuration settings. This is also useful when you are troubleshooting a slow cluster.'

2. Upgrade to AMI 3.1.0 or later, otherwise retries of S3 ops don't work.
s3  emr  hadoop  aws  bugs  speculative-execution  ops 
7 hours ago by jm
Load testing Apache Kafka on AWS
This is a very solid benchmarking post, examining Kafka in good detail. Nicely done. Bottom line:
I basically spend 2/3 of my work time torture testing and operationalizing distributed systems in production. There's some that I'm not so pleased with (posts pending in draft forever) and some that have attributes that I really love. Kafka is one of those systems that I pretty much enjoy every bit of, and the fact that it performs predictably well is only a symptom of the reason and not the reason itself: the authors really know what they're doing. Nothing about this software is an accident. Performance, everything in this post, is only a fraction of what's important to me and what matters when you run these systems for real. Kafka represents everything I think good distributed systems are about: that thorough and explicit design decisions win.
testing  aws  kafka  ec2  load-testing  benchmarks  performance 
11 days ago by jm
Zonify
'a set of command line tools for managing Route53 DNS for an AWS infrastructure. It intelligently uses tags and other metadata to automatically create the associated DNS records.'
zonify  aws  dns  ec2  route53  ops 
25 days ago by jm
Using spot instances
Excellent post on all of the ins and outs of EC2 spot instance usage
ec2  aws  spot-instances  pricing  cloud  auto-scaling  ops 
8 weeks ago by jm
All Data Are Belong to AWS: Streaming upload via Fluentd
Fluentd looks like a decent foundation for tailing/streaming event processing in Ruby, supporting batched output to S3 and a bunch of other AWS services, Kafka, and RabbitMQ for output. Claims to have ok performance, despite its Rubbitude. However, its high-availability story is shite, so not to be used where availability is important
ruby  rabbitmq  kafka  tail  event-streaming  cep  event-processing  s3  aws  sqs  fluentd 
11 weeks ago by jm
AWS Speed Test: What are the Fastest EC2 and S3 Regions?
My god, this test is awful -- this is how NOT to test networked infrastructure. (1) testing from a single EC2 instance in each region; (2) uploading to a single test bucket for each test; (3) results don't include min/max or percentiles, just an averaged measurement for each test. FAIL
fail  testing  networking  performance  ec2  aws  s3  internet 
12 weeks ago by jm
The Network is Reliable - ACM Queue
Peter Bailis and Kyle Kingsbury accumulate a comprehensive, informal survey of real-world network failures observed in production. I remember that April 2011 EBS outage...
ec2  aws  networking  outages  partitions  jepsen  pbailis  aphyr  acm-queue  acm  survey  ops 
july 2014 by jm
Latest EBS tuning tips
from yesterday's AWS Summit in NYC:

Cheat sheet of EBS-optimized instances. http://t.co/vmTlhUtpWk
Optimize your queue depth to achieve lower latency & highest IOPS. http://t.co/EO48oa0D6X
When configuring your RAID, use a stripe size of 128KB or 256KB. http://t.co/N0ldtFJ4t6
Use larger block size to speed up the pre-warming process. http://t.co/8UoIeWE2px
ebs  aws  amazon  iops  raid  ops  tuning 
july 2014 by jm
Netflix/ribbon
a client side IPC library that is battle-tested in cloud. It provides the following features:

Load balancing;
Fault tolerance;
Multiple protocol (HTTP, TCP, UDP) support in an asynchronous and reactive model;
Caching and batching.

I like the integration of Eureka and Hystrix in particular, although I would really like to read more about Eureka's approach to availability during network partitions and CAP.

https://groups.google.com/d/msg/eureka_netflix/LXKWoD14RFY/-5nElGl1OQ0J has some interesting discussion on the topic. It actually sounds like the Eureka approach is more correct than using ZK: 'Eureka is available. ZooKeeper, while tolerant against single node failures, doesn't react well to long partitioning events. For us, it's vastly more important that we maintain an available registry than a necessary consistent registry. If us-east-1d sees 23 nodes, and us-east-1c sees 22 nodes for a little bit, that's OK with us.'

See also http://ispyker.blogspot.ie/2013/12/zookeeper-as-cloud-native-service.html which corroborates this:

I went into one of the instances and quickly did an iptables DROP on all packets coming from the other two instances. This would simulate an availability zone continuing to function, but that zone losing network connectivity to the other availability zones. What I saw was that the two other instances noticed that the first server “going away”, but they continued to function as they still saw a majority (66%). More interestingly the first instance noticed the other two servers “going away” dropping the ensemble availability to 33%. This caused the first server to stop serving requests to clients (not only writes, but also reads). [...]

To me this seems like a concern, as network partitions should be considered an event that should be survived. In this case (with this specific configuration of zookeeper) no new clients in that availability zone would be able to register themselves with consumers within the same availability zone. Adding more zookeeper instances to the ensemble wouldn’t help considering a balanced deployment as in this case the availability would always be majority (66%) and non-majority (33%).
netflix  ribbon  availability  libraries  java  hystrix  eureka  aws  ec2  load-balancing  networking  http  tcp  architecture  clients  ipc 
july 2014 by jm
New AWS Web Services region: eu-central-1 (soon)
Iiiinteresting. Sounds like new anti-NSA-snooping privacy laws will be driving a lot of new mini-regions in AWS. Hope Amazon have their new-region-standup process a little more streamlined by now than when I was there ;)
aws  germany  privacy  ec2  eu-central-1  nsa  snooping 
july 2014 by jm
New Low Cost EC2 Instances with Burstable Performance
Oh, very neat. New micro, small, and medium-class instances with burstable CPU scaling:
The T2 instances are built around a processing allocation model that provides you a generous, assured baseline amount of processing power coupled with the ability to automatically and transparently scale up to a full core when you need more compute power. Your ability to burst is based on the concept of "CPU Credits" that you accumulate during quiet periods and spend when things get busy. You can provision an instance of modest size and cost and still have more than adequate compute power in reserve to handle peak demands for compute power.
ec2  aws  hosting  cpu  scaling  burst  load  instances 
july 2014 by jm
Delivery Notifications for Simple Email Service
Today we are enhancing SES with the addition of delivery notifications. You can now elect to receive an Amazon SNS notification each time SES successfully delivers a message to a recipient's email server. These notifications give you increased visibility into the mail delivery process. With today's release, you can now track deliveries, bounces, and complaints, all via notification to the SNS topic or topics of your choice.
delivery  email  smtp  ses  aws  sns  notifications  ops 
june 2014 by jm
Amazon EC2 Service Limits Report Now Available
'designed to make it easier for you to view and manage your limits for Amazon EC2 by providing the latest information on service limits and links to quickly request limit increases. EC2 Service Limits Report displays all your service limit information in one place to help you avoid encountering limits on future EC2, EBS, Auto Scaling, and VPC usage.'
aws  ec2  vpc  ebs  autoscaling  limits  ops 
june 2014 by jm
Code Spaces data and backups deleted by hackers
Rather scary story of an extortionist wiping out a company's AWS-based infrastructure. Turns out S3 supports MFA-required deletion as a feature, though, which would help against that.
ops  security  extortion  aws  ec2  s3  code-spaces  delete  mfa  two-factor-authentication  authentication  infrastructure 
june 2014 by jm
Use of Formal Methods at Amazon Web Services
Chris Newcombe, Marc Brooker, et al. writing about their experience using formal specification and model-checking languages (TLA+) in production in AWS:

The success with DynamoDB gave us enough evidence to present TLA+ to the broader engineering community at Amazon. This raised a challenge; how to convey the purpose and benefits of formal methods to an audience of software engineers? Engineers think in terms of debugging rather than ‘verification’, so we called the presentation “Debugging Designs”.

Continuing that metaphor, we have found that software engineers more readily grasp the concept and practical value of TLA+ if we dub it 'Exhaustively-testable pseudo-code'.

We initially avoid the words ‘formal’, ‘verification’, and ‘proof’, due to the widespread view that formal methods are impractical. We also initially avoid mentioning what the acronym ‘TLA’ stands for, as doing so would give an incorrect impression of complexity.


More slides at http://tla2012.loria.fr/contributed/newcombe-slides.pdf ; proggit discussion at http://www.reddit.com/r/programming/comments/277fbh/use_of_formal_methods_at_amazon_web_services/
formal-methods  model-checking  tla  tla+  programming  distsys  distcomp  ebs  s3  dynamodb  aws  ec2  marc-brooker  chris-newcombe 
june 2014 by jm
AWS SDK for Java Client Configuration
turns out the AWS SDK has lots of tuning knobs: region selection, socket buffer sizes, and debug logging (including wire logging).
aws  sdk  java  logging  ec2  s3  dynamodb  sockets  tuning 
june 2014 by jm
moto
Mock Boto: 'a library that allows your python tests to easily mock out the boto library.' Supports S3, Autoscaling, EC2, DynamoDB, ELB, Route53, SES, SQS, and STS currently, and even supports a standalone server mode, to act as a mock service for non-Python clients. Excellent!

(via Conor McDermottroe)
python  aws  testing  mocks  mocking  system-tests  unit-tests  coding  ec2  s3 
may 2014 by jm
AWS Case Study: Hailo
Ubuntu, C*, HAProxy, MySQL, RDS, multiple AWS regions.
hailo  cassandra  ubuntu  mysql  rds  aws  ec2  haproxy  architecture 
april 2014 by jm
AWS Elastic Beanstalk for Docker
This is pretty amazing. nice work, Beanstalk team. not sure how well it integrates with the rest of AWS though
aws  amazon  docker  ec2  beanstalk  ops  containers  linux 
april 2014 by jm
Using AWS in the context of Australian Privacy Considerations
interesting new white paper from Amazon regarding recent strengthening of the Aussie privacy laws, particularly w.r.t. geographic location of data and access by overseas law enforcement agencies...
amazon  aws  security  law  privacy  data-protection  ec2  s3  nsa  gchq  five-eyes 
april 2014 by jm
s3funnel
'a command line tool for Amazon's Simple Storage Service (S3). Written in Python, easy_install the package to install as an egg. Supports multithreaded operations for large volumes. Put, get, or delete many items concurrently, using a fixed-size pool of threads. Built on workerpool for multithreading and boto for access to the Amazon S3 API. Unix-friendly input and output. Pipe things in, out, and all around.'

MIT-licensed open source. (via Paul Dolan)
via:pdolan  s3  s3funnel  tools  ops  aws  python  mit  open-source 
april 2014 by jm
Adrian Cockroft's Cloud Outage Reports Collection
The detailed summaries of outages from cloud vendors are comprehensive and the response to each highlights many lessons in how to build robust distributed systems. For outages that significantly affected Netflix, the Netflix techblog report gives insight into how to effectively build reliable services on top of AWS. [....] I plan to collect reports here over time, and welcome links to other write-ups of outages and how to survive them.
outages  post-mortems  documentation  ops  aws  ec2  amazon  google  dropbox  microsoft  azure  incident-response 
march 2014 by jm
S3QL
a file system that stores all its data online using storage services like Google Storage, Amazon S3, or OpenStack. S3QL effectively provides a hard disk of dynamic, infinite capacity that can be accessed from any computer with internet access running Linux, FreeBSD or OS-X.

S3QL is a standard conforming, full featured UNIX file system that is conceptually indistinguishable from any local file system. Furthermore, S3QL has additional features like compression, encryption, data de-duplication, immutable trees and snapshotting which make it especially suitable for online backup and archival.
s3  s3ql  backup  aws  filesystems  linux  freebsd  osx  ops 
march 2014 by jm
Video Processing at Dropbox
On-the-fly video transcoding during live streaming. They've done a great job of this!
At the beginning of the development of this feature, we entertained the idea to simply pre-transcode all the videos in Dropbox to all possible target devices. Soon enough we realized that this simple approach would be too expensive at our scale, so we decided to build a system that allows us to trigger a transcoding process only upon user request and cache the results for subsequent fetches. This on-demand approach: adapts to heterogeneous devices and network conditions, is relatively cheap (everything is relative at our scale), guarantees low latency startup time.
ffmpeg  dropbox  streaming  video  cdn  ec2  hls  http  mp4  nginx  haproxy  aws  h264 
february 2014 by jm
Netflix: Your Linux AMI: optimization and performance [slides]
a fantastic bunch of low-level kernel tweaks and tunables which Netflix have found useful in production to maximise productivity of their fleet. Interesting use of SCHED_BATCH process scheduler class for batch processes, in particular. Also, great docs on their experience with perf and SystemTap. Perf really looks like a tool I need to get to grips with...
netflix  aws  tuning  ami  perf  systemtap  tunables  sched_batch  batch  hadoop  optimization  performance 
december 2013 by jm
10 Things You Should Know About AWS
Some decent tips in here, mainly EC2-focussed
amazon  ec2  aws  ops  rds 
november 2013 by jm
Scryer: Netflix’s Predictive Auto Scaling Engine
Scryer is a new system that allows us to provision the right number of AWS instances needed to handle the traffic of our customers. But Scryer is different from Amazon Auto Scaling (AAS), which reacts to real-time metrics and adjusts instance counts accordingly. Rather, Scryer predicts what the needs will be prior to the time of need and provisions the instances based on those predictions.
scaling  infrastructure  aws  ec2  netflix  scryer  auto-scaling  aas  metrics  prediction  spikes 
november 2013 by jm
'Experience of software engineers using TLA+, PlusCal and TLC' [slides] [pdf]
by Chris Newcombe, an AWS principal engineer. Several Amazonians sharing their results in simulating tricky distributed-systems problems using formal methods
tla+  pluscal  tlc  formal-methods  simulation  proving  aws  amazon  architecture  design 
october 2013 by jm
DynamoDB Local
'a client-side database that supports the complete DynamoDB API, but doesn't manipulate any tables or data in DynamoDB itself. You can write code while sitting in a tree, on the beach, or in the desert. When you are ready to deploy your application, you simply instruct it to connect to the actual DynamoDB endpoint. No other modifications will be needed.'

This is good -- an in-memory data store for integration testing is absolutely vital for production usage. (Voldemort does this well, for example.)
dynamodb  aws  ec2  testing  integration-testing  unit-tests 
september 2013 by jm
Benchmarking Redis on AWS ElastiCache
good data points, but could do with latency percentiles
latency  redis  measurement  benchmarks  ec2  elasticache  aws  storage  tests 
september 2013 by jm
awscli

The future of the AWS command line tools is awscli, a single, unified, consistent command line tool that works with almost all of the AWS services. Here is a quick list of the services that awscli currently supports: Auto Scaling, CloudFormation, CloudSearch, CloudWatch, Data Pipeline, Direct Connect, DynamoDB, EC2, ElastiCache, Elastic Beanstalk, Elastic Transcoder, ELB, EMR, Identity and Access Management, Import/Export, OpsWorks, RDS, Redshift, Route 53, S3, SES, SNS, SQS, Storage Gateway, Security Token Service, Support API, SWF, VPC. Support for the following appears to be planned: CloudFront, Glacier, SimpleDB.

The awscli software is being actively developed as an open source project on Github, with a lot of support from Amazon. You’ll note that the biggest contributors to awscli are Amazon employees with Mitch Garnaat leading. Mitch is also the author of boto, the amazing Python library for AWS.
aws  awscli  cli  tools  command-line  ec2  s3  amazon  api 
august 2013 by jm
Improved HTTPS Performance with Early SSL Termination
This is a neat hack. Since SSL/TLS connection establishment requires lots of consecutive round trips before the connection is ready, by performing that closer to the user and reusing an existing region-to-region connection behind the scenes, the overall latency is greatly improved. Works for HTTP as well
http  https  ssl  architecture  aws  ec2  performance  latency  internet  round-trip  nginx  tls 
july 2013 by jm
EC2Instances.info
'Easy Amazon EC2 Instance Comparison'. a nice UI on the various EC2 instance types on offer with their key attributes. Misses out availability of EBS-optimized instances though
amazon  ec2  aws  comparison  pricing 
june 2013 by jm
the infamous 2008 S3 single-bit-corruption outage
Neat, I didn't realise this was publicly visible. A single corrupted bit infected the S3 gossip network, taking down the whole S3 service in (iirc) one region:
We've now determined that message corruption was the cause of the server-to-server communication problems. More specifically, we found that there were a handful of messages on Sunday morning that had a single bit corrupted such that the message was still intelligible, but the system state information was incorrect. We use MD5 checksums throughout the system, for example, to prevent, detect, and recover from corruption that can occur during receipt, storage, and retrieval of customers' objects. However, we didn't have the same protection in place to detect whether [gossip state] had been corrupted. As a result, when the corruption occurred, we didn't detect it and it spread throughout the system causing the symptoms described above. We hadn't encountered server-to-server communication issues of this scale before and, as a result, it took some time during the event to diagnose and recover from it.

During our post-mortem analysis we've spent quite a bit of time evaluating what happened, how quickly we were able to respond and recover, and what we could do to prevent other unusual circumstances like this from having system-wide impacts. Here are the actions that we're taking: (a) we've deployed several changes to Amazon S3 that significantly reduce the amount of time required to completely restore system-wide state and restart customer request processing; (b) we've deployed a change to how Amazon S3 gossips about failed servers that reduces the amount of gossip and helps prevent the behavior we experienced on Sunday; (c) we've added additional monitoring and alarming of gossip rates and failures; and, (d) we're adding checksums to proactively detect corruption of system state messages so we can log any such messages and then reject them.


This is why you checksum all the things ;)
s3  aws  post-mortems  network  outages  failures  corruption  grey-failures  amazon  gossip 
june 2013 by jm
Monitoring the Status of Your EBS Volumes
Page in the AWS docs which describes their derived metrics and how they are computed -- these are visible in the AWS Management Console, and alarmable, but not viewable in the Cloudwatch UI. grr. (page-joshea!)
ebs  aws  monitoring  metrics  ops  documentation  cloudwatch 
may 2013 by jm
AWS forum post on interpreting iostat output for EBS
Great post from AndrewC@EBS on interpreting iostat output on EBS volumes -- from 2009, but still looks reasonable enough
iostat  ebs  disks  hardware  aws  ops 
may 2013 by jm
Measuring & Optimizing I/O Performance
Another good writeup on iostat and EBS, from Ilya Grigorik
io  optimization  sysadmin  performance  iostat  ebs  aws  ops 
may 2013 by jm
ec2-consistent-snapshot
This program creates an EBS snapshot for an Amazon EC2 EBS volume. To
help ensure consistent data in the snapshot, it tries to flush and
freeze the filesystem(s) first as well as flushing and locking the
database, if applicable.

Filesystems can be frozen during the snapshot. Prior to Linux kernel
2.6.29, XFS must be used for freezing support. While frozen, a
filesystem will be consistent on disk and all writes will block.

There are a number of timeouts to reduce the risk of interfering with
the normal database operation while improving the chances of getting a
consistent snapshot.

If you have multiple EBS volumes in a RAID configuration, you can
specify all of the volume ids on the command line and it will create
snapshots for each while the filesystem and database are locked. Note
that it is your responsibility to keep track of the resulting snapshot
ids and to figure out how to put these back together when you need to
restore the RAID setup.


Handy!
ubuntu  ec2  aws  linux  ebs  snapshots  ops  tools  alestic 
may 2013 by jm
Understanding Elastic Block Store Availability and Performance [slides]
fantastic in-depth presentation on EBS usage; lots of good advice here if you're using EBS volumes with/without PIOPS
piops  ebs  performance  aws  ec2  ops  storage  amazon  presentations 
may 2013 by jm
Under the Covers of DynamoDB
mostly a DynamoDB puff-piece from last week's Amazon Cloud Connect, but contains some good real-world figures for a 20-billion-GUID deduping table use-case at end. ($4,150 per month, to cut to the chase)
dynamodb  aws  figures  costs  architecture  ec2  dedupe  cloud-connect  slides 
april 2013 by jm
Latency's Worst Nightmare: Performance Tuning Tips and Tricks [slides]
the basics of running a service stack (web, app servers, data stores) on AWS. some good benchmark figures in the final slides
benchmarks  aws  ec2  ebs  piops  services  scaling  scalability  presentations 
april 2013 by jm
High Scalability - Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
wow, Pinterest have a pretty hardcore architecture. Sharding to the max. This is scary stuff for me:
a [Cassandra-style] Cluster Management Algorithm is a SPOF. If there’s a bug it impacts every node. This took them down 4 times.


yeah, so, eek ;)
clustering  sharding  architecture  aws  scalability  scaling  pinterest  via:matt-sergeant  redis  mysql  memcached 
april 2013 by jm
High Performance MongoDB Clusters with Amazon EBS Provisioned IOPS
yeah yeah, Mongo. bookmarking for the good data on EBS+PIOPS
ebs  piops  aws  performance  tips  ops  ec2  mongodb  presentations 
april 2013 by jm
By the numbers: How Google Compute Engine stacks up to Amazon EC2
Scalr's thoughts on Google's EC2 competitor.
with Google Compute Engine, AWS has a formidable new competitor in the public cloud space, and we’ll likely be moving some of Scalr’s production workloads from our hybrid aws-rackspace-softlayer setup to it when it leaves beta. There’s a strong technical case for migrating heavy workloads to GCE, and I’ll be grabbing popcorn to eagerly watch as the battle unfolds between the giants.
gce  cloud  ec2  amazon  aws  google  scalr 
march 2013 by jm
Sift Science says it can sniff out cyber fraud — before it gets expensive
Great idea for a startup. This stuff is complex, right in the heart of every company's ordering pipeline, and I can see a lot of customers for this
sift-science  anti-fraud  fraud  b2b  b2c  ecommerce  startups  aws 
march 2013 by jm
Denominator: A Multi-Vendor Interface for DNS
the latest good stuff from Netflix.

Denominator is a portable Java library for manipulating DNS clouds. Denominator has pluggable back-ends, initially including AWS Route53, Neustar Ultra, DynECT, and a mock for testing. We also ship a command line version so it's easy for anyone to try it out.
The reason we built Denominator is that we are working on multi-region failover and traffic sharing patterns to provide higher availability for the streaming service during regional outages caused by our own bugs and AWS issues. To do this we need to directly control the DNS configuration that routes users to each region and each zone. When we looked at the features and vendors in this space we found that we were already using AWS Route53, which has a nice API but is missing some advanced features; Neustar UltraDNS, which has a SOAP based API; and DynECT, which has a REST API that uses a quite different pseudo-transactional model. We couldn’t find a Java based API that grouped together common set of capabilities that we are interested in, so we created one. The idea is that any feature that is supported by more than one vendor API is the highest common denominator, and that functionality can be switched between vendors as needed, or in the event of a DNS vendor outage.
dns  netflix  java  tools  ops  route53  aws  ultradns  dynect 
march 2013 by jm
Ironfan
'an expressive toolset for constructing scalable, resilient [service] architectures. It works in the cloud, in the data center, and on your laptop, and it makes your system diagram visible and inevitable. Inevitable systems coordinate automatically to interconnect, removing the hassle of manual configuration of connection points (and the associated danger of human error).' Looks like a pretty neat cluster deployment tool; driven from a single configuration file, using Chef, integrating closely with AWS and providing many useful additional features
chef  deployment  clusters  knife  services  aws  ec2  ops  ironfan  demo 
january 2013 by jm
AWS Advent 2012
'an annual exploration of Amazon Web Services.' Some great hacks here
aws  amazon  advent  sysadmin  s3  ec2  chef  puppet  ops 
december 2012 by jm
James Hamilton - Failures at Scale & How to Ride Through Them - AWS re:Invent 2012 - Cpn208
mostly an update of his classic USENIX paper, but pretty cool to come across a mention of a network monitoring system we've built on page 21 ;)
amazon  james-hamilton  reliabilty  slides  aws 
december 2012 by jm
How Team Obama’s tech efficiency left Romney IT in dust | Ars Technica
The web-app dev and ops best practices used by the Obama campaign's tech team. Some key tools: Puppet, EC2, Asgard, Cacti, Opsview, StatsD, Graphite, Seyren, Route53, Loggly, etc.
obama  campaigns  tools  ops  asgard  ec2  aws  route53 
november 2012 by jm
Amazon Web Services Blog: Amazon S3 Performance Tips & Tricks
Doug Grismore provides a very useful S3 performance tip; monotonically increasing keys will hurt performance, and describes a clean-enough way to avoid the problem
s3  performance  aws 
march 2012 by jm
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Amazing stuff from Adrian Cockroft at last week's QCon. Faceted object model, lots of Cassandra automation
cassandra  api  design  oo  object-model  java  adrian-cockroft  slides  qcon  scaling  aws  netflix 
march 2012 by jm
Cloudsmith Stack Hammer
something Chris Horn sent on -- using Puppet to build stacks and deploy to AWS using a simple point-and-click interface. looks cool
github  ec2  aws  puppet  stacks  cloudsmith  stack-hammer  via:chorn 
february 2012 by jm
Benchmarking Cassandra Scalability on AWS - Over a million writes per second
NetFlix' benchmarks -- impressively detailed. '48, 96, 144 and 288 instances', across 3 EC2 AZs in us-east, successfully scaling linearly
ec2  aws  cassandra  scaling  benchmarks  netflix  performance 
november 2011 by jm
Amazon hiring embedded OS developers
hey, I know a few of those! 'I need more help on a project I’m driving at Amazon where we continue to make big changes in our datacenter network to improve customer experience and drive down costs while, at the same time, deploying more gear into production each day than all of Amazon.com used back in 2000. It’s an exciting time and we have big changes happening in networking. If you enjoy and have experience in operating systems, networking protocol stacks, or embedded systems and you would like to work on one of the biggest networks in the world, [get in touch].' -- James Hamilton
james-hamilton  aws  jobs  amazon  networking  embedded 
october 2011 by jm
Building with Legos
Netflix tech blog on how they deploy their services. Notably, they avoid the Puppet/Chef approach, citing these reasons: 'One is that it eliminates a number of dependencies in the production environment: a master control server, package repository and client scripts on the servers, network permissions to talk to all of these. Another is that it guarantees that what we test in the test environment is the EXACT same thing that is deployed in production; there is very little chance of configuration or other creep/bit rot. Finally, it means that there is no way for people to change or install things in the production environment (this may seem like a really harsh restriction, but if you can build a new AMI fast enough it doesn't really make a difference).'
devops  cloud  aws  netflix  puppet  chef  deployment 
august 2011 by jm
Amazon EC2 outage: summary and lessons learned
Rightscale CTO on last week's outage; pretty detailed, good round-up of useful commentary from around the web, too
ebs  ec2  aws  cloud  availability  slas  rightscale  amazon 
april 2011 by jm
What Larry Page really needs to do to return Google to its startup roots
massively detailed critique of Google's corporate culture -- lots of internals exposed
google  management  culture  aws  corporate-culture  gossip  from delicious
march 2011 by jm
Quora’s Technology Examined
Python, Nginx, Tornado for COMET stuff, MySQL as a data store, memcached, Thrift, haproxy, AWS, Pylons.  fantastic, very detailed post (via Nelson)
quora  python  nginx  tornado  comet  mysql  memcached  thrift  haproxy  aws  pylons  via:nelson  from delicious
february 2011 by jm
Netflix: Dev and Ops internals
extensive details on the innards of Netflix' move to AWS, from the legendary Adrian Cockcroft
adrian-cockcroft  aws  netflix  ops  cloud  from delicious
november 2010 by jm

related tags

aas  acm  acm-queue  adrian-cockcroft  adrian-cockroft  advent  alestic  amazon  ami  analytics  anti-fraud  aphyr  api  architecture  asgard  authentication  auto-scaling  autoscaling  availability  aws  awscli  azure  b2b  b2c  backup  backups  batch  beanstalk  benchmarks  bugs  burst  campaigns  cassandra  cdn  cep  chef  chris-newcombe  cli  clients  cloud  cloud-connect  cloudformation  cloudsearch  cloudsmith  cloudwatch  clustering  clusters  code-spaces  coding  comet  command-line  comparison  containers  corporate-culture  corruption  costs  counters  cpu  culture  data-protection  dedupe  delete  delivery  demo  deployment  design  devops  disks  distcomp  distsys  dns  docker  documentation  dropbox  duplicity  duply  dynamodb  dynect  ebs  ec2  ecommerce  elasticache  elb  email  embedded  emr  eu-central-1  eureka  event-processing  event-streaming  extortion  fail  failures  ffmpeg  figures  filesystems  five-eyes  fluentd  formal-methods  fraud  freebsd  gce  gchq  germany  github  google  gossip  grey-failures  h264  hadoop  hailo  haproxy  hardware  hls  hosting  http  https  hystrix  incident-response  infrastructure  instances  integration-testing  internet  io  iops  iostat  ipc  ironfan  james-hamilton  java  jepsen  jobs  kafka  knife  latency  law  libraries  limits  linux  load  load-balancing  load-testing  logging  lucene  management  marc-brooker  measurement  memcached  memory  metrics  mfa  microsoft  mit  mocking  mocks  model-checking  mongodb  monitoring  mp4  mysql  netflix  network  networking  nginx  nosql  notifications  nsa  obama  object-model  oo  open-source  ops  optimization  osx  outages  partitions  pbailis  perf  perfect-forward-secrecy  performance  pinterest  piops  pluscal  post-mortems  prediction  presentations  pricing  privacy  programming  proving  provisioning  proxying  puppet  pylons  python  qcon  quora  r3  rabbitmq  raid  rdbms  rds  redis  redshift  reliability  reliabilty  ribbon  rightscale  round-trip  route53  ruby  s3  s3funnel  s3ql  scalability  scaling  scalr  sched_batch  scryer  sdk  search  security  servers  services  ses  sharding  sift-science  simulation  slas  slides  smtp  smugmug  snapshots  snooping  sns  sockets  solr  speculative-execution  spikes  spot-instances  sql  sqs  ssl  stack-hammer  stacks  startups  storage  streaming  survey  sysadmin  system-tests  systemtap  tail  tcp  tellybug  testing  tests  thrift  tips  tla  tla+  tlc  tls  tools  tornado  tunables  tuning  two-factor-authentication  ubuntu  ultradns  unit-tests  unix  via:chorn  via:matt-sergeant  via:nelson  via:pdolan  video  vpc  zonify 

Copy this bookmark:



description:


tags: