jm + aws   113

Lambda: Bees with Frickin' Laser Beams
a HTTP testing tool in AWS Lambda. nice enough, but still a toy...
lambda  aws  node  javascript  hacks  http  load-testing 
15 days ago by jm
awslabs/aws-lambda-redshift-loader
Load data into Redshift from S3 buckets using a pre-canned Lambda function. Looks like it may be a good example of production-quality Lambda
lambda  aws  ec2  redshift  s3  loaders  etl  pipeline 
17 days ago by jm
s3.amazonaws.com "certificate verification failed" errors due to crappy Verisign certs and overzealous curl policies
Seth Vargo is correct. Its not the bit length of the key which is at issue, its the signature algorithm. The entire keychain for the s3.awsamazon.com key is signed with SHA1withRSA:

https://www.ssllabs.com/ssltest/analyze.html?d=s3.amazonaws.com&s=54.231.244.0&hideResults=on

At issue is that the root verisign key has been marked as weak because of SHA1 and taken out of the curl bundle which is widely popular, and this issue will continue to cause more and more issues going forwards as that bundle makes it way into shipping o/s distributions and aws certification verification breaks.


'This is still happening and curl is now failing on my machine causing all sorts of fun issues (including breaking CocoaPods that are using S3 for storage).' -- @jmhodges

This may be a contributory factor to the issue @nelson saw: https://nelsonslog.wordpress.com/2015/04/28/cyberduck-is-responsible-for-my-bad-ssl-certificate/

Curl's ca-certs bundle is also used by Node: https://github.com/joyent/node/issues/8894 and doubtless many other apps and packages.

Here's a mailing list thread discussing the issue: http://curl.haxx.se/mail/archive-2014-10/0066.html -- looks like the curl team aren't too bothered about it.
curl  s3  amazon  aws  ssl  tls  certs  sha1  rsa  key-length  security  cacerts 
23 days ago by jm
Kappa
'a command line tool that (hopefully) makes it easier to deploy, update, and test functions for AWS Lambda.' much needed IMO -- Lambda is too closed
aws  lambda  mitch-garnaat  coding  testing  cli  kappa 
24 days ago by jm
Cluster-Based Architectures Using Docker and Amazon EC2 Container Service
In this post, we’re going to take a deeper dive into the architectural concepts underlying cluster computing using container management frameworks such as ECS. We will show how these frameworks effectively abstract the low-level resources such as CPU, memory, and storage, allowing for highly efficient usage of the nodes in a compute cluster. Building on some of the concepts detailed in the earlier posts, we will discover why containers are such a good fit for this type of abstraction, and how the Amazon EC2 Container Service fits into the larger ecosystem of cluster management frameworks.
docker  aws  ecs  ec2  ops  hosting  containers  mesos  clusters 
28 days ago by jm
Amazon EC2 Container Service team AmA
a few answers here. Mostly people pointing out shortcomings and the team asking them to start a thread on their forum though :(
ec2  ecs  docker  aws  ops  ama  reddit 
28 days ago by jm
credstash
'CredStash is a very simple, easy to use credential management and distribution system that uses AWS Key Management System (KMS) for key wrapping and master-key storage, and DynamoDB for credential storage and sharing.'
aws  credstash  python  security  keys  key-management  secrets  kms 
4 weeks ago by jm
Run your own high-end cloud gaming service on EC2
Using Steam streaming and EC2 g2.2xlarge spot instances -- 'comes out to around $0.52/hr'. That's pretty compelling IMO
aws  ec2  gaming  games  graphics  spot-instances  hacks  windows  steam 
4 weeks ago by jm
Microservices and elastic resource pools with Amazon EC2 Container Service
interesting approach to working around ECS' shortcomings -- bit specific to Hailo's microservices arch and IPC mechanism though.

aside: I like their version numbering scheme: ISO-8601, YYYYMMDDHHMMSS. keep it simple!
versioning  microservices  hailo  aws  ec2  ecs  docker  containers  scheduling  allocation  deployment  provisioning  qos 
5 weeks ago by jm
Subscribing AWS Lambda Function To SNS Topic With aws-cli
how to use the AWS command line tools to do this
aws  aws-cli  cli  lambda  sns  hacks 
5 weeks ago by jm
AWS Lambda Event-Driven Architecture With Amazon SNS
Any message posted to an SNS topic can trigger the execution of custom code you have written, but you don’t have to maintain any infrastructure to keep that code available to listen for those events and you don’t have to pay for any infrastructure when the code is not being run. This is, in my opinion, the first time that Amazon can truly say that AWS Lambda is event-driven, as we now have a central, independent, event management system (SNS) where any authorized entity can trigger the event (post a message to a topic) and any authorized AWS Lambda function can listen for the event, and neither has to know about the other.
aws  ec2  lambda  sns  events  cep  event-processing  coding  cloud  hacks  eric-hammond 
5 weeks ago by jm
Amazon Machine Learning
Upsides of this new AWS service:

* great UI and visualisations.

* solid choice of metric to evaluate the results. Maybe things moved on since I was working on it, but the use of AUC, false positives and false negatives was pretty new when I was working on it. (er, 10 years ago!)

Downsides:

* it could do with more support for unsupervised learning algorithms. Supervised learning means you need to provide training data, which in itself can be hard work. My experience with logistic regression in the past is that it requires very accurate training data, too -- its tolerance for misclassified training examples is poor.

* Also, in my experience, 80% of the hard work of using ML algorithms is writing good tokenisation and feature extraction algorithms. I don't see any help for that here unfortunately. (probably not that surprising as it requires really detailed knowledge of the input data to know what classes can be abbreviated into a single class, etc.)
amazon  aws  ml  machine-learning  auc  data-science 
5 weeks ago by jm
(SEC307) Building a DDoS-Resilient Architecture with AWS
good slides on a "web application firewall" proxy service, deployable as an auto-scaling EC2 unit
ec2  aws  ddos  security  resilience  slides  reinvent  firewalls  http  elb 
6 weeks ago by jm
S3's "s3-external-1.amazonaws.com" endpoint
public documentation of how to work around the legacy S3 multi-region replication behaviour in North America
aws  s3  eventual-consistency  consistency  us-east  replication  workarounds  legacy 
6 weeks ago by jm
When S3's eventual consistency is REALLY eventual
a consistency outage in S3 last year, resulting in about 40 objects failing read-after-write consistency for a duration of about 23 hours
s3  eventual-consistency  aws  consistency  read-after-writes  bugs  outages  stackdriver 
6 weeks ago by jm
Can Spark Streaming survive Chaos Monkey?
good empirical results on Spark's resilience to network/host outages in EC2
ec2  aws  emr  spark  resilience  ha  fault-tolerance  chaos-monkey  netflix 
10 weeks ago by jm
500 Mbps upload to S3
the following guidelines maximize bandwidth usage:
Optimizing the sizes of the file parts, whether they are part of a large file or an entire small file; Optimizing the number of parts transferred concurrently.
Tuning these two parameters achieves the best possible transfer speeds to [S3].
s3  uploads  dataman  aws  ec2  performance 
11 weeks ago by jm
Alibaba's cloud service launches in US, wants to rain all over Amazon
server-hosting only for now. Interesting!
Alibaba’s cloud platform already competes with the likes of AWS in China. Aliyun’s Chinese data centers are in Beijing, Hangzhou, Qingdao, Hong Kong, and Shenzhen. “For the time being, we are just testing the water,” Yu said today. That means Aliyun will focus first on Chinese companies doing business in the US. “We know well what Chinese clients need, and now it’s time for us to learn what US clients need,” he added.
alibaba  china  aws  aliyun  hosting 
11 weeks ago by jm
What Color Is Your Xen?
What a mess.
What's faster: PV, HVM, HVM with PV drivers, PVHVM, or PVH? Cloud computing providers using Xen can offer different virtualization "modes", based on paravirtualization (PV), hardware virtual machine (HVM), or a hybrid of them. As a customer, you may be required to choose one of these. So, which one?
ec2  linux  performance  aws  ops  pv  hvm  xen  virtualization 
12 weeks ago by jm
Azul Zing on Ubuntu on AWS Marketplace
hmmm, very interesting -- the super-low-latency Zing JVM is available as a commercial EC2 instance type, at costs less than the EC2 instance price
zing  azul  latency  performance  ec2  aws 
february 2015 by jm
0x74696d | Falling In And Out Of Love with DynamoDB, Part II
Good DynamoDB real-world experience post, via Mitch Garnaat. We should write up ours, although it's pretty scary-stuff-free by comparison
aws  dynamodb  storage  databases  architecture  ops 
february 2015 by jm
Comparing Message Queue Architectures on AWS
A good overview -- I like the summary table. tl;dr:
If you are light on DevOps and not latency sensitive use SQS for job management and Kinesis for event stream processing. If latency is an issue, use ELB or 2 RabbitMQs (or 2 beanstalkds) for job management and Redis for event stream processing.
amazon  architecture  aws  messaging  queueing  elb  rabbitmq  beanstalk  kinesis  sqs  redis  kafka 
february 2015 by jm
AWS Tips I Wish I'd Known Before I Started
Some good advice and guidelines (although some are just silly).
aws  ops  tips  advice  ec2  s3 
january 2015 by jm
AWS Key Management Service Cryptographic Details
"AWS Key Management Service (AWS KMS) provides cryptographic keys and operations scaled for the cloud. AWS KMS keys and functionality are used by other AWS cloud services, and you can use them to protect user data in your applications that use AWS. This white paper provides details on the cryptographic operations that are executed within AWS when you use AWS KMS."
white-papers  aws  amazon  kms  key-management  crypto  pdf 
december 2014 by jm
Aurora for MySQL is coming
'Anurag@AWS posts a quite interesting comment on Aurora failover: We asynchronously write to 6 copies and ack the write when we see four completions. So, traditional 4/6 quorums with synchrony as you surmised. Now, each log record can end up with a independent quorum from any other log record, which helps with jitter, but introduces some sophistication in recovery protocols. We peer to peer to fill in holes. We also will repair bad segments in the background, and downgrade to a 3/4 quorum if unable to place in an AZ for any extended period. You need a pretty bad failure to get a write outage.' (via High Scalability)
via:highscalability  mysql  aurora  failover  fault-tolerance  aws  replication  quorum 
december 2014 by jm
(SDD416) Amazon EBS Deep Dive | AWS re:Invent 2014
Excellent data on current EBS performance characteristics
ebs  ops  aws  reinvent  slides 
november 2014 by jm
AWS re:Invent 2014 Video & Slide Presentation Links
Nice work by Andrew Spyker -- this should be an official feature of the re:Invent website, really
reinvent  aws  conferences  talks  slides  ec2  s3  ops  presentations 
november 2014 by jm
AWS re:Invent 2014 | (SPOT302) Under the Covers of AWS: Its Core Distributed Systems - YouTube
This is a really solid talk -- not surprising, alv@ is one of the speakers!
"AWS and Amazon.com operate some of the world's largest distributed systems infrastructure and applications. In our past 18 years of operating this infrastructure, we have come to realize that building such large distributed systems to meet the durability, reliability, scalability, and performance needs of AWS requires us to build our services using a few common distributed systems primitives. Examples of these primitives include a reliable method to build consensus in a distributed system, reliable and scalable key-value store, infrastructure for a transactional logging system, scalable database query layers using both NoSQL and SQL APIs, and a system for scalable and elastic compute infrastructure.

In this session, we discuss some of the solutions that we employ in building these primitives and our lessons in operating these systems. We also cover the history of some of these primitives -- DHTs, transactional logging, materialized views and various other deep distributed systems concepts; how their design evolved over time; and how we continue to scale them to AWS. "


Slides: http://www.slideshare.net/AmazonWebServices/spot302-under-the-covers-of-aws-core-distributed-systems-primitives-that-power-our-platform-aws-reinvent-2014
scale  scaling  aws  amazon  dht  logging  data-structures  distcomp  via:marc-brooker  dynamodb  s3 
november 2014 by jm
JMESPath
an [XPath-style] query language for JSON. You can extract and transform elements from a JSON document.


Supported by the "aws" CLI tool, and in boto.
aws  boto  jmespath  json  xpath  querying  languages  documents 
november 2014 by jm
DynamoDB Streams
This is pretty awesome. All changes to a DynamoDB table can be streamed to a Kinesis stream, MySQL-replication-style.

The nice bit is that it has a solid way to ensure readers won't get overwhelmed by the stream volume (since ddb tables are IOPS-rate-limited), and Kinesis has a solid way to read missed updates (since it's a Kafka-style windowed persistent stream). With this you have a pretty reliable way to ensure you're not going to suffer data loss.
iops  dynamodb  aws  kinesis  reliability  replication  multi-az  multi-region  failover  streaming  kafka 
november 2014 by jm
Doing Constant Work to Avoid Failures
A good example of a design pattern -- by performing a relatively constant amount of work regardless of the input, we can predict scalability and reduce the risk of overload when something unexpected changes in that input
scalability  scaling  architecture  aws  route53  via:brianscanlan  overload  constant-load  loading 
november 2014 by jm
Why We Didn’t Use Kafka for a Very Kafka-Shaped Problem
A good story of when Kafka _didn't_ fit the use case:
We came up with a complicated process of app-level replication for our messages into two separate Kafka clusters. We would then do end-to-end checking of the two clusters, detecting dropped messages in each cluster based on messages that weren’t in both.

It was ugly. It was clearly going to be fragile and error-prone. It was going to be a lot of app-level replication and horrible heuristics to see when we were losing messages and at least alert us, even if we couldn’t fix every failure case.

Despite us building a Kafka prototype for our ETL — having an existing investment in it — it just wasn’t going to do what we wanted. And that meant we needed to leave it behind, rewriting the ETL prototype.
cassandra  java  kafka  scala  network-partitions  availability  multi-region  multi-az  aws  replication  onlive 
november 2014 by jm
Zookeeper: not so great as a highly-available service registry
Turns out ZK isn't a good choice as a service discovery system, if you want to be able to use that service discovery system while partitioned from the rest of the ZK cluster:
I went into one of the instances and quickly did an iptables DROP on all packets coming from the other two instances.  This would simulate an availability zone continuing to function, but that zone losing network connectivity to the other availability zones.  What I saw was that the two other instances noticed the first server “going away”, but they continued to function as they still saw a majority (66%).  More interestingly the first instance noticed the other two servers “going away”, dropping the ensemble availability to 33%.  This caused the first server to stop serving requests to clients (not only writes, but also reads).


So: within that offline AZ, service discovery *reads* (as well as writes) stopped working due to a lack of ZK quorum. This is quite a feasible outage scenario for EC2, by the way, since (at least when I was working there) the network links between AZs, and the links with the external internet, were not 100% overlapping.

In other words, if you want a highly-available service discovery system in the fact of network partitions, you want an AP service discovery system, rather than a CP one -- and ZK is a CP system.

Another risk, noted on the Netflix Eureka mailing list at https://groups.google.com/d/msg/eureka_netflix/LXKWoD14RFY/tA9UnerrBHUJ :

ZooKeeper, while tolerant against single node failures, doesn't react well to long partitioning events. For us, it's vastly more important that we maintain an available registry than a necessarily consistent registry. If us-east-1d sees 23 nodes, and us-east-1c sees 22 nodes for a little bit, that's OK with us.


I guess this means that a long partition can trigger SESSION_EXPIRED state, resulting in ZK client libraries requiring a restart/reconnect to fix. I'm not entirely clear what happens to the ZK cluster itself in this scenario though.

Finally, Pinterest ran into other issues relying on ZK for service discovery and registration, described at http://engineering.pinterest.com/post/77933733851/zookeeper-resilience-at-pinterest ; sounds like this was mainly around load and the "thundering herd" overload problem. Their workaround was to decouple ZK availability from their services' availability, by building a Smartstack-style sidecar daemon on each host which tracked/cached ZK data.
zookeeper  service-discovery  ops  ha  cap  ap  cp  service-registry  availability  ec2  aws  network  partitions  eureka  smartstack  pinterest 
november 2014 by jm
Elastic MapReduce vs S3
Turns out there are a few bugs in EMR's S3 support, believe it or not.

1. 'Consider disabling Hadoop's speculative execution feature if your cluster is experiencing Amazon S3 concurrency issues. You do this through the mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution configuration settings. This is also useful when you are troubleshooting a slow cluster.'

2. Upgrade to AMI 3.1.0 or later, otherwise retries of S3 ops don't work.
s3  emr  hadoop  aws  bugs  speculative-execution  ops 
october 2014 by jm
Load testing Apache Kafka on AWS
This is a very solid benchmarking post, examining Kafka in good detail. Nicely done. Bottom line:
I basically spend 2/3 of my work time torture testing and operationalizing distributed systems in production. There's some that I'm not so pleased with (posts pending in draft forever) and some that have attributes that I really love. Kafka is one of those systems that I pretty much enjoy every bit of, and the fact that it performs predictably well is only a symptom of the reason and not the reason itself: the authors really know what they're doing. Nothing about this software is an accident. Performance, everything in this post, is only a fraction of what's important to me and what matters when you run these systems for real. Kafka represents everything I think good distributed systems are about: that thorough and explicit design decisions win.
testing  aws  kafka  ec2  load-testing  benchmarks  performance 
october 2014 by jm
Zonify
'a set of command line tools for managing Route53 DNS for an AWS infrastructure. It intelligently uses tags and other metadata to automatically create the associated DNS records.'
zonify  aws  dns  ec2  route53  ops 
october 2014 by jm
Using spot instances
Excellent post on all of the ins and outs of EC2 spot instance usage
ec2  aws  spot-instances  pricing  cloud  auto-scaling  ops 
september 2014 by jm
All Data Are Belong to AWS: Streaming upload via Fluentd
Fluentd looks like a decent foundation for tailing/streaming event processing in Ruby, supporting batched output to S3 and a bunch of other AWS services, Kafka, and RabbitMQ for output. Claims to have ok performance, despite its Rubbitude. However, its high-availability story is shite, so not to be used where availability is important
ruby  rabbitmq  kafka  tail  event-streaming  cep  event-processing  s3  aws  sqs  fluentd 
august 2014 by jm
AWS Speed Test: What are the Fastest EC2 and S3 Regions?
My god, this test is awful -- this is how NOT to test networked infrastructure. (1) testing from a single EC2 instance in each region; (2) uploading to a single test bucket for each test; (3) results don't include min/max or percentiles, just an averaged measurement for each test. FAIL
fail  testing  networking  performance  ec2  aws  s3  internet 
august 2014 by jm
The Network is Reliable - ACM Queue
Peter Bailis and Kyle Kingsbury accumulate a comprehensive, informal survey of real-world network failures observed in production. I remember that April 2011 EBS outage...
ec2  aws  networking  outages  partitions  jepsen  pbailis  aphyr  acm-queue  acm  survey  ops 
july 2014 by jm
Latest EBS tuning tips
from yesterday's AWS Summit in NYC:

Cheat sheet of EBS-optimized instances. http://t.co/vmTlhUtpWk
Optimize your queue depth to achieve lower latency & highest IOPS. http://t.co/EO48oa0D6X
When configuring your RAID, use a stripe size of 128KB or 256KB. http://t.co/N0ldtFJ4t6
Use larger block size to speed up the pre-warming process. http://t.co/8UoIeWE2px
ebs  aws  amazon  iops  raid  ops  tuning 
july 2014 by jm
Netflix/ribbon
a client side IPC library that is battle-tested in cloud. It provides the following features:

Load balancing;
Fault tolerance;
Multiple protocol (HTTP, TCP, UDP) support in an asynchronous and reactive model;
Caching and batching.

I like the integration of Eureka and Hystrix in particular, although I would really like to read more about Eureka's approach to availability during network partitions and CAP.

https://groups.google.com/d/msg/eureka_netflix/LXKWoD14RFY/-5nElGl1OQ0J has some interesting discussion on the topic. It actually sounds like the Eureka approach is more correct than using ZK: 'Eureka is available. ZooKeeper, while tolerant against single node failures, doesn't react well to long partitioning events. For us, it's vastly more important that we maintain an available registry than a necessary consistent registry. If us-east-1d sees 23 nodes, and us-east-1c sees 22 nodes for a little bit, that's OK with us.'

See also http://ispyker.blogspot.ie/2013/12/zookeeper-as-cloud-native-service.html which corroborates this:

I went into one of the instances and quickly did an iptables DROP on all packets coming from the other two instances. This would simulate an availability zone continuing to function, but that zone losing network connectivity to the other availability zones. What I saw was that the two other instances noticed that the first server “going away”, but they continued to function as they still saw a majority (66%). More interestingly the first instance noticed the other two servers “going away” dropping the ensemble availability to 33%. This caused the first server to stop serving requests to clients (not only writes, but also reads). [...]

To me this seems like a concern, as network partitions should be considered an event that should be survived. In this case (with this specific configuration of zookeeper) no new clients in that availability zone would be able to register themselves with consumers within the same availability zone. Adding more zookeeper instances to the ensemble wouldn’t help considering a balanced deployment as in this case the availability would always be majority (66%) and non-majority (33%).
netflix  ribbon  availability  libraries  java  hystrix  eureka  aws  ec2  load-balancing  networking  http  tcp  architecture  clients  ipc 
july 2014 by jm
New AWS Web Services region: eu-central-1 (soon)
Iiiinteresting. Sounds like new anti-NSA-snooping privacy laws will be driving a lot of new mini-regions in AWS. Hope Amazon have their new-region-standup process a little more streamlined by now than when I was there ;)
aws  germany  privacy  ec2  eu-central-1  nsa  snooping 
july 2014 by jm
New Low Cost EC2 Instances with Burstable Performance
Oh, very neat. New micro, small, and medium-class instances with burstable CPU scaling:
The T2 instances are built around a processing allocation model that provides you a generous, assured baseline amount of processing power coupled with the ability to automatically and transparently scale up to a full core when you need more compute power. Your ability to burst is based on the concept of "CPU Credits" that you accumulate during quiet periods and spend when things get busy. You can provision an instance of modest size and cost and still have more than adequate compute power in reserve to handle peak demands for compute power.
ec2  aws  hosting  cpu  scaling  burst  load  instances 
july 2014 by jm
Delivery Notifications for Simple Email Service
Today we are enhancing SES with the addition of delivery notifications. You can now elect to receive an Amazon SNS notification each time SES successfully delivers a message to a recipient's email server. These notifications give you increased visibility into the mail delivery process. With today's release, you can now track deliveries, bounces, and complaints, all via notification to the SNS topic or topics of your choice.
delivery  email  smtp  ses  aws  sns  notifications  ops 
june 2014 by jm
Amazon EC2 Service Limits Report Now Available
'designed to make it easier for you to view and manage your limits for Amazon EC2 by providing the latest information on service limits and links to quickly request limit increases. EC2 Service Limits Report displays all your service limit information in one place to help you avoid encountering limits on future EC2, EBS, Auto Scaling, and VPC usage.'
aws  ec2  vpc  ebs  autoscaling  limits  ops 
june 2014 by jm
Code Spaces data and backups deleted by hackers
Rather scary story of an extortionist wiping out a company's AWS-based infrastructure. Turns out S3 supports MFA-required deletion as a feature, though, which would help against that.
ops  security  extortion  aws  ec2  s3  code-spaces  delete  mfa  two-factor-authentication  authentication  infrastructure 
june 2014 by jm
Use of Formal Methods at Amazon Web Services
Chris Newcombe, Marc Brooker, et al. writing about their experience using formal specification and model-checking languages (TLA+) in production in AWS:

The success with DynamoDB gave us enough evidence to present TLA+ to the broader engineering community at Amazon. This raised a challenge; how to convey the purpose and benefits of formal methods to an audience of software engineers? Engineers think in terms of debugging rather than ‘verification’, so we called the presentation “Debugging Designs”.

Continuing that metaphor, we have found that software engineers more readily grasp the concept and practical value of TLA+ if we dub it 'Exhaustively-testable pseudo-code'.

We initially avoid the words ‘formal’, ‘verification’, and ‘proof’, due to the widespread view that formal methods are impractical. We also initially avoid mentioning what the acronym ‘TLA’ stands for, as doing so would give an incorrect impression of complexity.


More slides at http://tla2012.loria.fr/contributed/newcombe-slides.pdf ; proggit discussion at http://www.reddit.com/r/programming/comments/277fbh/use_of_formal_methods_at_amazon_web_services/
formal-methods  model-checking  tla  tla+  programming  distsys  distcomp  ebs  s3  dynamodb  aws  ec2  marc-brooker  chris-newcombe 
june 2014 by jm
AWS SDK for Java Client Configuration
turns out the AWS SDK has lots of tuning knobs: region selection, socket buffer sizes, and debug logging (including wire logging).
aws  sdk  java  logging  ec2  s3  dynamodb  sockets  tuning 
june 2014 by jm
moto
Mock Boto: 'a library that allows your python tests to easily mock out the boto library.' Supports S3, Autoscaling, EC2, DynamoDB, ELB, Route53, SES, SQS, and STS currently, and even supports a standalone server mode, to act as a mock service for non-Python clients. Excellent!

(via Conor McDermottroe)
python  aws  testing  mocks  mocking  system-tests  unit-tests  coding  ec2  s3 
may 2014 by jm
AWS Case Study: Hailo
Ubuntu, C*, HAProxy, MySQL, RDS, multiple AWS regions.
hailo  cassandra  ubuntu  mysql  rds  aws  ec2  haproxy  architecture 
april 2014 by jm
AWS Elastic Beanstalk for Docker
This is pretty amazing. nice work, Beanstalk team. not sure how well it integrates with the rest of AWS though
aws  amazon  docker  ec2  beanstalk  ops  containers  linux 
april 2014 by jm
Using AWS in the context of Australian Privacy Considerations
interesting new white paper from Amazon regarding recent strengthening of the Aussie privacy laws, particularly w.r.t. geographic location of data and access by overseas law enforcement agencies...
amazon  aws  security  law  privacy  data-protection  ec2  s3  nsa  gchq  five-eyes 
april 2014 by jm
s3funnel
'a command line tool for Amazon's Simple Storage Service (S3). Written in Python, easy_install the package to install as an egg. Supports multithreaded operations for large volumes. Put, get, or delete many items concurrently, using a fixed-size pool of threads. Built on workerpool for multithreading and boto for access to the Amazon S3 API. Unix-friendly input and output. Pipe things in, out, and all around.'

MIT-licensed open source. (via Paul Dolan)
via:pdolan  s3  s3funnel  tools  ops  aws  python  mit  open-source 
april 2014 by jm
Adrian Cockroft's Cloud Outage Reports Collection
The detailed summaries of outages from cloud vendors are comprehensive and the response to each highlights many lessons in how to build robust distributed systems. For outages that significantly affected Netflix, the Netflix techblog report gives insight into how to effectively build reliable services on top of AWS. [....] I plan to collect reports here over time, and welcome links to other write-ups of outages and how to survive them.
outages  post-mortems  documentation  ops  aws  ec2  amazon  google  dropbox  microsoft  azure  incident-response 
march 2014 by jm
S3QL
a file system that stores all its data online using storage services like Google Storage, Amazon S3, or OpenStack. S3QL effectively provides a hard disk of dynamic, infinite capacity that can be accessed from any computer with internet access running Linux, FreeBSD or OS-X.

S3QL is a standard conforming, full featured UNIX file system that is conceptually indistinguishable from any local file system. Furthermore, S3QL has additional features like compression, encryption, data de-duplication, immutable trees and snapshotting which make it especially suitable for online backup and archival.
s3  s3ql  backup  aws  filesystems  linux  freebsd  osx  ops 
march 2014 by jm
Video Processing at Dropbox
On-the-fly video transcoding during live streaming. They've done a great job of this!
At the beginning of the development of this feature, we entertained the idea to simply pre-transcode all the videos in Dropbox to all possible target devices. Soon enough we realized that this simple approach would be too expensive at our scale, so we decided to build a system that allows us to trigger a transcoding process only upon user request and cache the results for subsequent fetches. This on-demand approach: adapts to heterogeneous devices and network conditions, is relatively cheap (everything is relative at our scale), guarantees low latency startup time.
ffmpeg  dropbox  streaming  video  cdn  ec2  hls  http  mp4  nginx  haproxy  aws  h264 
february 2014 by jm
Netflix: Your Linux AMI: optimization and performance [slides]
a fantastic bunch of low-level kernel tweaks and tunables which Netflix have found useful in production to maximise productivity of their fleet. Interesting use of SCHED_BATCH process scheduler class for batch processes, in particular. Also, great docs on their experience with perf and SystemTap. Perf really looks like a tool I need to get to grips with...
netflix  aws  tuning  ami  perf  systemtap  tunables  sched_batch  batch  hadoop  optimization  performance 
december 2013 by jm
10 Things You Should Know About AWS
Some decent tips in here, mainly EC2-focussed
amazon  ec2  aws  ops  rds 
november 2013 by jm
Scryer: Netflix’s Predictive Auto Scaling Engine
Scryer is a new system that allows us to provision the right number of AWS instances needed to handle the traffic of our customers. But Scryer is different from Amazon Auto Scaling (AAS), which reacts to real-time metrics and adjusts instance counts accordingly. Rather, Scryer predicts what the needs will be prior to the time of need and provisions the instances based on those predictions.
scaling  infrastructure  aws  ec2  netflix  scryer  auto-scaling  aas  metrics  prediction  spikes 
november 2013 by jm
'Experience of software engineers using TLA+, PlusCal and TLC' [slides] [pdf]
by Chris Newcombe, an AWS principal engineer. Several Amazonians sharing their results in simulating tricky distributed-systems problems using formal methods
tla+  pluscal  tlc  formal-methods  simulation  proving  aws  amazon  architecture  design 
october 2013 by jm
« earlier      
per page:    204080120160

related tags

aas  acm  acm-queue  adrian-cockcroft  adrian-cockroft  advent  advice  alestic  alibaba  aliyun  allocation  ama  amazon  ami  analytics  anti-fraud  ap  aphyr  api  architecture  asg  asgard  auc  aurora  authentication  auto-scaling  autoscaling  availability  aws  aws-cli  awscli  azul  azure  b2b  b2c  backup  backups  batch  beanstalk  benchmarks  blue-green-deployments  boto  bugs  burst  cacerts  campaigns  cap  cassandra  cdn  cep  certs  chaos-monkey  chef  china  chris-newcombe  cli  clients  cloud  cloud-connect  cloudformation  cloudnative  cloudsearch  cloudsmith  cloudwatch  clustering  clusters  code-spaces  codedeploy  coding  comet  command-line  comparison  conferences  consistency  constant-load  containers  corporate-culture  corruption  costs  counters  cp  cpu  credstash  cross-region  crypto  culture  curl  data-protection  data-science  data-structures  databases  dataman  ddos  dedupe  delete  delivery  delta  demo  deploy  deployment  design  devops  dht  disks  distcomp  distsys  dns  docker  documentation  documents  dropbox  duplicity  duply  dynamodb  dynect  ebs  ec2  ecommerce  ecs  elasticache  elb  email  embedded  emr  eric-hammond  etl  eu-central-1  eureka  event-processing  event-streaming  events  eventual-consistency  examples  extortion  fail  failover  failures  fault-tolerance  ffmpeg  figures  filesystems  firewalls  five-eyes  fluentd  formal-methods  fraud  freebsd  games  gaming  gce  gchq  germany  gilt  github  google  gossip  graphics  grey-failures  h264  ha  hacks  hadoop  hailo  haproxy  hardware  hls  hosting  http  https  hvm  hystrix  iam  incident-response  infrastructure  instances  integration-testing  inter-region  internet  io  iops  iostat  ipc  ironfan  james-hamilton  java  javascript  jepsen  jmespath  jobs  json  kafka  kappa  key-length  key-management  key-rotation  keys  kinesis  kms  knife  lambda  languages  latency  law  legacy  libraries  limits  linux  load  load-balancing  load-testing  loaders  loading  logging  lucene  machine-learning  management  marc-brooker  measurement  memcached  memory  mesos  messaging  metrics  mfa  microservices  microsoft  mit  mitch-garnaat  ml  mocking  mocks  model-checking  mongodb  monitoring  mp4  multi-az  multi-region  mysql  netflix  network  network-partitions  networking  nginx  node  node.js  nosql  notifications  nsa  obama  object-model  onlive  oo  open-source  ops  optimization  osx  outages  overload  partitions  pbailis  pdf  perf  perfect-forward-secrecy  performance  pinterest  piops  pipeline  pluscal  post-mortems  prediction  presentations  pricing  privacy  programming  proving  provisioning  proxying  puppet  pv  pylons  python  qcon  qos  querying  queueing  quora  quorum  r3  rabbitmq  raid  rdbms  rds  read-after-writes  reddit  redis  redshift  reinvent  reliability  reliabilty  replication  resilience  ribbon  rightscale  round-trip  route53  rsa  ruby  s3  s3funnel  s3ql  scala  scalability  scale  scaling  scalr  scheduling  sched_batch  scryer  sdk  search  secrets  security  servers  service-discovery  service-registry  services  ses  sha1  sharding  sift-science  simulation  slas  slides  smartstack  smtp  smugmug  snapshots  snooping  sns  sockets  solr  spark  speculative-execution  spikes  spot-instances  sql  sqs  ssl  stack-hammer  stackdriver  stacks  startups  steam  storage  streaming  survey  sysadmin  system-tests  systemtap  tail  talks  tcp  tellybug  testing  tests  thrift  tips  tla  tla+  tlc  tls  tools  tornado  transactions  tunables  tuning  two-factor-authentication  ubuntu  ultradns  unit-tests  unix  uploads  us-east  versioning  via:brianscanlan  via:chorn  via:highscalability  via:marc-brooker  via:matt-sergeant  via:nelson  via:pdolan  video  virtualization  vpc  white-papers  windows  workarounds  xen  xpath  zing  zonify  zookeeper 

Copy this bookmark:



description:


tags: