jm + aws   201

Spotting a million dollars in your AWS account · Segment Blog
You can easily split your spend by AWS service per month and call it a day. Ten thousand dollars of EC2, one thousand to S3, five hundred dollars to network traffic, etc. But what’s still missing is a synthesis of which products and engineering teams are dominating your costs. 

Then, add in the fact that you may have hundreds of instances and millions of containers that come and go. Soon, what started as simple analysis problem has quickly become unimaginably complex. 

In this follow-up post, we’d like to share details on the toolkit we used. Our hope is to offer up a few ideas to help you analyze your AWS spend, no matter whether you’re running only a handful of instances, or tens of thousands.

segment  money  costs  billing  aws  ec2  ecs  ops 
6 days ago by jm
jantman/awslimitchecker

A script and python module to check your AWS service limits and usage, and warn when usage approaches limits.

Users building out scalable services in Amazon AWS often run into AWS' service limits - often at the least convenient time (i.e. mid-deploy or when autoscaling fails). Amazon's Trusted Advisor can help this, but even the version that comes with Business and Enterprise support only monitors a small subset of AWS limits and only alerts weekly. awslimitchecker provides a command line script and reusable package that queries your current usage of AWS resources and compares it to limits (hard-coded AWS defaults that you can override, API-based limits where available, or data from Trusted Advisor where available), notifying you when you are approaching or at your limits.


(via This Week in AWS)
aws  amazon  limits  scripts  ops 
9 days ago by jm
_Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases_
'Amazon Aurora is a relational database service for OLTP workloads offered as part of Amazon Web Services (AWS). In this paper, we describe the architecture of Aurora and the design considerations leading to that architecture. We believe the central constraint in high throughput data processing has moved from compute and storage to the network. Aurora brings a novel architecture to the relational database to address this constraint, most notably by pushing redo processing to a multi-tenant scale-out storage service, purpose-built for Aurora. We describe how doing so not only reduces network traffic, but also allows for fast crash recovery, failovers to replicas without loss of data, and fault-tolerant, self-healing storage. We then describe how Aurora achieves consensus on durable state across numerous storage nodes using an efficient asynchronous scheme, avoiding expensive and chatty recovery protocols. Finally, having operated Aurora as a production service for over 18 months, we share the lessons we have learnt from our customers on what modern cloud applications expect from databases.'
via:rbranson  aurora  aws  amazon  databases  storage  papers  architecture 
10 days ago by jm
Backdooring an AWS account
eek. Things to look out for on your AWS setup:
So you’ve pwned an AWS account — congratulations — now what? You’re eager to get to the data theft, amirite? Not so fast whipper snapper, have you disrupted logging? Do you know what you have? Sweet! Time to get settled in. Maintaining persistence in AWS is only limited by your imagination but there are few obvious and oft used techniques everyone should know and watch for.
aws  security  hacks  iam  sts 
16 days ago by jm
cristim/autospotting: Pay up to 10 times less on EC2 by automatically replacing on-demand AutoScaling group members with similar or larger identically configured spot instances.
A simple and easy to use tool designed to significantly lower your Amazon AWS costs by automating the use of the spot market.

Once enabled on an existing on-demand AutoScaling group, it launches an EC2 spot instance that is cheaper, at least as large and configured identically to your current on-demand instances. As soon as the new instance is ready, it is added to the group and an on-demand instance is detached from the group and terminated.

It continuously applies this process, gradually replacing any on-demand instances with spot instances until the group only consists of spot instances, but it can also be configured to keep some on-demand instances running.
aws  golang  ec2  autoscaling  asg  spot-instances  ops 
23 days ago by jm
acksin/seespot: AWS Spot instance health check with termination and clean up support
When a Spot Instance is about to terminate there is a 2 minute window before the termination actually happens. SeeSpot is a utility for AWS Spot instances that handles the health check. If used with an AWS ELB it also handles cleanup of the instance when a Spot Termination notice is sent.
aws  elb  spot-instances  health-checks  golang  lifecycle  ops 
23 days ago by jm
AWS Greengrass
AWS Greengrass is software that lets you run local compute, messaging & data caching for connected devices in a secure way. With AWS Greengrass, connected devices can run AWS Lambda functions, keep device data in sync, and communicate with other devices securely – even when not connected to the Internet. Using AWS Lambda, Greengrass ensures your IoT devices can respond quickly to local events, operate with intermittent connections, and minimize the cost of transmitting IoT data to the cloud.

AWS Greengrass seamlessly extends AWS to devices so they can act locally on the data they generate, while still using the cloud for management, analytics, and durable storage. With Greengrass, you can use familiar languages and programming models to create and test your device software in the cloud, and then deploy it to your devices. AWS Greengrass can be programmed to filter device data and only transmit necessary information back to the cloud. AWS Greengrass authenticates and encrypts device data at all points of connection using AWS IoT’s security and access management capabilities. This way data is never exchanged between devices when they communicate with each other and the cloud without proven identity.
aws  cloud  iot  lambda  devices  offline  synchronization  architecture 
4 weeks ago by jm
Amazon DynamoDB Accelerator (DAX)
Amazon DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from milliseconds to microseconds – even at millions of requests per second. DAX does all the heavy lifting required to add in-memory acceleration to your DynamoDB tables, without requiring developers to manage cache invalidation, data population, or cluster management.


No latency percentile figures, unfortunately. Also still in preview.
amazon  dynamodb  aws  dax  performance  storage  databases  latency  low-latency 
5 weeks ago by jm
Ubuntu on AWS gets serious performance boost with AWS-tuned kernel
interesting -- faster boots, CPU throttling resolved on t2.micros, other nice stuff
aws  ubuntu  ec2  kernel  linux  ops 
6 weeks ago by jm
Lessons Learned in Lambda
In case you were thinking Lambda was potentially usable yet
lambda  aws  shitshow  architecture  serverless 
6 weeks ago by jm
Deep Dive on Amazon EBS Elastic Volumes
'March 2017 AWS Online Tech Talks' -- lots about the new volume types
aws  ebs  storage  architecture  ops  slides 
7 weeks ago by jm
atlassian/localstack: A fully functional local AWS cloud stack. Develop and test your cloud apps offline!
LocalStack provides an easy-to-use test/mocking framework for developing Cloud applications. Currently, the focus is primarily on supporting the AWS cloud stack.

LocalStack spins up the following core Cloud APIs on your local machine:

API Gateway at http://localhost:4567;
Kinesis at http://localhost:4568;
DynamoDB at http://localhost:4569;
DynamoDB Streams at http://localhost:4570;
Elasticsearch at http://localhost:4571;
S3 at http://localhost:4572;
Firehose at http://localhost:4573;
Lambda at http://localhost:4574;
SNS at http://localhost:4575;
SQS at http://localhost:4576

Additionally, LocalStack provides a powerful set of tools to interact with the cloud services, including a fully featured KCL Kinesis client with Python binding, simple setup/teardown integration for nosetests, as well as an Environment abstraction that allows to easily switch between local and remote Cloud execution.
aws  emulation  mocking  services  testing  dynamodb  s3 
9 weeks ago by jm
Segment.com on cost savings using DynamoDB, autoscaling and ECS
great post.

1. DynamoDB hot shards were a big problem -- and it is terrible that diagnosing this requires a ticket to AWS support! This heat map should be a built-in feature.

2. ECS auto-scaling gets a solid thumbs-up.

3. Switching from ELB to ALB lets them set ports dynamically for individual ECS Docker containers, and then pack as many containers as will fit on a giant EC2 instance.

4. Terraform modules to automate setup and maintainance of ECS, autoscaling groups, and ALBs
terraform  segment  architecture  aws  dynamodb  alb  elb  asg  ecs  docker 
9 weeks ago by jm
The Occasional Chaos of AWS Lambda Runtime Performance
If our code has modest resource requirements, and can tolerate large changes in performance, then it makes sense to start with the least amount of memory necessary. On the other hand, if consistency is important, the best way to achieve that is by cranking the memory setting all the way up to 1536MB.
It’s also worth noting here that CPU-bound Lambdas may be cheaper to run over time with a higher memory setting, as Jim Conning describes in his article, “AWS Lambda: Faster is Cheaper”. In our tests, we haven’t seen conclusive evidence of that behavior, but much more data is required to draw any strong conclusions.
The other lesson learned is that Lambda benchmarks should be gathered over the course of days, not hours or minutes, in order to provide actionable information. Otherwise, it’s possible to see very impressive performance from a Lambda that might later dramatically change for the worse, and any decisions made based on that information will be rendered useless.
aws  lambda  amazon  performance  architecture  ops  benchmarks 
11 weeks ago by jm
S3 2017-02-28 outage post-mortem
The Amazon Simple Storage Service (S3) team was debugging an issue causing the S3 billing system to progress more slowly than expected. At 9:37AM PST, an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended. The servers that were inadvertently removed supported two other S3 subsystems.  One of these subsystems, the index subsystem, manages the metadata and location information of all S3 objects in the region. This subsystem is necessary to serve all GET, LIST, PUT, and DELETE requests. The second subsystem, the placement subsystem, manages allocation of new storage and requires the index subsystem to be functioning properly to correctly operate. The placement subsystem is used during PUT requests to allocate storage for new objects. Removing a significant portion of the capacity caused each of these systems to require a full restart. While these subsystems were being restarted, S3 was unable to service requests. Other AWS services in the US-EAST-1 Region that rely on S3 for storage, including the S3 console, Amazon Elastic Compute Cloud (EC2) new instance launches, Amazon Elastic Block Store (EBS) volumes (when data was needed from a S3 snapshot), and AWS Lambda were also impacted while the S3 APIs were unavailable.  
s3  postmortem  aws  post-mortem  outages  cms  ops 
11 weeks ago by jm
Manage DynamoDB Items Using Time to Live (TTL)
good call.
Many DynamoDB users store data that has a limited useful life or is accessed less frequently over time. Some of them track recent logins, trial subscriptions, or application metrics. Others store data that is subject to regulatory or contractual limitations on how long it can be stored. Until now, these customers implemented their own time-based data management. At scale, this sometimes meant that they ran a couple of Amazon Elastic Compute Cloud (EC2) instances that did nothing more than scan DynamoDB items, check date attributes, and issue delete requests for items that were no longer needed. This added cost and complexity to their application. In order to streamline this popular and important use case, we are launching a new Time to Live (TTL) feature today. You can enable this feature on a table-by-table basis, specifying an item attribute that contains the expiration time for the item.
dynamodb  ttl  storage  aws  architecture  expiry 
12 weeks ago by jm
Fault Domains and the Vegas Rule | Expedia Engineering Blog
I like this concept -- analogous to AWS' AZs -- limit blast radius of an outage by explicitly defining dependency scopes
aws  az  fault-domains  vegas-rule  blast-radius  outages  reliability  architecture 
february 2017 by jm
Instapaper Outage Cause & Recovery
Hard to see this as anything other than a pretty awful documentation fail by the AWS RDS service:
Without knowledge of the pre-April 2014 file size limit, it was difficult to foresee and prevent this issue. As far as we can tell, there’s no information in the RDS console in the form of monitoring, alerts or logging that would have let us know we were approaching the 2TB file size limit, or that we were subject to it in the first place. Even now, there’s nothing to indicate that our hosted database has a critical issue.
limits  aws  rds  databases  mysql  filesystems  ops  instapaper  risks 
february 2017 by jm
Amazon EC2 Container Service Plugin - Jenkins - Jenkins Wiki
neat, relatively new plugin to use ECS as a autoscaling node fleet in Jenkins
ec2  ecs  aws  jenkins  docker  plugins 
february 2017 by jm
How and why the leap second affected Cloudflare DNS
The root cause of the bug that affected our DNS service was the belief that time cannot go backwards. In our case, some code assumed that the difference between two times would always be, at worst, zero. RRDNS is written in Go and uses Go’s time.Now() function to get the time. Unfortunately, this function does not guarantee monotonicity. Go currently doesn’t offer a monotonic time source.


So the clock went "backwards", s1 - s2 returned < 0, and the code couldn't handle it (because it's a little known and infrequent failure case).

Part of the root cause here is cultural -- Google has solved the leap-second problem internally through leap smearing, and Go seems to be fundamentally a Google product at heart.

The easiest fix in general in the "outside world" is to use "ntpd -x" to do a form of smearing. It looks like AWS are leap smearing internally (https://aws.amazon.com/blogs/aws/look-before-you-leap-the-coming-leap-second-and-aws/), but it is a shame they aren't making this a standard part of services running on top of AWS and a feature of the AWS NTP fleet.
ntp  time  leap-seconds  fail  cloudflare  rrdns  go  golang  dns  leap-smearing  ntpd  aws 
january 2017 by jm
AWS re:Invent 2016: Lessons Learned from a Year of Using Spot Fleet (CMP205) - YouTube
Yelp talk about their Spot Fleet price optimization autoscaler app, FleetMiser
yelp  scaling  aws  spot-fleet  ops  spot-instances  money 
december 2016 by jm
Auto scaling Pinterest
notes on a second-system take on autoscaling -- Pinterest tried it once, it didn't take, and this is the rerun. I like the tandem ASG approach (spots and nonspots)
spot-instances  scaling  aws  scalability  ops  architecture  pinterest  via:highscalability 
december 2016 by jm
AWS Step Functions – Build Distributed Applications Using Visual Workflows
Looks like one of the few new AWS features that's actually usable and useful -- replacement for SWF
swf  lambda  workflows  aws  architecture 
december 2016 by jm
Simple Queue Service FIFO Queues with Exactly-Once Processing & Deduplication
great, I've looked for this so many times. Only tricky limit I can spot is the 300 tps limit, and it's US-East/US-West only for now
fifo  queues  queueing  architecture  aws  sqs 
november 2016 by jm
Julia Evans reverse engineers Skyliner.io
simple usage of Docker, blue/green deploys, and AWS ALBs
docker  alb  aws  ec2  blue-green-deploys  deployment  ops  tools  skyliner  via:jgilbert 
november 2016 by jm
Skyliner
Coda Hale's new gig on how they're using Docker, AWS, etc. I like this: "Use containers. Not too much. Mostly for packaging."
docker  aws  packaging  ops  devops  containers  skyliner 
september 2016 by jm
AWS Case Study: mytaxi
ECS, Docker, ELB, SQS, SNS, RDS, VPC, and spot instances. Pretty canonical setup these days...
The mytaxi app is also now able to predict daily and weekly spikes. In addition, it has gained the elasticity required to meet demand during special events. Herzberg describes a typical situation on New Year's Eve: “Shortly before midnight everyone needs a taxi to get to parties, and after midnight people want to go home. In past years we couldn't keep up with the demand this generated, which was around three and a half times as high as normal. In November 2015 we moved our Docker container architecture to Amazon ECS, and for the first time ever in December we were able to celebrate a new year in which our system could handle the huge number of requests without any crashes or interruptions—an accomplishment that we were extremely proud of. We had faced the biggest night on the calendar without any downtime.”
mytaxi  aws  ecs  docker  elb  sqs  sns  rds  vpc  spot-instances  ops  architecture 
august 2016 by jm
Cross-Region Read Replicas for Amazon Aurora
Creating a read replica in another region also creates an Aurora cluster in the region. This cluster can contain up to 15 more read replicas, with very low replication lag (typically less than 20 ms) within the region (between regions, latency will vary based on the distance between the source and target). You can use this model to duplicate your cluster and read replica setup across regions for disaster recovery. In the event of a regional disruption, you can promote the cross-region replica to be the master. This will allow you to minimize downtime for your cross-region application. This feature applies to unencrypted Aurora clusters.
aws  mysql  databases  storage  replication  cross-region  failover  reliability  aurora 
june 2016 by jm
Green/Blue Deployments with AWS Lambda and CloudFormation - done right
Basically, use a Lambda to put all instances from an ASG into the ELB, then remove the old ASG
asg  elb  aws  lambda  deployment  ops  blue-green-deploys 
may 2016 by jm
3 Reasons AWS Lambda Is Not Ready for Prime Time
This totally matches my own preconceptions ;)
When we at Datawire tried to actually use Lambda for a real-world HTTP-based microservice [...], we found some uncool things that make Lambda not yet ready for the world we live in:

Lambda is a building block, not a tool;
Lambda is not well documented;
Lambda is terrible at error handling

Lung skips these uncool things, which makes sense because they’d make the tutorial collapse under its own weight, but you can’t skip them if you want to work in the real world. (Note that if you’re using Lambda for event handling within the AWS world, your life will be easier. But the really interesting case in the microservice world is Lambda and HTTP.)
aws  lambda  microservices  datawire  http  api-gateway  apis  https  python  ops 
may 2016 by jm
Key Metrics for Amazon Aurora | AWS Partner Network (APN) Blog
Very DataDog-oriented, but some decent tips on monitorable metrics here
datadog  metrics  aurora  aws  rds  monitoring  ops 
may 2016 by jm
Amazon S3 Transfer Acceleration
The AWS edge network has points of presence in more than 50 locations. Today, it is used to distribute content via Amazon CloudFront and to provide rapid responses to DNS queries made to Amazon Route 53. With today’s announcement, the edge network also helps to accelerate data transfers in to and out of Amazon S3. It will be of particular benefit to you if you are transferring data across or between continents, have a fast Internet connection, use large objects, or have a lot of content to upload.

You can think of the edge network as a bridge between your upload point (your desktop or your on-premises data center) and the target bucket. After you enable this feature for a bucket (by checking a checkbox in the AWS Management Console), you simply change the bucket’s endpoint to the form BUCKET_NAME.s3-accelerate.amazonaws.com. No other configuration changes are necessary! After you do this, your TCP connections will be routed to the best AWS edge location based on latency.  Transfer Acceleration will then send your uploads back to S3 over the AWS-managed backbone network using optimized network protocols, persistent connections from edge to origin, fully-open send and receive windows, and so forth.
aws  s3  networking  infrastructure  ops  internet  cdn 
april 2016 by jm
AWSume
'AWS Assume Made Awesome' -- 'Here are Trek10, we work with many clients, and thus work with multiple AWS accounts on a regular (daily) basis. We needed a way to make managing all our different accounts easier. We create a standard Trek10 administrator role in our clients’ accounts that we can assume. For security we require that the role assumer have multifactor authentication enabled.'
mfa  aws  awsume  credentials  accounts  ops 
april 2016 by jm
Running Docker on AWS from the ground up
Advantages/disavantages section right at the bottom is good.
ECS, believe it or not, is one of the simplest Schedulers out there. Most of the other alternatives I’ve tried offer all sorts of fancy bells & whistles, but they are either significantly more complicated to understand (lots of new concepts), take too much effort to set up (lots of new technologies to install and run), are too magical (and therefore impossible to debug), or some combination of all three. That said, ECS also leaves a lot to be desired.
aws  docker  ecs  ec2  schedulers 
april 2016 by jm
s3git
git for Cloud Storage. Create distributed, decentralized and versioned repositories that scale infinitely to 100s of millions of files and PBs of storage. Huge repos can be cloned on your local SSD for making changes, committing and pushing back. Oh yeah, and it dedupes too due to BLAKE2 Tree hashing. http://s3git.org
git  ops  storage  cloud  s3  disk  aws  version-control  blake2 
april 2016 by jm
Charity Majors - AWS networking, VPC, environments and you
'VPC is the future and it is awesome, and unless you have some VERY SPECIFIC AND CONVINCING reasons to do otherwise, you should be spinning up a VPC per environment with orchestration and prob doing it from CI on every code commit, almost like it’s just like, you know, code.'
networking  ops  vpc  aws  environments  stacks  terraform 
march 2016 by jm
Yeobot - CloudNative
'The shared SQL command line for AWS'. it's #chatopsy!
chatops  yeobot  bots  cloudnative  ec2  aws  slack 
march 2016 by jm
AWS Certificate Manager – Deploy SSL/TLS-Based Apps on AWS
Very nifty -- autodeploys free wildcard certs to ELBs and Cloudfront. HN discussion thread is pretty good: https://news.ycombinator.com/item?id=10947186
ssl  tls  certificates  ops  aws  cloudfront  elb 
january 2016 by jm
VPC NAT gateways : transactional uniqueness at scale
colmmacc introducing the VPC NAT gateway product from AWS, in a guest post on James Hamilton's blog no less!:
you can think of it as a new “even bigger” [NAT] box, but under the hood NAT gateways are different. The connections are managed by a fault-tolerant co-operation of devices in the VPC network fabric. Each new connection is assigned a port in a robust and transactional way, while also being replicated across an extensible set of multiple devices. In other words: the NAT gateway is internally horizontally scalable and resilient.
amazon  ec2  nat  networking  aws  colmmacc 
january 2016 by jm
2016 Wish List for AWS?
good thread of AWS' shortcomings -- so many services still don't handle VPC for instance
vpc  aws  ec2  ops  wishlist 
december 2015 by jm
Amazon EC2 Container Registry
hooray, Docker registry here at last
ecs  docker  registry  ops  containers  aws 
december 2015 by jm
How to host Hugo static website generator on AWS Lambda
seriously, AWS. editing JSON files in a browser text box is an awful, awful user experience
aws  lambda  hosting  hugo  blogs  website-generators 
december 2015 by jm
Why We Chose Kubernetes Over ECS
3 months ago when we, at nanit.com, came to evaluate which Docker orchestration framework to use, we gave ECS the first priority. We were already familiar with AWS services, and since we already had our whole infrastructure there, it was the default choice. After testing the service for a while we had the feeling it was not mature enough and missing some key features we needed (more on that later), so we went to test another orchestration framework: Kubernetes. We were glad to discover that Kubernetes is far more comprehensive and had almost all the features we required. For us, Kubernetes won ECS on ECS’s home court, which is AWS.
kubernetes  ecs  docker  containers  aws  ec2  ops 
december 2015 by jm
AWS Api Gateway for Fun and Profit
good worked-through example of an API Gateway rewriting system
api-gateway  aws  api  http  services  ops  alerting  alarming  opsgenie  signalfx 
december 2015 by jm
A Gulp Workflow for Amazon Lambda
'any nontrivial development of Lambda functions will require a simple, automated build/deploy process that also fills a couple of Lambda’s gaps such as the use of node modules and environment variables.'

See also https://medium.com/@AdamRNeary/developing-and-testing-amazon-lambda-functions-e590fac85df4#.mz0a4qk3j : 'I am psyched about Amazon’s new Lambda service for asynchronous task processing, but the ideal development and testing cycle is really left to the engineer. While Amazon provides a web-based console, I prefer an approach that uses Mocha. Below you will find the gritty details using Kinesis events as a sample input.'
lambda  aws  services  testing  deployment  ops  mocha  gulp  javascript 
december 2015 by jm
Global Continuous Delivery with Spinnaker
Netflix' CD platform, post-Atlas. looks interesting
continuous-delivery  aws  netflix  cd  devops  ops  atlas  spinnaker 
november 2015 by jm
Dynalite
Awesome new mock DynamoDB implementation:
An implementation of Amazon's DynamoDB, focussed on correctness and performance, and built on LevelDB (well, @rvagg's awesome LevelUP to be precise). This project aims to match the live DynamoDB instances as closely as possible (and is tested against them in various regions), including all limits and error messages.

Why not Amazon's DynamoDB Local? Because it's too buggy! And it differs too much from the live instances in a number of key areas.


We use DynamoDBLocal in our tests -- the availability of that tool is one of the key reasons we have adopted Dynamo so heavily, since we can safely test our code properly with it. This looks even better.
dynamodb  testing  unit-tests  integration-testing  tests  ops  dynalite  aws  leveldb 
november 2015 by jm
Valid MFA token does not work during first 1am hour before daylight savings ends and second 1am hour starts · Issue #1611 · aws/aws-cli
Add another one to the "yay for DST" pile. (also yay for AWS using PST/PDT as default internal timezone instead of UTC...)
utc  timezones  fail  bugs  aws  aws-cli  dst  daylight-savings  time 
november 2015 by jm
It's an Emulator, Not a Petting Zoo: Emu and Lambda
a Lambda emulator in Python, suitable for unit testing lambdas
lambda  aws  coding  unit-tests  dev 
october 2015 by jm
Amazon ECS CLI Tutorial - Amazon EC2 Container Service
super-basic ECS tutorial, using a docker-compose.yml to create a new ECS-managed service fleet
ecs  cli  linux  aws  ec2  hosting  docker  tutorials 
october 2015 by jm
Hologram
Hologram exposes an imitation of the EC2 instance metadata service on developer workstations that supports the [IAM Roles] temporary credentials workflow. It is accessible via the same HTTP endpoint to calling SDKs, so your code can use the same process in both development and production. The keys that Hologram provisions are temporary, so EC2 access can be centrally controlled without direct administrative access to developer workstations.
iam  roles  ec2  authorization  aws  adroll  open-source  cli  osx  coding  dev 
october 2015 by jm
(ARC308) The Serverless Company: Using AWS Lambda
Describing PlayOn! Sports' Lambda setup. Sounds pretty productionizable
ops  lambda  aws  reinvent  slides  architecture 
october 2015 by jm
AWS re:Invent 2015 Video & Slide Presentation Links with Easy Index
Andrew Spyker's roundup:
my quick index of all re:Invent sessions.  Please wait for a few days and I'll keep running the tool to fill in the index.  It usually takes Amazon a few weeks to fully upload all the videos and slideshares.


Pretty definitive, full text descriptions of all sessions (and there are an awful lot of 'em).
aws  reinvent  andrew-spyker  scraping  slides  presentations  ec2  video 
october 2015 by jm
AWS re:Invent 2015 | (CMP406) Amazon ECS at Coursera - YouTube
Coursera are running user-submitted code in ECS! interesting stuff about how they use Docker security/resource-limiting features, forking the ecs-agent code, to run user-submitted code. :O
coursera  user-submitted-code  sandboxing  docker  security  ecs  aws  resource-limits  ops 
october 2015 by jm
The Totally Managed Analytics Pipeline: Segment, Lambda, and Dynamo
notable mainly for the details of Terraform support for Lambda: that's a significant improvement to Lambda's production-readiness
aws  pipelines  data  streaming  lambda  dynamodb  analytics  terraform  ops 
october 2015 by jm
Rebuilding Our Infrastructure with Docker, ECS, and Terraform
Good writeup of current best practices for a production AWS architecture
aws  ops  docker  ecs  ec2  prod  terraform  segment  via:marc 
october 2015 by jm
ECJ ruling on Irish privacy case has huge significance
The only current way to comply with EU law, the judgment indicates, is to keep EU data within the EU. Whether those data can be safely managed within facilities run by US companies will not be determined until the US rules on an ongoing Microsoft case.
Microsoft stands in contempt of court right now for refusing to hand over to US authorities, emails held in its Irish data centre. This case will surely go to the Supreme Court and will be an extremely important determination for the cloud business, and any company or individual using data centre storage. If Microsoft loses, US multinationals will be left scrambling to somehow, legally firewall off their EU-based data centres from US government reach.


(cough, Amazon)
aws  hosting  eu  privacy  surveillance  gchq  nsa  microsoft  ireland 
october 2015 by jm
EC2 Spot Blocks for Defined-Duration Workloads
you can now launch Spot instances that will run continuously for a finite duration (1 to 6 hours). Pricing is based on the requested duration and the available capacity, and is typically 30% to 45% less than On-Demand.
ec2  aws  spot-instances  spot  pricing  time 
october 2015 by jm
Chaos Engineering Upgraded
some details on Netflix's Chaos Monkey, Chaos Kong and other aspects of their availability/failover testing
architecture  aws  netflix  ops  chaos-monkey  chaos-kong  testing  availability  failover  ha 
september 2015 by jm
Summary of the Amazon DynamoDB Service Disruption and Related Impacts in the US-East Region
Painful to read, but: tl;dr: monitoring oversight, followed by a transient network glitch triggering IPC timeouts, which increased load due to lack of circuit breakers, creating a cascading failure
aws  postmortem  outages  dynamodb  ec2  post-mortems  circuit-breakers  monitoring 
september 2015 by jm
How We Use AWS Lambda for Rapidly Intensifying Workloads · CloudSploit
impressive -- pretty much the entire workload is run from Lambda here
lambda  aws  ec2  autoscaling  cloudsploit 
september 2015 by jm
Evolution of Babbel’s data pipeline on AWS: from SQS to Kinesis
Good "here's how we found it" blog post:

Our new data pipeline with Kinesis in place allows us to plug new consumers without causing any damage to the current system, so it’s possible to rewrite all Queue Workers one by one and replace them with Kinesis Workers. In general, the transition to Kinesis was smooth and there were not so tricky parts.
Another outcome was significantly reduced costs – handling almost the same amount of data as SQS, Kinesis appeared to be many times cheaper than SQS.
aws  kinesis  kafka  streaming  data-pipelines  streams  sqs  queues  architecture  kcl 
september 2015 by jm
Spot Bid Advisor
analyzes Spot price history to help you determine a bid price that suits your needs.
ec2  aws  spot  spot-instances  history 
september 2015 by jm
S3QL
a file system that stores all its data online using storage services like Google Storage, Amazon S3, or OpenStack. S3QL effectively provides a hard disk of dynamic, infinite capacity that can be accessed from any computer with internet access running Linux, FreeBSD or OS-X.
S3QL is a standard conforming, full featured UNIX file system that is conceptually indistinguishable from any local file system. Furthermore, S3QL has additional features like compression, encryption, data de-duplication, immutable trees and snapshotting which make it especially suitable for online backup and archival.
S3QL is designed to favor simplicity and elegance over performance and feature-creep. Care has been taken to make the source code as readable and serviceable as possible. Solid error detection and error handling have been included from the very first line, and S3QL comes with extensive automated test cases for all its components.
filesystems  aws  s3  storage  unix  google-storage  openstack 
september 2015 by jm
Amazon EC2 2015 Benchmark: Testing Speeds Between AWS EC2 and S3 Regions
Here we are again, a year later, and still no bloody percentiles! Just amateurish averaging. This is not how you measure anything, ffs. Still, better than nothing I suppose
fail  latency  measurement  aws  ec2  percentiles  s3 
august 2015 by jm
Implementing Efficient and Reliable Producers with the Amazon Kinesis Producer Library - AWS Big Data Blog
Good advice on production-quality, decent-scale usage of Kinesis in Java with the official library: batching, retries, partial failures, backoff, and monitoring. (Also, jaysus, the AWS Cloudwatch API is awful, looking at this!)
kpl  aws  kinesis  tips  java  batching  streaming  production  cloudwatch  monitoring  coding 
august 2015 by jm
Amazon S3 Introduces New Usability Enhancements
bucket limit increase, and read-after-write consistency in US Standard. About time too! ;)
aws  s3  storage  consistency 
august 2015 by jm
danilop/runjop · GitHub
RunJOP (Run Just Once Please) is a distributed execution framework to run a command (i.e. a job) only once in a group of servers [built using AWS DynamoDB and S3].


nifty! Distributed cron is pretty easy when you've got Dynamo doing the heavy lifting.
dynamodb  cron  distributed-cron  scheduling  runjop  danilop  hacks  aws  ops 
july 2015 by jm
« earlier      
per page:    204080120160

related tags

10/8  aas  accounts  acm  acm-queue  adrian-cockcroft  adrian-cockroft  adroll  advent  advice  alarming  alb  alerting  alestic  alibaba  aliyun  allocation  ama  amazon  ami  analytics  andrew-spyker  animation  anti-fraud  ap  aphyr  api  api-gateway  apis  architecture  asg  asgard  atlas  auc  aurora  authentication  authorization  auto-scaling  autoscaling  availability  aws  aws-cli  awscli  awsume  az  azul  azure  b2b  b2c  backup  backups  batch  batching  beanstalk  benchmarks  billing  blake2  blast-radius  blogs  blue-green-deployments  blue-green-deploys  boto  bots  bugs  burst  cacerts  campaigns  cap  cassandra  cd  cdn  cep  certificates  certs  chaos-kong  chaos-monkey  chatops  chef  china  chris-newcombe  circuit-breakers  cli  clients  clock  cloud  cloud-connect  cloudflare  cloudformation  cloudfront  cloudnative  cloudsearch  cloudsmith  cloudsploit  cloudwatch  cluster  clustering  clusters  cms  code-spaces  codedeploy  coding  colmmacc  comet  command-line  comparison  conferences  consistency  constant-load  containers  continuous-delivery  corporate-culture  corruption  costs  counters  coursera  cp  cpu  credentials  credstash  cron  cross-region  crypto  culture  curl  danilop  data  data-pipelines  data-protection  data-science  data-structures  databases  datadog  dataman  datawire  dax  daylight-savings  ddos  dedupe  delete  delivery  delta  demo  deploy  deployment  design  dev  devices  devops  dht  disk  disks  distcomp  distributed-cron  distsys  dns  docker  documentation  documents  dos  dropbox  dst  duplicity  duply  dynalite  dynamodb  dynect  ebs  ec2  ecommerce  ecs  elasticache  elb  email  embedded  emr  emrfs  emulation  environments  eric-brandwine  eric-hammond  etcd  etl  eu  eu-central-1  eureka  event-processing  event-streaming  events  eventual-consistency  examples  expiry  export  extortion  fail  failover  failures  fault-domains  fault-tolerance  ffmpeg  fifo  figures  filesystems  firewalls  five-eyes  flow-logs  fluentd  formal-methods  fpga  fraud  freebsd  funny  fuse  games  gaming  gce  gchq  germany  gifs  gilt  git  github  go  golang  google  google-storage  gossip  graphics  grey-failures  gulp  h264  ha  hacks  hadoop  hailo  haproxy  hardware  health-checks  history  hls  hosting  http  https  hugo  hvm  hystrix  iam  incident-response  infrastructure  instances  instapaper  integration-testing  inter-region  internet  io  iops  iostat  iot  ip-addresses  ipc  ireland  ironfan  james-hamilton  java  javascript  jenkins  jepsen  jmespath  jobs  json  kafka  kappa  kcl  kernel  key-length  key-management  key-rotation  keys  kinesis  kms  knife  kpl  kubernetes  lambda  languages  latency  law  leap-seconds  leap-smearing  legacy  leveldb  libraries  lifecycle  limits  linux  load  load-balancing  load-testing  loaders  loading  logging  low-latency  lucene  machine-learning  management  marc-brooker  measurement  memcached  memory  mesos  messaging  metrics  mfa  microservices  microsoft  mit  mitch-garnaat  ml  mocha  mocking  mocks  model-checking  money  mongodb  monitoring  mp4  multi-az  multi-region  mysql  mytaxi  nat  netflix  network  network-partitions  networking  nginx  node  node.js  nosql  notifications  nsa  ntp  ntpd  obama  object-model  offline  omgwtfbbq  onlive  oo  open-source  openstack  ops  opsgenie  optimization  osx  outages  overload  packaging  packet-capture  packets  papers  partitions  pbailis  pdf  percentiles  perf  perfect-forward-secrecy  performance  periodic-tasks  pinterest  piops  pipeline  pipelines  pki  plugins  pluscal  post-mortem  post-mortems  postmortem  prediction  presentations  pricing  privacy  prod  production  programming  proving  provisioning  proxying  puppet  pv  pylons  python  qcon  qos  querying  queueing  queues  quora  quorum  r3  rabbitmq  raid  rdbms  rds  read-after-writes  recurrence  reddit  redis  redshift  registry  reinvent  reliability  reliabilty  replication  resilience  resource-limits  ribbon  rightscale  risks  roles  round-trip  route53  rrdns  rsa  ruby  runjop  s3  s3fs  s3funnel  s3ql  sandboxing  scala  scalability  scale  scaling  scalr  schedulers  scheduling  sched_batch  scraping  scripts  scryer  sdk  sdn  search  secrets  security  segment  serverless  servers  service-discovery  service-registry  services  ses  sha1  sharding  shitshow  sift-science  signalfx  simulation  skyliner  slack  slas  slides  smartstack  smtp  smugmug  snapshots  snooping  sns  sockets  solr  spark  speculative-execution  spikes  spinnaker  spot  spot-fleet  spot-instances  sql  sqs  ssl  stack-hammer  stackdriver  stacks  startups  steam  storage  streaming  streams  sts  surveillance  survey  swf  synchronization  sysadmin  system-tests  systemtap  tail  talks  tcp  tcpdump  tellybug  terraform  testing  tests  thrift  time  timezones  tips  tla  tla+  tlc  tls  tools  tornado  tracing  transactions  ttl  tunables  tuning  tutorials  two-factor-authentication  ubuntu  ultradns  unit-tests  unix  uploads  us-east  user-submitted-code  utc  vegas-rule  version-control  versioning  via:brianscanlan  via:chorn  via:highscalability  via:jgilbert  via:marc  via:marc-brooker  via:matt-sergeant  via:nelson  via:pdolan  via:rbranson  video  virtualization  vlans  vpc  web-services  website-generators  white-papers  whitepapers  windows  wishlist  workarounds  worker-pools  workflows  xen  xpath  yas3fs  yelp  yeobot  zing  zonify  zookeeper 

Copy this bookmark:



description:


tags: