dtjm + scalability   9

How HipChat Stores and Indexes Billions of Messages Using ElasticSearch and Redis - High Scalability -
Amazon’s flakiness makes you develop better. Don’t put all your eggs in one basket, if a node goes down you have deal with it or traffic will be lost for some users.
elasticsearch  redis  scalability  search  distributedsystems 
december 2014 by dtjm
Continuous Deployment at Quora - Engineering at Quora - Quora
Between 12:00 AM and 11:59 PM on April 25, 2013, Quora released new versions of the site 46 times. This was a normal day for us. We run a very fast continuous-deployment cycle, where changes are pushed live as soon as they are committed.
scalability  devops 
may 2013 by dtjm
High Scalability - High Scalability - Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
Lessons Learned

It will fail. Keep it simple.

Keep it fun. There’s a lot of new people joining the team. If you just give them a huge complex system it won’t be fun. Keeping the architecture simple has been a big win. New engineers have been contributing code from week one.

When you push something to the limit all these technologies fail in their own special way.

Architecture is doing the right thing when growth can be handled by adding more of the same stuff. You want to be able to scale by throwing money at the problem by throwing more boxes at the problem as you need them. If your architecture can do that, then you’re golden.

Cluster Management Algorithm is a SPOF. If there’s a bug it impacts every node. This took them down 4 times.

To handle rapid growth you need to spread data out evenly to handle the ever increasing load.

The least data you move across your nodes the more stable your architecture. This is why they went with sharding over clusters.

A service oriented architecture rules. It isolates functionality, helps reduce connections, organize teams, organize support, and improves security.

Asked yourself what your really want to be. Drop technologies that match that vision, even if you have to rearchitecture everything.

Don’t freak out about losing a little data. They keep user data in memory and write it out periodically. A loss means only a few hours of data are lost, but the resulting system is much simpler and more robust, and that’s what matters.

When you push something to the limit all technologies fail in their own special way. This lead them to evaluate tool choices with a preference for tools that are: mature; really good and simple; well known and liked; well supported; consistently good performers; failure free as possible; free. Using these criteria they selected: MySQL, Solr, Memcache, and Redis. Cassandra and Mongo were dropped.
architecture  scalability  distributedsystems 
april 2013 by dtjm

Copy this bookmark: