jm + memcached   16

Evolution of Application Data Caching : From RAM to SSD
Memcached provides an external storage shim called extstore, that supports storing of data on SSD (I2) and NVMe (I3). extstore is efficient in terms of cost & storage device utilization without compromising the speed and throughput. All the metadata (key & other metadata) is stored in RAM whereas the actual data is stored on flash.
memcached  netflix  services  storage  memory  ssd  nvme  extstore  caching 
july 2018 by jm
MemC3: Compact and concurrent Memcache with dumber caching and smarter hashing
An improved hashing algorithm called optimistic cuckoo hashing, and a CLOCK-based eviction algorithm that works in tandem with it. They are evaluated in the context of Memcached, where combined they give up to a 30% memory usage reduction and up to a 3x improvement in queries per second as compared to the default Memcached implementation on read-heavy workloads with small objects (as is typified by Facebook workloads).
memcached  performance  key-value-stores  storage  databases  cuckoo-hashing  algorithms  concurrency  caching  cache-eviction  memory  throughput 
november 2016 by jm
How both TCP and Ethernet checksums fail
At Twitter, a team had a unusual failure where corrupt data ended up in memcache. The root cause appears to have been a switch that was corrupting packets. Most packets were being dropped and the throughput was much lower than normal, but some were still making it through. The hypothesis is that occasionally the corrupt packets had valid TCP and Ethernet checksums. One "lucky" packet stored corrupt data in memcache. Even after the switch was replaced, the errors continued until the cache was cleared.


YA occurrence of this bug. When it happens, it tends to _really_ screw things up, because it's so rare -- we had monitoring for this in Amazon, and when it occurred, it overwhelmingly occurred due to host-level kernel/libc/RAM issues rather than stuff in the network. Amazon design principles were to add app-level checksumming throughout, which of course catches the lot.
networking  tcp  ip  twitter  ethernet  checksums  packets  memcached 
october 2015 by jm
How Twitter Uses Redis to Scale
'105TB RAM, 39MM QPS, 10,000+ instances.' Notes from a talk given by Yao Yu of Twitter's Cache team, where she's worked for 4 years. Lots of interesting insights into large-scale Redis caching usage -- as in, large enough to max out the cluster hosts' network bandwidth.
twitter  redis  caching  memcached  yao-yu  scaling 
september 2014 by jm
Sirius by Comcast
At Comcast, our applications need convenient, low-latency access to important reference datasets. For example, our XfinityTV websites and apps need to use entertainment-related data to serve almost every API or web request to our datacenters: information like what year Casablanca was released, or how many episodes were in Season 7 of Seinfeld, or when the next episode of the Voice will be airing (and on which channel!).

We traditionally managed this information with a combination of relational databases and RESTful web services but yearned for something simpler than the ORM, HTTP client, and cache management code our developers dealt with on a daily basis. As main memory sizes on commodity servers continued to grow, however, we asked ourselves: How can we keep this reference data entirely in RAM, while ensuring it gets updated as needed and is easily accessible to application developers?

The Sirius distributed system library is our answer to that question, and we're happy to announce that we've made it available as an open source project. Sirius is written in Scala and uses the Akka actor system under the covers, but is easily usable by any JVM-based language.

Also includes a Paxos implementation with "fast follower" read-only slave replication. ASL2-licensed open source.

The only thing I can spot to be worried about is speed of startup; they note that apps need to replay a log at startup to rebuild state, which can be slow if unoptimized in my experience.

Update: in a twitter conversation at https://twitter.com/jon_moore/status/459363751893139456 , Jon Moore indicated they haven't had problems with this even with 'datasets consuming 10-20GB of heap', and have 'benchmarked a 5-node Sirius ingest cluster up to 1k updates/sec write throughput.' That's pretty solid!
open-source  comcast  paxos  replication  read-only  datastores  storage  memory  memcached  redis  sirius  scala  akka  jvm  libraries 
april 2014 by jm
MICA: A Holistic Approach To Fast In-Memory Key-Value Storage [paper]
Very interesting new approach to building a scalable in-memory K/V store. As Rajiv Kurian notes on the mechanical-sympathy list:

'The basic idea is that each core is responsible for a portion of the key-space and requests are forwarded to the right core, avoiding multiple-writer scenarios. This is opposed to designs like memcache which uses locks and shared memory.

Some of the things I found interesting: The single writer design is taken to an extreme. Clients assist the partitioning of requests, by calculating hashes before submitting GET requests. It uses Intel DPDK instead of sockets to forward packets to the right core, without processing the packet on any core. Each core is paired with a dedicated RX/TX queue. The design for a lossy cache is simple but interesting. It does things like replacing a hash slot (instead of chaining) etc. to take advantage of the lossy nature of caches. There is a lossless design too. A bunch of tricks to optimize for memory performance. This includes pre-allocation, design of the hash indexes, prefetching tricks etc. There are some other concurrency tricks that were interesting. Handling dangling pointers was one of them.'

Source code here: https://github.com/efficient/mica
mica  in-memory  memory  ram  key-value-stores  storage  smp  dpdk  multicore  memcached  concurrency 
april 2014 by jm
hlld
a high-performance C server which is used to expose HyperLogLog sets and operations over them to networked clients. It uses a simple ASCII protocol which is human readable, and similar to memcached.

HyperLogLog's are a relatively new sketching data structure. They are used to estimate cardinality, i.e. the unique number of items in a set. They are based on the observation that any bit in a "good" hash function is indepedenent of any other bit and that the probability of getting a string of N bits all set to the same value is 1/(2^N). There is a lot more in the math, but that is the basic intuition. What is even more incredible is that the storage required to do the counting is log(log(N)). So with a 6 bit register, we can count well into the trillions. For more information, its best to read the papers referenced at the end. TL;DR: HyperLogLogs enable you to have a set with about 1.6% variance, using 3280 bytes, and estimate sizes in the trillions.


(via:cscotta)
hyper-log-log  hlld  hll  data-structures  memcached  daemons  sketching  estimation  big-data  cardinality  algorithms  via:cscotta 
june 2013 by jm
memcached turns 10 years old
Well, apparently tomorrow, but close enough. Happy birthday to bradfitz' greatest creation and its wonderful slab allocator!
birthdays  code  via:alex-popescu  open-source  history  malloc  memory  caching  memcached 
may 2013 by jm
High Scalability - Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
wow, Pinterest have a pretty hardcore architecture. Sharding to the max. This is scary stuff for me:
a [Cassandra-style] Cluster Management Algorithm is a SPOF. If there’s a bug it impacts every node. This took them down 4 times.


yeah, so, eek ;)
clustering  sharding  architecture  aws  scalability  scaling  pinterest  via:matt-sergeant  redis  mysql  memcached 
april 2013 by jm
bloomd
a high-performance C server which is used to expose bloom filters and operations over them to networked clients. It uses a simple ASCII protocol which is human readable, and similar to memcached.
(via Tony Finch)
via:fanf  memcached  bloomd  open-source  bloom-filters 
march 2013 by jm
Fatcache
from Twitter -- 'a cache for your big data. Even though memory is thousand times faster than SSD, network connected SSD-backed memory makes sense, if we design the system in a way that network latencies dominate over the SSD latencies by a large factor. To understand why network connected SSD makes sense, it is important to understand the role distributed memory plays in large-scale web architecture. In recent years, terabyte-scale, distributed, in-memory caches have become a fundamental building block of any web architecture. In-memory indexes, hash tables, key-value stores and caches are increasingly incorporated for scaling throughput and reducing latency of persistent storage systems. However, power consumption, operational complexity and single node DRAM cost make horizontally scaling this architecture challenging. The current cost of DRAM per server increases dramatically beyond approximately 150 GB, and power cost scales similarly as DRAM density increases. Fatcache extends a volatile, in-memory cache by incorporating SSD-backed storage.'
twitter  ssd  cache  caching  memcached  memcache  memory  network  storage 
february 2013 by jm
Facebook monitoring cache with Claspin
reasonably nice heatmap viz for large-scale instance monitoring. I like the "snake" pattern for racks
facebook  monitoring  dataviz  heatmaps  claspin  cache  memcached  ui 
september 2012 by jm
Scaling: It's Not What It Used To Be
skamille's top 5 scaling apps. "1. Redis. I was at a NoSQL meetup last night when someone asked "if you could put a million dollars behind one of the solutions presented here tonight, which one would you choose?" And the answer that one of the participants gave was "None of the above. I would choose Redis. Everyone uses one of these products and Redis."
2. Nginx. Your ops team probably already loves it. It's simple, it scales fabulously, and you don't have to be a programmer to understand how to run it.
3. HAProxy. Because if you're going to have hundreds or thousands of servers, you'd better have good load balancing.
4. Memcached. Redis can act as a cache but using a real caching product for such a purpose is probably a better call.
And finally:
5. Cloud hardware. Imagine trying to grow out to millions of users if you had to buy, install, and admin every piece of hardware you would need to do such a thing."
scaling  nginx  memcached  haproxy  redis 
april 2012 by jm
good taxonomy of memcached use cases
via Jeff Barr's announcement of the Elasticache launch. from 2008, but a better taxonomy than I've seen elsewhere
memcached  caching  mysql  performance  scalability  via:jeffbarr 
august 2011 by jm
Quora’s Technology Examined
Python, Nginx, Tornado for COMET stuff, MySQL as a data store, memcached, Thrift, haproxy, AWS, Pylons.  fantastic, very detailed post (via Nelson)
quora  python  nginx  tornado  comet  mysql  memcached  thrift  haproxy  aws  pylons  via:nelson  from delicious
february 2011 by jm
Cache on Delivery
Mind-boggling presentation; a load of sites are exposing memcacheds to the public internet, with no auth, and full of juicy data (samples included). iptables is hard
memcached  security  hacks  exploits  from delicious
august 2010 by jm

Copy this bookmark:



description:


tags: