jm + caching   27

Locking, Little's Law, and the USL
Excellent explanatory mailing list post by Martin Thompson to the mechanical-sympathy group, discussing Little's Law vs the USL:
Little's law can be used to describe a system in steady state from a queuing perspective, i.e. arrival and leaving rates are balanced. In this case it is a crude way of modelling a system with a contention percentage of 100% under Amdahl's law, in that throughput is one over latency.

However this is an inaccurate way to model a system with locks. Amdahl's law does not account for coherence costs. For example, if you wrote a microbenchmark with a single thread to measure the lock cost then it is much lower than in a multi-threaded environment where cache coherence, other OS costs such as scheduling, and lock implementations need to be considered.

Universal Scalability Law (USL) accounts for both the contention and the coherence costs.

When modelling locks it is necessary to consider how contention and coherence costs vary given how they can be implemented. Consider in Java how we have biased locking, thin locks, fat locks, inflation, and revoking biases which can cause safe points that bring all threads in the JVM to a stop with a significant coherence component.
usl  scaling  scalability  performance  locking  locks  java  jvm  amdahls-law  littles-law  system-dynamics  modelling  systems  caching  threads  schedulers  contention 
6 days ago by jm
_Optimal Probabilistic Cache Stampede Prevention_ [pdf]
'When a frequently-accessed cache item expires, multiple requests
to that item can trigger a cache miss and start regenerating
that same item at the same time. This phenomenon,
known as cache stampede, severely limits the performance
of databases and web servers. A natural countermeasure to
this issue is to let the processes that perform such requests
to randomly ask for a regeneration before the expiration
time of the item. In this paper we give optimal algorithms
for performing such probabilistic early expirations. Our algorithms
are theoretically optimal and have much better
performances than other solutions used in real-world applications.'

(via Marc Brooker)
via:marcbrooker  caching  caches  algorithm  probabilistic  expiration  vldb  papers  expiry  cache-miss  stampedes 
may 2017 by jm
MemC3: Compact and concurrent Memcache with dumber caching and smarter hashing
An improved hashing algorithm called optimistic cuckoo hashing, and a CLOCK-based eviction algorithm that works in tandem with it. They are evaluated in the context of Memcached, where combined they give up to a 30% memory usage reduction and up to a 3x improvement in queries per second as compared to the default Memcached implementation on read-heavy workloads with small objects (as is typified by Facebook workloads).
memcached  performance  key-value-stores  storage  databases  cuckoo-hashing  algorithms  concurrency  caching  cache-eviction  memory  throughput 
november 2016 by jm
Caffeine cache adopts Window TinyLfu eviction policy
'Caffeine is a Java 8 rewrite of Guava's cache. In this version we focused on improving the hit rate by evaluating alternatives to the classic least-recenty-used (LRU) eviction policy. In collaboration with researchers at Israel's Technion, we developed a new algorithm that matches or exceeds the hit rate of the best alternatives (ARC, LIRS). A paper of our work is being prepared for publication.'

W-TinyLfu uses a small admission LRU that evicts to a large Segmented LRU if accepted by the TinyLfu admission policy. TinyLfu relies on a frequency sketch to probabilistically estimate the historic usage of an entry. The window allows the policy to have a high hit rate when entries exhibit a high temporal / low frequency access pattern which would otherwise be rejected. The configuration enables the cache to estimate the frequency and recency of an entry with low overhead. This implementation uses a 4-bit CountMinSketch, growing at 8 bytes per cache entry to be accurate. Unlike ARC and LIRS, this policy does not retain non-resident keys.
tinylfu  caches  caching  cache-eviction  java8  guava  caffeine  lru  count-min  sketching  algorithms 
november 2015 by jm
You're probably wrong about caching
Excellent cut-out-and-keep guide to why you should add a caching layer. I've been following this practice for the past few years, after I realised that #6 (recovering from a failed cache is hard) is a killer -- I've seen a few large-scale outages where a production system had gained enough scale that it required a cache to operate, and once that cache was damaged, bringing the system back online required a painful rewarming protocol. Better to design for the non-cached case if possible.
architecture  caching  coding  design  caches  ops  production  scalability 
september 2015 by jm
How to change Gradle cache location
$GRADLE_USER_HOME, basically -- it may also be possible to set from the Gradle script itself too
gradle  build  caching  environment  unix  cache 
may 2015 by jm
'Caffeine is a Java 8 based concurrency library that provides specialized data structures, such as a high performance cache.'
cache  java8  java  guava  caching  concurrency  data-structures  coding 
march 2015 by jm
RIPQ: Advanced photo caching on flash for Facebook
Interesting priority-queue algorithm optimised for caching data on SSD
priority-queue  algorithms  facebook  ssd  flash  caching  ripq  papers 
february 2015 by jm
get page cache statistics for files.
A common question when tuning databases and other IO-intensive applications is, "is Linux caching my data or not?" pcstat gets that information for you using the mincore(2) syscall. I wrote this is so that Apache Cassandra users can see if ssTables are being cached.
linux  page-cache  caching  go  performance  cassandra  ops  mincore  fincore 
september 2014 by jm
Inside Apple’s Live Event Stream Failure, And Why It Happened: It Wasn’t A Capacity Issue
The bottom line with this event is that the encoding, translation, JavaScript code, the video player, the call to S3 single storage location and the millisecond refreshes all didn’t work properly together and was the root cause of Apple’s failed attempt to make the live stream work without any problems. So while it would be easy to say it was a CDN capacity issue, which was my initial thought considering how many events are taking place today and this week, it does not appear that a lack of capacity played any part in the event not working properly. Apple simply didn’t provision and plan for the event properly.
cdn  streaming  apple  fail  scaling  s3  akamai  caching 
september 2014 by jm
How Twitter Uses Redis to Scale
'105TB RAM, 39MM QPS, 10,000+ instances.' Notes from a talk given by Yao Yu of Twitter's Cache team, where she's worked for 4 years. Lots of interesting insights into large-scale Redis caching usage -- as in, large enough to max out the cluster hosts' network bandwidth.
twitter  redis  caching  memcached  yao-yu  scaling 
september 2014 by jm
An analysis of Facebook photo caching
excellent analysis of caching behaviour at scale, from the FB engineering blog (via Tony Finch)
via:fanf  caching  facebook  architecture  photos  images  cache  fifo  lru  scalability 
may 2014 by jm
Basho LevelDB supports tiered storage
Tiered storage is turning out to be a pretty practical trick to take advantage of SSDs:
The justification for two types/speeds of storage arrays is simple. leveldb is extremely write intensive in its lower levels. The write intensity drops off as the level number increases. Similarly, current and frequently updated data tends to be in lower levels while archival data tends to be in higher levels. These leveldb characteristics create a desire to have faster, more expensive storage arrays for the high intensity lower levels. This branch allows the high intensity lower levels to be on expensive storage arrays while slower, less expensive storage arrays to hold the higher level data to reduce costs.
caching  tiered-storage  storage  ssds  ebs  leveldb  basho  patches  riak  iops 
april 2014 by jm
Huge Redis rant
I want to emphasize that if you use redis as intended (as a slightly-persistent, not-HA cache), it's great. Unfortunately, more and more shops seem to be thinking that Redis is a full-service database and, as someone who's had to spend an inordinate amount of time maintaining such a setup, it's not. If you're writing software and you're thinking "hey, it would be easy to just put a SET key value in this code and be done," please reconsider. There are lots of great products out there that are better for the overwhelming majority of use cases.

Ouch. (via Aphyr)
redis  storage  architecture  memory  caching  ha  databases 
february 2014 by jm
How to avoid crappy ISP caches when viewing YouTube video
Must give this a try when I get home -- I frequently have latency problems watching YT on my UPC connection, and I bet they have a crappily-managed, overloaded cache box on their network.
streaming  youtube  caching  isps  caches  firewalls  iptables  hacks  video  networking 
august 2013 by jm
Twilio Billing Incident Post-Mortem
At 1:35 AM PDT on July 18, a loss of network connectivity caused all billing redis-slaves to simultaneously disconnect from the master. This caused all redis-slaves to reconnect and request full synchronization with the master at the same time. Receiving full sync requests from each redis-slave caused the master to suffer extreme load, resulting in performance degradation of the master and timeouts from redis-slaves to redis-master.
By 2:39 AM PDT the host’s load became so extreme, services relying on redis-master began to fail. At 2:42 AM PDT, our monitoring system alerted our on-call engineering team of a failure in the Redis cluster. Observing extreme load on the host, the redis process on redis-master was misdiagnosed as requiring a restart to recover. This caused redis-master to read an incorrect configuration file, which in turn caused Redis to attempt to recover from a non-existent AOF file, instead of the binary snapshot. As a result of that failed recovery, redis-master dropped all balance data. In addition to forcing recovery from a non-existent AOF, an incorrect configuration also caused redis-master to boot as a slave of itself, putting it in read-only mode and preventing the billing system from updating account balances.

See also for antirez' response.

Here's the takeaways I'm getting from it:

1. network partitions happen in production, and cause cascading failures. this is a great demo of that.

2. don't store critical data in Redis. this was the case for Twilio -- as far as I can tell they were using Redis as a front-line cache for billing data -- but it's worth saying anyway. ;)

3. Twilio were just using Redis as a cache, but a bug in their code meant that the writes to the backing SQL store were not being *read*, resulting in repeated billing and customer impact. In other words, it turned a (fragile) cache into the authoritative store.

4. they should probably have designed their code so that write failures would not result in repeated billing for customers -- that's a bad failure path.

Good post-mortem anyway, and I'd say their customers are a good deal happier to see this published, even if it contains details of the mistakes they made along the way.
redis  caching  storage  networking  network-partitions  twilio  postmortems  ops  billing  replication 
july 2013 by jm
Reducing MongoDB traffic by 78% with Redis | Crashlytics Blog
One for @roflscaletips. Crashlytics reduce MongoDB load by hacking in some hand-coded caching into their Rails app, instead of just using a front-line HTTP cache to reduce Rails *and* db load. duh. (via Oisin)
crashlytics  fail  roflscale  rails  caching  redis  ruby  via:oisin 
may 2013 by jm
memcached turns 10 years old
Well, apparently tomorrow, but close enough. Happy birthday to bradfitz' greatest creation and its wonderful slab allocator!
birthdays  code  via:alex-popescu  open-source  history  malloc  memory  caching  memcached 
may 2013 by jm
Making sense out of BDB-JE fast stats
good info on the system metrics recorded by BDB-JE's EnvironmentStats code, particularly where cache and cleaner activity are concerned. Particularly useful for Voldemort
voldemort  caching  bdb  bdb-je  storage  tuning  ops  metrics  reference 
may 2013 by jm
from Twitter -- 'a cache for your big data. Even though memory is thousand times faster than SSD, network connected SSD-backed memory makes sense, if we design the system in a way that network latencies dominate over the SSD latencies by a large factor. To understand why network connected SSD makes sense, it is important to understand the role distributed memory plays in large-scale web architecture. In recent years, terabyte-scale, distributed, in-memory caches have become a fundamental building block of any web architecture. In-memory indexes, hash tables, key-value stores and caches are increasingly incorporated for scaling throughput and reducing latency of persistent storage systems. However, power consumption, operational complexity and single node DRAM cost make horizontally scaling this architecture challenging. The current cost of DRAM per server increases dramatically beyond approximately 150 GB, and power cost scales similarly as DRAM density increases. Fatcache extends a volatile, in-memory cache by incorporating SSD-backed storage.'
twitter  ssd  cache  caching  memcached  memcache  memory  network  storage 
february 2013 by jm
good taxonomy of memcached use cases
via Jeff Barr's announcement of the Elasticache launch. from 2008, but a better taxonomy than I've seen elsewhere
memcached  caching  mysql  performance  scalability  via:jeffbarr 
august 2011 by jm
Improving Linux performance by preserving Buffer Cache State
handy -- a patch to rsync(1) which will not disturb the buffer cache, so that large file transfers and backups will not interfere with what's been cached previously
performance  linux  caching  buffer-cache  rsync  io  cache  patches  backups  from delicious
march 2011 by jm
A high-performance compressor optimized for binary data -- 'designed to transmit data to the processor cache faster than a traditional, non-compressed, direct memory fetch via memcpy()' (via Bill de hOra)
via:dehora  compression  memcpy  caching  l1  software  memory  optimization  performance  python  pytables  from delicious
october 2010 by jm
Why WeakHashMap Sucks
'SoftReferences are the cheap, crappy caching mechanism [...] perfect for when you'd like your cache to be cleared at random times and in random order.'
softreferences  weakreferences  weak  references  gc  java  jvm  caching  hash  memory  collections  vm  weakhashmap  via:spyced  from delicious
september 2009 by jm

related tags

akamai  algorithm  algorithms  amdahls-law  apple  architecture  backups  basho  bdb  bdb-je  billing  birthdays  buffer-cache  buffer-overflows  build  c  cache  cache-eviction  cache-miss  caches  caching  caffeine  cassandra  cdn  cloudflare  code  coding  collections  compression  concurrency  contention  count-min  crashlytics  cuckoo-hashing  data-leak  data-structures  databases  design  ebs  email  environment  erlang  expiration  expiry  facebook  fail  fifo  fincore  firewalls  flash  gc  go  gradle  guava  ha  hacks  hash  history  images  internet  io  iops  iptables  isps  java  java8  jvm  key-value-stores  l1  lcs  leaks  leveldb  linux  littles-law  locking  locks  lru  mailinator  malloc  memcache  memcached  memcpy  memory  metrics  mincore  modelling  multifeed  mysql  network  network-partitions  networking  open-source  ops  optimization  page-cache  papers  patches  performance  photos  postmortems  priority-queue  probabilistic  production  pytables  python  rails  redis  reference  references  replication  riak  ripq  roflscale  rsync  ruby  s3  scalability  scaling  schedulers  security  sketching  softreferences  software  ssd  ssds  stampedes  storage  streaming  system-dynamics  systems  threads  throughput  tiered-storage  tinylfu  tuning  twilio  twitter  unix  usl  via:alex-popescu  via:dehora  via:fanf  via:jeffbarr  via:marcbrooker  via:oisin  via:spyced  video  vldb  vm  voldemort  weak  weakhashmap  weakreferences  yao-yu  youtube 

Copy this bookmark: