**jm + via:marcbrooker**
2

_Optimal Probabilistic Cache Stampede Prevention_ [pdf]

may 2017 by jm

'When a frequently-accessed cache item expires, multiple requests

to that item can trigger a cache miss and start regenerating

that same item at the same time. This phenomenon,

known as cache stampede, severely limits the performance

of databases and web servers. A natural countermeasure to

this issue is to let the processes that perform such requests

to randomly ask for a regeneration before the expiration

time of the item. In this paper we give optimal algorithms

for performing such probabilistic early expirations. Our algorithms

are theoretically optimal and have much better

performances than other solutions used in real-world applications.'

(via Marc Brooker)

via:marcbrooker
caching
caches
algorithm
probabilistic
expiration
vldb
papers
expiry
cache-miss
stampedes
to that item can trigger a cache miss and start regenerating

that same item at the same time. This phenomenon,

known as cache stampede, severely limits the performance

of databases and web servers. A natural countermeasure to

this issue is to let the processes that perform such requests

to randomly ask for a regeneration before the expiration

time of the item. In this paper we give optimal algorithms

for performing such probabilistic early expirations. Our algorithms

are theoretically optimal and have much better

performances than other solutions used in real-world applications.'

(via Marc Brooker)

may 2017 by jm

Darts, Dice, and Coins

(via Marc Brooker)
via:marcbrooker
algorithms
probability
algorithm
coding
data-structures
alias
dice
random

april 2016 by jm

Earlier this year, I asked a question on Stack Overflow about a data structure for loaded dice. Specifically, I was interested in answering this question: "You are given an n-sided die where side i has probability pi of being rolled. What is the most efficient data structure for simulating rolls of the die?"

This data structure could be used for many purposes. For starters, you could use it to simulate rolls of a fair, six-sided die by assigning probability 1616 to each of the sides of the die, or a to simulate a fair coin by simulating a two-sided die where each side has probability 1212 of coming up. You could also use this data structure to directly simulate the total of two fair six-sided dice being thrown by having an 11-sided die (whose faces were 2, 3, 4, ..., 12), where each side was appropriately weighted with the probability that this total would show if you used two fair dice. However, you could also use this data structure to simulate loaded dice. For example, if you were playing craps with dice that you knew weren't perfectly fair, you might use the data structure to simulate many rolls of the dice to see what the optimal strategy would be. You could also consider simulating an imperfect roulette wheel in the same way.

Outside the domain of game-playing, you could also use this data structure in robotics simulations where sensors have known failure rates. For example, if a range sensor has a 95% chance of giving the right value back, a 4% chance of giving back a value that's too small, and a 1% chance of handing back a value that's too large, you could use this data structure to simulate readings from the sensor by generating a random outcome and simulating the sensor reading in that case.

The answer I received on Stack Overflow impressed me for two reasons. First, the solution pointed me at a powerful technique called the alias method that, under certain reasonable assumptions about the machine model, is capable of simulating rolls of the die in O(1)O(1) time after a simple preprocessing step. Second, and perhaps more surprisingly, this algorithm has been known for decades, but I had not once encountered it! Considering how much processing time is dedicated to simulation, I would have expected this technique to be better- known. A few quick Google searches turned up a wealth of information on the technique, but I couldn't find a single site that compiled together the intuition and explanation behind the technique.

(via Marc Brooker)

april 2016 by jm

Copy this bookmark: