jm + via:marcbrooker   2

_Optimal Probabilistic Cache Stampede Prevention_ [pdf]
'When a frequently-accessed cache item expires, multiple requests
to that item can trigger a cache miss and start regenerating
that same item at the same time. This phenomenon,
known as cache stampede, severely limits the performance
of databases and web servers. A natural countermeasure to
this issue is to let the processes that perform such requests
to randomly ask for a regeneration before the expiration
time of the item. In this paper we give optimal algorithms
for performing such probabilistic early expirations. Our algorithms
are theoretically optimal and have much better
performances than other solutions used in real-world applications.'

(via Marc Brooker)
via:marcbrooker  caching  caches  algorithm  probabilistic  expiration  vldb  papers  expiry  cache-miss  stampedes 
12 days ago by jm
Darts, Dice, and Coins
Earlier this year, I asked a question on Stack Overflow about a data structure for loaded dice. Specifically, I was interested in answering this question: "You are given an n-sided die where side i has probability pi of being rolled. What is the most efficient data structure for simulating rolls of the die?"

This data structure could be used for many purposes. For starters, you could use it to simulate rolls of a fair, six-sided die by assigning probability 1616 to each of the sides of the die, or a to simulate a fair coin by simulating a two-sided die where each side has probability 1212 of coming up. You could also use this data structure to directly simulate the total of two fair six-sided dice being thrown by having an 11-sided die (whose faces were 2, 3, 4, ..., 12), where each side was appropriately weighted with the probability that this total would show if you used two fair dice. However, you could also use this data structure to simulate loaded dice. For example, if you were playing craps with dice that you knew weren't perfectly fair, you might use the data structure to simulate many rolls of the dice to see what the optimal strategy would be. You could also consider simulating an imperfect roulette wheel in the same way.

Outside the domain of game-playing, you could also use this data structure in robotics simulations where sensors have known failure rates. For example, if a range sensor has a 95% chance of giving the right value back, a 4% chance of giving back a value that's too small, and a 1% chance of handing back a value that's too large, you could use this data structure to simulate readings from the sensor by generating a random outcome and simulating the sensor reading in that case.

The answer I received on Stack Overflow impressed me for two reasons. First, the solution pointed me at a powerful technique called the alias method that, under certain reasonable assumptions about the machine model, is capable of simulating rolls of the die in O(1)O(1) time after a simple preprocessing step. Second, and perhaps more surprisingly, this algorithm has been known for decades, but I had not once encountered it! Considering how much processing time is dedicated to simulation, I would have expected this technique to be better- known. A few quick Google searches turned up a wealth of information on the technique, but I couldn't find a single site that compiled together the intuition and explanation behind the technique.

(via Marc Brooker)
via:marcbrooker  algorithms  probability  algorithm  coding  data-structures  alias  dice  random 
april 2016 by jm

Copy this bookmark: