jm + probability   11

The Impenetrable Program Transforming How Courts Treat DNA Evidence | WIRED
'So the lab turned to TrueAllele, a program sold by Cybergenetics, a small company dedicated to helping law enforcement analyze DNA where regular lab tests fail. They do it with something called probabilistic genotyping, which uses complex mathematical formulas to examine the statistical likelihood that a certain genotype comes from one individual over another. It’s a type of DNA testing that’s becoming increasingly popular in courtrooms. '

[...] 'But now legal experts, along with Johnson’s advocates, are joining forces to argue to a California court that TrueAllele—the seemingly magic software that helped law enforcement analyze the evidence that tied Johnson to the crimes—should be forced to reveal the code that sent Johnson to prison. This code, they say, is necessary in order to properly evaluate the technology. In fact, they say, justice from an unknown algorithm is no justice at all.'
law  justice  trueallele  software  dna  evidence  statistics  probability  code-review  auditing 
10 days ago by jm
Darts, Dice, and Coins
Earlier this year, I asked a question on Stack Overflow about a data structure for loaded dice. Specifically, I was interested in answering this question: "You are given an n-sided die where side i has probability pi of being rolled. What is the most efficient data structure for simulating rolls of the die?"

This data structure could be used for many purposes. For starters, you could use it to simulate rolls of a fair, six-sided die by assigning probability 1616 to each of the sides of the die, or a to simulate a fair coin by simulating a two-sided die where each side has probability 1212 of coming up. You could also use this data structure to directly simulate the total of two fair six-sided dice being thrown by having an 11-sided die (whose faces were 2, 3, 4, ..., 12), where each side was appropriately weighted with the probability that this total would show if you used two fair dice. However, you could also use this data structure to simulate loaded dice. For example, if you were playing craps with dice that you knew weren't perfectly fair, you might use the data structure to simulate many rolls of the dice to see what the optimal strategy would be. You could also consider simulating an imperfect roulette wheel in the same way.

Outside the domain of game-playing, you could also use this data structure in robotics simulations where sensors have known failure rates. For example, if a range sensor has a 95% chance of giving the right value back, a 4% chance of giving back a value that's too small, and a 1% chance of handing back a value that's too large, you could use this data structure to simulate readings from the sensor by generating a random outcome and simulating the sensor reading in that case.

The answer I received on Stack Overflow impressed me for two reasons. First, the solution pointed me at a powerful technique called the alias method that, under certain reasonable assumptions about the machine model, is capable of simulating rolls of the die in O(1)O(1) time after a simple preprocessing step. Second, and perhaps more surprisingly, this algorithm has been known for decades, but I had not once encountered it! Considering how much processing time is dedicated to simulation, I would have expected this technique to be better- known. A few quick Google searches turned up a wealth of information on the technique, but I couldn't find a single site that compiled together the intuition and explanation behind the technique.

(via Marc Brooker)
via:marcbrooker  algorithms  probability  algorithm  coding  data-structures  alias  dice  random 
april 2016 by jm
The general birthday problem
Good explanation and scipy code for the birthday paradox and hash collisions
hashing  hashes  collisions  birthday-problem  birthday-paradox  coding  probability  statistics 
february 2016 by jm
Birthday problem calculator
I keep having to google this, so here's a good one which works -- unlike Wolfram Alpha!
birthday  birthday-paradox  birthday-problem  hashes  hash-collision  attacks  security  collisions  calculators  probability  statistcs 
december 2015 by jm
'Uncertain<T>: A First-Order Type for Uncertain Data' [paper, PDF]
'Emerging applications increasingly use estimates such as sensor
data (GPS), probabilistic models, machine learning, big
data, and human data. Unfortunately, representing this uncertain
data with discrete types (floats, integers, and booleans)
encourages developers to pretend it is not probabilistic, which
causes three types of uncertainty bugs. (1) Using estimates
as facts ignores random error in estimates. (2) Computation
compounds that error. (3) Boolean questions on probabilistic
data induce false positives and negatives.
This paper introduces Uncertain<T>, a new programming
language abstraction for uncertain data. We implement a
Bayesian network semantics for computation and conditionals
that improves program correctness. The runtime uses sampling
and hypothesis tests to evaluate computation and conditionals
lazily and efficiently. We illustrate with sensor and
machine learning applications that Uncertain<T> improves
expressiveness and accuracy.'

(via Tony Finch)
via:fanf  uncertainty  estimation  types  strong-typing  coding  probability  statistics  machine-learning  sampling 
december 2014 by jm
How the search for flight AF447 used Bayesian inference
Via jgc, the search for the downed Air France flight was optimized using this technique:

'Metron’s approach to this search planning problem is rooted in classical Bayesian inference,
which allows organization of available data with associated uncertainties and computation of the
Probability Distribution Function (PDF) for target location given these data. In following this
approach, the first step was to gather the available information about the location of the impact site
of the aircraft. This information was sometimes contradictory and filled with ambiguities and
uncertainties. Using a Bayesian approach we organized this material into consistent scenarios,
quantified the uncertainties with probability distributions, weighted the relative likelihood of each
scenario, and performed a simulation to produce a prior PDF for the location of the wreck.'
metron  bayes  bayesian-inference  machine-learning  statistics  via:jgc  air-france  disasters  probability  inference  searching 
march 2014 by jm
Probabilistic Data Structures for Web Analytics and Data Mining « Highly Scalable Blog
Stream summary, count-min sketches, loglog counting, linear counters. Some nifty algorithms for probabilistic estimation of element frequencies and data-set cardinality (via proggit)
via:proggit  algorithms  probability  probabilistic  count-min  stream-summary  loglog-counting  linear-counting  estimation  big-data 
may 2012 by jm
Wired: how a Toronto statistician cracked the state lottery
'The tic-tac-toe lottery was seriously flawed. It took a few hours of studying his tickets and some statistical sleuthing, but he discovered a defect in the game: The visible numbers turned out to reveal essential information about the digits hidden under the latex coating. Nothing needed to be scratched off—the ticket could be cracked if you knew the secret code.'
toronto  hacks  money  statistics  probability  wired  tic-tac-toe  singleton  from delicious
february 2011 by jm
Colm argues against the 'sleep rand % 3600' hack
it's not sufficiently evenly-distributed, apparently. Also: got linked from Hack The Planet!
scheduling  probability  sleep  unix  updating  cron  random  from delicious
september 2009 by jm

Copy this bookmark: