Copy this bookmark:



bookmark detail

NYC generates hash-anonymised data dump, which gets reversed
There are about 1000*26**3 = 21952000 or 22M possible medallion numbers. So, by calculating the md5 hashes of all these numbers (only 24M!), one can completely deanonymise the entire data. Modern computers are fast: so fast that computing the 24M hashes took less than 2 minutes.

(via Bruce Schneier)

The better fix is a HMAC (see ), or just to assign opaque IDs instead of hashing.
hashing  sha1  md5  bruce-schneier  anonymization  deanonymization  security  new-york  nyc  taxis  data  big-data  hmac  keyed-hashing  salting 
june 2014 by jm
view in context