jm + anonymization   6

'Yet another Crypto-PAn implementation for Python':
This package provides a function to anonymize IP addresses keeping their prefix consistency. This program is based on the paper "Prefix-Preserving IP Address Anonymization: Measurement-based Security Evaluation and a New Cryptography-based Scheme" written by Jun Xu, Jinliang Fan, Mostafa H. Ammar, and Sue B. Moon. The detailed explanation can be found in [Xu2002]. This package supports both IPv4 and IPv6 anonymization.

(via Alexandre Dulaunoy)
via:adulau  anonymization  ip-addresses  internet  ipv4  ipv6  security  crypto  python  crypto-pan 
4 weeks ago by jm
Differential Privacy
Apple have announced they plan to use it; Google use a DP algorithm called RAPPOR in Chrome usage statistics. In summary: "novel privacy technology that allows inferring statistics about populations while preserving the privacy of individual users".
apple  privacy  anonymization  google  rappor  algorithms  sampling  populations  statistics  differential-privacy 
june 2016 by jm and access to UK health records: patient privacy and public trust
'In 2013, the United Kingdom launched, an NHS England initiative to combine patient records, stored in the machines of general practitioners (GPs), with information from social services and hospitals to make one centralized data archive. One aim of the initiative is to gain a picture of the care being delivered between different parts of the healthcare system and thus identify what is working in health care delivery, and what areas need greater attention and resources. This case study analyzes the complications around the launch of It explains the historical context of the program and the controversies that emerged in the course of the rollout. It explores problems in management and communications around the centralization effort, competing views on the safety of “anonymous” and “pseudonymous” health data, and the conflicting legal duties imposed on GPs with the introduction of the 2012 Health and Social Care Act. This paper also explores the power struggles in the battle over and outlines the tensions among various stakeholders, including patients, GPs, the Health and Social Care Information Centre (HSCIC), the government, privacy experts and data purchasers. The predominant public policy question that emerges from this review centers on how best to utilize technological advances and simultaneously strike a balance between the many competing interests around health and personal privacy.'  privacy  healthcare  uk  nhs  trust  anonymity  anonymization  gps  medicine 
august 2015 by jm
Ask the Decoder: Did I sign up for a global sleep study?
How meaningful is this corporate data science, anyway? Given the tech-savvy people in the Bay Area, Jawbone likely had a very dense sample of Jawbone wearers to draw from for its Napa earthquake analysis. That allowed it to look at proximity to the epicenter of the earthquake from location information.

Jawbone boasts its sample population of roughly “1 million Up wearers who track their sleep using Up by Jawbone.” But when looking into patterns county by county in the U.S., Jawbone states, it takes certain statistical liberties to show granularity while accounting for places where there may not be many Jawbone users.

So while Jawbone data can show us interesting things about sleep patterns across a very large population, we have to remember how selective that population is. Jawbone wearers are people who can afford a $129 wearable fitness gadget and the smartphone or computer to interact with the output from the device.

Jawbone is sharing what it learns with the public, but think of all the public health interests or other third parties that might be interested in other research questions from a large scale data set. Yet this data is not collected with scientific processes and controls and is not treated with the rigor and scrutiny that a scientific study requires.

Jawbone and other fitness trackers don’t give us the option to use their devices while opting out of contributing to the anonymous data sets they publish. Maybe that ought to change.
jawbone  privacy  data-protection  anonymization  aggregation  data  medicine  health  earthquakes  statistics  iot  wearables 
march 2015 by jm
'Robust De-anonymization of Large Sparse Datasets' [pdf]
paper by Arvind Narayanan and Vitaly Shmatikov, 2008.

'We present a new class of statistical de- anonymization attacks against high-dimensional micro-data, such as individual preferences, recommendations, transaction records and so on. Our techniques are robust to perturbation in the data and tolerate some mistakes in the adversary's background knowledge. We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world's largest online movie rental service. We demonstrate that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber's record in the dataset. Using the Internet Movie Database as the source of background knowledge, we successfully identified the Netflix records of known users, uncovering their apparent political preferences and other potentially sensitive information.'
anonymisation  anonymization  sanitisation  databases  data-dumps  privacy  security  papers 
june 2014 by jm
NYC generates hash-anonymised data dump, which gets reversed
There are about 1000*26**3 = 21952000 or 22M possible medallion numbers. So, by calculating the md5 hashes of all these numbers (only 24M!), one can completely deanonymise the entire data. Modern computers are fast: so fast that computing the 24M hashes took less than 2 minutes.

(via Bruce Schneier)

The better fix is a HMAC (see ), or just to assign opaque IDs instead of hashing.
hashing  sha1  md5  bruce-schneier  anonymization  deanonymization  security  new-york  nyc  taxis  data  big-data  hmac  keyed-hashing  salting 
june 2014 by jm

Copy this bookmark: