jm + anonymisation   4

Estimating the success of re-identifications in incomplete datasets using generative models | Nature Communications
Using our model, we find that 99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes. Our results suggest that even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR and seriously challenge the technical and legal adequacy of the de-identification release-and-forget model.


ouch.
deanonymization  deidentification  anonymization  anonymisation  gdpr  privacy  data-privacy  papers 
27 days ago by jm
Excellent example of failed "anonymisation" of a dataset
Fred Logue notes how this failed Mayo TD Michelle Mulherin:
From recent reports it mow appears that the Department of Education is discussing anonymisation of the Primary Online Database with the Data Protection Commissioner. Well someone should ask Mayo TD Michelle Mulherin how anonymisation is working for her.

The Sunday Times reports that Ms Mulherin was the only TD in the Irish parliament on the dates when expensive phone calls were made to a mobile number in Kenya. The details of the calls were released under the Freedom of Information Act in an “anonymised” database. While it must be said the fact that Ms Mulherin was the only TD present on those occasions does not prove she made the calls – the reporting in the press is now raising the possibility that it was her.

From a data protection point of view this is a perfect example of the difficulty with anonymisation. Data protection rules apply to personal data which is defined as data relating to a living individual who is or can be identified from the data or from the data in conjunction with other information. Anonymisation is often cited as a means for processing data outside the scope of data protection law but as Ms Mulherin has discovered individuals can be identified using supposedly anonymised data when analysed in conjunction with other data.

In the case of the mysterious calls to Kenya even though the released information was “anonymised” to protect the privacy of public representatives, the phone log used in combination with the attendance record of public representatives and information on social media was sufficient to identify individuals and at least raise evidence of association between individuals and certain phone calls. While this may be well and good in terms of accounting for abuses of the phone service it also has worrying implications for the ability of public representatives to conduct their business in private.

The bottom line is that anonymisation is very difficult if not impossible as Ms Mulherin has learned to her cost. It certainly is a lot more complex than simply removing names and other identifying features from a single dataset. The more data that there is and the more diverse the sources the greater the risk that individuals can be identified from supposedly anonymised datasets.
data  anonymisation  fred-logue  ireland  michelle-mulherin  tds  kenya  data-protection  privacy 
january 2015 by jm
HSE data releases may be de-anonymisable
Although the data has been kept anonymous, the increasing sophistication of computer-driven data-mining techniques has led to fears patients could be identified.
A HSE spokesman confirmed yesterday that the office responded to requests for data from a variety of sources, including researchers, the universities, GPs, the media, health insurers and pharmaceutical companies. An average of about two requests a week was received. [...]
The information provided by the HPO has significant patient identifiers removed, such as name and date of birth. According to the HSE spokesman, individual patient information is not provided and, where information is sought for a small group of patients, this is not provided where the number involved is under five. “In such circumstances, it is highly unlikely that anyone could be identified. Nevertheless, we will have another look at data releases from the office,” he said.


I'd say this could be readily reversible, from the sounds of it.
anonymisation  sanitisation  data-dumps  hse  health  privacy  via:tjmcintyre 
june 2014 by jm
'Robust De-anonymization of Large Sparse Datasets' [pdf]
paper by Arvind Narayanan and Vitaly Shmatikov, 2008.

'We present a new class of statistical de- anonymization attacks against high-dimensional micro-data, such as individual preferences, recommendations, transaction records and so on. Our techniques are robust to perturbation in the data and tolerate some mistakes in the adversary's background knowledge. We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world's largest online movie rental service. We demonstrate that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber's record in the dataset. Using the Internet Movie Database as the source of background knowledge, we successfully identified the Netflix records of known users, uncovering their apparent political preferences and other potentially sensitive information.'
anonymisation  anonymization  sanitisation  databases  data-dumps  privacy  security  papers 
june 2014 by jm

Copy this bookmark:



description:


tags: