rvenkat + privacy   50

Sexual Privacy by Danielle Keats Citron :: SSRN
Those who wish to control, expose, and damage the identities of individuals routinely do so by invading their privacy. People are secretly recorded in bedrooms and public bathrooms, and “up their skirts.” They are coerced into sharing nude photographs and filming sex acts under the threat of public disclosure of their nude images. People’s nude images are posted online without permission. Machine-learning technology is used to create digitally manipulated “deep fake” sex videos that swap people’s faces into pornography.

At the heart of these abuses is an invasion of sexual privacy—the behaviors and expectations that manage access to, and information about, the human body; intimate activities; and personal choices about the body and intimate information. More often, women, nonwhites, sexual minorities, and minors shoulder the abuse.

Sexual privacy is a distinct privacy interest that warrants recognition and protection. It serves as a cornerstone for sexual autonomy and consent. It is foundational to intimacy. Its denial results in the subordination of marginalized communities. Traditional privacy law’s efficacy, however, is eroding just as digital technologies magnify the scale and scope of the harm. This Article suggests an approach to sexual privacy that focuses on law and markets. Law should provide federal and state penalties for privacy invaders, remove the statutory immunity from liability for certain content platforms, and work in tandem with hate crime laws. Market efforts should be pursued if they enhance the overall privacy interests of all involved.
privacy  law  civil_rights  networked_public_sphere  via:zeynep 
12 days ago by rvenkat
[1810.08130] Private Machine Learning in TensorFlow using Secure Computation
We present a framework for experimenting with secure multi-party computation directly in TensorFlow. By doing so we benefit from several properties valuable to both researchers and practitioners, including tight integration with ordinary machine learning processes, existing optimizations for distributed computation in TensorFlow, high-level abstractions for expressing complex algorithms and protocols, and an expanded set of familiar tooling. We give an open source implementation of a state-of-the-art protocol and report on concrete benchmarks using typical models from private machine learning.
privacy  machine_learning  software  python  for_friends 
25 days ago by rvenkat
[1808.00023] The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning
The nascent field of fair machine learning aims to ensure that decisions guided by algorithms are equitable. Over the last several years, three formal definitions of fairness have gained prominence: (1) anti-classification, meaning that protected attributes---like race, gender, and their proxies---are not explicitly used to make decisions; (2) classification parity, meaning that common measures of predictive performance (e.g., false positive and false negative rates) are equal across groups defined by the protected attributes; and (3) calibration, meaning that conditional on risk estimates, outcomes are independent of protected attributes. Here we show that all three of these fairness definitions suffer from significant statistical limitations. Requiring anti-classification or classification parity can, perversely, harm the very groups they were designed to protect; and calibration, though generally desirable, provides little guarantee that decisions are equitable. In contrast to these formal fairness criteria, we argue that it is often preferable to treat similarly risky people similarly, based on the most statistically accurate estimates of risk that one can produce. Such a strategy, while not universally applicable, often aligns well with policy objectives; notably, this strategy will typically violate both anti-classification and classification parity. In practice, it requires significant effort to construct suitable risk estimates. One must carefully define and measure the targets of prediction to avoid retrenching biases in the data. But, importantly, one cannot generally address these difficulties by requiring that algorithms satisfy popular mathematical formalizations of fairness. By highlighting these challenges in the foundation of fair machine learning, we hope to help researchers and practitioners productively advance the area.
machine_learning  algorithms  bias  ethics  privacy  review  for_friends 
august 2018 by rvenkat
[1803.09007] Quantifying Surveillance in the Networked Age: Node-based Intrusions and Group Privacy
From the "right to be left alone" to the "right to selective disclosure", privacy has long been thought as the control individuals have over the information they share and reveal about themselves. However, in a world that is more connected than ever, the choices of the people we interact with increasingly affect our privacy. This forces us to rethink our definition of privacy. We here formalize and study, as local and global node- and edge-observability, Bloustein's concept of group privacy. We prove edge-observability to be independent of the graph structure, while node-observability depends only on the degree distribution of the graph. We show on synthetic datasets that, for attacks spanning several hops such as those implemented by social networks and current US laws, the presence of hubs increases node-observability while a high clustering coefficient decreases it, at fixed density. We then study the edge-observability of a large real-world mobile phone dataset over a month and show that, even under the restricted two-hops rule, compromising as little as 1% of the nodes leads to observing up to 46% of all communications in the network. More worrisome, we also show that on average 36\% of each person's communications would be locally edge-observable under the same rule. Finally, we use real sensing data to show how people living in cities are vulnerable to distributed node-observability attacks. Using a smartphone app to compromise 1\% of the population, an attacker could monitor the location of more than half of London's population. Taken together, our results show that the current individual-centric approach to privacy and data protection does not encompass the realities of modern life. This makes us---as a society---vulnerable to large-scale surveillance attacks which we need to develop protections against.
networks  privacy  networked_life  social_networks 
march 2018 by rvenkat
Monophily in social networks introduces similarity among friends-of-friends | Nature Human Behaviour
The observation that individuals tend to be friends with people who are similar to themselves, commonly known as homophily, is a prominent feature of social networks. While homophily describes a bias in attribute preferences for similar others, it gives limited attention to variability. Here, we observe that attribute preferences can exhibit variation beyond what can be explained by homophily. We call this excess variation monophily to describe the presence of individuals with extreme preferences for a particular attribute possibly unrelated to their own attribute. We observe that monophily can induce a similarity among friends-of-friends without requiring any similarity among friends. To simulate homophily and monophily in synthetic networks, we propose an overdispersed extension of the classical stochastic block model. We use this model to demonstrate how homophily-based methods for predicting attributes on social networks based on friends (that is, 'the company you keep') are fundamentally different from monophily-based methods based on friends-of-friends (that is, 'the company you’re kept in'). We place particular focus on predicting gender, where homophily can be weak or non-existent in practice. These findings offer an alternative perspective on network structure and prediction, complicating the already difficult task of protecting privacy on social networks.

social_networks  privacy  network_data_analysis  latent_variable  block_model  via:clauset  networks  teaching  ? 
march 2018 by rvenkat
[1803.02887] A first look at browser-based Cryptojacking
In this paper, we examine the recent trend towards in-browser mining of cryptocurrencies; in particular, the mining of Monero through Coinhive and similar code- bases. In this model, a user visiting a website will download a JavaScript code that executes client-side in her browser, mines a cryptocurrency, typically without her consent or knowledge, and pays out the seigniorage to the website. Websites may consciously employ this as an alternative or to supplement advertisement revenue, may offer premium content in exchange for mining, or may be unwittingly serving the code as a result of a breach (in which case the seigniorage is collected by the attacker). The cryptocurrency Monero is preferred seemingly for its unfriendliness to large-scale ASIC mining that would drive browser-based efforts out of the market, as well as for its purported privacy features. In this paper, we survey this landscape, conduct some measurements to establish its prevalence and profitability, outline an ethical framework for considering whether it should be classified as an attack or business opportunity, and make suggestions for the detection, mitigation and/or prevention of browser-based mining for non- consenting users.
via:randw  cryptocurrency  cybersecurity  privacy  consumer_protection  regulation 
march 2018 by rvenkat
Federal Statistics, Multiple Data Sources, and Privacy Protection: Next Steps | The National Academies Press
The environment for obtaining information and providing statistical data for policy makers and the public has changed significantly in the past decade, raising questions about the fundamental survey paradigm that underlies federal statistics. New data sources provide opportunities to develop a new paradigm that can improve timeliness, geographic or subpopulation detail, and statistical efficiency. It also has the potential to reduce the costs of producing federal statistics.

The panel's first report described federal statistical agencies’ current paradigm, which relies heavily on sample surveys for producing national statistics, and challenges agencies are facing; the legal frameworks and mechanisms for protecting the privacy and confidentiality of statistical data and for providing researchers access to data, and challenges to those frameworks and mechanisms; and statistical agencies access to alternative sources of data. The panel recommended a new approach for federal statistical programs that would combine diverse data sources from government and private sector sources and the creation of a new entity that would provide the foundational elements needed for this new approach, including legal authority to access data and protect privacy.

This second of the panel's two reports builds on the analysis, conclusions, and recommendations in the first one. This report assesses alternative methods for implementing a new approach that would combine diverse data sources from government and private sector sources, including describing statistical models for combining data from multiple sources; examining statistical and computer science approaches that foster privacy protections; evaluating frameworks for assessing the quality and utility of alternative data sources; and various models for implementing the recommended new entity. Together, the two reports offer ideas and recommendations to help federal statistical agencies examine and evaluate data from alternative sources and then combine them as appropriate to provide the country with more timely, actionable, and useful information for policy makers, businesses, and individuals.
nap  report  data  privacy  governance  regulation  data_fusion 
december 2017 by rvenkat
I never signed up for this! Privacy implications of email tracking
We show that the simple act of viewing emails contains privacy pitfalls for the unwary. We assembled a corpus of commercial mailing-list emails, and find a network of hundreds of third parties that track email recipients via methods such as embedded pixels. About 30% of emails leak the recipient’s email address to one or more of these third parties when they are viewed. In the majority of cases, these leaks are intentional on the part of email senders, and further leaks occur if the recipient clicks links in emails. Mail servers and clients may employ a variety of defenses, but we analyze 16 servers and clients and find that they are far from comprehensive. We propose, prototype, and evaluate a new defense, namely stripping tracking tags from emails based on enhanced versions of existing web tracking protection lists.
data  privacy  ethics  surveillance  civil_rights 
december 2017 by rvenkat
Surveillance Intermediaries by Alan Z. Rozenshtein :: SSRN
Apple’s 2016 fight against a court order commanding it to help the FBI unlock the iPhone of one of the San Bernardino terrorists exemplifies how central the question of regulating government surveillance has become in American politics and law. But scholarly attempts to answer this question have suffered from a serious omission: scholars have ignored how government surveillance is checked by “surveillance intermediaries,” the companies like Apple, Google, and Facebook that dominate digital communications and data storage, and on whose cooperation government surveillance relies. This Article fills this gap in the scholarly literature, providing the first comprehensive analysis of how surveillance intermediaries constrain the surveillance executive. In so doing, it enhances our conceptual understanding of, and thus our ability to improve, the institutional design of government surveillance.

Surveillance intermediaries have the financial and ideological incentives to resist government requests for user data. Their techniques of resistance are: proceduralism and litigiousness that reject voluntary cooperation in favor of minimal compliance and aggressive litigation; technological unilateralism that designs products and services to make surveillance harder; and policy mobilization that rallies legislative and public opinion to limit surveillance. Surveillance intermediaries also enhance the “surveillance separation of powers”; they make the surveillance executive more subject to inter-branch constraints from Congress and the courts, and to intra-branch constraints from foreign-relations and economics agencies as well as the surveillance executive’s own surveillance-limiting components.

The normative implications of this descriptive account are important and cross-cutting. Surveillance intermediaries can both improve and worsen the “surveillance frontier”: the set of tradeoffs — between public safety, privacy, and economic growth — from which we choose surveillance policy. And while intermediaries enhance surveillance self-government when they mobilize public opinion and strengthen the surveillance separation of powers, they undermine it when their unilateral technological changes prevent the government from exercising its lawful surveillance authorities.
surveillance  big_data  privacy  algorithms  ethics  law  civil_rights  GAFA 
october 2017 by rvenkat
UW ADINT: Advertising as Surveillance
Targeted advertising is at the heart of the largest technology companies today, and is becoming increasingly precise. Simultaneously, users generate more and more personal data that is shared with advertisers as more and more of daily life becomes intertwined with networked technology. There are many studies about how users are tracked and what kinds of data are gathered. The sheer scale and precision of individual data that is collected can be concerning. However, in the broader public debate about these practices this concern is often tempered by the understanding that all this potentially sensitive data is only accessed by large corporations; these corporations are profit-motivated and could be held to account for misusing the personal data they have collected. In this work we examine the capability of a different actor -- an individual with a modest budget -- to access the data collected by the advertising ecosystem. Specifically, we find that an individual can use the targeted advertising system to conduct physical and digital surveillance on targets that use smartphone apps with ads

--over dramatized version here
computaional_advertising  surveillance  data  privacy  technology  GAFA 
october 2017 by rvenkat
Free Speech in the Algorithmic Society: Big Data, Private Governance, and New School Speech Regulation by Jack M. Balkin :: SSRN
We have now moved from the early days of the Internet to the Algorithmic Society. The Algorithmic Society features the use of algorithms, artificial intelligence agents, and Big Data to govern populations. It also features digital infrastructure companies, large multi-national social media platforms, and search engines that sit between traditional nation states and ordinary individuals, and serve as special-purpose governors of speech.

The Algorithmic Society presents two central problems for freedom of expression. First, Big Data allows new forms of manipulation and control, which private companies will attempt to legitimate and insulate from regulation by invoking free speech principles. Here First Amendment arguments will likely be employed to forestall digital privacy guarantees and prevent consumer protection regulation. Second, privately owned digital infrastructure companies and online platforms govern speech much as nation states once did. Here the First Amendment, as normally construed, is simply inadequate to protect the practical ability to speak.

The first part of the essay describes how to regulate online businesses that employ Big Data and algorithmic decision making consistent with free speech principles. Some of these businesses are "information fiduciaries" toward their end-users; they must exercise duties of good faith and non-manipulation. Other businesses who are not information fiduciaries have a duty not to engage in "algorithmic nuisance": they may not externalize the costs of their analysis and use of Big Data onto innocent third parties.

The second part of the essay turns to the emerging pluralist model of online speech regulation. This pluralist model contrasts with the traditional dyadic model in which nation states regulated the speech of their citizens.

In the pluralist model, territorial governments continue to regulate the speech directly. But they also attempt to coerce or co-opt owners of digital infrastructure to regulate the speech of others. This is "new school" speech regulation. Digital infrastructure owners, and especially social media companies, now act as private governors of speech communities, creating and enforcing various rules and norms of the communities they govern. Finally, end users, civil society organizations, hackers, and other private actors repeatedly put pressure on digital infrastructure companies to regulate speech in certain ways and not to regulate it in others. This triangular tug of war -- rather than the traditional dyadic model of states regulating the speech of private parties -- characterizes the practical ability to speak in the algorithmic society.

The essay uses the examples of the right to be forgotten and the problem of fake news to illustrate the emerging pluralist model -- and new school speech regulation -- in action.

As private governance becomes central to freedom of speech, both end-users and nation states put pressure on private governance. Nation states attempt to co-opt private companies into becoming bureaucracies for the enforcement of hate speech regulation and new doctrines like the right to be forgotten. Conversely, end users increasingly demand procedural guarantees, due process, transparency, and equal protection from private online companies.

The more that end-users view businesses as governors, or as special-purpose sovereigns, the more end-users will expect -- and demand -- that these companies should conform to the basic obligations of governors towards those they govern. These obligations include procedural fairness in handling complaints and applying sanctions, notice, transparency, reasoned explanations, consistency, and conformity to rule of law values -- the “law” in this case being the publicly stated norms and policies of the company. Digital infrastructure companies, in turn, will find that they must take on new social obligations to meet these growing threats and expectations from nation states and end-users alike.
freedom_of_speech  internet  regulation  governance  administrative_state  big_data  algorithms  privacy  data  artificial_intelligence  machine_learning  ethics  philosophy_of_technology  new_media  social_media  networked_public_sphere  public_sphere  GAFA 
september 2017 by rvenkat
Artificial Intelligence's Fair Use Crisis by Benjamin L. W. Sobel :: SSRN
As automation supplants more forms of labor, creative expression still seems like a distinctly human enterprise. This may someday change: by ingesting works of authorship as “training data,” computer programs can teach themselves to write natural prose, compose music, and generate movies. Machine learning is an artificial intelligence (AI) technology with immense potential and a commensurate appetite for copyrighted works. In the United States, the copyright law mechanism most likely to facilitate machine learning’s uses of protected data is the fair use doctrine. However, current fair use doctrine threatens either to derail the progress of machine learning or to disenfranchise the human creators whose work makes it possible.

This Article addresses the problem in three parts: using popular machine learning datasets and research as case studies, Part I describes how programs “learn” from corpora of copyrighted works and catalogs the legal risks of this practice. It concludes that fair use may not protect expressive machine learning applications, including the burgeoning field of natural language generation. Part II explains that applying today’s fair use doctrine to expressive machine learning will yield one of two undesirable outcomes: if US courts reject the fair use defense for machine learning, valuable innovation may move to another jurisdiction or halt entirely; alternatively, if courts find the technology to be fair use, sophisticated software may divert rightful earnings from the authors of input data. This dilemma shows that fair use may no longer serve its historical purpose. Traditionally, fair use is understood to benefit the public by fostering expressive activity. Today, the doctrine increasingly serves the economic interests of powerful firms at the expense of disempowered individual rightsholders. Finally, in Part III, this Article contemplates changes in doctrine and policy that could address these problems. It concludes that the United States’ interest in avoiding both prongs of AI’s fair use dilemma offers a novel justification for redistributive measures that could promote social equity alongside technological progress.
artificial_intelligence  machine_learning  big_data  ethics  privacy  intellectual_property  law  united_states_of_america 
september 2017 by rvenkat
Predicting Financial Crime: Augmenting the Predictive Policing Arsenal
Financial crime is a rampant but hidden threat. In spite of this, predictive policing systems disproportionately target “street crime” rather than white collar crime. This paper presents the White Collar Crime Early Warning System (WCCEWS), a white collar crime predictive model that uses random forest classifiers to identify high risk zones for incidents of financial crime.

--looks similar to Berk's recidivism prediction methods.
big_data  machine_learning  ethics  privacy  prediction  policing  i_remain_skeptical 
april 2017 by rvenkat
Exposed! A Survey of Attacks on Private Data - Annual Review of Statistics and Its Application, 4(1):
Privacy-preserving statistical data analysis addresses the general question of protecting privacy when publicly releasing information about a sensitive dataset. A privacy attack takes seemingly innocuous released information and uses it to discern the private details of individuals, thus demonstrating that such information compromises privacy. For example, re-identification attacks have shown that it is easy to link supposedly de-identified records to the identity of the individual concerned. This survey focuses on attacking aggregate data, such as statistics about how many individuals have a certain disease, genetic trait, or combination thereof. We consider two types of attacks: reconstruction attacks, which approximately determine a sensitive feature of all the individuals covered by the dataset, and tracing attacks, which determine whether or not a target individual’s data is included in the dataset. We also discuss techniques from the differential privacy literature for releasing approximate aggregate statistics while provably thwarting any privacy attack.
review  data  privacy  statistics  machine_learning 
december 2016 by rvenkat
[1609.05807] Inherent Trade-Offs in the Fair Determination of Risk Scores
Recent discussion in the public sphere about algorithmic classification has involved tension between competing notions of what it means for a probabilistic classification to be fair to different groups. We formalize three fairness conditions that lie at the heart of these debates, and we prove that except in highly constrained special cases, there is no method that can satisfy these three conditions simultaneously. Moreover, even satisfying all three conditions approximately requires that the data lie in an approximate version of one of the constrained special cases identified by our theorem. These results suggest some of the ways in which key notions of fairness are incompatible with each other, and hence provide a framework for thinking about the trade-offs between them.

-- with Jon Kleinberg
data  ethics  algorithms  big_data  privacy  sendhil.mullainathan 
december 2016 by rvenkat
Christo Wilson
-- works on fairness of algorithms, especially personalization systems.
people  data  privacy  ethics 
august 2016 by rvenkat
The Secretive World of Selling Data About You
Third, and most disturbing, there’s nothing consumers can do about any of this. They don’t know what data is being collected, or by whom. They don’t know what’s being done with it. They don’t know where it is going. They probably imagine specific lists being sent around, not calculated scores that may seem unrelated to the original data. And if they are concerned, there’s no way to see or correct the information about themselves being passed around.
via:henryfarrell  data  privacy  big_data  ethics  for_friends 
june 2016 by rvenkat
Evaluating the privacy properties of telephone metadata
Privacy protections against government surveillance are often scoped to communications content and exclude communications metadata. In the United States, the National Security Agency operated a particularly controversial program, collecting bulk telephone metadata nationwide. We investigate the privacy properties of telephone metadata to assess the impact of policies that distinguish between content and metadata. We find that telephone metadata is densely interconnected, can trivially be reidentified, enables automated location and relationship inferences, and can be used to determine highly sensitive traits.
teaching  data  ethics  big_data  privacy  policy  surveillance 
may 2016 by rvenkat
The Un-Territoriality of Data by Jennifer C. Daskal :: SSRN
Territoriality looms large in our jurisprudence, particularly as it relates to the government’s authority to search and seize. Fourth Amendment rights turn on whether the search or seizure takes place territorially or extraterritorially; the government’s surveillance authorities depend on whether the target is located within the United States or without; and courts’ warrant jurisdiction extends, with limited exceptions, only to the border’s edge. Yet the rise of electronic data challenges territoriality at its core. Territoriality, after all, depends on the ability to define the relevant “here” and “there,” and it presumes that the “here” and “there” have normative significance. The ease and speed with which data travels across borders, the seemingly arbitrary paths it takes, and the physical disconnect between where data is stored and where it is accessed, critically test these foundational premises. Why should either privacy rights or government access to sought-after evidence depend on where a document is stored at any given moment? Conversely, why should State A be permitted to unilaterally access data located in State B, simply because technology allows it to do so, without regard to State B’s rules governing law enforcement access to data held within its borders?

This article tackles these challenges. It explores the unique features of data, and highlights the ways in which data undermines long-standing assumptions about the link between data location and the rights and obligations that ought to apply. Specifically, it argues that a territorial-based Fourth Amendment fails to adequately protect “the people” it is intended to cover. On the flip side, the article warns against the kind of unilateral, extraterritorial law enforcement that electronic data encourages — in which nations compel the production of data located anywhere around the globe, without regard to the sovereign interests of other nation-states.
data  privacy  via:henryfarrell 
august 2015 by rvenkat
David Auerbach
He writes for slate on all things data and technology related. He has written extensively on the recent facebook studies.
data_journalism  privacy  technology 
may 2015 by rvenkat
UM Carey Law | Frank Pasquale
Has written a book on internet privacy, big data and all that. (Black Box Society)
data  privacy  law 
may 2015 by rvenkat
The Government’s Consumer Data Watchdog - NYTimes.com
NYTimes article. Mentions Ashkan Soltani, the new chief technologist at FTC. The article also mentions Edward Felten, a computer security professor at Princeton; Latanya Sweeney,a professor of government at Harvard among others. Soltani heads the Office of Technology Research and Investigation.

Should follow them more, they seem balanced in their opinion. They express concerns but don't work up paranoia.
privacy  tracking  ftc  consumer_protection 
may 2015 by rvenkat

related tags

?  administrative_state  algorithms  artificial_intelligence  arvind.narayanan  automation  baltimore  bias  big_data  block_model  china  civil_rights  collective_cognition  collective_intention  computaional_advertising  conspiracy_theories  constitutional_law  consumer_protection  contagion  contemporary_culture  crime  critique  cryptocurrency  cybersecurity  dark_web  data  data_fusion  data_journalism  data_mining  deep_learning  digital_economy  dmce  encryption  ethics  for_friends  freedom_of_speech  ftc  GAFA  google  governance  institutions  intellectual_property  internet  i_remain_skeptical  journalism  labor  latent_variable  law  machine_learning  market_microstructure  media_studies  motherboard  nap  national_surveillance_state  networked_life  networked_public_sphere  networks  network_data_analysis  new_media  NYTimes  observational_studies  online_experiments  people  philosophy_of_technology  police  policing  policy  political_economy  political_psychology  prediction  privacy  propublica  public_goods  public_policy  public_sphere  python  quanta_mag  regulation  report  reuters  review  sendhil.mullainathan  socialism  social_construction_of_knowledge  social_media  social_movements  social_networks  social_science  software  statistics  surveillance  teaching  technology  the_atlantic  the_guardian  tracking  tutorial  united_states_of_america  us_politics  us_supreme_court  via:clauset  via:cshalizi  via:henryfarrell  via:nyhan  via:randw  via:zeynep  WaPo  wikileaks  wired  zeynep.tufekci 

Copy this bookmark: