Adversarial Robustness - Theory and Practice
"This web page contains materials to accompany the NeurIPS 2018 tutorial, “Adversarial Robustness: Theory and Practice”, by Zico Kolter and Aleksander Madry. The notes are in very early draft form, and we will be updating them (organizing material more, writing them in a more consistent form with the relevant citations, etc) for an official release in early 2019. Until then, however, we hope they are still a useful reference that can be used to explore some of the key ideas and methodology behind adversarial robustness, from standpoints of both generating adversarial attacks on classifiers and training classifiers that are inherently robust."
4 days ago
Which US cities have good and bad public transportation - Vox
"Christof Spieler, a structural engineer and urban planner from Houston, has lots of opinions about public transit in America and elsewhere. In his new book, Trains, Buses, People: An Opinionated Atlas of US Transit, he maps out 47 metro areas that have rail transit or bus rapid transit, ranks the best and worst systems, and offers advice on how to build better networks."
cities  transportation  books 
4 days ago
Compact Representation of Uncertainty in Clustering
For many classic structured prediction problems, probability distributions over the dependent variables can be efficiently computed using widely-known algorithms and data structures (such as forward-backward, and its corresponding trellis for exact probability distributions in Markov models). However, we know of no previous work studying efficient representations of exact distributions over clusterings. This paper presents definitions and proofs for a dynamic-programming inference procedure that computes the partition function, the marginal probability of a cluster, and the MAP clustering---all exactly. Rather than the Nth Bell number, these exact solutions take time and space proportional to the substantially smaller powerset of N. Indeed, we improve upon the time complexity of the algorithm introduced by Kohonen and Corander (2016) for this problem by a factor of N. While still large, this previously unknown result is intellectually interesting in its own right, makes feasible exact inference for important real-world small data applications (such as medicine), and provides a natural stepping stone towards sparse-trellis approximations that enable further scalability (which we also explore). In experiments, we demonstrate the superiority of our approach over approximate methods in analyzing real-world gene expression data used in cancer treatment.
clustering  uncertainty 
4 days ago
[1805.07820] Targeted Adversarial Examples for Black Box Audio Systems
The application of deep recurrent networks to audio transcription has led to impressive gains in automatic speech recognition (ASR) systems. Many have demonstrated that small adversarial perturbations can fool deep neural networks into incorrectly predicting a specified target with high confidence. Current work on fooling ASR systems have focused on white-box attacks, in which the model architecture and parameters are known. In this paper, we adopt a black-box approach to adversarial generation, combining the approaches of both genetic algorithms and gradient estimation to solve the task. We achieve a 89.25% targeted attack similarity after 3000 generations while maintaining 94.6% audio file similarity.
adversarial-examples  audio  black-box 
5 days ago
[1803.01814] Norm matters: efficient and accurate normalization schemes in deep networks
Over the past few years batch-normalization has been commonly used in deep networks, allowing faster training and high performance for a wide variety of applications. However, the reasons behind its merits remained unanswered, with several shortcomings that hindered its use for certain tasks. In this work we present a novel view on the purpose and function of normalization methods and weight-decay, as tools to decouple weights' norm from the underlying optimized objective. We also improve the use of weight-normalization and show the connection between practices such as normalization, weight decay and learning-rate adjustments. Finally, we suggest several alternatives to the widely used L2 batch-norm, using normalization in L1 and L∞ spaces that can substantially improve numerical stability in low-precision implementations as well as provide computational and memory benefits. We demonstrate that such methods enable the first batch-norm alternative to work for half-precision implementations.
neural-net  normalization 
5 days ago
[1806.10909] ResNet with one-neuron hidden layers is a Universal Approximator
We demonstrate that a very deep ResNet with stacked modules with one neuron per hidden layer and ReLU activation functions can uniformly approximate any Lebesgue integrable function in d dimensions, i.e. ℓ1(ℝd). Because of the identity mapping inherent to ResNets, our network has alternating layers of dimension one and d. This stands in sharp contrast to fully connected networks, which are not universal approximators if their width is the input dimension d [Lu et al, 2017; Hanin and Sellke, 2017]. Hence, our result implies an increase in representational power for narrow deep networks by the ResNet architecture.
resnet  neural-net  universal-approximator 
5 days ago
Modern Neural Networks Generalize on Small Data Sets
In this paper, we use a linear program to empirically decompose fitted neural networks into ensembles of low-bias sub-networks. We show that these sub-networks are relatively uncorrelated which leads to an internal regularization process, very much like a random forest, which can explain why a neural network is surprisingly resistant to overfitting. We then demonstrate this in practice by applying large neural networks, with hundreds of parameters per training observation, to a collection of 116 real-world data sets from the UCI Machine Learning Repository. This collection of data sets contains a much smaller number of training examples than the types of image classification tasks generally studied in the deep learning literature, as well as non-trivial label noise. We show that even in this setting deep neural nets are capable of achieving superior classification accuracy without overfitting.
neural-net  generalization  small-data  richard-berk 
5 days ago
[1808.01204] Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data
Neural networks have many successful applications, while much less theoretical understanding has been gained. Towards bridging this gap, we study the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization. In the overparameterized setting, when the data comes from mixtures of well-separated distributions, we prove that SGD learns a network with a small generalization error, albeit the network has enough capacity to fit arbitrary labels. Furthermore, the analysis provides interesting insights into several aspects of learning neural networks and can be verified based on empirical studies on synthetic data and on the MNIST dataset.
neural-net  sgd  generalization 
5 days ago
[1811.00164] Deep Counterfactual Regret Minimization
Counterfactual Regret Minimization (CFR) is the leading algorithm for solving large imperfect-information games. It iteratively traverses the game tree in order to converge to a Nash equilibrium. In order to deal with extremely large games, CFR typically uses domain-specific heuristics to simplify the target game in a process known as abstraction. This simplified game is solved with tabular CFR, and its solution is mapped back to the full game. This paper introduces Deep Counterfactual Regret Minimization (Deep CFR), a form of CFR that obviates the need for abstraction by instead using deep neural networks to approximate the behavior of CFR in the full game. We show that Deep CFR is principled and achieves strong performance in large poker games. This is the first non-tabular variant of CFR to be successful in large games.
game-theory  deep-learning  counterfactual  regret-minimiation  cfr  algorithms 
11 days ago
If Only We’d Fucking Listen to Helen DeWitt - The Millions
"DeWitt can ignore that inkling in part because she’s made a public persona out of questioning its merits, especially when it comes to what she calls “normative publishing.” In her interviews, wielding the sort of rationalism more commonly associated with economists, tech entrepreneurs and utilitarians, she picks apart the industry’s conventions, like the puritanical separation of authors and type-setters, the unwillingness to experiment with new revenue models, especially ones perfected by the art world, and even the norm of meetings editors in cafés, where table space is too limited for spreading out one’s papers. My favorite of her frustrations is how when she met with prospective agents she couldn’t get them to agree to squeeze as much marketing juice as possible from any future suicides she attempted. “If I could have sold off a suicide attempt,” she said in a 2008 interview, “I would have had more time for reading Spinoza.” Duh."
publishing  helen-dewitt 
14 days ago
Language Log » A better way to calculate pitch range
"You might think that the many differences between the perceptual variable of pitch and the physical variable of fundamental frequency ("f0") arise because perception is complicated and physics is simple. But if so, you'd be mostly wrong. The biggest problem is that physical f0 is a complex and often fundamentally incoherent concept. And even in the areas where f0 is well defined, f0 estimation (usually called "pitch tracking") is prone to errors."
f0  pitch  estimation  speech  vocal-fry 
16 days ago
Ibis: Python Data Analysis Productivity Framework — Ibis v0.14.0+29.gc382cba documentation
Ibis is a toolbox to bridge the gap between local Python environments (like pandas and scikit-learn) and remote storage and execution systems like Hadoop components (like HDFS, Impala, Hive, Spark) and SQL databases (Postgres, etc.). Its goal is to simplify analytical workflows and make you more productive.
python  libs  data  sql 
16 days ago
Structural Causal Bandits: Where to Intervene?
We study the problem of identifying the best action in a sequential decision-making setting when the reward distributions of the arms exhibit a non-trivial dependence structure, which is governed by the underlying causal model of the domain where the agent is deployed. In this setting, playing an arm corresponds to intervening on a set of variables and setting them to specific values. In this paper, we show that whenever the underlying causal model is not taken into account during the decision-making process, the standard strategies of simultaneously intervening on all variables or on all the subsets of the variables may, in general, lead to suboptimal policies, regardless of the number of interventions performed by the agent in the environment. We formally acknowledge this phenomenon and investigate structural properties implied by the underlying causal model, which lead to a complete characterization of the relationships between the arms' distributions. We leverage this characterization to build a new algorithm that takes as input a causal structure and finds a minimal, sound, and complete set of qualified arms that an agent should play to maximize its expected reward. We empirically demonstrate that the new strategy learns an optimal policy and leads to orders of magnitude faster convergence rates when compared with its causal-insensitive counterparts.
bandits  causality  intervention 
16 days ago
Equality of Opportunity in Classification: A Causal Approach
The Equalized Odds (for short, EO) is one of the most popular measures of discrimination used in the supervised learning setting. It ascertains fairness through the balance of the misclassification rates (false positive and negative) across the protected groups -- e.g., in the context of law enforcement, an African-American defendant who would not commit a future crime will have an equal opportunity of being released, compared to a non-recidivating Caucasian defendant. Despite this noble goal, it has been acknowledged in the literature that statistical tests based on the EO are oblivious to the underlying causal mechanisms that generated the disparity in the first place (Hardt et al. 2016). This leads to a critical disconnect between statistical measures readable from the data and the meaning of discrimination in the legal system, where compelling evidence that the observed disparity is tied to a specific causal process deemed unfair by society is required to characterize discrimination. The goal of this paper is to develop a principled approach to connect the statistical disparities characterized by the EO and the underlying, elusive, and frequently unobserved, causal mechanisms that generated such inequality. We start by introducing a new family of counterfactual measures that allows one to explain the misclassification disparities in terms of the underlying mechanisms in an arbitrary, non-parametric structural causal model. This will, in turn, allow legal and data analysts to interpret currently deployed classifiers through causal lens, linking the statistical disparities found in the data to the corresponding causal processes. Leveraging the new family of counterfactual measures, we develop a learning procedure to construct a classifier that is statistically efficient, interpretable, and compatible with the basic human intuition of fairness. We demonstrate our results through experiments in both real (COMPAS) and synthetic datasets.
machine-learning  fairness  classification  evaluation-measures  causality 
16 days ago
[1811.07867] Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions
A recent flurry of research activity has attempted to quantitatively define "fairness" for decisions based on statistical and machine learning (ML) predictions. The rapid growth of this new field has led to wildly inconsistent terminology and notation, presenting a serious challenge for cataloguing and comparing definitions. This paper attempts to bring much-needed order.
First, we explicate the various choices and assumptions made---often implicitly---to justify the use of prediction-based decisions. Next, we show how such choices and assumptions can raise concerns about fairness and we present a notationally consistent catalogue of fairness definitions from the ML literature. In doing so, we offer a concise reference for thinking through the choices, assumptions, and fairness considerations of prediction-based decision systems.
statistics  machine-learning  fairness  causal-inference 
22 days ago
Polarization in Poland: A Warning From Europe - The Atlantic
"The emotional appeal of a conspiracy theory is in its simplicity. It explains away complex phenomena, accounts for chance and accidents, offers the believer the satisfying sense of having special, privileged access to the truth. But—once again—separating the appeal of conspiracy from the ways it affects the careers of those who promote it is very difficult. For those who become the one-party state’s gatekeepers, for those who repeat and promote the official conspiracy theories, acceptance of these simple explanations also brings another reward: power."
poland  hungary  politics  authoritarianism  conspiracy  smolensk 
25 days ago
Robustness checks are a joke - Statistical Modeling, Causal Inference, and Social Science
The problem as I see it is that robustness checks are supposed to be for exploration but are typically used for confirmation.

Maybe another way to put it is: As long as we recognize that robustness checks are typically used for confirmation, we can interpret them in that way. Thus, instead of taking a robustness check as evidence that a claimed finding is robust, we should take a robustness check as providing evidence on particular directions the model can be perturbed without changing the main conclusions.
hypothesis-testing  robustness  sensitivity 
28 days ago
[1810.12281] Three Mechanisms of Weight Decay Regularization
Weight decay is one of the standard tricks in the neural network toolbox, but the reasons for its regularization effect are poorly understood, and recent results have cast doubt on the traditional interpretation in terms of L2 regularization. Literal weight decay has been shown to outperform L2 regularization for optimizers for which they differ. We empirically investigate weight decay for three optimization algorithms (SGD, Adam, and K-FAC) and a variety of network architectures. We identify three distinct mechanisms by which weight decay exerts a regularization effect, depending on the particular optimization algorithm and architecture: (1) increasing the effective learning rate, (2) approximately regularizing the input-output Jacobian norm, and (3) reducing the effective damping coefficient for second-order optimization. Our results provide insight into how to improve the regularization of neural networks.
neural-net  optimization  regularization  weight-decay  l2  roger-grosse 
28 days ago
[1801.02384] Attacking Speaker Recognition With Deep Generative Models
In this paper we investigate the ability of generative adversarial networks (GANs) to synthesize spoofing attacks on modern speaker recognition systems. We first show that samples generated with SampleRNN and WaveNet are unable to fool a CNN-based speaker recognition system. We propose a modification of the Wasserstein GAN objective function to make use of data that is real but not from the class being learned. Our semi-supervised learning method is able to perform both targeted and untargeted attacks, raising questions related to security in speaker authentication systems.
gan  adversarial-examples  speaker-recognition 
28 days ago
Perception-Based Personalization of Hearing Aids Using Gaussian Processes and Active Learning - IEEE Journals & Magazine
Personalization of multi-parameter hearing aids involves an initial fitting followed by a manual knowledge-based trial-and-error fine-tuning from ambiguous verbal user feedback. The result is an often suboptimal HA setting whereby the full potential of modern hearing aids is not utilized. This article proposes an interactive hearing-aid personalization system that obtains an optimal individual setting of the hearing aids from direct perceptual user feedback. Results obtained with ten hearing-impaired subjects show that ten to twenty pairwise user assessments between different settings-equivalent to 5-10 min-is sufficient for personalization of up to four hearing-aid parameters. A setting obtained by the system was significantly preferred by the subject over the initial fitting, and the obtained setting could be reproduced with reasonable precision. The system may have potential for clinical usage to assist both the hearing-care professional and the user.
active-learning  hearing-aid  personalization  gaussian-processes  speech-enhancement 
28 days ago
[1609.08442] Collaborative Learning for Language and Speaker Recognition
This paper presents a unified model to perform language and speaker recognition simultaneously and altogether. The model is based on a multi-task recurrent neural network where the output of one task is fed as the input of the other, leading to a collaborative learning framework that can improve both language and speaker recognition by borrowing information from each other. Our experiments demonstrated that the multi-task model outperforms the task-specific models on both tasks.
language-recognition  speaker-recognition  multitask-learning 
28 days ago
TensorSpace.js – Present tensor in space, neural network 3D visualization framework_Github - jishuwen(技术文)
TensorSpace is a neural network 3D visualization framework built by TensorFlow.js, Three.js and Tween.js. TensorSpace provides Keras-like APIs to build deep learning layers, load pre-trained models, and generate a 3D visualization in the browser.
neural-net  tensorflow  visualization 
4 weeks ago
[1811.04017] A generic framework for privacy preserving deep learning
We detail a new framework for privacy preserving deep learning and discuss its assets. The framework puts a premium on ownership and secure processing of data and introduces a valuable representation based on chains of commands and tensors. This abstraction allows one to implement complex privacy preserving constructs such as Federated Learning, Secure Multiparty Computation, and Differential Privacy while still exposing a familiar deep learning API to the end-user. We report early results on the Boston Housing and Pima Indian Diabetes datasets. While the privacy features apart from Differential Privacy do not impact the prediction accuracy, the current implementation of the framework introduces a significant overhead in performance, which will be addressed at a later stage of the development. We believe this work is an important milestone introducing the first reliable, general framework for privacy preserving deep learning.
differential-privacy  deep-learning 
4 weeks ago
[1711.05408] Recurrent Neural Networks as Weighted Language Recognizers
We investigate the computational complexity of various problems for simple recurrent neural networks (RNNs) as formal models for recognizing weighted languages. We focus on the single-layer, ReLU-activation, rational-weight RNNs with softmax, which are commonly used in natural language processing applications. We show that most problems for such RNNs are undecidable, including consistency, equivalence, minimization, and the determination of the highest-weighted string. However, for consistent RNNs the last problem becomes decidable, although the solution length can surpass all computable bounds. If additionally the string is limited to polynomial length, the problem becomes NP-complete and APX-hard. In summary, this shows that approximations and heuristic algorithms are necessary in practical applications of those RNNs.
rnn  automata 
4 weeks ago
[1805.04908] On the Practical Computational Power of Finite Precision RNNs for Language Recognition
While Recurrent Neural Networks (RNNs) are famously known to be Turing complete, this relies on infinite precision in the states and unbounded computation time. We consider the case of RNNs with finite precision whose computation time is linear in the input length. Under these limitations, we show that different RNN variants have different computational power. In particular, we show that the LSTM and the Elman-RNN with ReLU activation are strictly stronger than the RNN with a squashing activation and the GRU. This is achieved because LSTMs and ReLU-RNNs can easily implement counting behavior. We show empirically that the LSTM does indeed learn to effectively use the counting mechanism.
nlp  rnn  complexity 
4 weeks ago
[1811.03666] On the Statistical and Information-theoretic Characteristics of Deep Network Representations
It has been common to argue or imply that a regularizer can be used to alter a statistical property of a hidden layer's representation and thus improve generalization or performance of deep networks. For instance, dropout has been known to improve performance by reducing co-adaptation, and representational sparsity has been argued as a good characteristic because many data-generation processes have a small number of factors that are independent. In this work, we analytically and empirically investigate the popular characteristics of learned representations, including correlation, sparsity, dead unit, rank, and mutual information, and disprove many of the \textit{conventional wisdom}. We first show that infinitely many Identical Output Networks (IONs) can be constructed for any deep network with a linear layer, where any invertible affine transformation can be applied to alter the layer's representation characteristics. The existence of ION proves that the correlation characteristics of representation is irrelevant to the performance. Extensions to ReLU layers are provided, too. Then, we consider sparsity, dead unit, and rank to show that only loose relationships exist among the three characteristics. It is shown that a higher sparsity or additional dead units do not imply a better or worse performance when the rank of representation is fixed. We also develop a rank regularizer and show that neither representation sparsity nor lower rank is helpful for improving performance even when the data-generation process has a small number of independent factors. Mutual information I(zl;x) and I(zl;y) are investigated, and we show that regularizers can affect I(zl;x) and thus indirectly influence the performance. Finally, we explain how a rich set of regularizers can be used as a powerful tool for performance tuning.
deep-learning  neural-net  anlaysis  information-theory  optimization 
4 weeks ago
[1811.03804] Gradient Descent Finds Global Minima of Deep Neural Networks
Gradient descent finds a global minimum in training deep neural networks despite the objective function being non-convex. The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet). Our analysis relies on the particular structure of the Gram matrix induced by the neural network architecture. This structure allows us to show the Gram matrix is stable throughout the training process and this stability implies the global optimality of the gradient descent algorithm. Our bounds also shed light on the advantage of using ResNet over the fully connected feedforward architecture; our bound requires the number of neurons per layer scaling exponentially with depth for feedforward networks whereas for ResNet the bound only requires the number of neurons per layer scaling polynomially with depth. We further extend our analysis to deep residual convolutional neural networks and obtain a similar convergence result.
dnn  neural-net  analysis  gradient-descent  optimization 
4 weeks ago
Specification gaming examples in AI - master list
"specification gaming examples in AI: unintended solutions to the specified objective that don't satisfy the designer's intent. This includes reward hacking behaviors in reinforcement learning"
ai  machine-learning  specification-gaming 
4 weeks ago
rr: lightweight recording & deterministic debugging
rr aspires to be your primary C/C++ debugging tool for Linux, replacing — well, enhancing — gdb. You record a failure once, then debug the recording, deterministically, as many times as you want. The same execution is replayed every time.
rr also provides efficient reverse execution under gdb. Set breakpoints and data watchpoints and quickly reverse-execute to where they were hit.
c++  debugger 
5 weeks ago
[1811.01753] How deep is deep enough? - Optimizing deep neural network architecture
Deep neural networks use stacked layers of feature detectors to repeatedly transform the input data, so that structurally different classes of input become well separated in the final layer. While the method has turned out extremely powerful in many applications, its success depends critically on the correct choice of hyperparameters, in particular the number of network layers. Here, we introduce a new measure, called the generalized discrimination value (GDV), which quantifies how well different object classes separate in each layer. Due to its definition, the GDV is invariant to translation and scaling of the input data, independent of the number of features, as well as independent of the number and permutation of the neurons within a layer. We compute the GDV in each layer of a Deep Belief Network that was trained unsupervised on the MNIST data set. Strikingly, we find that the GDV first improves with each successive network layer, but then gets worse again beyond layer 30, thus indicating the optimal network depth for this data classification task. Our further investigations suggest that the GDV can serve as a universal tool to determine the optimal number of layers in deep neural networks for any type of input data.
deep-learning  neural-net  analysis  depth 
5 weeks ago
CRAN - Package longmemo
Datasets and Functionality from 'Jan Beran' (1994). Statistics for Long-Memory Processes
R  libs  time-series  long-memory  arfima 
5 weeks ago
[1409.2329] Recurrent Neural Network Regularization
We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. Dropout, the most successful technique for regularizing neural networks, does not work well with RNNs and LSTMs. In this paper, we show how to correctly apply dropout to LSTMs, and show that it substantially reduces overfitting on a variety of tasks. These tasks include language modeling, speech recognition, image caption generation, and machine translation.
rnn  regularization 
5 weeks ago
In North Dakota, Native Americans Try to Turn an ID Law to Their Advantage - The New York Times
"Voters’ experiences have varied greatly based on which county they live in. In Rolette County, where the Turtle Mountain Reservation is, they have been able to get addresses from the county and IDs from the tribe without much red tape. But at Standing Rock, in Sioux County, the 911 coordinator is the sheriff, Frank Landeis. That’s a deterrent to people who are afraid to interact with law enforcement, much less tell the sheriff where they live, and Sheriff Landeis is not easy to reach.

When Ms. Finn called him on Oct. 12, three days after the Supreme Court ruling, he was out. On Oct. 15, he said he was transporting prisoners and could not assign addresses that day. He was also unavailable when The New York Times called on Friday.

And in an episode recounted independently by Ms. Finn, Mr. Semans and Ms. Young, a tribal elder, Terry Yellow Fat, got through to Sheriff Landeis only to be assigned the address of a bar near his house. Mr. Semans worried that, in addition to playing into stereotypes about Native Americans and alcohol, this could expose Mr. Yellow Fat to fraud charges if he voted under an address he knew was incorrect.

So, with help from Four Directions and others, some tribes are creating addresses themselves — and preparing to do so until the polls close."
voter-suppression  voter-id  north-dakota  native-americans 
6 weeks ago
[1702.08159] McKernel: A Library for Approximate Kernel Expansions in Log-linear Time
Kernel Methods Next Generation (KMNG) introduces a framework to use kernel approximates in the mini-batch setting with SGD Optimizer as an alternative to Deep Learning. McKernel is a C++ library for KMNG ML Large-scale. It contains a CPU optimized implementation of the Fastfood algorithm that allows the computation of approximated kernel expansions in log-linear time. The algorithm requires to compute the product of Walsh Hadamard Transform (WHT) matrices. A cache friendly SIMD Fast Walsh Hadamard Transform (FWHT) that achieves compelling speed and outperforms current state-of-the-art methods has been developed. McKernel allows to obtain non-linear classification combining Fastfood and a linear classifier.
kernel-methods  sgd  minibatch 
6 weeks ago
Eddie Murphy and the Dangers of Counterfactual Causal Thinking About Detecting Racial Discrimination by Issa Kohler-Hausmann :: SSRN
"The model of discrimination animating some of the most common approaches to detecting discrimination in both law and social science—the counterfactual causal model—is wrong. In that model, racial discrimination is detected by measuring the “treatment effect of race,” where the treatment is conceptualized as manipulating the raced status of otherwise identical units (e.g., a person, a neighborhood, a school). Most objections to talking about race as a cause in the counterfactual model have been raised in terms of manipulability. If we cannot manipulate a person’s race at the moment of a police stop, traffic encounter, or prosecutorial charging decision, then it is impossible to detect if the person’s race was the sole cause of an unfavorable outcome. But this debate has proceeded on the wrong terms. The counterfactual causal model of discrimination is not wrong because we can’t work around the practical limits of manipulation, as evidenced by both Eddie Murphy’s comic genius in the SNL skit “White Like Me” and the entire genre of audit and correspondence studies. It is wrong because to fit the rigor of the counterfactual model of a clearly defined treatment on otherwise identical units, we must reduce race to only the signs of the category, meaning we must think race is skin color, or phenotype, or other ways we identify group status. And that is a concept mistake if one subscribes to a constructivist, as opposed to biological or genetic, conception of race. I argue that the counterfactual causal model of discrimination is based on a flawed theory of (1) what the category of race references and how it produces effects in the world and (2) what is meant when we say it is wrong to make decisions of import because of race. We cannot detect actions as discriminatory by identifying a relation of counterfactual causality; we can only do so by reasoning about its distinctive wrongfulness by referencing what constitutes the very categories that are the objects of concern."
discrimination  race  law  reasoning  causality  counterfactual 
6 weeks ago
[1810.10703] K For The Price Of 1: Parameter Efficient Multi-task And Transfer Learning
We introduce a novel method that enables parameter-efficient transfer and multitask learning. The basic approach is to allow a model patch - a small set of parameters - to specialize to each task, instead of fine-tuning the last layer or the entire network. For instance, we show that learning a set of scales and biases allows a network to learn a completely different embedding that could be used for different tasks (such as converting an SSD detection model into a 1000-class classification model while reusing 98% of parameters of the feature extractor). Similarly, we show that re-learning the existing low-parameter layers (such as depth-wise convolutions) also improves accuracy significantly. Our approach allows both simultaneous (multi-task) learning as well as sequential transfer learning wherein we adapt pretrained networks to solve new problems. For multi-task learning, despite using much fewer parameters than traditional logits-only fine-tuning, we match single-task-based performance.
7 weeks ago
« earlier      
active-learning advice ai ajax algorithms amazon analysis architecture argumentation art asp.net asr audio bayesian bioinformatics biology blogs book books browser business c c++ classification cli clustering code color comparison compsci computer-vision concurrency convnet courses critique css culture d3 data data-analysis data-mining database datasets debugging deep-learning design dip distcomp django dsp dtw economics education email erlang evolution extension facebook finance firefox food free functional funny gan genetics geo geometry git google graph graphical-models graphics gui haskell history html http humor image information-theory internet ir java javascript journalism jquery knn language latex library libs links linux logic mac machine-learning mapping maps markets math matlab matplotlib matrix memory mobile model-selection music net networks neural-net nlp notes numeric numpy nyc opensource optimization papers parallel pdf people performance philosophy photos physics pkg playlist plc plotting plugins politics postgresql privacy probability productivity proglang programming psychology python r read rec recipes ref reference regression regularization reinforcement-learning research rest reviews rnn ruby scalability scaling scicomp science scifi search security sgd similarity slides social-software software speech sql startup statcomp statistics stats submodularity surveys swdev talks teaching tech tensorflow testing text thesis time-series tips tutorial tutorials twitter ui unix utils via:arthegall via:chl via:cshalizi video videos vim visualization web webapp webapps webdev windows writing

Copy this bookmark: