Adversarial Robustness - Theory and Practice

4 days ago

"This web page contains materials to accompany the NeurIPS 2018 tutorial, “Adversarial Robustness: Theory and Practice”, by Zico Kolter and Aleksander Madry. The notes are in very early draft form, and we will be updating them (organizing material more, writing them in a more consistent form with the relevant citations, etc) for an official release in early 2019. Until then, however, we hope they are still a useful reference that can be used to explore some of the key ideas and methodology behind adversarial robustness, from standpoints of both generating adversarial attacks on classifiers and training classifiers that are inherently robust."

adversarial-examples
4 days ago

Which US cities have good and bad public transportation - Vox

4 days ago

"Christof Spieler, a structural engineer and urban planner from Houston, has lots of opinions about public transit in America and elsewhere. In his new book, Trains, Buses, People: An Opinionated Atlas of US Transit, he maps out 47 metro areas that have rail transit or bus rapid transit, ranks the best and worst systems, and offers advice on how to build better networks."

cities
transportation
books
4 days ago

Compact Representation of Uncertainty in Clustering

4 days ago

For many classic structured prediction problems, probability distributions over the dependent variables can be efficiently computed using widely-known algorithms and data structures (such as forward-backward, and its corresponding trellis for exact probability distributions in Markov models). However, we know of no previous work studying efficient representations of exact distributions over clusterings. This paper presents definitions and proofs for a dynamic-programming inference procedure that computes the partition function, the marginal probability of a cluster, and the MAP clustering---all exactly. Rather than the Nth Bell number, these exact solutions take time and space proportional to the substantially smaller powerset of N. Indeed, we improve upon the time complexity of the algorithm introduced by Kohonen and Corander (2016) for this problem by a factor of N. While still large, this previously unknown result is intellectually interesting in its own right, makes feasible exact inference for important real-world small data applications (such as medicine), and provides a natural stepping stone towards sparse-trellis approximations that enable further scalability (which we also explore). In experiments, we demonstrate the superiority of our approach over approximate methods in analyzing real-world gene expression data used in cancer treatment.

clustering
uncertainty
4 days ago

[1805.07820] Targeted Adversarial Examples for Black Box Audio Systems

5 days ago

The application of deep recurrent networks to audio transcription has led to impressive gains in automatic speech recognition (ASR) systems. Many have demonstrated that small adversarial perturbations can fool deep neural networks into incorrectly predicting a specified target with high confidence. Current work on fooling ASR systems have focused on white-box attacks, in which the model architecture and parameters are known. In this paper, we adopt a black-box approach to adversarial generation, combining the approaches of both genetic algorithms and gradient estimation to solve the task. We achieve a 89.25% targeted attack similarity after 3000 generations while maintaining 94.6% audio file similarity.

adversarial-examples
audio
black-box
5 days ago

[1803.01814] Norm matters: efficient and accurate normalization schemes in deep networks

5 days ago

Over the past few years batch-normalization has been commonly used in deep networks, allowing faster training and high performance for a wide variety of applications. However, the reasons behind its merits remained unanswered, with several shortcomings that hindered its use for certain tasks. In this work we present a novel view on the purpose and function of normalization methods and weight-decay, as tools to decouple weights' norm from the underlying optimized objective. We also improve the use of weight-normalization and show the connection between practices such as normalization, weight decay and learning-rate adjustments. Finally, we suggest several alternatives to the widely used L2 batch-norm, using normalization in L1 and L∞ spaces that can substantially improve numerical stability in low-precision implementations as well as provide computational and memory benefits. We demonstrate that such methods enable the first batch-norm alternative to work for half-precision implementations.

neural-net
normalization
5 days ago

[1806.10909] ResNet with one-neuron hidden layers is a Universal Approximator

5 days ago

We demonstrate that a very deep ResNet with stacked modules with one neuron per hidden layer and ReLU activation functions can uniformly approximate any Lebesgue integrable function in d dimensions, i.e. ℓ1(ℝd). Because of the identity mapping inherent to ResNets, our network has alternating layers of dimension one and d. This stands in sharp contrast to fully connected networks, which are not universal approximators if their width is the input dimension d [Lu et al, 2017; Hanin and Sellke, 2017]. Hence, our result implies an increase in representational power for narrow deep networks by the ResNet architecture.

resnet
neural-net
universal-approximator
5 days ago

Modern Neural Networks Generalize on Small Data Sets

5 days ago

In this paper, we use a linear program to empirically decompose fitted neural networks into ensembles of low-bias sub-networks. We show that these sub-networks are relatively uncorrelated which leads to an internal regularization process, very much like a random forest, which can explain why a neural network is surprisingly resistant to overfitting. We then demonstrate this in practice by applying large neural networks, with hundreds of parameters per training observation, to a collection of 116 real-world data sets from the UCI Machine Learning Repository. This collection of data sets contains a much smaller number of training examples than the types of image classification tasks generally studied in the deep learning literature, as well as non-trivial label noise. We show that even in this setting deep neural nets are capable of achieving superior classification accuracy without overfitting.

neural-net
generalization
small-data
richard-berk
5 days ago

[1808.01204] Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

5 days ago

Neural networks have many successful applications, while much less theoretical understanding has been gained. Towards bridging this gap, we study the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization. In the overparameterized setting, when the data comes from mixtures of well-separated distributions, we prove that SGD learns a network with a small generalization error, albeit the network has enough capacity to fit arbitrary labels. Furthermore, the analysis provides interesting insights into several aspects of learning neural networks and can be verified based on empirical studies on synthetic data and on the MNIST dataset.

neural-net
sgd
generalization
5 days ago

[1811.00164] Deep Counterfactual Regret Minimization

11 days ago

Counterfactual Regret Minimization (CFR) is the leading algorithm for solving large imperfect-information games. It iteratively traverses the game tree in order to converge to a Nash equilibrium. In order to deal with extremely large games, CFR typically uses domain-specific heuristics to simplify the target game in a process known as abstraction. This simplified game is solved with tabular CFR, and its solution is mapped back to the full game. This paper introduces Deep Counterfactual Regret Minimization (Deep CFR), a form of CFR that obviates the need for abstraction by instead using deep neural networks to approximate the behavior of CFR in the full game. We show that Deep CFR is principled and achieves strong performance in large poker games. This is the first non-tabular variant of CFR to be successful in large games.

game-theory
deep-learning
counterfactual
regret-minimiation
cfr
algorithms
11 days ago

If Only We’d Fucking Listen to Helen DeWitt - The Millions

14 days ago

"DeWitt can ignore that inkling in part because she’s made a public persona out of questioning its merits, especially when it comes to what she calls “normative publishing.” In her interviews, wielding the sort of rationalism more commonly associated with economists, tech entrepreneurs and utilitarians, she picks apart the industry’s conventions, like the puritanical separation of authors and type-setters, the unwillingness to experiment with new revenue models, especially ones perfected by the art world, and even the norm of meetings editors in cafés, where table space is too limited for spreading out one’s papers. My favorite of her frustrations is how when she met with prospective agents she couldn’t get them to agree to squeeze as much marketing juice as possible from any future suicides she attempted. “If I could have sold off a suicide attempt,” she said in a 2008 interview, “I would have had more time for reading Spinoza.” Duh."

publishing
helen-dewitt
14 days ago

Language Log » A better way to calculate pitch range

16 days ago

"You might think that the many differences between the perceptual variable of pitch and the physical variable of fundamental frequency ("f0") arise because perception is complicated and physics is simple. But if so, you'd be mostly wrong. The biggest problem is that physical f0 is a complex and often fundamentally incoherent concept. And even in the areas where f0 is well defined, f0 estimation (usually called "pitch tracking") is prone to errors."

f0
pitch
estimation
speech
vocal-fry
16 days ago

Ibis: Python Data Analysis Productivity Framework — Ibis v0.14.0+29.gc382cba documentation

16 days ago

Ibis is a toolbox to bridge the gap between local Python environments (like pandas and scikit-learn) and remote storage and execution systems like Hadoop components (like HDFS, Impala, Hive, Spark) and SQL databases (Postgres, etc.). Its goal is to simplify analytical workflows and make you more productive.

python
libs
data
sql
16 days ago

Structural Causal Bandits: Where to Intervene?

16 days ago

We study the problem of identifying the best action in a sequential decision-making setting when the reward distributions of the arms exhibit a non-trivial dependence structure, which is governed by the underlying causal model of the domain where the agent is deployed. In this setting, playing an arm corresponds to intervening on a set of variables and setting them to specific values. In this paper, we show that whenever the underlying causal model is not taken into account during the decision-making process, the standard strategies of simultaneously intervening on all variables or on all the subsets of the variables may, in general, lead to suboptimal policies, regardless of the number of interventions performed by the agent in the environment. We formally acknowledge this phenomenon and investigate structural properties implied by the underlying causal model, which lead to a complete characterization of the relationships between the arms' distributions. We leverage this characterization to build a new algorithm that takes as input a causal structure and finds a minimal, sound, and complete set of qualified arms that an agent should play to maximize its expected reward. We empirically demonstrate that the new strategy learns an optimal policy and leads to orders of magnitude faster convergence rates when compared with its causal-insensitive counterparts.

bandits
causality
intervention
16 days ago

Equality of Opportunity in Classification: A Causal Approach

16 days ago

The Equalized Odds (for short, EO) is one of the most popular measures of discrimination used in the supervised learning setting. It ascertains fairness through the balance of the misclassification rates (false positive and negative) across the protected groups -- e.g., in the context of law enforcement, an African-American defendant who would not commit a future crime will have an equal opportunity of being released, compared to a non-recidivating Caucasian defendant. Despite this noble goal, it has been acknowledged in the literature that statistical tests based on the EO are oblivious to the underlying causal mechanisms that generated the disparity in the first place (Hardt et al. 2016). This leads to a critical disconnect between statistical measures readable from the data and the meaning of discrimination in the legal system, where compelling evidence that the observed disparity is tied to a specific causal process deemed unfair by society is required to characterize discrimination. The goal of this paper is to develop a principled approach to connect the statistical disparities characterized by the EO and the underlying, elusive, and frequently unobserved, causal mechanisms that generated such inequality. We start by introducing a new family of counterfactual measures that allows one to explain the misclassification disparities in terms of the underlying mechanisms in an arbitrary, non-parametric structural causal model. This will, in turn, allow legal and data analysts to interpret currently deployed classifiers through causal lens, linking the statistical disparities found in the data to the corresponding causal processes. Leveraging the new family of counterfactual measures, we develop a learning procedure to construct a classifier that is statistically efficient, interpretable, and compatible with the basic human intuition of fairness. We demonstrate our results through experiments in both real (COMPAS) and synthetic datasets.

machine-learning
fairness
classification
evaluation-measures
causality
16 days ago

[1811.07867] Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions

22 days ago

A recent flurry of research activity has attempted to quantitatively define "fairness" for decisions based on statistical and machine learning (ML) predictions. The rapid growth of this new field has led to wildly inconsistent terminology and notation, presenting a serious challenge for cataloguing and comparing definitions. This paper attempts to bring much-needed order.

First, we explicate the various choices and assumptions made---often implicitly---to justify the use of prediction-based decisions. Next, we show how such choices and assumptions can raise concerns about fairness and we present a notationally consistent catalogue of fairness definitions from the ML literature. In doing so, we offer a concise reference for thinking through the choices, assumptions, and fairness considerations of prediction-based decision systems.

statistics
machine-learning
fairness
causal-inference
First, we explicate the various choices and assumptions made---often implicitly---to justify the use of prediction-based decisions. Next, we show how such choices and assumptions can raise concerns about fairness and we present a notationally consistent catalogue of fairness definitions from the ML literature. In doing so, we offer a concise reference for thinking through the choices, assumptions, and fairness considerations of prediction-based decision systems.

22 days ago

Polarization in Poland: A Warning From Europe - The Atlantic

25 days ago

"The emotional appeal of a conspiracy theory is in its simplicity. It explains away complex phenomena, accounts for chance and accidents, offers the believer the satisfying sense of having special, privileged access to the truth. But—once again—separating the appeal of conspiracy from the ways it affects the careers of those who promote it is very difficult. For those who become the one-party state’s gatekeepers, for those who repeat and promote the official conspiracy theories, acceptance of these simple explanations also brings another reward: power."

poland
hungary
politics
authoritarianism
conspiracy
smolensk
25 days ago

Robustness checks are a joke - Statistical Modeling, Causal Inference, and Social Science

28 days ago

The problem as I see it is that robustness checks are supposed to be for exploration but are typically used for confirmation.

Maybe another way to put it is: As long as we recognize that robustness checks are typically used for confirmation, we can interpret them in that way. Thus, instead of taking a robustness check as evidence that a claimed finding is robust, we should take a robustness check as providing evidence on particular directions the model can be perturbed without changing the main conclusions.

hypothesis-testing
robustness
sensitivity
Maybe another way to put it is: As long as we recognize that robustness checks are typically used for confirmation, we can interpret them in that way. Thus, instead of taking a robustness check as evidence that a claimed finding is robust, we should take a robustness check as providing evidence on particular directions the model can be perturbed without changing the main conclusions.

28 days ago

[1810.12281] Three Mechanisms of Weight Decay Regularization

28 days ago

Weight decay is one of the standard tricks in the neural network toolbox, but the reasons for its regularization effect are poorly understood, and recent results have cast doubt on the traditional interpretation in terms of L2 regularization. Literal weight decay has been shown to outperform L2 regularization for optimizers for which they differ. We empirically investigate weight decay for three optimization algorithms (SGD, Adam, and K-FAC) and a variety of network architectures. We identify three distinct mechanisms by which weight decay exerts a regularization effect, depending on the particular optimization algorithm and architecture: (1) increasing the effective learning rate, (2) approximately regularizing the input-output Jacobian norm, and (3) reducing the effective damping coefficient for second-order optimization. Our results provide insight into how to improve the regularization of neural networks.

neural-net
optimization
regularization
weight-decay
l2
roger-grosse
28 days ago

[1801.02384] Attacking Speaker Recognition With Deep Generative Models

28 days ago

In this paper we investigate the ability of generative adversarial networks (GANs) to synthesize spoofing attacks on modern speaker recognition systems. We first show that samples generated with SampleRNN and WaveNet are unable to fool a CNN-based speaker recognition system. We propose a modification of the Wasserstein GAN objective function to make use of data that is real but not from the class being learned. Our semi-supervised learning method is able to perform both targeted and untargeted attacks, raising questions related to security in speaker authentication systems.

gan
adversarial-examples
speaker-recognition
28 days ago

Perception-Based Personalization of Hearing Aids Using Gaussian Processes and Active Learning - IEEE Journals & Magazine

28 days ago

Personalization of multi-parameter hearing aids involves an initial fitting followed by a manual knowledge-based trial-and-error fine-tuning from ambiguous verbal user feedback. The result is an often suboptimal HA setting whereby the full potential of modern hearing aids is not utilized. This article proposes an interactive hearing-aid personalization system that obtains an optimal individual setting of the hearing aids from direct perceptual user feedback. Results obtained with ten hearing-impaired subjects show that ten to twenty pairwise user assessments between different settings-equivalent to 5-10 min-is sufficient for personalization of up to four hearing-aid parameters. A setting obtained by the system was significantly preferred by the subject over the initial fitting, and the obtained setting could be reproduced with reasonable precision. The system may have potential for clinical usage to assist both the hearing-care professional and the user.

active-learning
hearing-aid
personalization
gaussian-processes
speech-enhancement
28 days ago

[1609.08442] Collaborative Learning for Language and Speaker Recognition

28 days ago

This paper presents a unified model to perform language and speaker recognition simultaneously and altogether. The model is based on a multi-task recurrent neural network where the output of one task is fed as the input of the other, leading to a collaborative learning framework that can improve both language and speaker recognition by borrowing information from each other. Our experiments demonstrated that the multi-task model outperforms the task-specific models on both tasks.

language-recognition
speaker-recognition
multitask-learning
28 days ago

TensorSpace.js – Present tensor in space, neural network 3D visualization framework_Github - jishuwen(技术文)

4 weeks ago

TensorSpace is a neural network 3D visualization framework built by TensorFlow.js, Three.js and Tween.js. TensorSpace provides Keras-like APIs to build deep learning layers, load pre-trained models, and generate a 3D visualization in the browser.

neural-net
tensorflow
visualization
4 weeks ago

[1811.04017] A generic framework for privacy preserving deep learning

4 weeks ago

We detail a new framework for privacy preserving deep learning and discuss its assets. The framework puts a premium on ownership and secure processing of data and introduces a valuable representation based on chains of commands and tensors. This abstraction allows one to implement complex privacy preserving constructs such as Federated Learning, Secure Multiparty Computation, and Differential Privacy while still exposing a familiar deep learning API to the end-user. We report early results on the Boston Housing and Pima Indian Diabetes datasets. While the privacy features apart from Differential Privacy do not impact the prediction accuracy, the current implementation of the framework introduces a significant overhead in performance, which will be addressed at a later stage of the development. We believe this work is an important milestone introducing the first reliable, general framework for privacy preserving deep learning.

differential-privacy
deep-learning
4 weeks ago

[1711.05408] Recurrent Neural Networks as Weighted Language Recognizers

4 weeks ago

We investigate the computational complexity of various problems for simple recurrent neural networks (RNNs) as formal models for recognizing weighted languages. We focus on the single-layer, ReLU-activation, rational-weight RNNs with softmax, which are commonly used in natural language processing applications. We show that most problems for such RNNs are undecidable, including consistency, equivalence, minimization, and the determination of the highest-weighted string. However, for consistent RNNs the last problem becomes decidable, although the solution length can surpass all computable bounds. If additionally the string is limited to polynomial length, the problem becomes NP-complete and APX-hard. In summary, this shows that approximations and heuristic algorithms are necessary in practical applications of those RNNs.

rnn
automata
4 weeks ago

[1805.04908] On the Practical Computational Power of Finite Precision RNNs for Language Recognition

4 weeks ago

While Recurrent Neural Networks (RNNs) are famously known to be Turing complete, this relies on infinite precision in the states and unbounded computation time. We consider the case of RNNs with finite precision whose computation time is linear in the input length. Under these limitations, we show that different RNN variants have different computational power. In particular, we show that the LSTM and the Elman-RNN with ReLU activation are strictly stronger than the RNN with a squashing activation and the GRU. This is achieved because LSTMs and ReLU-RNNs can easily implement counting behavior. We show empirically that the LSTM does indeed learn to effectively use the counting mechanism.

nlp
rnn
complexity
4 weeks ago

[1811.03666] On the Statistical and Information-theoretic Characteristics of Deep Network Representations

4 weeks ago

It has been common to argue or imply that a regularizer can be used to alter a statistical property of a hidden layer's representation and thus improve generalization or performance of deep networks. For instance, dropout has been known to improve performance by reducing co-adaptation, and representational sparsity has been argued as a good characteristic because many data-generation processes have a small number of factors that are independent. In this work, we analytically and empirically investigate the popular characteristics of learned representations, including correlation, sparsity, dead unit, rank, and mutual information, and disprove many of the \textit{conventional wisdom}. We first show that infinitely many Identical Output Networks (IONs) can be constructed for any deep network with a linear layer, where any invertible affine transformation can be applied to alter the layer's representation characteristics. The existence of ION proves that the correlation characteristics of representation is irrelevant to the performance. Extensions to ReLU layers are provided, too. Then, we consider sparsity, dead unit, and rank to show that only loose relationships exist among the three characteristics. It is shown that a higher sparsity or additional dead units do not imply a better or worse performance when the rank of representation is fixed. We also develop a rank regularizer and show that neither representation sparsity nor lower rank is helpful for improving performance even when the data-generation process has a small number of independent factors. Mutual information I(zl;x) and I(zl;y) are investigated, and we show that regularizers can affect I(zl;x) and thus indirectly influence the performance. Finally, we explain how a rich set of regularizers can be used as a powerful tool for performance tuning.

deep-learning
neural-net
anlaysis
information-theory
optimization
4 weeks ago

[1811.03804] Gradient Descent Finds Global Minima of Deep Neural Networks

4 weeks ago

Gradient descent finds a global minimum in training deep neural networks despite the objective function being non-convex. The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet). Our analysis relies on the particular structure of the Gram matrix induced by the neural network architecture. This structure allows us to show the Gram matrix is stable throughout the training process and this stability implies the global optimality of the gradient descent algorithm. Our bounds also shed light on the advantage of using ResNet over the fully connected feedforward architecture; our bound requires the number of neurons per layer scaling exponentially with depth for feedforward networks whereas for ResNet the bound only requires the number of neurons per layer scaling polynomially with depth. We further extend our analysis to deep residual convolutional neural networks and obtain a similar convergence result.

dnn
neural-net
analysis
gradient-descent
optimization
4 weeks ago

Specification gaming examples in AI - master list

4 weeks ago

"specification gaming examples in AI: unintended solutions to the specified objective that don't satisfy the designer's intent. This includes reward hacking behaviors in reinforcement learning"

ai
machine-learning
specification-gaming
4 weeks ago

rr: lightweight recording & deterministic debugging

5 weeks ago

rr aspires to be your primary C/C++ debugging tool for Linux, replacing — well, enhancing — gdb. You record a failure once, then debug the recording, deterministically, as many times as you want. The same execution is replayed every time.

rr also provides efficient reverse execution under gdb. Set breakpoints and data watchpoints and quickly reverse-execute to where they were hit.

c++
debugger
rr also provides efficient reverse execution under gdb. Set breakpoints and data watchpoints and quickly reverse-execute to where they were hit.

5 weeks ago

[1811.01753] How deep is deep enough? - Optimizing deep neural network architecture

5 weeks ago

Deep neural networks use stacked layers of feature detectors to repeatedly transform the input data, so that structurally different classes of input become well separated in the final layer. While the method has turned out extremely powerful in many applications, its success depends critically on the correct choice of hyperparameters, in particular the number of network layers. Here, we introduce a new measure, called the generalized discrimination value (GDV), which quantifies how well different object classes separate in each layer. Due to its definition, the GDV is invariant to translation and scaling of the input data, independent of the number of features, as well as independent of the number and permutation of the neurons within a layer. We compute the GDV in each layer of a Deep Belief Network that was trained unsupervised on the MNIST data set. Strikingly, we find that the GDV first improves with each successive network layer, but then gets worse again beyond layer 30, thus indicating the optimal network depth for this data classification task. Our further investigations suggest that the GDV can serve as a universal tool to determine the optimal number of layers in deep neural networks for any type of input data.

deep-learning
neural-net
analysis
depth
5 weeks ago

CRAN - Package longmemo

5 weeks ago

Datasets and Functionality from 'Jan Beran' (1994). Statistics for Long-Memory Processes

R
libs
time-series
long-memory
arfima
5 weeks ago

[1409.2329] Recurrent Neural Network Regularization

5 weeks ago

We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. Dropout, the most successful technique for regularizing neural networks, does not work well with RNNs and LSTMs. In this paper, we show how to correctly apply dropout to LSTMs, and show that it substantially reduces overfitting on a variety of tasks. These tasks include language modeling, speech recognition, image caption generation, and machine translation.

rnn
regularization
5 weeks ago

In North Dakota, Native Americans Try to Turn an ID Law to Their Advantage - The New York Times

6 weeks ago

"Voters’ experiences have varied greatly based on which county they live in. In Rolette County, where the Turtle Mountain Reservation is, they have been able to get addresses from the county and IDs from the tribe without much red tape. But at Standing Rock, in Sioux County, the 911 coordinator is the sheriff, Frank Landeis. That’s a deterrent to people who are afraid to interact with law enforcement, much less tell the sheriff where they live, and Sheriff Landeis is not easy to reach.

When Ms. Finn called him on Oct. 12, three days after the Supreme Court ruling, he was out. On Oct. 15, he said he was transporting prisoners and could not assign addresses that day. He was also unavailable when The New York Times called on Friday.

And in an episode recounted independently by Ms. Finn, Mr. Semans and Ms. Young, a tribal elder, Terry Yellow Fat, got through to Sheriff Landeis only to be assigned the address of a bar near his house. Mr. Semans worried that, in addition to playing into stereotypes about Native Americans and alcohol, this could expose Mr. Yellow Fat to fraud charges if he voted under an address he knew was incorrect.

So, with help from Four Directions and others, some tribes are creating addresses themselves — and preparing to do so until the polls close."

voter-suppression
voter-id
north-dakota
native-americans
When Ms. Finn called him on Oct. 12, three days after the Supreme Court ruling, he was out. On Oct. 15, he said he was transporting prisoners and could not assign addresses that day. He was also unavailable when The New York Times called on Friday.

And in an episode recounted independently by Ms. Finn, Mr. Semans and Ms. Young, a tribal elder, Terry Yellow Fat, got through to Sheriff Landeis only to be assigned the address of a bar near his house. Mr. Semans worried that, in addition to playing into stereotypes about Native Americans and alcohol, this could expose Mr. Yellow Fat to fraud charges if he voted under an address he knew was incorrect.

So, with help from Four Directions and others, some tribes are creating addresses themselves — and preparing to do so until the polls close."

6 weeks ago

[1702.08159] McKernel: A Library for Approximate Kernel Expansions in Log-linear Time

6 weeks ago

Kernel Methods Next Generation (KMNG) introduces a framework to use kernel approximates in the mini-batch setting with SGD Optimizer as an alternative to Deep Learning. McKernel is a C++ library for KMNG ML Large-scale. It contains a CPU optimized implementation of the Fastfood algorithm that allows the computation of approximated kernel expansions in log-linear time. The algorithm requires to compute the product of Walsh Hadamard Transform (WHT) matrices. A cache friendly SIMD Fast Walsh Hadamard Transform (FWHT) that achieves compelling speed and outperforms current state-of-the-art methods has been developed. McKernel allows to obtain non-linear classification combining Fastfood and a linear classifier.

kernel-methods
sgd
minibatch
6 weeks ago

Easily print GitHub markdown as beautiful PDFs

6 weeks ago

"Simply replace github.com with gitprint.com"

webapps
converter
markdown
github
printing
pdf
6 weeks ago

Eddie Murphy and the Dangers of Counterfactual Causal Thinking About Detecting Racial Discrimination by Issa Kohler-Hausmann :: SSRN

6 weeks ago

"The model of discrimination animating some of the most common approaches to detecting discrimination in both law and social science—the counterfactual causal model—is wrong. In that model, racial discrimination is detected by measuring the “treatment effect of race,” where the treatment is conceptualized as manipulating the raced status of otherwise identical units (e.g., a person, a neighborhood, a school). Most objections to talking about race as a cause in the counterfactual model have been raised in terms of manipulability. If we cannot manipulate a person’s race at the moment of a police stop, traffic encounter, or prosecutorial charging decision, then it is impossible to detect if the person’s race was the sole cause of an unfavorable outcome. But this debate has proceeded on the wrong terms. The counterfactual causal model of discrimination is not wrong because we can’t work around the practical limits of manipulation, as evidenced by both Eddie Murphy’s comic genius in the SNL skit “White Like Me” and the entire genre of audit and correspondence studies. It is wrong because to fit the rigor of the counterfactual model of a clearly defined treatment on otherwise identical units, we must reduce race to only the signs of the category, meaning we must think race is skin color, or phenotype, or other ways we identify group status. And that is a concept mistake if one subscribes to a constructivist, as opposed to biological or genetic, conception of race. I argue that the counterfactual causal model of discrimination is based on a flawed theory of (1) what the category of race references and how it produces effects in the world and (2) what is meant when we say it is wrong to make decisions of import because of race. We cannot detect actions as discriminatory by identifying a relation of counterfactual causality; we can only do so by reasoning about its distinctive wrongfulness by referencing what constitutes the very categories that are the objects of concern."

discrimination
race
law
reasoning
causality
counterfactual
6 weeks ago

[1810.10703] K For The Price Of 1: Parameter Efficient Multi-task And Transfer Learning

7 weeks ago

We introduce a novel method that enables parameter-efficient transfer and multitask learning. The basic approach is to allow a model patch - a small set of parameters - to specialize to each task, instead of fine-tuning the last layer or the entire network. For instance, we show that learning a set of scales and biases allows a network to learn a completely different embedding that could be used for different tasks (such as converting an SSD detection model into a 1000-class classification model while reusing 98% of parameters of the feature extractor). Similarly, we show that re-learning the existing low-parameter layers (such as depth-wise convolutions) also improves accuracy significantly. Our approach allows both simultaneous (multi-task) learning as well as sequential transfer learning wherein we adapt pretrained networks to solve new problems. For multi-task learning, despite using much fewer parameters than traditional logits-only fine-tuning, we match single-task-based performance.

transfer-learning
7 weeks ago

active-learning
advice
ai
ajax
algorithms
amazon
analysis
architecture
argumentation
art
asp.net
asr
audio
bayesian
bioinformatics
biology
blogs
book
books
browser
business
c
c++
classification
cli
clustering
code
color
comparison
compsci
computer-vision
concurrency
convnet
courses
critique
css
culture
d3
data
data-analysis
data-mining
database
datasets
debugging
deep-learning
design
dip
distcomp
django
dsp
dtw
economics
education
email
erlang
evolution
extension
facebook
finance
firefox
food
free
functional
funny
gan
genetics
geo
geometry
git
google
graph
graphical-models
graphics
gui
haskell
history
html
http
humor
image
information-theory
internet
ir
java
javascript
journalism
jquery
knn
language
latex
library
libs
links
linux
logic
mac
machine-learning
mapping
maps
markets
math
matlab
matplotlib
matrix
memory
mobile
model-selection
music
net
networks
neural-net
nlp
notes
numeric
numpy
nyc
opensource
optimization
papers
parallel
pdf
people
performance
philosophy
photos
physics
pkg
playlist
plc
plotting
plugins
politics
postgresql
privacy
probability
productivity
proglang
programming
psychology
python
r
read
rec
recipes
ref
reference
regression
regularization
reinforcement-learning
research
rest
reviews
rnn
ruby
scalability
scaling
scicomp
science
scifi
search
security
sgd
similarity
slides
social-software
software
speech
sql
startup
statcomp
statistics
stats
submodularity
surveys
swdev
talks
teaching
tech
tensorflow
testing
text
thesis
time-series
tips
tutorial
tutorials
twitter
ui
unix
utils
via:arthegall
via:chl
via:cshalizi
video
videos
vim
visualization
web
webapp
webapps
webdev
windows
writing