**classifiers**226

[1709.05862] Recognizing Objects In-the-wild: Where Do We Stand?

9 weeks ago by cshalizi

"The ability to recognize objects is an essential skill for a robotic system acting in human-populated environments. Despite decades of effort from the robotic and vision research communities, robots are still missing good visual perceptual systems, preventing the use of autonomous agents for real-world applications. The progress is slowed down by the lack of a testbed able to accurately represent the world perceived by the robot in-the-wild. In order to fill this gap, we introduce a large-scale, multi-view object dataset collected with an RGB-D camera mounted on a mobile robot. The dataset embeds the challenges faced by a robot in a real-life application and provides a useful tool for validating object recognition algorithms. Besides describing the characteristics of the dataset, the paper evaluates the performance of a collection of well-established deep convolutional networks on the new dataset and analyzes the transferability of deep representations from Web images to robotic data. Despite the promising results obtained with such representations, the experiments demonstrate that object classification with real-life robotic data is far from being solved. Finally, we provide a comparative study to analyze and highlight the open challenges in robot vision, explaining the discrepancies in the performance."

to:NB
machine_learning
neural_networks
your_favorite_deep_neural_network_sucks
classifiers
to_read
via:melanie_mitchell
9 weeks ago by cshalizi

Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning

september 2018 by cshalizi

"Randomized neural networks are immortalized in this AI Koan: In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6. What are you doing?'' asked Minsky. I am training a randomly wired neural net to play tic-tac-toe,'' Sussman replied. Why is the net wired randomly?'' asked Minsky. Sussman replied, I do not want it to have any preconceptions of how to play.'' Minsky then shut his eyes. Why do you close your eyes?'' Sussman asked his teacher. So that the room will be empty,'' replied Minsky. At that moment, Sussman was enlightened. We analyze shallow random networks with the help of concentration of measure inequalities. Specifically, we consider architectures that compute a weighted sum of their inputs after passing them through a bank of arbitrary randomized nonlinearities. We identify conditions under which these networks exhibit good classification performance, and bound their test error in terms of the size of the dataset and the number of random nonlinearities."

--- Have I never bookmarked this before?

in_NB
approximation
kernel_methods
random_projections
statistics
prediction
classifiers
rahimi.ali
recht.benjamin
machine_learning
have_read
--- Have I never bookmarked this before?

september 2018 by cshalizi

cultural cognition project - Cultural Cognition Blog - Humans using statistical models are embarrassingly bad at predicting Supreme Court decisions....

september 2018 by cshalizi

Ouch. This would make a great teaching example, if replication data is available.

track_down_references
classifiers
law
evisceration
to_teach
kahan.dan
september 2018 by cshalizi

[1808.07593] Pathologies in information bottleneck for deterministic supervised learning

august 2018 by cshalizi

"Information bottleneck (IB) is a method for extracting information from one random variable X that is relevant for predicting another random variable Y. To do so, IB identifies an intermediate "bottleneck" variable T that has low mutual information I(X;T) and high mutual information I(Y;T). The "IB curve" characterizes the set of bottleneck variables that achieve maximal I(Y;T) for a given I(X;T), and is typically explored by optimizing the "IB Lagrangian", I(Y;T)−βI(X;T). Recently, there has been interest in applying IB to supervised learning, particularly for classification problems that use neural networks. In most classification problems, the output class Y is a deterministic function of the input X, which we refer to as "deterministic supervised learning". We demonstrate three pathologies that arise when IB is used in any scenario where Y is a deterministic function of X: (1) the IB curve cannot be recovered by optimizing the IB Lagrangian for different values of β; (2) there are "uninteresting" solutions at all points of the IB curve; and (3) for classifiers that achieve low error rates, the activity of different hidden layers will not exhibit a strict trade-off between compression and prediction, contrary to a recent proposal. To address problem (1), we propose a functional that, unlike the IB Lagrangian, can recover the IB curve in all cases. We finish by demonstrating these issues on the MNIST dataset."

to:NB
to_read
information_theory
classifiers
information_bottleneck
via:ded-maxim
august 2018 by cshalizi

Classifiers · PyPI

august 2018 by phatblat

The Python Package Index (PyPI) is a repository of software for the Python programming language.

pypi
python
classifiers
package
tags
index
august 2018 by phatblat

[1805.10204] Adversarial examples from computational constraints

may 2018 by cshalizi

"Why are classifiers in high dimension vulnerable to "adversarial" perturbations? We show that it is likely not due to information theoretic limitations, but rather it could be due to computational constraints.

"First we prove that, for a broad set of classification tasks, the mere existence of a robust classifier implies that it can be found by a possibly exponential-time algorithm with relatively few training examples. Then we give a particular classification task where learning a robust classifier is computationally intractable. More precisely we construct a binary classification task in high dimensional space which is (i) information theoretically easy to learn robustly for large perturbations, (ii) efficiently learnable (non-robustly) by a simple linear separator, (iii) yet is not efficiently robustly learnable, even for small perturbations, by any algorithm in the statistical query (SQ) model. This example gives an exponential separation between classical learning and robust learning in the statistical query model. It suggests that adversarial examples may be an unavoidable byproduct of computational limitations of learning algorithms."

in_NB
adversarial_examples
computational_complexity
machine_learning
classifiers
have_read
bubeck.sebastien
"First we prove that, for a broad set of classification tasks, the mere existence of a robust classifier implies that it can be found by a possibly exponential-time algorithm with relatively few training examples. Then we give a particular classification task where learning a robust classifier is computationally intractable. More precisely we construct a binary classification task in high dimensional space which is (i) information theoretically easy to learn robustly for large perturbations, (ii) efficiently learnable (non-robustly) by a simple linear separator, (iii) yet is not efficiently robustly learnable, even for small perturbations, by any algorithm in the statistical query (SQ) model. This example gives an exponential separation between classical learning and robust learning in the statistical query model. It suggests that adversarial examples may be an unavoidable byproduct of computational limitations of learning algorithms."

may 2018 by cshalizi

[1802.01396] To understand deep learning we need to understand kernel learning

march 2018 by cshalizi

"Generalization performance of classifiers in deep learning has recently become a subject of intense study. Heavily over-parametrized deep models tend to fit training data exactly. Despite overfitting, they perform well on test data, a phenomenon not yet fully understood.

"The first point of our paper is that strong performance of overfitted classifiers is not a unique feature of deep learning. Using real-world and synthetic datasets, we establish that kernel classifiers trained to have zero classification error (overfitting) or even zero regression error (interpolation) perform very well on test data.

"We proceed to prove lower bounds on the norm of overfitted solutions for smooth kernels, showing that they increase nearly exponentially with the data size. Since most generalization bounds depend polynomially on the norm of the solution, this result implies that they diverge as data increases. Furthermore, the existing bounds do not apply to interpolated classifiers.

"We also show experimentally that (non-smooth) Laplacian kernels easily fit random labels using a version of SGD, a finding that parallels results reported for ReLU neural networks. In contrast, fitting noisy data requires many more epochs for smooth Gaussian kernels. The observation that the performance of overfitted Laplacian and Gaussian classifiers on the test is quite similar, suggests that generalization is tied to the properties of the kernel function rather than the optimization process.

"We see that some key phenomena of deep learning are manifested similarly in kernel methods in the overfitted regime. We argue that progress on understanding deep learning will be difficult, until more analytically tractable "shallow" kernel methods are better understood. The experimental and theoretical results presented in this paper indicate a need for new theoretical ideas for understanding classical kernel methods."

--- Of course, this also makes me wonder whether there really are practical advantages to deep networks, over and above what we'd get by throwing resources at kernels...

to:NB
neural_networks
kernel_methods
classifiers
regression
statistics
computational_statistics
learning_theory
belkin.mikhail
"The first point of our paper is that strong performance of overfitted classifiers is not a unique feature of deep learning. Using real-world and synthetic datasets, we establish that kernel classifiers trained to have zero classification error (overfitting) or even zero regression error (interpolation) perform very well on test data.

"We proceed to prove lower bounds on the norm of overfitted solutions for smooth kernels, showing that they increase nearly exponentially with the data size. Since most generalization bounds depend polynomially on the norm of the solution, this result implies that they diverge as data increases. Furthermore, the existing bounds do not apply to interpolated classifiers.

"We also show experimentally that (non-smooth) Laplacian kernels easily fit random labels using a version of SGD, a finding that parallels results reported for ReLU neural networks. In contrast, fitting noisy data requires many more epochs for smooth Gaussian kernels. The observation that the performance of overfitted Laplacian and Gaussian classifiers on the test is quite similar, suggests that generalization is tied to the properties of the kernel function rather than the optimization process.

"We see that some key phenomena of deep learning are manifested similarly in kernel methods in the overfitted regime. We argue that progress on understanding deep learning will be difficult, until more analytically tractable "shallow" kernel methods are better understood. The experimental and theoretical results presented in this paper indicate a need for new theoretical ideas for understanding classical kernel methods."

--- Of course, this also makes me wonder whether there really are practical advantages to deep networks, over and above what we'd get by throwing resources at kernels...

march 2018 by cshalizi

Model Selection via the VC Dimension

march 2018 by cshalizi

"We develop an objective function that can be readily optimized to give an estimator of the Vapnik-Chervonenkis dimension for regression problems. We verify our estimator is consistent and performs well in simulations. We use our estimator on two datasets both acknowledged to be difficult and see that it gives results that are comparable in quality if not better than established techniques such as the Bayes information criterion, two forms of empirical risk minimization, and two sparsity methods"

--- If they really fixed the issues with the Vapnik et al. (1994) results, this would be very nice.

to:NB
learning_theory
model_selection
statistics
regression
classifiers
vc-dimension
to_read
--- If they really fixed the issues with the Vapnik et al. (1994) results, this would be very nice.

march 2018 by cshalizi

Letting neural networks be weird • Do neural nets dream of electric sheep?

march 2018 by cshalizi

This delightful example replaces the (sadly, apparently apocryphal) one about tanks.

neural_networks
classifiers
to_teach:data-mining
via:henry_farrell
machine_learning
sheep
your_favorite_deep_neural_network_sucks
march 2018 by cshalizi

[1702.04690] Simple rules for complex decisions

january 2018 by cshalizi

"From doctors diagnosing patients to judges setting bail, experts often base their decisions on experience and intuition rather than on statistical models. While understandable, relying on intuition over models has often been found to result in inferior outcomes. Here we present a new method, select-regress-and-round, for constructing simple rules that perform well for complex decisions. These rules take the form of a weighted checklist, can be applied mentally, and nonetheless rival the performance of modern machine learning algorithms. Our method for creating these rules is itself simple, and can be carried out by practitioners with basic statistics knowledge. We demonstrate this technique with a detailed case study of judicial decisions to release or detain defendants while they await trial. In this application, as in many policy settings, the effects of proposed decision rules cannot be directly observed from historical data: if a rule recommends releasing a defendant that the judge in reality detained, we do not observe what would have happened under the proposed action. We address this key counterfactual estimation problem by drawing on tools from causal inference. We find that simple rules significantly outperform judges and are on par with decisions derived from random forests trained on all available features. Generalizing to 22 varied decision-making domains, we find this basic result replicates. We conclude with an analytical framework that helps explain why these simple decision rules perform as well as they do."

to:NB
to_read
decision-making
classifiers
fast-and-frugal_heuristics
heuristics
clinical-vs-actuarial_prediction
prediction
crime
bail
via:vaguery
january 2018 by cshalizi

**related tags**

Copy this bookmark: