classifiers   226

« earlier    

[1709.05862] Recognizing Objects In-the-wild: Where Do We Stand?
"The ability to recognize objects is an essential skill for a robotic system acting in human-populated environments. Despite decades of effort from the robotic and vision research communities, robots are still missing good visual perceptual systems, preventing the use of autonomous agents for real-world applications. The progress is slowed down by the lack of a testbed able to accurately represent the world perceived by the robot in-the-wild. In order to fill this gap, we introduce a large-scale, multi-view object dataset collected with an RGB-D camera mounted on a mobile robot. The dataset embeds the challenges faced by a robot in a real-life application and provides a useful tool for validating object recognition algorithms. Besides describing the characteristics of the dataset, the paper evaluates the performance of a collection of well-established deep convolutional networks on the new dataset and analyzes the transferability of deep representations from Web images to robotic data. Despite the promising results obtained with such representations, the experiments demonstrate that object classification with real-life robotic data is far from being solved. Finally, we provide a comparative study to analyze and highlight the open challenges in robot vision, explaining the discrepancies in the performance."
to:NB  machine_learning  neural_networks  your_favorite_deep_neural_network_sucks  classifiers  to_read  via:melanie_mitchell 
10 days ago by cshalizi
Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning
"Randomized neural networks are immortalized in this AI Koan: In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6. What are you doing?'' asked Minsky. I am training a randomly wired neural net to play tic-tac-toe,'' Sussman replied. Why is the net wired randomly?'' asked Minsky. Sussman replied, I do not want it to have any preconceptions of how to play.'' Minsky then shut his eyes. Why do you close your eyes?'' Sussman asked his teacher. So that the room will be empty,'' replied Minsky. At that moment, Sussman was enlightened. We analyze shallow random networks with the help of concentration of measure inequalities. Specifically, we consider architectures that compute a weighted sum of their inputs after passing them through a bank of arbitrary randomized nonlinearities. We identify conditions under which these networks exhibit good classification performance, and bound their test error in terms of the size of the dataset and the number of random nonlinearities."

--- Have I never bookmarked this before?
in_NB  approximation  kernel_methods  random_projections  statistics  prediction  classifiers  rahimi.ali  recht.benjamin  machine_learning  have_read 
5 weeks ago by cshalizi
[1808.07593] Pathologies in information bottleneck for deterministic supervised learning
"Information bottleneck (IB) is a method for extracting information from one random variable X that is relevant for predicting another random variable Y. To do so, IB identifies an intermediate "bottleneck" variable T that has low mutual information I(X;T) and high mutual information I(Y;T). The "IB curve" characterizes the set of bottleneck variables that achieve maximal I(Y;T) for a given I(X;T), and is typically explored by optimizing the "IB Lagrangian", I(Y;T)−βI(X;T). Recently, there has been interest in applying IB to supervised learning, particularly for classification problems that use neural networks. In most classification problems, the output class Y is a deterministic function of the input X, which we refer to as "deterministic supervised learning". We demonstrate three pathologies that arise when IB is used in any scenario where Y is a deterministic function of X: (1) the IB curve cannot be recovered by optimizing the IB Lagrangian for different values of β; (2) there are "uninteresting" solutions at all points of the IB curve; and (3) for classifiers that achieve low error rates, the activity of different hidden layers will not exhibit a strict trade-off between compression and prediction, contrary to a recent proposal. To address problem (1), we propose a functional that, unlike the IB Lagrangian, can recover the IB curve in all cases. We finish by demonstrating these issues on the MNIST dataset."
to:NB  to_read  information_theory  classifiers  information_bottleneck  via:ded-maxim 
8 weeks ago by cshalizi
Classifiers · PyPI
The Python Package Index (PyPI) is a repository of software for the Python programming language.
pypi  python  classifiers  package  tags  index 
10 weeks ago by phatblat
[1805.10204] Adversarial examples from computational constraints
"Why are classifiers in high dimension vulnerable to "adversarial" perturbations? We show that it is likely not due to information theoretic limitations, but rather it could be due to computational constraints.
"First we prove that, for a broad set of classification tasks, the mere existence of a robust classifier implies that it can be found by a possibly exponential-time algorithm with relatively few training examples. Then we give a particular classification task where learning a robust classifier is computationally intractable. More precisely we construct a binary classification task in high dimensional space which is (i) information theoretically easy to learn robustly for large perturbations, (ii) efficiently learnable (non-robustly) by a simple linear separator, (iii) yet is not efficiently robustly learnable, even for small perturbations, by any algorithm in the statistical query (SQ) model. This example gives an exponential separation between classical learning and robust learning in the statistical query model. It suggests that adversarial examples may be an unavoidable byproduct of computational limitations of learning algorithms."
in_NB  adversarial_examples  computational_complexity  machine_learning  classifiers  have_read  bubeck.sebastien 
may 2018 by cshalizi
[1802.01396] To understand deep learning we need to understand kernel learning
"Generalization performance of classifiers in deep learning has recently become a subject of intense study. Heavily over-parametrized deep models tend to fit training data exactly. Despite overfitting, they perform well on test data, a phenomenon not yet fully understood.
"The first point of our paper is that strong performance of overfitted classifiers is not a unique feature of deep learning. Using real-world and synthetic datasets, we establish that kernel classifiers trained to have zero classification error (overfitting) or even zero regression error (interpolation) perform very well on test data.
"We proceed to prove lower bounds on the norm of overfitted solutions for smooth kernels, showing that they increase nearly exponentially with the data size. Since most generalization bounds depend polynomially on the norm of the solution, this result implies that they diverge as data increases. Furthermore, the existing bounds do not apply to interpolated classifiers.
"We also show experimentally that (non-smooth) Laplacian kernels easily fit random labels using a version of SGD, a finding that parallels results reported for ReLU neural networks. In contrast, fitting noisy data requires many more epochs for smooth Gaussian kernels. The observation that the performance of overfitted Laplacian and Gaussian classifiers on the test is quite similar, suggests that generalization is tied to the properties of the kernel function rather than the optimization process.
"We see that some key phenomena of deep learning are manifested similarly in kernel methods in the overfitted regime. We argue that progress on understanding deep learning will be difficult, until more analytically tractable "shallow" kernel methods are better understood. The experimental and theoretical results presented in this paper indicate a need for new theoretical ideas for understanding classical kernel methods."

--- Of course, this also makes me wonder whether there really are practical advantages to deep networks, over and above what we'd get by throwing resources at kernels...
to:NB  neural_networks  kernel_methods  classifiers  regression  statistics  computational_statistics  learning_theory  belkin.mikhail 
march 2018 by cshalizi
Model Selection via the VC Dimension
"We develop an objective function that can be readily optimized to give an estimator of the Vapnik-Chervonenkis dimension for regression problems. We verify our estimator is consistent and performs well in simulations. We use our estimator on two datasets both acknowledged to be difficult and see that it gives results that are comparable in quality if not better than established techniques such as the Bayes information criterion, two forms of empirical risk minimization, and two sparsity methods"

--- If they really fixed the issues with the Vapnik et al. (1994) results, this would be very nice.
to:NB  learning_theory  model_selection  statistics  regression  classifiers  vc-dimension  to_read 
march 2018 by cshalizi
[1702.04690] Simple rules for complex decisions
"From doctors diagnosing patients to judges setting bail, experts often base their decisions on experience and intuition rather than on statistical models. While understandable, relying on intuition over models has often been found to result in inferior outcomes. Here we present a new method, select-regress-and-round, for constructing simple rules that perform well for complex decisions. These rules take the form of a weighted checklist, can be applied mentally, and nonetheless rival the performance of modern machine learning algorithms. Our method for creating these rules is itself simple, and can be carried out by practitioners with basic statistics knowledge. We demonstrate this technique with a detailed case study of judicial decisions to release or detain defendants while they await trial. In this application, as in many policy settings, the effects of proposed decision rules cannot be directly observed from historical data: if a rule recommends releasing a defendant that the judge in reality detained, we do not observe what would have happened under the proposed action. We address this key counterfactual estimation problem by drawing on tools from causal inference. We find that simple rules significantly outperform judges and are on par with decisions derived from random forests trained on all available features. Generalizing to 22 varied decision-making domains, we find this basic result replicates. We conclude with an analytical framework that helps explain why these simple decision rules perform as well as they do."
to:NB  to_read  decision-making  classifiers  fast-and-frugal_heuristics  heuristics  clinical-vs-actuarial_prediction  prediction  crime  bail  via:vaguery 
january 2018 by cshalizi

« earlier    

related tags

academic  action  active_learning  additive_models  adversarial-classification  adversarial_examples  ai  algorithms  anomaly_detection  approximation  attacks  auerbach.david  bad_data_analysis  bail  bayes  bayesian  bayesian_methodology  belkin.mikhail  bias  biau.gerard  books  books:noted  boosting  bootstrap  bubeck.sebastien  buhlmann.peter  calibration  causal_inference  classification  classifier  classimbalance  clauset.aaron  clinical-vs-actuarial_prediction  clinical_vs_actuarial_prediction  clustering  cognitive_development  collective_cognition  community_discovery  compciv  complexity_measures  computational_complexity  computational_statistics  computer-vision  crime  cross-validation  data  data_analysis  data_mining  data_science  data_sets  datascience  decision-making  decision  decision_trees  deep-learning  deep.learning  density_estimation  devroye.luc  distributed_systems  dnns  drones  eberhardt.frederick  ensemble_methods  entropy_estimation  ergodic_theory  evisceration  evolutionary.theory  example  explanation  exponential_families  fast-and-frugal_heuristics  fmri  function  functional_connectivity  gc  getoor.lise  google  graphical_models  guestrin.carlos  have_forgotten  have_read  heard_the_talk  heuristics  high-dimensional_statistics  hypothesis_testing  image-processing  in_nb  index  induction  information_bottleneck  information_theory  infosec  jeff_delezen  jeremy_zhou  judy_arrays  kahan.dan  kernel_methods  kifi  kith_and_kin  kleinberg.jon  knn  kolar.mladen  kontorovich.aryeh  latent_semantic_indexing  law  learn  learning_theory  liu.han  logistic_regression  loss  machine-learning  machine.learning  machine_learning  machinelearning  me  medicine  methodology  ml  model_checking  model_selection  monte_carlo  national_surveillance_state  natural_language_processing  nature  nearest-neighbors  network_data_analysis  networked_life  neural-networks  neural_coding_and_decoding  neural_data_analysis  neural_networks  neuroscience  neyman-pearson  nlp  nltk  nn  nobel.andrew  nonparametrics  of  opencv  optimization  organizations  oversampling  package  papers  pattern.matching  pattern_recognition  patternclassification  poldrack.russell  political_science  prediction  privacy  psychology  pypi  python  rahimi.ali  random_projections  re:adafaepov  re:aos_project  re:democratic_cognition  re:functional_communities  re:network_differences  re:prediction-without-prejudice  re:prediction-without-racism  re:your_favorite_dsge_sucks  recht.benjamin  regression  research  risk  rstats  ruby  sas  scikit-learn  search  security  sheep  skew.attack  smoothing  smote  social_measurement  social_media  social_science_methodology  sociology  space  sparsity  spatial_statistics  spectral_clustering  splines  standford  stanford  state  statistics  stochastic_block_models  structural_risk_minimization  tags  terrorism_fears  text  text_mining  textmining  the_continuing_crises  theory  time_series  to:blog  to:nb  to_read  to_teach  to_teach:baby-nets  to_teach:data-mining  toxic  track_down_references  trolling  tutorial  tutorials  undersampling  variable_selection  vc-dimension  visual_display_of_quantitative_information  vulnerabilities  watts.duncan  your_favorite_deep_neural_network_sucks  yu.bin 

Copy this bookmark: