**generalization**298

The Disproportional Power of Anecdotes

8 days ago by dandv

increase your sample size when making decisions. You need enough information to be able to plot the range of possibilities, identify the outliers, and define the average.

"The Canadian Encyclopedia states, “If the nearly 29 million (figure unadjusted) in gold that was recovered during the heady years of 1897 to 1899 [in the Klondike] was divided equally among all those who participated in the gold rush, the amount would fall far short of the total they had invested in time and money.”

How did this happen? Because those miners took anecdotes as being representative of a broader reality. Quite literally, they learned mining from rumor, and didn’t develop any real knowledge. Most people fought for claims along the creeks, where easy gold had been discovered, while rejecting the bench claims on the hillsides above, which often had just as much gold.

You may be thinking that these men must have been desperate if they packed themselves up, heading into unknown territory, facing multiple dangers along the way, to chase a dream of easy money. But most of us aren’t that different. How many times have you invested in a “hot stock” on a tip from one person, only to have the company go under within a year? Ultimately, the smaller the sample size, the greater role the factors of chance play in determining an outcome.”

many scientific psychology studies use college students who are predominantly Western, Educated, Industrialized, Rich, and Democratic (WEIRD), and then draw conclusions about the entire human race from these outliers. They reviewed scientific literature from domains such as “visual perception, fairness, cooperation, spatial reasoning, categorization and inferential induction, moral reasoning, and the heritability of IQ. The findings suggest that members of WEIRD societies, including young children, are among the least representative populations one could find for generalizing about humans.”

this particular subpopulation is highly unrepresentative of the species.”

decision
making
against
generalization
bias
gold
"The Canadian Encyclopedia states, “If the nearly 29 million (figure unadjusted) in gold that was recovered during the heady years of 1897 to 1899 [in the Klondike] was divided equally among all those who participated in the gold rush, the amount would fall far short of the total they had invested in time and money.”

How did this happen? Because those miners took anecdotes as being representative of a broader reality. Quite literally, they learned mining from rumor, and didn’t develop any real knowledge. Most people fought for claims along the creeks, where easy gold had been discovered, while rejecting the bench claims on the hillsides above, which often had just as much gold.

You may be thinking that these men must have been desperate if they packed themselves up, heading into unknown territory, facing multiple dangers along the way, to chase a dream of easy money. But most of us aren’t that different. How many times have you invested in a “hot stock” on a tip from one person, only to have the company go under within a year? Ultimately, the smaller the sample size, the greater role the factors of chance play in determining an outcome.”

many scientific psychology studies use college students who are predominantly Western, Educated, Industrialized, Rich, and Democratic (WEIRD), and then draw conclusions about the entire human race from these outliers. They reviewed scientific literature from domains such as “visual perception, fairness, cooperation, spatial reasoning, categorization and inferential induction, moral reasoning, and the heritability of IQ. The findings suggest that members of WEIRD societies, including young children, are among the least representative populations one could find for generalizing about humans.”

this particular subpopulation is highly unrepresentative of the species.”

8 days ago by dandv

[1902.10286] On Multi-Cause Causal Inference with Unobserved Confounding: Counterexamples, Impossibility, and Alternatives

5 weeks ago by Vaguery

Unobserved confounding is a central barrier to drawing causal inferences from observational data. Several authors have recently proposed that this barrier can be overcome in the case where one attempts to infer the effects of several variables simultaneously. In this paper, we present two simple, analytical counterexamples that challenge the general claims that are central to these approaches. In addition, we show that nonparametric identification is impossible in this setting. We discuss practical implications, and suggest alternatives to the methods that have been proposed so far in this line of work: using proxy variables and shifting focus to sensitivity analysis.

via:cshalizi
machine-learning
whereof-one-cannot-speak
generalization
dreaming
algorithms
induction?
counterexamples
5 weeks ago by Vaguery

[1706.06083] Towards Deep Learning Models Resistant to Adversarial Attacks

7 weeks ago by Vaguery

Recent work has demonstrated that neural networks are vulnerable to adversarial examples, i.e., inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models.

machine-learning
adversarial-learning
coevolution
generalization
optical-illusions
feature-extraction
feature-construction
neural-networks
rather-interesting
to-write-about
7 weeks ago by Vaguery

OSF | Near and Far Transfer in Cognitive Training: A Second-Order Meta- Analysis

8 weeks ago by nhaliday

In Models 1 (k = 99) and 2 (k = 119), we investigated the impact of working-memory training on near-transfer (i.e., memory) and far-transfer (e.g., reasoning, speed, and language) measures, respectively, and whether it is mediated by the type of population. Model 3 (k = 233) extended Model 2 by adding six meta-analyses assessing the far-transfer effects of other cognitive-training programs (video-games, music, chess, and exergames). Model 1 showed that working-memory training does induce near transfer, and that the size of this effect is moderated by the type of population. By contrast, Models 2 and 3 highlighted that far-transfer effects are small or null.

study
preprint
psychology
cog-psych
intelligence
generalization
dimensionality
psych-architecture
intervention
enhancement
practice
8 weeks ago by nhaliday

[1901.10861] A Simple Explanation for the Existence of Adversarial Examples with Small Hamming Distance

9 weeks ago by Vaguery

The existence of adversarial examples in which an imperceptible change in the input can fool well trained neural networks was experimentally discovered by Szegedy et al in 2013, who called them "Intriguing properties of neural networks". Since then, this topic had become one of the hottest research areas within machine learning, but the ease with which we can switch between any two decisions in targeted attacks is still far from being understood, and in particular it is not clear which parameters determine the number of input coordinates we have to change in order to mislead the network. In this paper we develop a simple mathematical framework which enables us to think about this baffling phenomenon from a fresh perspective, turning it into a natural consequence of the geometry of ℝn with the L0 (Hamming) metric, which can be quantitatively analyzed. In particular, we explain why we should expect to find targeted adversarial examples with Hamming distance of roughly m in arbitrarily deep neural networks which are designed to distinguish between m input classes.

machine-learning
robustness
adversarial-examples
rather-interesting
good-explanations
to-write-about
computational-vs-systems
generalization
feature-construction
saliency
9 weeks ago by Vaguery

Modern Neural Networks Generalize on Small Data Sets

december 2018 by arsyed

In this paper, we use a linear program to empirically decompose fitted neural networks into ensembles of low-bias sub-networks. We show that these sub-networks are relatively uncorrelated which leads to an internal regularization process, very much like a random forest, which can explain why a neural network is surprisingly resistant to overfitting. We then demonstrate this in practice by applying large neural networks, with hundreds of parameters per training observation, to a collection of 116 real-world data sets from the UCI Machine Learning Repository. This collection of data sets contains a much smaller number of training examples than the types of image classification tasks generally studied in the deep learning literature, as well as non-trivial label noise. We show that even in this setting deep neural nets are capable of achieving superior classification accuracy without overfitting.

neural-net
generalization
small-data
richard-berk
december 2018 by arsyed

[1808.01204] Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

december 2018 by arsyed

Neural networks have many successful applications, while much less theoretical understanding has been gained. Towards bridging this gap, we study the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization. In the overparameterized setting, when the data comes from mixtures of well-separated distributions, we prove that SGD learns a network with a small generalization error, albeit the network has enough capacity to fit arbitrary labels. Furthermore, the analysis provides interesting insights into several aspects of learning neural networks and can be verified based on empirical studies on synthetic data and on the MNIST dataset.

neural-net
sgd
generalization
december 2018 by arsyed

[1810.12282] Assessing Generalization in Deep Reinforcement Learning

november 2018 by arsyed

Deep reinforcement learning (RL) has achieved breakthrough results on many tasks, but has been shown to be sensitive to system changes at test time. As a result, building deep RL agents that generalize has become an active research area. Our aim is to catalyze and streamline community-wide progress on this problem by providing the first benchmark and a common experimental protocol for investigating generalization in RL. Our benchmark contains a diverse set of environments and our evaluation methodology covers both in-distribution and out-of-distribution generalization. To provide a set of baselines for future research, we conduct a systematic evaluation of deep RL algorithms, including those that specifically tackle the problem of generalization.

reinforcement-learning
generalization
november 2018 by arsyed

Generalization guides human exploration in vast decision spaces | bioRxiv

november 2018 by arsyed

From foraging for food to learning complex games, many aspects of human behaviour can be framed as a search problem with a vast space of possible actions. Under finite search horizons, optimal solutions are generally unobtainable. Yet how do humans navigate vast problem spaces, which require intelligent exploration of unobserved actions? Using a variety of bandit tasks with up to 121 arms, we study how humans search for rewards under limited search horizons, where the spatial correlation of rewards (in both generated and natural environments) provides traction for generalization. Across a variety of different probabilistic and heuristic models, we find evidence that Gaussian Process function learning--combined with an optimistic Upper Confidence Bound sampling strategy--provides a robust account of how people use generalization to guide search. Our modelling results and parameter estimates are recoverable, and can be used to simulate human-like performance, providing novel insights about human behaviour in complex environments.

decision
human
exploration
generalization
november 2018 by arsyed

[1811.03270] An Optimal Transport View on Generalization

november 2018 by arsyed

We derive upper bounds on the generalization error of learning algorithms based on their \emph{algorithmic transport cost}: the expected Wasserstein distance between the output hypothesis and the output hypothesis conditioned on an input example. The bounds provide a novel approach to study the generalization of learning algorithms from an optimal transport view and impose less constraints on the loss function, such as sub-gaussian or bounded. We further provide several upper bounds on the algorithmic transport cost in terms of total variation distance, relative entropy (or KL-divergence), and VC dimension, thus further bridging optimal transport theory and information theory with statistical learning theory. Moreover, we also study different conditions for loss functions under which the generalization error of a learning algorithm can be upper bounded by different probability metrics between distributions relating to the output hypothesis and/or the input data. Finally, under our established framework, we analyze the generalization in deep learning and conclude that the generalization error in deep neural networks (DNNs) decreases exponentially to zero as the number of layers increases. Our analyses of generalization error in deep learning mainly exploit the hierarchical structure in DNNs and the contraction property of f-divergence, which may be of independent interest in analyzing other learning models with hierarchical structure.

optimal-transport
generalization
machine-learning
november 2018 by arsyed

**related tags**

Copy this bookmark: