[1902.06720] Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
A longstanding goal in deep learning research has been to precisely characterize training and generalization. However, the often complex loss landscapes of neural networks have made a theory of learning dynamics elusive. In this work, we show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters. Furthermore, mirroring the correspondence between wide Bayesian neural networks and Gaussian processes, gradient-based training of wide neural networks with a squared loss produces test set predictions drawn from a Gaussian process with a particular compositional kernel. While these theoretical results are only exact in the infinite width limit, we nevertheless find excellent empirical agreement between the predictions of the original network and those of the linearized version even for finite practically-sized networks. This agreement is robust across different architectures, optimization methods, and loss functions.
neural-net  linear-models  gradient-descent 
5 hours ago
[1902.04023] Computing Extremely Accurate Quantiles Using t-Digests
}We present on-line algorithms for computing approximations of rank-based statistics that give high accuracy, particularly near the tails of a distribution, with very small sketches. Notably, the method allows a quantile q to be computed with an accuracy relative to max(q,1−q) rather than absolute accuracy as with most other methods. This new algorithm is robust with respect to skewed distributions or ordered datasets and allows separately computed summaries to be combined with no loss in accuracy.
An open-source Java implementation of this algorithm is available from the author. Independent implementations in Go and Python are also available."
algorithms  online  approximation  quantile 
Performance Evaluation in Machine Learning:The Good, The Bad, The Ugly and The Way Forward
"This paper gives an overview of some ways in which our understanding of performance evaluation measures for machine-learned classifiers has improved over the last twenty years. I also highlight a range of areas where this understanding is still lacking, leading to ill-advised practices in classifier evaluation. This suggests that in order to make further progress we need to develop a proper measurement theory of machine learning. I then demonstrate by example what such a measurement theory might look like and what kinds of new results it would entail. Finally, I argue that key properties such as classification ability and data set difficulty are unlikely to be directly observable, suggesting the need for latent-variable models and causal inference."
machine-learning  evaluation  measurement 
7 days ago
How Not to Count the Poor by Thomas Pogge, Sanjay G. Reddy :: SSRN
The World Bank's approach to estimating the extent, distribution and trend of global income poverty is neither meaningful nor reliable. The Bank uses an arbitrary international poverty line that is not adequately anchored in any specification of the real requirements of human beings. Moreover, it employs a concept of purchasing power equivalence that is neither well defined nor appropriate for poverty assessment. These difficulties are inherent in the Bank's "money-metric" approach and cannot be credibly overcome without dispensing with this approach altogether. In addition, the Bank extrapolates incorrectly from limited data and thereby creates an appearance of precision that masks the high probable error of its estimates. It is difficult to judge the nature and extent of the errors in global poverty estimates that these three flaws produce. However, there is reason to believe that the Bank's approach may have led it to understate the extent of global income poverty and to infer without adequate justification that global income poverty has steeply declined in the recent period. A new methodology of global poverty assessment, focused directly on what is needed to achieve elementary human requirements, is feasible and necessary. A practical approach to implementing an alternative is described.
economics  development  poverty  thomas-pogge  sanjay-reddy 
7 days ago
[1901.11373] Learning and Evaluating General Linguistic Intelligence
We define general linguistic intelligence as the ability to reuse previously acquired knowledge about a language's lexicon, syntax, semantics, and pragmatic conventions to adapt to new tasks quickly. Using this definition, we analyze state-of-the-art natural language understanding models and conduct an extensive empirical investigation to evaluate them against these criteria through a series of experiments that assess the task-independence of the knowledge being acquired by the learning process. In addition to task performance, we propose a new evaluation metric based on an online encoding of the test data that quantifies how quickly an existing agent (model) learns a new task. Our results show that while the field has made impressive progress in terms of model architectures that generalize to many tasks, these models still require a lot of in-domain training examples (e.g., for fine tuning, training task-specific modules), and are prone to catastrophic forgetting. Moreover, we find that far from solving general tasks (e.g., document question answering), our models are overfitting to the quirks of particular datasets (e.g., SQuAD). We discuss missing components and conjecture on how to make progress toward general linguistic intelligence.
evaluation  nlp  nlu 
8 days ago
[1811.03188] Solving Jigsaw Puzzles By the Graph Connection Laplacian
We propose a novel mathematical framework to address the problem of automatically solving large jigsaw puzzles. This problem assumes a large image which is cut into equal square pieces that are arbitrarily rotated and shifted and asks to recover the original image given the transformed pieces. The main contribution of this work is a theoretically-guaranteed method for recovering the unknown orientations of the puzzle pieces by using the graph connection Laplacian associated with the puzzle. Iterative application of this method and other methods for recovering the unknown shifts result in a solution for the large jigsaw puzzle problem. This solution is not greedy, unlike many other solutions. Numerical experiments demonstrate the competitive performance of the proposed method.
jigsaw  graph  laplacian 
9 days ago
Homepage — Essentia 2.1-beta5-dev documentation
"Essentia is a open-source C++ library for audio analysis and audio-based music information retrieval. It contains an extensive collection of algorithms including audio input/output functionality, standard digital signal processing blocks, statistical characterization of data, and a large set of spectral, temporal, tonal and high-level music descriptors. [...] The library is also wrapped in Python and includes a number of command-line tools and third-party extensions, which facilitate its use for fast prototyping and allow setting up research experiments very rapidly."
python  libs  audio  dsp  music  mir 
9 days ago
Understanding the bin, sbin, usr/bin , usr/sbin split
"You know how Ken Thompson and Dennis Ritchie created Unix on a PDP-7 in 1969? Well around 1971 they upgraded to a PDP-11 with a pair of RK05 disk packs (1.5 megabytes each) for storage. When the operating system grew too big to fit on the first RK05 disk pack (their root filesystem) they let it leak into the second one, which is where all the user home directories lived (which is why the mount was called /usr). They replicated all the OS directories under there (/bin, /sbin, /lib, /tmp...) and wrote files to those new directories because their original disk was out of space. When they got a third disk, they mounted it on /home and relocated all the user directories to there so the OS could consume all the space on both disks and grow to THREE WHOLE MEGABYTES (ooooh!)."
unix  filesystem  history  via:jm 
11 days ago
Convert video to audio, Catch up on your video backlog — Listen Later
"Listen Later is a free service for converting videos into an audio podcast, which makes it easier to catch up on your video backlog during chores, errands and commutes. Listen Later works with services supported by youtube-dl."
podcasts  video  audio 
12 days ago
Exploring random encoders for sentence classification - Facebook Code
"We set out to determine what was gained, if anything, by using current state-of-the-art methods rather than random methods that combine nothing but pretrained word embeddings. The power of random features has long been known in the machine learning community, so we applied it to this NLP task. We explored three methods: bag of random embedding projections, random LSTMs, and echo state networks. Our findings indicated that much of the lifting power in sentence embeddings comes from word representations. We found that random parameterizations over pretrained word embeddings constituted a very strong baseline and sometimes even matched the performance of well-known sentence encoders such as SkipThought and InferSent. These findings impose a strong baseline for research in representation learning for sentences going forward. We also made important observations about proper experimental protocol for sentence classification evaluation, together with recommendations for future research."
nlp  embedding  sentence  random-features  via:hustwj 
14 days ago
The rise of the swear nerds | The Outline
“Fuckbonnet” is a swear-pyrrhic compound. The double-n in the middle and stop consonant at the end make it fun to say, but — and this is crucial — the insult itself does not say anything. What is a fuckbonnet, exactly? Is it something you wear when you get…? Is it a hat that has fallen out of fashion and is now only good for…? There’s no discernible meaning behind the word; it only expresses contempt and the author’s vain originality. I submit that this aspect of the new swears is a feature, not a bug. The reason this formula has become so popular in our time is that it conveys the author’s outrage without running the risk of actually insulting anybody.

The guide to the formula embedded above points to this aspect of the new swears, describing them as “non-gendered insults” that are better than problematic old standbys like “bitch.” Coming up with insults that do not invoke gender or race or disability is good. The point of an insult is to hurt the person so insulted, not to deride an entire class. For this reason, though, the insult must describe or otherwise connect to its target. The signature feature of the new swears is that they do not carry any target-specific content."
language  words  swear  insult 
15 days ago
Backreaction: Particle physicists surprised to find I am not their cheer-leader
"You see, the issue they have isn’t that I say particle physics has a problem. Because that’s obvious to everyone who ever had anything to do with the field. The issue is that I publicly say it."
physics  culture  criticism 
16 days ago
Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet | OpenReview
"Deep Neural Networks (DNNs) excel on many complex perceptual tasks but it has proven notoriously difficult to understand how they reach their decisions. We here introduce a high-performance DNN architecture on ImageNet whose decisions are considerably easier to explain. Our model, a simple variant of the ResNet-50 architecture called BagNet, classifies an image based on the occurrences of small local image features without taking into account their spatial ordering. This strategy is closely related to the bag-of-feature (BoF) models popular before the onset of deep learning and reaches a surprisingly high accuracy on ImageNet (87.6% top-5 for 32 x 32 px features and Alexnet performance for 16 x16 px features). The constraint on local features makes it straight-forward to analyse how exactly each part of the image influences the classification. Furthermore, the BagNets behave similar to state-of-the art deep neural networks such as VGG-16, ResNet-152 or DenseNet-169 in terms of feature sensitivity, error distribution and interactions between image parts. This suggests that the improvements of DNNs over previous bag-of-feature classifiers in the last few years is mostly achieved by better fine-tuning rather than by qualitatively different decision strategies."
deep-learning  convnet  computer-vision  bagnet  imagenet 
16 days ago
Understanding Convolutional Neural Networks for Text Classification
We present an analysis into the inner workings of Convolutional Neural Networks (CNNs) for processing text. CNNs used for computer vision can be interpreted by projecting filters into image space, but for discrete sequence inputs CNNs remain a mystery. We aim to understand the method by which the networks process and classify text. We examine common hypotheses to this problem: that filters, accompanied by global max-pooling, serve as ngram detectors. We show that filters may capture several different semantic classes of ngrams by using different activation patterns, and that global max-pooling induces behavior which separates important ngrams from the rest. Finally, we show practical use cases derived from our findings in the form of model interpretability (explaining a trained model by deriving a concrete identity for each filter, bridging the gap between visualization tools in vision tasks and NLP) and prediction interpretability (explaining predictions)
nlp  convnet 
19 days ago
On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis - W18-5406
Text preprocessing is often the first step in the pipeline of a Natural Language Processing (NLP) system, with potential impact in its final performance. Despite its importance, text preprocessing has not received much attention in the deep learning literature. In this paper we investigate the impact of simple text preprocessing decisions (particularly tokenizing, lemmatizing, lowercasing and multiword grouping) on the performance of a standard neural text classifier. We perform an extensive evaluation on standard benchmarks from text categorization and sentiment analysis. While our experiments show that a simple tokenization of input text is generally adequate, they also highlight significant degrees of variability across preprocessing techniques. This reveals the importance of paying attention to this usually-overlooked step in the pipeline, particularly when comparing different models. Finally, our evaluation provides insights into the best preprocessing practices for training word embeddings.
neural-net  nlp  text  preprocessing 
19 days ago
Gender Shades
"How well do IBM, Microsoft, and Face++ AI services guess the gender of a face?"
machine-learning  computer-vision  facial-recognition  gender 
24 days ago
[1901.08162] Causal Reasoning from Meta-reinforcement Learning
Discovering and exploiting the causal structure in the environment is a crucial challenge for intelligent agents. Here we explore whether causal reasoning can emerge via meta-reinforcement learning. We train a recurrent network with model-free reinforcement learning to solve a range of problems that each contain causal structure. We find that the trained agent can perform causal reasoning in novel situations in order to obtain rewards. The agent can select informative interventions, draw causal inferences from observational data, and make counterfactual predictions. Although established formal causal reasoning algorithms also exist, in this paper we show that such reasoning can arise from model-free reinforcement learning, and suggest that causal reasoning in complex settings may benefit from the more end-to-end learning-based approaches presented here. This work also offers new strategies for structured exploration in reinforcement learning, by providing agents with the ability to perform -- and interpret -- experiments.
causality  meta-learning  reinforcement-learning 
25 days ago
Enhancing human learning via spaced repetition optimization | PNAS
"Understanding human memory has been a long-standing problem in various scientific disciplines. Early works focused on characterizing human memory using small-scale controlled experiments and these empirical studies later motivated the design of spaced repetition algorithms for efficient memorization. However, current spaced repetition algorithms are rule-based heuristics with hard-coded parameters, which do not leverage the automated fine-grained monitoring and greater degree of control offered by modern online learning platforms. In this work, we develop a computational framework to derive optimal spaced repetition algorithms, specially designed to adapt to the learners’ performance. A large-scale natural experiment using data from a popular language-learning online platform provides empirical evidence that the spaced repetition algorithms derived using our framework are significantly superior to alternatives."
point-processes  control  spaced-repetition  duolingo  memory 
27 days ago
JRMeyer/multi-task-kaldi: An example directory for running Multi-Task Learning training on Kaldi neural networks. In Kaldi-speak, this is an egs dir for nnet3 training.
The collection of scripts in this repository represent a template for training neural networks via Multi-Task Learning in Kaldi. This repo is heavily based on the existing Kaldi multilingual Babel example directory.
asr  kaldi  transfer-learning  multi-task 
29 days ago
[1811.06031] A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks
Much effort has been devoted to evaluate whether multi-task learning can be leveraged to learn rich representations that can be used in various Natural Language Processing (NLP) down-stream applications. However, there is still a lack of understanding of the settings in which multi-task learning has a significant effect. In this work, we introduce a hierarchical model trained in a multi-task learning setup on a set of carefully selected semantic tasks. The model is trained in a hierarchical fashion to introduce an inductive bias by supervising a set of low level tasks at the bottom layers of the model and more complex tasks at the top layers of the model. This model achieves state-of-the-art results on a number of tasks, namely Named Entity Recognition, Entity Mention Detection and Relation Extraction without hand-engineered features or external NLP tools like syntactic parsers. The hierarchical training supervision induces a set of shared semantic representations at lower layers of the model. We show that as we move from the bottom to the top layers of the model, the hidden states of the layers tend to represent more complex semantic information.
neural-net  nlp  multi-task 
29 days ago
[1705.08142] Latent Multi-task Architecture Learning
Multi-task learning (MTL) allows deep neural networks to learn from related tasks by sharing parameters with other networks. In practice, however, MTL involves searching an enormous space of possible parameter sharing architectures to find (a) the layers or subspaces that benefit from sharing, (b) the appropriate amount of sharing, and (c) the appropriate relative weights of the different task losses. Recent work has addressed each of the above problems in isolation. In this work we present an approach that learns a latent multi-task architecture that jointly addresses (a)--(c). We present experiments on synthetic data and data from OntoNotes 5.0, including four different tasks and seven different domains. Our extension consistently outperforms previous approaches to learning latent architectures for multi-task problems and achieves up to 15% average error reductions over common approaches to MTL.
neural-net  nlp  multi-task 
29 days ago
[1711.02257] GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks
Deep multitask networks, in which one neural network produces multiple predictive outputs, can offer better speed and performance than their single-task counterparts but are challenging to train properly. We present a gradient normalization (GradNorm) algorithm that automatically balances training in deep multitask models by dynamically tuning gradient magnitudes. We show that for various network architectures, for both regression and classification tasks, and on both synthetic and real datasets, GradNorm improves accuracy and reduces overfitting across multiple tasks when compared to single-task networks, static baselines, and other adaptive multitask loss balancing techniques. GradNorm also matches or surpasses the performance of exhaustive grid search methods, despite only involving a single asymmetry hyperparameter α. Thus, what was once a tedious search process that incurred exponentially more compute for each task added can now be accomplished within a few training runs, irrespective of the number of tasks. Ultimately, we will demonstrate that gradient manipulation affords us great control over the training dynamics of multitask networks and may be one of the keys to unlocking the potential of multitask learning.
neural-net  normalization  gradient  multi-task 
4 weeks ago
Should Companies Be Allowed to Issue Stock with Unequal Voting Rights?
"While media companies, such as The New York Times Co., Comcast, DISH Network, AMC holdings, Liberty Media, News Corporation, and Viacom have traditionally had dual-class shares — arguably to maintain news independence — a more important recent development is the widespread adoption of dual-class structure by technology companies. Almost 50% of recent technology listings have a dual-class status. We explored reasons for the growing use of the dual-class structure in an HBS case study among technology companies. Our nickel summary is that their growing popularity is due to the increasing importance of intangible investments, the rise of activist investors, and the decline of other protection mechanisms available to existing management such as staggered boards and poison pills. A dual-class structure, offering immunity against proxy contests initiated by short-term investors, could be optimal if it enables founder-managers to ignore pressures from the capital markets and avoid myopic actions such as cutting research and development and delaying corporate restructuring."
corporation  stock  multi-class 
4 weeks ago
NEMISIG 2019 | Brought to you by Brooklyn College
"NEMISIG (North East Music Information Special Interest Group) is a yearly informal meeting for Music Information Retrieval researchers who work at the intersection of computer science, mathematics, and music."
workshops  nemisig  music  ir 
4 weeks ago
You don't know JAX
"JAX is a Python library which augments numpy and Python code with function transformations which make it trivial to perform operations common in machine learning programs. Concretely, this makes it simple to write standard Python/numpy code and immediately be able to

Compute the derivative of a function via a successor to autograd
Just-in-time compile a function to run efficiently on an accelerator via XLA
Automagically vectorize a function, so that e.g. you can process a “batch” of data in parallel"
4 weeks ago
Martians Build Two Immense Canals In Two Years - The New York Times
"Vast Engineering Works Accomplished in an Incredibly Short Time by Our Planetary Neighbors -Wonders of the September Sky."
nyt  mars  history 
5 weeks ago
Compiler Explorer
"Compiler Explorer is an interactive online compiler which shows the assembly output of compiled C++, Rust, Go (and many more) code."
webapps  compiler  c  c++  go  rust 
5 weeks ago
enkimute/ganja.js: Geometric Algebra for Javascript (with operator overloading and algebraic literals)
"Ganja.js is a Geometric Algebra code generator for javascript. It generates Clifford algebras and sub-algebras of any signature and implements operator overloading and algebraic constants.

(Mathematically, an algebra generated by ganja.js is a graded exterior (Grassmann) algebra (or one of its subalgebras) with a non-metric outer product, extended (Clifford) with geometric and contraction inner products, a Poincare duality operator and the main involutions and morphisms.)

(Technically, ganja.js is a code generator producing classes that reificate algebraic literals and expressions by using reflection, a built-in tokenizer and a simple AST translator to rewrite functions containing algebraic constructs to their procedural counterparts.)

(Practically, ganja.js enables real math syntax inside javascript, with element, vector and matrix operations over reals, complex numbers, dual numbers, hyperbolic numbers, vectors, spacetime events, quaternions, dual quaternions, biquaternions or any other Clifford Algebra.)"
javascript  libs  geometric-algebra 
5 weeks ago
[1502.05767] Automatic differentiation in machine learning: a survey
Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in machine learning. Automatic differentiation (AD), also called algorithmic differentiation or simply "autodiff", is a family of techniques similar to but more general than backpropagation for efficiently and accurately evaluating derivatives of numeric functions expressed as computer programs. AD is a small but established field with applications in areas including computational fluid dynamics, atmospheric sciences, and engineering design optimization. Until very recently, the fields of machine learning and AD have largely been unaware of each other and, in some cases, have independently discovered each other's results. Despite its relevance, general-purpose AD has been missing from the machine learning toolbox, a situation slowly changing with its ongoing adoption under the names "dynamic computational graphs" and "differentiable programming". We survey the intersection of AD and machine learning, cover applications where AD has direct relevance, and address the main implementation techniques. By precisely defining the main differentiation techniques and their interrelationships, we aim to bring clarity to the usage of the terms "autodiff", "automatic differentiation", and "symbolic differentiation" as these are encountered more and more in machine learning settings.
surveys  automatic-differentiation  machine-learning 
5 weeks ago
[1901.03403] Mean Estimation from One-Bit Measurements
We consider the problem of estimating the mean of a symmetric log-concave distribution under the following constraint: only a single bit per sample from this distribution is available to the estimator. We study the mean squared error (MSE) risk in this estimation as a function of the number of samples, and hence the number of bits, from this distribution. Under an adaptive setting in which each bit is a function of the current sample and the previously observed bits, we show that the optimal relative efficiency compared to the sample mean is the efficiency of the median. For example, in estimating the mean of a normal distribution, a constraint of one bit per sample incurs a penalty of π/2 in sample size compared to the unconstrained case. We also consider a distributed setting where each one-bit message is only a function of a single sample. We derive lower bounds on the MSE in this setting, and show that the optimal efficiency can only be attained at a finite number of points in the parameter space. Finally, we analyze a distributed setting where the bits are obtained by comparing each sample against a prescribed threshold. Consequently, we consider the threshold density that minimizes the maximal MSE. Our results indicate that estimating the mean from one-bit measurements is equivalent to estimating the sample median from these measurements. In the adaptive case, this estimate can be done with vanishing error for any point in the parameter space. In the distributed case, this estimate can be done with vanishing error only for a finite number of possible values for the unknown mean.
statistics  estimation 
5 weeks ago
[1802.07810] Manipulating and Measuring Model Interpretability
Despite a growing literature on creating interpretable machine learning methods, there have been few experimental studies of their effects on end users. We present a series of large-scale, randomized, pre-registered experiments in which participants were shown functionally identical models that varied only in two factors thought to influence interpretability: the number of input features and the model transparency (clear or black-box). Participants who were shown a clear model with a small number of features were better able to simulate the model's predictions. However, contrary to what one might expect when manipulating interpretability, we found no significant difference in multiple measures of trust across conditions. Even more surprisingly, increased transparency hampered people's ability to detect when a model has made a sizeable mistake. These findings emphasize the importance of studying how models are presented to people and empirically verifying that interpretable models achieve their intended effects on end users.
machine-learning  interpretation 
5 weeks ago
Designing neural networks through neuroevolution | Nature Machine Intelligence
Much of recent machine learning has focused on deep learning, in which neural network weights are trained through variants of stochastic gradient descent. An alternative approach comes from the field of neuroevolution, which harnesses evolutionary algorithms to optimize neural networks, inspired by the fact that natural brains themselves are the products of an evolutionary process. Neuroevolution enables important capabilities that are typically unavailable to gradient-based approaches, including learning neural network building blocks (for example activation functions), hyperparameters, architectures and even the algorithms for learning themselves. Neuroevolution also differs from deep learning (and deep reinforcement learning) by maintaining a population of solutions during search, enabling extreme exploration and massive parallelization. Finally, because neuroevolution research has (until recently) developed largely in isolation from gradient-based neural network research, it has developed many unique and effective techniques that should be effective in other machine learning areas too. This Review looks at several key aspects of modern neuroevolution, including large-scale computing, the benefits of novelty and diversity, the power of indirect encoding, and the field’s contributions to meta-learning and architecture search. Our hope is to inspire renewed interest in the field as it meets the potential of the increasing computation available today, to highlight how many of its ideas can provide an exciting resource for inspiration and hybridization to the deep learning, deep reinforcement learning and machine learning communities, and to explain how neuroevolution could prove to be a critical tool in the long-term pursuit of artificial general intelligence.
surveys  neural-net  meta-learning  architecture-search  evolutionary-algorithms 
5 weeks ago
Size-Independent Sample Complexity of Neural Networks
We study the sample complexity of learning neural networks, by providing new bounds on their Rademacher complexity assuming norm constraints on the parameter matrix of each layer. Compared to previous work, these complexity bounds have improved dependence on the network depth, and under some additional assumptions, are fully independent of the network size (both depth and width). These results are derived using some novel techniques, which may be of independent interest.
neural-net  complexity 
5 weeks ago
[1812.08951] Analysis Methods in Neural Language Processing: A Survey
"The field of natural language processing has seen impressive progress in recent years, with neural network models replacing many of the traditional systems. A plethora of new models have been proposed, many of which are thought to be opaque compared to their feature-rich counterparts. This has led researchers to analyze, interpret, and evaluate neural networks in novel and more fine-grained ways. In this survey paper, we review analysis methods in neural language processing, categorize them according to prominent research trends, highlight existing limitations, and point to potential directions for future work."
surveys  nlp  analysis 
5 weeks ago
Classifying Prediction Errors
"Understanding prediction errors and determining how to fix them is critical to building effective predictive systems. In this paper, we delineate four types of prediction errors (mislabeling, representation, learner and boundary errors) and demonstrate that these four types characterize all prediction errors. In addition, we describe potential remedies and tools that can be used to reduce the uncertainty when trying to determine the source of a prediction error and when trying to take action to remove a prediction error."
machine-learning  error  interactive-learning 
5 weeks ago
[1804.02476v2] Associative Compression Networks for Representation Learning
This paper introduces Associative Compression Networks (ACNs), a new framework for variational autoencoding with neural networks. The system differs from existing variational autoencoders (VAEs) in that the prior distribution used to model each code is conditioned on a similar code from the dataset. In compression terms this equates to sequentially transmitting the dataset using an ordering determined by proximity in latent space. Since the prior need only account for local, rather than global variations in the latent space, the coding cost is greatly reduced, leading to rich, informative codes. Crucially, the codes remain informative when powerful, autoregressive decoders are used, which we argue is fundamentally difficult with normal VAEs. Experimental results on MNIST, CIFAR-10, ImageNet and CelebA show that ACNs discover high-level latent features such as object class, writing style, pose and facial expression, which can be used to cluster and classify the data, as well as to generate diverse and convincing samples. We conclude that ACNs are a promising new direction for representation learning: one that steps away from IID modelling, and towards learning a structured description of the dataset as a whole.
representation-learning  acn  vae  data-ordering 
6 weeks ago
The Great Suspender - Chrome Web Store
"A lightweight extension to reduce chrome's memory footprint. Perfect if you have a lot of tabs open at the same time. Tabs that have not been viewed after a configurable length of time will be automagically suspended in the background, freeing up the memory and CPU being consumed by that tab."
chrome  extensions  tab  memory 
6 weeks ago
Sure thing. A few years ago, everyone switched their deep nets to "residual net... | Hacker News
resnet: "The idea is that it's easier to model a small change to an almost-correct answer than to output the whole improved answer at once.

In the last couple of years a few different groups noticed that this looks like a primitive ODE solver (Euler's method) [...]

We just took this idea to its logical extreme: What if we _define_ a deep net as a continuously evolving system? So instead of updating the hidden units layer by layer, we define their derivative with respect to depth instead. We call this an ODE net.

Now, we can use off-the-shelf adaptive ODE solvers to compute the final state of these dynamics, and call that the output of the neural network. This has drawbacks (it's slower to train) but lots of advantages too: We can loosen the numerical tolerance of the solver to make our nets faster at test time. We can also handle continuous-time models a lot more naturally. It turns out that there is also a simpler version of the change of variables formula (for density modeling) when you move to continuous time. "
neural-net  ode  resnet 
9 weeks ago
[1706.04902] A Survey Of Cross-lingual Word Embedding Models
Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models for low-resource languages. In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions. The recurring theme of the survey is that many of the models presented in the literature optimize for the same objectives, and that seemingly different models are often equivalent modulo optimization strategies, hyper-parameters, and such. We also discuss the different ways cross-lingual word embeddings are evaluated, as well as future challenges and research horizons.
surveys  nlp  word-embedding  cross-embedding  via:hustwj 
9 weeks ago
[1812.03253] Counterfactuals uncover the modular structure of deep generative models
Deep generative models such as Generative Adversarial Networks (GANs) and Variational Auto-Encoders (VAEs) are important tools to capture and investigate the properties of complex empirical data. However, the complexity of their inner elements makes their functioning challenging to assess and modify. In this respect, these architectures behave as black box models. In order to better understand the function of such networks, we analyze their modularity based on the counterfactual manipulation of their internal variables. Experiments with face images support that modularity between groups of channels is achieved to some degree within convolutional layers of vanilla VAE and GAN generators. This helps understand the functional organization of these systems and allows designing meaningful transformations of the generated images without further training.
neural-net  gan  vae  analysis  interpretation  counterfactual 
9 weeks ago
Adversarial Robustness - Theory and Practice
"This web page contains materials to accompany the NeurIPS 2018 tutorial, “Adversarial Robustness: Theory and Practice”, by Zico Kolter and Aleksander Madry. The notes are in very early draft form, and we will be updating them (organizing material more, writing them in a more consistent form with the relevant citations, etc) for an official release in early 2019. Until then, however, we hope they are still a useful reference that can be used to explore some of the key ideas and methodology behind adversarial robustness, from standpoints of both generating adversarial attacks on classifiers and training classifiers that are inherently robust."
10 weeks ago
Which US cities have good and bad public transportation - Vox
"Christof Spieler, a structural engineer and urban planner from Houston, has lots of opinions about public transit in America and elsewhere. In his new book, Trains, Buses, People: An Opinionated Atlas of US Transit, he maps out 47 metro areas that have rail transit or bus rapid transit, ranks the best and worst systems, and offers advice on how to build better networks."
cities  transportation  books 
10 weeks ago
Compact Representation of Uncertainty in Clustering
For many classic structured prediction problems, probability distributions over the dependent variables can be efficiently computed using widely-known algorithms and data structures (such as forward-backward, and its corresponding trellis for exact probability distributions in Markov models). However, we know of no previous work studying efficient representations of exact distributions over clusterings. This paper presents definitions and proofs for a dynamic-programming inference procedure that computes the partition function, the marginal probability of a cluster, and the MAP clustering---all exactly. Rather than the Nth Bell number, these exact solutions take time and space proportional to the substantially smaller powerset of N. Indeed, we improve upon the time complexity of the algorithm introduced by Kohonen and Corander (2016) for this problem by a factor of N. While still large, this previously unknown result is intellectually interesting in its own right, makes feasible exact inference for important real-world small data applications (such as medicine), and provides a natural stepping stone towards sparse-trellis approximations that enable further scalability (which we also explore). In experiments, we demonstrate the superiority of our approach over approximate methods in analyzing real-world gene expression data used in cancer treatment.
clustering  uncertainty 
10 weeks ago
[1805.07820] Targeted Adversarial Examples for Black Box Audio Systems
The application of deep recurrent networks to audio transcription has led to impressive gains in automatic speech recognition (ASR) systems. Many have demonstrated that small adversarial perturbations can fool deep neural networks into incorrectly predicting a specified target with high confidence. Current work on fooling ASR systems have focused on white-box attacks, in which the model architecture and parameters are known. In this paper, we adopt a black-box approach to adversarial generation, combining the approaches of both genetic algorithms and gradient estimation to solve the task. We achieve a 89.25% targeted attack similarity after 3000 generations while maintaining 94.6% audio file similarity.
adversarial-examples  audio  black-box 
10 weeks ago
[1803.01814] Norm matters: efficient and accurate normalization schemes in deep networks
Over the past few years batch-normalization has been commonly used in deep networks, allowing faster training and high performance for a wide variety of applications. However, the reasons behind its merits remained unanswered, with several shortcomings that hindered its use for certain tasks. In this work we present a novel view on the purpose and function of normalization methods and weight-decay, as tools to decouple weights' norm from the underlying optimized objective. We also improve the use of weight-normalization and show the connection between practices such as normalization, weight decay and learning-rate adjustments. Finally, we suggest several alternatives to the widely used L2 batch-norm, using normalization in L1 and L∞ spaces that can substantially improve numerical stability in low-precision implementations as well as provide computational and memory benefits. We demonstrate that such methods enable the first batch-norm alternative to work for half-precision implementations.
neural-net  normalization 
10 weeks ago
[1806.10909] ResNet with one-neuron hidden layers is a Universal Approximator
We demonstrate that a very deep ResNet with stacked modules with one neuron per hidden layer and ReLU activation functions can uniformly approximate any Lebesgue integrable function in d dimensions, i.e. ℓ1(ℝd). Because of the identity mapping inherent to ResNets, our network has alternating layers of dimension one and d. This stands in sharp contrast to fully connected networks, which are not universal approximators if their width is the input dimension d [Lu et al, 2017; Hanin and Sellke, 2017]. Hence, our result implies an increase in representational power for narrow deep networks by the ResNet architecture.
resnet  neural-net  universal-approximator 
10 weeks ago
Modern Neural Networks Generalize on Small Data Sets
In this paper, we use a linear program to empirically decompose fitted neural networks into ensembles of low-bias sub-networks. We show that these sub-networks are relatively uncorrelated which leads to an internal regularization process, very much like a random forest, which can explain why a neural network is surprisingly resistant to overfitting. We then demonstrate this in practice by applying large neural networks, with hundreds of parameters per training observation, to a collection of 116 real-world data sets from the UCI Machine Learning Repository. This collection of data sets contains a much smaller number of training examples than the types of image classification tasks generally studied in the deep learning literature, as well as non-trivial label noise. We show that even in this setting deep neural nets are capable of achieving superior classification accuracy without overfitting.
neural-net  generalization  small-data  richard-berk 
10 weeks ago
[1808.01204] Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data
Neural networks have many successful applications, while much less theoretical understanding has been gained. Towards bridging this gap, we study the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization. In the overparameterized setting, when the data comes from mixtures of well-separated distributions, we prove that SGD learns a network with a small generalization error, albeit the network has enough capacity to fit arbitrary labels. Furthermore, the analysis provides interesting insights into several aspects of learning neural networks and can be verified based on empirical studies on synthetic data and on the MNIST dataset.
neural-net  sgd  generalization 
10 weeks ago
« earlier      
active-learning advice ai ajax algorithms amazon analysis architecture argumentation art asp.net asr audio bayesian bioinformatics biology blogs book books browser business c c++ classification cli clustering code color comparison compsci computer-vision concurrency convnet courses critique css culture d3 data data-analysis data-mining database datasets debugging deep-learning design dip distcomp django dsp dtw economics education email erlang evolution extension facebook finance firefox food free functional funny gan genetics geo geometry git google graph graphical-models graphics gui haskell history html http humor image information-theory internet ir java javascript journalism jquery knn language latex library libs links linux logic mac machine-learning mapping maps markets math matlab matplotlib matrix memory mobile model-selection music net networks neural-net nlp notes numeric numpy nyc opensource optimization papers parallel pdf people performance philosophy photos physics pkg playlist plc plotting plugins politics postgresql privacy probability productivity proglang programming psychology python r read rec recipes ref reference regression regularization reinforcement-learning research rest reviews rnn ruby scalability scaling scicomp science scifi search security sgd similarity slides social-software software speech sql startup statcomp statistics stats submodularity surveys swdev talks teaching tech tensorflow testing text thesis time-series tips tutorial tutorials twitter ui unix utils via:arthegall via:chl via:cshalizi video videos vim visualization web webapp webapps webdev windows writing

Copy this bookmark: