💥 Training Neural Nets on Larger Batches: Practical Tips on 1-GPU, Multi-GPU & Distributed setups
"How can you train your model on large batches when your GPU can’t hold more than a few samples?

There are several tools, tips and tricks you can use to do that and I thought it would be nice to gather all the things I use and learned in a post."
pytorch  scaling  gpu  parallel  neural-net  batch-size  gradient-accumulation 
8 hours ago
On the robustness and discriminative power of information retrieval metrics for top-N recommendation
"The evaluation of Recommender Systems is still an open issue in the field. Despite its limitations, offline evaluation usually constitutes the first step in assessing recommendation methods due to its reduced costs and high reproducibility. Selecting the appropriate metric is a critical and ranking accuracy usually attracts the most attention nowadays. In this paper, we aim to shed light on the advantages of different ranking metrics which were previously used in Information Retrieval and are now used for assessing top-N recommenders. We propose methodologies for comparing the robustness and the discriminative power of different metrics. On the one hand, we study cut-offs and we find that deeper cut-offs offer greater robustness and discriminative power. On the other hand, we find that precision offers high robustness and Normalised Discounted Cumulative Gain provides the best discriminative power."
recsys  ir  evaluation  metrics 
10 days ago
interactiveaudiolab/nussl: A simple audio source separation library built in python
At its core, nussl contains implementations of the following source separation algorithms:
Spatialization algorithms:
Degenerate Unmixing Estimation Technique (DUET)
Repetition algorithms:
REpeating Pattern Extraction Technique (REPET)
REPET using the cosine similarity matrix (REPET-SIM)
Separation via 2DFT
General matrix decomposition/Component Analysis:
Non-negative Matrix Factorization with MFCC clustering (NMF)
Robust Principal Component Analysis (RPCA)
Independent Component Analysis (ICA)
Ideal Mask
High/Low Pass Filtering
Composite Methods
Overlap Add
Algorithm Picker (multicue separation)
Other Foreground/Background Decompositions
Harmonic/Percussive Source Separation (HPSS)
Melody Tracking separation (Melodia)
Deep Learning
Deep Clustering
python  libs  audio  source-separation  deep-clustering 
11 days ago
Classification accuracy is not enough | SpringerLink
We argue that an evaluation of system behavior at the level of the music is required to usefully address the fundamental problems of music genre recognition (MGR), and indeed other tasks of music information retrieval, such as autotagging. A recent review of works in MGR since 1995 shows that most (82 %) measure the capacity of a system to recognize genre by its classification accuracy. After reviewing evaluation in MGR, we show that neither classification accuracy, nor recall and precision, nor confusion tables, necessarily reflect the capacity of a system to recognize genre in musical signals. Hence, such figures of merit cannot be used to reliably rank, promote or discount the genre recognition performance of MGR systems if genre recognition (rather than identification by irrelevant confounding factors) is the objective. This motivates the development of a richer experimental toolbox for evaluating any system designed to intelligently extract information from music signals.
music  genre  machine-learning  evaluation  metrics 
11 days ago
Saturday Morning Breakfast Cereal - Rise of the Machines
"thanks to machine-learning algorithms, the robot apocalypse was short-lived"
ai  machine-learning  funny 
12 days ago
Facebook Is Giving Advertisers Access to Your Shadow Contact Information
"Facebook is not content to use the contact information you willingly put into your Facebook profile for advertising. It is also using contact information you handed over for security purposes and contact information you didn’t hand over at all, but that was collected from other people’s contact books, a hidden layer of details Facebook has about you that I’ve come to call “shadow contact information.” I managed to place an ad in front of Alan Mislove by targeting his shadow profile. This means that the junk email address that you hand over for discounts or for shady online shopping is likely associated with your account and being used to target you with ads.

They found that when a user gives Facebook a phone number for two-factor authentication or in order to receive alerts about new log-ins to a user’s account, that phone number became targetable by an advertiser within a couple of weeks. So users who want their accounts to be more secure are forced to make a privacy trade-off and allow advertisers to more easily find them on the social network. When asked about this, a Facebook spokesperson said that “we use the information people provide to offer a more personalized experience, including showing more relevant ads.”"
facebook  advertising  privacy 
12 days ago
Padasip — Padasip 1.1.1 documentation
"This library is designed to simplify adaptive signal processing tasks within python (filtering, prediction, reconstruction, classification)."
python  libs  dsp  adaptive 
13 days ago
Vigilante engineer stops Waymo from patenting key lidar technology | Ars Technica
The USPTO was not impressed. In March, an examiner noted that a re-drawn diagram of Waymo's lidar firing circuit showed current passing along a wire between the circuit and the ground in two directions—something generally deemed impossible. "Patent owner's expert testimony is not convincing to show that the path even goes to ground in view of the magic ground wire, which shows current moving in two directions along a single wire," noted the examiners dryly.
patents  ip  lidar  waymo 
13 days ago
Common Voice
"We are building an open and publicly available dataset of voices that everyone can use to train speech-enabled applications."
datasets  speech  mozilla 
13 days ago
CheatSheet - Know your short cuts
Just hold the ⌘-Key a bit longer to get a list of all active short cuts of the current application. It's as simple as that.
software  mac  keyboard  shortcuts 
16 days ago
Surprisingly Easy Hard-Attention for Sequence to Sequence Learning
In this paper we show that a simple beam approximation of the joint distribution between attention and output is an easy, accurate, and efficient attention mechanism for sequence to sequence learning. The method combines the advantage of sharp focus in hard attention and the implementation ease of soft attention. On five translation and two morphological inflection tasks we show effortless and consistent gains in BLEU compared to existing attention mechanisms
rnn  sequential-modeling  attention  sunita-sarawagi 
19 days ago
[1606.03402] Length bias in Encoder Decoder Models and a Case for Global Conditioning
Encoder-decoder networks are popular for modeling sequences probabilistically in many applications. These models use the power of the Long Short-Term Memory (LSTM) architecture to capture the full dependence among variables, unlike earlier models like CRFs that typically assumed conditional independence among non-adjacent variables. However in practice encoder-decoder models exhibit a bias towards short sequences that surprisingly gets worse with increasing beam size.
In this paper we show that such phenomenon is due to a discrepancy between the full sequence margin and the per-element margin enforced by the locally conditioned training objective of a encoder-decoder model. The discrepancy more adversely impacts long sequences, explaining the bias towards predicting short sequences.
For the case where the predicted sequences come from a closed set, we show that a globally conditioned model alleviates the above problems of encoder-decoder models. From a practical point of view, our proposed model also eliminates the need for a beam-search during inference, which reduces to an efficient dot-product based search in a vector-space.
rnn  sequential-modeling  encoder-decoder  sunita-sarawagi 
19 days ago
A fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks. It is under the umbrella of the DMTK(http://github.com/microsoft/dmtk) project of Microsoft.
boosting  libs  machine-learning 
25 days ago
[1809.04356] Deep learning for time series classification: a review
Time Series Classification (TSC) is an important and challenging problem in data mining. With the increase of time series data availability, hundreds of TSC algorithms have been proposed. Among these methods, only a few have considered Deep Neural Networks (DNNs) to perform this task. This is surprising as deep learning has seen very successful applications in the last years. DNNs have indeed revolutionized the field of computer vision especially with the advent of novel deeper architectures such as Residual and Convolutional Neural Networks. Apart from images, sequential data such as text and audio can also be processed with DNNs to reach state of the art performance for document classification and speech recognition. In this article, we study the current state of the art performance of deep learning algorithms for TSC by presenting an empirical study of the most recent DNN architectures for TSC. We give an overview of the most successful deep learning applications in various time series domains under a unified taxonomy of DNNs for TSC. We also provide an open source deep learning framework to the TSC community where we implemented each of the compared approaches and evaluated them on a univariate TSC benchmark (the UCR archive) and 12 multivariate time series datasets. By training 8,730 deep learning models on 97 time series datasets, we propose the most exhaustive study of DNNs for TSC to date.
surveys  time-series  deep-learning  classification 
26 days ago
[1603.08507] Generating Visual Explanations
Clearly explaining a rationale for a classification decision to an end-user can be as important as the decision itself. Existing approaches for deep visual recognition are generally opaque and do not output any justification text; contemporary vision-language models can describe image content but fail to take into account class-discriminative image aspects which justify visual predictions. We propose a new model that focuses on the discriminating properties of the visible object, jointly predicts a class label, and explains why the predicted label is appropriate for the image. We propose a novel loss function based on sampling and reinforcement learning that learns to generate sentences that realize a global sentence property, such as class specificity. Our results on a fine-grained bird species classification dataset show that our model is able to generate explanations which are not only consistent with an image but also more discriminative than descriptions produced by existing captioning methods.
neural-net  explanation  computer-vision 
29 days ago
[1803.05984] Deep Co-Training for Semi-Supervised Image Recognition
In this paper, we study the problem of semi-supervised image recognition, which is to learn classifiers using both labeled and unlabeled images. We present Deep Co-Training, a deep learning based method inspired by the Co-Training framework. The original Co-Training learns two classifiers on two views which are data from different sources that describe the same instances. To extend this concept to deep learning, Deep Co-Training trains multiple deep neural networks to be the different views and exploits adversarial examples to encourage view difference, in order to prevent the networks from collapsing into each other. As a result, the co-trained networks provide different and complementary information about the data, which is necessary for the Co-Training framework to achieve good results. We test our method on SVHN, CIFAR-10/100 and ImageNet datasets, and our method outperforms the previous state-of-the-art methods by a large margin.
neural-net  co-training 
29 days ago
[1610.02242] Temporal Ensembling for Semi-Supervised Learning
In this paper, we present a simple and efficient method for training deep neural networks in a semi-supervised setting where only a small portion of training data is labeled. We introduce self-ensembling, where we form a consensus prediction of the unknown labels using the outputs of the network-in-training on different epochs, and most importantly, under different regularization and input augmentation conditions. This ensemble prediction can be expected to be a better predictor for the unknown labels than the output of the network at the most recent training epoch, and can thus be used as a target for training. Using our method, we set new records for two standard semi-supervised learning benchmarks, reducing the (non-augmented) classification error rate from 18.44% to 7.05% in SVHN with 500 labels and from 18.63% to 16.55% in CIFAR-10 with 4000 labels, and further to 5.12% and 12.16% by enabling the standard augmentations. We additionally obtain a clear improvement in CIFAR-100 classification accuracy by using random images from the Tiny Images dataset as unlabeled extra inputs during training. Finally, we demonstrate good tolerance to incorrect labels.
semisupervised  neural-net  ensemble 
29 days ago
Hi! I’m Mike Bostock, creator of D3.js and a former graphics editor for The New York Times. I do data visualization, design and open source. AMA! : dataisbeautiful
I’m unavoidably biased towards people I’ve worked with, so my favorites are stuff by Jason Davies (World Airport Voronoi, Animated World Zoom) and my former colleagues at the New York Times (Gregor Aisch did amazing work on mapping migration, Shan Carter’s explainer on the number 342,200).
visualization  ama  reddit  d3  mike-bostock 
29 days ago
Effectiveness of Animation in Trend Visualization - IEEE Journals & Magazine
Animation has been used to show trends in multi-dimensional data. This technique has recently gained new prominence for presentations, most notably with Gapminder Trendalyzer. In Trendalyzer, animation together with interesting data and an engaging presenter helps the audience understand the results of an analysis of the data. It is less clear whether trend animation is effective for analysis. This paper proposes two alternative trend visualizations that use static depictions of trends: one which shows traces of all trends overlaid simultaneously in one display and a second that uses a small multiples display to show the trend traces side-by-side. The paper evaluates the three visualizations for both analysis and presentation. Results indicate that trend animation can be challenging to use even for presentations; while it is the fastest technique for presentation and participants find it enjoyable and exciting, it does lead to many participant errors. Animation is the least effective form for analysis; both static depictions of trends are significantly faster than animation, and the small multiples display is more accurate.
visualization  animation  gapminder  hans-rosling  perception 
29 days ago
Unfolding the Earth: Myriahedral Projections
"Mapping the earth is a classic problem. For thousands of years cartographers, mathematicians, and inventors have come up with methods to map the curved surface of the earth to a flat plane. The main problem is that you cannot do this perfectly, such that both the shape and size of the surface are depicted properly everywhere. This has intrigued me for a long time. Why not just take a map of a small part of the earth, which is almost perfect, glue neighboring maps to it, and repeat this until the whole earth is shown? Of course you get interrupts, but does this matter? What does such a map look like? To check this out, we developed myriahedral projections."
cartography  projection  myriahedral 
29 days ago
[1201.3011] Spring Embedders and Force Directed Graph Drawing Algorithms
Force-directed algorithms are among the most flexible methods for calculating layouts of simple undirected graphs. Also known as spring embedders, such algorithms calculate the layout of a graph using only information contained within the structure of the graph itself, rather than relying on domain-specific knowledge. Graphs drawn with these algorithms tend to be aesthetically pleasing, exhibit symmetries, and tend to produce crossing-free layouts for planar graphs. In this survey we consider several classical algorithms, starting from Tutte's 1963 barycentric method, and including recent scalable multiscale methods for large and dynamic graphs.
surveys  graph  layout  stephen-kobourov 
29 days ago
Science Isn’t Broken | FiveThirtyEight
As a society, our stories about how science works are also prone to error. The standard way of thinking about the scientific method is: ask a question, do a study, get an answer. But this notion is vastly oversimplified. A more common path to truth looks like this: ask a question, do a study, get a partial or ambiguous answer, then do another study, and then do another to keep testing potential hypotheses and homing in on a more complete answer. Human fallibilities send the scientific process hurtling in fits, starts and misdirections instead of in a straight line from question to truth.

Media accounts of science tend to gloss over the nuance, and it’s easy to understand why. For one thing, reporters and editors who cover science don’t always have training on how to interpret studies. And headlines that read “weak, unreplicated study finds tenuous link between certain vegetables and cancer risk” don’t fly off the newsstands or bring in the clicks as fast as ones that scream “foods that fight cancer!”
science  statistics 
29 days ago
STAT 509: Clinical Trial
"This course is a survey of statistical methods and study design issues related to the testing of medical treatments. There are 19 lessons in this graduate level course"
statistics  biostatistics  clinical-trial  experiment-design 
29 days ago
Optimized Active Learning Strategy for Audiovisual Speaker Recognition | SpringerLink
The purpose of this work is to investigate the improved recognition accuracy caused from exploiting optimization stages for tuning parameters of an Active Learning (AL) classifier. Since plenty of data could be available during Speaker Recognition (SR) tasks, the AL concept, which incorporates human entities inside its learning kernel for exploring hidden insights into unlabeled data, seems extremely suitable, without demanding much expertise on behalf of the human factor. Six datasets containing 8 and 16 speakers’ utterances under different recording setups, are described by audiovisual features and evaluated through the time-efficient Uncertainty Sampling query strategy (UncS). Both Support Vector Machines (SVMs) and Random Forest (RF) algorithms were selected to be tuned over a small subset of the initial training data and then applied iteratively for mining the most suitable instances from a corresponding pool of unlabeled instances. Useful conclusions are drawn concerning the values of the selected parameters, allowing future optimization attempts to get employed into more restricted regions, while remarkable improvements rates were obtained using an ideal annotator.
active-learning  asr  audio-visual 
29 days ago
[1809.02882] Cost-Sensitive Active Learning for Intracranial Hemorrhage Detection
Deep learning for clinical applications is subject to stringent performance requirements, which raises a need for large labeled datasets. However, the enormous cost of labeling medical data makes this challenging. In this paper, we build a cost-sensitive active learning system for the problem of intracranial hemorrhage detection and segmentation on head computed tomography (CT). We show that our ensemble method compares favorably with the state-of-the-art, while running faster and using less memory. Moreover, our experiments are done using a substantially larger dataset than earlier papers on this topic. Since the labeling time could vary tremendously across examples, we model the labeling time and optimize the return on investment. We validate this idea by core-set selection on our large labeled dataset and by growing it with data from the wild.
active-learning  cost-sensitive 
29 days ago
Metric Learning Tutorial - Abstract - Europe PMC
Most popular machine learning algorithms like k-nearest neighbour, k-means, SVM uses a metric to identify the distance(or similarity) between data instances. It is clear that performances of these algorithm heavily depends on the metric being used. In absence of prior knowledge about data we can only use general purpose metrics like Euclidean distance, Cosine similarity or Manhattan distance etc, but these metric often fail to capture the correct behaviour of data which directly affects the performance of the learning algorithm. Solution to this problem is to tune the metric according to the data and the problem, manually deriving the metric for high dimensional data which is often difficult to even visualize is not only tedious but is extremely difficult. Which leads to put effort on \textit{metric learning} which satisfies the data geometry.Goal of metric learning algorithm is to learn a metric which assigns small distance to similar points and relatively large distance to dissimilar points.
tutorials  metric-learning 
29 days ago
Three recent trends in Paralinguistics on the way to omniscient machine intelligence | SpringerLink
A 2 year-old has approximately heard a 1000 h of speech—at the age of ten, around ten thousand. Similarly, automatic speech recognisers are often trained on data in these dimensions. In stark contrast, however, only few databases to train a speaker analysis system contain more than 10 h of speech and hardly ever more than 100 h. Yet, these systems are ideally expected to recognise the states and traits of speakers independent of the person, spoken content, language, cultural background, and acoustic disturbances best at human parity or even superhuman levels. While this is not reached at the time for many tasks such as speaker emotion recognition, deep learning—often described to lead to significant improvements—in combination with sufficient learning data, holds the promise to reach this goal. Luckily, every second, more than 5 h of video are uploaded to the web and several hundreds of hours of audio and video communication in most languages of the world take place. A major effort could thus be invested in efficient labelling and sharing of these. In this contribution, first, benchmarks are given from the nine research challenges co-organised by the authors over the years at the annual Interspeech conference since 2009. Then, approaches to utmost efficient exploitation of the ‘big’ (unlabelled) data available are presented. Small-world modelling in combination with unsupervised learning help to rapidly identify potential target data of interest. Further, gamified crowdsourcing combined with human-machine cooperative learning turns the annotation process into an entertaining experience, while reducing the manual labelling effort to a minimum. Moreover, increasingly autonomous deep holistic end-to-end learning solutions are presented for the tasks at hand. The concluding discussion will contain some crystal ball gazing alongside practical hints not missing out on ethical aspects.
speech  paralinguistics 
29 days ago
The Brier score does not evaluate the clinical utility of diagnostic tests or prediction models | Diagnostic and Prognostic Research | Full Text
A variety of statistics have been proposed as tools to help investigators assess the value of diagnostic tests or prediction models. The Brier score has been recommended on the grounds that it is a proper scoring rule that is affected by both discrimination and calibration. However, the Brier score is prevalence dependent in such a way that the rank ordering of tests or models may inappropriately vary by prevalence. We explored four common clinical scenarios: comparison of a highly accurate binary test with a continuous prediction model of moderate predictiveness; comparison of two binary tests where the importance of sensitivity versus specificity is inversely associated with prevalence; comparison of models and tests to default strategies of assuming that all or no patients are positive; and comparison of two models with miscalibration in opposite directions. In each case, we found that the Brier score gave an inappropriate rank ordering of the tests and models. Conversely, net benefit, a decision-analytic measure, gave results that always favored the preferable test or model. Brier score does not evaluate clinical value of diagnostic tests or prediction models. We advocate, as an alternative, the use of decision-analytic measures such as net benefit.
statistics  scoring  brier 
4 weeks ago
Principles of Traditional Animation Applied to Computer Animation
Many of the principles of traditional animation were developed in the 1930's at the Walt Disney studios. These principles were developed to make animation, especially character animation, more realistic and entertaining. These principles can and should be applied to 3D computer animation.
animation  patterns  john-lasseter 
4 weeks ago
The Concept of External Validity on JSTOR
Many researchers feel that external validity must be emphasized even in theoretical research. The argument for both a sophisticated and a common sense version of this contention is refuted in this paper. It is concluded that the very nature of progress in theoretical research argues against attempting to maximize external validity in the context of any single study.
statistics  measurement  validity  external 
4 weeks ago
« earlier      
active-learning advice ai ajax algorithms amazon analysis animation architecture argumentation art asp.net asr audio bayesian bioinformatics biology blogs book books brain browser business c c++ classification cli clustering code color comparison compsci computer-vision concurrency courses critique css culture d3 data data-analysis data-mining database datasets debugging deep-learning design dip distcomp django dsp dtw economics education email erlang evolution extension facebook finance firefox food free functional funny gan genetics geo geometry git google graph graphical-models graphics gui haskell history html http humor image information-theory internet ir java javascript journalism jquery knn language latex library libs links linux logic mac machine-learning mapping maps markets math matlab matplotlib matrix memory mobile music net networks neural-net nlp notes numeric numpy nyc opensource optimization pandas papers parallel pdf people performance philosophy photos physics pkg playlist plc plotting plugins politics postgresql privacy probability productivity proglang programming psychology python r read rec recipes ref reference regression reinforcement-learning research rest reviews rnn ruby scalability scaling scicomp science scifi search security shell similarity slides social-software software speech sql startup statcomp statistics stats submodularity surveys swdev talks teaching tech testing text thesis time-series tips tutorial tutorials twitter ui unix utils via:arthegall via:chl via:cshalizi video videos vim visualization web webapp webapps webdev windows writing

Copy this bookmark: