**csantos + statistics**
667

Entropy | Free Full-Text | On Accuracy of PDF Divergence Estimators and Their Applicability to Representative Data Sampling

11 weeks ago by csantos

This paper investigates the possibility of using various probability density function divergence measures for the purpose of representative data sampling. As it turned out, the first difficulty one needs to deal with is estimation of the divergence itself. In contrast to other publications on this subject, the experimental results provided in this study show that in many cases it is not possible unless samples consisting of thousands of instances are used. Exhaustive experiments on the divergence guided representative data sampling have been performed using 26 publicly available benchmark datasets and 70 PDF divergence estimators, and their results have been analysed and discussed.

kullback-leibler
InformationTheory
divergences
estimation
statistics
density-estimation
11 weeks ago by csantos

[0803.4101] Measuring and testing dependence by correlation of distances

11 weeks ago by csantos

Distance correlation is a new measure of dependence between random vectors. Distance covariance and distance correlation are analogous to product-moment covariance and correlation, but unlike the classical definition of correlation, distance correlation is zero only if the random vectors are independent. The empirical distance dependence measures are based on certain Euclidean distances between sample elements rather than sample moments, yet have a compact representation analogous to the classical covariance and correlation. Asymptotic properties and applications in testing independence are discussed. Implementation of the test and Monte Carlo results are also presented.

Statistics
correlation
dependence_measures
11 weeks ago by csantos

Fanning the Flames of Hate: Social Media and Hate Crime by Karsten Müller, Carlo Schwarz :: SSRN

12 weeks ago by csantos

This paper investigates the link between social media and hate crime using Facebook data. We study the case of Germany, where the recently emerged right-wing party Alternative für Deutschland (AfD) has developed a major social media presence. We show that right-wing anti-refugee sentiment on Facebook predicts violent crimes against refugees in otherwise similar municipalities with higher social media usage. To further establish causality, we exploit exogenous variation in major internet and Facebook outages, which fully undo the correlation between social media and hate crime. We further find that the effect decreases with distracting news events; increases with user network interactions; and does not hold for posts unrelated to refugees. Our results suggest that social media can act as a propagation mechanism between online hate speech and real-life violent crime.

facebook
doom
SocialSciences
social-media
statistics
causation
12 weeks ago by csantos

Archive ouverte HAL - Assessing and tuning brain decoders: cross-validation, caveats, and guidelines

july 2018 by csantos

Decoding, ie prediction from brain images or signals, calls for empirical evaluation of its predictive power. Such evaluation is achieved via cross-validation, a method also used to tune decoders' hyper-parameters. This paper is a review on cross-validation procedures for decoding in neuroimaging. It includes a didactic overview of the relevant theoretical considerations. Practical aspects are highlighted with an extensive empirical study of the common decoders in within-and across-subject predictions, on multiple datasets –anatomical and functional MRI and MEG– and simulations. Theory and experiments outline that the popular " leave-one-out " strategy leads to unstable and biased estimates, and a repeated random splits method should be preferred. Experiments outline the large error bars of cross-validation in neuroimaging settings: typical confidence intervals of 10%. Nested cross-validation can tune decoders' parameters while avoiding circularity bias. However we find that it can be more favorable to use sane defaults, in particular for non-sparse decoders.

Neuroscience
medical
imaging
crossvalidation
bias
statistics
july 2018 by csantos

Police killings and their spillover effects on the mental health of black Americans: a population-based, quasi-experimental study - The Lancet

june 2018 by csantos

Police kill more than 300 black Americans—at least a quarter of them unarmed—each year in the USA. These events might have spillover effects on the mental health of people not directly affected.

mentalHealth
health
healthcare
statistics
racism
inequality
violence
june 2018 by csantos

Predictive modeling of U.S. health care spending in late life | Science

june 2018 by csantos

Most deaths are unpredictable; hence, focusing on end-of-life spending does not necessarily identify “wasteful” spending.

health
healthcare
medical
informatics
Statistics
june 2018 by csantos

RLN keras tutorial

june 2018 by csantos

This is a quick tutorial of the use of the Keras RLN implementation.

First, let's import and create the train and test set. In this tutorial, we're using the Boston housing price regression dataset, with additional noise features.

Statistics
MachineLearning
NeuralNetworks
DeepLearning
keras
regularization
First, let's import and create the train and test set. In this tutorial, we're using the Boston housing price regression dataset, with additional noise features.

june 2018 by csantos

[1609.06840] Exact Sampling from Determinantal Point Processes

april 2018 by csantos

Determinantal point processes (DPPs) are an important concept in random matrix theory and combinatorics. They have also recently attracted interest in the study of numerical methods for machine learning, as they offer an elegant "missing link" between independent Monte Carlo sampling and deterministic evaluation on regular grids, applicable to a general set of spaces. This is helpful whenever an algorithm explores to reduce uncertainty, such as in active learning, Bayesian optimization, reinforcement learning, and marginalization in graphical models. To draw samples from a DPP in practice, existing literature focuses on approximate schemes of low cost, or comparably inefficient exact algorithms like rejection sampling. We point out that, for many settings of relevance to machine learning, it is also possible to draw exact samples from DPPs on continuous domains. We start from an intuitive example on the real line, which is then generalized to multivariate real vector spaces. We also compare to previously studied approximations, showing that exact sampling, despite higher cost, can be preferable where precision is needed.

sampling
Statistics
MachineLearning
Probability
april 2018 by csantos

[1707.04345] Gaussian Graphical Models: An Algebraic and Geometric Perspective

april 2018 by csantos

Gaussian graphical models are used throughout the natural sciences, social sciences, and economics to model the statistical relationships between variables of interest in the form of a graph. We here provide a pedagogic introduction to Gaussian graphical models and review recent results on maximum likelihood estimation for such models. Throughout, we highlight the rich algebraic and geometric properties of Gaussian graphical models and explain how these properties relate to convex optimization and ultimately result in insights on the existence of the maximum likelihood estimator (MLE) and algorithms for computing the MLE.

via:arthegall
GraphicalModels
statistics
AlgebraicGeometry
april 2018 by csantos

Deep Learning, Structure and Innate Priors | Abigail See

february 2018 by csantos

Video. Yann LeCun and Christopher Manning discuss the role of priors/structure in machine learning.

yannlecun
ChrisManning
watchlist
papers
prior
statistics
NeuralNetworks
MachineLearning
february 2018 by csantos

[1711.11561] Measuring the tendency of CNNs to Learn Surface Statistical Regularities

january 2018 by csantos

Our main finding is that CNNs exhibit a tendency to latch onto the Fourier image statistics of the training dataset, sometimes exhibiting up to a 28% generalization gap across the various test sets. Moreover, we observe that significantly increasing the depth of a network has a very marginal impact on closing the aforementioned generalization gap. Thus we provide quantitative evidence supporting the hypothesis that deep CNNs tend to learn surface statistical regularities in the dataset rather than higher-level abstract concepts.

machinelearning
deeplearning
deep-learning
machine-learning
by:YoshuaBengio
NeuralNetworks
generalization
statistics
january 2018 by csantos

Prophet | Prophet is a forecasting procedure implemented in R and Python. It is fast and provides completely automated forecasts that can be tuned by hand by data scientists and analysts.

facebook machinelearning statistics forecasting TimeSeries python rstats stan probabilisticProgramming

january 2018 by csantos

facebook machinelearning statistics forecasting TimeSeries python rstats stan probabilisticProgramming

january 2018 by csantos

[1706.09141] Causal Structure Learning

november 2017 by csantos

Graphical models can represent a multivariate distribution in a convenient and accessible form as a graph. Causal models can be viewed as a special class of graphical models that not only represent the distribution of the observed system but also the distributions under external interventions. They hence enable predictions under hypothetical interventions, which is important for decision making. The challenging task of learning causal models from data always relies on some underlying assumptions. We discuss several recently proposed structure learning algorithms and their assumptions, and compare their empirical performance under various scenarios.

papers
surveys
graphicalmodels
Causality
Statistics
MachineLearning
via:arsyed
november 2017 by csantos

Statistics IB

september 2017 by csantos

Spiegelhalter's course at Cambridge.

Statistics
by:DavidSpiegelhalter
september 2017 by csantos

What physics can tell us about inference ? [video]

december 2016 by csantos

There is a deep analogy between statistical inference and statistical physics; I will give a friendly introduction to both of these fields. I will then discuss phase transitions in two problems of interest to a broad range of data sciences: community detection in social and biological networks, and clustering of sparse high-dimensional data. In both cases, if our data becomes too sparse or too noisy, it suddenly becomes impossible to find the underlying pattern, or even tell if there is one. Physics both helps us locate these phase transiitons, and design optimal algorithms that succeed all the way up to this point. Along the way, I will visit ideas from computational complexity, random graphs, random matrices, and spin glass theory.

Statistics
physics
MachineLearning
watchlist
by:ChristopherMoore
december 2016 by csantos

The Great Minds Journal Club discusses Westfall & Yarkoni (2016) – [citation needed]

june 2016 by csantos

“The basic problem the authors highlight is pretty simple,” said Samantha. “It’s easy to illustrate with an example. Say you want to know if eating more bacon is associated with a higher incidence of colorectal cancer–like that paper that came out a while ago suggested. In theory, you could just ask people how often they eat bacon and how often they get cancer, and then correlate the two. But suppose you find a positive correlation–what can you conclude?”

epidemiology
measurement
dialogue
statistics
causality
june 2016 by csantos

libFM

june 2016 by csantos

Factorization machines (FM) are a generic approach that allows to mimic most factorization models by feature engineering. This way, factorization machines combine the generality of feature engineering with the superiority of factorization models in estimating interactions between categorical variables of large domain. libFM is a software implementation for factorization machines that features stochastic gradient descent (SGD) and alternating least squares (ALS) optimization as well as Bayesian inference using Markov Chain Monte Carlo (MCMC).

Source code

factorization
MachineLearning
statistics
interactions
Source code

june 2016 by csantos

Mere renovation is not enough:

january 2016 by csantos

Discussion about Statistics Curriculum

Statistics
teaching
curriculum
january 2016 by csantos

Collaborative Statistics

june 2014 by csantos

Collaborative Statistics was written by Barbara Illowsky and Susan Dean, faculty members at De Anza College in Cupertino, California. The textbook was developed over several years and has been used in regular and honors-level classroom settings and in distance learning classes. This textbook is intended for introductory statistics courses being taken by students at two– and four–year colleges who are majoring in fields other than math or engineering. Intermediate algebra is the only prerequisite. The book focuses on applications of statistical knowledge rather than the theory behind it.

Statistics
book
to:ipe
june 2014 by csantos

**related tags**

Copy this bookmark: