cshalizi + density_estimation   139

 « earlier
Devroye , Reddad : Discrete minimax estimation with trees
"We propose a simple recursive data-based partitioning scheme which produces piecewise-constant or piecewise-linear density estimates on intervals, and show how this scheme can determine the optimal L1L1 minimax rate for some discrete nonparametric classes."
to:NB  density_estimation  devroye.luc  minimax  nonparametrics  statistics
4 days ago by cshalizi
[1708.05254] Adaptive Clustering Using Kernel Density Estimators
"We derive and analyze a generic, recursive algorithm for estimating all splits in a finite cluster tree as well as the corresponding clusters. We further investigate statistical properties of this generic clustering algorithm when it receives level set estimates from a kernel density estimator. In particular, we derive finite sample guarantees, consistency, rates of convergence, and an adaptive data-driven strategy for choosing the kernel bandwidth. For these results we do not need continuity assumptions on the density such as Hölder continuity, but only require intuitive geometric assumptions of non-parametric nature."
to:NB  density_estimation  kernel_estimators  clustering  statistics
5 days ago by cshalizi
[1908.06081] Analyzing the Fine Structure of Distributions
"One aim of data mining is the identification of interesting structures in data. Basic properties of the empirical distribution, such as skewness and an eventual clipping, i.e., hard limits in value ranges, need to be assessed. Of particular interest is the question, whether the data originates from one process, or contains subsets related to different states of the data producing process. Data visualization tools should deliver a sensitive picture of the univariate probability density distribution (PDF) for each feature. Visualization tools for PDFs are typically kernel density estimates and range from the classical histogram to modern tools like bean or violin plots. Conventional methods have difficulties in visualizing the pdf in case of uniform, multimodal, skewed and clipped data if density estimation parameters remain in a default setting. As a consequence, a new visualization tool called Mirrored Density plot (MD plot) is proposed which is particularly designed to discover interesting structures in continuous features. The MD plot does not require any adjustments of parameters of density estimation which makes the usage compelling for non-experts. The visualization tools are evaluated in comparison to statistical tests for the typical challenges of explorative distribution analysis. The results are presented on bimodal Gaussian and skewed distributions as well as several features with published pdfs. In exploratory data analysis of 12 features describing the quarterly financial statements, when statistical testing becomes a demanding task, only the MD plots can identify the structure of their pdfs. Overall, the MD plot can outperform the methods mentioned above."
to:NB  visual_display_of_quantitative_information  density_estimation  statistics
5 days ago by cshalizi
[1907.13630] Kernel Density Estimation for Undirected Dyadic Data
"We study nonparametric estimation of density functions for undirected dyadic random variables (i.e., random variables defined for all n\overset{def}{\equiv}\tbinom{N}{2} unordered pairs of agents/nodes in a weighted network of order N). These random variables satisfy a local dependence property: any random variables in the network that share one or two indices may be dependent, while those sharing no indices in common are independent. In this setting, we show that density functions may be estimated by an application of the kernel estimation method of Rosenblatt (1956) and Parzen (1962). We suggest an estimate of their asymptotic variances inspired by a combination of (i) Newey's (1994) method of variance estimation for kernel estimators in the "monadic" setting and (ii) a variance estimator for the (estimated) density of a simple network first suggested by Holland and Leinhardt (1976). More unusual are the rates of convergence and asymptotic (normal) distributions of our dyadic density estimates. Specifically, we show that they converge at the same rate as the (unconditional) dyadic sample mean: the square root of the number, N, of nodes. This differs from the results for nonparametric estimation of densities and regression functions for monadic data, which generally have a slower rate of convergence than their corresponding sample mean."
to:NB  density_estimation  network_data_analysis  nonparametrics  kernel_methods  re:smoothing_adjacency_matrices
23 days ago by cshalizi
Nonparametric Estimation of Multivariate Mixtures: Journal of the American Statistical Association: Vol 0, No 0
"A multivariate mixture model is determined by three elements: the number of components, the mixing proportions, and the component distributions. Assuming that the number of components is given and that each mixture component has independent marginal distributions, we propose a nonparametric method to estimate the component distributions. The basic idea is to convert the estimation of component density functions to a problem of estimating the coordinates of the component density functions with respect to a good set of basis functions. Specifically, we construct a set of basis functions by using conditional density functions and try to recover the coordinates of component density functions with respect to this set of basis functions. Furthermore, we show that our estimator for the component density functions is consistent. Numerical studies are used to compare our algorithm with other existing nonparametric methods of estimating component distributions under the assumption of conditionally independent marginals."
to:NB  mixture_models  density_estimation  statistics
4 weeks ago by cshalizi
Cheng , Chen : Nonparametric inference via bootstrapping the debiased estimator
"In this paper, we propose to construct confidence bands by bootstrapping the debiased kernel density estimator (for density estimation) and the debiased local polynomial regression estimator (for regression analysis). The idea of using a debiased estimator was recently employed by Calonico et al. (2018b) to construct a confidence interval of the density function (and regression function) at a given point by explicitly estimating stochastic variations. We extend their ideas of using the debiased estimator and further propose a bootstrap approach for constructing simultaneous confidence bands. This modified method has an advantage that we can easily choose the smoothing bandwidth from conventional bandwidth selectors and the confidence band will be asymptotically valid. We prove the validity of the bootstrap confidence band and generalize it to density level sets and inverse regression problems. Simulation studies confirm the validity of the proposed confidence bands/sets. We apply our approach to an Astronomy dataset to show its applicability."
to:NB  to_read  statistics  bootstrap  confidence_sets  regression  density_estimation  re:ADAfaEPoV
7 weeks ago by cshalizi
[1906.07177] (f)RFCDE: Random Forests for Conditional Density Estimation and Functional Data
"Random forests is a common non-parametric regression technique which performs well for mixed-type unordered data and irrelevant features, while being robust to monotonic variable transformations. Standard random forests, however, do not efficiently handle functional data and runs into a curse-of dimensionality when presented with high-resolution curves and surfaces. Furthermore, in settings with heteroskedasticity or multimodality, a regression point estimate with standard errors do not fully capture the uncertainty in our predictions. A more informative quantity is the conditional density p(y | x) which describes the full extent of the uncertainty in the response y given covariates x. In this paper we show how random forests can be efficiently leveraged for conditional density estimation, functional covariates, and multiple responses without increasing computational complexity. We provide open-source software for all procedures with R and Python versions that call a common C++ library."
to:NB  ensemble_methods  regression  density_estimation  statistics  kith_and_kin  decision_trees  lee.ann_b.
9 weeks ago by cshalizi
[1903.00954] Conditional Density Estimation with Neural Networks: Best Practices and Benchmarks
"Given a set of empirical observations, conditional density estimation aims to capture the statistical relationship between a conditional variable x and a dependent variable y by modeling their conditional probability p(y|x). The paper develops best practices for conditional density estimation for finance applications with neural networks, grounded on mathematical insights and empirical evaluations. In particular, we introduce a noise regularization and data normalization scheme, alleviating problems with over-fitting, initialization and hyper-parameter sensitivity of such estimators. We compare our proposed methodology with popular semi- and non-parametric density estimators, underpin its effectiveness in various benchmarks on simulated and Euro Stoxx 50 data and show its superior performance. Our methodology allows to obtain high-quality estimators for statistical expectations of higher moments, quantiles and non-linear return transformations, with very little assumptions about the return dynamic."
to:NB  neural_networks  density_estimation  statistics  nonparametrics
april 2019 by cshalizi
Multivariate Density Estimation | Wiley Series in Probability and Statistics
"Featuring a thoroughly revised presentation, Multivariate Density Estimation: Theory, Practice, and Visualization, Second Edition maintains an intuitive approach to the underlying methodology and supporting theory of density estimation. Including new material and updated research in each chapter, the Second Edition presents additional clarification of theoretical opportunities, new algorithms, and up-to-date coverage of the unique challenges presented in the field of data analysis.
"The new edition focuses on the various density estimation techniques and methods that can be used in the field of big data. Defining optimal nonparametric estimators, the Second Edition demonstrates the density estimation tools to use when dealing with various multivariate structures in univariate, bivariate, trivariate, and quadrivariate data analysis. Continuing to illustrate the major concepts in the context of the classical histogram, Multivariate Density Estimation: Theory, Practice, and Visualization, Second Edition also features:
Over 150 updated figures to clarify theoretical results and to show analyses of real data sets
An updated presentation of graphic visualization using computer software such as R
A clear discussion of selections of important research during the past decade, including mixture estimation, robust parametric modeling algorithms, and clustering
More than 130 problems to help readers reinforce the main concepts and ideas presented
Boxed theorems and results allowing easy identification of crucial ideas
Figures in color in the digital versions of the book
A website with related data sets
"Multivariate Density Estimation: Theory, Practice, and Visualization, Second Edition is an ideal reference for theoretical and applied statisticians, practicing engineers, as well as readers interested in the theoretical aspects of nonparametric estimation and the application of these methods to multivariate data. The Second Edition is also useful as a textbook for introductory courses in kernel statistics, smoothing, advanced computational statistics, and general forms of statistical distributions."
to:NB  books:noted  downloaded  smoothing  density_estimation  statistics  nonparametrics
january 2019 by cshalizi
Parzen : On Estimation of a Probability Density Function and Mode
In which Parzen introduces kernel density estimation, three years after Rosenblatt introduced it _in the same journal_.
september 2018 by cshalizi
Rosenblatt : Remarks on Some Nonparametric Estimates of a Density Function (1956)
"This note discusses some aspects of the estimation of the density function of a univariate probability distribution. All estimates of the density function satisfying relatively mild conditions are shown to be biased. The asymptotic mean square error of a particular class of estimates is evaluated."

--- In which Rosenblatt introduces kernel density estimation.
september 2018 by cshalizi
Density Estimation in Infinite Dimensional Exponential Families
"In this paper, we consider an infinite dimensional exponential family P of probability densities, which are parametrized by functions in a reproducing kernel Hilbert space H, and show it to be quite rich in the sense that a broad class of densities on ℝdRd can be approximated arbitrarily well in Kullback-Leibler (KL) divergence by elements in P. Motivated by this approximation property, the paper addresses the question of estimating an unknown density p0p0 through an element in P. Standard techniques like maximum likelihood estimation (MLE) or pseudo MLE (based on the method of sieves), which are based on minimizing the KL divergence between p0p0 and P, do not yield practically useful estimators because of their inability to efficiently handle the log-partition function. We propose an estimator p̂ np^n based on minimizing the Fisher divergence, J(p0‖p)J(p0‖p) between p0p0 and p∈p∈P, which involves solving a simple finite-dimensional linear system. When p0∈p0∈P, we show that the proposed estimator is consistent, and provide a convergence rate of n−min{23,2β+12β+2}n−min{23,2β+12β+2} in Fisher divergence under the smoothness assumption that logp0∈(Cβ)log⁡p0∈R(Cβ) for some β≥0β≥0, where CC is a certain Hilbert-Schmidt operator on H and (Cβ)R(Cβ) denotes the image of CβCβ. We also investigate the misspecified case of p0∉p0∉P and show that J(p0‖p̂ n)→infp∈J(p0‖p)J(p0‖p^n)→infp∈PJ(p0‖p) as n→∞n→∞, and provide a rate for this convergence under a similar smoothness condition as above. Through numerical simulations we demonstrate that the proposed estimator outperforms the non- parametric kernel density estimator, and that the advantage of the proposed estimator grows as dd increases."
in_NB  density_estimation  exponential_families  statistics  via:?
july 2018 by cshalizi
Boosting conditional probability estimators | SpringerLink
"In the standard agnostic multiclass model, <instance, label > pairs are sampled independently from some underlying distribution. This distribution induces a conditional probability over the labels given an instance, and our goal in this paper is to learn this conditional distribution. Since even unconditional densities are quite challenging to learn, we give our learner access to <instance, conditional distribution > pairs. Assuming a base learner oracle in this model, we might seek a boosting algorithm for constructing a strong learner. Unfortunately, without further assumptions, this is provably impossible. However, we give a new boosting algorithm that succeeds in the following sense: given a base learner guaranteed to achieve some average accuracy (i.e., risk), we efficiently construct a learner that achieves the same level of accuracy with arbitrarily high probability. We give generalization guarantees of several different kinds, including distribution-free accuracy and risk bounds. None of our estimates depend on the number of boosting rounds and some of them admit dimension-free formulations."
to:NB  boosting  density_estimation  kith_and_kin  kontorovich.aryeh  statistics
may 2018 by cshalizi
Sufficient Dimension Reduction via Direct Estimation of the Gradients of Logarithmic Conditional Densities | Neural Computation | MIT Press Journals
"Sufficient dimension reduction (SDR) is aimed at obtaining the low-rank projection matrix in the input space such that information about output data is maximally preserved. Among various approaches to SDR, a promising method is based on the eigendecomposition of the outer product of the gradient of the conditional density of output given input. In this letter, we propose a novel estimator of the gradient of the logarithmic conditional density that directly fits a linear-in-parameter model to the true gradient under the squared loss. Thanks to this simple least-squares formulation, its solution can be computed efficiently in a closed form. Then we develop a new SDR method based on the proposed gradient estimator. We theoretically prove that the proposed gradient estimator, as well as the SDR solution obtained from it, achieves the optimal parametric convergence rate. Finally, we experimentally demonstrate that our SDR method compares favorably with existing approaches in both accuracy and computational efficiency on a variety of artificial and benchmark data sets."
to:NB  dimension_reduction  sufficiency  density_estimation  linear_regression  statistics
january 2018 by cshalizi
Robust Conditional Probabilities
"Conditional probabilities are a core concept in machine learning. For example, optimal prediction of a label Y given an input X corresponds to maximizing the conditional probability of Y given X. A common approach to inference tasks is learning a model of conditional probabilities. However, these models are often based on strong assumptions (e.g., log-linear models), and hence their estimate of conditional probabilities is not robust and is highly dependent on the validity of their assumptions. Here we propose a framework for reasoning about conditional probabilities without assuming anything about the underlying distributions, except knowledge of their second order marginals, which can be estimated from data. We show how this setting leads to guaranteed bounds on conditional probabilities, which can be calculated efficiently in a variety of settings, including structured-prediction. Finally, we apply them to semi-supervised deep learning, obtaining results competitive with variational autoencoders."
to:NB  statistics  density_estimation  probability
november 2017 by cshalizi
The power of absolute discounting: all-dimensional distribution estimation
"Categorical models are the natural fit for many problems. When learning the distribution of categories from samples, high-dimensionality may dilute the data. Minimax optimality is too pessimistic to remedy this issue. A serendipitously discovered estimator, absolute discounting, corrects empirical frequencies by subtracting a constant from observed categories, which it then redistributes among the unobserved. It outperforms classical estimators empirically, and has been used extensively in natural language modeling. In this paper, we rigorously explain the prowess of this estimator using less pessimistic notions. We show (1) that absolute discounting recovers classical minimax KL-risk rates, (2) that it is \emph{adaptive} to an effective dimension rather than the true dimension, (3) that it is strongly related to the Good-Turing estimator and inherits its \emph{competitive} properties. We use power-law distributions as the corner stone of these results. We validate the theory via synthetic data and an application to the Global Terrorism Database."
to:NB  to_read  density_estimation  statistics  heavy_tails
november 2017 by cshalizi
Lectures on the Nearest Neighbor Method | SpringerLink
"This text presents a wide-ranging and rigorous overview of nearest neighbor methods, one of the most important paradigms in machine learning. Now in one self-contained volume, this book systematically covers key statistical, probabilistic, combinatorial and geometric ideas for understanding, analyzing and developing nearest neighbor methods."
in_NB  books:noted  nearest-neighbors  density_estimation  regression  classifiers  statistics  devroye.luc  biau.gerard  nonparametrics  re:ADAfaEPoV  entropy_estimation
august 2017 by cshalizi
Online Reinforcement Learning Using a Probability Density Estimation | Neural Computation | MIT Press Journals
"Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions."
to:NB  online_learning  density_estimation  statistics  reinforcement_learning
january 2017 by cshalizi
[1602.00531] Adaptive non-parametric estimation in the presence of dependence
"We consider non-parametric estimation problems in the presence of dependent data, notably non-parametric regression with random design and non-parametric density estimation. The proposed estimation procedure is based on a dimension reduction. The minimax optimal rate of convergence of the estimator is derived assuming a sufficiently weak dependence characterized by fast decreasing mixing coefficients. We illustrate these results by considering classical smoothness assumptions. However, the proposed estimator requires an optimal choice of a dimension parameter depending on certain characteristics of the function of interest, which are not known in practice. The main issue addressed in our work is an adaptive choice of this dimension parameter combining model selection and Lepski's method. It is inspired by the recent work of Goldenshluger and Lepski (2011). We show that this data-driven estimator can attain the lower risk bound up to a constant provided a fast decay of the mixing coefficients."
to:NB  statistics  regression  nonparametrics  learning_under_dependence  density_estimation  dimension_reduction
february 2016 by cshalizi
[1506.04513] Convex Risk Minimization and Conditional Probability Estimation
"This paper proves, in very general settings, that convex risk minimization is a procedure to select a unique conditional probability model determined by the classification problem. Unlike most previous work, we give results that are general enough to include cases in which no minimum exists, as occurs typically, for instance, with standard boosting algorithms. Concretely, we first show that any sequence of predictors minimizing convex risk over the source distribution will converge to this unique model when the class of predictors is linear (but potentially of infinite dimension). Secondly, we show the same result holds for \emph{empirical} risk minimization whenever this class of predictors is finite dimensional, where the essential technical contribution is a norm-free generalization bound."
to:NB  learning_theory  statistics  density_estimation  re:AoS_project
july 2015 by cshalizi
Chernozhukov , Chetverikov , Kato : Anti-concentration and honest, adaptive confidence bands
"Modern construction of uniform confidence bands for nonparametric densities (and other functions) often relies on the classical Smirnov–Bickel–Rosenblatt (SBR) condition; see, for example, Giné and Nickl [Probab. Theory Related Fields 143 (2009) 569–596]. This condition requires the existence of a limit distribution of an extreme value type for the supremum of a studentized empirical process (equivalently, for the supremum of a Gaussian process with the same covariance function as that of the studentized empirical process). The principal contribution of this paper is to remove the need for this classical condition. We show that a considerably weaker sufficient condition is derived from an anti-concentration property of the supremum of the approximating Gaussian process, and we derive an inequality leading to such a property for separable Gaussian processes. We refer to the new condition as a generalized SBR condition. Our new result shows that the supremum does not concentrate too fast around any value.
"We then apply this result to derive a Gaussian multiplier bootstrap procedure for constructing honest confidence bands for nonparametric density estimators (this result can be applied in other nonparametric problems as well). An essential advantage of our approach is that it applies generically even in those cases where the limit distribution of the supremum of the studentized empirical process does not exist (or is unknown). This is of particular importance in problems where resolution levels or other tuning parameters have been chosen in a data-driven fashion, which is needed for adaptive constructions of the confidence bands. Finally, of independent interest is our introduction of a new, practical version of Lepski’s method, which computes the optimal, nonconservative resolution levels via a Gaussian multiplier bootstrap method."

--- Ungated version: http://arxiv.org/abs/1303.7152
in_NB  confidence_sets  bootstrap  density_estimation  nonparametrics  statistics  regression  to_read  re:ADAfaEPoV
february 2015 by cshalizi
Silverman : Spline Smoothing: The Equivalent Variable Kernel Method
"The spline smoothing approach to nonparametric regression and curve estimation is considered. It is shown that, in a certain sense, spline smoothing corresponds approximately to smoothing by a kernel method with bandwidth depending on the local density of design points. Some exact calculations demonstrate that the approximation is extremely close in practice. Consideration of kernel smoothing methods demonstrates that the way in which the effective local bandwidth behaves in spline smoothing has desirable properties. Finally, the main result of the paper is applied to the related topic of penalized maximum likelihood probability density estimates; a heuristic discussion shows that these estimates should adapt well in the tails of the distribution."
have_read  splines  kernel_estimators  nonparametrics  regression  density_estimation  statistics  silverman.bernard  in_NB
february 2015 by cshalizi
Crisan , Míguez : Particle-kernel estimation of the filter density in state-space models
"Sequential Monte Carlo (SMC) methods, also known as particle filters, are simulation-based recursive algorithms for the approximation of the a posteriori probability measures generated by state-space dynamical models. At any given time t, a SMC method produces a set of samples over the state space of the system of interest (often termed “particles”) that is used to build a discrete and random approximation of the posterior probability distribution of the state variables, conditional on a sequence of available observations. One potential application of the methodology is the estimation of the densities associated to the sequence of a posteriori distributions. While practitioners have rather freely applied such density approximations in the past, the issue has received less attention from a theoretical perspective. In this paper, we address the problem of constructing kernel-based estimates of the posterior probability density function and its derivatives, and obtain asymptotic convergence results for the estimation errors. In particular, we find convergence rates for the approximation errors that hold uniformly on the state space and guarantee that the error vanishes almost surely as the number of particles in the filter grows. Based on this uniform convergence result, we first show how to build continuous measures that converge almost surely (with known rate) toward the posterior measure and then address a few applications. The latter include maximum a posteriori estimation of the system state using the approximate derivatives of the posterior density and the approximation of functionals of it, for example, Shannon’s entropy."
in_NB  particle_filters  kernel_estimators  density_estimation  filtering  state_estimation  state-space_models  statistics  computational_statistics
january 2015 by cshalizi
[1412.1716] Nonparametric Modal Regression
"Modal regression estimates the local modes of the distribution of Y given X = x, instead of the mean, as in the usual regression sense, and can hence reveal important structure missed by usual regression methods. We study a simple nonparametric method for modal regression, based on a kernel density estimate (KDE) of the joint distribution of Y and X. We derive asymptotic error bounds for this method, and propose techniques for constructing confidence sets and prediction sets. The latter is used to select the smoothing bandwidth of the underlying KDE. The idea behind modal regression is connected to many others, such as mixture regression and density ridge estimation, and we discuss these ties as well."
to:NB  regression  density_estimation  nonparametrics  statistics  kith_and_kin  genovese.chris  wasserman.larry  tibshirani.ryan
january 2015 by cshalizi
[1411.4040] Kernel Density Estimation on Symmetric Spaces
"We investigate a natural variant of kernel density estimation on a large class of symmetric spaces and prove a minimax rate of convergence as fast as the minimax rate on Euclidean space. We make neither compactness assumptions on the space nor Holder-class assumptions on the densities. A main tool used in proving the convergence rate is the Helgason-Fourier transform, a generalization of the Fourier transform for semisimple Lie groups modulo maximal compact subgroups. This paper obtains a simplified formula in the special case when the symmetric space is the 2-dimensional hyperboloid."
in_NB  density_estimation  statistics  nonparametrics  kith_and_kin  asta.dena  statistics_on_manifolds
november 2014 by cshalizi
[1307.7760] Geometric Inference on Kernel Density Estimates
"We show that geometric inference of a point cloud can be calculated by examining its kernel density estimate. This intermediate step results in the inference being statically robust to noise and allows for large computational gains and scalability (e.g. on 100 million points). In particular, by first creating a coreset for the kernel density estimate, the data representing the final geometric and topological structure has size depending only on the error tolerance, not on the size of the original point set or the complexity of the structure. To achieve this result, we study how to replace distance to a measure, as studied by Chazal, Cohen-Steiner, and Merigot, with the kernel distance. The kernel distance is monotonic with the kernel density estimate (sublevel sets of the kernel distance are superlevel sets of the kernel density estimate), thus allowing us to examine the kernel density estimate in this manner. We show it has several computational and stability advantages. Moreover, we provide an algorithm to estimate its topology using weighted Vietoris-Rips complexes."
to:NB  geometry  kernel_estimators  density_estimation  statistics  computational_statistics  have_read
october 2014 by cshalizi
Quality and efficiency for kernel density estimates in large data
"Kernel density estimates are important for a broad variety of applications. Their construction has been well-studied, but existing techniques are expensive on massive datasets and/or only provide heuristic approximations without theoretical guarantees. We propose randomized and deterministic algorithms with quality guarantees which are orders of magnitude more efficient than previous algorithms. Our algorithms do not require knowledge of the kernel or its bandwidth parameter and are easily parallelizable. We demonstrate how to implement our ideas in a centralized setting and in MapReduce, although our algorithms are applicable to any large-scale data processing framework. Extensive experiments on large real datasets demonstrate the quality, efficiency, and scalability of our techniques."

--- Ungated version: http://www.cs.utah.edu/~lifeifei/papers/kernelsigmod13.pdf
october 2014 by cshalizi
[0907.0199] High-Dimensional Density Estimation via SCA: An Example in the Modelling of Hurricane Tracks
"We present nonparametric techniques for constructing and verifying density estimates from high-dimensional data whose irregular dependence structure cannot be modelled by parametric multivariate distributions. A low-dimensional representation of the data is critical in such situations because of the curse of dimensionality. Our proposed methodology consists of three main parts: (1) data reparameterization via dimensionality reduction, wherein the data are mapped into a space where standard techniques can be used for density estimation and simulation; (2) inverse mapping, in which simulated points are mapped back to the high-dimensional input space; and (3) verification, in which the quality of the estimate is assessed by comparing simulated samples with the observed data. These approaches are illustrated via an exploration of the spatial variability of tropical cyclones in the North Atlantic; each datum in this case is an entire hurricane trajectory. We conclude the paper with a discussion of extending the methods to model the relationship between TC variability and climatic variables."
to_read  heard_the_talk  kith_and_kin  dimension_reduction  lee.ann_b.  buchman.susan  high-dimensional_statistics  density_estimation  meteorology  hurricanes  diffusion_maps  entableted  in_NB
april 2014 by cshalizi
High-Dimensional Density Ratio Estimation with Extensions to Approximate Likelihood Computation | AISTATS 2014 | JMLR W&CP
"The ratio between two probability density functions is an important component of various tasks, including selection bias correction, novelty detection and classification. Recently, several estimators of this ratio have been proposed. Most of these methods fail if the sample space is high-dimensional, and hence require a dimension reduction step, the result of which can be a significant loss of information. Here we propose a simple-to-implement, fully nonparametric density ratio estimator that expands the ratio in terms of the eigenfunctions of a kernel-based operator; these functions reflect the underlying geometry of the data (e.g., submanifold structure), often leading to better estimates without an explicit dimension reduction step. We show how our general framework can be extended to address another important problem, the estimation of a likelihood function in situations where that function cannot be well-approximated by an analytical form. One is often faced with this situation when performing statistical inference with data from the sciences, due the complexity of the data and of the processes that generated those data. We emphasize applications where using existing likelihood-free methods of inference would be challenging due to the high dimensionality of the sample space, but where our spectral series method yields a reasonable estimate of the likelihood function. We provide theoretical guarantees and illustrate the effectiveness of our proposed method with numerical experiments."
density_estimation  density_ratio_estimation  likelihood  spectral_methods  high-dimensional_statistics  kith_and_kin  lee.ann_b.  izbicki.rafael  read_the_thesis  in_NB
april 2014 by cshalizi
[1402.3466] Kernel density estimates in particle filter
"The paper deals with kernel estimates of densities of filtering distributions in the particle filter. The convergence of the estimates is investigated by means of Fourier analysis. It is shown that the estimates converge to theoretical filtering densities in the mean integrated squared error under a certain assumption on the Sobolev character of the filtering densities. A sufficient condition is presented for the persistence of this Sobolev character over time."
to:NB  density_estimation  kernel_estimators  particle_filters  filtering  statistics
march 2014 by cshalizi
[1402.2966] Nonparametric Estimation of Renyi Divergence and Friends
"We consider nonparametric estimation of L_2, Renyi-α and Tsallis-α divergences of continuous distributions. Our approach is to construct estimators for particular integral functionals of two densities and translate them into divergence estimators. For the integral functionals, our estimators are based on corrections of a preliminary plug-in estimator. We analyze the rates of convergence for our estimators and show that the parametric rate of n−1/2 is achievable when the densities' smoothness s are both at least d/4 where d is the dimension. We also derive minimax lower bounds for this problem which confirm that s>d/4 is necessary to achieve the n−1/2 rate of convergence. We confirm our theoretical guarantees with a number of simulations."
in_NB  information_theory  density_estimation  divergence_estimation  kith_and_kin  entropy_estimation  wasserman.larry  poczos.barnabas
february 2014 by cshalizi
[1312.1099] Multiscale Dictionary Learning for Estimating Conditional Distributions
"Nonparametric estimation of the conditional distribution of a response given high-dimensional features is a challenging problem. It is important to allow not only the mean but also the variance and shape of the response density to change flexibly with features, which are massive-dimensional. We propose a multiscale dictionary learning model, which expresses the conditional response density as a convex combination of dictionary densities, with the densities used and their weights dependent on the path through a tree decomposition of the feature space. A fast graph partitioning algorithm is applied to obtain the tree decomposition, with Bayesian methods then used to adaptively prune and average over different sub-trees in a soft probabilistic manner. The algorithm scales efficiently to approximately one million features. State of the art predictive performance is demonstrated for toy examples and two neuroscience applications including up to a million features."
to:NB  density_estimation  conditional_density_estimation  nonparametrics  statistics
january 2014 by cshalizi
[1311.4780] Asymptotically Exact, Embarrassingly Parallel MCMC
"Communication costs, resulting from synchronization requirements during learning, can greatly slow down many parallel machine learning algorithms. In this paper, we present a parallel Markov chain Monte Carlo (MCMC) algorithm in which subsets of data are processed independently, with very little communication. First, we arbitrarily partition data onto multiple machines. Then, on each machine, any classical MCMC method (e.g., Gibbs sampling) may be used to draw samples from a posterior distribution given the data subset. Finally, the samples from each machine are combined to form samples from the full posterior. This embarrassingly parallel algorithm effectively allows each machine to act independently on a subset of the data (without communication) until the final combination stage. We prove that our algorithm generates asymptotically exact samples and empirically demonstrate its effectiveness on several large-scale synthetic and real datasets."
in_NB  monte_carlo  density_estimation  distributed_systems  computational_statistics  xing.eric  have_read
december 2013 by cshalizi
[1312.3516] Density Estimation in Infinite Dimensional Exponential Families
"In this paper, we consider the problem of estimating densities in an infinite dimensional exponential family indexed by functions in a reproducing kernel Hilbert space. Since standard techniques like maximum likelihood estimation (MLE) or pseudo MLE (based on the method of sieves) do not yield practically useful estimators because of their inability to handle the log-partition function efficiently, we propose an estimator based on the score matching method introduced by Hyv\"{a}rinen, which involves solving a simple linear system. We show that the proposed estimator is consistent, and provide convergence rates under smoothness assumptions. We also empirically demonstrate that the proposed method outperforms the standard non-parametric kernel density estimator."

--- The theory is interesting, but I think the (brief) empirical comparison to kernel density estimation is a bit of a cheat, because they seem to be using the objective function _their_ method optimizes! Something like out-of-sample log-likelihood would have been much fairer.
to:NB  density_estimation  hilbert_space  kernel_methods  statistics  nonparametrics  have_read
december 2013 by cshalizi
Preadjusted non-parametric estimation of a conditional distribution function - Veraverbeke - 2013 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library
"The paper deals with non-parametric estimation of a conditional distribution function. We suggest a method of preadjusting the original observations non-parametrically through location and scale, to reduce the bias of the estimator. We derive the asymptotic properties of the estimator proposed. A simulation study investigating the finite sample performances of the estimators discussed is provided and reveals the gain that can be achieved. It is also shown how the idea of the preadjusting opens the path to improved estimators in other settings such as conditional quantile and density estimation, and conditional survival function estimation in the case of censored data."
to:NB  density_estimation  nonparametrics  statistics
november 2013 by cshalizi
Bott , Devroye , Kohler : Estimation of a distribution from data with small measurement errors
"In this paper we study the problem of estimation of a distribution from data that contain small measurement errors. The only assumption on these errors is that the average absolute measurement error converges to zero for sample size tending to infinity with probability one. In particular we do not assume that the measurement errors are independent with expectation zero. Throughout the paper we assume that the distribution, which has to be estimated, has a density with respect to the Lebesgue-Borel measure.
"We show that the empirical measure based on the data with measurement error leads to an uniform consistent estimate of the distribution function. Furthermore, we show that in general no estimate is consistent in the total variation sense for all distributions under the above assumptions. However, in case that the average measurement error converges to zero faster than a properly chosen sequence of bandwidths, the total variation error of the distribution estimate corresponding to a kernel density estimate converges to zero for all distributions. In case of a general additive error model we show that this result even holds if only the average measurement error converges to zero. The results are applied in the context of estimation of the density of residuals in a random design regression model, where the residual error is not independent from the predictor."
to:NB  density_estimation  error-in-variables  nonparametrics  statistics  kernel_estimators
october 2013 by cshalizi
[1309.6906] Hellinger Distance and Bayesian Non-Parametrics: Hierarchical Models for Robust and Efficient Bayesian Inference
"This paper introduces a hierarchical framework to incorporate Hellinger distance methods into Bayesian analysis. We propose to modify a prior over non-parametric densities with the exponential of twice the Hellinger distance between a candidate and a parametric density. By incorporating a prior over the parameters of the second density, we arrive at a hierarchical model in which a non-parametric model is placed between parameters and the data. The parameters of the family can then be estimated as hyperparameters in the model. In frequentist estimation, minimizing the Hellinger distance between a kernel density estimate and a parametric family has been shown to produce estimators that are both robust to outliers and statistically efficient when the parametric model is correct. In this paper, we demonstrate that the same results are applicable when a non-parametric Bayes density estimate replaces the kernel density estimate. We then demonstrate that robustness and efficiency also hold for the proposed hierarchical model. The finite-sample behavior of the resulting estimates is investigated by simulation and on real world data."
to:NB  density_estimation  nonparametrics  hooker.giles  statistics
september 2013 by cshalizi
Hall , Horowitz : A simple bootstrap method for constructing nonparametric confidence bands for functions
"Standard approaches to constructing nonparametric confidence bands for functions are frustrated by the impact of bias, which generally is not estimated consistently when using the bootstrap and conventionally smoothed function estimators. To overcome this problem it is common practice to either undersmooth, so as to reduce the impact of bias, or oversmooth, and thereby introduce an explicit or implicit bias estimator. However, these approaches, and others based on nonstandard smoothing methods, complicate the process of inference, for example, by requiring the choice of new, unconventional smoothing parameters and, in the case of undersmoothing, producing relatively wide bands. In this paper we suggest a new approach, which exploits to our advantage one of the difficulties that, in the past, has prevented an attractive solution to the problem—the fact that the standard bootstrap bias estimator suffers from relatively high-frequency stochastic error. The high frequency, together with a technique based on quantiles, can be exploited to dampen down the stochastic error term, leading to relatively narrow, simple-to-construct confidence bands."

--- Huh. Need to see how hard this would really be to implement, and also what it really establishes.
september 2013 by cshalizi
Uniform-in-Bandwidth Functional Limit Laws - Springer
"We provide uniform-in-bandwidth functional limit laws for the increments of the empirical and quantile processes. Our theorems, established in the framework of convergence in probability, imply new sharp uniform-in-bandwidth limit laws for functional estimators. In particular, they yield the explicit value of the asymptotic limiting constant for the uniform-in-bandwidth sup-norm of the random error of kernel density estimators. We allow the bandwidth to vary within the complete range for which the estimators are consistent."
in_NB  empirical_processes  density_estimation  statistics  convergence_of_stochastic_processes
august 2013 by cshalizi
[1109.2694] Kernel density estimation for stationary random fields
"In this paper, under natural and easily verifiable conditions, we prove the $\mathbb{L}^1$-convergence and the asymptotic normality of the Parzen-Rosenblatt density estimator for stationary random fields of the form $X_k = g(\varepsilon_{k-s}, s \in \Z^d)$, $k\in\Z^d$, where $(\varepsilon_i)_{i\in\Z^d}$ are i.i.d real random variables and $g$ is a measurable function defined on $\R^{\Z^d}$. Such kind of processes provides a general framework for stationary ergodic random fields. A Berry-Esseen's type central limit theorem is also given for the considered estimator."
to:NB  density_estimation  statistics  random_fields  mixing
july 2013 by cshalizi
Adaptive confidence sets in L^2 - Springer
"The problem of constructing confidence sets that are adaptive in L2 -loss over a continuous scale of Sobolev classes of probability densities is considered. Adaptation holds, where possible, with respect to both the radius of the Sobolev ball and its smoothness degree, and over maximal parameter spaces for which adaptation is possible. Two key regimes of parameter constellations are identified: one where full adaptation is possible, and one where adaptation requires critical regions be removed. Techniques used to derive these results include a general nonparametric minimax test for infinite-dimensional null- and alternative hypotheses, and new lower bounds for L2 -adaptive confidence sets."
in_NB  nonparametrics  density_estimation  confidence_sets  statistics  nickl.richard
july 2013 by cshalizi
Taylor & Francis Online :: Copula-Based Regression Estimation and Inference - Journal of the American Statistical Association - Volume 108, Issue 502
"We investigate a new approach to estimating a regression function based on copulas. The main idea behind this approach is to write the regression function in terms of a copula and marginal distributions. Once the copula and the marginal distributions are estimated, we use the plug-in method to construct our new estimator. Because various methods are available in the literature for estimating both a copula and a distribution, this idea provides a rich and flexible family of regression estimators. We provide some asymptotic results related to this copula-based regression modeling when the copula is estimated via profile likelihood and the marginals are estimated nonparametrically. We also study the finite sample performance of the estimator and illustrate its usefulness by analyzing data from air pollution studies."
to:NB  regression  copulas  density_estimation  nonparametrics  statistics
july 2013 by cshalizi
Non-parametric identification and estimation of the number of components in multivariate mixtures - Kasahara - 2013 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library
"We analyse the identifiability of the number of components in k-variate, M-component finite mixture models in which each component distribution has independent marginals, including models in latent class analysis. Without making parametric assumptions on the component distributions, we investigate how one can identify the number of components from the distribution function of the observed data. When k≥2, a lower bound on the number of components (M) is non-parametrically identifiable from the rank of a matrix constructed from the distribution function of the observed variables. Building on this identification condition, we develop a procedure to estimate a lower bound on the number of components consistently."
in_NB  mixture_models  nonparametrics  density_estimation  inference_to_latent_objects  re:g_paper  to_teach:undergrad-ADA
july 2013 by cshalizi
[1306.3185] Robust regression modeling and predictive recursion maximum likelihood
"In a regression context, robustness to specification of the error distribution is important. In this paper, we propose a general nonparametric scale mixture model for the error distribution. For such mixtures, the predictive recursion method has been shown to be a simple and computationally efficient alternative to existing methods. Here, a predictive recursion-based likelihood function is constructed, and estimation of the regression parameters proceeds by maximizing this function. A hybrid predictive recursion--EM algorithm is proposed for this purpose, and simulations and real data analyses compare its performance to a variety of existing methods. A useful by-product of the hybrid algorithm is a sequence of scores which can be used to identify outliers.'
to:NB  statistics  regression  nonparametrics  density_estimation  prediction  em_algorithm
june 2013 by cshalizi
[1306.2671] Posterior consistency of nonparametric location-scale mixtures for multivariate density estimation
"Density estimation represents one of the most successful applications of Bayesian Nonparametrics. In particular, Dirichlet process mixtures of normals are the golden standard for density estimation and their asymptotic properties have been studied extensively, especially in the univariate case. However a gap between practitioners and the current theoretical literature is present. So far, posterior asymptotic results in the multivariate case are available only for location mixtures of Gaussian kernels with independent prior on the common covariance matrix, while in practice as well as from a conceptual point of view a location-scale mixture is often preferable. In this paper we address posterior consistency for such general mixture models by adapting a convergence rate result which combines the usual low-entropy, high-mass sieve approach with a suitable summability condition. Specifically, we establish consistency for Dirichlet process mixtures of Gaussian kernels with various prior specifications on the covariance matrix including priors that parsimoniously model the covariance via a sparse factor model."
in_NB  density_estimation  bayesian_consistency  nonparametrics
june 2013 by cshalizi
[1306.2194] Adaptive Noisy Clustering
"The problem of adaptive noisy clustering is investigated. Given a set of noisy observations $Z_i=X_i+\epsilon_i$, $i=1,...,n$, the goal is to design clusters associated with the law of $X_i$'s, with unknown density $f$ with respect to the Lebesgue measure. Since we observe a corrupted sample, a direct approach as the popular {\it $k$-means} is not suitable in this case. In this paper, we propose a noisy $k$-means minimization, which is based on the $k$-means loss function and a deconvolution estimator of the density $f$. In particular, this approach suffers from the dependence on a bandwidth involved in the deconvolution kernel. Fast rates of convergence for the excess risk are proposed for a particular choice of the bandwidth, which depends on the smoothness of the density $f$.
"Then, we turn out into the main issue of the paper: the data-driven choice of the bandwidth. We state an adaptive upper bound for a new selection rule, called ERC (Empirical Risk Comparison). This selection rule is based on the Lepski's principle, where empirical risks associated with different bandwidths are compared. Finally, we illustrate that this adaptive rule can be used in many statistical problems of $M$-estimation where the empirical risk depends on a nuisance parameter."
in_NB  clustering  density_estimation  statistics
june 2013 by cshalizi
Flexible Copula Density Estimation with Penalized Hierarchical B-splines - Kauermann - 2013 - Scandinavian Journal of Statistics - Wiley Online Library
"The paper introduces a new method for flexible spline fitting for copula density estimation. Spline coefficients are penalized to achieve a smooth fit. To weaken the curse of dimensionality, instead of a full tensor spline basis, a reduced tensor product based on so called sparse grids (Notes Numer. Fluid Mech. Multidiscip. Des., 31, 1991, 241-251) is used. To achieve uniform margins of the copula density, linear constraints are placed on the spline coefficients, and quadratic programming is used to fit the model. Simulations and practical examples accompany the presentation."
to:NB  density_estimation  copulas  splines  statistics
june 2013 by cshalizi
[1306.0186] NADE: The real-valued neural autoregressive density-estimator
"We introduce RNADE, a new model for joint density estimation of real-valued vectors. Our model calculates the density of a datapoint as the product of one-dimensional conditionals modeled using mixture density networks with shared parameters. RNADE learns a distributed representation of the data, while having a tractable expression for the calculation of densities. A tractable likelihood allows direct comparison with other methods and training by standard gradient-based optimizers. We compare the performance of RNADE on several datasets of heterogeneous and perceptual data, finding it outperforms mixture models in all but one case."
to:NB  density_estimation  statistics
june 2013 by cshalizi
Lepski : Multivariate density estimation under sup-norm loss: Oracle approach, adaptation and independence structure
"This paper deals with the density estimation on ℝd under sup-norm loss. We provide a fully data-driven estimation procedure and establish for it a so-called sup-norm oracle inequality. The proposed estimator allows us to take into account not only approximation properties of the underlying density, but eventual independence structure as well. Our results contain, as a particular case, the complete solution of the bandwidth selection problem in the multivariate density model. Usefulness of the developed approach is illustrated by application to adaptive estimation over anisotropic Nikolskii classes."
may 2013 by cshalizi
[1305.5882] Limit theorems for kernel density estimators under dependent samples
"In this paper, we construct a moment inequality for mixing dependent random variables, it is of independent interest. As applications, the consistency of the kernel density estimation is investigated. Several limit theorems are established: First, the central limit theorems for the kernel density estimator $f_{n,K}(x)$ and its distribution function are constructed. Also, the convergence rates of $\|f_{n,K}(x)-Ef_{n,K}(x)\|_{p}$ in sup-norm loss and integral $L^{p}$-norm loss are proved. Moreover, the a.s. convergence rates of the supremum of $|f_{n,K}(x)-Ef_{n,K}(x)|$ over a compact set and the whole real line are obtained. It is showed, under suitable conditions on the mixing rates, the kernel function and the bandwidths, that the optimal rates for i.i.d. random variables are also optimal for dependent ones."

--- The "to_teach" is really "to_mention"
in_NB  kernel_estimators  density_estimation  statistical_inference_for_stochastic_processes  statistics  time_series  to_teach:undergrad-ADA  re:your_favorite_dsge_sucks
may 2013 by cshalizi
[1305.3207] Efficient Density Estimation via Piecewise Polynomial Approximation
"We give a highly efficient "semi-agnostic" algorithm for learning univariate probability distributions that are well approximated by piecewise polynomial density functions. Let $p$ be an arbitrary distribution over an interval $I$ which is $\tau$-close (in total variation distance) to an unknown probability distribution $q$ that is defined by an unknown partition of $I$ into $t$ intervals and $t$ unknown degree-$d$ polynomials specifying $q$ over each of the intervals. We give an algorithm that draws $\tilde{O}(t\new{(d+1)}/\eps^2)$ samples from $p$, runs in time $\poly(t,d,1/\eps)$, and with high probability outputs a piecewise polynomial hypothesis distribution $h$ that is $(O(\tau)+\eps)$-close (in total variation distance) to $p$. This sample complexity is essentially optimal; we show that even for $\tau=0$, any algorithm that learns an unknown $t$-piecewise degree-$d$ probability distribution over $I$ to accuracy $\eps$ must use $\Omega({\frac {t(d+1)} {\poly(1 + \log(d+1))}} \cdot {\frac 1 {\eps^2}})$ samples from the distribution, regardless of its running time. Our algorithm combines tools from approximation theory, uniform convergence, linear programming, and dynamic programming.
"We apply this general algorithm to obtain a wide range of results for many natural problems in density estimation over both continuous and discrete domains. These include state-of-the-art results for learning mixtures of log-concave distributions; mixtures of $t$-modal distributions; mixtures of Monotone Hazard Rate distributions; mixtures of Poisson Binomial Distributions; mixtures of Gaussians; and mixtures of $k$-monotone densities. Our general technique yields computationally efficient algorithms for all these problems, in many cases with provably optimal sample complexities (up to logarithmic factors) in all parameters."
to:NB  density_estimation  statistics  computational_statistics
may 2013 by cshalizi
[1304.5575] Inverse Density as an Inverse Problem: The Fredholm Equation Approach
"In this paper we address the problem of estimating the ratio $\frac{q}{p}$ where $p$ is a density function and $q$ is another density, or, more generally an arbitrary function. Knowing or approximating this ratio is needed in various problems of inference and integration, in particular, when one needs to average a function with respect to one probability distribution, given a sample from another. It is often referred as {\it importance sampling} in statistical inference and is also closely related to the problem of {\it covariate shift} in transfer learning as well as to various MCMC methods. It may also be useful for separating the underlying geometry of a space, say a manifold, from the density function defined on it.
"Our approach is based on reformulating the problem of estimating $\frac{q}{p}$ as an inverse problem in terms of an integral operator corresponding to a kernel, and thus reducing it to an integral equation, known as the Fredholm problem of the first kind. This formulation, combined with the techniques of regularization and kernel methods, leads to a principled kernel-based framework for constructing algorithms and for analyzing them theoretically.
"The resulting family of algorithms (FIRE, for Fredholm Inverse Regularized Estimator) is flexible, simple and easy to implement.
"We provide detailed theoretical analysis including concentration bounds and convergence rates for the Gaussian kernel in the case of densities defined on $\R^d$, compact domains in $\R^d$ and smooth $d$-dimensional sub-manifolds of the Euclidean space.
We also show experimental results including applications to classification and semi-supervised learning within the covariate shift framework and demonstrate some encouraging experimental comparisons. We also show how the parameters of our algorithms can be chosen in a completely unsupervised manner."
to:NB  inverse_problems  non-stationarity  density_estimation  statistics  kernel_estimators  belkin.mikhail
april 2013 by cshalizi
[1304.1761] On Bayesian supremum norm contraction rates
"Building on ideas from Castillo and Nickl, arXiv:1208.3862v2, a method is provided to study nonparametric Bayesian posterior convergence rates when `strong' measures of distances, such as the sup-norm, are considered. In particular, we show that likelihood methods can achieve optimal minimax sup-norm rates in density estimation on the unit interval. The introduced methodology is used to prove that commonly used families of prior distributions on densities, namely log-density priors and dyadic random density histograms, can indeed achieve optimal sup-norm rates of convergence. New results are also derived in the Gaussian white noise model as a further illustration of the presented techniques."
in_NB  bayesian_consistency  statistics  density_estimation
april 2013 by cshalizi
Relative Density-Ratio Estimation for Robust Distribution Comparison
"Divergence estimators based on direct approximation of density ratios without going through separate approximation of numerator and denominator densities have been successfully applied to machine learning tasks that involve distribution comparison such as outlier detection, transfer learning, and two-sample homogeneity test. However, since density-ratio functions often possess high fluctuation, divergence estimation is a challenging task in practice. In this letter, we use relative divergences for distribution comparison, which involves approximation of relative density ratios. Since relative density ratios are always smoother than corresponding ordinary density ratios, our proposed method is favorable in terms of nonparametric convergence speed. Furthermore, we show that the proposed divergence estimator has asymptotic variance independent of the model complexity under a parametric setup, implying that the proposed estimator hardly overfits even with complex models. Through experiments, we demonstrate the usefulness of the proposedapproach."

--- This is _not_ the relative density between p and q in the Handcock-Morris sense, just the ratio between p and ap+(1-a)q, for adjustable a. (This is to keep the density ratio from going to infinite anywhere.) The whole thing seems like a bit of a hack...
in_NB  density_estimation  statistics  two-sample_tests  goodness-of-fit  have_read
april 2013 by cshalizi
Raginsky , Silva , Lazebnik , Willett : A recursive procedure for density estimation on the binary hypercube
"This paper describes a recursive estimation procedure for multivariate binary densities (probability distributions of vectors of Bernoulli random variables) using orthogonal expansions. For d covariates, there are 2d basis coefficients to estimate, which renders conventional approaches computationally prohibitive when d is large. However, for a wide class of densities that satisfy a certain sparsity condition, our estimator runs in probabilistic polynomial time and adapts to the unknown sparsity of the underlying density in two key ways: (1) it attains near-minimax mean-squared error for moderate sample sizes, and (2) the computational complexity is lower for sparser densities. Our method also allows for flexible control of the trade-off between mean-squared error and computational complexity."
to:NB  heard_the_talk  density_estimation  statistics  nonparametrics  raginsky.maxim
march 2013 by cshalizi
Conditional transformation models - Hothorn - 2013 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library
"The ultimate goal of regression analysis is to obtain information about the conditional distribution of a response given a set of explanatory variables. This goal is, however, seldom achieved because most established regression models estimate only the conditional mean as a function of the explanatory variables and assume that higher moments are not affected by the regressors. The underlying reason for such a restriction is the assumption of additivity of signal and noise. We propose to relax this common assumption in the framework of transformation models. The novel class of semiparametric regression models proposed herein allows transformation functions to depend on explanatory variables. These transformation functions are estimated by regularized optimization of scoring rules for probabilistic forecasts, e.g. the continuous ranked probability score. The corresponding estimated conditional distribution functions are consistent. Conditional transformation models are potentially useful for describing possible heteroscedasticity, comparing spatially varying distributions, identifying extreme events, deriving prediction intervals and selecting variables beyond mean regression effects. An empirical investigation based on a heteroscedastic varying-coefficient simulation model demonstrates that semiparametric estimation of conditional distribution functions can be more beneficial than kernel-based non-parametric approaches or parametric generalized additive models for location, scale and shape."
march 2013 by cshalizi
[1303.1435] Nonparametric functionals as generalized functions
"The paper considers probability distribution, density, conditional distribution and density and conditional moments as well as their kernel estimators in spaces of generalized functions. This approach does not require restrictions on classes of distributions common in nonparametric estimation. Density in usual function spaces is not well-posed; this paper establishes existence and well-posedness of the generalized density function. It also demonstrates root-n convergence of the kernel density estimator in the space of generalized functions. It is shown that the usual kernel estimator of the conditional distribution converges at a parametric rate as a random process in the space of generalized functions to a limit Gaussian process regardless of pointwise existence of the conditional distribution. Conditional moments such as conditional mean are also be characterized via generalized functions. Convergence of the kernel estimators to the limit Gaussian process is shown to hold as long as the appropriate moments exist."
to:NB  density_estimation  statistics  nonparametrics  have_read
march 2013 by cshalizi
[1302.6422] Nonparametric Estimation of Phylogenetic Tree Distributions
"While the majority of gene histories found in a clade of organisms are expected to be generated by a common process (e.g. the coalescent process), it is well-known that numerous other coexisting processes (e.g. horizontal gene transfers, gene duplication and subsequent neofunctionalization) will cause some genes to exhibit a history quite distinct from the histories of the majority of genes. Such "outlying" gene trees are considered to be biologically interesting and identifying these genes has become an important problem in phylogenetics. In this paper we propose and implement a nonparametric method of estimating distributions of phylogenetic trees, with the goal of identifying trees which are significantly different from the rest of the trees in the sample.
"Our approach mimics the common statistical technique of kernel density estimation, using tree distances to define kernels. In contrast to parametric models, such as coalescent, nonparametric approaches avoid the problem of model mis-specification, which leads to potentially unreliable results. Our method demonstrated superior accuracy in outlier detection on simulated data, when compared to a previously published method. We also applied our method to a dataset of Apicomplexa genes, identifying a set of putative outliers. Our method for estimating tree distributions is implemented as the R package, kdetrees, and is available for download from CRAN."
to:NB  kernel_methods  density_estimation  phylogenetics  statistics  re:network_differences  to_teach:undergrad-ADA
march 2013 by cshalizi
Comparing distributions by using dependent normalized random-measure mixtures - Griffin - 2013 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library
"A methodology for the simultaneous Bayesian non-parametric modelling of several distributions is developed. Our approach uses normalized random measures with independent increments and builds dependence through the superposition of shared processes. The properties of the prior are described and the modelling possibilities of this framework are explored in detail. Efficient slice sampling methods are developed for inference. Various posterior summaries are introduced which allow better understanding of the differences between distributions. The methods are illustrated on simulated data and examples from survival analysis and stochastic frontier analysis."
to:NB  density_estimation  comparing_distributions  statistics
march 2013 by cshalizi
[1302.5125] High-Dimensional Probability Estimation with Deep Density Models
"One of the fundamental problems in machine learning is the estimation of a probability distribution from data. Many techniques have been proposed to study the structure of data, most often building around the assumption that observations lie on a lower-dimensional manifold of high probability. It has been more difficult, however, to exploit this insight to build explicit, tractable density models for high-dimensional data. In this paper, we introduce the deep density model (DDM), a new approach to density estimation. We exploit insights from deep learning to construct a bijective map to a representation space, under which the transformation of the distribution of the data is approximately factorized and has identical and known marginal densities. The simplicity of the latent distribution under the model allows us to feasibly explore it, and the invertibility of the map to characterize contraction of measure across it. This enables us to compute normalized densities for out-of-sample data. This combination of tractability and flexibility allows us to tackle a variety of probabilistic tasks on high-dimensional datasets, including: rapid computation of normalized densities at test-time without evaluating a partition function; generation of samples without MCMC; and characterization of the joint entropy of the data."
to:NB  density_estimation  statistics  machine_learning  inference_to_latent_objects
february 2013 by cshalizi
[1102.2450] Concentration Inequalities and Confidence Bands for Needlet Density Estimators on Compact Homogeneous Manifolds
"Let $X_1,...,X_n$ be a random sample from some unknown probability density $f$ defined on a compact homogeneous manifold $mathbf M$ of dimension $d ge 1$. Consider a 'needlet frame' ${phi_{j eta}}$ describing a localised projection onto the space of eigenfunctions of the Laplace operator on $mathbf M$ with corresponding eigenvalues less than $2^{2j}$, as constructed in cite{GP10}. We prove non-asymptotic concentration inequalities for the uniform deviations of the linear needlet density estimator $f_n(j)$ obtained from an empirical estimate of the needlet projection $sum_eta phi_{j eta} int f phi_{j eta}$ of $f$. We apply these results to construct risk-adaptive estimators and nonasymptotic confidence bands for the unknown density $f$. The confidence bands are adaptive over classes of differentiable and H"{older}-continuous functions on $mathbf M$ that attain their H"{o}lder exponents."
to:NB  density_estimation  statistics  nickl.richard  statistics_on_manifolds
february 2013 by cshalizi
IEEE Xplore - Estimation of a Density Using Real and Artificial Data
"Let $X, X_{1}, X_{2}, dots$ be independent and identically distributed ${BBR }^{d}$-valued random variables and let $m: {BBR }^{d} rightarrow {BBR }$ be a measurable function such that a density $f$ of $Y=m(X)$ exists. Given a sample of the distribution of $(X,Y)$ and additional independent observations of $X$ , we are interested in estimating $f$. We apply a regression estimate to the sample of $(X,Y)$ and use this estimate to generate additional artificial observations of $Y$ . Using these artificial observations together with the real observations of $Y$, we construct a density estimate of $f$ by using a convex combination of two kernel density estimates. It is shown that if the bandwidths satisfy the usual conditions and if in addition the supremum norm error of the regression estimate converges almost surely faster toward zero than the bandwidth of the kernel density estimate applied to the artificial data, then the convex combination of the two density estimates is $L_{1}$-consistent. The performance of the estimate for finite sample size is illustrated by simulated data, and the usefulness of the proced- re is demonstrated by applying it to a density estimation problem in a simulation model."
to:NB  regression  density_estimation  statistics  kernel_estimators  simulation  devroye.luc  semi-supervised_learning
february 2013 by cshalizi
[1010.4202] M"{o}bius deconvolution on the hyperbolic plane with application to impedance density estimation
"In this paper we consider a novel statistical inverse problem on the Poincar'{e}, or Lobachevsky, upper (complex) half plane. Here the Riemannian structure is hyperbolic and a transitive group action comes from the space of $2times2$ real matrices of determinant one via M"{o}bius transformations. Our approach is based on a deconvolution technique which relies on the Helgason--Fourier calculus adapted to this hyperbolic space. This gives a minimax nonparametric density estimator of a hyperbolic density that is corrupted by a random M"{o}bius transform. A motivation for this work comes from the reconstruction of impedances of capacitors where the above scenario on the Poincar'{e} plane exactly describes the physical system that is of statistical interest."
have_read  density_estimation  deconvolution  statistics  hyperbolic_geometry  re:smoothing_adjacency_matrices  fourier_analysis  via:dena  in_NB  re:hyperbolic_networks  re:network_differences
february 2013 by cshalizi
Smoothing Spline ANOVA Models
"Nonparametric function estimation with stochastic data, otherwise known as smoothing, has been studied by several generations of statisticians. Assisted by the ample computing power in today's servers, desktops, and laptops, smoothing methods have been finding their ways into everyday data analysis by practitioners. While scores of methods have proved successful for univariate smoothing, ones practical in multivariate settings number far less. Smoothing spline ANOVA models are a versatile family of smoothing methods derived through roughness penalties, that are suitable for both univariate and multivariate problems.
"In this book, the author presents a treatise on penalty smoothing under a unified framework. Methods are developed for (i) regression with Gaussian and non-Gaussian responses as well as with censored lifetime data; (ii) density and conditional density estimation under a variety of sampling schemes; and (iii) hazard rate estimation with censored life time data and covariates. The unifying themes are the general penalized likelihood method and the construction of multivariate models with built-in ANOVA decompositions. Extensive discussions are devoted to model construction, smoothing parameter selection, computation, and asymptotic convergence."
in_NB  books:noted  smoothing  splines  regression  density_estimation  statistics  nonparametrics  additive_models
february 2013 by cshalizi
Samworth , Yuan : Independent component analysis via nonparametric maximum likelihood estimation
"Independent Component Analysis (ICA) models are very popular semiparametric models in which we observe independent copies of a random vector X=AS, where A is a non-singular matrix and S has independent components. We propose a new way of estimating the unmixing matrix W=A−1 and the marginal distributions of the components of S using nonparametric maximum likelihood. Specifically, we study the projection of the empirical distribution onto the subset of ICA distributions having log-concave marginals. We show that, from the point of view of estimating the unmixing matrix, it makes no difference whether or not the log-concavity is correctly specified. The approach is further justified by both theoretical results and a simulation study."

- By "heard the talk", I mean Richard was kind enough to explain this to me.
to:NB  independent_component_analysis  nonparametrics  statistics  density_estimation  samworth.richard
february 2013 by cshalizi
[1301.1898] Concentration rate and consistency of the posterior under monotonicity constraints
"In this paper, we consider the well known problem of estimating a density function under qualitative assumptions. More precisely, we estimate monotone non increasing densities in a Bayesian setting and derive convergence rate for the posterior distribution for a Dirichlet process and finite mixture prior. We prove that the posterior distribution based on both priors concentrates at the rate $(n/log(n))^{-1/3}$, which is the minimax rate of estimation up to a $log(n)$ factor. We also study the behaviour of the posterior for the point-wise loss at any fixed point of the support the density and for the sup norm. We prove that the posterior is consistent and achieve the optimal concentration rate for both losses."
in_NB  statistics  density_estimation  bayesian_consistency
january 2013 by cshalizi
[1301.0554] Tree-dependent Component Analysis
"We present a generalization of independent component analysis (ICA), where instead of looking for a linear transform that makes the data components independent, we look for a transform that makes the data components well fit by a tree-structured graphical model. Treating the problem as a semiparametric statistical problem, we show that the optimal transform is found by minimizing a contrast function based on mutual information, a function that directly extends the contrast function used for classical ICA. We provide two approximations of this contrast function, one using kernel density estimation, and another using kernel generalized variance. This tree-dependent component analysis framework leads naturally to an efficient general multivariate density estimation technique where only bivariate density estimation needs to be performed."

--- How did I miss this?!?
to:NB  dimension_reduction  principal_components  graphical_models  jordan.michael_i.  bach.francis_r.  density_estimation  chow-liu_trees  machine_learning  statistics
january 2013 by cshalizi
Kernel spatial density estimation in infinite dimension space - Springer
"In this paper, we propose a nonparametric method to estimate the spatial density of a functional stationary random field. This latter is with values in some infinite dimensional normed space and admitted a density with respect to some reference measure. We study both the weak and strong consistencies of the considered estimator and also give some rates of convergence. Special attention is paid to the links between the probabilities of small balls and the rates of convergence of the estimator. The practical use and the behavior of the estimator are illustrated through some simulations and a real data application."
to:NB  density_estimation  statistics  random_fields
january 2013 by cshalizi
[1212.5156] Nonparametric Ridge Estimation
"We study the problem of estimating the ridges of a density function. Ridge estimation is an extension of mode finding and is useful for understanding the structure of a density. It can also be used to find hidden structure in point cloud data. We show that, under mild regularity conditions, the ridges of the kernel density estimator consistently estimate the ridges of the true density. When the data are noisy measurements of a manifold, we show that the ridges are close and topologically similar to the hidden manifold. To find the estimated ridges in practice, we adapt the modified mean-shift algorithm proposed by Ozertem and Erdogmus (2011). Some numerical experiments verify that the algorithm is fast and accurate."
to:NB  density_estimation  nonparametrics  kith_and_kin  statistics
december 2012 by cshalizi
[1212.2812] A Review of Kernel Density Estimation with Applications to Econometrics
"Nonparametric density estimation is of great importance when econometricians want to model the probabilistic or stochastic structure of a data set. This comprehensive review summarizes the most important theoretical aspects of kernel density estimation and provides an extensive description of classical and modern data analytic methods to compute the smoothing parameter. Throughout the text, several references can be found to the most up-to-date and cut point research approaches in this area, while econometric data sets are analyzed as examples. Lastly, we present SIZer, a new approach introduced by Chaudhuri and Marron (2000), whose objective is to analyze the visible features representing important underlying structures for different bandwidths."
in_NB  kernel_estimators  density_estimation  statistics
december 2012 by cshalizi
Favaro , Lijoi , Prünster : Asymptotics for a Bayesian nonparametric estimator of species variety
"In Bayesian nonparametric inference, random discrete probability measures are commonly used as priors within hierarchical mixture models for density estimation and for inference on the clustering of the data. Recently, it has been shown that they can also be exploited in species sampling problems: indeed they are natural tools for modeling the random proportions of species within a population thus allowing for inference on various quantities of statistical interest. For applications that involve large samples, the exact evaluation of the corresponding estimators becomes impracticable and, therefore, asymptotic approximations are sought. In the present paper, we study the limiting behaviour of the number of new species to be observed from further sampling, conditional on observed data, assuming the observations are exchangeable and directed by a normalized generalized gamma process prior. Such an asymptotic study highlights a connection between the normalized generalized gamma process and the two-parameter Poisson–Dirichlet process that was previously known only in the unconditional case."
in_NB  statistics  nonparametrics  density_estimation
november 2012 by cshalizi
[1210.5830] $V$-fold cross-validation and $V$-fold penalization in least-squares density estimation
"This paper studies $V$-fold cross-validation for model selection in least-squares density estimation. The goal is to provide theoretical grounds for choosing $V$ in order to minimize the least-squares risk of the selected estimator. % We first prove a non asymptotic oracle inequality for $V$-fold cross-validation and its bias-corrected version ($V$-fold penalization), with an upper bound decreasing as a function of $V$. In particular, this result implies $V$-fold penalization is asymptotically optimal. % Then, we compute the variance of $V$-fold cross-validation and related criteria, as well as the variance of key quantities for model selection performances. We show these variances depend on $V$ like $1+1/(V-1)$ (at least in some particular cases), suggesting the performances increase much from V=2 to V=5 or 10, and then is almost constant. % Overall, this explains the common advice to take $V=10$---at least in our setting and when the computational power is limited---, as confirmed by some simulation experiments."
in_NB  cross-validation  statistics  density_estimation  to_read  arlot.sylvain
october 2012 by cshalizi
[1210.1715] On adaptive minimax density estimation on $R^d$
"We address the problem of adaptive minimax density estimation on $bR^d$ with $bL_p$--loss on the anisotropic Nikol'skii classes. We fully characterize behavior of the minimax risk for different relationships between regularity parameters and norm indexes in definitions of the functional class and of the risk. In particular, we show that there are four different regimes with respect to the behavior of the minimax risk. We develop a single estimator which is (nearly) optimal in orderover the complete scale of the anisotropic Nikol'skii classes. Our estimation procedure is based on a data-driven selection of an estimator from a fixed family of kernel estimators."
to:NB  statistics  density_estimation
october 2012 by cshalizi
Bull : Honest adaptive confidence bands and self-similar functions
"Confidence bands are confidence sets for an unknown function f, containing all functions within some sup-norm distance of an estimator. In the density estimation, regression, and white noise models, we consider the problem of constructing adaptive confidence bands, whose width contracts at an optimal rate over a range of Hölder classes.
"While adaptive estimators exist, in general adaptive confidence bands do not, and to proceed we must place further conditions on f. We discuss previous approaches to this issue, and show it is necessary to restrict f to fundamentally smaller classes of functions.
"We then consider the self-similar functions, whose Hölder norm is similar at large and small scales. We show that such functions may be considered typical functions of a given Hölder class, and that the assumption of self-similarity is both necessary and sufficient for the construction of adaptive bands."
in_NB  confidence_sets  regression  density_estimation  nonparametrics  statistics
august 2012 by cshalizi
Fluctuation Bounds for Chaos Plus Noise in Dynamical Systems - Journal of Statistical Physics, Volume 148, Number 3 - SpringerLink
"We are interested in time series of the form y n =x n +ξ n where {x n } is generated by a chaotic dynamical system and where ξ n models observational noise. Using concentration inequalities, we derive fluctuation bounds for the auto-covariance function, the empirical measure, the kernel density estimator and the correlation dimension evaluated along y 0,…,y n , for all n. The chaotic systems we consider include for instance the Hénon attractor for Benedicks-Carleson parameters."
in_NB  to_read  dynamical_systems  time_series  statistics  concentration_of_measure  density_estimation
august 2012 by cshalizi
Taylor & Francis Online :: Semiparametric Hidden Markov Models - Journal of Computational and Graphical Statistics - Volume 21, Issue 3
"Hidden Markov models (HMMs) are widely used for dependent data modeling. Classically, the state-dependent distributions, that is, the distribution of the observation given the hidden state, belong to some parametric family. This article proposes a semiparametric HMM for continuous observations with at least one state-dependent distribution modeled with modern nonparametric techniques, for example, nonparametric maximum likelihood estimation under shape constraints like log-concavity. The resulting model is much more flexible and avoids a model misspecification bias. Concentrating on the special case of two states, the article discusses identifiability of the model. Further, an estimation procedure for semiparametric HMMs is proposed based on the expectation–maximization algorithm, which gives also rise to model validation techniques, as demonstrated in a simulation study and an empirical illustration."
to:NB  markov_models  density_estimation  nonparametrics  time_series  statistics
august 2012 by cshalizi
[1206.5036] Estimating Densities with Non-Parametric Exponential Families
"We propose a novel approach for density estimation with exponential families for the case when the true density may not fall within the chosen family. Our approach augments the sufficient statistics with features designed to accumulate probability mass in the neighborhood of the observed points, resulting in a non-parametric model similar to kernel density estimators. We show that under mild conditions, the resulting model uses only the sufficient statistics if the density is within the chosen exponential family, and asymptotically, it approximates densities outside of the chosen exponential family. Using the proposed approach, we modify the exponential random graph model, commonly used for modeling small-size graph distributions, to address the well-known issue of model degeneracy."
in_NB  density_estimation  exponential_families  exponential_family_random_graphs  kirshner.sergey  sufficiency  statistics  nonparametrics  have_read
june 2012 by cshalizi
[1206.5278] Fast Nonparametric Conditional Density Estimation
"Conditional density estimation generalizes regression by modeling a full density f(yjx) rather than only the expected value E(yjx). This is important for many tasks, including handling multi-modality and generating prediction intervals. Though fundamental and widely applicable, nonparametric conditional density estimators have received relatively little attention from statisticians and little or none from the machine learning community. None of that work has been applied to greater than bivariate data, presumably due to the computational difficulty of data-driven bandwidth selection. We describe the double kernel conditional density estimator and derive fast dual-tree-based algorithms for bandwidth selection using a maximum likelihood criterion. These techniques give speedups of up to 3.8 million in our experiments, and enable the first applications to previously intractable large multivariate datasets, including a redshift prediction problem from the Sloan Digital Sky Survey."

The bit about nobody working on more than bivariate problems is wrong, but the algorithm sounds promising.
june 2012 by cshalizi
[1205.5314] A general spline representation for nonparametric and semiparametric density estimates using diffeomorphisms
"A theorem of McCann shows that for any two absolutely continuous probability measures on R^d there exists a monotone transformation sending one probability measure to the other. A consequence of this theorem, relevant to statistics, is that density estimation can be recast in terms of transformations. In particular, one can fix any absolutely continuous probability measure, call it P, and then reparameterize the whole class of absolutely continuous probability measures as monotone transformations from P. In this paper we utilize this reparameterization of densities, as monotone transformations from some P, to construct semiparametric and nonparametric density estimates. We focus our attention on classes of transformations, developed in the image processing and computational anatomy literature, which are smooth, invertible and which have attractive computational properties. The techniques developed for this class of transformations allow us to show that a penalized maximum likelihood estimate (PMLE) of a smooth transformation from P exists and has a finite dimensional characterization, similar to those results found in the spline literature. These results are derived utilizing an Euler-Lagrange characterization of the PMLE which also establishes a surprising connection to a generalization of Stein's lemma for characterizing the normal distribution."
in_NB  density_estimation  statistics  splines
june 2012 by cshalizi
Concentration inequalities and confidence bands for needlet density estimators on compact homogeneous manifolds
"Let X 1, . . . , X n be a random sample from some unknown probability density f defined on a compact homogeneous manifold M of dimension d ≥ 1. Consider a ‘needlet frame’ j describing a localised projection onto the space of eigenfunctions of the Laplace operator on M with corresponding eigenvalues less than 2^2j , as constructed in Geller and Pesenson (J Geom Anal 2011). We prove non-asymptotic concentration inequalities for the uniform deviations of the linear needlet density estimator f n (j) obtained from an empirical estimate of the needlet projection jfj  of f. We apply these results to construct risk-adaptive estimators and nonasymptotic confidence bands for the unknown density f. The confidence bands are adaptive over classes of differentiable and Hölder-continuous functions on M that attain their Hölder exponents."
to:NB  density_estimation  statistics
june 2012 by cshalizi
per page:    204080120160

Copy this bookmark:

description:

tags: