cshalizi + kernel_estimators   63

[1906.00232] Kernel Instrumental Variable Regression
"Instrumental variable regression is a strategy for learning causal relationships in observational data. If measurements of input X and output Y are confounded, the causal relationship can nonetheless be identified if an instrumental variable Z is available that influences X directly, but is conditionally independent of Y given X and the unmeasured confounder. The classic two-stage least squares algorithm (2SLS) simplifies the estimation problem by modeling all relationships as linear functions. We propose kernel instrumental variable regression (KIV), a nonparametric generalization of 2SLS, modeling relations among X, Y, and Z as nonlinear functions in reproducing kernel Hilbert spaces (RKHSs). We prove the consistency of KIV under mild assumptions, and derive conditions under which the convergence rate achieves the minimax optimal rate for unconfounded, one-stage RKHS regression. In doing so, we obtain an efficient ratio between training sample sizes used in the algorithm's first and second stages. In experiments, KIV outperforms state of the art alternatives for nonparametric instrumental variable regression. Of independent interest, we provide a more general theory of conditional mean embedding regression in which the RKHS has infinite dimension."
5 weeks ago by cshalizi
[1708.05254] Adaptive Clustering Using Kernel Density Estimators
"We derive and analyze a generic, recursive algorithm for estimating all splits in a finite cluster tree as well as the corresponding clusters. We further investigate statistical properties of this generic clustering algorithm when it receives level set estimates from a kernel density estimator. In particular, we derive finite sample guarantees, consistency, rates of convergence, and an adaptive data-driven strategy for choosing the kernel bandwidth. For these results we do not need continuity assumptions on the density such as Hölder continuity, but only require intuitive geometric assumptions of non-parametric nature."
to:NB  density_estimation  kernel_estimators  clustering  statistics
9 weeks ago by cshalizi
[1712.01572] Eigendecompositions of Transfer Operators in Reproducing Kernel Hilbert Spaces
"Transfer operators such as the Perron--Frobenius or Koopman operator play an important role in the global analysis of complex dynamical systems. The eigenfunctions of these operators can be used to detect metastable sets, to project the dynamics onto the dominant slow processes, or to separate superimposed signals. We extend transfer operator theory to reproducing kernel Hilbert spaces and show that these operators are related to Hilbert space representations of conditional distributions, known as conditional mean embeddings in the machine learning community. Moreover, numerical methods to compute empirical estimates of these embeddings are akin to data-driven methods for the approximation of transfer operators such as extended dynamic mode decomposition and its variants. One main benefit of the presented kernel-based approaches is that these methods can be applied to any domain where a similarity measure given by a kernel is available. We illustrate the results with the aid of guiding examples and highlight potential applications in molecular dynamics as well as video and text data analysis."
to:NB  kernel_estimators  dynamical_systems  statistical_inference_for_stochastic_processes
9 weeks ago by cshalizi
[1908.04748] Optimal Estimation of Generalized Average Treatment Effects using Kernel Optimal Matching
"In causal inference, a variety of causal effect estimands have been studied, including the sample, uncensored, target, conditional, optimal subpopulation, and optimal weighted average treatment effects. Ad-hoc methods have been developed for each estimand based on inverse probability weighting (IPW) and on outcome regression modeling, but these may be sensitive to model misspecification, practical violations of positivity, or both. The contribution of this paper is twofold. First, we formulate the generalized average treatment effect (GATE) to unify these causal estimands as well as their IPW estimates. Second, we develop a method based on Kernel Optimal Matching (KOM) to optimally estimate GATE and to find the GATE most easily estimable by KOM, which we term the Kernel Optimal Weighted Average Treatment Effect. KOM provides uniform control on the conditional mean squared error of a weighted estimator over a class of models while simultaneously controlling for precision. We study its theoretical properties and evaluate its comparative performance in a simulation study. We illustrate the use of KOM for GATE estimation in two case studies: comparing spine surgical interventions and studying the effect of peer support on people living with HIV."
to:NB  causal_inference  estimation  matching  statistics  kernel_estimators
9 weeks ago by cshalizi
[1908.02399] Estimation of Conditional Average Treatment Effects with High-Dimensional Data
"Given the unconfoundedness assumption, we propose new nonparametric estimators for the reduced dimensional conditional average treatment effect (CATE) function. In the first stage, the nuisance functions necessary for identifying CATE are estimated by machine learning methods, allowing the number of covariates to be comparable to or larger than the sample size. This is a key feature since identification is generally more credible if the full vector of conditioning variables, including possible transformations, is high-dimensional. The second stage consists of a low-dimensional kernel regression, reducing CATE to a function of the covariate(s) of interest. We consider two variants of the estimator depending on whether the nuisance functions are estimated over the full sample or over a hold-out sample. Building on Belloni at al. (2017) and Chernozhukov et al. (2018), we derive functional limit theory for the estimators and provide an easy-to-implement procedure for uniform inference based on the multiplier bootstrap."
to:NB  causal_inference  regression  statistics  high-dimensional_statistics  nonparametrics  kernel_estimators
10 weeks ago by cshalizi
An oracle property of the Nadaraya–Watson kernel estimator for high‐dimensional nonparametric regression - Conn - - Scandinavian Journal of Statistics - Wiley Online Library
"The Nadaraya–Watson estimator is among the most studied nonparametric regression methods. A classical result is that its convergence rate depends on the number of covariates and deteriorates quickly as the dimension grows. This underscores the “curse of dimensionality” and has limited its use in high‐dimensional settings. In this paper, however, we show that the Nadaraya–Watson estimator has an oracle property such that when the true regression function is single‐ or multi‐index, it discovers the low‐rank dependence structure between the response and the covariates, mitigating the curse of dimensionality. Specifically, we prove that, using K‐fold cross‐validation and a positive‐semidefinite bandwidth matrix, the Nadaraya–Watson estimator has a convergence rate that depends on the number of indices rather than on the number of covariates. This result follows by allowing the bandwidths to diverge to infinity rather than restricting them all to converge to zero at certain rates, as in previous theoretical studies."
to:NB  regression  kernel_estimators  smoothing  high-dimensional_statistics
june 2019 by cshalizi
Kernel Smoothing | Wiley Online Books
"Comprehensive theoretical overview of kernel smoothing methods with motivating examples
"Kernel smoothing is a flexible nonparametric curve estimation method that is applicable when parametric descriptions of the data are not sufficiently adequate. This book explores theory and methods of kernel smoothing in a variety of contexts, considering independent and correlated data e.g. with short-memory and long-memory correlations, as well as non-Gaussian data that are transformations of latent Gaussian processes. These types of data occur in many fields of research, e.g. the natural and the environmental sciences, and others. Nonparametric density estimation, nonparametric and semiparametric regression, trend and surface estimation in particular for time series and spatial data and other topics such as rapid change points, robustness etc. are introduced alongside a study of their theoretical properties and optimality issues, such as consistency and bandwidth selection.
"Addressing a variety of topics, Kernel Smoothing: Principles, Methods and Applications offers a user-friendly presentation of the mathematical content so that the reader can directly implement the formulas using any appropriate software. The overall aim of the book is to describe the methods and their theoretical backgrounds, while maintaining an analytically simple approach and including motivating examples—making it extremely useful in many sciences such as geophysics, climate research, forestry, ecology, and other natural and life sciences, as well as in finance, sociology, and engineering."
january 2019 by cshalizi
Prediction Interval for Autoregressive Time Series via Oracally Efficient Estimation of Multi‐Step‐Ahead Innovation Distribution Function - Kong - 2018 - Journal of Time Series Analysis - Wiley Online Library
"A kernel distribution estimator (KDE) is proposed for multi‐step‐ahead prediction error distribution of autoregressive time series, based on prediction residuals. Under general assumptions, the KDE is proved to be oracally efficient as the infeasible KDE and the empirical cumulative distribution function (cdf) based on unobserved prediction errors. Quantile estimator is obtained from the oracally efficient KDE, and prediction interval for multi‐step‐ahead future observation is constructed using the estimated quantiles and shown to achieve asymptotically the nominal confidence levels. Simulation examples corroborate the asymptotic theory."
in_NB  prediction  time_series  statistics  kernel_estimators
october 2018 by cshalizi
[1505.00869] On the Feasibility of Distributed Kernel Regression for Big Data
"In modern scientific research, massive datasets with huge numbers of observations are frequently encountered. To facilitate the computational process, a divide-and-conquer scheme is often used for the analysis of big data. In such a strategy, a full dataset is first split into several manageable segments; the final output is then averaged from the individual outputs of the segments. Despite its popularity in practice, it remains largely unknown that whether such a distributive strategy provides valid theoretical inferences to the original data. In this paper, we address this fundamental issue for the distributed kernel regression (DKR), where the algorithmic feasibility is measured by the generalization performance of the resulting estimator. To justify DKR, a uniform convergence rate is needed for bounding the generalization error over the individual outputs, which brings new and challenging issues in the big data setup. Under mild conditions, we show that, with a proper number of segments, DKR leads to an estimator that is generalization consistent to the unknown regression function. The obtained results justify the method of DKR and shed light on the feasibility of using other distributed algorithms for processing big data. The promising preference of the method is supported by both simulation and real data examples."
to:NB  computational_statistics  kernel_estimators  regression  statistics
july 2015 by cshalizi
Silverman : Spline Smoothing: The Equivalent Variable Kernel Method
"The spline smoothing approach to nonparametric regression and curve estimation is considered. It is shown that, in a certain sense, spline smoothing corresponds approximately to smoothing by a kernel method with bandwidth depending on the local density of design points. Some exact calculations demonstrate that the approximation is extremely close in practice. Consideration of kernel smoothing methods demonstrates that the way in which the effective local bandwidth behaves in spline smoothing has desirable properties. Finally, the main result of the paper is applied to the related topic of penalized maximum likelihood probability density estimates; a heuristic discussion shows that these estimates should adapt well in the tails of the distribution."
have_read  splines  kernel_estimators  nonparametrics  regression  density_estimation  statistics  silverman.bernard  in_NB
february 2015 by cshalizi
Crisan , Míguez : Particle-kernel estimation of the filter density in state-space models
"Sequential Monte Carlo (SMC) methods, also known as particle filters, are simulation-based recursive algorithms for the approximation of the a posteriori probability measures generated by state-space dynamical models. At any given time t, a SMC method produces a set of samples over the state space of the system of interest (often termed “particles”) that is used to build a discrete and random approximation of the posterior probability distribution of the state variables, conditional on a sequence of available observations. One potential application of the methodology is the estimation of the densities associated to the sequence of a posteriori distributions. While practitioners have rather freely applied such density approximations in the past, the issue has received less attention from a theoretical perspective. In this paper, we address the problem of constructing kernel-based estimates of the posterior probability density function and its derivatives, and obtain asymptotic convergence results for the estimation errors. In particular, we find convergence rates for the approximation errors that hold uniformly on the state space and guarantee that the error vanishes almost surely as the number of particles in the filter grows. Based on this uniform convergence result, we first show how to build continuous measures that converge almost surely (with known rate) toward the posterior measure and then address a few applications. The latter include maximum a posteriori estimation of the system state using the approximate derivatives of the posterior density and the approximation of functionals of it, for example, Shannon’s entropy."
in_NB  particle_filters  kernel_estimators  density_estimation  filtering  state_estimation  state-space_models  statistics  computational_statistics
january 2015 by cshalizi
[1001.0591] Comparing Distributions and Shapes using the Kernel Distance
"Starting with a similarity function between objects, it is possible to define a distance metric on pairs of objects, and more generally on probability distributions over them. These distance metrics have a deep basis in functional analysis, measure theory and geometric measure theory, and have a rich structure that includes an isometric embedding into a (possibly infinite dimensional) Hilbert space. They have recently been applied to numerous problems in machine learning and shape analysis.
"In this paper, we provide the first algorithmic analysis of these distance metrics. Our main contributions are as follows: (i) We present fast approximation algorithms for computing the kernel distance between two point sets P and Q that runs in near-linear time in the size of (P cup Q) (note that an explicit calculation would take quadratic time). (ii) We present polynomial-time algorithms for approximately minimizing the kernel distance under rigid transformation; they run in time O(n + poly(1/epsilon, log n)). (iii) We provide several general techniques for reducing complex objects to convenient sparse representations (specifically to point sets or sets of points sets) which approximately preserve the kernel distance. In particular, this allows us to reduce problems of computing the kernel distance between various types of objects such as curves, surfaces, and distributions to computing the kernel distance between point sets. These take advantage of the reproducing kernel Hilbert space and a new relation linking binary range spaces to continuous range spaces with bounded fat-shattering dimension."
to:NB  have_read  kernel_estimators  two-sample_tests  statistics  probability  re:network_differences
october 2014 by cshalizi
[1307.7760] Geometric Inference on Kernel Density Estimates
"We show that geometric inference of a point cloud can be calculated by examining its kernel density estimate. This intermediate step results in the inference being statically robust to noise and allows for large computational gains and scalability (e.g. on 100 million points). In particular, by first creating a coreset for the kernel density estimate, the data representing the final geometric and topological structure has size depending only on the error tolerance, not on the size of the original point set or the complexity of the structure. To achieve this result, we study how to replace distance to a measure, as studied by Chazal, Cohen-Steiner, and Merigot, with the kernel distance. The kernel distance is monotonic with the kernel density estimate (sublevel sets of the kernel distance are superlevel sets of the kernel density estimate), thus allowing us to examine the kernel density estimate in this manner. We show it has several computational and stability advantages. Moreover, we provide an algorithm to estimate its topology using weighted Vietoris-Rips complexes."
to:NB  geometry  kernel_estimators  density_estimation  statistics  computational_statistics  have_read
october 2014 by cshalizi
Quality and efficiency for kernel density estimates in large data
"Kernel density estimates are important for a broad variety of applications. Their construction has been well-studied, but existing techniques are expensive on massive datasets and/or only provide heuristic approximations without theoretical guarantees. We propose randomized and deterministic algorithms with quality guarantees which are orders of magnitude more efficient than previous algorithms. Our algorithms do not require knowledge of the kernel or its bandwidth parameter and are easily parallelizable. We demonstrate how to implement our ideas in a centralized setting and in MapReduce, although our algorithms are applicable to any large-scale data processing framework. Extensive experiments on large real datasets demonstrate the quality, efficiency, and scalability of our techniques."

--- Ungated version: http://www.cs.utah.edu/~lifeifei/papers/kernelsigmod13.pdf
october 2014 by cshalizi
[1406.2083] Kernel MMD, the Median Heuristic and Distance Correlation in High Dimensions
"This paper is about two related methods for two sample testing and independence testing which have emerged over the last decade: Maximum Mean Discrepancy (MMD) for the former problem and Distance Correlation (dCor) for the latter. Both these methods have been suggested for high-dimensional problems, and sometimes claimed to be unaffected by increasing dimensionality of the samples. We will show theoretically and practically that the power of both methods (for different reasons) does actually decrease polynomially with dimension. We also analyze the median heuristic, which is a method for choosing tuning parameters of translation invariant kernels. We show that different bandwidth choices could result in the MMD decaying polynomially or even exponentially in dimension."
to:NB  hypothesis_testing  two-sample_tests  kernel_estimators  dependence_measures  kith_and_kin  wasserman.larry  singh.aarti  ramdas.aaditya  high-dimensional_statistics
july 2014 by cshalizi
[1406.0118] Improved graph Laplacian via geometric self-consistency
"We address the problem of setting the kernel bandwidth used by Manifold Learning algorithms to construct the graph Laplacian. Exploiting the connection between manifold geometry, represented by the Riemannian metric, and the Laplace-Beltrami operator, we set the bandwidth by optimizing the Laplacian's ability to preserve the geometry of the data. Experiments show that this principled approach is effective and robust."
to:NB  manifold_learning  kernel_estimators  statistics
july 2014 by cshalizi
[1402.3466] Kernel density estimates in particle filter
"The paper deals with kernel estimates of densities of filtering distributions in the particle filter. The convergence of the estimates is investigated by means of Fourier analysis. It is shown that the estimates converge to theoretical filtering densities in the mean integrated squared error under a certain assumption on the Sobolev character of the filtering densities. A sufficient condition is presented for the persistence of this Sobolev character over time."
to:NB  density_estimation  kernel_estimators  particle_filters  filtering  statistics
march 2014 by cshalizi
Joint Estimation of Multiple Graphical Models from High Dimensional Time Series
"In this manuscript the problem of jointly estimating multiple graphical models in high dimensions is considered. It is assumed that the data are collected from n subjects, each of which consists of m non-independent observations. The graphical models of subjects vary, but are assumed to change smoothly corresponding to a measure of the closeness between subjects. A kernel based method for jointly estimating all graphical models is proposed. Theoretically, under a double asymptotic framework, where both (m,n) and the dimension d can increase, the explicit rate of convergence in parameter estimation is provided, thus characterizing the strength one can borrow across different individuals and impact of data dependence on parameter estimation. Empirically, experiments on both synthetic and real resting state functional magnetic resonance imaging (rs-fMRI) data illustrate the effectiveness of the proposed method."
to:NB  to_read  graphical_models  time_series  high-dimensional_statistics  kernel_estimators  liu.han  re:your_favorite_dsge_sucks  fmri
january 2014 by cshalizi
NON-PARAMETRIC ESTIMATION UNDER STRONG DEPENDENCE - Zhao - 2013 - Journal of Time Series Analysis - Wiley Online Library
"We study non-parametric regression function estimation for models with strong dependence. Compared with short-range dependent models, long-range dependent models often result in slower convergence rates. We propose a simple differencing-sequence based non-parametric estimator that achieves the same convergence rate as if the data were independent. Simulation studies show that the proposed method has good finite sample performance."

- The trick here is to only estimate the independence on an observed _and_ IID covariate, i.e., the model is Y(t) = m(X(t)) + g(t) + \epsilon_t, where X(t) is the IID covariate, g(t) is an (unknown) time-trend, and \epsilon_t is the long-range-dependent innovation sequence. Differencing, Y(t) - Y(t-1) = m(X(t)) - m(X(t-1)) + stuff which averages rapidly, so one can learn the m() function up to an over-all constant rapidly. Worth mentioning in the ADA notes, in the sense of "don't solve hard problems you don't have to", but not a fundamental advance.
december 2013 by cshalizi
[1105.2135] Confidence bands for Horvitz-Thompson estimators using sampled noisy functional data
"When collections of functional data are too large to be exhaustively observed, survey sampling techniques provide an effective way to estimate global quantities such as the population mean function. Assuming functional data are collected from a finite population according to a probabilistic sampling scheme, with the measurements being discrete in time and noisy, we propose to first smooth the sampled trajectories with local polynomials and then estimate the mean function with a Horvitz-Thompson estimator. Under mild conditions on the population size, observation times, regularity of the trajectories, sampling scheme, and smoothing bandwidth, we prove a Central Limit theorem in the space of continuous functions. We also establish the uniform consistency of a covariance function estimator and apply the former results to build confidence bands for the mean function. The bands attain nominal coverage and are obtained through Gaussian process simulations conditional on the estimated covariance function. To select the bandwidth, we propose a cross-validation method that accounts for the sampling weights. A simulation study assesses the performance of our approach and highlights the influence of the sampling scheme and bandwidth choice."
in_NB  functional_data_analysis  surveys  statistics  confidence_sets  kernel_estimators
december 2013 by cshalizi
Bott , Devroye , Kohler : Estimation of a distribution from data with small measurement errors
"In this paper we study the problem of estimation of a distribution from data that contain small measurement errors. The only assumption on these errors is that the average absolute measurement error converges to zero for sample size tending to infinity with probability one. In particular we do not assume that the measurement errors are independent with expectation zero. Throughout the paper we assume that the distribution, which has to be estimated, has a density with respect to the Lebesgue-Borel measure.
"We show that the empirical measure based on the data with measurement error leads to an uniform consistent estimate of the distribution function. Furthermore, we show that in general no estimate is consistent in the total variation sense for all distributions under the above assumptions. However, in case that the average measurement error converges to zero faster than a properly chosen sequence of bandwidths, the total variation error of the distribution estimate corresponding to a kernel density estimate converges to zero for all distributions. In case of a general additive error model we show that this result even holds if only the average measurement error converges to zero. The results are applied in the context of estimation of the density of residuals in a random design regression model, where the residual error is not independent from the predictor."
to:NB  density_estimation  error-in-variables  nonparametrics  statistics  kernel_estimators
october 2013 by cshalizi
[1306.3574] Early stopping and non-parametric regression: An optimal data-dependent stopping rule
"The strategy of early stopping is a regularization technique based on choosing a stopping time for an iterative algorithm. Focusing on non-parametric regression in a reproducing kernel Hilbert space, we analyze the early stopping strategy for a form of gradient-descent applied to the least-squares loss function. We propose a data-dependent stopping rule that does not involve hold-out or cross-validation data, and we prove upper bounds on the squared error of the resulting function estimate, measured in either the $L^2(P)$ and $L^2(P_n)$ norm. These upper bounds lead to minimax-optimal rates for various kernel classes, including Sobolev smoothness classes and other forms of reproducing kernel Hilbert spaces. We show through simulation that our stopping rule compares favorably to two other stopping rules, one based on hold-out data and the other based on Stein's unbiased risk estimate. We also establish a tight connection between our early stopping strategy and the solution path of a kernel ridge regression estimator."
in_NB  optimization  kernel_estimators  hilbert_space  nonparametrics  regression  minimax  yu.bin  wainwright.martin_j.  to_teach:statcomp  have_read
june 2013 by cshalizi
[1306.1868] Smoothing splines with varying smoothing parameter
"This paper considers the development of spatially adaptive smoothing splines for the estimation of a regression function with non-homogeneous smoothness across the domain. Two challenging issues that arise in this context are the evaluation of the equivalent kernel and the determination of a local penalty. The roughness penalty is a function of the design points in order to accommodate local behavior of the regression function. It is shown that the spatially adaptive smoothing spline estimator is approximately a kernel estimator. The resulting equivalent kernel is spatially dependent. The equivalent kernels for traditional smoothing splines are a special case of this general solution. With the aid of the Green's function for a two-point boundary value problem, the explicit forms of the asymptotic mean and variance are obtained for any interior point. Thus, the optimal roughness penalty function is obtained by approximately minimizing the asymptotic integrated mean square error. Simulation results and an application illustrate the performance of the proposed estimator."
to:NB  splines  kernel_estimators  nonparametrics  regression  statistics
june 2013 by cshalizi
[1305.5882] Limit theorems for kernel density estimators under dependent samples
"In this paper, we construct a moment inequality for mixing dependent random variables, it is of independent interest. As applications, the consistency of the kernel density estimation is investigated. Several limit theorems are established: First, the central limit theorems for the kernel density estimator $f_{n,K}(x)$ and its distribution function are constructed. Also, the convergence rates of $\|f_{n,K}(x)-Ef_{n,K}(x)\|_{p}$ in sup-norm loss and integral $L^{p}$-norm loss are proved. Moreover, the a.s. convergence rates of the supremum of $|f_{n,K}(x)-Ef_{n,K}(x)|$ over a compact set and the whole real line are obtained. It is showed, under suitable conditions on the mixing rates, the kernel function and the bandwidths, that the optimal rates for i.i.d. random variables are also optimal for dependent ones."

--- The "to_teach" is really "to_mention"
may 2013 by cshalizi
[1305.5029] Divide and Conquer Kernel Ridge Regression: A Distributed Algorithm with Minimax Optimal Rates
"We establish optimal convergence rates for a decomposition-based scalable approach to kernel ridge regression. The method is simple to describe: it randomly partitions a dataset of size N into m subsets of equal size, computes an independent kernel ridge regression estimator for each subset, then averages the local solutions into a global predictor. This partitioning leads to a substantial reduction in computation time versus the standard approach of performing kernel ridge regression on all N samples. Our two main theorems establish that despite the computational speed-up, statistical optimality is retained: as long as m is not too large, the partition-based estimator achieves the statistical minimax rate over all estimators using the set of N samples. As concrete examples, our theory guarantees that the number of processors m may grow nearly linearly for finite-rank kernels and Gaussian kernels and polynomially in N for Sobolev spaces, which in turn allows for substantial reductions in computational cost. We conclude with experiments on both simulated data and a music-prediction task that complement our theoretical results, exhibiting the computational and statistical benefits of our approach."
to:NB  kernel_estimators  statistics  regression  distributed_systems  wainwright.martin_j.  duchi.john_c.
may 2013 by cshalizi
[1304.5575] Inverse Density as an Inverse Problem: The Fredholm Equation Approach
"In this paper we address the problem of estimating the ratio $\frac{q}{p}$ where $p$ is a density function and $q$ is another density, or, more generally an arbitrary function. Knowing or approximating this ratio is needed in various problems of inference and integration, in particular, when one needs to average a function with respect to one probability distribution, given a sample from another. It is often referred as {\it importance sampling} in statistical inference and is also closely related to the problem of {\it covariate shift} in transfer learning as well as to various MCMC methods. It may also be useful for separating the underlying geometry of a space, say a manifold, from the density function defined on it.
"Our approach is based on reformulating the problem of estimating $\frac{q}{p}$ as an inverse problem in terms of an integral operator corresponding to a kernel, and thus reducing it to an integral equation, known as the Fredholm problem of the first kind. This formulation, combined with the techniques of regularization and kernel methods, leads to a principled kernel-based framework for constructing algorithms and for analyzing them theoretically.
"The resulting family of algorithms (FIRE, for Fredholm Inverse Regularized Estimator) is flexible, simple and easy to implement.
"We provide detailed theoretical analysis including concentration bounds and convergence rates for the Gaussian kernel in the case of densities defined on $\R^d$, compact domains in $\R^d$ and smooth $d$-dimensional sub-manifolds of the Euclidean space.
We also show experimental results including applications to classification and semi-supervised learning within the covariate shift framework and demonstrate some encouraging experimental comparisons. We also show how the parameters of our algorithms can be chosen in a completely unsupervised manner."
to:NB  inverse_problems  non-stationarity  density_estimation  statistics  kernel_estimators  belkin.mikhail
april 2013 by cshalizi
Vilar , Vilar : Time series clustering based on nonparametric multidimensional forecast densities
"A new time series clustering method based on comparing forecast densities for a sequence of k>1 consecutive horizons is proposed. The unknown k-dimensional forecast densities can be non-parametrically approximated by using bootstrap procedures that mimic the generating processes without parametric restrictions. However, the difficulty of constructing accurate kernel estimators of multivariate densities is well known. To circumvent the high dimensionality problem, the bootstrap prediction vectors are projected onto a lower-dimensional space using principal components analysis, and then the densities are estimated in this new space. Proper distances between pairs of estimated densities are computed and used to generate an initial dissimilarity matrix, and hence a standard hierarchical clustering is performed. The clustering procedure is examined via simulation and is applied to a real dataset involving electricity prices series."
time_series  kernel_estimators  prediction  clustering  low-dimensional_summaries  statistics  bootstrap  nonparametrics  in_NB
april 2013 by cshalizi
Unified inference for sparse and dense longitudinal models
"In longitudinal data analysis, statistical inference for sparse data and dense data could be substantially different. For kernel smoothing, the estimate of the mean function, the convergence rates and the limiting variance functions are different in the two scenarios. This phenomenon poses challenges for statistical inference, as a subjective choice between the sparse and dense cases may lead to wrong conclusions. We develop methods based on self-normalization that can adapt to the sparse and dense cases in a unified framework. Simulations show that the proposed methods outperform some existing methods."
to:NB  kernel_estimators  smoothing  longitudinal_data
february 2013 by cshalizi
[1302.4922] Structure Discovery in Nonparametric Regression through Compositional Kernel Search
"Despite its importance, choosing the structural form of the kernel in nonparametric regression remains a black art. We define a space of kernel structures which are built compositionally by adding and multiplying a small number of base kernels. We present a method for searching over this space of structures which mirrors the scientific discovery process. The learned structures can often decompose functions into interpretable components and enable long-range extrapolation on time-series datasets. Our structure search method outperforms many widely used kernels and kernel combination methods on a variety of prediction tasks."
to:NB  kernel_estimators  regression  nonparametrics  gharamani.zoubin  model_discovery  to_read
february 2013 by cshalizi
IEEE Xplore - Estimation of a Density Using Real and Artificial Data
"Let $X, X_{1}, X_{2}, dots$ be independent and identically distributed ${BBR }^{d}$-valued random variables and let $m: {BBR }^{d} rightarrow {BBR }$ be a measurable function such that a density $f$ of $Y=m(X)$ exists. Given a sample of the distribution of $(X,Y)$ and additional independent observations of $X$ , we are interested in estimating $f$. We apply a regression estimate to the sample of $(X,Y)$ and use this estimate to generate additional artificial observations of $Y$ . Using these artificial observations together with the real observations of $Y$, we construct a density estimate of $f$ by using a convex combination of two kernel density estimates. It is shown that if the bandwidths satisfy the usual conditions and if in addition the supremum norm error of the regression estimate converges almost surely faster toward zero than the bandwidth of the kernel density estimate applied to the artificial data, then the convex combination of the two density estimates is $L_{1}$-consistent. The performance of the estimate for finite sample size is illustrated by simulated data, and the usefulness of the proced- re is demonstrated by applying it to a density estimation problem in a simulation model."
to:NB  regression  density_estimation  statistics  kernel_estimators  simulation  devroye.luc  semi-supervised_learning
february 2013 by cshalizi
[1212.2812] A Review of Kernel Density Estimation with Applications to Econometrics
"Nonparametric density estimation is of great importance when econometricians want to model the probabilistic or stochastic structure of a data set. This comprehensive review summarizes the most important theoretical aspects of kernel density estimation and provides an extensive description of classical and modern data analytic methods to compute the smoothing parameter. Throughout the text, several references can be found to the most up-to-date and cut point research approaches in this area, while econometric data sets are analyzed as examples. Lastly, we present SIZer, a new approach introduced by Chaudhuri and Marron (2000), whose objective is to analyze the visible features representing important underlying structures for different bandwidths."
in_NB  kernel_estimators  density_estimation  statistics
december 2012 by cshalizi
Kernel Regularized Least Squares: Moving Beyond Linearity and Additivity Without Sacrificing Interpretability by Jens Hainmueller, Chad Hazlett :: SSRN
"We propose the use of Kernel Regularized Least Squares (KRLS) for social science modeling and inference problems. KRLS borrows from machine learning methods designed to solve regression and classification problems without relying on linearity or additivity assumptions. The method constructs a flexible hypothesis space that uses kernels as radial basis functions and finds the best fitting surface in this space by minimizing a complexity-penalized least squares problem. We provide an accessible explanation of the method and argue that it is well suited for social science inquiry because it avoids strong parametric assumptions and still allows for simple interpretation in ways analogous to OLS or other members of the GLM family. We also extend the method in several directions to make it more effective for social inquiry. In particular, we (1) derive new estimators for the pointwise marginal effects and their variances, (2) establish unbiasedness, consistency, and asymptotic normality of the KRLS estimator under fairly general conditions, (3) develop an automated approach to chose smoothing parameters, and (4) provide companion software. We illustrate the use of the methods through several simulations and a real-data example."

Seems pretty straightforward...
october 2012 by cshalizi
[1205.6843] Significance Testing and Group Variable Selection
"Let X; Z be r and s-dimensional covariates, respectively, used to model the response variable Y as Y = m(X;Z) + sigma(X;Z)epsilon. We develop an ANOVA-type test for the null hypothesis that Z has no influence on the regression function, based on residuals obtained from local polynomial ?fitting of the null model. Using p-values from this test, a group variable selection method based on multiple testing ideas is proposed. Simulations studies suggest that the proposed test procedure outperforms the generalized likelihood ratio test when the alternative is non-additive or there is heteroscedasticity. Additional simulation studies, with data generated from linear, non-linear and logistic regression, reveal that the proposed group variable selection procedure performs competitively against Group Lasso, and outperforms it in selecting groups having nonlinear effects. The proposed group variable selection procedure is illustrated on a real data set."
june 2012 by cshalizi
Local polynomial regression for symmetric positive definite matrices - Yuan - 2012 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library
"Local polynomial regression has received extensive attention for the non-parametric estimation of regression functions when both the response and the covariate are in Euclidean space. However, little has been done when the response is in a Riemannian manifold. We develop an intrinsic local polynomial regression estimate for the analysis of symmetric positive definite matrices as responses that lie in a Riemannian manifold with covariate in Euclidean space. The primary motivation and application of the methodology proposed is in computer vision and medical imaging. We examine two commonly used metrics, including the trace metric and the log-Euclidean metric on the space of symmetric positive definite matrices. For each metric, we develop a cross-validation bandwidth selection method, derive the asymptotic bias, variance and normality of the intrinsic local constant and local linear estimators, and compare their asymptotic mean-square errors. Simulation studies are further used to compare the estimators under the two metrics and to examine their finite sample performance. We use our method to detect diagnostic differences between diffusion tensors along fibre tracts in a study of human immunodeficiency virus."
to:NB  variance_estimation  statistics  regression  nonparametrics  kernel_estimators
april 2012 by cshalizi
[0805.2490] Using statistical smoothing to date medieval manuscripts
"We discuss the use of multivariate kernel smoothing methods to date manuscripts dating from the 11th to the 15th centuries, in the English county of Essex. The dataset consists of some 3300 dated and 5000 undated manuscripts, and the former are used as a training sample for imputing dates for the latter. It is assumed that two manuscripts that are close'', in a sense that may be defined by a vector of measures of distance for documents, will have close dates. Using this approach, statistical ideas are used to assess similarity'', by smoothing among distance measures, and thus to estimate dates for the 5000 undated manuscripts by reference to the dated ones."

Can we get data?
march 2012 by cshalizi
[1202.3775] Kernel-based Conditional Independence Test and Application in Causal Discovery
"Conditional independence testing is an important problem, especially in Bayesian network learning and causal discovery. Due to the curse of dimensionality, testing for conditional independence of continuous variables is particularly challenging. We propose a Kernel-based Conditional Independence test (KCI-test), by constructing an appropriate test statistic and deriving its asymptotic distribution under the null hypothesis of conditional independence. The proposed method is computationally efficient and easy to implement. Experimental results show that it outperforms other methods, especially when the conditioning set is large or the sample size is not very large, in which case other methods encounter difficulties."
february 2012 by cshalizi
Model Selection in Kernel Based Regression using the Influence Function
"Recent results about the robustness of kernel methods involve the analysis of influence functions. By definition the influence function is closely related to leave-one-out criteria. In statistical learning, the latter is often used to assess the generalization of a method. In statistics, the influence function is used in a similar way to analyze the statistical efficiency of a method. Links between both worlds are explored. The influence function is related to the first term of a Taylor expansion. Higher order influence functions are calculated. A recursive relation between these terms is found characterizing the full Taylor expansion. It is shown how to evaluate influence functions at a specific sample distribution to obtain an approximation of the leave-one-out error. A specific implementation is proposed using a L1 loss in the selection of the hyperparameters and a Huber loss in the estimation procedure. The parameter in the Huber loss controlling the degree of robustness is optimized as well. The resulting procedure gives good results, even when outliers are present in the data."
in_NB  statistics  regression  kernel_estimators  model_selection  robustness  nonparametrics  cross-validation
february 2012 by cshalizi
[1112.1838] Non-parametric kernel estimation for symmetric Hawkes processes. Application to high frequency financial data
"We define a numerical method that provides a non-parametric estimation of the kernel shape in symmetric multivariate Hawkes processes. This method relies on second order statistical properties of Hawkes processes that relate the covariance matrix of the process to the kernel matrix. The square root of the correlation function is computed using a minimal phase recovering method. We illustrate our method on some examples and provide an empirical study of the estimation errors. Within this framework, we analyze high frequency financial price data modeled as 1D or 2D Hawkes processes. We find slowly decaying (power-law) kernel shapes suggesting a long memory nature of self-excitation phenomena at the microstructure level of price dynamics."
in_NB  kernel_estimators  time_series  point_processes  nonparametrics  statistics  re:LoB_project
december 2011 by cshalizi
[1111.6230] Convergence of Nonparametric Functional Regression Estimates with Functional Responses
"We consider nonparametric functional regression when both predictors and responses are functions. More specifically, we let $(X_1,Y_1),...,(X_n,Y_n)$ be random elements in $mathcal{F}timesmathcal{H}$ where $mathcal{F}$ is a semi-metric space and $mathcal{H}$ is a separable Hilbert space. Based on a recently introduced notion of weak dependence for functional data, we showed the almost sure convergence rates of both the Nadaraya-Watson estimator and the nearest neighbor estimator, in a unified manner. Several factors, including functional nature of the responses, the assumptions on the functional variables using the Orlicz norm and the desired generality on weakly dependent data, make the theoretical investigations more challenging and interesting."
in_NB  statistics  regression  nonparametrics  functional_data  kernel_estimators
december 2011 by cshalizi
CAKE: Convex Adaptive Kernel Density Estimation
"In this paper we present a generalization of kernel density estimation called Convex Adaptive Kernel Density Estimation (CAKE) that replaces single bandwidth se- lection by a convex aggregation of kernels at all scales, where the convex aggregation is allowed to vary from one training point to another, treating the fundamental problem of heterogeneous smoothness in a novel way. Learning the CAKE estimator given a training set reduces to solving a single con- vex quadratic programming problem. We derive rates of convergence of CAKE like estimator to the true underlying density under smoothness assumptions on the class and show that given a sufficiently large sample the mean squared error of such estimators is optimal in a minimax sense. We also give a risk bound of the CAKE estimator in terms of its empirical risk. We empirically compare CAKE to other density estimators proposed in the statistics literature for handling heterogeneous smoothness on different synthetic and natural distributions. "
to:NB  have_read  density_estimation  ensemble_methods  kernel_estimators  statistics
november 2011 by cshalizi
Maximum Kernel Likelihood Estimation - Journal of Computational and Graphical Statistics - 17(4):976
"We introduce an estimator for the population mean based on maximizing likelihoods formed by parameterizing a kernel density estimate. Due to these origins, we have dubbed the estimator the maximum kernel likelihood estimate (MKLE). A speedy computational method to compute the MKLE based on binning is implemented in a simulation study which shows that the MKLE at an optimal bandwidth is decidedly superior in terms of efficiency to the sample mean and other measures of location for heavy tailed symmetric distributions. An empirical rule and a computational method to estimate this optimal bandwidth are developed and used to construct bootstrap confidence intervals for the population mean. We show that the intervals have approximately nominal coverage and have significantly smaller average width than the standard t and z intervals. Finally, we develop some mathematical properties for a very close approximation to the MKLE called the kernel mean. In particular, we demonstrate that the kernel mean is indeed unbiased for the population mean for symmetric distributions."
statistics  estimation  kernel_estimators  in_NB  heavy_tails
october 2011 by cshalizi
"Smooth Regression Analysis" (G. S. Watson, 1964) JSTOR: Sankhyā: The Indian Journal of Statistics, Series A, Vol. 26, No. 4 (Dec., 1964), pp. 359-372
The abstract is great: "Few would deny that the most powerful statistical tool is graph paper. When however there are many observations (and/or many variables) graphical procedures become tedious. It seems to the author that the most characteristic problem for statisticians at the moment is the development of methods for analyzing the data poured out by electronic observing systems. The present paper gives a simple computer method for obtaining a "graph" from a large number of observations."
june 2011 by cshalizi
Botev, Grotowski, Kroese: Kernel density estimation via diffusion
"We present a new adaptive kernel density estimator based on linear diffusion processes. The proposed estimator builds on existing ideas for adaptive smoothing by incorporating information from a pilot density estimate. In addition, we propose a new plug-in bandwidth selection method that is free from the arbitrary normal reference rules used by existing methods. We present simulation examples in which the proposed approach outperforms existing methods in terms of accuracy and reliability."
density_estimation  kernel_estimators  stochastic_processes  statistics
august 2010 by cshalizi
Liu, Wu: Simultaneous nonparametric inference of time series
" kernel estimation of marginal densities and regression functions of stationary processes. It is shown that for a wide class of time series, with proper centering and scaling, the maximum deviations of kernel density and regression estimates are asymptotically Gumbel. Our results substantially generalize earlier ones which were obtained under independence or beta mixing assumptions. The asymptotic results can be applied to assess patterns of marginal densities or regression functions via the construction of simultaneous confidence bands for which one can perform goodness-of-fit tests"
time_series  statistical_inference_for_stochastic_processes  kernel_estimators  confidence_sets
july 2010 by cshalizi
"We propose an adaptive varying-coefficient spatiotemporal model for data that are observed irregularly over space and regularly in time. The model is capable of catching possible non-linearity (both in space and in time) and non-stationarity (in space) by allowing the auto-regressive coefficients to vary with both spatial location and an unknown index variable. We suggest a two-step procedure to estimate both the coefficient functions and the index variable, which is readily implemented and can be computed even for large spatiotemporal data sets. Our theoretical results indicate that, in the presence of the so-called nugget effect, the errors in the estimation may be reduced via the spatial smoothing—the second step in the estimation procedure proposed. The simulation results reinforce this finding. As an illustration, we apply the methodology to a data set of sea level pressure in the North Sea."
spatial_statistics  statistics  time_series  smoothing  kernel_estimators
august 2009 by cshalizi
Uniform Convergence Rates for Kernel Estimation with Dependent Data
"This paper presents a set of rate of uniform consistency results for kernel estimators of density functions and regressions functions. We generalize the existing literature by allowing for stationary strong mixing multivariate data with infinite support, and kernels with unbounded support, and general bandwidth sequences. These results are useful for semiparametric estimation based on a first stage nonparametric estimator."
kernel_estimators  mixing  statistical_inference_for_stochastic_processes  statistics  density_estimation  regression  hansen.bruce
june 2009 by cshalizi
[0812.3973] Revisiting R\'ev\'esz's stochastic approximation method for the estimation of a regression function
A recursive/on-line kernel regression estimator, with proofs that it's about as efficient as the off-line/batch-mode Nadaraya-Watson estimator. Sounds cool...
nonparametrics  regression  kernel_estimators  stochastic_approximation  online_learning  to_read  to_teach:data-mining
january 2009 by cshalizi
Cross-Validation and the Estimation of Conditional Probability Densities
Nice. Definitely needs to be included next time I teach data-mining. (The method is implemented in the "np" package on CRAN.) In particular worth comparing to logistic regression and logistic GAMs for binary conditional probability estimation/classification.