**cshalizi + estimation**
310

[1911.01483] Statistical Inference for Model Parameters in Stochastic Gradient Descent via Batch Means

4 weeks ago by cshalizi

"Statistical inference of true model parameters based on stochastic gradient descent (SGD) has started receiving attention in recent years. In this paper, we study a simple algorithm to construct asymptotically valid confidence regions for model parameters using the batch means method. The main idea is to cancel out the covariance matrix which is hard/costly to estimate. In the process of developing the algorithm, we establish process-level function central limit theorem for Polyak-Ruppert averaging based SGD estimators. We also extend the batch means method to accommodate more general batch size specifications."

to:NB
optimization
estimation
statistics
re:HEAS
4 weeks ago by cshalizi

[1910.11540] Descriptive Dimensionality and Its Characterization of MDL-based Learning and Change Detection

6 weeks ago by cshalizi

"This paper introduces a new notion of dimensionality of probabilistic models from an information-theoretic view point. We call it the "descriptive dimension"(Ddim). We show that Ddim coincides with the number of independent parameters for the parametric class, and can further be extended to real-valued dimensionality when a number of models are mixed. The paper then derives the rate of convergence of the MDL (Minimum Description Length) learning algorithm which outputs a normalized maximum likelihood (NML) distribution with model of the shortest NML codelength. The paper proves that the rate is governed by Ddim. The paper also derives error probabilities of the MDL-based test for multiple model change detection. It proves that they are also governed by Ddim. Through the analysis, we demonstrate that Ddim is an intrinsic quantity which characterizes the performance of the MDL-based learning and change detection."

to:NB
minimum_description_length
statistics
estimation
prediction
change-point_problem
6 weeks ago by cshalizi

[1602.02201] The Rate-Distortion Risk in Estimation from Compressed Data

6 weeks ago by cshalizi

"We consider the problem of estimating a latent signal from a lossy compressed version of the data. We assume that the data is generated by an underlying signal and compressed using a lossy compression scheme that is agnostic to this signal. In reconstruction, the underlying signal is estimated so as to minimize a prescribed loss measure. For the above setting and an arbitrary distortion measure between the data and its compressed version, we define the rate-distortion (RD) risk of an estimator as its risk with respect to the distribution achieving Shannon's RD function with respect to this distortion. We derive conditions under which the RD risk describes the risk in estimating from the compressed data. The main theoretical tools to obtain these conditions are transportation-cost inequalities in conjunction with properties of source codes achieving Shannon's RD function. We show that these conditions hold in various settings, including settings where the alphabet of the underlying signal is finite or when the RD achieving distribution is multivariate normal. We evaluate the RD risk in special cases under these settings. This risk provides an achievable loss in compress-and-estimate settings, i.e., when the data is first compressed, communicated or stored using a procedure that is agnostic to the underlying signal, which is later estimated from the compressed version of the data. Our results imply the following general procedure for designing estimators from datasets undergoing lossy compression without specifying the actual compression technique; train the estimator based on a perturbation of the data according to the RD achieving distribution. Under general conditions, this estimator achieves the RD risk when applied to the lossy compressed version of the data."

to:NB
estimation
information_theory
compression
statistics
6 weeks ago by cshalizi

How to Use Economic Theory to Improve Estimators: Shrinking Toward Theoretical Restrictions | The Review of Economics and Statistics | MIT Press Journals

6 weeks ago by cshalizi

"We propose to use economic theories to construct shrinkage estimators that perform well when the theories' empirical implications are approximately correct but perform no worse than unrestricted estimators when the theories' implications do not hold. We implement this construction in various settings, including labor demand and wage inequality, and estimation of consumer demand. We provide asymptotic and finite sample characterizations of the behavior of the proposed estimators. Our approach is an alternative to the use of theory as something to be tested or to be imposed on estimates. Our approach complements uses of theory for identification and extrapolation."

to:NB
economics
econometrics
statistics
estimation
shrinkage
6 weeks ago by cshalizi

[1910.08390] Finite sample deviation and variance bounds for first order autoregressive processes

7 weeks ago by cshalizi

"In this paper, we study finite-sample properties of the least squares estimator in first order autoregressive processes. By leveraging a result from decoupling theory, we derive upper bounds on the probability that the estimate deviates by at least a positive ε from its true value. Our results consider both stable and unstable processes. Afterwards, we obtain problem-dependent non-asymptotic bounds on the variance of this estimator, valid for sample sizes greater than or equal to seven. Via simulations we analyze the conservatism of our bounds, and show that they reliably capture the true behavior of the quantities of interest."

to:NB
statistics
deviation_inequalities
time_series
estimation
to_teach:data_over_space_and_time
7 weeks ago by cshalizi

[1910.09457] Aleatoric and Epistemic Uncertainty in Machine Learning: A Tutorial Introduction

7 weeks ago by cshalizi

"The notion of uncertainty is of major importance in machine learning and constitutes a key element of machine learning methodology. In line with the statistical tradition, uncertainty has long been perceived as almost synonymous with standard probability and probabilistic predictions. Yet, due to the steadily increasing relevance of machine learning for practical applications and related issues such as safety requirements, new problems and challenges have recently been identified by machine learning scholars, and these problems may call for new methodological developments. In particular, this includes the importance of distinguishing between (at least) two different types of uncertainty, often refereed to as aleatoric and epistemic. In this paper, we provide an introduction to the topic of uncertainty in machine learning as well as an overview of hitherto attempts at handling uncertainty in general and formalizing this distinction in particular."

to:NB
prediction
estimation
uncertainty_for_neural_networks
statistics
7 weeks ago by cshalizi

[1910.07185] Identifying relationships between cognitive processes across tasks, contexts, and time

7 weeks ago by cshalizi

"It is commonly assumed that a specific testing occasion (task, design, procedure, etc.) provides insight into psychological phenomena that generalise to other, related testing occasions. However, this assumption is rarely tested in data. When it is tested, the two existing methods of comparison have one of the following two shortcomings: they either correlate summary statistics like mean response time or accuracy, which does not provide insight into relationships between latent psychological processes, or they first assume independence in cognitive processes across tasks and then, in a second step, test whether there is in fact a relationship. Our article develops a statistically principled method to directly estimate the correlation between latent components of cognitive processing across tasks, contexts, and time. Our method simultaneously estimates individual participant parameters of a cognitive model at each testing occasion, group-level parameters representing across-participant parameter averages and variances, and across-task covariances, i.e., correlations. The approach provides a natural way to "borrow" data across testing occasions, which increases the precision of parameter estimates across all testing occasions provided there is a non-zero relationship between some of the latent processes of the model. We illustrate the method in two applications in decision making contexts. The first assesses the effect of the neural scanning environment on model parameters, and the second assesses relationships between latent processes underlying performance of three different tasks. We conclude by highlighting the potential of the parameter-correlation method to provide an "assumption-light" tool for estimating the relatedness of cognitive processes across tasks, contexts, and time."

to:NB
cognitive_science
psychometrics
statistics
estimation
7 weeks ago by cshalizi

[1703.09965] Estimable group effects for strongly correlated variables in linear models

7 weeks ago by cshalizi

"It is well known that parameters for strongly correlated predictor variables in a linear model cannot be accurately estimated. We look for linear combinations of these parameters that can be. Under a uniform model, we find such linear combinations in a neighborhood of a simple variability weighted average of these parameters. Surprisingly, this variability weighted average is more accurately estimated when the variables are more strongly correlated, and it is the only linear combination with this property. It can be easily computed for strongly correlated predictor variables in all linear models and has applications in inference and estimation concerning parameters of such variables."

to:NB
linear_regression
estimation
statistics
to_teach:linear_models
7 weeks ago by cshalizi

[1909.10828] Double-estimation-friendly inference for high-dimensional misspecified models

10 weeks ago by cshalizi

"All models may be wrong---but that is not necessarily a problem for inference. Consider the standard t-test for the significance of a variable X for predicting response Y whilst controlling for p other covariates Z in a random design linear model. This yields correct asymptotic type~I error control for the null hypothesis that X is conditionally independent of Y given Z under an \emph{arbitrary} regression model of Y on (X,Z), provided that a linear regression model for X on Z holds. An analogous robustness to misspecification, which we term the "double-estimation-friendly" (DEF) property, also holds for Wald tests in generalised linear models, with some small modifications.

"In this expository paper we explore this phenomenon, and propose methodology for high-dimensional regression settings that respects the DEF property. We advocate specifying (sparse) generalised linear regression models for both Y and the covariate of interest X; our framework gives valid inference for the conditional independence null if either of these hold. In the special case where both specifications are linear, our proposal amounts to a small modification of the popular debiased Lasso test. We also investigate constructing confidence intervals for the regression coefficient of X via inverting our tests; these have coverage guarantees even in partially linear models where the contribution of Z to Y can be arbitrary. Numerical experiments demonstrate the effectiveness of the methodology."

to:NB
misspecification
estimation
high-dimensional_statistics
buhlmann.peter
statistics
"In this expository paper we explore this phenomenon, and propose methodology for high-dimensional regression settings that respects the DEF property. We advocate specifying (sparse) generalised linear regression models for both Y and the covariate of interest X; our framework gives valid inference for the conditional independence null if either of these hold. In the special case where both specifications are linear, our proposal amounts to a small modification of the popular debiased Lasso test. We also investigate constructing confidence intervals for the regression coefficient of X via inverting our tests; these have coverage guarantees even in partially linear models where the contribution of Z to Y can be arbitrary. Numerical experiments demonstrate the effectiveness of the methodology."

10 weeks ago by cshalizi

[1909.09336] Applications of Generalized Maximum Likelihood Estimators to stratified sampling and post-stratification with many unobserved strata

11 weeks ago by cshalizi

"Consider the problem of estimating a weighted average of the means of n strata, based on a random sample with realized Ki observations from stratum i,i=1,...,n.

"This task is non-trivial in cases where for a significant portion of the strata the corresponding Ki=0. Such a situation may happen in post-stratification, when it is desired to have very fine stratification. A fine stratification could be desired in order that assumptions, or, approximations, like Missing At Random conditional on strata, will be appealing. Fine stratification could also be desired in observational studies, when it is desired to estimate average treatment effect, by averaging the effects in small and homogenous strata.

"Our approach is based on applying Generalized Maximum Likelihood Estimators (GMLE), and ideas that are related to Non-Parametric Empirical Bayes, in order to estimate the means of strata i with corresponding Ki=0. There are no assumptions about a relation between the means of the unobserved strata (i.e., with Ki=0) and those of the observed strata.

"The performance of our approach is demonstrated both in simulations and on a real data set. Some consistency and asymptotic results are also provided."

to:NB
missing_data
statistics
estimation
surveys
"This task is non-trivial in cases where for a significant portion of the strata the corresponding Ki=0. Such a situation may happen in post-stratification, when it is desired to have very fine stratification. A fine stratification could be desired in order that assumptions, or, approximations, like Missing At Random conditional on strata, will be appealing. Fine stratification could also be desired in observational studies, when it is desired to estimate average treatment effect, by averaging the effects in small and homogenous strata.

"Our approach is based on applying Generalized Maximum Likelihood Estimators (GMLE), and ideas that are related to Non-Parametric Empirical Bayes, in order to estimate the means of strata i with corresponding Ki=0. There are no assumptions about a relation between the means of the unobserved strata (i.e., with Ki=0) and those of the observed strata.

"The performance of our approach is demonstrated both in simulations and on a real data set. Some consistency and asymptotic results are also provided."

11 weeks ago by cshalizi

[1909.05582] A taxonomy of estimator consistency on discrete estimation problems

12 weeks ago by cshalizi

"We describe a four-level hierarchy mapping both all discrete estimation problems and all estimators on these problems, such that the hierarchy describes each estimator's consistency guarantees on each problem class. We show that no estimator is consistent for all estimation problems, but that some estimators, such as Maximum A Posteriori, are consistent for the widest possible class of discrete estimation problems. For Maximum Likelihood and Approximate Maximum Likelihood estimators we show that they do not provide consistency on as wide a class, but define a sub-class of problems characterised by their consistency. Lastly, we show that some popular estimators, specifically Strict Minimum Message Length, do not provide consistency guarantees even within the sub-class."

to:NB
statistics
estimation
model_selection
12 weeks ago by cshalizi

[1702.08109] Variational Analysis of Constrained M-Estimators

12 weeks ago by cshalizi

"We propose a unified framework for establishing existence of nonparametric M-estimators, computing the corresponding estimates, and proving their strong consistency when the class of functions is exceptionally rich. In particular, the framework addresses situations where the class of functions is complex involving information and assumptions about shape, pointwise bounds, location of modes, height at modes, location of level-sets, values of moments, size of subgradients, continuity, distance to a "prior" function, multivariate total positivity, and any combination of the above. The class might be engineered to perform well in a specific setting even in the presence of little data. The framework views the class of functions as a subset of a particular metric space of upper semicontinuous functions under the Attouch-Wets distance. In addition to allowing a systematic treatment of numerous M-estimators, the framework yields consistency of plug-in estimators of modes of densities, maximizers of regression functions, level-sets of classifiers, and related quantities, and also enables computation by means of approximating parametric classes. We establish consistency through a one-sided law of large numbers, here extended to sieves, that relaxes assumptions of uniform laws, while ensuring global approximations even under model misspecification."

to:NB
estimation
empirical_processes
statistics
12 weeks ago by cshalizi

[1909.00579] Asymptotic linear expansion of regularized M-estimators

september 2019 by cshalizi

"Parametric high-dimensional regression analysis requires the usage of regularization terms to get interpretable models. The respective estimators can be regarded as regularized M-functionals which are naturally highly nonlinear. We study under which conditions these M-functionals are compactly differentiable, so that the corresponding estimators admit an asymptotically linear expansion. In a one-step construction, for a suitably consistent starting estimator, this linearization replaces solving optimization problems by evaluating the corresponding influence curves at the given data points. We show under which conditions the asymptotic linear expansion is valid and provide concrete examples of machine learning algorithms that fit into this framework."

to:NB
estimation
statistics
re:hea
september 2019 by cshalizi

[1607.06163] Indirect Inference With(Out) Constraints

august 2019 by cshalizi

"Indirect Inference (I-I) estimation of structural parameters θ {requires matching observed and simulated statistics, which are most often generated using an auxiliary model that depends on instrumental parameters β.} {The estimators of the instrumental parameters will encapsulate} the statistical information used for inference about the structural parameters. As such, artificially constraining these parameters may restrict the ability of the auxiliary model to accurately replicate features in the structural data, which may lead to a range of issues, such as, a loss of identification. However, in certain situations the parameters β naturally come with a set of q restrictions. Examples include settings where β must be estimated subject to q possibly strict inequality constraints g(β)>0, such as, when I-I is based on GARCH auxiliary models. In these settings we propose a novel I-I approach that uses appropriately modified unconstrained auxiliary statistics, which are simple to compute and always exists. We state the relevant asymptotic theory for this I-I approach without constraints and show that it can be reinterpreted as a standard implementation of I-I through a properly modified binding function. Several examples that have featured in the literature illustrate our approach."

to:NB
indirect_inference
estimation
statistics
august 2019 by cshalizi

[1302.0890] Local Log-linear Models for Capture-Recapture

august 2019 by cshalizi

"Log-linear models are often used to estimate the size of a closed population using capture-recapture data. When capture probabilities are related to auxiliary covariates, one may select a separate model based on each of several post-strata. We extend post-stratification to its logical extreme by selecting a local log-linear model for each observed unit, while smoothing to achieve stability. Our local models serve a dual purpose: In addition to estimating the size of the population, we estimate the rate of missingness as a function of covariates. A simulation demonstrates the superiority of our method when the generating model varies over the covariate space. Data from the Breeding Bird Survey is used to illustrate the method."

--- When did the title change from "Smooth Poststratification"?

to:NB
have_read
surveys
smoothing
statistics
estimation
kurtz.zachary
kith_and_kin
--- When did the title change from "Smooth Poststratification"?

august 2019 by cshalizi

[1901.00555] An Introductory Guide to Fano's Inequality with Applications in Statistical Estimation

august 2019 by cshalizi

"Information theory plays an indispensable role in the development of algorithm-independent impossibility results, both for communication problems and for seemingly distinct areas such as statistics and machine learning. While numerous information-theoretic tools have been proposed for this purpose, the oldest one remains arguably the most versatile and widespread: Fano's inequality. In this chapter, we provide a survey of Fano's inequality and its variants in the context of statistical estimation, adopting a versatile framework that covers a wide range of specific problems. We present a variety of key tools and techniques used for establishing impossibility results via this approach, and provide representative examples covering group testing, graphical model selection, sparse linear regression, density estimation, and convex optimization."

in_NB
information_theory
minimax
statistics
estimation
have_read
re:HEAS
august 2019 by cshalizi

[1908.04748] Optimal Estimation of Generalized Average Treatment Effects using Kernel Optimal Matching

august 2019 by cshalizi

"In causal inference, a variety of causal effect estimands have been studied, including the sample, uncensored, target, conditional, optimal subpopulation, and optimal weighted average treatment effects. Ad-hoc methods have been developed for each estimand based on inverse probability weighting (IPW) and on outcome regression modeling, but these may be sensitive to model misspecification, practical violations of positivity, or both. The contribution of this paper is twofold. First, we formulate the generalized average treatment effect (GATE) to unify these causal estimands as well as their IPW estimates. Second, we develop a method based on Kernel Optimal Matching (KOM) to optimally estimate GATE and to find the GATE most easily estimable by KOM, which we term the Kernel Optimal Weighted Average Treatment Effect. KOM provides uniform control on the conditional mean squared error of a weighted estimator over a class of models while simultaneously controlling for precision. We study its theoretical properties and evaluate its comparative performance in a simulation study. We illustrate the use of KOM for GATE estimation in two case studies: comparing spine surgical interventions and studying the effect of peer support on people living with HIV."

to:NB
causal_inference
estimation
matching
statistics
kernel_estimators
august 2019 by cshalizi

Non-Standard Parametric Statistical Inference - Russell Cheng - Oxford University Press

august 2019 by cshalizi

"This book discusses the fitting of parametric statistical models to data samples. Emphasis is placed on: (i) how to recognize situations where the problem is non-standard when parameter estimates behave unusually, and (ii) the use of parametric bootstrap resampling methods in analyzing such problems.

"A frequentist likelihood-based viewpoint is adopted, for which there is a well-established and very practical theory. The standard situation is where certain widely applicable regularity conditions hold. However, there are many apparently innocuous situations where standard theory breaks down, sometimes spectacularly. Most of the departures from regularity are described geometrically, with only sufficient mathematical detail to clarify the non-standard nature of a problem and to allow formulation of practical solutions.

"The book is intended for anyone with a basic knowledge of statistical methods, as is typically covered in a university statistical inference course, wishing to understand or study how standard methodology might fail. Easy to understand statistical methods are presented which overcome these difficulties, and demonstrated by detailed examples drawn from real applications. Simple and practical model-building is an underlying theme.

"Parametric bootstrap resampling is used throughout for analyzing the properties of fitted models, illustrating its ease of implementation even in non-standard situations. Distributional properties are obtained numerically for estimators or statistics not previously considered in the literature because their theoretical distributional properties are too hard to obtain theoretically. Bootstrap results are presented mainly graphically in the book, providing an accessible demonstration of the sampling behaviour of estimators."

to:NB
bootstrap
estimation
hypothesis_testing
statistics
"A frequentist likelihood-based viewpoint is adopted, for which there is a well-established and very practical theory. The standard situation is where certain widely applicable regularity conditions hold. However, there are many apparently innocuous situations where standard theory breaks down, sometimes spectacularly. Most of the departures from regularity are described geometrically, with only sufficient mathematical detail to clarify the non-standard nature of a problem and to allow formulation of practical solutions.

"The book is intended for anyone with a basic knowledge of statistical methods, as is typically covered in a university statistical inference course, wishing to understand or study how standard methodology might fail. Easy to understand statistical methods are presented which overcome these difficulties, and demonstrated by detailed examples drawn from real applications. Simple and practical model-building is an underlying theme.

"Parametric bootstrap resampling is used throughout for analyzing the properties of fitted models, illustrating its ease of implementation even in non-standard situations. Distributional properties are obtained numerically for estimators or statistics not previously considered in the literature because their theoretical distributional properties are too hard to obtain theoretically. Bootstrap results are presented mainly graphically in the book, providing an accessible demonstration of the sampling behaviour of estimators."

august 2019 by cshalizi

[1908.00598] Sampling-free Epistemic Uncertainty Estimation Using Approximated Variance Propagation

august 2019 by cshalizi

"We present a sampling-free approach for computing the epistemic uncertainty of a neural network. Epistemic uncertainty is an important quantity for the deployment of deep neural networks in safety-critical applications, since it represents how much one can trust predictions on new data. Recently promising works were proposed using noise injection combined with Monte-Carlo sampling at inference time to estimate this quantity (e.g. Monte-Carlo dropout). Our main contribution is an approximation of the epistemic uncertainty estimated by these methods that does not require sampling, thus notably reducing the computational overhead. We apply our approach to large-scale visual tasks (i.e., semantic segmentation and depth regression) to demonstrate the advantages of our method compared to sampling-based approaches in terms of quality of the uncertainty estimates as well as of computational overhead."

--- Prediction before scanning the paper: It's the delta method (which I learned as "propagation of error" in physics lab).

--- And on p. 2, start of sec. 3, we read "At its core our method uses error propagation [25], commonly used in physics, where the error is equivalent to the variance". (Well, at least they know that!)

to:NB
neural_networks
estimation
statistics
propagation_of_error
everything_old_is_new_again
uncertainty_for_neural_networks
--- Prediction before scanning the paper: It's the delta method (which I learned as "propagation of error" in physics lab).

--- And on p. 2, start of sec. 3, we read "At its core our method uses error propagation [25], commonly used in physics, where the error is equivalent to the variance". (Well, at least they know that!)

august 2019 by cshalizi

[1908.00310] Maximum likelihood estimation of power-law degree distributions using friendship paradox based sampling

august 2019 by cshalizi

"This paper considers the problem of estimating a power-law degree distribution of an undirected network. Even though power-law degree distributions are ubiquitous in nature, the widely used parametric methods for estimating them (e.g. linear regression on double-logarithmic axes, maximum likelihood estimation with uniformly sampled nodes) suffer from the large variance introduced by the lack of data-points from the tail portion of the power-law degree distribution. As a solution, we present a novel maximum likelihood estimation approach that exploits the friendship paradox to sample more efficiently from the tail of the degree distribution. We analytically show that the proposed method results in a smaller bias, variance and a Cramer-Rao lower bound compared to the maximum-likelihood estimate obtained with uniformly sampled nodes (which is the most commonly used method in literature). Detailed simulation results are presented to illustrate the performance of the proposed method under different conditions and how it compares with alternative methods."

to:NB
network_data_analysis
heavy_tails
to_be_shot_after_a_fair_trial
estimation
statistics
august 2019 by cshalizi

[1510.00551] Investigation of Parameter Uncertainty in Clustering Using a Gaussian Mixture Model Via Jackknife, Bootstrap and Weighted Likelihood Bootstrap

july 2019 by cshalizi

"Mixture models are a popular tool in model-based clustering. Such a model is often fitted by a procedure that maximizes the likelihood, such as the EM algorithm. At convergence, the maximum likelihood parameter estimates are typically reported, but in most cases little emphasis is placed on the variability associated with these estimates. In part this may be due to the fact that standard errors are not directly calculated in the model-fitting algorithm, either because they are not required to fit the model, or because they are difficult to compute. The examination of standard errors in model-based clustering is therefore typically neglected. The widely used R package mclust has recently introduced bootstrap and weighted likelihood bootstrap methods to facilitate standard error estimation. This paper provides an empirical comparison of these methods (along with the jackknife method) for producing standard errors and confidence intervals for mixture parameters. These methods are illustrated and contrasted in both a simulation study and in the traditional Old Faithful data set and Thyroid data set."

to:NB
statistics
estimation
mixture_models
confidence_sets
july 2019 by cshalizi

[1906.08283] Minimum Stein Discrepancy Estimators

june 2019 by cshalizi

"When maximum likelihood estimation is infeasible, one often turns to score matching, contrastive divergence, or minimum probability flow learning to obtain tractable parameter estimates. We provide a unifying perspective of these techniques as minimum Stein discrepancy estimators and use this lens to design new diffusion kernel Stein discrepancy (DKSD) and diffusion score matching (DSM) estimators with complementary strengths. We establish the consistency, asymptotic normality, and robustness of DKSD and DSM estimators, derive stochastic Riemannian gradient descent algorithms for their efficient optimization, and demonstrate their advantages over score matching in models with non-smooth densities or heavy tailed distributions."

to:NB
statistics
estimation
june 2019 by cshalizi

[1712.07248] Towards a General Large Sample Theory for Regularized Estimators

june 2019 by cshalizi

"We present a general framework for studying regularized estimators; such estimators are pervasive in estimation problems wherein "plug-in" type estimators are either ill-defined or ill-behaved. Within this framework, we derive, under primitive conditions, consistency and a generalization of the asymptotic linearity property. We also provide data-driven methods for choosing tuning parameters that, under some conditions, achieve the aforementioned properties. We illustrate the scope of our approach by studying a wide range of applications, revisiting known results and deriving new ones."

to:NB
statistics
estimation
optimization
to_read
june 2019 by cshalizi

[1906.05944] Statistical Inference for Generative Models with Maximum Mean Discrepancy

june 2019 by cshalizi

"While likelihood-based inference and its variants provide a statistically efficient and widely applicable approach to parametric inference, their application to models involving intractable likelihoods poses challenges. In this work, we study a class of minimum distance estimators for intractable generative models, that is, statistical models for which the likelihood is intractable, but simulation is cheap. The distance considered, maximum mean discrepancy (MMD), is defined through the embedding of probability measures into a reproducing kernel Hilbert space. We study the theoretical properties of these estimators, showing that they are consistent, asymptotically normal and robust to model misspecification. A main advantage of these estimators is the flexibility offered by the choice of kernel, which can be used to trade-off statistical efficiency and robustness. On the algorithmic side, we study the geometry induced by MMD on the parameter space and use this to introduce a novel natural gradient descent-like algorithm for efficient implementation of these estimators. We illustrate the relevance of our theoretical results on several classes of models including a discrete-time latent Markov process and two multivariate stochastic differential equation models."

to:NB
simulation
indirect_inference
statistics
estimation
hilbert_space
june 2019 by cshalizi

[1805.07454] Fisher Efficient Inference of Intractable Models

may 2019 by cshalizi

"Maximum Likelihood Estimators (MLE) has many good properties. For example, the asymptotic variance of MLE solution attains equality of the asymptotic Cram{é}r-Rao lower bound (efficiency bound), which is the minimum possible variance for an unbiased estimator. However, obtaining such MLE solution requires calculating the likelihood function which may not be tractable due to the normalization term of the density model. In this paper, we derive a Discriminative Likelihood Estimator (DLE) from the Kullback-Leibler divergence minimization criterion implemented via density ratio estimation procedure and Stein operator. We study the problem of model inference using DLE. We prove its consistency and show the asymptotic variance of its solution can also attain the equality of the efficiency bound under mild regularity conditions. We also propose a dual formulation of DLE which can be easily optimized. Numerical studies validate our asymptotic theorems and we give an example where DLE successfully estimates an intractable model constructed using a pre-trained deep neural network."

to:NB
likelihood
estimation
statistics
may 2019 by cshalizi

[1903.06576] A nonasymptotic law of iterated logarithm for general M-estimators

may 2019 by cshalizi

"M-estimators are ubiquitous in machine learning and statistical learning theory. They are used both for defining prediction strategies and for evaluating their precision. In this paper, we propose the first non-asymptotic "any-time" deviation bounds for general M-estimators, where "any-time" means that the bound holds with a prescribed probability for every sample size. These bounds are nonasymptotic versions of the law of iterated logarithm. They are established under general assumptions such as Lipschitz continuity of the loss function and (local) curvature of the population risk. These conditions are satisfied for most examples used in machine learning, including those ensuring robustness to outliers and to heavy tailed distributions. As an example of application, we consider the problem of best arm identification in a parametric stochastic multi-arm bandit setting. We show that the established bound can be converted into a new algorithm, with provably optimal theoretical guarantees. Numerical experiments illustrating the validity of the algorithm are reported."

to:NB
estimation
deviation_inequalities
statistics
may 2019 by cshalizi

[1904.02826] What can be estimated? Identifiability, estimability, causal inference and ill-posed inverse problems

april 2019 by cshalizi

"Here we consider, in the context of causal inference, the general question: 'what can be estimated from data?'. We call this the question of estimability. We consider the usual definition adopted in the causal inference literature -- identifiability -- in a general mathematical setting and show why it is an inadequate formal translation of the concept of estimability. Despite showing that identifiability implies the existence of a Fisher-consistent estimator, we show that this estimator may be discontinuous, hence unstable, in general. The source of the difficulty is that the general form of the causal inference problem is an ill-posed inverse problem. Inverse problems have three conditions which must be satisfied in order to be considered well-posed: existence, uniqueness, and stability of solutions. We illustrate how identifiability corresponds to the question of uniqueness; in contrast, we take estimability to mean satisfaction of all three conditions, i.e. well-posedness. Well-known results from the inverse problems literature imply that mere identifiability does not guarantee well-posedness of the causal inference procedure, i.e. estimability, and apparent solutions to causal inference problems can be essentially useless with even the smallest amount of imperfection. Analogous issues with attempts to apply standard statistical procedures to very general settings were raised in the statistical literature as far back as the 60s and 70s. Thus, in addition to giving general characterisations of identifiability and estimability, we demonstrate how the issues raised in both the theory of inverse problems and in the theory of statistical inference lead to concerns over the stability of general nonparametric approaches to causal inference. These apply, in particular, to those that focus on identifiability while excluding the additional stability requirements required for estimability."

to:NB
causal_inference
statistics
estimation
to_be_shot_after_a_fair_trial
identifiability
inverse_problems
april 2019 by cshalizi

Object-oriented Computation of Sandwich Estimators | Zeileis | Journal of Statistical Software

october 2018 by cshalizi

"Sandwich covariance matrix estimators are a popular tool in applied regression modeling for performing inference that is robust to certain types of model misspecification. Suitable implementations are available in the R system for statistical computing for certain model fitting functions only (in particular lm()), but not for other standard regression functions, such as glm(), nls(), or survreg(). Therefore, conceptual tools and their translation to computational tools in the package sandwich are discussed, enabling the computation of sandwich estimators in general parametric models. Object orientation can be achieved by providing a few extractor functions' most importantly for the empirical estimating functions' from which various types of sandwich estimators can be computed."

to:NB
computational_statistics
R
estimation
regression
statistics
to_teach
october 2018 by cshalizi

5601 Notes: The Sandwich Estimator

october 2018 by cshalizi

I believe the subscript in n inside the sums defining V_n and J_n should be i. Otherwise, this is terrific (unsurprisingly).

to:NB
to_teach
have_read
statistics
estimation
fisher_information
misspecification
geyer.charles
re:HEAS
october 2018 by cshalizi

[1608.00060] Double/Debiased Machine Learning for Treatment and Causal Parameters

july 2018 by cshalizi

"Most modern supervised statistical/machine learning (ML) methods are explicitly designed to solve prediction problems very well. Achieving this goal does not imply that these methods automatically deliver good estimators of causal parameters. Examples of such parameters include individual regression coefficients, average treatment effects, average lifts, and demand or supply elasticities. In fact, estimates of such causal parameters obtained via naively plugging ML estimators into estimating equations for such parameters can behave very poorly due to the regularization bias. Fortunately, this regularization bias can be removed by solving auxiliary prediction problems via ML tools. Specifically, we can form an orthogonal score for the target low-dimensional parameter by combining auxiliary and main ML predictions. The score is then used to build a de-biased estimator of the target parameter which typically will converge at the fastest possible 1/root(n) rate and be approximately unbiased and normal, and from which valid confidence intervals for these parameters of interest may be constructed. The resulting method thus could be called a "double ML" method because it relies on estimating primary and auxiliary predictive models. In order to avoid overfitting, our construction also makes use of the K-fold sample splitting, which we call cross-fitting. This allows us to use a very broad set of ML predictive methods in solving the auxiliary and main prediction problems, such as random forest, lasso, ridge, deep neural nets, boosted trees, as well as various hybrids and aggregators of these methods."

to:NB
statistics
estimation
high-dimensional_statistics
robins.james
via:jbdelong
july 2018 by cshalizi

A review of asymptotic theory of estimating functions | SpringerLink

july 2018 by cshalizi

"Asymptotic statistical theory for estimating functions is reviewed in a generality suitable for stochastic processes. Conditions concerning existence of a consistent estimator, uniqueness, rate of convergence, and the asymptotic distribution are treated separately. Our conditions are not minimal, but can be verified for many interesting stochastic process models. Several examples illustrate the wide applicability of the theory and why the generality is needed."

to:NB
statistics
estimation
asymptotics
time_series
statistical_inference_for_stochastic_processes
july 2018 by cshalizi

Indirect inference through prediction

july 2018 by cshalizi

"By recasting indirect inference estimation as a prediction rather than a minimization and by using regularized regressions, we can bypass the three major problems of estimation: selecting the summary statistics, defining the distance function and minimizing it numerically. By substituting regression with classification we can extend this approach to model selection as well. We present three examples: a statistical fit, the parametrization of a simple RBC model and heuristics selection in a fishery agent-based model."

agent-based_models
prediction
statistics
estimation
indirect_inference
simulation
have_read
in_NB
july 2018 by cshalizi

[1804.10611] On the Estimation of Latent Distances Using Graph Distances

may 2018 by cshalizi

"We are given the adjacency matrix of a geometric graph and the task of recovering the latent positions. We study one of the most popular approaches which consists in using the graph distances and derive error bounds under various assumptions on the link function. In the simplest case where the link function is an indicator function, the bound is (nearly) optimal as it (nearly) matches an information lower bound."

to:NB
network_data_analysis
statistics
re:hyperbolic_networks
via:ale
estimation
inference_to_latent_objects
to_read
may 2018 by cshalizi

Wald : Estimation of a Parameter When the Number of Unknown Parameters Increases Indefinitely with the Number of Observations (1948)

january 2018 by cshalizi

"Necessary and sufficient conditions are given for the existence of a uniformly consistent estimate of an unknown parameter θ when the successive observations are not necessarily independent and the number of unknown parameters involved in the joint distribution of the observations increases indefinitely with the number of observations. In analogy with R. A. Fisher's information function, the amount of information contained in the first n observations regarding θ is defined. A sufficient condition for the non-existence of a uniformly consistent estimate of θ is given in section 3 in terms of the information function. Section 4 gives a simplified expression for the amount of information when the successive observations are independent."

in_NB
have_read
fisher_information
estimation
information_theory
nonparametrics
statistics
wald.abraham
january 2018 by cshalizi

[1711.07137] Nonparametric Double Robustness

january 2018 by cshalizi

"Use of nonparametric techniques (e.g., machine learning, kernel smoothing, stacking) are increasingly appealing because they do not require precise knowledge of the true underlying models that generated the data under study. Indeed, numerous authors have advocated for their use with standard methods (e.g., regression, inverse probability weighting) in epidemiology. However, when used in the context of such singly robust approaches, nonparametric methods can lead to suboptimal statistical properties, including inefficiency and no valid confidence intervals. Using extensive Monte Carlo simulations, we show how doubly robust methods offer improvements over singly robust approaches when implemented via nonparametric methods. We use 10,000 simulated samples and 50, 100, 200, 600, and 1200 observations to investigate the bias and mean squared error of singly robust (g Computation, inverse probability weighting) and doubly robust (augmented inverse probability weighting, targeted maximum likelihood estimation) estimators under four scenarios: correct and incorrect model specification; and parametric and nonparametric estimation. As expected, results show best performance with g computation under correctly specified parametric models. However, even when based on complex transformed covariates, double robust estimation performs better than singly robust estimators when nonparametric methods are used. Our results suggest that nonparametric methods should be used with doubly instead of singly robust estimation techniques."

to:NB
statistics
causal_inference
estimation
nonparametrics
to_teach:undergrad-ADA
kith_and_kin
january 2018 by cshalizi

[1406.0423] Targeted Maximum Likelihood Estimation using Exponential Families

october 2017 by cshalizi

"Targeted maximum likelihood estimation (TMLE) is a general method for estimating parameters in semiparametric and nonparametric models. Each iteration of TMLE involves fitting a parametric submodel that targets the parameter of interest. We investigate the use of exponential families to define the parametric submodel. This implementation of TMLE gives a general approach for estimating any smooth parameter in the nonparametric model. A computational advantage of this approach is that each iteration of TMLE involves estimation of a parameter in an exponential family, which is a convex optimization problem for which software implementing reliable and computationally efficient methods exists. We illustrate the method in three estimation problems, involving the mean of an outcome missing at random, the parameter of a median regression model, and the causal effect of a continuous exposure, respectively. We conduct a simulation study comparing different choices for the parametric submodel, focusing on the first of these problems. To the best of our knowledge, this is the first study investigating robustness of TMLE to different specifications of the parametric submodel. We find that the choice of submodel can have an important impact on the behavior of the estimator in finite samples."

to:NB
statistics
estimation
nonparametrics
causal_inference
exponential_families
october 2017 by cshalizi

Oracle M-Estimation for Time Series Models - Giurcanu - 2016 - Journal of Time Series Analysis - Wiley Online Library

april 2017 by cshalizi

"We propose a thresholding M-estimator for multivariate time series. Our proposed estimator has the oracle property that its large-sample properties are the same as of the classical M-estimator obtained under the a priori information that the zero parameters were known. We study the consistency of the standard block bootstrap, the centred block bootstrap and the empirical likelihood block bootstrap distributions of the proposed M-estimator. We develop automatic selection procedures for the thresholding parameter and for the block length of the bootstrap methods. We present the results of a simulation study of the proposed methods for a sparse vector autoregressive VAR(2) time series model. The analysis of two real-world data sets illustrate applications of the methods in practice."

bootstrap
time_series
statistics
estimation
in_NB
sparsity
variable_selection
high-dimensional_statistics
april 2017 by cshalizi

[1311.5768] An RKHS Approach to Estimation with Sparsity Constraints

december 2016 by cshalizi

"The investigation of the effects of sparsity or sparsity constraints in signal processing problems has received considerable attention recently. Sparsity constraints refer to the a priori information that the object or signal of interest can be represented by using only few elements of a predefined dictionary. Within this thesis, sparsity refers to the fact that a vector to be estimated has only few nonzero entries. One specific field concerned with sparsity constraints has become popular under the name Compressed Sensing (CS). Within CS, the sparsity is exploited in order to perform (nearly) lossless compression. Moreover, this compression is carried out jointly or simultaneously with the process of sensing a physical quantity. In contrast to CS, one can alternatively use sparsity to enhance signal processing methods. Obviously, sparsity constraints can only improve the obtainable estimation performance since the constraints can be interpreted as an additional prior information about the unknown parameter vector which is to be estimated. Our main focus will be on this aspect of sparsity, i.e., we analyze how much we can gain in estimation performance due to the sparsity constraints."

to:NB
sparsity
compressed_sensing
hilbert_space
estimation
statistics
december 2016 by cshalizi

[1210.6516] The RKHS Approach to Minimum Variance Estimation Revisited: Variance Bounds, Sufficient Statistics, and Exponential Families

december 2016 by cshalizi

"The mathematical theory of reproducing kernel Hilbert spaces (RKHS) provides powerful tools for minimum variance estimation (MVE) problems. Here, we extend the classical RKHS based analysis of MVE in several directions. We develop a geometric formulation of five known lower bounds on the estimator variance (Barankin bound, Cramer-Rao bound, constrained Cramer-Rao bound, Bhattacharyya bound, and Hammersley-Chapman-Robbins bound) in terms of orthogonal projections onto a subspace of the RKHS associated with a given MVE problem. We show that, under mild conditions, the Barankin bound (the tightest possible lower bound on the estimator variance) is a lower semicontinuous function of the parameter vector. We also show that the RKHS associated with an MVE problem remains unchanged if the observation is replaced by a sufficient statistic. Finally, for MVE problems conforming to an exponential family of distributions, we derive novel closed-form lower bound on the estimator variance and show that a reduction of the parameter set leaves the minimum achievable variance unchanged."

in_NB
hilbert_space
cramer-rao
estimation
statistics
sufficiency
exponential_families
december 2016 by cshalizi

[1204.2477] A Simple Explanation of A Spectral Algorithm for Learning Hidden Markov Models

november 2016 by cshalizi

"A simple linear algebraic explanation of the algorithm in "A Spectral Algorithm for Learning Hidden Markov Models" (COLT 2009). Most of the content is in Figure 2; the text just makes everything precise in four nearly-trivial claims."

to:NB
spectral_methods
re:AoS_project
markov_models
state-space_models
statistics
time_series
estimation
november 2016 by cshalizi

On The So-Called “Huber Sandwich Estimator” and “Robust Standard Errors”

may 2016 by cshalizi

"The “Huber Sandwich Estimator” can be used to estimate the variance of the MLE when the underlying model is incorrect. If the model is nearly correct, so are the usual standard errors, and robustification is unlikely to help much. On the other hand, if the model is seriously in error, the sandwich may help on the variance side, but the parameters being estimated by the MLE are likely to be meaningless—except perhaps as descriptive statistics."

to:NB
have_read
statistics
estimation
misspecification
freedman.david
to_teach:linear_models
may 2016 by cshalizi

[1601.00815] Semi-parametric efficiency bounds and efficient estimation for high-dimensional models

february 2016 by cshalizi

"Asymptotic lower bounds for estimation play a fundamental role in assessing the quality of statistical procedures. In this paper we consider the possibility of establishing semi-parametric efficiency bounds for high-dimensional models and construction of estimators reaching these bounds. We propose a local uniform asymptotic unbiasedness assumption for high-dimensional models and derive explicit lower bounds on the variance of any asymptotically unbiased estimator. We show that an estimator obtained by de-sparsifying (or de-biasing) an ℓ1-penalized M-estimator is asymptotically unbiased and achieves the lower bound on the variance: thus it is asymptotically efficient. In particular, we consider the linear regression model, Gaussian graphical models and Gaussian sequence models under mild conditions. Furthermore, motivated by the results of Le Cam on local asymptotic normality, we show that the de-sparsified estimator converges to the limiting normal distribution with zero mean and the smallest possible variance not only pointwise, but locally uniformly in the underlying parameter. This is achieved by deriving an extension of Le Cam's Lemma to the high-dimensional setting."

to:NB
statistics
high-dimensional_statistics
estimation
van_de_geer.sara
february 2016 by cshalizi

[1601.01413] Local average causal effects and superefficiency

february 2016 by cshalizi

"Recent approaches in causal inference have proposed estimating average causal effects that are local to some subpopulation, often for reasons of efficiency. These inferential targets are sometimes data-adaptive, in that they are dependent on the empirical distribution of the data. In this short note, we show that if researchers are willing to adapt the inferential target on the basis of efficiency, then extraordinary gains in precision can be obtained. Specifically, when causal effects are heterogeneous, any asymptotically normal and root-n consistent estimator of the population average causal effect is superefficient for a data-adaptive local average causal effect. Our result illustrates the fundamental gain in statistical certainty afforded by indifference about the inferential target."

to:NB
causal_inference
estimation
statistics
aronow.peter
february 2016 by cshalizi

[1601.04736] A Consistent Direct Method for Estimating Parameters in Ordinary Differential Equations Models

february 2016 by cshalizi

"Ordinary differential equations provide an attractive framework for modeling temporal dynamics in a variety of scientific settings. We show how consistent estimation for parameters in ODE models can be obtained by modifying a direct (non-iterative) least squares method similar to the direct methods originally developed by Himmelbau, Jones and Bischoff. Our method is called the bias-corrected least squares (BCLS) method since it is a modification of least squares methods known to be biased. Consistency of the BCLS method is established and simulations are used to compare the BCLS method to other methods for parameter estimation in ODE models."

to:NB
statistics
dynamical_systems
estimation
re:stacs
february 2016 by cshalizi

[1602.00359] Confidence intervals for means under constrained dependence

february 2016 by cshalizi

"We develop a general framework for conducting inference on the mean of dependent random variables given constraints on their dependency graph. We establish the consistency of an oracle variance estimator of the mean when the dependency graph is known, along with an associated central limit theorem. We derive an integer linear program for finding an upper bound for the estimated variance when the graph is unknown, but topological and degree-based constraints are available. We develop alternative bounds, including a closed-form bound, under an additional homoskedasticity assumption. We establish a basis for Wald-type confidence intervals for the mean that are guaranteed to have asymptotically conservative coverage. We apply the approach to inference from a social network link-tracing study and provide statistical software implementing the approach."

to:NB
network_data_analysis
graphical_models
estimation
statistics
confidence_sets
february 2016 by cshalizi

Large Sample Properties of Matching Estimators for Average Treatment Effects - Abadie - 2005 - Econometrica - Wiley Online Library

february 2016 by cshalizi

"Matching estimators for average treatment effects are widely used in evaluation research despite the fact that their large sample properties have not been established in many cases. The absence of formal results in this area may be partly due to the fact that standard asymptotic expansions do not apply to matching estimators with a fixed number of matches because such estimators are highly nonsmooth functionals of the data. In this article we develop new methods for analyzing the large sample properties of matching estimators and establish a number of new results. We focus on matching with replacement with a fixed number of matches. First, we show that matching estimators are not N1/2-consistent in general and describe conditions under which matching estimators do attain N1/2-consistency. Second, we show that even in settings where matching estimators are N1/2-consistent, simple matching estimators with a fixed number of matches do not attain the semiparametric efficiency bound. Third, we provide a consistent estimator for the large sample variance that does not require consistent nonparametric estimation of unknown functions. Software for implementing these methods is available in Matlab, Stata, and R."

--- An unkind version of this would be "matching is what happens when you do nearest-neighbor regression, and you forget that the bias-variance tradeoff is a _tradeoff_."

(Ungated version: http://www.ksg.harvard.edu/fs/aabadie/smep.pdf)

(ADA note: reference in the causal-estimation chapter, re connection between matching and nearest neighbors)

to:NB
statistics
estimation
causal_inference
regression
to_teach:undergrad-ADA
have_read
matching
--- An unkind version of this would be "matching is what happens when you do nearest-neighbor regression, and you forget that the bias-variance tradeoff is a _tradeoff_."

(Ungated version: http://www.ksg.harvard.edu/fs/aabadie/smep.pdf)

(ADA note: reference in the causal-estimation chapter, re connection between matching and nearest neighbors)

february 2016 by cshalizi

Mathematical Foundations of Infinite-Dimensional Statistical Models | Statistical Theory and Methods | Cambridge University Press

january 2016 by cshalizi

"In nonparametric and high-dimensional statistical models, the classical Gauss-Fisher-Le Cam theory of the optimality of maximum likelihood estimators and Bayesian posterior inference does not apply, and new foundations and ideas have been developed in the past several decades. This book gives a coherent account of the statistical theory in infinite-dimensional parameter spaces. The mathematical foundations include self-contained 'mini-courses' on the theory of Gaussian and empirical processes, on approximation and wavelet theory, and on the basic theory of function spaces. The theory of statistical inference in such models – hypothesis testing, estimation and confidence sets – is then presented within the minimax paradigm of decision theory. This includes the basic theory of convolution kernel and projection estimation, but also Bayesian nonparametrics and nonparametric maximum likelihood estimation. In a final chapter the theory of adaptive inference in nonparametric models is developed, including Lepski's method, wavelet thresholding, and adaptive inference for self-similar functions."

in_NB
books:recommended
statistics
estimation
nonparametrics
books:owned
january 2016 by cshalizi

Herbst, E.P. and Schorfheide, F.: Bayesian Estimation of DSGE Models (eBook and Hardcover).

january 2016 by cshalizi

"Dynamic stochastic general equilibrium (DSGE) models have become one of the workhorses of modern macroeconomics and are extensively used for academic research as well as forecasting and policy analysis at central banks. This book introduces readers to state-of-the-art computational techniques used in the Bayesian analysis of DSGE models. The book covers Markov chain Monte Carlo techniques for linearized DSGE models, novel sequential Monte Carlo methods that can be used for parameter inference, and the estimation of nonlinear DSGE models based on particle filter approximations of the likelihood function. The theoretical foundations of the algorithms are discussed in depth, and detailed empirical applications and numerical illustrations are provided. The book also gives invaluable advice on how to tailor these algorithms to specific applications and assess the accuracy and reliability of the computations."

to:NB
books:noted
econometrics
macroeconomics
time_series
estimation
statistics
re:your_favorite_dsge_sucks
january 2016 by cshalizi

Empirical Bernstein Inequalities for U-Statistics

december 2015 by cshalizi

"We present original empirical Bernstein inequalities for U-statistics with bounded symmetric kernels q. They are expressed with respect to empirical estimates of either the variance of q or the conditional variance that appears in the Bernstein-type inequality for U-statistics derived by Arcones [2]. Our result subsumes other existing empirical Bernstein inequalities, as it reduces to them when U-statistics of order 1 are considered. In addition, it is based on a rather direct argument using two applications of the same (non-empirical) Bernstein inequality for U-statistics. We discuss potential applications of our new inequalities, especially in the realm of learning ranking/scoring functions. In the process, we exhibit an efficient procedure to compute the variance estimates for the special case of bipartite ranking that rests on a sorting argument. We also argue that our results may provide test set bounds and particularly interesting empirical racing algorithms for the problem of online learning of scoring functions."

in_NB
u-statistics
statistics
estimation
deviation_inequalities
re:smoothing_adjacency_matrices
december 2015 by cshalizi

[1504.04580] Robust estimation of U-statistics

december 2015 by cshalizi

"An important part of the legacy of Evarist Gin\'e is his fundamental contributions to our understanding of U-statistics and U-processes. In this paper we discuss the estimation of the mean of multivariate functions in case of possibly heavy-tailed distributions. In such situations, reliable estimates of the mean cannot be obtained by usual U-statistics. We introduce a new estimator, based on the so-called median-of-means technique. We develop performance bounds for this new estimator that generalizes an estimate of Arcones and Gin\'e (1993), showing that the new estimator performs, under minimal moment conditions, as well as classical U-statistics for bounded random variables. We discuss an application of this estimator to clustering."

in_NB
heavy_tails
statistics
estimation
deviation_inequalities
re:smoothing_adjacency_matrices
u-statistics
december 2015 by cshalizi

AEAweb: JEL (53,3) p. 631 - Communicating Uncertainty in Official Economic Statistics: An Appraisal Fifty Years after Morgenstern

september 2015 by cshalizi

"Federal statistical agencies in the United States and analogous agencies elsewhere commonly report official economic statistics as point estimates, without accompanying measures of error. Users of the statistics may incorrectly view them as error free or may incorrectly conjecture error magnitudes. This paper discusses strategies to mitigate misinterpretation of official statistics by communicating uncertainty to the public. Sampling error can be measured using established statistical principles. The challenge is to satisfactorily measure the various forms of nonsampling error. I find it useful to distinguish transitory statistical uncertainty, permanent statistical uncertainty, and conceptual uncertainty. I illustrate how each arises as the Bureau of Economic Analysis periodically revises GDP estimates, the Census Bureau generates household income statistics from surveys with nonresponse, and the Bureau of Labor Statistics seasonally adjusts employment statistics. I anchor my discussion of communication of uncertainty in the contribution of Oskar Morgenstern (1963a), who argued forcefully for agency publication of error estimates for official economic statistics."

to:NB
to_read
statistics
economics
on_the_accuracy_of_economic_observations
manski.charles
estimation
september 2015 by cshalizi

[1507.00964] Non-parametric estimation of Fisher information from real data

august 2015 by cshalizi

"The Fisher Information matrix is a widely used measure for applications ranging from statistical inference, information geometry, experiment design, to the study of criticality in biological systems. Yet there is no commonly accepted non-parametric algorithm to estimate it from real data. In this rapid communication we show how to accurately estimate the Fisher information in a nonparametric way. We also develop a numerical procedure to minimize the errors by choosing the interval of the finite difference scheme necessary to compute the derivatives in the definition of the Fisher information. Our method uses the recently published "Density Estimation using Field Theory" algorithm to compute the probability density functions for continuous densities. We use the Fisher information of the normal distribution to validate our method and as an example we compute the temperature component of the Fisher Information Matrix in the two dimensional Ising model and show that it obeys the expected relation to the heat capacity and therefore peaks at the phase transition at the correct critical temperature."

--- The idea of a non-parametric estimate of Fisher information, which presumes a parametric model, is a little boggling. I should read it, I guess, but perhaps they have some sort of semi-parametric setting in mind?

to:NB
fisher_information
estimation
nonparametrics
statistics
to_be_shot_after_a_fair_trial
--- The idea of a non-parametric estimate of Fisher information, which presumes a parametric model, is a little boggling. I should read it, I guess, but perhaps they have some sort of semi-parametric setting in mind?

august 2015 by cshalizi

[1507.02061] Honest confidence regions and optimality in high-dimensional precision matrix estimation

august 2015 by cshalizi

"We propose methodology for estimation of sparse precision matrices and statistical inference for their low-dimensional parameters in a high-dimensional setting where the number of parameters p can be much larger than the sample size. We show that the novel estimator achieves minimax rates in supremum norm and the low-dimensional components of the estimator have a Gaussian limiting distribution. These results hold uniformly over the class of precision matrices with row sparsity of small order n‾‾√/logp and spectrum uniformly bounded, under sub-Gaussian tail assumption on the margins of the true underlying distribution. Consequently, our results lead to uniformly valid confidence regions for low-dimensional parameters of the precision matrix. Thresholding the estimator leads to variable selection without imposing irrepresentability conditions. The performance of the method is demonstrated in a simulation study."

to:NB
confidence_sets
estimation
high-dimensional_statistics
statistics
van_de_geer.sara
august 2015 by cshalizi

[1507.04118] Oracle inequalities for network models and sparse graphon estimation

august 2015 by cshalizi

"Inhomogeneous random graph models encompass many network models such as stochastic block models and latent position models. In this paper, we study two estimators -- the ordinary block constant least squares estimator, and its restricted version. We show that they satisfy oracle inequalities with respect to the block constant oracle. As a consequence, we derive optimal rates of estimation of the probability matrix. Our results cover the important setting of sparse networks. Nonparametric rates for graphon estimation in the L2 norm are also derived when the probability matrix is sampled according to a graphon model. The results shed light on the differences between estimation under the empirical loss (the probability matrix estimation) and under the integrated loss (the graphon estimation)."

to:NB
network_data_analysis
graph_limits
statistics
minimax
estimation
re:smoothing_adjacency_matrices
august 2015 by cshalizi

[1507.04553] Approximate Maximum Likelihood Estimation

august 2015 by cshalizi

"In recent years, methods of approximate parameter estimation have attracted considerable interest in complex problems where exact likelihoods are hard to obtain. In their most basic form, Bayesian methods such as Approximate Bayesian Computation (ABC) involve sampling from the parameter space and keeping those parameters that produce data that fit sufficiently well to the actually observed data. Exploring the whole parameter space, however, makes this approach inefficient in high dimensional problems. This led to the proposal of more sophisticated iterative methods of inference such as particle filters.

"Here, we propose an alternative approach that is based on stochastic gradient methods and applicable both in a frequentist and a Bayesian setting. By moving along a simulated gradient, the algorithm produces a sequence of estimates that will eventually converge either to the maximum likelihood estimate or to the maximum of the posterior distribution, in each case under a set of observed summary statistics. To avoid reaching only a local maximum, we propose to run the algorithm from a set of random starting values.

"As good tuning of the algorithm is important, we explored several tuning strategies, and propose a set of guidelines that worked best in our simulations. We investigate the performance of our approach in simulation studies, and also apply the algorithm to two models with intractable likelihood functions. First, we present an application to inference in the context of queuing systems. We also re-analyze population genetic data and estimate parameters describing the demographic history of Sumatran and Bornean orang-utan populations."

in_NB
statistics
computational_statistics
stochastic_approximation
likelihood
estimation
primates
"Here, we propose an alternative approach that is based on stochastic gradient methods and applicable both in a frequentist and a Bayesian setting. By moving along a simulated gradient, the algorithm produces a sequence of estimates that will eventually converge either to the maximum likelihood estimate or to the maximum of the posterior distribution, in each case under a set of observed summary statistics. To avoid reaching only a local maximum, we propose to run the algorithm from a set of random starting values.

"As good tuning of the algorithm is important, we explored several tuning strategies, and propose a set of guidelines that worked best in our simulations. We investigate the performance of our approach in simulation studies, and also apply the algorithm to two models with intractable likelihood functions. First, we present an application to inference in the context of queuing systems. We also re-analyze population genetic data and estimate parameters describing the demographic history of Sumatran and Bornean orang-utan populations."

august 2015 by cshalizi

[1507.08612] Likelihood-free inference in high-dimensional models

august 2015 by cshalizi

"Methods that bypass analytical evaluations of the likelihood function have become an indispensable tool for statistical inference in many fields of science. These so-called likelihood-free methods rely on accepting and rejecting simulations based on summary statistics, which limits them to low dimensional models for which the absolute likelihood is large enough to result in manageable acceptance rates. To get around these issues, we introduce a novel, likelihood-free Markov-Chain Monte Carlo (MCMC) method combining two key innovations: updating only one parameter per iteration and accepting or rejecting this update based on subsets of statistics sufficient for this parameter. This increases acceptance rates dramatically, rendering this approach suitable even for models of very high dimensionality. We further derive that for linear models, a one dimensional combination of statistics per parameter is sufficient and can be found empirically with simulations. Finally, we demonstrate that our method readily scales to models of very high dimensionality using both toy models as well as by jointly inferring the effective population size, the distribution of fitness effects of new mutations (DFE) and selection coefficients for each locus from data of a recent experiment on the evolution of drug-resistance in Influenza."

to:NB
simulation
approximate_bayesian_computation
statistics
estimation
august 2015 by cshalizi

Stable Spectral Learning Based on Schur Decomposition

july 2015 by cshalizi

"Spectral methods are a powerful tool for inferring the parameters of certain classes of probability distributions by means of standard eigenvalue- eigenvector decompositions. Spectral algorithms can be orders of magnitude faster than log- likelihood based and related iterative methods, and, thanks to the uniqueness of the spectral de- composition, they enjoy global optimality guar- antees. In practice, however, the applicability of spectral methods is limited due to their sensitiv- ity to model misspecification, which can cause instability issues in the case of non-exact models. We present a new spectral approach that is based on the Schur triangularization of an observable matrix, and we carry out the corresponding theo- retical analysis. Our main result is a bound on the estimation error that is shown to depend linearly on the condition number of the ground-truth con- ditional probability matrix and inversely on the eigengap of an observable matrix. Numerical ex- periments show that the proposed method is more stable, and performs better in general, than the classical spectral approach using direct matrix di- agonalization."

to:NB
to_read
spectral_methods
mixture_models
statistics
computational_statistics
estimation
misspecification
july 2015 by cshalizi

Model Search by Bootstrap "Bumping" on JSTOR

july 2015 by cshalizi

"We propose a bootstrap-based method for enhancing a search through a space of models. The technique is well suited to complex, adaptively fitted models--it provides a convenient method for finding better local minima and for resistant fitting. Applications to regression, classification, and density estimation are described. We also provide results on the asymptotic behavior of bumping estimates."

to:NB
estimation
bootstrap
statistics
tibshirani.robert
july 2015 by cshalizi

[1506.01831] Handy sufficient conditions for the convergence of the maximum likelihood estimator in observation-driven models

july 2015 by cshalizi

"This paper generalizes asymptotic properties obtained in the observation-driven times series models considered by \cite{dou:kou:mou:2013} in the sense that the conditional law of each observation is also permitted to depend on the parameter. The existence of ergodic solutions and the consistency of the Maximum Likelihood Estimator (MLE) are derived under easy-to-check conditions. The obtained conditions appear to apply for a wide class of models. We illustrate our results with specific observation-driven times series, including the recently introduced NBIN-GARCH and NM-GARCH models, demonstrating the consistency of the MLE for these two models."

in_NB
statistics
likelihood
estimation
statistical_inference_for_stochastic_processes
douc.randal
chains_with_complete_connections
july 2015 by cshalizi

Information-theoretic optimality of observation-driven time series models for continuous responses

june 2015 by cshalizi

"We investigate information-theoretic optimality properties of the score function of the predictive likelihood as a device for updating a real-valued time-varying parameter in a univariate observation-driven model with continuous responses. We restrict our attention to models with updates of one lag order. The results provide theoretical justification for a class of score-driven models which includes the generalized autoregressive conditional heteroskedasticity model as a special case. Our main contribution is to show that only parameter updates based on the score will always reduce the local Kullback–Leibler divergence between the true conditional density and the model-implied conditional density. This result holds irrespective of the severity of model misspecification. We also show that use of the score leads to a considerably smaller global Kullback–Leibler divergence in empirically relevant settings. We illustrate the theory with an application to time-varying volatility models. We show that the reduction in Kullback–Leibler divergence across a range of different settings can be substantial compared to updates based on, for example, squared lagged observations."

in_NB
statistics
information_theory
estimation
likelihood
prediction
time_series
chains_with_complete_connections
june 2015 by cshalizi

Biased and Inefficient - Superefficiency

may 2015 by cshalizi

I wonder if this related to the results about how superefficiency is only possible on a computable set of parameter values?

statistics
estimation
may 2015 by cshalizi

Neyman: "Two Aspects of the Representative Method" (1934)

march 2015 by cshalizi

This is a pretty amazing paper, not just for introducing confidence intervals, but also for setting the pattern for a huge area of statistics pretty much down to the present.

in_NB
have_read
statistics
estimation
surveys
sampling
confidence_sets
neyman.jerzy
march 2015 by cshalizi

Fu : Large Sample Point Estimation: A Large Deviation Theory Approach

february 2015 by cshalizi

"In this paper the exponential rates of decrease and bounds on tail probabilities for consistent estimators are studied using large deviation methods. The asymptotic expansions of Bahadur bounds and exponential rates in the case of the maximum likelihood estimator are obtained. Based on these results we have obtained a result parallel to the Fisher-Rao-Efron result concerning second-order efficiency (see Efron, 1975). Our results also substantiate the geometric observation given by Efron (1975) that if the statistical curvature of the underlying distribution is small, then the maximum likelihood estimator is nearly optimal.'

in_NB
large_deviations
statistics
estimation
have_read
february 2015 by cshalizi

[1404.1578] Models as Approximations: How Random Predictors and Model Violations Invalidate Classical Inference in Regression

february 2015 by cshalizi

"We review and interpret the early insights of Halbert White who over thirty years ago inaugurated a form of statistical inference for regression models that is asymptotically correct even under "model misspecification," that is, under the assumption that models are approximations rather than generative truths. This form of inference, which is pervasive in econometrics, relies on the "sandwich estimator" of standard error. Whereas linear models theory in statistics assumes models to be true and predictors to be fixed, White's theory permits models to be approximate and predictors to be random. Careful reading of his work shows that the deepest consequences for statistical inference arise from a synergy --- a "conspiracy" --- of nonlinearity and randomness of the predictors which invalidates the ancillarity argument that justifies conditioning on the predictors when they are random. Unlike the standard error of linear models theory, the sandwich estimator provides asymptotically correct inference in the presence of both nonlinearity and heteroskedasticity. An asymptotic comparison of the two types of standard error shows that discrepancies between them can be of arbitrary magnitude. If there exist discrepancies, standard errors from linear models theory are usually too liberal even though occasionally they can be too conservative as well. A valid alternative to the sandwich estimator is provided by the "pairs bootstrap"; in fact, the sandwich estimator can be shown to be a limiting case of the pairs bootstrap. We conclude by giving meaning to regression slopes when the linear model is an approximation rather than a truth. --- In this review we limit ourselves to linear least squares regression, but many qualitative insights hold for most forms of regression."

-- Very close to what I teach in my class, though I haven't really talked about sandwich variances.

in_NB
have_read
statistics
regression
linear_regression
bootstrap
misspecification
estimation
approximation
-- Very close to what I teach in my class, though I haven't really talked about sandwich variances.

february 2015 by cshalizi

[1409.7458] Beyond Maximum Likelihood: from Theory to Practice

january 2015 by cshalizi

"Maximum likelihood is the most widely used statistical estimation technique. Recent work by the authors introduced a general methodology for the construction of estimators for functionals in parametric models, and demonstrated improvements - both in theory and in practice - over the maximum likelihood estimator (MLE), particularly in high dimensional scenarios involving parameter dimension comparable to or larger than the number of samples. This approach to estimation, building on results from approximation theory, is shown to yield minimax rate-optimal estimators for a wide class of functionals, implementable with modest computational requirements. In a nutshell, a message of this recent work is that, for a wide class of functionals, the performance of these essentially optimal estimators with n samples is comparable to that of the MLE with nlnn samples.

"In the present paper, we highlight the applicability of the aforementioned methodology to statistical problems beyond functional estimation, and show that it can yield substantial gains. For example, we demonstrate that for learning tree-structured graphical models, our approach achieves a significant reduction of the required data size compared with the classical Chow--Liu algorithm, which is an implementation of the MLE, to achieve the same accuracy. The key step in improving the Chow--Liu algorithm is to replace the empirical mutual information with the estimator for mutual information proposed by the authors. Further, applying the same replacement approach to classical Bayesian network classification, the resulting classifiers uniformly outperform the previous classifiers on 26 widely used datasets."

to:NB
estimation
likelihood
statistics
"In the present paper, we highlight the applicability of the aforementioned methodology to statistical problems beyond functional estimation, and show that it can yield substantial gains. For example, we demonstrate that for learning tree-structured graphical models, our approach achieves a significant reduction of the required data size compared with the classical Chow--Liu algorithm, which is an implementation of the MLE, to achieve the same accuracy. The key step in improving the Chow--Liu algorithm is to replace the empirical mutual information with the estimator for mutual information proposed by the authors. Further, applying the same replacement approach to classical Bayesian network classification, the resulting classifiers uniformly outperform the previous classifiers on 26 widely used datasets."

january 2015 by cshalizi

[1411.2045] Multivariate f-Divergence Estimation With Confidence

january 2015 by cshalizi

"The problem of f-divergence estimation is important in the fields of machine learning, information theory, and statistics. While several nonparametric divergence estimators exist, relatively few have known convergence properties. In particular, even for those estimators whose MSE convergence rates are known, the asymptotic distributions are unknown. We establish the asymptotic normality of a recently proposed ensemble estimator of f-divergence between two distributions from a finite number of samples. This estimator has MSE convergence rate of O(1/T), is simple to implement, and performs well in high dimensions. This theory enables us to perform divergence-based inference tasks such as testing equality of pairs of distributions based on empirical samples. We experimentally validate our theoretical results and, as an illustration, use them to empirically bound the best achievable classification error."

estimation
entropy_estimation
information_theory
statistics
two-sample_tests
in_NB
hero.alfred_o._iii
january 2015 by cshalizi

[1411.4723] A Frequentist Approach to Computer Model Calibration

january 2015 by cshalizi

"This paper considers the computer model calibration problem and provides a general frequentist solution. Under the proposed framework, the data model is semi-parametric with a nonparametric discrepancy function which accounts for any discrepancy between the physical reality and the computer model. In an attempt to solve a fundamentally important (but often ignored) identifiability issue between the computer model parameters and the discrepancy function, this paper proposes a new and identifiable parametrization of the calibration problem. It also develops a two-step procedure for estimating all the relevant quantities under the new parameterization. This estimation procedure is shown to enjoy excellent rates of convergence and can be straightforwardly implemented with existing software. For uncertainty quantification, bootstrapping is adopted to construct confidence regions for the quantities of interest. The practical performance of the proposed methodology is illustrated through simulation examples and an application to a computational fluid dynamics model."

- i.e., pick the parameter value where a nonparametric regression of the residuals is as small as possible on average.

to:NB
simulation
statistics
estimation
re:stacs
- i.e., pick the parameter value where a nonparametric regression of the residuals is as small as possible on average.

january 2015 by cshalizi

[1412.8695] On Particle Methods for Parameter Estimation in State-Space Models

january 2015 by cshalizi

"Nonlinear non-Gaussian state-space models are ubiquitous in statistics, econometrics, information engineering and signal processing. Particle methods, also known as Sequential Monte Carlo (SMC) methods, provide reliable numerical approximations to the associated state inference problems. However, in most applications, the state-space model of interest also depends on unknown static parameters that need to be estimated from the data. In this context, standard particle methods fail and it is necessary to rely on more sophisticated algorithms. The aim of this paper is to present a comprehensive review of particle methods that have been proposed to perform static parameter estimation in state-space models. We discuss the advantages and limitations of these methods and illustrate their performance on simple models."

to:NB
particle_filters
time_series
statistics
estimation
state-space_models
january 2015 by cshalizi

Maximum Likelihood Estimation of Misspecified Models

september 2014 by cshalizi

"This paper examines the consequences and detection of model misspecification when using maximum likelihood techniques for estimation and inference. The quasi-maximum likelihood estimator (OMLE) converges to a well defined limit, and may or may not be consistent for particular parameters of interest. Standard tests (Wald, Lagrange Multiplier, or Likelihood Ratio) are invalid in the presence of misspecification, but more general statistics are given which allow inferences to be drawn robustly. The properties of the QMLE and the information matrix are exploited to yield several useful tests for model misspecification."

to:NB
likelihood
estimation
misspecification
statistics
white.halbert
september 2014 by cshalizi

Interference Between Units in Randomized Experiments - Journal of the American Statistical Association - Volume 102, Issue 477

august 2014 by cshalizi

"In a randomized experiment comparing two treatments, there is interference between units if applying the treatment to one unit may affect other units. Interference implies that treatment effects are not comparisons of two potential responses that a unit may exhibit, one under treatment and the other under control, but instead are inherently more complex. Interference is common in social settings where people communicate, compete, or spread disease; in studies that treat one part of an organism using a symmetrical part as control; in studies that apply different treatments to the same organism at different times; and in many other situations. Available statistical tools are limited. For instance, Fisher's sharp null hypothesis of no treatment effect implicitly entails no interference, and so his randomization test may be used to test no effect, but conventional ways of inverting the test to obtain confidence intervals, say for an additive effect, are not applicable with interference. Another commonly used approach assumes that interference is of a simple parametric form confined to units that are near one another in time or space; this is useful when applicable but is of little use when interference may be widespread and of uncertain form. Exact, nonparametric methods are developed for inverting randomization tests to obtain confidence intervals for magnitudes of effect assuming nothing at all about the structure of the interference between units. The limitations of these methods are discussed. To illustrate the general approach, two simple methods and two simple empirical examples are discussed. Extension to randomization based covariance adjustment is briefly described."

causal_inference
experimental_design
statistics
estimation
network_data_analysis
social_influence
rosenbaum.paul
to_read
in_NB
august 2014 by cshalizi

[1408.1554] A complete data frame work for fitting power law distributions

august 2014 by cshalizi

"Over the last few decades power law distributions have been suggested as forming generative mechanisms in a variety of disparate fields, such as, astrophysics, criminology and database curation. However, fitting these heavy tailed distributions requires care, especially since the power law behaviour may only be present in the distributional tail. Current state of the art methods for fitting these models rely on estimating the cut-off parameter xmin. This results in the majority of collected data being discarded. This paper provides an alternative, principled approached for fitting heavy tailed distributions. By directly modelling the deviation from the power law distribution, we can fit and compare a variety of competing models in a single unified framework."

to:NB
heavy_tails
statistics
estimation
to_read
august 2014 by cshalizi

[1408.1182] Empirical non-parametric estimation of the Fisher Information

august 2014 by cshalizi

"The Fisher information matrix (FIM) is a foundational concept in statistical signal processing. The FIM depends on the probability distribution, assumed to belong to a smooth parametric family. Traditional approaches to estimating the FIM require estimating the probability distribution, or its parameters, along with its gradient or Hessian. However, in many practical situations the probability distribution of the data is not known. Here we propose a method of estimating the Fisher information directly from the data that does not require knowledge of the underlying probability distribution. The method is based on non-parametric estimation of an f-divergence over a local neighborhood of the parameter space and a relation between curvature of the f-divergence and the FIM. Thus we obtain an empirical estimator of the FIM that does not require density estimation and is asymptotically consistent. We empirically evaluate the validity of our approach using two experiments.'

in_NB
statistics
fisher_information
estimation
hero.alfred_o._iii
nonparametrics
entropy_estimation
august 2014 by cshalizi

Rydén : EM versus Markov chain Monte Carlo for estimation of hidden Markov models: a computational perspective

july 2014 by cshalizi

"Hidden Markov models (HMMs) and related models have become standard in statistics during the last 15--20 years, with applications in diverse areas like speech and other statistical signal processing, hydrology, financial statistics and econometrics, bioinformatics etc. Inference in HMMs is traditionally often carried out using the EM algorithm, but examples of Bayesian estimation, in general implemented through Markov chain Monte Carlo (MCMC) sampling are also frequent in the HMM literature. The purpose of this paper is to compare the EM and MCMC approaches in three cases of different complexity; the examples include model order selection, continuous-time HMMs and variants of HMMs in which the observed data depends on many hidden variables in an overlapping fashion. All these examples in some way or another originate from real-data applications. Neither EM nor MCMC analysis of HMMs is a black-box methodology without need for user-interaction, and we will illustrate some of the problems, like poor mixing and long computation times, one may expect to encounter."

to:NB
em_algorithm
monte_carlo
markov_models
state-space_models
estimation
statistics
computational_statistics
ryden.tobias
july 2014 by cshalizi

[1406.0063] Causal network inference using biochemical kinetics

july 2014 by cshalizi

"Network models are widely used as structural summaries of biochemical systems. Statistical estimation of networks is usually based on linear or discrete models. However, the dynamics of these systems are generally nonlinear, suggesting that suitable nonlinear formulations may offer gains with respect to network inference and associated prediction problems. We present a general framework for both network inference and dynamical prediction that is rooted in nonlinear biochemical kinetics. This is done by considering a dynamical system based on a chemical reaction graph and associated kinetics parameters. Inference regarding both parameters and the reaction graph itself is carried out within a fully Bayesian framework. Prediction of dynamical behavior is achieved by averaging over both parameters and reaction graphs, allowing prediction even when the underlying reactions themselves are unknown or uncertain. Results, based on (i) data simulated from a mechanistic model of mitogen-activated protein kinase signaling and (ii) phosphoproteomic data from cancer cell lines, demonstrate that nonlinear formulations can yield gains in network inference and permit dynamical prediction in the challenging setting where the reaction graph is unknown."

to:NB
biochemical_networks
graphical_models
statistics
estimation
july 2014 by cshalizi

[1406.6956] Order-Optimal Estimation of Functionals of Discrete Distributions

july 2014 by cshalizi

"We propose a general framework for the construction and analysis of estimators for a wide class of functionals of discrete distributions, where the alphabet size S is unknown and may be scaling with the number of observations n. We treat the respective regions where the functional is "nonsmooth" and "smooth" separately. In the "nonsmooth" regime, we apply an unbiased estimator for the best polynomial approximation of the functional whereas, in the "smooth" regime, we apply a bias-corrected version of the Maximum Likelihood Estimator (MLE).

"We illustrate the merit of this approach by thoroughly analyzing the performance of the resulting schemes for estimating two important information measures: the entropy and the R\'enyi entropy of order α. We obtain the best known upper bounds for the maximum mean squared error incurred in estimating these functionals. In particular, we demonstrate that our estimator achieves the optimal sample complexity n=Θ(S/lnS) for entropy estimation. We also demonstrate that it suffices to have n=ω(S1/α/lnS) for estimating the R\'enyi entropy of order α, 0<α<1. Conversely, we establish a minimax lower bound that establishes optimality of this sample complexity to within a lnS‾‾‾‾√ factor.

"We highlight the practical advantages of our schemes for the estimation of entropy and mutual information. We compare our performance with the popular MLE and with the order-optimal entropy estimator of Valiant and Valiant. As we illustrate with a few experiments, our approach results in shorter running time and higher accuracy."

in_NB
entropy_estimation
statistics
to_read
estimation
information_theory
"We illustrate the merit of this approach by thoroughly analyzing the performance of the resulting schemes for estimating two important information measures: the entropy and the R\'enyi entropy of order α. We obtain the best known upper bounds for the maximum mean squared error incurred in estimating these functionals. In particular, we demonstrate that our estimator achieves the optimal sample complexity n=Θ(S/lnS) for entropy estimation. We also demonstrate that it suffices to have n=ω(S1/α/lnS) for estimating the R\'enyi entropy of order α, 0<α<1. Conversely, we establish a minimax lower bound that establishes optimality of this sample complexity to within a lnS‾‾‾‾√ factor.

"We highlight the practical advantages of our schemes for the estimation of entropy and mutual information. We compare our performance with the popular MLE and with the order-optimal entropy estimator of Valiant and Valiant. As we illustrate with a few experiments, our approach results in shorter running time and higher accuracy."

july 2014 by cshalizi

[1406.6959] Maximum Likelihood Estimation of Functionals of Discrete Distributions

july 2014 by cshalizi

"The Maximum Likelihood Estimator (MLE) is widely used in estimating functionals of discrete probability distributions, and involves "plugging-in" the empirical distribution of the data. In this work we propose a general framework and procedure to analyze the performance of the MLE in estimating functionals of discrete distributions, under the worst-case mean squared error criterion. In particular, we use approximation theory to bound the bias incurred by the MLE, and concentration inequalities to bound the variance. We highlight our techniques by considering two important information measures: the entropy, and the R\'enyi entropy of order α. For entropy estimation, we show that it is necessary and sufficient to have n=ω(S) observations for the MLE to be consistent, where S represents the alphabet size. In addition, we obtain that it is necessary and sufficient to consider n=ω(S1/α) samples for the MLE to consistently estimate ∑Si=1pαi,0<α<1. For both these problems, the MLE achieves the best possible sample complexity up to logarithmic factors. When α>1, we show that n=ω(max(S2/α−1,1)) samples suffice."

in_NB
to_read
statistics
entropy_estimation
estimation
information_theory
july 2014 by cshalizi

[1405.5103] Estimation in high dimensions: a geometric perspective

june 2014 by cshalizi

"This tutorial paper provides an exposition of a flexible geometric framework for high dimensional estimation problems with constraints. The paper develops geometric intuition about high dimensional sets, justifies it with some results of asymptotic convex geometry, and demonstrates connections between geometric results and estimation problems. The theory is illustrated with applications to sparse recovery, matrix completion, quantization, linear and logistic regression and generalized linear models."

to:NB
statistics
high-dimensional_statistics
estimation
sparsity
convexity
geometry
compressed_sensing
to_read
june 2014 by cshalizi

Non-Asymptotic Analysis of Relational Learning with One Network | AISTATS 2014 | JMLR W&CP

april 2014 by cshalizi

"This theoretical paper is concerned with a rigorous non-asymptotic analysis of relational learning applied to a single network. Under suitable and intuitive conditions on features and clique dependencies over the network, we present the first probably approximately correct (PAC) bound for maximum likelihood estimation (MLE). To our best knowledge, this is the first sample complexity result of this problem. We propose a novel combinational approach to analyze complex dependencies of relational data, which is crucial to our non-asymptotic analysis. The consistency of MLE under our conditions is also proved as the consequence of our sample complexity bound. Finally, our combinational method for analyzing dependent data can be easily generalized to treat other generalized maximum likelihood estimators for relational learning."

in_NB
estimation
relational_learning
learning_theory
statistics
network_data_analysis
to_read
re:XV_for_networks
entableted
april 2014 by cshalizi

**related tags**

Copy this bookmark: