**cshalizi + likelihood**
69

[1908.08741] A relation between log-likelihood and cross-validation log-scores

6 weeks ago by cshalizi

"It is shown that the log-likelihood of a hypothesis or model given some data is equivalent to an average of all leave-one-out cross-validation log-scores that can be calculated from all subsets of the data. This relation can be generalized to any k-fold cross-validation log-scores."

--- This sounds funny, because leave-one-out is (asymptotically) equivalent to the robustified AIC (= Takeuchi information criterion).

--- ETA after reading: The algebra looks legit, but kinda pointless.

statistics
likelihood
cross-validation
have_read
shot_after_a_fair_trial
not_worth_putting_in_notebooks
--- This sounds funny, because leave-one-out is (asymptotically) equivalent to the robustified AIC (= Takeuchi information criterion).

--- ETA after reading: The algebra looks legit, but kinda pointless.

6 weeks ago by cshalizi

A Likelihood Ratio Approach to Sequential Change Point Detection for a General Class of Parameters: Journal of the American Statistical Association: Vol 0, No 0

10 weeks ago by cshalizi

"In this article, we propose a new approach for sequential monitoring of a general class of parameters of a d-dimensional time series, which can be estimated by approximately linear functionals of the empirical distribution function. We consider a closed-end method, which is motivated by the likelihood ratio test principle and compare the new method with two alternative procedures. We also incorporate self-normalization such that estimation of the long-run variance is not necessary. We prove that for a large class of testing problems the new detection scheme has asymptotic level α and is consistent. The asymptotic theory is illustrated for the important cases of monitoring a change in the mean, variance, and correlation. By means of a simulation study it is demonstrated that the new test performs better than the currently available procedures for these problems. Finally, the methodology is illustrated by a small data example investigating index prices from the dot-com bubble."

to:NB
change-point_problem
likelihood
statistics
10 weeks ago by cshalizi

[1907.09611] Asymptotic normality, concentration, and coverage of generalized posteriors

11 weeks ago by cshalizi

"Generalized likelihoods are commonly used to obtain consistent estimators with attractive computational and robustness properties. Formally, any generalized likelihood can be used to define a generalized posterior distribution, but an arbitrarily defined "posterior" cannot be expected to appropriately quantify uncertainty in any meaningful sense. In this article, we provide sufficient conditions under which generalized posteriors exhibit concentration, asymptotic normality (Bernstein-von Mises), an asymptotically correct Laplace approximation, and asymptotically correct frequentist coverage. We apply our results in detail to generalized posteriors for a wide array of generalized likelihoods, including pseudolikelihoods in general, the Ising model pseudolikelihood, the Gaussian Markov random field pseudolikelihood, the fully observed Boltzmann machine pseudolikelihood, the Cox proportional hazards partial likelihood, and a median-based likelihood for robust inference of location. Further, we show how our results can be used to easily establish the asymptotics of standard posteriors for exponential families and generalized linear models. We make no assumption of model correctness so that our results apply with or without misspecification."

to:NB
bayesian_consistency
statistics
to_read
likelihood
misspecification
11 weeks ago by cshalizi

Mitchell , Allman , Rhodes : Hypothesis testing near singularities and boundaries

12 weeks ago by cshalizi

"The likelihood ratio statistic, with its asymptotic χ2χ2 distribution at regular model points, is often used for hypothesis testing. However, the asymptotic distribution can differ at model singularities and boundaries, suggesting the use of a χ2χ2 might be problematic nearby. Indeed, its poor behavior for testing near singularities and boundaries is apparent in simulations, and can lead to conservative or anti-conservative tests. Here we develop a new distribution designed for use in hypothesis testing near singularities and boundaries, which asymptotically agrees with that of the likelihood ratio statistic. For two example trinomial models, arising in the context of inference of evolutionary trees, we show the new distributions outperform a χ2χ2."

to:NB
hypothesis_testing
likelihood
statistics
12 weeks ago by cshalizi

Likelihood Ratio Tests for a Large Directed Acyclic Graph: Journal of the American Statistical Association: Vol 0, No 0

12 weeks ago by cshalizi

"Inference of directional pairwise relations between interacting units in a directed acyclic graph (DAG), such as a regulatory gene network, is common in practice, imposing challenges because of lack of inferential tools. For example, inferring a specific gene pathway of a regulatory gene network is biologically important. Yet, frequentist inference of directionality of connections remains largely unexplored for regulatory models. In this article, we propose constrained likelihood ratio tests for inference of the connectivity as well as directionality subject to nonconvex acyclicity constraints in a Gaussian directed graphical model. Particularly, we derive the asymptotic distributions of the constrained likelihood ratios in a high-dimensional situation. For testing of connectivity, the asymptotic distribution is either chi-squared or normal depending on if the number of testable links in a DAG model is small. For testing of directionality, the asymptotic distribution is the minimum of d independent chi-squared variables with one-degree of freedom or a generalized Gamma distribution depending on if d is small, where d is number of breakpoints in a hypothesized pathway. Moreover, we develop a computational method to perform the proposed tests, which integrates an alternating direction method of multipliers and difference convex programming. Finally, the power analysis and simulations suggest that the tests achieve the desired objectives of inference. An analysis of an Alzheimer’s disease gene expression dataset illustrates the utility of the proposed method to infer a directed pathway in a gene network."

to:NB
graphical_models
likelihood
hypothesis_testing
statistics
12 weeks ago by cshalizi

[1703.07963] A Donsker-type Theorem for Log-likelihood Processes

june 2019 by cshalizi

"Let (Ω,,()t≥0,P) be a complete stochastic basis, X a semimartingale with predictable compensator (B,C,ν). Consider a family of probability measures P=(Pn,ψ,ψ∈Ψ,n≥1), where Ψ is an index set, Pn,ψ≪locP, and denote the likelihood ratio process by Zn,ψt=dPn,ψ|tdP|t. Under some regularity conditions in terms of logarithm entropy and Hellinger processes, we prove that logZnt converges weakly to a Gaussian process in ℓ∞(Ψ) as n→∞ for each fixed t>0."

to:NB
statistics
likelihood
convergence_of_stochastic_processes
june 2019 by cshalizi

[1905.11505] Validation of Approximate Likelihood and Emulator Models for Computationally Intensive Simulations

may 2019 by cshalizi

"Complex phenomena are often modeled with computationally intensive feed-forward simulations for which a tractable analytic likelihood does not exist. In these cases, it is sometimes necessary to use an approximate likelihood or faster emulator model for efficient statistical inference. We describe a new two-sample testing framework for quantifying the quality of the fit to simulations at fixed parameter values. This framework can leverage any regression method to handle complex high-dimensional data and attain higher power in settings where well-known distance-based tests would not. We also introduce a statistically rigorous test for assessing global goodness-of-fit across simulation parameters. In cases where the fit is inadequate, our method provides valuable diagnostics by allowing one to identify regions in both feature and parameter space which the model fails to reproduce well. We provide both theoretical results and examples which illustrate the effectiveness of our approach."

to:NB
statistics
simulation
likelihood
kith_and_kin
lee.ann_b.
izbicki.rafael
may 2019 by cshalizi

[1805.07454] Fisher Efficient Inference of Intractable Models

may 2019 by cshalizi

"Maximum Likelihood Estimators (MLE) has many good properties. For example, the asymptotic variance of MLE solution attains equality of the asymptotic Cram{é}r-Rao lower bound (efficiency bound), which is the minimum possible variance for an unbiased estimator. However, obtaining such MLE solution requires calculating the likelihood function which may not be tractable due to the normalization term of the density model. In this paper, we derive a Discriminative Likelihood Estimator (DLE) from the Kullback-Leibler divergence minimization criterion implemented via density ratio estimation procedure and Stein operator. We study the problem of model inference using DLE. We prove its consistency and show the asymptotic variance of its solution can also attain the equality of the efficiency bound under mild regularity conditions. We also propose a dual formulation of DLE which can be easily optimized. Numerical studies validate our asymptotic theorems and we give an example where DLE successfully estimates an intractable model constructed using a pre-trained deep neural network."

to:NB
likelihood
estimation
statistics
may 2019 by cshalizi

[1905.09715] An illustration of the risk of borrowing information via a shared likelihood

may 2019 by cshalizi

"A concrete, stylized example illustrates that inferences may be degraded, rather than improved, by incorporating supplementary data via a joint likelihood. In the example, the likelihood is assumed to be correctly specified, as is the prior over the parameter of interest; all that is necessary for the joint modeling approach to suffer is misspecification of the prior over a nuisance parameter."

to:NB
misspecification
likelihood
statistics
hahn.p._richard
may 2019 by cshalizi

Bretó : Modeling and Inference for Infectious Disease Dynamics: A Likelihood-Based Approach

may 2019 by cshalizi

"Likelihood-based statistical inference has been considered in most scientific fields involving stochastic modeling. This includes infectious disease dynamics, where scientific understanding can help capture biological processes in so-called mechanistic models and their likelihood functions. However, when the likelihood of such mechanistic models lacks a closed-form expression, computational burdens are substantial. In this context, algorithmic advances have facilitated likelihood maximization, promoting the study of novel data-motivated mechanistic models over the last decade. Reviewing these models is the focus of this paper. In particular, we highlight statistical aspects of these models like overdispersion, which is key in the interface between nonlinear infectious disease modeling and data analysis. We also point out potential directions for further model exploration."

to:NB
epidemic_models
likelihood
statistics
may 2019 by cshalizi

A Composite Likelihood Framework for Analyzing Singular DSGE Models | The Review of Economics and Statistics | MIT Press Journals

january 2019 by cshalizi

"This paper builds on the composite likelihood concept of Lindsay (1988) to develop a framework for parameter identification, estimation, inference, and forecasting in dynamic stochastic general equilibrium (DSGE) models allowing for stochastic singularity. The framework consists of four components. First, it provides a necessary and sufficient condition for parameter identification, where the identifying information is provided by the first- and second-order properties of nonsingular submodels. Second, it provides a procedure based on Markov Chain Monte Carlo for parameter estimation. Third, it delivers confidence sets for structural parameters and impulse responses that allow for model misspecification. Fourth, it generates forecasts for all the observed endogenous variables, irrespective of the number of shocks in the model. The framework encompasses the conventional likelihood analysis as a special case when the model is nonsingular. It enables the researcher to start with a basic model and then gradually incorporate more shocks and other features, meanwhile confronting all the models with the data to assess their implications. The methodology is illustrated using both small- and medium-scale DSGE models. These models have numbers of shocks ranging between 1 and 7."

to:NB
state-space_models
economics
time_series
macroeconomics
statistics
likelihood
re:your_favorite_dsge_sucks
january 2019 by cshalizi

[1507.04553] Approximate Maximum Likelihood Estimation

august 2015 by cshalizi

"In recent years, methods of approximate parameter estimation have attracted considerable interest in complex problems where exact likelihoods are hard to obtain. In their most basic form, Bayesian methods such as Approximate Bayesian Computation (ABC) involve sampling from the parameter space and keeping those parameters that produce data that fit sufficiently well to the actually observed data. Exploring the whole parameter space, however, makes this approach inefficient in high dimensional problems. This led to the proposal of more sophisticated iterative methods of inference such as particle filters.

"Here, we propose an alternative approach that is based on stochastic gradient methods and applicable both in a frequentist and a Bayesian setting. By moving along a simulated gradient, the algorithm produces a sequence of estimates that will eventually converge either to the maximum likelihood estimate or to the maximum of the posterior distribution, in each case under a set of observed summary statistics. To avoid reaching only a local maximum, we propose to run the algorithm from a set of random starting values.

"As good tuning of the algorithm is important, we explored several tuning strategies, and propose a set of guidelines that worked best in our simulations. We investigate the performance of our approach in simulation studies, and also apply the algorithm to two models with intractable likelihood functions. First, we present an application to inference in the context of queuing systems. We also re-analyze population genetic data and estimate parameters describing the demographic history of Sumatran and Bornean orang-utan populations."

in_NB
statistics
computational_statistics
stochastic_approximation
likelihood
estimation
primates
"Here, we propose an alternative approach that is based on stochastic gradient methods and applicable both in a frequentist and a Bayesian setting. By moving along a simulated gradient, the algorithm produces a sequence of estimates that will eventually converge either to the maximum likelihood estimate or to the maximum of the posterior distribution, in each case under a set of observed summary statistics. To avoid reaching only a local maximum, we propose to run the algorithm from a set of random starting values.

"As good tuning of the algorithm is important, we explored several tuning strategies, and propose a set of guidelines that worked best in our simulations. We investigate the performance of our approach in simulation studies, and also apply the algorithm to two models with intractable likelihood functions. First, we present an application to inference in the context of queuing systems. We also re-analyze population genetic data and estimate parameters describing the demographic history of Sumatran and Bornean orang-utan populations."

august 2015 by cshalizi

[1506.01831] Handy sufficient conditions for the convergence of the maximum likelihood estimator in observation-driven models

july 2015 by cshalizi

"This paper generalizes asymptotic properties obtained in the observation-driven times series models considered by \cite{dou:kou:mou:2013} in the sense that the conditional law of each observation is also permitted to depend on the parameter. The existence of ergodic solutions and the consistency of the Maximum Likelihood Estimator (MLE) are derived under easy-to-check conditions. The obtained conditions appear to apply for a wide class of models. We illustrate our results with specific observation-driven times series, including the recently introduced NBIN-GARCH and NM-GARCH models, demonstrating the consistency of the MLE for these two models."

in_NB
statistics
likelihood
estimation
statistical_inference_for_stochastic_processes
douc.randal
chains_with_complete_connections
july 2015 by cshalizi

Information-theoretic optimality of observation-driven time series models for continuous responses

june 2015 by cshalizi

"We investigate information-theoretic optimality properties of the score function of the predictive likelihood as a device for updating a real-valued time-varying parameter in a univariate observation-driven model with continuous responses. We restrict our attention to models with updates of one lag order. The results provide theoretical justification for a class of score-driven models which includes the generalized autoregressive conditional heteroskedasticity model as a special case. Our main contribution is to show that only parameter updates based on the score will always reduce the local Kullback–Leibler divergence between the true conditional density and the model-implied conditional density. This result holds irrespective of the severity of model misspecification. We also show that use of the score leads to a considerably smaller global Kullback–Leibler divergence in empirically relevant settings. We illustrate the theory with an application to time-varying volatility models. We show that the reduction in Kullback–Leibler divergence across a range of different settings can be substantial compared to updates based on, for example, squared lagged observations."

in_NB
statistics
information_theory
estimation
likelihood
prediction
time_series
chains_with_complete_connections
june 2015 by cshalizi

[1409.7458] Beyond Maximum Likelihood: from Theory to Practice

january 2015 by cshalizi

"Maximum likelihood is the most widely used statistical estimation technique. Recent work by the authors introduced a general methodology for the construction of estimators for functionals in parametric models, and demonstrated improvements - both in theory and in practice - over the maximum likelihood estimator (MLE), particularly in high dimensional scenarios involving parameter dimension comparable to or larger than the number of samples. This approach to estimation, building on results from approximation theory, is shown to yield minimax rate-optimal estimators for a wide class of functionals, implementable with modest computational requirements. In a nutshell, a message of this recent work is that, for a wide class of functionals, the performance of these essentially optimal estimators with n samples is comparable to that of the MLE with nlnn samples.

"In the present paper, we highlight the applicability of the aforementioned methodology to statistical problems beyond functional estimation, and show that it can yield substantial gains. For example, we demonstrate that for learning tree-structured graphical models, our approach achieves a significant reduction of the required data size compared with the classical Chow--Liu algorithm, which is an implementation of the MLE, to achieve the same accuracy. The key step in improving the Chow--Liu algorithm is to replace the empirical mutual information with the estimator for mutual information proposed by the authors. Further, applying the same replacement approach to classical Bayesian network classification, the resulting classifiers uniformly outperform the previous classifiers on 26 widely used datasets."

to:NB
estimation
likelihood
statistics
"In the present paper, we highlight the applicability of the aforementioned methodology to statistical problems beyond functional estimation, and show that it can yield substantial gains. For example, we demonstrate that for learning tree-structured graphical models, our approach achieves a significant reduction of the required data size compared with the classical Chow--Liu algorithm, which is an implementation of the MLE, to achieve the same accuracy. The key step in improving the Chow--Liu algorithm is to replace the empirical mutual information with the estimator for mutual information proposed by the authors. Further, applying the same replacement approach to classical Bayesian network classification, the resulting classifiers uniformly outperform the previous classifiers on 26 widely used datasets."

january 2015 by cshalizi

[1410.2568] Rising Above Chaotic Likelihoods

january 2015 by cshalizi

"Berliner (Likelihood and Bayesian prediction for chaotic systems, J. Am. Stat. Assoc. 1991) identified a number of difficulties in using the likelihood function within the Bayesian paradigm for state estimation and parameter estimation of chaotic systems. Even when the equations of the system are given, he demonstrated "chaotic likelihood functions" of initial conditions and parameter values in the 1-D Logistic Map. Chaotic likelihood functions, while ultimately smooth, have such complicated small scale structure as to cast doubt on the possibility of identifying high likelihood estimates in practice. In this paper, the challenge of chaotic likelihoods is overcome by embedding the observations in a higher dimensional sequence-space, which is shown to allow good state estimation with finite computational power. An Importance Sampling approach is introduced, where Pseudo-orbit Data Assimilation is employed in the sequence-space in order first to identify relevant pseudo-orbits and then relevant trajectories. Estimates are identified with likelihoods orders of magnitude higher than those previously identified in the examples given by Berliner. Importance Sampling uses the information from both system dynamics and observations. Using the relevant prior will, of course, eventually yield an accountable sample, but given the same computational resource this traditional approach would provide no high likelihood points at all. Berliner's central conclusion is supported. "chaotic likelihood functions" for parameter estimation still pose challenge; this fact is used to clarify why physical scientists tend to maintain a strong distinction between the initial condition uncertainty and parameter uncertainty."

to:NB
statistics
dynamical_systems
chaos
time_series
likelihood
smith.leonard
january 2015 by cshalizi

Maximum Likelihood Estimation of Misspecified Models

september 2014 by cshalizi

"This paper examines the consequences and detection of model misspecification when using maximum likelihood techniques for estimation and inference. The quasi-maximum likelihood estimator (OMLE) converges to a well defined limit, and may or may not be consistent for particular parameters of interest. Standard tests (Wald, Lagrange Multiplier, or Likelihood Ratio) are invalid in the presence of misspecification, but more general statistics are given which allow inferences to be drawn robustly. The properties of the QMLE and the information matrix are exploited to yield several useful tests for model misspecification."

to:NB
likelihood
estimation
misspecification
statistics
white.halbert
september 2014 by cshalizi

High-Dimensional Density Ratio Estimation with Extensions to Approximate Likelihood Computation | AISTATS 2014 | JMLR W&CP

april 2014 by cshalizi

"The ratio between two probability density functions is an important component of various tasks, including selection bias correction, novelty detection and classification. Recently, several estimators of this ratio have been proposed. Most of these methods fail if the sample space is high-dimensional, and hence require a dimension reduction step, the result of which can be a significant loss of information. Here we propose a simple-to-implement, fully nonparametric density ratio estimator that expands the ratio in terms of the eigenfunctions of a kernel-based operator; these functions reflect the underlying geometry of the data (e.g., submanifold structure), often leading to better estimates without an explicit dimension reduction step. We show how our general framework can be extended to address another important problem, the estimation of a likelihood function in situations where that function cannot be well-approximated by an analytical form. One is often faced with this situation when performing statistical inference with data from the sciences, due the complexity of the data and of the processes that generated those data. We emphasize applications where using existing likelihood-free methods of inference would be challenging due to the high dimensionality of the sample space, but where our spectral series method yields a reasonable estimate of the likelihood function. We provide theoretical guarantees and illustrate the effectiveness of our proposed method with numerical experiments."

density_estimation
density_ratio_estimation
likelihood
spectral_methods
high-dimensional_statistics
kith_and_kin
lee.ann_b.
izbicki.rafael
read_the_thesis
in_NB
april 2014 by cshalizi

Parameter Estimation for Hidden Markov Models with Intractable Likelihoods - Dean - 2014 - Scandinavian Journal of Statistics - Wiley Online Library

march 2014 by cshalizi

"Approximate Bayesian computation (ABC) is a popular technique for analysing data for complex models where the likelihood function is intractable. It involves using simulation from the model to approximate the likelihood, with this approximate likelihood then being used to construct an approximate posterior. In this paper, we consider methods that estimate the parameters by maximizing the approximate likelihood used in ABC. We give a theoretical analysis of the asymptotic properties of the resulting estimator. In particular, we derive results analogous to those of consistency and asymptotic normality for standard maximum likelihood estimation. We also discuss how sequential Monte Carlo methods provide a natural method for implementing our likelihood-based ABC procedures."

to:NB
estimation
state-space_models
approximate_bayesian_computation
likelihood
statistics
time_series
singh.sumeetpal
march 2014 by cshalizi

Likelihood Ratio Tests for Dependent Data with Applications to Longitudinal and Functional Data Analysis - Staicu - 2014 - Scandinavian Journal of Statistics - Wiley Online Library

march 2014 by cshalizi

"This paper introduces a general framework for testing hypotheses about the structure of the mean function of complex functional processes. Important particular cases of the proposed framework are as follows: (1) testing the null hypothesis that the mean of a functional process is parametric against a general alternative modelled by penalized splines; and (2) testing the null hypothesis that the means of two possibly correlated functional processes are equal or differ by only a simple parametric function. A global pseudo-likelihood ratio test is proposed, and its asymptotic distribution is derived. The size and power properties of the test are confirmed in realistic simulation scenarios. Finite-sample power results indicate that the proposed test is much more powerful than competing alternatives. Methods are applied to testing the equality between the means of normalized δ-power of sleep electroencephalograms of subjects with sleep-disordered breathing and matched controls."

to:NB
likelihood
hypothesis_testing
splines
nonparametrics
misspecification
statistics
to_teach:undergrad-ADA
march 2014 by cshalizi

Likelihood Methods for Point Processes with Refractoriness

march 2014 by cshalizi

"Likelihood-based encoding models founded on point processes have received significant attention in the literature because of their ability to reveal the information encoded by spiking neural populations. We propose an approximation to the likelihood of a point-process model of neurons that holds under assumptions about the continuous time process that are physiologically reasonable for neural spike trains: the presence of a refractory period, the predictability of the conditional intensity function, and its integrability. These are properties that apply to a large class of point processes arising in applications other than neuroscience. The proposed approach has several advantages over conventional ones. In particular, one can use standard fitting procedures for generalized linear models based on iteratively reweighted least squares while improving the accuracy of the approximation to the likelihood and reducing bias in the estimation of the parameters of the underlying continuous-time model. As a result, the proposed approach can use a larger bin size to achieve the same accuracy as conventional approaches would with a smaller bin size. This is particularly important when analyzing neural data with high mean and instantaneous firing rates. We demonstrate these claims on simulated and real neural spiking activity. By allowing a substantive increase in the required bin size, our algorithm has the potential to lower the barrier to the use of point-process methods in an increasing number of applications."

in_NB
neural_data_analysis
point_processes
likelihood
statistics
brown.emery
march 2014 by cshalizi

[1401.1026] A nonstandard empirical likelihood for time series

march 2014 by cshalizi

"Standard blockwise empirical likelihood (BEL) for stationary, weakly dependent time series requires specifying a fixed block length as a tuning parameter for setting confidence regions. This aspect can be difficult and impacts coverage accuracy. As an alternative, this paper proposes a new version of BEL based on a simple, though nonstandard, data-blocking rule which uses a data block of every possible length. Consequently, the method does not involve the usual block selection issues and is also anticipated to exhibit better coverage performance. Its nonstandard blocking scheme, however, induces nonstandard asymptotics and requires a significantly different development compared to standard BEL. We establish the large-sample distribution of log-ratio statistics from the new BEL method for calibrating confidence regions for mean or smooth function parameters of time series. This limit law is not the usual chi-square one, but is distribution-free and can be reproduced through straightforward simulations. Numerical studies indicate that the proposed method generally exhibits better coverage accuracy than standard BEL."

to:NB
likelihood
time_series
statistics
march 2014 by cshalizi

[1402.6409] Rate of convergence in the maximum likelihood estimation for partial discrete parameter, with applications to the cluster analysis and philology

march 2014 by cshalizi

"The problem of estimation of the distribution parameters on the sample when the part of these parameters are discrete (e.g. integer) is considered. We prove that the rate of convergence of MLE estimates under the natural conditions on the distribution density is exponentially fast."

to:NB
estimation
likelihood
statistics
march 2014 by cshalizi

[1401.6714] Information Theoretic Validity of Penalized Likelihood

february 2014 by cshalizi

"Building upon past work, which developed information theoretic notions of when a penalized likelihood procedure can be interpreted as codelengths arising from a two stage code and when the statistical risk of the procedure has a redundancy risk bound, we present new results and risk bounds showing that the l1 penalty in Gaussian Graphical Models fits the above story. We also show how twice the traditional l0 penalty times plus lower order terms which stay bounded on the whole parameter space has a conditional two stage description length interpretation and present risk bounds for this penalized likelihood procedure."

to:NB
information_theory
likelihood
information_criteria
barron.andrew_w.
statistics
february 2014 by cshalizi

[1212.3647] Parametric inference in the large data limit using maximally informative models

january 2014 by cshalizi

"Motivated by data-rich experiments in transcriptional regulation and sensory neuroscience, we consider the following general problem in statistical inference. When exposed to a high-dimensional signal S, a system of interest computes a representation R of that signal which is then observed through a noisy measurement M. From a large number of signals and measurements, we wish to infer the "filter" that maps S to R. However, the standard method for solving such problems, likelihood-based inference, requires perfect a priori knowledge of the "noise function" mapping R to M. In practice such noise functions are usually known only approximately, if at all, and using an incorrect noise function will typically bias the inferred filter. Here we show that, in the large data limit, this need for a pre-characterized noise function can be circumvented by searching for filters that instead maximize the mutual information I[M;R] between observed measurements and predicted representations. Moreover, if the correct filter lies within the space of filters being explored, maximizing mutual information becomes equivalent to simultaneously maximizing every dependence measure that satisfies the Data Processing Inequality. It is important to note that maximizing mutual information will typically leave a small number of directions in parameter space unconstrained. We term these directions "diffeomorphic modes" and present an equation that allows these modes to be derived systematically. The presence of diffeomorphic modes reflects a fundamental and nontrivial substructure within parameter space, one that is obscured by standard likelihood-based inference."

--- Published version: http://dx.doi.org/10.1162/NECO_a_00568

to:NB
likelihood
estimation
information_theory
statistics
to_be_shot_after_a_fair_trial
--- Published version: http://dx.doi.org/10.1162/NECO_a_00568

january 2014 by cshalizi

Taylor & Francis Online :: A Progressive Block Empirical Likelihood Method for Time Series - Journal of the American Statistical Association - Volume 108, Issue 504

december 2013 by cshalizi

"This article develops a new blockwise empirical likelihood (BEL) method for stationary, weakly dependent time processes, called the progressive block empirical likelihood (PBEL). In contrast to the standard version of BEL, which uses data blocks of constant length for a given sample size and whose performance can depend crucially on the block length selection, this new approach involves a data-blocking scheme where blocks increase in length by an arithmetic progression. Consequently, no block length selections are required for the PBEL method, which implies a certain type of robustness for this version of BEL. For inference of smooth functions of the process mean, theoretical results establish the chi-squared limit of the log-likelihood ratio based on PBEL, which can be used to calibrate confidence regions. Using the same progressive block scheme, distributional extensions are also provided for other nonparametric likelihoods with time series in the family of Cressie–Read discrepancies. Simulation evidence indicates that the PBEL method can perform comparably to the standard BEL in coverage accuracy (when the latter uses a “good” block choice) and can exhibit more stability, without the need to select a usual block length. Supplementary materials for this article are available online."

to:NB
likelihood
statistics
statistical_inference_for_stochastic_processes
december 2013 by cshalizi

American Statistical Association Portal :: Estimability and Likelihood Inference for Generalized Linear Mixed Models Using Data Cloning - Journal of the American Statistical Association - Volume 105, Issue 492

december 2013 by cshalizi

"Maximum likelihood estimation for Generalized Linear Mixed Models (GLMM), an important class of statistical models with substantial applications in epidemiology, medical statistics, and many other fields, poses significant computational difficulties. In this article, we use data cloning, a simple computational method that exploits advances in Bayesian computation, in particular the Markov Chain Monte Carlo method, to obtain maximum likelihood estimators of the parameters in these models. This method also leads to a simple estimator of the asymptotic variance of the maximum likelihood estimators. Determining estimability of the parameters in a mixed model is, in general, a very difficult problem. Data cloning provides a simple graphical test to not only check if the full set of parameters is estimable but also, and perhaps more importantly, if a specified function of the parameters is estimable. One of the goals of mixed models is to predict random effects. We suggest a frequentist method to obtain prediction intervals for random effects. We illustrate data cloning in the GLMM context by analyzing the Logistic–Normal model for over-dispersed binary data, and the Poisson–Normal model for repeated and spatial counts data. We consider Normal–Normal and Binary–Normal mixture models to show how data cloning can be used to study estimability of various parameters. We contend that whenever hierarchical models are used, estimability of the parameters should be checked before drawing scientific inferences or making management decisions. Data cloning facilitates such a check on hierarchical models."

in_NB
to_read
partial_identification
monte_carlo
likelihood
december 2013 by cshalizi

Data cloning: easy maximum likelihood estimation for complex ecological models using Bayesian Markov chain Monte Carlo methods - Lele - 2007 - Ecology Letters - Wiley Online Library

december 2013 by cshalizi

"We introduce a new statistical computing method, called data cloning, to calculate maximum likelihood estimates and their standard errors for complex ecological models. Although the method uses the Bayesian framework and exploits the computational simplicity of the Markov chain Monte Carlo (MCMC) algorithms, it provides valid frequentist inferences such as the maximum likelihood estimates and their standard errors. The inferences are completely invariant to the choice of the prior distributions and therefore avoid the inherent subjectivity of the Bayesian approach. The data cloning method is easily implemented using standard MCMC software. Data cloning is particularly useful for analysing ecological situations in which hierarchical statistical models, such as state-space models and mixed effects models, are appropriate. We illustrate the method by fitting two nonlinear population dynamics models to data in the presence of process and observation noise."

- There is nothing specifically ecological about this. The trick is to raise the likelihood to some large power k, as though one had observed k completely independent replicas of the data which all happened to be exactly the same, and then do ordinary Metropolis-Hastings. For sufficiently large k, the likelihood function dominates the prior, Bernstein-von Mises takes over, and the distribution of the posterior concentrates around the MLE, with inverse-Fisher-information variance. It's very clever, and there seem to be some extensions (can't recall if this first paper mentions them) about separating identified from unidentified parameters by seeing which posterior variances don't shrink as k is cranked up.

Ungated author copy: http://mysite.science.uottawa.ca/flutsche/PUBLICATIONS/LeleDennisLutscher2007.pdf

in_NB
monte_carlo
likelihood
re:stacs
have_read
statistics
estimation
state-space_models
hierarchical_statistical_models
- There is nothing specifically ecological about this. The trick is to raise the likelihood to some large power k, as though one had observed k completely independent replicas of the data which all happened to be exactly the same, and then do ordinary Metropolis-Hastings. For sufficiently large k, the likelihood function dominates the prior, Bernstein-von Mises takes over, and the distribution of the posterior concentrates around the MLE, with inverse-Fisher-information variance. It's very clever, and there seem to be some extensions (can't recall if this first paper mentions them) about separating identified from unidentified parameters by seeing which posterior variances don't shrink as k is cranked up.

Ungated author copy: http://mysite.science.uottawa.ca/flutsche/PUBLICATIONS/LeleDennisLutscher2007.pdf

december 2013 by cshalizi

[1311.7286] Approximate Bayesian Computation with composite score functions

december 2013 by cshalizi

"Both Approximate Bayesian Computation (ABC) and composite likelihood methods are useful for Bayesian and frequentist inference when the likelihood function is intractable. We show that composite likelihoods score functions can be fruitfully used as automatic informative summary statistics in ABC in order to obtain accurate approximations to the posterior distribution of the parameter of interest. This is formally motivated by the use of the score function of the full likelihood, and extended to general unbiased estimating functions in complex models. Examples illustrate that the proposed ABC procedure can significantly improve upon usual ABC methods based on ordinary data summaries."

to:NB
approximation
likelihood
statistics
estimation
december 2013 by cshalizi

[1207.0865] Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels

november 2013 by cshalizi

"Variational methods for parameter estimation are an active research area, potentially offering computationally tractable heuristics with theoretical performance bounds. We build on recent work that applies such methods to network data, and establish asymptotic normality rates for parameter estimates of stochastic blockmodel data, by either maximum likelihood or variational estimation. The result also applies to various sub-models of the stochastic blockmodel found in the literature."

in_NB
likelihood
estimation
community_discovery
network_data_analysis
statistics
choi.david_s.
kith_and_kin
variational_inference
bickel.peter_j.
november 2013 by cshalizi

Reid : Aspects of likelihood inference

september 2013 by cshalizi

"I review the classical theory of likelihood based inference and consider how it is being extended and developed for use in complex models and sampling schemes."

to:NB
statistics
likelihood
estimation
september 2013 by cshalizi

[1308.0049] A composite likelihood approach to computer model calibration using high-dimensional spatial data

august 2013 by cshalizi

"Computer models are used to model complex processes in various disciplines. Often, a key source of uncertainty in the behavior of complex computer models is uncertainty due to unknown model input parameters. Statistical computer model calibration is the process of inferring model parameter values, along with associated uncertainties, from observations of the physical process and from model outputs at various parameter settings. Observations and model outputs are often in the form of high-dimensional spatial fields, especially in the environmental sciences. Sound statistical inference may be computationally challenging in such situations. Here we introduce a composite likelihood-based approach to perform computer model calibration with high-dimensional spatial data. While composite likelihood has been studied extensively in the context of spatial statistics, computer model calibration using composite likelihood poses several new challenges. We propose a computationally efficient approach for Bayesian computer model calibration using composite likelihood. We also develop a methodology based on asymptotic theory for adjusting the composite likelihood posterior distribution so that it accurately represents posterior uncertainties. We study the application of our new approach in the context of calibration for a climate model."

to:NB
simulation
statistics
likelihood
computational_statistics
estimation
august 2013 by cshalizi

[1307.5381] A convex pseudo-likelihood framework for high dimensional partial correlation estimation with convergence guarantees

july 2013 by cshalizi

"Sparse high dimensional graphical model selection is a topic of much interest in modern day statistics. A popular approach is to apply l1-penalties to either (1) parametric likelihoods, or, (2) regularized regression/pseudo-likelihoods, with the latter having the distinct advantage that they do not explicitly assume Gaussianity. As none of the popular methods proposed for solving pseudo-likelihood based objective functions have provable convergence guarantees, it is not clear if corresponding estimators exist or are even computable, or if they actually yield correct partial correlation graphs. This paper proposes a new pseudo-likelihood based graphical model selection method that aims to overcome some of the shortcomings of current methods, but at the same time retain all their respective strengths. In particular, we introduce a novel framework that leads to a convex formulation of the partial covariance regression graph problem, resulting in an objective function comprised of quadratic forms. The objective is then optimized via a coordinate-wise approach. The specific functional form of the objective function facilitates rigorous convergence analysis leading to convergence guarantees; an important property that cannot be established using standard results, when the dimension is larger than the sample size, as is often the case in high dimensional applications. These convergence guarantees ensure that estimators are well-defined under very general conditions, and are always computable. In addition, the approach yields estimators that have good large sample properties and also respect symmetry. Furthermore, application to simulated/real data, timing comparisons and numerical convergence is demonstrated. We also present a novel unifying framework that places all graphical pseudo-likelihood methods as special cases of a more general formulation, leading to important insights."

to:NB
convexity
graphical_models
sparsity
optimization
lasso
likelihood
statistics
july 2013 by cshalizi

Li : Maximum-likelihood estimation for diffusion processes via closed-form density expansions

july 2013 by cshalizi

"This paper proposes a widely applicable method of approximate maximum-likelihood estimation for multivariate diffusion process from discretely sampled data. A closed-form asymptotic expansion for transition density is proposed and accompanied by an algorithm containing only basic and explicit calculations for delivering any arbitrary order of the expansion. The likelihood function is thus approximated explicitly and employed in statistical estimation. The performance of our method is demonstrated by Monte Carlo simulations from implementing several examples, which represent a wide range of commonly used diffusion models. The convergence related to the expansion and the estimation method are theoretically justified using the theory of Watanabe [Ann. Probab. 15 (1987) 1–39] and Yoshida [J. Japan Statist. Soc. 22 (1992) 139–159] on analysis of the generalized random variables under some standard sufficient conditions."

in_NB
stochastic_differential_equations
statistical_inference_for_stochastic_processes
estimation
likelihood
statistics
july 2013 by cshalizi

[1306.5603] Consistency of maximum likelihood estimation for some dynamical systems

june 2013 by cshalizi

"We consider the asymptotic consistency of maximum likelihood parameter estimation for dynamical systems observed with noise. Under suitable conditions on the dynamical systems and the observations, we show that maximum likelihood parameter estimation is consistent. Our proof involves ideas from both information theory and dynamical systems. Furthermore, we show how some well-studied properties of dynamical systems imply the general statistical properties related to maximum likelihood estimation. Finally, we exhibit classical families of dynamical systems for which maximum likelihood estimation is consistent. Examples include shifts of finite type with Gibbs measures and Axiom A attractors with SRB measures."

in_NB
to_read
dynamical_systems
time_series
information_theory
statistics
chaos
likelihood
estimation
statistical_inference_for_stochastic_processes
pillai.natesh
nobel.andrew
june 2013 by cshalizi

Peng , Schick : Empirical likelihood approach to goodness of fit testing

june 2013 by cshalizi

"Motivated by applications to goodness of fit testing, the empirical likelihood approach is generalized to allow for the number of constraints to grow with the sample size and for the constraints to use estimated criteria functions. The latter is needed to deal with nuisance parameters. The proposed empirical likelihood based goodness of fit tests are asymptotically distribution free. For univariate observations, tests for a specified distribution, for a distribution of parametric form, and for a symmetric distribution are presented. For bivariate observations, tests for independence are developed."

in_NB
goodness-of-fit
likelihood
regression
statistics
june 2013 by cshalizi

[1306.4032] Playing Russian Roulette with Intractable Likelihoods

june 2013 by cshalizi

"A general scheme to exploit Exact-Approximate MCMC methodology for intractable likelihoods is suggested. By representing the intractable likelihood as an infinite Maclaurin or Geometric series expansion, unbiased estimates of the likelihood can be obtained by finite time stochastic truncations of the series via Russian Roulette sampling. Whilst the estimates of the intractable likelihood are unbiased, for unbounded unnormalised densities they induce a signed measure in the Exact-Approximate Markov chain Monte Carlo procedure which will introduce bias in the invariant distribution of the chain. By exploiting results from the Quantum Chromodynamics literature the signed measures can be employed in an Exact-Approximate sampling scheme in such a way that expectations with respect to the desired target distribution are preserved. This provides a general methodology to construct Exact-Approximate sampling schemes for a wide range of models and the methodology is demonstrated on well known examples such as posterior inference of coupling parameters in Ising models and defining the posterior for Fisher-Bingham distributions defined on the $d$-Sphere. A large scale example is provided for a Gaussian Markov Random Field model, with fine scale mesh refinement, describing the Ozone Column data. To our knowledge this is the first time that fully Bayesian inference over a model of this size has been feasible without the need to resort to any approximations. Finally a critical assessment of the strengths and weaknesses of the methodology is provided with pointers to ongoing research."

in_NB
monte_carlo
approximate_bayesian_computation
simulation
likelihood
estimation
statistics
to_read
re:stacs
june 2013 by cshalizi

[1306.1493] Extended empirical likelihood for general estimating equations

june 2013 by cshalizi

"We derive an extended empirical likelihood for parameters defined by estimating equations which generalizes the original empirical likelihood for such parameters to the full parameter space. Under mild conditions, the extended empirical likelihood has all asymptotic properties of the original empirical likelihood. Its contours retain the data-driven shape of the latter. It can also attain the second order accuracy. The first order extended empirical likelihood is easy-to-use yet it is substantially more accurate than other empirical likelihoods, including second order ones. We recommend it for practical applications of the empirical likelihood method."

to:NB
likelihood
estimation
statistics
june 2013 by cshalizi

[1305.5712] Fast inference in generalized linear models via expected log-likelihoods

may 2013 by cshalizi

"Generalized linear models play an essential role in a wide variety of statistical applications. This paper discusses an approximation of the likelihood in these models that can greatly facilitate computation. The basic idea is to replace a sum that appears in the exact log-likelihood by an expectation over the model covariates; the resulting "expected log-likelihood" can in many cases be computed significantly faster than the exact log-likelihood. In many neuroscience experiments the distribution over model covariates is controlled by the experimenter and the expected log-likelihood approximation becomes particularly useful; for example, estimators based on maximizing this expected log-likelihood (or a penalized version thereof) can often be obtained with orders of magnitude computational savings compared to the exact maximum likelihood estimators. A risk analysis establishes that these maximum EL estimators often come with little cost in accuracy (and in some cases even improved accuracy) compared to standard maximum likelihood estimates. Finally, we find that these methods can significantly decrease the computation time of marginal likelihood calculations for model selection and of Markov chain Monte Carlo methods for sampling from the posterior parameter distribution. We illustrate our results by applying these methods to a computationally-challenging dataset of neural spike trains obtained via large-scale multi-electrode recordings in the primate retina."

to:NB
estimation
likelihood
computational_statistics
paninski.liam
to_teach:statcomp
to_teach:undergrad-ADA
generalized_linear_models
may 2013 by cshalizi

[1305.1056] Relative Performance of Expected and Observed Fisher Information in Covariance Estimation for Maximum Likelihood Estimates

may 2013 by cshalizi

"Maximum likelihood estimation is a popular method in statistical inference. As a way of assessing the accuracy of the maximum likelihood estimate (MLE), the calculation of the covariance matrix of the MLE is of great interest in practice. Standard statistical theory shows that the normalized MLE is asymptotically normally distributed with covariance matrix being the inverse of the Fisher information matrix (FIM) at the unknown parameter. Two commonly used estimates for the covariance of the MLE are the inverse of the observed FIM (the same as the inverse Hessian of the negative log-likelihood) and the inverse of the expected FIM (the same as the inverse FIM). Both of the observed and expected FIM are evaluated at the MLE from the sample data. In this dissertation, we demonstrate that, under reasonable conditions similar to standard MLE conditions, the inverse expected FIM outperforms the inverse observed FIM under a mean squared error criterion. Specifically, in an asymptotic sense, the inverse expected FIM (evaluated at the MLE) has no greater mean squared error with respect to the true covariance matrix than the inverse observed FIM (evaluated at the MLE) at the element level. This result is different from widely accepted results showing preference for the observed FIM. In this dissertation, we present theoretical derivations that lead to the conclusion above. We also present numerical studies on three distinct problems to support the theoretical result. This dissertation also includes two appendices on topics of relevance to stochastic systems. The first appendix discusses optimal perturbation distributions for the simultaneous perturbation stochastic approximation (SPSA) algorithm. The second appendix considers Monte Carlo methods for computing FIMs when closed forms are not attainable."

--- Huh, I guess I'd been relying on folklore.

to:NB
fisher_information
estimation
statistics
likelihood
--- Huh, I guess I'd been relying on folklore.

may 2013 by cshalizi

[1304.0503] Non-parametric likelihood based estimation of linear filters for point processes

april 2013 by cshalizi

"We consider models for multivariate point processes where the intensity is given non-parametrically in terms of functions in a reproducing kernel Hilbert space. The likelihood function involves a time integral and is consequently not given in terms of a finite number of kernel evaluations. We derive a representation of the gradient of the log-likelihood and provide two methods for practically computing approximations to the gradient by time discretization. We illustrate the methods by an application to neuron network modeling, and we investigate how the computational costs of the methods depend on the resolution of the time discretization. The methods are implemented and available in the R-package ppstat."

in_NB
hilbert_space
likelihood
point_processes
statistics
estimation
nonparametrics
april 2013 by cshalizi

[1303.6794] A likelihood based framework for assessing network evolution models tested on real network data

march 2013 by cshalizi

"This paper presents a statistically sound method for using likelihood to assess potential models of network evolution. The method is tested on data from five real networks. Data from the internet autonomous system network, from two photo sharing sites and from a co-authorship network are tested using this framework."

to:NB
statistics
likelihood
network_data_analysis
march 2013 by cshalizi

Computing Likelihood Functions for High-Energy Physics Experiments when Distributions are Defined by Simulators with Nuisance Parameters

march 2013 by cshalizi

"When searching for new phenomena in high-energy physics, statistical analysis is complicated by the presence of nuisance parameters, representing uncertainty in the physics of interactions or in detector properties. Another complication, even with no nuisance parameters, is that the probability distributions of the models are specified only by simulation programs, with no way of evaluating their probability density functions. I advocate expressing the result of an experiment by means of the likelihood function, rather than by frequentist confidence intervals or p-values. A likelihood function for this problem is difficult to obtain, however, for both of the reasons given above. I discuss ways of circumventing these problems by reducing dimensionality using a classifier and employing simulations with multiple values for the nuisance parameters."

to:NB
statistics
likelihood
computational_statistics
simulation
neal.radford
march 2013 by cshalizi

[1302.5468] What does the proof of Birnbaum's theorem prove?

february 2013 by cshalizi

"Birnbaum's theorem, that the sufficiency and conditionality principles entail the likelihood principle, has engendered a great deal of controversy and discussion since the publication of the result in 1962. In particular, many have raised doubts as to the validity of this result. Typically these doubts are concerned with the validity of the principles of sufficiency and conditionality as expressed by Birnbaum. Technically it would seem, however, that the proof itself is sound. In this paper we use set theory to formalize the context in which the result is proved and show that in fact Birnbaum's theorem is incorrectly stated as a key hypothesis is left out of the statement. When this hypothesis is added, we see that sufficiency is irrelevant, and that the result is dependent on a well-known flaw in conditionality that renders the result almost vacuous."

in_NB
statistics
sufficiency
likelihood
foundations_of_statistics
have_read
february 2013 by cshalizi

[1302.3071] A penalized empirical likelihood method in high dimensions

february 2013 by cshalizi

"This paper formulates a penalized empirical likelihood (PEL) method for inference on the population mean when the dimension of the observations may grow faster than the sample size. Asymptotic distributions of the PEL ratio statistic is derived under different component-wise dependence structures of the observations, namely, (i) non-Ergodic, (ii) long-range dependence and (iii) short-range dependence. It follows that the limit distribution of the proposed PEL ratio statistic can vary widely depending on the correlation structure, and it is typically different from the usual chi-squared limit of the empirical likelihood ratio statistic in the fixed and finite dimensional case. A unified subsampling based calibration is proposed, and its validity is established in all three cases, (i)-(iii). Finite sample properties of the method are investigated through a simulation study."

to:NB
likelihood
statistics
lahiri.s.n.
to_read
high-dimensional_statistics
february 2013 by cshalizi

[1302.3302] Asymptotic power of likelihood ratio tests for high dimensional data

february 2013 by cshalizi

"This paper considers the asymptotic power of likelihood ratio test (LRT) for the identity test when the dimension p is large compared to the sample size n. The asymptotic distribution of LRT under alternatives is given and an explicit expression of the power is derived. A simulation study is carried out to compare LRT with other tests. All these studies show that LRT is a powerful test to detect eigenvalues around zero. "

to:NB
likelihood
statistics
model_selection
hypothesis_testing
february 2013 by cshalizi

[1302.3567] Efficient Approximations for the Marginal Likelihood of Incomplete Data Given a Bayesian Network

february 2013 by cshalizi

"We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MDL approximation. We also consider approximations proposed by Draper (1993) and Cheeseman and Stutz (1995). These approximations are as efficient as BIC/MDL, but their accuracy has not been studied in any depth. We compare the accuracy of these approximations under the assumption that the Laplace approximation is the most accurate. In experiments using synthetic data generated from discrete naive-Bayes models having a hidden root node, we find that the CS measure is the most accurate."

to:NB
likelihood
statistics
graphical_models
laplace_approximation
february 2013 by cshalizi

Hinkley : Predictive Likelihood

january 2013 by cshalizi

"The likelihood function is the common basis of all parametric inference. However, with the exception of an ad hoc definition by Fisher, there has been no such unifying basis for prediction of future events, given past observations. This article proposes a definition of predictive likelihood which can help to remove some nonuniqueness problems in sampling-theory predictive inference, and which can produce a simple prediction analog of the Bayesian parametric result, posterior ∝ prior × likelihood, in many situations."

- Parallel to Lauritzen's work, according to p. 1.

in_NB
to_read
prediction
likelihood
statistics
sufficiency
- Parallel to Lauritzen's work, according to p. 1.

january 2013 by cshalizi

[1301.0463] A Simple Approach to Maximum Intractable Likelihood Estimation

january 2013 by cshalizi

"Approximate Bayesian Computation (ABC) can be viewed as an analytic approximation of an intractable likelihood coupled with an elementary simulation step. Such a view, combined with a suitable instrumental prior distribution permits maximum-likelihood (or maximum-a-posteriori) inference to be conducted, approximately, using essentially the same techniques. An elementary approach to this problem which simply obtains a nonparametric approximation of the likelihood surface which is then used as a smooth proxy for the likelihood in a subsequent maximisation step is developed here and the convergence of this class of algorithms is characterised theoretically. The use of non-sufficient summary statistics in this context is considered. Applying the proposed method to four problems demonstrates good performance. The proposed approach provides an alternative for approximating the maximum likelihood estimator (MLE) in complex scenarios."

Journal version (in open-access _Electronic Journal of Statistics_): http://dx.doi.org/10.1214/13-EJS819

in_NB
to_read
statistics
estimation
likelihood
approximate_bayesian_computation
indirect_inference
re:stacs
simulation
to_teach:complexity-and-inference
Journal version (in open-access _Electronic Journal of Statistics_): http://dx.doi.org/10.1214/13-EJS819

january 2013 by cshalizi

Taylor & Francis Online :: Fast Inference for the Latent Space Network Model Using a Case-Control Approximate Likelihood - Journal of Computational and Graphical Statistics - Volume 21, Issue 4

december 2012 by cshalizi

"Network models are widely used in social sciences and genome sciences. The latent space model proposed by Hoff et al. (2002), and extended by Handcock et al. (2007) to incorporate clustering, provides a visually interpretable model-based spatial representation of relational data and takes account of several intrinsic network properties. Due to the structure of the likelihood function of the latent space model, the computational cost is of order O(N 2), where N is the number of nodes. This makes it infeasible for large networks. In this article, we propose an approximation of the log-likelihood function. We adapt the case-control idea from epidemiology and construct a case-control log-likelihood, which is an unbiased estimator of the log-full likelihood. Replacing the full likelihood by the case-control likelihood in the Markov chain Monte Carlo estimation of the latent space model reduces the computational time from O(N 2) to O(N), making it feasible for large networks. We evaluate its performance using simulated and real data. We fit the model to a large protein–protein interaction data using the case-control likelihood and use the model fitted link probabilities to identify false positive links. Supplemental materials are available online."

to:NB
statistics
network_data_analysis
approximation
likelihood
inference_to_latent_objects
hoff.peter
raftery.adrian
december 2012 by cshalizi

FLAT PRIORS IN FLATLAND: STONE’S PARADOX « Normal Deviate

december 2012 by cshalizi

"Another consequence of Stone’s example is that, in my opinion, it shows that the Likelihood Principle is bogus. According to the likelihood principle, the observed likelihood function contains all the useful information in the data. In this example, the likelihood does not distinguish the four possible parameter values.

"But the direction of the string from the current position — which does not affect the likelihood — clearly has lots of information."

statistics
bayesianism
likelihood
"But the direction of the string from the current position — which does not affect the likelihood — clearly has lots of information."

december 2012 by cshalizi

[1206.2245] Pippi - painless parsing, post-processing and plotting of posterior and likelihood samples

june 2012 by cshalizi

"Interpreting samples from likelihood or posterior probability density functions is rarely as straightforward as it seems it should be. Producing publication-quality graphics of these distributions is often similarly painful. In this short note I describe pippi, a simple, publicly-available package for parsing and post-processing such samples, as well as generating high-quality PDF graphics of the results. Pippi is easily and extensively configurable and customisable, both in its options for parsing and post-processing samples, and in the visual aspects of the figures it produces. I illustrate some of these using an existing supersymmetric global fit, performed in the context of a gamma-ray search for dark matter. Pippi can be downloaded and followed at this http URL"

in_NB
statistics
likelihood
monte_carlo
estimation
visual_display_of_quantitative_information
june 2012 by cshalizi

[0804.2996] The Epic Story of Maximum Likelihood

june 2012 by cshalizi

"At a superficial level, the idea of maximum likelihood must be prehistoric: early hunters and gatherers may not have used the words ``method of maximum likelihood'' to describe their choice of where and how to hunt and gather, but it is hard to believe they would have been surprised if their method had been described in those terms. It seems a simple, even unassailable idea: Who would rise to argue in favor of a method of minimum likelihood, or even mediocre likelihood? And yet the mathematical history of the topic shows this ``simple idea'' is really anything but simple. Joseph Louis Lagrange, Daniel Bernoulli, Leonard Euler, Pierre Simon Laplace and Carl Friedrich Gauss are only some of those who explored the topic, not always in ways we would sanction today. In this article, that history is reviewed from back well before Fisher to the time of Lucien Le Cam's dissertation. In the process Fisher's unpublished 1930 characterization of conditions for the consistency and efficiency of maximum likelihood estimates is presented, and the mathematical basis of his three proofs discussed. In particular, Fisher's derivation of the information inequality is seen to be derived from his work on the analysis of variance, and his later approach via estimating functions was derived from Euler's Relation for homogeneous functions. The reaction to Fisher's work is reviewed, and some lessons drawn."

Gated version: http://projecteuclid.org/euclid.ss/1207580174

in_NB
likelihood
statistics
estimation
history_of_statistics
stigler.stephen
fisher.r.a.
pearson.karl
neyman.jerzy
hotelling.harold
cramer-rao_inequality
information_geometry
have_read
wald.abraham
Gated version: http://projecteuclid.org/euclid.ss/1207580174

june 2012 by cshalizi

[1206.0867] Testing linear hypotheses in high-dimensional regressions

june 2012 by cshalizi

"For a multivariate linear model, Wilk's likelihood ratio test (LRT) constitutes one of the cornerstone tools. However, the computation of its quantiles under the null or the alternative requires complex analytic approximations and more importantly, these distributional approximations are feasible only for moderate dimension of the dependent variable, say $ple 20$. On the other hand, assuming that the data dimension $p$ as well as the number $q$ of regression variables are fixed while the sample size $n$ grows, several asymptotic approximations are proposed in the literature for Wilk's $bLa$ including the widely used chi-square approximation. In this paper, we consider necessary modifications to Wilk's test in a high-dimensional context, specifically assuming a high data dimension $p$ and a large sample size $n$. Based on recent random matrix theory, the correction we propose to Wilk's test is asymptotically Gaussian under the null and simulations demonstrate that the corrected LRT has very satisfactory size and power, surely in the large $p$ and large $n$ context, but also for moderately large data dimensions like $p=30$ or $p=50$. As a byproduct, we give a reason explaining why the standard chi-square approximation fails for high-dimensional data. We also introduce a new procedure for the classical multiple sample significance test in MANOVA which is valid for high-dimensional data."

to:NB
to_read
statistics
model_selection
likelihood
re:model_selection_for_networks
high-dimensional_statistics
june 2012 by cshalizi

[0808.4042] Statistical models, likelihood, penalized likelihood and hierarchical likelihood

february 2012 by cshalizi

"We give an overview of statistical models and likelihood, together with two of its variants: penalized and hierarchical likelihood. The Kullback-Leibler divergence is referred to repeatedly, for defining the misspecification risk of a model, for grounding the likelihood and the likelihood crossvalidation which can be used for choosing weights in penalized likelihood. Families of penalized likelihood and sieves estimators are shown to be equivalent. The similarity of these likelihood with a posteriori distributions in a Bayesian approach is considered."

in_NB
statistics
likelihood
bayesianism
information_theory
february 2012 by cshalizi

Stochastic Composite Likelihood

november 2010 by cshalizi

"Maximum likelihood estimators are often of limited practical use due to the intensive computation they require. We propose a family of alternative estimators that maximize a stochastic variation of the composite likelihood function. Each of the estimators resolve the computation-accuracy tradeoff differently, and taken together they span a continuous spectrum of computation-accuracy tradeoff resolutions. We prove the consistency of the estimators, provide formulas for their asymptotic variance, statistical robustness, and computational complexity. We discuss experimental results in the context of Boltzmann machines and conditional random fields. The theoretical and experimental studies demonstrate the effectiveness of the estimators when the computational resources are insufficient. They also demonstrate that in some cases reduced computational complexity is associated with robustness thereby increasing statistical accuracy."

likelihood
estimation
statistics
lebanon.guy
to_read
november 2010 by cshalizi

"A note on the asymptotic behaviour of empirical likelihood statistics" - Statistical Methods & Applications, Volume 19, Number 4

october 2010 by cshalizi

"This paper develops some theoretical results about the asymptotic behaviour of the empirical likelihood and the empirical profile likelihood statistics, which originate from fairly general estimating functions. The results accommodate, within a unified framework, various situations potentially occurring in a wide range of applications. For this reason, they are potentially useful in several contexts, such as, for example, in inference for dependent data. We provide examples showing that known findings in literature about the asymptotic behaviour of some empirical likelihood statistics in time series models can be derived as particular cases of our results."

empirical_likelihood
asymptotics
statistics
estimation
likelihood
statistical_inference_for_stochastic_processes
october 2010 by cshalizi

Default priors for Bayesian and frequentist inference - Fraser et al. - 2010 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library

october 2010 by cshalizi

"We investigate the choice of default priors for use with likelihood for Bayesian and frequentist inference. Such a prior is a density or relative density that weights an observed likelihood function, leading to the elimination of parameters that are not of interest and then a density-type assessment for a parameter of interest. For independent responses from a continuous model, we develop a prior for the full parameter that is closely linked to the original Bayes approach and provides an extension of the right invariant measure to general contexts. We then develop a modified prior that is targeted on a component parameter of interest and by targeting avoids the marginalization paradoxes of Dawid and co-workers. This modifies Jeffreys's prior and provides extensions to the development of Welch and Peers. ... combined to explore priors for a vector parameter of interest in the presence of a vector nuisance parameter. Examples ... illustrate the computation of the priors."

likelihood
estimation
default_priors
bayesianism
statistics
nuisance_parameters
october 2010 by cshalizi

"Predictive Likelihood Inference, with Applications" - Butler, Journal of the Royal Statistical Society. Series B (Methodological), Vol. 48, No. 1 (1986), pp. 1-38

july 2010 by cshalizi

"in the predictive setting, all parameters are nuisance parameters": yes!

prediction
likelihood
estimation
statistics
july 2010 by cshalizi

[1003.0691] Statistical and Computational Tradeoffs in Stochastic Composite Likelihood

march 2010 by cshalizi

"Maximum likelihood estimators are often of limited practical use due to the intensive computation they require. We propose a family of alternative estimators that maximize a stochastic variation of the composite likelihood function. Each of the estimators resolve the computation-accuracy tradeoff differently, and taken together they span a continuous spectrum of computation-accuracy tradeoff resolutions. We prove the consistency of the estimators, provide formulas for their asymptotic variance, statistical robustness, and computational complexity. We discuss experimental results in the context of Boltzmann machines and conditional random fields. The theoretical and experimental studies demonstrate the effectiveness of the estimators when the computational resources are insufficient. They also demonstrate that in some cases reduced computational complexity is associated with robustness thereby increasing statistical accuracy."

statistics
estimation
likelihood
computational_statistics
lebanon.guy
march 2010 by cshalizi

[math/0611376] Efficient likelihood estimation in state space models

march 2010 by cshalizi

"Motivated by studying asymptotic properties of the maximum likelihood estimator (MLE) in stochastic volatility (SV) models, in this paper we investigate likelihood estimation in state space models. We first prove, under some regularity conditions, there is a consistent sequence of roots of the likelihood equation that is asymptotically normal with the inverse of the Fisher information as its variance. With an extra assumption that the likelihood equation has a unique root for each $n$, then there is a consistent sequence of estimators of the unknown parameters. If, in addition, the supremum of the log likelihood function is integrable, the MLE exists and is strongly consistent. Edgeworth expansion of the approximate solution of likelihood equation is also established. Several examples, including Markov switching models, ARMA models, (G)ARCH models and stochastic volatility (SV) models, are given for illustration."

estimation
time_series
state-space_models
markov_models
likelihood
statistics
march 2010 by cshalizi

Likelihood for statistically equivalent models. John Copas. 2010; JRSS B

january 2010 by cshalizi

"In likelihood inference we usually assume that the model is fixed and then base inference on the corresponding likelihood function. Often, however, the choice of model is rather arbitrary, and there may be other models which fit the data equally well. We study robustness of likelihood inference over such 'statistically equivalent' models and suggest a simple 'envelope likelihood' to capture this aspect of model uncertainty. Robustness depends critically on how we specify the parameter of interest. Some asymptotic theory is presented, illustrated by three examples."

statistics
estimation
likelihood
model_uncertainty
misspecification
re:phil-of-bayes_paper
to_read
january 2010 by cshalizi

Commenges: Statistical models: Conventional, penalized and hierarchical likelihood

december 2009 by cshalizi

"We give an overview of statistical models and likelihood, together with two of its variants: penalized and hierarchical likelihood. The Kullback-Leibler divergence is referred to repeatedly in the literature, for defining the misspecification risk of a model and for grounding the likelihood and the likelihood cross-validation, which can be used for choosing weights in penalized likelihood. Families of penalized likelihood and particular sieves estimators are shown to be equivalent. The similarity of these likelihoods with a posteriori distributions in a Bayesian approach is considered."

statistics
likelihood
cross-validation
re:phil-of-bayes_paper
to_read
december 2009 by cshalizi

Accurate Parametric Inference for Small Samples

june 2009 by cshalizi

Looks like a teaser for the book by Brazzale, Davison and Reid.

statistics
estimation
likelihood
asymptotics
have_skimmed
june 2009 by cshalizi

[0708.2184] Monte Carlo likelihood inference for missing data models

november 2007 by cshalizi

"We describe a Monte Carlo method to approximate the maximum likelihood estimate (MLE), when there are missing data and the observed data likelihood is not available in closed form. This method uses simulated missing data that are independent and identically distributed and independent of the observed data. Our Monte Carlo approximation to the MLE is a consistent and asymptotically normal estimate of the minimizer $\theta^*$ of the Kullback--Leibler information, as both Monte Carlo and observed data sample sizes go to infinity simultaneously. Plug-in estimates of the asymptotic variance are provided for constructing confidence regions for $\theta^*$. We give Logit--Normal generalized linear mixed model examples, calculated using an R package."

- "Have read" in the sense of skipping all the proofs, but wanting to go back to them.

statistics
monte_carlo
missing_data
in_NB
geyer.charles_j.
have_read
to_teach:undergrad-ADA
likelihood
estimation
- "Have read" in the sense of skipping all the proofs, but wanting to go back to them.

november 2007 by cshalizi

**related tags**

Copy this bookmark: