cshalizi + likelihood   69

[1908.08741] A relation between log-likelihood and cross-validation log-scores
"It is shown that the log-likelihood of a hypothesis or model given some data is equivalent to an average of all leave-one-out cross-validation log-scores that can be calculated from all subsets of the data. This relation can be generalized to any k-fold cross-validation log-scores."

--- This sounds funny, because leave-one-out is (asymptotically) equivalent to the robustified AIC (= Takeuchi information criterion).

--- ETA after reading: The algebra looks legit, but kinda pointless.
statistics  likelihood  cross-validation  have_read  shot_after_a_fair_trial  not_worth_putting_in_notebooks 
6 weeks ago by cshalizi
A Likelihood Ratio Approach to Sequential Change Point Detection for a General Class of Parameters: Journal of the American Statistical Association: Vol 0, No 0
"In this article, we propose a new approach for sequential monitoring of a general class of parameters of a d-dimensional time series, which can be estimated by approximately linear functionals of the empirical distribution function. We consider a closed-end method, which is motivated by the likelihood ratio test principle and compare the new method with two alternative procedures. We also incorporate self-normalization such that estimation of the long-run variance is not necessary. We prove that for a large class of testing problems the new detection scheme has asymptotic level α and is consistent. The asymptotic theory is illustrated for the important cases of monitoring a change in the mean, variance, and correlation. By means of a simulation study it is demonstrated that the new test performs better than the currently available procedures for these problems. Finally, the methodology is illustrated by a small data example investigating index prices from the dot-com bubble."
to:NB  change-point_problem  likelihood  statistics 
10 weeks ago by cshalizi
[1907.09611] Asymptotic normality, concentration, and coverage of generalized posteriors
"Generalized likelihoods are commonly used to obtain consistent estimators with attractive computational and robustness properties. Formally, any generalized likelihood can be used to define a generalized posterior distribution, but an arbitrarily defined "posterior" cannot be expected to appropriately quantify uncertainty in any meaningful sense. In this article, we provide sufficient conditions under which generalized posteriors exhibit concentration, asymptotic normality (Bernstein-von Mises), an asymptotically correct Laplace approximation, and asymptotically correct frequentist coverage. We apply our results in detail to generalized posteriors for a wide array of generalized likelihoods, including pseudolikelihoods in general, the Ising model pseudolikelihood, the Gaussian Markov random field pseudolikelihood, the fully observed Boltzmann machine pseudolikelihood, the Cox proportional hazards partial likelihood, and a median-based likelihood for robust inference of location. Further, we show how our results can be used to easily establish the asymptotics of standard posteriors for exponential families and generalized linear models. We make no assumption of model correctness so that our results apply with or without misspecification."
to:NB  bayesian_consistency  statistics  to_read  likelihood  misspecification 
11 weeks ago by cshalizi
Mitchell , Allman , Rhodes : Hypothesis testing near singularities and boundaries
"The likelihood ratio statistic, with its asymptotic χ2χ2 distribution at regular model points, is often used for hypothesis testing. However, the asymptotic distribution can differ at model singularities and boundaries, suggesting the use of a χ2χ2 might be problematic nearby. Indeed, its poor behavior for testing near singularities and boundaries is apparent in simulations, and can lead to conservative or anti-conservative tests. Here we develop a new distribution designed for use in hypothesis testing near singularities and boundaries, which asymptotically agrees with that of the likelihood ratio statistic. For two example trinomial models, arising in the context of inference of evolutionary trees, we show the new distributions outperform a χ2χ2."
to:NB  hypothesis_testing  likelihood  statistics 
12 weeks ago by cshalizi
Likelihood Ratio Tests for a Large Directed Acyclic Graph: Journal of the American Statistical Association: Vol 0, No 0
"Inference of directional pairwise relations between interacting units in a directed acyclic graph (DAG), such as a regulatory gene network, is common in practice, imposing challenges because of lack of inferential tools. For example, inferring a specific gene pathway of a regulatory gene network is biologically important. Yet, frequentist inference of directionality of connections remains largely unexplored for regulatory models. In this article, we propose constrained likelihood ratio tests for inference of the connectivity as well as directionality subject to nonconvex acyclicity constraints in a Gaussian directed graphical model. Particularly, we derive the asymptotic distributions of the constrained likelihood ratios in a high-dimensional situation. For testing of connectivity, the asymptotic distribution is either chi-squared or normal depending on if the number of testable links in a DAG model is small. For testing of directionality, the asymptotic distribution is the minimum of d independent chi-squared variables with one-degree of freedom or a generalized Gamma distribution depending on if d is small, where d is number of breakpoints in a hypothesized pathway. Moreover, we develop a computational method to perform the proposed tests, which integrates an alternating direction method of multipliers and difference convex programming. Finally, the power analysis and simulations suggest that the tests achieve the desired objectives of inference. An analysis of an Alzheimer’s disease gene expression dataset illustrates the utility of the proposed method to infer a directed pathway in a gene network."
to:NB  graphical_models  likelihood  hypothesis_testing  statistics 
12 weeks ago by cshalizi
[1703.07963] A Donsker-type Theorem for Log-likelihood Processes
"Let (Ω,,()t≥0,P) be a complete stochastic basis, X a semimartingale with predictable compensator (B,C,ν). Consider a family of probability measures P=(Pn,ψ,ψ∈Ψ,n≥1), where Ψ is an index set, Pn,ψ≪locP, and denote the likelihood ratio process by Zn,ψt=dPn,ψ|tdP|t. Under some regularity conditions in terms of logarithm entropy and Hellinger processes, we prove that logZnt converges weakly to a Gaussian process in ℓ∞(Ψ) as n→∞ for each fixed t>0."
to:NB  statistics  likelihood  convergence_of_stochastic_processes 
june 2019 by cshalizi
[1905.11505] Validation of Approximate Likelihood and Emulator Models for Computationally Intensive Simulations
"Complex phenomena are often modeled with computationally intensive feed-forward simulations for which a tractable analytic likelihood does not exist. In these cases, it is sometimes necessary to use an approximate likelihood or faster emulator model for efficient statistical inference. We describe a new two-sample testing framework for quantifying the quality of the fit to simulations at fixed parameter values. This framework can leverage any regression method to handle complex high-dimensional data and attain higher power in settings where well-known distance-based tests would not. We also introduce a statistically rigorous test for assessing global goodness-of-fit across simulation parameters. In cases where the fit is inadequate, our method provides valuable diagnostics by allowing one to identify regions in both feature and parameter space which the model fails to reproduce well. We provide both theoretical results and examples which illustrate the effectiveness of our approach."
to:NB  statistics  simulation  likelihood  kith_and_kin  lee.ann_b.  izbicki.rafael 
may 2019 by cshalizi
[1805.07454] Fisher Efficient Inference of Intractable Models
"Maximum Likelihood Estimators (MLE) has many good properties. For example, the asymptotic variance of MLE solution attains equality of the asymptotic Cram{é}r-Rao lower bound (efficiency bound), which is the minimum possible variance for an unbiased estimator. However, obtaining such MLE solution requires calculating the likelihood function which may not be tractable due to the normalization term of the density model. In this paper, we derive a Discriminative Likelihood Estimator (DLE) from the Kullback-Leibler divergence minimization criterion implemented via density ratio estimation procedure and Stein operator. We study the problem of model inference using DLE. We prove its consistency and show the asymptotic variance of its solution can also attain the equality of the efficiency bound under mild regularity conditions. We also propose a dual formulation of DLE which can be easily optimized. Numerical studies validate our asymptotic theorems and we give an example where DLE successfully estimates an intractable model constructed using a pre-trained deep neural network."
to:NB  likelihood  estimation  statistics 
may 2019 by cshalizi
[1905.09715] An illustration of the risk of borrowing information via a shared likelihood
"A concrete, stylized example illustrates that inferences may be degraded, rather than improved, by incorporating supplementary data via a joint likelihood. In the example, the likelihood is assumed to be correctly specified, as is the prior over the parameter of interest; all that is necessary for the joint modeling approach to suffer is misspecification of the prior over a nuisance parameter."
to:NB  misspecification  likelihood  statistics  hahn.p._richard 
may 2019 by cshalizi
Bretó : Modeling and Inference for Infectious Disease Dynamics: A Likelihood-Based Approach
"Likelihood-based statistical inference has been considered in most scientific fields involving stochastic modeling. This includes infectious disease dynamics, where scientific understanding can help capture biological processes in so-called mechanistic models and their likelihood functions. However, when the likelihood of such mechanistic models lacks a closed-form expression, computational burdens are substantial. In this context, algorithmic advances have facilitated likelihood maximization, promoting the study of novel data-motivated mechanistic models over the last decade. Reviewing these models is the focus of this paper. In particular, we highlight statistical aspects of these models like overdispersion, which is key in the interface between nonlinear infectious disease modeling and data analysis. We also point out potential directions for further model exploration."
to:NB  epidemic_models  likelihood  statistics 
may 2019 by cshalizi
A Composite Likelihood Framework for Analyzing Singular DSGE Models | The Review of Economics and Statistics | MIT Press Journals
"This paper builds on the composite likelihood concept of Lindsay (1988) to develop a framework for parameter identification, estimation, inference, and forecasting in dynamic stochastic general equilibrium (DSGE) models allowing for stochastic singularity. The framework consists of four components. First, it provides a necessary and sufficient condition for parameter identification, where the identifying information is provided by the first- and second-order properties of nonsingular submodels. Second, it provides a procedure based on Markov Chain Monte Carlo for parameter estimation. Third, it delivers confidence sets for structural parameters and impulse responses that allow for model misspecification. Fourth, it generates forecasts for all the observed endogenous variables, irrespective of the number of shocks in the model. The framework encompasses the conventional likelihood analysis as a special case when the model is nonsingular. It enables the researcher to start with a basic model and then gradually incorporate more shocks and other features, meanwhile confronting all the models with the data to assess their implications. The methodology is illustrated using both small- and medium-scale DSGE models. These models have numbers of shocks ranging between 1 and 7."
to:NB  state-space_models  economics  time_series  macroeconomics  statistics  likelihood  re:your_favorite_dsge_sucks 
january 2019 by cshalizi
[1507.04553] Approximate Maximum Likelihood Estimation
"In recent years, methods of approximate parameter estimation have attracted considerable interest in complex problems where exact likelihoods are hard to obtain. In their most basic form, Bayesian methods such as Approximate Bayesian Computation (ABC) involve sampling from the parameter space and keeping those parameters that produce data that fit sufficiently well to the actually observed data. Exploring the whole parameter space, however, makes this approach inefficient in high dimensional problems. This led to the proposal of more sophisticated iterative methods of inference such as particle filters.
"Here, we propose an alternative approach that is based on stochastic gradient methods and applicable both in a frequentist and a Bayesian setting. By moving along a simulated gradient, the algorithm produces a sequence of estimates that will eventually converge either to the maximum likelihood estimate or to the maximum of the posterior distribution, in each case under a set of observed summary statistics. To avoid reaching only a local maximum, we propose to run the algorithm from a set of random starting values.
"As good tuning of the algorithm is important, we explored several tuning strategies, and propose a set of guidelines that worked best in our simulations. We investigate the performance of our approach in simulation studies, and also apply the algorithm to two models with intractable likelihood functions. First, we present an application to inference in the context of queuing systems. We also re-analyze population genetic data and estimate parameters describing the demographic history of Sumatran and Bornean orang-utan populations."
in_NB  statistics  computational_statistics  stochastic_approximation  likelihood  estimation  primates 
august 2015 by cshalizi
[1506.01831] Handy sufficient conditions for the convergence of the maximum likelihood estimator in observation-driven models
"This paper generalizes asymptotic properties obtained in the observation-driven times series models considered by \cite{dou:kou:mou:2013} in the sense that the conditional law of each observation is also permitted to depend on the parameter. The existence of ergodic solutions and the consistency of the Maximum Likelihood Estimator (MLE) are derived under easy-to-check conditions. The obtained conditions appear to apply for a wide class of models. We illustrate our results with specific observation-driven times series, including the recently introduced NBIN-GARCH and NM-GARCH models, demonstrating the consistency of the MLE for these two models."
in_NB  statistics  likelihood  estimation  statistical_inference_for_stochastic_processes  douc.randal  chains_with_complete_connections 
july 2015 by cshalizi
Information-theoretic optimality of observation-driven time series models for continuous responses
"We investigate information-theoretic optimality properties of the score function of the predictive likelihood as a device for updating a real-valued time-varying parameter in a univariate observation-driven model with continuous responses. We restrict our attention to models with updates of one lag order. The results provide theoretical justification for a class of score-driven models which includes the generalized autoregressive conditional heteroskedasticity model as a special case. Our main contribution is to show that only parameter updates based on the score will always reduce the local Kullback–Leibler divergence between the true conditional density and the model-implied conditional density. This result holds irrespective of the severity of model misspecification. We also show that use of the score leads to a considerably smaller global Kullback–Leibler divergence in empirically relevant settings. We illustrate the theory with an application to time-varying volatility models. We show that the reduction in Kullback–Leibler divergence across a range of different settings can be substantial compared to updates based on, for example, squared lagged observations."
in_NB  statistics  information_theory  estimation  likelihood  prediction  time_series  chains_with_complete_connections 
june 2015 by cshalizi
[1409.7458] Beyond Maximum Likelihood: from Theory to Practice
"Maximum likelihood is the most widely used statistical estimation technique. Recent work by the authors introduced a general methodology for the construction of estimators for functionals in parametric models, and demonstrated improvements - both in theory and in practice - over the maximum likelihood estimator (MLE), particularly in high dimensional scenarios involving parameter dimension comparable to or larger than the number of samples. This approach to estimation, building on results from approximation theory, is shown to yield minimax rate-optimal estimators for a wide class of functionals, implementable with modest computational requirements. In a nutshell, a message of this recent work is that, for a wide class of functionals, the performance of these essentially optimal estimators with n samples is comparable to that of the MLE with nlnn samples.
"In the present paper, we highlight the applicability of the aforementioned methodology to statistical problems beyond functional estimation, and show that it can yield substantial gains. For example, we demonstrate that for learning tree-structured graphical models, our approach achieves a significant reduction of the required data size compared with the classical Chow--Liu algorithm, which is an implementation of the MLE, to achieve the same accuracy. The key step in improving the Chow--Liu algorithm is to replace the empirical mutual information with the estimator for mutual information proposed by the authors. Further, applying the same replacement approach to classical Bayesian network classification, the resulting classifiers uniformly outperform the previous classifiers on 26 widely used datasets."
to:NB  estimation  likelihood  statistics 
january 2015 by cshalizi
[1410.2568] Rising Above Chaotic Likelihoods
"Berliner (Likelihood and Bayesian prediction for chaotic systems, J. Am. Stat. Assoc. 1991) identified a number of difficulties in using the likelihood function within the Bayesian paradigm for state estimation and parameter estimation of chaotic systems. Even when the equations of the system are given, he demonstrated "chaotic likelihood functions" of initial conditions and parameter values in the 1-D Logistic Map. Chaotic likelihood functions, while ultimately smooth, have such complicated small scale structure as to cast doubt on the possibility of identifying high likelihood estimates in practice. In this paper, the challenge of chaotic likelihoods is overcome by embedding the observations in a higher dimensional sequence-space, which is shown to allow good state estimation with finite computational power. An Importance Sampling approach is introduced, where Pseudo-orbit Data Assimilation is employed in the sequence-space in order first to identify relevant pseudo-orbits and then relevant trajectories. Estimates are identified with likelihoods orders of magnitude higher than those previously identified in the examples given by Berliner. Importance Sampling uses the information from both system dynamics and observations. Using the relevant prior will, of course, eventually yield an accountable sample, but given the same computational resource this traditional approach would provide no high likelihood points at all. Berliner's central conclusion is supported. "chaotic likelihood functions" for parameter estimation still pose challenge; this fact is used to clarify why physical scientists tend to maintain a strong distinction between the initial condition uncertainty and parameter uncertainty."
to:NB  statistics  dynamical_systems  chaos  time_series  likelihood  smith.leonard 
january 2015 by cshalizi
Maximum Likelihood Estimation of Misspecified Models
"This paper examines the consequences and detection of model misspecification when using maximum likelihood techniques for estimation and inference. The quasi-maximum likelihood estimator (OMLE) converges to a well defined limit, and may or may not be consistent for particular parameters of interest. Standard tests (Wald, Lagrange Multiplier, or Likelihood Ratio) are invalid in the presence of misspecification, but more general statistics are given which allow inferences to be drawn robustly. The properties of the QMLE and the information matrix are exploited to yield several useful tests for model misspecification."
to:NB  likelihood  estimation  misspecification  statistics  white.halbert 
september 2014 by cshalizi
High-Dimensional Density Ratio Estimation with Extensions to Approximate Likelihood Computation | AISTATS 2014 | JMLR W&CP
"The ratio between two probability density functions is an important component of various tasks, including selection bias correction, novelty detection and classification. Recently, several estimators of this ratio have been proposed. Most of these methods fail if the sample space is high-dimensional, and hence require a dimension reduction step, the result of which can be a significant loss of information. Here we propose a simple-to-implement, fully nonparametric density ratio estimator that expands the ratio in terms of the eigenfunctions of a kernel-based operator; these functions reflect the underlying geometry of the data (e.g., submanifold structure), often leading to better estimates without an explicit dimension reduction step. We show how our general framework can be extended to address another important problem, the estimation of a likelihood function in situations where that function cannot be well-approximated by an analytical form. One is often faced with this situation when performing statistical inference with data from the sciences, due the complexity of the data and of the processes that generated those data. We emphasize applications where using existing likelihood-free methods of inference would be challenging due to the high dimensionality of the sample space, but where our spectral series method yields a reasonable estimate of the likelihood function. We provide theoretical guarantees and illustrate the effectiveness of our proposed method with numerical experiments."
density_estimation  density_ratio_estimation  likelihood  spectral_methods  high-dimensional_statistics  kith_and_kin  lee.ann_b.  izbicki.rafael  read_the_thesis  in_NB 
april 2014 by cshalizi
Parameter Estimation for Hidden Markov Models with Intractable Likelihoods - Dean - 2014 - Scandinavian Journal of Statistics - Wiley Online Library
"Approximate Bayesian computation (ABC) is a popular technique for analysing data for complex models where the likelihood function is intractable. It involves using simulation from the model to approximate the likelihood, with this approximate likelihood then being used to construct an approximate posterior. In this paper, we consider methods that estimate the parameters by maximizing the approximate likelihood used in ABC. We give a theoretical analysis of the asymptotic properties of the resulting estimator. In particular, we derive results analogous to those of consistency and asymptotic normality for standard maximum likelihood estimation. We also discuss how sequential Monte Carlo methods provide a natural method for implementing our likelihood-based ABC procedures."
to:NB  estimation  state-space_models  approximate_bayesian_computation  likelihood  statistics  time_series  singh.sumeetpal 
march 2014 by cshalizi
Likelihood Ratio Tests for Dependent Data with Applications to Longitudinal and Functional Data Analysis - Staicu - 2014 - Scandinavian Journal of Statistics - Wiley Online Library
"This paper introduces a general framework for testing hypotheses about the structure of the mean function of complex functional processes. Important particular cases of the proposed framework are as follows: (1) testing the null hypothesis that the mean of a functional process is parametric against a general alternative modelled by penalized splines; and (2) testing the null hypothesis that the means of two possibly correlated functional processes are equal or differ by only a simple parametric function. A global pseudo-likelihood ratio test is proposed, and its asymptotic distribution is derived. The size and power properties of the test are confirmed in realistic simulation scenarios. Finite-sample power results indicate that the proposed test is much more powerful than competing alternatives. Methods are applied to testing the equality between the means of normalized δ-power of sleep electroencephalograms of subjects with sleep-disordered breathing and matched controls."
to:NB  likelihood  hypothesis_testing  splines  nonparametrics  misspecification  statistics  to_teach:undergrad-ADA 
march 2014 by cshalizi
Likelihood Methods for Point Processes with Refractoriness
"Likelihood-based encoding models founded on point processes have received significant attention in the literature because of their ability to reveal the information encoded by spiking neural populations. We propose an approximation to the likelihood of a point-process model of neurons that holds under assumptions about the continuous time process that are physiologically reasonable for neural spike trains: the presence of a refractory period, the predictability of the conditional intensity function, and its integrability. These are properties that apply to a large class of point processes arising in applications other than neuroscience. The proposed approach has several advantages over conventional ones. In particular, one can use standard fitting procedures for generalized linear models based on iteratively reweighted least squares while improving the accuracy of the approximation to the likelihood and reducing bias in the estimation of the parameters of the underlying continuous-time model. As a result, the proposed approach can use a larger bin size to achieve the same accuracy as conventional approaches would with a smaller bin size. This is particularly important when analyzing neural data with high mean and instantaneous firing rates. We demonstrate these claims on simulated and real neural spiking activity. By allowing a substantive increase in the required bin size, our algorithm has the potential to lower the barrier to the use of point-process methods in an increasing number of applications."
in_NB  neural_data_analysis  point_processes  likelihood  statistics  brown.emery 
march 2014 by cshalizi
[1401.1026] A nonstandard empirical likelihood for time series
"Standard blockwise empirical likelihood (BEL) for stationary, weakly dependent time series requires specifying a fixed block length as a tuning parameter for setting confidence regions. This aspect can be difficult and impacts coverage accuracy. As an alternative, this paper proposes a new version of BEL based on a simple, though nonstandard, data-blocking rule which uses a data block of every possible length. Consequently, the method does not involve the usual block selection issues and is also anticipated to exhibit better coverage performance. Its nonstandard blocking scheme, however, induces nonstandard asymptotics and requires a significantly different development compared to standard BEL. We establish the large-sample distribution of log-ratio statistics from the new BEL method for calibrating confidence regions for mean or smooth function parameters of time series. This limit law is not the usual chi-square one, but is distribution-free and can be reproduced through straightforward simulations. Numerical studies indicate that the proposed method generally exhibits better coverage accuracy than standard BEL."
to:NB  likelihood  time_series  statistics 
march 2014 by cshalizi
[1402.6409] Rate of convergence in the maximum likelihood estimation for partial discrete parameter, with applications to the cluster analysis and philology
"The problem of estimation of the distribution parameters on the sample when the part of these parameters are discrete (e.g. integer) is considered. We prove that the rate of convergence of MLE estimates under the natural conditions on the distribution density is exponentially fast."
to:NB  estimation  likelihood  statistics 
march 2014 by cshalizi
[1401.6714] Information Theoretic Validity of Penalized Likelihood
"Building upon past work, which developed information theoretic notions of when a penalized likelihood procedure can be interpreted as codelengths arising from a two stage code and when the statistical risk of the procedure has a redundancy risk bound, we present new results and risk bounds showing that the l1 penalty in Gaussian Graphical Models fits the above story. We also show how twice the traditional l0 penalty times plus lower order terms which stay bounded on the whole parameter space has a conditional two stage description length interpretation and present risk bounds for this penalized likelihood procedure."
to:NB  information_theory  likelihood  information_criteria  barron.andrew_w.  statistics 
february 2014 by cshalizi
[1212.3647] Parametric inference in the large data limit using maximally informative models
"Motivated by data-rich experiments in transcriptional regulation and sensory neuroscience, we consider the following general problem in statistical inference. When exposed to a high-dimensional signal S, a system of interest computes a representation R of that signal which is then observed through a noisy measurement M. From a large number of signals and measurements, we wish to infer the "filter" that maps S to R. However, the standard method for solving such problems, likelihood-based inference, requires perfect a priori knowledge of the "noise function" mapping R to M. In practice such noise functions are usually known only approximately, if at all, and using an incorrect noise function will typically bias the inferred filter. Here we show that, in the large data limit, this need for a pre-characterized noise function can be circumvented by searching for filters that instead maximize the mutual information I[M;R] between observed measurements and predicted representations. Moreover, if the correct filter lies within the space of filters being explored, maximizing mutual information becomes equivalent to simultaneously maximizing every dependence measure that satisfies the Data Processing Inequality. It is important to note that maximizing mutual information will typically leave a small number of directions in parameter space unconstrained. We term these directions "diffeomorphic modes" and present an equation that allows these modes to be derived systematically. The presence of diffeomorphic modes reflects a fundamental and nontrivial substructure within parameter space, one that is obscured by standard likelihood-based inference."

--- Published version: http://dx.doi.org/10.1162/NECO_a_00568
to:NB  likelihood  estimation  information_theory  statistics  to_be_shot_after_a_fair_trial 
january 2014 by cshalizi
Taylor & Francis Online :: A Progressive Block Empirical Likelihood Method for Time Series - Journal of the American Statistical Association - Volume 108, Issue 504
"This article develops a new blockwise empirical likelihood (BEL) method for stationary, weakly dependent time processes, called the progressive block empirical likelihood (PBEL). In contrast to the standard version of BEL, which uses data blocks of constant length for a given sample size and whose performance can depend crucially on the block length selection, this new approach involves a data-blocking scheme where blocks increase in length by an arithmetic progression. Consequently, no block length selections are required for the PBEL method, which implies a certain type of robustness for this version of BEL. For inference of smooth functions of the process mean, theoretical results establish the chi-squared limit of the log-likelihood ratio based on PBEL, which can be used to calibrate confidence regions. Using the same progressive block scheme, distributional extensions are also provided for other nonparametric likelihoods with time series in the family of Cressie–Read discrepancies. Simulation evidence indicates that the PBEL method can perform comparably to the standard BEL in coverage accuracy (when the latter uses a “good” block choice) and can exhibit more stability, without the need to select a usual block length. Supplementary materials for this article are available online."
to:NB  likelihood  statistics  statistical_inference_for_stochastic_processes 
december 2013 by cshalizi
American Statistical Association Portal :: Estimability and Likelihood Inference for Generalized Linear Mixed Models Using Data Cloning - Journal of the American Statistical Association - Volume 105, Issue 492
"Maximum likelihood estimation for Generalized Linear Mixed Models (GLMM), an important class of statistical models with substantial applications in epidemiology, medical statistics, and many other fields, poses significant computational difficulties. In this article, we use data cloning, a simple computational method that exploits advances in Bayesian computation, in particular the Markov Chain Monte Carlo method, to obtain maximum likelihood estimators of the parameters in these models. This method also leads to a simple estimator of the asymptotic variance of the maximum likelihood estimators. Determining estimability of the parameters in a mixed model is, in general, a very difficult problem. Data cloning provides a simple graphical test to not only check if the full set of parameters is estimable but also, and perhaps more importantly, if a specified function of the parameters is estimable. One of the goals of mixed models is to predict random effects. We suggest a frequentist method to obtain prediction intervals for random effects. We illustrate data cloning in the GLMM context by analyzing the Logistic–Normal model for over-dispersed binary data, and the Poisson–Normal model for repeated and spatial counts data. We consider Normal–Normal and Binary–Normal mixture models to show how data cloning can be used to study estimability of various parameters. We contend that whenever hierarchical models are used, estimability of the parameters should be checked before drawing scientific inferences or making management decisions. Data cloning facilitates such a check on hierarchical models."
in_NB  to_read  partial_identification  monte_carlo  likelihood 
december 2013 by cshalizi
Data cloning: easy maximum likelihood estimation for complex ecological models using Bayesian Markov chain Monte Carlo methods - Lele - 2007 - Ecology Letters - Wiley Online Library
"We introduce a new statistical computing method, called data cloning, to calculate maximum likelihood estimates and their standard errors for complex ecological models. Although the method uses the Bayesian framework and exploits the computational simplicity of the Markov chain Monte Carlo (MCMC) algorithms, it provides valid frequentist inferences such as the maximum likelihood estimates and their standard errors. The inferences are completely invariant to the choice of the prior distributions and therefore avoid the inherent subjectivity of the Bayesian approach. The data cloning method is easily implemented using standard MCMC software. Data cloning is particularly useful for analysing ecological situations in which hierarchical statistical models, such as state-space models and mixed effects models, are appropriate. We illustrate the method by fitting two nonlinear population dynamics models to data in the presence of process and observation noise."

- There is nothing specifically ecological about this. The trick is to raise the likelihood to some large power k, as though one had observed k completely independent replicas of the data which all happened to be exactly the same, and then do ordinary Metropolis-Hastings. For sufficiently large k, the likelihood function dominates the prior, Bernstein-von Mises takes over, and the distribution of the posterior concentrates around the MLE, with inverse-Fisher-information variance. It's very clever, and there seem to be some extensions (can't recall if this first paper mentions them) about separating identified from unidentified parameters by seeing which posterior variances don't shrink as k is cranked up.

Ungated author copy: http://mysite.science.uottawa.ca/flutsche/PUBLICATIONS/LeleDennisLutscher2007.pdf
in_NB  monte_carlo  likelihood  re:stacs  have_read  statistics  estimation  state-space_models  hierarchical_statistical_models 
december 2013 by cshalizi
[1311.7286] Approximate Bayesian Computation with composite score functions
"Both Approximate Bayesian Computation (ABC) and composite likelihood methods are useful for Bayesian and frequentist inference when the likelihood function is intractable. We show that composite likelihoods score functions can be fruitfully used as automatic informative summary statistics in ABC in order to obtain accurate approximations to the posterior distribution of the parameter of interest. This is formally motivated by the use of the score function of the full likelihood, and extended to general unbiased estimating functions in complex models. Examples illustrate that the proposed ABC procedure can significantly improve upon usual ABC methods based on ordinary data summaries."
to:NB  approximation  likelihood  statistics  estimation 
december 2013 by cshalizi
[1207.0865] Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels
"Variational methods for parameter estimation are an active research area, potentially offering computationally tractable heuristics with theoretical performance bounds. We build on recent work that applies such methods to network data, and establish asymptotic normality rates for parameter estimates of stochastic blockmodel data, by either maximum likelihood or variational estimation. The result also applies to various sub-models of the stochastic blockmodel found in the literature."
in_NB  likelihood  estimation  community_discovery  network_data_analysis  statistics  choi.david_s.  kith_and_kin  variational_inference  bickel.peter_j. 
november 2013 by cshalizi
Reid : Aspects of likelihood inference
"I review the classical theory of likelihood based inference and consider how it is being extended and developed for use in complex models and sampling schemes."
to:NB  statistics  likelihood  estimation 
september 2013 by cshalizi
[1308.0049] A composite likelihood approach to computer model calibration using high-dimensional spatial data
"Computer models are used to model complex processes in various disciplines. Often, a key source of uncertainty in the behavior of complex computer models is uncertainty due to unknown model input parameters. Statistical computer model calibration is the process of inferring model parameter values, along with associated uncertainties, from observations of the physical process and from model outputs at various parameter settings. Observations and model outputs are often in the form of high-dimensional spatial fields, especially in the environmental sciences. Sound statistical inference may be computationally challenging in such situations. Here we introduce a composite likelihood-based approach to perform computer model calibration with high-dimensional spatial data. While composite likelihood has been studied extensively in the context of spatial statistics, computer model calibration using composite likelihood poses several new challenges. We propose a computationally efficient approach for Bayesian computer model calibration using composite likelihood. We also develop a methodology based on asymptotic theory for adjusting the composite likelihood posterior distribution so that it accurately represents posterior uncertainties. We study the application of our new approach in the context of calibration for a climate model."
to:NB  simulation  statistics  likelihood  computational_statistics  estimation 
august 2013 by cshalizi
[1307.5381] A convex pseudo-likelihood framework for high dimensional partial correlation estimation with convergence guarantees
"Sparse high dimensional graphical model selection is a topic of much interest in modern day statistics. A popular approach is to apply l1-penalties to either (1) parametric likelihoods, or, (2) regularized regression/pseudo-likelihoods, with the latter having the distinct advantage that they do not explicitly assume Gaussianity. As none of the popular methods proposed for solving pseudo-likelihood based objective functions have provable convergence guarantees, it is not clear if corresponding estimators exist or are even computable, or if they actually yield correct partial correlation graphs. This paper proposes a new pseudo-likelihood based graphical model selection method that aims to overcome some of the shortcomings of current methods, but at the same time retain all their respective strengths. In particular, we introduce a novel framework that leads to a convex formulation of the partial covariance regression graph problem, resulting in an objective function comprised of quadratic forms. The objective is then optimized via a coordinate-wise approach. The specific functional form of the objective function facilitates rigorous convergence analysis leading to convergence guarantees; an important property that cannot be established using standard results, when the dimension is larger than the sample size, as is often the case in high dimensional applications. These convergence guarantees ensure that estimators are well-defined under very general conditions, and are always computable. In addition, the approach yields estimators that have good large sample properties and also respect symmetry. Furthermore, application to simulated/real data, timing comparisons and numerical convergence is demonstrated. We also present a novel unifying framework that places all graphical pseudo-likelihood methods as special cases of a more general formulation, leading to important insights."
to:NB  convexity  graphical_models  sparsity  optimization  lasso  likelihood  statistics 
july 2013 by cshalizi
Li : Maximum-likelihood estimation for diffusion processes via closed-form density expansions
"This paper proposes a widely applicable method of approximate maximum-likelihood estimation for multivariate diffusion process from discretely sampled data. A closed-form asymptotic expansion for transition density is proposed and accompanied by an algorithm containing only basic and explicit calculations for delivering any arbitrary order of the expansion. The likelihood function is thus approximated explicitly and employed in statistical estimation. The performance of our method is demonstrated by Monte Carlo simulations from implementing several examples, which represent a wide range of commonly used diffusion models. The convergence related to the expansion and the estimation method are theoretically justified using the theory of Watanabe [Ann. Probab. 15 (1987) 1–39] and Yoshida [J. Japan Statist. Soc. 22 (1992) 139–159] on analysis of the generalized random variables under some standard sufficient conditions."
in_NB  stochastic_differential_equations  statistical_inference_for_stochastic_processes  estimation  likelihood  statistics 
july 2013 by cshalizi
[1306.5603] Consistency of maximum likelihood estimation for some dynamical systems
"We consider the asymptotic consistency of maximum likelihood parameter estimation for dynamical systems observed with noise. Under suitable conditions on the dynamical systems and the observations, we show that maximum likelihood parameter estimation is consistent. Our proof involves ideas from both information theory and dynamical systems. Furthermore, we show how some well-studied properties of dynamical systems imply the general statistical properties related to maximum likelihood estimation. Finally, we exhibit classical families of dynamical systems for which maximum likelihood estimation is consistent. Examples include shifts of finite type with Gibbs measures and Axiom A attractors with SRB measures."
in_NB  to_read  dynamical_systems  time_series  information_theory  statistics  chaos  likelihood  estimation  statistical_inference_for_stochastic_processes  pillai.natesh  nobel.andrew 
june 2013 by cshalizi
Peng , Schick : Empirical likelihood approach to goodness of fit testing
"Motivated by applications to goodness of fit testing, the empirical likelihood approach is generalized to allow for the number of constraints to grow with the sample size and for the constraints to use estimated criteria functions. The latter is needed to deal with nuisance parameters. The proposed empirical likelihood based goodness of fit tests are asymptotically distribution free. For univariate observations, tests for a specified distribution, for a distribution of parametric form, and for a symmetric distribution are presented. For bivariate observations, tests for independence are developed."
in_NB  goodness-of-fit  likelihood  regression  statistics 
june 2013 by cshalizi
[1306.4032] Playing Russian Roulette with Intractable Likelihoods
"A general scheme to exploit Exact-Approximate MCMC methodology for intractable likelihoods is suggested. By representing the intractable likelihood as an infinite Maclaurin or Geometric series expansion, unbiased estimates of the likelihood can be obtained by finite time stochastic truncations of the series via Russian Roulette sampling. Whilst the estimates of the intractable likelihood are unbiased, for unbounded unnormalised densities they induce a signed measure in the Exact-Approximate Markov chain Monte Carlo procedure which will introduce bias in the invariant distribution of the chain. By exploiting results from the Quantum Chromodynamics literature the signed measures can be employed in an Exact-Approximate sampling scheme in such a way that expectations with respect to the desired target distribution are preserved. This provides a general methodology to construct Exact-Approximate sampling schemes for a wide range of models and the methodology is demonstrated on well known examples such as posterior inference of coupling parameters in Ising models and defining the posterior for Fisher-Bingham distributions defined on the $d$-Sphere. A large scale example is provided for a Gaussian Markov Random Field model, with fine scale mesh refinement, describing the Ozone Column data. To our knowledge this is the first time that fully Bayesian inference over a model of this size has been feasible without the need to resort to any approximations. Finally a critical assessment of the strengths and weaknesses of the methodology is provided with pointers to ongoing research."
in_NB  monte_carlo  approximate_bayesian_computation  simulation  likelihood  estimation  statistics  to_read  re:stacs 
june 2013 by cshalizi
[1306.1493] Extended empirical likelihood for general estimating equations
"We derive an extended empirical likelihood for parameters defined by estimating equations which generalizes the original empirical likelihood for such parameters to the full parameter space. Under mild conditions, the extended empirical likelihood has all asymptotic properties of the original empirical likelihood. Its contours retain the data-driven shape of the latter. It can also attain the second order accuracy. The first order extended empirical likelihood is easy-to-use yet it is substantially more accurate than other empirical likelihoods, including second order ones. We recommend it for practical applications of the empirical likelihood method."
to:NB  likelihood  estimation  statistics 
june 2013 by cshalizi
[1305.5712] Fast inference in generalized linear models via expected log-likelihoods
"Generalized linear models play an essential role in a wide variety of statistical applications. This paper discusses an approximation of the likelihood in these models that can greatly facilitate computation. The basic idea is to replace a sum that appears in the exact log-likelihood by an expectation over the model covariates; the resulting "expected log-likelihood" can in many cases be computed significantly faster than the exact log-likelihood. In many neuroscience experiments the distribution over model covariates is controlled by the experimenter and the expected log-likelihood approximation becomes particularly useful; for example, estimators based on maximizing this expected log-likelihood (or a penalized version thereof) can often be obtained with orders of magnitude computational savings compared to the exact maximum likelihood estimators. A risk analysis establishes that these maximum EL estimators often come with little cost in accuracy (and in some cases even improved accuracy) compared to standard maximum likelihood estimates. Finally, we find that these methods can significantly decrease the computation time of marginal likelihood calculations for model selection and of Markov chain Monte Carlo methods for sampling from the posterior parameter distribution. We illustrate our results by applying these methods to a computationally-challenging dataset of neural spike trains obtained via large-scale multi-electrode recordings in the primate retina."
to:NB  estimation  likelihood  computational_statistics  paninski.liam  to_teach:statcomp  to_teach:undergrad-ADA  generalized_linear_models 
may 2013 by cshalizi
[1305.1056] Relative Performance of Expected and Observed Fisher Information in Covariance Estimation for Maximum Likelihood Estimates
"Maximum likelihood estimation is a popular method in statistical inference. As a way of assessing the accuracy of the maximum likelihood estimate (MLE), the calculation of the covariance matrix of the MLE is of great interest in practice. Standard statistical theory shows that the normalized MLE is asymptotically normally distributed with covariance matrix being the inverse of the Fisher information matrix (FIM) at the unknown parameter. Two commonly used estimates for the covariance of the MLE are the inverse of the observed FIM (the same as the inverse Hessian of the negative log-likelihood) and the inverse of the expected FIM (the same as the inverse FIM). Both of the observed and expected FIM are evaluated at the MLE from the sample data. In this dissertation, we demonstrate that, under reasonable conditions similar to standard MLE conditions, the inverse expected FIM outperforms the inverse observed FIM under a mean squared error criterion. Specifically, in an asymptotic sense, the inverse expected FIM (evaluated at the MLE) has no greater mean squared error with respect to the true covariance matrix than the inverse observed FIM (evaluated at the MLE) at the element level. This result is different from widely accepted results showing preference for the observed FIM. In this dissertation, we present theoretical derivations that lead to the conclusion above. We also present numerical studies on three distinct problems to support the theoretical result. This dissertation also includes two appendices on topics of relevance to stochastic systems. The first appendix discusses optimal perturbation distributions for the simultaneous perturbation stochastic approximation (SPSA) algorithm. The second appendix considers Monte Carlo methods for computing FIMs when closed forms are not attainable."

--- Huh, I guess I'd been relying on folklore.
to:NB  fisher_information  estimation  statistics  likelihood 
may 2013 by cshalizi
[1304.0503] Non-parametric likelihood based estimation of linear filters for point processes
"We consider models for multivariate point processes where the intensity is given non-parametrically in terms of functions in a reproducing kernel Hilbert space. The likelihood function involves a time integral and is consequently not given in terms of a finite number of kernel evaluations. We derive a representation of the gradient of the log-likelihood and provide two methods for practically computing approximations to the gradient by time discretization. We illustrate the methods by an application to neuron network modeling, and we investigate how the computational costs of the methods depend on the resolution of the time discretization. The methods are implemented and available in the R-package ppstat."
in_NB  hilbert_space  likelihood  point_processes  statistics  estimation  nonparametrics 
april 2013 by cshalizi
[1303.6794] A likelihood based framework for assessing network evolution models tested on real network data
"This paper presents a statistically sound method for using likelihood to assess potential models of network evolution. The method is tested on data from five real networks. Data from the internet autonomous system network, from two photo sharing sites and from a co-authorship network are tested using this framework."
to:NB  statistics  likelihood  network_data_analysis 
march 2013 by cshalizi
Computing Likelihood Functions for High-Energy Physics Experiments when Distributions are Defined by Simulators with Nuisance Parameters
"When searching for new phenomena in high-energy physics, statistical analysis is complicated by the presence of nuisance parameters, representing uncertainty in the physics of interactions or in detector properties. Another complication, even with no nuisance parameters, is that the probability distributions of the models are specified only by simulation programs, with no way of evaluating their probability density functions. I advocate expressing the result of an experiment by means of the likelihood function, rather than by frequentist confidence intervals or p-values. A likelihood function for this problem is difficult to obtain, however, for both of the reasons given above. I discuss ways of circumventing these problems by reducing dimensionality using a classifier and employing simulations with multiple values for the nuisance parameters."
to:NB  statistics  likelihood  computational_statistics  simulation  neal.radford 
march 2013 by cshalizi
[1302.5468] What does the proof of Birnbaum's theorem prove?
"Birnbaum's theorem, that the sufficiency and conditionality principles entail the likelihood principle, has engendered a great deal of controversy and discussion since the publication of the result in 1962. In particular, many have raised doubts as to the validity of this result. Typically these doubts are concerned with the validity of the principles of sufficiency and conditionality as expressed by Birnbaum. Technically it would seem, however, that the proof itself is sound. In this paper we use set theory to formalize the context in which the result is proved and show that in fact Birnbaum's theorem is incorrectly stated as a key hypothesis is left out of the statement. When this hypothesis is added, we see that sufficiency is irrelevant, and that the result is dependent on a well-known flaw in conditionality that renders the result almost vacuous."
in_NB  statistics  sufficiency  likelihood  foundations_of_statistics  have_read 
february 2013 by cshalizi
[1302.3071] A penalized empirical likelihood method in high dimensions
"This paper formulates a penalized empirical likelihood (PEL) method for inference on the population mean when the dimension of the observations may grow faster than the sample size. Asymptotic distributions of the PEL ratio statistic is derived under different component-wise dependence structures of the observations, namely, (i) non-Ergodic, (ii) long-range dependence and (iii) short-range dependence. It follows that the limit distribution of the proposed PEL ratio statistic can vary widely depending on the correlation structure, and it is typically different from the usual chi-squared limit of the empirical likelihood ratio statistic in the fixed and finite dimensional case. A unified subsampling based calibration is proposed, and its validity is established in all three cases, (i)-(iii). Finite sample properties of the method are investigated through a simulation study."
to:NB  likelihood  statistics  lahiri.s.n.  to_read  high-dimensional_statistics 
february 2013 by cshalizi
[1302.3302] Asymptotic power of likelihood ratio tests for high dimensional data
"This paper considers the asymptotic power of likelihood ratio test (LRT) for the identity test when the dimension p is large compared to the sample size n. The asymptotic distribution of LRT under alternatives is given and an explicit expression of the power is derived. A simulation study is carried out to compare LRT with other tests. All these studies show that LRT is a powerful test to detect eigenvalues around zero. "
to:NB  likelihood  statistics  model_selection  hypothesis_testing 
february 2013 by cshalizi
[1302.3567] Efficient Approximations for the Marginal Likelihood of Incomplete Data Given a Bayesian Network
"We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MDL approximation. We also consider approximations proposed by Draper (1993) and Cheeseman and Stutz (1995). These approximations are as efficient as BIC/MDL, but their accuracy has not been studied in any depth. We compare the accuracy of these approximations under the assumption that the Laplace approximation is the most accurate. In experiments using synthetic data generated from discrete naive-Bayes models having a hidden root node, we find that the CS measure is the most accurate."
to:NB  likelihood  statistics  graphical_models  laplace_approximation 
february 2013 by cshalizi
Hinkley : Predictive Likelihood
"The likelihood function is the common basis of all parametric inference. However, with the exception of an ad hoc definition by Fisher, there has been no such unifying basis for prediction of future events, given past observations. This article proposes a definition of predictive likelihood which can help to remove some nonuniqueness problems in sampling-theory predictive inference, and which can produce a simple prediction analog of the Bayesian parametric result, posterior ∝ prior × likelihood, in many situations."

- Parallel to Lauritzen's work, according to p. 1.
in_NB  to_read  prediction  likelihood  statistics  sufficiency 
january 2013 by cshalizi
[1301.0463] A Simple Approach to Maximum Intractable Likelihood Estimation
"Approximate Bayesian Computation (ABC) can be viewed as an analytic approximation of an intractable likelihood coupled with an elementary simulation step. Such a view, combined with a suitable instrumental prior distribution permits maximum-likelihood (or maximum-a-posteriori) inference to be conducted, approximately, using essentially the same techniques. An elementary approach to this problem which simply obtains a nonparametric approximation of the likelihood surface which is then used as a smooth proxy for the likelihood in a subsequent maximisation step is developed here and the convergence of this class of algorithms is characterised theoretically. The use of non-sufficient summary statistics in this context is considered. Applying the proposed method to four problems demonstrates good performance. The proposed approach provides an alternative for approximating the maximum likelihood estimator (MLE) in complex scenarios."

Journal version (in open-access _Electronic Journal of Statistics_): http://dx.doi.org/10.1214/13-EJS819
in_NB  to_read  statistics  estimation  likelihood  approximate_bayesian_computation  indirect_inference  re:stacs  simulation  to_teach:complexity-and-inference 
january 2013 by cshalizi
Taylor & Francis Online :: Fast Inference for the Latent Space Network Model Using a Case-Control Approximate Likelihood - Journal of Computational and Graphical Statistics - Volume 21, Issue 4
"Network models are widely used in social sciences and genome sciences. The latent space model proposed by Hoff et al. (2002), and extended by Handcock et al. (2007) to incorporate clustering, provides a visually interpretable model-based spatial representation of relational data and takes account of several intrinsic network properties. Due to the structure of the likelihood function of the latent space model, the computational cost is of order O(N 2), where N is the number of nodes. This makes it infeasible for large networks. In this article, we propose an approximation of the log-likelihood function. We adapt the case-control idea from epidemiology and construct a case-control log-likelihood, which is an unbiased estimator of the log-full likelihood. Replacing the full likelihood by the case-control likelihood in the Markov chain Monte Carlo estimation of the latent space model reduces the computational time from O(N 2) to O(N), making it feasible for large networks. We evaluate its performance using simulated and real data. We fit the model to a large protein–protein interaction data using the case-control likelihood and use the model fitted link probabilities to identify false positive links. Supplemental materials are available online."
to:NB  statistics  network_data_analysis  approximation  likelihood  inference_to_latent_objects  hoff.peter  raftery.adrian 
december 2012 by cshalizi
"Another consequence of Stone’s example is that, in my opinion, it shows that the Likelihood Principle is bogus. According to the likelihood principle, the observed likelihood function contains all the useful information in the data. In this example, the likelihood does not distinguish the four possible parameter values.
"But the direction of the string from the current position — which does not affect the likelihood — clearly has lots of information."
statistics  bayesianism  likelihood 
december 2012 by cshalizi
[1206.2245] Pippi - painless parsing, post-processing and plotting of posterior and likelihood samples
"Interpreting samples from likelihood or posterior probability density functions is rarely as straightforward as it seems it should be. Producing publication-quality graphics of these distributions is often similarly painful. In this short note I describe pippi, a simple, publicly-available package for parsing and post-processing such samples, as well as generating high-quality PDF graphics of the results. Pippi is easily and extensively configurable and customisable, both in its options for parsing and post-processing samples, and in the visual aspects of the figures it produces. I illustrate some of these using an existing supersymmetric global fit, performed in the context of a gamma-ray search for dark matter. Pippi can be downloaded and followed at this http URL"
in_NB  statistics  likelihood  monte_carlo  estimation  visual_display_of_quantitative_information 
june 2012 by cshalizi
[0804.2996] The Epic Story of Maximum Likelihood
"At a superficial level, the idea of maximum likelihood must be prehistoric: early hunters and gatherers may not have used the words ``method of maximum likelihood'' to describe their choice of where and how to hunt and gather, but it is hard to believe they would have been surprised if their method had been described in those terms. It seems a simple, even unassailable idea: Who would rise to argue in favor of a method of minimum likelihood, or even mediocre likelihood? And yet the mathematical history of the topic shows this ``simple idea'' is really anything but simple. Joseph Louis Lagrange, Daniel Bernoulli, Leonard Euler, Pierre Simon Laplace and Carl Friedrich Gauss are only some of those who explored the topic, not always in ways we would sanction today. In this article, that history is reviewed from back well before Fisher to the time of Lucien Le Cam's dissertation. In the process Fisher's unpublished 1930 characterization of conditions for the consistency and efficiency of maximum likelihood estimates is presented, and the mathematical basis of his three proofs discussed. In particular, Fisher's derivation of the information inequality is seen to be derived from his work on the analysis of variance, and his later approach via estimating functions was derived from Euler's Relation for homogeneous functions. The reaction to Fisher's work is reviewed, and some lessons drawn."

Gated version: http://projecteuclid.org/euclid.ss/1207580174
in_NB  likelihood  statistics  estimation  history_of_statistics  stigler.stephen  fisher.r.a.  pearson.karl  neyman.jerzy  hotelling.harold  cramer-rao_inequality  information_geometry  have_read  wald.abraham 
june 2012 by cshalizi
[1206.0867] Testing linear hypotheses in high-dimensional regressions
"For a multivariate linear model, Wilk's likelihood ratio test (LRT) constitutes one of the cornerstone tools. However, the computation of its quantiles under the null or the alternative requires complex analytic approximations and more importantly, these distributional approximations are feasible only for moderate dimension of the dependent variable, say $ple 20$. On the other hand, assuming that the data dimension $p$ as well as the number $q$ of regression variables are fixed while the sample size $n$ grows, several asymptotic approximations are proposed in the literature for Wilk's $bLa$ including the widely used chi-square approximation. In this paper, we consider necessary modifications to Wilk's test in a high-dimensional context, specifically assuming a high data dimension $p$ and a large sample size $n$. Based on recent random matrix theory, the correction we propose to Wilk's test is asymptotically Gaussian under the null and simulations demonstrate that the corrected LRT has very satisfactory size and power, surely in the large $p$ and large $n$ context, but also for moderately large data dimensions like $p=30$ or $p=50$. As a byproduct, we give a reason explaining why the standard chi-square approximation fails for high-dimensional data. We also introduce a new procedure for the classical multiple sample significance test in MANOVA which is valid for high-dimensional data."
to:NB  to_read  statistics  model_selection  likelihood  re:model_selection_for_networks  high-dimensional_statistics 
june 2012 by cshalizi
[0808.4042] Statistical models, likelihood, penalized likelihood and hierarchical likelihood
"We give an overview of statistical models and likelihood, together with two of its variants: penalized and hierarchical likelihood. The Kullback-Leibler divergence is referred to repeatedly, for defining the misspecification risk of a model, for grounding the likelihood and the likelihood crossvalidation which can be used for choosing weights in penalized likelihood. Families of penalized likelihood and sieves estimators are shown to be equivalent. The similarity of these likelihood with a posteriori distributions in a Bayesian approach is considered."
in_NB  statistics  likelihood  bayesianism  information_theory 
february 2012 by cshalizi
Stochastic Composite Likelihood
"Maximum likelihood estimators are often of limited practical use due to the intensive computation they require. We propose a family of alternative estimators that maximize a stochastic variation of the composite likelihood function. Each of the estimators resolve the computation-accuracy tradeoff differently, and taken together they span a continuous spectrum of computation-accuracy tradeoff resolutions. We prove the consistency of the estimators, provide formulas for their asymptotic variance, statistical robustness, and computational complexity. We discuss experimental results in the context of Boltzmann machines and conditional random fields. The theoretical and experimental studies demonstrate the effectiveness of the estimators when the computational resources are insufficient. They also demonstrate that in some cases reduced computational complexity is associated with robustness thereby increasing statistical accuracy."
likelihood  estimation  statistics  lebanon.guy  to_read 
november 2010 by cshalizi
"A note on the asymptotic behaviour of empirical likelihood statistics" - Statistical Methods & Applications, Volume 19, Number 4
"This paper develops some theoretical results about the asymptotic behaviour of the empirical likelihood and the empirical profile likelihood statistics, which originate from fairly general estimating functions. The results accommodate, within a unified framework, various situations potentially occurring in a wide range of applications. For this reason, they are potentially useful in several contexts, such as, for example, in inference for dependent data. We provide examples showing that known findings in literature about the asymptotic behaviour of some empirical likelihood statistics in time series models can be derived as particular cases of our results."
empirical_likelihood  asymptotics  statistics  estimation  likelihood  statistical_inference_for_stochastic_processes 
october 2010 by cshalizi
Default priors for Bayesian and frequentist inference - Fraser et al. - 2010 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library
"We investigate the choice of default priors for use with likelihood for Bayesian and frequentist inference. Such a prior is a density or relative density that weights an observed likelihood function, leading to the elimination of parameters that are not of interest and then a density-type assessment for a parameter of interest. For independent responses from a continuous model, we develop a prior for the full parameter that is closely linked to the original Bayes approach and provides an extension of the right invariant measure to general contexts. We then develop a modified prior that is targeted on a component parameter of interest and by targeting avoids the marginalization paradoxes of Dawid and co-workers. This modifies Jeffreys's prior and provides extensions to the development of Welch and Peers. ... combined to explore priors for a vector parameter of interest in the presence of a vector nuisance parameter. Examples ... illustrate the computation of the priors."
likelihood  estimation  default_priors  bayesianism  statistics  nuisance_parameters 
october 2010 by cshalizi
[1003.0691] Statistical and Computational Tradeoffs in Stochastic Composite Likelihood
"Maximum likelihood estimators are often of limited practical use due to the intensive computation they require. We propose a family of alternative estimators that maximize a stochastic variation of the composite likelihood function. Each of the estimators resolve the computation-accuracy tradeoff differently, and taken together they span a continuous spectrum of computation-accuracy tradeoff resolutions. We prove the consistency of the estimators, provide formulas for their asymptotic variance, statistical robustness, and computational complexity. We discuss experimental results in the context of Boltzmann machines and conditional random fields. The theoretical and experimental studies demonstrate the effectiveness of the estimators when the computational resources are insufficient. They also demonstrate that in some cases reduced computational complexity is associated with robustness thereby increasing statistical accuracy."
statistics  estimation  likelihood  computational_statistics  lebanon.guy 
march 2010 by cshalizi
[math/0611376] Efficient likelihood estimation in state space models
"Motivated by studying asymptotic properties of the maximum likelihood estimator (MLE) in stochastic volatility (SV) models, in this paper we investigate likelihood estimation in state space models. We first prove, under some regularity conditions, there is a consistent sequence of roots of the likelihood equation that is asymptotically normal with the inverse of the Fisher information as its variance. With an extra assumption that the likelihood equation has a unique root for each $n$, then there is a consistent sequence of estimators of the unknown parameters. If, in addition, the supremum of the log likelihood function is integrable, the MLE exists and is strongly consistent. Edgeworth expansion of the approximate solution of likelihood equation is also established. Several examples, including Markov switching models, ARMA models, (G)ARCH models and stochastic volatility (SV) models, are given for illustration."
estimation  time_series  state-space_models  markov_models  likelihood  statistics 
march 2010 by cshalizi
Likelihood for statistically equivalent models. John Copas. 2010; JRSS B
"In likelihood inference we usually assume that the model is fixed and then base inference on the corresponding likelihood function. Often, however, the choice of model is rather arbitrary, and there may be other models which fit the data equally well. We study robustness of likelihood inference over such 'statistically equivalent' models and suggest a simple 'envelope likelihood' to capture this aspect of model uncertainty. Robustness depends critically on how we specify the parameter of interest. Some asymptotic theory is presented, illustrated by three examples."
statistics  estimation  likelihood  model_uncertainty  misspecification  re:phil-of-bayes_paper  to_read 
january 2010 by cshalizi
Commenges: Statistical models: Conventional, penalized and hierarchical likelihood
"We give an overview of statistical models and likelihood, together with two of its variants: penalized and hierarchical likelihood. The Kullback-Leibler divergence is referred to repeatedly in the literature, for defining the misspecification risk of a model and for grounding the likelihood and the likelihood cross-validation, which can be used for choosing weights in penalized likelihood. Families of penalized likelihood and particular sieves estimators are shown to be equivalent. The similarity of these likelihoods with a posteriori distributions in a Bayesian approach is considered."
statistics  likelihood  cross-validation  re:phil-of-bayes_paper  to_read 
december 2009 by cshalizi
Accurate Parametric Inference for Small Samples
Looks like a teaser for the book by Brazzale, Davison and Reid.
statistics  estimation  likelihood  asymptotics  have_skimmed 
june 2009 by cshalizi
[0708.2184] Monte Carlo likelihood inference for missing data models
"We describe a Monte Carlo method to approximate the maximum likelihood estimate (MLE), when there are missing data and the observed data likelihood is not available in closed form. This method uses simulated missing data that are independent and identically distributed and independent of the observed data. Our Monte Carlo approximation to the MLE is a consistent and asymptotically normal estimate of the minimizer $\theta^*$ of the Kullback--Leibler information, as both Monte Carlo and observed data sample sizes go to infinity simultaneously. Plug-in estimates of the asymptotic variance are provided for constructing confidence regions for $\theta^*$. We give Logit--Normal generalized linear mixed model examples, calculated using an R package."

- "Have read" in the sense of skipping all the proofs, but wanting to go back to them.
statistics  monte_carlo  missing_data  in_NB  geyer.charles_j.  have_read  to_teach:undergrad-ADA  likelihood  estimation 
november 2007 by cshalizi

related tags

approximate_bayesian_computation  approximation  asymptotics  barron.andrew_w.  bayesianism  bayesian_consistency  bickel.peter_j.  brown.emery  chains_with_complete_connections  change-point_problem  chaos  choi.david_s.  community_discovery  computational_statistics  convergence_of_stochastic_processes  convexity  cramer-rao_inequality  cross-validation  default_priors  density_estimation  density_ratio_estimation  douc.randal  dynamical_systems  economics  empirical_likelihood  epidemic_models  estimation  fisher.r.a.  fisher_information  foundations_of_statistics  frequency_domain  generalized_linear_models  geyer.charles_j.  goodness-of-fit  graphical_models  hahn.p._richard  have_read  have_skimmed  hierarchical_statistical_models  high-dimensional_statistics  hilbert_space  history_of_statistics  hoff.peter  hotelling.harold  hypothesis_testing  indirect_inference  inference_to_latent_objects  information_criteria  information_geometry  information_theory  in_NB  izbicki.rafael  kith_and_kin  lahiri.s.n.  laplace_approximation  lasso  lebanon.guy  lee.ann_b.  likelihood  macroeconomics  markov_models  meta-analysis  missing_data  misspecification  model_selection  model_uncertainty  monte_carlo  neal.radford  network_data_analysis  neural_data_analysis  neyman.jerzy  nobel.andrew  nonparametrics  not_worth_putting_in_notebooks  nuisance_parameters  optimization  paninski.liam  partial_identification  pearson.karl  pillai.natesh  point_processes  prediction  primates  raftery.adrian  re:model_selection_for_networks  re:neutral_model_of_inquiry  re:phil-of-bayes_paper  re:stacs  re:your_favorite_dsge_sucks  read_the_thesis  regression  robins.james  shot_after_a_fair_trial  simulation  singh.sumeetpal  smith.leonard  sparsity  spectral_methods  splines  state-space_models  statistical_inference_for_stochastic_processes  statistics  stigler.stephen  stochastic_approximation  stochastic_differential_equations  stochastic_processes  sufficiency  time_series  to:blog  to:NB  to_be_shot_after_a_fair_trial  to_read  to_teach:complexity-and-inference  to_teach:statcomp  to_teach:undergrad-ADA  variational_inference  visual_display_of_quantitative_information  wald.abraham  wasserman.larry  white.halbert 

Copy this bookmark: