cshalizi + simulation   114

[1802.05688] Simulation assisted machine learning
"Motivation: In a predictive modeling setting, if sufficient details of the system behavior are known, one can build and use a simulation for making predictions. When sufficient system details are not known, one typically turns to machine learning, which builds a black-box model of the system using a large dataset of input sample features and outputs. We consider a setting which is between these two extremes: some details of the system mechanics are known but not enough for creating simulations that can be used to make high quality predictions. In this context we propose using approximate simulations to build a kernel for use in kernelized machine learning methods, such as support vector machines. The results of multiple simulations (under various uncertainty scenarios) are used to compute similarity measures between every pair of samples: sample pairs are given a high similarity score if they behave similarly under a wide range of simulation parameters. These similarity values, rather than the original high dimensional feature data, are used to build the kernel.
"Results: We demonstrate and explore the simulation based kernel (SimKern) concept using four synthetic complex systems--three biologically inspired models and one network flow optimization model. We show that, when the number of training samples is small compared to the number of features, the SimKern approach dominates over no-prior-knowledge methods. This approach should be applicable in all disciplines where predictive models are sought and informative yet approximate simulations are available. "
to:NB  simulation  indirect_inference  kernel_methods  statistics 
8 days ago by cshalizi
[1106.4929] Simulating rare events in dynamical processes
"Atypical, rare trajectories of dynamical systems are important: they are often the paths for chemical reactions, the haven of (relative) stability of planetary systems, the rogue waves that are detected in oil platforms, the structures that are responsible for intermittency in a turbulent liquid, the active regions that allow a supercooled liquid to flow... Simulating them in an efficient, accelerated way, is in fact quite simple.
"In this paper we review a computational technique to study such rare events in both stochastic and Hamiltonian systems. The method is based on the evolution of a family of copies of the system which are replicated or killed in such a way as to favor the realization of the atypical trajectories. We illustrate this with various examples."
to:NB  stochastic_processes  simulation  large_deviations  to_teach:data_over_space_and_time  to_teach:statcomp  re:fitness_sampling  re:do-institutions-evolve 
25 days ago by cshalizi
[1607.08804] Finite-Time and -Size Scalings in the Evaluation of Large Deviation Functions: Numerical Approach in Continuous Time
"Rare trajectories of stochastic systems are important to understand -- because of their potential impact. However, their properties are by definition difficult to sample directly. Population dynamics provides a numerical tool allowing their study, by means of simulating a large number of copies of the system, which are subjected to selection rules that favor the rare trajectories of interest. Such algorithms are plagued by finite simulation time- and finite population size- effects that can render their use delicate. In this paper, we present a numerical approach which uses the finite-time and finite-size scalings of estimators of the large deviation functions associated to the distribution of rare trajectories. The method we propose allows one to extract the infinite-time and infinite-size limit of these estimators which -- as shown on the contact process -- provides a significant improvement of the large deviation functions estimators compared to the the standard one."
to:NB  large_deviations  stochastic_processes  simulation  re:fitness_sampling 
25 days ago by cshalizi
[1906.05944] Statistical Inference for Generative Models with Maximum Mean Discrepancy
"While likelihood-based inference and its variants provide a statistically efficient and widely applicable approach to parametric inference, their application to models involving intractable likelihoods poses challenges. In this work, we study a class of minimum distance estimators for intractable generative models, that is, statistical models for which the likelihood is intractable, but simulation is cheap. The distance considered, maximum mean discrepancy (MMD), is defined through the embedding of probability measures into a reproducing kernel Hilbert space. We study the theoretical properties of these estimators, showing that they are consistent, asymptotically normal and robust to model misspecification. A main advantage of these estimators is the flexibility offered by the choice of kernel, which can be used to trade-off statistical efficiency and robustness. On the algorithmic side, we study the geometry induced by MMD on the parameter space and use this to introduce a novel natural gradient descent-like algorithm for efficient implementation of these estimators. We illustrate the relevance of our theoretical results on several classes of models including a discrete-time latent Markov process and two multivariate stochastic differential equation models."
to:NB  simulation  indirect_inference  statistics  estimation  hilbert_space 
9 weeks ago by cshalizi
Huang , Reich , Fuentes , Sankarasubramanian : Complete spatial model calibration
"Computer simulation models are central to environmental science. These mathematical models are used to understand complex weather and climate patterns and to predict the climate’s response to different forcings. Climate models are of course not perfect reflections of reality, and so comparison with observed data is needed to quantify and to correct for biases and other deficiencies. We propose a new method to calibrate model output using observed data. Our approach not only matches the marginal distributions of the model output and gridded observed data, but it simultaneously postprocesses the model output to have the same spatial correlation as the observed data. This comprehensive calibration method permits realistic spatial simulations for regional impact studies. We apply the proposed method to global climate model output in North America and show that it successfully calibrates the model output for temperature and precipitation."
to:NB  spatial_statistics  simulation  model_checking  statistics  to_teach:data_over_space_and_time  to_read 
9 weeks ago by cshalizi
Computer model calibration with confidence and consistency
"The paper proposes and examines a calibration method for inexact models. The method produces a confidence set on the parameters that includes the best parameter with a desired probability under any sample size. Additionally, this confidence set is shown to be consistent in that it excludes suboptimal parameters in large sample environments. The method works and the results hold with few assumptions; the ideas are maintained even with discrete input spaces or parameter spaces. Computation of the confidence sets and approximate confidence sets is discussed. The performance is illustrated in a simulation example as well as two real data examples."
to:NB  simulation  statistics  confidence_sets  misspecification 
9 weeks ago by cshalizi
[1905.11505] Validation of Approximate Likelihood and Emulator Models for Computationally Intensive Simulations
"Complex phenomena are often modeled with computationally intensive feed-forward simulations for which a tractable analytic likelihood does not exist. In these cases, it is sometimes necessary to use an approximate likelihood or faster emulator model for efficient statistical inference. We describe a new two-sample testing framework for quantifying the quality of the fit to simulations at fixed parameter values. This framework can leverage any regression method to handle complex high-dimensional data and attain higher power in settings where well-known distance-based tests would not. We also introduce a statistically rigorous test for assessing global goodness-of-fit across simulation parameters. In cases where the fit is inadequate, our method provides valuable diagnostics by allowing one to identify regions in both feature and parameter space which the model fails to reproduce well. We provide both theoretical results and examples which illustrate the effectiveness of our approach."
to:NB  statistics  simulation  likelihood  kith_and_kin  lee.ann_b.  izbicki.rafael 
12 weeks ago by cshalizi
McKinley , Vernon , Andrianakis , McCreesh , Oakley , Nsubuga , Goldstein , White : Approximate Bayesian Computation and Simulation-Based Inference for Complex Stochastic Epidemic Models
"Approximate Bayesian Computation (ABC) and other simulation-based inference methods are becoming increasingly used for inference in complex systems, due to their relative ease-of-implementation. We briefly review some of the more popular variants of ABC and their application in epidemiology, before using a real-world model of HIV transmission to illustrate some of challenges when applying ABC methods to high-dimensional, computationally intensive models. We then discuss an alternative approach—history matching—that aims to address some of these issues, and conclude with a comparison between these different methodologies."
to:NB  epidemic_models  simulation  approximate_bayesian_computation  statistics 
may 2019 by cshalizi
Peeking Inside the Black Box: A New Kind of Scientific Visualization | SpringerLink
"Computational systems biologists create and manipulate computational models of biological systems, but they do not always have straightforward epistemic access to the content and behavioural profile of such models because of their length, coding idiosyncrasies, and formal complexity. This creates difficulties both for modellers in their research groups and for their bioscience collaborators who rely on these models. In this paper we introduce a new kind of visualization (observed in a qualitative study of a systems biology laboratory) that was developed to address just this sort of epistemic opacity. The visualization is unusual in that it depicts the dynamics and structure of a computer model instead of that model’s target system, and because it is generated algorithmically. Using considerations from epistemology and aesthetics, we explore how this new kind of visualization increases scientific understanding of the content and function of computer models in systems biology to reduce epistemic opacity."
to:NB  visual_display_of_quantitative_information  modeling  simulation  sociology_of_science 
may 2019 by cshalizi
Empirical Model Building | Wiley Series in Probability and Statistics
"Successful empirical model building is founded on the relationship between data and approximate representations of the real systems that generated that data. As a result, it is essential for researchers who construct these models to possess the special skills and techniques for producing results that are insightful, reliable, and useful. Empirical Model Building: Data, Models, and Reality, Second Edition presents a hands-on approach to the basic principles of empirical model building through a shrewd mixture of differential equations, computer-intensive methods, and data. The book outlines both classical and new approaches and incorporates numerous real-world statistical problems that illustrate modeling approaches that are applicable to a broad range of audiences, including applied statisticians and practicing engineers and scientists.
"The book continues to review models of growth and decay, systems where competition and interaction add to the complextiy of the model while discussing both classical and non-classical data analysis methods. This Second Edition now features further coverage of momentum based investing practices and resampling techniques, showcasing their importance and expediency in the real world. The author provides applications of empirical modeling, such as computer modeling of the AIDS epidemic to explain why North America has most of the AIDS cases in the First World and data-based strategies that allow individual investors to build their own investment portfolios. Throughout the book, computer-based analysis is emphasized and newly added and updated exercises allow readers to test their comprehension of the presented material."
to:NB  books:noted  modeling  simulation  time_series  statistics  downloaded  to_be_shot_after_a_fair_trial 
january 2019 by cshalizi
Sensitivity Analysis in Practice | Wiley Online Books
"Sensitivity analysis should be considered a pre-requisite for statistical model building in any scientific discipline where modelling takes place. For a non-expert, choosing the method of analysis for their model is complex, and depends on a number of factors. This  book guides the non-expert through their problem in order to enable them to choose and apply the most appropriate method. It offers a review of the state-of-the-art in sensitivity analysis, and is suitable for a wide range of practitioners. It is focussed on the use of SIMLAB – a widely distributed freely-available sensitivity analysis software package developed by the authors – for solving problems in sensitivity analysis of statistical models."
to:NB  books:noted  downloaded  sensitivity_analysis  modeling  simulation 
january 2019 by cshalizi
Political Attitudes | Wiley Online Books
"Political Science has traditionally employed empirical research and analytical resources to understand, explain and predict political phenomena. One of the long-standing criticisms against empirical modeling targets the static perspective provided by the model-invariant paradigm. In political science research, this issue has a particular relevance since political phenomena prove sophisticated degrees of context-dependency whose complexity could be hardly captured by traditional approaches. To cope with the complexity challenge, a new modeling paradigm was needed. This book is concerned with this challenge. Moreover, the book aims to reveal the power of computational modeling of political attitudes to reinforce the political methodology in facing two fundamental challenges: political culture modeling and polity modeling. The book argues that an artificial polity model as a powerful research instrument could hardly be effective without the political attitude and, by extension, the political culture computational and simulation modeling theory, experiments and practice."
to:NB  books:noted  downloaded  voter_model  social_influence  public_opinion  agent-based_models  political_science  simulation  interacting_particle_systems 
january 2019 by cshalizi
[1812.00681] Numerical computation of rare events via large deviation theory
"An overview of rare events algorithms based on large deviation theory (LDT) is presented. It covers a range of numerical schemes to compute the large deviation minimizer in various setups, and discusses best practices, common pitfalls, and implementation trade-offs. Generalizations, extensions, and improvements of the minimum action methods are proposed. These algorithms are tested on example problems which illustrate several common difficulties which arise e.g. when the forcing is degenerate or multiplicative, or the systems are infinite-dimensional. Generalizations to processes driven by non-Gaussian noises or random initial data and parameters are also discussed, along with the connection between the LDT-based approach reviewed here and other methods, such as stochastic field theory and optimal control. Finally, the integration of this approach in importance sampling methods using e.g. genealogical algorithms is explored."
to:NB  large_deviations  rar  rare-event_simulation  simulation  computational_statistics  probability  via:rvenkat 
december 2018 by cshalizi
Demographic Models for Projecting Population and Migration: Methods for African Historical Analysis | Manning | Journal of World-Historical Information
"This study presents methods for projecting population and migration over time in cases were empirical data are missing or undependable. The methods are useful for cases in which the researcher has details of population size and structure for a limited period of time (most obviously, the end point), with scattered evidence on other times. It enables estimation of population size, including its structure in age, sex, and status, either forward or backward in time. The program keeps track of all the details. The calculated data can be reported or sampled and compared to empirical findings at various times and places to expected values based on other procedures of estimation.
"The application of these general methods that is developed here is the projection of African populations backwards in time from 1950, since 1950 is the first date for which consistently strong demographic estimates are available for national-level populations all over the African continent. The models give particular attention to migration through enslavement, which was highly important in Africa from 1650 to 1900. Details include a sensitivity analysis showing relative significance of input variables and techniques for calibrating various dimensions of the projection with each other. These same methods may be applicable to quite different historical situations, as long as the data conform in structure to those considered here."

--- The final for the Kids.
to:NB  have_read  demography  history  africa  imperialism  slavery  great_transformation  to_teach:data_over_space_and_time  simulation  manning.patrick 
december 2018 by cshalizi
[1808.04739] Simulating Markov random fields with a conclique-based Gibbs sampler
"For spatial and network data, we consider models formed from a Markov random field (MRF) structure and the specification of a conditional distribution for each observation. At issue, fast simulation from such MRF models is often an important consideration, particularly when repeated generation of large numbers of data sets is required (e.g., for approximating sampling distributions). However, a standard Gibbs strategy for simulating from MRF models involves single-updates, performed with the conditional distribution of each observation in a sequential manner, whereby a Gibbs iteration may become computationally involved even for relatively small samples. As an alternative, we describe a general way to simulate from MRF models using Gibbs sampling with "concliques" (i.e., groups of non-neighboring observations). Compared to standard Gibbs sampling, this simulation scheme can be much faster by reducing Gibbs steps and by independently updating all observations per conclique at once. We detail the simulation method, establish its validity, and assess its computational performance through numerical studies, where speed advantages are shown for several spatial and network examples."

--- Slides: http://andeekaplan.com/phd-thesis/slides/public.pdf
--- There's an R package on Github but I couldn't get it to compile...
in_NB  random_fields  simulation  stochastic_processes  spatial_statistics  network_data_analysis  markov_models  statistics  computational_statistics  to_teach:data_over_space_and_time  have_read 
december 2018 by cshalizi
Uncertainty Quantification of Stochastic Simulation for Black-box Computer Experiments | SpringerLink
"Stochastic simulations applied to black-box computer experiments are becoming more widely used to evaluate the reliability of systems. Yet, the reliability evaluation or computer experiments involving many replications of simulations can take significant computational resources as simulators become more realistic. To speed up, importance sampling coupled with near-optimal sampling allocation for these experiments is recently proposed to efficiently estimate the probability associated with the stochastic system output. In this study, we establish the central limit theorem for the probability estimator from such procedure and construct an asymptotically valid confidence interval to quantify estimation uncertainty. We apply the proposed approach to a numerical example and present a case study for evaluating the structural reliability of a wind turbine."
to:NB  simulation  monte_carlo 
november 2018 by cshalizi
[0802.0021] Time series analysis via mechanistic models
"The purpose of time series analysis via mechanistic models is to reconcile the known or hypothesized structure of a dynamical system with observations collected over time. We develop a framework for constructing nonlinear mechanistic models and carrying out inference. Our framework permits the consideration of implicit dynamic models, meaning statistical models for stochastic dynamical systems which are specified by a simulation algorithm to generate sample paths. Inference procedures that operate on implicit models are said to have the plug-and-play property. Our work builds on recently developed plug-and-play inference methodology for partially observed Markov models. We introduce a class of implicitly specified Markov chains with stochastic transition rates, and we demonstrate its applicability to open problems in statistical inference for biological systems. As one example, these models are shown to give a fresh perspective on measles transmission dynamics. As a second example, we present a mechanistic analysis of cholera incidence data, involving interaction between two competing strains of the pathogen Vibrio cholerae."
in_NB  statistics  time_series  simulation  to_teach:data_over_space_and_time 
september 2018 by cshalizi
Spatial Simulation | Wiley Online Books
"Across broad areas of the environmental and social sciences, simulation models are  an important way to study systems inaccessible to scientific experimental and observational methods, and also an essential complement to those more conventional approaches.  The contemporary research literature is teeming with abstract simulation models whose presentation is mathematically demanding and requires a high level of knowledge of quantitative and computational methods and approaches.  Furthermore, simulation models designed to represent specific systems and phenomena are often complicated, and, as a result, difficult to reconstruct from their descriptions in the literature.  This book aims to provide a practical and accessible account of dynamic spatial modelling, while also equipping readers with a sound conceptual foundation in the subject, and a useful introduction to the wide-ranging literature.
"Spatial Simulation: Exploring Pattern and Process is organised around the idea that a small number of spatial processes underlie the wide variety of dynamic spatial models. Its central focus on three ‘building-blocks’ of dynamic spatial models – forces of attraction and segregation, individual mobile entities, and processes of spread – guides the reader to an understanding of the basis of many of the complicated models found in the research literature. The three building block models are presented in their simplest form and are progressively elaborated and related to real world process that can be represented using them.  Introductory chapters cover essential background topics, particularly the relationships between pattern, process and spatiotemporal scale.  Additional chapters consider how time and space can be represented in more complicated models, and methods for the analysis and evaluation of models. Finally, the three building block models are woven together in a more elaborate example to show how a complicated model can be assembled from relatively simple components.
"To aid understanding, more than 50 specific models described in the book are available online at patternandprocess.org for exploration in the freely available Netlogo platform.  This book encourages readers to develop intuition for the abstract types of model that are likely to be appropriate for application in any specific context.  Spatial Simulation: Exploring Pattern and Process will be of interest to undergraduate and graduate students taking courses in environmental, social, ecological and geographical disciplines.  Researchers and professionals who require a non-specialist introduction will also find this book an invaluable guide to dynamic spatial simulation."

--- This looks cool, but it'd kind of blow the kids minds, so the last tag is really more "to mine for examples" than "to teach".
in_NB  books:noted  simulation  modeling  cellular_automata  to_teach:data_over_space_and_time 
august 2018 by cshalizi
Indirect inference through prediction
"By recasting indirect inference estimation as a prediction rather than a minimization and by using regularized regressions, we can bypass the three major problems of estimation: selecting the summary statistics, defining the distance function and minimizing it numerically. By substituting regression with classification we can extend this approach to model selection as well. We present three examples: a statistical fit, the parametrization of a simple RBC model and heuristics selection in a fishery agent-based model."
agent-based_models  prediction  statistics  estimation  indirect_inference  simulation  have_read  in_NB 
july 2018 by cshalizi
A Micro-Level Data-Calibrated Agent-Based Model: The Synergy between Microsimulation and Agent-Based Modeling | Artificial Life | MIT Press Journals
"Artificial life (ALife) examines systems related to natural life, its processes, and its evolution, using simulations with computer models, robotics, and biochemistry. In this article, we focus on the computer modeling, or “soft,” aspects of ALife and prepare a framework for scientists and modelers to be able to support such experiments. The framework is designed and built to be a parallel as well as distributed agent-based modeling environment, and does not require end users to have expertise in parallel or distributed computing. Furthermore, we use this framework to implement a hybrid model using microsimulation and agent-based modeling techniques to generate an artificial society. We leverage this artificial society to simulate and analyze population dynamics using Korean population census data. The agents in this model derive their decisional behaviors from real data (microsimulation feature) and interact among themselves (agent-based modeling feature) to proceed in the simulation. The behaviors, interactions, and social scenarios of the agents are varied to perform an analysis of population dynamics. We also estimate the future cost of pension policies based on the future population structure of the artificial society. The proposed framework and model demonstrates how ALife techniques can be used by researchers in relation to social issues and policies."
to:NB  agent-based_models  statistics  simulation 
may 2018 by cshalizi
Explaining with Simulations: Why Visual Representations Matter | Perspectives on Science | MIT Press Journals
"Computer simulations are often expected to provide explanations about target phenomena. However there is a gap between the simulation outputs and the underlying model, which prevents users finding the relevant explanatory components within the model. I contend that visual representations which adequately display the simulation outputs can nevertheless be used to get explanations. In order to do so, I elaborate on the way graphs and pictures can help one to explain the behavior of a flow past a cylinder. I then specify the reasons that make more generally visual representations particularly suitable for explanatory tasks in a computer-assisted context."
to:NB  simulation  modeling  explanation  philosophy_of_science  visual_display_of_quantitative_information 
april 2018 by cshalizi
[1506.04956] The Scope and Limits of Simulation in Cognitive Models
"It has been proposed that human physical reasoning consists largely of running "physics engines in the head" in which the future trajectory of the physical system under consideration is computed precisely using accurate scientific theories. In such models, uncertainty and incomplete knowledge is dealt with by sampling probabilistically over the space of possible trajectories ("Monte Carlo simulation"). We argue that such simulation-based models are too weak, in that there are many important aspects of human physical reasoning that cannot be carried out this way, or can only be carried out very inefficiently; and too strong, in that humans make large systematic errors that the models cannot account for. We conclude that simulation-based reasoning makes up at most a small part of a larger system that encompasses a wide range of additional cognitive processes."
to:NB  simulation  mental_models  cognitive_science  marcus.gary 
january 2018 by cshalizi
[1511.01844] A note on the evaluation of generative models
"Probabilistic generative models can be used for compression, denoising, inpainting, texture synthesis, semi-supervised learning, unsupervised feature learning, and other tasks. Given this wide range of applications, it is not surprising that a lot of heterogeneity exists in the way these models are formulated, trained, and evaluated. As a consequence, direct comparison between models is often difficult. This article reviews mostly known but often underappreciated properties relating to the evaluation and interpretation of generative models with a focus on image models. In particular, we show that three of the currently most commonly used criteria---average log-likelihood, Parzen window estimates, and visual fidelity of samples---are largely independent of each other when the data is high-dimensional. Good performance with respect to one criterion therefore need not imply good performance with respect to the other criteria. Our results show that extrapolation from one criterion to another is not warranted and generative models need to be evaluated directly with respect to the application(s) they were intended for. In addition, we provide examples demonstrating that Parzen window estimates should generally be avoided."
to:NB  simulation  stochastic_models  model_checking  statistics  via:vaguery  to_read  re:ADAfaEPoV  re:phil-of-bayes_paper 
december 2016 by cshalizi
The Systemic Image: A New Theory of Interactive Real-Time Simulations | The MIT Press
"Computer simulations conceive objects and situations dynamically, in their changes and progressions. In The Systemic Image, Inge Hinterwaldner considers not only the technical components of dynamic computer simulations but also the sensory aspects of the realization. Examining the optic, the acoustic, the tactile, and the sensorimotor impressions that interactive real-time simulations provide, she finds that iconicity plays a dominant yet unexpected role. Based on this, and close readings of a series of example works, Hinterwaldner offers a new conceptualization of the relationship between systemic configuration and the iconic aspects in these calculated complexes.
"Hinterwaldner discusses specifications of sensorialization, necessary to make the simulation dynamic perceivable. Interweaving iconicity with simulation, she explores the expressive possibilities that can be achieved under the condition of continuously calculated explicit changes. She distinguishes among four levels of forming: the systems perspective, as a process and schema that establishes the most general framework of simulations; the mathematical model, which marks off the boundaries of the simulation’s actualization; the iconization and its orientation toward the user; and interaction design, necessary for the full unfolding of the simulation. The user makes manifest what is initially latent. Viewing the simulation as an interface, Hinterwaldner argues that not only does the sensorially designed aspect of the simulation seduce the user but the user also makes an impact on the simulation—on the dynamic and perhaps on the iconization, although not on the perspectivation. The influence is reciprocal."

--- This sounds like it's talking about interesting things, but if the text is anything like this description, I rather doubt I'll be up to the effort of mental translation / editing.
to:NB  books:noted  simulation 
june 2016 by cshalizi
Agent-Based Models and Microsimulation - Annual Review of Statistics and Its Application, 2(1):259
"Agent-based models (ABMs) are computational models used to simulate the actions and interactions of agents within a system. Usually, each agent has a relatively simple set of rules for how he or she responds to his or her environment and to other agents. These models are used to gain insight into the emergent behavior of complex systems with many agents, in which the emergent behavior depends upon the micro-level behavior of the individuals. ABMs are widely used in many fields, and this article reviews some of those applications. However, as relatively little work has been done on statistical inference for such models, this article also points out some of those gaps and recent strategies to address them."
to:NB  agent-based_models  statistics  simulation  banks.david 
march 2016 by cshalizi
[1507.08612] Likelihood-free inference in high-dimensional models
"Methods that bypass analytical evaluations of the likelihood function have become an indispensable tool for statistical inference in many fields of science. These so-called likelihood-free methods rely on accepting and rejecting simulations based on summary statistics, which limits them to low dimensional models for which the absolute likelihood is large enough to result in manageable acceptance rates. To get around these issues, we introduce a novel, likelihood-free Markov-Chain Monte Carlo (MCMC) method combining two key innovations: updating only one parameter per iteration and accepting or rejecting this update based on subsets of statistics sufficient for this parameter. This increases acceptance rates dramatically, rendering this approach suitable even for models of very high dimensionality. We further derive that for linear models, a one dimensional combination of statistics per parameter is sufficient and can be found empirically with simulations. Finally, we demonstrate that our method readily scales to models of very high dimensionality using both toy models as well as by jointly inferring the effective population size, the distribution of fitness effects of new mutations (DFE) and selection coefficients for each locus from data of a recent experiment on the evolution of drug-resistance in Influenza."
to:NB  simulation  approximate_bayesian_computation  statistics  estimation 
august 2015 by cshalizi
Training generative neural networks via maximum mean discrepancy optimization
"We consider training a deep neural network to generate samples from an unknown distribu- tion given i.i.d. data. We frame learning as an optimization minimizing a two-sample test statistic—informally speaking, a good genera- tor network produces samples that cause a two- sample test to fail to reject the null hypothesis. As our two-sample test statistic, we use an un- biased estimate of the maximum mean discrep- ancy, which is the centerpiece of the nonpara- metric kernel two-sample test proposed by Gret- ton et al. [2]. We compare to the adversar- ial nets framework introduced by Goodfellow et al. [1], in which learning is a two-player game between a generator network and an adversarial discriminator network, both trained to outwit the other. From this perspective, the MMD statistic plays the role of the discriminator. In addition to empirical comparisons, we prove bounds on the generalization error incurred by optimizing the empirical MMD."

--- On first glance, there's no obvious limitation to neural networks, and indeed it's rather suggestive of indirect inference (to me)
to:NB  simulation  stochastic_models  neural_networks  machine_learning  two-sample_tests  hypothesis_testing  nonparametrics  kernel_methods  statistics  computational_statistics  ghahramani.zoubin 
july 2015 by cshalizi
[1411.4723] A Frequentist Approach to Computer Model Calibration
"This paper considers the computer model calibration problem and provides a general frequentist solution. Under the proposed framework, the data model is semi-parametric with a nonparametric discrepancy function which accounts for any discrepancy between the physical reality and the computer model. In an attempt to solve a fundamentally important (but often ignored) identifiability issue between the computer model parameters and the discrepancy function, this paper proposes a new and identifiable parametrization of the calibration problem. It also develops a two-step procedure for estimating all the relevant quantities under the new parameterization. This estimation procedure is shown to enjoy excellent rates of convergence and can be straightforwardly implemented with existing software. For uncertainty quantification, bootstrapping is adopted to construct confidence regions for the quantities of interest. The practical performance of the proposed methodology is illustrated through simulation examples and an application to a computational fluid dynamics model."

- i.e., pick the parameter value where a nonparametric regression of the residuals is as small as possible on average.
to:NB  simulation  statistics  estimation  re:stacs 
january 2015 by cshalizi
[1501.01265] The ABC of Simulation Estimation with Auxiliary Statistics
"This paper provides a synthesis of the simulation based minimum distance estimators used in economics with the method of ABC (Approximate Bayesian Computation) used in other disciplines. While the two strands of work are seemingly related, the relation between them is not well understood beyond the fact that they all replace the likelihood by auxiliary statistics that are informative about the data. We connect the two methods by using a reverse sampler to engineer the ABC posterior distribution from a sequence of simulated minimum distance estimates. Focusing on the exactly identified case, we find that Bayesian and frequentist estimators have different bias properties for an arbitrary choice of prior. The difference can be traced to whether we match the sample auxiliary statistics to each or to the average of the simulated ones. In principle, the likelihood-free Bayesian estimators can completely eliminate the order 1T bias with suitable choice of the prior. But under a specific relation between the structural parameters and auxiliary statistics, the frequentist simulated distance estimators are automatically second order unbiased. These differences are illustrated using an analytical example and a simulation study of the dynamic panel model."
in_NB  indirect_inference  approximate_bayesian_computation  statistics  computational_statistics  simulation  re:stacs  to_read 
january 2015 by cshalizi
Agent-Based Simulation and Sociological Understanding
"The article discusses agent-based simulation as a tool of sociological understanding. Based on an inferential account of understanding, it argues that computer simulations increase our explanatory understanding both by expanding our ability to make what-if inferences about social processes and by making these inferences more reliable. However, our ability to understand simulations limits our ability to understand real world phenomena through them. Thomas Schelling's checkerboard model of ethnic segregation is used to demonstrate the important role played by abstract how-possibly models in the process of building a mechanistic understanding of social phenomena."
simulation  philosophy_of_science  agent-based_models  schelling_model  sociology  social_science_methodology  in_NB 
september 2014 by cshalizi
[1404.0667] ATLAS: A geometric approach to learning high-dimensional stochastic systems near manifolds
"When simulating multiscale stochastic differential equations (SDEs) in high-dimensions, separation of timescales, stochastic noise and high-dimensionality can make simulations prohibitively expensive. The computational cost is dictated by microscale properties and interactions of many variables, while interesting behavior often occurs at the macroscale level and at large time scales, often characterized by few important, but unknown, degrees of freedom. For many problems bridging the gap between the microscale and macroscale by direct simulation is computationally infeasible. In this work we propose a novel approach to automatically learn a reduced model with an associated fast macroscale simulator. Our unsupervised learning algorithm uses short parallelizable microscale simulations to learn provably accurate macroscale SDE models. The learning algorithm takes as input: the microscale simulator, a local distance function, and a homogenization spatial or temporal scale, which is the smallest time scale of interest in the reduced system. The learned macroscale model can then be used for fast computation and storage of long simulations. We discuss various examples, both low- and high-dimensional, as well as results about the accuracy of the fast simulators we construct, and its dependency on the number of short paths requested from the microscale simulator."
in_NB  stochastic_differential_equations  macro_from_micro  simulation  stochastic_processes 
april 2014 by cshalizi
The Practice of Agent-Based Model Visualization
"We discuss approaches to agent-based model visualization. Agent-based modeling has its own requirements for visualization, some shared with other forms of simulation software, and some unique to this approach. In particular, agent-based models are typified by complexity, dynamism, nonequilibrium and transient behavior, heterogeneity, and a researcher's interest in both individual- and aggregate-level behavior. These are all traits requiring careful consideration in the design, experimentation, and communication of results. In the case of all but final communication for dissemination, researchers may not make their visualizations public. Hence, the knowledge of how to visualize during these earlier stages is unavailable to the research community in a readily accessible form. Here we explore means by which all phases of agent-based modeling can benefit from visualization, and we provide examples from the available literature and online sources to illustrate key stages and techniques."
in_NB  visual_display_of_quantitative_information  simulation  agent-based_models  to_teach:complexity-and-inference 
march 2014 by cshalizi
[1403.1011] Model versions and fast algorithms for network epidemiology
"Network epidemiology has become a core framework for investigating the role of human contact patterns in the spreading of infectious diseases. In network epidemiology represents the contact structure as a network of nodes (individuals) connected by links (sometimes as a temporal network where the links are not continuously active) and the disease as a compartmental model (where individuals are assigned states with respect to the disease and follow certain transition rules between the states). In this paper, we discuss fast algorithms for such simulations and also compare two commonly used versions - one where there is a constant recovery rate (the number of individuals that stop being infectious per time is proportional to the number of such people), the other where the duration of the disease is constant. We find that, for most practical purposes, these versions are qualitatively the same."
in_NB  networks  simulation  epidemic_models  holme.petter 
march 2014 by cshalizi
Building Simulations from the Ground Up: Modeling and Theory in Systems Biology
"In this article, we provide a case study examining how integrative systems biologists build simulation models in the absence of a theoretical base. Lacking theoretical starting points, integrative systems biology researchers rely cognitively on the model-building process to disentangle and understand complex biochemical systems. They build simulations from the ground up in a nest-like fashion, by pulling together information and techniques from a variety of possible sources and experimenting with different structures in order to discover a stable, robust result. Finally, we analyze the alternative role and meaning theory has in systems biology expressed as canonical template theories like Biochemical Systems Theory."
to:NB  molecular_biology  simulation  philosophy_of_science 
february 2014 by cshalizi
Worthwhile Canadian Initiative: Microfoundations we like vs microfoundations we can solve
I fail to see why conclusion #4 is not altogether correct, and if it's outside the current comfort zone and skill-set of economists, well, that's just sad.
economics  macroeconomics  social_science_methodology  agent-based_models  simulation  via:jbdelong  to:blog 
december 2013 by cshalizi
Simulation as an engine of physical scene understanding
"In a glance, we can perceive whether a stack of dishes will topple, a branch will support a child’s weight, a grocery bag is poorly packed and liable to tear or crush its contents, or a tool is firmly attached to a table or free to be lifted. Such rapid physical inferences are central to how people interact with the world and with each other, yet their computational underpinnings are poorly understood. We propose a model based on an “intuitive physics engine,” a cognitive mechanism similar to computer engines that simulate rich physics in video games and graphics, but that uses approximate, probabilistic simulations to make robust and fast inferences in complex natural scenes where crucial information is unobserved. This single model fits data from five distinct psychophysical tasks, captures several illusions and biases, and explains core aspects of human mental models and common-sense reasoning that are instrumental to how humans understand their everyday world."
to:NB  perception  cognitive_science  simulation 
november 2013 by cshalizi
Building Simulations from the Ground Up: Modeling and Theory in Systems Biology [JSTOR: Philosophy of Science, Vol. 80, No. 4 (October 2013), pp. 533-556]
"In this article, we provide a case study examining how integrative systems biologists build simulation models in the absence of a theoretical base. Lacking theoretical starting points, integrative systems biology researchers rely cognitively on the model-building process to disentangle and understand complex biochemical systems. They build simulations from the ground up in a nest-like fashion, by pulling together information and techniques from a variety of possible sources and experimenting with different structures in order to discover a stable, robust result. Finally, we analyze the alternative role and meaning theory has in systems biology expressed as canonical template theories like Biochemical Systems Theory."
to:NB  simulation  biochemical_networks  biology  philosophy_of_science 
october 2013 by cshalizi
Hardin , Garcia , Golan : A method for generating realistic correlation matrices
"Simulating sample correlation matrices is important in many areas of statistics. Approaches such as generating Gaussian data and finding their sample correlation matrix or generating random uniform [−1,1] deviates as pairwise correlations both have drawbacks. We develop an algorithm for adding noise, in a highly controlled manner, to general correlation matrices. In many instances, our method yields results which are superior to those obtained by simply simulating Gaussian data. Moreover, we demonstrate how our general algorithm can be tailored to a number of different correlation models. Using our results with a few different applications, we show that simulating correlation matrices can help assess statistical methodology."
to:NB  statistics  simulation  re:g_paper 
october 2013 by cshalizi
Backward Simulation Methods for Monte Carlo Statistical Inference
"Backward Simulation Methods for Monte Carlo Statistical Inference presents and discusses various backward simulation methods for Monte Carlo statistical inference. The focus is on SMC-based backward simulators, which are useful for inference in analytically intractable models, such as nonlinear and/or non-Gaussian SSMs, but also in more general latent variable models."
to:NB  monte_carlo  particle_filters  simulation  statistics 
september 2013 by cshalizi
[1308.0049] A composite likelihood approach to computer model calibration using high-dimensional spatial data
"Computer models are used to model complex processes in various disciplines. Often, a key source of uncertainty in the behavior of complex computer models is uncertainty due to unknown model input parameters. Statistical computer model calibration is the process of inferring model parameter values, along with associated uncertainties, from observations of the physical process and from model outputs at various parameter settings. Observations and model outputs are often in the form of high-dimensional spatial fields, especially in the environmental sciences. Sound statistical inference may be computationally challenging in such situations. Here we introduce a composite likelihood-based approach to perform computer model calibration with high-dimensional spatial data. While composite likelihood has been studied extensively in the context of spatial statistics, computer model calibration using composite likelihood poses several new challenges. We propose a computationally efficient approach for Bayesian computer model calibration using composite likelihood. We also develop a methodology based on asymptotic theory for adjusting the composite likelihood posterior distribution so that it accurately represents posterior uncertainties. We study the application of our new approach in the context of calibration for a climate model."
to:NB  simulation  statistics  likelihood  computational_statistics  estimation 
august 2013 by cshalizi
"Microsimulation Methods in Spatial Analysis and Planning" --- JSTOR: Geografiska Annaler. Series B, Human Geography, Vol. 69, No. 2 (1987), pp. 145-164
"Despite the significant potential afforded to model based geography by microsimulation methods they are still largely ignored by spatial analysts. In the worse case there is the possibility of the method being reinvented by different researchers independently. In this paper we attempt to describe the main features of the method and to briefly review some of the more important applications. We also speculate on some future developments of the methodology in the area of social and economic geography."
in_NB  agent-based_models  simulation  economics  geography 
july 2013 by cshalizi
[1307.1223] Fast inverse transform sampling in one and two dimensions
"We develop a computationally efficient and robust algorithm for generating pseudo-random samples from a broad class of smooth probability distributions in one and two dimensions. The algorithm is based on inverse transform sampling with a polynomial approximation scheme using Chebyshev polynomials, Chebyshev grids, and low rank function approximation. Numerical experiments demonstrate that our algorithm outperforms existing approaches."
to:NB  computational_statistics  simulation  monte_carlo 
july 2013 by cshalizi
[1306.4032] Playing Russian Roulette with Intractable Likelihoods
"A general scheme to exploit Exact-Approximate MCMC methodology for intractable likelihoods is suggested. By representing the intractable likelihood as an infinite Maclaurin or Geometric series expansion, unbiased estimates of the likelihood can be obtained by finite time stochastic truncations of the series via Russian Roulette sampling. Whilst the estimates of the intractable likelihood are unbiased, for unbounded unnormalised densities they induce a signed measure in the Exact-Approximate Markov chain Monte Carlo procedure which will introduce bias in the invariant distribution of the chain. By exploiting results from the Quantum Chromodynamics literature the signed measures can be employed in an Exact-Approximate sampling scheme in such a way that expectations with respect to the desired target distribution are preserved. This provides a general methodology to construct Exact-Approximate sampling schemes for a wide range of models and the methodology is demonstrated on well known examples such as posterior inference of coupling parameters in Ising models and defining the posterior for Fisher-Bingham distributions defined on the $d$-Sphere. A large scale example is provided for a Gaussian Markov Random Field model, with fine scale mesh refinement, describing the Ozone Column data. To our knowledge this is the first time that fully Bayesian inference over a model of this size has been feasible without the need to resort to any approximations. Finally a critical assessment of the strengths and weaknesses of the methodology is provided with pointers to ongoing research."
in_NB  monte_carlo  approximate_bayesian_computation  simulation  likelihood  estimation  statistics  to_read  re:stacs 
june 2013 by cshalizi
2013: Knowledge Extraction via Comparison of Complex Computational Models to Massive Data Sets: July 29-31, 2013 | Statistical and Applied Mathematical Sciences Institute (SAMSI)
"Advances in computation have significantly improved our abilities to model complex processes in science, engineering and the social sciences. In parallel, experimental observations have grown in size and complexity as well. Gaining knowledge and insight from these efforts requires rigorous comparison of models and data. The ever increasing sophistication of the models along with the size and detail of the heterogeneous data sets demands commensurate advances in the processes and practices of data analysis.
"This workshop, co-sponsored by SAMSI and the NSF funded MADAI collaboration (Models and Data Analysis Initiative, http://madai.us) is devoted to applying and developing new techniques for the statistical analysis of massively complex models and the application of cutting edge visualization tools to drive data exploration. Currently, MADAI's analysis infrastructure and work- flows are being designed to address scientific challenges in Heavy-Ion Physics, Cosmology and Climate Sciences. Once fully developed these should be broadly extensible to other domains.
"The purpose of the workshop is to introduce a broader base of domain scientists in the aforementioned communities to statistical and visualization tools that facilitate knowledge extraction via complex model to data comparisons. The workshop will also provide opportunities for the Statistical Science community to learn about recent developments in complex modeling and computer experiments as well as engage in new collaborative ventures. Two half-day hands-on tutorials will showcase a modular visualization platform (based on Paraview) that allows for advanced visualization of complex model dynamics as well as statistical analysis tools. The statistical tools are based on Gaussian process surrogate models for rapid exploration of a model's parameter space."

- Can't attend, but looks interesting.
conferences  computational_statistics  data_analysis  simulation  nonparametrics  visual_display_of_quantitative_information 
june 2013 by cshalizi
[1304.5768] Derivative-Free Estimation of the Score Vector and Observed Information Matrix with Application to State-Space Models
"Ionides, King et al. (see e.g. Inference for nonlinear dynamical systems, PNAS 103) have recently introduced an original approach to perform maximum likelihood parameter estimation in state-space models which only requires being able to simulate the latent Markov model according its prior distribution. Their methodology relies on an approximation of the score vector for general statistical models based upon an artificial posterior distribution and bypasses the calculation of any derivative. Building upon this insightful work, we provide here a simple "derivative-free" estimator of the observed information matrix based upon this very artificial posterior distribution. However for state-space models where sequential Monte Carlo computation is required, these estimators have too high a variance and need to be modified. In this specific context, we derive new derivative-free estimators of the score vector and observed information matrix which are computed using sequential Monte Carlo approximations of smoothed additive functionals associated with a modified version of the original state-space model."

Auto-vulgarization: http://statisfaction.wordpress.com/2013/04/23/derivative-free-estimate-of-derivatives/
to:NB  estimation  statistics  fisher_information  state-space_models  simulation  particle_filters 
april 2013 by cshalizi
Burn-In is Unnecessary
Hmmm. Shouldn't one be able to address this as, given that the initial state X_0 comes from a distribution \pi which is not the invariant distribution \rho of the Markov operator, for what b does the empirical distribution of X_{b:n} come closest, on average and in some reasonable metric, to \rho? The answer presumably depends on how far \pi is from \rho and how rapidly T mixes.
monte_carlo  to_teach:statcomp  ergodic_theory  markov_models  geyer.charles_j.  have_read  simulation  computational_statistics 
april 2013 by cshalizi
Simulation and Learning - A Model-Centered Approach
"This book conveys the incredible instructional potential of simulation as a modality of education and provides guidelines for the design of effective simulation-based learning environments.  The framework of the book consists of  model-centered learning---learning that requires a restructuring of individual mental models utilized by both students and teachers."

Simulation models extend our biological capacity to carry out simulative reasoning. Recent approaches to mental modeling, such as embodied cognition and the extended mind hypothesis are also considered in the book, which relies heavily on recent advances in cognitive science.

A conceptual model called the “epistemic simulation cycle” is proposed as a blueprint for the comprehension of the cognitive activities involved in simulation-based learning and for instructional design.
in_NB  books:noted  simulation  education  modeling  cognitive_science 
april 2013 by cshalizi
Model-free forecasting outperforms the correct mechanistic model for simulated and experimental data
"Accurate predictions of species abundance remain one of the most vexing challenges in ecology. This observation is perhaps unsurprising, because population dynamics are often strongly forced and highly nonlinear. Recently, however, numerous statistical techniques have been proposed for fitting highly parameterized mechanistic models to complex time series, potentially providing the machinery necessary for generating useful predictions. Alternatively, there is a wide variety of comparatively simple model-free forecasting methods that could be used to predict abundance. Here we pose a rather conservative challenge and ask whether a correctly specified mechanistic model, fit with commonly used statistical techniques, can provide better forecasts than simple model-free methods for ecological systems with noisy nonlinear dynamics. Using four different control models and seven experimental time series of flour beetles, we found that Markov chain Monte Carlo procedures for fitting mechanistic models often converged on best-fit parameterizations far different from the known parameters. As a result, the correctly specified models provided inaccurate forecasts and incorrect inferences. In contrast, a model-free method based on state-space reconstruction gave the most accurate short-term forecasts, even while using only a single time series from the multivariate system. Considering the recent push for ecosystem-based management and the increasing call for ecological predictions, our results suggest that a flexible model-free approach may be the most promising way forward."
to:NB  time_series  nonparametrics  simulation  ecology  statistics  prediction  state-space_reconstruction  sugihara.george  to_read 
march 2013 by cshalizi
Computing Likelihood Functions for High-Energy Physics Experiments when Distributions are Defined by Simulators with Nuisance Parameters
"When searching for new phenomena in high-energy physics, statistical analysis is complicated by the presence of nuisance parameters, representing uncertainty in the physics of interactions or in detector properties. Another complication, even with no nuisance parameters, is that the probability distributions of the models are specified only by simulation programs, with no way of evaluating their probability density functions. I advocate expressing the result of an experiment by means of the likelihood function, rather than by frequentist confidence intervals or p-values. A likelihood function for this problem is difficult to obtain, however, for both of the reasons given above. I discuss ways of circumventing these problems by reducing dimensionality using a classifier and employing simulations with multiple values for the nuisance parameters."
to:NB  statistics  likelihood  computational_statistics  simulation  neal.radford 
march 2013 by cshalizi
Perfect Simulation of Autoregressive Models with Infinite Memory - Springer
"In this paper we consider the problem of determining the law of binary stochastic processes from transition kernels depending on the whole past. These kernels are linear in the past values of the process. They are allowed to assume values close to both 0 and 1, preventing the application of usual results on uniqueness. We give sufficient conditions for uniqueness and non-uniqueness; in the former case a perfect simulation algorithm is also given."
to:NB  stochastic_processes  markov_models  simulation 
march 2013 by cshalizi
[1303.3079] Uncertainty Quantification for Emulators
"Consider approximating a function $f$ by an emulator $\hat{f}$ based on $n$ observations of $f$. This problem is a common when observing $f$ requires a computationally demanding simulation or an actual experiment. Let $w$ be a point in the domain of $f$. The potential error of $\hat{f}$ at $w$ is the largest value of $|\hat{f}(w) - g(w)|$ among functions $g$ that satisfy all constraints $f$ is known to satisfy. The supremum over $w$ of the potential error is the maximum potential error of $\hat{f}$. Suppose that $f$ is in a known class of regular functions. The observations provide a lower bound for the (global) regularity of $f$ as an element of that class. Consider the set $\mathcal F$ of all functions in the regularity class that agree with the $n$ observations and are globally no less regular than $f$ has been observed to be. Among all emulators that produce functions in $\mathcal F$, we find a lower bound on the potential error for $f \in \mathcal F$; its maximum over $w$ is a lower bound on the maximum potential error of any $\hat{f}$. If this lower bound is large, every emulator based on these observations is potentially substantially incorrect. To guarantee higher accuracy would require stronger assumptions about the regularity of $f$. We find a lower bound on the number of observations required to ensure that some emulator based on those observations approximates all $f \in \mathcal F$ to within $\epsilon$. For the Community Atmosphere Model, the maximum potential error of any emulator trained on a particular set of 1154 observations of $f$ is no smaller than the potential error based on a single observation of $f$ at the centroid of the 21-dimensional parameter space."
to:NB  simulation  regression  nonparametrics  stark.philip_b.  statistics 
march 2013 by cshalizi
[1303.0326] Robust Sensitivity Analysis for Stochastic Systems
"Sensitivity analysis for stochastic systems is typically carried out via derivative estimation, which critically requires parametric model assumptions. In many situations, however, we want to evaluate model misspecification effect beyond certain parametric family of models, or in some cases, there plainly are no parametric models to begin with. Motivated by such deficiency, we propose a sensitivity analysis framework that is parameter-free, by using the Kullback-Leibler divergence as a measure of model discrepancy, and obtain well-defined derivative estimators. These estimators are robust in that they automatically choose the worst (and best)-case directions to move along in the (typically infinite-dimensional) model space. They require little knowledge to implement; the distributions of the underlying random variables can be known up to, for example, black-box simulation. Methodologically, we identify these worst-case directions of the model as changes of measure that are the fixed points of a class of functional contraction maps. These fixed points can be asymptotically expanded, resulting in derivative estimators that are expressed in closed-form formula in terms of the moments of certain "symmetrizations" of the system, and hence are readily computable."
in_NB  sensitivity_analysis  simulation  stochastic_processes  information_theory  misspecification 
march 2013 by cshalizi
[1302.6427] Hypothesis Testing for Validation and Certification
"We develop a hypothesis testing framework for the formulation of the problems of 1) the validation of a simulation model and 2) using modeling to certify the performance of a physical system. These results are used to solve the extrapolative validation and certification problems, namely problems where the regime of interest is different than the regime for which we have experimental data. We use concentration of measure theory to develop the tests and analyze their errors. This work was stimulated by the work of Lucas, Owhadi, and Ortiz where a rigorous method of validation and certification is described and tested. In a remark we describe the connection between the two approaches. Moreover, as mentioned in that work these results have important implications in the Quantification of Margins and Uncertainties (QMU) framework. In particular, in a remark we describe how it provides a rigorous interpretation of the notion of confidence and new notions of margins and uncertainties which allow this interpretation. Since certain concentration parameters used in the above tests may be unkown, we furthermore show, in the last half of the paper, how to derive equally powerful tests which estimate them from sample data, thus replacing the assumption of the values of the concentration parameters with weaker assumptions. This paper is an essentially exact copy of one dated April 10, 2010."
to:NB  hypothesis_testing  statistics  steinwart.ingo  to_teach:undergrad-ADA  simulation 
march 2013 by cshalizi
[1302.0583] Efficient Importance Sampling for Rare Event Simulation with Applications
"Importance sampling has been known as a powerful tool to reduce the variance of Monte Carlo estimator for rare event simulation. Based on the criterion of minimizing the variance of Monte Carlo estimator within a parametric family, we propose a general account for finding the optimal tilting measure. To this end, when the moment generating function of the underlying distribution exists, we obtain a simple and explicit expression of the optimal alternative distribution. The proposed algorithm is quite general to cover many interesting examples, such as normal distribution, noncentral $\chi^2$ distribution, and compound Poisson processes. To illustrate the broad applicability of our method, we study value-at-risk (VaR) computation in financial risk management and bootstrap confidence regions in statistical inferences."
in_NB  simulation  rare-event_simulation  monte_carlo 
march 2013 by cshalizi
[1302.1500] Exact test for Markov order
"We describe an exact test of the null hypothesis that a Markov chain is nth order versus the alternate hypothesis that it is $(n+1)$-th order. The procedure does not rely on asymptotic properties, but instead builds up the test statistic distribution via surrogate data and is valid for any sample size. Surrogate data are generated using a novel algorithm that guarantees, per shot, a uniform sampling from the set of sequences that exactly match the nth order properties of the observed data."
to:NB  markov_models  model_selection  simulation  statistics 
march 2013 by cshalizi
IEEE Xplore - Estimation of a Density Using Real and Artificial Data
"Let $X, X_{1}, X_{2}, dots $ be independent and identically distributed $ {BBR }^{d}$-valued random variables and let $m: {BBR }^{d} rightarrow {BBR } $ be a measurable function such that a density $f$ of $Y=m(X)$ exists. Given a sample of the distribution of $(X,Y)$ and additional independent observations of $X$ , we are interested in estimating $f$. We apply a regression estimate to the sample of $(X,Y)$ and use this estimate to generate additional artificial observations of $Y$ . Using these artificial observations together with the real observations of $Y$, we construct a density estimate of $f$ by using a convex combination of two kernel density estimates. It is shown that if the bandwidths satisfy the usual conditions and if in addition the supremum norm error of the regression estimate converges almost surely faster toward zero than the bandwidth of the kernel density estimate applied to the artificial data, then the convex combination of the two density estimates is $L_{1}$-consistent. The performance of the estimate for finite sample size is illustrated by simulated data, and the usefulness of the proced- re is demonstrated by applying it to a density estimation problem in a simulation model."
to:NB  regression  density_estimation  statistics  kernel_estimators  simulation  devroye.luc  semi-supervised_learning 
february 2013 by cshalizi
[1302.3564] Tail Sensitivity Analysis in Bayesian Networks
"The paper presents an efficient method for simulating the tails of a target variable Z=h(X) which depends on a set of basic variables X=(X_1, ..., X_n). To this aim, variables X_i, i=1, ..., n are sequentially simulated in such a manner that Z=h(x_1, ..., x_i-1, X_i, ..., X_n) is guaranteed to be in the tail of Z. When this method is difficult to apply, an alternative method is proposed, which leads to a low rejection proportion of sample values, when compared with the Monte Carlo method. Both methods are shown to be very useful to perform a sensitivity analysis of Bayesian networks, when very large confidence intervals for the marginal/conditional probabilities are required, as in reliability or risk analysis. The methods are shown to behave best when all scores coincide. The required modifications for this to occur are discussed. The methods are illustrated with several examples and one example of application to a real case is used to illustrate the whole process.'
to:NB  graphical_models  sensitivity_analysis  simulation  rare-event_simulation  statistics 
february 2013 by cshalizi
[1301.0463] A Simple Approach to Maximum Intractable Likelihood Estimation
"Approximate Bayesian Computation (ABC) can be viewed as an analytic approximation of an intractable likelihood coupled with an elementary simulation step. Such a view, combined with a suitable instrumental prior distribution permits maximum-likelihood (or maximum-a-posteriori) inference to be conducted, approximately, using essentially the same techniques. An elementary approach to this problem which simply obtains a nonparametric approximation of the likelihood surface which is then used as a smooth proxy for the likelihood in a subsequent maximisation step is developed here and the convergence of this class of algorithms is characterised theoretically. The use of non-sufficient summary statistics in this context is considered. Applying the proposed method to four problems demonstrates good performance. The proposed approach provides an alternative for approximating the maximum likelihood estimator (MLE) in complex scenarios."

Journal version (in open-access _Electronic Journal of Statistics_): http://dx.doi.org/10.1214/13-EJS819
in_NB  to_read  statistics  estimation  likelihood  approximate_bayesian_computation  indirect_inference  re:stacs  simulation  to_teach:complexity-and-inference 
january 2013 by cshalizi
Statistical Approach to Quantum Field Theory
"Over the past few decades the powerful methods of statistical physics and Euclidean quantum field theory have moved closer together, with common tools based on the use of path integrals. The interpretation of Euclidean field theories as particular systems of statistical physics has opened up new avenues for understanding strongly coupled quantum systems or quantum field theories at zero or finite temperatures.
"Accordingly, the first chapters of this book contain a self-contained introduction to path integrals in Euclidean quantum mechanics and statistical mechanics. The resulting high-dimensional integrals can be estimated with the help of Monte Carlo simulations based on Markov processes. The most commonly used algorithms are presented in detail so as to prepare the reader for the use of high-performance computers as an “experimental” tool for this burgeoning field of theoretical physics."
path_integrals  field_theory  quantum_mechanics  monte_carlo  stochastic_models  simulation  physics  statistical_mechanics  in_NB  books:noted 
november 2012 by cshalizi
[1210.6187] Kriging-based sequential design strategies using fast cross-validation techniques with extensions to multi-fidelity computer codes
"Kriging-based surrogate models have become very popular during the last decades to approximate a computer code output from few simulations. In practical applications, it is very common to sequentially add new simulations to obtain more accurate approximations. We propose in this paper a method of kriging-based sequential design which combines both the error evaluation providing by the kriging model and the observed errors of a Leave-One-Out cross-validation procedure. This method is proposed in two versions, the first one selects points one at-a-time. The second one allows us to parallelize the simulations and to add several design points at-a-time. Then, we extend these strategies to multi-fidelity co-kriging models which allow us to surrogate a complex code using fast approximations of it. The main advantage of these extensions is that it not only provides the new locations where to perform simulations but also which versions of code have to be simulated (between the complex one or one of its fast approximations). A real multi-fidelity application is used to illustrate the efficiency of the proposed approaches. In this example, the accurate code is a two-dimensional finite element model and the less accurate one is a one-dimensional approximation of the system."
in_NB  smoothing  statistics  simulation  cross-validation  spatial_statistics 
october 2012 by cshalizi
Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification
"Advances in computing hardware and algorithms have dramatically improved the ability to simulate complex processes computationally. Today's simulation capabilities offer the prospect of addressing questions that in the past could be addressed only by resource-intensive experimentation, if at all. Assessing the Reliability of Complex Models recognizes the ubiquity of uncertainty in computational estimates of reality and the necessity for its quantification.
"As computational science and engineering have matured, the process of quantifying or bounding uncertainties in a computational estimate of a physical quality of interest has evolved into a small set of interdependent tasks: verification, validation, and uncertainty of quantification (VVUQ). In recognition of the increasing importance of computational simulation and the increasing need to assess uncertainties in computational results, the National Research Council was asked to study the mathematical foundations of VVUQ and to recommend steps that will ultimately lead to improved processes.
"Assessing the Reliability of Complex Models discusses changes in education of professionals and dissemination of information that should enhance the ability of future VVUQ practitioners to improve and properly apply VVUQ methodologies to difficult problems, enhance the ability of VVUQ customers to understand VVUQ results and use them to make informed decisions, and enhance the ability of all VVUQ stakeholders to communicate with each other. This report is an essential resource for all decision and policy makers in the field, students, stakeholders, UQ experts, and VVUQ educators and practitioners."
to:NB  books:noted  statistics  modeling  simulation 
july 2012 by cshalizi
[1206.6452] Smoothness and Structure Learning by Proxy
"As data sets grow in size, the ability of learning methods to find structure in them is increasingly hampered by the time needed to search the large spaces of possibilities and generate a score for each that takes all of the observed data into account. For instance, Bayesian networks, the model chosen in this paper, have a super-exponentially large search space for a fixed number of variables. One possible method to alleviate this problem is to use a proxy, such as a Gaussian Process regressor, in place of the true scoring function, training it on a selection of sampled networks. We prove here that the use of such a proxy is well-founded, as we can bound the smoothness of a commonly-used scoring function for Bayesian network structure learning. We show here that, compared to an identical search strategy using the network?s exact scores, our proxy-based search is able to get equivalent or better scores on a number of data sets in a fraction of the time."
to:NB  graphical_models  simulation  machine_learning 
july 2012 by cshalizi
[0802.0021] Time series analysis via mechanistic models
"The purpose of time series analysis via mechanistic models is to reconcile the known or hypothesized structure of a dynamical system with observations collected over time. We develop a framework for constructing nonlinear mechanistic models and carrying out inference. Our framework permits the consideration of implicit dynamic models, meaning statistical models for stochastic dynamical systems which are specified by a simulation algorithm to generate sample paths. Inference procedures that operate on implicit models are said to have the plug-and-play property. Our work builds on recently developed plug-and-play inference methodology for partially observed Markov models. We introduce a class of implicitly specified Markov chains with stochastic transition rates, and we demonstrate its applicability to open problems in statistical inference for biological systems. As one example, these models are shown to give a fresh perspective on measles transmission dynamics. As a second example, we present a mechanistic analysis of cholera incidence data, involving interaction between two competing strains of the pathogen Vibrio cholera."
in_NB  to_read  time_series  statistics  statistical_inference_for_stochastic_processes  indirect_inference  simulation 
june 2012 by cshalizi
Introduction to Computable General Equilibrium Models - Academic and Professional Books - Cambridge University Press
"Computable general equilibrium (CGE) models are widely used by governmental organizations and academic institutions to analyze the economy-wide effects of events such as climate change, tax policies, and immigration. This book provides a practical, how-to guide to CGE models suitable for use at the undergraduate college level. Its introductory level distinguishes it from other available books and articles on CGE models. The book provides intuitive and graphical explanations of the economic theory that underlies a CGE model and includes many examples and hands-on modeling exercises. It may be used in courses on economics principles, microeconomics, macroeconomics, public finance, environmental economics, and international trade and finance, because it shows students the role of theory in a realistic model of an economy. The book is also suitable for courses on general equilibrium models and research methods, and for professionals interested in learning how to use CGE models."

- The mathematical and conceptual level here is shockingly low.
economics  simulation  re:your_favorite_dsge_sucks  re:computational_lens  have_read 
june 2012 by cshalizi
Haaland , Qian : Accurate emulators for large-scale computer experiments
"Large-scale computer experiments are becoming increasingly important in science. A multi-step procedure is introduced to statisticians for modeling such experiments, which builds an accurate interpolator in multiple steps. In practice, the procedure shows substantial improvements in overall accuracy, but its theoretical properties are not well established. We introduce the terms nominal and numeric error and decompose the overall error of an interpolator into nominal and numeric portions. Bounds on the numeric and nominal error are developed to show theoretically that substantial gains in overall accuracy can be attained with the multi-step approach."
to:NB  statistics  simulation 
january 2012 by cshalizi
[1111.6233] Additive Covariance Kernels for High-Dimensional Gaussian Process Modeling
"Gaussian process models -also called Kriging models- are often used as mathematical approximations of expensive experiments. However, the number of observation required for building an emulator becomes unrealistic when using classical covariance kernels when the dimension of input increases. In oder to get round the curse of dimensionality, a popular approach is to consider simplified models such as additive models. The ambition of the present work is to give an insight into covariance kernels that are well suited for building additive Kriging models and to describe some properties of the resulting models."
to:NB  statistics  regression  simulation 
december 2011 by cshalizi
[1110.3860] Contending Parties: A Logistic Choice Analysis of Inter- and Intra-group Blog Citation Dynamics in the 2004 US Presidential Election
"The 2004 US Presidential Election cycle marked the debut of Internet-based media such as blogs and social networking websites as institutionally recognized features of the American political landscape. Using a longitudinal sample of all DNC/RNC-designated blog-citation networks we are able to test the influence of various strategic, institutional, and balance-theoretic mechanisms and exogenous factors such as seasonality and political events on the propensity of blogs to cite one another over time. Capitalizing on the temporal resolution of our data, we utilize an autoregressive network regression framework to carry out inference for a logistic choice process. Using a combination of deviance-based model selection criteria and simulation-based model adequacy tests, we identify the combination of processes that best characterizes the choice behavior of the contending blogs."
to:NB  network_data_analysis  blogs  us_politics  model_selection  simulation 
october 2011 by cshalizi
[1107.1680] Developments in perfect simulation for Gibbs measures
"This paper deals with the problem of perfect sampling from a Gibbs measure with infinite range interactions. We present some sufficient conditions for the extinction of processes which are like supermartingales when large values are taken. This result has profound consequences on perfect simulation, showing that local modifications on the interactions of a model do not affect the simulability. We also pose the question to optimize over a sequence of sets and we completely solve the question in the case of finite range interactions."
simulation  gibbs_distributions  random_fields  stochastic_processes  to:NB 
july 2011 by cshalizi
[1101.0833] Dynamical systems, simulation, abstract computation
"We survey an area of recent development, relating dynamics to theoretical computer science. We discuss the theoretical limits of simulation and computation of interesting quantities in dynamical systems. We will focus on central objects of the theory of dynamics, as invariant measures and invariant sets, showing that even if they can be computed with arbitrary precision in many interesting cases, there exists some cases in which they can not. We also explain how it is possible to compute the speed of convergence of ergodic averages (when the system is known exactly) and how this entails the computation of arbitrarily good approximations of points of the space having typical statistical behaviour (a sort of constructive version of the pointwise ergodic theorem)."
dynamical_systems  theoretical_computer_science  computability  algorithmic_information_theory  ergodic_theory  simulation  to_read  re:almost_none  in_NB 
january 2011 by cshalizi
Philosophy and Simulation: The Emergence of Synthetic Reason - Continuum
I have liked DeLanda's recent books, though I still find _War in the Age of Intelligent Machines_ bad, and don't get what he sees in Deleuze.  I look forward to this one with interest.
books:noted  simulation  complexity  cellular_automata  agent-based_models  philosophy_of_science  delanda.manuel  post-structuralism  books:owned 
january 2011 by cshalizi
« earlier      
per page:    204080120160

related tags

adaptive_behavior  afghanistan  africa  agent-based_models  ai  algorithmic_information_theory  anasazi  animals  apollo_project  approximate_bayesian_computation  archaeology  artificial_life  astronomy  bad_science  banks.david  bayesianism  berk.richard  biochemical_networks  biology  blogs  books:noted  books:owned  books:recommended  book_reviews  bounded_rationality  cellular_automata  chaos  climate_change  cognition  cognitive_science  cold_war  complexity  computability  computation  computational_statistics  conferences  confidence_sets  control_of_chaos  control_theory_and_control_engineering  coveted  crime  cross-validation  cultural_evolution  data_analysis  dauxois.thierry  debugging  decision-making  delanda.manuel  del_moral.pierre  demography  density_estimation  design  devroye.luc  diagonalization  downloaded  dynamical_systems  ecology  econometrics  economics  economic_policy  education  epidemic_models  ergodic_theory  estimation  estimation_of_dynamical_systems  evisceration  evolutionary_biology  evolutionary_computation  evolutionary_economics  evolution_of_intelligence  experimental_design  explanation  extended_kalman_filter  fallacies  fermi.enrico  field_theory  filtering  fisher_information  flake.gary_william  forrester.jay  fractals  genetic_algorithms  geography  geyer.charles_j.  ghahramani.zoubin  gibbs_distributions  giere.ronald  godels_theorem  graphical_models  gravner.janko  gray.robert_m  great_transformation  griffeath.david  have_read  hierarchical_structure  hilbert_space  history  history_of_science  history_of_technology  holme.petter  human_evolution  hypothesis_testing  imperialism  improvement_of_the_understanding  indirect_inference  information_theory  institutions  interacting_particle_systems  intro_prob  in_NB  izbicki.rafael  kalman_filter  kernel_estimators  kernel_methods  kith_and_kin  large_deviations  lee.ann_b.  likelihood  machine_learning  macroeconomics  macro_from_micro  manning.patrick  marcus.gary  markov_models  mental_models  meteorology  methodology  microcosms  misspecification  modeling  model_checking  model_selection  molecular_biology  monte_carlo  nasa  neal.radford  networks  network_data_analysis  neural_networks  neuroscience  new_mexico  nonparametrics  nordhaus.william  organizations  orrery  particle_filters  pasta.j  path_integrals  pattern_formation  perception  philosophy_of_science  physics  political_science  post-structuralism  prediction  pretty_pictures  probability  programming  public_opinion  quantum_mechanics  R  random_fields  rar  rare-event_simulation  re:ADAfaEPoV  re:almost_none  re:computational_lens  re:do-institutions-evolve  re:fitness_sampling  re:g_paper  re:phil-of-bayes_paper  re:stacs  re:your_favorite_dsge_sucks  recursion  regression  representation  rosenblueth.arturo  schelling_model  scientific_computing  sebastian.lynne  semantics_from_syntax  semi-supervised_learning  sensitivity_analysis  simon.herbert  simulation  simulation-based_inference  simulation:instructional  slavery  smoothing  snowflakes  social_influence  social_life_of_the_mind  social_science_methodology  sociology  sociology_of_science  software  spatial_statistics  stark.philip_b.  state-space_models  state-space_reconstruction  state_estimation  statistical_inference_for_stochastic_processes  statistical_mechanics  statistics  steinwart.ingo  stochastic_differential_equations  stochastic_models  stochastic_processes  sugihara.george  theoretical_biology  theoretical_computer_science  the_continuing_crises  time_series  to:blog  to:NB  to_be_shot_after_a_fair_trial  to_read  to_teach:complexity-and-inference  to_teach:data_over_space_and_time  to_teach:statcomp  to_teach:undergrad-ADA  tsingou.mary  two-sample_tests  ulam.stanislaw  uncomputability  us_military  us_politics  utter_stupidity  via:brad_plumer  via:jbdelong  via:krugman  via:rvenkat  via:the_author  via:tozier  via:unfogged  via:vaguery  visual_display_of_quantitative_information  voter_model  wiener.norbert 

Copy this bookmark: