**cshalizi + prediction**
326

Evaluating Probabilistic Forecasts with scoringRules | Jordan | Journal of Statistical Software

3 days ago by cshalizi

"Probabilistic forecasts in the form of probability distributions over future events have become popular in several fields including meteorology, hydrology, economics, and demography. In typical applications, many alternative statistical models and data sources can be used to produce probabilistic forecasts. Hence, evaluating and selecting among competing methods is an important task. The scoringRules package for R provides functionality for comparative evaluation of probabilistic models based on proper scoring rules, covering a wide range of situations in applied work. This paper discusses implementation and usage details, presents case studies from meteorology and economics, and points to the relevant background literature."

to:NB
prediction
statistics
to_teach:undergrad-ADA
to_teach:data-mining
3 days ago by cshalizi

[1908.07204] Forecasting observables with particle filters: Any filter will do!

3 days ago by cshalizi

"We investigate the impact of filter choice on forecast accuracy in state space models. The filters are used both to estimate the posterior distribution of the parameters, via a particle marginal Metropolis-Hastings (PMMH) algorithm, and to produce draws from the filtered distribution of the final state. Multiple filters are entertained, including two new data-driven methods. Simulation exercises are used to document the performance of each PMMH algorithm, in terms of computation time and the efficiency of the chain. We then produce the forecast distributions for the one-step-ahead value of the observed variable, using a fixed number of particles and Markov chain draws. Despite distinct differences in efficiency, the filters yield virtually identical forecasting accuracy, with this result holding under both correct and incorrect specification of the model. This invariance of forecast performance to the specification of the filter also characterizes an empirical analysis of S&P500 daily returns."

to:NB
time_series
particle_filters
prediction
state_estimation
state-space_models
to_teach:data_over_space_and_time
statistics
3 days ago by cshalizi

The Incompatible Incentives of Private Sector AI by Tom Slee :: SSRN

4 days ago by cshalizi

"Algorithms that sort people into categories are plagued by incompatible incentives. While more accurate algorithms may address problems of statistical bias and unfairness, they cannot solve the ethical challenges that arise from incompatible incentives.

"Subjects of algorithmic decisions seek to optimize their outcomes, but such efforts may degrade the accuracy of the algorithm. To maintain their accuracy, algorithms must be accompanied by supplementary rules: “guardrails” that dictate the limits of acceptable behaviour by subjects. Algorithm owners are drawn into taking on the tasks of governance, managing and validating the behaviour of those who interact with their systems.

"The governance role offers temptations to indulge in regulatory arbitrage. If governance is left to algorithm owners, it may lead to arbitrary and restrictive controls on individual behaviour. The goal of algorithmic governance by automated decision systems, social media recommender systems, and rating systems is a mirage, retreating into the distance whenever we seem to approach it."

to:NB
mechanism_design
prediction
data_mining
slee.tom
to_read
to_teach:data-mining
"Subjects of algorithmic decisions seek to optimize their outcomes, but such efforts may degrade the accuracy of the algorithm. To maintain their accuracy, algorithms must be accompanied by supplementary rules: “guardrails” that dictate the limits of acceptable behaviour by subjects. Algorithm owners are drawn into taking on the tasks of governance, managing and validating the behaviour of those who interact with their systems.

"The governance role offers temptations to indulge in regulatory arbitrage. If governance is left to algorithm owners, it may lead to arbitrary and restrictive controls on individual behaviour. The goal of algorithmic governance by automated decision systems, social media recommender systems, and rating systems is a mirage, retreating into the distance whenever we seem to approach it."

4 days ago by cshalizi

[1811.06407] Neural Predictive Belief Representations

4 days ago by cshalizi

"Unsupervised representation learning has succeeded with excellent results in many applications. It is an especially powerful tool to learn a good representation of environments with partial or noisy observations. In partially observable domains it is important for the representation to encode a belief state, a sufficient statistic of the observations seen so far. In this paper, we investigate whether it is possible to learn such a belief representation using modern neural architectures. Specifically, we focus on one-step frame prediction and two variants of contrastive predictive coding (CPC) as the objective functions to learn the representations. To evaluate these learned representations, we test how well they can predict various pieces of information about the underlying state of the environment, e.g., position of the agent in a 3D maze. We show that all three methods are able to learn belief representations of the environment, they encode not only the state information, but also its uncertainty, a crucial aspect of belief states. We also find that for CPC multi-step predictions and action-conditioning are critical for accurate belief representations in visually complex environments. The ability of neural representations to capture the belief information has the potential to spur new advances for learning and planning in partially observable domains, where leveraging uncertainty is essential for optimal decision making."

to:NB
prediction
predictive_representations
inference_to_latent_objects
neural_networks
to_read
4 days ago by cshalizi

[1908.06729] Autoregressive-Model-Based Methods for Online Time Series Prediction with Missing Values: an Experimental Evaluation

4 days ago by cshalizi

"Time series prediction with missing values is an important problem of time series analysis since complete data is usually hard to obtain in many real-world applications. To model the generation of time series, autoregressive (AR) model is a basic and widely used one, which assumes that each observation in the time series is a noisy linear combination of some previous observations along with a constant shift. To tackle the problem of prediction with missing values, a number of methods were proposed based on various data models. For real application scenarios, how do these methods perform over different types of time series with different levels of data missing remains to be investigated. In this paper, we focus on online methods for AR-model-based time series prediction with missing values. We adapted five mainstream methods to fit in such a scenario. We make detailed discussion on each of them by introducing their core ideas about how to estimate the AR coefficients and their different strategies to deal with missing values. We also present algorithmic implementations for better understanding. In order to comprehensively evaluate these methods and do the comparison, we conduct experiments with various configurations of relative parameters over both synthetic and real data. From the experimental results, we derived several noteworthy conclusions and shows that imputation is a simple but reliable strategy to handle missing values in online prediction tasks."

to:NB
time_series
prediction
missing_data
statistics
to_teach:data_over_space_and_time
4 days ago by cshalizi

[1908.06437] Block Nearest Neighboor Gaussian processes for large datasets

4 days ago by cshalizi

"This work develops a valid spatial block-Nearest Neighbor Gaussian process (block-NNGP) for estimation and prediction of location-referenced large spatial datasets. The key idea behind our approach is to subdivide the spatial domain into several blocks which are dependent under some constraints. The cross-blocks capture the large-scale spatial variation, while each block capture the small-scale dependence. The block-NNGP is embeded as a sparsity-inducing prior within a hierarchical modeling framework. Markov chain Monte Carlo (MCMC) algorithms are executed without storing or decomposing large matrices, while the sparse block precision matrix is efficiently computed through parallel computing. We also consider alternate MCMC algorithms through composite sampling for faster computing time, and more reproducible Bayesian inference. The performance of the block-NNGP is illustrated using simulation studies and applications with massive real data, for locations in the order of 10^4."

to:NB
spatial_statistics
prediction
computational_statistics
statistics
to_teach:data_over_space_and_time
4 days ago by cshalizi

[1908.06936] ExaGeoStatR: A Package for Large-Scale Geostatistics in R

4 days ago by cshalizi

"Parallel computing in Gaussian process calculation becomes a necessity for avoiding computational and memory restrictions associated with Geostatistics applications. The evaluation of the Gaussian log-likelihood function requires O(n^2) storage and O(n^3) operations where n is the number of geographical locations. In this paper, we present ExaGeoStatR, a package for large-scale Geostatistics in R that supports parallel computation of the maximum likelihood function on shared memory, GPU, and distributed systems. The parallelization depends on breaking down the numerical linear algebra operations into a set of tasks and rendering them for a task-based programming model. ExaGeoStatR supports several maximum likelihood computation variants such as exact, Diagonal Super Tile (DST), and Tile Low-Rank (TLR) approximation besides providing a tool to generate large-scale synthetic datasets which can be used to test and compare different approximations methods. The package can be used directly through the R environment without any C, CUDA, or MPIknowledge. Here, we demonstrate the ExaGeoStatR package by illustrating its implementation details, analyzing its performance on various parallel architectures, and assessing its accuracy using both synthetic datasets and a sea surface temperature dataset. The performance evaluation involves spatial datasets with up to 250K observations."

to:NB
spatial_statistics
prediction
computational_statistics
R
statistics
to_teach:data_over_space_and_time
4 days ago by cshalizi

[1611.04460] Predictive, finite-sample model choice for time series under stationarity and non-stationarity

8 days ago by cshalizi

"In statistical research there usually exists a choice between structurally simpler or more complex models. We argue that, even if a more complex, locally stationary time series model were true, then a simple, stationary time series model may be advantageous to work with under parameter uncertainty. We present a new model choice methodology, where one of two competing approaches is chosen based on its empirical, finite-sample performance with respect to prediction, in a manner that ensures interpretability. A rigorous, theoretical analysis of the procedure is provided. As an important side result we prove, for possibly diverging model order, that the localised Yule-Walker estimator is strongly, uniformly consistent under local stationarity. An R package, forecastSNSTS, is provided and used to apply the methodology to financial and meteorological data in empirical examples. We further provide an extensive simulation study and discuss when it is preferable to base forecasts on the more volatile time-varying estimates and when it is advantageous to forecast as if the data were from a stationary process, even though they might not be."

to:NB
time_series
prediction
model_selection
statistics
non-stationarity
8 days ago by cshalizi

[1908.02718] A Characterization of Mean Squared Error for Estimator with Bagging

16 days ago by cshalizi

"Bagging can significantly improve the generalization performance of unstable machine learning algorithms such as trees or neural networks. Though bagging is now widely used in practice and many empirical studies have explored its behavior, we still know little about the theoretical properties of bagged predictions. In this paper, we theoretically investigate how the bagging method can reduce the Mean Squared Error (MSE) when applied on a statistical estimator. First, we prove that for any estimator, increasing the number of bagged estimators N in the average can only reduce the MSE. This intuitive result, observed empirically and discussed in the literature, has not yet been rigorously proved. Second, we focus on the standard estimator of variance called unbiased sample variance and we develop an exact analytical expression of the MSE for this estimator with bagging.

"This allows us to rigorously discuss the number of iterations N and the batch size m of the bagging method. From this expression, we state that only if the kurtosis of the distribution is greater than 32, the MSE of the variance estimator can be reduced with bagging. This result is important because it demonstrates that for distribution with low kurtosis, bagging can only deteriorate the performance of a statistical prediction. Finally, we propose a novel general-purpose algorithm to estimate with high precision the variance of a sample."

to:NB
ensemble_methods
prediction
regression
statistics
"This allows us to rigorously discuss the number of iterations N and the batch size m of the bagging method. From this expression, we state that only if the kurtosis of the distribution is greater than 32, the MSE of the variance estimator can be reduced with bagging. This result is important because it demonstrates that for distribution with low kurtosis, bagging can only deteriorate the performance of a statistical prediction. Finally, we propose a novel general-purpose algorithm to estimate with high precision the variance of a sample."

16 days ago by cshalizi

[1908.02614] The power of dynamic social networks to predict individuals' mental health

16 days ago by cshalizi

"Precision medicine has received attention both in and outside the clinic. We focus on the latter, by exploiting the relationship between individuals' social interactions and their mental health to develop a predictive model of one's likelihood to be depressed or anxious from rich dynamic social network data. To our knowledge, we are the first to do this. Existing studies differ from our work in at least one aspect: they do not model social interaction data as a network; they do so but analyze static network data; they examine "correlation" between social networks and health but without developing a predictive model; or they study other individual traits but not mental health. In a systematic and comprehensive evaluation, we show that our predictive model that uses dynamic social network data is superior to its static network as well as non-network equivalents when run on the same data."

to:NB
social_networks
psychiatry
sociology
prediction
network_data_analysis
lizardo.omar
to_read
16 days ago by cshalizi

Dimension reduction for the conditional mean and variance functions in time series - Park - - Scandinavian Journal of Statistics - Wiley Online Library

17 days ago by cshalizi

"This paper deals with the nonparametric estimation of the mean and variance functions of univariate time series data. We propose a nonparametric dimension reduction technique for both mean and variance functions of time series. This method does not require any model specification and instead we seek directions in both the mean and variance functions such that the conditional distribution of the current observation given the vector of past observations is the same as that of the current observation given a few linear combinations of the past observations without loss of inferential information. The directions of the mean and variance functions are estimated by maximizing the Kullback‐Leibler distance function. The consistency of the proposed estimators is established. A computational procedure is introduced to detect lags of the conditional mean and variance functions in practice. Numerical examples and simulation studies are performed to illustrate and evaluate the performance of the proposed estimators."

to:NB
prediction
time_series
dimension_reduction
statistics
information_theory
17 days ago by cshalizi

[1810.02909] On the Art and Science of Machine Learning Explanations

19 days ago by cshalizi

"This text discusses several popular explanatory methods that go beyond the error measurements and plots traditionally used to assess machine learning models. Some of the explanatory methods are accepted tools of the trade while others are rigorously derived and backed by long-standing theory. The methods, decision tree surrogate models, individual conditional expectation (ICE) plots, local interpretable model-agnostic explanations (LIME), partial dependence plots, and Shapley explanations, vary in terms of scope, fidelity, and suitable application domain. Along with descriptions of these methods, this text presents real-world usage recommendations supported by a use case and public, in-depth software examples for reproducibility."

to:NB
data_mining
prediction
explanation
to_teach:data-mining
19 days ago by cshalizi

[1907.08742] Estimating the Algorithmic Variance of Randomized Ensembles via the Bootstrap

4 weeks ago by cshalizi

"Although the methods of bagging and random forests are some of the most widely used prediction methods, relatively little is known about their algorithmic convergence. In particular, there are not many theoretical guarantees for deciding when an ensemble is "large enough" --- so that its accuracy is close to that of an ideal infinite ensemble. Due to the fact that bagging and random forests are randomized algorithms, the choice of ensemble size is closely related to the notion of "algorithmic variance" (i.e. the variance of prediction error due only to the training algorithm). In the present work, we propose a bootstrap method to estimate this variance for bagging, random forests, and related methods in the context of classification. To be specific, suppose the training dataset is fixed, and let the random variable Errt denote the prediction error of a randomized ensemble of size t. Working under a "first-order model" for randomized ensembles, we prove that the centered law of Errt can be consistently approximated via the proposed method as t→∞. Meanwhile, the computational cost of the method is quite modest, by virtue of an extrapolation technique. As a consequence, the method offers a practical guideline for deciding when the algorithmic fluctuations of Errt are negligible."

to:NB
ensemble_methods
computational_statistics
statistics
prediction
to_teach:data-mining
4 weeks ago by cshalizi

[1907.09013] Conscientious Classification: A Data Scientist's Guide to Discrimination-Aware Classification

4 weeks ago by cshalizi

"Recent research has helped to cultivate growing awareness that machine learning systems fueled by big data can create or exacerbate troubling disparities in society. Much of this research comes from outside of the practicing data science community, leaving its members with little concrete guidance to proactively address these concerns. This article introduces issues of discrimination to the data science community on its own terms. In it, we tour the familiar data mining process while providing a taxonomy of common practices that have the potential to produce unintended discrimination. We also survey how discrimination is commonly measured, and suggest how familiar development processes can be augmented to mitigate systems' discriminatory potential. We advocate that data scientists should be intentional about modeling and reducing discriminatory outcomes. Without doing so, their efforts will result in perpetuating any systemic discrimination that may exist, but under a misleading veil of data-driven objectivity."

to:NB
classifiers
algorithmic_fairness
prediction
to_teach:data-mining
o'neil.cathy
4 weeks ago by cshalizi

[1907.08679] Recommender Systems with Heterogeneous Side Information

4 weeks ago by cshalizi

"In modern recommender systems, both users and items are associated with rich side information, which can help understand users and items. Such information is typically heterogeneous and can be roughly categorized into flat and hierarchical side information. While side information has been proved to be valuable, the majority of existing systems have exploited either only flat side information or only hierarchical side information due to the challenges brought by the heterogeneity. In this paper, we investigate the problem of exploiting heterogeneous side information for recommendations. Specifically, we propose a novel framework jointly captures flat and hierarchical side information with mathematical coherence. We demonstrate the effectiveness of the proposed framework via extensive experiments on various real-world datasets. Empirical results show that our approach is able to lead a significant performance gain over the state-of-the-art methods."

to:NB
recommender_systems
prediction
to_teach:data-mining
4 weeks ago by cshalizi

[1907.01552] Forecasting high-dimensional dynamics exploiting suboptimal embeddings

5 weeks ago by cshalizi

"Delay embedding---a method for reconstructing dynamical systems by delay coordinates---is widely used to forecast nonlinear time series as a model-free approach. When multivariate time series are observed, several existing frameworks can be applied to yield a single forecast combining multiple forecasts derived from various embeddings. However, the performance of these frameworks is not always satisfactory because they randomly select embeddings or use brute force and do not consider the diversity of the embeddings to combine. Herein, we develop a forecasting framework that overcomes these existing problems. The framework exploits various "suboptimal embeddings" obtained by minimizing the in-sample error via combinatorial optimization. The framework achieves the best results among existing frameworks for sample toy datasets and a real-world flood dataset. We show that the framework is applicable to a wide range of data lengths and dimensions. Therefore, the framework can be applied to various fields such as neuroscience, ecology, finance, fluid dynamics, weather, and disaster prevention."

to:NB
dynamical_systems
time_series
prediction
geometry_from_a_time_series
5 weeks ago by cshalizi

[1906.08832] A Flexible Pipeline for Prediction of Tropical Cyclone Paths

8 weeks ago by cshalizi

"Hurricanes and, more generally, tropical cyclones (TCs) are rare, complex natural phenomena of both scientific and public interest. The importance of understanding TCs in a changing climate has increased as recent TCs have had devastating impacts on human lives and communities. Moreover, good prediction and understanding about the complex nature of TCs can mitigate some of these human and property losses. Though TCs have been studied from many different angles, more work is needed from a statistical approach of providing prediction regions. The current state-of-the-art in TC prediction bands comes from the National Hurricane Center of the National Oceanographic and Atmospheric Administration (NOAA), whose proprietary model provides "cones of uncertainty" for TCs through an analysis of historical forecast errors.

"The contribution of this paper is twofold. We introduce a new pipeline that encourages transparent and adaptable prediction band development by streamlining cyclone track simulation and prediction band generation. We also provide updates to existing models and novel statistical methodologies in both areas of the pipeline, respectively."

to:NB
cyclones
prediction
statistics
spatio-temporal_statistics
schafer.chad
dunn.robin
kith_and_kin
to_teach:data_over_space_and_time
"The contribution of this paper is twofold. We introduce a new pipeline that encourages transparent and adaptable prediction band development by streamlining cyclone track simulation and prediction band generation. We also provide updates to existing models and novel statistical methodologies in both areas of the pipeline, respectively."

8 weeks ago by cshalizi

[1906.05473] Selective prediction-set models with coverage guarantees

10 weeks ago by cshalizi

"Though black-box predictors are state-of-the-art for many complex tasks, they often fail to properly quantify predictive uncertainty and may provide inappropriate predictions for unfamiliar data. Instead, we can learn more reliable models by letting them either output a prediction set or abstain when the uncertainty is high. We propose training these selective prediction-set models using an uncertainty-aware loss minimization framework, which unifies ideas from decision theory and robust maximum likelihood. Moreover, since black-box methods are not guaranteed to output well-calibrated prediction sets, we show how to calculate point estimates and confidence intervals for the true coverage of any selective prediction-set model, as well as a uniform mixture of K set models obtained from K-fold sample-splitting. When applied to predicting in-hospital mortality and length-of-stay for ICU patients, our model outperforms existing approaches on both in-sample and out-of-sample age groups, and our recalibration method provides accurate inference for prediction set coverage."

to:NB
prediction
statistics
10 weeks ago by cshalizi

[1906.04711] ProPublica's COMPAS Data Revisited

10 weeks ago by cshalizi

"In this paper I re-examine the COMPAS recidivism score and criminal history data collected by ProPublica in 2016, which has fueled intense debate and research in the nascent field of `algorithmic fairness' or `fair machine learning' over the past three years. ProPublica's COMPAS data is used in an ever-increasing number of studies to test various definitions and methodologies of algorithmic fairness. This paper takes a closer look at the actual datasets put together by ProPublica. In particular, I examine the distribution of defendants across COMPAS screening dates and find that ProPublica made an important data processing mistake when it created some of the key datasets most often used by other researchers. Specifically, the datasets built to study the likelihood of recidivism within two years of the original COMPAS screening date. As I show in this paper, ProPublica made a mistake implementing the two-year sample cutoff rule for recidivists in such datasets (whereas it implemented an appropriate two-year sample cutoff rule for non-recidivists). As a result, ProPublica incorrectly kept a disproportionate share of recidivists. This data processing mistake leads to biased two-year recidivism datasets, with artificially high recidivism rates. This also affects the positive and negative predictive values. On the other hand, this data processing mistake does not impact some of the key statistical measures highlighted by ProPublica and other researchers, such as the false positive and false negative rates, nor the overall accuracy."

to:NB
data_sets
crime
prediction
to_teach:data-mining
10 weeks ago by cshalizi

[1905.12262] Flexible Mining of Prefix Sequences from Time-Series Traces

12 weeks ago by cshalizi

"Mining temporal assertions from time-series data using information theory to filter real properties from incidental ones is a practically significant challenge. The problem is complex for continuous or hybrid systems because the degrees of influence on a consequent from a timed-sequence of predicates (called its prefix sequence), varies continuously over dense time intervals. We propose a parameterized method that uses interval arithmetic for flexibly learning prefix sequences having influence on a defined consequent over various time scales and predicates over system variables."

to:NB
prediction
time_series
variable-length_markov_chains
re:AoS_project
12 weeks ago by cshalizi

[1905.11744] Evaluating time series forecasting models: An empirical study on performance estimation methods

12 weeks ago by cshalizi

"Performance estimation aims at estimating the loss that a predictive model will incur on unseen data. These procedures are part of the pipeline in every machine learning project and are used for assessing the overall generalisation ability of predictive models. In this paper we address the application of these methods to time series forecasting tasks. For independent and identically distributed data the most common approach is cross-validation. However, the dependency among observations in time series raises some caveats about the most appropriate way to estimate performance in this type of data and currently there is no settled way to do so. We compare different variants of cross-validation and of out-of-sample approaches using two case studies: One with 62 real-world time series and another with three synthetic time series. Results show noticeable differences in the performance estimation methods in the two scenarios. In particular, empirical experiments suggest that cross-validation approaches can be applied to stationary time series. However, in real-world scenarios, when different sources of non-stationary variation are at play, the most accurate estimates are produced by out-of-sample methods that preserve the temporal order of observations."

to:NB
time_series
prediction
cross-validation
model_selection
re:XV_for_mixing
statistics
12 weeks ago by cshalizi

[1905.10634] Adaptive, Distribution-Free Prediction Intervals for Deep Neural Networks

12 weeks ago by cshalizi

"This paper addresses the problem of assessing the variability of predictions from deep neural networks. There is a growing literature on using and improving the predictive accuracy of deep networks, but a concomitant improvement in the quantification of their uncertainty is lacking. We provide a prediction interval network (PI-Network) which is a transparent, tractable modification of the standard predictive loss used to train deep networks. The PI-Network outputs three values instead of a single point estimate and optimizes a loss function inspired by quantile regression. We go beyond merely motivating the construction of these networks and provide two prediction interval methods with provable, finite sample coverage guarantees without any assumptions on the underlying distribution from which our data is drawn. We only require that the observations are independent and identically distributed. Furthermore, our intervals adapt to heteroskedasticity and asymmetry in the conditional distribution of the response given the covariates. The first method leverages the conformal inference framework and provides average coverage. The second method provides a new, stronger guarantee by conditioning on the observed data. Lastly, our loss function does not compromise the predictive accuracy of the network like other prediction interval methods. We demonstrate the ease of use of the PI-Network as well as its improvements over other methods on both simulated and real data. As the PI-Network can be used with a host of deep learning methods with only minor modifications, its use should become standard practice, much like reporting standard errors along with mean estimates."

to:NB
prediction
confidence_sets
neural_networks
regression
leeb.hannes
statistics
12 weeks ago by cshalizi

Keeping Score: Predictive Analytics in Policing | Annual Review of Criminology

may 2019 by cshalizi

"Predictive analytics in policing is a data-driven approach to (a) characterizing crime patterns across time and space and (b) leveraging this knowledge for the prevention of crime and disorder. This article outlines the current state of the field, providing a review of forecasting tools that have been successfully applied by police to the task of crime prediction. We then discuss options for structured design and evaluation of a predictive policing program so that the benefits of proactive intervention efforts are maximized given fixed resource constraints. We highlight examples of predictive policing programs that have been implemented and evaluated by police agencies in the field. Finally, we discuss ethical issues related to predictive analytics in policing and suggest approaches for minimizing potential harm to vulnerable communities while providing an equitable distribution of the benefits of crime prevention across populations within police jurisdiction."

to:NB
police
crime
prediction
data_mining
to_teach:data-mining
may 2019 by cshalizi

Mapping Sea-Level Change in Time, Space, and Probability | Annual Review of Environment and Resources

may 2019 by cshalizi

"Future sea-level rise generates hazards for coastal populations, economies, infrastructure, and ecosystems around the world. The projection of future sea-level rise relies on an accurate understanding of the mechanisms driving its complex spatio-temporal evolution, which must be founded on an understanding of its history. We review the current methodologies and data sources used to reconstruct the history of sea-level change over geological (Pliocene, Last Interglacial, and Holocene) and instrumental (tide-gauge and satellite alimetry) eras, and the tools used to project the future spatial and temporal evolution of sea level. We summarize the understanding of the future evolution of sea level over the near (through 2050), medium (2100), and long (post-2100) terms. Using case studies from Singapore and New Jersey, we illustrate the ways in which current methodologies and data sources can constrain future projections, and how accurate projections can motivate the development of new sea-level research questions across relevant timescales."

(Last tag unusually tentative)

to:NB
climate_change
prediction
oceanography
to_teach:data_over_space_and_time
(Last tag unusually tentative)

may 2019 by cshalizi

On the Statistical Formalism of Uncertainty Quantification | Annual Review of Statistics and Its Application

may 2019 by cshalizi

"The use of models to try to better understand reality is ubiquitous. Models have proven useful in testing our current understanding of reality; for instance, climate models of the 1980s were built for science discovery, to achieve a better understanding of the general dynamics of climate systems. Scientific insights often take the form of general qualitative predictions (i.e., “under these conditions, the Earth's poles will warm more than the rest of the planet”); such use of models differs from making quantitative forecasts of specific events (i.e. “high winds at noon tomorrow at London's Heathrow Airport”). It is sometimes hoped that, after sufficient model development, any model can be used to make quantitative forecasts for any target system. Even if that were the case, there would always be some uncertainty in the prediction. Uncertainty quantification aims to provide a framework within which that uncertainty can be discussed and, ideally, quantified, in a manner relevant to practitioners using the forecast system. A statistical formalism has developed that claims to be able to accurately assess the uncertainty in prediction. This article is a discussion of if and when this formalism can do so. The article arose from an ongoing discussion between the authors concerning this issue, the second author generally being considerably more skeptical concerning the utility of the formalism in providing quantitative decision-relevant information."

to:NB
to_read
statistics
prediction
risk_vs_uncertainty
smith.leonard
berger.james
foundations_of_statistics
may 2019 by cshalizi

On Prediction Properties of Kriging: Uniform Error Bounds and Robustness: Journal of the American Statistical Association: Vol 0, No 0

may 2019 by cshalizi

"Kriging based on Gaussian random fields is widely used in reconstructing unknown functions. The kriging method has pointwise predictive distributions which are computationally simple. However, in many applications one would like to predict for a range of untried points simultaneously. In this work, we obtain some error bounds for the simple and universal kriging predictor under the uniform metric. It works for a scattered set of input points in an arbitrary dimension, and also covers the case where the covariance function of the Gaussian process is misspecified. These results lead to a better understanding of the rate of convergence of kriging under the Gaussian or the Matérn correlation functions, the relationship between space-filling designs and kriging models, and the robustness of the Matérn correlation functions. Supplementary materials for this article are available online."

(The last tag is really more "look at this before I teach that course next time, to see if any of it is worth giving a pointer to for the more advanced students")

in_NB
kriging
spatial_statistics
prediction
smoothing
statistics
to_teach:data_over_space_and_time
(The last tag is really more "look at this before I teach that course next time, to see if any of it is worth giving a pointer to for the more advanced students")

may 2019 by cshalizi

[1904.04765] Generic Variance Bounds on Estimation and Prediction Errors in Time Series Analysis: An Entropy Perspective

may 2019 by cshalizi

"In this paper, we obtain generic bounds on the variances of estimation and prediction errors in time series analysis via an information-theoretic approach. It is seen in general that the error bounds are determined by the conditional entropy of the data point to be estimated or predicted given the side information or past observations. Additionally, we discover that in order to achieve the prediction error bounds asymptotically, the necessary and sufficient condition is that the "innovation" is asymptotically white Gaussian. When restricted to Gaussian processes and 1-step prediction, our bounds are shown to reduce to the Kolmogorov-Szegö formula and Wiener-Masani formula known from linear prediction theory."

to:NB
information_theory
prediction
time_series
statistics
to_teach:data_over_space_and_time
may 2019 by cshalizi

[1904.06019] Conformal Prediction Under Covariate Shift

may 2019 by cshalizi

"We extend conformal prediction methodology beyond the case of exchangeable data. In particular, we show that a weighted version of conformal prediction can be used to compute distribution-free prediction intervals for problems in which the test and training covariate distributions differ, but the likelihood ratio between these two distributions is known---or, in practice, can be estimated accurately with access to a large set of unlabeled data (test covariate points). Our weighted extension of conformal prediction also applies more generally, to settings in which the data satisfies a certain weighted notion of exchangeability. We discuss other potential applications of our new conformal methodology, including latent variable and missing data problems."

to:NB
to_read
statistics
prediction
conformal_prediction
ramdas.aaditya
tibshirani.ryan
kith_and_kin
covariate_shift
may 2019 by cshalizi

[1903.01048] Early Detection of Influenza outbreaks in the United States

april 2019 by cshalizi

"Public health surveillance systems often fail to detect emerging infectious diseases, particularly in resource limited settings. By integrating relevant clinical and internet-source data, we can close critical gaps in coverage and accelerate outbreak detection. Here, we present a multivariate algorithm that uses freely available online data to provide early warning of emerging influenza epidemics in the US. We evaluated 240 candidate predictors and found that the most predictive combination does \textit{not} include surveillance or electronic health records data, but instead consists of eight Google search and Wikipedia pageview time series reflecting changing levels of interest in influenza-related topics. In cross validation on 2010-2016 data, this algorithm sounds alarms an average of 16.4 weeks prior to influenza activity reaching the Center for Disease Control and Prevention (CDC) threshold for declaring the start of the season. In an out-of-sample test on data from the rapidly-emerging fall wave of the 2009 H1N1 pandemic, it recognized the threat five weeks in advance of this surveillance threshold. Simpler algorithms, including fixed week-of-the-year triggers, lag the optimized alarms by only a few weeks when detecting seasonal influenza, but fail to provide early warning in the 2009 pandemic scenario. This demonstrates a robust method for designing next generation outbreak detection algorithms. By combining scan statistics with machine learning, it identifies tractable combinations of data sources (from among thousands of candidates) that can provide early warning of emerging infectious disease threats worldwide."

to:NB
statistics
prediction
epidemiology
meyers.lauren_ancel
april 2019 by cshalizi

[1903.02131] A Prediction Tournament Paradox

april 2019 by cshalizi

"In a prediction tournament, contestants "forecast" by asserting a numerical probability for each of (say) 100 future real-world events. The scoring system is designed so that (regardless of the unknown true probabilities) more accurate forecasters will likely score better. This is true for one-on-one comparisons between contestants. But consider a realistic-size tournament with many contestants, with a range of accuracies. It may seem self-evident that the winner will likely be one of the most accurate forecasters. But, in the setting where the range extends to very accurate forecasters, simulations show this is mathematically false, within a somewhat plausible model. Even outside that setting the winner is less likely than intuition suggests to be one of the handful of best forecasters. Though implicit in recent technical papers, this paradox has apparently not been explicitly pointed out before, though is easily explained. It perhaps has implications for the ongoing IARPA-sponsored research programs involving forecasting."

to:NB
prediction
aldous.david
april 2019 by cshalizi

[1903.08125] Predictive Clustering

april 2019 by cshalizi

"We show how to convert any clustering into a prediction set. This has the effect of converting the clustering into a (possibly overlapping) union of spheres or ellipsoids. The tuning parameters can be chosen to minimize the size of the prediction set. When applied to k-means clustering, this method solves several problems: the method tells us how to choose k, how to merge clusters and how to replace the Voronoi partition with more natural shapes. We show that the same reasoning can be applied to other clustering methods."

to:NB
statistics
kith_and_kin
prediction
clustering
rinaldo.alessandro
wasserman.larry
april 2019 by cshalizi

Polar Vortex 2019: Why Forecasts Are So Accurate Now - The Atlantic

february 2019 by cshalizi

Actually teaching this would mean learning a lot about the history & current state of weather forecasting...

prediction
meteorology
to_teach
february 2019 by cshalizi

Kriging and Splines: Theoretical Approach to Linking Spatial Prediction Methods | SpringerLink

january 2019 by cshalizi

To rip off shamelessly, if/when I re-teach the course.

in_NB
to_read
splines
prediction
spatial_statistics
to_teach:data_over_space_and_time
january 2019 by cshalizi

[1710.05013] A Case Study Competition Among Methods for Analyzing Large Spatial Data

october 2018 by cshalizi

"The Gaussian process is an indispensable tool for spatial data analysts. The onset of the "big data" era, however, has lead to the traditional Gaussian process being computationally infeasible for modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low rank structures and/or multi-core and multi-threaded computing environments to facilitate computation. This study provides, first, an introductory overview of several methods for analyzing large spatial data. Second, this study describes the results of a predictive competition among the described methods as implemented by different groups with strong expertise in the methodology. Specifically, each research group was provided with two training datasets (one simulated and one observed) along with a set of prediction locations. Each group then wrote their own implementation of their method to produce predictions at the given location and each which was subsequently run on a common computing environment. The methods were then compared in terms of various predictive diagnostics. Supplementary materials regarding implementation details of the methods and code are available for this article online."

in_NB
spatial_statistics
prediction
computational_statistics
statistics
to_teach:data_over_space_and_time
october 2018 by cshalizi

Prediction Interval for Autoregressive Time Series via Oracally Efficient Estimation of Multi‐Step‐Ahead Innovation Distribution Function - Kong - 2018 - Journal of Time Series Analysis - Wiley Online Library

october 2018 by cshalizi

"A kernel distribution estimator (KDE) is proposed for multi‐step‐ahead prediction error distribution of autoregressive time series, based on prediction residuals. Under general assumptions, the KDE is proved to be oracally efficient as the infeasible KDE and the empirical cumulative distribution function (cdf) based on unobserved prediction errors. Quantile estimator is obtained from the oracally efficient KDE, and prediction interval for multi‐step‐ahead future observation is constructed using the estimated quantiles and shown to achieve asymptotically the nominal confidence levels. Simulation examples corroborate the asymptotic theory."

in_NB
prediction
time_series
statistics
kernel_estimators
october 2018 by cshalizi

On the Sensitivity of Granger Causality to Errors‐In‐Variables, Linear Transformations and Subsampling - Anderson - - Journal of Time Series Analysis - Wiley Online Library

september 2018 by cshalizi

"This article studies the sensitivity of Granger causality to the addition of noise, the introduction of subsampling, and the application of causal invertible filters to weakly stationary processes. Using canonical spectral factors and Wold decompositions, we give general conditions under which additive noise or filtering distorts Granger‐causal properties by inducing (spurious) Granger causality, as well as conditions under which it does not. For the errors‐in‐variables case, we give a continuity result, which implies that: a ‘small’ noise‐to‐signal ratio entails ‘small’ distortions in Granger causality. On filtering, we give general necessary and sufficient conditions under which ‘spurious’ causal relations between (vector) time series are not induced by linear transformations of the variables involved. This also yields transformations (or filters) which can eliminate Granger causality from one vector to another one. In a number of cases, we clarify results in the existing literature, with a number of calculations streamlining some existing approaches."

to:NB
time_series
prediction
granger_causality
measurement
september 2018 by cshalizi

Lognormal-de Wijsian Geostatistics for Ore Evaluation

september 2018 by cshalizi

Krige on kriging. I have to admit I hadn't fully realized that the historical context was "keep South Africa going"...

in_NB
have_read
spatial_statistics
prediction
statistics
geology
to_teach:data_over_space_and_time
september 2018 by cshalizi

Hello World | W. W. Norton & Company

september 2018 by cshalizi

"If you were accused of a crime, who would you rather decide your sentence—a mathematically consistent algorithm incapable of empathy or a compassionate human judge prone to bias and error? What if you want to buy a driverless car and must choose between one programmed to save as many lives as possible and another that prioritizes the lives of its own passengers? And would you agree to share your family’s full medical history if you were told that it would help researchers find a cure for cancer?

"These are just some of the dilemmas that we are beginning to face as we approach the age of the algorithm, when it feels as if the machines reign supreme. Already, these lines of code are telling us what to watch, where to go, whom to date, and even whom to send to jail. But as we rely on algorithms to automate big, important decisions—in crime, justice, healthcare, transportation, and money—they raise questions about what we want our world to look like. What matters most: Helping doctors with diagnosis or preserving privacy? Protecting victims of crime or preventing innocent people being falsely accused?

"Hello World takes us on a tour through the good, the bad, and the downright ugly of the algorithms that surround us on a daily basis. Mathematician Hannah Fry reveals their inner workings, showing us how algorithms are written and implemented, and demonstrates the ways in which human bias can literally be written into the code. By weaving in relatable, real world stories with accessible explanations of the underlying mathematics that power algorithms, Hello World helps us to determine their power, expose their limitations, and examine whether they really are improvement on the human systems they replace."

to:NB
books:noted
data_mining
machine_learning
prediction
"These are just some of the dilemmas that we are beginning to face as we approach the age of the algorithm, when it feels as if the machines reign supreme. Already, these lines of code are telling us what to watch, where to go, whom to date, and even whom to send to jail. But as we rely on algorithms to automate big, important decisions—in crime, justice, healthcare, transportation, and money—they raise questions about what we want our world to look like. What matters most: Helping doctors with diagnosis or preserving privacy? Protecting victims of crime or preventing innocent people being falsely accused?

"Hello World takes us on a tour through the good, the bad, and the downright ugly of the algorithms that surround us on a daily basis. Mathematician Hannah Fry reveals their inner workings, showing us how algorithms are written and implemented, and demonstrates the ways in which human bias can literally be written into the code. By weaving in relatable, real world stories with accessible explanations of the underlying mathematics that power algorithms, Hello World helps us to determine their power, expose their limitations, and examine whether they really are improvement on the human systems they replace."

september 2018 by cshalizi

Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning

september 2018 by cshalizi

"Randomized neural networks are immortalized in this AI Koan: In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6. What are you doing?'' asked Minsky. I am training a randomly wired neural net to play tic-tac-toe,'' Sussman replied. Why is the net wired randomly?'' asked Minsky. Sussman replied, I do not want it to have any preconceptions of how to play.'' Minsky then shut his eyes. Why do you close your eyes?'' Sussman asked his teacher. So that the room will be empty,'' replied Minsky. At that moment, Sussman was enlightened. We analyze shallow random networks with the help of concentration of measure inequalities. Specifically, we consider architectures that compute a weighted sum of their inputs after passing them through a bank of arbitrary randomized nonlinearities. We identify conditions under which these networks exhibit good classification performance, and bound their test error in terms of the size of the dataset and the number of random nonlinearities."

--- Have I never bookmarked this before?

in_NB
approximation
kernel_methods
random_projections
statistics
prediction
classifiers
rahimi.ali
recht.benjamin
machine_learning
have_read
--- Have I never bookmarked this before?

september 2018 by cshalizi

[1205.4591] Forecastable Component Analysis (ForeCA)

september 2018 by cshalizi

" introduce Forecastable Component Analysis (ForeCA), a novel dimension reduction technique for temporally dependent signals. Based on a new forecastability measure, ForeCA finds an optimal transformation to separate a multivariate time series into a forecastable and an orthogonal white noise space. I present a converging algorithm with a fast eigenvector solution. Applications to financial and macro-economic time series show that ForeCA can successfully discover informative structure, which can be used for forecasting as well as classification. The R package ForeCA (this http URL) accompanies this work and is publicly available on CRAN."

to:NB
have_read
time_series
kith_and_kin
goerg.georg
prediction
statistics
to_teach:data_over_space_and_time
september 2018 by cshalizi

[1808.00023] The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning

august 2018 by cshalizi

"The nascent field of fair machine learning aims to ensure that decisions guided by algorithms are equitable. Over the last several years, three formal definitions of fairness have gained prominence: (1) anti-classification, meaning that protected attributes---like race, gender, and their proxies---are not explicitly used to make decisions; (2) classification parity, meaning that common measures of predictive performance (e.g., false positive and false negative rates) are equal across groups defined by the protected attributes; and (3) calibration, meaning that conditional on risk estimates, outcomes are independent of protected attributes. Here we show that all three of these fairness definitions suffer from significant statistical limitations. Requiring anti-classification or classification parity can, perversely, harm the very groups they were designed to protect; and calibration, though generally desirable, provides little guarantee that decisions are equitable. In contrast to these formal fairness criteria, we argue that it is often preferable to treat similarly risky people similarly, based on the most statistically accurate estimates of risk that one can produce. Such a strategy, while not universally applicable, often aligns well with policy objectives; notably, this strategy will typically violate both anti-classification and classification parity. In practice, it requires significant effort to construct suitable risk estimates. One must carefully define and measure the targets of prediction to avoid retrenching biases in the data. But, importantly, one cannot generally address these difficulties by requiring that algorithms satisfy popular mathematical formalizations of fairness. By highlighting these challenges in the foundation of fair machine learning, we hope to help researchers and practitioners productively advance the area."

--- ETA: This is a really good and convincing paper.

in_NB
prediction
algorithmic_fairness
goel.sharad
via:rvenkat
have_read
heard_the_talk
--- ETA: This is a really good and convincing paper.

august 2018 by cshalizi

Local causal states and discrete coherent structures (Rupe and Crutchfield, 2018)

august 2018 by cshalizi

"Coherent structures form spontaneously in nonlinear spatiotemporal systems and are found at all spatial scales in natural phenomena from laboratory hydrodynamic flows and chemical reactions to ocean, atmosphere, and planetary climate dynamics. Phenomenologically, they appear as key components that organize the macroscopic behaviors in such systems. Despite a century of effort, they have eluded rigorous analysis and empirical prediction, with progress being made only recently. As a step in this, we present a formal theory of coherent structures in fully discrete dynamical field theories. It builds on the notion of structure introduced by computational mechanics, generalizing it to a local spatiotemporal setting. The analysis’ main tool employs the local causal states, which are used to uncover a system’s hidden spatiotemporal symmetries and which identify coherent structures as spatially localized deviations from those symmetries. The approach is behavior-driven in the sense that it does not rely on directly analyzing spatiotemporal equations of motion, rather it considers only the spatiotemporal fields a system generates. As such, it offers an unsupervised approach to discover and describe coherent structures. We illustrate the approach by analyzing coherent structures generated by elementary cellular automata, comparing the results with an earlier, dynamic-invariant-set approach that decomposes fields into domains, particles, and particle interactions."

--- *ahem* *cough* https://arxiv.org/abs/nlin/0508001 *ahem*

to:NB
have_read
pattern_formation
complexity
prediction
stochastic_processes
spatio-temporal_statistics
cellular_automata
crutchfield.james_p.
modesty_forbids_further_comment
--- *ahem* *cough* https://arxiv.org/abs/nlin/0508001 *ahem*

august 2018 by cshalizi

[1705.08105] FRK: An R Package for Spatial and Spatio-Temporal Prediction with Large Datasets

august 2018 by cshalizi

"FRK is an R software package for spatial/spatio-temporal modelling and prediction with large datasets. It facilitates optimal spatial prediction (kriging) on the most commonly used manifolds (in Euclidean space and on the surface of the sphere), for both spatial and spatio-temporal fields. It differs from many of the packages for spatial modelling and prediction by avoiding stationary and isotropic covariance and variogram models, instead constructing a spatial random effects (SRE) model on a fine-resolution discretised spatial domain. The discrete element is known as a basic areal unit (BAU), whose introduction in the software leads to several practical advantages. The software can be used to (i) integrate multiple observations with different supports with relative ease; (ii) obtain exact predictions at millions of prediction locations (without conditional simulation); and (iii) distinguish between measurement error and fine-scale variation at the resolution of the BAU, thereby allowing for reliable uncertainty quantification. The temporal component is included by adding another dimension. A key component of the SRE model is the specification of spatial or spatio-temporal basis functions; in the package, they can be generated automatically or by the user. The package also offers automatic BAU construction, an expectation-maximisation (EM) algorithm for parameter estimation, and functionality for prediction over any user-specified polygons or BAUs. Use of the package is illustrated on several spatial and spatio-temporal datasets, and its predictions and the model it implements are extensively compared to others commonly used for spatial prediction and modelling."

in_NB
to_read
R
heard_the_talk
prediction
spatial_statistics
spatio-temporal_statistics
to_teach:data_over_space_and_time
august 2018 by cshalizi

Indirect inference through prediction

july 2018 by cshalizi

"By recasting indirect inference estimation as a prediction rather than a minimization and by using regularized regressions, we can bypass the three major problems of estimation: selecting the summary statistics, defining the distance function and minimizing it numerically. By substituting regression with classification we can extend this approach to model selection as well. We present three examples: a statistical fit, the parametrization of a simple RBC model and heuristics selection in a fishery agent-based model."

agent-based_models
prediction
statistics
estimation
indirect_inference
simulation
have_read
in_NB
july 2018 by cshalizi

Material Signals: A Historical Sociology of High-Frequency Trading | American Journal of Sociology: Vol 123, No 6

june 2018 by cshalizi

"Drawing on interviews with 194 market participants (including 54 practitioners of high-frequency trading or HFT), this article first identifies the main classes of “signals” (patterns of data) that influence how HFT algorithms buy and sell shares and interact with each other. Second, it investigates historically the processes that have led to three of the most important categories of these signals, finding that they arise from three features of U.S. share trading that are the result of episodes of meso-level conflict. Third, the article demonstrates the contingency of these features by briefly comparing HFT in share trading to HFT in futures, Treasurys, and foreign exchange. The article thus argues that how HFT algorithms act and interact is a specific, contingent product not just of the current but also of the past interaction of people, organizations, algorithms, and machines."

to:NB
finance
sociology
prediction
june 2018 by cshalizi

Bootstrap bias corrections for ensemble methods | SpringerLink

may 2018 by cshalizi

"This paper examines the use of a residual bootstrap for bias correction in machine learning regression methods. Accounting for bias is an important obstacle in recent efforts to develop statistical inference for machine learning. We demonstrate empirically that the proposed bootstrap bias correction can lead to substantial improvements in both bias and predictive accuracy. In the context of ensembles of trees, we show that this correction can be approximated at only double the cost of training the original ensemble. Our method is shown to improve test set accuracy over random forests by up to 70% on example problems from the UCI repository."

to;NB
ensemble_methods
prediction
bootstrap
hooker.giles
statistics
may 2018 by cshalizi

[1706.08576] Invariant Causal Prediction for Nonlinear Models

may 2018 by cshalizi

"An important problem in many domains is to predict how a system will respond to interventions. This task is inherently linked to estimating the system's underlying causal structure. To this end, 'invariant causal prediction' (ICP) (Peters et al., 2016) has been proposed which learns a causal model exploiting the invariance of causal relations using data from different environments. When considering linear models, the implementation of ICP is relatively straight-forward. However, the nonlinear case is more challenging due to the difficulty of performing nonparametric tests for conditional independence. In this work, we present and evaluate an array of methods for nonlinear and nonparametric versions of ICP for learning the causal parents of given target variables. We find that an approach which first fits a nonlinear model with data pooled over all environments and then tests for differences between the residual distributions across environments is quite robust across a large variety of simulation settings. We call this procedure "Invariant residual distribution test". In general, we observe that the performance of all approaches is critically dependent on the true (unknown) causal structure and it becomes challenging to achieve high power if the parental set includes more than two variables. As a real-world example, we consider fertility rate modelling which is central to world population projections. We explore predicting the effect of hypothetical interventions using the accepted models from nonlinear ICP. The results reaffirm the previously observed central causal role of child mortality rates."

to:NB
causal_inference
causal_discovery
statistics
regression
prediction
peters.jonas
meinshausen.nicolai
to_read
heard_the_talk
to_teach:undergrad-ADA
re:ADAfaEPoV
may 2018 by cshalizi

[1501.01332] Causal inference using invariant prediction: identification and confidence intervals

may 2018 by cshalizi

"What is the difference of a prediction that is made with a causal model and a non-causal model? Suppose we intervene on the predictor variables or change the whole environment. The predictions from a causal model will in general work as well under interventions as for observational data. In contrast, predictions from a non-causal model can potentially be very wrong if we actively intervene on variables. Here, we propose to exploit this invariance of a prediction under a causal model for causal inference: given different experimental settings (for example various interventions) we collect all models that do show invariance in their predictive accuracy across settings and interventions. The causal model will be a member of this set of models with high probability. This approach yields valid confidence intervals for the causal relationships in quite general scenarios. We examine the example of structural equation models in more detail and provide sufficient assumptions under which the set of causal predictors becomes identifiable. We further investigate robustness properties of our approach under model misspecification and discuss possible extensions. The empirical properties are studied for various data sets, including large-scale gene perturbation experiments."

to:NB
to_read
causal_inference
causal_discovery
statistics
prediction
regression
buhlmann.peter
meinshausen.nicolai
peters.jonas
heard_the_talk
re:ADAfaEPoV
to_teach:undergrad-ADA
may 2018 by cshalizi

[1708.03579] Self-exciting point processes with spatial covariates: modeling the dynamics of crime

may 2018 by cshalizi

"Crime has both varying patterns in space, related to features of the environment, economy, and policing, and patterns in time arising from criminal behavior, such as retaliation. Serious crimes may also be presaged by minor crimes of disorder. We demonstrate that these spatial and temporal patterns are generally confounded, requiring analyses to take both into account, and propose a spatio-temporal self-exciting point process model which incorporates spatial features, near-repeat and retaliation effects, and triggering. We develop inference methods and diagnostic tools, such as residual maps, for this model, and through extensive simulation and crime data obtained from Pittsburgh, Pennsylvania, demonstrate its properties and usefulness."

in_NB
spatio-temporal_statistics
point_processes
prediction
statistics
crime
kith_and_kin
reinhart.alex
greenhouse.joel
on_the_thesis_committee
to_teach:data_over_space_and_time
may 2018 by cshalizi

Forecasting the spatial transmission of influenza in the United States | PNAS

may 2018 by cshalizi

"Recurrent outbreaks of seasonal and pandemic influenza create a need for forecasts of the geographic spread of this pathogen. Although it is well established that the spatial progression of infection is largely attributable to human mobility, difficulty obtaining real-time information on human movement has limited its incorporation into existing infectious disease forecasting techniques. In this study, we develop and validate an ensemble forecast system for predicting the spatiotemporal spread of influenza that uses readily accessible human mobility data and a metapopulation model. In retrospective state-level forecasts for 35 US states, the system accurately predicts local influenza outbreak onset,—i.e., spatial spread, defined as the week that local incidence increases above a baseline threshold—up to 6 wk in advance of this event. In addition, the metapopulation prediction system forecasts influenza outbreak onset, peak timing, and peak intensity more accurately than isolated location-specific forecasts. The proposed framework could be applied to emergent respiratory viruses and, with appropriate modifications, other infectious diseases."

to:NB
epidemic_models
influenza
contagion
prediction
statistics
may 2018 by cshalizi

Slowness as a Proxy for Temporal Predictability: An Empirical Comparison | Neural Computation | MIT Press Journals

may 2018 by cshalizi

"The computational principles of slowness and predictability have been proposed to describe aspects of information processing in the visual system. From the perspective of slowness being a limited special case of predictability we investigate the relationship between these two principles empirically. On a collection of real-world data sets we compare the features extracted by slow feature analysis (SFA) to the features of three recently proposed methods for predictable feature extraction: forecastable component analysis, predictable feature analysis, and graph-based predictable feature analysis. Our experiments show that the predictability of the learned features is highly correlated, and, thus, SFA appears to effectively implement a method for extracting predictable features according to different measures of predictability."

to:NB
time_series
prediction
statistics
may 2018 by cshalizi

Predictive Processing and the Representation Wars | SpringerLink

march 2018 by cshalizi

"Clark has recently suggested that predictive processing advances a theory of neural function with the resources to put an ecumenical end to the “representation wars” of recent cognitive science. In this paper I defend and develop this suggestion. First, I broaden the representation wars to include three foundational challenges to representational cognitive science. Second, I articulate three features of predictive processing’s account of internal representation that distinguish it from more orthodox representationalist frameworks. Specifically, I argue that it posits a resemblance-based representational architecture with organism-relative contents that functions in the service of pragmatic success, not veridical representation. Finally, I argue that internal representation so understood is either impervious to the three anti-representationalist challenges I outline or can actively embrace them."

to:NB
philosophy_of_mind
cognitive_science
representation
prediction
march 2018 by cshalizi

[1803.04383] Delayed Impact of Fair Machine Learning

march 2018 by cshalizi

"Fairness in machine learning has predominantly been studied in static classification settings without concern for how decisions change the underlying population over time. Conventional wisdom suggests that fairness criteria promote the long-term well-being of those groups they aim to protect.

"We study how static fairness criteria interact with temporal indicators of well-being, such as long-term improvement, stagnation, and decline in a variable of interest. We demonstrate that even in a one-step feedback model, common fairness criteria in general do not promote improvement over time, and may in fact cause harm in cases where an unconstrained objective would not.

"We completely characterize the delayed impact of three standard criteria, contrasting the regimes in which these exhibit qualitatively different behavior. In addition, we find that a natural form of measurement error broadens the regime in which fairness criteria perform favorably.

"Our results highlight the importance of measurement and temporal modeling in the evaluation of fairness criteria, suggesting a range of new challenges and trade-offs."

--- A _lot_ is going to hinge here on how they model the feedback process.

--- My evil spirit is making me wonder how hard it would be to write a "Rhetoric of Reaction"-esque attack on algorithmic fairness.

to:NB
algorithmic_fairness
prediction
credit_ratings
data_mining
via:whimsley
"We study how static fairness criteria interact with temporal indicators of well-being, such as long-term improvement, stagnation, and decline in a variable of interest. We demonstrate that even in a one-step feedback model, common fairness criteria in general do not promote improvement over time, and may in fact cause harm in cases where an unconstrained objective would not.

"We completely characterize the delayed impact of three standard criteria, contrasting the regimes in which these exhibit qualitatively different behavior. In addition, we find that a natural form of measurement error broadens the regime in which fairness criteria perform favorably.

"Our results highlight the importance of measurement and temporal modeling in the evaluation of fairness criteria, suggesting a range of new challenges and trade-offs."

--- A _lot_ is going to hinge here on how they model the feedback process.

--- My evil spirit is making me wonder how hard it would be to write a "Rhetoric of Reaction"-esque attack on algorithmic fairness.

march 2018 by cshalizi

[1802.07814] Learning to Explain: An Information-Theoretic Perspective on Model Interpretation

march 2018 by cshalizi

"We introduce instancewise feature selection as a methodology for model interpretation. Our method is based on learning a function to extract a subset of features that are most informative for each given example. This feature selector is trained to maximize the mutual information between selected features and the response variable, where the conditional distribution of the response variable given the input is the model to be explained. We develop an efficient variational approximation to the mutual information, and show that the resulting method compares favorably to other model explanation methods on a variety of synthetic and real data sets using both quantitative metrics and human evaluation."

to:NB
information_theory
explanation
prediction
jordan.michael_i.
wainwright.martin_j.
statistics
march 2018 by cshalizi

[1702.04690] Simple rules for complex decisions

january 2018 by cshalizi

"From doctors diagnosing patients to judges setting bail, experts often base their decisions on experience and intuition rather than on statistical models. While understandable, relying on intuition over models has often been found to result in inferior outcomes. Here we present a new method, select-regress-and-round, for constructing simple rules that perform well for complex decisions. These rules take the form of a weighted checklist, can be applied mentally, and nonetheless rival the performance of modern machine learning algorithms. Our method for creating these rules is itself simple, and can be carried out by practitioners with basic statistics knowledge. We demonstrate this technique with a detailed case study of judicial decisions to release or detain defendants while they await trial. In this application, as in many policy settings, the effects of proposed decision rules cannot be directly observed from historical data: if a rule recommends releasing a defendant that the judge in reality detained, we do not observe what would have happened under the proposed action. We address this key counterfactual estimation problem by drawing on tools from causal inference. We find that simple rules significantly outperform judges and are on par with decisions derived from random forests trained on all available features. Generalizing to 22 varied decision-making domains, we find this basic result replicates. We conclude with an analytical framework that helps explain why these simple decision rules perform as well as they do."

to:NB
to_read
decision-making
classifiers
fast-and-frugal_heuristics
heuristics
clinical-vs-actuarial_prediction
prediction
crime
bail
via:vaguery
january 2018 by cshalizi

[1801.02858] Scalable high-resolution forecasting of sparse spatiotemporal events with kernel methods: a winning solution to the NIJ "Real-Time Crime Forecasting Challenge"

january 2018 by cshalizi

"This article describes Team Kernel Glitches' solution to the National Institute of Justice's (NIJ) Real-Time Crime Forecasting Challenge. The goal of the NIJ Real-Time Crime Forecasting Competition was to maximize two different crime hotspot scoring metrics for calls-for-service to the Portland Police Bureau (PPB) in Portland, Oregon during the period from March 1, 2017 to May 31, 2017. Our solution to the challenge is a spatiotemporal forecasting model combining scalable randomized Reproducing Kernel Hilbert Space (RKHS) methods for approximating Gaussian processes with autoregressive smoothing kernels in a regularized supervised learning framework. Our model can be understood as an approximation to the popular log-Gaussian Cox Process model: we discretize the spatiotemporal point pattern and learn a log intensity function using the Poisson likelihood and highly efficient gradient-based optimization methods. Model hyperparameters including quality of RKHS approximation, spatial and temporal kernel lengthscales, number of autoregressive lags, bandwidths for smoothing kernels, as well as cell shape, size, and rotation, were learned using crossvalidation. Resulting predictions exceeded baseline KDE estimates by 0.157. Performance improvement over baseline predictions were particularly large for sparse crimes over short forecasting horizons."

--- There seems to be some substantial improvements here over Seth's Ph.D. thesis...

in_NB
to_read
spatio-temporal_statistics
point_processes
statistics
prediction
crime
flaxman.seth
--- There seems to be some substantial improvements here over Seth's Ph.D. thesis...

january 2018 by cshalizi

Looking Forward: Prediction and Uncertainty in Modern America, Pietruska

january 2018 by cshalizi

"In the decades after the Civil War, the world experienced monumental changes in industry, trade, and governance. As Americans faced this uncertain future, public debate sprang up over the accuracy and value of predictions, asking whether it was possible to look into the future with any degree of certainty. In Looking Forward, Jamie L. Pietruska uncovers a culture of prediction in the modern era, where forecasts became commonplace as crop forecasters, “weather prophets,” business forecasters, utopian novelists, and fortune-tellers produced and sold their visions of the future. Private and government forecasters competed for authority—as well as for an audience—and a single prediction could make or break a forecaster’s reputation.

"Pietruska argues that this late nineteenth-century quest for future certainty had an especially ironic consequence: it led Americans to accept uncertainty as an inescapable part of both forecasting and twentieth-century economic and cultural life. Drawing together histories of science, technology, capitalism, environment, and culture, Looking Forward explores how forecasts functioned as new forms of knowledge and risk management tools that sometimes mitigated, but at other times exacerbated, the very uncertainties they were designed to conquer. Ultimately Pietruska shows how Americans came to understand the future itself as predictable, yet still uncertain."

to:NB
books:noted
prediction
history_of_ideas
history_of_science
19th_century_history
american_history
"Pietruska argues that this late nineteenth-century quest for future certainty had an especially ironic consequence: it led Americans to accept uncertainty as an inescapable part of both forecasting and twentieth-century economic and cultural life. Drawing together histories of science, technology, capitalism, environment, and culture, Looking Forward explores how forecasts functioned as new forms of knowledge and risk management tools that sometimes mitigated, but at other times exacerbated, the very uncertainties they were designed to conquer. Ultimately Pietruska shows how Americans came to understand the future itself as predictable, yet still uncertain."

january 2018 by cshalizi

[1706.02744] Avoiding Discrimination through Causal Reasoning

november 2017 by cshalizi

"Recent work on fairness in machine learning has focused on various statistical discrimination criteria and how they trade off. Most of these criteria are observational: They depend only on the joint distribution of predictor, protected attribute, features, and outcome. While convenient to work with, observational criteria have severe inherent limitations that prevent them from resolving matters of fairness conclusively.

"Going beyond observational criteria, we frame the problem of discrimination based on protected attributes in the language of causal reasoning. This viewpoint shifts attention from "What is the right fairness criterion?" to "What do we want to assume about the causal data generating process?" Through the lens of causality, we make several contributions. First, we crisply articulate why and when observational criteria fail, thus formalizing what was before a matter of opinion. Second, our approach exposes previously ignored subtleties and why they are fundamental to the problem. Finally, we put forward natural causal non-discrimination criteria and develop algorithms that satisfy them."

to:NB
to_read
causality
algorithmic_fairness
prediction
machine_learning
janzing.dominik
re:ADAfaEPoV
via:arsyed
"Going beyond observational criteria, we frame the problem of discrimination based on protected attributes in the language of causal reasoning. This viewpoint shifts attention from "What is the right fairness criterion?" to "What do we want to assume about the causal data generating process?" Through the lens of causality, we make several contributions. First, we crisply articulate why and when observational criteria fail, thus formalizing what was before a matter of opinion. Second, our approach exposes previously ignored subtleties and why they are fundamental to the problem. Finally, we put forward natural causal non-discrimination criteria and develop algorithms that satisfy them."

november 2017 by cshalizi

Tetlock, P.E.: Expert Political Judgment: How Good Is It? How Can We Know?. (New Edition) (eBook, Paperback and Hardcover)

september 2017 by cshalizi

"Tetlock first discusses arguments about whether the world is too complex for people to find the tools to understand political phenomena, let alone predict the future. He evaluates predictions from experts in different fields, comparing them to predictions by well-informed laity or those based on simple extrapolation from current trends. He goes on to analyze which styles of thinking are more successful in forecasting. Classifying thinking styles using Isaiah Berlin's prototypes of the fox and the hedgehog, Tetlock contends that the fox--the thinker who knows many little things, draws from an eclectic array of traditions, and is better able to improvise in response to changing events--is more successful in predicting the future than the hedgehog, who knows one big thing, toils devotedly within one tradition, and imposes formulaic solutions on ill-defined problems. He notes a perversely inverse relationship between the best scientific indicators of good judgement and the qualities that the media most prizes in pundits--the single-minded determination required to prevail in ideological combat.

"Clearly written and impeccably researched, the book fills a huge void in the literature on evaluating expert opinion. It will appeal across many academic disciplines as well as to corporations seeking to develop standards for judging expert decision-making. Now with a new preface in which Tetlock discusses the latest research in the field, the book explores what constitutes good judgment in predicting future events and looks at why experts are often wrong in their forecasts."

in_NB
books:noted
prediction
expertise
cognitive_science
"Clearly written and impeccably researched, the book fills a huge void in the literature on evaluating expert opinion. It will appeal across many academic disciplines as well as to corporations seeking to develop standards for judging expert decision-making. Now with a new preface in which Tetlock discusses the latest research in the field, the book explores what constitutes good judgment in predicting future events and looks at why experts are often wrong in their forecasts."

september 2017 by cshalizi

[1709.02012v1] On Fairness and Calibration

september 2017 by cshalizi

"The machine learning community has become increasingly concerned with the potential for bias and discrimination in predictive models, and this has motivated a growing line of work on what it means for a classification procedure to be "fair." In particular, we investigate the tension between minimizing error disparity across different population groups while maintaining calibrated probability estimates. We show that calibration is compatible only with a single error constraint (i.e. equal false-negatives rates across groups), and show that any algorithm that satisfies this relaxation is no better than randomizing a percentage of predictions for an existing classifier. These unsettling findings, which extend and generalize existing results, are empirically confirmed on several datasets."

to:NB
to_read
calibration
prediction
classifiers
kleinberg.jon
via:arsyed
september 2017 by cshalizi

Minding the Weather | The MIT Press

september 2017 by cshalizi

"This book argues that the human cognition system is the least understood, yet probably most important, component of forecasting accuracy. Minding the Weather investigates how people acquire massive and highly organized knowledge and develop the reasoning skills and strategies that enable them to achieve the highest levels of performance.

"The authors consider such topics as the forecasting workplace; atmospheric scientists’ descriptions of their reasoning strategies; the nature of expertise; forecaster knowledge, perceptual skills, and reasoning; and expert systems designed to imitate forecaster reasoning. Drawing on research in cognitive science, meteorology, and computer science, the authors argue that forecasting involves an interdependence of humans and technologies. Human expertise will always be necessary."

to:NB
prediction
meteorology
cognitive_science
books:noted
"The authors consider such topics as the forecasting workplace; atmospheric scientists’ descriptions of their reasoning strategies; the nature of expertise; forecaster knowledge, perceptual skills, and reasoning; and expert systems designed to imitate forecaster reasoning. Drawing on research in cognitive science, meteorology, and computer science, the authors argue that forecasting involves an interdependence of humans and technologies. Human expertise will always be necessary."

september 2017 by cshalizi

Empirical prediction intervals improve energy forecasting

august 2017 by cshalizi

"Hundreds of organizations and analysts use energy projections, such as those contained in the US Energy Information Administration (EIA)’s Annual Energy Outlook (AEO), for investment and policy decisions. Retrospective analyses of past AEO projections have shown that observed values can differ from the projection by several hundred percent, and thus a thorough treatment of uncertainty is essential. We evaluate the out-of-sample forecasting performance of several empirical density forecasting methods, using the continuous ranked probability score (CRPS). The analysis confirms that a Gaussian density, estimated on past forecasting errors, gives comparatively accurate uncertainty estimates over a variety of energy quantities in the AEO, in particular outperforming scenario projections provided in the AEO. We report probabilistic uncertainties for 18 core quantities of the AEO 2016 projections. Our work frames how to produce, evaluate, and rank probabilistic forecasts in this setting. We propose a log transformation of forecast errors for price projections and a modified nonparametric empirical density forecasting method. Our findings give guidance on how to evaluate and communicate uncertainty in future energy outlooks."

--- It's probably presumptuous of me, but I am a bit proud, because the first author learned a lot of these methods from my class...

to:NB
to_read
heard_the_talk
energy
prediction
statistics
to_teach:undergrad-ADA
--- It's probably presumptuous of me, but I am a bit proud, because the first author learned a lot of these methods from my class...

august 2017 by cshalizi

What does it mean to ask for an “explainable” algorithm?

july 2017 by cshalizi

"The second type of explainability problem is complexity. Here everything about the algorithm is known, but somebody feels that the algorithm is so complex that they cannot understand it. It will always be possible to answer what-if questions, such as how the algorithm’s result would have been different had the person been one year older, or had an extra $1000 of annual income, or had one fewer prior misdemeanor conviction, or whatever. So complexity can only be a barrier to big-picture understanding, not to understanding which factors might have changed a particular person’s outcome."

--- I am not at all sure about this, because of interactions. If the function changes sufficiently rapidly, with enough interactions between the inputs, knowing these sorts of local perturbations may tell us very little.

explanation
prediction
have_read
--- I am not at all sure about this, because of interactions. If the function changes sufficiently rapidly, with enough interactions between the inputs, knowing these sorts of local perturbations may tell us very little.

july 2017 by cshalizi

Phys. Rev. E 95, 042140 (2017) - Thermodynamics of complexity and pattern manipulation

june 2017 by cshalizi

"Many organisms capitalize on their ability to predict the environment to maximize available free energy and reinvest this energy to create new complex structures. This functionality relies on the manipulation of patterns—temporally ordered sequences of data. Here, we propose a framework to describe pattern manipulators—devices that convert thermodynamic work to patterns or vice versa—and use them to build a “pattern engine” that facilitates a thermodynamic cycle of pattern creation and consumption. We show that the least heat dissipation is achieved by the provably simplest devices, the ones that exhibit desired operational behavior while maintaining the least internal memory. We derive the ultimate limits of this heat dissipation and show that it is generally nonzero and connected with the pattern's intrinsic crypticity—a complexity theoretic quantity that captures the puzzling difference between the amount of information the pattern's past behavior reveals about its future and the amount one needs to communicate about this past to optimally predict the future."

to:NB
to_read
complexity
complexity_measures
prediction
thermodynamics
maxwells_demon
june 2017 by cshalizi

Dietze, M.C.: Ecological Forecasting (eBook and Hardcover).

june 2017 by cshalizi

"An authoritative and accessible introduction to the concepts and tools needed to make ecology a more predictive science

"Ecologists are being asked to respond to unprecedented environmental challenges. How can they provide the best available scientific information about what will happen in the future? Ecological Forecasting is the first book to bring together the concepts and tools needed to make ecology a more predictive science.

"Ecological Forecasting presents a new way of doing ecology. A closer connection between data and models can help us to project our current understanding of ecological processes into new places and times. This accessible and comprehensive book covers a wealth of topics, including Bayesian calibration and the complexities of real-world data; uncertainty quantification, partitioning, propagation, and analysis; feedbacks from models to measurements; state-space models and data fusion; iterative forecasting and the forecast cycle; and decision support."

to:NB
books:noted
ecology
prediction
statistics
"Ecologists are being asked to respond to unprecedented environmental challenges. How can they provide the best available scientific information about what will happen in the future? Ecological Forecasting is the first book to bring together the concepts and tools needed to make ecology a more predictive science.

"Ecological Forecasting presents a new way of doing ecology. A closer connection between data and models can help us to project our current understanding of ecological processes into new places and times. This accessible and comprehensive book covers a wealth of topics, including Bayesian calibration and the complexities of real-world data; uncertainty quantification, partitioning, propagation, and analysis; feedbacks from models to measurements; state-space models and data fusion; iterative forecasting and the forecast cycle; and decision support."

june 2017 by cshalizi

[1604.04173] Distribution-Free Predictive Inference For Regression

february 2017 by cshalizi

"We develop a general framework for distribution-free predictive inference in regression, using conformal inference. The proposed methodology allows construction of prediction bands for the response variable using any estimator of the regression function. The resulting prediction band preserves the consistency properties of the original estimator under standard assumptions, while guaranteeing finite sample marginal coverage even when the assumptions do not hold. We analyze and compare, both empirically and theoretically, two major variants of our conformal procedure: the full conformal inference and split conformal inference, along with a related jackknife method. These methods offer different tradeoffs between statistical accuracy (length of resulting prediction intervals) and computational efficiency. As extensions, we develop a method for constructing valid in-sample prediction intervals called rank-one-out conformal inference, which has essentially the same computational efficiency as split conformal inference. We also describe an extension of our procedures for producing prediction bands with varying local width, in order to adapt to heteroskedascity in the data distribution. Lastly, we propose a model-free notion of variable importance, called leave-one-covariate-out or LOCO inference. Accompanying our paper is an R package conformalInference that implements all of the proposals we have introduced. In the spirit of reproducibility, all empirical results in this paper can be easily (re)generated using this package."

to:NB
to_read
kith_and_kin
regression
prediction
wasserman.larry
tibshirani.ryan
g'sell.max
lei.jing
rinaldo.alessandro
february 2017 by cshalizi

A material political economy: Automated Trading Desk and price prediction in high-frequency trading - Dec 06, 2016

december 2016 by cshalizi

"This article contains the first detailed historical study of one of the new high-frequency trading (HFT) firms that have transformed many of the world’s financial markets. The study, of Automated Trading Desk (ATD), one of the earliest and most important such firms, focuses on how ATD’s algorithms predicted share price changes. The article argues that political-economic struggles are integral to the existence of some of the ‘pockets’ of predictable structure in the otherwise random movements of prices, to the availability of the data that allow algorithms to identify these pockets, and to the capacity of algorithms to use these predictions to trade profitably. The article also examines the role of HFT algorithms such as ATD’s in the epochal, fiercely contested shift in US share trading from ‘fixed-role’ markets towards ‘all-to-all’ markets."

to:NB
finance
prediction
sociology
mackenzie.donald
december 2016 by cshalizi

‘Moneyball’ for Professors?

december 2016 by cshalizi

The key paragraph:

"Using a hand-curated data set of 54 scholars who obtained doctorates after 1995 and held assistant professorships at top-10 operations research programs in 2003 or earlier, these statistical models made different decisions than the tenure committees for 16 (30%) of the candidates. Specifically, these new criteria yielded a set of scholars who, in the future, produced more papers published in the top journals and research that was cited more often than the scholars who were actually selected by tenure committees"

--- In other words, "success" here is defined entirely through the worst sort of abuse of citation metrics, i.e., through doing the things which everyone who has seriously studied citation metrics says you should _not_ use them for. (Cf. https://arxiv.org/abs/0910.3529 .) If the objective was to making academic hiring decisions _even less_ sensitive to actually intellectual quality, one could hardly do better.

I am sure that this idea will, however, be widely adopted and go from strength to strength.

bad_data_analysis
academia
bibliometry
social_networks
network_data_analysis
prediction
utter_stupidity
have_read
via:jbdelong
to:blog
"Using a hand-curated data set of 54 scholars who obtained doctorates after 1995 and held assistant professorships at top-10 operations research programs in 2003 or earlier, these statistical models made different decisions than the tenure committees for 16 (30%) of the candidates. Specifically, these new criteria yielded a set of scholars who, in the future, produced more papers published in the top journals and research that was cited more often than the scholars who were actually selected by tenure committees"

--- In other words, "success" here is defined entirely through the worst sort of abuse of citation metrics, i.e., through doing the things which everyone who has seriously studied citation metrics says you should _not_ use them for. (Cf. https://arxiv.org/abs/0910.3529 .) If the objective was to making academic hiring decisions _even less_ sensitive to actually intellectual quality, one could hardly do better.

I am sure that this idea will, however, be widely adopted and go from strength to strength.

december 2016 by cshalizi

[1311.4500] Time series prediction via aggregation : an oracle bound including numerical cost

december 2016 by cshalizi

"We address the problem of forecasting a time series meeting the Causal Bernoulli Shift model, using a parametric set of predictors. The aggregation technique provides a predictor with well established and quite satisfying theoretical properties expressed by an oracle inequality for the prediction risk. The numerical computation of the aggregated predictor usually relies on a Markov chain Monte Carlo method whose convergence should be evaluated. In particular, it is crucial to bound the number of simulations needed to achieve a numerical precision of the same order as the prediction risk. In this direction we present a fairly general result which can be seen as an oracle inequality including the numerical cost of the predictor computation. The numerical cost appears by letting the oracle inequality depend on the number of simulations required in the Monte Carlo approximation. Some numerical experiments are then carried out to support our findings."

to:NB
prediction
time_series
statistics
computational_statistics
monte_carlo
ensemble_methods
december 2016 by cshalizi

Sequential Learning, Predictability, and Optimal Portfolio Returns - JOHANNES - 2014 - The Journal of Finance - Wiley Online Library

december 2016 by cshalizi

"This paper finds statistically and economically significant out-of-sample portfolio benefits for an investor who uses models of return predictability when forming optimal portfolios. Investors must account for estimation risk, and incorporate an ensemble of important features, including time-varying volatility, and time-varying expected returns driven by payout yield measures that include share repurchase and issuance. Prior research documents a lack of benefits to return predictability, and our results suggest that this is largely due to omitting time-varying volatility and estimation risk. We also document the sequential process of investors learning about parameters, state variables, and models as new data arrive."

to:NB
finance
prediction
time_series
online_learning
december 2016 by cshalizi

[1311.5828] The Splice Bootstrap

december 2016 by cshalizi

"This paper proposes a new bootstrap method to compute predictive intervals for nonlinear autoregressive time series model forecast. This method we call the splice boobstrap as it involves splicing the last p values of a given series to a suitably simulated series. This ensures that each simulated series will have the same set of p time series values in common, a necessary requirement for computing conditional predictive intervals. Using simulation studies we show the methods gives 90% intervals intervals that are similar to those expected from theory for simple linear and SETAR model driven by normal and non-normal noise. Furthermore, we apply the method to some economic data and demonstrate the intervals compare favourably with cross-validation based intervals."

to:NB
bootstrap
time_series
statistics
prediction
to_teach:undergrad-ADA
re:ADAfaEPoV
to_read
december 2016 by cshalizi

Links Between Multiplicity Automata, Observable Operator Models and Predictive State Representations -- a Unified Learning Framework

november 2016 by cshalizi

"Stochastic multiplicity automata (SMA) are weighted finite automata that generalize probabilistic automata. They have been used in the context of probabilistic grammatical inference. Observable operator models (OOMs) are a generalization of hidden Markov models, which in turn are models for discrete-valued stochastic processes and are used ubiquitously in the context of speech recognition and bio-sequence modeling. Predictive state representations (PSRs) extend OOMs to stochastic input-output systems and are employed in the context of agent modeling and planning.

"We present SMA, OOMs, and PSRs under the common framework of sequential systems, which are an algebraic characterization of multiplicity automata, and examine the precise relationships between them. Furthermore, we establish a unified approach to learning such models from data. Many of the learning algorithms that have been proposed can be understood as variations of this basic learning scheme, and several turn out to be closely related to each other, or even equivalent."

to:NB
re:AoS_project
stochastic_processes
statistics
prediction
state-space_models
automata_theory
"We present SMA, OOMs, and PSRs under the common framework of sequential systems, which are an algebraic characterization of multiplicity automata, and examine the precise relationships between them. Furthermore, we establish a unified approach to learning such models from data. Many of the learning algorithms that have been proposed can be understood as variations of this basic learning scheme, and several turn out to be closely related to each other, or even equivalent."

november 2016 by cshalizi

[1210.0103] On convergence rates of Bayesian predictive densities and posterior distributions

november 2016 by cshalizi

"Frequentist-style large-sample properties of Bayesian posterior distributions, such as consistency and convergence rates, are important considerations in nonparametric problems. In this paper we give an analysis of Bayesian asymptotics based primarily on predictive densities. Our analysis is unified in the sense that essentially the same approach can be taken to develop convergence rate results in iid, mis-specified iid, independent non-iid, and dependent data cases."

to:NB
bayesian_consistency
prediction
statistics
nonparametrics
re:bayes_as_evol
november 2016 by cshalizi

Optimal prediction of the number of unseen species

november 2016 by cshalizi

"Estimating the number of unseen species is an important problem in many scientific endeavors. Its most popular formulation, introduced by Fisher et al. [Fisher RA, Corbet AS, Williams CB (1943) J Animal Ecol 12(1):42−58], uses n samples to predict the number U of hitherto unseen species that would be observed if t⋅n new samples were collected. Of considerable interest is the largest ratio t between the number of new and existing samples for which U can be accurately predicted. In seminal works, Good and Toulmin [Good I, Toulmin G (1956) Biometrika 43(102):45−63] constructed an intriguing estimator that predicts U for all t≤1. Subsequently, Efron and Thisted [Efron B, Thisted R (1976) Biometrika 63(3):435−447] proposed a modification that empirically predicts U even for some t>1, but without provable guarantees. We derive a class of estimators that provably predict U all of the way up to t∝logn. We also show that this range is the best possible and that the estimator’s mean-square error is near optimal for any t. Our approach yields a provable guarantee for the Efron−Thisted estimator and, in addition, a variant with stronger theoretical and experimental performance than existing methodologies on a variety of synthetic and real datasets. The estimators are simple, linear, computationally efficient, and scalable to massive datasets. Their performance guarantees hold uniformly for all distributions, and apply to all four standard sampling models commonly used across various scientific disciplines: multinomial, Poisson, hypergeometric, and Bernoulli product."

to:NB
sampling
prediction
statistics
november 2016 by cshalizi

Modeling the Heavens: Sphairopoiia and Ptolemy’s Planetary Hypotheses

july 2016 by cshalizi

"This article investigates sphairopoiia, the art of making instruments that display the heavens, in Claudius Ptolemy’s Planetary Hypotheses. It takes up two questions: what kind of instrument does Ptolemy describe? And, could such an instrument have been constructed? I argue that Ptolemy did not propose one specific type of instrument, but instead he offered a range of possible designs, with the details to be worked out by the craftsman. Moreover, in addition to exhibiting his astronomical models and having the ability to estimate predictions, the instrument he proposed would have also shown the physical workings of the heavens. What emerges is both a clearer idea of what Ptolemy wanted the technician to build, and the purpose of such instruments."

to:NB
history_of_science
astronomy
ptolemy
modeling
prediction
history_of_technology
july 2016 by cshalizi

[1606.08813] EU regulations on algorithmic decision-making and a "right to explanation"

july 2016 by cshalizi

"We summarize the potential impact that the European Union's new General Data Protection Regulation will have on the routine use of machine learning algorithms. Slated to take effect as law across the EU in 2018, it will restrict automated individual decision-making (that is, algorithms that make decisions based on user-level predictors) which "significantly affect" users. The law will also create a "right to explanation," whereby a user can ask for an explanation of an algorithmic decision that was made about them. We argue that while this law will pose large challenges for industry, it highlights opportunities for machine learning researchers to take the lead in designing algorithms and evaluation frameworks which avoid discrimination."

to:NB
explanation
statistics
prediction
decision-making
flaxman.seth
july 2016 by cshalizi

**related tags**

Copy this bookmark: