cshalizi + statistics   3592

The Identification Zoo: Meanings of Identification in Econometrics
"Over two dozen different terms for identification appear in the econometrics literature, including set identification, causal identification, local identification, generic identification, weak identification, identification at infinity, and many more. This survey (i) gives a new framework unifying existing definitions of point identification; (ii) summarizes and compares the zooful of different terms associated with identification that appear in the literature; and (iii) discusses concepts closely related to identification, such as normalizations and the differences in identification between structural models and causal, reduced form models."
to:NB  identifiability  statistics  econometrics 
4 days ago by cshalizi
Lei : Convergence and concentration of empirical measures under Wasserstein distance in unbounded functional spaces
"We provide upper bounds of the expected Wasserstein distance between a probability measure and its empirical version, generalizing recent results for finite dimensional Euclidean spaces and bounded functional spaces. Such a generalization can cover Euclidean spaces with large dimensionality, with the optimal dependence on the dimensionality. Our method also covers the important case of Gaussian processes in separable Hilbert spaces, with rate-optimal upper bounds for functional data distributions whose coordinates decay geometrically or polynomially. Moreover, our bounds of the expected value can be combined with mean-concentration results to yield improved exponential tail probability bounds for the Wasserstein error of empirical measures under Bernstein-type or log Sobolev-type conditions."
to:NB  concentration_of_measure  empirical_processes  statistics 
7 days ago by cshalizi
Fréchet analysis of variance for random objects | Biometrika | Oxford Academic
"Fréchet mean and variance provide a way of obtaining a mean and variance for metric space-valued random variables, and can be used for statistical analysis of data objects that lie in abstract spaces devoid of algebraic structure and operations. Examples of such data objects include covariance matrices, graph Laplacians of networks and univariate probability distribution functions. We derive a central limit theorem for the Fréchet variance under mild regularity conditions, using empirical process theory, and also provide a consistent estimator of the asymptotic variance. These results lead to a test for comparing kk populations of metric space-valued data objects in terms of Fréchet means and variances. We examine the finite-sample performance of this novel inference procedure through simulation studies on several special cases that include probability distributions and graph Laplacians, leading to a test for comparing populations of networks. The proposed approach has good finite-sample performance in simulations for different kinds of random objects. We illustrate the proposed methods by analysing data on mortality profiles of various countries and resting-state functional magnetic resonance imaging data."
to:NB  statistics  statistics_on_manifolds 
7 days ago by cshalizi
Network dependence testing via diffusion maps and distance-based correlations | Biometrika | Oxford Academic
"Deciphering the associations between network connectivity and nodal attributes is one of the core problems in network science. The dependency structure and high dimensionality of networks pose unique challenges to traditional dependency tests in terms of theoretical guarantees and empirical performance. We propose an approach to test network dependence via diffusion maps and distance-based correlations. We prove that the new method yields a consistent test statistic under mild distributional assumptions on the graph structure, and demonstrate that it is able to efficiently identify the most informative graph embedding with respect to the diffusion time. The methodology is illustrated on both simulated and real data."
to:NB  network_data_analysis  statistics  to_read  to_teach:baby-nets 
7 days ago by cshalizi
Causal inference with confounders missing not at random | Biometrika | Oxford Academic
"It is important to draw causal inference from observational studies, but this becomes challenging if the confounders have missing values. Generally, causal effects are not identifiable if the confounders are missing not at random. In this article we propose a novel framework for nonparametric identification of causal effects with confounders subject to an outcome-independent missingness, which means that the missing data mechanism is independent of the outcome, given the treatment and possibly missing confounders. We then propose a nonparametric two-stage least squares estimator and a parametric estimator for causal effects."
to:NB  causal_inference  statistics  missing_data 
7 days ago by cshalizi
On causal discovery with an equal-variance assumption | Biometrika | Oxford Academic
"Prior work has shown that causal structure can be uniquely identified from observational data when these follow a structural equation model whose error terms have equal variance. We show that this fact is implied by an ordering among conditional variances. We demonstrate that ordering estimates of these variances yields a simple yet state-of-the-art method for causal structure learning that is readily extendable to high-dimensional problems."
to:NB  causal_discovery  statistics  drton.mathias 
7 days ago by cshalizi
Bootstrapping spectral statistics in high dimensions | Biometrika | Oxford Academic
"Statistics derived from the eigenvalues of sample covariance matrices are called spectral statistics, and they play a central role in multivariate testing. Although bootstrap methods are an established approach to approximating the laws of spectral statistics in low-dimensional problems, such methods are relatively unexplored in the high-dimensional setting. The aim of this article is to focus on linear spectral statistics as a class of prototypes for developing a new bootstrap in high dimensions, a method we refer to as the spectral bootstrap. In essence, the proposed method originates from the parametric bootstrap and is motivated by the fact that in high dimensions it is difficult to obtain a nonparametric approximation to the full data-generating distribution. From a practical standpoint, the method is easy to use and allows the user to circumvent the difficulties of complex asymptotic formulas for linear spectral statistics. In addition to proving the consistency of the proposed method, we present encouraging empirical results in a variety of settings. Lastly, and perhaps most interestingly, we show through simulations that the method can be applied successfully to statistics outside the class of linear spectral statistics, such as the largest sample eigenvalue and others."
to:NB  spectral_methods  bootstrap  high-dimensional_statistics  statistics 
8 days ago by cshalizi
Kleijn , Zhao : Criteria for posterior consistency and convergence at a rate
"Frequentist conditions for asymptotic consistency of Bayesian procedures with i.i.d. data focus on lower bounds for prior mass in Kullback-Leibler neighbourhoods of the data distribution. The goal of this paper is to investigate the flexibility in these criteria. We derive a versatile new posterior consistency theorem, which is used to consider Kullback-Leibler consistency and indicate when it is sufficient to have a prior that charges metric balls instead of KL-neighbourhoods. We generalize our proposal to sieved models with Barron’s negligible prior mass condition and to separable models with variations on Walker’s condition. Results are also applied in semi-parametric consistency: support boundary estimation is considered explicitly and consistency is proved in a model for which Kullback-Leibler priors do not exist. As a further demonstration of applicability, we consider metric consistency at a rate: under a mild integrability condition, the second-order Ghosal-Ghosh-van der Vaart prior mass condition can be relaxed to a lower bound for ordinary KL-neighbourhoods. The posterior rate is derived in a parametric model for heavy-tailed distributions in which the Ghosal-Ghosh-van der Vaart condition cannot be satisfied by any prior."
to:NB  statistics  bayesian_consistency  nonparametrics 
9 days ago by cshalizi
[1909.01464] Rates of Convergence for Large-scale Nearest Neighbor Classification
"Nearest neighbor is a popular class of classification methods with many desirable properties. For a large data set which cannot be loaded into the memory of a single machine due to computation, communication, privacy, or ownership limitations, we consider the divide and conquer scheme: the entire data set is divided into small subsamples, on which nearest neighbor predictions are made, and then a final decision is reached by aggregating the predictions on subsamples by majority voting. We name this method the big Nearest Neighbor (bigNN) classifier, and provide its rates of convergence under minimal assumptions, in terms of both the excess risk and the classification instability, which are proven to be the same rates as the oracle nearest neighbor classifier and cannot be improved. To significantly reduce the prediction time that is required for achieving the optimal rate, we also consider the pre-training acceleration technique applied to the bigNN method, with proven convergence rate. We find that in the distributed setting, the optimal choice of the neighbor k should scale with both the total sample size and the number of partitions, and there is a theoretical upper limit for the latter. Numerical studies have verified the theoretical findings."
to:NB  nearest_neighbors  prediction  learning_theory  statistics  distributed_systems 
11 days ago by cshalizi
The accuracy, fairness, and limits of predicting recidivism | Science Advances
"Algorithms for predicting recidivism are commonly used to assess a criminal defendant’s likelihood of committing a crime. These predictions are used in pretrial, parole, and sentencing decisions. Proponents of these systems argue that big data and advanced machine learning make these analyses more accurate and less biased than humans. We show, however, that the widely used commercial risk assessment software COMPAS is no more accurate or fair than predictions made by people with little or no criminal justice expertise. In addition, despite COMPAS’s collection of 137 features, the same accuracy can be achieved with a simple linear classifier with only two features."

--- Pretty sure that The Kids matched the last result using decision trees (with the same two features).
to:NB  have_skimmed  crime  prediction  algorithmic_fairness  statistics  classifiers  to_teach:data-mining 
12 days ago by cshalizi
Unsupervised identification of the internal states that shape natural behavior | Nature Neuroscience
"Internal states shape stimulus responses and decision-making, but we lack methods to identify them. To address this gap, we developed an unsupervised method to identify internal states from behavioral data and applied it to a dynamic social interaction. During courtship, Drosophila melanogaster males pattern their songs using feedback cues from their partner. Our model uncovers three latent states underlying this behavior and is able to predict moment-to-moment variation in song-patterning decisions. These states correspond to different sensorimotor strategies, each of which is characterized by different mappings from feedback cues to song modes. We show that a pair of neurons previously thought to be command neurons for song production are sufficient to drive switching between states. Our results reveal how animals compose behavior from previously unidentified internal states, which is a necessary step for quantitative descriptions of animal behavior that link environmental cues, internal needs, neuronal activity and motor outputs."

--- Interesting to see how they select an HMM architecture. (I have Opnions about the Right Way To Do It, naturally.)
to:NB  neuroscience  state-space_models  time_series  psychology  inference_to_latent_objects  statistics  to_read 
14 days ago by cshalizi
The relationship between external variables and common factors | SpringerLink
"A theorem is presented which gives the range of possible correlations between a common factor and an external variable (i.e., a variable not included in the test battery factor analyzed). Analogous expressions for component (and regression component) theory are also derived. Some situations involving external correlations are then discussed which dramatize the theoretical differences between components and common factors."
in_NB  have_read  factor_analysis  inference_to_latent_objects  psychometrics  statistics  re:g_paper 
14 days ago by cshalizi
Some new results on factor indeterminacy | SpringerLink
"Some relations between maximum likelihood factor analysis and factor indeterminacy are discussed. Bounds are derived for the minimum average correlation between equivalent sets of correlated factors which depend on the latent roots of the factor intercorrelation matrix ψ. Empirical examples are presented to illustrate some of the theory and indicate the extent to which it can be expected to be relevant in practice."
in_NB  have_read  a_long_time_ago  factor_analysis  low-rank_approximation  statistics  re:g_paper 
14 days ago by cshalizi
Teacher Effects on Student Achievement and Height: A Cautionary Tale
"Estimates of teacher “value-added” suggest teachers vary substantially in their ability to promote student learning. Prompted by this finding, many states and school districts have adopted value-added measures as indicators of teacher job performance. In this paper, we conduct a new test of the validity of value-added models. Using administrative student data from New York City, we apply commonly estimated value-added models to an outcome teachers cannot plausibly affect: student height. We find the standard deviation of teacher effects on height is nearly as large as that for math and reading achievement, raising obvious questions about validity. Subsequent analysis finds these “effects” are largely spurious variation (noise), rather than bias resulting from sorting on unobserved factors related to achievement. Given the difficulty of differentiating signal from noise in real-world teacher effect estimates, this paper serves as a cautionary tale for their use in practice."
to:NB  value-added_measures  statistics  education  social_measurement  bad_data_analysis 
15 days ago by cshalizi
Random Forests | SpringerLink
"Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression."
to:NB  have_read  breiman.leo  ensemble_methods  decision_trees  random_forests  to_teach:data-mining  machine_learning  statistics  prediction 
15 days ago by cshalizi
Chen , Wang : Distributional properties and estimation in spatial image clustering
Clusters of different objects are of great interest in many fields, such as agriculture and ecology. One kind of clustering methods is very different from the traditional statistical clustering analysis, which is based on discrete data points. This method of clustering defines clusters as the connected areas where a well-defined spatial random field is above certain threshold. The statistical properties, especially the distributional properties, of the defined clusters are vital for the studies of related fields. However, the available statistical techniques for analyzing clustering models are not applicable to these problems. We study the distribution properties of the clusters by defining a distribution function of the clusters rigorously and providing methods to estimate the spatial distribution function. Our results are illustrated by numerical experiments and an application to a real world problem.
spatial_statistics  random_fields  to_teach:data_over_space_and_time  statistics 
15 days ago by cshalizi
Alquier , Marie : Matrix factorization for multivariate time series analysis
"Matrix factorization is a powerful data analysis tool. It has been used in multivariate time series analysis, leading to the decomposition of the series in a small set of latent factors. However, little is known on the statistical performances of matrix factorization for time series. In this paper, we extend the results known for matrix estimation in the i.i.d setting to time series. Moreover, we prove that when the series exhibit some additional structure like periodicity or smoothness, it is possible to improve on the classical rates of convergence."
in_NB  low-rank_approximation  time_series  factor_analysis  statistics  to_read  to_teach:data_over_space_and_time 
15 days ago by cshalizi
[1911.00535] Think-aloud interviews: A tool for exploring student statistical reasoning
"As statistics educators revise introductory courses to cover new topics and reach students from more diverse academic backgrounds, they need assessments to test if new teaching strategies and new curricula are meeting their goals. But assessing student understanding of statistics concepts can be difficult: conceptual questions are difficult to write clearly, and students often interpret questions in unexpected ways and give answers for unexpected reasons. Assessment results alone also do not clearly indicate the reasons students pick specific answers.
"We describe think-aloud interviews with students as a powerful tool to ensure that draft questions fulfill their intended purpose, uncover unexpected misconceptions or surprising readings of questions, and suggest new questions or further pedagogical research. We have conducted more than 40 hour-long think-aloud interviews to develop over 50 assessment questions, and have collected pre- and post-test assessment data from hundreds of introductory statistics students at two institutions.
"Think-alouds and assessment data have helped us refine draft questions and explore student misunderstandings. Our findings include previously under-reported statistical misconceptions about sampling distributions and causation. These results suggest directions for future statistics education research and show how think-aloud interviews can be effectively used to develop assessments and improve our understanding of student learning."
to:NB  to_read  heard_the_talk  kith_and_kin  statistics  cognitive_science  education  protocol_analysis  expertise 
22 days ago by cshalizi
[1910.12820] Empirical Differential Privacy
"We show how to achieve differential privacy with no or reduced added noise, based on the empirical noise in the data itself. Unlike previous works on noiseless privacy, the empirical viewpoint avoids making any explicit assumptions about the random process generating the data."
to:NB  differential_privacy  statistics 
29 days ago by cshalizi
[1910.14238] Learning Disentangled Representations for Recommendation
"User behavior data in recommender systems are driven by the complex interactions of many latent factors behind the users' decision making processes. The factors are highly entangled, and may range from high-level ones that govern user intentions, to low-level ones that characterize a user's preference when executing an intention. Learning representations that uncover and disentangle these latent factors can bring enhanced robustness, interpretability, and controllability. However, learning such disentangled representations from user behavior is challenging, and remains largely neglected by the existing literature. In this paper, we present the MACRo-mIcro Disentangled Variational Auto-Encoder (MacridVAE) for learning disentangled representations from user behavior. Our approach achieves macro disentanglement by inferring the high-level concepts associated with user intentions (e.g., to buy a shirt or a cellphone), while capturing the preference of a user regarding the different concepts separately. A micro-disentanglement regularizer, stemming from an information-theoretic interpretation of VAEs, then forces each dimension of the representations to independently reflect an isolated low-level factor (e.g., the size or the color of a shirt). Empirical results show that our approach can achieve substantial improvement over the state-of-the-art baselines. We further demonstrate that the learned representations are interpretable and controllable, which can potentially lead to a new paradigm for recommendation where users are given fine-grained control over targeted aspects of the recommendation lists."
to:NB  information_bottleneck  inference_to_latent_objects  recommender_systems  statistics 
29 days ago by cshalizi
[1910.14212] Sobolev Independence Criterion
"We propose the Sobolev Independence Criterion (SIC), an interpretable dependency measure between a high dimensional random variable X and a response variable Y . SIC decomposes to the sum of feature importance scores and hence can be used for nonlinear feature selection. SIC can be seen as a gradient regularized Integral Probability Metric (IPM) between the joint distribution of the two random variables and the product of their marginals. We use sparsity inducing gradient penalties to promote input sparsity of the critic of the IPM. In the kernel version we show that SIC can be cast as a convex optimization problem by introducing auxiliary variables that play an important role in feature selection as they are normalized feature importance scores. We then present a neural version of SIC where the critic is parameterized as a homogeneous neural network, improving its representation power as well as its interpretability. We conduct experiments validating SIC for feature selection in synthetic and real-world experiments. We show that SIC enables reliable and interpretable discoveries, when used in conjunction with the holdout randomization test and knockoffs to control the False Discovery Rate. Code is available at this http URL."
to:NB  dependence_measures  statistics  variable_selection 
29 days ago by cshalizi
[1910.02505] Boosting Local Causal Discovery in High-Dimensional Expression Data
"We study the performance of Local Causal Discovery (LCD), a simple and efficient constraint-based method for causal discovery, in predicting causal effects in large-scale gene expression data. We construct practical estimators specific to the high-dimensional regime. Inspired by the ICP algorithm, we use an optional preselection method and two different statistical tests. Empirically, the resulting LCD estimator is seen to closely approach the accuracy of ICP, the state-of-the-art method, while it is algorithmically simpler and computationally more efficient."
to:NB  causal_discovery  statistics 
29 days ago by cshalizi
[1905.00175] Boosting: Why You Can Use the HP Filter
"The Hodrick-Prescott (HP) filter is one of the most widely used econometric methods in applied macroeconomic research. Like all nonparametric methods, the HP filter depends critically on a tuning parameter that controls the degree of smoothing. Yet in contrast to modern nonparametric methods and applied work with these procedures, empirical practice with the HP filter almost universally relies on standard settings for the tuning parameter that have been suggested largely by experimentation with macroeconomic data and heuristic reasoning. As recent research (Phillips and Jin, 2015) has shown, standard settings may not be adequate in removing trends, particularly stochastic trends, in economic data.
"This paper proposes an easy-to-implement practical procedure of iterating the HP smoother that is intended to make the filter a smarter smoothing device for trend estimation and trend elimination. We call this iterated HP technique the boosted HP filter in view of its connection to L2-boosting in machine learning. The paper develops limit theory to show that the boosted HP (bHP) filter asymptotically recovers trend mechanisms that involve unit root processes, deterministic polynomial drifts, and polynomial drifts with structural breaks. A stopping criterion is used to automate the iterative HP algorithm, making it a data-determined method that is ready for modern data-rich environments in economic research. The methodology is illustrated using three real data examples that highlight the differences between simple HP filtering, the data-determined boosted filter, and an alternative autoregressive approach. These examples show that the bHP filter is helpful in analyzing a large collection of heterogeneous macroeconomic time series that manifest various degrees of persistence, trend behavior, and volatility."
to:NB  boosting  time_series  splines  statistics 
29 days ago by cshalizi
[1911.00190] Randomization as Regularization: A Degrees of Freedom Explanation for Random Forest Success
"Random forests remain among the most popular off-the-shelf supervised machine learning tools with a well-established track record of predictive accuracy in both regression and classification settings. Despite their empirical success as well as a bevy of recent work investigating their statistical properties, a full and satisfying explanation for their success has yet to be put forth. Here we aim to take a step forward in this direction by demonstrating that the additional randomness injected into individual trees serves as a form of implicit regularization, making random forests an ideal model in low signal-to-noise ratio (SNR) settings. Specifically, from a model-complexity perspective, we show that the mtry parameter in random forests serves much the same purpose as the shrinkage penalty in explicitly regularized regression procedures like lasso and ridge regression. To highlight this point, we design a randomized linear-model-based forward selection procedure intended as an analogue to tree-based random forests and demonstrate its surprisingly strong empirical performance. Numerous demonstrations on both real and synthetic data are provided."
to:NB  random_forests  statistics  decision_trees  ensemble_methods 
4 weeks ago by cshalizi
[1605.00252] Fast Rates for General Unbounded Loss Functions: from ERM to Generalized Bayes
"We present new excess risk bounds for general unbounded loss functions including log loss and squared loss, where the distribution of the losses may be heavy-tailed. The bounds hold for general estimators, but they are optimized when applied to η-generalized Bayesian, MDL, and empirical risk minimization estimators. In the case of log loss, the bounds imply convergence rates for generalized Bayesian inference under misspecification in terms of a generalization of the Hellinger metric as long as the learning rate η is set correctly. For general loss functions, our bounds rely on two separate conditions: the v-GRIP (generalized reversed information projection) conditions, which control the lower tail of the excess loss; and the newly introduced witness condition, which controls the upper tail. The parameter v in the v-GRIP conditions determines the achievable rate and is akin to the exponent in the Tsybakov margin condition and the Bernstein condition for bounded losses, which the v-GRIP conditions generalize; favorable v in combination with small model complexity leads to Õ (1/n) rates. The witness condition allows us to connect the excess risk to an "annealed" version thereof, by which we generalize several previous results connecting Hellinger and Rényi divergence to KL divergence."
to:NB  learning_theory  grunwald.peter  statistics 
4 weeks ago by cshalizi
[1807.05748] Learning Stochastic Differential Equations With Gaussian Processes Without Gradient Matching
"We introduce a novel paradigm for learning non-parametric drift and diffusion functions for stochastic differential equation (SDE). The proposed model learns to simulate path distributions that match observations with non-uniform time increments and arbitrary sparseness, which is in contrast with gradient matching that does not optimize simulated responses. We formulate sensitivity equations for learning and demonstrate that our general stochastic distribution optimisation leads to robust and efficient learning of SDE systems."
to:NB  stochastic_differential_equations  statistical_inference_for_stochastic_processes  gaussian_processes  statistics  nonparametrics 
4 weeks ago by cshalizi
Phylogenetic Comparative Methods and the Evolution of Multivariate Phenotypes | Annual Review of Ecology, Evolution, and Systematics
"Evolutionary biology is multivariate, and advances in phylogenetic comparative methods for multivariate phenotypes have surged to accommodate this fact. Evolutionary trends in multivariate phenotypes are derived from distances and directions between species in a multivariate phenotype space. For these patterns to be interpretable, phenotypes should be characterized by traits in commensurate units and scale. Visualizing such trends, as is achieved with phylomorphospaces, should continue to play a prominent role in macroevolutionary analyses. Evaluating phylogenetic generalized least squares (PGLS) models (e.g., phylogenetic analysis of variance and regression) is valuable, but using parametric procedures is limited to only a few phenotypic variables. In contrast, nonparametric, permutation-based PGLS methods provide a flexible alternative and are thus preferred for high-dimensional multivariate phenotypes. Permutation-based methods for evaluating covariation within multivariate phenotypes are also well established and can test evolutionary trends in phenotypic integration. However, comparing evolutionary rates and modes in multivariate phenotypes remains an important area of future development."
to:NB  phylogenetics  statistics 
4 weeks ago by cshalizi
[1911.01483] Statistical Inference for Model Parameters in Stochastic Gradient Descent via Batch Means
"Statistical inference of true model parameters based on stochastic gradient descent (SGD) has started receiving attention in recent years. In this paper, we study a simple algorithm to construct asymptotically valid confidence regions for model parameters using the batch means method. The main idea is to cancel out the covariance matrix which is hard/costly to estimate. In the process of developing the algorithm, we establish process-level function central limit theorem for Polyak-Ruppert averaging based SGD estimators. We also extend the batch means method to accommodate more general batch size specifications."
to:NB  optimization  estimation  statistics  re:HEAS 
4 weeks ago by cshalizi
[1911.01429] The frontier of simulation-based inference
"Many domains of science have developed complex simulations to describe phenomena of interest. While these simulations provide high-fidelity models, they are poorly suited for inference and lead to challenging inverse problems. We review the rapidly developing field of simulation-based inference and identify the forces giving new momentum to the field. Finally, we describe how the frontier is expanding so that a broad audience can appreciate the profound change these developments may have on science."
to:NB  statistics  simulation  to_read 
4 weeks ago by cshalizi
Modeling Uncertainty in Latent Class Membership: A Case Study in Criminology on JSTOR
"Social scientists are commonly interested in relating a latent trait (e.g., criminal tendency) to measurable individual covariates (e.g., poor parenting) to understand what defines or perhaps causes the latent trait. In this article we develop an efficient and convenient method for answering such questions. The basic model presumes that two types of variables have been measured: response variables (possibly longitudinal) that partially determine the latent class membership, and covariates or risk factors that we wish to relate to these latent class variables. The model assumes that these observable variables are conditionally independent, given the latent class variable. We use a mixture model for the joint distribution of the observables. We apply this model to a longitudinal dataset assembled as part of the Cambridge Study of Delinquent Development to test a fundamental theory of criminal development. This theory holds that crime is committed by two distinct groups within the population: adolescent-limited offenders and life-course-persistent offenders. As these labels suggest, the two groups are distinguished by the longevity of their offending careers. The theory also predicts that life-course-persistent offenders are disproportionately comprised of individuals born with neurological deficits and reared by caregivers without the skills and resources to effectively socialize a difficult child."
to:NB  crime  time_series  mixture_models  kith_and_kin  roeder.kathryn  statistics  inference_to_latent_objects  have_skimmed 
4 weeks ago by cshalizi
[1705.04867] Nearest Neighbors for Matrix Estimation Interpreted as Blind Regression for Latent Variable Model
"We consider the setup of nonparametric {\em blind regression} for estimating the entries of a large m×n matrix, when provided with a small, random fraction of noisy measurements. We assume that all rows u∈[m] and columns i∈[n] of the matrix are associated to latent features xrow(u) and xcol(i) respectively, and the (u,i)-th entry of the matrix, A(u,i) is equal to f(xrow(u),xcol(i)) for a latent function f. Given noisy observations of a small, random subset of the matrix entries, our goal is to estimate the unobserved entries of the matrix as well as to "de-noise" the observed entries. As the main result of this work, we introduce a nearest-neighbor-based estimation algorithm, and establish its consistency when the underlying latent function f is Lipschitz, the underlying latent space is a bounded diameter Polish space, and the random fraction of observed entries in the matrix is at least max(m−1+δ,n−1/2+δ), for any δ>0. As an important byproduct, our analysis sheds light into the performance of the classical collaborative filtering algorithm for matrix completion, which has been widely utilized in practice. Experiments with the MovieLens and Netflix datasets suggest that our algorithm provides a principled improvement over basic collaborative filtering and is competitive with matrix factorization methods. Our algorithm has a natural extension to the setting of tensor completion via flattening the tensor to matrix. When applied to the setting of image in-painting, which is a 3-order tensor, we find that our approach is competitive with respect to state-of-art tensor completion algorithms across benchmark images."
to:NB  matrix_factorization  collaborative_filtering  nearest_neighbors  statistics 
5 weeks ago by cshalizi
Sadhanala , Tibshirani : Additive models with trend filtering
"We study additive models built with trend filtering, that is, additive models whose components are each regularized by the (discrete) total variation of their kkth (discrete) derivative, for a chosen integer k≥0k≥0. This results in kkth degree piecewise polynomial components, (e.g., k=0k=0 gives piecewise constant components, k=1k=1 gives piecewise linear, k=2k=2 gives piecewise quadratic, etc.). Analogous to its advantages in the univariate case, additive trend filtering has favorable theoretical and computational properties, thanks in large part to the localized nature of the (discrete) total variation regularizer that it uses. On the theory side, we derive fast error rates for additive trend filtering estimates, and show these rates are minimax optimal when the underlying function is additive and has component functions whose derivatives are of bounded variation. We also show that these rates are unattainable by additive smoothing splines (and by additive models built from linear smoothers, in general). On the computational side, we use backfitting, to leverage fast univariate trend filtering solvers; we also describe a new backfitting algorithm whose iterations can be run in parallel, which (as far as we can tell) is the first of its kind. Lastly, we present a number of experiments to examine the empirical performance of trend filtering."
to:NB  regression  additive_models  statistics  kith_and_kin  tibshirani.ryan 
5 weeks ago by cshalizi
Veitch , Roy : Sampling and estimation for (sparse) exchangeable graphs
"Sparse exchangeable graphs on ℝ+R+, and the associated graphex framework for sparse graphs, generalize exchangeable graphs on ℕN, and the associated graphon framework for dense graphs. We develop the graphex framework as a tool for statistical network analysis by identifying the sampling scheme that is naturally associated with the models of the framework, formalizing two natural notions of consistent estimation of the parameter (the graphex) underlying these models, and identifying general consistent estimators in each case. The sampling scheme is a modification of independent vertex sampling that throws away vertices that are isolated in the sampled subgraph. The estimators are variants of the empirical graphon estimator, which is known to be a consistent estimator for the distribution of dense exchangeable graphs; both can be understood as graph analogues to the empirical distribution in the i.i.d. sequence setting. Our results may be viewed as a generalization of consistent estimation via the empirical graphon from the dense graph regime to also include sparse graphs."
to:NB  graph_limits  network_data_analysis  graphons  nonparametrics  statistics  veitch.victor  to_teach:graphons 
5 weeks ago by cshalizi
Rinaldo , Wasserman , G’Sell : Bootstrapping and sample splitting for high-dimensional, assumption-lean inference
"Several new methods have been recently proposed for performing valid inference after model selection. An older method is sample splitting: use part of the data for model selection and the rest for inference. In this paper, we revisit sample splitting combined with the bootstrap (or the Normal approximation). We show that this leads to a simple, assumption-lean approach to inference and we establish results on the accuracy of the method. In fact, we find new bounds on the accuracy of the bootstrap and the Normal approximation for general nonlinear parameters with increasing dimension which we then use to assess the accuracy of regression inference. We define new parameters that measure variable importance and that can be inferred with greater accuracy than the usual regression coefficients. Finally, we elucidate an inference-prediction trade-off: splitting increases the accuracy and robustness of inference but can decrease the accuracy of the predictions."
to:NB  linear_regression  sample_splitting  statistics  kith_and_kin  rinaldo.alessandro  wasserman.larry  g'sell.max 
5 weeks ago by cshalizi
[1910.13289] Optimal nonparametric multivariate change point detection and localization
"We study the multivariate nonparametric change point detection problem, where the data are a collection of independent p-dimensional random vectors, whose distributions are assumed to have piecewise-constant and uniformly Lipschitz densities, changing at unknown times, called change points. We quantify the magnitude of the distributional change at any change point with the supremum norm of the difference between the corresponding densities. We are concerned with the localization task of estimating the positions of the change points. In our analysis, we allow for the model parameters to vary with the total number of time points, including the minimal spacing between consecutive change points and the magnitude of the smallest distributional change. We provide information-theoretic lower bounds on both the localization rate and the minimal signal-to-noise ratio required to guarantee consistent localization. We formulate a novel algorithm based on kernel density estimation that nearly achieves the minimax lower bound, save for logarithm factors. We have provided extensive numerical evidence to support our theoretical findings."
to:NB  change-point_problem  statistics  kith_and_kin  rinaldo.alessandro 
5 weeks ago by cshalizi
[1905.11255] Kernel Conditional Density Operators
"We introduce a novel conditional density estimation model termed the conditional density operator (CDO). It naturally captures multivariate, multimodal output densities and shows performance that is competitive with recent neural conditional density models and Gaussian processes. The proposed model is based on a novel approach to the reconstruction of probability densities from their kernel mean embeddings by drawing connections to estimation of Radon-Nikodym derivatives in the reproducing kernel Hilbert space (RKHS). We prove finite sample bounds for the estimation error in a standard density reconstruction scenario, independent of problem dimensionality. Interestingly, when a kernel is used that is also a probability density, the CDO allows us to both evaluate and sample the output density efficiently. We demonstrate the versatility and performance of the proposed model on both synthetic and real-world data."
to:NB  kernel_estimators  hilbert_space  density_estimation  statistics  to_read 
5 weeks ago by cshalizi
[1907.03783] Comparing EM with GD in Mixture Models of Two Components
"The expectation-maximization (EM) algorithm has been widely used in minimizing the negative log likelihood (also known as cross entropy) of mixture models. However, little is understood about the goodness of the fixed points it converges to. In this paper, we study the regions where one component is missing in two-component mixture models, which we call one-cluster regions. We analyze the propensity of such regions to trap EM and gradient descent (GD) for mixtures of two Gaussians and mixtures of two Bernoullis. In the case of Gaussian mixtures, EM escapes one-cluster regions exponentially fast, while GD escapes them linearly fast. In the case of mixtures of Bernoullis, we find that there exist one-cluster regions that are stable for GD and therefore trap GD, but those regions are unstable for EM, allowing EM to escape. Those regions are local minima that appear universally in experiments and can be arbitrarily bad. This work implies that EM is less likely than GD to converge to certain bad local optima in mixture models."
to:NB  mixture_models  em_algorithm  optimization  computational_statistics  statistics 
5 weeks ago by cshalizi
[1910.12993] Characterizing Distribution Equivalence for Cyclic and Acyclic Directed Graphs
"The main way for defining equivalence among acyclic directed graphs is based on the conditional independencies of the distributions that they can generate. However, it is known that when cycles are allowed in the structure, conditional independence is not a suitable notion for equivalence of two structures, as it does not reflect all the information in the distribution that can be used for identification of the underlying structure. In this paper, we present a general, unified notion of equivalence for linear Gaussian directed graphs. Our proposed definition for equivalence is based on the set of distributions that the structure is able to generate. We take a first step towards devising methods for characterizing the equivalence of two structures, which may be cyclic or acyclic. Additionally, we propose a score-based method for learning the structure from observational data."
to:NB  graphical_models  causal_discovery  zhang.kun  statistics 
5 weeks ago by cshalizi
[1809.05014] Statistical Estimation of Ergodic Markov Chain Kernel over Discrete State Space
"We investigate the statistical complexity of estimating the parameters of a discrete-state Markov chain kernel from a single long sequence of state observations. In the finite case, we characterize (modulo logarithmic factors) the minimax sample complexity of estimation with respect to the operator infinity norm, while in the countably infinite case, we analyze the problem with respect to a natural entry-wise norm derived from total variation. We show that in both cases, the sample complexity is governed by the mixing properties of the unknown chain, for which, in the finite-state case, there are known finite-sample estimators with fully empirical confidence intervals."
to:NB  markov_models  statistical_inference_for_stochastic_processes  statistics  time_series  kith_and_kin  kontorovich.aryeh  to_teach:data_over_space_and_time 
5 weeks ago by cshalizi
[1910.13427] Distribution Density, Tails, and Outliers in Machine Learning: Metrics and Applications
"We develop techniques to quantify the degree to which a given (training or testing) example is an outlier in the underlying distribution. We evaluate five methods to score examples in a dataset by how well-represented the examples are, for different plausible definitions of "well-represented", and apply these to four common datasets: MNIST, Fashion-MNIST, CIFAR-10, and ImageNet. Despite being independent approaches, we find all five are highly correlated, suggesting that the notion of being well-represented can be quantified. Among other uses, we find these methods can be combined to identify (a) prototypical examples (that match human expectations); (b) memorized training examples; and, (c) uncommon submodes of the dataset. Further, we show how we can utilize our metrics to determine an improved ordering for curriculum learning, and impact adversarial robustness. We release all metric values on training and test sets we studied."

--- Interesting to see if they look at earlier work on outliers at all.
in_NB  outliers  adversarial_examples  statistics 
5 weeks ago by cshalizi
[1909.05442] Nonstationary Nonparametric Online Learning: Balancing Dynamic Regret and Model Parsimony
"An open challenge in supervised learning is \emph{conceptual drift}: a data point begins as classified according to one label, but over time the notion of that label changes. Beyond linear autoregressive models, transfer and meta learning address drift, but require data that is representative of disparate domains at the outset of training. To relax this requirement, we propose a memory-efficient \emph{online} universal function approximator based on compressed kernel methods. Our approach hinges upon viewing non-stationary learning as online convex optimization with dynamic comparators, for which performance is quantified by dynamic regret.
"Prior works control dynamic regret growth only for linear models. In contrast, we hypothesize actions belong to reproducing kernel Hilbert spaces (RKHS). We propose a functional variant of online gradient descent (OGD) operating in tandem with greedy subspace projections. Projections are necessary to surmount the fact that RKHS functions have complexity proportional to time.
"For this scheme, we establish sublinear dynamic regret growth in terms of both loss variation and functional path length, and that the memory of the function sequence remains moderate. Experiments demonstrate the usefulness of the proposed technique for online nonlinear regression and classification problems with non-stationary data."
to:NB  non-stationarity  time_series  online_learning  statistics  re:growing_ensemble_project 
6 weeks ago by cshalizi
[1810.01509] Hierarchical community detection by recursive partitioning
"The problem of community detection in networks is usually formulated as finding a single partition of the network into some "correct" number of communities. We argue that it is more interpretable and in some regimes more accurate to construct a hierarchical tree of communities instead. This can be done with a simple top-down recursive partitioning algorithm, starting with a single community and separating the nodes into two communities by spectral clustering repeatedly, until a stopping rule suggests there are no further communities. This class of algorithms is model-free, computationally efficient, and requires no tuning other than selecting a stopping rule. We show that there are regimes where this approach outperforms K-way spectral clustering, and propose a natural framework for analyzing the algorithm's theoretical performance, the binary tree stochastic block model. Under this model, we prove that the algorithm correctly recovers the entire community tree under relatively mild assumptions. We also apply the algorithm to a dataset of statistics papers to construct a hierarchical tree of statistical research communities."
to:NB  community_discovery  statistics  network_data_analysis  levina.elizaveta  sarkar.purna  bickel.peter_j. 
6 weeks ago by cshalizi
[1905.03806] Think Globally, Act Locally: A Deep Neural Network Approach to High-Dimensional Time Series Forecasting
"Forecasting high-dimensional time series plays a crucial role in many applications such as demand forecasting and financial predictions. Modern datasets can have millions of correlated time-series that evolve together, i.e they are extremely high dimensional (one dimension for each individual time-series). There is a need for exploiting global patterns and coupling them with local calibration for better prediction. However, most recent deep learning approaches in the literature are one-dimensional, i.e, even though they are trained on the whole dataset, during prediction, the future forecast for a single dimension mainly depends on past values from the same dimension. In this paper, we seek to correct this deficiency and propose DeepGLO, a deep forecasting model which thinks globally and acts locally. In particular, DeepGLO is a hybrid model that combines a global matrix factorization model regularized by a temporal convolution network, along with another temporal network that can capture local properties of each time-series and associated covariates. Our model can be trained effectively on high-dimensional but diverse time series, where different time series can have vastly different scales, without a priori normalization or rescaling. Empirical results demonstrate that DeepGLO can outperform state-of-the-art approaches; for example, we see more than 25% improvement in WAPE over other methods on a public dataset that contains more than 100K-dimensional time series."
to:NB  time_series  statistics  prediction 
6 weeks ago by cshalizi
[1910.12207] An Active Approach for Model Interpretation
"Model interpretation, or explanation of a machine learning classifier, aims to extract generalizable knowledge from a trained classifier into a human-understandable format, for various purposes such as model assessment, debugging and trust. From a computaional viewpoint, it is formulated as approximating the target classifier using a simpler interpretable model, such as rule models like a decision set/list/tree. Often, this approximation is handled as standard supervised learning and the only difference is that the labels are provided by the target classifier instead of ground truth. This paradigm is particularly popular because there exists a variety of well-studied supervised algorithms for learning an interpretable classifier. However, we argue that this paradigm is suboptimal for it does not utilize the unique property of the model interpretation problem, that is, the ability to generate synthetic instances and query the target classifier for their labels. We call this the active-query property, suggesting that we should consider model interpretation from an active learning perspective. Following this insight, we argue that the active-query property should be employed when designing a model interpretation algorithm, and that the generation of synthetic instances should be integrated seamlessly with the algorithm that learns the model interpretation. In this paper, we demonstrate that by doing so, it is possible to achieve more faithful interpretation with simpler model complexity. As a technical contribution, we present an active algorithm Active Decision Set Induction (ADS) to learn a decision set, a set of if-else rules, for model interpretation. ADS performs a local search over the space of all decision sets. In every iteration, ADS computes confidence intervals for the value of the objective function of all local actions and utilizes active-query to determine the best one."
to:NB  experimental_design  statistics  data_mining 
6 weeks ago by cshalizi
[1905.13195] Modeling Uncertainty by Learning a Hierarchy of Deep Neural Connections
"Modeling uncertainty in deep neural networks, despite recent important advances, is still an open problem. Bayesian neural networks are a powerful solution, where the prior over network weights is a design choice, often a normal distribution or other distribution encouraging sparsity. However, this prior is agnostic to the generative process of the input data, which might lead to unwarranted generalization for out-of-distribution tested data. We suggest the presence of a confounder for the relation between the input data and the discriminative function given the target label. We propose an approach for modeling this confounder by sharing neural connectivity patterns between the generative and discriminative networks. This approach leads to a new deep architecture, where networks are sampled from the posterior of local causal structures, and coupled into a compact hierarchy. We demonstrate that sampling networks from this hierarchy, proportionally to their posterior, is efficient and enables estimating various types of uncertainties. Empirical evaluations of our method demonstrate significant improvement compared to state-of-the-art calibration and out-of-distribution detection methods."

--- This seems totally ad-hoc.
to:NB  uncertainty_for_neural_networks  statistics  to_be_shot_after_a_fair_trial 
6 weeks ago by cshalizi
[1902.02979] Fair Decisions Despite Imperfect Predictions
"Consequential decisions are increasingly informed by sophisticated data-driven predictive models. However, consistently learning accurate predictive models requires access to ground truth labels. Unfortunately, in practice, labels may only exist conditional on certain decisions---if a loan is denied, there is not even an option for the individual to pay back the loan. In this paper, we show that, in this selective labels setting, learning to predict is suboptimal in terms of both fairness and utility. To avoid this undesirable behavior, we propose to directly learn stochastic decision policies that maximize utility under fairness constraints. In the context of fair machine learning, our results suggest the need for a paradigm shift from "learning to predict" to "learning to decide". Experiments on synthetic and real-world data illustrate the favorable properties of learning to decide, in terms of both utility and fairness."
to:NB  missing_data  classifiers  algorithmic_fairness  statistics  to_teach:data-mining 
6 weeks ago by cshalizi
[1810.11953] Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift
"We might hope that when faced with unexpected inputs, well-designed software systems would fire off warnings. Machine learning (ML) systems, however, which depend strongly on properties of their inputs (e.g. the i.i.d. assumption), tend to fail silently. This paper explores the problem of building ML systems that fail loudly, investigating methods for detecting dataset shift, identifying exemplars that most typify the shift, and quantifying shift malignancy. We focus on several datasets and various perturbations to both covariates and label distributions with varying magnitudes and fractions of data affected. Interestingly, we show that across the dataset shifts that we explore, a two-sample-testing-based approach, using pre-trained classifiers for dimensionality reduction, performs best. Moreover, we demonstrate that domain-discriminating approaches tend to be helpful for characterizing shifts qualitatively and determining if they are harmful."
to:NB  dataset_shift  machine_learning  model_checking  lipton.zachary  two-sample_tests  statistics 
6 weeks ago by cshalizi
[1910.12618] Textual Data for Time Series Forecasting
"While ubiquitous, textual sources of information such as company reports, social media posts, etc. are hardly included in prediction algorithms for time series, despite the relevant information they may contain. In this work, openly accessible daily weather reports from France and the United-Kingdom are leveraged to predict time series of national electricity consumption, average temperature and wind-speed with a single pipeline. Two methods of numerical representation of text are considered, namely traditional Term Frequency - Inverse Document Frequency (TF-IDF) as well as our own neural word embedding. Using exclusively text, we are able to predict the aforementioned time series with sufficient accuracy to be used to replace missing data. Furthermore the proposed word embeddings display geometric properties relating to the behavior of the time series and context similarity between words."
to:NB  text_mining  natural_language_processing  prediction  time_series  statistics 
6 weeks ago by cshalizi
[1910.12478] Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes
"Wide neural networks with random weights and biases are Gaussian processes, as observed by Neal (1995) for shallow networks, and more recently by Lee et al. (2018) and Matthews et al. (2018) for deep fully-connected networks, as well as by Novak et al. (2019) and Garriga-Alonso et al. (2019) for deep convolutional networks. We show that this Neural Network-Gaussian Process correspondence surprisingly extends to all modern feedforward or recurrent neural networks composed of multilayer perceptron, RNNs (e.g. LSTMs, GRUs), (nD or graph) convolution, pooling, skip connection, attention, batch normalization, and/or layer normalization. More generally, we introduce a language for expressing neural network computations, and our result encompasses all such expressible neural networks. This work serves as a tutorial on the *tensor programs* technique formulated in Yang (2019) and elucidates the Gaussian Process results obtained there. We provide open-source implementations of the Gaussian Process kernels of simple RNN, GRU, transformer, and batchnorm+ReLU network at this http URL."
to:NB  gaussian_processes  neural_networks  statistics 
6 weeks ago by cshalizi
[1910.12414] Locality-Sensitive Hashing for f-Divergences: Mutual Information Loss and Beyond
"Computing approximate nearest neighbors in high dimensional spaces is a central problem in large-scale data mining with a wide range of applications in machine learning and data science. A popular and effective technique in computing nearest neighbors approximately is the locality-sensitive hashing (LSH) scheme. In this paper, we aim to develop LSH schemes for distance functions that measure the distance between two probability distributions, particularly for f-divergences as well as a generalization to capture mutual information loss. First, we provide a general framework to design LHS schemes for f-divergence distance functions and develop LSH schemes for the generalized Jensen-Shannon divergence and triangular discrimination in this framework. We show a two-sided approximation result for approximation of the generalized Jensen-Shannon divergence by the Hellinger distance, which may be of independent interest. Next, we show a general method of reducing the problem of designing an LSH scheme for a Krein kernel (which can be expressed as the difference of two positive definite kernels) to the problem of maximum inner product search. We exemplify this method by applying it to the mutual information loss, due to its several important applications such as model compression."
to:NB  entropy_estimation  locality-senstive_hashing  computational_statistics  statistics 
6 weeks ago by cshalizi
[1907.02306] Consistent Regression using Data-Dependent Coverings
"In this paper, we introduce a novel method to generate interpretable regression function estimators. The idea is based on called data-dependent coverings. The aim is to extract from the data a covering of the feature space instead of a partition. The estimator predicts the empirical conditional expectation over the cells of the partitions generated from the coverings. Thus, such estimator has the same form as those issued from data-dependent partitioning algorithms. We give sufficient conditions to ensure the consistency, avoiding the sufficient condition of shrinkage of the cells that appears in the former literature. Doing so, we reduce the number of covering elements. We show that such coverings are interpretable and each element of the covering is tagged as significant or insignificant. The proof of the consistency is based on a control of the error of the empirical estimation of conditional expectations which is interesting on its own."
to:NB  statistics  regression  nonparametrics  to_read 
6 weeks ago by cshalizi
[1910.11397] The covariate-adjusted residual estimator and its use in both randomized trials and observational settings
"We often seek to estimate the causal effect of an exposure on a particular outcome in both randomized and observational settings. One such estimation method is the covariate-adjusted residuals estimator, which was designed for individually or cluster randomized trials. In this manuscript, we study the properties of this estimator and develop a new estimator that utilizes both covariate adjustment and inverse probability weighting We support our theoretical results with a simulation study and an application in an infectious disease setting. The covariate-adjusted residuals estimator is an efficient and unbiased estimator of the average treatment effect in randomized trials; however, it is not guaranteed to be unbiased in observational studies. Our novel estimator, the covariate-adjusted residuals estimator with inverse probability weighting, is unbiased in randomized and observational settings, under a reasonable set of assumptions. Furthermore, when these assumptions hold, it provides efficiency gains over inverse probability weighting in observational studies. The covariate-adjusted residuals estimator is valid for use in randomized trials, but should not be used in observational studies. The covariate-adjusted residuals estimator with inverse probability weighting provides an efficient alternative for use in randomized and observational settings."

--- I'll be interested to see how this differs from the usual "doubly-robust" estimator.
to:NB  causal_inference  statistics 
6 weeks ago by cshalizi
[1910.11410] Almost Politically Acceptable Criminal Justice Risk Assessment
"In criminal justice risk forecasting, one can prove that it is impossible to optimize accuracy and fairness at the same time. One can also prove that it is impossible optimize at once all of the usual group definitions of fairness. In the policy arena, one is left with tradeoffs about which many stakeholders will adamantly disagree. In this paper, we offer a different approach. We do not seek perfectly accurate and fair risk assessments. We seek politically acceptable risk assessments. We describe and apply to data on 300,000 offenders a machine learning approach that responds to many of the most visible charges of "racial bias." Regardless of whether such claims are true, we adjust our procedures to compensate. We begin by training the algorithm on White offenders only and computing risk with test data separately for White offenders and Black offenders. Thus, the fitted algorithm structure is exactly the same for both groups; the algorithm treats all offenders as if they are White. But because White and Black offenders can bring different predictors distributions to the white-trained algorithm, we provide additional adjustments if needed. Insofar are conventional machine learning procedures do not produce accuracy and fairness that some stakeholders require, it is possible to alter conventional practice to respond explicitly to many salient stakeholder claims even if they are unsupported by the facts. The results can be a politically acceptable risk assessment tools."
to:NB  prediction  crime  algorithmic_fairness  statistics  data_mining  to_teach:data-mining  berk.richard  not_often_you_see_an_abstract_framed_as_"fine_have_it_your_way" 
6 weeks ago by cshalizi
[1910.11743] Boosting heritability: estimating the genetic component of phenotypic variation with multiple sample splitting
"Heritability is a central measure in genetics quantifying how much of the variability observed in a trait is attributable to genetic differences. Existing methods for estimating heritability are most often based on random-effect models, typically for computational reasons. The alternative of using a fixed-effect model has received much more limited attention in the literature. In this paper, we propose a generic strategy for heritability inference, termed as \textit{"boosting heritability"}, by combining several advantageous features of different recent methods to produce an estimate of the heritability with a high-dimensional linear model. Boosting heritability uses in particular a multiple sample splitting strategy which leads to a more stable estimate. We use antibiotic resistance data from a major human pathogen, \textit{Sptreptococcus pneumoniae}, to demonstrate the applicability of our inference strategy."
to:NB  heritability  statistics  genetics  ensemble_methods 
6 weeks ago by cshalizi
[1910.11445] Finite Mixtures of ERGMs for Ensembles of Networks
"Ensembles of networks arise in many scientific fields, but currently there are few statistical models aimed at understanding their generative processes. To fill in this gap, we propose characterizing network ensembles via finite mixtures of exponential family random graph models, employing a Metropolis-within-Gibbs algorithm to conduct Bayesian inference. Simulation studies show that the proposed procedure can recover the true cluster assignments and cluster-specific parameters. We demonstrate the utility of the proposed approach using an ensemble of political co-voting networks among U.S. Senators."
to:NB  exponential_family_random_graphs  ensemble_methods  network_data_analysis  statistics  butts.carter  to_teach:baby-nets 
6 weeks ago by cshalizi
[1910.11537] Unified model selection approach based on minimum description length principle in Granger causality analysis
"Granger causality analysis (GCA) provides a powerful tool for uncovering the patterns of brain connectivity mechanism using neuroimaging techniques. Conventional GCA applies two different mathematical theories in a two-stage scheme: (1) the Bayesian information criterion (BIC) or Akaike information criterion (AIC) for the regression model orders associated with endogenous and exogenous information; (2) F-statistics for determining the causal effects of exogenous variables. While specifying endogenous and exogenous effects are essentially the same model selection problem, this could produce different benchmarks in the two stages and therefore degrade the performance of GCA. In this course, we present a unified model selection approach based on the minimum description length (MDL) principle for GCA in the context of the general regression model paradigm. Compared with conventional methods, our approach emphasize that a single mathematical theory should be held throughout the GCA process. Under this framework, all candidate models within the model space might be compared freely in the context of the code length, without the need for an intermediate model. We illustrate its advantages over conventional two-stage GCA approach in a 3-node network and a 5-node network synthetic experiments. The unified model selection approach is capable of identifying the actual connectivity while avoiding the false influences of noise. More importantly, the proposed approach obtained more consistent results in a challenge fMRI dataset for causality investigation, mental calculation network under visual and auditory stimulus, respectively. The proposed approach has potential to accommodate other Granger causality representations in other function space. The comparison between different GC representations in different function spaces can also be naturally deal with in the framework."
to:NB  time_series  granger_causality  minimum_description_length  statistics 
6 weeks ago by cshalizi
[1910.11540] Descriptive Dimensionality and Its Characterization of MDL-based Learning and Change Detection
"This paper introduces a new notion of dimensionality of probabilistic models from an information-theoretic view point. We call it the "descriptive dimension"(Ddim). We show that Ddim coincides with the number of independent parameters for the parametric class, and can further be extended to real-valued dimensionality when a number of models are mixed. The paper then derives the rate of convergence of the MDL (Minimum Description Length) learning algorithm which outputs a normalized maximum likelihood (NML) distribution with model of the shortest NML codelength. The paper proves that the rate is governed by Ddim. The paper also derives error probabilities of the MDL-based test for multiple model change detection. It proves that they are also governed by Ddim. Through the analysis, we demonstrate that Ddim is an intrinsic quantity which characterizes the performance of the MDL-based learning and change detection."
to:NB  minimum_description_length  statistics  estimation  prediction  change-point_problem 
6 weeks ago by cshalizi
[1910.11492] Causal inference for climate change events from satellite image time series using computer vision and deep learning
"We propose a method for causal inference using satellite image time series, in order to determine the treatment effects of interventions which impact climate change, such as deforestation. Simply put, the aim is to quantify the 'before versus after' effect of climate related human driven interventions, such as urbanization; as well as natural disasters, such as hurricanes and forest fires. As a concrete example, we focus on quantifying forest tree cover change/ deforestation due to human led causes. The proposed method involves the following steps. First, we uae computer vision and machine learning/deep learning techniques to detect and quantify forest tree coverage levels over time, at every time epoch. We then look at this time series to identify changepoints. Next, we estimate the expected (forest tree cover) values using a Bayesian structural causal model and projecting/forecasting the counterfactual. This is compared to the values actually observed post intervention, and the difference in the two values gives us the effect of the intervention (as compared to the non intervention scenario, i.e. what would have possibly happened without the intervention). As a specific use case, we analyze deforestation levels before and after the hyperinflation event (intervention) in Brazil (which ended in 1993-94), for the Amazon rainforest region, around Rondonia, Brazil. For this deforestation use case, using our causal inference framework can help causally attribute change/reduction in forest tree cover and increasing deforestation rates due to human activities at various points in time."
to:NB  causal_inference  climate_change  statistics  spatio-temporal_statistics  to_teach:data_over_space_and_time 
6 weeks ago by cshalizi
[1910.12343] Disentangling synchrony from serial dependency in paired event time series
"Quantifying synchronization phenomena based on the timing of events has recently attracted a great deal of interest in various disciplines such as neuroscience or climatology. A multitude of similarity measures has been proposed for thus purpose, including Event Synchronization (ES) and Event Coincidence Analysis (ECA) as two widely applicable examples. While ES defines synchrony in a data adaptive local way that does not distinguish between different time scales, ECA requires selecting a specific scale for analysis. In this paper, we use slightly modified versions of both ES and ECA that address previous issues with respect to proper normalization and boundary treatment, which are particularly relevant for short time series with low temporal resolution. By numerically studying threshold crossing events in coupled autoregressive processes, we identify a practical limitation of ES when attempting to study synchrony between serially dependent event sequences exhibiting event clustering in time. Practical implications of this observation are demonstrated for the case of functional network representations of climate extremes based on both ES and ECA, while no marked differences between both measures are observed for the case of epileptic electroencephalogram (EEG) data. Our findings suggest that careful event detection along with diligent preprocessing is recommended when applying ES while less crucial for ECA. Despite the lack of a general \emph{modus operandi} for both, event and synchronization detection, we suggest ECA as a widely robust method, especially for time resolved synchronization analyses of event time series from various disciplines."
to:NB  synchronization  time_series  statistics  to_teach:data_over_space_and_time 
6 weeks ago by cshalizi
[1910.04851] Addressing Failure Prediction by Learning Model Confidence
"Assessing reliably the confidence of a deep neural network and predicting its failures is of primary importance for the practical deployment of these models. In this paper, we propose a new target criterion for model confidence, corresponding to the True Class Probability (TCP). We show how using the TCP is more suited than relying on the classic Maximum Class Probability (MCP). We provide in addition theoretical guarantees for TCP in the context of failure prediction. Since the true class is by essence unknown at test time, we propose to learn TCP criterion on the training set, introducing a specific learning scheme adapted to this context. Extensive experiments are conducted for validating the relevance of the proposed approach. We study various network architectures, small and large scale datasets for image classification and semantic segmentation. We show that our approach consistently outperforms several strong methods, from MCP to Bayesian uncertainty, as well as recent approaches specifically designed for failure prediction."
to:NB  prediction  uncertainty_for_neural_networks  statistics 
6 weeks ago by cshalizi
[1810.11571] Analysis of KNN Information Estimators for Smooth Distributions
"KSG mutual information estimator, which is based on the distances of each sample to its k-th nearest neighbor, is widely used to estimate mutual information between two continuous random variables. Existing work has analyzed the convergence rate of this estimator for random variables whose densities are bounded away from zero in its support. In practice, however, KSG estimator also performs well for a much broader class of distributions, including not only those with bounded support and densities bounded away from zero, but also those with bounded support but densities approaching zero, and those with unbounded support. In this paper, we analyze the convergence rate of the error of KSG estimator for smooth distributions, whose support of density can be both bounded and unbounded. As KSG mutual information estimator can be viewed as an adaptive recombination of KL entropy estimators, in our analysis, we also provide convergence analysis of KL entropy estimator for a broad class of distributions."
to:NB  entropy_estimation  statistics 
6 weeks ago by cshalizi
Audits as Evidence
"We develop tools for utilizing correspondence experiments to detect illegal discrimination by individual employers. Employers violate US employment law if their propensity to contact applicants depends on protected characteristics such as race or sex. We establish identification of higher moments of the causal effects of protected characteristics on callback rates as a function of the number of fictitious applications sent to each job ad. These moments are used to bound the fraction of jobs that illegally discriminate. Applying our results to three experimental datasets, we find evidence of significant employer heterogeneity in discriminatory behavior, with the standard deviation of gaps in job-specific callback probabilities across protected groups averaging roughly twice the mean gap. In a recent experiment manipulating racially distinctive names, we estimate that at least 85% of jobs that contact both of two white applications and neither of two black applications are engaged in illegal discrimination. To assess more carefully the tradeoff between type I and II errors presented by these behavioral patterns, we consider the performance of a series of decision rules for investigating suspicious callback behavior under a simple two-type model that rationalizes the experimental data. Though, in our preferred specification, only 17% of employers are estimated to discriminate on the basis of race, we find that an experiment sending 10 applications to each job would enable accurate detection of 7-10% of discriminators while falsely accusing fewer than 0.2% of non-discriminators. A minimax decision rule acknowledging partial identification of the joint distribution of callback rates yields higher error rates but more investigations than our baseline two-type model. Our results suggest illegal labor market discrimination can be reliably monitored with relatively small modifications to existing audit designs."
to:NB  racism  sociology  statistics  audit_studies  via:jbdelong 
6 weeks ago by cshalizi
[1910.12327] A simple measure of conditional dependence
"We propose a coefficient of conditional dependence between two random variables Y and Z given a set of other variables X1,…,Xp, based on an i.i.d. sample. The coefficient has a long list of desirable properties, the most important of which is that under absolutely no distributional assumptions, it converges to a limit in [0,1], where the limit is 0 if and only if Y and Z are conditionally independent given X1,…,Xp, and is 1 if and only if Y is equal to a measurable function of Z given X1,…,Xp. Using this statistic, we devise a new variable selection algorithm, called Feature Ordering by Conditional Independence (FOCI), which is model-free, has no tuning parameters, and is provably consistent under sparsity assumptions. A number of applications to synthetic and real datasets are worked out."
to:NB  dependence_measures  statistics  variable_selection 
6 weeks ago by cshalizi
[1711.02141] Optimal rates of entropy estimation over Lipschitz balls
"We consider the problem of minimax estimation of the entropy of a density over Lipschitz balls. Dropping the usual assumption that the density is bounded away from zero, we obtain the minimax rates (nlnn)−s/(s+d)+n−1/2 for 0<s≤2 for densities supported on [0,1]d, where s is the smoothness parameter and n is the number of independent samples. We generalize the results to densities with unbounded support: given an Orlicz functions Ψ of rapid growth (such as the sub-exponential and sub-Gaussian classes), the minimax rates for densities with bounded Ψ-Orlicz norm increase to (nlnn)−s/(s+d)(Ψ−1(n))d(1−d/p(s+d))+n−1/2, where p is the norm parameter in the Lipschitz ball. We also show that the integral-form plug-in estimators with kernel density estimates fail to achieve the minimax rates, and characterize their worst case performances over the Lipschitz ball.
"One of the key steps in analyzing the bias relies on a novel application of the Hardy-Littlewood maximal inequality, which also leads to a new inequality on the Fisher information that may be of independent interest."
to:NB  entropy_estimation  minimax  statistics 
6 weeks ago by cshalizi
[1910.12128] Joint Analysis of Social and Item Response Networks with Latent Space Models
"The adjustment of students to a school environment is fundamentally linked to the friendship networks they form with their peers. Consequently, the complete picture of a student's adjustment can only be obtained by taking into account both their friendship network and their reported perceptions of the school environment. However, there is a lack of flexible statistical models and methods that can jointly analyze a social network with an item-response data matrix. In this paper, we propose a latent space model for heterogeneous (multimodal) networks (LSMH) and its extension LSMH-Item, which combine the framework of latent space modeling in network analysis with item response theory in psychometrics. Using LSMH, we summarize the information from the social network and the item responses in a person-item joint latent space. We use a Variational Bayesian Expectation-Maximization estimation algorithm to estimate the item and person locations in the joint latent space. This methodology allows for effective integration, informative visualization, and prediction of social networks and item responses. We apply the proposed methodology to data collected from 16 third-grade classrooms comprised of 451 third-grade students' self-reported friendships and school liking, which were collected as part of the Early Learning Ohio project. Through the person-item joint latent space, we are able to identify students with potential adjustment difficulties and found consistent connection between students' friendship networks and their well-being. We believe that using LSMH, researchers will be able to easily identify students in need of intervention and revolutionize the understanding of social behaviors."
to:NB  social_networks  network_data_analysis  psychometrics  statistics 
6 weeks ago by cshalizi
[1602.02201] The Rate-Distortion Risk in Estimation from Compressed Data
"We consider the problem of estimating a latent signal from a lossy compressed version of the data. We assume that the data is generated by an underlying signal and compressed using a lossy compression scheme that is agnostic to this signal. In reconstruction, the underlying signal is estimated so as to minimize a prescribed loss measure. For the above setting and an arbitrary distortion measure between the data and its compressed version, we define the rate-distortion (RD) risk of an estimator as its risk with respect to the distribution achieving Shannon's RD function with respect to this distortion. We derive conditions under which the RD risk describes the risk in estimating from the compressed data. The main theoretical tools to obtain these conditions are transportation-cost inequalities in conjunction with properties of source codes achieving Shannon's RD function. We show that these conditions hold in various settings, including settings where the alphabet of the underlying signal is finite or when the RD achieving distribution is multivariate normal. We evaluate the RD risk in special cases under these settings. This risk provides an achievable loss in compress-and-estimate settings, i.e., when the data is first compressed, communicated or stored using a procedure that is agnostic to the underlying signal, which is later estimated from the compressed version of the data. Our results imply the following general procedure for designing estimators from datasets undergoing lossy compression without specifying the actual compression technique; train the estimator based on a perturbation of the data according to the RD achieving distribution. Under general conditions, this estimator achieves the RD risk when applied to the lossy compressed version of the data."
to:NB  estimation  information_theory  compression  statistics 
6 weeks ago by cshalizi
[1910.11299] Fraud Detection in Networks: State-of-the-art
"Financial fraud detection represents the challenge of finding anomalies in networks of financial transactions. In general, the anomaly detection (AD) is the problem of distinguishing between normal data samples with well defined patterns or signatures and those that do not conform to the expected profiles. The fraudulent behaviour in money laundering may manifest itself through unusual patterns in financial transaction networks. In such networks, nodes represents customers and the edges are transactions: a directed edge between two nodes illustrates that there is a money transfer in the respective direction, where the weight on the edge is the transferred amount. In this paper we present a survey on the fundamental anomaly detection techniques and then present briefly the relevant literature in connection with fraud detection context."
to:NB  fraud  classifiers  network_data_analysis  relational_learning  statistics 
6 weeks ago by cshalizi
[1910.10871] Preventing Adversarial Use of Datasets through Fair Core-Set Construction
We propose improving the privacy properties of a dataset by publishing only a strategically chosen "core-set" of the data containing a subset of the instances. The core-set allows strong performance on primary tasks, but forces poor performance on unwanted tasks. We give methods for both linear models and neural networks and demonstrate their efficacy on data.
data_mining  computational_statistics  statistics  prediction  compression 
6 weeks ago by cshalizi
[1909.10831] Entropy from Machine Learning
"We translate the problem of calculating the entropy of a set of binary configurations/signals into a sequence of supervised classification tasks. Subsequently, one can use virtually any machine learning classification algorithm for computing entropy. This procedure can be used to compute entropy, and consequently the free energy directly from a set of Monte Carlo configurations at a given temperature. As a test of the proposed method, using an off-the-shelf machine learning classifier we reproduce the entropy and free energy of the 2D Ising model from Monte Carlo configurations at various temperatures throughout its phase diagram. Other potential applications include computing the entropy of spiking neurons or any other multidimensional binary signals."
to:NB  classifiers  entropy_estimation  statistics  to_be_shot_after_a_fair_trial 
6 weeks ago by cshalizi
[1910.10382] How well can we learn large factor models without assuming strong factors?
"In this paper, we consider the problem of learning models with a latent factor structure. The focus is to find what is possible and what is impossible if the usual strong factor condition is not imposed. We study the minimax rate and adaptivity issues in two problems: pure factor models and panel regression with interactive fixed effects. For pure factor models, if the number of factors is known, we develop adaptive estimation and inference procedures that attain the minimax rate. However, when the number of factors is not specified a priori, we show that there is a tradeoff between validity and efficiency: any confidence interval that has uniform validity for arbitrary factor strength has to be conservative; in particular its width is bounded away from zero even when the factors are strong. Conversely, any data-driven confidence interval that does not require as an input the exact number of factors (including weak ones) and has shrinking width under strong factors does not have uniform coverage and the worst-case coverage probability is at most 1/2. For panel regressions with interactive fixed effects, the tradeoff is much better. We find that the minimax rate for learning the regression coefficient does not depend on the factor strength and propose a simple estimator that achieves this rate."
to:NB  factor_analysis  high-dimensional_statistics  confidence_sets  statistics 
6 weeks ago by cshalizi
[1906.06583] Linear regression with stationary errors : the R package slm
"This paper introduces the R package slm which stands for Stationary Linear Models. The package contains a set of statistical procedures for linear regression in the general context where the error process is strictly stationary with short memory. We work in the setting of Hannan (1973), who proved the asymptotic normality of the (normalized) least squares estimators (LSE) under very mild conditions on the error process. We propose different ways to estimate the asymptotic covariance matrix of the LSE, and then to correct the type I error rates of the usual tests on the parameters (as well as confidence intervals). The procedures are evaluated through different sets of simulations, and two examples of real datasets are studied."
to:NB  linear_regression  time_series  statistics  to_teach:data_over_space_and_time  to_teach:linear_models 
6 weeks ago by cshalizi
[1910.02301] Change Detection in Noisy Dynamic Networks: A Spectral Embedding Approach
"Change detection in dynamic networks is an important problem in many areas, such as fraud detection, cyber intrusion detection and health care monitoring. It is a challenging problem because it involves a time sequence of graphs, each of which is usually very large and sparse with heterogeneous vertex degrees, resulting in a complex, high dimensional mathematical object. Spectral embedding methods provide an effective way to transform a graph to a lower dimensional latent Euclidean space that preserves the underlying structure of the network. Although change detection methods that use spectral embedding are available, they do not address sparsity and degree heterogeneity that usually occur in noisy real-world graphs and a majority of these methods focus on changes in the behaviour of the overall network.
"In this paper, we adapt previously developed techniques in spectral graph theory and propose a novel concept of applying Procrustes techniques to embedded points for vertices in a graph to detect changes in entity behaviour. Our spectral embedding approach not only addresses sparsity and degree heterogeneity issues, but also obtains an estimate of the appropriate embedding dimension. We call this method CDP (change detection using Procrustes analysis). We demonstrate the performance of CDP through extensive simulation experiments and a real-world application. CDP successfully detects various types of vertex-based changes including (i) changes in vertex degree, (ii) changes in community membership of vertices, and (iii) unusual increase or decrease in edge weight between vertices. The change detection performance of CDP is compared with two other baseline methods that employ alternative spectral embedding approaches. In both cases, CDP generally shows superior performance."
to:NB  network_data_analysis  spectral_methods  re:network_differences  change-point_problem  statistics 
6 weeks ago by cshalizi
How to Use Economic Theory to Improve Estimators: Shrinking Toward Theoretical Restrictions | The Review of Economics and Statistics | MIT Press Journals
"We propose to use economic theories to construct shrinkage estimators that perform well when the theories' empirical implications are approximately correct but perform no worse than unrestricted estimators when the theories' implications do not hold. We implement this construction in various settings, including labor demand and wage inequality, and estimation of consumer demand. We provide asymptotic and finite sample characterizations of the behavior of the proposed estimators. Our approach is an alternative to the use of theory as something to be tested or to be imposed on estimates. Our approach complements uses of theory for identification and extrapolation."
to:NB  economics  econometrics  statistics  estimation  shrinkage 
6 weeks ago by cshalizi
[1910.10174] Leveraging directed causal discovery to detect latent common causes
"The discovery of causal relationships is a fundamental problem in science and medicine. In recent years, many elegant approaches to discovering causal relationships between two variables from uncontrolled data have been proposed. However, most of these deal only with purely directed causal relationships and cannot detect latent common causes. Here, we devise a general method which takes a purely directed causal discovery algorithm and modifies it so that it can also detect latent common causes. The identifiability of the modified algorithm depends on the identifiability of the original, as well as an assumption that the strength of noise be relatively small. We apply our method to two directed causal discovery algorithms, the Information Geometric Causal Inference of (Daniusis et al., 2010) and the Kernel Conditional Deviance for Causal Inference of (Mitrovic, Sejdinovic, and Teh, 2018), and extensively test on synthetic data---detecting latent common causes in additive, multiplicative and complex noise regimes---and on real data, where we are able to detect known common causes. In addition to detecting latent common causes, our experiments demonstrate that both modified algorithms preserve the performance of the original directed algorithm in distinguishing directed causal relations."
to:NB  causal_discovery  inference_to_latent_objects  statistics 
6 weeks ago by cshalizi
Anonymous Heterogeneous Distributed Detection: Optimal Decision Rules, Error Exponents, and the Price of Anonymity - IEEE Journals & Magazine
"We explore the fundamental limits of heterogeneous distributed detection in an anonymous sensor network with $n$ sensors and a single fusion center. The fusion center collects the single observation from each of the $n$ sensors to detect a binary parameter. The sensors are clustered into multiple groups, and different groups follow different distributions under a given hypothesis. The key challenge for the fusion center is the anonymity of sensors—although it knows the exact number of sensors and the distribution of observations in each group, it does not know which group each sensor belongs to. It is hence natural to consider it as a composite hypothesis testing problem. First, we propose an optimal test called mixture likelihood ratio test , which is a randomized threshold test based on the ratio of the uniform mixture of all the possible distributions under one hypothesis to that under the other hypothesis. Optimality is shown by first arguing that there exists an optimal test that is symmetric , that is, it does not depend on the order of observations across the sensors, and then proving that the mixture likelihood ratio test is optimal among all symmetric tests. Second, we focus on the Neyman–Pearson setting and characterize the error exponent of the worst-case type-II error probability as $n$ tends to infinity, assuming the number of sensors in each group is proportional to $n$ . Finally, we generalize our result to find the collection of all achievable type-I and type-II error exponents, showing that the boundary of the region can be obtained by solving an optimization problem. Our results elucidate the price of anonymity in heterogeneous distributed detection, and can be extended to $M$ -ary hypothesis testing with heterogeneous observations generated according to hidden latent variables."
to:NB  hypothesis_testing  distributed_systems  statistics  information_theory 
6 weeks ago by cshalizi
[1910.04832] Learning interaction kernels in heterogeneous systems of agents from multiple trajectories
"Systems of interacting particles or agents have wide applications in many disciplines such as Physics, Chemistry, Biology and Economics. These systems are governed by interaction laws, which are often unknown: estimating them from observation data is a fundamental task that can provide meaningful insights and accurate predictions of the behaviour of the agents. In this paper, we consider the inverse problem of learning interaction laws given data from multiple trajectories, in a nonparametric fashion, when the interaction kernels depend on pairwise distances. We establish a condition for learnability of interaction kernels, and construct estimators that are guaranteed to converge in a suitable L2 space, at the optimal min-max rate for 1-dimensional nonparametric regression. We propose an efficient learning algorithm based on least squares, which can be implemented in parallel for multiple trajectories and is therefore well-suited for the high dimensional, big data regime. Numerical simulations on a variety examples, including opinion dynamics, predator-swarm dynamics and heterogeneous particle dynamics, suggest that the learnability condition is satisfied in models used in practice, and the rate of convergence of our estimator is consistent with the theory. These simulations also suggest that our estimators are robust to noise in the observations, and produce accurate predictions of dynamics in relative large time intervals, even when they are learned from data collected in short time intervals."
to:NB  interacting_particle_systems  statistical_inference_for_stochastic_processes  statistics 
6 weeks ago by cshalizi
« earlier      
per page:    204080120160

related tags

20th_century_history  abstract_algebra  academia  accuracy_vs_precision  acree.brice  active_learning  adamic.lada  adams.terrence  addiction  additive_models  adversarial_examples  advertising  affectionate_parody  affirmative_action  afghanistan  age-period-cohort  agent-based_models  aggregation  ai  airoldi.edo  alcoholism  algebra  algebraic_statistics  algorithmic_fairness  algorithmic_information_theory  algorithms  allen.genevera_i.  allometry  alquier.pierre  amari.shun-ichi  america  american_history  american_south  analysis  analysis_of_variance  ancient_history  ancient_trade  anderson.chris  anderson.norm  anomaly_detection  antidepressants  approximate_bayesian_computation  approximation  archaeology  arlot.sylvain  arms_trade  aronow.peter  artificial_intelligence  art_history  asta.dena  astronomy  asymptotics  athey.susan  attention  attractor_reconstruction  audit_studies  autism  automata_theory  automated_diagnosis  ay.nihat  azizyan.martin  a_long_time_ago  baby_steps  bacanu.silviu-alin  bach.francis_r.  bad_data_analysis  bad_science  bad_science_journalism  bakshy.eytan  bandit_problems  banks.david  barely-comprehensible_mathematics  barron.andrew_w.  bartlett.m.s.  bartlett.peter_l.  base_rates  bayesianism  bayesian_consistency  bayesian_nonparametrics  bayes_rule  beaumont.mark  because_everything_is_clearer_with_category_theory  beirl.wolfgang  belkin.mikhail  beran.jan  berger.james  bergstrom.carl_t.  berk.richard  berk.robert_h  bernstein-von_mises  bialek.william  biau.gerard  bibliometry  bickel.david_r.  bickel.peter_j.  big_data  biochemical_networks  bioinformatics  biology  biophysics  birds  birge.lucien  bitcoin  blanchard.gilles  blattman.chris  blei.david  blitzstein.joseph  blogged  blogging  blogs  bollt.erik_m.  books:can't_really_recommend  books:contributed-to  books:noted  books:owned  books:recommended  books:reviewed  book_reviews  boosting  bootstrap  borgs.christian  borsboom.denny  boucheron.stephane  bounded_rationality  bousquet.olivier  bradley.elizabeth  branching_processes  breiman.leo  brillinger.david  brown.emery  brown.lawrence  buhlmann.peter  buntine.wray  bureaucracy  burke.timothy  busy_busy_busy  butts.carter  but_why_not_just_regularize?_why_for_the_love_of_god?  cai.t._tony  calibration  cambridge_analytica  cart  cartoons  caruana.rich  carvalho.carlos  categorical_data  category_theory  catoni.olivier  cats  causality  causal_discovery  causal_inference  cavalli-sforza.l.luca  celisse.alain  central_limit_theorem  cesa-bianchi.nicolo  chains_with_complete_connections  chandrasekhar.arun  change-point_problem  change_of_representation  chaos  chatterjee.sourav  chayes.jennifer  chemistry  chen.yen-chi  choi.david_s.  cholera  chouldechova.alexandra  chow-liu_trees  citation_networks  cities  clarke.kevin  classifiers  class_struggles_in_america  clauset.aaron  clermont.gilles  cleveland.william_s.  climate_change  climatology  clinical_trials  clinical_vs_actuarial_judgment  clustering  cmu  coarse-graining  coates.ta-nehisi  coddington.mark  cognitive_biases  cognitive_development  cognitive_science  cohen.william  cold_war  coleman.todd  collaborative_filtering  collective_cognition  collinearity  communism  community_discovery  comparative_history  comparative_methods  comparing_distributions  complete_correct  complexity  complexity_measures  compressed_sensing  compression  computability  computational_complexity  computational_statistics  concentration_of_measure  conditional_density_estimation  conditional_independence  conditional_independence_tests  conferences  confidence_sets  confirmation_bias  conformal_prediction  confounding  congress  consistency  contagion  content_analysis  context-free_grammars  contingency_tables  control_theory_and_control_engineering  convergence_of_stochastic_processes  convexity  convex_sets  cool_if_true  copulas  corporations  correlation  corruption  cosmology  covariance  covariate_shift  cover.thomas  coveted  cox.d.r  cox.d.r.  cox.david_r.  cramer-rao  cramer-rao_inequality  crawford.forrest  crime  cross-validation  crutchfield.james_p.  cultural_evolution  cumulants  curse_of_dimensionality  curve-estimation  curve_fitting  cyclones  damouras.sotirios  danks.david  darmon.david  das.kaustav  dasgupta.anirban  databases  dataset_shift  data_analysis  data_cleaning  data_collection  data_mining  data_science  data_sets  davison.anthony  dawid.a.p.  dawid.a._philip  dawid.philip  debunking  deceiving_us_has_become_an_industrial_process  decision-making  decision_theory  decision_trees  deconvolution  dedeo.simon  default_priors  degrees_of_freedom  delong.brad  del_moral.pierre  dembo.amir  demography  density_estimation  density_ratio_estimation  dependence_measures  design_for_a_brain  development_economics  development_policy  deviation_inequalities  devlin.bernie  devroye.luc  dewitt.helen  diaconis.persi  didelez.vanessa  did_this_really_need_saying?  differential_equations  differential_geometry  differential_privacy  diffusion_maps  diffusion_of_innovations  diggle.peter  dimension_estimation  dimension_reduction  directed_information  discretization  discrimination  distance_covariance  distributed_systems  distributions  diversity  dobra.adrian  domingos.pedro  donoho.david  douc.randal  downloaded  drton.mathias  drugs  dsges  dsm  dsquared  duchi.john_c.  dudoit.sandrine  dunn.robin  dynamical_systems  dynamics_in_cognition  dynamic_networks  eagle.nathan  early_modern_european_history  earthquakes  eberhardt.frederick  eckles.dean  ecology  econometrics  economics  economics_of_superstars  economic_growth  economic_history  eddy.william_f.  education  EEG  efromovich.sam  efron.brad  efron.bradley  eichler.michael  ellenberg.jordan  elwert.felix  empirical_likelihood  empirical_processes  em_algorithm  encompassing  energy  ensemble_methods  entableted  entropy  entropy_estimation  entropy_rate  epidemics_on_networks  epidemic_models  epidemiology  epidemiology_of_representations  epilepsy  epistemology  equations_of_motion_from_a_time_series  equations_of_motion_from_time_series  ergodic_decomposition  ergodic_theory  error-in-variables  errors-in-variables  error_statistics  estimation  estimation_of_dynamical_systems  ethnography  events  everything_old_is_new_again  evidence  evidence_based  evisceration  evolution  evolutionary_biology  evolutionary_optimization  evolutionary_psychology  evolution_of_cooperation  exchangeability  exchangeable_arrays  exchangeable_sequences  expectation-maximization  experimental_design  experimental_economics  experimental_psychology  experimental_sociology  experiments  expertise  explainable_predictors  explanation  explanation_by_mechanisms  exploratory_data_analysis  exponential_families  exponential_family_random_graphs  extreme_values  facebook  factor_analysis  fact_checking  fairness  fairness_in_machine_learning  fan.jianqing  faraway.j.j.  fast-and-frugal_heuristics  feedback  fienberg.stephen  fienberg.stephen_e.  filtering  finance  financial_markets  financial_speculation  fingerprints  fink.daniel  fisher.franklin_m.  fisher.r.a.  fisher_information  flaxman.seth  fleuret.francois  flocks_and_swarms  fluctuation-response  fluid_mechanics  fmri  folding_problem  food  food_webs  forensics  fosdick.bailey  foster.dean_p.  foundations_of_statistics  fourier_analysis  fox.emily  fox.emily_b.  fractals  france  franklin.charles  fraser.d.a.s.  fraud  freedman.david  freedman.david_a.  freeman.peter  functional_connectivity  functional_data  functional_data_analysis  funny:academic  funny:because_its_true  funny:geeky  funny:malicious  funny:unintentionally  g'sell.max  galstyan.aram  gambling  game_theory  gaussian_processes  geisser.seymour  gelman.andrew  geman.donald  generalized_linear_models  genetics  genetic_algorithms  gene_expression_data_analysis  gene_regulation  genomics  genomic_control  genovese.chris  genovese.christopher  geography  geology  geometry  geometry_from_a_time_series  geometry_of_statistical_inference  george.ed  gershman.sam  getoor.lise  geyer.charles  geyer.charles_j.  ghahramani.zoubin  gibbs_distributions  gibrats_law  gigerenzer.gerd  gigs  gile.krista  gile.krista_j.  gini_coefficient  gini_index  girvan.michelle  gives_economists_a_bad_name  globalization  glymour.clark  gneiting.tilmann  goel.sharad  goerg.georg  goerg.georg_m.  good-turing_estimation  good.i.j.  goodness-of-fit  good_causes  gourieroux.christian  grade_inflation  grading  graduate_school  gramacy.robert  grammar_induction  granger_causality  grants  graphical  graphical_models  graphons  graph_discovery  graph_embedding  graph_learning  graph_limits  graph_theory  graves.timothy  great_depression  great_risk_shift  green.peter_j.  greenhouse.joel  gretton.arthur  gretton.arthurt  grunwald.peter  gustafson.paul  guttorp.peter  guyon.isabelle  haavelmo.trygve  hacking.ian  hahn.p._richard  halbert_white_died_for_your_sins  hall.peter  handcock.mark  hansen.ben  hansen.bruce  hansen.christian  happiness  hardle.wolfgang  harrison.matthew_t.  hart.jeffrey  haslinger.rob  hastie.trevor  have_forgotten  have_read  have_read_too_many_times  have_skimmed  have_taught  have_written  hazan.elad  healy.kieran  heard_the_talk  heavy_tails  heckerman.david  hendry.david  heritability  hero.alfred  hero.alfred_o.iii  hero.alfred_o._iii  heteroskedasticity  heuristics  hierarchical_statistical_models  hierarchical_structure  high-dimensional_probability  high-dimensional_statistics  hilbert_space  hill.jennifer  historical_genetics  historical_linguistics  historical_materialism  history  history_of_economics  history_of_ideas  history_of_mathematics  history_of_science  history_of_statistics  hjort.nils_lid  hodrick-prescott_filter  hoeffdings_inequality  hoff.peter  holmes.susan  homophily  homrighausen.darren  hooi.bryan  hooker.giles  hopcroft.john  horowitz.joel  hotelling.harold  hoyer.patrik_o.  hsu.daniel  huber.peter  huff.darrell  human_genetics  hunter.david_r.  hutter.marcus  hydrodynamics  hyperbolic_geometry  hypothesis_testing  ia!_ia!_raginsky_fhtagn!  iacus.stefano  identifiability  imbens.guido  imbens.guido_w.  imperialism  importance_sampling  imputation  independence_testing  independence_tests  independent_component_analysis  india  indirect_inference  individual_sequence_prediction  induction  industrial_organization  inequalities  inequality  inference_to_latent_objects  influence  influenza  information_bottleneck  information_cascades  information_criteria  information_geometry  information_retrieval  information_theory  innovation  insects  institutions  instrumental_variables  insurance  interacting_particle_systems  interpolation  interview  intro_stats  inverse_problems  in_library  in_NB  in_the_acknowledgments  in_wishlist  ionides.edward  iran  ising_model  izbicki.rafael  izibicki  i_knew_him_when  i_see_what_you_did_there  jackknife  jackson.matthew_o.  jakulin.aleks  janzing.dominik  javidi.tara  jensen.david  jensen.shane  jensen.shane_t.  jiang.wenxin  jin.jiashun  jordan.michael_i.  jost.jurgen  journalism  junker.brian  k-means  kadane.jay  kafadar.karen  kakade.sham  kalisch.markus  kallenberg.olav  kalman_filter  karhunen-loeve_decomposition  karl_pearson_died_for_your_sins  karrer.brian  kass.robert  kelly.kevin_t.  kelly_gambling  kempthorne.oscar  kennedy.edward  kennedy.edward_h.  kernel_estimators  kernel_methods  king.gary  kirshner.sergey  kith_and_kin  klein.ezra  kleinberg.jon  klemens.ben  knockoffs  kolaczyk.eric  kolar.mladen  koller.daphne  kolmogorov-smirnov-test  kontorovich.aryeh  kontoyiannis.ioannis  kriging  krijnen.wim  krivitsky.pavel  kronecker_graphs  kullback.solomon  kurths.jurgen  kurtz.zachary  lafferty.john  lagrange_multipliers  lahiri.s.n.  landauers_principle  lane.david  lang.kevin  langford.john  laplace_approximation  large_deviations  lasso  latent_dirichlet_allocation  latent_semantic_analysis  latent_variables  lauritzen.steffen  law  law_of_the_iterated_logarithm  lazer.david  le.quoc  lead  leamer.ed  learning_in_games  learning_theory  learning_under_dependence  lebanon.guy  lebaron.blake  lecue.guillaume  lee.ann_b.  leeb.hannes  leek.jeffrey_t.  lehmann.erich  lei.jing  leskovec.jure  levina.elizaveta  levina.liza  levitt.steven  levy_processes  le_cam.lucien  liberman.mark  lie_detection  likelihood  likelihood_ratio_tests  linear_algebra  linear_programming  linear_regression  linguistics  link_prediction  lipson.hod  lipton.zachary  literary_criticism  liu.han  liu.richard  lives_of_the_scientists  living_in_sin  locality-senstive_hashing  logic  logical_positivism  logistic_regression  loh.po-ling  lolcats  long-memory_processes  long-range_dependence  low-dimensional_summaries  low-rank_approximation  low-regret_learning  luby.amanda  luca.diana  lugosi.gabor  lumley.thomas  luxburg.ulrike_von  maathuis.marloes  maathuis.marloes_h.  machine_learning  machine_translation  macroeconomics  macro_from_micro  madigan.david  mahoney.michael_w.  malawi  management  mandelbrot.benoit  manifold_learning  mankad.shawn  mann.michael  manski.charles  manski.charles_f.  maps  marcus.gary  market_failures_in_everything  markov_models  martin.john_levi  martingales  massart.pascal  matching  mathematical_logic  mathematics  matloff.norm  matloff.norman  matrix_completion  matrix_factorization  mauro.jacqueline  maximum_entropy  mayo-wilson.conor  mayo.deborah  mccloskey.deirdre  mccormick.tyler  mccormick.tyler_h.  mccullagh.peter  mcdonald.daniel  mcfowland.edward_iii  mdl  mean-field_theory  measurement  measure_theory  medicine  medieval_european_history  meehl.paul  meier.lukas  meila.marina  meinshausen.nicolai  memory  memo_to_self_real_scholars_write  mendelson.sahar  mendelson.shahar  mental_testing  merhav.neri  meta-analysis  meteorology  methodological_advice  methodology  method_of_moments  method_of_sieves  metric_learning  meyers.lauren_ancel  meyn.sean_p.  minimax  minimum_description_length  mining_the_bibliome  mis-specification_testing  missing_data  misspecification  mixing  mixture_models  mizon.grayham  modeling  model_averaging  model_checking  model_discovery  model_search  model_selection  model_uncertainty  modularity  molecular_biology  monfort.alain  montanari.andrea  monteleoni.claire  monte_carlo  moocs  moore.cristopher  moral_depravity  morley.james  morvai.gusztav  mossel.elchanan  moulines.eric  mucha.peter  mucha.peter_j.  mukherjee.sayan  muller.jerry  multidimensional_scaling  multiple_comparisons  multiple_imputation  multiple_testing  multivariate_distributions  murphy.susan  murray-watters.alexander  murray.charles  mushrooms  nadler.boaz  nagin.dan  names  narcissism  nardi.yuval  natural_history_of_truthiness  natural_language_processing  neal.radford  nearest-neighbors  nearest-neighbor_methods  nearest-neighors  nearest_neighbors  nemenman.ilya  networked_life  networks  network_comparison  network_data_analysis  network_differences  network_experiments  network_formation  network_sampling  network_visualization  neural_coding_and_decoding  neural_data_analysis  neural_modeling  neural_networks  neuropsychology  neuroscience  neutral_models  neville.jennifer  newman.mark  neyman-pearson_lemma  neyman.jerzy  neyman_smooth_tests  ng.serena  nichols.thomas_e.  nickl.richard  nickl.robert  nielsen.michael  niezink.nynke  nigeria  nilsson_jacobi.martin  niyogi.partha  nobel.andrew  noel.hans  nolan.deborah  nominate  non-equilibrium  non-stationarity  nonparametrics  norvig.peter  not_often_you_see_an_abstract_framed_as_"fine_have_it_your_way"  not_worth_putting_in_notebooks  nowak.robert  no_really_utter_stupidity  no_such_thing_as_model_free_prediction  nugent.rebecca  nuisance_parameters  null_models  numerical_analysis  nyhan.brendan  o'neil.cathy  obesity  obituaries  objectivity  obviously_correct  obvious_to_one_skilled_in_the_art  occams_razor  occupy_wall_street  oceanography  oconnor.brendan  official_statistics  of_a_sort  ogburn.elizabeth  ok_not_quite_scooped  ok_surprisingly_fragile_data_analysis  online_learning  on_the_accuracy_of_economic_observations  on_the_thesis_committee  operator_learning  operator_theory  optimization  oracle_inequalities  orbanz.peter  order_statistics  ordinal_data  organizations  our_decrepit_institutions  outliers  owen.art  p-values  pac-bayesian  paper_writing  parallelism  partial_identification  partial_pooling  particle_detectors  particle_filters  particle_physics  parzen.emanuel  path_integrals  pattern_discovery  pattern_recognition  pattison.philippa  pearl.judea  pearson.karl  pedagogy  penn.mark  peres.yuval  periodicity  perl  perry.patrick_o.  personality  perturbation_theory  peters.jonas  phase_transitions  philadelphia  philosophy_of_mind  philosophy_of_science  phylogenetics  physics  physics_of_information  pillai.natesh  pillow.jonathan  pitman.jim  pittsburgh  please_dont_let_me_be_scooped  please_give_me_strength  poczos.barnabas  point_processes  poldrack.russell  police  political_economy  political_science  pollard.david  polling  popular_social_science  porter.mason_a.  post-selection_inference  potscher.benedict_m.  poverty  practices_relating_to_the_transmission_of_genetic_information  pre-validation  predator-prey_models  prediction  prediction_trees  predictive_state_representations  prequentialism  pretty_pictures  priebe.carey_e.  primates  principal_components  privacy  probability  programming  projection  proof_theory  propagation_of_error  propensity_scores  proteins  protocol_analysis  psychiatry  psychoceramics  psychology  psychometrics  publication_bias  public_health  public_opinion  public_policy  quality_control  r  race  racine.jeffrey  racine.jeffrey_s.  racism  racist_idiocy  rafael  raftery.adrian  raginsky.maxim  rahimi.ali  rakhlin.alexander  rakhlin.sasha  ramdas.aaditya  rand.william  randomization  randomized_algorithms  random_fields  random_forests  random_matrices  random_projections  random_time_changes  rare-event_simulation  rare_events  rasmussen.carl_edward  rational_expectations  rauchway.eric  ravikumar.pradeep  re:6dfb  re:actually-dr-internet-is-the-name-of-the-monsters-creator  re:ADAfaEPoV  re:almost_none  re:AoS_project  re:bayes_as_evol  re:community_control  re:critique_of_diffusion  re:data_science_whitepaper  re:democratic_cognition  re:do-institutions-evolve  re:do_not_adjust_your_receiver  re:fitness_sampling  re:fractal_network_asymptotics  re:functional_communities  re:growing_ensemble_project  re:g_paper  re:hea  re:HEAS  re:homophily_and_confounding  re:hyperbolic_networks  re:kinematic-decoding  re:knightian_uncertainty  re:LICORS  re:LoB_project  re:model_selection_for_networks  re:naive-semi-supervised  re:network_differences  re:neutral_cultural_networks  re:neutral_model_of_inquiry  re:phil-of-bayes_paper  re:pli-R  re:small-area_estimation_by_smoothing  re:smoothing_adjacency_matrices  re:social_networks_as_sensor_networks  re:stacs  re:TALR  re:urban_scaling_what_urban_scaling  re:what_is_a_macrostate  re:what_is_the_right_null_model_for_linear_regression  re:XV_for_mixing  re:XV_for_networks  re:your_favorite_dsge_sucks  re:your_favorite_ergm_sucks  reaction-diffusion  read_the_draft  read_the_thesis  recht.benjamin  recommender_systems  record_linkage  recurrence_times  red_state_blue_state  regression  regulation  reinforcement_learning  reinhart.alex  relational_learning  religion  renormalization  renyi_entropy  replicator_dynamics  reproducibility  resampling  research_and_development  respondent-driven_sampling  review_papers  rhetoric  richard.jean-francois  richards.joey  richardson.thomas  ridge_regression  riedewald.mirek  rigollet.philippe  rinaldo.alessandro  ripley.brian  risk_assessment  risk_vs_uncertainty  rivers  robert.christian  robin.stephane  robins.james  robins.james_m.  robustness  robust_statistics  rodu.jordan  rodu_jordan  roeder.kathryn  rogeberg.ole  rohe.karl  rosenbaum.paul  rosenblatt.murray  rosenfeld.roni  rosling.hans  rosset.saharon  rossman.gabriel  rosvall.martin  rousseau.judith  rubin.donald  rubin.jonathan  rudin.cynthia  runge.jakob  running_dogs_of_reaction  ryabko.daniil  ryden.tobias  sadeghi.kayvan  salakhtudinov.ruslan  salakhutdinov.ruslan  sales.adam  salganik.matthew_j.  salmon  samii.cyrus  sample_splitting  sampling  samworth.richard  samworth.richard_j.  samworth.robert  sandler.mark  sardinia  sarkar.purna  sarkar.purnamrita  sarwate.anand  satire  savage.leonard_j.  savage.van  schafer.chad  scheines.richard  schofield.lynne  scholarships  schweinberger.michael  science  science_as_a_social_process  science_journalism  science_policy  scientific_computing  scooped  scooped?  scoring_rules  scotland  scott.clayton  seals  search_engines  segregation  selection_bias  self-centered  self-fulfilling_prophecies  self-promotion  self-similarity  semantics  semi-supervised_learning  send_a_reprint  sensitive_dependence_on_initial_conditions  sensitivity_analysis  sequential_monte_carlo  sequential_optimization  series_of_footnotes  sethna.james  sexist_idiocy  shafer.glenn  shanteau.james  sharpee.tatyana_o.  sharpnack.james  sheu.chyong-hwa  shore.jesse_c.  shot_after_a_fair_trial  shpitser.ilya  shrinkage  signal_processing  signal_transduction  silver.nate  silverman.bernard  simonoff.jeffrey  sims.christopher  simulation  simulation-based_inference  singh.aarti  singh.satinder_baveja  singh.sumeetpal  singh.sumeetpal_s.  singular_value_decomposition  sleep  smale.stephen  small-area_estimation  small.dylan  smith.leonard  smoking  smola.alex  smoothing  snijders.t.a.b.  snijders.tom  sobel.michael_e.  sober.elliott  social_history  social_influence  social_learning  social_life_of_the_mind  social_measurement  social_media  social_movements  social_networks  social_neuroscience  social_science_methodology  sociolinguistics  sociology  sociology_of_science  software  solo.victor  sornette.didier  sorokina.daria  spanos.aris  sparsity  spatial_statistics  spatio-temporal_statistics  spectral_clustering  spectral_estimation  spectral_methods  speed.terry  spirtes.peter  splines  sports  sra.suvrit  stability_of_learning  standardized_testing  stanley.h._eugene  stark.philip_b.  state-space_models  state-space_reconstruction  state_estimation  state_reconstruction  stationarity  stationary_features  statistical_computing  statistical_inference_for_stochastic_processes  statistical_interaction  statistical_mechanics  statistics  statistics_on_manifolds  stein.charles  steins_method  steinwart.ingo  stepping_stone_model  stigler.stephen  stochastic_approximation  stochastic_block_models  stochastic_differential_equations  stochastic_models  stochastic_processes  stochastic_volatility  stodden.victoria  stovel.katherine  structural_equations  structural_risk_minimization  structured_data  stuart.elizabeth  studentization  stumpf.michael.p.h.  sufficiency  sugihara.george  sugiyama.masashi  summer_schools  superefficiency  supervenience  support-vector_machines  surveillance  surveys  survey_sampling  survival_analysis  symmetry  synchronization  synthetic_controls  systems_identification  taskar.ben  teaching  teaching_evaluations  television  telgarsky.matus  tenenbaum.joshua  tensors  terrorism  tetrad  text_mining  theoretical_computer_science  theory_of_the_novel  thermodynamics  the_american_dilemma  the_bias_variance_tradeoff_rules_everything_around_me  the_civilizing_process  the_continuing_crises  the_objective_function_which_can_be_admitted_to_is_not_the_true_objective_function  the_present_before_it_was_widely_distributed  the_problem_is_not_the_p-values  this_needed_saying?  thomas.a.c.  thomas.andrew_c.  tibshirani.robert  tibshirani.ryan  time_rescaling  time_series  time_series_connections  to:blog  to:NB  to;NB  todorova.sonia  topic_models  topology  to_be_shot_after_a_fair_trial  to_read  to_teach  to_teach:advanced-stochastic-processes  to_teach:baby-nets  to_teach:complexity-and-inference  to_teach:data-mining  to_teach:data_over_space_and_time  to_teach:graphons  to_teach:linear_models  to_teach:statcomp  to_teach:undergrad-ADA  to_teach:undergrad-research  tracked_down_references  track_down_references  trends_and_detrending  tukey.john_w.  tumley.thomas  turbulence  tutorials  two-sample_tests  u-statistics  ugander.johan  uhler.caroline  uncertainty_for_neural_networks  under_precisely_controlled_experimental_conditions_the_organism_does_what_it_damn_well_pleases  unemployment  unions  universal_prediction  urban_economics  user_interfaces  us_politics  utter_stupidity  value-added_measurement_in_education  value-added_measures  value_of_information  vanderweele.tyler  van_der_laan.mark  van_der_maas.h.l.j.  van_der_vaart.aad  van_de_geer.sara  van_handel.ramon  van_roy.benjamin  vapnik.v.n.  variable_selection  variance_components  variance_estimation  variational_inference  vats.divyanshu  vc-dimension  veitch.victor  ventura.sam  ventura.valerie  verdinelli.isa  verzani.john  ver_steeg.greg  via:?  via:aaronsw  via:aaron_clauset  via:abbas-raza  via:abumuqawama  via:ale  via:ariddell  via:arinaldo  via:arsyed  via:arthegall  via:a_budding_sociologists_commonplace_book  via:betsy_ogburn  via:brendano  via:chl  via:civilstat  via:crooked_timber  via:d-squared  via:deaneckles  via:ded-maxim  via:dena  via:djm1107  via:dpfeldman  via:DRMacIver  via:dynamic_ecology  via:erindanielson  via:flaxman  via:flowing_data  via:gelman  via:genin.konstantin  via:gordon_w.  via:guslacerda  via:harry_wynn  via:henry_farrell  via:idiolect  via:io9  via:jbdelong  via:jhofman  via:john-burke  via:judea_pearl  via:justin  via:kass  via:kathryn  via:kjhealy  via:klk  via:larry_wasserman  via:logista  via:martens  via:matthew_berryman  via:mcfowland  via:mejn  via:moritz-heene  via:mreid  via:nielsen  via:nikete  via:nyhan  via:ogburn  via:orgtheory  via:phnk  via:rocha  via:rvenkat  via:ryantibs  via:samii  via:scotte  via:seth  via:shivak  via:simon_d.  via:simply_statistics  via:slaniel  via:spirtes  via:stodden  via:students  via:the_author  via:tslumley  via:unfogged  via:vaguery  via:vqv  via:wesolowski  via:wiggins  via:wikipedia  via:xmarquez  vidyasagar.mathukumalli  violence  visual_display_of_quantitative_information  von_mises.richard  vovk.vladimir_g.  vu.vincent  wade.nicholas  wager.stefan  wahba.grace  wainwright.martin_j.  wald.abraham  war  warmuth.nanny  war_crimes  wasserman.larry  wasserstein_metrics  was_on_the_committee  watkins.nicholas  wavelets  weak_dependence  weaver.rhiannon  wehbe.leila  weiss.benjamin  wemuth.nanny  wermuth.nanny  we_can_look  whats_gone_wrong_with_america  what_the_experiment_died_of  wheels:reinvention_of  when_you_walk_through_the_garden_you_gotta_watch_your_back  white.halbert  white.john_myles  why_oh_why_cant_we_have_a_better_intelligentsia  why_oh_why_cant_we_have_a_better_press_corps  wiener-khinchin  wiener.norbert  wiggins.christopher  wikipedia  wilks.s._s.  willett.rebecca_m.  williamson.robert_c.  winship.christopher  wolfe.patrick_j.  wolfowitz.j.  wood.simon_n.  word2vec  wu.wei_biao  WWII  xing.eric  yarkoni.tal  your_favorite_deep_neural_network_sucks  you_should_pardon_the_expression_in_this_context  yu.bin  zenker.sven  zero-inflation  zhang.kun  zhang.tong  zheng.alice  zhu.jerry  ziliak.stephen  zilsel.edgar 

Copy this bookmark: