cshalizi + to_read   1313

The Incompatible Incentives of Private Sector AI by Tom Slee :: SSRN
"Algorithms that sort people into categories are plagued by incompatible incentives. While more accurate algorithms may address problems of statistical bias and unfairness, they cannot solve the ethical challenges that arise from incompatible incentives.
"Subjects of algorithmic decisions seek to optimize their outcomes, but such efforts may degrade the accuracy of the algorithm. To maintain their accuracy, algorithms must be accompanied by supplementary rules: “guardrails” that dictate the limits of acceptable behaviour by subjects. Algorithm owners are drawn into taking on the tasks of governance, managing and validating the behaviour of those who interact with their systems.
"The governance role offers temptations to indulge in regulatory arbitrage. If governance is left to algorithm owners, it may lead to arbitrary and restrictive controls on individual behaviour. The goal of algorithmic governance by automated decision systems, social media recommender systems, and rating systems is a mirage, retreating into the distance whenever we seem to approach it."
to:NB  mechanism_design  prediction  data_mining  slee.tom  to_read  to_teach:data-mining 
20 hours ago by cshalizi
Phys. Rev. E 100, 022124 (2019) - Stochastic basins of attraction and generalized committor functions
"We study two generalizations of the basin of attraction of a stable state, to the case of stochastic dynamics, arbitrary regions, and finite-time horizons. This is done by introducing generalized committor functions and studying soujourn times. We show that the volume of the generalized basin, the basin stability, can be efficiently estimated using Monte Carlo–like techniques, making this concept amenable to the study of high-dimension stochastic systems. Finally, we illustrate in a set of examples that stochastic basins efficiently capture the realm of attraction of metastable sets, which parts of phase space go into long transients in deterministic systems, that they allow us to deal with numerical noise, and can detect the collapse of metastability in high-dimensional systems. We discuss two far-reaching generalizations of the basin of attraction of an attractor. The basin of attraction of an attractor are those states that eventually will get to the attractor. In a generic stochastic system, all regions will be left again; no attraction is permanent. To obtain the equivalent of the basin of attraction of a region we need to generalize the notion to cover finite-time horizons and finite regions. We do so by considering soujourn times, the fraction of time that a trajectory spends in a set, and by generalizing committor functions which arise in the study of hitting probabilities. In a simplified setting we show that these two notions reduce to the normal notions of the basin of attraction in the appropriate limits. We also show that the volume of these stochastic basins can be efficiently estimated for high-dimensional systems at computational cost comparable to that for deterministic systems. To fully illustrate the properties captured by the stochastic basins, we show a set of examples ranging from simple conceptual models to high-dimensional inhomogeneous oscillator chains. These show that stochastic basins efficiently capture metastable attraction, the presence of long transients, that they allow us to deal with numerical and approximation noise, and can detect the collapse of metastability with increasing noise in high-dimensional systems."
to:NB  metastability  non-equilibrium  stochastic_processes  dynamical_systems  to_read 
yesterday by cshalizi
[1908.06339] Indetermination of networks structure from the dynamics perspective
"Networks are universally considered as complex structures of interactions of large multi-component systems. In order to determine the role that each node has inside a complex network, several centrality measures have been developed. Such topological features are also important for their role in the dynamical processes occurring in networked systems. In this paper, we argue that the dynamical activity of the nodes may strongly reshape their relevance inside the network making centrality measures in many cases misleading. We show that when the dynamics taking place at the local level of the node is slower than the global one between the nodes, then the system may lose track of the structural features. On the contrary, when that ratio is reversed only global properties such as the shortest distances can be recovered. From the perspective of networks inference, this constitutes an uncertainty principle, in the sense that it limits the extraction of multi-resolution information about the structure, particularly in the presence of noise. For illustration purposes, we show that for networks with different time-scale structures such as strong modularity, the existence of fast global dynamics can imply that precise inference of the community structure is impossible."

--- This tends to reinforce my long-standing gut skepticism about the universal value of centrality measures.
to:NB  network_data_analysis  dynamical_systems  to_teach:baby-nets  to_read 
yesterday by cshalizi
[1811.06407] Neural Predictive Belief Representations
"Unsupervised representation learning has succeeded with excellent results in many applications. It is an especially powerful tool to learn a good representation of environments with partial or noisy observations. In partially observable domains it is important for the representation to encode a belief state, a sufficient statistic of the observations seen so far. In this paper, we investigate whether it is possible to learn such a belief representation using modern neural architectures. Specifically, we focus on one-step frame prediction and two variants of contrastive predictive coding (CPC) as the objective functions to learn the representations. To evaluate these learned representations, we test how well they can predict various pieces of information about the underlying state of the environment, e.g., position of the agent in a 3D maze. We show that all three methods are able to learn belief representations of the environment, they encode not only the state information, but also its uncertainty, a crucial aspect of belief states. We also find that for CPC multi-step predictions and action-conditioning are critical for accurate belief representations in visually complex environments. The ability of neural representations to capture the belief information has the potential to spur new advances for learning and planning in partially observable domains, where leveraging uncertainty is essential for optimal decision making."
to:NB  prediction  predictive_representations  inference_to_latent_objects  neural_networks  to_read 
yesterday by cshalizi
[1811.02549] Language GANs Falling Short
"Generating high-quality text with sufficient diversity is essential for a wide range of Natural Language Generation (NLG) tasks. Maximum-Likelihood (MLE) models trained with teacher forcing have consistently been reported as weak baselines, where poor performance is attributed to exposure bias (Bengio et al., 2015; Ranzato et al., 2015); at inference time, the model is fed its own prediction instead of a ground-truth token, which can lead to accumulating errors and poor samples. This line of reasoning has led to an outbreak of adversarial based approaches for NLG, on the account that GANs do not suffer from exposure bias. In this work, we make several surprising observations which contradict common beliefs. First, we revisit the canonical evaluation framework for NLG, and point out fundamental flaws with quality-only evaluation: we show that one can outperform such metrics using a simple, well-known temperature parameter to artificially reduce the entropy of the model's conditional distributions. Second, we leverage the control over the quality / diversity trade-off given by this parameter to evaluate models over the whole quality-diversity spectrum and find MLE models constantly outperform the proposed GAN variants over the whole quality-diversity space. Our results have several implications: 1) The impact of exposure bias on sample quality is less severe than previously thought, 2) temperature tuning provides a better quality / diversity trade-off than adversarial training while being easier to train, easier to cross-validate, and less computationally expensive. Code to reproduce the experiments is available at this http URL"
to:NB  natural_language_processing  model_checking  your_favorite_deep_neural_network_sucks  to_read 
yesterday by cshalizi
[1908.06456] Harmonic Analysis of Symmetric Random Graphs
"Following Ressel (1985,2008) this note attempts to understand graph limits (Lovasz and Szegedy 2006} in terms of harmonic analysis on semigroups (Berg et al. 1984), thereby providing an alternative derivation of de Finetti's theorem for random exchangeable graphs."

--- SL has been hinting about this for years (it's the natural combination of his 70s--80s work on "extremal point" models, sufficiency, and semi-groups with his recent interest in graph limits and graphons), so I'm very excited to read this.
to:NB  to_read  graph_limits  analysis  probability  lauritzen.steffen 
yesterday by cshalizi
[1901.00555] An Introductory Guide to Fano's Inequality with Applications in Statistical Estimation
"Information theory plays an indispensable role in the development of algorithm-independent impossibility results, both for communication problems and for seemingly distinct areas such as statistics and machine learning. While numerous information-theoretic tools have been proposed for this purpose, the oldest one remains arguably the most versatile and widespread: Fano's inequality. In this chapter, we provide a survey of Fano's inequality and its variants in the context of statistical estimation, adopting a versatile framework that covers a wide range of specific problems. We present a variety of key tools and techniques used for establishing impossibility results via this approach, and provide representative examples covering group testing, graphical model selection, sparse linear regression, density estimation, and convex optimization."
to:NB  information_theory  minimax  statistics  estimation  to_read 
2 days ago by cshalizi
Franco Moretti & Oleg Sobchuk, Hidden In Plain Sight, NLR 118, July–August 2019
"If there is one feature that immediately distinguishes the digital humanities (dh) from the ‘other’ humanities, data visualization has to be it. Histograms, scatterplots, time series, diagrams, networks . . . ten, fifteen years ago, studies of film, music, literature or art didn’t use any of these. Now they do, and here we examine some premises (unspoken, and often probably unconscious) of this field-defining practice. Field-defining, because visualization is never just visualization: it involves the formation of corpora, the definition of data, their elaboration, and often some sort of preliminary interpretation as well. Whence the idea of this article: to gather sixty-odd studies that have had a significant impact on dh, and analyse how they visually present their data.footnote1 What interests us is visualization as a practice, in the conviction that practices—what we learn to do by doing, by professional habit, without being fully aware of what we are doing—often have larger theoretical implications than theoretical statements themselves. Whether this has indeed been the case for dh, is for readers to decide."
to:NB  to_read  visual_display_of_quantitative_information  humanities  moretti.franco 
6 days ago by cshalizi
[1908.02375] Limit Theorems for Data with Network Structure
"This paper develops new limit theory for data that are generated by networks or more generally display cross-sectional dependence structures that are governed by observable and unobservable characteristics. Strategic network formation models are an example. Wether two data points are highly correlated or not depends on draws from underlying characteristics distributions. The paper defines a measure of closeness that depends on primitive conditions on the distribution of observable characteristics as well as functional form of the underlying model. A summability condition over the probability distribution of observable characteristics is shown to be a critical ingredient in establishing limit results. The paper establishes weak and strong laws of large numbers as well as a stable central limit theorem for a class of statistics that include as special cases network statistics such as average node degrees or average peer characteristics. Some worked examples illustrating the theory are provided."
to:NB  stochastic_processes  networks  network_data_analysis  ergodic_theory  to_read 
13 days ago by cshalizi
[1908.02723] Advocacy Learning: Learning through Competition and Class-Conditional Representations
"We introduce advocacy learning, a novel supervised training scheme for attention-based classification problems. Advocacy learning relies on a framework consisting of two connected networks: 1) N Advocates (one for each class), each of which outputs an argument in the form of an attention map over the input, and 2) a Judge, which predicts the class label based on these arguments. Each Advocate produces a class-conditional representation with the goal of convincing the Judge that the input example belongs to their class, even when the input belongs to a different class. Applied to several different classification tasks, we show that advocacy learning can lead to small improvements in classification accuracy over an identical supervised baseline. Though a series of follow-up experiments, we analyze when and how such class-conditional representations improve discriminative performance. Though somewhat counter-intuitive, a framework in which subnetworks are trained to competitively provide evidence in support of their class shows promise, in many cases performing on par with standard learning approaches. This provides a foundation for further exploration into competition and class-conditional representations in supervised learning."

--- Drs. Mercier and Sperber, please call your office. (Also Drs. Jordan and Jacobs...)
to:NB  machine_learning  collective_cognition  ensemble_methods  to_read 
13 days ago by cshalizi
[1908.02614] The power of dynamic social networks to predict individuals' mental health
"Precision medicine has received attention both in and outside the clinic. We focus on the latter, by exploiting the relationship between individuals' social interactions and their mental health to develop a predictive model of one's likelihood to be depressed or anxious from rich dynamic social network data. To our knowledge, we are the first to do this. Existing studies differ from our work in at least one aspect: they do not model social interaction data as a network; they do so but analyze static network data; they examine "correlation" between social networks and health but without developing a predictive model; or they study other individual traits but not mental health. In a systematic and comprehensive evaluation, we show that our predictive model that uses dynamic social network data is superior to its static network as well as non-network equivalents when run on the same data."
to:NB  social_networks  psychiatry  sociology  prediction  network_data_analysis  lizardo.omar  to_read 
13 days ago by cshalizi
[1908.01823] Change-point detection in dynamic networks via graphon estimation
"We propose a general approach for change-point detection in dynamic networks. The proposed method is model-free and covers a wide range of dynamic networks. The key idea behind our approach is to effectively utilize the network structure in designing change-point detection algorithms. This is done via an initial step of graphon estimation, where we propose a modified neighborhood smoothing~(MNBS) algorithm for estimating the link probability matrices of a dynamic network. Based on the initial graphon estimation, we then develop a screening and thresholding algorithm for multiple change-point detection in dynamic networks. The convergence rate and consistency for the change-point detection procedure are derived as well as those for MNBS. When the number of nodes is large~(e.g., exceeds the number of temporal points), our approach yields a faster convergence rate in detecting change-points comparing with an algorithm that simply employs averaged information of the dynamic network across time. Numerical experiments demonstrate robust performance of the proposed algorithm for change-point detection under various types of dynamic networks, and superior performance over existing methods is observed. A real data example is provided to illustrate the effectiveness and practical impact of the procedure."
to:NB  network_data_analysis  change-point_problem  graph_limits  statistics  to_read 
14 days ago by cshalizi
Fake news on Twitter during the 2016 U.S. presidential election | Science
"The spread of fake news on social media became a public concern in the United States after the 2016 presidential election. We examined exposure to and sharing of fake news by registered voters on Twitter and found that engagement with fake news sources was extremely concentrated. Only 1% of individuals accounted for 80% of fake news source exposures, and 0.1% accounted for nearly 80% of fake news sources shared. Individuals most likely to engage with fake news sources were conservative leaning, older, and highly engaged with political news. A cluster of fake news sources shared overlapping audiences on the extreme right, but for people across the political spectrum, most political news exposure still came from mainstream media outlets."

--- To Bruce Sterling's (obviously correct) dictum that "The future is about old people, in big cities, afraid of the sky", perhaps we should add "outraged at nonsense".
to:NB  to_read  social_media  deceiving_us_has_become_an_industrial_process  natural_history_of_truthiness  lazer.david  us_politics  networked_life  re:actually-dr-internet-is-the-name-of-the-monsters-creator 
15 days ago by cshalizi
[1908.00882] Population Predictive Checks
"Bayesian modeling has become a staple for researchers analyzing data. Thanks to recent developments in approximate posterior inference, modern researchers can easily build, use, and revise complicated Bayesian models for large and rich data. These new abilities, however, bring into focus the problem of model assessment. Researchers need tools to diagnose the fitness of their models, to understand where a model falls short, and to guide its revision. In this paper we develop a new method for Bayesian model checking, the population predictive check (Pop-PC). Pop-PCs are built on posterior predictive checks (PPC), a seminal method that checks a model by assessing the posterior predictive distribution on the observed data. Though powerful, PPCs use the data twice---both to calculate the posterior predictive and to evaluate it---which can lead to overconfident assessments. Pop-PCs, in contrast, compare the posterior predictive distribution to the population distribution of the data. This strategy blends Bayesian modeling with frequentist assessment, leading to a robust check that validates the model on its generalization. Of course the population distribution is not usually available; thus we use tools like the bootstrap and cross validation to estimate the Pop-PC. Further, we extend Pop-PCs to hierarchical models. We study Pop-PCs on classical regression and a hierarchical model of text. We show that Pop-PCs are robust to overfitting and can be easily deployed on a broad family of models."
to:NB  model_checking  bayesianism  statistics  blei.david  re:phil-of-bayes_paper  to_read  cross-validation 
16 days ago by cshalizi
Kontorovich , Pinelis : Exact lower bounds for the agnostic probably-approximately-correct (PAC) machine learning model
"We provide an exact nonasymptotic lower bound on the minimax expected excess risk (EER) in the agnostic probably-approximately-correct (PAC) machine learning classification model and identify minimax learning algorithms as certain maximally symmetric and minimally randomized “voting” procedures. Based on this result, an exact asymptotic lower bound on the minimax EER is provided. This bound is of the simple form c∞/ν√c∞/ν as ν→∞ν→∞, where c∞=0.16997…c∞=0.16997… is a universal constant, ν=m/dν=m/d, mm is the size of the training sample and dd is the Vapnik–Chervonenkis dimension of the hypothesis class. It is shown that the differences between these asymptotic and nonasymptotic bounds, as well as the differences between these two bounds and the maximum EER of any learning algorithms that minimize the empirical risk, are asymptotically negligible, and all these differences are due to ties in the mentioned “voting” procedures. A few easy to compute nonasymptotic lower bounds on the minimax EER are also obtained, which are shown to be close to the exact asymptotic lower bound c∞/ν√c∞/ν even for rather small values of the ratio ν=m/dν=m/d. As an application of these results, we substantially improve existing lower bounds on the tail probability of the excess risk. Among the tools used are Bayes estimation and apparently new identities and inequalities for binomial distributions."
to:NB  learning_theory  statistics  minimax  kontorovich.aryeh  kith_and_kin  to_read 
17 days ago by cshalizi
[1907.13323] Multi-cause causal inference with unmeasured confounding and binary outcome
"Unobserved confounding presents a major threat to causal inference in observational studies. Recently, several authors suggest that this problem may be overcome in a shared confounding setting where multiple treatments are independent given a common latent confounder. It has been shown that if additional data such as negative controls are available, then the causal effects are indeed identifiable. In this paper, we show that these additional data are not necessary for causal identification, provided that the treatments and outcome follow Gaussian and logistic structural equation models, respectively. Our novel identification strategy is based on the symmetry and tail properties of the observed data distribution. We further develop two-step likelihood-based estimation procedures. We illustrate our method through simulations and a real data application studying the causal relationship between the volume of various brain regions and cognitive scores."
to:NB  causal_inference  statistics  to_read 
20 days ago by cshalizi
[1907.12581] Improved mutual information measure for classification and community detection
"The information theoretic quantity known as mutual information finds wide use in classification and community detection analyses to compare two classifications of the same set of objects into groups. In the context of classification algorithms, for instance, it is often used to compare discovered classes to known ground truth and hence to quantify algorithm performance. Here we argue that the standard mutual information, as commonly defined, omits a crucial term which can become large under real-world conditions, producing results that can be substantially in error. We demonstrate how to correct this error and define a mutual information that works in all cases. We discuss practical implementation of the new measure and give some example applications."

--- From a quick scan, this is essentially saying that one should do two-part (old Rissanen) style minimum description length coding, so you need to give both the information content of the correspondence, _and_ specify which correspondence you're using.
to:NB  to_read  information_theory  have_skimmed  community_discovery  kith_and_kin  newman.mark 
20 days ago by cshalizi
[1903.06936] Bayesian and Spline based Approaches for (EM based) Graphon Estimation
"The paper proposes the estimation of a graphon function for network data using principles of the EM algorithm. The approach considers both, variability with respect to ordering the nodes of a network and estimation of the unique representation of a graphon. To do so (linear) B-splines are used, which allows to easily accommodate constraints in the estimation routine so that the estimated graphon fulfills the canonical representation, meaning its univariate margin is monotonic. The graphon estimate itself allows to apply Bayesian ideas to explore both, the degree distribution and the ordering of the nodes with respect to their degree. Variability and uncertainty is taken into account using MCMC techniques. Combining both steps gives an EM based approach for graphon estimation."

GODDAMIT.
to:NB  to_read  graph_limits  network_data_analysis  nonparametrics  statistics  re:smoothing_adjacency_matrices  scooped? 
21 days ago by cshalizi
[1507.08140] Degree-based goodness-of-fit tests for heterogeneous random graph models : independent and exchangeable cases
"The degrees are a classical and relevant way to study the topology of a network. They can be used to assess the goodness-of-fit for a given random graph model. In this paper we introduce goodness-of-fit tests for two classes of models. First, we consider the case of independent graph models such as the heterogeneous Erdös-Rényi model in which the edges have different connection probabilities. Second, we consider a generic model for exchangeable random graphs called the W-graph. The stochastic block model and the expected degree distribution model fall within this framework. We prove the asymptotic normality of the degree mean square under these independent and exchangeable models and derive formal tests. We study the power of the proposed tests and we prove the asymptotic normality under specific sparsity regimes. The tests are illustrated on real networks from social sciences and ecology, and their performances are assessed via a simulation study."

Journal version: https://doi.org/10.1111/sjos.12410
to:NB  goodness-of-fit  network_data_analysis  graph_limits  to_teach:graphons  to_read 
22 days ago by cshalizi
Free trade and opioid overdose death in the United States - ScienceDirect
"Opioid overdose deaths in the U.S. rose dramatically after 1999, but also exhibited substantial geographic variation. This has largely been explained by differential availability of prescription and non-prescription opioids, including heroin and fentanyl. Recent studies explore the underlying role of socioeconomic factors, but overlook the influence of job loss due to international trade, an economic phenomenon that disproportionately harms the same regions and demographic groups at the heart of the opioid epidemic. We used OLS regression and county-year level data from the Centers for Disease Controls and the Department of Labor to test the association between trade-related job loss and opioid-related overdose death between 1999 and 2015. We find that the loss of 1000 trade-related jobs was associated with a 2.7 percent increase in opioid-related deaths. When fentanyl was present in the heroin supply, the same number of job losses was associated with a 11.3 percent increase in opioid-related deaths."

--- I'm very skeptical about OLS here. Something like nearest neighbors would be better here, but I'm not sure how to handle spatial correlation.
to:NB  to_read  drugs  whats_gone_wrong_with_america  class_struggles_in_america  econometrics  statistics  globalization  to_teach:data_over_space_and_time  to_teach:undergrad-ADA  causal_inference 
22 days ago by cshalizi
Peeling the Onion of Brain Representations | Annual Review of Neuroscience
"The brain's function is to enable adaptive behavior in the world. To this end, the brain processes information about the world. The concept of representation links the information processed by the brain back to the world and enables us to understand what the brain does at a functional level. The appeal of making the connection between brain activity and what it represents has been irresistible to neuroscience, despite the fact that representational interpretations pose several challenges: We must define which aspects of brain activity matter, how the code works, and how it supports computations that contribute to adaptive behavior. It has been suggested that we might drop representational language altogether and seek to understand the brain, more simply, as a dynamical system. In this review, we argue that the concept of representation provides a useful link between dynamics and computational function and ask which aspects of brain activity should be analyzed to achieve a representational understanding. We peel the onion of brain representations in search of the layers (the aspects of brain activity) that matter to computation. The article provides an introduction to the motivation and mathematics of representational models, a critical discussion of their assumptions and limitations, and a preview of future directions in this area."
to:NB  neuroscience  cognitive_science  representation  kriegeskorte.nikolaus  computation  to_read 
22 days ago by cshalizi
[1907.09611] Asymptotic normality, concentration, and coverage of generalized posteriors
"Generalized likelihoods are commonly used to obtain consistent estimators with attractive computational and robustness properties. Formally, any generalized likelihood can be used to define a generalized posterior distribution, but an arbitrarily defined "posterior" cannot be expected to appropriately quantify uncertainty in any meaningful sense. In this article, we provide sufficient conditions under which generalized posteriors exhibit concentration, asymptotic normality (Bernstein-von Mises), an asymptotically correct Laplace approximation, and asymptotically correct frequentist coverage. We apply our results in detail to generalized posteriors for a wide array of generalized likelihoods, including pseudolikelihoods in general, the Ising model pseudolikelihood, the Gaussian Markov random field pseudolikelihood, the fully observed Boltzmann machine pseudolikelihood, the Cox proportional hazards partial likelihood, and a median-based likelihood for robust inference of location. Further, we show how our results can be used to easily establish the asymptotics of standard posteriors for exponential families and generalized linear models. We make no assumption of model correctness so that our results apply with or without misspecification."
to:NB  bayesian_consistency  statistics  to_read  likelihood  misspecification 
28 days ago by cshalizi
[1806.07016] Evaluating Ex Ante Counterfactual Predictions Using Ex Post Causal Inference
"We derive a formal, decision-based method for comparing the performance of counterfactual treatment regime predictions using the results of experiments that give relevant information on the distribution of treated outcomes. Our approach allows us to quantify and assess the statistical significance of differential performance for optimal treatment regimes estimated from structural models, extrapolated treatment effects, expert opinion, and other methods. We apply our method to evaluate optimal treatment regimes for conditional cash transfer programs across countries where predictions are generated using data from experimental evaluations in other countries and pre-program data in the country of interest."
to:NB  causal_inference  model_checking  statistics  to_read  samii.cyrus 
28 days ago by cshalizi
[1707.00833] Two-sample Hypothesis Testing for Inhomogeneous Random Graphs
"The study of networks leads to a wide range of high dimensional inference problems. In many practical applications, one needs to draw inference from one or few large sparse networks. The present paper studies hypothesis testing of graphs in this high-dimensional regime, where the goal is to test between two populations of inhomogeneous random graphs defined on the same set of n vertices. The size of each population m is much smaller than n, and can even be a constant as small as 1. The critical question in this context is whether the problem is solvable for small m.
"We answer this question from a minimax testing perspective. Let P,Q be the population adjacencies of two sparse inhomogeneous random graph models, and d be a suitably defined distance function. Given a population of m graphs from each model, we derive minimax separation rates for the problem of testing P=Q against d(P,Q)>ρ. We observe that if m is small, then the minimax separation is too large for some popular choices of d, including total variation distance between corresponding distributions. This implies that some models that are widely separated in d cannot be distinguished for small m, and hence, the testing problem is generally not solvable in these cases.
"We also show that if m>1, then the minimax separation is relatively small if d is the Frobenius norm or operator norm distance between P and Q. For m=1, only the latter distance provides small minimax separation. Thus, for these distances, the problem is solvable for small m. We also present near-optimal two-sample tests in both cases, where tests are adaptive with respect to sparsity level of the graphs."
to:NB  hypothesis_testing  network_data_analysis  statistics  re:network_differences  to_read 
4 weeks ago by cshalizi
[1907.01605] Limits of Sparse Configuration Models and Beyond: Graphexes and Multi-Graphexes
"We investigate structural properties of large, sparse random graphs through the lens of "sampling convergence" (Borgs et. al. (2017)). Sampling convergence generalizes left convergence to sparse graphs, and describes the limit in terms of a "graphex". We introduce a notion of sampling convergence for sequences of multigraphs, and establish the graphex limit for the configuration model, a preferential attachment model, the generalized random graph, and a bipartite variant of the configuration model. The results for the configuration model, preferential attachment model and bipartite configuration model provide necessary and sufficient conditions for these random graph models to converge. The limit for the configuration model and the preferential attachment model is an augmented version of an exchangeable random graph model introduced by Caron and Fox (2017)."
in_NB  graph_limits  to_read  probability  chayes.jennifer  borgs.christian 
4 weeks ago by cshalizi
The Standard Errors of Persistence
"A large literature on persistence finds that many modern outcomes strongly reflect characteristics of the same places in the distant past. However, alongside unusually high t statistics, these regressions display severe spatial auto-correlation in residuals, and the purpose of this paper is to examine whether these two properties might be connected. We start by running artificial regressions where both variables are spatial noise and find that, even for modest ranges of spatial correlation between points, t statistics become severely inflated leading to significance levels that are in error by several orders of magnitude. We analyse 27 persistence studies in leading journals and find that in most cases if we replace the main explanatory variable with spatial noise the fit of the regression commonly improves; and if we replace the dependent variable with spatial noise, the persistence variable can still explain it at high significance levels. We can predict in advance which persistence results might be the outcome of fitting spatial noise from the degree of spatial au-tocorrelation in their residuals measured by a standard Moran statistic. Our findings suggest that the results of persistence studies, and of spatial regressions more generally, might be treated with some caution in the absence of reported Moran statistics and noise simulations."
to:NB  to_read  econometrics  regression  spatial_statistics  to_teach:data_over_space_and_time  via:jbdelong 
5 weeks ago by cshalizi
Cheng , Chen : Nonparametric inference via bootstrapping the debiased estimator
"In this paper, we propose to construct confidence bands by bootstrapping the debiased kernel density estimator (for density estimation) and the debiased local polynomial regression estimator (for regression analysis). The idea of using a debiased estimator was recently employed by Calonico et al. (2018b) to construct a confidence interval of the density function (and regression function) at a given point by explicitly estimating stochastic variations. We extend their ideas of using the debiased estimator and further propose a bootstrap approach for constructing simultaneous confidence bands. This modified method has an advantage that we can easily choose the smoothing bandwidth from conventional bandwidth selectors and the confidence band will be asymptotically valid. We prove the validity of the bootstrap confidence band and generalize it to density level sets and inverse regression problems. Simulation studies confirm the validity of the proposed confidence bands/sets. We apply our approach to an Astronomy dataset to show its applicability."
to:NB  to_read  statistics  bootstrap  confidence_sets  regression  density_estimation  re:ADAfaEPoV 
6 weeks ago by cshalizi
[1811.08525] Consensus and Polarisation in Competing Complex Contagion Processes
"The rate of adoption of new information depends on reinforcement from multiple sources in a way that often cannot be described by simple contagion processes. In such cases, contagion is said to be complex. Complex contagion happens in the diffusion of human behaviours, innovations, and knowledge. Based on that evidence, we propose a model that considers multiple, potentially asymmetric, and competing contagion processes and analyse its respective population-wide dynamics, bringing together ideas from complex contagion, opinion dynamics, evolutionary game theory, and language competition by shifting the focus from individuals to the properties of the diffusing processes. We show that our model spans a dynamical space in which the population exhibits patterns of consensus, dominance, and, importantly, different types of polarisation, a more diverse dynamical environment that contrasts with single simple contagion processes. We show how these patterns emerge and how different population structures modify them through a natural development of spatial correlations: structured interactions increase the range of the dominance regime by reducing that of dynamic polarisation, tight modular structures can generate structural polarisation, depending on the interplay between fundamental properties of the processes and the modularity of the interaction network."
to:NB  epidemic_models  diffusion_of_innovations  epidemiology_of_ideas  social_networks  levin.simon  to_read  re:do-institutions-evolve 
8 weeks ago by cshalizi
[1712.07248] Towards a General Large Sample Theory for Regularized Estimators
"We present a general framework for studying regularized estimators; such estimators are pervasive in estimation problems wherein "plug-in" type estimators are either ill-defined or ill-behaved. Within this framework, we derive, under primitive conditions, consistency and a generalization of the asymptotic linearity property. We also provide data-driven methods for choosing tuning parameters that, under some conditions, achieve the aforementioned properties. We illustrate the scope of our approach by studying a wide range of applications, revisiting known results and deriving new ones."
to:NB  statistics  estimation  optimization  to_read 
8 weeks ago by cshalizi
[1704.04118] From Data to Decisions: Distributionally Robust Optimization is Optimal
"We study stochastic programs where the decision-maker cannot observe the distribution of the exogenous uncertainties but has access to a finite set of independent samples from this distribution. In this setting, the goal is to find a procedure that transforms the data to an estimate of the expected cost function under the unknown data-generating distribution, i.e., a predictor, and an optimizer of the estimated cost function that serves as a near-optimal candidate decision, i.e., a prescriptor. As functions of the data, predictors and prescriptors constitute statistical estimators. We propose a meta-optimization problem to find the least conservative predictors and prescriptors subject to constraints on their out-of-sample disappointment. The out-of-sample disappointment quantifies the probability that the actual expected cost of the candidate decision under the unknown true distribution exceeds its predicted cost. Leveraging tools from large deviations theory, we prove that this meta-optimization problem admits a unique solution: The best predictor-prescriptor pair is obtained by solving a distributionally robust optimization problem over all distributions within a given relative entropy distance from the empirical distribution of the data."

--- Physicists re-inventing learning theory for generalization error bounds?
to:NB  to_read  learning_theory  large_deviations  decision_theory  statistics 
9 weeks ago by cshalizi
[1902.02580] The few-get-richer: a surprising consequence of popularity-based rankings
"Ranking algorithms play a crucial role in online platforms ranging from search engines to recommender systems. In this paper, we identify a surprising consequence of popularity-based rankings: the fewer the items reporting a given signal, the higher the share of the overall traffic they collectively attract. This few-get-richer effect emerges in settings where there are few distinct classes of items (e.g., left-leaning news sources versus right-leaning news sources), and items are ranked based on their popularity. We demonstrate analytically that the few-get-richer effect emerges when people tend to click on top-ranked items and have heterogeneous preferences for the classes of items. Using simulations, we analyze how the strength of the effect changes with assumptions about the setting and human behavior. We also test our predictions experimentally in an online experiment with human participants. Our findings have important implications to understand the spread of misinformation."
to:NB  information_retrieval  networked_life  why_oh_why_cant_we_have_a_better_press_corps  to_read 
9 weeks ago by cshalizi
Huang , Reich , Fuentes , Sankarasubramanian : Complete spatial model calibration
"Computer simulation models are central to environmental science. These mathematical models are used to understand complex weather and climate patterns and to predict the climate’s response to different forcings. Climate models are of course not perfect reflections of reality, and so comparison with observed data is needed to quantify and to correct for biases and other deficiencies. We propose a new method to calibrate model output using observed data. Our approach not only matches the marginal distributions of the model output and gridded observed data, but it simultaneously postprocesses the model output to have the same spatial correlation as the observed data. This comprehensive calibration method permits realistic spatial simulations for regional impact studies. We apply the proposed method to global climate model output in North America and show that it successfully calibrates the model output for temperature and precipitation."
to:NB  spatial_statistics  simulation  model_checking  statistics  to_teach:data_over_space_and_time  to_read 
9 weeks ago by cshalizi
Panel Data Analysis via Mechanistic Models: Journal of the American Statistical Association: Vol 0, No 0
"Panel data, also known as longitudinal data, consist of a collection of time series. Each time series, which could itself be multivariate, comprises a sequence of measurements taken on a distinct unit. Mechanistic modeling involves writing down scientifically motivated equations describing the collection of dynamic systems giving rise to the observations on each unit. A defining characteristic of panel systems is that the dynamic interaction between units should be negligible. Panel models therefore consist of a collection of independent stochastic processes, generally linked through shared parameters while also having unit-specific parameters. To give the scientist flexibility in model specification, we are motivated to develop a framework for inference on panel data permitting the consideration of arbitrary nonlinear, partially observed panel models. We build on iterated filtering techniques that provide likelihood-based inference on nonlinear partially observed Markov process models for time series data. Our methodology depends on the latent Markov process only through simulation; this plug-and-play property ensures applicability to a large class of models. We demonstrate our methodology on a toy example and two epidemiological case studies. We address inferential and computational issues arising due to the combination of model complexity and dataset size."
to:NB  statistics  statistical_inference_for_stochastic_processes  time_series  ionides.edward  to_read  particle_filters 
9 weeks ago by cshalizi
[1904.02610] Diverse communities behave like typical random ecosystems
"With a brief letter to Nature in 1972, Robert May triggered a worldwide research program in theoretical ecology and complex systems that continues to this day. Building on powerful mathematical results about large random matrices, he argued that systems with sufficiently large numbers of interacting components are generically unstable. In the ecological context, May's thesis directly contradicted the longstanding ecological intuition that diversity promotes stability. In economics and finance, May's work helped to consolidate growing concerns about the fragility of an increasingly interconnected global marketplace. In this Letter, we draw on recent theoretical progress in random matrix theory and statistical physics to fundamentally extend and reinterpret May's theorem. We confirm that a wide range of ecological models become unstable at the point predicted by May, even when the models do not strictly follow his assumptions. Surprisingly, increasing the interaction strength or diversity beyond the May threshold results in a reorganization of the ecosystem -- through extinction of a fixed fraction of species -- into a new stable state whose properties are well described by purely random interactions. This self-organized state remains stable for arbitrarily large ecosystem and suggests a new interpretation of May's original conclusions: when interacting complex systems with many components become sufficiently large, they will generically undergo a transition to a "typical" self-organized, stable state."
to:NB  ecology  random_matrices  self-organization  to_read 
9 weeks ago by cshalizi
[1906.05433] Tackling Climate Change with Machine Learning
"Climate change is one of the greatest challenges facing humanity, and we, as machine learning experts, may wonder how we can help. Here we describe how machine learning can be a powerful tool in reducing greenhouse gas emissions and helping society adapt to a changing climate. From smart grids to disaster management, we identify high impact problems where existing gaps can be filled by machine learning, in collaboration with other fields. Our recommendations encompass exciting research questions as well as promising business opportunities. We call on the machine learning community to join the global effort against climate change."

--- My gut reaction is that this is well-intentioned but point-missing, but note the final tags.
to:NB  climate_change  machine_learning  to_read  to_be_shot_after_a_fair_trial 
9 weeks ago by cshalizi
The Sad Truth about Happiness Scales | Journal of Political Economy: Ahead of Print
"Happiness is reported in ordered intervals (e.g., very, pretty, not too happy). We review and apply standard statistical results to determine when such data permit identification of two groups’ relative average happiness. The necessary conditions for nonparametric identification are strong and unlikely to ever be satisfied. Standard parametric approaches cannot identify this ranking unless the variances are exactly equal. If not, ordered probit findings can be reversed by lognormal transformations. For nine prominent happiness research areas, conditions for nonparametric identification are rejected and standard parametric results are reversed using plausible transformations. Tests for a common reporting function consistently reject."
to:NB  to_read  social_measurement  psychometrics  statistics  via:phnk 
10 weeks ago by cshalizi
Life after Lead: Effects of Early Interventions for Children Exposed to Lead
"Lead pollution is consistently linked to cognitive and behavioral impairments, yet little is known about the benefits of public health interventions for children exposed to lead. This paper estimates the long-term impacts of early-life interventions (e.g. lead remediation, nutritional assessment, medical evaluation, developmental surveillance, and public assistance referrals) recommended for lead-poisoned children. Using linked administrative data from Charlotte, NC, we compare outcomes for children who are similar across observable characteristics but differ in eligibility for intervention due to blood lead test results. We find that the negative outcomes previously associated with early-life exposure can largely be reversed by intervention."

--- The last tag, as usual, is conditional on liking the paper after reading it, and on replication data being available.
to:NB  to_read  lead  cognitive_development  sociology  causal_inference  to_teach:undergrad-ADA 
10 weeks ago by cshalizi
Phys. Rev. E 99, 062301 (2019) - Social clustering in epidemic spread on coevolving networks
"Even though transitivity is a central structural feature of social networks, its influence on epidemic spread on coevolving networks has remained relatively unexplored. Here we introduce and study an adaptive susceptible-infected-susceptible (SIS) epidemic model wherein the infection and network coevolve with nontrivial probability to close triangles during edge rewiring, leading to substantial reinforcement of network transitivity. This model provides an opportunity to study the role of transitivity in altering the SIS dynamics on a coevolving network. Using numerical simulations and approximate master equations (AMEs), we identify and examine a rich set of dynamical features in the model. In many cases, AMEs including transitivity reinforcement provide accurate predictions of stationary-state disease prevalence and network degree distributions. Furthermore, for some parameter settings, the AMEs accurately trace the temporal evolution of the system. We show that higher transitivity reinforcement in the model leads to lower levels of infective individuals in the population, when closing a triangle is the dominant rewiring mechanism. These methods and results may be useful in developing ideas and modeling strategies for controlling SIS-type epidemics."
in_NB  epidemic_models  mucha.peter_j.  networks  re:do-institutions-evolve  to_read 
10 weeks ago by cshalizi
[1905.12580] Model Similarity Mitigates Test Set Overuse
"Excessive reuse of test data has become commonplace in today's machine learning workflows. Popular benchmarks, competitions, industrial scale tuning, among other applications, all involve test data reuse beyond guidance by statistical confidence bounds. Nonetheless, recent replication studies give evidence that popular benchmarks continue to support progress despite years of extensive reuse. We proffer a new explanation for the apparent longevity of test data: Many proposed models are similar in their predictions and we prove that this similarity mitigates overfitting. Specifically, we show empirically that models proposed for the ImageNet ILSVRC benchmark agree in their predictions well beyond what we can conclude from their accuracy levels alone. Likewise, models created by large scale hyperparameter search enjoy high levels of similarity. Motivated by these empirical observations, we give a non-asymptotic generalization bound that takes similarity into account, leading to meaningful confidence bounds in practical settings."

--- So, the only reason what we're doing works is that we're not really changing very much?
to:NB  learning_theory  cross-validation  to_read  recht.benjamin 
11 weeks ago by cshalizi
[1905.12202] Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness
"Many recent works have shown that adversarial examples that fool classifiers can be found by minimally perturbing a normal input. Recent theoretical results, starting with Gilmer et al. (2018), show that if the inputs are drawn from a concentrated metric probability space, then adversarial examples with small perturbation are inevitable. A concentrated space has the property that any subset with Ω(1) (e.g., 1/100) measure, according to the imposed distribution, has small distance to almost all (e.g., 99/100) of the points in the space. It is not clear, however, whether these theoretical results apply to actual distributions such as images. This paper presents a method for empirically measuring and bounding the concentration of a concrete dataset which is proven to converge to the actual concentration. We use it to empirically estimate the intrinsic robustness to ℓ∞ and ℓ2 perturbations of several image classification benchmarks."
in_NB  to_read  adversarial_examples  concentration_of_measure  probability  statistics  learning_theory 
11 weeks ago by cshalizi
[1701.00505] Statistical inference for network samples using subgraph counts
"We consider that a network is an observation, and a collection of observed networks forms a sample. In this setting, we provide methods to test whether all observations in a network sample are drawn from a specified model. We achieve this by deriving, under the null of the graphon model, the joint asymptotic properties of average subgraph counts as the number of observed networks increases but the number of nodes in each network remains finite. In doing so, we do not require that each observed network contains the same number of nodes, or is drawn from the same distribution. Our results yield joint confidence regions for subgraph counts, and therefore methods for testing whether the observations in a network sample are drawn from: a specified distribution, a specified model, or from the same model as another network sample. We present simulation experiments and an illustrative example on a sample of brain networks where we find that highly creative individuals' brains present significantly more short cycles."
to:NB  to_read  network_data_analysis  graphical_models  re:network_differences 
11 weeks ago by cshalizi
[1905.11381] Trust but Verify: An Information-Theoretic Explanation for the Adversarial Fragility of Machine Learning Systems, and a General Defense against Adversarial Attacks
"Deep-learning based classification algorithms have been shown to be susceptible to adversarial attacks: minor changes to the input of classifiers can dramatically change their outputs, while being imperceptible to humans. In this paper, we present a simple hypothesis about a feature compression property of artificial intelligence (AI) classifiers and present theoretical arguments to show that this hypothesis successfully accounts for the observed fragility of AI classifiers to small adversarial perturbations. Drawing on ideas from information and coding theory, we propose a general class of defenses for detecting classifier errors caused by abnormally small input perturbations. We further show theoretical guarantees for the performance of this detection method. We present experimental results with (a) a voice recognition system, and (b) a digit recognition system using the MNIST database, to demonstrate the effectiveness of the proposed defense methods. The ideas in this paper are motivated by a simple analogy between AI classifiers and the standard Shannon model of a communication system."
in_NB  information_theory  adversarial_examples  to_read  to_be_shot_after_a_fair_trial 
11 weeks ago by cshalizi
[1905.11382] State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations
"Machine learning promises methods that generalize well from finite labeled data. However, the brittleness of existing neural net approaches is revealed by notable failures, such as the existence of adversarial examples that are misclassified despite being nearly identical to a training example, or the inability of recurrent sequence-processing nets to stay on track without teacher forcing. We introduce a method, which we refer to as \emph{state reification}, that involves modeling the distribution of hidden states over the training data and then projecting hidden states observed during testing toward this distribution. Our intuition is that if the network can remain in a familiar manifold of hidden space, subsequent layers of the net should be well trained to respond appropriately. We show that this state-reification method helps neural nets to generalize better, especially when labeled data are sparse, and also helps overcome the challenge of achieving robust generalization with adversarial training."

--- My suspicion, admittedly based only on the abstract, is that this will, at best, be yet another re-invention of predictive states (http://bactra.org/notebooks/prediction-process.html). That would not, actually, be a bad thing.
to:NB  to_read  neural_networks  learning_theory  your_favorite_deep_neural_network_sucks  adversarial_examples 
11 weeks ago by cshalizi
[1901.08082] Cooperative Online Learning: Keeping your Neighbors Updated
"We study an asynchronous online learning setting with a network of agents. At each time step, some of the agents are activated, requested to make a prediction, and pay the corresponding loss. The loss function is then revealed to these agents and also to their neighbors in the network. When activations are stochastic, we show that the regret achieved by N agents running the standard Online Mirror Descent is (αT‾‾‾√), where T is the horizon and α≤N is the independence number of the network. This is in contrast to the regret Ω(NT‾‾‾√) which N agents incur in the same setting when feedback is not shared. We also show a matching lower bound of order αT‾‾‾√ that holds for any given network. When the pattern of agent activations is arbitrary, the problem changes significantly: we prove a Ω(T) lower bound on the regret that holds for any online algorithm oblivious to the feedback source."
to:NB  social_learning  online_learning  low-regret_learning  learning_theory  cesa-bianchi.nicolo  monteleoni.claire  to_read  re:democratic_cognition 
11 weeks ago by cshalizi
[1905.10854] All Neural Networks are Created Equal
"One of the unresolved questions in the context of deep learning is the triumph of GD based optimization, which is guaranteed to converge to one of many local minima. To shed light on the nature of the solutions that are thus being discovered, we investigate the ensemble of solutions reached by the same network architecture, with different random initialization of weights and random mini-batches. Surprisingly, we observe that these solutions are in fact very similar - more often than not, each train and test example is either classified correctly by all the networks, or by none at all. Moreover, all the networks seem to share the same learning dynamics, whereby initially the same train and test examples are incorporated into the learnt model, followed by other examples which are learnt in roughly the same order. When different neural network architectures are compared, the same learning dynamics is observed even when one architecture is significantly stronger than the other and achieves higher accuracy. Finally, when investigating other methods that involve the gradual refinement of a solution, such as boosting, once again we see the same learning pattern. In all cases, it appears as if all the classifiers start by learning to classify correctly the same train and test examples, while the more powerful classifiers continue to learn to classify correctly additional examples. These results are incredibly robust, observed for a large variety of architectures, hyperparameters and different datasets of images. Thus we observe that different classification solutions may be discovered by different means, but typically they evolve in roughly the same manner and demonstrate a similar success and failure behavior. For a given dataset, such behavior seems to be strongly correlated with effective generalization, while the induced ranking of examples may reflect inherent structure in the data."

!!!
to:NB  to_read  optimization  machine_learning  neural_networks  your_favorite_deep_neural_network_sucks 
12 weeks ago by cshalizi
[1905.10857] Causal Discovery and Forecasting in Nonstationary Environments with State-Space Models
"In many scientific fields, such as economics and neuroscience, we are often faced with nonstationary time series, and concerned with both finding causal relations and forecasting the values of variables of interest, both of which are particularly challenging in such nonstationary environments. In this paper, we study causal discovery and forecasting for nonstationary time series. By exploiting a particular type of state-space model to represent the processes, we show that nonstationarity helps to identify causal structure and that forecasting naturally benefits from learned causal knowledge. Specifically, we allow changes in both causal strengths and noise variances in the nonlinear state-space models, which, interestingly, renders both the causal structure and model parameters identifiable. Given the causal model, we treat forecasting as a problem in Bayesian inference in the causal model, which exploits the time-varying property of the data and adapts to new observations in a principled manner. Experimental results on synthetic and real-world data sets demonstrate the efficacy of the proposed methods."
to:NB  causal_inference  causal_discovery  state-space_models  time_series  non-stationarity  statistics  kith_and_kin  glymour.clark  to_read  zhang.kun 
12 weeks ago by cshalizi
[1808.06581] The Deconfounded Recommender: A Causal Inference Approach to Recommendation
"The goal of recommendation is to show users items that they will like. Though usually framed as a prediction, the spirit of recommendation is to answer an interventional question---for each user and movie, what would the rating be if we "forced" the user to watch the movie? To this end, we develop a causal approach to recommendation, one where watching a movie is a "treatment" and a user's rating is an "outcome." The problem is there may be unobserved confounders, variables that affect both which movies the users watch and how they rate them; unobserved confounders impede causal predictions with observational data. To solve this problem, we develop the deconfounded recommender, a way to use classical recommendation models for causal recommendation. Following Wang & Blei [23], the deconfounded recommender involves two probabilistic models. The first models which movies the users watch; it provides a substitute for the unobserved confounders. The second one models how each user rates each movie; it employs the substitute to help account for confounders. This two-stage approach removes bias due to confounding. It improves recommendation and enjoys stable performance against interventions on test sets."
in_NB  causal_inference  collaborative_filtering  blei.david  to_teach:data-mining  to_read 
12 weeks ago by cshalizi
Pensky : Dynamic network models and graphon estimation
"In the present paper, we consider a dynamic stochastic network model. The objective is estimation of the tensor of connection probabilities ΛΛ when it is generated by a Dynamic Stochastic Block Model (DSBM) or a dynamic graphon. In particular, in the context of the DSBM, we derive a penalized least squares estimator ΛˆΛ^ of ΛΛ and show that ΛˆΛ^ satisfies an oracle inequality and also attains minimax lower bounds for the risk. We extend those results to estimation of ΛΛ when it is generated by a dynamic graphon function. The estimators constructed in the paper are adaptive to the unknown number of blocks in the context of the DSBM or to the smoothness of the graphon function. The technique relies on the vectorization of the model and leads to much simpler mathematical arguments than the ones used previously in the stationary set up. In addition, all results in the paper are nonasymptotic and allow a variety of extensions."
to:NB  to_read  graph_limits  nonparametrics  network_data_analysis  re:smoothing_adjacency_matrices 
12 weeks ago by cshalizi
From Stochastic Thermodynamics to Thermodynamic Inference | Annual Review of Condensed Matter Physics
"For a large class of nonequilibrium systems, thermodynamic notions like work, heat, and, in particular, entropy production can be identified on the level of fluctuating dynamical trajectories. Within stochastic thermodynamics various fluctuation theorems relating these quantities have been proven. Their application to experimental systems requires that all relevant mesostates are accessible. Recent advances address the typical situation that only partial, or coarse-grained, information about a system is available. Thermodynamic inference as a general strategy uses consistency constraints derived from stochastic thermodynamics to infer otherwise hidden properties of nonequilibrium systems. An important class in this respect are active particles, for which we resolve the conflicting strategies that have been proposed to identify entropy production. As a paradigm for thermodynamic inference, the thermodynamic uncertainty relation provides a lower bound on the entropy production through measurements of the dispersion of any current in the system. Likewise, it quantifies the cost of precision for biomolecular processes. Generalizations and ramifications allow the inference of, inter alia, model-free upper bounds on the efficiency of molecular motors and of the minimal number of intermediate states in enzymatic networks."
to:NB  statistical_mechanics  thermodynamics  fluctuation-response  non-equilibrium  to_read 
12 weeks ago by cshalizi
Turbulence Modeling in the Age of Data | Annual Review of Fluid Mechanics
"Data from experiments and direct simulations of turbulence have historically been used to calibrate simple engineering models such as those based on the Reynolds-averaged Navier–Stokes (RANS) equations. In the past few years, with the availability of large and diverse data sets, researchers have begun to explore methods to systematically inform turbulence models with data, with the goal of quantifying and reducing model uncertainties. This review surveys recent developments in bounding uncertainties in RANS models via physical constraints, in adopting statistical inference to characterize model coefficients and estimate discrepancy, and in using machine learning to improve turbulence models. Key principles, achievements, and challenges are discussed. A central perspective advocated in this review is that by exploiting foundational knowledge in turbulence modeling and physical constraints, researchers can use data-driven approaches to yield useful predictive models."
to:NB  turbulence  statistics  to_read 
12 weeks ago by cshalizi
The Fokker–Planck Approach to Complex Spatiotemporal Disordered Systems | Annual Review of Condensed Matter Physics
"When the complete understanding of a complex system is not available, as, e.g., for systems considered in the real world, we need a top-down approach to complexity. In this approach, one may desire to understand general multipoint statistics. Here, such a general approach is presented and discussed based on examples from turbulence and sea waves. Our main idea is based on the cascade picture of turbulence, entangling fluctuations from large to small scales. Inspired by this cascade picture, we express the general multipoint statistics by the statistics of scale-dependent fluctuations of variables and relate it to a scale-dependent process, which finally is a stochastic cascade process. We show how to extract from empirical data a Fokker–Planck equation for this cascade process, which allows the generation of surrogate data to forecast extreme events as well as to develop a nonequilibrium thermodynamics for the complex systems. For each cascade event, an entropy production can be determined. These entropies accurately fulfill a rigorous law, namely the integral fluctuations theorem."
to:NB  stochastic_processes  random_fields  physics  statistical_mechanics  markov_models  macro_from_micro  to_be_shot_after_a_fair_trial  non-equilibrium  to_read 
12 weeks ago by cshalizi
Identification and Extrapolation of Causal Effects with Instrumental Variables | Annual Review of Economics
"Instrumental variables (IV) are widely used in economics to address selection on unobservables. Standard IV methods produce estimates of causal effects that are specific to individuals whose behavior can be manipulated by the instrument at hand. In many cases, these individuals are not the same as those who would be induced to treatment by an intervention or policy of interest to the researcher. The average causal effect for the two groups can differ significantly if the effect of the treatment varies systematically with unobserved factors that are correlated with treatment choice. We review the implications of this type of unobserved heterogeneity for the interpretation of standard IV methods and for their relevance to policy evaluation. We argue that making inferences about policy-relevant parameters typically requires extrapolating from the individuals affected by the instrument to the individuals who would be induced to treatment by the policy under consideration. We discuss a variety of alternatives to standard IV methods that can be used to rigorously perform this extrapolation. We show that many of these approaches can be nested as special cases of a general framework that embraces the possibility of partial identification."

--- Memo to self: Read this before revising the IV sections of ADAfaEPoV.
to:NB  causal_inference  instrumental_variables  partial_identification  statistics  re:ADAfaEPoV  to_read 
12 weeks ago by cshalizi
On the Statistical Formalism of Uncertainty Quantification | Annual Review of Statistics and Its Application
"The use of models to try to better understand reality is ubiquitous. Models have proven useful in testing our current understanding of reality; for instance, climate models of the 1980s were built for science discovery, to achieve a better understanding of the general dynamics of climate systems. Scientific insights often take the form of general qualitative predictions (i.e., “under these conditions, the Earth's poles will warm more than the rest of the planet”); such use of models differs from making quantitative forecasts of specific events (i.e. “high winds at noon tomorrow at London's Heathrow Airport”). It is sometimes hoped that, after sufficient model development, any model can be used to make quantitative forecasts for any target system. Even if that were the case, there would always be some uncertainty in the prediction. Uncertainty quantification aims to provide a framework within which that uncertainty can be discussed and, ideally, quantified, in a manner relevant to practitioners using the forecast system. A statistical formalism has developed that claims to be able to accurately assess the uncertainty in prediction. This article is a discussion of if and when this formalism can do so. The article arose from an ongoing discussion between the authors concerning this issue, the second author generally being considerably more skeptical concerning the utility of the formalism in providing quantitative decision-relevant information."
to:NB  to_read  statistics  prediction  risk_vs_uncertainty  smith.leonard  berger.james  foundations_of_statistics 
12 weeks ago by cshalizi
[1901.10113] Self-organization of action hierarchy and compositionality by reinforcement learning with recurrent networks
"Recurrent neural networks (RNNs) for reinforcement learning (RL) have shown distinct advantages, e.g., solving memory-dependent tasks and meta-learning. However, little effort has been spent on improving RNN architectures and on understanding the underlying neural mechanisms for performance gain. In this paper, we propose a novel, multiple-timescale, stochastic RNN for RL. Empirical results show that the network can autonomously learn to abstract sub-goals and can self-develop an action hierarchy using internal dynamics in a challenging continuous control task. Furthermore, we show that the self-developed compositionality of the network enhances faster re-learning when adapting to a new task that is a re-composition of previously learned sub-goals, than when starting from scratch. We also found that improved performance can be achieved when neural activities are subject to stochastic rather than deterministic dynamics."

--- We're going to re-invent Miller, Galanter and Pribram (1960) _Plans and the Structure of Behavior_, aren't we?
to:NB  to_read  reinforcement_learning  neural_networks  self-organization 
12 weeks ago by cshalizi
Multiresolution Network Models: Journal of Computational and Graphical Statistics: Vol 28, No 1
"Many existing statistical and machine learning tools for social network analysis focus on a single level of analysis. Methods designed for clustering optimize a global partition of the graph, whereas projection-based approaches (e.g., the latent space model in the statistics literature) represent in rich detail the roles of individuals. Many pertinent questions in sociology and economics, however, span multiple scales of analysis. Further, many questions involve comparisons across disconnected graphs that will, inevitably be of different sizes, either due to missing data or the inherent heterogeneity in real-world networks. We propose a class of network models that represent network structure on multiple scales and facilitate comparison across graphs with different numbers of individuals. These models differentially invest modeling effort within subgraphs of high density, often termed communities, while maintaining a parsimonious structure between said subgraphs. We show that our model class is projective, highlighting an ongoing discussion in the social network modeling literature on the dependence of inference paradigms on the size of the observed graph. We illustrate the utility of our method using data on household relations from Karnataka, India. Supplementary material for this article is available online."
to:NB  to_read  network_data_analysis  statistics  mccormick.tyler  fosdick.bailey  to_teach:baby-nets  re:fractal_network_asymptotics 
12 weeks ago by cshalizi
Lee , Song : Stable limit theorems for empirical processes under conditional neighborhood dependence
"This paper introduces a new concept of stochastic dependence among many random variables which we call conditional neighborhood dependence (CND). Suppose that there are a set of random variables and a set of sigma algebras where both sets are indexed by the same set endowed with a neighborhood system. When the set of random variables satisfies CND, any two non-adjacent sets of random variables are conditionally independent given sigma algebras having indices in one of the two sets’ neighborhood. Random variables with CND include those with conditional dependency graphs and a class of Markov random fields with a global Markov property. The CND property is useful for modeling cross-sectional dependence governed by a complex, large network. This paper provides two main results. The first result is a stable central limit theorem for a sum of random variables with CND. The second result is a Donsker-type result of stable convergence of empirical processes indexed by a class of functions satisfying a certain bracketing entropy condition when the random variables satisfy CND."
to:NB  to_read  empirical_processes  random_fields  stochastic_processes  central_limit_theorem 
12 weeks ago by cshalizi
McGoff , Mukherjee , Pillai : Statistical inference for dynamical systems: A review
"The topic of statistical inference for dynamical systems has been studied widely across several fields. In this survey we focus on methods related to parameter estimation for nonlinear dynamical systems. Our objective is to place results across distinct disciplines in a common setting and highlight opportunities for further research."
to:NB  to_read  dynamical_systems  statistical_inference_for_stochastic_processes  stochastic_processes  statistics  re:almost_none  re:stacs 
12 weeks ago by cshalizi
On the Interpretation of do(x) : Journal of Causal Inference
"This paper provides empirical interpretation of the do(x) operator when applied to non-manipulable variables such as race, obesity, or cholesterol level. We view do(x) as an ideal intervention that provides valuable information on the effects of manipulable variables and is thus empirically testable. We draw parallels between this interpretation and ways of enabling machines to learn effects of untried actions from those tried. We end with the conclusion that researchers need not distinguish manipulable from non-manipulable variables; both types are equally eligible to receive the do(x) operator and to produce useful information for decision makers."
to:NB  causality  pearl.judea  re:ADAfaEPoV  to_read 
12 weeks ago by cshalizi
JPAE at 25: Looking back and moving forward on teaching evaluations: Journal of Public Affairs Education: Vol 25, No 1
"In many if not most colleges and universities in the United States, raw scores from Student Evaluations of Teaching (SETs) are the primary tool of teaching assessment, and teaching evaluations often have real consequences for promotion and tenure. In 2005, JPAE published an article on teaching evaluations, and this article added to what was at that time a somewhat thin literature indicating that SETs are systematically biased against female faculty, and probably against older and minority faculty. Since that time, this literature has swelled and grown and now the evidence that SETs are invalid and systematically biased is too strong to ignore. Over its first 25 years, JPAE has been a force for good in public affairs education. As JPAE moves into its next 25 years, it should take a principled and evidence-based stand against the use of raw SETs as an important indicator of teaching quality, and should encourage high-quality articles studying other methods of assessing teaching so that we can learn what approaches are better."
to:NB  to_read  teaching  social_measurement  i_want_to_believe  to_be_shot_after_a_fair_trial 
may 2019 by cshalizi
[1904.06019] Conformal Prediction Under Covariate Shift
"We extend conformal prediction methodology beyond the case of exchangeable data. In particular, we show that a weighted version of conformal prediction can be used to compute distribution-free prediction intervals for problems in which the test and training covariate distributions differ, but the likelihood ratio between these two distributions is known---or, in practice, can be estimated accurately with access to a large set of unlabeled data (test covariate points). Our weighted extension of conformal prediction also applies more generally, to settings in which the data satisfies a certain weighted notion of exchangeability. We discuss other potential applications of our new conformal methodology, including latent variable and missing data problems."
to:NB  to_read  statistics  prediction  conformal_prediction  ramdas.aaditya  tibshirani.ryan  kith_and_kin  covariate_shift 
may 2019 by cshalizi
[1705.08527] Causal inference for social network data
"We extend recent work by van der Laan (2014) on causal inference for causally connected units to more general social network settings. Our asymptotic results allow for dependence of each observation on a growing number of other units as sample size increases. We are not aware of any previous methods for inference about network members in observational settings that allow the number of ties per node to increase as the network grows. While previous methods have generally implicitly focused on one of two possible sources of dependence among social network observations, we allow for both dependence due to contagion, or transmission of information across network ties, and for dependence due to latent similarities among nodes sharing ties. We describe estimation and inference for causal effects that are specifically of interest in social network settings."
to:NB  to_read  heard_the_talk  causal_inference  network_data_analysis  kith_and_kin  ogburn.elizabeth  van_der_laan.mark  re:homophily_and_confounding  to_teach:baby-nets 
april 2019 by cshalizi
A Spline Theory of Deep Learning
"We build a rigorous bridge between deep networks (DNs) and approximation theory via spline functions and operators. Our key result is that a large class of DNs can be written as a composition of max-affine spline operators (MASOs), which provide a powerful portal through which to view and analyze their inner workings. For instance, conditioned on the input signal, the output of a MASO DN can be written as a simple affine transformation of the input. This implies that a DN constructs a set of signal-dependent, class-specific templates against which the signal is compared via a simple inner product; we explore the links to the classical theory of optimal classification via matched filters and the effects of data memorization. Going further, we propose a simple penalty term that can be added to the cost function of any DN learning algorithm to force the templates to be orthogonal with each other; this leads to significantly improved classification performance and reduced overfitting with no change to the DN architecture. The spline partition of the input signal space opens up a new geometric avenue to study how DNs organize signals in a hierarchical fashion. As an application, we develop and validate a new distance metric for signals that quantifies the difference between their partition encodings."
to:NB  to_read  approximation  splines  neural_networks  machine_learning  your_favorite_deep_neural_network_sucks  via:csantos 
april 2019 by cshalizi
[1903.08560] Surprises in High-Dimensional Ridgeless Least Squares Interpolation
"Interpolators---estimators that achieve zero training error---have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum ℓ2 norm (`ridgeless') interpolation in high-dimensional least squares regression. We consider two different models for the feature distribution: a linear model, where the feature vectors xi∈ℝp are obtained by applying a linear transform to a vector of i.i.d.\ entries, xi=Σ1/2zi (with zi∈ℝp); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, xi=φ(Wzi) (with zi∈ℝd, W∈ℝp×d a matrix of i.i.d.\ entries, and φ an activation function acting componentwise on Wzi). We recover---in a precise quantitative way---several phenomena that have been observed in large-scale neural networks and kernel machines, including the `double descent' behavior of the prediction risk, and the potential benefits of overparametrization."

--- "Heard the talk" = "Ryan came into my office to explain it all because he was so enthused".
to:NB  to_read  regression  high-dimensional_statistics  interpolation  kith_and_kin  tibshirani.ryan  rosset.saharon  montanari.andrea  hastie.trevor  statistics  neural_networks  heard_the_talk 
april 2019 by cshalizi
[1903.08687] On approximate validation of models: A Kolmogorov-Smirnov based approach
"Classical tests of fit typically reject a model for large enough real data samples. In contrast, often in statistical practice a model offers a good description of the data even though it is not the "true" random generator. We consider a more flexible approach based on contamination neighbourhoods around a model. Using trimming methods and the Kolmogorov metric we introduce a functional statistic measuring departures from a contaminated model and the associated estimator corresponding to its sample version. We show how this estimator allows testing of fit for the (slightly) contaminated model vs sensible deviations from it, with uniformly exponentially small type I and type II error probabilities. We also address the asymptotic behavior of the estimator showing that, under suitable regularity conditions, it asymptotically behaves as the supremum of a Gaussian process. As an application we explore methods of comparison between descriptive models based on the paradigm of model falseness. We also include some connections of our approach with the False-Discovery-Rate setting, showing competitive behavior when estimating the contamination level, although applicable in a wider framework."

--- This would be very cool if it does what they say they want it to.
to:NB  to_read  goodness-of-fit  statistics  stochastic_models 
april 2019 by cshalizi
[1903.08766] A Method for Measuring Network Effects of One-to-One Communication Features in Online A/B Tests
"A/B testing is an important decision making tool in product development because can provide an accurate estimate of the average treatment effect of a new features, which allows developers to understand how the business impact of new changes to products or algorithms. However, an important assumption of A/B testing, Stable Unit Treatment Value Assumption (SUTVA), is not always a valid assumption to make, especially for products that facilitate interactions between individuals. In contexts like one-to-one messaging we should expect network interference; if an experimental manipulation is effective, behavior of the treatment group is likely to influence members in the control group by sending them messages, violating this assumption. In this paper, we propose a novel method that can be used to account for network effects when A/B testing changes to one-to-one interactions. Our method is an edge-based analysis that can be applied to standard Bernoulli randomized experiments to retrieve an average treatment effect that is not influenced by network interference. We develop a theoretical model, and methods for computing point estimates and variances of effects of interest via network-consistent permutation testing. We then apply our technique to real data from experiments conducted on the messaging product at LinkedIn. We find empirical support for our model, and evidence that the standard method of analysis for A/B tests underestimates the impact of new features in one-to-one messaging contexts."
to:NB  to_read  experimental_design  network_data_analysis  statistics  re:do_not_adjust_your_receiver 
april 2019 by cshalizi
[1903.02541] Relational Pooling for Graph Representations
"This work generalizes graph neural networks (GNNs) beyond those based on the Weisfeiler-Lehman (WL) algorithm, graph Laplacians, and graph diffusion kernels. Our approach, denoted Relational Pooling (RP), draws from the theory of finite partial exchangeability to provide a framework with maximal representation power for graphs. RP can work with existing graph representation models, and somewhat counterintuitively, can make them even more powerful than the original WL isomorphism test. Additionally, RP is the first theoretically sound framework to use architectures like Recurrent Neural Networks and Convolutional Neural Networks for graph classification. RP also has graph kernels as a special case. We demonstrate improved performance of novel RP-based graph representations over current state-of-the-art methods on a number of tasks."
to:NB  to_read  network_data_analysis  graph_limits  neural_networks 
april 2019 by cshalizi
[1903.01672] Causal Discovery from Heterogeneous/Nonstationary Data
"It is commonplace to encounter heterogeneous or nonstationary data, of which the underlying generating process changes across domains or over time. Such a distribution shift feature presents both challenges and opportunities for causal discovery. In this paper, we develop a framework for causal discovery from such data, called Constraint-based causal Discovery from heterogeneous/NOnstationary Data (CD-NOD), to find causal skeleton and directions and estimate the properties of mechanism changes. First, we propose an enhanced constraint-based procedure to detect variables whose local mechanisms change and recover the skeleton of the causal structure over observed variables. Second, we present a method to determine causal orientations by making use of independent changes in the data distribution implied by the underlying causal model, benefiting from information carried by changing distributions. After learning the causal structure, next, we investigate how to efficiently estimate the `driving force' of the nonstationarity of a causal mechanism. That is, we aim to extract from data a low-dimensional representation of changes. The proposed methods are nonparametric, with no hard restrictions on data distributions and causal mechanisms, and do not rely on window segmentation. Furthermore, we find that data heterogeneity benefits causal structure identification even with particular types of confounders. Finally, we show the connection between heterogeneity/nonstationarity and soft intervention in causal discovery. Experimental results on various synthetic and real-world data sets (task-fMRI and stock market data) are presented to demonstrate the efficacy of the proposed methods."
to:NB  statistics  causal_inference  causal_discovery  non-stationarity  kith_and_kin  glymour.clark  to_read 
april 2019 by cshalizi
[1902.03515] Multi-Domain Translation by Learning Uncoupled Autoencoders
"Multi-domain translation seeks to learn a probabilistic coupling between marginal distributions that reflects the correspondence between different domains. We assume that data from different domains are generated from a shared latent representation based on a structural equation model. Under this assumption, we show that the problem of computing a probabilistic coupling between marginals is equivalent to learning multiple uncoupled autoencoders that embed to a given shared latent distribution. In addition, we propose a new framework and algorithm for multi-domain translation based on learning the shared latent distribution and training autoencoders under distributional constraints. A key practical advantage of our framework is that new autoencoders (i.e., new domains) can be added sequentially to the model without retraining on the other domains, which we demonstrate experimentally on image as well as genomics datasets."

--- Last tag is tentative.
to:NB  machine_learning  to_read  inference_to_latent_objects  uhler.caroline  factor_analysis 
april 2019 by cshalizi
The Lasserre Hierarch in Approximation Algorithms
"The Lasserre hierarchy is a systematic procedure to strengthen a relaxation for
an optimization problem by adding additional variables and SDP constraints. In
the last years this hierarchy moved into the focus of researchers in approximation algorithms as the obtain relaxations have provably nice properties. In particular on the t -th level, the relaxation can be solved in time n^O(t) and every constraint that one could derive from looking just at t variables is automatically satisfied. Additionally, it provides a vector embedding of events so that probabilities are expressable as inner products.
"The goal of these lecture notes is to give short but rigorous proofs of all key properties of the Lasserre hierarchy. In the second part we will demonstrate how the Lasserre SDP can be applied to (mostly NP-hard) optimization problems such as KNAPSACK, MATCHING, MAXCUT (in general and in dense graphs), 3-COLORING and SETCOVER."

--- I remember Cris trying to explain this to me once...
to:NB  to_read  approximation  optimization  re:in_soviet_union_optimization_problem_solves_you 
april 2019 by cshalizi
[1807.00564] Inference, Learning, and Population Size: Projectivity for SRL Models
"A subtle difference between propositional and relational data is that in many relational models, marginal probabilities depend on the population or domain size. This paper connects the dependence on population size to the classic notion of projectivity from statistical theory: Projectivity implies that relational predictions are robust with respect to changes in domain size. We discuss projectivity for a number of common SRL systems, and identify syntactic fragments that are guaranteed to yield projective models. The syntactic conditions are restrictive, which suggests that projectivity is difficult to achieve in SRL, and care must be taken when working with different domain sizes."
to:NB  to_read  relational_learning  re:your_favorite_ergm_sucks  probability 
march 2019 by cshalizi
[1609.08816] Identifying Causal Effects With Proxy Variables of an Unmeasured Confounder
"We consider a causal effect that is confounded by an unobserved variable, but with observed proxy variables of the confounder. We show that, with at least two independent proxy variables satisfying a certain rank condition, the causal effect is nonparametrically identified, even if the measurement error mechanism, i.e., the conditional distribution of the proxies given the con- founder, may not be identified. Our result generalizes the identification strategy of Kuroki & Pearl (2014) that rests on identification of the measurement error mechanism. When only one proxy for the confounder is available, or the required rank condition is not met, we develop a strategy to test the null hypothesis of no causal effect."
to:NB  to_read  causal_inference  statistics 
march 2019 by cshalizi
[1902.10286] On Multi-Cause Causal Inference with Unobserved Confounding: Counterexamples, Impossibility, and Alternatives
"Unobserved confounding is a central barrier to drawing causal inferences from observational data. Several authors have recently proposed that this barrier can be overcome in the case where one attempts to infer the effects of several variables simultaneously. In this paper, we present two simple, analytical counterexamples that challenge the general claims that are central to these approaches. In addition, we show that nonparametric identification is impossible in this setting. We discuss practical implications, and suggest alternatives to the methods that have been proposed so far in this line of work: using proxy variables and shifting focus to sensitivity analysis."

--- Counter-examples to the "deconfounder" approach.
in_NB  causal_inference  statistics  to_read  factor_analysis  via:betsy_ogburn 
march 2019 by cshalizi
[1902.04114] Using Embeddings to Correct for Unobserved Confounding
"We consider causal inference in the presence of unobserved confounding. In particular, we study the case where a proxy is available for the confounder but the proxy has non-iid structure. As one example, the link structure of a social network carries information about its members. As another, the text of a document collection carries information about their meanings. In both these settings, we show how to effectively use the proxy to do causal inference. The main idea is to reduce the causal estimation problem to a semi-supervised prediction of both the treatments and outcomes. Networks and text both admit high-quality embedding models that can be used for this semi-supervised prediction. Our method yields valid inferences under suitable (weak) conditions on the quality of the predictive model. We validate the method with experiments on a semi-synthetic social network dataset. We demonstrate the method by estimating the causal effect of properties of computer science submissions on whether they are accepted at a conference."
to:NB  causal_inference  statistics  blei.david  re:homophily_and_confounding  to_read 
february 2019 by cshalizi
The Bias Is Built In: How Administrative Records Mask Racially Biased Policing by Dean Knox, Will Lowe, Jonathan Mummolo :: SSRN
"Researchers often lack the necessary data to credibly estimate racial bias in policing. In particular, police administrative records lack information on civilians that police observe but do not investigate. In this paper, we show that if police racially discriminate when choosing whom to investigate, using administrative records to estimate racial bias in police behavior amounts to post-treatment conditioning, and renders many quantities of interest unidentified---even among investigated individuals---absent strong and untestable assumptions. In most cases, no set of controls can eliminate this statistical bias, the exact form of which we derive through principal stratification in a causal mediation framework. We develop a bias-correction procedure and nonparametric sharp bounds for race effects, replicate published findings, and show traditional estimation techniques can severely underestimate levels of racially biased policing or even mask discrimination entirely. We conclude by outlining a general and feasible design for future studies that is robust to this inferential snare."
to:NB  to_read  causal_inference  police  discrimination  statistics  to_teach:undergrad-ADA  via:henry_farrell 
february 2019 by cshalizi
Social Space Diffusion: Applications of a Latent Space Model to Diffusion with Uncertain Ties - Jacob C. Fisher, 2019
"Social networks represent two different facets of social life: (1) stable paths for diffusion, or the spread of something through a connected population, and (2) random draws from an underlying social space, which indicate the relative positions of the people in the network to one another. The dual nature of networks creates a challenge: if the observed network ties are a single random draw, is it realistic to expect that diffusion only follows the observed network ties? This study takes a first step toward integrating these two perspectives by introducing a social space diffusion model. In the model, network ties indicate positions in social space, and diffusion occurs proportionally to distance in social space. Practically, the simulation occurs in two parts. First, positions are estimated using a statistical model (in this example, a latent space model). Then, second, the predicted probabilities of a tie from that model—representing the distances in social space—or a series of networks drawn from those probabilities—representing routine churn in the network—are used as weights in a weighted averaging framework. Using longitudinal data from high school friendship networks, the author explores the properties of the model. The author shows that the model produces smoothed diffusion results, which predict attitudes in future waves 10 percent better than a diffusion model using the observed network and up to 5 percent better than diffusion models using alternative, non-model-based smoothing approaches."
to:NB  to_read  social_influence  social_networks  network_data_analysis  re:homophily_and_confounding  to_teach:baby-nets  via:gabriel_rossman 
february 2019 by cshalizi
« earlier      
per page:    204080120160

related tags

aaronson.scott  abbott.andrew  academia  active_learning  acupuncture  adamic.lada  adaptive_behavior  addiction  additive_models  adversarial_examples  afghanistan  agent-based_models  ai  aids  airoldi.edo  al-qaeda  albers.dave  alcoholism  aldous.david_j.  algebra  algorithmic_fairness  algorithmic_information_theory  allometry  amaral.luis  amari.shun-ichi  american_hegemony  american_history  analogy  analysis  ancel_meyers.lauren  ancient_history  anderson.perry  andrews.donald_w._k.  and_I_wish_I_could_tag_this_for:aaronsw  anomaly_detection  anthropology  appropriations_of_complexity  approximate_bayesian_computation  approximation  arab_spring  archaeology  arlot.sylvain  aronow.peter  arrow.kenneth  arrow_of_time  artificial_intelligence  astronomy  asymptotics  atay.fatihcan  attention  attractor_reconstruction  autism  automata_theory  automation  averaged_equations_of_motion  ay.nihat  backfitting  bad_data_analysis  bad_science_journalism  bail  baker.dean  bakshy.eytan  balduzzi.david  balkin.jack  ballistic_computation  bandit_problems  banking  barron.andrew_w.  bartlett.peter_l.  barvinok.alexander  bassett.danielle_s.  batterman.robert_w  baveja.satinder_singh  bayesianism  bayesian_consistency  bednar.jenna  behavioral_genetics  belusov-zhabotinsky  berger.james  bergstrom.carl_t.  bernstein-von_mises  bialek.william  bias-variance  biau.gerard  bibliometry  bickel.peter_j.  biochemical_networks  bioinformatics  biological_computation  biology  biophysics  biotechnology  birds  birge.lucien  blanchard.gilles  blanchard.olivier  blattman.chris  blei.david  blogged  blogging  blume.andreas  bollt.erik_m.  books:noted  books:owned  book_reviews  boosting  boots.byron  bootstrap  borgs.christian  bosnia  bottou.leon  bounded_rationality  bousquet.olivier  bowen.william_g.  bowles.samuel  branching_processes  breiman.leo  buchman.susan  buhlmann.peter  bureaucracy  by_people_i_know  calibration  campaign_finance  capitalism  carroll.sean  categorization  category_theory  cats  causality  causal_discovery  causal_inference  cavalli-sforza.l.luca  cellular_automata  censorship  centola.damon  central_asia  central_limit_theorem  cesa-bianchi.nicolo  chambers.john  chandrasekhar.arun  change-point_problem  changing_the_subject  chaos  chatterjee.sourav  chayes.jennifer  chicago  china  china:prc  chow-liu_trees  christakis.nicholas  christianity  chung.fan  citation_networks  cities  clarke.kevin  classifiers  class_struggles_in_america  clauset.aaron  climate_change  climatology  clinical-vs-actuarial_prediction  clinical_vs_actuarial_judgment  clustering  coarse-graining  cognition  cognitive_development  cognitive_science  cognitive_tools  cognitive_triage  cohen.michael  cohn.henry  coleman.todd  collaborative_filtering  collective_action  collective_cognition  collins.harry  community_discovery  comparative_history  complexity  complexity_measures  compressed_sensing  computability  computation  computational_complexity  computational_statistics  computers  concentration_of_measure  conditional_independence  confidence_sets  conformal_prediction  congress  consciousness  consistency  contagion  control_theory  control_theory_and_control_engineering  convergence_of_stochastic_processes  convexity  convex_sets  cool_if_true  copulas  corporations  cosmology  cost_disease  counter-insurgency  couzin.iain  covariate_shift  cox.d.r  cox.d.r.  credit  crime  cross-validation  crutchfield.james_p.  CSSR  cultural_differences  cultural_evolution  cultural_transmission  cultural_transmission_of_cognitive_tools  cultural_universals  culture  cumulants  curse_of_dimensionality  curve_fitting  danks.david  darwin_machines  dasgupta.anirban  databases  data_analysis  data_collection  data_mining  data_sets  david.paul  dawid.philip  debowski.lukasz  debunking  deceiving_us_has_become_an_industrial_process  decision-making  decision_theory  decision_trees  defenses_of_liberalism  della_penna.nicholas  dembo.amir  democracy  density_estimation  density_ratio_estimation  determinism  developmental_biology  development_economics  deviation_inequalities  devroye.luc  dewey.john  de_deo.simon  diaconis.persi  dietterich.thomas  differential_equations  diffusion_maps  diffusion_of_innovations  dimension_reduction  directed_information  discretization  discrimination  distributed_systems  diversity  document_summarization  domingos.pedro  donoho.david  donskers_theorem  drugs  dr_marx_dr_karl_marx_to_the_red_courtesy_phone_please  dudoit.sandrine  dupuis.paul  dynamical_systems  early_modern_european_history  earthquakes  eberhardt.frederick  eckles.dean  ecology  econometrics  economics  economic_growth  economic_history  economic_policy  education  effective_field_theories  eichler.michael  eliassi-rad.tina  elites  ellis.richard  emergence  emotion  empirical_likelihood  empirical_processes  em_algorithm  enceladus  encompassing  energy  ensemble_methods  entableted  entropy  entropy_estimation  entropy_rate  epidemic_models  epidemiology  epidemiology_of_ideas  epidemiology_of_representations  epistemology  ergodic_decomposition  ergodic_theory  error-in-variables  essentialism  estimation  estimation_of_dynamical_systems  ethnography  eurasian_history  europe  every_biological_invention_begins_as_a_perversion  evolution  evolutionary_biology  evolutionary_economics  evolutionary_game_theory  evolutionary_optimization  evolutionary_psychology  evolution_of_cognition  evolution_of_complexity  evolution_of_cooperation  evolving_local_rules  executive_function  expectation-maximization  experimental_biology  experimental_design  experimental_economics  experimental_psychology  experimental_sociology  explanation  exploration-exploitation  exponential_convergence_of_empirical_probabilities  exponential_families  exponential_family_random_graphs  extreme_values  facebook  face_perception  face_recognition  factor_analysis  fan.jianqing  fast-and-frugal_heuristics  feedback  feigenbaum.mitchell  feldman.david_p.  feminism  feynman_diagrams  field_theory  fienberg.stephen_e.  filtering  finance  financialization  financial_crisis_of_2007--  financial_markets  fisher_information  flaxman.seth  flickr  flocks_and_swarms  fluctuation-response  flynn_effect  fmri  folklore  foreign_policy  forensics  fosdick.bailey  foster.dean_p.  foundations_of_statistics  fourier_analysis  fox.emily  fox.emily_b.  fractals  fraud  freeman.peter  functional_central_limit_theorem  functional_connectivity  fung.archon  g'sell.max  galstyan.aram  galves.antonio  game_theory  gangs  gaussian_processes  gene_expression_data_analysis  gene_regulation  genomics  geography  geology  geometry  gershman.sam  gershman.samuel  getoor.lise  gharamani.zoubin  gigerenzer.gerd  gladwell.malcolm  globalization  glymour.clark  gneiting.tilmann  godfrey-smith.peter  goel.sharad  going_to_miss_the_talk  goodness-of-fit  gordon.geoff  gordon.geoffrey  gordon.geoffrey_j.  gourieroux.christian  grammar_induction  granger_causality  graphical  graphical_models  graph_discovery  graph_embedding  graph_limits  graph_spectra  graph_theory  great_risk_shift  great_transformation  groupthink  grunwald.peter  guerrilla_warfare  guestrin.carlos  gustafson.paul  gutfraind.sasha  guttorp.peter  haavelmo.trygve  habit  hacking.ian  handcock.mark  hansen.ben  hansen.bruce  hansen.christian  hardle.wolfgang  harris.zellig  hastie.trevor  have_forgotten  have_read  have_skimmed  hayek.f.a._von  heard_the_talk  heavy_tails  herding  heritability  heuristics  hierarchical_statistical_models  hierarchical_structure  high-dimensional_probability  high-dimensional_statistics  hilbert_space  historical_genetics  historical_linguistics  historical_materialism  historiography  history  history_of_biology  history_of_mathematics  history_of_physics  history_of_science  history_of_statistics  history_of_technology  hoff.peter  hofman.jake  holmes.susan  homophily  hooker.giles  hoover.kevin_d.  hopcroft.john  hsu.daniel  humanities  human_evolution  human_genetics  hurricanes  hydrothermal_vents  hyperbolic_geometry  hypergraphs  hypothesis_testing  iceland  ideal-point_models  identifiability  identity_group_formation  identity_politics  ideology  implicit_association_test  implicit_learning  increasing_returns  independence_testing  independence_tests  india  indirect_inference  individual_sequence_prediction  industrial_revolution  inequality  inference_to_latent_objects  influence  influenza  information_bottleneck  information_cascades  information_criteria  information_geometry  information_retrieval  information_theory  innovation  input-output_analysis  insects  institutions  instrumental_variables  intellectual_property  intelligence_(psychology)  intelligence_(spying)  interacting_particle_systems  interface_design  international_relations  internet  interpolation  interpretation  inverse_problems  in_library  in_NB  ionides.edward  iq  ising_model  islam  islamic_civilization  i_want_to_believe  jackson.matthew_o.  jacobs.abigail_z.  jaeger.herbert  janzing.dominik  jensen.shane  jin.jiashun  jordan.michael_i.  jost.jurgen  journalism  k-means  kac.mark  kadane.jay  kadanoff.leo  kahan.dan  kakade.sham  kalisch.markus  KAM_theory  kantz.holger  karrer.brian  kelly.kevin_t.  kennedy.edward_h.  kernel_estimators  kernel_methods  khinchin.a._i.  king.gary  kirman.alan  kith_and_kin  kleinberg.jon  kolaczyk.eric  kolar.mladen  koller.daphne  kontorovich.aryeh  kontoyiannis.ioannis  koyama.shinsuke  krakauer.david  kriegeskorte.nikolaus  krioukov.dmitri  lafferty.john  lahiri.s.n.  landauers_principle  landemore.helene  large_deviations  lashley.karl  lasso  latent_variables  latex  lauritzen.steffen  law  lazer.david  lead  learning  learning_in_games  learning_theory  lebanon.guy  lebaron.blake  lecue.guillaume  lee.ann_b.  lei.jing  leonardi.florencia  lerman.kristina  leskovec.jure  levels_of_selection  levin.simon  levina.liza  liberalism  liberman.mark  likelihood  linear_regression  linguistics  link_prediction  lipson.hod  literary_criticism  literary_history  liu.han  lives_of_the_scientists  lizardo.omar  logic  logical_positivism  lohr.wolfgang  long-memory_processes  long-range_dependence  lovett.marsha  low-dimensional_summaries  low-rank_approximation  low-regret_learning  lugosi.gabor  maathuis.marloes  machine_learning  machta.jon  macroeconomics  macro_from_micro  mahoney.michael  mandelbrot.benoit  manifold_learning  manski.charles  mapping  maps  markets_as_collective_calculating_devices  market_failures_in_everything  markov_models  martingales  mason.joshua_w.  massart.pascal  mathematics  matloff.norm  matrix_and_tensor_factorization  matthew_effect  maxwell.james_clerk  maxwells_demon  mcafee.noelle  mccormick.tyler  mccormick.tyler_h.  meaning_as_location_in_a_system_of_relations  measurement  measure_theory  mechanism_design  medicine  medieval_eurasian_history  meinshausen.nicolai  memory  mendelson.shahar  mental_testing  mercier.hugo  meta-analysis  metastability  meteorology  methodological_advice  methodology  method_of_moments  migraine  military_industrial_complex  minimax  missing_data  misspecification  mixing  mixture_models  modeling  model_averaging  model_checking  model_discovery  model_selection  model_uncertainty  moderate_deviations  modularity  mohri.mehryar  mohri.meryar  molecular_dynamics  monfort.alain  montanari.andrea  monteleoni.claire  monte_carlo  moocs  moore.cristopher  moral_philosophy  moral_psychology  moretti.franco  morozov.evgeny  morphogenesis  morvai.gusztav  moulines.eric  mucha.peter  mucha.peter_j.  multidimensional_scaling  multiple_testing  murray-watters.alexander  naidu.suresh  names  narrative  natural_history_of_truthiness  natural_language_processing  nearest_neighbors  neat_nonlinear_nonsense  networked_life  networks  network_alignment  network_comparison  network_data_analysis  network_differences  network_experiments  network_formation  network_sampling  network_visualization  neural_coding_and_decoding  neural_computation  neural_control_of_action  neural_data_analysis  neural_modeling  neural_networks  neuropsychology  neuroscience  neville.jennifer  newman.mark  nilsson_jacobi.martin  nobel.andrew  nominate  non-equilibrium  non-stationarity  nonparametrics  nordhaus.william  nowak.robert  no_really_via:io9  numeracy  oatley.thomas  ober.josiah  obesity  observable_operator_models  occams_razor  of_course_we_shouldn't_forget_the_marketers  ogburn.elizabeth  oligarchy  online_learning  on_the_accuracy_of_economic_observations  optimization  orbanz.peter  organizations  our_decrepit_institutions  owen.art  p-values  page.scott  pakistan  paper_writing  parallelism  parenting  partial_identification  particle_filters  partisanship  path_dependence  pattern_formation  pearl.judea  pedagogy  peer_production  peer_review  perception  perry.patrick_o.  persianate_culture  peters.jonas  phase_transitions  philosophy_of_science  photos  phylogenetics  physics  physics_of_information  pillai.natesh  pitman.jim  pittsburgh  plagues_and_peoples  planning  poincare_recurrence  point_processes  polanyi.karl  poldrack.russell  police  policing  politcal_theory  political_economy  political_networks  political_parties  political_philosophy  political_science  pollard.david  polletta.francesca  polya.george  porter.mason_a.  post-selection_inference  poverty  pragmatics  pre-validation  prediction  prediction_trees  predictive_representations  predictive_states  preference_falisifcation  prejudice  previous_tag_was_in_poor_taste  prices  primo.david  principal_components  printing  prison  privacy  privatization  probability  productivity  programming  progressive_forces  psychiatry  psychology  psychometrics  publication_bias  public_policy  quantum_mechanics  R  racism  radev.dragomir  raginsky.maxim  rakhlin.alexander  rakhlin.sasha  ramdas.aaditya  randal.douc  randomization  random_boolean_networks  random_fields  random_forests  random_matrices  random_projections  rashevsky.nicolas  rationality  ravikumar.pradeep  re:6dfb  re:actually-dr-internet-is-the-name-of-the-monsters-creator  re:ADAfaEPoV  re:aggregating_random_graphs  re:almost_none  re:anti-nudge  re:AoS_project  re:backwards_arrow  re:bayes_as_evol  re:bootstrapping_graphs  re:computational_lens  re:critique_of_diffusion  re:data_science_whitepaper  re:democratic_cognition  re:do-institutions-evolve  re:donor_networks  re:do_not_adjust_your_receiver  re:fitness_sampling  re:fractal_asymptotics_of_networks  re:fractal_network_asymptotics  re:freshman_seminar_on_optimization  re:friday_cat-blogging  re:functional_communities  re:growing_ensemble_project  re:g_paper  re:homophily_and_confounding  re:hyperbolic_networks  re:IMF_talk  re:in_soviet_union_optimization_problem_solves_you  re:knightian_uncertainty  re:model_selection_for_networks  re:naive-semi-supervised  re:network_bootstraps  re:network_differences  re:neutral_model_of_inquiry  re:phil-of-bayes_paper  re:prediction-without-racism  re:risk_bounds_for_time_series  re:smoothing_adjacency_matrices  re:social_networks_as_sensor_networks  re:stacs  re:urban_scaling_what_urban_scaling  re:what_is_a_macrostate  re:XV_for_mixing  re:XV_for_networks  re:your_favorite_dsge_sucks  re:your_favorite_ergm_sucks  reaction-diffusion  recht.benjamin  reciprocity  record_linkage  recurrence_times  reddy.sanjay  reductionism  reformation  regression  regulation  reinforcement_learning  relational_learning  relativity  religion  remarkable_if_true  renormalization  replicator_dynamics  representation  resampling  respondent-driven_sampling  revolution  rhetoric  richards.joey  richardson.thomas  rigollet.philippe  rinaldo.alessandro  risk_assessment  risk_vs_uncertainty  robert.christian  robins.james  robust_statistics  rockmore.dan  rodu_jordan  rogeberg.ole  rosenbaum.paul  rosset.saharon  rosvall.martin  ruelle.david  runciman.w.g.  ryabko.b._ya.  ryabko.daniil  sadeghi.kayvan  sales.adam  samii.cyrus  sandler.mark  sarkar.purnamrita  saturn  schafer.chad  schapire.robert_e.  schweinberger.michael  science  science_as_a_social_process  science_fiction  science_policy  science_studies  scientific_revolution  scooped  scooped?  scotland  scott.clayton  segregation  self-fulfilling_prophecies  self-fulfilling_prophecy  self-organization  self-organized_criticality  semi-supervised_learning  send_a_note  sethi.rajiv  sethna.james  sexism  shpitser.ilya  shrinkage  siddiqi.sajid_m.  signal_processing  signal_transduction  simulation  singh.satinder_baveja  situations_and_attitudes  skocpol.theda  slavery  slee.tom  smith.eric  smith.leonard  smith.vernon  smoking  smoothing  socialism  socialist_calculation_debate  social_construction  social_contagion  social_engineering  social_influence  social_learning  social_life_of_the_mind  social_measurement  social_media  social_movements  social_networks  social_norms  social_organization  social_psychology  social_science_methodology  social_theory  sociology  sociology_of_science  sofic_processes  song.le  soviet-afghan_war  sparsity  spatial_statistics  spatio-temporal_statistics  spectral_clustering  spectral_estimation  spectral_methods  spirtes.peter  splines  sporns.olaf  stability_of_learning  standardized_testing  state-building  state-space_models  state-space_reconstruction  state_estimation  stationarity  statistical_inference_for_stochastic_processes  statistical_mechanics  statistics  steins_method  steinwart.ingo  stiglitz.joseph  stochastic_approximation  stochastic_differential_equations  stochastic_models  stochastic_processes  stotz.karola  stovel.katherine  stress  structured_data  sufficiency  sugihara.george  suhay.liz  suicide  support-vector_machines  surveillance  surveys  symbolic_dynamics  synchronization  synchronizing_words  syntax  tacit_knowledge  tagging  taxes  teaching  technological_change  technological_unemployment  tenenbaum.joshua  termites  text_mining  theoretical_computer_science  thermodynamics  thermodynamic_formalism  the_american_dilemma  the_civilizing_process  the_continuing_crises  the_printing_press_as_an_agent_of_change  things_that_should_not_be  tibshirani.robert  tibshirani.ryan  time-series  time_series  tin_NB  tishby.naftali  tkacik.maureen  to:blog  to:NB  tools_into_theories  topic_models  topology  total_factor_productivity  to_be_shot_after_a_fair_trial  to_read  to_teach  to_teach:advanced-stochastic-processes  to_teach:baby-nets  to_teach:complexity-and-inference  to_teach:data-mining  to_teach:data_over_space_and_time  to_teach:graphons  to_teach:statcomp  to_teach:undergrad-ADA  track_down_references  transaction_networks  travelers'_tales  trump.donald  tuberculosis  turbulence  turing_mechanism  two-sample_tests  typography  ugander.johan  uhler.caroline  ulam.stanislaw  unions  universal_prediction  unsupervised_learning  urban_economics  us  us-iraq_war  ussr  us_military  us_politics  vagueness  vanderweele.tyler  van_der_laan.mark  van_der_vaart.aad  van_de_geer.sara  van_handel.ramon  variable-length_markov_models  variable_selection  variance_estimation  variational_inference  vast_right-wing_conspiracy  vats.divyanshu  vc-dimension  ver_steeg.greg  via:?  via:aaron_clauset  via:absfac  via:afinetheorem  via:aks  via:ale  via:anoopsarkar  via:ariddell  via:arinaldo  via:arsyed  via:arthegall  via:a_budding_sociologists_commonplace_book  via:betsy_ogburn  via:blattman  via:cris_moore  via:csantos  via:d-squared  via:ded-maxim  via:djm1107  via:dpfeldman  via:dsparks  via:dynamic_ecology  via:gabriel_rossman  via:gelman  via:guslacerda  via:harry_wynn  via:henry_farrell  via:hugo_mercier  via:io9  via:jbdelong  via:jhofman  via:joncgoodwin  via:justin  via:kevin_kelly  via:kjhealy  via:krackhardt  via:krugman  via:languagelog  via:larry_wasserman  via:magistra_et_mater  via:mason_porter  via:mberryman  via:melanie_mitchell  via:merriam  via:mindhacks  via:monkeycage  via:mraginsky  via:neuroanthropology  via:nikete  via:nybooks  via:nyhan  via:orgtheory  via:orzelc  via:paper_I_refereed_and_can't_tell_you_about  via:phnk  via:rocha  via:rvenkat  via:santerre  via:seth  via:shivak  via:sifu_tweety  via:spangledrongo  via:spencer-ackerman  via:tslumley  via:vaguery  via:wiggins  via:xmarquez  via:yorksranter  vidyasagar.mathukumalli  violence  vision  visual_display_of_quantitative_information  von_mises.richard  voter_model  vu.vincent  vul.edward  wahba.grace  wainwright.martin_j.  waiting_times  war  warfare_of_science_and_theology_in_christendom  warmuth.nanny  wasserman.larry  watkins.nicholas  watts.duncan  weiss.benjamin  wellman.michael  wellman.michael_p.  wemuth.nanny  whales  whats_gone_wrong_with_america  where_by_"heard_the_talk"_I_mean_had_it_explained_over_coffee  white.halbert  why_oh_why_cant_we_have_a_better_academic_publishing_system  why_oh_why_cant_we_have_a_better_press_corps  wiggins.chrisropher  wikipedia  wolfe.patrick_j.  world_history  xing.eric  yarkoni.tal  your_favorite_deep_neural_network_sucks  yu.bin  yu.byron  zhang.jiji  zhang.kun  zhang.tong  zhu.jerry 

Copy this bookmark:



description:


tags: