The Incompatible Incentives of Private Sector AI by Tom Slee :: SSRN

20 hours ago by cshalizi

"Algorithms that sort people into categories are plagued by incompatible incentives. While more accurate algorithms may address problems of statistical bias and unfairness, they cannot solve the ethical challenges that arise from incompatible incentives.

"Subjects of algorithmic decisions seek to optimize their outcomes, but such efforts may degrade the accuracy of the algorithm. To maintain their accuracy, algorithms must be accompanied by supplementary rules: “guardrails” that dictate the limits of acceptable behaviour by subjects. Algorithm owners are drawn into taking on the tasks of governance, managing and validating the behaviour of those who interact with their systems.

"The governance role offers temptations to indulge in regulatory arbitrage. If governance is left to algorithm owners, it may lead to arbitrary and restrictive controls on individual behaviour. The goal of algorithmic governance by automated decision systems, social media recommender systems, and rating systems is a mirage, retreating into the distance whenever we seem to approach it."

to:NB
mechanism_design
prediction
data_mining
slee.tom
to_read
to_teach:data-mining
"Subjects of algorithmic decisions seek to optimize their outcomes, but such efforts may degrade the accuracy of the algorithm. To maintain their accuracy, algorithms must be accompanied by supplementary rules: “guardrails” that dictate the limits of acceptable behaviour by subjects. Algorithm owners are drawn into taking on the tasks of governance, managing and validating the behaviour of those who interact with their systems.

"The governance role offers temptations to indulge in regulatory arbitrage. If governance is left to algorithm owners, it may lead to arbitrary and restrictive controls on individual behaviour. The goal of algorithmic governance by automated decision systems, social media recommender systems, and rating systems is a mirage, retreating into the distance whenever we seem to approach it."

20 hours ago by cshalizi

Phys. Rev. E 100, 022124 (2019) - Stochastic basins of attraction and generalized committor functions

yesterday by cshalizi

"We study two generalizations of the basin of attraction of a stable state, to the case of stochastic dynamics, arbitrary regions, and finite-time horizons. This is done by introducing generalized committor functions and studying soujourn times. We show that the volume of the generalized basin, the basin stability, can be efficiently estimated using Monte Carlo–like techniques, making this concept amenable to the study of high-dimension stochastic systems. Finally, we illustrate in a set of examples that stochastic basins efficiently capture the realm of attraction of metastable sets, which parts of phase space go into long transients in deterministic systems, that they allow us to deal with numerical noise, and can detect the collapse of metastability in high-dimensional systems. We discuss two far-reaching generalizations of the basin of attraction of an attractor. The basin of attraction of an attractor are those states that eventually will get to the attractor. In a generic stochastic system, all regions will be left again; no attraction is permanent. To obtain the equivalent of the basin of attraction of a region we need to generalize the notion to cover finite-time horizons and finite regions. We do so by considering soujourn times, the fraction of time that a trajectory spends in a set, and by generalizing committor functions which arise in the study of hitting probabilities. In a simplified setting we show that these two notions reduce to the normal notions of the basin of attraction in the appropriate limits. We also show that the volume of these stochastic basins can be efficiently estimated for high-dimensional systems at computational cost comparable to that for deterministic systems. To fully illustrate the properties captured by the stochastic basins, we show a set of examples ranging from simple conceptual models to high-dimensional inhomogeneous oscillator chains. These show that stochastic basins efficiently capture metastable attraction, the presence of long transients, that they allow us to deal with numerical and approximation noise, and can detect the collapse of metastability with increasing noise in high-dimensional systems."

to:NB
metastability
non-equilibrium
stochastic_processes
dynamical_systems
to_read
yesterday by cshalizi

[1908.06339] Indetermination of networks structure from the dynamics perspective

yesterday by cshalizi

"Networks are universally considered as complex structures of interactions of large multi-component systems. In order to determine the role that each node has inside a complex network, several centrality measures have been developed. Such topological features are also important for their role in the dynamical processes occurring in networked systems. In this paper, we argue that the dynamical activity of the nodes may strongly reshape their relevance inside the network making centrality measures in many cases misleading. We show that when the dynamics taking place at the local level of the node is slower than the global one between the nodes, then the system may lose track of the structural features. On the contrary, when that ratio is reversed only global properties such as the shortest distances can be recovered. From the perspective of networks inference, this constitutes an uncertainty principle, in the sense that it limits the extraction of multi-resolution information about the structure, particularly in the presence of noise. For illustration purposes, we show that for networks with different time-scale structures such as strong modularity, the existence of fast global dynamics can imply that precise inference of the community structure is impossible."

--- This tends to reinforce my long-standing gut skepticism about the universal value of centrality measures.

to:NB
network_data_analysis
dynamical_systems
to_teach:baby-nets
to_read
--- This tends to reinforce my long-standing gut skepticism about the universal value of centrality measures.

yesterday by cshalizi

[1811.06407] Neural Predictive Belief Representations

yesterday by cshalizi

"Unsupervised representation learning has succeeded with excellent results in many applications. It is an especially powerful tool to learn a good representation of environments with partial or noisy observations. In partially observable domains it is important for the representation to encode a belief state, a sufficient statistic of the observations seen so far. In this paper, we investigate whether it is possible to learn such a belief representation using modern neural architectures. Specifically, we focus on one-step frame prediction and two variants of contrastive predictive coding (CPC) as the objective functions to learn the representations. To evaluate these learned representations, we test how well they can predict various pieces of information about the underlying state of the environment, e.g., position of the agent in a 3D maze. We show that all three methods are able to learn belief representations of the environment, they encode not only the state information, but also its uncertainty, a crucial aspect of belief states. We also find that for CPC multi-step predictions and action-conditioning are critical for accurate belief representations in visually complex environments. The ability of neural representations to capture the belief information has the potential to spur new advances for learning and planning in partially observable domains, where leveraging uncertainty is essential for optimal decision making."

to:NB
prediction
predictive_representations
inference_to_latent_objects
neural_networks
to_read
yesterday by cshalizi

[1811.02549] Language GANs Falling Short

yesterday by cshalizi

"Generating high-quality text with sufficient diversity is essential for a wide range of Natural Language Generation (NLG) tasks. Maximum-Likelihood (MLE) models trained with teacher forcing have consistently been reported as weak baselines, where poor performance is attributed to exposure bias (Bengio et al., 2015; Ranzato et al., 2015); at inference time, the model is fed its own prediction instead of a ground-truth token, which can lead to accumulating errors and poor samples. This line of reasoning has led to an outbreak of adversarial based approaches for NLG, on the account that GANs do not suffer from exposure bias. In this work, we make several surprising observations which contradict common beliefs. First, we revisit the canonical evaluation framework for NLG, and point out fundamental flaws with quality-only evaluation: we show that one can outperform such metrics using a simple, well-known temperature parameter to artificially reduce the entropy of the model's conditional distributions. Second, we leverage the control over the quality / diversity trade-off given by this parameter to evaluate models over the whole quality-diversity spectrum and find MLE models constantly outperform the proposed GAN variants over the whole quality-diversity space. Our results have several implications: 1) The impact of exposure bias on sample quality is less severe than previously thought, 2) temperature tuning provides a better quality / diversity trade-off than adversarial training while being easier to train, easier to cross-validate, and less computationally expensive. Code to reproduce the experiments is available at this http URL"

to:NB
natural_language_processing
model_checking
your_favorite_deep_neural_network_sucks
to_read
yesterday by cshalizi

[1908.06456] Harmonic Analysis of Symmetric Random Graphs

yesterday by cshalizi

"Following Ressel (1985,2008) this note attempts to understand graph limits (Lovasz and Szegedy 2006} in terms of harmonic analysis on semigroups (Berg et al. 1984), thereby providing an alternative derivation of de Finetti's theorem for random exchangeable graphs."

--- SL has been hinting about this for years (it's the natural combination of his 70s--80s work on "extremal point" models, sufficiency, and semi-groups with his recent interest in graph limits and graphons), so I'm very excited to read this.

to:NB
to_read
graph_limits
analysis
probability
lauritzen.steffen
--- SL has been hinting about this for years (it's the natural combination of his 70s--80s work on "extremal point" models, sufficiency, and semi-groups with his recent interest in graph limits and graphons), so I'm very excited to read this.

yesterday by cshalizi

[1901.00555] An Introductory Guide to Fano's Inequality with Applications in Statistical Estimation

2 days ago by cshalizi

"Information theory plays an indispensable role in the development of algorithm-independent impossibility results, both for communication problems and for seemingly distinct areas such as statistics and machine learning. While numerous information-theoretic tools have been proposed for this purpose, the oldest one remains arguably the most versatile and widespread: Fano's inequality. In this chapter, we provide a survey of Fano's inequality and its variants in the context of statistical estimation, adopting a versatile framework that covers a wide range of specific problems. We present a variety of key tools and techniques used for establishing impossibility results via this approach, and provide representative examples covering group testing, graphical model selection, sparse linear regression, density estimation, and convex optimization."

to:NB
information_theory
minimax
statistics
estimation
to_read
2 days ago by cshalizi

Franco Moretti & Oleg Sobchuk, Hidden In Plain Sight, NLR 118, July–August 2019

6 days ago by cshalizi

"If there is one feature that immediately distinguishes the digital humanities (dh) from the ‘other’ humanities, data visualization has to be it. Histograms, scatterplots, time series, diagrams, networks . . . ten, fifteen years ago, studies of film, music, literature or art didn’t use any of these. Now they do, and here we examine some premises (unspoken, and often probably unconscious) of this field-defining practice. Field-defining, because visualization is never just visualization: it involves the formation of corpora, the definition of data, their elaboration, and often some sort of preliminary interpretation as well. Whence the idea of this article: to gather sixty-odd studies that have had a significant impact on dh, and analyse how they visually present their data.footnote1 What interests us is visualization as a practice, in the conviction that practices—what we learn to do by doing, by professional habit, without being fully aware of what we are doing—often have larger theoretical implications than theoretical statements themselves. Whether this has indeed been the case for dh, is for readers to decide."

to:NB
to_read
visual_display_of_quantitative_information
humanities
moretti.franco
6 days ago by cshalizi

Improving Teaching Effectiveness: Final Report: The Intensive Partnerships for Effective Teaching Through 2015–2016 | RAND

13 days ago by cshalizi

"The Intensive Partnerships for Effective Teaching Through 2015–2016"

to:NB
to_read
pedagogy
education
by_people_i_know
13 days ago by cshalizi

[1908.02375] Limit Theorems for Data with Network Structure

13 days ago by cshalizi

"This paper develops new limit theory for data that are generated by networks or more generally display cross-sectional dependence structures that are governed by observable and unobservable characteristics. Strategic network formation models are an example. Wether two data points are highly correlated or not depends on draws from underlying characteristics distributions. The paper defines a measure of closeness that depends on primitive conditions on the distribution of observable characteristics as well as functional form of the underlying model. A summability condition over the probability distribution of observable characteristics is shown to be a critical ingredient in establishing limit results. The paper establishes weak and strong laws of large numbers as well as a stable central limit theorem for a class of statistics that include as special cases network statistics such as average node degrees or average peer characteristics. Some worked examples illustrating the theory are provided."

to:NB
stochastic_processes
networks
network_data_analysis
ergodic_theory
to_read
13 days ago by cshalizi

[1908.02723] Advocacy Learning: Learning through Competition and Class-Conditional Representations

13 days ago by cshalizi

"We introduce advocacy learning, a novel supervised training scheme for attention-based classification problems. Advocacy learning relies on a framework consisting of two connected networks: 1) N Advocates (one for each class), each of which outputs an argument in the form of an attention map over the input, and 2) a Judge, which predicts the class label based on these arguments. Each Advocate produces a class-conditional representation with the goal of convincing the Judge that the input example belongs to their class, even when the input belongs to a different class. Applied to several different classification tasks, we show that advocacy learning can lead to small improvements in classification accuracy over an identical supervised baseline. Though a series of follow-up experiments, we analyze when and how such class-conditional representations improve discriminative performance. Though somewhat counter-intuitive, a framework in which subnetworks are trained to competitively provide evidence in support of their class shows promise, in many cases performing on par with standard learning approaches. This provides a foundation for further exploration into competition and class-conditional representations in supervised learning."

--- Drs. Mercier and Sperber, please call your office. (Also Drs. Jordan and Jacobs...)

to:NB
machine_learning
collective_cognition
ensemble_methods
to_read
--- Drs. Mercier and Sperber, please call your office. (Also Drs. Jordan and Jacobs...)

13 days ago by cshalizi

[1908.02614] The power of dynamic social networks to predict individuals' mental health

13 days ago by cshalizi

"Precision medicine has received attention both in and outside the clinic. We focus on the latter, by exploiting the relationship between individuals' social interactions and their mental health to develop a predictive model of one's likelihood to be depressed or anxious from rich dynamic social network data. To our knowledge, we are the first to do this. Existing studies differ from our work in at least one aspect: they do not model social interaction data as a network; they do so but analyze static network data; they examine "correlation" between social networks and health but without developing a predictive model; or they study other individual traits but not mental health. In a systematic and comprehensive evaluation, we show that our predictive model that uses dynamic social network data is superior to its static network as well as non-network equivalents when run on the same data."

to:NB
social_networks
psychiatry
sociology
prediction
network_data_analysis
lizardo.omar
to_read
13 days ago by cshalizi

[1908.01823] Change-point detection in dynamic networks via graphon estimation

14 days ago by cshalizi

"We propose a general approach for change-point detection in dynamic networks. The proposed method is model-free and covers a wide range of dynamic networks. The key idea behind our approach is to effectively utilize the network structure in designing change-point detection algorithms. This is done via an initial step of graphon estimation, where we propose a modified neighborhood smoothing~(MNBS) algorithm for estimating the link probability matrices of a dynamic network. Based on the initial graphon estimation, we then develop a screening and thresholding algorithm for multiple change-point detection in dynamic networks. The convergence rate and consistency for the change-point detection procedure are derived as well as those for MNBS. When the number of nodes is large~(e.g., exceeds the number of temporal points), our approach yields a faster convergence rate in detecting change-points comparing with an algorithm that simply employs averaged information of the dynamic network across time. Numerical experiments demonstrate robust performance of the proposed algorithm for change-point detection under various types of dynamic networks, and superior performance over existing methods is observed. A real data example is provided to illustrate the effectiveness and practical impact of the procedure."

to:NB
network_data_analysis
change-point_problem
graph_limits
statistics
to_read
14 days ago by cshalizi

Fake news on Twitter during the 2016 U.S. presidential election | Science

15 days ago by cshalizi

"The spread of fake news on social media became a public concern in the United States after the 2016 presidential election. We examined exposure to and sharing of fake news by registered voters on Twitter and found that engagement with fake news sources was extremely concentrated. Only 1% of individuals accounted for 80% of fake news source exposures, and 0.1% accounted for nearly 80% of fake news sources shared. Individuals most likely to engage with fake news sources were conservative leaning, older, and highly engaged with political news. A cluster of fake news sources shared overlapping audiences on the extreme right, but for people across the political spectrum, most political news exposure still came from mainstream media outlets."

--- To Bruce Sterling's (obviously correct) dictum that "The future is about old people, in big cities, afraid of the sky", perhaps we should add "outraged at nonsense".

to:NB
to_read
social_media
deceiving_us_has_become_an_industrial_process
natural_history_of_truthiness
lazer.david
us_politics
networked_life
re:actually-dr-internet-is-the-name-of-the-monsters-creator
--- To Bruce Sterling's (obviously correct) dictum that "The future is about old people, in big cities, afraid of the sky", perhaps we should add "outraged at nonsense".

15 days ago by cshalizi

[1908.00882] Population Predictive Checks

16 days ago by cshalizi

"Bayesian modeling has become a staple for researchers analyzing data. Thanks to recent developments in approximate posterior inference, modern researchers can easily build, use, and revise complicated Bayesian models for large and rich data. These new abilities, however, bring into focus the problem of model assessment. Researchers need tools to diagnose the fitness of their models, to understand where a model falls short, and to guide its revision. In this paper we develop a new method for Bayesian model checking, the population predictive check (Pop-PC). Pop-PCs are built on posterior predictive checks (PPC), a seminal method that checks a model by assessing the posterior predictive distribution on the observed data. Though powerful, PPCs use the data twice---both to calculate the posterior predictive and to evaluate it---which can lead to overconfident assessments. Pop-PCs, in contrast, compare the posterior predictive distribution to the population distribution of the data. This strategy blends Bayesian modeling with frequentist assessment, leading to a robust check that validates the model on its generalization. Of course the population distribution is not usually available; thus we use tools like the bootstrap and cross validation to estimate the Pop-PC. Further, we extend Pop-PCs to hierarchical models. We study Pop-PCs on classical regression and a hierarchical model of text. We show that Pop-PCs are robust to overfitting and can be easily deployed on a broad family of models."

to:NB
model_checking
bayesianism
statistics
blei.david
re:phil-of-bayes_paper
to_read
cross-validation
16 days ago by cshalizi

Kontorovich , Pinelis : Exact lower bounds for the agnostic probably-approximately-correct (PAC) machine learning model

17 days ago by cshalizi

"We provide an exact nonasymptotic lower bound on the minimax expected excess risk (EER) in the agnostic probably-approximately-correct (PAC) machine learning classification model and identify minimax learning algorithms as certain maximally symmetric and minimally randomized “voting” procedures. Based on this result, an exact asymptotic lower bound on the minimax EER is provided. This bound is of the simple form c∞/ν√c∞/ν as ν→∞ν→∞, where c∞=0.16997…c∞=0.16997… is a universal constant, ν=m/dν=m/d, mm is the size of the training sample and dd is the Vapnik–Chervonenkis dimension of the hypothesis class. It is shown that the differences between these asymptotic and nonasymptotic bounds, as well as the differences between these two bounds and the maximum EER of any learning algorithms that minimize the empirical risk, are asymptotically negligible, and all these differences are due to ties in the mentioned “voting” procedures. A few easy to compute nonasymptotic lower bounds on the minimax EER are also obtained, which are shown to be close to the exact asymptotic lower bound c∞/ν√c∞/ν even for rather small values of the ratio ν=m/dν=m/d. As an application of these results, we substantially improve existing lower bounds on the tail probability of the excess risk. Among the tools used are Bayes estimation and apparently new identities and inequalities for binomial distributions."

to:NB
learning_theory
statistics
minimax
kontorovich.aryeh
kith_and_kin
to_read
17 days ago by cshalizi

[1907.13323] Multi-cause causal inference with unmeasured confounding and binary outcome

20 days ago by cshalizi

"Unobserved confounding presents a major threat to causal inference in observational studies. Recently, several authors suggest that this problem may be overcome in a shared confounding setting where multiple treatments are independent given a common latent confounder. It has been shown that if additional data such as negative controls are available, then the causal effects are indeed identifiable. In this paper, we show that these additional data are not necessary for causal identification, provided that the treatments and outcome follow Gaussian and logistic structural equation models, respectively. Our novel identification strategy is based on the symmetry and tail properties of the observed data distribution. We further develop two-step likelihood-based estimation procedures. We illustrate our method through simulations and a real data application studying the causal relationship between the volume of various brain regions and cognitive scores."

to:NB
causal_inference
statistics
to_read
20 days ago by cshalizi

[1907.12581] Improved mutual information measure for classification and community detection

20 days ago by cshalizi

"The information theoretic quantity known as mutual information finds wide use in classification and community detection analyses to compare two classifications of the same set of objects into groups. In the context of classification algorithms, for instance, it is often used to compare discovered classes to known ground truth and hence to quantify algorithm performance. Here we argue that the standard mutual information, as commonly defined, omits a crucial term which can become large under real-world conditions, producing results that can be substantially in error. We demonstrate how to correct this error and define a mutual information that works in all cases. We discuss practical implementation of the new measure and give some example applications."

--- From a quick scan, this is essentially saying that one should do two-part (old Rissanen) style minimum description length coding, so you need to give both the information content of the correspondence, _and_ specify which correspondence you're using.

to:NB
to_read
information_theory
have_skimmed
community_discovery
kith_and_kin
newman.mark
--- From a quick scan, this is essentially saying that one should do two-part (old Rissanen) style minimum description length coding, so you need to give both the information content of the correspondence, _and_ specify which correspondence you're using.

20 days ago by cshalizi

[1903.06936] Bayesian and Spline based Approaches for (EM based) Graphon Estimation

21 days ago by cshalizi

"The paper proposes the estimation of a graphon function for network data using principles of the EM algorithm. The approach considers both, variability with respect to ordering the nodes of a network and estimation of the unique representation of a graphon. To do so (linear) B-splines are used, which allows to easily accommodate constraints in the estimation routine so that the estimated graphon fulfills the canonical representation, meaning its univariate margin is monotonic. The graphon estimate itself allows to apply Bayesian ideas to explore both, the degree distribution and the ordering of the nodes with respect to their degree. Variability and uncertainty is taken into account using MCMC techniques. Combining both steps gives an EM based approach for graphon estimation."

GODDAMIT.

to:NB
to_read
graph_limits
network_data_analysis
nonparametrics
statistics
re:smoothing_adjacency_matrices
scooped?
GODDAMIT.

21 days ago by cshalizi

[1507.08140] Degree-based goodness-of-fit tests for heterogeneous random graph models : independent and exchangeable cases

22 days ago by cshalizi

"The degrees are a classical and relevant way to study the topology of a network. They can be used to assess the goodness-of-fit for a given random graph model. In this paper we introduce goodness-of-fit tests for two classes of models. First, we consider the case of independent graph models such as the heterogeneous Erdös-Rényi model in which the edges have different connection probabilities. Second, we consider a generic model for exchangeable random graphs called the W-graph. The stochastic block model and the expected degree distribution model fall within this framework. We prove the asymptotic normality of the degree mean square under these independent and exchangeable models and derive formal tests. We study the power of the proposed tests and we prove the asymptotic normality under specific sparsity regimes. The tests are illustrated on real networks from social sciences and ecology, and their performances are assessed via a simulation study."

Journal version: https://doi.org/10.1111/sjos.12410

to:NB
goodness-of-fit
network_data_analysis
graph_limits
to_teach:graphons
to_read
Journal version: https://doi.org/10.1111/sjos.12410

22 days ago by cshalizi

Free trade and opioid overdose death in the United States - ScienceDirect

22 days ago by cshalizi

"Opioid overdose deaths in the U.S. rose dramatically after 1999, but also exhibited substantial geographic variation. This has largely been explained by differential availability of prescription and non-prescription opioids, including heroin and fentanyl. Recent studies explore the underlying role of socioeconomic factors, but overlook the influence of job loss due to international trade, an economic phenomenon that disproportionately harms the same regions and demographic groups at the heart of the opioid epidemic. We used OLS regression and county-year level data from the Centers for Disease Controls and the Department of Labor to test the association between trade-related job loss and opioid-related overdose death between 1999 and 2015. We find that the loss of 1000 trade-related jobs was associated with a 2.7 percent increase in opioid-related deaths. When fentanyl was present in the heroin supply, the same number of job losses was associated with a 11.3 percent increase in opioid-related deaths."

--- I'm very skeptical about OLS here. Something like nearest neighbors would be better here, but I'm not sure how to handle spatial correlation.

to:NB
to_read
drugs
whats_gone_wrong_with_america
class_struggles_in_america
econometrics
statistics
globalization
to_teach:data_over_space_and_time
to_teach:undergrad-ADA
causal_inference
--- I'm very skeptical about OLS here. Something like nearest neighbors would be better here, but I'm not sure how to handle spatial correlation.

22 days ago by cshalizi

Peeling the Onion of Brain Representations | Annual Review of Neuroscience

22 days ago by cshalizi

"The brain's function is to enable adaptive behavior in the world. To this end, the brain processes information about the world. The concept of representation links the information processed by the brain back to the world and enables us to understand what the brain does at a functional level. The appeal of making the connection between brain activity and what it represents has been irresistible to neuroscience, despite the fact that representational interpretations pose several challenges: We must define which aspects of brain activity matter, how the code works, and how it supports computations that contribute to adaptive behavior. It has been suggested that we might drop representational language altogether and seek to understand the brain, more simply, as a dynamical system. In this review, we argue that the concept of representation provides a useful link between dynamics and computational function and ask which aspects of brain activity should be analyzed to achieve a representational understanding. We peel the onion of brain representations in search of the layers (the aspects of brain activity) that matter to computation. The article provides an introduction to the motivation and mathematics of representational models, a critical discussion of their assumptions and limitations, and a preview of future directions in this area."

to:NB
neuroscience
cognitive_science
representation
kriegeskorte.nikolaus
computation
to_read
22 days ago by cshalizi

[1907.09611] Asymptotic normality, concentration, and coverage of generalized posteriors

28 days ago by cshalizi

"Generalized likelihoods are commonly used to obtain consistent estimators with attractive computational and robustness properties. Formally, any generalized likelihood can be used to define a generalized posterior distribution, but an arbitrarily defined "posterior" cannot be expected to appropriately quantify uncertainty in any meaningful sense. In this article, we provide sufficient conditions under which generalized posteriors exhibit concentration, asymptotic normality (Bernstein-von Mises), an asymptotically correct Laplace approximation, and asymptotically correct frequentist coverage. We apply our results in detail to generalized posteriors for a wide array of generalized likelihoods, including pseudolikelihoods in general, the Ising model pseudolikelihood, the Gaussian Markov random field pseudolikelihood, the fully observed Boltzmann machine pseudolikelihood, the Cox proportional hazards partial likelihood, and a median-based likelihood for robust inference of location. Further, we show how our results can be used to easily establish the asymptotics of standard posteriors for exponential families and generalized linear models. We make no assumption of model correctness so that our results apply with or without misspecification."

to:NB
bayesian_consistency
statistics
to_read
likelihood
misspecification
28 days ago by cshalizi

[1806.07016] Evaluating Ex Ante Counterfactual Predictions Using Ex Post Causal Inference

28 days ago by cshalizi

"We derive a formal, decision-based method for comparing the performance of counterfactual treatment regime predictions using the results of experiments that give relevant information on the distribution of treated outcomes. Our approach allows us to quantify and assess the statistical significance of differential performance for optimal treatment regimes estimated from structural models, extrapolated treatment effects, expert opinion, and other methods. We apply our method to evaluate optimal treatment regimes for conditional cash transfer programs across countries where predictions are generated using data from experimental evaluations in other countries and pre-program data in the country of interest."

to:NB
causal_inference
model_checking
statistics
to_read
samii.cyrus
28 days ago by cshalizi

[1707.00833] Two-sample Hypothesis Testing for Inhomogeneous Random Graphs

4 weeks ago by cshalizi

"The study of networks leads to a wide range of high dimensional inference problems. In many practical applications, one needs to draw inference from one or few large sparse networks. The present paper studies hypothesis testing of graphs in this high-dimensional regime, where the goal is to test between two populations of inhomogeneous random graphs defined on the same set of n vertices. The size of each population m is much smaller than n, and can even be a constant as small as 1. The critical question in this context is whether the problem is solvable for small m.

"We answer this question from a minimax testing perspective. Let P,Q be the population adjacencies of two sparse inhomogeneous random graph models, and d be a suitably defined distance function. Given a population of m graphs from each model, we derive minimax separation rates for the problem of testing P=Q against d(P,Q)>ρ. We observe that if m is small, then the minimax separation is too large for some popular choices of d, including total variation distance between corresponding distributions. This implies that some models that are widely separated in d cannot be distinguished for small m, and hence, the testing problem is generally not solvable in these cases.

"We also show that if m>1, then the minimax separation is relatively small if d is the Frobenius norm or operator norm distance between P and Q. For m=1, only the latter distance provides small minimax separation. Thus, for these distances, the problem is solvable for small m. We also present near-optimal two-sample tests in both cases, where tests are adaptive with respect to sparsity level of the graphs."

to:NB
hypothesis_testing
network_data_analysis
statistics
re:network_differences
to_read
"We answer this question from a minimax testing perspective. Let P,Q be the population adjacencies of two sparse inhomogeneous random graph models, and d be a suitably defined distance function. Given a population of m graphs from each model, we derive minimax separation rates for the problem of testing P=Q against d(P,Q)>ρ. We observe that if m is small, then the minimax separation is too large for some popular choices of d, including total variation distance between corresponding distributions. This implies that some models that are widely separated in d cannot be distinguished for small m, and hence, the testing problem is generally not solvable in these cases.

"We also show that if m>1, then the minimax separation is relatively small if d is the Frobenius norm or operator norm distance between P and Q. For m=1, only the latter distance provides small minimax separation. Thus, for these distances, the problem is solvable for small m. We also present near-optimal two-sample tests in both cases, where tests are adaptive with respect to sparsity level of the graphs."

4 weeks ago by cshalizi

[1907.01605] Limits of Sparse Configuration Models and Beyond: Graphexes and Multi-Graphexes

4 weeks ago by cshalizi

"We investigate structural properties of large, sparse random graphs through the lens of "sampling convergence" (Borgs et. al. (2017)). Sampling convergence generalizes left convergence to sparse graphs, and describes the limit in terms of a "graphex". We introduce a notion of sampling convergence for sequences of multigraphs, and establish the graphex limit for the configuration model, a preferential attachment model, the generalized random graph, and a bipartite variant of the configuration model. The results for the configuration model, preferential attachment model and bipartite configuration model provide necessary and sufficient conditions for these random graph models to converge. The limit for the configuration model and the preferential attachment model is an augmented version of an exchangeable random graph model introduced by Caron and Fox (2017)."

in_NB
graph_limits
to_read
probability
chayes.jennifer
borgs.christian
4 weeks ago by cshalizi

The Standard Errors of Persistence

5 weeks ago by cshalizi

"A large literature on persistence finds that many modern outcomes strongly reflect characteristics of the same places in the distant past. However, alongside unusually high t statistics, these regressions display severe spatial auto-correlation in residuals, and the purpose of this paper is to examine whether these two properties might be connected. We start by running artificial regressions where both variables are spatial noise and find that, even for modest ranges of spatial correlation between points, t statistics become severely inflated leading to significance levels that are in error by several orders of magnitude. We analyse 27 persistence studies in leading journals and find that in most cases if we replace the main explanatory variable with spatial noise the fit of the regression commonly improves; and if we replace the dependent variable with spatial noise, the persistence variable can still explain it at high significance levels. We can predict in advance which persistence results might be the outcome of fitting spatial noise from the degree of spatial au-tocorrelation in their residuals measured by a standard Moran statistic. Our findings suggest that the results of persistence studies, and of spatial regressions more generally, might be treated with some caution in the absence of reported Moran statistics and noise simulations."

to:NB
to_read
econometrics
regression
spatial_statistics
to_teach:data_over_space_and_time
via:jbdelong
5 weeks ago by cshalizi

Cheng , Chen : Nonparametric inference via bootstrapping the debiased estimator

6 weeks ago by cshalizi

"In this paper, we propose to construct confidence bands by bootstrapping the debiased kernel density estimator (for density estimation) and the debiased local polynomial regression estimator (for regression analysis). The idea of using a debiased estimator was recently employed by Calonico et al. (2018b) to construct a confidence interval of the density function (and regression function) at a given point by explicitly estimating stochastic variations. We extend their ideas of using the debiased estimator and further propose a bootstrap approach for constructing simultaneous confidence bands. This modified method has an advantage that we can easily choose the smoothing bandwidth from conventional bandwidth selectors and the confidence band will be asymptotically valid. We prove the validity of the bootstrap confidence band and generalize it to density level sets and inverse regression problems. Simulation studies confirm the validity of the proposed confidence bands/sets. We apply our approach to an Astronomy dataset to show its applicability."

to:NB
to_read
statistics
bootstrap
confidence_sets
regression
density_estimation
re:ADAfaEPoV
6 weeks ago by cshalizi

[1811.08525] Consensus and Polarisation in Competing Complex Contagion Processes

8 weeks ago by cshalizi

"The rate of adoption of new information depends on reinforcement from multiple sources in a way that often cannot be described by simple contagion processes. In such cases, contagion is said to be complex. Complex contagion happens in the diffusion of human behaviours, innovations, and knowledge. Based on that evidence, we propose a model that considers multiple, potentially asymmetric, and competing contagion processes and analyse its respective population-wide dynamics, bringing together ideas from complex contagion, opinion dynamics, evolutionary game theory, and language competition by shifting the focus from individuals to the properties of the diffusing processes. We show that our model spans a dynamical space in which the population exhibits patterns of consensus, dominance, and, importantly, different types of polarisation, a more diverse dynamical environment that contrasts with single simple contagion processes. We show how these patterns emerge and how different population structures modify them through a natural development of spatial correlations: structured interactions increase the range of the dominance regime by reducing that of dynamic polarisation, tight modular structures can generate structural polarisation, depending on the interplay between fundamental properties of the processes and the modularity of the interaction network."

to:NB
epidemic_models
diffusion_of_innovations
epidemiology_of_ideas
social_networks
levin.simon
to_read
re:do-institutions-evolve
8 weeks ago by cshalizi

[1712.07248] Towards a General Large Sample Theory for Regularized Estimators

8 weeks ago by cshalizi

"We present a general framework for studying regularized estimators; such estimators are pervasive in estimation problems wherein "plug-in" type estimators are either ill-defined or ill-behaved. Within this framework, we derive, under primitive conditions, consistency and a generalization of the asymptotic linearity property. We also provide data-driven methods for choosing tuning parameters that, under some conditions, achieve the aforementioned properties. We illustrate the scope of our approach by studying a wide range of applications, revisiting known results and deriving new ones."

to:NB
statistics
estimation
optimization
to_read
8 weeks ago by cshalizi

[1704.04118] From Data to Decisions: Distributionally Robust Optimization is Optimal

9 weeks ago by cshalizi

"We study stochastic programs where the decision-maker cannot observe the distribution of the exogenous uncertainties but has access to a finite set of independent samples from this distribution. In this setting, the goal is to find a procedure that transforms the data to an estimate of the expected cost function under the unknown data-generating distribution, i.e., a predictor, and an optimizer of the estimated cost function that serves as a near-optimal candidate decision, i.e., a prescriptor. As functions of the data, predictors and prescriptors constitute statistical estimators. We propose a meta-optimization problem to find the least conservative predictors and prescriptors subject to constraints on their out-of-sample disappointment. The out-of-sample disappointment quantifies the probability that the actual expected cost of the candidate decision under the unknown true distribution exceeds its predicted cost. Leveraging tools from large deviations theory, we prove that this meta-optimization problem admits a unique solution: The best predictor-prescriptor pair is obtained by solving a distributionally robust optimization problem over all distributions within a given relative entropy distance from the empirical distribution of the data."

--- Physicists re-inventing learning theory for generalization error bounds?

to:NB
to_read
learning_theory
large_deviations
decision_theory
statistics
--- Physicists re-inventing learning theory for generalization error bounds?

9 weeks ago by cshalizi

[1902.02580] The few-get-richer: a surprising consequence of popularity-based rankings

9 weeks ago by cshalizi

"Ranking algorithms play a crucial role in online platforms ranging from search engines to recommender systems. In this paper, we identify a surprising consequence of popularity-based rankings: the fewer the items reporting a given signal, the higher the share of the overall traffic they collectively attract. This few-get-richer effect emerges in settings where there are few distinct classes of items (e.g., left-leaning news sources versus right-leaning news sources), and items are ranked based on their popularity. We demonstrate analytically that the few-get-richer effect emerges when people tend to click on top-ranked items and have heterogeneous preferences for the classes of items. Using simulations, we analyze how the strength of the effect changes with assumptions about the setting and human behavior. We also test our predictions experimentally in an online experiment with human participants. Our findings have important implications to understand the spread of misinformation."

to:NB
information_retrieval
networked_life
why_oh_why_cant_we_have_a_better_press_corps
to_read
9 weeks ago by cshalizi

Huang , Reich , Fuentes , Sankarasubramanian : Complete spatial model calibration

9 weeks ago by cshalizi

"Computer simulation models are central to environmental science. These mathematical models are used to understand complex weather and climate patterns and to predict the climate’s response to different forcings. Climate models are of course not perfect reflections of reality, and so comparison with observed data is needed to quantify and to correct for biases and other deficiencies. We propose a new method to calibrate model output using observed data. Our approach not only matches the marginal distributions of the model output and gridded observed data, but it simultaneously postprocesses the model output to have the same spatial correlation as the observed data. This comprehensive calibration method permits realistic spatial simulations for regional impact studies. We apply the proposed method to global climate model output in North America and show that it successfully calibrates the model output for temperature and precipitation."

to:NB
spatial_statistics
simulation
model_checking
statistics
to_teach:data_over_space_and_time
to_read
9 weeks ago by cshalizi

Panel Data Analysis via Mechanistic Models: Journal of the American Statistical Association: Vol 0, No 0

9 weeks ago by cshalizi

"Panel data, also known as longitudinal data, consist of a collection of time series. Each time series, which could itself be multivariate, comprises a sequence of measurements taken on a distinct unit. Mechanistic modeling involves writing down scientifically motivated equations describing the collection of dynamic systems giving rise to the observations on each unit. A defining characteristic of panel systems is that the dynamic interaction between units should be negligible. Panel models therefore consist of a collection of independent stochastic processes, generally linked through shared parameters while also having unit-specific parameters. To give the scientist flexibility in model specification, we are motivated to develop a framework for inference on panel data permitting the consideration of arbitrary nonlinear, partially observed panel models. We build on iterated filtering techniques that provide likelihood-based inference on nonlinear partially observed Markov process models for time series data. Our methodology depends on the latent Markov process only through simulation; this plug-and-play property ensures applicability to a large class of models. We demonstrate our methodology on a toy example and two epidemiological case studies. We address inferential and computational issues arising due to the combination of model complexity and dataset size."

to:NB
statistics
statistical_inference_for_stochastic_processes
time_series
ionides.edward
to_read
particle_filters
9 weeks ago by cshalizi

[1904.02610] Diverse communities behave like typical random ecosystems

9 weeks ago by cshalizi

"With a brief letter to Nature in 1972, Robert May triggered a worldwide research program in theoretical ecology and complex systems that continues to this day. Building on powerful mathematical results about large random matrices, he argued that systems with sufficiently large numbers of interacting components are generically unstable. In the ecological context, May's thesis directly contradicted the longstanding ecological intuition that diversity promotes stability. In economics and finance, May's work helped to consolidate growing concerns about the fragility of an increasingly interconnected global marketplace. In this Letter, we draw on recent theoretical progress in random matrix theory and statistical physics to fundamentally extend and reinterpret May's theorem. We confirm that a wide range of ecological models become unstable at the point predicted by May, even when the models do not strictly follow his assumptions. Surprisingly, increasing the interaction strength or diversity beyond the May threshold results in a reorganization of the ecosystem -- through extinction of a fixed fraction of species -- into a new stable state whose properties are well described by purely random interactions. This self-organized state remains stable for arbitrarily large ecosystem and suggests a new interpretation of May's original conclusions: when interacting complex systems with many components become sufficiently large, they will generically undergo a transition to a "typical" self-organized, stable state."

to:NB
ecology
random_matrices
self-organization
to_read
9 weeks ago by cshalizi

[1906.05433] Tackling Climate Change with Machine Learning

9 weeks ago by cshalizi

"Climate change is one of the greatest challenges facing humanity, and we, as machine learning experts, may wonder how we can help. Here we describe how machine learning can be a powerful tool in reducing greenhouse gas emissions and helping society adapt to a changing climate. From smart grids to disaster management, we identify high impact problems where existing gaps can be filled by machine learning, in collaboration with other fields. Our recommendations encompass exciting research questions as well as promising business opportunities. We call on the machine learning community to join the global effort against climate change."

--- My gut reaction is that this is well-intentioned but point-missing, but note the final tags.

to:NB
climate_change
machine_learning
to_read
to_be_shot_after_a_fair_trial
--- My gut reaction is that this is well-intentioned but point-missing, but note the final tags.

9 weeks ago by cshalizi

The Sad Truth about Happiness Scales | Journal of Political Economy: Ahead of Print

10 weeks ago by cshalizi

"Happiness is reported in ordered intervals (e.g., very, pretty, not too happy). We review and apply standard statistical results to determine when such data permit identification of two groups’ relative average happiness. The necessary conditions for nonparametric identification are strong and unlikely to ever be satisfied. Standard parametric approaches cannot identify this ranking unless the variances are exactly equal. If not, ordered probit findings can be reversed by lognormal transformations. For nine prominent happiness research areas, conditions for nonparametric identification are rejected and standard parametric results are reversed using plausible transformations. Tests for a common reporting function consistently reject."

to:NB
to_read
social_measurement
psychometrics
statistics
via:phnk
10 weeks ago by cshalizi

Life after Lead: Effects of Early Interventions for Children Exposed to Lead

10 weeks ago by cshalizi

"Lead pollution is consistently linked to cognitive and behavioral impairments, yet little is known about the benefits of public health interventions for children exposed to lead. This paper estimates the long-term impacts of early-life interventions (e.g. lead remediation, nutritional assessment, medical evaluation, developmental surveillance, and public assistance referrals) recommended for lead-poisoned children. Using linked administrative data from Charlotte, NC, we compare outcomes for children who are similar across observable characteristics but differ in eligibility for intervention due to blood lead test results. We find that the negative outcomes previously associated with early-life exposure can largely be reversed by intervention."

--- The last tag, as usual, is conditional on liking the paper after reading it, and on replication data being available.

to:NB
to_read
lead
cognitive_development
sociology
causal_inference
to_teach:undergrad-ADA
--- The last tag, as usual, is conditional on liking the paper after reading it, and on replication data being available.

10 weeks ago by cshalizi

Phys. Rev. E 99, 062301 (2019) - Social clustering in epidemic spread on coevolving networks

10 weeks ago by cshalizi

"Even though transitivity is a central structural feature of social networks, its influence on epidemic spread on coevolving networks has remained relatively unexplored. Here we introduce and study an adaptive susceptible-infected-susceptible (SIS) epidemic model wherein the infection and network coevolve with nontrivial probability to close triangles during edge rewiring, leading to substantial reinforcement of network transitivity. This model provides an opportunity to study the role of transitivity in altering the SIS dynamics on a coevolving network. Using numerical simulations and approximate master equations (AMEs), we identify and examine a rich set of dynamical features in the model. In many cases, AMEs including transitivity reinforcement provide accurate predictions of stationary-state disease prevalence and network degree distributions. Furthermore, for some parameter settings, the AMEs accurately trace the temporal evolution of the system. We show that higher transitivity reinforcement in the model leads to lower levels of infective individuals in the population, when closing a triangle is the dominant rewiring mechanism. These methods and results may be useful in developing ideas and modeling strategies for controlling SIS-type epidemics."

in_NB
epidemic_models
mucha.peter_j.
networks
re:do-institutions-evolve
to_read
10 weeks ago by cshalizi

Recent Advances in Spatio‐Temporal Methodology - Wikle - 2019 - Journal of Time Series Analysis - Wiley Online Library

11 weeks ago by cshalizi

"This special issue consists of a collection of articles that describe innovations in spatio‐temporal methodology..."

to:NB
to_read
spatio-temporal_statistics
statistics
to_teach:data_over_space_and_time
11 weeks ago by cshalizi

[1905.12580] Model Similarity Mitigates Test Set Overuse

11 weeks ago by cshalizi

"Excessive reuse of test data has become commonplace in today's machine learning workflows. Popular benchmarks, competitions, industrial scale tuning, among other applications, all involve test data reuse beyond guidance by statistical confidence bounds. Nonetheless, recent replication studies give evidence that popular benchmarks continue to support progress despite years of extensive reuse. We proffer a new explanation for the apparent longevity of test data: Many proposed models are similar in their predictions and we prove that this similarity mitigates overfitting. Specifically, we show empirically that models proposed for the ImageNet ILSVRC benchmark agree in their predictions well beyond what we can conclude from their accuracy levels alone. Likewise, models created by large scale hyperparameter search enjoy high levels of similarity. Motivated by these empirical observations, we give a non-asymptotic generalization bound that takes similarity into account, leading to meaningful confidence bounds in practical settings."

--- So, the only reason what we're doing works is that we're not really changing very much?

to:NB
learning_theory
cross-validation
to_read
recht.benjamin
--- So, the only reason what we're doing works is that we're not really changing very much?

11 weeks ago by cshalizi

[1905.12202] Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness

11 weeks ago by cshalizi

"Many recent works have shown that adversarial examples that fool classifiers can be found by minimally perturbing a normal input. Recent theoretical results, starting with Gilmer et al. (2018), show that if the inputs are drawn from a concentrated metric probability space, then adversarial examples with small perturbation are inevitable. A concentrated space has the property that any subset with Ω(1) (e.g., 1/100) measure, according to the imposed distribution, has small distance to almost all (e.g., 99/100) of the points in the space. It is not clear, however, whether these theoretical results apply to actual distributions such as images. This paper presents a method for empirically measuring and bounding the concentration of a concrete dataset which is proven to converge to the actual concentration. We use it to empirically estimate the intrinsic robustness to ℓ∞ and ℓ2 perturbations of several image classification benchmarks."

in_NB
to_read
adversarial_examples
concentration_of_measure
probability
statistics
learning_theory
11 weeks ago by cshalizi

[1701.00505] Statistical inference for network samples using subgraph counts

11 weeks ago by cshalizi

"We consider that a network is an observation, and a collection of observed networks forms a sample. In this setting, we provide methods to test whether all observations in a network sample are drawn from a specified model. We achieve this by deriving, under the null of the graphon model, the joint asymptotic properties of average subgraph counts as the number of observed networks increases but the number of nodes in each network remains finite. In doing so, we do not require that each observed network contains the same number of nodes, or is drawn from the same distribution. Our results yield joint confidence regions for subgraph counts, and therefore methods for testing whether the observations in a network sample are drawn from: a specified distribution, a specified model, or from the same model as another network sample. We present simulation experiments and an illustrative example on a sample of brain networks where we find that highly creative individuals' brains present significantly more short cycles."

to:NB
to_read
network_data_analysis
graphical_models
re:network_differences
11 weeks ago by cshalizi

[1905.11381] Trust but Verify: An Information-Theoretic Explanation for the Adversarial Fragility of Machine Learning Systems, and a General Defense against Adversarial Attacks

11 weeks ago by cshalizi

"Deep-learning based classification algorithms have been shown to be susceptible to adversarial attacks: minor changes to the input of classifiers can dramatically change their outputs, while being imperceptible to humans. In this paper, we present a simple hypothesis about a feature compression property of artificial intelligence (AI) classifiers and present theoretical arguments to show that this hypothesis successfully accounts for the observed fragility of AI classifiers to small adversarial perturbations. Drawing on ideas from information and coding theory, we propose a general class of defenses for detecting classifier errors caused by abnormally small input perturbations. We further show theoretical guarantees for the performance of this detection method. We present experimental results with (a) a voice recognition system, and (b) a digit recognition system using the MNIST database, to demonstrate the effectiveness of the proposed defense methods. The ideas in this paper are motivated by a simple analogy between AI classifiers and the standard Shannon model of a communication system."

in_NB
information_theory
adversarial_examples
to_read
to_be_shot_after_a_fair_trial
11 weeks ago by cshalizi

[1905.11382] State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations

11 weeks ago by cshalizi

"Machine learning promises methods that generalize well from finite labeled data. However, the brittleness of existing neural net approaches is revealed by notable failures, such as the existence of adversarial examples that are misclassified despite being nearly identical to a training example, or the inability of recurrent sequence-processing nets to stay on track without teacher forcing. We introduce a method, which we refer to as \emph{state reification}, that involves modeling the distribution of hidden states over the training data and then projecting hidden states observed during testing toward this distribution. Our intuition is that if the network can remain in a familiar manifold of hidden space, subsequent layers of the net should be well trained to respond appropriately. We show that this state-reification method helps neural nets to generalize better, especially when labeled data are sparse, and also helps overcome the challenge of achieving robust generalization with adversarial training."

--- My suspicion, admittedly based only on the abstract, is that this will, at best, be yet another re-invention of predictive states (http://bactra.org/notebooks/prediction-process.html). That would not, actually, be a bad thing.

to:NB
to_read
neural_networks
learning_theory
your_favorite_deep_neural_network_sucks
adversarial_examples
--- My suspicion, admittedly based only on the abstract, is that this will, at best, be yet another re-invention of predictive states (http://bactra.org/notebooks/prediction-process.html). That would not, actually, be a bad thing.

11 weeks ago by cshalizi

[1901.08082] Cooperative Online Learning: Keeping your Neighbors Updated

11 weeks ago by cshalizi

"We study an asynchronous online learning setting with a network of agents. At each time step, some of the agents are activated, requested to make a prediction, and pay the corresponding loss. The loss function is then revealed to these agents and also to their neighbors in the network. When activations are stochastic, we show that the regret achieved by N agents running the standard Online Mirror Descent is (αT‾‾‾√), where T is the horizon and α≤N is the independence number of the network. This is in contrast to the regret Ω(NT‾‾‾√) which N agents incur in the same setting when feedback is not shared. We also show a matching lower bound of order αT‾‾‾√ that holds for any given network. When the pattern of agent activations is arbitrary, the problem changes significantly: we prove a Ω(T) lower bound on the regret that holds for any online algorithm oblivious to the feedback source."

to:NB
social_learning
online_learning
low-regret_learning
learning_theory
cesa-bianchi.nicolo
monteleoni.claire
to_read
re:democratic_cognition
11 weeks ago by cshalizi

[1905.10854] All Neural Networks are Created Equal

12 weeks ago by cshalizi

"One of the unresolved questions in the context of deep learning is the triumph of GD based optimization, which is guaranteed to converge to one of many local minima. To shed light on the nature of the solutions that are thus being discovered, we investigate the ensemble of solutions reached by the same network architecture, with different random initialization of weights and random mini-batches. Surprisingly, we observe that these solutions are in fact very similar - more often than not, each train and test example is either classified correctly by all the networks, or by none at all. Moreover, all the networks seem to share the same learning dynamics, whereby initially the same train and test examples are incorporated into the learnt model, followed by other examples which are learnt in roughly the same order. When different neural network architectures are compared, the same learning dynamics is observed even when one architecture is significantly stronger than the other and achieves higher accuracy. Finally, when investigating other methods that involve the gradual refinement of a solution, such as boosting, once again we see the same learning pattern. In all cases, it appears as if all the classifiers start by learning to classify correctly the same train and test examples, while the more powerful classifiers continue to learn to classify correctly additional examples. These results are incredibly robust, observed for a large variety of architectures, hyperparameters and different datasets of images. Thus we observe that different classification solutions may be discovered by different means, but typically they evolve in roughly the same manner and demonstrate a similar success and failure behavior. For a given dataset, such behavior seems to be strongly correlated with effective generalization, while the induced ranking of examples may reflect inherent structure in the data."

!!!

to:NB
to_read
optimization
machine_learning
neural_networks
your_favorite_deep_neural_network_sucks
!!!

12 weeks ago by cshalizi

[1905.10857] Causal Discovery and Forecasting in Nonstationary Environments with State-Space Models

12 weeks ago by cshalizi

"In many scientific fields, such as economics and neuroscience, we are often faced with nonstationary time series, and concerned with both finding causal relations and forecasting the values of variables of interest, both of which are particularly challenging in such nonstationary environments. In this paper, we study causal discovery and forecasting for nonstationary time series. By exploiting a particular type of state-space model to represent the processes, we show that nonstationarity helps to identify causal structure and that forecasting naturally benefits from learned causal knowledge. Specifically, we allow changes in both causal strengths and noise variances in the nonlinear state-space models, which, interestingly, renders both the causal structure and model parameters identifiable. Given the causal model, we treat forecasting as a problem in Bayesian inference in the causal model, which exploits the time-varying property of the data and adapts to new observations in a principled manner. Experimental results on synthetic and real-world data sets demonstrate the efficacy of the proposed methods."

to:NB
causal_inference
causal_discovery
state-space_models
time_series
non-stationarity
statistics
kith_and_kin
glymour.clark
to_read
zhang.kun
12 weeks ago by cshalizi

[1808.06581] The Deconfounded Recommender: A Causal Inference Approach to Recommendation

12 weeks ago by cshalizi

"The goal of recommendation is to show users items that they will like. Though usually framed as a prediction, the spirit of recommendation is to answer an interventional question---for each user and movie, what would the rating be if we "forced" the user to watch the movie? To this end, we develop a causal approach to recommendation, one where watching a movie is a "treatment" and a user's rating is an "outcome." The problem is there may be unobserved confounders, variables that affect both which movies the users watch and how they rate them; unobserved confounders impede causal predictions with observational data. To solve this problem, we develop the deconfounded recommender, a way to use classical recommendation models for causal recommendation. Following Wang & Blei [23], the deconfounded recommender involves two probabilistic models. The first models which movies the users watch; it provides a substitute for the unobserved confounders. The second one models how each user rates each movie; it employs the substitute to help account for confounders. This two-stage approach removes bias due to confounding. It improves recommendation and enjoys stable performance against interventions on test sets."

in_NB
causal_inference
collaborative_filtering
blei.david
to_teach:data-mining
to_read
12 weeks ago by cshalizi

Pensky : Dynamic network models and graphon estimation

12 weeks ago by cshalizi

"In the present paper, we consider a dynamic stochastic network model. The objective is estimation of the tensor of connection probabilities ΛΛ when it is generated by a Dynamic Stochastic Block Model (DSBM) or a dynamic graphon. In particular, in the context of the DSBM, we derive a penalized least squares estimator ΛˆΛ^ of ΛΛ and show that ΛˆΛ^ satisfies an oracle inequality and also attains minimax lower bounds for the risk. We extend those results to estimation of ΛΛ when it is generated by a dynamic graphon function. The estimators constructed in the paper are adaptive to the unknown number of blocks in the context of the DSBM or to the smoothness of the graphon function. The technique relies on the vectorization of the model and leads to much simpler mathematical arguments than the ones used previously in the stationary set up. In addition, all results in the paper are nonasymptotic and allow a variety of extensions."

to:NB
to_read
graph_limits
nonparametrics
network_data_analysis
re:smoothing_adjacency_matrices
12 weeks ago by cshalizi

From Stochastic Thermodynamics to Thermodynamic Inference | Annual Review of Condensed Matter Physics

12 weeks ago by cshalizi

"For a large class of nonequilibrium systems, thermodynamic notions like work, heat, and, in particular, entropy production can be identified on the level of fluctuating dynamical trajectories. Within stochastic thermodynamics various fluctuation theorems relating these quantities have been proven. Their application to experimental systems requires that all relevant mesostates are accessible. Recent advances address the typical situation that only partial, or coarse-grained, information about a system is available. Thermodynamic inference as a general strategy uses consistency constraints derived from stochastic thermodynamics to infer otherwise hidden properties of nonequilibrium systems. An important class in this respect are active particles, for which we resolve the conflicting strategies that have been proposed to identify entropy production. As a paradigm for thermodynamic inference, the thermodynamic uncertainty relation provides a lower bound on the entropy production through measurements of the dispersion of any current in the system. Likewise, it quantifies the cost of precision for biomolecular processes. Generalizations and ramifications allow the inference of, inter alia, model-free upper bounds on the efficiency of molecular motors and of the minimal number of intermediate states in enzymatic networks."

to:NB
statistical_mechanics
thermodynamics
fluctuation-response
non-equilibrium
to_read
12 weeks ago by cshalizi

Turbulence Modeling in the Age of Data | Annual Review of Fluid Mechanics

12 weeks ago by cshalizi

"Data from experiments and direct simulations of turbulence have historically been used to calibrate simple engineering models such as those based on the Reynolds-averaged Navier–Stokes (RANS) equations. In the past few years, with the availability of large and diverse data sets, researchers have begun to explore methods to systematically inform turbulence models with data, with the goal of quantifying and reducing model uncertainties. This review surveys recent developments in bounding uncertainties in RANS models via physical constraints, in adopting statistical inference to characterize model coefficients and estimate discrepancy, and in using machine learning to improve turbulence models. Key principles, achievements, and challenges are discussed. A central perspective advocated in this review is that by exploiting foundational knowledge in turbulence modeling and physical constraints, researchers can use data-driven approaches to yield useful predictive models."

to:NB
turbulence
statistics
to_read
12 weeks ago by cshalizi

The Fokker–Planck Approach to Complex Spatiotemporal Disordered Systems | Annual Review of Condensed Matter Physics

12 weeks ago by cshalizi

"When the complete understanding of a complex system is not available, as, e.g., for systems considered in the real world, we need a top-down approach to complexity. In this approach, one may desire to understand general multipoint statistics. Here, such a general approach is presented and discussed based on examples from turbulence and sea waves. Our main idea is based on the cascade picture of turbulence, entangling fluctuations from large to small scales. Inspired by this cascade picture, we express the general multipoint statistics by the statistics of scale-dependent fluctuations of variables and relate it to a scale-dependent process, which finally is a stochastic cascade process. We show how to extract from empirical data a Fokker–Planck equation for this cascade process, which allows the generation of surrogate data to forecast extreme events as well as to develop a nonequilibrium thermodynamics for the complex systems. For each cascade event, an entropy production can be determined. These entropies accurately fulfill a rigorous law, namely the integral fluctuations theorem."

to:NB
stochastic_processes
random_fields
physics
statistical_mechanics
markov_models
macro_from_micro
to_be_shot_after_a_fair_trial
non-equilibrium
to_read
12 weeks ago by cshalizi

Identification and Extrapolation of Causal Effects with Instrumental Variables | Annual Review of Economics

12 weeks ago by cshalizi

"Instrumental variables (IV) are widely used in economics to address selection on unobservables. Standard IV methods produce estimates of causal effects that are specific to individuals whose behavior can be manipulated by the instrument at hand. In many cases, these individuals are not the same as those who would be induced to treatment by an intervention or policy of interest to the researcher. The average causal effect for the two groups can differ significantly if the effect of the treatment varies systematically with unobserved factors that are correlated with treatment choice. We review the implications of this type of unobserved heterogeneity for the interpretation of standard IV methods and for their relevance to policy evaluation. We argue that making inferences about policy-relevant parameters typically requires extrapolating from the individuals affected by the instrument to the individuals who would be induced to treatment by the policy under consideration. We discuss a variety of alternatives to standard IV methods that can be used to rigorously perform this extrapolation. We show that many of these approaches can be nested as special cases of a general framework that embraces the possibility of partial identification."

--- Memo to self: Read this before revising the IV sections of ADAfaEPoV.

to:NB
causal_inference
instrumental_variables
partial_identification
statistics
re:ADAfaEPoV
to_read
--- Memo to self: Read this before revising the IV sections of ADAfaEPoV.

12 weeks ago by cshalizi

On the Statistical Formalism of Uncertainty Quantification | Annual Review of Statistics and Its Application

12 weeks ago by cshalizi

"The use of models to try to better understand reality is ubiquitous. Models have proven useful in testing our current understanding of reality; for instance, climate models of the 1980s were built for science discovery, to achieve a better understanding of the general dynamics of climate systems. Scientific insights often take the form of general qualitative predictions (i.e., “under these conditions, the Earth's poles will warm more than the rest of the planet”); such use of models differs from making quantitative forecasts of specific events (i.e. “high winds at noon tomorrow at London's Heathrow Airport”). It is sometimes hoped that, after sufficient model development, any model can be used to make quantitative forecasts for any target system. Even if that were the case, there would always be some uncertainty in the prediction. Uncertainty quantification aims to provide a framework within which that uncertainty can be discussed and, ideally, quantified, in a manner relevant to practitioners using the forecast system. A statistical formalism has developed that claims to be able to accurately assess the uncertainty in prediction. This article is a discussion of if and when this formalism can do so. The article arose from an ongoing discussion between the authors concerning this issue, the second author generally being considerably more skeptical concerning the utility of the formalism in providing quantitative decision-relevant information."

to:NB
to_read
statistics
prediction
risk_vs_uncertainty
smith.leonard
berger.james
foundations_of_statistics
12 weeks ago by cshalizi

[1901.10113] Self-organization of action hierarchy and compositionality by reinforcement learning with recurrent networks

12 weeks ago by cshalizi

"Recurrent neural networks (RNNs) for reinforcement learning (RL) have shown distinct advantages, e.g., solving memory-dependent tasks and meta-learning. However, little effort has been spent on improving RNN architectures and on understanding the underlying neural mechanisms for performance gain. In this paper, we propose a novel, multiple-timescale, stochastic RNN for RL. Empirical results show that the network can autonomously learn to abstract sub-goals and can self-develop an action hierarchy using internal dynamics in a challenging continuous control task. Furthermore, we show that the self-developed compositionality of the network enhances faster re-learning when adapting to a new task that is a re-composition of previously learned sub-goals, than when starting from scratch. We also found that improved performance can be achieved when neural activities are subject to stochastic rather than deterministic dynamics."

--- We're going to re-invent Miller, Galanter and Pribram (1960) _Plans and the Structure of Behavior_, aren't we?

to:NB
to_read
reinforcement_learning
neural_networks
self-organization
--- We're going to re-invent Miller, Galanter and Pribram (1960) _Plans and the Structure of Behavior_, aren't we?

12 weeks ago by cshalizi

Multiresolution Network Models: Journal of Computational and Graphical Statistics: Vol 28, No 1

12 weeks ago by cshalizi

"Many existing statistical and machine learning tools for social network analysis focus on a single level of analysis. Methods designed for clustering optimize a global partition of the graph, whereas projection-based approaches (e.g., the latent space model in the statistics literature) represent in rich detail the roles of individuals. Many pertinent questions in sociology and economics, however, span multiple scales of analysis. Further, many questions involve comparisons across disconnected graphs that will, inevitably be of different sizes, either due to missing data or the inherent heterogeneity in real-world networks. We propose a class of network models that represent network structure on multiple scales and facilitate comparison across graphs with different numbers of individuals. These models differentially invest modeling effort within subgraphs of high density, often termed communities, while maintaining a parsimonious structure between said subgraphs. We show that our model class is projective, highlighting an ongoing discussion in the social network modeling literature on the dependence of inference paradigms on the size of the observed graph. We illustrate the utility of our method using data on household relations from Karnataka, India. Supplementary material for this article is available online."

to:NB
to_read
network_data_analysis
statistics
mccormick.tyler
fosdick.bailey
to_teach:baby-nets
re:fractal_network_asymptotics
12 weeks ago by cshalizi

Lee , Song : Stable limit theorems for empirical processes under conditional neighborhood dependence

12 weeks ago by cshalizi

"This paper introduces a new concept of stochastic dependence among many random variables which we call conditional neighborhood dependence (CND). Suppose that there are a set of random variables and a set of sigma algebras where both sets are indexed by the same set endowed with a neighborhood system. When the set of random variables satisfies CND, any two non-adjacent sets of random variables are conditionally independent given sigma algebras having indices in one of the two sets’ neighborhood. Random variables with CND include those with conditional dependency graphs and a class of Markov random fields with a global Markov property. The CND property is useful for modeling cross-sectional dependence governed by a complex, large network. This paper provides two main results. The first result is a stable central limit theorem for a sum of random variables with CND. The second result is a Donsker-type result of stable convergence of empirical processes indexed by a class of functions satisfying a certain bracketing entropy condition when the random variables satisfy CND."

to:NB
to_read
empirical_processes
random_fields
stochastic_processes
central_limit_theorem
12 weeks ago by cshalizi

McGoff , Mukherjee , Pillai : Statistical inference for dynamical systems: A review

12 weeks ago by cshalizi

"The topic of statistical inference for dynamical systems has been studied widely across several fields. In this survey we focus on methods related to parameter estimation for nonlinear dynamical systems. Our objective is to place results across distinct disciplines in a common setting and highlight opportunities for further research."

to:NB
to_read
dynamical_systems
statistical_inference_for_stochastic_processes
stochastic_processes
statistics
re:almost_none
re:stacs
12 weeks ago by cshalizi

On the Interpretation of do(x) : Journal of Causal Inference

12 weeks ago by cshalizi

"This paper provides empirical interpretation of the do(x) operator when applied to non-manipulable variables such as race, obesity, or cholesterol level. We view do(x) as an ideal intervention that provides valuable information on the effects of manipulable variables and is thus empirically testable. We draw parallels between this interpretation and ways of enabling machines to learn effects of untried actions from those tried. We end with the conclusion that researchers need not distinguish manipulable from non-manipulable variables; both types are equally eligible to receive the do(x) operator and to produce useful information for decision makers."

to:NB
causality
pearl.judea
re:ADAfaEPoV
to_read
12 weeks ago by cshalizi

JPAE at 25: Looking back and moving forward on teaching evaluations: Journal of Public Affairs Education: Vol 25, No 1

may 2019 by cshalizi

"In many if not most colleges and universities in the United States, raw scores from Student Evaluations of Teaching (SETs) are the primary tool of teaching assessment, and teaching evaluations often have real consequences for promotion and tenure. In 2005, JPAE published an article on teaching evaluations, and this article added to what was at that time a somewhat thin literature indicating that SETs are systematically biased against female faculty, and probably against older and minority faculty. Since that time, this literature has swelled and grown and now the evidence that SETs are invalid and systematically biased is too strong to ignore. Over its first 25 years, JPAE has been a force for good in public affairs education. As JPAE moves into its next 25 years, it should take a principled and evidence-based stand against the use of raw SETs as an important indicator of teaching quality, and should encourage high-quality articles studying other methods of assessing teaching so that we can learn what approaches are better."

to:NB
to_read
teaching
social_measurement
i_want_to_believe
to_be_shot_after_a_fair_trial
may 2019 by cshalizi

[1904.06019] Conformal Prediction Under Covariate Shift

may 2019 by cshalizi

"We extend conformal prediction methodology beyond the case of exchangeable data. In particular, we show that a weighted version of conformal prediction can be used to compute distribution-free prediction intervals for problems in which the test and training covariate distributions differ, but the likelihood ratio between these two distributions is known---or, in practice, can be estimated accurately with access to a large set of unlabeled data (test covariate points). Our weighted extension of conformal prediction also applies more generally, to settings in which the data satisfies a certain weighted notion of exchangeability. We discuss other potential applications of our new conformal methodology, including latent variable and missing data problems."

to:NB
to_read
statistics
prediction
conformal_prediction
ramdas.aaditya
tibshirani.ryan
kith_and_kin
covariate_shift
may 2019 by cshalizi

[1705.08527] Causal inference for social network data

april 2019 by cshalizi

"We extend recent work by van der Laan (2014) on causal inference for causally connected units to more general social network settings. Our asymptotic results allow for dependence of each observation on a growing number of other units as sample size increases. We are not aware of any previous methods for inference about network members in observational settings that allow the number of ties per node to increase as the network grows. While previous methods have generally implicitly focused on one of two possible sources of dependence among social network observations, we allow for both dependence due to contagion, or transmission of information across network ties, and for dependence due to latent similarities among nodes sharing ties. We describe estimation and inference for causal effects that are specifically of interest in social network settings."

to:NB
to_read
heard_the_talk
causal_inference
network_data_analysis
kith_and_kin
ogburn.elizabeth
van_der_laan.mark
re:homophily_and_confounding
to_teach:baby-nets
april 2019 by cshalizi

A Spline Theory of Deep Learning

april 2019 by cshalizi

"We build a rigorous bridge between deep networks (DNs) and approximation theory via spline functions and operators. Our key result is that a large class of DNs can be written as a composition of max-affine spline operators (MASOs), which provide a powerful portal through which to view and analyze their inner workings. For instance, conditioned on the input signal, the output of a MASO DN can be written as a simple affine transformation of the input. This implies that a DN constructs a set of signal-dependent, class-specific templates against which the signal is compared via a simple inner product; we explore the links to the classical theory of optimal classification via matched filters and the effects of data memorization. Going further, we propose a simple penalty term that can be added to the cost function of any DN learning algorithm to force the templates to be orthogonal with each other; this leads to significantly improved classification performance and reduced overfitting with no change to the DN architecture. The spline partition of the input signal space opens up a new geometric avenue to study how DNs organize signals in a hierarchical fashion. As an application, we develop and validate a new distance metric for signals that quantifies the difference between their partition encodings."

to:NB
to_read
approximation
splines
neural_networks
machine_learning
your_favorite_deep_neural_network_sucks
via:csantos
april 2019 by cshalizi

[1903.08560] Surprises in High-Dimensional Ridgeless Least Squares Interpolation

april 2019 by cshalizi

"Interpolators---estimators that achieve zero training error---have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum ℓ2 norm (`ridgeless') interpolation in high-dimensional least squares regression. We consider two different models for the feature distribution: a linear model, where the feature vectors xi∈ℝp are obtained by applying a linear transform to a vector of i.i.d.\ entries, xi=Σ1/2zi (with zi∈ℝp); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, xi=φ(Wzi) (with zi∈ℝd, W∈ℝp×d a matrix of i.i.d.\ entries, and φ an activation function acting componentwise on Wzi). We recover---in a precise quantitative way---several phenomena that have been observed in large-scale neural networks and kernel machines, including the `double descent' behavior of the prediction risk, and the potential benefits of overparametrization."

--- "Heard the talk" = "Ryan came into my office to explain it all because he was so enthused".

to:NB
to_read
regression
high-dimensional_statistics
interpolation
kith_and_kin
tibshirani.ryan
rosset.saharon
montanari.andrea
hastie.trevor
statistics
neural_networks
heard_the_talk
--- "Heard the talk" = "Ryan came into my office to explain it all because he was so enthused".

april 2019 by cshalizi

[1903.08687] On approximate validation of models: A Kolmogorov-Smirnov based approach

april 2019 by cshalizi

"Classical tests of fit typically reject a model for large enough real data samples. In contrast, often in statistical practice a model offers a good description of the data even though it is not the "true" random generator. We consider a more flexible approach based on contamination neighbourhoods around a model. Using trimming methods and the Kolmogorov metric we introduce a functional statistic measuring departures from a contaminated model and the associated estimator corresponding to its sample version. We show how this estimator allows testing of fit for the (slightly) contaminated model vs sensible deviations from it, with uniformly exponentially small type I and type II error probabilities. We also address the asymptotic behavior of the estimator showing that, under suitable regularity conditions, it asymptotically behaves as the supremum of a Gaussian process. As an application we explore methods of comparison between descriptive models based on the paradigm of model falseness. We also include some connections of our approach with the False-Discovery-Rate setting, showing competitive behavior when estimating the contamination level, although applicable in a wider framework."

--- This would be very cool if it does what they say they want it to.

to:NB
to_read
goodness-of-fit
statistics
stochastic_models
--- This would be very cool if it does what they say they want it to.

april 2019 by cshalizi

[1903.08766] A Method for Measuring Network Effects of One-to-One Communication Features in Online A/B Tests

april 2019 by cshalizi

"A/B testing is an important decision making tool in product development because can provide an accurate estimate of the average treatment effect of a new features, which allows developers to understand how the business impact of new changes to products or algorithms. However, an important assumption of A/B testing, Stable Unit Treatment Value Assumption (SUTVA), is not always a valid assumption to make, especially for products that facilitate interactions between individuals. In contexts like one-to-one messaging we should expect network interference; if an experimental manipulation is effective, behavior of the treatment group is likely to influence members in the control group by sending them messages, violating this assumption. In this paper, we propose a novel method that can be used to account for network effects when A/B testing changes to one-to-one interactions. Our method is an edge-based analysis that can be applied to standard Bernoulli randomized experiments to retrieve an average treatment effect that is not influenced by network interference. We develop a theoretical model, and methods for computing point estimates and variances of effects of interest via network-consistent permutation testing. We then apply our technique to real data from experiments conducted on the messaging product at LinkedIn. We find empirical support for our model, and evidence that the standard method of analysis for A/B tests underestimates the impact of new features in one-to-one messaging contexts."

to:NB
to_read
experimental_design
network_data_analysis
statistics
re:do_not_adjust_your_receiver
april 2019 by cshalizi

[1903.02541] Relational Pooling for Graph Representations

april 2019 by cshalizi

"This work generalizes graph neural networks (GNNs) beyond those based on the Weisfeiler-Lehman (WL) algorithm, graph Laplacians, and graph diffusion kernels. Our approach, denoted Relational Pooling (RP), draws from the theory of finite partial exchangeability to provide a framework with maximal representation power for graphs. RP can work with existing graph representation models, and somewhat counterintuitively, can make them even more powerful than the original WL isomorphism test. Additionally, RP is the first theoretically sound framework to use architectures like Recurrent Neural Networks and Convolutional Neural Networks for graph classification. RP also has graph kernels as a special case. We demonstrate improved performance of novel RP-based graph representations over current state-of-the-art methods on a number of tasks."

to:NB
to_read
network_data_analysis
graph_limits
neural_networks
april 2019 by cshalizi

[1903.01672] Causal Discovery from Heterogeneous/Nonstationary Data

april 2019 by cshalizi

"It is commonplace to encounter heterogeneous or nonstationary data, of which the underlying generating process changes across domains or over time. Such a distribution shift feature presents both challenges and opportunities for causal discovery. In this paper, we develop a framework for causal discovery from such data, called Constraint-based causal Discovery from heterogeneous/NOnstationary Data (CD-NOD), to find causal skeleton and directions and estimate the properties of mechanism changes. First, we propose an enhanced constraint-based procedure to detect variables whose local mechanisms change and recover the skeleton of the causal structure over observed variables. Second, we present a method to determine causal orientations by making use of independent changes in the data distribution implied by the underlying causal model, benefiting from information carried by changing distributions. After learning the causal structure, next, we investigate how to efficiently estimate the `driving force' of the nonstationarity of a causal mechanism. That is, we aim to extract from data a low-dimensional representation of changes. The proposed methods are nonparametric, with no hard restrictions on data distributions and causal mechanisms, and do not rely on window segmentation. Furthermore, we find that data heterogeneity benefits causal structure identification even with particular types of confounders. Finally, we show the connection between heterogeneity/nonstationarity and soft intervention in causal discovery. Experimental results on various synthetic and real-world data sets (task-fMRI and stock market data) are presented to demonstrate the efficacy of the proposed methods."

to:NB
statistics
causal_inference
causal_discovery
non-stationarity
kith_and_kin
glymour.clark
to_read
april 2019 by cshalizi

[1902.03515] Multi-Domain Translation by Learning Uncoupled Autoencoders

april 2019 by cshalizi

"Multi-domain translation seeks to learn a probabilistic coupling between marginal distributions that reflects the correspondence between different domains. We assume that data from different domains are generated from a shared latent representation based on a structural equation model. Under this assumption, we show that the problem of computing a probabilistic coupling between marginals is equivalent to learning multiple uncoupled autoencoders that embed to a given shared latent distribution. In addition, we propose a new framework and algorithm for multi-domain translation based on learning the shared latent distribution and training autoencoders under distributional constraints. A key practical advantage of our framework is that new autoencoders (i.e., new domains) can be added sequentially to the model without retraining on the other domains, which we demonstrate experimentally on image as well as genomics datasets."

--- Last tag is tentative.

to:NB
machine_learning
to_read
inference_to_latent_objects
uhler.caroline
factor_analysis
--- Last tag is tentative.

april 2019 by cshalizi

The Lasserre Hierarch in Approximation Algorithms

april 2019 by cshalizi

"The Lasserre hierarchy is a systematic procedure to strengthen a relaxation for

an optimization problem by adding additional variables and SDP constraints. In

the last years this hierarchy moved into the focus of researchers in approximation algorithms as the obtain relaxations have provably nice properties. In particular on the t -th level, the relaxation can be solved in time n^O(t) and every constraint that one could derive from looking just at t variables is automatically satisfied. Additionally, it provides a vector embedding of events so that probabilities are expressable as inner products.

"The goal of these lecture notes is to give short but rigorous proofs of all key properties of the Lasserre hierarchy. In the second part we will demonstrate how the Lasserre SDP can be applied to (mostly NP-hard) optimization problems such as KNAPSACK, MATCHING, MAXCUT (in general and in dense graphs), 3-COLORING and SETCOVER."

--- I remember Cris trying to explain this to me once...

to:NB
to_read
approximation
optimization
re:in_soviet_union_optimization_problem_solves_you
an optimization problem by adding additional variables and SDP constraints. In

the last years this hierarchy moved into the focus of researchers in approximation algorithms as the obtain relaxations have provably nice properties. In particular on the t -th level, the relaxation can be solved in time n^O(t) and every constraint that one could derive from looking just at t variables is automatically satisfied. Additionally, it provides a vector embedding of events so that probabilities are expressable as inner products.

"The goal of these lecture notes is to give short but rigorous proofs of all key properties of the Lasserre hierarchy. In the second part we will demonstrate how the Lasserre SDP can be applied to (mostly NP-hard) optimization problems such as KNAPSACK, MATCHING, MAXCUT (in general and in dense graphs), 3-COLORING and SETCOVER."

--- I remember Cris trying to explain this to me once...

april 2019 by cshalizi

[1807.00564] Inference, Learning, and Population Size: Projectivity for SRL Models

march 2019 by cshalizi

"A subtle difference between propositional and relational data is that in many relational models, marginal probabilities depend on the population or domain size. This paper connects the dependence on population size to the classic notion of projectivity from statistical theory: Projectivity implies that relational predictions are robust with respect to changes in domain size. We discuss projectivity for a number of common SRL systems, and identify syntactic fragments that are guaranteed to yield projective models. The syntactic conditions are restrictive, which suggests that projectivity is difficult to achieve in SRL, and care must be taken when working with different domain sizes."

to:NB
to_read
relational_learning
re:your_favorite_ergm_sucks
probability
march 2019 by cshalizi

[1609.08816] Identifying Causal Effects With Proxy Variables of an Unmeasured Confounder

march 2019 by cshalizi

"We consider a causal effect that is confounded by an unobserved variable, but with observed proxy variables of the confounder. We show that, with at least two independent proxy variables satisfying a certain rank condition, the causal effect is nonparametrically identified, even if the measurement error mechanism, i.e., the conditional distribution of the proxies given the con- founder, may not be identified. Our result generalizes the identification strategy of Kuroki & Pearl (2014) that rests on identification of the measurement error mechanism. When only one proxy for the confounder is available, or the required rank condition is not met, we develop a strategy to test the null hypothesis of no causal effect."

to:NB
to_read
causal_inference
statistics
march 2019 by cshalizi

[1902.10286] On Multi-Cause Causal Inference with Unobserved Confounding: Counterexamples, Impossibility, and Alternatives

march 2019 by cshalizi

"Unobserved confounding is a central barrier to drawing causal inferences from observational data. Several authors have recently proposed that this barrier can be overcome in the case where one attempts to infer the effects of several variables simultaneously. In this paper, we present two simple, analytical counterexamples that challenge the general claims that are central to these approaches. In addition, we show that nonparametric identification is impossible in this setting. We discuss practical implications, and suggest alternatives to the methods that have been proposed so far in this line of work: using proxy variables and shifting focus to sensitivity analysis."

--- Counter-examples to the "deconfounder" approach.

in_NB
causal_inference
statistics
to_read
factor_analysis
via:betsy_ogburn
--- Counter-examples to the "deconfounder" approach.

march 2019 by cshalizi

[1902.04114] Using Embeddings to Correct for Unobserved Confounding

february 2019 by cshalizi

"We consider causal inference in the presence of unobserved confounding. In particular, we study the case where a proxy is available for the confounder but the proxy has non-iid structure. As one example, the link structure of a social network carries information about its members. As another, the text of a document collection carries information about their meanings. In both these settings, we show how to effectively use the proxy to do causal inference. The main idea is to reduce the causal estimation problem to a semi-supervised prediction of both the treatments and outcomes. Networks and text both admit high-quality embedding models that can be used for this semi-supervised prediction. Our method yields valid inferences under suitable (weak) conditions on the quality of the predictive model. We validate the method with experiments on a semi-synthetic social network dataset. We demonstrate the method by estimating the causal effect of properties of computer science submissions on whether they are accepted at a conference."

to:NB
causal_inference
statistics
blei.david
re:homophily_and_confounding
to_read
february 2019 by cshalizi

The Bias Is Built In: How Administrative Records Mask Racially Biased Policing by Dean Knox, Will Lowe, Jonathan Mummolo :: SSRN

february 2019 by cshalizi

"Researchers often lack the necessary data to credibly estimate racial bias in policing. In particular, police administrative records lack information on civilians that police observe but do not investigate. In this paper, we show that if police racially discriminate when choosing whom to investigate, using administrative records to estimate racial bias in police behavior amounts to post-treatment conditioning, and renders many quantities of interest unidentified---even among investigated individuals---absent strong and untestable assumptions. In most cases, no set of controls can eliminate this statistical bias, the exact form of which we derive through principal stratification in a causal mediation framework. We develop a bias-correction procedure and nonparametric sharp bounds for race effects, replicate published findings, and show traditional estimation techniques can severely underestimate levels of racially biased policing or even mask discrimination entirely. We conclude by outlining a general and feasible design for future studies that is robust to this inferential snare."

to:NB
to_read
causal_inference
police
discrimination
statistics
to_teach:undergrad-ADA
via:henry_farrell
february 2019 by cshalizi

Social Space Diffusion: Applications of a Latent Space Model to Diffusion with Uncertain Ties - Jacob C. Fisher, 2019

february 2019 by cshalizi

"Social networks represent two different facets of social life: (1) stable paths for diffusion, or the spread of something through a connected population, and (2) random draws from an underlying social space, which indicate the relative positions of the people in the network to one another. The dual nature of networks creates a challenge: if the observed network ties are a single random draw, is it realistic to expect that diffusion only follows the observed network ties? This study takes a first step toward integrating these two perspectives by introducing a social space diffusion model. In the model, network ties indicate positions in social space, and diffusion occurs proportionally to distance in social space. Practically, the simulation occurs in two parts. First, positions are estimated using a statistical model (in this example, a latent space model). Then, second, the predicted probabilities of a tie from that model—representing the distances in social space—or a series of networks drawn from those probabilities—representing routine churn in the network—are used as weights in a weighted averaging framework. Using longitudinal data from high school friendship networks, the author explores the properties of the model. The author shows that the model produces smoothed diffusion results, which predict attitudes in future waves 10 percent better than a diffusion model using the observed network and up to 5 percent better than diffusion models using alternative, non-model-based smoothing approaches."

to:NB
to_read
social_influence
social_networks
network_data_analysis
re:homophily_and_confounding
to_teach:baby-nets
via:gabriel_rossman
february 2019 by cshalizi

**related tags**

Copy this bookmark: