[1707.04800] Exponential-Family Models of Random Graphs: Inference in Finite-, Super-, and Infinite Population Scenarios

6 hours ago by cshalizi

"Exponential-family Random Graph Models (ERGMs) constitute a large statistical framework for modeling sparse and dense random graphs, short- and long-tailed degree distributions, covariates, and a wide range of complex dependencies. Special cases of ERGMs are generalized linear models (GLMs), Bernoulli random graphs, β-models, p1-models, and models related to Markov random fields in spatial statistics and other areas of statistics. While widely used in practice, questions have been raised about the theoretical properties of ERGMs. These include concerns that some ERGMs are near-degenerate and that many ERGMs are non-projective. To address them, careful attention must be paid to model specifications and their underlying assumptions, and in which inferential settings models are employed. As we discuss, near-degeneracy can affect simplistic ERGMs lacking structure, but well-posed ERGMs with additional structure can be well-behaved. Likewise, lack of projectivity can affect non-likelihood-based inference, but likelihood-based inference does not require projectivity. Here, we review well-posed ERGMs along with likelihood-based inference. We first clarify the core statistical notions of "sample" and "population" in the ERGM framework, and separate the process that generates the population graph from the observation process. We then review likelihood-based inference in finite-, super-, and infinite-population scenarios. We conclude with consistency results, and an application to human brain networks"

--- Needless (?) to say, I am unconvinced by the arguments that projectivity doesn't matter.

to:NB
exponential_family_random_graphs
network_data_analysis
statistics
to_teach:baby-nets
re:your_favorite_ergm_sucks
--- Needless (?) to say, I am unconvinced by the arguments that projectivity doesn't matter.

6 hours ago by cshalizi

[1908.01596v3] A Unifying Analysis of Shift Operators on a Graph

6 hours ago by cshalizi

The maximum entropy principle is employed to introduce a general class of shift operators (GSO) for random signals on a graph. By virtue of the assumed probabilistic framework, the proposed GSO is shown to be both bounded and to exhibit the desired property of asymptotic power preservation over graph shifts. For rigour, the sensitivity of the GSO to a graph topology misspecification is also addressed. The advantages of the proposed operator are demonstrated in a real-world multi-sensor signal average setting.

network_data_analysis
to:NB
6 hours ago by cshalizi

[1909.05802] Ecological communities from random Lotka-Volterra dynamics with nonlinear functional response

6 hours ago by cshalizi

We investigate the outcome of Lotka-Volterra dynamics of ecological communities with random interaction coefficients and non-linear functional response. We show in simulations that the saturation of Holling type-II response stabilises the dynamics. This is confirmed in an analytical generating-functional approach to Lotka-Volterra equations with piecewise linear saturating response. For such systems we are able to derive self-consistent relations governing the stable fixed-point phase, and to carry out a linear stability analysis to predict the onset of unstable behaviour. We investigate in detail the combined effects of the mean and variance of the random interaction coefficients, the cut-off parameter of the non-linear response, and a symmetry parameter. We find that stability and diversity increases with the introduction of functional response, where decreasing the functional response parameter has a similar effect to decreasing the symmetry parameter. We also find biomass and diversity to be less dependent on the symmetry of interactions with functional response, and co-operation to no longer have a detrimental effect on stability.

ecology
to:NB
6 hours ago by cshalizi

[1908.11140] Deep Learning and MARS: A Connection

13 hours ago by cshalizi

"We consider least squares regression estimates using deep neural networks. We show that these estimates satisfy an oracle inequality, which implies that (up to a logarithmic factor) the error of these estimates is at least as small as the optimal possible error bound which one would expect for MARS in case that this procedure would work in the optimal way. As a result we show that our neural networks are able to achieve a dimensionality reduction in case that the regression function locally has low dimensionality. This assumption seems to be realistic in real-world applications, since selected high-dimensional data are often confined to locally-low-dimensional distributions. In our simulation study we provide numerical experiments to support our theoretical results and to compare our estimate with other conventional nonparametric regression estimates, especially with MARS. The use of our estimates is illustrated through a real data analysis."

to:NB
regression
nonparametrics
neural_networks
statistics
13 hours ago by cshalizi

[1909.05229] Goodness-of-fit tests on manifolds

13 hours ago by cshalizi

"We develop a general theory for the goodness-of-fit test to non-linear models. In particular, we assume that the observations are noisy samples of a sub-manifold defined by a non-linear map of some intrinsic structures. The observation noise is additive Gaussian. Our main result shows that the "residual" of the model fit, by solving a non-linear least-square problem, follows a (possibly non-central) χ2 distribution. The parameters of the χ2 distribution are related to the model order and dimension of the problem. The main result is established by making a novel connection between statistical test and differential geometry. We further present a method to select the model orders sequentially. We demonstrate the broad application of the general theory in a range of applications in machine learning and signal processing, including determining the rank of low-rank (possibly complex-valued) matrices and tensors, from noisy, partial, or indirect observations, determining the number of sources in signal demixing, and potential applications in determining the number of hidden nodes in neural networks."

to:NB
statistics_on_manifolds
goodness-of-fit
statistics
inference_to_latent_objects
manifold_learning
13 hours ago by cshalizi

[1612.01159] On the instability and degeneracy of deep learning models

13 hours ago by cshalizi

"A probability model exhibits instability if small changes in a data outcome result in large, and often unanticipated, changes in probability. This instability is a property of the probability model, given by a distributional form and a given configuration of parameters. For correlated data structures found in several application areas, there is increasing interest in identifying such sensitivity in model probability structure. We consider the problem of quantifying instability for general probability models defined on sequences of observations, where each sequence of length N has a finite number of possible values that can be taken at each point. A sequence of probability models results, indexed by N, and an associated parameter sequence, that accommodates data of expanding dimension. Model instability is formally shown to occur when a certain log-probability ratio under such models grows faster than N. In this case, a one component change in the data sequence can shift probability by orders of magnitude. Also, as instability becomes more extreme, the resulting probability models are shown to tend to degeneracy, placing all their probability on potentially small portions of the sample space. These results on instability apply to large classes of models commonly used in random graphs, network analysis, and machine learning contexts."

to:NB
statistics
exponential_family_random_graphs
re:your_favorite_ergm_sucks
to_read
13 hours ago by cshalizi

[1909.04843] Clusters and the entropy in opinion dynamics on complex networks

13 hours ago by cshalizi

"In this work, we investigate a heterogeneous population in the modified Hegselmann-Krause opinion model on complex networks. We introduce the Shannon information entropy about all relative opinion clusters to characterize the cluster profile in the final configuration. Independent of network structures, there exists the optimal stubbornness of one subpopulation for the largest number of clusters and the highest entropy. Besides, there is the optimal bounded confidence (or subpopulation ratio) of one subpopulation for the smallest number of clusters and the lowest entropy. However, network structures affect cluster profiles indeed. A large average degree favors consensus for making different networks more similar with complete graphs. The network size has limited impact on cluster profiles of heterogeneous populations on scale-free networks but has significant effects upon those on small-world networks."

to:NB
networks
voter_model
13 hours ago by cshalizi

[1909.04791] Techniques All Classifiers Can Learn from Deep Networks: Models, Optimizations, and Regularization

13 hours ago by cshalizi

"Deep neural networks have introduced novel and useful tools to the machine learning community, and other types of classifiers can make use of these tools to improve their performance and generality. This paper reviews the current state of the art for deep learning classifier technologies that are being used outside of deep neural networks. Many components of existing deep neural network architectures can be employed by non-network classifiers. In this paper, we review the feature learning, optimization, and regularization methods that form a core of deep network technologies. We then survey non-neural network learning algorithms that make innovative use of these methods to improve classification. We conclude by discussing directions that can be pursued to expand the area of deep learning for a variety of classification algorithms."

to:NB
classifiers
machine_learning
optimization
computational_statistics
statistics
neural_networks
13 hours ago by cshalizi

[1909.05097] Spectral Non-Convex Optimization for Dimension Reduction with Hilbert-Schmidt Independence Criterion

13 hours ago by cshalizi

"The Hilbert Schmidt Independence Criterion (HSIC) is a kernel dependence measure that has applications in various aspects of machine learning. Conveniently, the objectives of different dimensionality reduction applications using HSIC often reduce to the same optimization problem. However, the nonconvexity of the objective function arising from non-linear kernels poses a serious challenge to optimization efficiency and limits the potential of HSIC-based formulations. As a result, only linear kernels have been computationally tractable in practice. This paper proposes a spectral-based optimization algorithm that extends beyond the linear kernel. The algorithm identifies a family of suitable kernels and provides the first and second-order local guarantees when a fixed point is reached. Furthermore, we propose a principled initialization strategy, thereby removing the need to repeat the algorithm at random initialization points. Compared to state-of-the-art optimization algorithms, our empirical results on real data show a run-time improvement by as much as a factor of 10^5 while consistently achieving lower cost and classification/clustering errors. The implementation source code is publicly available on this https URL."

to:NB
kernel_methods
hilbert_space
probability
dimension_reduction
statistics
computational_statistics
spectral_methods
13 hours ago by cshalizi

[1909.05099] How to detect novelty in textual data streams? A comparative study of existing methods

13 hours ago by cshalizi

"Since datasets with annotation for novelty at the document and/or word level are not easily available, we present a simulation framework that allows us to create different textual datasets in which we control the way novelty occurs. We also present a benchmark of existing methods for novelty detection in textual data streams. We define a few tasks to solve and compare several state-of-the-art methods. The simulation framework allows us to evaluate their performances according to a set of limited scenarios and test their sensitivity to some parameters. Finally, we experiment with the same methods on different kinds of novelty in the New York Times Annotated Dataset."

to:NB
text_mining
natural_language_processing
13 hours ago by cshalizi

[1909.05207] Introduction to Online Convex Optimization

13 hours ago by cshalizi

"This manuscript portrays optimization as a process. In many practical applications the environment is so complex that it is infeasible to lay out a comprehensive theoretical model and use classical algorithmic theory and mathematical optimization. It is necessary as well as beneficial to take a robust approach, by applying an optimization method that learns as one goes along, learning from experience as more aspects of the problem are observed. This view of optimization as a process has become prominent in varied fields and has led to some spectacular success in modeling and systems that are now part of our daily lives."

to:NB
optimization
convexity
online_learning
low-regret_learning
hazan.elad
13 hours ago by cshalizi

[1909.04019] Forecaster: A Graph Transformer for Forecasting Spatial and Time-Dependent Data

13 hours ago by cshalizi

"Spatial and time-dependent data is of interest in many applications. This task is difficult due to its complex spatial dependency, long-range temporal dependency, data non-stationarity, and data heterogeneity. To address these challenges, we propose Forecaster, a graph Transformer architecture. Specifically, we start by learning the structure of the graph that parsimoniously represents the spatial dependency between the data at different locations. Based on the topology of the graph, we sparsify the Transformer to account for the strength of spatial dependency, long-range temporal dependency, data non-stationarity, and data heterogeneity. We evaluate Forecaster in the problem of forecasting taxi ride-hailing demand and show that our proposed architecture significantly outperforms the state-of-the-art baselines."

to:NB
spatio-temporal_statistics
prediction
statistics
13 hours ago by cshalizi

[1905.10546] Protecting the Protected Group: Circumventing Harmful Fairness

13 hours ago by cshalizi

"Machine Learning (ML) algorithms shape our lives. Banks use them to determine if we are good borrowers; IT companies delegate them recruitment decisions; police apply ML for crime-prediction, and judges base their verdicts on ML. However, real-world examples show that such automated decisions tend to discriminate against protected groups. This potential discrimination generated a huge hype both in media and in the research community. Quite a few formal notions of fairness were proposed, which take a form of constraints a "fair" algorithm must satisfy. We focus on scenarios where fairness is imposed on a self-interested party (e.g., a bank that maximizes its revenue). We find that the disadvantaged protected group can be worse off after imposing a fairness constraint. We introduce a family of \textit{Welfare-Equalizing} fairness constraints that equalize per-capita welfare of protected groups, and include \textit{Demographic Parity} and \textit{Equal Opportunity} as particular cases. In this family, we characterize conditions under which the fairness constraint helps the disadvantaged group. We also characterize the structure of the optimal \textit{Welfare-Equalizing} classifier for the self-interested party, and provide an algorithm to compute it. Overall, our \textit{Welfare-Equalizing} fairness approach provides a unified framework for discussing fairness in classification in the presence of a self-interested party."

--- From the abstract, this sounds pretty similar to Sharad Goel's paper...

to:NB
algorithmic_fairness
--- From the abstract, this sounds pretty similar to Sharad Goel's paper...

13 hours ago by cshalizi

[1909.04883] Learning Vector-valued Functions with Local Rademacher Complexity

14 hours ago by cshalizi

"We consider a general family of problems of which the output space admits vector-valued structure, covering a broad family of important domains, e.g. multi-label learning and multi-class classification. By using local Rademacher complexity and unlabeled data, we derived novel data-dependent excess risk bounds for vector-valued functions in both linear space and kernel space. The proposed bounds are much sharper than existing bounds and can be applied into specific vector-valued tasks in terms of different hypotheses sets and loss functions. Theoretical analysis motivates us to devise a unified learning framework for vector-valued functions based which is solved by proximal gradient descent on the primal, achieving a much better tradeoff between accuracy and efficiency. Empirical results on several benchmark datasets show that the proposed algorithm outperforms compared methods significantly, which coincides with our theoretical analysis."

to:NB
learning_theory
to_read
14 hours ago by cshalizi

[1909.00835] Evolutionary reinforcement learning of dynamical large deviations

14 hours ago by cshalizi

"We show how to calculate dynamical large deviations using evolutionary reinforcement learning. An agent, a stochastic model, propagates a continuous-time Monte Carlo trajectory, and receives a reward conditioned upon the values of certain path-extensive quantities. Evolution produces progressively fitter agents, allowing the calculation of a piece of a large-deviation rate function for a particular model and path-extensive quantity. For models with small state spaces the evolutionary process acts directly on rates, and for models with large state spaces the process acts on the weights of a neural network that parameterizes the model's rates. The present approach shows how path-extensive physics problems can be considered within a framework widely used in machine learning."

to:NB
reinforcement_learning
large_deviations
computational_statistics
probability
re:fitness_sampling
14 hours ago by cshalizi

[1904.04052] Practical tests for significance in Markov Chains

16 hours ago by cshalizi

"We give qualitative and quantitative improvements to theorems which enable significance testing in Markov Chains, with a particular eye toward the goal of enabling strong, interpretable, and statistically rigorous claims of political gerrymandering. Our results can be used to demonstrate at a desired significance level that a given Markov Chain state (e.g., a districting) is extremely unusual (rather than just atypical) with respect to the fragility of its characteristics in the chain. We also provide theorems specialized to leverage quantitative improvements when there is a product structure in the underlying probability space, as can occur due to geographical constraints on districtings."

to:NB
markov_models
hypothesis_testing
anomaly_detection
to_be_shot_after_a_fair_trial
16 hours ago by cshalizi

[1809.08686] On the quenched CLT for stationary random fields under projective criteria

16 hours ago by cshalizi

"Motivated by random evolutions which do not start from equilibrium, in a recent work, Peligrad and Volný (2018) showed that the quenched CLT (central limit theorem) holds for ortho-martingale random fields. In this paper, we study the quenched CLT for a class of random fields larger than the ortho-martingales. To get the results, we impose sufficient conditions in terms of projective criteria under which the partial sums of a stationary random field admit an ortho-martingale approximation. More precisely, the sufficient conditions are of the Hannan's projective type. As applications, we establish quenched CLT's for linear and nonlinear random fields with independent innovations."

to:NB
stochastic_processes
random_fields
martingales
central_limit_theorem
16 hours ago by cshalizi

[1909.04706] Regression to the Mean's Impact on the Synthetic Control Method: Bias and Sensitivity Analysis

16 hours ago by cshalizi

"To make informed policy recommendations from observational data, we must be able to discern true treatment effects from random noise and effects due to confounding. Difference-in-Difference techniques which match treated units to control units based on pre-treatment outcomes, such as the synthetic control approach have been presented as principled methods to account for confounding. However, we show that use of synthetic controls or other matching procedures can introduce regression to the mean (RTM) bias into estimates of the average treatment effect on the treated. Through simulations, we show RTM bias can lead to inflated type I error rates as well as decreased power in typical policy evaluation settings. Further, we provide a novel correction for RTM bias which can reduce bias and attain appropriate type I error rates. This correction can be used to perform a sensitivity analysis which determines how results may be affected by RTM. We use our proposed correction and sensitivity analysis to reanalyze data concerning the effects of California's Proposition 99, a large-scale tobacco control program, on statewide smoking rates."

to:NB
causal_inference
statistics
economics
small.dylan
16 hours ago by cshalizi

[1909.04890] Aggregated Hold-Out

16 hours ago by cshalizi

"Aggregated hold-out (Agghoo) is a method which averages learning rules selected by hold-out (that is, cross-validation with a single split). We provide the first theoretical guarantees on Agghoo, ensuring that it can be used safely: Agghoo performs at worst like the hold-out when the risk is convex. The same holds true in classification with the 0-1 risk, with an additional constant factor. For the hold-out, oracle inequalities are known for bounded losses, as in binary classification. We show that similar results can be proved, under appropriate assumptions, for other risk-minimization problems. In particular, we obtain an oracle inequality for regularized kernel regression with a Lip-schitz loss, without requiring that the Y variable or the regressors be bounded. Numerical experiments show that aggregation brings a significant improvement over the hold-out and that Agghoo is competitive with cross-validation."

to:NB
cross-validation
model_selection
statistics
arlot.sylvain
to_read
16 hours ago by cshalizi

[1908.04358] Graph hierarchy and spread of infections

16 hours ago by cshalizi

"Trophic levels and hence trophic coherence can be defined only on networks with well defined sources, trophic analysis of networks had been restricted to the ecological domain until now. Trophic coherence, a measure of a network's hierarchical organisation, has been shown to be linked to a network's structural and dynamical aspects. In this paper we introduce hierarchical levels, which is a generalisation of trophic levels, that can be defined on any simple graph and we interpret it as a network influence metric. We discuss how our generalisation relates to the previous definition and what new insights our generalisation shines on the topological and dynamical aspects of networks. We also show that the mean of hierarchical differences correlates strongly with the topology of the graph. Finally, we model an epidemiological dynamics and show how the statistical properties of hierarchical differences relate to the incidence rate and how it affects the spreading process in a SIS model."

to:NB
epidemics_on_networks
re:do-institutions-evolve
to_read
16 hours ago by cshalizi

[1907.09551] Cell differentiation: what have we learned in 50 years?

16 hours ago by cshalizi

"I revisit two theories of cell differentiation in multicellular organisms published a half-century ago, Stuart Kauffman's global gene regulatory dynamics (GGRD) model and Roy Britten's and Eric Davidson's modular gene regulatory network (MGRN) model, in light of newer knowledge of mechanisms of gene regulation in the metazoans (animals). The two models continue to inform hypotheses and computational studies of differentiation of lineage-adjacent cell types. However, their shared notion (based on bacterial regulatory systems) of gene switches and networks built from them, have constrained progress in understanding the dynamics and evolution of differentiation. Recent work has described unique write-read-rewrite chromatin-based expression encoding in eukaryotes, as well metazoan-specific processes of gene activation and silencing in condensed-phase, enhancer-recruiting regulatory hubs, employing disordered proteins, including transcription factors, with context-dependent identities. These findings suggest an evolutionary scenario in which the origination of differentiation in animals, rather than depending exclusively on adaptive natural selection, emerged as a consequence of a type of multicellularity in which the novel metazoan gene regulatory apparatus was readily mobilized to amplify and exaggerate inherent cell functions of unicellular ancestors. The plausibility of this hypothesis is illustrated by the evolution of the developmental role of Grainyhead-like in the formation of epithelium."

to:NB
developmental_biology
evolutionary_biology
gene_regulation
biochemical_networks
16 hours ago by cshalizi

[1909.03166] Equalizing Recourse across Groups

16 hours ago by cshalizi

"The rise in machine learning-assisted decision-making has led to concerns about the fairness of the decisions and techniques to mitigate problems of discrimination. If a negative decision is made about an individual (denying a loan, rejecting an application for housing, and so on) justice dictates that we be able to ask how we might change circumstances to get a favorable decision the next time. Moreover, the ability to change circumstances (a better education, improved credentials) should not be limited to only those with access to expensive resources. In other words, \emph{recourse} for negative decisions should be considered a desirable value that can be equalized across (demographically defined) groups. This paper describes how to build models that make accurate predictions while still ensuring that the penalties for a negative outcome do not disadvantage different groups disproportionately. We measure recourse as the distance of an individual from the decision boundary of a classifier. We then introduce a regularized objective to minimize the difference in recourse across groups. We explore linear settings and further extend recourse to non-linear settings as well as model-agnostic settings where the exact distance from boundary cannot be calculated. Our results show that we can successfully decrease the unfairness in recourse while maintaining classifier performance."

--- This seems like a very odd definition of "recourse", though Suresh is generally sensible so the last tag applies charitably.

to:NB
to_read
algorithmic_fairness
venkatasubramanian.suresh
decision-making
to_be_shot_after_a_fair_trial
--- This seems like a very odd definition of "recourse", though Suresh is generally sensible so the last tag applies charitably.

16 hours ago by cshalizi

[1908.01297] A Restricted Black-box Adversarial Framework Towards Attacking Graph Embedding Models

16 hours ago by cshalizi

"With the great success of graph embedding model on both academic and industry area, the robustness of graph embedding against adversarial attack inevitably becomes a central problem in graph learning domain. Regardless of the fruitful progress, most of the current works perform the attack in a white-box fashion: they need to access the model predictions and labels to construct their adversarial loss. However, the inaccessibility of model predictions in real systems makes the white-box attack impractical to real graph learning system. This paper promotes current frameworks in a more general and flexible sense -- we demand to attack various kinds of graph embedding model with black-box driven. To this end, we begin by investigating the theoretical connections between graph signal processing and graph embedding models in a principled way and formulate the graph embedding model as a general graph signal process with corresponding graph filter. As such, a generalized adversarial attacker: GF-Attack is constructed by the graph filter and feature matrix. Instead of accessing any knowledge of the target classifiers used in graph embedding, GF-Attack performs the attack only on the graph filter in a black-box attack fashion. To validate the generalization of GF-Attack, we construct the attacker on four popular graph embedding models. Extensive experimental results validate the effectiveness of our attacker on several benchmark datasets. Particularly by using our attack, even small graph perturbations like one-edge flip is able to consistently make a strong attack in performance to different graph embedding models."

to:NB
network_data_analysis
adversarial_examples
16 hours ago by cshalizi

[1904.08554] Gotta Catch 'Em All: Using Concealed Trapdoors to Detect Adversarial Attacks on Neural Networks

16 hours ago by cshalizi

"Deep neural networks are vulnerable to adversarial attacks. Numerous efforts have focused on defenses that either try to patch `holes' in trained models or try to make it difficult or costly to compute adversarial examples exploiting these holes. In our work, we explore a counter-intuitive approach of constructing "adversarial trapdoors. Unlike prior works that try to patch or disguise vulnerable points in the manifold, we intentionally inject `trapdoors,' artificial weaknesses in the manifold that attract optimized perturbation into certain pre-embedded local optima. As a result, the adversarial generation functions naturally gravitate towards our trapdoors, producing adversarial examples that the model owner can recognize through a known neuron activation signature. In this paper, we introduce trapdoors and describe an implementation of trapdoors using similar strategies to backdoor/Trojan attacks. We show that by proactively injecting trapdoors into the models (and extracting their neuron activation signature), we can detect adversarial examples generated by the state of the art attacks (Projected Gradient Descent, Optimization based CW, and Elastic Net) with high detection success rate and negligible impact on normal inputs. These results also generalize across multiple classification domains (image recognition, face recognition and traffic sign recognition). We explore different properties of trapdoors, and discuss potential countermeasures (adaptive attacks) and mitigations."

to:NB
adversarial_examples
16 hours ago by cshalizi

[1907.11932] Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment

16 hours ago by cshalizi

"Machine learning algorithms are often vulnerable to adversarial examples that have imperceptible alterations from the original counterparts but can fool the state-of-the-art models. It is helpful to evaluate or even improve the robustness of these models by exposing the maliciously crafted adversarial examples. In this paper, we present TextFooler, a simple but strong baseline to generate natural adversarial text. By applying it to two fundamental natural language tasks, text classification and textual entailment, we successfully attacked three target models, including the powerful pre-trained BERT, and the widely used convolutional and recurrent neural networks. We demonstrate the advantages of this framework in three ways: (1) effective---it outperforms state-of-the-art attacks in terms of success rate and perturbation rate, (2) utility-preserving---it preserves semantic content and grammaticality, and remains correctly classified by humans, and (3) efficient---it generates adversarial text with computational complexity linear to the text length."

to:NB
adversarial_examples
text_mining
16 hours ago by cshalizi

[1905.07360] Contrastive Fairness in Machine Learning

16 hours ago by cshalizi

"Was it fair that Harry was hired but not Barry? Was it fair that Pam was fired instead of Sam? How can one ensure fairness when an intelligent algorithm takes these decisions instead of a human? How can one ensure that the decisions were taken based on merit and not on protected attributes like race or sex? These are the questions that must be answered now that many decisions in real life can be made through machine learning. However research in fairness of algorithms has focused on the counterfactual questions "what if?" or "why?", whereas in real life most subjective questions of consequence are contrastive: "why this but not that?". We introduce concepts and mathematical tools using causal inference to address contrastive fairness in algorithmic decision-making with illustrative examples."

to:NB
causal_inference
algorithmic_fairness
statistics
16 hours ago by cshalizi

[1906.00232] Kernel Instrumental Variable Regression

16 hours ago by cshalizi

"Instrumental variable regression is a strategy for learning causal relationships in observational data. If measurements of input X and output Y are confounded, the causal relationship can nonetheless be identified if an instrumental variable Z is available that influences X directly, but is conditionally independent of Y given X and the unmeasured confounder. The classic two-stage least squares algorithm (2SLS) simplifies the estimation problem by modeling all relationships as linear functions. We propose kernel instrumental variable regression (KIV), a nonparametric generalization of 2SLS, modeling relations among X, Y, and Z as nonlinear functions in reproducing kernel Hilbert spaces (RKHSs). We prove the consistency of KIV under mild assumptions, and derive conditions under which the convergence rate achieves the minimax optimal rate for unconfounded, one-stage RKHS regression. In doing so, we obtain an efficient ratio between training sample sizes used in the algorithm's first and second stages. In experiments, KIV outperforms state of the art alternatives for nonparametric instrumental variable regression. Of independent interest, we provide a more general theory of conditional mean embedding regression in which the RKHS has infinite dimension."

to:NB
instrumental_variables
kernel_estimators
regression
nonparametrics
causal_inference
statistics
re:ADAfaEPoV
to_read
16 hours ago by cshalizi

[1909.03217] Community detection in inhomogeneous random graphs

16 hours ago by cshalizi

"We study the problem of detecting whether an inhomogeneous random graph contains a planted community. Specifically, we observe a single realization of a graph. Under the null hypothesis, this graph is a sample from an inhomogeneous random graph, whereas under the alternative, there exists a small subgraph where the edge probabilities are increased by a multiplicative scaling factor. We present a scan test that is able to detect the presence of such a planted community, even when this community is very small and the underlying graph is inhomogeneous. We also derive an information theoretic lower bound for this problem which shows that in some regimes the scan test is almost asymptotically optimal. We illustrate our results through examples and numerical experiments."

to:NB
community_discovery
network_data_analysis
statistics
16 hours ago by cshalizi

[1909.03302] On the Optimality of Gaussian Kernel Based Nonparametric Tests against Smooth Alternatives

16 hours ago by cshalizi

"Nonparametric tests via kernel embedding of distributions have witnessed a great deal of practical successes in recent years. However, statistical properties of these tests are largely unknown beyond consistency against a fixed alternative. To fill in this void, we study here the asymptotic properties of goodness-of-fit, homogeneity and independence tests using Gaussian kernels, arguably the most popular and successful among such tests. Our results provide theoretical justifications for this common practice by showing that tests using Gaussian kernel with an appropriately chosen scaling parameter are minimax optimal against smooth alternatives in all three settings. In addition, our analysis also pinpoints the importance of choosing a diverging scaling parameter when using Gaussian kernels and suggests a data-driven choice of the scaling parameter that yields tests optimal, up to an iterated logarithmic factor, over a wide range of smooth alternatives. Numerical experiments are also presented to further demonstrate the practical merits of the methodology."

to:NB
kernel_methods
hilbert_space
goodness-of-fit
hypothesis_testing
statistics
16 hours ago by cshalizi

[1902.02408] Weak consistency of the 1-nearest neighbor measure with applications to missing data

16 hours ago by cshalizi

"When data is partially missing at random, imputation and importance weighting are often used to estimate moments of the unobserved population. In this paper, we study 1-nearest neighbor (1NN) importance weighting, which estimates moments by replacing missing data with the complete data that is the nearest neighbor in the non-missing covariate space. We define an empirical measure, the 1NN measure, and show that it is weakly consistent for the measure of the missing data. The main idea behind this result is that the 1NN measure is performing inverse probability weighting in the limit. We study applications to missing data and mitigating the impact of covariate shift in prediction tasks."

to:NB
missing_data
density_estimation
statistics
sharpnack.james
re:ADAfaEPoV
16 hours ago by cshalizi

[1509.02556] Identification, Doubly Robust Estimation, and Semiparametric Efficiency Theory of Nonignorable Missing Data With a Shadow Variable

16 hours ago by cshalizi

"We consider identification and estimation with an outcome missing not at random (MNAR). We study an identification strategy based on a so-called shadow variable. A shadow variable is assumed to be correlated with the outcome, but independent of the missingness process conditional on the outcome and fully observed covariates. We describe a general condition for nonparametric identification of the full data law under MNAR using a valid shadow variable. Our condition is satisfied by many commonly-used models; moreover, it is imposed on the complete cases, and therefore has testable implications with observed data only. We describe semiparametric estimation methods and evaluate their performance on both simulation data and a real data example. We characterize the semiparametric efficiency bound for the class of regular and asymptotically linear estimators, and derive a closed form for the efficient influence function."

to:NB
statistics
missing_data
16 hours ago by cshalizi

On Whorfian Socioeconomics by Thomas B. Pepinsky :: SSRN

16 hours ago by cshalizi

"Whorfian socioeconomics is an emerging interdisciplinary field of study that holds that linguistic structures explain differences in beliefs, values, and opinions across communities. Its core empirical strategy is to document a correlation between the presence or absence of a linguistic feature in a survey respondent’s language, and her/his responses to survey questions. This essay demonstrates — using the universe of linguistic features from the World Atlas of Language Structures and a wide array of responses from the World Values Survey — that such an approach produces highly statistically significant correlations in a majority of analyses, irrespective of the theoretical plausibility linking linguistic features to respondent beliefs. These results raise the possibility that correlations between linguistic features and survey responses are actually spurious. The essay concludes by showing how two simple and well-understood statistical fixes can more accurately reflect uncertainty in these analyses, reducing the temptation for analysts to create implausible Whorfian theories to explain spurious linguistic correlations."

to:NB
to_read
linguistics
economics
social_science_methodology
pepinsky.thomas_b.
debunking
16 hours ago by cshalizi

[1909.03968] Tree-based Control Methods: Consequences of Moving the US Embassy

16 hours ago by cshalizi

"We recast the synthetic controls for evaluating policies as a counterfactual prediction problem and replace its linear regression with a non-parametric model inspired by machine learning. The proposed method enables us to achieve more accurate counterfactual predictions. We apply our method to a highly-debated policy: the movement of the US embassy to Jerusalem. In Israel and Palestine, we find that the average number of weekly conflicts has increased by roughly 103 % over 48 weeks since the movement was announced on December 6, 2017. Using conformal inference tests, we justify our model and find the increase to be statistically significant."

--- I am very skeptical of the application, but interested in the methodology.

to:NB
causal_inference
statistics
economics
nonparametrics
regression
re:ADAfaEPoV
--- I am very skeptical of the application, but interested in the methodology.

16 hours ago by cshalizi

[1909.03238] Linear response and moderate deviations: hierarchical approach. V

16 hours ago by cshalizi

"The Moderate Deviations Principle (MDP) is well-understood for sums of independent random variables, worse understood for stationary random sequences, and scantily understood for random fields. Here it is established for some planary random fields of the form Xt=ψ(Gt) obtained from a Gaussian random field Gt via a function ψ, and consequently, for zeroes of the Gaussian Entire Function."

to:NB
central_limit_theorem
random_fields
stochastic_processes
16 hours ago by cshalizi

[1909.03320] Matrix Calculations for Moments of Markov Processes

16 hours ago by cshalizi

"Matryoshka dolls, the traditional Russian nesting figurines, are known world-wide for each doll's encapsulation of a sequence of smaller dolls. In this paper, we identify a large class of Markov process whose moments are easy to compute by exploiting the structure of a new sequence of nested matrices we call Matryoshkhan matrices. We characterize the salient properties of Matryoshkhan matrices that allow us to compute these moments in closed form at a specific time without computing the entire path of the process. This speeds up the computation of the Markov process moments significantly in comparison to traditional differential equation methods, which we demonstrate through numerical experiments. Through our method, we derive explicit expressions for both transient and steady-state moments of this class of Markov processes. We demonstrate the applicability of this method through explicit examples such as shot-noise processes, growth-collapse processes, linear birth-death-immigration processes, and affine stochastic differential equations from the finance literature. We also show that we can derive explicit expressions for the self-exciting Hawkes process, for which finding closed form moment expressions has been an open problem since its introduction."

to:NB
markov_models
stochastic_processes
computational_statistics
16 hours ago by cshalizi

[1909.03479] Maximum principle for stochastic recursive optimal control problem under model uncertainty

16 hours ago by cshalizi

"In this paper, we consider a stochastic recursive optimal control problem under model uncertainty. In this framework, the cost function is described by solutions of a family of backward stochastic differential equations. With the help of the linearization techniques and weak convergence methods, we derive the corresponding stochastic maximum principle. Moreover, a linear quadratic robust control problem is also studied."

to:NB
control_theory_and_control_engineering
stochastic_processes
16 hours ago by cshalizi

[1909.03805] Large Time Behaviour and the Second Eigenvalue Problem for Finite State Mean-Field Interacting Particle Systems

16 hours ago by cshalizi

"This article examines large time behaviour and the second eigenvalue problem for Markovian mean-field interacting particle systems with jumps. Our first main result is on the time required for convergence of the empirical measure process of the particle system to its invariant measure; we show that, there is a constant Λ≥0 such that, when there are N particles in the system and when time is of the order exp{N(Λ+O(1))}, the process has mixed well and is very close to its invariant measure. We then obtain large-N asymptotics of the second eigenvalue of the generator associated with the empirical measure process when it is reversible with respect to its invariant measure. We show that its absolute value scales like exp{−NΛ}. The main tools used in establishing our results are the large deviation properties of the empirical measure process from its large-N limit. As an application of the study of large time behaviour, we also show convergence of the empirical measure of the system of particles to a global minimum of a certain entropy function when particles are added over time in a controlled fashion. The controlled addition of particles is analogous to the cooling schedule associated with the search for a global minimum of a function using the simulated annealing algorithm."

to:NB
interacting_particle_systems
markov_models
stochastic_processes
large_deviations
re:do-institutions-evolve
16 hours ago by cshalizi

[1811.06448] From weakly interacting particles to a regularised Dean--Kawasaki model

16 hours ago by cshalizi

"The evolution of finitely many particles obeying Langevin dynamics is described by Dean-Kawasaki equations, a class of stochastic equations featuring a non-Lipschitz multiplicative noise in divergence form. We derive a regularised Dean-Kawasaki model based on second order Langevin dynamics by analysing a system of particles interacting via a pairwise potential. Key tools of our analysis are the propagation of chaos and Simon's compactness criterion. The model we obtain is a small-noise stochastic perturbation of the undamped McKean-Vlasov equation. We also provide a high-probability result for existence and uniqueness for our model."

to:NB
interacting_particle_systems
stochastic_processes
16 hours ago by cshalizi

[1909.03550] Lecture Notes: Optimization for Machine Learning

16 hours ago by cshalizi

"Lecture notes on optimization for machine learning, derived from a course at Princeton University and tutorials given in MLSS, Buenos Aires, as well as Simons Foundation, Berkeley."

to:NB
optimization
learning_theory
hazan.elad
16 hours ago by cshalizi

[1909.03586] Curve Fitting from Probabilistic Emissions and Applications to Dynamic Item Response Theory

16 hours ago by cshalizi

"Item response theory (IRT) models are widely used in psychometrics and educational measurement, being deployed in many high stakes tests such as the GRE aptitude test. IRT has largely focused on estimation of a single latent trait (e.g. ability) that remains static through the collection of item responses. However, in contemporary settings where item responses are being continuously collected, such as Massive Open Online Courses (MOOCs), interest will naturally be on the dynamics of ability, thus complicating usage of traditional IRT models. We propose DynAEsti, an augmentation of the traditional IRT Expectation Maximization algorithm that allows ability to be a continuously varying curve over time. In the process, we develop CurvFiFE, a novel non-parametric continuous-time technique that handles the curve-fitting/regression problem extended to address more general probabilistic emissions (as opposed to simply noisy data points). Furthermore, to accomplish this, we develop a novel technique called grafting, which can successfully approximate distributions represented by graphical models when other popular techniques like Loopy Belief Propogation (LBP) and Variational Inference (VI) fail. The performance of DynAEsti is evaluated through simulation, where we achieve results comparable to the optimal of what is observed in the static ability scenario. Finally, DynAEsti is applied to a longitudinal performance dataset (80-years of competitive golf at the 18-hole Masters Tournament) to demonstrate its ability to recover key properties of human performance and the heterogeneous characteristics of the different holes. Python code for CurvFiFE and DynAEsti is publicly available at this http URL. This is the full version of our ICDM 2019 paper."

to:NB
inference_to_latent_objects
graphical_models
statistics
psychometrics
non-stationarity
16 hours ago by cshalizi

[1909.03681] Outlier Detection in High Dimensional Data

16 hours ago by cshalizi

"High-dimensional data poses unique challenges in outlier detection process. Most of the existing algorithms fail to properly address the issues stemming from a large number of features. In particular, outlier detection algorithms perform poorly on data set of small size with a large number of features. In this paper, we propose a novel outlier detection algorithm based on principal component analysis and kernel density estimation. The proposed method is designed to address the challenges of dealing with high-dimensional data by projecting the original data onto a smaller space and using the innate structure of the data to calculate anomaly scores for each data point. Numerical experiments on synthetic and real-life data show that our method performs well on high-dimensional data. In particular, the proposed method outperforms the benchmark methods as measured by the F1-score. Our method also produces better-than-average execution times compared to the benchmark methods."

--- Seems OK but ad hoc. Might make a decent extension to the eigendresses assignment for data mining.

to:NB
anomaly_detection
density_estimation
principal_components
high-dimensional_statistics
statistics
to_teach:data-mining
--- Seems OK but ad hoc. Might make a decent extension to the eigendresses assignment for data mining.

16 hours ago by cshalizi

[0804.4451] Dependence Structure Estimation via Copula

16 hours ago by cshalizi

"Dependence strucuture estimation is one of the important problems in machine learning domain and has many applications in different scientific areas. In this paper, a theoretical framework for such estimation based on copula and copula entropy -- the probabilistic theory of representation and measurement of statistical dependence, is proposed. Graphical models are considered as a special case of the copula framework. A method of the framework for estimating maximum spanning copula is proposed. Due to copula, the method is irrelevant to the properties of individual variables, insensitive to outlier and able to deal with non-Gaussianity. Experiments on both simulated data and real dataset demonstrated the effectiveness of the proposed method."

--- How does this differ from the "nonparanormal" models we know and love around here?

to:NB
causal_discovery
copulas
statistics
graphical_models
to_be_shot_after_a_fair_trial
--- How does this differ from the "nonparanormal" models we know and love around here?

16 hours ago by cshalizi

[1908.09946] An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations

16 hours ago by cshalizi

"K-Means is one of the most used algorithms for data clustering and the usual clustering method for benchmarking. Despite its wide application it is well-known that it suffers from a series of disadvantages, such as the positions of the initial clustering centres (centroids), which can greatly affect the clustering solution. Over the years many K-Means variations and initialisations techniques have been proposed with different degrees of complexity. In this study we focus on common K-Means variations and deterministic initialisation techniques and we first show that more sophisticated initialisation methods reduce or alleviates the need of complex K-Means clustering, and secondly, that deterministic methods can achieve equivalent or better performance than stochastic methods. These conclusions are obtained through extensive benchmarking using different model data sets from various studies as well as clustering data sets."

to:NB
clustering
k-means
data_mining
to_teach:data-mining
16 hours ago by cshalizi

[1804.09220] A Useful Version of the Central Limit Theorem for a General Class of Markov Chains

2 days ago by cshalizi

"In the paper we propose certain conditions, relatively easy to verify, which ensure the central limit theorem for some general class of Markov chains. To justify the usefulness of our criterion, we further verify it for a particular discrete-time Markov dynamical system. From the application point of view, the examined system provides a useful tool in analysing the stochastic dynamics of gene expression in prokaryotes."

to:NB
central_limit_theorem
markov_models
stochastic_processes
2 days ago by cshalizi

[1404.2035] Strongly continuous and locally equi-continuous semigroups on locally convex spaces

2 days ago by cshalizi

"We consider locally equi-continuous strongly continuous semigroups on locally convex spaces (X,tau). First, we show that if (X,tau) has the property that weak* compact sets of the dual are equi-continuous, then strong continuity of the semigroup is equivalent to weak continuity and local equi-continuity.

"Second, we consider locally convex spaces (X,tau) that are also equipped with a `suitable' auxiliary norm. We introduce the set N of tau continuous semi-norms that are bounded by the norm. If (X,tau) has the property that N is closed under countable convex combinations, then a number of Banach space results can be generalised in a straightforward way. Importantly, we extend the Hille-Yosida theorem.

"We apply the results to the study of transition semigroups of Markov processes on complete separable metric spaces."

to:NB
algebra
markov_models
stochastic_processes
re:almost_none
"Second, we consider locally convex spaces (X,tau) that are also equipped with a `suitable' auxiliary norm. We introduce the set N of tau continuous semi-norms that are bounded by the norm. If (X,tau) has the property that N is closed under countable convex combinations, then a number of Banach space results can be generalised in a straightforward way. Importantly, we extend the Hille-Yosida theorem.

"We apply the results to the study of transition semigroups of Markov processes on complete separable metric spaces."

2 days ago by cshalizi

[1412.8674] Infinite-dimensional stochastic differential equations and tail $ σ$-fields

2 days ago by cshalizi

"We present general theorems solving the long-standing problem of the existence and pathwise uniqueness of strong solutions of infinite-dimensional stochastic differential equations (ISDEs) called interacting Brownian motions. These ISDEs describe the dynamics of infinite-many Brownian particles moving in ℝd with free potential Φ and mutual interaction potential Ψ.

"We apply the theorems to essentially all interaction potentials of Ruelle's class such as the Lennard-Jones 6-12 potential and Riesz potentials, and to logarithmic potentials appearing in random matrix theory. We solve ISDEs of the Ginibre interacting Brownian motion and the sineβ interacting Brownian motion with β=1,2,4. We also use the theorems in separate papers for the Airy and Bessel interacting Brownian motions. One of the critical points for proving the general theorems is to establish a new formulation of solutions of ISDEs in terms of tail σ-fields of labeled path spaces consisting of trajectories of infinitely many particles. These formulations are equivalent to the original notions of solutions of ISDEs, and more feasible to treat in infinite dimensions."

to:NB
stochastic_processes
stochastic_differential_equations
interacting_particle_systems
re:almost_none
"We apply the theorems to essentially all interaction potentials of Ruelle's class such as the Lennard-Jones 6-12 potential and Riesz potentials, and to logarithmic potentials appearing in random matrix theory. We solve ISDEs of the Ginibre interacting Brownian motion and the sineβ interacting Brownian motion with β=1,2,4. We also use the theorems in separate papers for the Airy and Bessel interacting Brownian motions. One of the critical points for proving the general theorems is to establish a new formulation of solutions of ISDEs in terms of tail σ-fields of labeled path spaces consisting of trajectories of infinitely many particles. These formulations are equivalent to the original notions of solutions of ISDEs, and more feasible to treat in infinite dimensions."

2 days ago by cshalizi

[1909.05794] Stationary distributions of continuous-time Markov chains: a review of theory and truncation-based approximations

2 days ago by cshalizi

"Computing the stationary distributions of a continuous-time Markov chain involves solving a set of linear equations. In most cases of interest, the number of equations is infinite or too large, and cannot be solved analytically or numerically. Several approximation schemes overcome this issue by truncating the state space to a manageable size. In this review, we first give a comprehensive theoretical account of the stationary distributions and their relation to the long-term behaviour of the Markov chain, which is readily accessible to non-experts and free of irreducibility assumptions made in standard texts. We then review truncation-based approximation schemes paying particular attention to their convergence and to the errors they introduce, and we illustrate their performance with an example of a stochastic reaction network of relevance in biology and chemistry. We conclude by elaborating on computational trade-offs associated with error control and some open questions."

to:NB
markov_models
stochastic_processes
re:almost_none
approximation
2 days ago by cshalizi

[1909.05418] The Global Markov Property for a Mixture of DAGs

2 days ago by cshalizi

"Real causal processes may contain feedback loops and change over time. In this paper, we model cycles and non-stationary distributions using a mixture of directed acyclic graphs (DAGs). We then study the conditional independence (CI) relations induced by a density that factorizes according to a mixture of DAGs in two steps. First, we generalize d-separation for a single DAG to mixture d-separation for a mixture of DAGs. We then utilize the mixture d-separation criterion to derive a global Markov property that allows us to read off the CI relations induced by a mixture of DAGs using a particular summary graph. This result has potentially far reaching applications in algorithm design for causal discovery."

to:NB
causal_discovery
graphical_models
statistics
to_be_shot_after_a_fair_trial
2 days ago by cshalizi

[1909.05495] Optimal choice of $k$ for $k$-nearest neighbor regression

2 days ago by cshalizi

"The k-nearest neighbor algorithm (k-NN) is a widely used non-parametric method for classification and regression. We study the mean squared error of the k-NN estimator when k is chosen by leave-one-out cross-validation (LOOCV). Although it was known that this choice of k is asymptotically consistent, it was not known previously that it is an optimal k. We show, with high probability, the mean squared error of this estimator is close to the minimum mean squared error using the k-NN estimate, where the minimum is over all choices of k."

--- Looks legit on first pass (and we know that LOOCV is generally _predictively_ good).

to:NB
regression
nearest_neighbors
statistics
cross-validation
to_teach:data-mining
have_skimmed
--- Looks legit on first pass (and we know that LOOCV is generally _predictively_ good).

2 days ago by cshalizi

[1909.05570] Sharp Large Deviations for empirical correlation coefficients

2 days ago by cshalizi

"We study Sharp Large Deviations for Pearson's empirical correlation coefficients in the Spherical and Gaussian cases."

--- Perhaps a problem set for _Almost None_?

to:NB
large_deviations
statistics
re:almost_none
--- Perhaps a problem set for _Almost None_?

2 days ago by cshalizi

[1909.05582] A taxonomy of estimator consistency on discrete estimation problems

2 days ago by cshalizi

"We describe a four-level hierarchy mapping both all discrete estimation problems and all estimators on these problems, such that the hierarchy describes each estimator's consistency guarantees on each problem class. We show that no estimator is consistent for all estimation problems, but that some estimators, such as Maximum A Posteriori, are consistent for the widest possible class of discrete estimation problems. For Maximum Likelihood and Approximate Maximum Likelihood estimators we show that they do not provide consistency on as wide a class, but define a sub-class of problems characterised by their consistency. Lastly, we show that some popular estimators, specifically Strict Minimum Message Length, do not provide consistency guarantees even within the sub-class."

to:NB
statistics
estimation
model_selection
2 days ago by cshalizi

Dolera , Regazzini : Uniform rates of the Glivenko–Cantelli convergence and their use in approximating Bayesian inferences

2 days ago by cshalizi

"This paper deals with suitable quantifications in approximating a probability measure by an “empirical” random probability measure 𝔭̂ np^n, depending on the first nn terms of a sequence {ξ̃ i}i≥1{ξ~i}i≥1 of random elements. Section 2 studies the range of oscillation near zero of the Wasserstein distance d(p)[𝕊]d[S](p) between 𝔭0p0 and 𝔭̂ np^n, assuming the ξ̃ iξ~i’s i.i.d. from 𝔭0p0. In Theorem 2.1 𝔭0p0 can be fixed in the space of all probability measures on (ℝd,ℬ(ℝd))(Rd,B(Rd)) and 𝔭̂ np^n coincides with the empirical measure 𝔢̃ n:=1n∑ni=1δξ̃ ie~n:=1n∑i=1nδξ~i. In Theorem 2.2 (Theorem 2.3, respectively), 𝔭0p0 is a dd-dimensional Gaussian distribution (an element of a distinguished statistical exponential family, respectively) and 𝔭̂ np^n is another dd-dimensional Gaussian distribution with estimated mean and covariance matrix (another element of the same family with an estimated parameter, respectively). These new results improve on allied recent works by providing also uniform bounds with respect to nn, meaning the finiteness of the pp-moment of supn≥1bnd(p)[𝕊](𝔭0,𝔭̂ n)supn≥1bnd[S](p)(p0,p^n) is proved for some diverging sequence bnbn of positive numbers. In Section 3, assuming the ξ̃ iξ~i’s exchangeable, one studies the range of oscillation near zero of the Wasserstein distance between the conditional distribution – also called posterior – of the directing measure of the sequence, given ξ̃ 1,…,ξ̃ nξ~1,…,ξ~n, and the point mass at 𝔭̂ np^n. Similarly, a bound for the approximation of predictive distributions is given. Finally, Theorems from 3.3 to 3.5 reconsider Theorems from 2.1 to 2.3, respectively, according to a Bayesian perspective."

to:NB
empirical_processes
statistics
re:fitness_sampling
2 days ago by cshalizi

[1909.05299] Counterfactual Cross-Validation: Effective Causal Model Selection from Observational Data

2 days ago by cshalizi

"What is the most effective way to select the best causal model among potential candidates? In this paper, we propose a method to effectively select the best individual-level treatment effect (ITE) predictors from a set of candidates using only an observational validation set. In model selection or hyperparameter tuning, we are interested in choosing the best model or the value of hyperparameter from potential candidates. Thus, we focus on accurately preserving the rank order of the ITE prediction performance of candidate causal models. The proposed evaluation metric is theoretically proved to preserve the true ranking of the model performance in expectation and to minimize the upper bound of the finite sample uncertainty in model selection. Consistent with the theoretical result, empirical experiments demonstrate that our proposed method is more likely to select the best model and set of hyperparameter in both model selection and hyperparameter tuning."

--- Their goal is a good one, but the fundamental issue is that we don't _have_ observations of individual-level causal effects to cross-validate against. So they proxy that by a very standard doubly-robust estimator of said effects; at which point, why not just use that estimator? In any case, I want to see comparisons to the Naive Statistician's approach of just cross-validating for outcomes (rather than differences in potential outcomes).

to:NB
cross-validation
causal_inference
statistics
model_selection
have_skimmed
shot_after_a_fair_trial
--- Their goal is a good one, but the fundamental issue is that we don't _have_ observations of individual-level causal effects to cross-validate against. So they proxy that by a very standard doubly-robust estimator of said effects; at which point, why not just use that estimator? In any case, I want to see comparisons to the Naive Statistician's approach of just cross-validating for outcomes (rather than differences in potential outcomes).

2 days ago by cshalizi

Capitalism, Alone — Branko Milanovic | Harvard University Press

3 days ago by cshalizi

"We are all capitalists now. For the first time in human history, the globe is dominated by one economic system. In Capitalism, Alone, leading economist Branko Milanovic explains the reasons for this decisive historical shift since the days of feudalism and, later, communism. Surveying the varieties of capitalism, he asks: What are the prospects for a fairer world now that capitalism is the only game in town? His conclusions are sobering, but not fatalistic. Capitalism gets much wrong, but also much right—and it is not going anywhere. Our task is to improve it.

"Milanovic argues that capitalism has triumphed because it works. It delivers prosperity and gratifies human desires for autonomy. But it comes with a moral price, pushing us to treat material success as the ultimate goal. And it offers no guarantee of stability. In the West, liberal capitalism creaks under the strains of inequality and capitalist excess. That model now fights for hearts and minds with political capitalism, exemplified by China, which many claim is more efficient, but which is more vulnerable to corruption and, when growth is slow, social unrest. As for the economic problems of the Global South, Milanovic offers a creative, if controversial, plan for large-scale migration. Looking to the future, he dismisses prophets who proclaim some single outcome to be inevitable, whether worldwide prosperity or robot-driven mass unemployment. Capitalism is a risky system. But it is a human system. Our choices, and how clearly we see them, will determine how it serves us."

to:NB
books:noted
milanovic.branko
economics
political_economy
"Milanovic argues that capitalism has triumphed because it works. It delivers prosperity and gratifies human desires for autonomy. But it comes with a moral price, pushing us to treat material success as the ultimate goal. And it offers no guarantee of stability. In the West, liberal capitalism creaks under the strains of inequality and capitalist excess. That model now fights for hearts and minds with political capitalism, exemplified by China, which many claim is more efficient, but which is more vulnerable to corruption and, when growth is slow, social unrest. As for the economic problems of the Global South, Milanovic offers a creative, if controversial, plan for large-scale migration. Looking to the future, he dismisses prophets who proclaim some single outcome to be inevitable, whether worldwide prosperity or robot-driven mass unemployment. Capitalism is a risky system. But it is a human system. Our choices, and how clearly we see them, will determine how it serves us."

3 days ago by cshalizi

The Great Reversal — Thomas Philippon | Harvard University Press

3 days ago by cshalizi

"In this much-anticipated book, a leading economist argues that many key problems of the American economy are due not to the flaws of capitalism or the inevitabilities of globalization but to the concentration of corporate power. By lobbying against competition, the biggest firms drive profits higher while depressing wages and limiting opportunities for investment, innovation, and growth.

"Why are cell-phone plans so much more expensive in the United States than in Europe? It seems a simple question. But the search for an answer took Thomas Philippon on an unexpected journey through some of the most complex and hotly debated issues in modern economics. Ultimately he reached his surprising conclusion: American markets, once a model for the world, are giving up on healthy competition. Sector after economic sector is more concentrated than it was twenty years ago, dominated by fewer and bigger players who lobby politicians aggressively to protect and expand their profit margins. Across the country, this drives up prices while driving down investment, productivity, growth, and wages, resulting in more inequality. Meanwhile, Europe—long dismissed for competitive sclerosis and weak antitrust—is beating America at its own game.

"Philippon, one of the world’s leading economists, did not expect these conclusions in the age of Silicon Valley start-ups and millennial millionaires. But the data from his cutting-edge research proved undeniable. In this compelling tale of economic detective work, we follow him as he works out the basic facts and consequences of industry concentration in the U.S. and Europe, shows how lobbying and campaign contributions have defanged antitrust regulators, and considers what all this means for free trade, technology, and innovation. For the sake of ordinary Americans, he concludes, government needs to return to what it once did best: keeping the playing field level for competition. It’s time to make American markets great—and free—again."

to:NB
books:noted
economics
political_economy
economic_policy
market_failures_in_everything
class_struggles_in_america
"Why are cell-phone plans so much more expensive in the United States than in Europe? It seems a simple question. But the search for an answer took Thomas Philippon on an unexpected journey through some of the most complex and hotly debated issues in modern economics. Ultimately he reached his surprising conclusion: American markets, once a model for the world, are giving up on healthy competition. Sector after economic sector is more concentrated than it was twenty years ago, dominated by fewer and bigger players who lobby politicians aggressively to protect and expand their profit margins. Across the country, this drives up prices while driving down investment, productivity, growth, and wages, resulting in more inequality. Meanwhile, Europe—long dismissed for competitive sclerosis and weak antitrust—is beating America at its own game.

"Philippon, one of the world’s leading economists, did not expect these conclusions in the age of Silicon Valley start-ups and millennial millionaires. But the data from his cutting-edge research proved undeniable. In this compelling tale of economic detective work, we follow him as he works out the basic facts and consequences of industry concentration in the U.S. and Europe, shows how lobbying and campaign contributions have defanged antitrust regulators, and considers what all this means for free trade, technology, and innovation. For the sake of ordinary Americans, he concludes, government needs to return to what it once did best: keeping the playing field level for competition. It’s time to make American markets great—and free—again."

3 days ago by cshalizi

Unbound — Heather Boushey | Harvard University Press

3 days ago by cshalizi

"Do we have to choose between equality and prosperity? Many think that reducing economic inequality would require such heavy-handed interference with market forces that it would stifle economic growth. Heather Boushey, one of Washington’s most influential economic voices, insists nothing could be further from the truth. Presenting cutting-edge economics with journalistic verve, she shows how rising inequality has become a drag on growth and an impediment to a competitive United States marketplace for employers and employees alike.

"Boushey argues that inequality undermines growth in three ways. It obstructs the supply of talent, ideas, and capital as wealthy families monopolize the best educational, social, and economic opportunities. It also subverts private competition and public investment. Powerful corporations muscle competitors out of business, in the process costing consumers, suppressing wages, and hobbling innovation, while governments underfund key public goods that make the American Dream possible, from schools to transportation infrastructure to information and communication technology networks. Finally, it distorts consumer demand as stagnant wages and meager workplace benefits rob ordinary people of buying power and pushes the economy toward financial instability.

"Boushey makes this case with a clear, accessible tour of the best of contemporary economic research, while also injecting a passion for her subject gained through years of research into the economics of work–life conflict and policy work in the trenches of federal government. Unbound exposes deep problems in the U.S. economy, but its conclusion is optimistic. We can preserve the best of our nation’s economic and political traditions, and improve on them, by pursuing policies that reduce inequality—and by doing so, boost broadly shared economic growth."

to:NB
economics
inequality
political_economy
class_struggles_in_america
"Boushey argues that inequality undermines growth in three ways. It obstructs the supply of talent, ideas, and capital as wealthy families monopolize the best educational, social, and economic opportunities. It also subverts private competition and public investment. Powerful corporations muscle competitors out of business, in the process costing consumers, suppressing wages, and hobbling innovation, while governments underfund key public goods that make the American Dream possible, from schools to transportation infrastructure to information and communication technology networks. Finally, it distorts consumer demand as stagnant wages and meager workplace benefits rob ordinary people of buying power and pushes the economy toward financial instability.

"Boushey makes this case with a clear, accessible tour of the best of contemporary economic research, while also injecting a passion for her subject gained through years of research into the economics of work–life conflict and policy work in the trenches of federal government. Unbound exposes deep problems in the U.S. economy, but its conclusion is optimistic. We can preserve the best of our nation’s economic and political traditions, and improve on them, by pursuing policies that reduce inequality—and by doing so, boost broadly shared economic growth."

3 days ago by cshalizi

Search Advertising and Information Discovery: Are Consumers Averse to Sponsored Messages? by Navdeep S. Sahni, Charles Zhang :: SSRN

3 days ago by cshalizi

"We analyze a large-scale field experiment conducted on a US search engine in which 3.3 million users were randomized into seeing more, or less advertising. Our data rejects that users are, overall, averse to search advertising targeted to them. At the margin, users prefer the search engine with higher level of advertising. The usage of the search engine (in terms of number of searches, and number of search sessions) is higher among users who see higher levels of advertising, relative to the control group. This difference in usage persists even after the experimental treatment ends. The increase in usage is higher for users on the margin who, in the past, typed a competing search engine's name in the search query and navigated away from our focal search engine. On the supply side, higher level of advertising increases traffic to newer websites. Consumer response to search advertising is also more positive when more businesses located in the consumer's state create new websites. Quantitatively, the experimental treatment of a higher level of advertising increases ad clicks which leads to between 4.3% to 14.6% increase in search engine revenue.

"Overall, patterns in our data are consistent with an equilibrium in which advertising conveys relevant “local” information, which is unknown to the search engine, and therefore missed by the organic listings algorithm. Hence, search advertising makes consumers better off on average. On the margin, the search engine does not face a trade-off between advertising revenue and search engine usage."

to:NB
advertising
search_engines
networked_life
huh
"Overall, patterns in our data are consistent with an equilibrium in which advertising conveys relevant “local” information, which is unknown to the search engine, and therefore missed by the organic listings algorithm. Hence, search advertising makes consumers better off on average. On the margin, the search engine does not face a trade-off between advertising revenue and search engine usage."

3 days ago by cshalizi

[1702.08109] Variational Analysis of Constrained M-Estimators

3 days ago by cshalizi

"We propose a unified framework for establishing existence of nonparametric M-estimators, computing the corresponding estimates, and proving their strong consistency when the class of functions is exceptionally rich. In particular, the framework addresses situations where the class of functions is complex involving information and assumptions about shape, pointwise bounds, location of modes, height at modes, location of level-sets, values of moments, size of subgradients, continuity, distance to a "prior" function, multivariate total positivity, and any combination of the above. The class might be engineered to perform well in a specific setting even in the presence of little data. The framework views the class of functions as a subset of a particular metric space of upper semicontinuous functions under the Attouch-Wets distance. In addition to allowing a systematic treatment of numerous M-estimators, the framework yields consistency of plug-in estimators of modes of densities, maximizers of regression functions, level-sets of classifiers, and related quantities, and also enables computation by means of approximating parametric classes. We establish consistency through a one-sided law of large numbers, here extended to sieves, that relaxes assumptions of uniform laws, while ensuring global approximations even under model misspecification."

to:NB
estimation
empirical_processes
statistics
3 days ago by cshalizi

[1908.08702] Economically rational sample-size choice and irreproducibility

3 days ago by cshalizi

"Several systematic studies have suggested that a large fraction of published research is not reproducible. One probable reason for low reproducibility is insufficient sample size, resulting in low power and low positive predictive value. It has been suggested that insufficient sample-size choice is driven by a combination of scientific competition and 'positive publication bias'. Here we formalize this intuition in a simple model, in which scientists choose economically rational sample sizes, balancing the cost of experimentation with income from publication. Specifically, assuming that a scientist's income derives only from 'positive' findings (positive publication bias) and that individual samples cost a fixed amount, allows to leverage basic statistical formulas into an economic optimality prediction. We find that if effects have i) low base probability, ii) small effect size or iii) low grant income per publication, then the rational (economically optimal) sample size is small. Furthermore, for plausible distributions of these parameters we find a robust emergence of a bimodal distribution of obtained statistical power and low overall reproducibility rates, matching empirical findings. Overall, the model describes a simple mechanism explaining both the prevalence and the persistence of small sample sizes. It suggests economic rationality, or economic pressures, as a principal driver of irreproducibility."

--- To be clear, my skepticism here isn't about the basic idea, which has been articulated about a zillion times (back to Meehl at least...), but rather whether mathing it up with dubious simplifying assumptions adds anything of value.

to:NB
to_be_shot_after_a_fair_trial
bad_data_analysis
statistics
sociology_of_science
economics
--- To be clear, my skepticism here isn't about the basic idea, which has been articulated about a zillion times (back to Meehl at least...), but rather whether mathing it up with dubious simplifying assumptions adds anything of value.

3 days ago by cshalizi

[1909.04436] The Prevalence of Errors in Machine Learning Experiments

3 days ago by cshalizi

"Context: Conducting experiments is central to research machine learning research to benchmark, evaluate and compare learning algorithms. Consequently it is important we conduct reliable, trustworthy experiments. Objective: We investigate the incidence of errors in a sample of machine learning experiments in the domain of software defect prediction. Our focus is simple arithmetical and statistical errors. Method: We analyse 49 papers describing 2456 individual experimental results from a previously undertaken systematic review comparing supervised and unsupervised defect prediction classifiers. We extract the confusion matrices and test for relevant constraints, e.g., the marginal probabilities must sum to one. We also check for multiple statistical significance testing errors. Results: We find that a total of 22 out of 49 papers contain demonstrable errors. Of these 7 were statistical and 16 related to confusion matrix inconsistency (one paper contained both classes of error). Conclusions: Whilst some errors may be of a relatively trivial nature, e.g., transcription errors their presence does not engender confidence. We strongly urge researchers to follow open science principles so errors can be more easily be detected and corrected, thus as a community reduce this worryingly high error rate with our computational experiments."

to:NB
bad_data_analysis
machine_learning
to_teach:data-mining
3 days ago by cshalizi

[1809.06522] Concentration Inequalities for the Empirical Distribution

4 days ago by cshalizi

"We study concentration inequalities for the Kullback--Leibler (KL) divergence between the empirical distribution and the true distribution. Applying a recursion technique, we improve over the method of types bound uniformly in all regimes of sample size n and alphabet size k, and the improvement becomes more significant when k is large. We discuss the applications of our results in obtaining tighter concentration inequalities for L1 deviations of the empirical distribution from the true distribution, and the difference between concentration around the expectation or zero. We also obtain asymptotically tight bounds on the variance of the KL divergence between the empirical and true distribution, and demonstrate their quantitatively different behaviors between small and large sample sizes compared to the alphabet size."

to:NB
concentration_of_measure
deviation_inequalities
empirical_processes
statistics
probability
4 days ago by cshalizi

[1909.04475] Variable Length Markov Chains, Persistent Random Walks: a close encounter

4 days ago by cshalizi

"This is the story of the encounter between two worlds: the world of random walks and the world of Variable Length Markov Chains (VLMC). The meeting point turns around the semi-Markov property of underlying processes."

to:NB
markov_models
stochastic_processes
to_read
4 days ago by cshalizi

[1909.03634] Krylov Subspace Method for Nonlinear Dynamical Systems with Random Noise

4 days ago by cshalizi

"Operator-theoretic analysis of nonlinear dynamical systems has attracted much attention in a variety of engineering and scientific fields, endowed with practical estimation methods using data such as dynamic mode decomposition. In this paper, we address a lifted representation of nonlinear dynamical systems with random noise based on transfer operators, and develop a novel Krylov subspace method for estimating it using finite data, with consideration of the unboundedness of operators. For this purpose, we first consider Perron-Frobenius operators with kernel-mean embeddings for such systems. Then, we extend the Arnoldi method, which is the most classical type of Kryov subspace methods, so that it can be applied to the current case. Meanwhile, the Arnoldi method requires the assumption that the operator is bounded, which is not necessarily satisfied for transfer operators on nonlinear systems. We accordingly develop the shift-invert Arnoldi method for the Perron-Frobenius operators to avoid this problem. Also, we describe a way of evaluating the predictive accuracy by estimated operators on the basis of the maximum mean discrepancy, which is applicable, for example, to anomaly detection in complex systems. The empirical performance of our methods is investigated using synthetic and real-world healthcare data."

to:NB
dynamical_systems
stochastic_processes
hilbert_space
kernel_methods
4 days ago by cshalizi

[1909.04495] Natural Adversarial Sentence Generation with Gradient-based Perturbation

4 days ago by cshalizi

"This work proposes a novel algorithm to generate natural language adversarial input for text classification models, in order to investigate the robustness of these models. It involves applying gradient-based perturbation on the sentence embeddings that are used as the features for the classifier, and learning a decoder for generation. We employ this method to a sentiment analysis model and verify its effectiveness in inducing incorrect predictions by the model. We also conduct quantitative and qualitative analysis on these examples and demonstrate that our approach can generate more natural adversaries. In addition, it can be used to successfully perform black-box attacks, which involves attacking other existing models whose parameters are not known. On a public sentiment analysis API, the proposed method introduces a 20% relative decrease in average accuracy and 74% relative increase in absolute error."

to:NB
adversarial_examples
4 days ago by cshalizi

[1909.03543] An Experimental Study of Structural Diversity in Social Networks

4 days ago by cshalizi

"Several recent studies of online social networking platforms have found that adoption rates and engagement levels are positively correlated with structural diversity, the degree of heterogeneity among an individual's contacts as measured by network ties. One common theory for this observation is that structural diversity increases utility, in part because there is value to interacting with people from different network components on the same platform. While compelling, evidence for this causal theory comes from observational studies, making it difficult to rule out non-causal explanations. We investigate the role of structural diversity on retention by conducting a large-scale randomized controlled study on the Twitter platform. We first show that structural diversity correlates with user retention on Twitter, corroborating results from past observational studies. We then exogenously vary structural diversity by altering the set of network recommendations new users see when joining the platform; we confirm that this design induces the desired changes to network topology. We find, however, that low, medium, and high structural diversity treatment groups in our experiment have comparable retention rates. Thus, at least in this case, the observed correlation between structural diversity and retention does not appear to result from a causal relationship, challenging theories based on past observational studies."

to:NB
social_networks
experimental_sociology
re:democratic_cognition
ugander.johan
goel.sharad
4 days ago by cshalizi

[1909.04189] Follow the Leader: Documents on the Leading Edge of Semantic Change Get More Citations

4 days ago by cshalizi

"Diachronic word embeddings offer remarkable insights into the evolution of language and provide a tool for quantifying socio-cultural change. However, while this method identifies words that have semantically shifted, it studies them in isolation; it does not facilitate the discovery of documents that lead or lag with respect to specific semantic innovations. In this paper, we propose a method to quantify the degree of semantic progressiveness in each usage. These usages can be aggregated to obtain scores for each document. We analyze two large collections of documents, representing legal opinions and scientific articles. Documents that are predicted to be semantically progressive receive a larger number of citations, indicating that they are especially influential. Our work thus provides a new technique for identifying lexical semantic leaders and demonstrates a new link between early adoption and influence in a citation network."

to:NB
text_mining
bibliometry
sociology_of_science
lerman.kristina
law
4 days ago by cshalizi

Louder and Faster by Deborah Wong - Paperback - University of California Press

4 days ago by cshalizi

"A free open access ebook is available upon publication. Learn more at www.luminosoa.org.

"Louder and Faster is a cultural study of the phenomenon of Asian American taiko, the thundering, athletic drumming tradition that originated in Japan. Immersed in the taiko scene for twenty years, Deborah Wong has witnessed cultural and demographic changes and the exponential growth and expansion of taiko particularly in Southern California. Through her participatory ethnographic work, she reveals a complicated story embedded in memories of Japanese American internment and legacies of imperialism, Asian American identity and politics, a desire to be seen and heard, and the intersection of culture and global capitalism. Exploring the materialities of the drums, costumes, and bodies that make sound, analyzing the relationship of these to capitalist multiculturalism, and investigating the gender politics of taiko, Louder and Faster considers both the promises and pitfalls of music and performance as an antiracist practice. The result is a vivid glimpse of an Asian American presence that is both loud and fragile."

--- Presumably Wong looks at participation by Asian-but-not-Japanese Americans...

to:NB
books:noted
music
ethnography
identity_group_formation
california
"Louder and Faster is a cultural study of the phenomenon of Asian American taiko, the thundering, athletic drumming tradition that originated in Japan. Immersed in the taiko scene for twenty years, Deborah Wong has witnessed cultural and demographic changes and the exponential growth and expansion of taiko particularly in Southern California. Through her participatory ethnographic work, she reveals a complicated story embedded in memories of Japanese American internment and legacies of imperialism, Asian American identity and politics, a desire to be seen and heard, and the intersection of culture and global capitalism. Exploring the materialities of the drums, costumes, and bodies that make sound, analyzing the relationship of these to capitalist multiculturalism, and investigating the gender politics of taiko, Louder and Faster considers both the promises and pitfalls of music and performance as an antiracist practice. The result is a vivid glimpse of an Asian American presence that is both loud and fragile."

--- Presumably Wong looks at participation by Asian-but-not-Japanese Americans...

4 days ago by cshalizi

[1909.03726] On the Fluctuation-Dissipation Relation in non-equilibrium and non-Hamiltonian systems

4 days ago by cshalizi

"We review generalized Fluctuation-Dissipation Relations which are valid under general conditions even in ``non-standard systems'', e.g. out of equilibrium and/or without a Hamiltonian structure. The response functions can be expressed in terms of suitable correlation functions computed in the unperperturbed dynamics. In these relations, typically one has nontrivial contributions due to the form of the stationary probability distribution; such terms take into account the interaction among the relevant degrees of freedom in the system. We illustrate the general formalism with some examples in non-standard cases, including driven granular media, systems with a multiscale structure, active matter and systems showing anomalous diffusion."

to:NB
fluctuation-response
non-equilibrium
statistical_mechanics
stochastic_processes
vulpiani.angelo
4 days ago by cshalizi

[1909.03760] Fluctuation-Dissipation Relation and Hawking Effect

4 days ago by cshalizi

"We show a direct connection between Kubo's fluctuation-dissipation relation and Hawking effect that is valid in any dimensions. The relevant correlators, computed from the known expressions for the stress tensor, are shown to satisfy the Kubo relation, from which the temperature of a black hole as seen by an observer at an arbitrary distance is abstracted. This reproduces the Tolman temperature and hence the Hawking temperature as that measured by an observer at infinity."

to:NB
fluctuation-response
gravitation
physics
statistical_mechanics
4 days ago by cshalizi

[1908.11287] Using the fluctuation-dissipation theorem for non-conservative forces

4 days ago by cshalizi

"An equilibrium system which is perturbed by an external potential relaxes to a new equilibrium state, a process obeying the fluctuation-dissipation theorem. In contrast, perturbing by non-conservative forces yields a nonequilibrium steady state, and the fluctuation-dissipation theorem can in general not be applied. Here we exploit a freedom inherent to linear response theory: Force fields which perform work that does not couple statistically to the considered observable can be added without changing the response. Using this freedom, we demonstrate that the fluctuation-dissipation theorem can be applied for certain non-conservative forces. We discuss the case of a non-conservative force field linear in particle coordinates, where the mentioned freedom can be formulated in terms of symmetries. In particular, for the case of shear, this yields a new response formula, which we find advantageous over the known Green-Kubo relation in terms of statistical accuracy."

to:NB
fluctuation-response
non-equilibrium
statistical_mechanics
4 days ago by cshalizi

[1909.03093] Solving Interpretable Kernel Dimension Reduction

4 days ago by cshalizi

"Kernel dimensionality reduction (KDR) algorithms find a low dimensional representation of the original data by optimizing kernel dependency measures that are capable of capturing nonlinear relationships. The standard strategy is to first map the data into a high dimensional feature space using kernels prior to a projection onto a low dimensional space. While KDR methods can be easily solved by keeping the most dominant eigenvectors of the kernel matrix, its features are no longer easy to interpret. Alternatively, Interpretable KDR (IKDR) is different in that it projects onto a subspace \textit{before} the kernel feature mapping, therefore, the projection matrix can indicate how the original features linearly combine to form the new features. Unfortunately, the IKDR objective requires a non-convex manifold optimization that is difficult to solve and can no longer be solved by eigendecomposition. Recently, an efficient iterative spectral (eigendecomposition) method (ISM) has been proposed for this objective in the context of alternative clustering. However, ISM only provides theoretical guarantees for the Gaussian kernel. This greatly constrains ISM's usage since any kernel method using ISM is now limited to a single kernel. This work extends the theoretical guarantees of ISM to an entire family of kernels, thereby empowering ISM to solve any kernel method of the same objective. In identifying this family, we prove that each kernel within the family has a surrogate Φ matrix and the optimal projection is formed by its most dominant eigenvectors. With this extension, we establish how a wide range of IKDR applications across different learning paradigms can be solved by ISM. To support reproducible results, the source code is made publicly available on \url{this https URL}."

--- Last tag is dreamily aspirational.

to:NB
kernel_methods
dimension_reduction
principal_components
to_teach:data-mining
--- Last tag is dreamily aspirational.

4 days ago by cshalizi

A problem in theory | Nature Human Behaviour

4 days ago by cshalizi

"The replication crisis facing the psychological sciences is widely regarded as rooted in methodological or statistical shortcomings. We argue that a large part of the problem is the lack of a cumulative theoretical framework or frameworks. Without an overarching theoretical framework that generates hypotheses across diverse domains, empirical programs spawn and grow from personal intuitions and culturally biased folk theories. By providing ways to develop clear predictions, including through the use of formal modelling, theoretical frameworks set expectations that determine whether a new finding is confirmatory, nicely integrating with existing lines of research, or surprising, and therefore requiring further replication and scrutiny. Such frameworks also prioritize certain research foci, motivate the use diverse empirical approaches and, often, provide a natural means to integrate across the sciences. Thus, overarching theoretical frameworks pave the way toward a more general theory of human behaviour. We illustrate one such a theoretical framework: dual inheritance theory."

--- Well, yes. Coming from physics, my reaction to a lot of social science has long been that it doesn't have any _theories_, just _ideas_ [http://bactra.org/weblog/347.html]. Even in economics, theories rarely predict more than the sign of partial derivatives. Whether this can really be changed is another question...

social_science_methodology
to_read
via:rvenkat
to:NB
--- Well, yes. Coming from physics, my reaction to a lot of social science has long been that it doesn't have any _theories_, just _ideas_ [http://bactra.org/weblog/347.html]. Even in economics, theories rarely predict more than the sign of partial derivatives. Whether this can really be changed is another question...

4 days ago by cshalizi

[1909.02436] Are Adversarial Robustness and Common Perturbation Robustness Independant Attributes ?

5 days ago by cshalizi

"Neural Networks have been shown to be sensitive to common perturbations such as blur, Gaussian noise, rotations, etc. They are also vulnerable to some artificial malicious corruptions called adversarial examples. The adversarial examples study has recently become very popular and it sometimes even reduces the term "adversarial robustness" to the term "robustness". Yet, we do not know to what extent the adversarial robustness is related to the global robustness. Similarly, we do not know if a robustness to various common perturbations such as translations or contrast losses for instance, could help with adversarial corruptions. We intend to study the links between the robustnesses of neural networks to both perturbations. With our experiments, we provide one of the first benchmark designed to estimate the robustness of neural networks to common perturbations. We show that increasing the robustness to carefully selected common perturbations, can make neural networks more robust to unseen common perturbations. We also prove that adversarial robustness and robustness to common perturbations are independent. Our results make us believe that neural network robustness should be addressed in a broader sense."

to:NB
adversarial_examples
5 days ago by cshalizi

[1706.06083] Towards Deep Learning Models Resistant to Adversarial Attacks

5 days ago by cshalizi

"Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples---inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. Code and pre-trained models are available at this https URL and this https URL."

to:NB
adversarial_examples
5 days ago by cshalizi

[1909.02330] McDiarmid-Type Inequalities for Graph-Dependent Variables and Stability Bounds

5 days ago by cshalizi

"A crucial assumption in most statistical learning theory is that samples are independently and identically distributed (i.i.d.). However, for many real applications, the i.i.d. assumption does not hold. We consider learning problems in which examples are dependent and their dependency relation is characterized by a graph. To establish algorithm-dependent generalization theory for learning with non-i.i.d. data, we first prove novel McDiarmid-type concentration inequalities for Lipschitz functions of graph-dependent random variables. We show that concentration relies on the forest complexity of the graph, which characterizes the strength of the dependency. We demonstrate that for many types of dependent data, the forest complexity is small and thus implies good concentration. Based on our new inequalities we are able to build stability bounds for learning from graph-dependent data."

to:NB
learning_theory
dependence_measures
to_read
5 days ago by cshalizi

[1909.02187] More Adaptive Algorithms for Tracking the Best Expert

5 days ago by cshalizi

"In this paper, we consider the problem of prediction with expert advice in dynamic environments. We choose tracking regret as the performance metric and derive novel data-dependent bounds by developing two adaptive algorithms. The first algorithm achieves a second-order tracking regret bound, which improves existing first-order bounds. The second algorithm enjoys a path-length bound, which is generally incomparable to the second-order bound but offers advantages in slowly moving environments. Both algorithms are developed under the online mirror descent framework and draw inspiration from existing algorithms that attain data-dependent bounds of static regret. The key idea is to use a clipped simplex in the updating step of online mirror descent. Finally, we extend our algorithms and analysis to the problem of online matrix prediction and provide the first data-dependent tracking regret bound for this problem."

to:NB
online_learning
low-regret_learning
re:growing_ensemble_project
to_read
5 days ago by cshalizi

[1809.06848] On the Learning Dynamics of Deep Neural Networks

5 days ago by cshalizi

"While a lot of progress has been made in recent years, the dynamics of learning in deep nonlinear neural networks remain to this day largely misunderstood. In this work, we study the case of binary classification and prove various properties of learning in such networks under strong assumptions such as linear separability of the data. Extending existing results from the linear case, we confirm empirical observations by proving that the classification error also follows a sigmoidal shape in nonlinear architectures. We show that given proper initialization, learning expounds parallel independent modes and that certain regions of parameter space might lead to failed training. We also demonstrate that input norm and features' frequency in the dataset lead to distinct convergence speeds which might shed some light on the generalization capabilities of deep neural networks. We provide a comparison between the dynamics of learning with cross-entropy and hinge losses, which could prove useful to understand recent progress in the training of generative adversarial networks. Finally, we identify a phenomenon that we baptize \textit{gradient starvation} where the most frequent features in a dataset prevent the learning of other less frequent but equally informative features."

to:NB
optimization
neural_networks
5 days ago by cshalizi

What happened to cognitive science? | Nature Human Behaviour

5 days ago by cshalizi

"More than a half-century ago, the ‘cognitive revolution’, with the influential tenet ‘cognition is computation’, launched the investigation of the mind through a multidisciplinary endeavour called cognitive science. Despite significant diversity of views regarding its definition and intended scope, this new science, explicitly named in the singular, was meant to have a cohesive subject matter, complementary methods and integrated theories. Multiple signs, however, suggest that over time the prospect of an integrated cohesive science has not materialized. Here we investigate the status of the field in a data-informed manner, focusing on four indicators, two bibliometric and two socio-institutional. These indicators consistently show that the devised multi-disciplinary program failed to transition to a mature inter-disciplinary coherent field. Bibliometrically, the field has been largely subsumed by (cognitive) psychology, and educationally, it exhibits a striking lack of curricular consensus, raising questions about the future of the cognitive science enterprise."

to:NB
bibliometry
cognitive_science
sociology_of_science
via:?
5 days ago by cshalizi

[1909.01506] Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control

5 days ago by cshalizi

"Many real-world sequential decision-making problems can be formulated as optimal control with high-dimensional observations and unknown dynamics. A promising approach is to embed the high-dimensional observations into a lower-dimensional latent representation space, estimate the latent dynamics model, then utilize this model for control in the latent space. An important open question is how to learn a representation that is amenable to existing control algorithms? In this paper, we focus on learning representations for locally-linear control algorithms, such as iterative LQR (iLQR). By formulating and analyzing the representation learning problem from an optimal control perspective, we establish three underlying principles that the learned representation should comprise: 1) accurate prediction in the observation space, 2) consistency between latent and observation space dynamics, and 3) low curvature in the latent space transitions. These principles naturally correspond to a loss function that consists of three terms: prediction, consistency, and curvature (PCC). Crucially, to make PCC tractable, we derive an amortized variational bound for the PCC loss function. Extensive experiments on benchmark domains demonstrate that the new variational-PCC learning algorithm benefits from significantly more stable and reproducible training, and leads to superior control performance. Further ablation studies give support to the importance of all three PCC components for learning a good latent space for control."

to:NB
control_theory_and_control_engineering
differential_geometry
stochastic_processes
state-space_models
5 days ago by cshalizi

[1903.11482] A Sober Look at Neural Network Initializations

5 days ago by cshalizi

"Initializing the weights and the biases is a key part of the training process of a neural network. Unlike the subsequent optimization phase, however, the initialization phase has gained only limited attention in the literature. In this paper we discuss some consequences of commonly used initialization strategies for vanilla DNNs with ReLU activations. Based on these insights we then develop an alternative initialization strategy. Finally, we present some large scale experiments assessing the quality of the new initialization strategy."

--- Cf. _Numerical Recipes_: "Carefully crafted initial estimates reward you not only with reduced computational effort, but also with understanding and increased self-esteem."

to:NB
neural_networks
optimization
steinwart.ingo
minsky_shut_his_eyes
--- Cf. _Numerical Recipes_: "Carefully crafted initial estimates reward you not only with reduced computational effort, but also with understanding and increased self-esteem."

5 days ago by cshalizi

**related tags**