cshalizi + to:nb   5193

[1707.04800] Exponential-Family Models of Random Graphs: Inference in Finite-, Super-, and Infinite Population Scenarios
"Exponential-family Random Graph Models (ERGMs) constitute a large statistical framework for modeling sparse and dense random graphs, short- and long-tailed degree distributions, covariates, and a wide range of complex dependencies. Special cases of ERGMs are generalized linear models (GLMs), Bernoulli random graphs, β-models, p1-models, and models related to Markov random fields in spatial statistics and other areas of statistics. While widely used in practice, questions have been raised about the theoretical properties of ERGMs. These include concerns that some ERGMs are near-degenerate and that many ERGMs are non-projective. To address them, careful attention must be paid to model specifications and their underlying assumptions, and in which inferential settings models are employed. As we discuss, near-degeneracy can affect simplistic ERGMs lacking structure, but well-posed ERGMs with additional structure can be well-behaved. Likewise, lack of projectivity can affect non-likelihood-based inference, but likelihood-based inference does not require projectivity. Here, we review well-posed ERGMs along with likelihood-based inference. We first clarify the core statistical notions of "sample" and "population" in the ERGM framework, and separate the process that generates the population graph from the observation process. We then review likelihood-based inference in finite-, super-, and infinite-population scenarios. We conclude with consistency results, and an application to human brain networks"

--- Needless (?) to say, I am unconvinced by the arguments that projectivity doesn't matter.
to:NB  exponential_family_random_graphs  network_data_analysis  statistics  to_teach:baby-nets  re:your_favorite_ergm_sucks 
6 hours ago by cshalizi
[1908.01596v3] A Unifying Analysis of Shift Operators on a Graph
The maximum entropy principle is employed to introduce a general class of shift operators (GSO) for random signals on a graph. By virtue of the assumed probabilistic framework, the proposed GSO is shown to be both bounded and to exhibit the desired property of asymptotic power preservation over graph shifts. For rigour, the sensitivity of the GSO to a graph topology misspecification is also addressed. The advantages of the proposed operator are demonstrated in a real-world multi-sensor signal average setting.
network_data_analysis  to:NB 
6 hours ago by cshalizi
[1909.05802] Ecological communities from random Lotka-Volterra dynamics with nonlinear functional response
We investigate the outcome of Lotka-Volterra dynamics of ecological communities with random interaction coefficients and non-linear functional response. We show in simulations that the saturation of Holling type-II response stabilises the dynamics. This is confirmed in an analytical generating-functional approach to Lotka-Volterra equations with piecewise linear saturating response. For such systems we are able to derive self-consistent relations governing the stable fixed-point phase, and to carry out a linear stability analysis to predict the onset of unstable behaviour. We investigate in detail the combined effects of the mean and variance of the random interaction coefficients, the cut-off parameter of the non-linear response, and a symmetry parameter. We find that stability and diversity increases with the introduction of functional response, where decreasing the functional response parameter has a similar effect to decreasing the symmetry parameter. We also find biomass and diversity to be less dependent on the symmetry of interactions with functional response, and co-operation to no longer have a detrimental effect on stability.
ecology  to:NB 
6 hours ago by cshalizi
[1908.11140] Deep Learning and MARS: A Connection
"We consider least squares regression estimates using deep neural networks. We show that these estimates satisfy an oracle inequality, which implies that (up to a logarithmic factor) the error of these estimates is at least as small as the optimal possible error bound which one would expect for MARS in case that this procedure would work in the optimal way. As a result we show that our neural networks are able to achieve a dimensionality reduction in case that the regression function locally has low dimensionality. This assumption seems to be realistic in real-world applications, since selected high-dimensional data are often confined to locally-low-dimensional distributions. In our simulation study we provide numerical experiments to support our theoretical results and to compare our estimate with other conventional nonparametric regression estimates, especially with MARS. The use of our estimates is illustrated through a real data analysis."
to:NB  regression  nonparametrics  neural_networks  statistics 
13 hours ago by cshalizi
[1909.05229] Goodness-of-fit tests on manifolds
"We develop a general theory for the goodness-of-fit test to non-linear models. In particular, we assume that the observations are noisy samples of a sub-manifold defined by a non-linear map of some intrinsic structures. The observation noise is additive Gaussian. Our main result shows that the "residual" of the model fit, by solving a non-linear least-square problem, follows a (possibly non-central) χ2 distribution. The parameters of the χ2 distribution are related to the model order and dimension of the problem. The main result is established by making a novel connection between statistical test and differential geometry. We further present a method to select the model orders sequentially. We demonstrate the broad application of the general theory in a range of applications in machine learning and signal processing, including determining the rank of low-rank (possibly complex-valued) matrices and tensors, from noisy, partial, or indirect observations, determining the number of sources in signal demixing, and potential applications in determining the number of hidden nodes in neural networks."
to:NB  statistics_on_manifolds  goodness-of-fit  statistics  inference_to_latent_objects  manifold_learning 
13 hours ago by cshalizi
[1612.01159] On the instability and degeneracy of deep learning models
"A probability model exhibits instability if small changes in a data outcome result in large, and often unanticipated, changes in probability. This instability is a property of the probability model, given by a distributional form and a given configuration of parameters. For correlated data structures found in several application areas, there is increasing interest in identifying such sensitivity in model probability structure. We consider the problem of quantifying instability for general probability models defined on sequences of observations, where each sequence of length N has a finite number of possible values that can be taken at each point. A sequence of probability models results, indexed by N, and an associated parameter sequence, that accommodates data of expanding dimension. Model instability is formally shown to occur when a certain log-probability ratio under such models grows faster than N. In this case, a one component change in the data sequence can shift probability by orders of magnitude. Also, as instability becomes more extreme, the resulting probability models are shown to tend to degeneracy, placing all their probability on potentially small portions of the sample space. These results on instability apply to large classes of models commonly used in random graphs, network analysis, and machine learning contexts."
to:NB  statistics  exponential_family_random_graphs  re:your_favorite_ergm_sucks  to_read 
13 hours ago by cshalizi
[1909.04843] Clusters and the entropy in opinion dynamics on complex networks
"In this work, we investigate a heterogeneous population in the modified Hegselmann-Krause opinion model on complex networks. We introduce the Shannon information entropy about all relative opinion clusters to characterize the cluster profile in the final configuration. Independent of network structures, there exists the optimal stubbornness of one subpopulation for the largest number of clusters and the highest entropy. Besides, there is the optimal bounded confidence (or subpopulation ratio) of one subpopulation for the smallest number of clusters and the lowest entropy. However, network structures affect cluster profiles indeed. A large average degree favors consensus for making different networks more similar with complete graphs. The network size has limited impact on cluster profiles of heterogeneous populations on scale-free networks but has significant effects upon those on small-world networks."
to:NB  networks  voter_model 
13 hours ago by cshalizi
[1909.04791] Techniques All Classifiers Can Learn from Deep Networks: Models, Optimizations, and Regularization
"Deep neural networks have introduced novel and useful tools to the machine learning community, and other types of classifiers can make use of these tools to improve their performance and generality. This paper reviews the current state of the art for deep learning classifier technologies that are being used outside of deep neural networks. Many components of existing deep neural network architectures can be employed by non-network classifiers. In this paper, we review the feature learning, optimization, and regularization methods that form a core of deep network technologies. We then survey non-neural network learning algorithms that make innovative use of these methods to improve classification. We conclude by discussing directions that can be pursued to expand the area of deep learning for a variety of classification algorithms."
to:NB  classifiers  machine_learning  optimization  computational_statistics  statistics  neural_networks 
13 hours ago by cshalizi
[1909.05097] Spectral Non-Convex Optimization for Dimension Reduction with Hilbert-Schmidt Independence Criterion
"The Hilbert Schmidt Independence Criterion (HSIC) is a kernel dependence measure that has applications in various aspects of machine learning. Conveniently, the objectives of different dimensionality reduction applications using HSIC often reduce to the same optimization problem. However, the nonconvexity of the objective function arising from non-linear kernels poses a serious challenge to optimization efficiency and limits the potential of HSIC-based formulations. As a result, only linear kernels have been computationally tractable in practice. This paper proposes a spectral-based optimization algorithm that extends beyond the linear kernel. The algorithm identifies a family of suitable kernels and provides the first and second-order local guarantees when a fixed point is reached. Furthermore, we propose a principled initialization strategy, thereby removing the need to repeat the algorithm at random initialization points. Compared to state-of-the-art optimization algorithms, our empirical results on real data show a run-time improvement by as much as a factor of 10^5 while consistently achieving lower cost and classification/clustering errors. The implementation source code is publicly available on this https URL."
to:NB  kernel_methods  hilbert_space  probability  dimension_reduction  statistics  computational_statistics  spectral_methods 
13 hours ago by cshalizi
[1909.05099] How to detect novelty in textual data streams? A comparative study of existing methods
"Since datasets with annotation for novelty at the document and/or word level are not easily available, we present a simulation framework that allows us to create different textual datasets in which we control the way novelty occurs. We also present a benchmark of existing methods for novelty detection in textual data streams. We define a few tasks to solve and compare several state-of-the-art methods. The simulation framework allows us to evaluate their performances according to a set of limited scenarios and test their sensitivity to some parameters. Finally, we experiment with the same methods on different kinds of novelty in the New York Times Annotated Dataset."
to:NB  text_mining  natural_language_processing 
13 hours ago by cshalizi
[1909.05207] Introduction to Online Convex Optimization
"This manuscript portrays optimization as a process. In many practical applications the environment is so complex that it is infeasible to lay out a comprehensive theoretical model and use classical algorithmic theory and mathematical optimization. It is necessary as well as beneficial to take a robust approach, by applying an optimization method that learns as one goes along, learning from experience as more aspects of the problem are observed. This view of optimization as a process has become prominent in varied fields and has led to some spectacular success in modeling and systems that are now part of our daily lives."
to:NB  optimization  convexity  online_learning  low-regret_learning  hazan.elad 
13 hours ago by cshalizi
[1909.04019] Forecaster: A Graph Transformer for Forecasting Spatial and Time-Dependent Data
"Spatial and time-dependent data is of interest in many applications. This task is difficult due to its complex spatial dependency, long-range temporal dependency, data non-stationarity, and data heterogeneity. To address these challenges, we propose Forecaster, a graph Transformer architecture. Specifically, we start by learning the structure of the graph that parsimoniously represents the spatial dependency between the data at different locations. Based on the topology of the graph, we sparsify the Transformer to account for the strength of spatial dependency, long-range temporal dependency, data non-stationarity, and data heterogeneity. We evaluate Forecaster in the problem of forecasting taxi ride-hailing demand and show that our proposed architecture significantly outperforms the state-of-the-art baselines."
to:NB  spatio-temporal_statistics  prediction  statistics 
13 hours ago by cshalizi
[1905.10546] Protecting the Protected Group: Circumventing Harmful Fairness
"Machine Learning (ML) algorithms shape our lives. Banks use them to determine if we are good borrowers; IT companies delegate them recruitment decisions; police apply ML for crime-prediction, and judges base their verdicts on ML. However, real-world examples show that such automated decisions tend to discriminate against protected groups. This potential discrimination generated a huge hype both in media and in the research community. Quite a few formal notions of fairness were proposed, which take a form of constraints a "fair" algorithm must satisfy. We focus on scenarios where fairness is imposed on a self-interested party (e.g., a bank that maximizes its revenue). We find that the disadvantaged protected group can be worse off after imposing a fairness constraint. We introduce a family of \textit{Welfare-Equalizing} fairness constraints that equalize per-capita welfare of protected groups, and include \textit{Demographic Parity} and \textit{Equal Opportunity} as particular cases. In this family, we characterize conditions under which the fairness constraint helps the disadvantaged group. We also characterize the structure of the optimal \textit{Welfare-Equalizing} classifier for the self-interested party, and provide an algorithm to compute it. Overall, our \textit{Welfare-Equalizing} fairness approach provides a unified framework for discussing fairness in classification in the presence of a self-interested party."

--- From the abstract, this sounds pretty similar to Sharad Goel's paper...
to:NB  algorithmic_fairness 
13 hours ago by cshalizi
[1909.04883] Learning Vector-valued Functions with Local Rademacher Complexity
"We consider a general family of problems of which the output space admits vector-valued structure, covering a broad family of important domains, e.g. multi-label learning and multi-class classification. By using local Rademacher complexity and unlabeled data, we derived novel data-dependent excess risk bounds for vector-valued functions in both linear space and kernel space. The proposed bounds are much sharper than existing bounds and can be applied into specific vector-valued tasks in terms of different hypotheses sets and loss functions. Theoretical analysis motivates us to devise a unified learning framework for vector-valued functions based which is solved by proximal gradient descent on the primal, achieving a much better tradeoff between accuracy and efficiency. Empirical results on several benchmark datasets show that the proposed algorithm outperforms compared methods significantly, which coincides with our theoretical analysis."
to:NB  learning_theory  to_read 
14 hours ago by cshalizi
[1909.00835] Evolutionary reinforcement learning of dynamical large deviations
"We show how to calculate dynamical large deviations using evolutionary reinforcement learning. An agent, a stochastic model, propagates a continuous-time Monte Carlo trajectory, and receives a reward conditioned upon the values of certain path-extensive quantities. Evolution produces progressively fitter agents, allowing the calculation of a piece of a large-deviation rate function for a particular model and path-extensive quantity. For models with small state spaces the evolutionary process acts directly on rates, and for models with large state spaces the process acts on the weights of a neural network that parameterizes the model's rates. The present approach shows how path-extensive physics problems can be considered within a framework widely used in machine learning."
to:NB  reinforcement_learning  large_deviations  computational_statistics  probability  re:fitness_sampling 
14 hours ago by cshalizi
[1904.04052] Practical tests for significance in Markov Chains
"We give qualitative and quantitative improvements to theorems which enable significance testing in Markov Chains, with a particular eye toward the goal of enabling strong, interpretable, and statistically rigorous claims of political gerrymandering. Our results can be used to demonstrate at a desired significance level that a given Markov Chain state (e.g., a districting) is extremely unusual (rather than just atypical) with respect to the fragility of its characteristics in the chain. We also provide theorems specialized to leverage quantitative improvements when there is a product structure in the underlying probability space, as can occur due to geographical constraints on districtings."
to:NB  markov_models  hypothesis_testing  anomaly_detection  to_be_shot_after_a_fair_trial 
16 hours ago by cshalizi
[1809.08686] On the quenched CLT for stationary random fields under projective criteria
"Motivated by random evolutions which do not start from equilibrium, in a recent work, Peligrad and Volný (2018) showed that the quenched CLT (central limit theorem) holds for ortho-martingale random fields. In this paper, we study the quenched CLT for a class of random fields larger than the ortho-martingales. To get the results, we impose sufficient conditions in terms of projective criteria under which the partial sums of a stationary random field admit an ortho-martingale approximation. More precisely, the sufficient conditions are of the Hannan's projective type. As applications, we establish quenched CLT's for linear and nonlinear random fields with independent innovations."
to:NB  stochastic_processes  random_fields  martingales  central_limit_theorem 
16 hours ago by cshalizi
[1909.04706] Regression to the Mean's Impact on the Synthetic Control Method: Bias and Sensitivity Analysis
"To make informed policy recommendations from observational data, we must be able to discern true treatment effects from random noise and effects due to confounding. Difference-in-Difference techniques which match treated units to control units based on pre-treatment outcomes, such as the synthetic control approach have been presented as principled methods to account for confounding. However, we show that use of synthetic controls or other matching procedures can introduce regression to the mean (RTM) bias into estimates of the average treatment effect on the treated. Through simulations, we show RTM bias can lead to inflated type I error rates as well as decreased power in typical policy evaluation settings. Further, we provide a novel correction for RTM bias which can reduce bias and attain appropriate type I error rates. This correction can be used to perform a sensitivity analysis which determines how results may be affected by RTM. We use our proposed correction and sensitivity analysis to reanalyze data concerning the effects of California's Proposition 99, a large-scale tobacco control program, on statewide smoking rates."
to:NB  causal_inference  statistics  economics  small.dylan 
16 hours ago by cshalizi
[1909.04890] Aggregated Hold-Out
"Aggregated hold-out (Agghoo) is a method which averages learning rules selected by hold-out (that is, cross-validation with a single split). We provide the first theoretical guarantees on Agghoo, ensuring that it can be used safely: Agghoo performs at worst like the hold-out when the risk is convex. The same holds true in classification with the 0-1 risk, with an additional constant factor. For the hold-out, oracle inequalities are known for bounded losses, as in binary classification. We show that similar results can be proved, under appropriate assumptions, for other risk-minimization problems. In particular, we obtain an oracle inequality for regularized kernel regression with a Lip-schitz loss, without requiring that the Y variable or the regressors be bounded. Numerical experiments show that aggregation brings a significant improvement over the hold-out and that Agghoo is competitive with cross-validation."
to:NB  cross-validation  model_selection  statistics  arlot.sylvain  to_read 
16 hours ago by cshalizi
[1908.04358] Graph hierarchy and spread of infections
"Trophic levels and hence trophic coherence can be defined only on networks with well defined sources, trophic analysis of networks had been restricted to the ecological domain until now. Trophic coherence, a measure of a network's hierarchical organisation, has been shown to be linked to a network's structural and dynamical aspects. In this paper we introduce hierarchical levels, which is a generalisation of trophic levels, that can be defined on any simple graph and we interpret it as a network influence metric. We discuss how our generalisation relates to the previous definition and what new insights our generalisation shines on the topological and dynamical aspects of networks. We also show that the mean of hierarchical differences correlates strongly with the topology of the graph. Finally, we model an epidemiological dynamics and show how the statistical properties of hierarchical differences relate to the incidence rate and how it affects the spreading process in a SIS model."
to:NB  epidemics_on_networks  re:do-institutions-evolve  to_read 
16 hours ago by cshalizi
[1907.09551] Cell differentiation: what have we learned in 50 years?
"I revisit two theories of cell differentiation in multicellular organisms published a half-century ago, Stuart Kauffman's global gene regulatory dynamics (GGRD) model and Roy Britten's and Eric Davidson's modular gene regulatory network (MGRN) model, in light of newer knowledge of mechanisms of gene regulation in the metazoans (animals). The two models continue to inform hypotheses and computational studies of differentiation of lineage-adjacent cell types. However, their shared notion (based on bacterial regulatory systems) of gene switches and networks built from them, have constrained progress in understanding the dynamics and evolution of differentiation. Recent work has described unique write-read-rewrite chromatin-based expression encoding in eukaryotes, as well metazoan-specific processes of gene activation and silencing in condensed-phase, enhancer-recruiting regulatory hubs, employing disordered proteins, including transcription factors, with context-dependent identities. These findings suggest an evolutionary scenario in which the origination of differentiation in animals, rather than depending exclusively on adaptive natural selection, emerged as a consequence of a type of multicellularity in which the novel metazoan gene regulatory apparatus was readily mobilized to amplify and exaggerate inherent cell functions of unicellular ancestors. The plausibility of this hypothesis is illustrated by the evolution of the developmental role of Grainyhead-like in the formation of epithelium."
to:NB  developmental_biology  evolutionary_biology  gene_regulation  biochemical_networks 
16 hours ago by cshalizi
[1909.03166] Equalizing Recourse across Groups
"The rise in machine learning-assisted decision-making has led to concerns about the fairness of the decisions and techniques to mitigate problems of discrimination. If a negative decision is made about an individual (denying a loan, rejecting an application for housing, and so on) justice dictates that we be able to ask how we might change circumstances to get a favorable decision the next time. Moreover, the ability to change circumstances (a better education, improved credentials) should not be limited to only those with access to expensive resources. In other words, \emph{recourse} for negative decisions should be considered a desirable value that can be equalized across (demographically defined) groups. This paper describes how to build models that make accurate predictions while still ensuring that the penalties for a negative outcome do not disadvantage different groups disproportionately. We measure recourse as the distance of an individual from the decision boundary of a classifier. We then introduce a regularized objective to minimize the difference in recourse across groups. We explore linear settings and further extend recourse to non-linear settings as well as model-agnostic settings where the exact distance from boundary cannot be calculated. Our results show that we can successfully decrease the unfairness in recourse while maintaining classifier performance."

--- This seems like a very odd definition of "recourse", though Suresh is generally sensible so the last tag applies charitably.
to:NB  to_read  algorithmic_fairness  venkatasubramanian.suresh  decision-making  to_be_shot_after_a_fair_trial 
16 hours ago by cshalizi
[1908.01297] A Restricted Black-box Adversarial Framework Towards Attacking Graph Embedding Models
"With the great success of graph embedding model on both academic and industry area, the robustness of graph embedding against adversarial attack inevitably becomes a central problem in graph learning domain. Regardless of the fruitful progress, most of the current works perform the attack in a white-box fashion: they need to access the model predictions and labels to construct their adversarial loss. However, the inaccessibility of model predictions in real systems makes the white-box attack impractical to real graph learning system. This paper promotes current frameworks in a more general and flexible sense -- we demand to attack various kinds of graph embedding model with black-box driven. To this end, we begin by investigating the theoretical connections between graph signal processing and graph embedding models in a principled way and formulate the graph embedding model as a general graph signal process with corresponding graph filter. As such, a generalized adversarial attacker: GF-Attack is constructed by the graph filter and feature matrix. Instead of accessing any knowledge of the target classifiers used in graph embedding, GF-Attack performs the attack only on the graph filter in a black-box attack fashion. To validate the generalization of GF-Attack, we construct the attacker on four popular graph embedding models. Extensive experimental results validate the effectiveness of our attacker on several benchmark datasets. Particularly by using our attack, even small graph perturbations like one-edge flip is able to consistently make a strong attack in performance to different graph embedding models."
to:NB  network_data_analysis  adversarial_examples 
16 hours ago by cshalizi
[1904.08554] Gotta Catch 'Em All: Using Concealed Trapdoors to Detect Adversarial Attacks on Neural Networks
"Deep neural networks are vulnerable to adversarial attacks. Numerous efforts have focused on defenses that either try to patch `holes' in trained models or try to make it difficult or costly to compute adversarial examples exploiting these holes. In our work, we explore a counter-intuitive approach of constructing "adversarial trapdoors. Unlike prior works that try to patch or disguise vulnerable points in the manifold, we intentionally inject `trapdoors,' artificial weaknesses in the manifold that attract optimized perturbation into certain pre-embedded local optima. As a result, the adversarial generation functions naturally gravitate towards our trapdoors, producing adversarial examples that the model owner can recognize through a known neuron activation signature. In this paper, we introduce trapdoors and describe an implementation of trapdoors using similar strategies to backdoor/Trojan attacks. We show that by proactively injecting trapdoors into the models (and extracting their neuron activation signature), we can detect adversarial examples generated by the state of the art attacks (Projected Gradient Descent, Optimization based CW, and Elastic Net) with high detection success rate and negligible impact on normal inputs. These results also generalize across multiple classification domains (image recognition, face recognition and traffic sign recognition). We explore different properties of trapdoors, and discuss potential countermeasures (adaptive attacks) and mitigations."
to:NB  adversarial_examples 
16 hours ago by cshalizi
[1907.11932] Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment
"Machine learning algorithms are often vulnerable to adversarial examples that have imperceptible alterations from the original counterparts but can fool the state-of-the-art models. It is helpful to evaluate or even improve the robustness of these models by exposing the maliciously crafted adversarial examples. In this paper, we present TextFooler, a simple but strong baseline to generate natural adversarial text. By applying it to two fundamental natural language tasks, text classification and textual entailment, we successfully attacked three target models, including the powerful pre-trained BERT, and the widely used convolutional and recurrent neural networks. We demonstrate the advantages of this framework in three ways: (1) effective---it outperforms state-of-the-art attacks in terms of success rate and perturbation rate, (2) utility-preserving---it preserves semantic content and grammaticality, and remains correctly classified by humans, and (3) efficient---it generates adversarial text with computational complexity linear to the text length."
to:NB  adversarial_examples  text_mining 
16 hours ago by cshalizi
[1905.07360] Contrastive Fairness in Machine Learning
"Was it fair that Harry was hired but not Barry? Was it fair that Pam was fired instead of Sam? How can one ensure fairness when an intelligent algorithm takes these decisions instead of a human? How can one ensure that the decisions were taken based on merit and not on protected attributes like race or sex? These are the questions that must be answered now that many decisions in real life can be made through machine learning. However research in fairness of algorithms has focused on the counterfactual questions "what if?" or "why?", whereas in real life most subjective questions of consequence are contrastive: "why this but not that?". We introduce concepts and mathematical tools using causal inference to address contrastive fairness in algorithmic decision-making with illustrative examples."
to:NB  causal_inference  algorithmic_fairness  statistics 
16 hours ago by cshalizi
[1906.00232] Kernel Instrumental Variable Regression
"Instrumental variable regression is a strategy for learning causal relationships in observational data. If measurements of input X and output Y are confounded, the causal relationship can nonetheless be identified if an instrumental variable Z is available that influences X directly, but is conditionally independent of Y given X and the unmeasured confounder. The classic two-stage least squares algorithm (2SLS) simplifies the estimation problem by modeling all relationships as linear functions. We propose kernel instrumental variable regression (KIV), a nonparametric generalization of 2SLS, modeling relations among X, Y, and Z as nonlinear functions in reproducing kernel Hilbert spaces (RKHSs). We prove the consistency of KIV under mild assumptions, and derive conditions under which the convergence rate achieves the minimax optimal rate for unconfounded, one-stage RKHS regression. In doing so, we obtain an efficient ratio between training sample sizes used in the algorithm's first and second stages. In experiments, KIV outperforms state of the art alternatives for nonparametric instrumental variable regression. Of independent interest, we provide a more general theory of conditional mean embedding regression in which the RKHS has infinite dimension."
to:NB  instrumental_variables  kernel_estimators  regression  nonparametrics  causal_inference  statistics  re:ADAfaEPoV  to_read 
16 hours ago by cshalizi
[1909.03217] Community detection in inhomogeneous random graphs
"We study the problem of detecting whether an inhomogeneous random graph contains a planted community. Specifically, we observe a single realization of a graph. Under the null hypothesis, this graph is a sample from an inhomogeneous random graph, whereas under the alternative, there exists a small subgraph where the edge probabilities are increased by a multiplicative scaling factor. We present a scan test that is able to detect the presence of such a planted community, even when this community is very small and the underlying graph is inhomogeneous. We also derive an information theoretic lower bound for this problem which shows that in some regimes the scan test is almost asymptotically optimal. We illustrate our results through examples and numerical experiments."
to:NB  community_discovery  network_data_analysis  statistics 
16 hours ago by cshalizi
[1909.03302] On the Optimality of Gaussian Kernel Based Nonparametric Tests against Smooth Alternatives
"Nonparametric tests via kernel embedding of distributions have witnessed a great deal of practical successes in recent years. However, statistical properties of these tests are largely unknown beyond consistency against a fixed alternative. To fill in this void, we study here the asymptotic properties of goodness-of-fit, homogeneity and independence tests using Gaussian kernels, arguably the most popular and successful among such tests. Our results provide theoretical justifications for this common practice by showing that tests using Gaussian kernel with an appropriately chosen scaling parameter are minimax optimal against smooth alternatives in all three settings. In addition, our analysis also pinpoints the importance of choosing a diverging scaling parameter when using Gaussian kernels and suggests a data-driven choice of the scaling parameter that yields tests optimal, up to an iterated logarithmic factor, over a wide range of smooth alternatives. Numerical experiments are also presented to further demonstrate the practical merits of the methodology."
to:NB  kernel_methods  hilbert_space  goodness-of-fit  hypothesis_testing  statistics 
16 hours ago by cshalizi
[1902.02408] Weak consistency of the 1-nearest neighbor measure with applications to missing data
"When data is partially missing at random, imputation and importance weighting are often used to estimate moments of the unobserved population. In this paper, we study 1-nearest neighbor (1NN) importance weighting, which estimates moments by replacing missing data with the complete data that is the nearest neighbor in the non-missing covariate space. We define an empirical measure, the 1NN measure, and show that it is weakly consistent for the measure of the missing data. The main idea behind this result is that the 1NN measure is performing inverse probability weighting in the limit. We study applications to missing data and mitigating the impact of covariate shift in prediction tasks."
to:NB  missing_data  density_estimation  statistics  sharpnack.james  re:ADAfaEPoV 
16 hours ago by cshalizi
[1509.02556] Identification, Doubly Robust Estimation, and Semiparametric Efficiency Theory of Nonignorable Missing Data With a Shadow Variable
"We consider identification and estimation with an outcome missing not at random (MNAR). We study an identification strategy based on a so-called shadow variable. A shadow variable is assumed to be correlated with the outcome, but independent of the missingness process conditional on the outcome and fully observed covariates. We describe a general condition for nonparametric identification of the full data law under MNAR using a valid shadow variable. Our condition is satisfied by many commonly-used models; moreover, it is imposed on the complete cases, and therefore has testable implications with observed data only. We describe semiparametric estimation methods and evaluate their performance on both simulation data and a real data example. We characterize the semiparametric efficiency bound for the class of regular and asymptotically linear estimators, and derive a closed form for the efficient influence function."
to:NB  statistics  missing_data 
16 hours ago by cshalizi
On Whorfian Socioeconomics by Thomas B. Pepinsky :: SSRN
"Whorfian socioeconomics is an emerging interdisciplinary field of study that holds that linguistic structures explain differences in beliefs, values, and opinions across communities. Its core empirical strategy is to document a correlation between the presence or absence of a linguistic feature in a survey respondent’s language, and her/his responses to survey questions. This essay demonstrates — using the universe of linguistic features from the World Atlas of Language Structures and a wide array of responses from the World Values Survey — that such an approach produces highly statistically significant correlations in a majority of analyses, irrespective of the theoretical plausibility linking linguistic features to respondent beliefs. These results raise the possibility that correlations between linguistic features and survey responses are actually spurious. The essay concludes by showing how two simple and well-understood statistical fixes can more accurately reflect uncertainty in these analyses, reducing the temptation for analysts to create implausible Whorfian theories to explain spurious linguistic correlations."
to:NB  to_read  linguistics  economics  social_science_methodology  pepinsky.thomas_b.  debunking 
16 hours ago by cshalizi
[1909.03968] Tree-based Control Methods: Consequences of Moving the US Embassy
"We recast the synthetic controls for evaluating policies as a counterfactual prediction problem and replace its linear regression with a non-parametric model inspired by machine learning. The proposed method enables us to achieve more accurate counterfactual predictions. We apply our method to a highly-debated policy: the movement of the US embassy to Jerusalem. In Israel and Palestine, we find that the average number of weekly conflicts has increased by roughly 103 % over 48 weeks since the movement was announced on December 6, 2017. Using conformal inference tests, we justify our model and find the increase to be statistically significant."

--- I am very skeptical of the application, but interested in the methodology.
to:NB  causal_inference  statistics  economics  nonparametrics  regression  re:ADAfaEPoV 
16 hours ago by cshalizi
[1909.03238] Linear response and moderate deviations: hierarchical approach. V
"The Moderate Deviations Principle (MDP) is well-understood for sums of independent random variables, worse understood for stationary random sequences, and scantily understood for random fields. Here it is established for some planary random fields of the form Xt=ψ(Gt) obtained from a Gaussian random field Gt via a function ψ, and consequently, for zeroes of the Gaussian Entire Function."
to:NB  central_limit_theorem  random_fields  stochastic_processes 
16 hours ago by cshalizi
[1909.03320] Matrix Calculations for Moments of Markov Processes
"Matryoshka dolls, the traditional Russian nesting figurines, are known world-wide for each doll's encapsulation of a sequence of smaller dolls. In this paper, we identify a large class of Markov process whose moments are easy to compute by exploiting the structure of a new sequence of nested matrices we call Matryoshkhan matrices. We characterize the salient properties of Matryoshkhan matrices that allow us to compute these moments in closed form at a specific time without computing the entire path of the process. This speeds up the computation of the Markov process moments significantly in comparison to traditional differential equation methods, which we demonstrate through numerical experiments. Through our method, we derive explicit expressions for both transient and steady-state moments of this class of Markov processes. We demonstrate the applicability of this method through explicit examples such as shot-noise processes, growth-collapse processes, linear birth-death-immigration processes, and affine stochastic differential equations from the finance literature. We also show that we can derive explicit expressions for the self-exciting Hawkes process, for which finding closed form moment expressions has been an open problem since its introduction."
to:NB  markov_models  stochastic_processes  computational_statistics 
16 hours ago by cshalizi
[1909.03479] Maximum principle for stochastic recursive optimal control problem under model uncertainty
"In this paper, we consider a stochastic recursive optimal control problem under model uncertainty. In this framework, the cost function is described by solutions of a family of backward stochastic differential equations. With the help of the linearization techniques and weak convergence methods, we derive the corresponding stochastic maximum principle. Moreover, a linear quadratic robust control problem is also studied."
to:NB  control_theory_and_control_engineering  stochastic_processes 
16 hours ago by cshalizi
[1909.03805] Large Time Behaviour and the Second Eigenvalue Problem for Finite State Mean-Field Interacting Particle Systems
"This article examines large time behaviour and the second eigenvalue problem for Markovian mean-field interacting particle systems with jumps. Our first main result is on the time required for convergence of the empirical measure process of the particle system to its invariant measure; we show that, there is a constant Λ≥0 such that, when there are N particles in the system and when time is of the order exp{N(Λ+O(1))}, the process has mixed well and is very close to its invariant measure. We then obtain large-N asymptotics of the second eigenvalue of the generator associated with the empirical measure process when it is reversible with respect to its invariant measure. We show that its absolute value scales like exp{−NΛ}. The main tools used in establishing our results are the large deviation properties of the empirical measure process from its large-N limit. As an application of the study of large time behaviour, we also show convergence of the empirical measure of the system of particles to a global minimum of a certain entropy function when particles are added over time in a controlled fashion. The controlled addition of particles is analogous to the cooling schedule associated with the search for a global minimum of a function using the simulated annealing algorithm."
to:NB  interacting_particle_systems  markov_models  stochastic_processes  large_deviations  re:do-institutions-evolve 
16 hours ago by cshalizi
[1811.06448] From weakly interacting particles to a regularised Dean--Kawasaki model
"The evolution of finitely many particles obeying Langevin dynamics is described by Dean-Kawasaki equations, a class of stochastic equations featuring a non-Lipschitz multiplicative noise in divergence form. We derive a regularised Dean-Kawasaki model based on second order Langevin dynamics by analysing a system of particles interacting via a pairwise potential. Key tools of our analysis are the propagation of chaos and Simon's compactness criterion. The model we obtain is a small-noise stochastic perturbation of the undamped McKean-Vlasov equation. We also provide a high-probability result for existence and uniqueness for our model."
to:NB  interacting_particle_systems  stochastic_processes 
16 hours ago by cshalizi
[1909.03550] Lecture Notes: Optimization for Machine Learning
"Lecture notes on optimization for machine learning, derived from a course at Princeton University and tutorials given in MLSS, Buenos Aires, as well as Simons Foundation, Berkeley."
to:NB  optimization  learning_theory  hazan.elad 
16 hours ago by cshalizi
[1909.03586] Curve Fitting from Probabilistic Emissions and Applications to Dynamic Item Response Theory
"Item response theory (IRT) models are widely used in psychometrics and educational measurement, being deployed in many high stakes tests such as the GRE aptitude test. IRT has largely focused on estimation of a single latent trait (e.g. ability) that remains static through the collection of item responses. However, in contemporary settings where item responses are being continuously collected, such as Massive Open Online Courses (MOOCs), interest will naturally be on the dynamics of ability, thus complicating usage of traditional IRT models. We propose DynAEsti, an augmentation of the traditional IRT Expectation Maximization algorithm that allows ability to be a continuously varying curve over time. In the process, we develop CurvFiFE, a novel non-parametric continuous-time technique that handles the curve-fitting/regression problem extended to address more general probabilistic emissions (as opposed to simply noisy data points). Furthermore, to accomplish this, we develop a novel technique called grafting, which can successfully approximate distributions represented by graphical models when other popular techniques like Loopy Belief Propogation (LBP) and Variational Inference (VI) fail. The performance of DynAEsti is evaluated through simulation, where we achieve results comparable to the optimal of what is observed in the static ability scenario. Finally, DynAEsti is applied to a longitudinal performance dataset (80-years of competitive golf at the 18-hole Masters Tournament) to demonstrate its ability to recover key properties of human performance and the heterogeneous characteristics of the different holes. Python code for CurvFiFE and DynAEsti is publicly available at this http URL. This is the full version of our ICDM 2019 paper."
to:NB  inference_to_latent_objects  graphical_models  statistics  psychometrics  non-stationarity 
16 hours ago by cshalizi
[1909.03681] Outlier Detection in High Dimensional Data
"High-dimensional data poses unique challenges in outlier detection process. Most of the existing algorithms fail to properly address the issues stemming from a large number of features. In particular, outlier detection algorithms perform poorly on data set of small size with a large number of features. In this paper, we propose a novel outlier detection algorithm based on principal component analysis and kernel density estimation. The proposed method is designed to address the challenges of dealing with high-dimensional data by projecting the original data onto a smaller space and using the innate structure of the data to calculate anomaly scores for each data point. Numerical experiments on synthetic and real-life data show that our method performs well on high-dimensional data. In particular, the proposed method outperforms the benchmark methods as measured by the F1-score. Our method also produces better-than-average execution times compared to the benchmark methods."

--- Seems OK but ad hoc. Might make a decent extension to the eigendresses assignment for data mining.
to:NB  anomaly_detection  density_estimation  principal_components  high-dimensional_statistics  statistics  to_teach:data-mining 
16 hours ago by cshalizi
[0804.4451] Dependence Structure Estimation via Copula
"Dependence strucuture estimation is one of the important problems in machine learning domain and has many applications in different scientific areas. In this paper, a theoretical framework for such estimation based on copula and copula entropy -- the probabilistic theory of representation and measurement of statistical dependence, is proposed. Graphical models are considered as a special case of the copula framework. A method of the framework for estimating maximum spanning copula is proposed. Due to copula, the method is irrelevant to the properties of individual variables, insensitive to outlier and able to deal with non-Gaussianity. Experiments on both simulated data and real dataset demonstrated the effectiveness of the proposed method."

--- How does this differ from the "nonparanormal" models we know and love around here?
to:NB  causal_discovery  copulas  statistics  graphical_models  to_be_shot_after_a_fair_trial 
16 hours ago by cshalizi
[1908.09946] An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations
"K-Means is one of the most used algorithms for data clustering and the usual clustering method for benchmarking. Despite its wide application it is well-known that it suffers from a series of disadvantages, such as the positions of the initial clustering centres (centroids), which can greatly affect the clustering solution. Over the years many K-Means variations and initialisations techniques have been proposed with different degrees of complexity. In this study we focus on common K-Means variations and deterministic initialisation techniques and we first show that more sophisticated initialisation methods reduce or alleviates the need of complex K-Means clustering, and secondly, that deterministic methods can achieve equivalent or better performance than stochastic methods. These conclusions are obtained through extensive benchmarking using different model data sets from various studies as well as clustering data sets."
to:NB  clustering  k-means  data_mining  to_teach:data-mining 
16 hours ago by cshalizi
[1804.09220] A Useful Version of the Central Limit Theorem for a General Class of Markov Chains
"In the paper we propose certain conditions, relatively easy to verify, which ensure the central limit theorem for some general class of Markov chains. To justify the usefulness of our criterion, we further verify it for a particular discrete-time Markov dynamical system. From the application point of view, the examined system provides a useful tool in analysing the stochastic dynamics of gene expression in prokaryotes."
to:NB  central_limit_theorem  markov_models  stochastic_processes 
2 days ago by cshalizi
[1404.2035] Strongly continuous and locally equi-continuous semigroups on locally convex spaces
"We consider locally equi-continuous strongly continuous semigroups on locally convex spaces (X,tau). First, we show that if (X,tau) has the property that weak* compact sets of the dual are equi-continuous, then strong continuity of the semigroup is equivalent to weak continuity and local equi-continuity.
"Second, we consider locally convex spaces (X,tau) that are also equipped with a `suitable' auxiliary norm. We introduce the set N of tau continuous semi-norms that are bounded by the norm. If (X,tau) has the property that N is closed under countable convex combinations, then a number of Banach space results can be generalised in a straightforward way. Importantly, we extend the Hille-Yosida theorem.
"We apply the results to the study of transition semigroups of Markov processes on complete separable metric spaces."
to:NB  algebra  markov_models  stochastic_processes  re:almost_none 
2 days ago by cshalizi
[1412.8674] Infinite-dimensional stochastic differential equations and tail $ σ$-fields
"We present general theorems solving the long-standing problem of the existence and pathwise uniqueness of strong solutions of infinite-dimensional stochastic differential equations (ISDEs) called interacting Brownian motions. These ISDEs describe the dynamics of infinite-many Brownian particles moving in ℝd with free potential Φ and mutual interaction potential Ψ.
"We apply the theorems to essentially all interaction potentials of Ruelle's class such as the Lennard-Jones 6-12 potential and Riesz potentials, and to logarithmic potentials appearing in random matrix theory. We solve ISDEs of the Ginibre interacting Brownian motion and the sineβ interacting Brownian motion with β=1,2,4. We also use the theorems in separate papers for the Airy and Bessel interacting Brownian motions. One of the critical points for proving the general theorems is to establish a new formulation of solutions of ISDEs in terms of tail σ-fields of labeled path spaces consisting of trajectories of infinitely many particles. These formulations are equivalent to the original notions of solutions of ISDEs, and more feasible to treat in infinite dimensions."
to:NB  stochastic_processes  stochastic_differential_equations  interacting_particle_systems  re:almost_none 
2 days ago by cshalizi
[1909.05794] Stationary distributions of continuous-time Markov chains: a review of theory and truncation-based approximations
"Computing the stationary distributions of a continuous-time Markov chain involves solving a set of linear equations. In most cases of interest, the number of equations is infinite or too large, and cannot be solved analytically or numerically. Several approximation schemes overcome this issue by truncating the state space to a manageable size. In this review, we first give a comprehensive theoretical account of the stationary distributions and their relation to the long-term behaviour of the Markov chain, which is readily accessible to non-experts and free of irreducibility assumptions made in standard texts. We then review truncation-based approximation schemes paying particular attention to their convergence and to the errors they introduce, and we illustrate their performance with an example of a stochastic reaction network of relevance in biology and chemistry. We conclude by elaborating on computational trade-offs associated with error control and some open questions."
to:NB  markov_models  stochastic_processes  re:almost_none  approximation 
2 days ago by cshalizi
[1909.05418] The Global Markov Property for a Mixture of DAGs
"Real causal processes may contain feedback loops and change over time. In this paper, we model cycles and non-stationary distributions using a mixture of directed acyclic graphs (DAGs). We then study the conditional independence (CI) relations induced by a density that factorizes according to a mixture of DAGs in two steps. First, we generalize d-separation for a single DAG to mixture d-separation for a mixture of DAGs. We then utilize the mixture d-separation criterion to derive a global Markov property that allows us to read off the CI relations induced by a mixture of DAGs using a particular summary graph. This result has potentially far reaching applications in algorithm design for causal discovery."
to:NB  causal_discovery  graphical_models  statistics  to_be_shot_after_a_fair_trial 
2 days ago by cshalizi
[1909.05495] Optimal choice of $k$ for $k$-nearest neighbor regression
"The k-nearest neighbor algorithm (k-NN) is a widely used non-parametric method for classification and regression. We study the mean squared error of the k-NN estimator when k is chosen by leave-one-out cross-validation (LOOCV). Although it was known that this choice of k is asymptotically consistent, it was not known previously that it is an optimal k. We show, with high probability, the mean squared error of this estimator is close to the minimum mean squared error using the k-NN estimate, where the minimum is over all choices of k."

--- Looks legit on first pass (and we know that LOOCV is generally _predictively_ good).
to:NB  regression  nearest_neighbors  statistics  cross-validation  to_teach:data-mining  have_skimmed 
2 days ago by cshalizi
[1909.05570] Sharp Large Deviations for empirical correlation coefficients
"We study Sharp Large Deviations for Pearson's empirical correlation coefficients in the Spherical and Gaussian cases."

--- Perhaps a problem set for _Almost None_?
to:NB  large_deviations  statistics  re:almost_none 
2 days ago by cshalizi
[1909.05582] A taxonomy of estimator consistency on discrete estimation problems
"We describe a four-level hierarchy mapping both all discrete estimation problems and all estimators on these problems, such that the hierarchy describes each estimator's consistency guarantees on each problem class. We show that no estimator is consistent for all estimation problems, but that some estimators, such as Maximum A Posteriori, are consistent for the widest possible class of discrete estimation problems. For Maximum Likelihood and Approximate Maximum Likelihood estimators we show that they do not provide consistency on as wide a class, but define a sub-class of problems characterised by their consistency. Lastly, we show that some popular estimators, specifically Strict Minimum Message Length, do not provide consistency guarantees even within the sub-class."
to:NB  statistics  estimation  model_selection 
2 days ago by cshalizi
Dolera , Regazzini : Uniform rates of the Glivenko–Cantelli convergence and their use in approximating Bayesian inferences
"This paper deals with suitable quantifications in approximating a probability measure by an “empirical” random probability measure 𝔭̂ np^n, depending on the first nn terms of a sequence {ξ̃ i}i≥1{ξ~i}i≥1 of random elements. Section 2 studies the range of oscillation near zero of the Wasserstein distance d(p)[𝕊]d[S](p) between 𝔭0p0 and 𝔭̂ np^n, assuming the ξ̃ iξ~i’s i.i.d. from 𝔭0p0. In Theorem 2.1 𝔭0p0 can be fixed in the space of all probability measures on (ℝd,ℬ(ℝd))(Rd,B(Rd)) and 𝔭̂ np^n coincides with the empirical measure 𝔢̃ n:=1n∑ni=1δξ̃ ie~n:=1n∑i=1nδξ~i. In Theorem 2.2 (Theorem 2.3, respectively), 𝔭0p0 is a dd-dimensional Gaussian distribution (an element of a distinguished statistical exponential family, respectively) and 𝔭̂ np^n is another dd-dimensional Gaussian distribution with estimated mean and covariance matrix (another element of the same family with an estimated parameter, respectively). These new results improve on allied recent works by providing also uniform bounds with respect to nn, meaning the finiteness of the pp-moment of supn≥1bnd(p)[𝕊](𝔭0,𝔭̂ n)supn≥1⁡bnd[S](p)(p0,p^n) is proved for some diverging sequence bnbn of positive numbers. In Section 3, assuming the ξ̃ iξ~i’s exchangeable, one studies the range of oscillation near zero of the Wasserstein distance between the conditional distribution – also called posterior – of the directing measure of the sequence, given ξ̃ 1,…,ξ̃ nξ~1,…,ξ~n, and the point mass at 𝔭̂ np^n. Similarly, a bound for the approximation of predictive distributions is given. Finally, Theorems from 3.3 to 3.5 reconsider Theorems from 2.1 to 2.3, respectively, according to a Bayesian perspective."
to:NB  empirical_processes  statistics  re:fitness_sampling 
2 days ago by cshalizi
[1909.05299] Counterfactual Cross-Validation: Effective Causal Model Selection from Observational Data
"What is the most effective way to select the best causal model among potential candidates? In this paper, we propose a method to effectively select the best individual-level treatment effect (ITE) predictors from a set of candidates using only an observational validation set. In model selection or hyperparameter tuning, we are interested in choosing the best model or the value of hyperparameter from potential candidates. Thus, we focus on accurately preserving the rank order of the ITE prediction performance of candidate causal models. The proposed evaluation metric is theoretically proved to preserve the true ranking of the model performance in expectation and to minimize the upper bound of the finite sample uncertainty in model selection. Consistent with the theoretical result, empirical experiments demonstrate that our proposed method is more likely to select the best model and set of hyperparameter in both model selection and hyperparameter tuning."

--- Their goal is a good one, but the fundamental issue is that we don't _have_ observations of individual-level causal effects to cross-validate against. So they proxy that by a very standard doubly-robust estimator of said effects; at which point, why not just use that estimator? In any case, I want to see comparisons to the Naive Statistician's approach of just cross-validating for outcomes (rather than differences in potential outcomes).
to:NB  cross-validation  causal_inference  statistics  model_selection  have_skimmed  shot_after_a_fair_trial 
2 days ago by cshalizi
Capitalism, Alone — Branko Milanovic | Harvard University Press
"We are all capitalists now. For the first time in human history, the globe is dominated by one economic system. In Capitalism, Alone, leading economist Branko Milanovic explains the reasons for this decisive historical shift since the days of feudalism and, later, communism. Surveying the varieties of capitalism, he asks: What are the prospects for a fairer world now that capitalism is the only game in town? His conclusions are sobering, but not fatalistic. Capitalism gets much wrong, but also much right—and it is not going anywhere. Our task is to improve it.
"Milanovic argues that capitalism has triumphed because it works. It delivers prosperity and gratifies human desires for autonomy. But it comes with a moral price, pushing us to treat material success as the ultimate goal. And it offers no guarantee of stability. In the West, liberal capitalism creaks under the strains of inequality and capitalist excess. That model now fights for hearts and minds with political capitalism, exemplified by China, which many claim is more efficient, but which is more vulnerable to corruption and, when growth is slow, social unrest. As for the economic problems of the Global South, Milanovic offers a creative, if controversial, plan for large-scale migration. Looking to the future, he dismisses prophets who proclaim some single outcome to be inevitable, whether worldwide prosperity or robot-driven mass unemployment. Capitalism is a risky system. But it is a human system. Our choices, and how clearly we see them, will determine how it serves us."
to:NB  books:noted  milanovic.branko  economics  political_economy 
3 days ago by cshalizi
The Great Reversal — Thomas Philippon | Harvard University Press
"In this much-anticipated book, a leading economist argues that many key problems of the American economy are due not to the flaws of capitalism or the inevitabilities of globalization but to the concentration of corporate power. By lobbying against competition, the biggest firms drive profits higher while depressing wages and limiting opportunities for investment, innovation, and growth.
"Why are cell-phone plans so much more expensive in the United States than in Europe? It seems a simple question. But the search for an answer took Thomas Philippon on an unexpected journey through some of the most complex and hotly debated issues in modern economics. Ultimately he reached his surprising conclusion: American markets, once a model for the world, are giving up on healthy competition. Sector after economic sector is more concentrated than it was twenty years ago, dominated by fewer and bigger players who lobby politicians aggressively to protect and expand their profit margins. Across the country, this drives up prices while driving down investment, productivity, growth, and wages, resulting in more inequality. Meanwhile, Europe—long dismissed for competitive sclerosis and weak antitrust—is beating America at its own game.
"Philippon, one of the world’s leading economists, did not expect these conclusions in the age of Silicon Valley start-ups and millennial millionaires. But the data from his cutting-edge research proved undeniable. In this compelling tale of economic detective work, we follow him as he works out the basic facts and consequences of industry concentration in the U.S. and Europe, shows how lobbying and campaign contributions have defanged antitrust regulators, and considers what all this means for free trade, technology, and innovation. For the sake of ordinary Americans, he concludes, government needs to return to what it once did best: keeping the playing field level for competition. It’s time to make American markets great—and free—again."
to:NB  books:noted  economics  political_economy  economic_policy  market_failures_in_everything  class_struggles_in_america 
3 days ago by cshalizi
Unbound — Heather Boushey | Harvard University Press
"Do we have to choose between equality and prosperity? Many think that reducing economic inequality would require such heavy-handed interference with market forces that it would stifle economic growth. Heather Boushey, one of Washington’s most influential economic voices, insists nothing could be further from the truth. Presenting cutting-edge economics with journalistic verve, she shows how rising inequality has become a drag on growth and an impediment to a competitive United States marketplace for employers and employees alike.
"Boushey argues that inequality undermines growth in three ways. It obstructs the supply of talent, ideas, and capital as wealthy families monopolize the best educational, social, and economic opportunities. It also subverts private competition and public investment. Powerful corporations muscle competitors out of business, in the process costing consumers, suppressing wages, and hobbling innovation, while governments underfund key public goods that make the American Dream possible, from schools to transportation infrastructure to information and communication technology networks. Finally, it distorts consumer demand as stagnant wages and meager workplace benefits rob ordinary people of buying power and pushes the economy toward financial instability.
"Boushey makes this case with a clear, accessible tour of the best of contemporary economic research, while also injecting a passion for her subject gained through years of research into the economics of work–life conflict and policy work in the trenches of federal government. Unbound exposes deep problems in the U.S. economy, but its conclusion is optimistic. We can preserve the best of our nation’s economic and political traditions, and improve on them, by pursuing policies that reduce inequality—and by doing so, boost broadly shared economic growth."
to:NB  economics  inequality  political_economy  class_struggles_in_america 
3 days ago by cshalizi
Search Advertising and Information Discovery: Are Consumers Averse to Sponsored Messages? by Navdeep S. Sahni, Charles Zhang :: SSRN
"We analyze a large-scale field experiment conducted on a US search engine in which 3.3 million users were randomized into seeing more, or less advertising. Our data rejects that users are, overall, averse to search advertising targeted to them. At the margin, users prefer the search engine with higher level of advertising. The usage of the search engine (in terms of number of searches, and number of search sessions) is higher among users who see higher levels of advertising, relative to the control group. This difference in usage persists even after the experimental treatment ends. The increase in usage is higher for users on the margin who, in the past, typed a competing search engine's name in the search query and navigated away from our focal search engine. On the supply side, higher level of advertising increases traffic to newer websites. Consumer response to search advertising is also more positive when more businesses located in the consumer's state create new websites. Quantitatively, the experimental treatment of a higher level of advertising increases ad clicks which leads to between 4.3% to 14.6% increase in search engine revenue.
"Overall, patterns in our data are consistent with an equilibrium in which advertising conveys relevant “local” information, which is unknown to the search engine, and therefore missed by the organic listings algorithm. Hence, search advertising makes consumers better off on average. On the margin, the search engine does not face a trade-off between advertising revenue and search engine usage."
to:NB  advertising  search_engines  networked_life  huh 
3 days ago by cshalizi
[1702.08109] Variational Analysis of Constrained M-Estimators
"We propose a unified framework for establishing existence of nonparametric M-estimators, computing the corresponding estimates, and proving their strong consistency when the class of functions is exceptionally rich. In particular, the framework addresses situations where the class of functions is complex involving information and assumptions about shape, pointwise bounds, location of modes, height at modes, location of level-sets, values of moments, size of subgradients, continuity, distance to a "prior" function, multivariate total positivity, and any combination of the above. The class might be engineered to perform well in a specific setting even in the presence of little data. The framework views the class of functions as a subset of a particular metric space of upper semicontinuous functions under the Attouch-Wets distance. In addition to allowing a systematic treatment of numerous M-estimators, the framework yields consistency of plug-in estimators of modes of densities, maximizers of regression functions, level-sets of classifiers, and related quantities, and also enables computation by means of approximating parametric classes. We establish consistency through a one-sided law of large numbers, here extended to sieves, that relaxes assumptions of uniform laws, while ensuring global approximations even under model misspecification."
to:NB  estimation  empirical_processes  statistics 
3 days ago by cshalizi
[1908.08702] Economically rational sample-size choice and irreproducibility
"Several systematic studies have suggested that a large fraction of published research is not reproducible. One probable reason for low reproducibility is insufficient sample size, resulting in low power and low positive predictive value. It has been suggested that insufficient sample-size choice is driven by a combination of scientific competition and 'positive publication bias'. Here we formalize this intuition in a simple model, in which scientists choose economically rational sample sizes, balancing the cost of experimentation with income from publication. Specifically, assuming that a scientist's income derives only from 'positive' findings (positive publication bias) and that individual samples cost a fixed amount, allows to leverage basic statistical formulas into an economic optimality prediction. We find that if effects have i) low base probability, ii) small effect size or iii) low grant income per publication, then the rational (economically optimal) sample size is small. Furthermore, for plausible distributions of these parameters we find a robust emergence of a bimodal distribution of obtained statistical power and low overall reproducibility rates, matching empirical findings. Overall, the model describes a simple mechanism explaining both the prevalence and the persistence of small sample sizes. It suggests economic rationality, or economic pressures, as a principal driver of irreproducibility."

--- To be clear, my skepticism here isn't about the basic idea, which has been articulated about a zillion times (back to Meehl at least...), but rather whether mathing it up with dubious simplifying assumptions adds anything of value.
to:NB  to_be_shot_after_a_fair_trial  bad_data_analysis  statistics  sociology_of_science  economics 
3 days ago by cshalizi
[1909.04436] The Prevalence of Errors in Machine Learning Experiments
"Context: Conducting experiments is central to research machine learning research to benchmark, evaluate and compare learning algorithms. Consequently it is important we conduct reliable, trustworthy experiments. Objective: We investigate the incidence of errors in a sample of machine learning experiments in the domain of software defect prediction. Our focus is simple arithmetical and statistical errors. Method: We analyse 49 papers describing 2456 individual experimental results from a previously undertaken systematic review comparing supervised and unsupervised defect prediction classifiers. We extract the confusion matrices and test for relevant constraints, e.g., the marginal probabilities must sum to one. We also check for multiple statistical significance testing errors. Results: We find that a total of 22 out of 49 papers contain demonstrable errors. Of these 7 were statistical and 16 related to confusion matrix inconsistency (one paper contained both classes of error). Conclusions: Whilst some errors may be of a relatively trivial nature, e.g., transcription errors their presence does not engender confidence. We strongly urge researchers to follow open science principles so errors can be more easily be detected and corrected, thus as a community reduce this worryingly high error rate with our computational experiments."
to:NB  bad_data_analysis  machine_learning  to_teach:data-mining 
3 days ago by cshalizi
[1809.06522] Concentration Inequalities for the Empirical Distribution
"We study concentration inequalities for the Kullback--Leibler (KL) divergence between the empirical distribution and the true distribution. Applying a recursion technique, we improve over the method of types bound uniformly in all regimes of sample size n and alphabet size k, and the improvement becomes more significant when k is large. We discuss the applications of our results in obtaining tighter concentration inequalities for L1 deviations of the empirical distribution from the true distribution, and the difference between concentration around the expectation or zero. We also obtain asymptotically tight bounds on the variance of the KL divergence between the empirical and true distribution, and demonstrate their quantitatively different behaviors between small and large sample sizes compared to the alphabet size."
to:NB  concentration_of_measure  deviation_inequalities  empirical_processes  statistics  probability 
4 days ago by cshalizi
[1909.04475] Variable Length Markov Chains, Persistent Random Walks: a close encounter
"This is the story of the encounter between two worlds: the world of random walks and the world of Variable Length Markov Chains (VLMC). The meeting point turns around the semi-Markov property of underlying processes."
to:NB  markov_models  stochastic_processes  to_read 
4 days ago by cshalizi
[1909.03634] Krylov Subspace Method for Nonlinear Dynamical Systems with Random Noise
"Operator-theoretic analysis of nonlinear dynamical systems has attracted much attention in a variety of engineering and scientific fields, endowed with practical estimation methods using data such as dynamic mode decomposition. In this paper, we address a lifted representation of nonlinear dynamical systems with random noise based on transfer operators, and develop a novel Krylov subspace method for estimating it using finite data, with consideration of the unboundedness of operators. For this purpose, we first consider Perron-Frobenius operators with kernel-mean embeddings for such systems. Then, we extend the Arnoldi method, which is the most classical type of Kryov subspace methods, so that it can be applied to the current case. Meanwhile, the Arnoldi method requires the assumption that the operator is bounded, which is not necessarily satisfied for transfer operators on nonlinear systems. We accordingly develop the shift-invert Arnoldi method for the Perron-Frobenius operators to avoid this problem. Also, we describe a way of evaluating the predictive accuracy by estimated operators on the basis of the maximum mean discrepancy, which is applicable, for example, to anomaly detection in complex systems. The empirical performance of our methods is investigated using synthetic and real-world healthcare data."
to:NB  dynamical_systems  stochastic_processes  hilbert_space  kernel_methods 
4 days ago by cshalizi
[1909.04495] Natural Adversarial Sentence Generation with Gradient-based Perturbation
"This work proposes a novel algorithm to generate natural language adversarial input for text classification models, in order to investigate the robustness of these models. It involves applying gradient-based perturbation on the sentence embeddings that are used as the features for the classifier, and learning a decoder for generation. We employ this method to a sentiment analysis model and verify its effectiveness in inducing incorrect predictions by the model. We also conduct quantitative and qualitative analysis on these examples and demonstrate that our approach can generate more natural adversaries. In addition, it can be used to successfully perform black-box attacks, which involves attacking other existing models whose parameters are not known. On a public sentiment analysis API, the proposed method introduces a 20% relative decrease in average accuracy and 74% relative increase in absolute error."
to:NB  adversarial_examples 
4 days ago by cshalizi
[1909.03543] An Experimental Study of Structural Diversity in Social Networks
"Several recent studies of online social networking platforms have found that adoption rates and engagement levels are positively correlated with structural diversity, the degree of heterogeneity among an individual's contacts as measured by network ties. One common theory for this observation is that structural diversity increases utility, in part because there is value to interacting with people from different network components on the same platform. While compelling, evidence for this causal theory comes from observational studies, making it difficult to rule out non-causal explanations. We investigate the role of structural diversity on retention by conducting a large-scale randomized controlled study on the Twitter platform. We first show that structural diversity correlates with user retention on Twitter, corroborating results from past observational studies. We then exogenously vary structural diversity by altering the set of network recommendations new users see when joining the platform; we confirm that this design induces the desired changes to network topology. We find, however, that low, medium, and high structural diversity treatment groups in our experiment have comparable retention rates. Thus, at least in this case, the observed correlation between structural diversity and retention does not appear to result from a causal relationship, challenging theories based on past observational studies."
to:NB  social_networks  experimental_sociology  re:democratic_cognition  ugander.johan  goel.sharad 
4 days ago by cshalizi
[1909.04189] Follow the Leader: Documents on the Leading Edge of Semantic Change Get More Citations
"Diachronic word embeddings offer remarkable insights into the evolution of language and provide a tool for quantifying socio-cultural change. However, while this method identifies words that have semantically shifted, it studies them in isolation; it does not facilitate the discovery of documents that lead or lag with respect to specific semantic innovations. In this paper, we propose a method to quantify the degree of semantic progressiveness in each usage. These usages can be aggregated to obtain scores for each document. We analyze two large collections of documents, representing legal opinions and scientific articles. Documents that are predicted to be semantically progressive receive a larger number of citations, indicating that they are especially influential. Our work thus provides a new technique for identifying lexical semantic leaders and demonstrates a new link between early adoption and influence in a citation network."
to:NB  text_mining  bibliometry  sociology_of_science  lerman.kristina  law 
4 days ago by cshalizi
Louder and Faster by Deborah Wong - Paperback - University of California Press
"A free open access ebook is available upon publication. Learn more at www.luminosoa.org.
"Louder and Faster is a cultural study of the phenomenon of Asian American taiko, the thundering, athletic drumming tradition that originated in Japan. Immersed in the taiko scene for twenty years, Deborah Wong has witnessed cultural and demographic changes and the exponential growth and expansion of taiko particularly in Southern California. Through her participatory ethnographic work, she reveals a complicated story embedded in memories of Japanese American internment and legacies of imperialism, Asian American identity and politics, a desire to be seen and heard, and the intersection of culture and global capitalism. Exploring the materialities of the drums, costumes, and bodies that make sound, analyzing the relationship of these to capitalist multiculturalism, and investigating the gender politics of taiko, Louder and Faster considers both the promises and pitfalls of music and performance as an antiracist practice. The result is a vivid glimpse of an Asian American presence that is both loud and fragile."

--- Presumably Wong looks at participation by Asian-but-not-Japanese Americans...
to:NB  books:noted  music  ethnography  identity_group_formation  california 
4 days ago by cshalizi
[1909.03726] On the Fluctuation-Dissipation Relation in non-equilibrium and non-Hamiltonian systems
"We review generalized Fluctuation-Dissipation Relations which are valid under general conditions even in ``non-standard systems'', e.g. out of equilibrium and/or without a Hamiltonian structure. The response functions can be expressed in terms of suitable correlation functions computed in the unperperturbed dynamics. In these relations, typically one has nontrivial contributions due to the form of the stationary probability distribution; such terms take into account the interaction among the relevant degrees of freedom in the system. We illustrate the general formalism with some examples in non-standard cases, including driven granular media, systems with a multiscale structure, active matter and systems showing anomalous diffusion."
to:NB  fluctuation-response  non-equilibrium  statistical_mechanics  stochastic_processes  vulpiani.angelo 
4 days ago by cshalizi
[1909.03760] Fluctuation-Dissipation Relation and Hawking Effect
"We show a direct connection between Kubo's fluctuation-dissipation relation and Hawking effect that is valid in any dimensions. The relevant correlators, computed from the known expressions for the stress tensor, are shown to satisfy the Kubo relation, from which the temperature of a black hole as seen by an observer at an arbitrary distance is abstracted. This reproduces the Tolman temperature and hence the Hawking temperature as that measured by an observer at infinity."
to:NB  fluctuation-response  gravitation  physics  statistical_mechanics 
4 days ago by cshalizi
[1908.11287] Using the fluctuation-dissipation theorem for non-conservative forces
"An equilibrium system which is perturbed by an external potential relaxes to a new equilibrium state, a process obeying the fluctuation-dissipation theorem. In contrast, perturbing by non-conservative forces yields a nonequilibrium steady state, and the fluctuation-dissipation theorem can in general not be applied. Here we exploit a freedom inherent to linear response theory: Force fields which perform work that does not couple statistically to the considered observable can be added without changing the response. Using this freedom, we demonstrate that the fluctuation-dissipation theorem can be applied for certain non-conservative forces. We discuss the case of a non-conservative force field linear in particle coordinates, where the mentioned freedom can be formulated in terms of symmetries. In particular, for the case of shear, this yields a new response formula, which we find advantageous over the known Green-Kubo relation in terms of statistical accuracy."
to:NB  fluctuation-response  non-equilibrium  statistical_mechanics 
4 days ago by cshalizi
[1909.03093] Solving Interpretable Kernel Dimension Reduction
"Kernel dimensionality reduction (KDR) algorithms find a low dimensional representation of the original data by optimizing kernel dependency measures that are capable of capturing nonlinear relationships. The standard strategy is to first map the data into a high dimensional feature space using kernels prior to a projection onto a low dimensional space. While KDR methods can be easily solved by keeping the most dominant eigenvectors of the kernel matrix, its features are no longer easy to interpret. Alternatively, Interpretable KDR (IKDR) is different in that it projects onto a subspace \textit{before} the kernel feature mapping, therefore, the projection matrix can indicate how the original features linearly combine to form the new features. Unfortunately, the IKDR objective requires a non-convex manifold optimization that is difficult to solve and can no longer be solved by eigendecomposition. Recently, an efficient iterative spectral (eigendecomposition) method (ISM) has been proposed for this objective in the context of alternative clustering. However, ISM only provides theoretical guarantees for the Gaussian kernel. This greatly constrains ISM's usage since any kernel method using ISM is now limited to a single kernel. This work extends the theoretical guarantees of ISM to an entire family of kernels, thereby empowering ISM to solve any kernel method of the same objective. In identifying this family, we prove that each kernel within the family has a surrogate Φ matrix and the optimal projection is formed by its most dominant eigenvectors. With this extension, we establish how a wide range of IKDR applications across different learning paradigms can be solved by ISM. To support reproducible results, the source code is made publicly available on \url{this https URL}."

--- Last tag is dreamily aspirational.
to:NB  kernel_methods  dimension_reduction  principal_components  to_teach:data-mining 
4 days ago by cshalizi
A problem in theory | Nature Human Behaviour
"The replication crisis facing the psychological sciences is widely regarded as rooted in methodological or statistical shortcomings. We argue that a large part of the problem is the lack of a cumulative theoretical framework or frameworks. Without an overarching theoretical framework that generates hypotheses across diverse domains, empirical programs spawn and grow from personal intuitions and culturally biased folk theories. By providing ways to develop clear predictions, including through the use of formal modelling, theoretical frameworks set expectations that determine whether a new finding is confirmatory, nicely integrating with existing lines of research, or surprising, and therefore requiring further replication and scrutiny. Such frameworks also prioritize certain research foci, motivate the use diverse empirical approaches and, often, provide a natural means to integrate across the sciences. Thus, overarching theoretical frameworks pave the way toward a more general theory of human behaviour. We illustrate one such a theoretical framework: dual inheritance theory."

--- Well, yes. Coming from physics, my reaction to a lot of social science has long been that it doesn't have any _theories_, just _ideas_ [http://bactra.org/weblog/347.html]. Even in economics, theories rarely predict more than the sign of partial derivatives. Whether this can really be changed is another question...
social_science_methodology  to_read  via:rvenkat  to:NB 
4 days ago by cshalizi
[1909.02436] Are Adversarial Robustness and Common Perturbation Robustness Independant Attributes ?
"Neural Networks have been shown to be sensitive to common perturbations such as blur, Gaussian noise, rotations, etc. They are also vulnerable to some artificial malicious corruptions called adversarial examples. The adversarial examples study has recently become very popular and it sometimes even reduces the term "adversarial robustness" to the term "robustness". Yet, we do not know to what extent the adversarial robustness is related to the global robustness. Similarly, we do not know if a robustness to various common perturbations such as translations or contrast losses for instance, could help with adversarial corruptions. We intend to study the links between the robustnesses of neural networks to both perturbations. With our experiments, we provide one of the first benchmark designed to estimate the robustness of neural networks to common perturbations. We show that increasing the robustness to carefully selected common perturbations, can make neural networks more robust to unseen common perturbations. We also prove that adversarial robustness and robustness to common perturbations are independent. Our results make us believe that neural network robustness should be addressed in a broader sense."
to:NB  adversarial_examples 
5 days ago by cshalizi
[1706.06083] Towards Deep Learning Models Resistant to Adversarial Attacks
"Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples---inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. Code and pre-trained models are available at this https URL and this https URL."
to:NB  adversarial_examples 
5 days ago by cshalizi
[1909.02330] McDiarmid-Type Inequalities for Graph-Dependent Variables and Stability Bounds
"A crucial assumption in most statistical learning theory is that samples are independently and identically distributed (i.i.d.). However, for many real applications, the i.i.d. assumption does not hold. We consider learning problems in which examples are dependent and their dependency relation is characterized by a graph. To establish algorithm-dependent generalization theory for learning with non-i.i.d. data, we first prove novel McDiarmid-type concentration inequalities for Lipschitz functions of graph-dependent random variables. We show that concentration relies on the forest complexity of the graph, which characterizes the strength of the dependency. We demonstrate that for many types of dependent data, the forest complexity is small and thus implies good concentration. Based on our new inequalities we are able to build stability bounds for learning from graph-dependent data."
to:NB  learning_theory  dependence_measures  to_read 
5 days ago by cshalizi
[1909.02187] More Adaptive Algorithms for Tracking the Best Expert
"In this paper, we consider the problem of prediction with expert advice in dynamic environments. We choose tracking regret as the performance metric and derive novel data-dependent bounds by developing two adaptive algorithms. The first algorithm achieves a second-order tracking regret bound, which improves existing first-order bounds. The second algorithm enjoys a path-length bound, which is generally incomparable to the second-order bound but offers advantages in slowly moving environments. Both algorithms are developed under the online mirror descent framework and draw inspiration from existing algorithms that attain data-dependent bounds of static regret. The key idea is to use a clipped simplex in the updating step of online mirror descent. Finally, we extend our algorithms and analysis to the problem of online matrix prediction and provide the first data-dependent tracking regret bound for this problem."
to:NB  online_learning  low-regret_learning  re:growing_ensemble_project  to_read 
5 days ago by cshalizi
[1809.06848] On the Learning Dynamics of Deep Neural Networks
"While a lot of progress has been made in recent years, the dynamics of learning in deep nonlinear neural networks remain to this day largely misunderstood. In this work, we study the case of binary classification and prove various properties of learning in such networks under strong assumptions such as linear separability of the data. Extending existing results from the linear case, we confirm empirical observations by proving that the classification error also follows a sigmoidal shape in nonlinear architectures. We show that given proper initialization, learning expounds parallel independent modes and that certain regions of parameter space might lead to failed training. We also demonstrate that input norm and features' frequency in the dataset lead to distinct convergence speeds which might shed some light on the generalization capabilities of deep neural networks. We provide a comparison between the dynamics of learning with cross-entropy and hinge losses, which could prove useful to understand recent progress in the training of generative adversarial networks. Finally, we identify a phenomenon that we baptize \textit{gradient starvation} where the most frequent features in a dataset prevent the learning of other less frequent but equally informative features."
to:NB  optimization  neural_networks 
5 days ago by cshalizi
What happened to cognitive science? | Nature Human Behaviour
"More than a half-century ago, the ‘cognitive revolution’, with the influential tenet ‘cognition is computation’, launched the investigation of the mind through a multidisciplinary endeavour called cognitive science. Despite significant diversity of views regarding its definition and intended scope, this new science, explicitly named in the singular, was meant to have a cohesive subject matter, complementary methods and integrated theories. Multiple signs, however, suggest that over time the prospect of an integrated cohesive science has not materialized. Here we investigate the status of the field in a data-informed manner, focusing on four indicators, two bibliometric and two socio-institutional. These indicators consistently show that the devised multi-disciplinary program failed to transition to a mature inter-disciplinary coherent field. Bibliometrically, the field has been largely subsumed by (cognitive) psychology, and educationally, it exhibits a striking lack of curricular consensus, raising questions about the future of the cognitive science enterprise."
to:NB  bibliometry  cognitive_science  sociology_of_science  via:? 
5 days ago by cshalizi
[1909.01506] Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control
"Many real-world sequential decision-making problems can be formulated as optimal control with high-dimensional observations and unknown dynamics. A promising approach is to embed the high-dimensional observations into a lower-dimensional latent representation space, estimate the latent dynamics model, then utilize this model for control in the latent space. An important open question is how to learn a representation that is amenable to existing control algorithms? In this paper, we focus on learning representations for locally-linear control algorithms, such as iterative LQR (iLQR). By formulating and analyzing the representation learning problem from an optimal control perspective, we establish three underlying principles that the learned representation should comprise: 1) accurate prediction in the observation space, 2) consistency between latent and observation space dynamics, and 3) low curvature in the latent space transitions. These principles naturally correspond to a loss function that consists of three terms: prediction, consistency, and curvature (PCC). Crucially, to make PCC tractable, we derive an amortized variational bound for the PCC loss function. Extensive experiments on benchmark domains demonstrate that the new variational-PCC learning algorithm benefits from significantly more stable and reproducible training, and leads to superior control performance. Further ablation studies give support to the importance of all three PCC components for learning a good latent space for control."
to:NB  control_theory_and_control_engineering  differential_geometry  stochastic_processes  state-space_models 
5 days ago by cshalizi
[1903.11482] A Sober Look at Neural Network Initializations
"Initializing the weights and the biases is a key part of the training process of a neural network. Unlike the subsequent optimization phase, however, the initialization phase has gained only limited attention in the literature. In this paper we discuss some consequences of commonly used initialization strategies for vanilla DNNs with ReLU activations. Based on these insights we then develop an alternative initialization strategy. Finally, we present some large scale experiments assessing the quality of the new initialization strategy."

--- Cf. _Numerical Recipes_: "Carefully crafted initial estimates reward you not only with reduced computational effort, but also with understanding and increased self-esteem."
to:NB  neural_networks  optimization  steinwart.ingo  minsky_shut_his_eyes 
5 days ago by cshalizi
« earlier      
per page:    204080120160

related tags

#include:my_usual_skepticism_about_this_kind_of_psychometrics  18th_century_history  19th_century_history  20th_century_history  1960s  1970s  aaronson.scott  abarbanel.henry  abbott.andrew  abstraction  abstract_algebra  academia  academic_freedom  accounting  active_learning  acupuncture  adamic.lada  adaptation  adaptive_behavior  adaptive_preferences  addams.jane  addiction  additive_models  adversarial_examples  advertising  aesthetics  affirmative_action  afghanistan  africa  african_history  age-period-cohort  agent-based_models  aging  agriculture  AI  AIDS  airoldi.edo  air_hockey  al-qaeda  alaska  albert.david_z.  alchemy  alcoholism  aldous.david  alexander.christopher  alexander.jennifer_karns  algebra  algebraic_geometry  algebraic_statistics  algorithmic_fairness  algorithms  allen.danielle_s.  allen.genevera_i.  allometric_scaling  allometry  alquier.pierre  altruism  amari.shun-ichi  america  american_hegemony  american_history  american_revolution  american_south  american_southwest  american_west  analogy  analysis  analysis_of_variance  ancel_meyers.lauren  ancient_greece  ancient_history  ancient_persia  ancient_trade  animals  anima_naturaliter_christina_vel_superstitiosam  anomaly_detection  another_piece_of_my_childhood_becomes_a_part_of_the_historical_record  antarctica  anthropocene  anthropology  anti-intellectualism_in_american_life  anxiety  appropriations_of_complexity  approximate_bayesian_computation  approximation  arab_spring  aral.sinan  aral_sea  archaea  archaeology  architecture  arendt.hannah  arlot.sylvain  armchair_travel  arms_trade  arnold.v.i.  aronow.peter  arrow.kenneth  arrow_of_time  art  artificial_intelligence  artificial_life  art_history  asceticism  ass-covering  associative_learning  asteroids  astonishing_if_true  astrobiology  astrology  astronomy  asymptotics  ataturk  athens  athey.susan  attention  attractor_reconstruction  authoritarian_personality  autism  autocatalysis  automata  automata_theory  automation  autonomous_agents  autonomy  averaged_equations_of_motion  avigad.jeremy  ay.nihat  azizyan.martin  aztecs  babbage.charles  bach.francis_r.  bad_data_analysis  bad_science  bad_science_journalism  baez.john_c.  bahamas  bail  bakshy.eytan  balakrishnan.sivaraman  balkin.jack  balkin.jack_m.  ball.philip  ballistic_computation  bandit_problems  banking  banks.david  barely-comprehensible_mathematics  barely-comprehensible_metaphysics  barfield.thomas  barrel_cortex  barron.andrew_w.  basis_selection  bassett.danielle_s.  bayesianism  bayesian_consistency  bayesian_nonparametrics  beaumont.mark  because_everything_is_clearer_with_category_theory  beckwith.christopher  bednar.jenna  beer.randall  bees  behavioral_ecology  belief_propagation  belkin.mikhail  belusov-zhabotinsky  benfords_law  bengio.yoshua  berger.james  bergstrom.carl  bergstrom.carl_t.  bernstein-von_mises  bialek.william  bias-variance  bibliometry  bickel.peter_j.  bifurcations  bigotry  binning  biochemical_networks  biochemistry  biocomputers  biofilms  bioinformatics  biological_computation  biological_organization  biology  biophysics  biotechnology  birds  birge.lucien  bitcoin  blanchard.gilles  blanchard.olivier  blandly_dystopian_academic_prose  blattman.chris  blei.david  blogged  blogging  blogs  blume.andreas  blyth.mark  bok.derek  bollt.erik_m.  boltzmann.ludwig  boltzmann_brains  books:contributed-to  books:coveted  books:noted  books:owned  books:recommended  books:reviewed  books_about_books  book_reviews  boolean_algebra  boolean_networks  boosting  boots.byron  bootstrap  borgs.christian  borsboom.denny  bosch.hieronymous  botany  bounded_rationality  bowles.samuel  boyer.pascal  bradley.elizabeth  branching_processes  brantingham.jeff  brock.william_a.  broken_windows  brownian_motion  bubbles  buddhism  budiansky.stephen  buhlmann.peter  bullock.seth  buntine.wray  bureaucracy  burke.edmund  burning_man  business  butts.carter  byzantium  by_people_i_know  by_the_magistrate_as_equally_useful  cahokia  calibration  california  calvinism  cannibalism  capitalism  carbon_pricing  carnegie.andrew  carroll.sean  cart  cartograms  cartography  carvalho.carlos  cassirer.ernst  categorical_data  categorization  category_theory  catholicism  cats  catullus  cat_map  causality  causal_discovery  causal_inference  cavalli-sforza.l.luca  cellular_automata  censorship  census  centola.damon  centrality  central_asia  central_limit_theorem  ceramics  cesa-bianchi.nicolo  cetaceans  chains_with_complete_connections  chambers.john  chandrasekhar.arun  chandrashekar.arun  change-point_problem  change_of_representation  chaos  chatterjee.sourav  chayes.jennifer  chazelle.bernard  chechnya  chemistry  chemotaxis  chen.yen-chi  chicago  china  china:prc  choi.david  choi.david_s.  chomsky.noam  chouldechova.alexandra  chow-liu_trees  christakis.nicholas  christianity  cirkovic.milan  citation_networks  cities  civil_disobedience  civil_society  clarke.kevin  classical_mechanics  classifiers  class_struggles_in_america  clauset.aaron  cleveland.william_s.  climate_change  climatology  clinical-vs-actuarial_prediction  clinical_vs_actuarial_judgment  clustering  co-evolution  coarse-graining  cockroaches  cognition  cognitive_biases  cognitive_development  cognitive_science  cognitive_triage  cohen.michael  cohen.william  cold_war  cold_war_legacies  coleman.todd  coleman.todd_p.  collaborative_filtering  collective_action  collective_cognition  collective_support_for_individual_choice  collins.harry  collins.randall  colombia  combinatorics  combinatroics  comics  commons  communism  community_discovery  comparative_history  comparative_methods  comparative_sociology  comparing_distributions  competence_vs_performance  complexity  complexity_measures  compressed_sensing  computation  computational_complexity  computational_statistics  computers  computer_networks  computer_networks_as_provinces_of_the_commonwealth_of_letters  concentration_of_measure  condensed_matter  conditional_density_estimation  conditional_independence  conditional_independence_tests  conditional_random_fields  confidence_sets  confirmation_bias  conformal_prediction  confucianism  congress  connected_time_series  connectionism  consciousness  conservatism  conspiracy_theories  constraint-satisfaction  contagion  contingency_tables  control  control_of_movement  control_theory_and_control_engineering  convergence_of_stochastic_processes  convexity  conways_life  copernicus  copulas  corporations  correlation  corruption  cortical_maps  cosmology  cosmopolitanism  cost-benefit_analysis  counter-culture  counter-insurgency  counter-terrorism  counterculture  coupled_map_lattices  coup_d'etat  couzin.iain  covariance  covariate_shift  cover.thomas  coveted  cox.d.r  cox.d.r.  cox.david_r.  cramer-rao_inequality  credit  credit_attribution  credit_ratings  creeping_authoritarianism  crime  crispin.jessa  critical_theory  criticism_of_criticism_of_criticism  cross-cultural_psychology  cross-validation  crouch.colin  crustaceans  crutchfield.james_p.  cryptography  CSSR  cults  cultural_criticism  cultural_differences  cultural_diversity  cultural_evolution  cultural_exchange  cultural_history  cultural_revolution  cultural_transmission  cultural_transmission_of_cognitive_tools  cultural_universals  cumulants  cumulative_advantage  curiosity  curse_of_dimensionality  curve-estimation  curve_fitting  cybernetics  cyclones  danks.david  darmon.david  darwin.charles  darwin_machines  databases  dataset_shift  data_analysis  data_collection  data_mining  data_sets  dawid.a._philip  dawid.philip  debowski.lukasz  debunking  deceiving_us_has_become_an_industrial_process  decision-making  decision_making  decision_theory  decision_trees  decline_of_american_character  decoherence  dedeo.simon  defenses_of_liberalism  degrees_of_freedom  dehaene.stanislas  delanda.manuel  delhi  della_penna.nicholas  delong.brad  del_moral.pierre  democracy  demography  demogrqphic_transition  density_estimation  den_hollander.frank  dependence_measures  depression  deregulation  descartes.rene  design  design_for_a_brain  determinism  developmental_biology  development_economics  development_policy  deviation_inequalities  devroye.luc  dewey.john  de_deo.simon  diaconis.persi  didion.joan  did_someone_say_blood_music?  did_this_really_need_saying?  differential_equations  differential_geometry  differential_privacy  diffusion_maps  diffusion_of_innovations  digital_humanities  dimensional_analysis  dimension_estimation  dimension_reduction  dinosaurs  diplomacy  disasters  discipline_and_punish  discovery_of_the_past  discretization  discrimination  disgust  distance_covariance  distributed_systems  diversity  divination  dobra.adrian  document_summarization  dodds.peter_sheridan  domingos.pedro  donoho.david  dont_feed_the_trolls  dopamine  downloaded  dragons  drosophila  drugs  duchi.john_c.  dudoit.sandrine  dunn.robin  dupree.louis  durer.albrecht  du_bois.w.e.b.  dweck.carol  dynamical_systems  dynamics_on_networks  dynamic_networks  dyson.freeman  eagle.nathan  early_modern_european_history  early_modern_world_history  earthquakes  eberhardt.frederick  ebola  eckles.dean  eckmann.jean-pierre  ecological_fallacy  ecological_rationality  ecology  econometrics  economics  economics_of_imperfect_information  economics_of_information  economic_geography  economic_growth  economic_history  economic_policy  econophysics  edge_of_chaos  education  EEG  effective_connectivity  effective_field_theories  efficiency  efromovich.sam  efron.bradley  egypt  eichengreen.barry  eichler.michael  electricity  electronic_copy  elephants  eliassi-rad.tina  ellis.warren  ellsberg.daniel  elster.jon  elwert.felix  emergence  emotion  empirical_likelihood  empirical_processes  empiricism  em_algorithm  encyclopedias  endocrinology  energy  engineering  engineering_of_self-organization  engineers  english  enlightenment  enlightment  ensemble_methods  entableted  entropy  entropy_estimation  environmentalism  environmental_history  environmental_management  environmental_policy  epidemics  epidemics_on_networks  epidemic_models  epidemiology  epidemiology_of_ideas  epidemiology_of_representations  epilepsy  epistasis  epistemology  equality  equality_of_opportunity  equations_of_motion_from_a_time_series  equations_of_motion_from_data  ergodicity  ergodic_decomposition  ergodic_theory  error-in-variables  errors-in-variables  erwin.douglas  eschatology  esotericism  essentialism  estimation  etheridge.alison  ethics  ethnicity  ethnography  ethology  etiquette  eurasian_history  europe  european_history  evangelicals  evans.kellie_m.  event_related_potentials  everything_old_is_new_again  every_biological_invention_begins_as_a_perversion  evidence  evidence_based  evisceration  evo-devo  evolution  evolutionary_biology  evolutionary_economics  evolutionary_game_theory  evolutionary_optimization  evolutionary_psychology  evolution_of_cognition  evolution_of_complexity  evolution_of_cooperation  evolution_of_learning  evolution_of_life  evolution_of_morality  evolving_local_rules  excitable_media  executive_function  expectation-maximization  experimental_biology  experimental_design  experimental_economics  experimental_psychology  experimental_sociology  expertise  explanation  explanation_by_mechanisms  exploitation  exploratory_data_analysis  exponential_families  exponential_family_random_graphs  externalities  extraordinary_if_true  extreme_values  eye_of_newt  e_coli  facebook  face_perception  face_recognition  factor_analysis  fairies  fan.jianqing  fantasy  farrell.henry  fascism  fast-and-frugal_heuristics  fear  feedback  feigenbaum.mitchell  feldman.marcus_w.  feminism  feynman_diagrams  fiat_justia_ruat_coelum  field_theory  fienberg.stephen_e.  filtering  finance  financialization  financial_crises  financial_crisis_of_2007--  financial_markets  financial_speculation  fingerprints  finland  fish  fisher_information  flack.jessica  flaxman.seth  flight_of_the_albatross  flocking  fluctuation-response  fluid_mechanics  flynn.james  flynn_effect  fmri  folding_problem  folklore  fontana.walter  food  food_webs  foraging  foreign_policy  forensics  formal_languages  for_my_inner_six_year_old  fosdick.bailey  foster.dean_p.  foundations_of_mathematics  foundations_of_probability  foundations_of_statistical_mechanics  foundations_of_statistics  fourier_analysis  fowler.james  fox.emily_b.  france  frankel.felice  fraser.d.a.s.  fraud  freedman.david  freedman.david_a.  freedom_of_expression  freeman.peter  freese.jeremy  french_revolution  frontiers  functional_central_limit_theorem  functional_connectivity  functional_data  functional_data_analysis  fundamentalism  funny:geeky  funny:malicious  g'sell.max  gabaix.xaiver  gabaix.xavier  gaia  galbraith.james_k.  galesic.mirta  galileo  galstyan.aram  gambetta.diego  game_theory  gastner.michael  gastner.michael_t.  gaussian_processes  gelman.andrew  gender_gaps_of_questionable_injustice  generalized_linear_models  general_equilibrium  genetics  genetic_algorithms  genetic_engineering  gene_expression_data_analysis  gene_regulation  genomics  genovese.chris  genovese.christopher  genres  gentile.claudio  gentrification  geography  geology  geometry  geometry_from_a_time_series  gershman.sam  gershman.samuel  getoor.lise  geyer.charles  ghahramani.zoubin  gharamani.zoubin  gibbs_distributions  giere.ronald  gifford_lectures  gigerenzer.gerd  gilder.george  gile.krista  gini_index  gintis.herbert  girvan.michelle  gives_economists_a_bad_name  glass  globalization  glymour.clark  gneiting.tilmann  goel.sharad  goerg.georg  goldenberg.anna  goldenfeld.nigel  golman.russell  goodness-of-fit  gordon.geoff  gordon.geoffrey  gotts.nick  gourieroux.christian  gramacy.robert  grammar_induction  granger_causality  granovetter.mark  graphical_models  graphons  graph_discovery  graph_embedding  graph_learning  graph_limits  graph_sampling  graph_theory  graves.timothy  gravitation  great_depression  great_risk_shift  great_transformation  greenland  gretton.arthur  gretton.arthurt  groupthink  grumble  grunwald.peter  guestrin.carlos  guilds  gutfraind.sasha  haavelmo.trygve  habit  habsburg_empire  hahn.p._richard  hall.peter  halloween  halpern.joseph  halpern.joseph_y.  handcock.mark  hanneke.steve  hansen.ben  hansen.christian  happiness  harcourt.bernard_e.  hardle.wolfgang  hardy.g.h.  harris.zellig  harrison.matthew_t.  harrumph  hashing  haslinger.rob  hastie.trevor  have_read  have_skimmed  hayek.f.a._von  hazan.elad  healy.kieran  heard_the_talk  heath.joseph  heaven_and_earth_are_not_benevolent  heavy_tails  hebbian_learning  heckerman.david  heene.moritz  helmholtz.hermann  herding  heritability  hero.alfred_o.iii  hero.alfred_o._iii  heuristics  hierarchical_statistical_models  hierarchical_structure  high-dimensional_probability  high-dimensional_statistics  hilbert_space  hill.claire  hinduism  historical_genetics  historical_linguistics  historical_materialism  historical_myths  historiography  history  history_of_art  history_of_biology  history_of_computing  history_of_economics  history_of_ideas  history_of_literature  history_of_mathematics  history_of_medicine  history_of_morals  history_of_philosophy  history_of_physics  history_of_religion  history_of_science  history_of_statistics  history_of_tastes  history_of_technology  hodgson.marshall.g.s.  hoff.peter  hofman.jake  holland.john_h.  holme.petter  holmes.susan  homeostasis  hommes.cars  homogamy  homophily  homrighausen.darren  honor  hooi.bryan  hooker.giles  hoover.kevin  hordijk.wim  horowitz.joel  horror  hotelling.harold  how_stuff_works  how_they_did_it_back_in_the_old_country  hoyer.patrik_o.  hsu.daniel  huber.peter  huh  humanities  human_ecology  human_evolution  human_genetics  human_rights  human_sacrifice  hume.david  hutter.marcus  hydrodynamics  hydrology  hyperbolic_geometry  hypercycles  hypergraphs  hypothesis_testing  ibn_khaldun  ibn_sina  iconology  ideal-point_models  idealization  identifiability  identity_group_formation  identity_politics  ideology  image_retrieval  imagination  imbens.guido  imbens.guido_w.  imitation  immigration  immunology  imperfect_competition  imperialism  implicit_association_test  importance_sampling  imprisonment  imputation  incest  independence_testing  independence_tests  independent_component_analysis  india  indirect_inference  individualism  individual_sequence_prediction  indo-europeans  induction  industrial_organization  industrial_revolution  indus_valley_civilization  inequality  inference_to_latent_objects  influence  influenza  information_bottleneck  information_cascades  information_criteria  information_geometry  information_retrieval  information_theory  initiation  initiation_as_hazing  innovation  input-output_analysis  insects  institutions  instrumental_variables  instrumentation  integrative_action_of_the_nervous_system  intellectuals  intellectuals_in_politics  intellectual_property  intelligence_(psychology)  intelligence_(spying)  interacting_particle_systems  interesting_if_true  interlocking_directorates  international_relations  international_trade  internet  interpolation  interpretation  intuition  invasive_species  inverse_problems  in_library  in_NB  in_the_acknowledgments  in_wishlist  ionides.edward  iq  iran  ising_model  islam  islamic_civilization  islamophobia  isomorphism_problem  israel.jonathan  italy  iterative_approximation  izbicki.rafael  izibicki  i_dont_know_you_people  i_knew_him_when  i_want_to_believe  jackknife  jackson.matthew_o.  jacobs.abigail_z.  jakulin.aleks  janzing.dominik  japan  javidi.tara  jazz  jin.jiashun  jinn  johnson-laird.philip  johnson-lindenstrauss  jordan.michael_i.  jost.jurgen  journalism  judaism  justification  k-means  kac.mark  kadanoff.leo  kaelbling.leslie_pack  kahan.dan  kahan.don_a.  kakade.sham  kallenberg.olav  kalman_filter  karev.g.p.  karrer.brian  kass.robert  katrina  kauffman.stuart  kearns.michael  keller-segel_model  kelly.kevin_t.  kelly_gambling  kennedy.edward  kennedy.edward_h.  kernel_estimators  kernel_methods  king.gary  kinouchi.osame  kinsella.stephen  kinship  kirman.alan  kith_and_kin  kleinberg.jon  kleinberg.samantha  know_thyself  kolaczyk.eric  kolar.mladen  koller.daphne  kolter.zico  kontorovich.aryeh  koyama.shinsuke  krakauer.david  kriegeskorte.nikolaus  krioukov.dmitri  krugman.paul  krzakala.florent  kuramoto_model  kurths.jurgen  kurtz.thomas  kurtz.thomas_g.  kurtz.zachary  kuznet.simon  labor  lacerda.gustavo  lafferty.john  lahiri.s.n.  landauers_principle  language_history  laplace_approximation  large_deviations  large_numbers_of_rare_events  lasso  latent_dirichlet_allocation  latent_semantic_analysis  latent_space_network_models  latent_variables  latin  lauritzen.steffen  law  law_and_economics  lazer.david  le.quoc  lead  learning  learning_from_error  learning_in_animals  learning_in_anmals  learning_in_games  learning_theory  learning_under_dependence  lebanon.guy  lebowitz.joel  lee.ann_b.  leeb.hannes  leek.jeffrey_t.  legitimacy  lei.jing  leibniz  leibniz.g.w.  leisure  lenski.richard  leontief_died_for_your_sins  lerman.kristina  leskovec.jure  lessig.lawrence  levels_of_selection  levenson.joseph_r.  levin.simon  levina.elizaveta  levina.liza  levy_processes  liberalism  liberman.mark  libraries  liggett.thomas_m.  likelihood  limit_cycles  linear_algebra  linear_programming  linear_regression  linguistics  linguistic_evolution  link_prediction  lipson.hod  liquid_crystals  literacy  literary_criticism  literary_history  liu.fuchen  liu.han  liu.li  lives_of_the_activists  lives_of_the_artists  lives_of_the_monarchs  lives_of_the_revolutionaries  lives_of_the_scholars  lives_of_the_scientists  lizardo.omar  lloyd.g.e.r.  locality-sensitive_hashing  locally_linear_embedding  locusts  loeb.jacques  logic  logical_positivism  logistics  logistic_map  logistic_regression  loh.po-ling  long-memory_processes  long-range_dependence  long-range_interactions  longitudinal_data  los_angeles  lotka-volterra  love  low-dimensional_summaries  low-rank_approximation  low-regret_learning  lowenstein.george  lugosi.gabor  lumley.thomas  lumpentechnocracy  lunde.robert  lupia.arthur  lyapunov_exponents  lyapunov_functions  maathuis.marloes  maathuis.marloes_h.  machine_learning  mackenzie.donald  macroeconomics  macro_from_micro  madigan.david  mad_science  maes.christian  magic  magical_thinking  magnets  mahoney.michael  majorana.ettore  malaria  malawi  mammoths  management  manhattan_project  manifold_learning  mann.michael  manning.patrick  manski.charles  manski.charles_f.  map-reduce  maps  march.james_g.  marcus.gary  marketing  markets_as_collective_calculating_devices  market_bubbles  market_failures_in_everything  markov_models  marriage  mars  martin.john_levi  martingales  marton.katalin  marx.karl  marxism  mason.winter  matching  mathematical_biology  mathematics  matloff.norman  matrices  matrix_and_tensor_factorization  matrix_factorization  matthew_effect  mauro.jacqueline  maximum_entropy  maxwell.james_clerk  maxwells_demon  may.robert_m.  mayo-wilson.conor  mcafee.noelle  mcauley.paul  mccormick.tyler  mccormick.tyler_h.  mcculloch.warren  mcculloch.warren_s.  mccurdy.howard_e  mcdonald.daniel  mcfowland.edward_iii  mdl  mean-field  meaning_as_location_in_a_system_of_relations  measurement  measurement_problem  measure_theory  mea_copula  mea_maxima_copula  mechanism  mechanism_design  media  medicine  medieval_eurasian_history  medieval_european_history  mediterranean  meehl.paul  meinshausen.nicolai  memoir  memory  memo_to_self_real_scholars_write  mendelson.sahar  mental_illness  mental_models  mental_testing  mesopotamia  meta-analysis  meta-stability  metalworking  metaphor  metastability  meteorology  methodological_advice  methodology  method_of_sieves  metric_learning  mexico  meyers.lauren_ancel  middle_east  migraine  migration  mijnheer_spinoza_mijnheer_benedictus_spinoza_to_the_courtesy_phone_please  milanovic.branko  military_industrial_complex  military_privatization  millenarianism  ming_dynasty  minimax  minsky_shut_his_eyes  mirror_neurons  missing_data  missing_mass_problem  misspecification  mit  mixing  mixture_models  mjolsness.eric  modeling  model_checking  model_discovery  model_selection  model_theory  modernism  modernity  modern_ruins  modesty_forbids_further_comment  modularity  mohri.mehryar  mohri.meryar  molecular_biology  moment_closures  money  monfort.alain  mongol_empire  monopolistic_competition  monsters  montanari.andrea  monteleoni.claire  monte_carlo  moral_panics  moral_philosophy  moral_psychology  moral_responsibility  moretti.franco  morphogenesis  mortgage_crisis  mossel.elchanan  mother_courage_raises_the_west  motor_control  mound_builders  mucha.peter  mucha.peter_j.  mueller.john  mukherjee.sayan  multiculturalism  multidimensional_scaling  multiple_comparisons  multiple_imputation  multiple_testing  multivariate_distributions  murphy.susan  murray-watters.alexander  museums  music  mysticism  mythology  nadler.boaz  nagin.dan  naidu.suresh  names  narcissism  narrative  nationalism  national_surveillance_state  native_american_history  natural_born_cyborgs  natural_history  natural_history_of_truthiness  natural_language_processing  natural_science_of_the_human_species  neal.radford  nearest-neighbors  nearest_neighbors  neat_nonlinear_nonsense  neiwert.david  nemenman.ilya  neo-liberalism  neoliberalism  nerdworld  networked_life  networks  networks_over_time  network_comparison  network_data  network_data_analysis  network_experiments  network_formation  network_growth  network_sampling  network_visualization  neural_coding_and_decoding  neural_computation  neural_control_of_action  neural_data_analysis  neural_modeling  neural_networks  neurath.otto  neuropsychology  neuroscience  neutral_models  neutrinos  neville.jennifer  newman.mark  newspapers  newton.isaac  neyman-pearson  niche_construction  nichols.thomas_e.  nichols.tom_e.  nickl.richard  nietzsche.friedrich  niezink.nynke  nigeria  nk_networks  nobel.andrew  noethers_theorem  noise_in_dynamics  nominate  non-equilibrium  non-stationarity  nonparametrics  north.jill  norton.john  not_making_this_up  not_quite_scooped  novels  novelty  nowak.robert  nowak_martin_a.  no_really_via:io9  no_really_via:warrenellis  no_such_thing_as_model_free_prediction  nudges  nudging  nukes  null_models  numeracy  numerical_analysis  nye.mary_jo  o'connor.brendan  o'neil.cathy  ober.josiah  obesity  occams_razor  oceanography  oceans  odlyzko.andrew  official_statistics  of_course_we_shouldn't_forget_the_marketers  ogburn.betsy  ogburn.elizabeth  olfaction  oligarchy  online_learning  on_the_accuracy_of_economic_observations  oops  open_source  operator_theory  optimization  oracle_inequalities  order_parameters  ordinal_data  organizations  orientalism  origins_of_life  origin_of_life  orwell.george  oscillators  ostrom.elinor  otters  ottoman_empire  our_decrepit_institutions  outbreak_narrative  outliers  p-values  pac-bayesian  padgett.john  page.scott  page_rank  pakistan  paleontology  paninski.liam  parallelism  parallel_computing  parallel_processing  partial_identification  partial_pooling  particle_filters  particle_physics  partisanship  parzen.emanuel  pascal.blaise  path_dependence  path_integrals  pattern_discovery  pattern_formation  pattern_recognition  patterson.orlando  pearl.judea  pearls  pearson.karl  peasants_into_englishmen  peasants_into_frenchmen  pedagogy  peer_production  peer_review  peirce.c.s.  pepinsky.thomas_b.  peppers  perception  periodicity  persianate_culture  persian_empire  personality  perturbation_theory  peters.jonas  phase_transitions  philosophy  philosophy_of_history  philosophy_of_mind  philosophy_of_science  philosophy_of_science_by_scientists  photography  photos  photosynthesis  phylogenetics  phys  physics  physics_of_information  piketty.thomas  pillai.natesh  pillow.jonathan  pinsker.m.s.  plagues_and_peoples  planning  plants  poczos.barnabas  poetry  poincare.henri  poincare_recurrence  pointless_but_nonetheless_awesome  point_processes  polanyi.karl  polanyi.michael  polarization  poldrack.russell  poldrack.russell_a.  police  police_and_policing  policing  policy_analysis_as_a_social_process  politi.antonio  political_economy  political_networks  political_parties  political_philosophy  political_science  politics  politkovskaya.anna  pollard.david  pollution  polyani.karl  polymers  polynesia  popular_science  popular_social_science  population_genetics  population_processes  porter.mason_a.  portland  post-apocalyptic  post-selection_inference  post-soviet_life  post-soviet_politics  post-traumatic_stress_disorder  potscher.benedict_m.  poverty  powell.walter  power  pr0n  practices_relating_to_the_transmission_of_genetic_information  pragmatics  pragmatism  prague  pre-cognitivism  pre-validation  predator-prey_models  prediction  predictive_representations  predictive_states  preference_falisifcation  preferential_attachment  prejudice  pretty_pictures  previous_tag_was_in_poor_taste  prices  priebe.carey_e.  primates  primo.david  principal_components  printing  printing_press_as_an_agent_of_change  prison  prisoners_dilemma  privacy  privatization  pro-social_behavior_or_not  probability  problem_solving  productivity  professions_and_professionalization  programming  progress  progressive_forces  propagation_of_error  propensity_scores  prophecy  prosthetics  proteins  protest  psychiatry  psychoceramica  psychoceramics  psychology  psychometrics  ptolemy  publication_bias  public_opinion  public_policy  publishing  pyrrhonism  quantum_mechanics  quasi-species_models  quixotic_enterprises  R  race  race_in_America  racine.jeffrey  racism  racist_idiocy  radev.dragomir  radioactivity  rafael  raftery.adrian  raginsky.maxim  rakhlin.alexander  rakhlin.sasha  ramdas.aaditya  rand.william  randomized_algorithms  random_fields  random_forests  random_graphs  random_matrices  random_projections  random_time_changes  rar  rare-event_simulation  rashevsky.nicolas  rashid.ahmed  rasmussen.carl_edward  rationality  rat_whiskers  ravikumar.pradeep  re:6dfb  re:actually-dr-internet-is-the-name-of-the-monsters-creator  re:ADAfaEPoV  re:almost_none  re:anti-nudge  re:anti-tsallis  re:AoS_project  re:backwards_arrow  re:bayes_as_evol  re:bootstrapping_graphs  re:computational_complexity_of_communism  re:computational_lens  re:critique_of_diffusion  re:data_science_whitepaper  re:democratic_cognition  re:do-institutions-evolve  re:do_not_adjust_your_receiver  re:evidence-based_haruspicy  re:fitness_sampling  re:fractal_asymptotics_of_networks  re:fractal_network_asymptotics  re:friday_cat-blogging  re:friday_science_cat_blogging  re:functional_communities  re:growing_ensemble_project  re:g_paper  re:hea  re:heart-waves-project  re:homophily_and_confounding  re:hyperbolic_networks  re:IMF_talk  re:in_soviet_union_optimization_problem_solves_you  re:kinematic-decoding  re:knightian_uncertainty  re:LOB_project  re:model_selection_for_networks  re:mongol_modernity  re:naive-semi-supervised  re:network_bootstraps  re:network_differences  re:neutral_model_of_inquiry  re:nudging  re:phil-of-bayes_paper  re:prediction-without-prejudice  re:prediction-without-racism  re:reading_capital  re:risk_bounds_for_time_series  re:small-area_estimation_by_smoothing  re:smoothing_adjacency_matrices  re:social_networks_as_sensor_networks  re:stacs  re:TALR  re:urban_scaling_what_urban_scaling  re:what_is_a_macrostate  re:what_is_the_right_null_model_for_linear_regression  re:XV_for_mixing  re:XV_for_networks  re:your_favorite_dsge_sucks  re:your_favorite_ergm_sucks  reaction-diffusion  reading  read_the_draft  read_the_thesis  real_analysis  reception_history  reception_history_of_classical_antiquity  recht.benjamin  reciprocity  recommender_systems  record_linkage  rectification_of_names  recurrence_times  redner.sidney  reductionism  reed.adolf  reformation  regression  regular_expressions  regulation  reid.mark  reinforcement_learning  relational_learning  relativity  relevance  religion  renaissance_history  renormalization  replication  replicator_dynamics  representation  repro  reproducibility  reproducible_research  resnick.sidney  respondent-driven_sampling  reutlinger.alexander  review_papers  revolution  rhetoric  rhetorical_self-fashioning  richards.joey  richardson.thomas  ridge_regression  rinaldo.alessandro  risk_assessment  risk_perception  risk_vs_uncertainty  rissanen.jorma  ritov.ya'acov  ritual  rivers  robin.corey  robins.james  robins.james_m.  robotics  robots_and_robotics  robustness  robust_statistics  rodrik.dani  rodu.jordan  rodu_jordan  roeder.kathryn  rogeberg.ole  rohe.karl  role-playing_games  romanticism  roman_empire  rosenblatt.murray  rosset.saharon  rossman.gabriel  rosvall.martin  rothman.daniel  rubinstein.ariel  rudin.cynthia  runge.jakob  running_dogs_of_reaction  rural_decay  russell.bertrand  russia  ryabko.daniil  ryden.tobias  sacrifice  sadeghi.kayvan  said.edward  salakhtudinov.ruslan  sales.adam  salganik.matthew  samii.cyrus  sampling  samworth.richard  samworth.richard_j.  samworth.robert  san_francisco  sarkar.purna  sarwate.anand  satire  sat_on_the_thesis_committee  savage.van  saxe.rebecca  schafer.chad  scheines.richard  schizophrenia  schneier.bruce  scholarly_misconstruction_of_reality  scholasticism  schweinberger.michael  science  science_as_a_social_process  science_cat_blogging  science_fiction  science_in_society  science_journalism  science_policy  science_studies  scientific_computing  scientific_method  scientific_revolution  scientific_thinking  scooped  scooped?  scoring_rules  scotland  scott.clayton  scythia  seals  search_engines  secrecy  segregation  seldom_meet_even_for_merriment_or_diversion_but_it_ends_in_some_conspiracy_against_the_public  selection_bias  self-control  self-fulfilling_prophecies  self-fulfilling_prophecy  self-organization  self-organized_criticality  self-promotion  self-replication  selten.reinhard  semantics  semantics_from_syntax  semi-supervised_learning  sensitivity_analysis  sensor_networks  sentiment_analysis  separation_of_time-scales  sequential_optimization  serotonin  sethi.rajiv  set_theory  sexism  sexist_idiocy  sex_differences  shafer.glenn  shamanism  sharing_economy  sharpnack.james  shelby.tommie  shoes  shot_after_a_fair_trial  shpitser.ilya  shrinkage  sigmund.karl  signal_processing  signal_transduction  silk_road  silverman.bernard  simon.herbert  simonoff.jeffrey  simulation  singh.aarti  singh.satinder_baveja  singh.sumeetpal  singh.sumeetpal_s.  singular_value_decomposition  situations_and_attitudes  skepticism  skocpol.theda  slavery  slee.tom  slime_molds  small-area_estimation  small.dylan  smil.vaclav  smith.eric  smith.leonard  smith.noah  smith.noah_a.  smith.vernon  smoking  smola.alex  smoothing  snakes  snijders.tom  snowball_sampling  sobel.michael_e.  socialism  socialist_calculation_debate  social_choice  social_cognition  social_construction  social_contagion  social_criticism  social_democracy  social_engineering  social_evolution  social_identity  social_influence  social_learning  social_life_of_the_mind  social_measurement  social_media  social_mobility  social_movements  social_networks  social_neuroscience  social_norms  social_psychology  social_science_methodology  social_theory  sociology  sociology_of_literary_taste  sociology_of_science  sole.ricard  solidarity  something_about_america  song_dynasty  south-east_asia  soviet-afghan_war  space_exploration  space_station  spam  sparse_graph_models  sparsity  spatial_statistics  spatio-temporal_statistics  special_place_in_hell  spectral_clustering  spectral_methods  sperber.dan  spirits_of_places  spirtes.peter  splines  sporns.olaf  sprites.peter  squamous_surfaces_will_be_treated_in_part_II  sra.suvrit  stability_of_learning  standardized_testing  stark.philip_b.  state-building  state-space_models  state-space_reconstruction  state_estimation  state_reconstruction  statistical_computing  statistical_inference_for_stochastic_processes  statistical_mechanics  statistics  statistics_on_manifolds  steins_method  steinwart.ingo  stigler.stephen  stiglitz.joseph  stigmergy  still.susanne  still_better_than_the_alternative  stochastic_approximation  stochastic_block_models  stochastic_differential_equations  stochastic_models  stochastic_processes  stodden.victoria  stoicism  stotz.karola  stovel.katherine  strategic_position_in_networks  strauss.leo  streaming_algorithms  strength_of_weak_ties  stress  structured_data  stupid_security  submerged_state  suburbia  sufficiency  sufism  sugihara.george  sugiyama.masashi  suhay.liz  suicide  sunflowers  superheroes  superstition  support-vector_machines  surprise  surprisingly_not_via:warrenellis  surveillance  surveys  survey_sampling  survival_analysis  symbolic_dynamics  symbols_from_dynamics  synchronization  synchronizing_words  systems  tacit_knowledge  tagging  tamil_nadu  tao.terence  taskar.ben  taxes  taxis  teaching  technological_change  technological_unemployment  technology  teeth  temporal_networks  tenenbaum.joshua  tensors  termites  terrorism  terrorism_fears  tetlock.philip  tewari.ambuj  text_mining  theology  theoretical_biology  theoretical_computer_science  theory_of_mind  thermodynamics  thermodynamic_formalism  the_american_dilemma  the_civilizing_process  the_continuing_crises  the_dream_of_the_90s_is_alive  the_earth_as_transformed_by_human_action  the_enlightenment  the_good_man_is_happy_on_the_rack  the_great_transformation  the_nightmare_from_which_we_are_trying_to_awake  the_organism_as_an_adaptive_control_system  the_present_before_it_was_widely_distributed  the_printing_press_as_an_agent_of_change  the_running-dogs_of_reaction  the_silk_road  the_singularity_has_happened  the_spreadsheet_menace  the_wired_ideology  the_work_of_art_in_the_age_of_mechanical_reproduction  think_tanks  this_needed_saying?  tibet  tibshirani.robert  tibshirani.ryan  time  time-series  time_series  time_series_connections  titan  to:blog  to:NB  todorova.sonia  tomasello.michael  topic_models  topological_defects  topology  total_factor_productivity  touchette.hugo  tourism  to_besh  to_be_shot_after_a_fair_trial  to_download  to_read  to_teach  to_teach:baby-nets  to_teach:complexity-and-inference  to_teach:data-mining  to_teach:data_over_space_and_time  to_teach:graphons  to_teach:linear_models  to_teach:statcomp  to_teach:undergrad-ADA  to_teach:undergrad-research  track_down_references  trade_in_antiquity  tradition  transaction_networks  transit  translation  transmission_of_inequality  travel  travelers'_tales  trends_and_detrending  triadic_closure  trotsky.leon  trust  tsallis_statistics  tuberculosis  tufekci.zeynep  tukey.john_w.  tumley.thomas  turbulence  turing_mechanism  turkey  turkheimer.eric  tutorials  twitter  two-sample_tests  typography  u-statistics  ugander.johan  uhler.caroline  ulam.stanislaw  uncanny_valley  under_precisely_controlled_experimental_conditions_the_organism_does_what_it_damn_well_pleases  unions  universal_basic_income  universities  university_industrial_complex  urban_economics  us-afghan_war  us-iraq_war  uses_of_the_past  ussr  us_culture_wars  us_military  us_politics  utter_stupidity  vaccination  vagueness  valiant.leslie  value_of_information  vampires  vanderweele.tyler  van_alystne.marshall  van_der_laan.mark  van_der_vaart.aad  van_de_geer.sara  van_handel.ramon  variable-length_markov_chains  variable_selection  variance_estimation  variational_inference  variety  vast_right-wing_conspiracy  vats.divyanshu  vc-dimension  venkatasubramanian.suresh  ventura.valerie  ver_steeg.greg  via:?  via:???  via:aaronsw  via:absfac  via:aelkus  via:afinetheorem  via:aks  via:ale  via:aleks  via:anand  via:anoopsarkar  via:arinaldo  via:arsyed  via:arthegall  via:asarwate  via:a_budding_sociologists_commonplace_book  via:a_fine_theorem  via:BenjaminLind  via:bruces  via:charlie_stross  via:chl  via:coates.ta-nehisi  via:coleman  via:cris_moore  via:csantos  via:d-squared  via:ded-maxim  via:djm1107  via:dpfeldman  via:dsquared  via:everyone  via:flint_riemen  via:gabriel_rossman  via:gelman  via:harry_wynn  via:henry_farrell  via:hugo_mercier  via:ICCI  via:io9  via:jbdelong  via:john_lawler  via:joncgoodwin  via:justin  via:kjhealy  via:klk  via:krugman  via:larry_wasserman  via:lieberman  via:many  via:mcfowland  via:mejn  via:melanie_mitchell  via:merriam  via:mindhacks  via:monkeycage  via:mraginsky  via:mreid  via:neuroanthropology  via:nybooks  via:nyhan  via:ogburn  via:orgtheory  via:orzelc  via:phnk  via:pinboard  via:rortybomb  via:rvenkat  via:ryantibs  via:samii  via:scotte  via:seth  via:shivak  via:sifu_tweety  via:simon_d.  via:simply_statistics  via:suresh  via:teddy_s.  via:themonkeycage  via:tom_levenson  via:tozier  via:tslumley  via:tsuomela  via:unfogged  via:vaguery  via:warrenellis  via:watkins  via:whimsley  via:wiggins  via:xmarquez  via:yorksranter  video_games  vidyasagar.mathukumalli  vienna_circle  vietnam_war  violence  viruses  vishwanathan.s.v.n.  vision  visual_display_of_quantitative_information  vitalism  volcano  von_hippel.eric  von_neumann.john  voter_model  voting  vu.vincent  vul.edward  vulpiani.angelo  vygotsky.lev  wager.stefan  wagner.andreas  wainwright.martin_j.  wallace.david  war  warfare_of_science_and_theology_in_christendom  warmuth.manfred  warmuth.nanny  war_crimes  wasserman.larry  wasserstein_metrics  water  watkins.nicholas  watts.duncan  wavelets  web  wehbe.leila  welfare_state  wellman.michael  well_when_you_put_it_that_way_how_could_it_possibly_go_epically_tragically_wrong  wemuth.nanny  wermuth.nanny  wesolowski.amy  we_are_as_gods_and_we_might_as_well_get_good_at_it  we_would_have_to_invent_him  whales  whats_gone_wrong_with_america  wheels:reinvention_of  wheel_reinvention  when_you_meet_a_contradiction_draw_a_distinction  where_by_"heard_the_talk"_i_mean_"heard_it_explained_over_drinks"  where_by_"heard_the_talk"_I_mean_had_it_explained_over_coffee  where_the_axioms_come_from  white.halbert  why_oh_why_cant_we_have_a_better_academic_publishing_system  why_oh_why_cant_we_have_a_better_intelligentsia  why_oh_why_cant_we_have_a_better_press_corps  wiener.norbert  wiggins.chrisropher  wiggins.christopher  wikipedia  will  willett.rebecca_m.  williamson.robert_c.  winship.christopher  wisconsin  wisdom_of_the_east  wiuf.carsten  wolfe.patrick_j.  wood.frank  wood.simon_n.  wootton.david  word2vec  workers_cooperatives  world_bank  world_history  wright.erik_olin  writing  writing_advice  wu.wei-biao  wu.wei_biao  WWI  xing.eric  xinjiang  yan.xiaoran  yarkoni.tal  yeast  yet_another_piece_of_my_childhood_becomes_part_of_the_historical_record  yoga  young.h._peyton  young.j.z.  your_body_is_an_ecosystem  your_favorite_deep_neural_network_sucks