It's Latin to Me :: Reviews :: James Nicoll Reviews
"My characteristic understatement may have concealed how much I hated Lester. I don’t hate the book about him, because it’s never on his side. However, the book might have ended more satisfactorily if Lester had been torn apart by crazed undergraduate maenads in the last chapter."
book_reviews  nicoll.james 
Change of Time and Change of Measure (World Scientific)
"Change of Time and Change of Measure provides a comprehensive account of two topics that are of particular significance in both theoretical and applied stochastics: random change of time and change of probability law.
"Random change of time is key to understanding the nature of various stochastic processes, and gives rise to interesting mathematical results and insights of importance for the modeling and interpretation of empirically observed dynamic processes. Change of probability law is a technique for solving central questions in mathematical finance, and also has a considerable role in insurance mathematics, large deviation theory, and other fields.
"The book comprehensively collects and integrates results from a number of scattered sources in the literature and discusses the importance of the results relative to the existing literature, particularly with regard to mathematical finance.
"In this Second Edition a Chapter 13 entitled 'A Wider View' has been added. This outlines some of the developments that have taken place in the area of Change of Time and Change of Measure since the publication of the First Edition. Most of these developments have their root in the study of the Statistical Theory of Turbulence rather than in Financial Mathematics and Econometrics, and they form part of the new research area termed 'Ambit Stochastics'."
to:NB  books:noted  stochastic_processes 
Twelve Landmarks of Twentieth-Century Analysis | Real and Complex Analysis | Cambridge University Press
"The striking theorems showcased in this book are among the most profound results of twentieth-century analysis. The authors' original approach combines rigorous mathematical proofs with commentary on the underlying ideas to provide a rich insight into these landmarks in mathematics. Results ranging from the proof of Littlewood's conjecture to the Banach–Tarski paradox have been selected for their mathematical beauty as well as educative value and historical role. Placing each theorem in historical perspective, the authors paint a coherent picture of modern analysis and its development, whilst maintaining mathematical rigour with the provision of complete proofs, alternative proofs, worked examples, and more than 150 exercises and solution hints. This edition extends the original French edition of 2009 with a new chapter on partitions, including the Hardy–Ramanujan theorem, and a significant expansion of the existing chapter on the Corona problem."
to:NB  books:noted  mathematics  history_of_mathematics  analysis 
6 days ago
Enhancing the Effectiveness of Team Science | The National Academies Press
"The past half-century has witnessed a dramatic increase in the scale and complexity of scientific research. The growing scale of science has been accompanied by a shift toward collaborative research, referred to as "team science." Scientific research is increasingly conducted by small teams and larger groups rather than individual investigators, but the challenges of collaboration can slow these teams' progress in achieving their scientific goals. How does a team-based approach work, and how can universities and research institutions support teams?
"Enhancing the Effectiveness of Team Science synthesizes and integrates the available research to provide guidance on assembling the science team; leadership, education and professional development for science teams and groups. It also examines institutional and organizational structures and policies to support science teams and identifies areas where further research is needed to help science teams and groups achieve their scientific and translational goals. This report offers major public policy recommendations for science research agencies and policymakers, as well as recommendations for individual scientists, disciplinary associations, and research universities. Enhancing the Effectiveness of Team Science will be of interest to university research administrators, team science leaders, science faculty, and graduate and postdoctoral students."
to:NB  collective_cognition  science_as_a_social_process  social_life_of_the_mind  re:democratic_cognition 
9 days ago
Journal of the American Philosophical Association - What Is Democracy (and What Is Its Raison D’Etre)? - Cambridge Journals Online
"This article aims to say what democracy is or what the predicate ‘democratic’ means, as opposed to saying what is good, right, or desirable about it. The basic idea—by no means a novel one—is that a democratic system is one that features substantial equality of political power. More distinctively it is argued that ‘democratic’ is a relative gradable adjective, the use of which permits different, contextually determined thresholds of democraticness. Thus, a system can be correctly called ‘democratic’ even if it does not feature perfect equality of power. The article's central undertaking is to give greater precision to the operative notion(s) of power. No complete or fully unified measure of power is offered, but several conceptual tools are introduced that help give suitable content to power measurement. These tools include distinctions between conditional versus unconditional power and direct versus indirect power. Using such tools, a variety of prima facie problems for the power equality approach are addressed and defused. Finally, the theory is compared to epistemic and deliberative approaches to democracy; and reasons are offered for the attractiveness of democracy that flows from the power equality theme."
to:NB  democracy  political_philosophy  re:democratic_cognition 
9 days ago
Journal of the American Philosophical Association - Williams, Smith, and the Peculiarity of Piacularity - Cambridge Journals Online
"This article reflects on some of the complexities in Williams' discussion of moral luck. It compares this discussion with previous work, especially by Adam Smith, and argues that Williams' fear that the phenomenon of moral luck threatens the coherence of our moral concepts is unfounded."

- Tangential, but I quite liked this bit: "the words ‘rational’ or ‘irrational’ are so often placeholders, doing little more than pointing us, peremptorily but obscurely, in the direction of some supposed dimension of virtue or fault that our conduct may have exhibited without actually telling us what that dimension might be"
have_read  moral_psychology  moral_philosophy  blackburn.simon  via:unfogged 
9 days ago
Web Design - The First 100 Years
"Designers! I am a San Francisco computer programmer, but I come in peace!"
have_read  to:blog  ceglowski.maciej  networked_life  internet  computation  web  progressive_forces  cultural_criticism 
10 days ago
A Statistical Framework to Infer Delay and Direction of Information Flow from Measurements of Complex Systems
"In neuroscience, data are typically generated from neural network activity. The resulting time series represent measurements from spatially distributed subsystems with complex interactions, weakly coupled to a high-dimensional global system. We present a statistical framework to estimate the direction of information flow and its delay in measurements from systems of this type. Informed by differential topology, gaussian process regression is employed to reconstruct measurements of putative driving systems from measurements of the driven systems. These reconstructions serve to estimate the delay of the interaction by means of an analytical criterion developed for this purpose. The model accounts for a range of possible sources of uncertainty, including temporally evolving intrinsic noise, while assuming complex nonlinear dependencies. Furthermore, we show that if information flow is delayed, this approach also allows for inference in strong coupling scenarios of systems exhibiting synchronization phenomena. The validity of the method is demonstrated with a variety of delay-coupled chaotic oscillators. In addition, we show that these results seamlessly transfer to local field potentials in cat visual cortex."
to:NB  information_theory  time_series  statistics  functional_connectivity 
11 days ago
The end of capitalism has begun | Books | The Guardian
I'll charitably presume he explains how to deal with actual material goods (e.g., computers) in the full book.
progressive_forces  economics  peer_production  capitalism  to_be_shot_after_a_fair_trial 
13 days ago
Recovery from selection bias using marginal structure in discrete models
"This paper considers the problem of inferring a discrete joint distribution from a sample subject to selection. Abstractly, we want to identify a distribution p(x, w) from its condi- tional p(x | w). We introduce new assump- tions on the marginal model for p(x), un- der which generic identification is possible. These assumptions are quite general and can easily be tested; they do not require pre- cise background knowledge of p(x) or p(w), such as proportions estimated from previous studies. We particularly consider conditional independence constraints, which often arise from graphical and causal models, although other constraints can also be used. We show that generic identifiability of causal effects is possible in a much wider class of causal mod- els than had previously been known."
to:NB  graphical_models  identifiability  statistics  categorical_data  partial_identification  algebra  heard_the_talk  didelez.vanessa 
15 days ago
Causal Reasoning for Events in Continuous Time: A Decision-Theoretic Approach
"The dynamics of events occurring in continu- ous time can be modelled using marked point processes, or multi-state processes. Here, we review and extend the work of Røysland et al. (2015) on causal reasoning with local inde- pendence graphs for marked point processes in the context of survival analysis. We relate the results to the decision-theoretic approach of Dawid & Didelez (2010) using influence diagrams, and present additional identifying conditions."

--- VD suggests, orally, that the key bit here is the Doob-Meyer decomposition, and so the concepts may extend to, e.g., solutions of stochastic differential equations.
to:NB  time_series  point_processes  causality  causal_inference  graphical_models  statistics  didelez.vanessa  stochastic_processes  heard_the_talk 
15 days ago
Network Hypothesis Testing Using Mixed Kronecker Product Graph Models
"The recent interest in networks—social, physical, communication, information, etc.—has fueled a great deal of research on the analysis and modeling of graphs. However, many of the analyses have focused on a single large network (e.g., a subnetwork sampled from Facebook). Although several studies have compared networks from different domains or samples, they largely focus on empirical exploration of network similarities rather than explicit tests of hypotheses. This is in part due to a lack of statistical methods to determine whether two large networks are likely to have been drawn from the same underlying graph distribution. Research on across-network hypothesis testing methods has been limited by (i) difficulties associated with obtaining a set of networks to reason about the underlying graph distribution, and (ii) limitations of current statistical models of graphs that make it difficult to represent variations across networks. In this paper, we exploit the recent development of mixed-Kronecker Product Graph Models, which accurately capture the natural variation in real world graphs, to develop a model- based approach for hypothesis testing in networks."

--- I really should have known about this paper, and Jenn was remarkably graciously in telling me about it.
to:NB  to_read  network_data_analysis  hypothesis_testing  statistics  neville.jennifer  re:network_differences 
15 days ago
Start-up Nation? Slave Wealth and Entrepreneurship in Civil War Maryland
"The American slave economy was not simply a coerced agricultural labor force. A recent historical literature has emphasized that slaves were also relatively liquid financial assets, and an important source of collateral for a wide variety of economic activities. Drawing on this idea, we use new data to estimate the effect of Maryland’s 1864 uncompensated eman- cipation on business entry and exit. Using data on credit reports from Dun and Bradstreet linked to the 1860 census and slave schedules, we find that slaveowners were more likely to start businesses prior to emancipation, even conditional on wealth and human capital, and this advantage disappears with emancipation. We argue that this is due primarily to start-up credit rather than any advantage in production. Slave rental markets were active until emancipation, muting any difference between owning slaves and using slave labor in production, and we show that wages did not appreciably change after abolition. In our data, abolition of slavery is not associated with any differential exit of slaveholder owned businesses. In addition, we find that slaveowners differentially start-up businesses even in mercantile, non-agricultural, and urban sectors with little slave production. Our results suggest that the collateral dimension of slave property rights makes the institution an even larger contributor to American economic development than is normally supposed."
to:NB  to_read  american_history  economics  economic_history  slavery  the_american_dilemma  naidu.suresh 
16 days ago
Mesochronal structure learning
"Network statistics such as node degree distribu- tions, average path lengths, diameters, or clus- tering coefficients are widely used to character- ize networks. One statistic that received consid- erable attention is the distance distribution — the number of pairs of nodes for each shortest-path distance — in undirected networks. It captures important properties of the network, reflecting on the dynamics of network spreading processes, and incorporates parameters such as node cen- trality and (effective) diameter. So far, however, no parameterization of the distance distribution is known that applies to a large class of networks. Here we develop such a closed-form distribu- tion by applying maximum entropy arguments to derive a general, physically plausible model of path length histograms. Based on the model, we then establish the generalized Gamma as a three- parameter distribution for shortest-path distance in strongly-connected, undirected networks. Ex- tensive experiments corroborate our theoretical results, which thus provide new approaches to network analysis."
to:NB  to_read  graphical  time_series  causal_discovery  statistics  danks.david 
16 days ago
Parameterizing the distance distribution of undirected networks
"Network statistics such as node degree distribu- tions, average path lengths, diameters, or clus- tering coefficients are widely used to character- ize networks. One statistic that received consid- erable attention is the distance distribution — the number of pairs of nodes for each shortest-path distance — in undirected networks. It captures important properties of the network, reflecting on the dynamics of network spreading processes, and incorporates parameters such as node cen- trality and (effective) diameter. So far, however, no parameterization of the distance distribution is known that applies to a large class of networks. Here we develop such a closed-form distribu- tion by applying maximum entropy arguments to derive a general, physically plausible model of path length histograms. Based on the model, we then establish the generalized Gamma as a three- parameter distribution for shortest-path distance in strongly-connected, undirected networks. Ex- tensive experiments corroborate our theoretical results, which thus provide new approaches to network analysis."
to:NB  statistics  network_data_analysis 
16 days ago
Robust reconstruction of causal graphical models based on 2-point and 3-point conditional information
"We report a novel network reconstruction method, which combines constraint-based and Bayesian frameworks to reliably reconstruct graphical models despite inherent sampling noise in finite observational datasets. The approach is based on an information theory result trac- ing back the existence of colliders in graphi- cal models to negative conditional 3-point in- formation between observed variables. In turn, this provides a confident assessment of structural independencies in causal graphs, based on the ranking of their most likely contributing nodes with (significantly) positive conditional 3-point information. Starting from a complete undi- rected graph, dispensible edges are progressively pruned by iteratively “taking off” the most likely positive conditional 3-point information from the 2-point (mutual) information between each pair of nodes. The resulting network skeleton is then partially directed by orienting and propa- gating edge directions, based on the sign and magnitude of the conditional 3-point informa- tion of unshielded triples. This “3off2” net- work reconstruction approach is shown to out- perform constraint-based, search-and-score and earlier hybrid methods on a range of benchmark networks."
to:NB  to_read  causal_discovery  graphical_models  information_theory  statistics  heard_the_talk 
16 days ago
Learning from Pairwise Marginal Independencies
"We consider graphs that represent pairwise marginal independencies amongst a set of vari- ables (for instance, the zero entries of a covari- ance matrix for normal data). We characterize the directed acyclic graphs (DAGs) that faithfully ex- plain a given set of independencies, and derive al- gorithms to efficiently enumerate such structures. Our results map out the space of faithful causal models for a given set of pairwise marginal inde- pendence relations. This allows us to show the extent to which causal inference is possible with- out using conditional independence tests."
to:NB  graphical_models  causal_discovery  statistics  re:ADAfaEPoV 
16 days ago
A complete generalized adjustment criterion
"Covariate adjustment is a widely used approach to estimate total causal effects from observational data. Several graphical criteria have been de- veloped in recent years to identify valid covari- ates for adjustment from graphical causal mod- els. These criteria can handle multiple causes, latent confounding, or partial knowledge of the causal structure; however, their diversity is con- fusing and some of them are only sufficient, but not necessary. In this paper, we present a cri- terion that is necessary and sufficient for four different classes of graphical causal models: di- rected acyclic graphs (DAGs), maximum ances- tral graphs (MAGs), completed partially directed acyclic graphs (CPDAGs), and partial ancestral graphs (PAGs). Our criterion subsumes the ex- isting ones and in this way unifies adjustment set construction for a large set of graph classes."
to:NB  to_read  causal_inference  statistics  graphical_models  maathuis.marloes  kalisch.markus  re:ADAfaEPoV 
16 days ago
Learning the Structure of Causal Models with Relational and Temporal Dependence
"Many real-world domains are inherently rela- tional and temporal—they consist of heteroge- neous entities that interact with each other over time. Effective reasoning about causality in such domains requires representations that explicitly model relational and temporal dependence. In this work, we provide a formalization of tem- poral relational models. We define temporal ex- tensions to abstract ground graphs—a lifted rep- resentation that abstracts paths of dependence over all possible ground graphs. Temporal ab- stract ground graphs enable a sound and com- plete method for answering d-separation queries on temporal relational models. These methods provide the foundation for a constraint-based al- gorithm, TRCD, that learns causal models from temporal relational data. We provide experimen- tal evidence that demonstrates the need to explic- itly represent time when inferring causal depen- dence. We also demonstrate the expressive gain of TRCD compared to earlier algorithms that do not explicitly represent time."
to:NB  causal_discovery  relational_learning  machine_learning  statistics  jensen.david  heard_the_talk 
16 days ago
Estimating Mutual Information by Local Gaussian Approximation
"Estimating mutual information (MI) from sam- ples is a fundamental problem in statistics, ma- chine learning, and data analysis. Recently it was shown that a popular class of non-parametric MI estimators perform very poorly for strongly de- pendent variables and have sample complexity that scales exponentially with the true MI. This undesired behavior was attributed to the reliance of those estimators on local uniformity of the un- derlying (and unknown) probability density func- tion. Here we present a novel semi-parametric estimator of mutual information, where at each sample point, densities are locally approximated by a Gaussians distribution. We demonstrate that the estimator is asymptotically unbiased. We also show that the proposed estimator has a supe- rior performance compared to several baselines, and is able to accurately measure relationship strengths over many orders of magnitude."
to:NB  entropy_estimation  information_theory  statistics  kith_and_kin  galstyan.aram  ver_steeg.greg 
16 days ago
Do-calculus when the true graph is unknown
"One of the basic tasks of causal discovery is to estimate the causal effect of some set of variables on another given a statistical data set. In this article we bridge the gap between causal struc- ture discovery and the do-calculus by proposing a method for the identification of causal effects on the basis of arbitrary (equivalence) classes of semi-Markovian causal models. The approach uses a general logical representation of the equiv- alence class of graphs obtained from a causal structure discovery algorithm, the properties of which can then be queried by procedures im- plementing the do-calculus inference for causal effects. We show that the method is more ef- ficient than determining causal effects using a naive enumeration of graphs in the equivalence class. Moreover, the method is complete with respect to the identifiability of causal effects for settings, in which extant methods that do not re- quire knowledge of the true graph, offer only in- complete results. The method is entirely modular and easily adapted for different background set- tings."

(Last tag is just a to-mention.)
to:NB  heard_the_talk  to_read  causal_inference  causal_discovery  graphical_models  statistics  eberhardt.frederick  kith_and_kin  re:ADAfaEPoV 
16 days ago
Missing Data as a Causal and Probabilistic Problem
"Causal inference is often phrased as a missing data problem – for every unit, only the response to observed treatment assignment is known, the response to other treatment assignments is not. In this paper, we extend the converse approach of [7] of representing missing data problems to causal models where only interventions on miss- ingness indicators are allowed. We further use this representation to leverage techniques devel- oped for the problem of identification of causal effects to give a general criterion for cases where a joint distribution containing missing variables can be recovered from data actually observed, given assumptions on missingness mechanisms. This criterion is significantly more general than the commonly used “missing at random” (MAR) criterion, and generalizes past work which also exploits a graphical representation of missing- ness. In fact, the relationship of our criterion to MAR is not unlike the relationship between the ID algorithm for identification of causal effects [22, 18], and conditional ignorability [13]."
to:NB  statistics  graphical_models  causal_inference  missing_data  pearl.judea  shiptser.ilya  to_read  re:ADAfaEPoV  heard_the_talk 
16 days ago
Importance Sampling Over Sets: A New Probabilistic Inference Scheme
"Computing expectations in high-dimensional spaces is a key challenge in probabilistic infer- ence and machine learning. Monte Carlo sam- pling, and importance sampling in particular, is one of the leading approaches. We propose a generalized importance sampling scheme based on randomly selecting (exponentially large) sub- sets of states rather than individual ones. By col- lecting a small number of extreme states in the sampled sets, we obtain estimates of statistics of interest, such as the partition function of an undirected graphical model. We incorporate this idea into a novel maximum likelihood learning algorithm based on cutting planes. We demon- strate empirically that our scheme provides accu- rate answers and scales to problems with up to a million variables."

--- Applicable to the fitness-breeding scheme?
to:NB  computational_statistics  monte_carlo  statistics  approximation  re:fitness_sampling 
16 days ago
Stable Spectral Learning Based on Schur Decomposition
"Spectral methods are a powerful tool for inferring the parameters of certain classes of probability distributions by means of standard eigenvalue- eigenvector decompositions. Spectral algorithms can be orders of magnitude faster than log- likelihood based and related iterative methods, and, thanks to the uniqueness of the spectral de- composition, they enjoy global optimality guar- antees. In practice, however, the applicability of spectral methods is limited due to their sensitiv- ity to model misspecification, which can cause instability issues in the case of non-exact models. We present a new spectral approach that is based on the Schur triangularization of an observable matrix, and we carry out the corresponding theo- retical analysis. Our main result is a bound on the estimation error that is shown to depend linearly on the condition number of the ground-truth con- ditional probability matrix and inversely on the eigengap of an observable matrix. Numerical ex- periments show that the proposed method is more stable, and performs better in general, than the classical spectral approach using direct matrix di- agonalization."
to:NB  to_read  spectral_methods  mixture_models  statistics  computational_statistics  estimation  misspecification 
16 days ago
Novel Bernstein-like concentration inequalities for the missing mass
"We are concerned with obtaining novel concen- tration inequalities for the missing mass, i.e. the total probability mass of the outcomes not ob- served in the sample. We not only derive - for the first time - distribution-free Bernstein-like deviation bounds with sublinear exponents in de- viation size for missing mass, but also improve the results of McAllester and Ortiz (2003) and Berend and Kontorovich (2013, 2012) for small deviations which is the most interesting case in learning theory. It is known that the majority of standard inequalities cannot be directly used to analyze heterogeneous sums i.e. sums whose terms have large difference in magnitude. Our generic and intuitive approach shows that the heterogeneity issue introduced in McAllester and Ortiz (2003) is resolvable at least in the case of missing mass via regulating the terms using our novel thresholding technique."
to:NB  probability  deviation_inequalities 
16 days ago
Random Features for Large-Scale Kernel Machines
"To accelerate the training of kernel machines, we propose to map the input data to a randomized low-dimensional feature space and then apply existing fast linear methods. The features are designed so that the inner products of the transformed data are approximately equal to those in the feature space of a user specified shift- invariant kernel. We explore two sets of random features, provide convergence bounds on their ability to approximate various radial basis kernels, and show that in large-scale classification and regression tasks linear machine learning al- gorithms applied to these features outperform state-of-the-art large-scale kernel machines."
to:NB  to_read  machine_learning  computational_statistics  approximation  random_projections  fourier_analysis  kernel_methods  statistics  re:hyperbolic_networks 
16 days ago
Training generative neural networks via maximum mean discrepancy optimization
"We consider training a deep neural network to generate samples from an unknown distribu- tion given i.i.d. data. We frame learning as an optimization minimizing a two-sample test statistic—informally speaking, a good genera- tor network produces samples that cause a two- sample test to fail to reject the null hypothesis. As our two-sample test statistic, we use an un- biased estimate of the maximum mean discrep- ancy, which is the centerpiece of the nonpara- metric kernel two-sample test proposed by Gret- ton et al. [2]. We compare to the adversar- ial nets framework introduced by Goodfellow et al. [1], in which learning is a two-player game between a generator network and an adversarial discriminator network, both trained to outwit the other. From this perspective, the MMD statistic plays the role of the discriminator. In addition to empirical comparisons, we prove bounds on the generalization error incurred by optimizing the empirical MMD."

--- On first glance, there's no obvious limitation to neural networks, and indeed it's rather suggestive of indirect inference (to me)
to:NB  simulation  stochastic_models  neural_networks  machine_learning  two-sample_tests  hypothesis_testing  nonparametrics  kernel_methods  statistics  computational_statistics  ghahramani.zoubin 
16 days ago
Bayesian Structure Learning for Stationary Time Series
"Importance sampling (IS) and its variant, an- nealed IS (AIS) have been widely used for es- timating the partition function in graphical mod- els, such as Markov random fields and deep gen- erative models. However, IS tends to underesti- mate the partition function and is subject to high variance when the proposal distribution is more peaked than the target distribution. On the other hand, “reverse” versions of IS and AIS tend to overestimate the partition function, and degener- ate when the target distribution is more peaked than the proposal distribution. In this work, we present a simple, general method that gives much more reliable and robust estimates than either IS (AIS) or reverse IS (AIS). Our method works by converting the estimation problem into a simple classification problem that discriminates between the samples drawn from the target and the pro- posal. We give extensive theoretical and empir- ical justification; in particular, we show that an annealed version of our method significantly out- performs both AIS and reverse AIS as proposed by Burda et al. (2015), which has been the state- of-the-art for likelihood evaluation in deep gen- erative models."
to:NB  to_read  time_series  graphical_models  statistics  fox.emily_b. 
16 days ago
Estimating the Partition Function by Discriminance Sampling
"Importance sampling (IS) and its variant, an- nealed IS (AIS) have been widely used for es- timating the partition function in graphical mod- els, such as Markov random fields and deep gen- erative models. However, IS tends to underesti- mate the partition function and is subject to high variance when the proposal distribution is more peaked than the target distribution. On the other hand, “reverse” versions of IS and AIS tend to overestimate the partition function, and degener- ate when the target distribution is more peaked than the proposal distribution. In this work, we present a simple, general method that gives much more reliable and robust estimates than either IS (AIS) or reverse IS (AIS). Our method works by converting the estimation problem into a simple classification problem that discriminates between the samples drawn from the target and the pro- posal. We give extensive theoretical and empir- ical justification; in particular, we show that an annealed version of our method significantly out- performs both AIS and reverse AIS as proposed by Burda et al. (2015), which has been the state- of-the-art for likelihood evaluation in deep gen- erative models."
to:NB  heard_the_talk  computational_statistics  exponential_families  statistics  monte_carlo  classifiers 
16 days ago
The Effect of Schooling on Cognitive Skills
"To identify the causal effect of schooling on cognitive skills, we exploit conditionally random variation in the date Swedish males take a battery of cognitive tests in preparation for military service. We find an extra ten days of school instruction raises scores on crystallized intelligence tests (synonyms and technical comprehension tests) by approximately 1% of a standard deviation, whereas extra nonschool days have almost no effect. In contrast, test scores on fluid intelligence tests (spatial and logic tests) do not increase with additional days of schooling but do increase modestly with age."
to:NB  to_read  mental_testing  iq  education  psychology  re:g_paper 
16 days ago
[1506.04322] Fast Parallel Graphlet Counting for Large Networks
"From social science to biology, numerous applications often rely on motifs for intuitive and meaningful characterization of networks at both the global macro-level as well as the local micro-level. While motifs have witnessed a tremendous success and impact in a variety of domains, there has yet to be a fast and efficient approach for computing the frequencies of these subgraph patterns. However, existing methods are not scalable to large networks with millions of nodes and edges, which impedes the application of motifs to new problems that require large-scale network analysis. To address these problems, we propose a fast, efficient, and parallel algorithm for counting motifs of size k={3,4}-nodes that take only a fraction of the time to compute when compared with the current methods used. The proposed motif counting algorithms leverages a number of proven combinatorial arguments for different motifs. For each edge, we count a few motifs, and with these counts along with the combinatorial arguments, we obtain the exact counts of others in constant time. On a large collection of 300+ networks from a variety of domains, our motif counting strategies are on average 460x faster than current methods. This brings new opportunities to investigate the use of motifs on much larger networks and newer applications as we show in our experiments. To the best of our knowledge, this paper provides the largest motif computations to date as well as the largest systematic investigation on over 300+ networks from a variety of domains."
to:NB  network_data_analysis  neville.jennifer 
17 days ago
[1506.00728] Network Assisted Analysis to Reveal the Genetic Basis of Autism
"While studies show that autism is highly heritable, the nature of the genetic basis of this disorder remains illusive. Based on the idea that highly correlated genes are functionally interrelated and more likely to affect risk, we develop a novel statistical tool to find more potentially autism risk genes by combining the genetic association scores with gene co-expression in specific brain regions and periods of development. The gene dependence network is estimated using a novel partial neighborhood selection (PNS) algorithm, where node specific properties are incorporated into network estimation for improved statistical and computational efficiency. Then we adopt a hidden Markov random field (HMRF) model to combine the estimated network and the genetic association scores in a systematic manner. The proposed modeling framework can be naturally extended to incorporate additional structural information concerning the dependence between genes. Using currently available genetic association data from whole exome sequencing studies and brain gene expression levels, the proposed algorithm successfully identified 333 genes that plausibly affect autism risk."
to:NB  graphical_models  biochemical_networks  genetics  gene_expression_data_analysis  autism  kith_and_kin  roeder.kathryn  lei.jing  liu.li 
17 days ago
Why Propensity Scores Should Not Be Used for Matching | Gary King
"Researchers use propensity score matching (PSM) as a data preprocessing step to selectively prune units prior to applying a model to estimate a causal effect. The goal of PSM is to reduce imbalance in the chosen pre-treatment covariates between the treated and control groups, thereby reducing the degree of model dependence and potential for bias. We show here that PSM often accomplishes the opposite of what is intended -- increasing imbalance, inefficiency, model dependence, and bias. The weakness of PSM is that it attempts to approximate a completely randomized experiment, rather than, as with other matching methods, a more powerful fully blocked randomized experiment. PSM, unlike other matching methods, is thus blind to the often large portion of imbalance that could have been eliminated by approximating full blocking. Moreover, in data balanced enough to approximate complete randomization, either to begin with or after pruning some observations, PSM approximates random matching which turns out to increase imbalance. For other matching methods, the point where additional pruning increases imbalance occurs much later in the pruning process, when full blocking is approximated and there is no reason to prune, and so the danger is considerably less. We show that these problems with PSM occur even in data designed for PSM, with as few as two covariates, and in many real applications. Although these results suggest that researchers replace PSM with one of the other available methods when performing matching, propensity scores have many other productive uses."

--- The point about the simple randomized experiment vs. blocking is a good one.
--- An alternative to matching I've never seen explored (which probably means it's obviously flawed) would be to just do a two-variable non-parametric regression with the treatment variable and the propensity score...
to:NB  have_read  causal_inference  matching  statistics  king.gary  experimental_design 
17 days ago
Model Search by Bootstrap "Bumping" on JSTOR
"We propose a bootstrap-based method for enhancing a search through a space of models. The technique is well suited to complex, adaptively fitted models--it provides a convenient method for finding better local minima and for resistant fitting. Applications to regression, classification, and density estimation are described. We also provide results on the asymptotic behavior of bumping estimates."
to:NB  estimation  bootstrap  statistics  tibshirani.robert 
17 days ago
[1506.00251] Kinetics of Social Contagion
"Diffusion of information, behavioural patterns or innovations follows diverse pathways depending on a number of conditions, including the structure of the underlying social network, the sensitivity to peer pressure and the influence of media. Here we study analytically and by simulations a general model that incorporates threshold mechanism capturing sensitivity to peer pressure, the effect of `immune' nodes who never adopt, and a perpetual flow of external information. While any constant, non-zero rate of dynamically-introduced innovators leads to global spreading, the kinetics by which the asymptotic state is approached show rich behaviour. In particular we find that, as a function of the density of immune nodes, there is a transition from fast to slow spreading governed by entirely different mechanisms. This transition happens below the percolation threshold of fragmentation of the network, and has its origin in the competition between cascading behaviour induced by innovators and blocking of adoption due to immune nodes. This change is accompanied by a percolation transition of the induced clusters."
to:NB  networks  epidemic_models  re:do-institutions-evolve 
17 days ago
[1506.00627] Spatio-Temporal Complex Networks: Reachability, Centrality, and Robustness
"While recent advances in spatial and temporal networks have enabled researchers to more-accurately describe many real-world systems, existing models do not capture the combined constraint that space and time impose on the relationships and interactions present in a spatio-temporal complex network. This has important consequences, often resulting in an over-simplification of the resilience of a system and obscuring the network's true structure. In this paper, we study the response of spatio-temporal complex networks to random error and systematic attack. Firstly, we propose a model of spatio-temporal paths in time-varying spatially embedded networks. This model captures the property that, in many real-world systems, interaction between nodes is non-instantaneous and governed by the space in which they are embedded. Secondly, using numerical experiments on four empirical examples of such systems, we study the effect of node failure on a network's topological, temporal, and spatial structure. We find that networks exhibit divergent behaviour with respect to random error and systematic attack. Finally, to identify weaknesses specific to the behaviour of a spatio-temporal system, we introduce centrality measures that evaluate the importance of a node as a structural bridge and its role in supporting temporally efficient flow through the network. We explore the disruption to each system caused by attack strategies based on each of these centrality measures. This exposes the complex nature of fragility in a spatio-temporal system, showing that there is a variety of failure modes when a network is subject to systematic attack."
to:NB  networks  temporal_networks  re:do-institutions-evolve 
17 days ago
[1506.05490] Structural inference for uncertain networks
"In the study of networked systems such as biological, technological, and social networks the available data are often uncertain. Rather than knowing the structure of a network exactly, we know the connections between nodes only with a certain probability. In this paper we develop methods for the analysis of such uncertain data, focusing particularly on the problem of community detection. We give a principled maximum-likelihood method for inferring community structure and demonstrate how the results can be used to make improved estimates of the true structure of the network. Using computer-generated benchmark networks we demonstrate that our methods are able to reconstruct known communities more accurately than previous approaches based on data thresholding. We also give an example application to the detection of communities in a protein-protein interaction network."
to:NB  network_data_analysis  community_discovery  inference_to_latent_objects  kith_and_kin  newman.mark  statistics 
17 days ago
[1506.03022] The Majority Illusion in Social Networks
"Social behaviors are often contagious, spreading through a population as individuals imitate the decisions and choices of others. A variety of global phenomena, from innovation adoption to the emergence of social norms and political movements, arise as a result of people following a simple local rule, such as copy what others are doing. However, individuals often lack global knowledge of the behaviors of others and must estimate them from the observations of their friends' behaviors. In some cases, the structure of the underlying social network can dramatically skew an individual's local observations, making a behavior appear far more common locally than it is globally. We trace the origins of this phenomenon, which we call "the majority illusion," to the friendship paradox in social networks. As a result of this paradox, a behavior that is globally rare may be systematically overrepresented in the local neighborhoods of many people, i.e., among their friends. Thus, the "majority illusion" may facilitate the spread of social contagions in networks and also explain why systematic biases in social perceptions, for example, of risky behavior, arise. Using synthetic and real-world networks, we explore how the "majority illusion" depends on network structure and develop a statistical model to calculate its magnitude in a network."
to:NB  epidemic_models  social_networks  lerman.kristina  yan.xiaoran  re:do-institutions-evolve  kith_and_kin 
17 days ago
[1506.04506] Multiple phases in modularity-based community detection
"Detecting communities in a network, based only on the adjacency matrix, is a problem of interest to several scientific disciplines. Recently, Zhang and Moore have introduced an algorithm in [P. Zhang and C. Moore, Proceedings of the National Academy of Sciences 111, 18144 (2014)], called mod-bp, that avoids overfitting the data by optimizing a weighted average of modularity (a popular goodness-of-fit measure in community detection) and entropy (i.e. number of configurations with a given modularity). The adjustment of the relative weight, the "temperature" of the model, is crucial for getting a correct result from mod-bp. In this work we study the many phase transitions that mod-bp may undergo by changing the two parameters of the algorithm: the temperature T and the maximum number of groups q. We introduce a new set of order parameters that allow to determine the actual number of groups q^, and we observe on both synthetic and real networks the existence of phases with any q^∈{1,q}, which were unknown before. We discuss how to interpret the results of mod-bp and how to make the optimal choice for the problem of detecting significant communities."
to:NB  community_discovery  statistical_mechanics  statistics 
17 days ago
[1506.00986] The Impact of Heterogeneous Thresholds on Social Contagion with Multiple Initiators
"The threshold model is a simple but classic model of contagion spreading in complex social systems. To capture the complex nature of social influencing we investigate numerically and analytically the transition in the behavior of threshold-limited cascades in the presence of multiple initiators as the distribution of thresholds is varied between the two extreme cases of identical thresholds and a uniform distribution. We accomplish this by employing a truncated normal distribution of the nodes' thresholds and observe a non-monotonic change in the cascade size as we vary the standard deviation. Further, for a sufficiently large spread in the threshold distribution, the tipping-point behavior of the social influencing process disappears and is replaced by a smooth crossover governed by the size of initiator set. We demonstrate that for a given size of the initiator set, there is a specific variance of the threshold distribution for which an opinion spreads optimally. Furthermore, in the case of synthetic graphs we show that the spread asymptotically becomes independent of the system size, and that global cascades can arise just by the addition of a single node to the initiator set."
to:NB  epidemic_models  networks  re:do-institutions-evolve  stochastic_processes 
17 days ago
[1505.07478] Generalized communities in networks
"A substantial volume of research has been devoted to studies of community structure in networks, but communities are not the only possible form of large-scale network structure. Here we describe a broad extension of community structure that encompasses traditional communities but includes a wide range of generalized structural patterns as well. We describe a principled method for detecting this generalized structure in empirical network data and demonstrate with real-world examples how it can be used to learn new things about the shape and meaning of networks."
to:NB  to_read  network_data_analysis  community_discovery  statistics  scooped?  kith_and_kin  newman.mark  re:smoothing_adjacency_matrices 
17 days ago
[1506.01831] Handy sufficient conditions for the convergence of the maximum likelihood estimator in observation-driven models
"This paper generalizes asymptotic properties obtained in the observation-driven times series models considered by \cite{dou:kou:mou:2013} in the sense that the conditional law of each observation is also permitted to depend on the parameter. The existence of ergodic solutions and the consistency of the Maximum Likelihood Estimator (MLE) are derived under easy-to-check conditions. The obtained conditions appear to apply for a wide class of models. We illustrate our results with specific observation-driven times series, including the recently introduced NBIN-GARCH and NM-GARCH models, demonstrating the consistency of the MLE for these two models."
to:NB  statistics  likelihood  estimation  statistical_inference_for_stochastic_processes  douc.randal 
17 days ago
[1506.02084] Exact P-values for Network Interference
"We study the calculation of exact p-values for a large class of non-sharp null hypotheses about treatment effects in a setting with data from experiments involving members of a single connected network. The class includes null hypotheses that limit the effect of one unit's treatment status on another according to the distance between units; for example, the hypothesis might specify that the treatment status of immediate neighbors has no effect, or that units more than two edges away have no effect. We also consider hypotheses concerning the validity of sparsification of a network (for example based on the strength of ties) and hypotheses restricting heterogeneity in peer effects (so that, for example, only the number or fraction treated among neighboring units matters). Our general approach is to define an artificial experiment, such that the null hypothesis that was not sharp for the original experiment is sharp for the artificial experiment, and such that the randomization analysis for the artificial experiment is validated by the design of the original experiment."
to:NB  causal_inference  network_data_analysis  statistics  hypothesis_testing  eckles.dean 
17 days ago
[1506.02326] Estimation of the variance of partial sums of dependent processes
"We study subsampling estimators for the limit variance
of partial sums of a stationary stochastic process (Xk)k≥1. We establish L2-consistency of a non-overlapping block resampling method. Our results apply to processes that can be represented as functionals of strongly mixing processes. Motivated by recent applications to rank tests, we also study estimators for the series Var(F(X1))+2∑∞k=2Cov(F(X1),F(Xk)), where F is the distribution function of X1. Simulations illustrate the usefulness of the proposed estimators and of a mean squared error optimal rule for the choice of the block length."
to:NB  statistics  time_series  variance_estimation  bootstrap 
17 days ago
[1506.02494] backShift: Learning causal cyclic graphs from unknown shift interventions
"We propose a simple method to learn linear causal cyclic models in the presence of latent variables. The method relies on equilibrium data of the model recorded under shift interventions. The location and strength of these interventions do not have to be known and can be estimated from the data. Our method, called backShift, only uses second moments of the data and performs simple joint matrix diagonalization, applied to differences between covariance matrices. We give a sufficient and necessary condition for identifiability of the system, which is fulfilled almost surely under some quite general assumptions if and only if there are at least three distinct experimental settings, one of which can be pure observational data. We demonstrate the performance on some simulated data and applications in flow cytometry and financial time series. The code is made available as R-package backShift."
to:NB  causal_inference  causal_discovery  statistics 
17 days ago
[1506.02686] The LICORS Cabinet: Nonparametric Algorithms for Spatio-temporal Prediction
"For the task of unsupervised spatio-temporal forecasting (e.g., learning to predict video data without labels), we propose two new nonparametric predictive state algorithms, Moonshine and One Hundred Proof. The algorithms are conceptually simple and make few assumptions on the underlying spatio-temporal process yet have strong predictive performance and provide predictive distributions over spatio-temporal data. The latter property allows for likelihood estimation under the models, for classification and other probabilistic inference."
self-promotion  statistics  prediction  spatio-temporal_statistics  in_NB 
17 days ago
[1506.03486] Sequential Nonparametric Testing with the Law of the Iterated Logarithm
"Consider the problem of nonparametric two-sample mean testing, where we have access to i.i.d. samples from two multivariate distributions and wish to test whether they have the same mean. We propose a \textit{sequential} test for this problem suitable for data-rich, memory-constrained situations. It is novel in several ways: it takes linear time and constant space to compute on the fly, and has robust high-dimensional statistical performance, including basically the same power guarantee (for a given false positive rate) as a batch/offline version of the test with the same computational constraints. Most notably, it has a distinct computational advantage over the batch test, because it accesses only as many samples as are required -- its stopping time is adaptive to the unknown difficulty of the problem.
"We analyze the test and prove these properties in a rigorously finite-sample fashion, using a novel uniform empirical Bernstein version of the law of the iterated logarithm (LIL), which may be of independent interest and allows analysis of sequential tests in a general framework. We demonstrate how to extend our ideas to nonparametric homogeneity and independence testing, and make a case for their even broader applicability."
to:NB  hypothesis_testing  statistics  nonparametrics  kith_and_kin  ramdas.aaditya 
17 days ago
[1506.03850] Generalized Additive Model Selection
"We introduce GAMSEL (Generalized Additive Model Selection), a penalized likelihood approach for fitting sparse generalized additive models in high dimension. Our method interpolates between null, linear and additive models by allowing the effect of each variable to be estimated as being either zero, linear, or a low-complexity curve, as determined by the data. We present a blockwise coordinate descent procedure for efficiently optimizing the penalized likelihood objective over a dense grid of the tuning parameter, producing a regularization path of additive models. We demonstrate the performance of our method on both real and simulated data examples, and compare it with existing techniques for additive model selection."
to:NB  statistics  sparsity  additive_models  regression  kith_and_kin  chouldechova.alexandra  hastie.trevor 
17 days ago
[1506.05779] Simultaneous likelihood-based bootstrap confidence sets for a large number of models
"The paper studies a problem of constructing simultaneous likelihood-based confidence sets. We consider a simultaneous multiplier bootstrap procedure for estimating the quantiles of the joint distribution of the likelihood ratio statistics, and for adjusting the confidence level for multiplicity. Theoretical results state the bootstrap validity in the following setting: the sample size \(n\) is fixed, the maximal parameter dimension \(p_{\textrm{max}}\) and the number of considered parametric models \(K\) are s.t. \((\log K)^{12}p_{\max}^{3}/n\) is small. We also consider the situation when the parametric models are misspecified. If the models' misspecification is significant, then the bootstrap critical values exceed the true ones and the simultaneous bootstrap confidence set becomes conservative. Numerical experiments for local constant and local quadratic regressions illustrate the theoretical results."
to:NB  bootstrap  confidence_sets  statistics 
17 days ago
[1506.05822] Optimal model-free prediction from multivariate time series
"Forecasting a time series from multivariate predictors constitutes a challenging problem, especially using model-free approaches. Most techniques, such as nearest-neighbor prediction, quickly suffer from the curse of dimensionality and overfitting for more than a few predictors which has limited their application mostly to the univariate case. Therefore, selection strategies are needed that harness the available information as efficiently as possible. Since often the right combination of predictors matters, ideally all subsets of possible predictors should be tested for their predictive power, but the exponentially growing number of combinations makes such an approach computationally prohibitive. Here a prediction scheme that overcomes this strong limitation is introduced utilizing a causal pre-selection step which drastically reduces the number of possible predictors to the most predictive set of causal drivers making a globally optimal search scheme tractable. The information-theoretic optimality is derived and practical selection criteria are discussed. As demonstrated for multivariate nonlinear stochastic delay processes, the optimal scheme can even be less computationally expensive than commonly used sub-optimal schemes like forward selection. The method suggests a general framework to apply the optimal model-free approach to select variables and subsequently fit a model to further improve a prediction or learn statistical dependencies. The performance of this framework is illustrated on a climatological index of El Ni\~no Southern Oscillation."
to:NB  prediction  time_series  statistics  kurths.jurgen 
17 days ago
[1506.05900] Representation Learning for Clustering: A Statistical Framework
"We address the problem of communicating domain knowledge from a user to the designer of a clustering algorithm. We propose a protocol in which the user provides a clustering of a relatively small random sample of a data set. The algorithm designer then uses that sample to come up with a data representation under which k-means clustering results in a clustering (of the full data set) that is aligned with the user's clustering. We provide a formal statistical model for analyzing the sample complexity of learning a clustering representation with this paradigm. We then introduce a notion of capacity of a class of possible representations, in the spirit of the VC-dimension, showing that classes of representations that have finite such dimension can be successfully learned with sample size error bounds, and end our discussion with an analysis of that dimension for classes of representations induced by linear embeddings."
to:NB  machine_learning  representation  learning_theory  clustering  vc-dimension 
17 days ago
[1506.06162] Private Graphon Estimation for Sparse Graphs
"We design algorithms for fitting a high-dimensional statistical model to a large, sparse network without revealing sensitive informataion of individual members. Given a sparse input graph G, our algorithms output a node-differentially-private nonparametric block model approximation. By node-differentially-private, we mean that our output hides the insertion or removal of a vertex and all its adjacent edges. If G is an instance of the network obtained from a generative nonparametric model defined in terms of a graphon W, our model guarantees consistency, in the sense that as the number of vertices tends to infinity, the output of our algorithm converges to W in an appropriate version of the L2 norm. In particular, this means we can estimate the sizes of all multi-way cuts in G.
"Our results hold as long as W is bounded, the average degree of G grows at least like the log of the number of vertices, and the number of blocks goes to infinity at an appropriate rate. We give explicit error bounds in terms of the parameters of the model; in several settings, our bounds improve on or match known nonprivate results."
to:NB  graph_limits  nonparametrics  differential_privacy  stochastic_block_models  statistics  re:smoothing_adjacency_matrices  borgs.christian  chayes.jennifer 
17 days ago
[1506.06266] Uniform Asymptotic Inference and the Bootstrap After Model Selection
"Recently, Taylor et al. (2014) developed a method for making inferences on parameters after model selection, in a regression setting with normally distributed errors. In this work, we study the large sample properties of this method, without assuming normality. We prove that the test statistic of Taylor et al. (2014) is asymptotically pivotal, as the number of samples n grows and the dimension d of the regression problem stays fixed; our asymptotic result is uniformly valid over a wide class of nonnormal error distributions. We also propose an efficient bootstrap version of this test that is provably (asymptotically) conservative, and in practice, often delivers shorter confidence intervals that the original normality-based approach. Finally, we prove that the test statistic of Taylor et al. (2014) does not converge uniformly in a high-dimensional setting, when the dimension d is allowed grow"
to:NB  to_read  model_selection  confidence_sets  regression  high-dimensional_statistics  kith_and_kin  bootstrap  wasserman.larry  rinaldo.alessandro  tibshirani.ryan  tibshirani.robert 
17 days ago
[1506.07669] A review of some recent advances in causal inference
"We give a selective review of some recent developments in causal inference, intended for researchers who are not familiar with graphical models and causality, and with a focus on methods that are applicable to large data sets. We mainly address the problem of estimating causal effects from observational data. For example, one can think of estimating the effect of single or multiple gene knockouts from wild-type gene expression data, that is, from gene expression measurements that were obtained without doing any gene knockout experiments.
"We assume that the observational data are generated from a causal structure that can be represented by a directed acyclic graph (DAG). First, we discuss estimation of causal effects when the underlying causal DAG is known. In large-scale networks, however, the causal DAG is often unknown. Next, we therefore discuss causal structure learning, that is, learning information about the causal structure from observational data. We then combine these two parts and discuss methods to estimate (bounds on) causal effects from observational data when the causal structure is unknown. We also illustrate this method on a yeast gene expression data set. We close by mentioning several extensions of the discussed work.'
to:NB  causal_inference  graphical_models  partial_identification  statistics  maathuis.marloes 
17 days ago
[1506.07806] Properties of Latent Variable Network Models
"We derive properties of Latent Variable Models for networks, a broad class of models that includes the widely-used Latent Position Models. These include the average degree distribution, clustering coefficient, average path length and degree correlations. We introduce the Gaussian Latent Position Model, and derive analytic expressions and asymptotic approximations for its network properties. We pay particular attention to one special case, the Gaussian Latent Position Models with Random Effects, and show that it can represent the heavy-tailed degree distributions, positive asymptotic clustering coefficients and small-world behaviours that are often observed in social networks. Several real and simulated examples illustrate the ability of the models to capture important features of observed networks."
to:NB  statistics  network_data_analysis  inference_to_latent_objects  networks  to_read 
17 days ago
[1506.08826] Statistical Inference using the Morse-Smale Complex
"The Morse-Smale complex decomposes the sample space into cells where a given function f is increasing or decreasing. When applied to nonparametric density estimation and regression, it provides a way to represent, visualize and compare functions, even in high dimensions. In this paper, we study the estimation of the Morse-Smale complex and we use our results for a variety of statistical problems including: nonparametric two-sample testing, density estimation, nonparametric regression and mode clustering."
to:NB  statistics  nonparametrics  re:network_differences  wasserman.larry  genovese.christopher  chen.yen-chi  kith_and_kin  topology 
17 days ago
[1506.08910] Learning Single Index Models in High Dimensions
"Single Index Models (SIMs) are simple yet flexible semi-parametric models for classification and regression. Response variables are modeled as a nonlinear, monotonic function of a linear combination of features. Estimation in this context requires learning both the feature weights, and the nonlinear function. While methods have been described to learn SIMs in the low dimensional regime, a method that can efficiently learn SIMs in high dimensions has not been forthcoming. We propose three variants of a computationally and statistically efficient algorithm for SIM inference in high dimensions. We establish excess risk bounds for the proposed algorithms and experimentally validate the advantages that our SIM learning methods provide relative to Generalized Linear Model (GLM) and low dimensional SIM based learning methods."
to:NB  regression  nonparametrics  high-dimensional_statistics  willett.rebecca  statistics 
17 days ago
[1506.02903] Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path
"This article provides the first procedure for computing a fully data-dependent interval that traps the mixing time tmix of a finite reversible ergodic Markov chain at a prescribed confidence level. The interval is computed from a single finite-length sample path from the Markov chain, and does not require the knowledge of any parameters of the chain. This stands in contrast to previous approaches, which either only provide point estimates, or require a reset mechanism, or additional prior knowledge. The width of the interval converges to zero at a n‾‾√ rate, where n is the length of the sample path. Upper and lower bounds are given on the number of samples required to achieve constant-factor multiplicative accuracy. The lower bounds indicate that, unless further restrictions are placed on the chain, no procedure can achieve this accuracy level before seeing each state at least Ω(tmix) times on the average. Finally, future directions of research are identified."
to:NB  mixing  markov_models  statistical_inference_for_stochastic_processes  monte_carlo  kontorovich.aryeh  kith_and_kin 
17 days ago
[1506.04513] Convex Risk Minimization and Conditional Probability Estimation
"This paper proves, in very general settings, that convex risk minimization is a procedure to select a unique conditional probability model determined by the classification problem. Unlike most previous work, we give results that are general enough to include cases in which no minimum exists, as occurs typically, for instance, with standard boosting algorithms. Concretely, we first show that any sequence of predictors minimizing convex risk over the source distribution will converge to this unique model when the class of predictors is linear (but potentially of infinite dimension). Secondly, we show the same result holds for \emph{empirical} risk minimization whenever this class of predictors is finite dimensional, where the essential technical contribution is a norm-free generalization bound."
to:NB  learning_theory  statistics  density_estimation  re:AoS_project 
17 days ago
[1505.00869] On the Feasibility of Distributed Kernel Regression for Big Data
"In modern scientific research, massive datasets with huge numbers of observations are frequently encountered. To facilitate the computational process, a divide-and-conquer scheme is often used for the analysis of big data. In such a strategy, a full dataset is first split into several manageable segments; the final output is then averaged from the individual outputs of the segments. Despite its popularity in practice, it remains largely unknown that whether such a distributive strategy provides valid theoretical inferences to the original data. In this paper, we address this fundamental issue for the distributed kernel regression (DKR), where the algorithmic feasibility is measured by the generalization performance of the resulting estimator. To justify DKR, a uniform convergence rate is needed for bounding the generalization error over the individual outputs, which brings new and challenging issues in the big data setup. Under mild conditions, we show that, with a proper number of segments, DKR leads to an estimator that is generalization consistent to the unknown regression function. The obtained results justify the method of DKR and shed light on the feasibility of using other distributed algorithms for processing big data. The promising preference of the method is supported by both simulation and real data examples."
to:NB  computational_statistics  kernel_estimators  regression  statistics 
17 days ago
The geometry of graphs and some of its algorithmic applications (Linial, London and Rabinovich, 1995)
"In this paper we explore some implications of viewing graphs asgeometric objects. This approach offers a new perspective on a number of graph-theoretic and algorithmic problems. There are several ways to model graphs geometrically and our main concern here is with geometric representations that respect themetric of the (possibly weighted) graph. Given a graphG we map its vertices to a normed space in an attempt to (i) keep down the dimension of the host space, and (ii) guarantee a smalldistortion, i.e., make sure that distances between vertices inG closely match the distances between their geometric images.
"In this paper we develop efficient algorithms for embedding graphs low-dimensionally with a small distortion. Further algorithmic applications include:
"•A simple, unified approach to a number of problems on multicommodity flows, including the Leighton-Rao Theorem [37] and some of its extensions. We solve an open question in this area, showing that the max-flow vs. min-cut gap in thek-commodities problem isO(logk). Our new deterministic polynomial-time algorithm finds a (nearly tight) cut meeting this bound.
"•For graphs embeddable in low-dimensional spaces with a small distortion, we can find low-diameter decompositions (in the sense of [7] and [43]). The parameters of the decomposition depend only on the dimension and the distortion and not on the size of the graph.
"•In graphs embedded this way, small balancedseparators can be found efficiently.
"Given faithful low-dimensional representations of statistical data, it is possible to obtain meaningful and efficientclustering. This is one of the most basic tasks in pattern-recognition. For the (mostly heuristic) methods used in the practice of pattern-recognition, see [20], especially chapter 6.
"Our studies of multicommodity flows also imply that every embedding of (the metric of) ann-vertex, constant-degree expander into a Euclidean space (of any dimension) has distortion Ω(logn). This result is tight, and closes a gap left open by Bourgain [12]."

--- So why don't we think of communities in terms of low-diameter decompositions?
in_NB  graph_theory  dimension_reduction  mathematics  random_projections  clustering  have_read 
17 days ago
[1507.02803] Logarithmic Sobolev inequalities in discrete product spaces: a proof by a transportation cost distance
"The aim of this paper is to prove an inequality between relative entropy and the sum of average conditional relative entropies of the following form: For a fixed probability measure qn on n, ( is a finite set), and any probability measure pn=(Yn) on n, we have
where pi(⋅|y1,…,yi−1,yi+1,…,yn) and qi(⋅|x1,…,xi−1,xi+1,…,xn) denote the local specifications for pn resp. qn. The constant shall depend on the properties of the local specifications of qn"

(The abstract goes on)
to:NB  probability  information_theory  concentration_of_measure  via:ded-maxim  marton.katalin 
17 days ago
Carrots with Sausage and Rosemary Recipe -Marcia Kiesel | Food & Wine
Replaced the parsley with the adding in the green tops of the carrots, cooking for an extra minute with added vinegar. The rosemary is excessive if using dried rather than fresh.
food  recipes  have_made 
22 days ago
The DeLorean Sisters
"The DeLorean Sisters is a band that reimagines 1980’s pop, synth, and hair-metal hits as alt-country and Americana ditties... "

Observations: 1.This shouldn't work nearly as well as it does. 2. I really need to read _Retromania_.
have_listened  music  nostalgia  affectionate_parody  via:? 
22 days ago
Posterior predictive checks to quantify lack-of-fit in admixture models of latent population structure
"Admixture models are a ubiquitous approach to capture latent population structure in genetic samples. Despite the widespread application of admixture models, little thought has been devoted to the quality of the model fit or the accuracy of the estimates of parameters of interest for a particular study. Here we develop methods for validating admixture models based on posterior predictive checks (PPCs), a Bayesian method for assessing the quality of fit of a statistical model to a specific dataset. We develop PPCs for five population-level statistics of interest: within-population genetic variation, background linkage disequilibrium, number of ancestral populations, between-population genetic variation, and the downstream use of admixture parameters to correct for population structure in association studies. Using PPCs, we evaluate the quality of the admixture model fit to four qualitatively different population genetic datasets: the population reference sample (POPRES) European individuals, the HapMap phase 3 individuals, continental Indians, and African American individuals. We found that the same model fitted to different genomic studies resulted in highly study-specific results when evaluated using PPCs, illustrating the utility of PPCs for model-based analyses in large genomic studies."
to:NB  to_read  human_genetics  historical_genetics  model_checking  mixture_models  blei.david  statistics  kith_and_kin 
22 days ago
Empirical prediction intervals revisited
"Empirical prediction intervals are constructed based on the distribution of previous out-of-sample forecast errors. Given historical data, a sample of such forecast errors is generated by successively applying a chosen point forecasting model to a sequence of fixed windows of past observations and recording the associated deviations of the model predictions from the actual observations out-of-sample. The suitable quantiles of the distribution of these forecast errors are then used along with the point forecast made by the selected model to construct an empirical prediction interval. This paper re-examines the properties of the empirical prediction interval. Specifically, we provide conditions for its asymptotic validity, evaluate its small sample performance and discuss its limitations."

--- From examining the proofs, the only real use of stationary ergodicity of the original processes is to get stationary ergodicity of finite-window prediction errors. Hence this would work for a non-stationary process which admitted stationary-ergodic forecasting errors.
in_NB  time_series  have_read  prediction  confidence_sets  statistics  statistical_inference_for_stochastic_processes 
23 days ago
R interface to the Gastner-Newman cartogram code. I haven't tried it out yet.
R  visual_display_of_quantitative_information  cartograms 
25 days ago
Report: Careers outside of academia are richly rewarding for Ph.D. physicists | EurekAlert! Science News
Is it just me, or does this sound a bit like someone at a SLAC extolling the virtues of a liberal arts degree?
academia  physics  via:cw 
25 days ago
[1301.1273] Bayes and Frequentism: a Particle Physicist's perspective
Decent, but unspectacular. More Bayes-friendly than I'd be (of course).
to:NB  statistics  particle_physics 
29 days ago
Smith, A.T: The Political Machine: Assembling Sovereignty in the Bronze Age Caucasus. (eBook and Hardcover)
"The Political Machine investigates the essential role that material culture plays in the practices and maintenance of political sovereignty. Through an archaeological exploration of the Bronze Age Caucasus, Adam Smith demonstrates that beyond assemblies of people, polities are just as importantly assemblages of things—from ballots and bullets to crowns, regalia, and licenses. Smith looks at the ways that these assemblages help to forge cohesive publics, separate sovereigns from a wider social mass, and formalize governance—and he considers how these developments continue to shape politics today.
"Smith shows that the formation of polities is as much about the process of manufacturing assemblages as it is about disciplining subjects, and that these material objects or “machines” sustain communities, orders, and institutions. The sensibilities, senses, and sentiments connecting people to things enabled political authority during the Bronze Age and fortify political power even in the contemporary world. Smith provides a detailed account of the transformation of communities in the Caucasus, from small-scale early Bronze Age villages committed to egalitarianism, to Late Bronze Age polities predicated on radical inequality, organized violence, and a centralized apparatus of rule."

--- I guess Deleuzian archaeology is a thing?
to:NB  books:noted  archaeology  state-building 
29 days ago
Henrich, J.: The Secret of Our Success: How Culture Is Driving Human Evolution, Domesticating Our Species, and Making Us Smarter. (eBook and Hardcover)
"Humans are a puzzling species. On the one hand, we struggle to survive on our own in the wild, often unable to solve basic problems, like obtaining food, building shelters, or avoiding predators. On the other hand, human groups have produced innovative technologies, sophisticated languages, and complex institutions that have permitted us to successfully expand into environments across the globe. What has enabled us to dominate such a vast range of environments, more than any other species? As this book shows, the secret of our success lies not in our innate intelligence, but in our collective brains—in the ability of human groups to socially interconnect and learn from one another.
"Drawing insights from lost European explorers, clever chimpanzees, hunter-gatherers, neuroscientists, ancient bones, and the human genome, Joseph Henrich demonstrates how our collective brains have propelled our species’ genetic evolution and shaped our biology. Our early capacities for learning from others produced many innovations, such as fire, cooking, water containers, plant knowledge, and projectile weapons, which in turn drove the expansion of our brains and altered our physiology, anatomy, and psychology in crucial ways. Further on, some collective brains generated and recombined powerful concepts, such as the lever, wheel, screw, and writing. Henrich shows how our genetics and biology are inextricably interwoven with cultural evolution, and that this particular culture-gene interaction has propelled our species on an extraordinary evolutionary trajectory."
to:NB  books:noted  human_evolution  collective_cognition  social_life_of_the_mind  cultural_transmission_of_cognitive_tools  popular_social_science 
29 days ago
Plant Sensing and Communication, Karban
"In Plant Sensing and Communication, Richard Karban provides the first comprehensive overview of what is known about how plants perceive their environments, communicate those perceptions, and learn. Facing many of the same challenges as animals, plants have developed many similar capabilities: they sense light, chemicals, mechanical stimulation, temperature, electricity, and sound. Moreover, prior experiences have lasting impacts on sensitivity and response to cues; plants, in essence, have memory. Nor are their senses limited to the processes of an individual plant: plants eavesdrop on the cues and behaviors of neighbors and—for example, through flowers and fruits—exchange information with other types of organisms. Far from inanimate organisms limited by their stationary existence, plants, this book makes unquestionably clear, are in constant and lively discourse."
to:NB  books:noted  cognitive_science  biology  botany 
29 days ago
Margetts, H., John, P., Hale, S., Yasseri, T.: Political Turbulence: How Social Media Shape Collective Action. (eBook and Hardcover)
"As people spend increasing proportions of their daily lives using social media, such as Twitter and Facebook, they are being invited to support myriad political causes by sharing, liking, endorsing, or downloading. Chain reactions caused by these tiny acts of participation form a growing part of collective action today, from neighborhood campaigns to global political movements. Political Turbulence reveals that, in fact, most attempts at collective action online don’t succeed, but some give rise to huge mobilizations—even revolutions.
"Drawing on large-scale data generated from the Internet and real-world events, this book shows how mobilizations that succeed are unpredictable, unstable, and often unsustainable. To better understand this unruly new force in the political world, the authors use experiments that test how social media influence citizens deciding whether or not to participate. They show how different personality types react to these social influences and identify which types of people are willing to participate at an early stage in a mobilization when there are few supporters or signals of viability. The authors argue that pluralism is the model of democracy that is emerging in the social media age—not the ordered, organized vision of early pluralists, but a chaotic, turbulent form of politics."
to:NB  books:noted  social_media  networked_life  political_science  to_be_shot_after_a_fair_trial  collective_action 
4 weeks ago
Patel, K.: The New Deal: A Global History. (eBook and Hardcover)
"The New Deal: A Global History provides a radically new interpretation of a pivotal period in U.S. history. The first comprehensive study of the New Deal in a global context, the book compares American responses to the international crisis of capitalism and democracy during the 1930s to responses by other countries around the globe—not just in Europe but also in Latin America, Asia, and other parts of the world. Work creation, agricultural intervention, state planning, immigration policy, the role of mass media, forms of political leadership, and new ways of ruling America’s colonies—all had parallels elsewhere and unfolded against a backdrop of intense global debates.
"By avoiding the distortions of American exceptionalism, Kiran Klaus Patel shows how America’s reaction to the Great Depression connected it to the wider world. Among much else, the book explains why the New Deal had enormous repercussions on China; why Franklin D. Roosevelt studied the welfare schemes of Nazi Germany; and why the New Dealers were fascinated by cooperatives in Sweden—but ignored similar schemes in Japan."
to:NB  books:noted  american_history  comparative_history 
4 weeks ago
Andrade, T.: The Gunpowder Age: China, Military Innovation, and the Rise of the West in World History. (Hardcover)
"The Chinese invented gunpowder and began exploring its military uses as early as the 900s, four centuries before the technology passed to the West. But by the early 1800s, China had fallen so far behind the West in gunpowder warfare that it was easily defeated by Britain in the Opium War of 1839–42. What happened? In The Gunpowder Age, Tonio Andrade offers a compelling new answer, opening a fresh perspective on a key question of world history: why did the countries of western Europe surge to global importance starting in the 1500s while China slipped behind?
"Historians have long argued that gunpowder weapons helped Europeans establish global hegemony. Yet the inhabitants of what is today China not only invented guns and bombs but also, Andrade shows, continued to innovate in gunpowder technology through the early 1700s—much longer than previously thought. Why, then, did China become so vulnerable? Andrade argues that one significant reason is that it was out of practice fighting wars, having enjoyed nearly a century of relative peace, since 1760. Indeed, he demonstrates that China—like Europe—was a powerful military innovator, particularly during times of great warfare, such as the violent century starting after the Opium War, when the Chinese once again quickly modernized their forces. Today, China is simply returning to its old position as one of the world’s great military powers."
to:NB  books:noted  china  early_modern_world_history  gunpowder  mother_courage_raises_the_west  imperialism 
4 weeks ago
« earlier      
academia afghanistan agent-based_models american_history ancient_history archaeology art bad_data_analysis bad_science_journalism bayesian_consistency bayesianism biochemical_networks blogged book_reviews books:noted books:owned books:recommended bootstrap cartoons causal_inference causality central_asia central_limit_theorem class_struggles_in_america classifiers climate_change clustering cognitive_science collective_cognition comics community_discovery complexity computational_statistics concentration_of_measure confidence_sets corruption coveted crime cthulhiana cultural_criticism cultural_exchange data_analysis data_mining debunking decision-making decision_theory delong.brad democracy density_estimation dimension_reduction distributed_systems dynamical_systems econometrics economic_history economic_policy economics education empirical_processes ensemble_methods entropy_estimation epidemic_models epistemology ergodic_theory estimation evisceration evolution_of_cooperation evolutionary_biology experimental_psychology finance financial_crisis_of_2007-- financial_markets financial_speculation fmri food fraud functional_connectivity funny funny:academic funny:geeky funny:laughing_instead_of_screaming funny:malicious funny:pointed genetics graph_theory graphical_models have_read heard_the_talk heavy_tails high-dimensional_statistics hilbert_space history_of_ideas history_of_science human_genetics hypothesis_testing ideology imperialism in_nb inequality inference_to_latent_objects information_theory institutions kernel_estimators kernel_methods kith_and_kin krugman.paul large_deviations lasso learning_theory liberman.mark likelihood linguistics literary_criticism machine_learning macro_from_micro macroeconomics manifold_learning market_failures_in_everything markov_models mathematics mixing mixture_models model_selection modeling modern_ruins monte_carlo moral_psychology moral_responsibility mortgage_crisis natural_history_of_truthiness network_data_analysis networked_life networks neural_data_analysis neuroscience non-equilibrium nonparametrics optimization our_decrepit_institutions philosophy philosophy_of_science photos physics pittsburgh political_economy political_science practices_relating_to_the_transmission_of_genetic_information prediction pretty_pictures principal_components probability programming progressive_forces psychology r racism random_fields re:almost_none re:aos_project re:democratic_cognition re:do-institutions-evolve re:g_paper re:homophily_and_confounding re:network_differences re:smoothing_adjacency_matrices re:social_networks_as_sensor_networks re:stacs re:your_favorite_dsge_sucks recipes regression running_dogs_of_reaction science_as_a_social_process science_fiction simulation social_influence social_life_of_the_mind social_media social_networks social_science_methodology sociology something_about_america sparsity spatial_statistics state-space_models statistical_inference_for_stochastic_processes statistical_mechanics statistics stochastic_processes text_mining the_american_dilemma the_continuing_crises time_series to:blog to:nb to_be_shot_after_a_fair_trial to_read to_teach:complexity-and-inference to_teach:data-mining to_teach:statcomp to_teach:undergrad-ada track_down_references us-iraq_war us_politics utter_stupidity vast_right-wing_conspiracy via:? via:henry_farrell via:jbdelong via:klk visual_display_of_quantitative_information whats_gone_wrong_with_america why_oh_why_cant_we_have_a_better_academic_publishing_system why_oh_why_cant_we_have_a_better_press_corps

Copy this bookmark: