11754
The Sharing Economy Isn’t About Sharing at All - HBR
Well, yes, obviously. (I have used Zipcar regularly for years, but it would never have occurred to me that it was some form of _sharing_; it's a car _rental_ company which is a lot more convenient for me than the older ones.) Something like Uber or Airbnb makes its money by being the centralized intermediary between consumers and asset owners/service workers. (The goal would be to become the only effective marketplace for that sort of good or service --- would that be the monagorist? --- and so collect rents.) I guess I'd supposed/hoped that people actually in the industry realized this, and the "sharing" rhetoric was conscious camouflage, but this article makes it sound like they believe their own press.
corporations  networked_life  marketing  market_making  economics  have_read  via:wh
18 hours ago
The cultural evolution of mind reading
"It is not just a manner of speaking: “Mind reading,” or working out what others are thinking and feeling, is markedly similar to print reading. Both of these distinctly human skills recover meaning from signs, depend on dedicated cortical areas, are subject to genetically heritable disorders, show cultural variation around a universal core, and regulate how people behave. But when it comes to development, the evidence is conflicting. Some studies show that, like learning to read print, learning to read minds is a long, hard process that depends on tuition. Others indicate that even very young, nonliterate infants are already capable of mind reading. Here, we propose a resolution to this conflict. We suggest that infants are equipped with neurocognitive mechanisms that yield accurate expectations about behavior (“automatic” or “implicit” mind reading), whereas “explicit” mind reading, like literacy, is a culturally inherited skill; it is passed from one generation to the next by verbal instruction."

--- ETA after reading: interesting and not crazy, though not completely convincing; I'd need to think carefully, and look at their references, to decide how much of this is about mind-reading, the activity, vs. talking about mind-reading. (It's also interesting to imagine the psychological theories we might have if literacy were a cultural universal, which it could well be in a century or two.)
to:NB  have_read  cognitive_science  cognitive_development  cultural_transmission_of_cognitive_tools  theory_of_mind
yesterday
[1411.6179] Spatiotemporal Detection of Unusual Human Population Behavior Using Mobile Phone Data
"With the aim to contribute to humanitarian response to disasters and violent events, scientists have proposed the development of analytical tools that could identify emergency events in real-time, using mobile phone data. The assumption is that dramatic and discrete changes in behavior, measured with mobile phone data, will indicate extreme events. In this study, we propose an efficient system for spatiotemporal detection of behavioral anomalies from mobile phone data and compare sites with behavioral anomalies to an extensive database of emergency and non-emergency events in Rwanda. Our methodology successfully captures anomalous behavioral patterns associated with a broad range of events, from religious and official holidays to earthquakes, floods, violence against civilians and protests. Our results suggest that human behavioral responses to extreme events are complex and multi-dimensional, including extreme increases and decreases in both calling and movement behaviors. We also find significant temporal and spatial variance in responses to extreme events. Our behavioral anomaly detection system and extensive discussion of results are a significant contribution to the long-term project of creating an effective real-time event detection system with mobile phone data and we discuss the implications of our findings for future research to this end. "
to:NB  spatio-temporal_statistics  re:social_networks_as_sensor_networks  data_mining  statistics  dobra.adrian  eagle.nathan  anomaly_detection
yesterday
Discovery: Fish Live beneath Antarctica - Scientific American
All that's missing is commentary from Drs. Lake and Danforth, and maybe the adjective "Stygian".
antarctica  biology  cthulhiana
yesterday
[1411.2664] Preserving Statistical Validity in Adaptive Data Analysis
"A great deal of effort has been devoted to reducing the risk of spurious scientific discoveries, from the use of sophisticated validation techniques, to deep statistical methods for controlling the false discovery rate in multiple hypothesis testing. However, there is a fundamental disconnect between the theoretical results and the practice of data analysis: the theory of statistical inference assumes a fixed collection of hypotheses to be tested, or learning algorithms to be applied, selected non-adaptively before the data are gathered, whereas in practice data is shared and reused with hypotheses and new analyses being generated on the basis of data exploration and the outcomes of previous analyses.
"In this work we initiate a principled study of how to guarantee the validity of statistical inference in adaptive data analysis. As an instance of this problem, we propose and investigate the question of estimating the expectations of m adaptively chosen functions on an unknown distribution given n random samples.
"We show that, surprisingly, there is a way to estimate an \emph{exponential} in n number of expectations accurately even if the functions are chosen adaptively. This gives an exponential improvement over standard empirical estimators that are limited to a linear number of estimates. Our result follows from a general technique that counter-intuitively involves actively perturbing and coordinating the estimates, using techniques developed for privacy preservation. We give additional applications of this technique to our question."
to:NB  to_read  statistics  learning_theory  via:arthegall  concentration_of_measure  stability_of_learning
yesterday
Sociolinguistic Typology: Social Determinants of Linguistic Complexity
"Peter Trudgill looks at why human societies at different times and places produce different kinds of language. He considers how far social factors influence language structure and compares languages and dialects spoken across the globe, from Vietnam to Nigeria, Polynesia to Scandinavia, and from Canada to Amazonia.
"Modesty prevents Pennsylvanian Dutch Mennonites using the verb wotte ('want'); stratified society lies behind complicated Japanese honorifics; and a mountainous homeland suggests why speakers of Tibetan-Burmese Lahu have words for up there and down there. But culture and environment don't explain why Amazonian Jarawara needs three past tenses, nor why Nigerian Igbo can make do with eight adjectives, nor why most languages spoken in high altitudes do not exhibit an array of spatial demonstratives. Nor do they account for some languages changing faster than others or why some get more complex while others get simpler. The author looks at these and many other puzzles, exploring the social, linguistic, and other factors that might explain them and in the context of a huge range of languages and societies."
to:NB  books:noted  cultural_evolution  linguistics  complexity
yesterday
The C&O Canal Companion
"A comprehensive guide to one of America's unique national parks, The C&O Canal Companion takes readers on a mile-by-mile, lock-by-lock tour of the 184-mile Potomac River waterway and towpath that stretches from Washington, D.C., to Cumberland, Maryland, and the Allegheny Mountains. Making extensive use of records at the National Archives and the C&O Canal Park Headquarters, Mike High demonstrates how events and places along the canal relate to the history of the nation, from Civil War battles and river crossings to the frontier forts guarding the route to the West. Using attractive photographs and drawings, he introduces park visitors to the hidden history along the canal and provides practical advice on cycling, paddling, and hiking—all the information needed to fully enjoy the park's varied delights.
"Thoroughly overhauled and expanded, the second edition of this popular, fact-packed book features updated maps and photographs, as well as the latest information on lodgings and other facilities for hikers, bikers, and campers on weekend excursions or extended outdoor vacations. It also delves deeper into the history of the upland region, relaying new narratives about Native American settlements, the European explorers and traders who were among the first settlers, and the lives of slaves and free blacks who lived along or escaped slavery via the canal.
"Visitors to the C&O Canal who are interested in exploring natural wonders while tracing the routes of pioneers and engineers—not to mention the path of George Washington, who explored the Potomac route to the West as a young man and later laid out the first canals to make the river navigable—will find this guide indispensable."
books:noted  maryland  appalachia  travel  american_history  re:GAP_trail_trip
yesterday
Abandoned Footnotes: The Saudi Monarchy as a Family Firm
"Indeed, in some respects the Saudi system has more in common with systems of single party rule than with medieval European kingship. The Al Saud are an odd party, to be sure; only women can join voluntarily (by marrying into the family) but without gaining any formal power (though they may have influence through their sons). But, with its internal dispute resolution mechanisms, its intelligence networks, its “service” requirements, the family basically mimics the institutions of an effective (if small) party on the Leninist model. And thus the incentives that keep it in power are not dissimilar from the incentives that kept the PRI in Mexico or the Chinese Communist Party in power: they are basically reasons for insiders to stick together and not seek outsider support, and thus to prefer corporate control of the state to going alone."
3 days ago
Pop Sonnets
Ogged: "That Website Than Which No Greater Can Be Conceived".
funny  poetry  popular_culture  affectionate_parody  via:unfogged
3 days ago
Of Course You Hear What I Hear — Christmas Music Season Is Totally Data-Driven | FiveThirtyEight
Observations: (i) The domination of our popular culture by the childhoods of baby boomers --- my students' grandparents! --- is truly a force to behold. (ii) And of course that helps shape what all subsequent generations hear as "Christmas music". (iii) I am unreasonably charmed by the idea of using something like pagerank (or is it Kleinberg's HITS?) to identify Christmas-ness. (iv) Wait, _that's_ what happened to the author of _The War Against Silence_?
data_mining  music  christmas  popular_culture  towards_an_algorithmic_criticism  to_teach:data-mining  path_dependence  pagerank  to:blog  have_read
6 days ago
Numenera
Because what the Dying Earth/Viriconium/Urth etc. needed was a role-playing game. (Actually, it looks pretty good.)
role-playing_games  via:???
6 days ago
r - knitr - How to align code and plot side by side - Stack Overflow
Could this be modified to put figure on top, then code, then caption?
6 days ago
A practical introduction to functional programming at Mary Rose Cook
Uses python, but these ideas are exactly the ones I try to teach in that part of my R course, only better expressed.
7 days ago
How To Tell If You Are In A High Fantasy Novel
Funny, but really this is a criticism of extruded epic fantasy product, rather than high fantasy proper; it's about mistaking Poughkeepsie for Elfland.
funny:geeky  funny:malicious  literary_criticism  fantasy
7 days ago
Dehling , Durieu , Tusche : Approximating class approach for empirical processes of dependent sequences indexed by functions
"We study weak convergence of empirical processes of dependent data (Xi)i≥0, indexed by classes of functions. Our results are especially suitable for data arising from dynamical systems and Markov chains, where the central limit theorem for partial sums of observables is commonly derived via the spectral gap technique. We are specifically interested in situations where the index class  is different from the class of functions f for which we have good properties of the observables (f(Xi))i≥0. We introduce a new bracketing number to measure the size of the index class  which fits this setting. Our results apply to the empirical process of data (Xi)i≥0 satisfying a multiple mixing condition. This includes dynamical systems and Markov chains, if the Perron–Frobenius operator or the Markov operator has a spectral gap, but also extends beyond this class, for example, to ergodic torus automorphisms."
to:NB  empirical_processes  approximation  stochastic_processes  markov_models  dynamical_systems  ergodic_theory  mixing
7 days ago
Lederer , van de Geer : New concentration inequalities for suprema of empirical processes
"While effective concentration inequalities for suprema of empirical processes exist under boundedness or strict tail assumptions, no comparable results have been available under considerably weaker assumptions. In this paper, we derive concentration inequalities assuming only low moments for an envelope of the empirical process. These concentration inequalities are beneficial even when the envelope is much larger than the single functions under consideration."
to:NB  empirical_processes  concentration_of_measure  deviation_inequalities  van_de_geer.sara  stochastic_processes  to_read
7 days ago
Crisan , Míguez : Particle-kernel estimation of the filter density in state-space models
"Sequential Monte Carlo (SMC) methods, also known as particle filters, are simulation-based recursive algorithms for the approximation of the a posteriori probability measures generated by state-space dynamical models. At any given time t, a SMC method produces a set of samples over the state space of the system of interest (often termed “particles”) that is used to build a discrete and random approximation of the posterior probability distribution of the state variables, conditional on a sequence of available observations. One potential application of the methodology is the estimation of the densities associated to the sequence of a posteriori distributions. While practitioners have rather freely applied such density approximations in the past, the issue has received less attention from a theoretical perspective. In this paper, we address the problem of constructing kernel-based estimates of the posterior probability density function and its derivatives, and obtain asymptotic convergence results for the estimation errors. In particular, we find convergence rates for the approximation errors that hold uniformly on the state space and guarantee that the error vanishes almost surely as the number of particles in the filter grows. Based on this uniform convergence result, we first show how to build continuous measures that converge almost surely (with known rate) toward the posterior measure and then address a few applications. The latter include maximum a posteriori estimation of the system state using the approximate derivatives of the posterior density and the approximation of functionals of it, for example, Shannon’s entropy."
to:NB  particle_filters  kernel_estimators  density_estimation  filtering  state_estimation  state-space_models  statistics  computational_statistics
7 days ago
Trashorras , Wintenberger : Large deviations for bootstrapped empirical measures
"We investigate the Large Deviations (LD) properties of bootstrapped empirical measures with exchangeable weights. Our main results show in great generality how the resulting rate functions combine the LD properties of both the sample weights and the observations. As an application, we obtain new LD results and discuss both conditional and unconditional LD-efficiency for many classical choices of entries such as Efron’s, leave-p-out, i.i.d. weighted, k-blocks bootstraps, etc."
to:NB  bootstrap  empirical_processes  large_deviations  stochastic_processes  statistics  re:almost_none
7 days ago
Fischer : On the form of the large deviation rate function for the empirical measures of weakly interacting systems
"A basic result of large deviations theory is Sanov’s theorem, which states that the sequence of empirical measures of independent and identically distributed samples satisfies the large deviation principle with rate function given by relative entropy with respect to the common distribution. Large deviation principles for the empirical measures are also known to hold for broad classes of weakly interacting systems. When the interaction through the empirical measure corresponds to an absolutely continuous change of measure, the rate function can be expressed as relative entropy of a distribution with respect to the law of the McKean–Vlasov limit with measure-variable frozen at that distribution. We discuss situations, beyond that of tilted distributions, in which a large deviation principle holds with rate function in relative entropy form."
to:NB  interacting_particle_systems  large_deviations  information_theory  stochastic_processes  to_read  re:almost_none
7 days ago
Blanchard , Delattre , Roquain : Testing over a continuum of null hypotheses with False Discovery Rate control
"We consider statistical hypothesis testing simultaneously over a fairly general, possibly uncountably infinite, set of null hypotheses, under the assumption that a suitable single test (and corresponding p-value) is known for each individual hypothesis. We extend to this setting the notion of false discovery rate (FDR) as a measure of type I error. Our main result studies specific procedures based on the observation of the p-value process. Control of the FDR at a nominal level is ensured either under arbitrary dependence of p-values, or under the assumption that the finite dimensional distributions of the p-value process have positive correlations of a specific type (weak PRDS). Both cases generalize existing results established in the finite setting. Its interest is demonstrated in several non-parametric examples: testing the mean/signal in a Gaussian white noise model, testing the intensity of a Poisson process and testing the c.d.f. of i.i.d. random variables."
to:NB  multiple_testing  hypothesis_testing  statistics
7 days ago
Lions , Nisio : A uniqueness result for the semigroup associated with the Hamilton-Jacobi-Belman operator
b/c the Feng and Kurtz book on large deviations for stochastic processes (e.g., evolutionary models) seems to presume the reader knows what a "Nisio semigroup" is, and I don't.

ETA: I should have known better than to expect reading pure mathematicians would be clarifying.
8 days ago
[1312.7851] Effective Degrees of Freedom: A Flawed Metaphor
"To most applied statisticians, a fitting procedure's degrees of freedom is synonymous with its model complexity, or its capacity for overfitting to data. In particular, it is often used to parameterize the bias-variance tradeoff in model selection. We argue that, contrary to folk intuition, model complexity and degrees of freedom are not synonymous and may correspond very poorly. We exhibit and theoretically explore various examples of fitting procedures for which degrees of freedom is not monotonic in the model complexity parameter, and can exceed the total dimension of the response space. Even in very simple settings, the degrees of freedom can exceed the dimension of the ambient space by an arbitrarily large amount. We show the degrees of freedom for any non-convex projection method can be unbounded."

--- I have never really liked "degrees of freedom"...

--- ETA after reading: to be clear, no one is arguing about "effective degrees of freedom", in the sense of Efron (1986), telling us about over-fitting. The demonstrations here are that the geometric metaphor behind "degrees of freedom", while holding for linear models (without model selection), becomes very misleading in other contexts. Now, since I prefer to think of model selection in terms of capacity to over-fit, rather than the number of adjustable knobs...
8 days ago
Fully Exponential Laplace Approximations to Expectations and Variances of Nonpositive Functions (Tierny, Kass and Kadane, 1989)
The un-numbered equation defining $A_K$, between (2.3) and (2.4), is wrong --- the O(1/n) terms are off by various powers of $\sigma^2$. However, both (2.3) and (2.4) are right...
in_NB  have_read  laplace_approximation  statistics  approximation  kith_and_kin  kass.robert
8 days ago
[1410.1184] Graphical LASSO Based Model Selection for Time Series
"We propose a novel graphical model selection (GMS) scheme for high-dimensional stationary time series or discrete time process. The method is based on a natural generalization of the graphical LASSO (gLASSO), introduced originally for GMS based on i.i.d. samples, and estimates the conditional independence graph (CIG) of a time series from a finite length observation. The gLASSO for time series is defined as the solution of an l1-regularized maximum (approximate) likelihood problem. We solve this optimization problem using the alternating direction method of multipliers (ADMM). Our approach is nonparametric as we do not assume a finite dimensional (e.g., an autoregressive) parametric model for the observed process. Instead, we require the process to be sufficiently smooth in the spectral domain. For Gaussian processes, we characterize the performance of our method theoretically by deriving an upper bound on the probability that our algorithm fails to correctly identify the CIG. Numerical experiments demonstrate the ability of our method to recover the correct CIG from a limited amount of samples."
to:NB  graphical_models  time_series  model_selection  statistics
8 days ago
[1411.6512] Graphical Modeling of Spatial Health Data
"The literature on Gaussian graphical models (GGMs) contains two equally rich and equally significant domains of research efforts and interests. The first research domain relates to the problem of graph determination. That is, the underlying graph is unknown and needs to be inferred from the data. The second research domain dominates the applications in spatial epidemiology. In this context GGMs are typically referred to as Gaussian Markov random fields (GMRFs). Here the underlying graph is assumed to be known: the vertices correspond to geographical areas, while the edges are associated with areas that are considered to be neighbors of each other (e.g., if they share a border). We introduce multi-way Gaussian graphical models that unify the statistical approaches to inference for spatiotemporal epidemiology with the literature on general GGMs. The novelty of the proposed work consists of the addition of the G-Wishart distribution to the substantial collection of statistical tools used to model multivariate areal data. As opposed to fixed graphs that describe geography, there is an inherent uncertainty related to graph determination across the other dimensions of the data. Our new class of methods for spatial epidemiology allow the simultaneous use of GGMs to represent known spatial dependencies and to determine unknown dependencies in the other dimensions of the data."
to:NB  spatial_statistics  spatio-temporal_statistics  graphical_models  statistics  dobra.adrian
8 days ago
Beyond Models: Forecasting Complex Network Processes Directly from Data
"Complex network phenomena – such as information cascades in online social networks – are hard to fully observe, model, and forecast. In forecasting, a recent trend has been to forgo the use of parsimonious models in favor of models with in- creasingly large degrees of freedom that are trained to learn the behavior of a process from historical data. Extrapolat- ing this trend into the future, eventually we would like to renounce models all together. But is it possible to forecast the evolution of a complex stochastic process directly from the data without a model? In this work we show that the answer is yes. We present SED, an algorithm that forecasts process statistics based on relationships of statistical equiv- alence using two general axioms and historical data. To the best of our knowledge, SED is the first method that can perform axiomatic, model-free forecasts of complex stochas- tic processes. Our simulations using simple and complex evolving processes and tests performed on a large real-world dataset show promising results."

--- The last tag applies with extreme vehemence.
to:NB  network_data_analysis  information_cascades  time_series  bootstrap  statistics  prediction  to_be_shot_after_a_fair_trial
8 days ago
[1405.0058] Underestimating extreme events in power-law behavior due to machine-dependent cutoffs
"Power-law distributions are typical macroscopic features occurring in almost all complex systems observable in nature. As a result, researchers in quantitative analyses must often generate random synthetic variates obeying power-law distributions. The task is usually performed through standard methods that map uniform random variates into the desired probability space. Whereas all these algorithms are theoretically solid, in this paper we show that they are subject to severe machine-dependent limitations. As a result, two dramatic consequences arise: (i) the sampling in the tail of the distribution is not random but deterministic; (ii) the moments of the sample distribution, which are theoretically expected to diverge as functions of the sample sizes, converge instead to finite values. We provide quantitative indications for the range of distribution parameters that can be safely handled by standard libraries used in computational analyses. Whereas our findings indicate possible reinterpretations of numerical results obtained through flawed sampling methodologies, they also pave the way for the search for a concrete solution to this central issue shared by all quantitative sciences dealing with complexity."
to:NB  to_read  heavy_tails  approximation  computational_statistics  have_skimmed
8 days ago
[1411.3984] Qualitative Robustness in Bayesian Inference
"We develop a framework for quantifying the sensitivity of the distribution of posterior distributions with respect to perturbations of the prior and data generating distributions in the limit when the number of data points grows towards infinity. In this generalization of Hampel and Cuevas' notion of qualitative robustness to Bayesian inference, posterior distributions are analyzed as measure-valued random variables (measures randomized through the data) and their robustness is quantified using the total variation, Prokhorov, and Ky Fan metrics. Our results show that (1) the assumption that the prior has Kullback-Leibler support at the parameter value generating the data, classically used to prove consistency, can also be used to prove the non-robustness of posterior distributions with respect to infinitesimal perturbations (in total variation metric) of the class of priors satisfying that assumption, (2) for a prior which has global Kullback-Leibler support on a space which is not totally bounded, we can establish non qualitative robustness and (3) consistency and robustness are, to some degree, antagonistic requirements and a careful selection of the prior is important if both properties (or their approximations) are to be achieved.
"The mechanisms supporting our results are different and complementary to those discovered by Hampel and developed by Cuevas, and also indicate that misspecification generates non qualitative robustness."
to:NB  bayesianism  bayesian_consistency  misspecification  statistics
8 days ago
[1411.2755] Estimating causal structure using conditional DAG models
"This paper considers inference of causal structure in a class of graphical models called "conditional DAGs". These are directed acyclic graph (DAG) models with two kinds of variables, primary and secondary. The secondary variables are used to aid in estimation of causal relationships between the primary variables. We give causal semantics for this model class and prove that, under certain assumptions, the direction of causal influence is identifiable from the joint observational distribution of the primary and secondary variables. A score-based approach is developed for estimation of causal structure using these models and consistency results are established. Empirical results demonstrate gains compared with formulations that treat all variables on an equal footing, or that ignore secondary variables. The methodology is motivated by applications in molecular biology and is illustrated here using simulated data and in an analysis of proteomic data from the Cancer Genome Atlas."
to:NB  graphical_models  causal_inference  causal_discovery  statistics  to_read
8 days ago
[1406.5986] A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares
"We consider statistical aspects of solving large-scale least-squares (LS) problems using randomized sketching algorithms. Prior work has typically adopted an \emph{algorithmic perspective}, in that it has made no statistical assumptions on the input X and Y, and instead it has assumed that the data (X,Y) are fixed and worst-case. In this paper, we adopt a \emph{statistical perspective}, and we consider the mean-squared error performance of randomized sketching algorithms, when data (X,Y) are generated according to a statistical linear model Y=Xβ+ϵ, where ϵ is a noise process. To do this, we first develop a framework for assessing, in a unified manner, algorithmic and statistical aspects of randomized sketching methods. We then consider the statistical predicition efficiency (SPE) and the statistical residual efficiency (SRE) of the sketched LS estimator; and we use our framework to provide results for several types of random projection and random sampling sketching algorithms. Among other results, we show that the SRE can be bounded when p≲r≪n but that the SPE typically requires the sample size r to be substantially larger. Our theoretical results reveal that, depending on the specifics of the situation, leverage-based sampling methods can perform as well as or better than projection methods. Our empirical results reveal that when r is only slightly greater than p and much less than n, projection-based methods out-perform sampling-based methods, but as r grows, sampling methods start to out-perform projection methods."
to:NB  computational_statistics  regression  linear_regression  random_projections  statistics
8 days ago
[1407.4578] Maximal Autocorrelation Functions in Functional Data Analysis
"This paper proposes a new factor rotation for the context of functional principal components analysis. This rotation seeks to re-represent a functional subspace in terms of directions of decreasing smoothness as represented by a generalized smoothing metric. The rotation can be implemented simply and we show on two examples that this rotation can improve the interpretability of the leading components."
8 days ago
The Strange Inevitability of Evolution - Issue 20: Creativity - Nautilus
Nice popularization by Philip Ball about neutral networks in evolution, and how they contribute to both robustness and finding innovations. It's obviously very strongly based on talking with Andreas (so, e.g., no mention of Gerhart and Kirshner!), but not crazily so.
evolutionary_biology  biochemical_networks  popular_science  wagner.andreas  schuster.peter  have_read  via:henry_farrell  to:blog
8 days ago
[1408.5810] Kernel-based Information Criterion
"This paper introduces Kernel-based Information Criterion (KIC) for model selection in regression analysis. The novel kernel-based complexity measure in KIC efficiently computes the interdependency between parameters of the model using a variable-wise variance and yields selection of better, more robust regressors. Experimental results show superior performance on both simulated and real data sets compared to Leave-One-Out Cross-Validation (LOOCV), kernel-based Information Complexity (ICOMP), and maximum log of marginal likelihood in Gaussian Process Regression (GPR)."
to:NB  information_criteria  model_selection  statistics  kernel_methods  nonparametrics  regression
9 days ago
[1409.3886] On a Nonparametric Notion of Residual and its Applications
"Let (X, Z) be a continuous random vector. In this paper, we define the notion of a nonparametric residual of X on Z that is always independent of the predictor Z. We study its properties and show that the proposed notion of residual matches with the usual residual (error) in a multivariate normal regression model. Given a random vector (X, Y, Z), we use this notion of residual to show that the conditional independence between X and Y, given Z, is equivalent to the mutual independence of the residuals (of X on Z and Y on Z) and Z. This result is used to develop a test for conditional independence."
to:NB  dependence_measures  regression  prediction  statistics  nonparametrics
9 days ago
[1409.0031] Tracking Dynamic Point Processes on Networks
"Cascading chains of events are a salient feature of many real-world social, biological, and financial networks. In social networks, social reciprocity accounts for retaliations in gang interactions, proxy wars in nation-state conflicts, or Internet memes shared via social media. Neuron spikes stimulate or inhibit spike activity in other neurons. Stock market shocks can trigger a contagion of volatility throughout a financial network. In these and other examples, only individual events associated with network nodes are observed, usually without knowledge of the underlying dynamic relationships between nodes. This paper addresses the challenge of tracking how events within such networks stimulate or influence future events. The proposed approach is an online learning framework well-suited to streaming data, using a multivariate Hawkes point process model to encapsulate autoregressive features of observed events within the social network. Recent work on online learning in dynamic environments is leveraged not only to exploit the dynamics within the underlying network, but also to track that network structure as it evolves. Regret bounds and experimental results demonstrate that the proposed method performs nearly as well as an oracle or batch algorithm."
to:NB  network_data_analysis  point_processes  time_series  online_learning  statistics  willett.rebecca_m.
9 days ago
[1412.3432] Detecting Overlapping Communities in Networks with Spectral Methods
"Community detection is a fundamental problem in network analysis which is made more challenging by overlaps between communities which often occur in practice. Here we propose a general, flexible, and interpretable generative model for overlapping communities, which can be thought of as a generalization of the degree-corrected stochastic block model. We develop an efficient spectral algorithm for estimating the community memberships, which deals with the overlaps by employing the K-medians algorithm rather than the usual K-means for clustering in the spectral domain. We show that the algorithm is asymptotically consistent when networks are not too sparse and the overlaps between communities not too large. Numerical experiments on both simulated networks and many real social networks demonstrate that our method performs very well compared to a number of benchmark methods for overlapping community detection."
to:NB  spectral_clustering  network_data_analysis  community_discovery  statistics  levina.liza
9 days ago
[1410.7404] Maximally Informative Hierarchical Representations of High-Dimensional Data
"We consider a set of probabilistic functions of some input variables as a representation of the inputs. We present bounds on how informative a representation is about input data. We extend these bounds to hierarchical representations so that we can quantify the contribution of each layer towards capturing the information in the original data. The special form of these bounds leads to a simple, bottom-up optimization procedure to construct hierarchical representations that are also maximally informative about the data. This optimization has linear computational complexity and constant sample complexity in the number of variables. These results establish a new approach to unsupervised learning of deep representations that is both principled and practical. We demonstrate the usefulness of the approach on both synthetic and real-world data."
to:NB  have_read  information_theory  inference_to_latent_objects  graphical_models  ver_steeg.greg  galstyan.aram  probability
9 days ago
[1410.3533] Specification tests for nonlinear dynamic models
"We propose a new adequacy test and a graphical evaluation tool for nonlinear dynamic models. The proposed techniques can be applied in any setup where parametric conditional distribution of the data is specified, in particular to models involving conditional volatility, conditional higher moments, conditional quantiles, asymmetry, Value at Risk models, duration models, diffusion models, etc. Compared to other tests, the new test properly controls the nonlinear dynamic behavior in conditional distribution and does not rely on smoothing techniques which require a choice of several tuning parameters. The test is based on a new kind of multivariate empirical process of contemporaneous and lagged probability integral transforms. We establish weak convergence of the process under parameter uncertainty and local alternatives. We justify a parametric bootstrap approximation that accounts for parameter estimation effects often ignored in practice. Monte Carlo experiments show that the test has good finite-sample size and power properties. Using the new test and graphical tools we check the adequacy of various popular heteroscedastic models for stock exchange index data."
to:NB  model_checking  misspecification  time_series  dynamical_systems  statistics
9 days ago
[1410.2597] Optimal Inference After Model Selection
"To perform inference after model selection, we propose controlling the selective type I error; i.e., the error rate of a test given that it was performed. By doing so, we recover long-run frequency properties among selected hypotheses analogous to those that apply in the classical (non-adaptive) context. Our proposal is closely related to data splitting and has a similar intuitive justification, but is more powerful. Exploiting the classical theory of Lehmann and Scheffe (1955), we derive most powerful unbiased selective tests and confidence intervals for inference in exponential family models after arbitrary selection procedures. For linear regression, we derive new selective z-tests that generalize recent proposals for inference after model selection and improve on their power, and new selective t-tests that do not require knowledge of the error variance sigma^2."
to:NB  model_selection  hypothesis_testing  confidence_sets  statistics
9 days ago
[1409.7458] Beyond Maximum Likelihood: from Theory to Practice
"Maximum likelihood is the most widely used statistical estimation technique. Recent work by the authors introduced a general methodology for the construction of estimators for functionals in parametric models, and demonstrated improvements - both in theory and in practice - over the maximum likelihood estimator (MLE), particularly in high dimensional scenarios involving parameter dimension comparable to or larger than the number of samples. This approach to estimation, building on results from approximation theory, is shown to yield minimax rate-optimal estimators for a wide class of functionals, implementable with modest computational requirements. In a nutshell, a message of this recent work is that, for a wide class of functionals, the performance of these essentially optimal estimators with n samples is comparable to that of the MLE with nlnn samples.
"In the present paper, we highlight the applicability of the aforementioned methodology to statistical problems beyond functional estimation, and show that it can yield substantial gains. For example, we demonstrate that for learning tree-structured graphical models, our approach achieves a significant reduction of the required data size compared with the classical Chow--Liu algorithm, which is an implementation of the MLE, to achieve the same accuracy. The key step in improving the Chow--Liu algorithm is to replace the empirical mutual information with the estimator for mutual information proposed by the authors. Further, applying the same replacement approach to classical Bayesian network classification, the resulting classifiers uniformly outperform the previous classifiers on 26 widely used datasets."
to:NB  estimation  likelihood  statistics
9 days ago
[1407.0381] Minimax rates of entropy estimation on large alphabets via best polynomial approximation
"Consider the problem of estimating the Shannon entropy of a distribution on k elements from n independent samples. We show that the minimax mean-square error is within universal multiplicative constant factors of
(knlogk)2+log2kn
as long as n grows no faster than a polynomial of k. This implies the recent result of Valiant-Valiant \cite{VV11} that the minimal sample size for consistent entropy estimation scales according to Θ(klogk). The apparatus of best polynomial approximation plays a key role in both the minimax lower bound and the construction of optimal estimators."
to:NB  entropy_estimation  information_theory  minimax  statistics
9 days ago
[1406.5647] On semidefinite relaxations for the block model
"The stochastic block model (SBM) is a popular tool for community detection in networks, but fitting it by maximum likelihood (MLE) involves an infeasible optimization problem. We propose a new semi-definite programming (SDP) solution to the problem of fitting the SBM, derived as a relaxation of the MLE. Our relaxation, which we call SDP-1, is tighter than other recently proposed SDP relaxations, namely what we call SDP-2 and SDP-3, and thus previously established theoretical guarantees carry over. However, we show that SDP-1 is, in fact, strongly consistent (i.e., exactly recovers true communities) over a wider class of SBMs than what current results suggest. In particular, one can relax the assumption of strong assortativity, implicit in consistency conditions of current SDPs, to that of (weak) assortativity for SDP-1, thus, significantly broadening the class of applicable models. Our approach in deriving strong consistency results is based on a primal-dual witness construction, and as a by-product we recover current results for SDP-2. Our approach also suggests that strong assortativity is necessary for the success of SDP-2 and SDP-3 and is not an artifact of the current proofs. We provide empirical evidence of this conjecture, in addition to other numerical results comparing these SDPs, and adjacency-based spectral clustering, on real and synthetic data. Another feature of our relaxation is the tendency to produce more balanced (i.e., equal-sized) communities which, as we show with a real-data example, makes it the ideal tool for fitting network histograms, a concept gaining popularity in the graphon estimation literature. A general theme throughout will be to view all these SDPs within a unified framework, specifically, as relaxations of the MLE over various sub-classes of the SBM. This also leads to a connection with the well-known problem of sparse PCA."
to:NB  network_data_analysis  community_discovery  optimization  statistics  levina.liza
9 days ago
[1405.0352] Asymptotic Theory for Random Forests
"Random forests have proven themselves to be reliable predictive algorithms in many application areas. Not much is known, however, about the statistical properties of random forests. Several authors have established conditions under which their predictions are consistent, but these results do not provide practical estimates of the scale of random forest errors. In this paper, we analyze a random forest model based subsampling, and show that random forest predictions are asymptotically normal provided that the subsample size s scales as s(n)/n = o(log(n)^{-d}), where n is the number of training examples and d is the number of features. Moreover, we show that the asymptotic variance can consistently be estimated using an infinitesimal jackknife for bagged ensembles recently proposed by Efron (2013). In other words, our results let us both characterize and estimate the error-distribution of random forest predictions. Thus, random forests need not only be treated as black-box predictive algorithms, and can also be used for statistical inference."
to:NB  decision_theory  ensemble_methods  statistics
9 days ago
[1405.2722] Model Selection in Overlapping Stochastic Block Models
"Networks are a commonly used mathematical model to describe the rich set of interactions between objects of interest. Many clustering methods have been developed in order to partition such structures, among which several rely on underlying probabilistic models, typically mixture models. The relevant hidden structure may however show overlapping groups in several applications. The Overlapping Stochastic Block Model (2011) has been developed to take this phenomenon into account. Nevertheless, the problem of the choice of the number of classes in the inference step is still open. To tackle this issue, we consider the proposed model in a Bayesian framework and develop a new criterion based on a non asymptotic approximation of the marginal log-likelihood. We describe how the criterion can be computed through a variational Bayes EM algorithm, and demonstrate its efficiency by running it on both simulated and real data."
to:NB  community_discovery  network_data_analysis  statistics  model_selection
9 days ago
[1405.5505] Kernel Mean Shrinkage Estimators
"A mean function in a reproducing kernel Hilbert space (RKHS), or a kernel mean, is an important part of many algorithms ranging from kernel principal component analysis to Hilbert-space embedding of distributions. Given a finite sample, an empirical average has been used consistently as a standard estimator for the true kernel mean. Despite a common belief on the optimality of this estimator, we show that it can be improved thanks to a well-known phenomenon in statistics called Stein phenomenon. Our theoretical analysis reveals the existence of a wide class of estimators that are "better" than the standard one. Focusing on a subset of this class, we propose computationally efficient kernel mean shrinkage estimators. The proposed estimators are supplemented by both thorough theoretical justifications and empirical evaluations on several applications, which clearly demonstrate that the proposed estimators outperform the standard one. This improvement sheds light on high-dimensional kernel-based statistical inference and machine learning."
to:NB  statistics  kernel_methods  hilbert_space  hypothesis_testing  shrinkage  nonparametrics
9 days ago
[1406.1922] Stein Shrinkage for Cross-Covariance Operators and Kernel Independence Testing
"Cross-covariance operators arise naturally in many applications using Reproducing Kernel Hilbert Spaces (RKHSs) and are typically estimated using an empirical plugin estimator, which we demonstrate are poor estimators of operator (eigen)spectra at low sample sizes. This paper studies the phenomenon of Stein shrinkage for infinite dimensional cross-covariance operators in RKHSs, as briefly initiated by Muandet et al (2014) who recently suggested two shrinkage estimators. We develop a third family of shrinkage estimators and undertake a study of how shrinkage improves estimation of operator spectra. We demonstrate an important and surprising application, that shrunk test statistics yield higher power for kernel independence tests and we provide insights into why they improve performance."
to:NB  hypothesis_testing  shrinkage  statistics  nonparametrics  hilbert_space  kith_and_kin  ramdas.aaditya  wehbe.leila
9 days ago
[1411.2045] Multivariate f-Divergence Estimation With Confidence
"The problem of f-divergence estimation is important in the fields of machine learning, information theory, and statistics. While several nonparametric divergence estimators exist, relatively few have known convergence properties. In particular, even for those estimators whose MSE convergence rates are known, the asymptotic distributions are unknown. We establish the asymptotic normality of a recently proposed ensemble estimator of f-divergence between two distributions from a finite number of samples. This estimator has MSE convergence rate of O(1/T), is simple to implement, and performs well in high dimensions. This theory enables us to perform divergence-based inference tasks such as testing equality of pairs of distributions based on empirical samples. We experimentally validate our theoretical results and, as an illustration, use them to empirically bound the best achievable classification error."
to:NB  estimation  entropy_estimation  information_theory  statistics  two-sample_tests
10 days ago
[1405.1533] A consistent deterministic regression tree for non-parametric prediction of time series
"We study online prediction of bounded stationary ergodic processes. To do so, we consider the setting of prediction of individual sequences and build a deterministic regression tree that performs asymptotically as well as the best L-Lipschitz constant predictors. Then, we show why the obtained regret bound entails the asymptotical optimality with respect to the class of bounded stationary ergodic processes."
to:NB  to_read  time_series  nonparametrics  decision_trees  regression  learning_theory  statistics
10 days ago
[1411.1557] Proof Supplement - Learning Sparse Causal Models is not NP-hard (UAI2013)
"This article contains detailed proofs and additional examples related to the UAI-2013 submission `Learning Sparse Causal Models is not NP-hard'. It describes the FCI+ algorithm: a method for sound and complete causal model discovery in the presence of latent confounders and/or selection bias, that has worst case polynomial complexity of order N2(k+1) in the number of independence tests, for sparse graphs over N nodes, bounded by node degree k. The algorithm is an adaptation of the well-known FCI algorithm by (Spirtes et al., 2000) that is also sound and complete, but has worst case complexity exponential in N."
10 days ago
[1411.1469] A Generic Sample Splitting Approach for Refined Community Recovery in Stochastic Block Models
"We propose and analyze a generic method for community recovery in stochastic block models and degree corrected block models. This approach can exactly recover the hidden communities with high probability when the expected node degrees are of order logn or higher. Starting from a roughly correct community partition given by some conventional community recovery algorithm, this method refines the partition in a cross clustering step. Our results simplify and extend some of the previous work on exact community recovery, discovering the key role played by sample splitting. The proposed method is simple and can be implemented with many practical community recovery algorithms."
to:NB  community_discovery  cross-validation  network_data_analysis  statistics  lei.jing
10 days ago
[1412.3756] Certifying and removing disparate impact
"What does it mean for an algorithm to be biased?
"In U.S. law, the notion of bias is typically encoded through the idea of \emph{disparate impact}: namely, that a process (hiring, selection, etc) that on the surface seems completely neutral might still have widely different impacts on different groups. This legal determination expects an explicit understanding of the selection process.
"If the process is an algorithm though (as is common these days), the process of determining disparate impact (and hence bias) becomes trickier. First, it might not be possible to disclose the process. Second, even if the process is open, it might be too complex to ascertain how the algorithm is making its decisions. In effect, since we don't have access to the algorithm, we must make inferences based on the \emph{data} it uses.
"We make three contributions to this problem. First, we link the legal notion of disparate impact to a measure of classification accuracy that while known, has not received as much attention as more traditional notions of accuracy. Second, we propose a test for the possibility of disparate impact based on analyzing the information leakage of protected information from the data. Finally, we describe methods by which data might be made "unbiased" in order to test an algorithm. Interestingly, our approach bears some resemblance to actual practices that have recently received legal scrutiny."
to:NB  data_mining  law  to_teach:data-mining
10 days ago
[1411.5634] Earthquake Forecasting Using Hidden Markov Models
"This paper develops a novel method, based on hidden Markov models, to forecast earthquakes and applies the method to mainshock seismic activity in southern California and western Nevada. The forecasts are of the probability of a mainshock within one, five, and ten days in the entire study region or in specific subregions and are based on the observations available at the forecast time, namely the inter event times and locations of the previous mainshocks and the elapsed time since the most recent one. Hidden Markov models have been applied to many problems, including earthquake classification; this is the first application to earthquake forecasting."
to:NB  earthquakes  geology  time_series  prediction  state-space_models  markov_models  statistics
10 days ago
A high resolution 7-Tesla resting-state fMRI test-retest dataset with cognitive and physiological measures : Scientific Data
"Here we present a test-retest dataset of functional magnetic resonance imaging (fMRI) data acquired at rest. 22 participants were scanned during two sessions spaced one week apart. Each session includes two 1.5 mm isotropic whole-brain scans and one 0.75 mm isotropic scan of the prefrontal cortex, giving a total of six time-points. Additionally, the dataset includes measures of mood, sustained attention, blood pressure, respiration, pulse, and the content of self-generated thoughts (mind wandering). This data enables the investigation of sources of both intra- and inter-session variability not only limited to physiological changes, but also including alterations in cognitive and affective states, at high spatial resolution. The dataset is accompanied by a detailed experimental protocol and source code of all stimuli used."
to:NB  data_sets  fmri  re:functional_communities
10 days ago
Democracy beyond Athens Popular Government in the Greek Classical Age | Ancient history | Cambridge University Press
"What was ancient democracy like? Why did it spread in ancient Greece? An astonishing number of volumes has been devoted to the well-attested Athenian case, while non-Athenian democracy – for which evidence is harder to come by – has received only fleeting attention. Nevertheless, there exists a scattered body of ancient material regarding democracy beyond Athens, from ancient literary authors and epigraphic documents to archaeological evidence, out of which one can build an understanding of the phenomenon. This book presents a detailed study of ancient Greek democracy in the Classical period (480 – 323 BC), focusing on examples outside Athens. It has three main goals: to identify where and when democratic governments established themselves in ancient Greek city-states; to explain why democracy spread to many parts of Greece in this period; and to further our understanding of the nature of ancient democracy by studying its practices beyond Athens."
to:NB  books:noted  ancient_history  greece  democracy
10 days ago
The 2003 Dividend Tax Cut Did Nothing to Help the Real Economy | Next New Deal
The plots are persuasive, but I guess I'd worry about selection into different corporate forms. Propensity-score matching?
(Last tag is tentative.)
11 days ago
Presumably there a linguistic-pragmatics explanation of this --- people are interpreting the question so it makes sense as something asked for by an intelligent person, quite possibly more knowledgeable than they are.
11 days ago
Identifying the Culprit: Assessing Eyewitness Identification | The National Academies Press
"Eyewitnesses play an important role in criminal cases when they can identify culprits. Estimates suggest that tens of thousands of eyewitnesses make identifications in criminal investigations each year. Research on factors that affect the accuracy of eyewitness identification procedures has given us an increasingly clear picture of how identifications are made, and more importantly, an improved understanding of the principled limits on vision and memory that can lead to failure of identification. Factors such as viewing conditions, duress, elevated emotions, and biases influence the visual perception experience. Perceptual experiences are stored by a system of memory that is highly malleable and continuously evolving, neither retaining nor divulging content in an informational vacuum. As such, the fidelity of our memories to actual events may be compromised by many factors at all stages of processing, from encoding to storage and retrieval. Unknown to the individual, memories are forgotten, reconstructed, updated, and distorted. Complicating the process further, policies governing law enforcement procedures for conducting and recording identifications are not standard, and policies and practices to address the issue of misidentification vary widely. These limitations can produce mistaken identifications with significant consequences. What can we do to make certain that eyewitness identification convicts the guilty and exonerates the innocent?
"Identifying the Culprit makes the case that better data collection and research on eyewitness identification, new law enforcement training protocols, standardized procedures for administering line-ups, and improvements in the handling of eyewitness identification in court can increase the chances that accurate identifications are made. This report explains the science that has emerged during the past 30 years on eyewitness identifications and identifies best practices in eyewitness procedures for the law enforcement community and in the presentation of eyewitness evidence in the courtroom. In order to continue the advancement of eyewitness identification research, the report recommends a focused research agenda."
to:NB  books:noted  psychology  law
11 days ago
[1409.2090] On the asymptotics of random forests
"The last decade has witnessed a growing interest in random forest models which are recognized to exhibit good practical performance, especially in high-dimensional settings. On the theoretical side, however, their predictive power remains largely unexplained, thereby creating a gap between theory and practice. The aim of this paper is twofold. Firstly, we provide theoretical guarantees to link finite forests used in practice (with a finite number M of trees) to their asymptotic counterparts. Using empirical process theory, we prove a uniform central limit theorem for a large class of random forest estimates, which holds in particular for Breiman's original forests. Secondly, we show that infinite forest consistency implies finite forest consistency and thus, we state the consistency of several infinite forests. In particular, we prove that q quantile forests---close in spirit to Breiman's forests but easier to study---are able to combine inconsistent trees to obtain a final consistent prediction, thus highlighting the benefits of random forests compared to single trees."
to:NB  decision_trees  ensemble_methods  empirical_processes  statistics
11 days ago
[1405.2881] Consistency of Random Forests
"Random forests are a learning algorithm proposed by Breiman (2001) which combines several randomized decision trees and aggregates their predictions by averaging. Despite its wide usage and outstanding practical performance, little is known about the mathematical properties of the procedure. This disparity between theory and practice originates in the difficulty to simultaneously analyze both the randomization process and the highly data-dependent tree structure. In the present paper, we take a step forward in forest exploration by proving a consistency result for Breiman's (2001) original algorithm in the context of additive regression models. Our analysis also sheds an interesting light on how random forests can nicely adapt to sparsity in high-dimensional settings."
to:NB  ensemble_methods  decision_trees  statistics
11 days ago
[1412.4857] A Goodness-of-fit Test for Stochastic Block Models
"The stochastic block model is a popular tool for studying community structures in network data. We develop a goodness-of-fit test for the stochastic block model. The test statistic is based on the largest singular value of a residual matrix obtained by subtracting the estimated block mean effect from the adjacency matrix. Asymptotic null distribution is obtained using recent advances in random matrix theory. The test is proved to have full power against alternative stochastic block models with finer structures. These results naturally lead to a consistent sequential testing estimate of the number of communities."

--- I remember discussing a similar idea with AC way back in 2006, but not being clever enough to see how to actually get a null distribution.
to:NB  community_discovery  network_data_analysis  random_matrices  statistics  to_read  kith_and_kin  lei.jing
11 days ago
[1405.3904] A Markov-switching model for heat waves
"Heat waves merit careful study because they inflict severe economic and societal damage. We use an intuitive, informal working definition of a heat wave---a persistent event in the tail of the temperature distribution---to motivate an interpretable latent state extreme value model. A latent variable with dependence in time indicates membership in the heat wave state. The strength of the temporal dependence of the latent variable controls the frequency and persistence of heat waves. Within each heat wave, temperatures are modeled using extreme value distributions, with extremal dependence across time accomplished through an extreme value Markov model. One important virtue of interpretability is that model parameters directly translate into quantities of interest for risk management, so that questions like whether heat waves are becoming longer, more severe, or more frequent, are easily answered by querying an appropriate fitted model. We demonstrate the latent state model on two recent, calamitous, examples: the European heat wave of 2003 and the Russian heat wave of 2010."
to:NB  meteorology  markov_models  time_series  statistics
11 days ago
[1411.2127] Causal Inference with a Graphical Hierarchy of Interventions
"Identifying causal parameters from observational data is fraught with subtleties due to the issues of selection bias and confounding. In addition, more complex questions of interest, such as effects of treatment on the treated and mediated effects may not always be identified even in data where treatment assignment is known and under investigator control, or may be identified under one causal model but not another.
"Increasingly complex effects of interest, coupled with a diversity of causal models in use resulted in a fragmented view of identification. This fragmentation makes it unnecessarily difficult to determine if a given parameter is identified (and in what model), and what assumptions must hold for this to be the case. This, in turn, complicates the development of estimation theory and sensitivity analysis procedures.
"In this paper, we give a unifying view of a large class of causal effects of interest in terms of a hierarchy of interventions, and show that identification theory for this large class reduces to an identification theory of random variables under interventions from this hierarchy. Moreover, we show that one type of intervention in the hierarchy is naturally associated with queries identified under the Finest Fully Randomized Causally Interpretable Structure Tree Graph (FFRCISTG) model of Robins (via the extended g-formula), and another is naturally associated with queries identified under the Non-Parametric Structural Equation Model with Independent Errors (NPSEM-IE) of Pearl, via a more general functional we call the edge g-formula.
"Our results motivate the study of estimation theory for the edge g-formula, since we show it arises both in mediation analysis, and in settings where treatment assignment has unobserved causes, such as models associated with Pearl's front-door criterion."
to:NB  causal_inference  graphical_models  statistics  identifiability
11 days ago
[1409.2344] A nonparametric two-sample hypothesis testing problem for random dot product graphs
"We consider the problem of testing whether two finite-dimensional random dot product graphs have generating latent positions that are independently drawn from the same distribution, or distributions that are related via scaling or projection. We propose a test statistic that is a kernel-based function of the adjacency spectral embedding for each graph. We obtain a limiting distribution for our test statistic under the null and we show that our test procedure is consistent across a broad range of alternatives."
to:NB  network_data_analysis  hypothesis_testing  two-sample_tests  re:network_differences  statistics  to_read
11 days ago
[1412.3442] Posterior predictive p-values and the convex order
"Posterior predictive p-values are a common approach to Bayesian model-checking. This article analyses their frequency behaviour, that is, their distribution when the parameters and the data are drawn from the prior and the model respectively. We show that the family of possible distributions is exactly described as the distributions that are less variable than uniform on [0,1], in the convex order. In general, p-values with such a property are not conservative, and we illustrate how the theoretical worst-case error rate for false rejection can occur in practice. We describe how to correct the p-values to recover conservatism in several common scenarios, for example, when interpreting a single p-value or when combining multiple p-values into an overall score of significance. We also handle the case where the p-value is estimated from posterior samples obtained from techniques such as Markov Chain or Sequential Monte Carlo. Our results place posterior predictive p-values in a much clearer theoretical framework, allowing them to be used with more assurance."
to:NB  to_read  model_checking  bayesianism  re:phil-of-bayes_paper  hypothesis_testing  p-values  statistics
11 days ago
[1411.5172] Learning nonparametric differential equations with operator-valued kernels and gradient matching
"Modeling dynamical systems with ordinary differential equations implies a mechanistic view of the process underlying the dynamics. However in many cases, this knowledge is not available. To overcome this issue, we introduce a general framework for nonparametric ODE models using penalized regression in Reproducing Kernel Hilbert Spaces (RKHS) based on operator-valued kernels. Moreover, we extend the scope of gradient matching approaches to nonparametric ODE. A smooth estimate of the solution ODE is built to provide an approximation of the derivative of the ODE solution which is in turn used to learn the nonparametric ODE model. This approach benefits from the flexibility of penalized regression in RKHS allowing for ridge or (structured) sparse regression as well. Very good results are shown on 3 different ODE systems."
to:NB  dynamical_systems  time_series  hilbert_space  statistics  smoothing
11 days ago
[1405.5978] Blockmodeling of multilevel networks
"The article presents several approaches to the blockmodeling of multilevel network data. Multilevel network data consist of networks that are measured on at least two levels (e.g. between organizations and people) and information on ties between those levels (e.g. information on which people are members of which organizations). Several approaches will be considered: a separate analysis of the levels; transforming all networks to one level and blockmodeling on this level using information from all levels; and a truly multilevel approach where all levels and ties among them are modeled at the same time. Advantages and disadvantages of these approaches will be discussed."
to:NB  social_networks  network_data_analysis  hierarchical_structure  community_discovery  statistics
11 days ago
[1405.1868] On the role of additive regression for (high-dimensional) causal inference
"We consider the problem of inferring the (total) causal effect of a single variable intervention on a (response) variable of interest. We prove that for a very general class of structural equation models with known order of the variables, it is sufficient to use additive regression, even for cases where the structural equation model has non-additive functional form: we call the procedure ord-additive regression. As such, our result implies a major robustness property with respect to model misspecification. Furthermore, when the order of the variables is not known, we can estimate (the equivalence class of) the order of the variables, or (the equivalence class of) the directed acyclic graph corresponding to the structural equation model, and then proceed by using these estimates as a substitute for the true quantities. We empirically compare the ord-additive regression method with more classical approaches and argue that the former is indeed more robust, reliable and much simpler."

--- Surprising but very positive if true.

--- ETA: between when I opened that tab a few days ago and just now when I bookmarked it, the abstract has totally changed, apparently because of an error in v1:

"We consider the problem of inferring the total causal effect of a single variable intervention on a (response) variable of interest. We propose a certain marginal integration regression technique for a very general class of potentially nonlinear structural equation models (SEMs) with known structure, or at least known superset of adjustment variables: we call the procedure S-mint regression. We prove that it achieves the convergence rate as for nonparametric regression: for example, single variable intervention effects can be estimated with convergence rate n−2/5 assuming smoothness with twice differentiable functions. Our result can also be seen as a major robustness property with respect to model misspecification which goes much beyond the notion of double robustness. Furthermore, when the structure of the SEM is not known, we can estimate (the equivalence class of) the directed acyclic graph corresponding to the SEM, and then proceed by using S-mint based on these estimates. We empirically compare the S-mint regression method with more classical approaches and argue that the former is indeed more robust, more reliable and substantially simpler."
11 days ago
[1407.2483] Counting Markov Blanket Structures
"Learning Markov blanket (MB) structures has proven useful in performing feature selection, learning Bayesian networks (BNs), and discovering causal relationships. We present a formula for efficiently determining the number of MB structures given a target variable and a set of other variables. As expected, the number of MB structures grows exponentially. However, we show quantitatively that there are many fewer MB structures that contain the target variable than there are BN structures that contain it. In particular, the ratio of BN structures to MB structures appears to increase exponentially in the number of variables."
to:NB  graphical_models  combinatroics
12 days ago
[1412.3773] Distinguishing cause from effect using observational data: methods and benchmarks
"The discovery of causal relationships from purely observational data is a fundamental problem in science. The most elementary form of such a causal discovery problem is to decide whether X causes Y or, alternatively, Y causes X, given joint observations of two variables X, Y . This was often considered to be impossible. Nevertheless, several approaches for addressing this bivariate causal discovery problem were proposed recently. In this paper, we present the benchmark data set CauseEffectPairs that consists of 88 different "cause-effect pairs" selected from 31 datasets from various domains. We evaluated the performance of several bivariate causal discovery methods on these real-world benchmark data and on artificially simulated data. Our empirical results provide evidence that additive-noise methods are indeed able to distinguish cause from effect using only purely observational data. In addition, we prove consistency of the additive-noise method proposed by Hoyer et al. (2009)."
to:NB  causal_inference  statistics  janzing.dominik
12 days ago
[1411.6144] False discovery rate smoothing
"We present false discovery rate smoothing, an empirical-Bayes method for exploiting spatial structure in large multiple-testing problems. FDR smoothing automatically finds spatially localized regions of significant test statistics. It then relaxes the threshold of statistical significance within these regions, and tightens it elsewhere, in a manner that controls the overall false-discovery rate at a given level. This results in increased power and cleaner spatial separation of signals from noise. The approach requires solving a non-standard high-dimensional optimization problem, for which an efficient augmented-Lagrangian algorithm is presented. We demonstrate that FDR smoothing exhibits state-of-the-art performance on simulated examples. We also apply the method to a data set from an fMRI experiment on spatial working memory, where it detects patterns that are much more biologically plausible than those detected by existing FDR-controlling methods. All code for FDR smoothing is publicly available in Python and R."
to:NB  multiple_testing  smoothing  fmri  spatial_statistics  poldrack.russell  statistics  to_read
12 days ago
[1411.5977] On the Impossibility of Convex Inference in Human Computation
"Human computation or crowdsourcing involves joint inference of the ground-truth-answers and the worker-abilities by optimizing an objective function, for instance, by maximizing the data likelihood based on an assumed underlying model. A variety of methods have been proposed in the literature to address this inference problem. As far as we know, none of the objective functions in existing methods is convex. In machine learning and applied statistics, a convex function such as the objective function of support vector machines (SVMs) is generally preferred, since it can leverage the high-performance algorithms and rigorous guarantees established in the extensive literature on convex optimization. One may thus wonder if there exists a meaningful convex objective function for the inference problem in human computation. In this paper, we investigate this convexity issue for human computation. We take an axiomatic approach by formulating a set of axioms that impose two mild and natural assumptions on the objective function for the inference. Under these axioms, we show that it is unfortunately impossible to ensure convexity of the inference problem. On the other hand, we show that interestingly, in the absence of a requirement to model "spammers", one can construct reasonable objective functions for crowdsourcing that guarantee convex inference."

- Seems like a very odd approach.
to:NB  distributed_systems  collective_cognition  social_life_of_the_mind  convexity  re:democratic_cognition  to_be_shot_after_a_fair_trial
12 days ago
[1407.4916] Extensions of Stability Selection using subsamples of observations and covariates
"We introduce extensions of Stability Selection, a method to stabilize variable selection methods introduced by Meinshausen and B\"uhlmann (2010). We propose to apply a base selection method repeatedly to subsamples of the the observations and to subsets of the covariates under scrutiny, and to select covariates based on their selection frequency. We analyse the effects and benefits of these extensions. Our analysis generalizes the theoretical results of Meinshausen and B\"uhlmann (2010) from the case of half-samples to subsamples of arbitrary size. We study, in a theoretical manner, the effect of taking random covariate subsets using a simplified score model. Finally we validate these extensions on numerical experiments on both synthetic and real datasets, and compare the obtained results in detail to the original stability selection method."
to:NB  model_selection  statistics
12 days ago
[1409.4317] Bootstrap-based testing for functional data
"We propose a novel bootstrap-based methodology for testing hypotheses about equality of certain characteristics of the distributions between different populations in the context of functional data. The suggested testing methodology is simple and easy to implement. It resamples the original dataset in such a way that the null hypothesis of interest is satisfied and it can be potentially applied to a wide range of testing problems and test statistics of interest. Furthermore, it can be utilized to the case where more than two populations of functional data are considered. To illustrate it, we consider the important problems of testing the equality of mean functions or the equality of covariance functions (resp. covariance operators) between two populations. In this context, theoretical results that justify the validity of the suggested bootstrap-based procedure applied to some test statistics recently proposed in the literature, are established. Furthermore, simulation results demonstrate very good size and power performances in finite sample situations, including the case of testing problems and/or sample sizes where asymptotic considerations do not lead to satisfactory approximations. A real-life dataset analyzed in the literature is also examined."
to:NB  functional_data_analysis  hypothesis_testing  bootstrap  statistics  re:network_differences
12 days ago
[1411.4723] A Frequentist Approach to Computer Model Calibration
"This paper considers the computer model calibration problem and provides a general frequentist solution. Under the proposed framework, the data model is semi-parametric with a nonparametric discrepancy function which accounts for any discrepancy between the physical reality and the computer model. In an attempt to solve a fundamentally important (but often ignored) identifiability issue between the computer model parameters and the discrepancy function, this paper proposes a new and identifiable parametrization of the calibration problem. It also develops a two-step procedure for estimating all the relevant quantities under the new parameterization. This estimation procedure is shown to enjoy excellent rates of convergence and can be straightforwardly implemented with existing software. For uncertainty quantification, bootstrapping is adopted to construct confidence regions for the quantities of interest. The practical performance of the proposed methodology is illustrated through simulation examples and an application to a computational fluid dynamics model."

- i.e., pick the parameter value where a nonparametric regression of the residuals is as small as possible on average.
to:NB  simulation  statistics  estimation  re:stacs
12 days ago
[1405.3224] On the Complexity of A/B Testing
"A/B testing refers to the task of determining the best option among two alternatives that yield random outcomes. We provide distribution-dependent lower bounds for the performance of A/B testing that improve over the results currently available both in the fixed-confidence (or delta-PAC) and fixed-budget settings. When the distribution of the outcomes are Gaussian, we prove that the complexity of the fixed-confidence and fixed-budget settings are equivalent, and that uniform sampling of both alternatives is optimal only in the case of equal variances. In the common variance case, we also provide a stopping rule that terminates faster than existing fixed-confidence algorithms. In the case of Bernoulli distributions, we show that the complexity of fixed-budget setting is smaller than that of fixed-confidence setting and that uniform sampling of both alternatives - though not optimal - is advisable in practice when combined with an appropriate stopping criterion."

--- Surely these must be ancient results in the experimental design literature, especially in the Gaussian and binomial cases?
to:NB  learning_theory  experimental_design  statistics
12 days ago
[1405.3133] Graph Matching: Relax at Your Own Risk
"Graph matching---aligning a pair of graphs to minimize their edge disagreements---has received wide-spread attention from both theoretical and applied communities over the past several decades, including combinatorics, computer vision, and connectomics. Its attention can be partially attributed to its computational difficulty. Although many heuristics have previously been proposed in the literature to approximately solve graph matching, very few have any theoretical support for their performance. A common technique is to relax the discrete problem to a continuous problem, therefore enabling practitioners to bring gradient-descent-type algorithms to bear. We prove that an indefinite relaxation (when solved exactly) almost always discovers the optimal permutation, while a common convex relaxation almost always fails to discover the optimal permutation. These theoretical results suggest that initializing the indefinite algorithm with the convex optimum might yield improved practical performance. Indeed, experimental results illuminate and corroborate these theoretical findings, demonstrating that excellent results are achieved in both benchmark and real data problems by amalgamating the two approaches."
to:NB  graph_theory  network_data_analysis  optimization  re:network_differences
12 days ago
[1410.4307] Social Learning and Distributed Hypothesis Testing
"This paper considers a problem of distributed hypothesis testing and social learning. Individual nodes in a network receive noisy local (private) observations whose distribution is parameterized by a discrete parameter (hypotheses). The conditional distributions are known locally at the nodes, but the true parameter/hypothesis is not known. An update rule is analyzed in which nodes first perform a Bayesian update of their belief (distribution estimate) of the parameter based on their local observation, communicate these updates to their neighbors, and then perform a "non-Bayesian" linear consensus using the log-beliefs of their neighbors. In this paper we show that under mild assumptions, the belief of any node in any incorrect hypothesis converges to zero exponentially fast, and we characterize the exponential rate of learning which is given in terms of the network structure and the divergences between the observations' distributions. Our main result is the large deviation property established on the rate of convergence with an explicit characterization of the probability of convergence."
to:NB  social_life_of_the_mind  distributed_systems  hypothesis_testing  statistics  javidi.tara  sarwate.anand  large_deviations  re:democratic_cognition  re:social_networks_as_sensor_networks
12 days ago

Copy this bookmark:

description:

tags: