[1502.07576] Comparison Issues in Large Graphs: State of the Art and Future Directions

2 days ago

"Graph comparison is fundamentally important for many applications such as the analysis of social networks and biological data and has been a significant research area in the pattern recognition and pattern analysis domains. Nowadays, the graphs are large, they may have billions of nodes and edges. Comparison issues in such huge graphs are a challenging research problem.

"In this paper, we survey the research advances of comparison problems in large graphs. We review graph comparison and pattern matching approaches that focus on large graphs. We categorize the existing approaches into three classes: partition-based approaches, search space based approaches and summary based approaches. All the existing algorithms in these approaches are described in detail and analyzed according to multiple metrics such as time complexity, type of graphs or comparison concept. Finally, we identify directions for future research."

to:NB
network_data_analysis
network_differences
re:network_differences
to_read
via:vaguery
"In this paper, we survey the research advances of comparison problems in large graphs. We review graph comparison and pattern matching approaches that focus on large graphs. We categorize the existing approaches into three classes: partition-based approaches, search space based approaches and summary based approaches. All the existing algorithms in these approaches are described in detail and analyzed according to multiple metrics such as time complexity, type of graphs or comparison concept. Finally, we identify directions for future research."

2 days ago

The atoms of neural computation

2 days ago

"Does the brain depend on a set of elementary, reusable computations?"

--- There is a FAQ on arxiv, http://arxiv.org/abs/1410.8826, but not the actual paper.

to:NB
have_read
neural_computation
neuroscience
design_for_a_brain
marcus.gary
--- There is a FAQ on arxiv, http://arxiv.org/abs/1410.8826, but not the actual paper.

2 days ago

[1502.05934] Achieving All with No Parameters: Adaptive NormalHedge

3 days ago

"We study the classic online learning problem of predicting with expert advice, and propose a truly parameter-free and adaptive algorithm that achieves several objectives simultaneously without using any prior information. The main component of this work is an improved version of the NormalHedge.DT algorithm (Luo and Schapire, 2014), called AdaNormalHedge. On one hand, this new algorithm ensures small regret when the competitor has small loss and almost constant regret when the losses are stochastic. On the other hand, the algorithm is able to compete with any convex combination of the experts simultaneously, with a regret in terms of the relative entropy of the prior and the competitor. This resolves an open problem proposed by Chaudhuri et al. (2009) and Chernov and Vovk (2010). Moreover, we extend the results to the sleeping expert setting and provide two applications to illustrate the power of AdaNormalHedge: 1) competing with time-varying unknown competitors and 2) predicting almost as well as the best pruning tree. Our results on these applications significantly improve previous work from different aspects, and a special case of the first application resolves another open problem proposed by Warmuth and Koolen (2014) on whether one can simultaneously achieve optimal shifting regret for both adversarial and stochastic losses."

to:NB
to_read
learning_theory
low-regret_learning
re:growing_ensemble_project
3 days ago

Learned Patriots: Debating Science, State, and Society in the Nineteenth-Century Ottoman Empire, Yalçinkaya

3 days ago

"In Learned Patriots, M. Alper Yalçinkaya examines what it meant for nineteenth-century Ottoman elites themselves to have a debate about science. Yalçinkaya finds that for anxious nineteenth-century Ottoman politicians, intellectuals, and litterateurs, the chief question was not about the meaning, merits, or dangers of science. Rather, what mattered were the qualities of the new “men of science.” Would young, ambitious men with scientific education be loyal to the state? Were they “proper” members of the community? Science, Yalçinkaya shows, became a topic that could hardly be discussed without reference to identity and morality."

to:NB
books:noted
ottoman_empire
history_of_science
19th_century_history
history_of_ideas
3 days ago

Dead and Alive: Beliefs in Contradictory Conspiracy Theories

4 days ago

"Conspiracy theories can form a monological belief system: A self-sustaining worldview comprised of a network of mutually supportive beliefs. The present research shows that even mutually incompatible conspiracy theories are positively correlated in endorsement. In Study 1 (n = 137), the more participants believed that Princess Diana faked her own death, the more they believed that she was murdered. In Study 2 (n = 102), the more participants believed that Osama Bin Laden was already dead when U.S. special forces raided his compound in Pakistan, the more they believed he is still alive. Hierarchical regression models showed that mutually incompatible conspiracy theories are positively associated because both are associated with the view that the authorities are engaged in a cover-up (Study 2). The monological nature of conspiracy belief appears to be driven not by conspiracy theories directly supporting one another but by broader beliefs supporting conspiracy theories in general."

--- I'd want to look very carefully at the numerical data to make sure this isn't being driven by a few people who are crazy (even once you allow for their being into conspiracy theories). In fact, this sounds like a situation where you'd really want to look carefully at protocols collected from the interviewees... Last tag conditional on the authors responding positively to my query about access to the data.

to:NB
have_skimmed
surveys
hierarchical_statistical_models
conspiracy_theories
sociology
to_teach:undergrad-ADA
psychology
natural_history_of_truthiness
--- I'd want to look very carefully at the numerical data to make sure this isn't being driven by a few people who are crazy (even once you allow for their being into conspiracy theories). In fact, this sounds like a situation where you'd really want to look carefully at protocols collected from the interviewees... Last tag conditional on the authors responding positively to my query about access to the data.

4 days ago

How Robust Are Probabilistic Models of Higher-Level Cognition?

5 days ago

"An increasingly popular theory holds that the mind should be viewed as a near-optimal or rational engine of probabilistic inference, in domains as diverse as word learning, pragmatics, naive physics, and predictions of the future. We argue that this view, often identified with Bayesian models of inference, is markedly less promising than widely believed, and is undermined by post hoc practices that merit wholesale reevaluation. We also show that the common equation between probabilistic and rational or optimal is not justified."

in_NB
psychology
cognitive_science
bayesianism
marcus.gary_f.
have_read
5 days ago

What can individual differences tell us about the specialization of function?

5 days ago

"Can the study of individual differences inform debates about modularity and the specialization of function? In this article, we consider the implications of a highly replicated, robust finding known as positive manifold: Individual differences in different cognitive domains tend to be positively inter- correlated. Prima facie, this fact, which has generally been interpreted as reflecting the influence of a domain-general cognitive factor, might be seen as posing a serious challenge to a strong view of modularity. Drawing on a mixture of meta-analysis and computer simulation, we show that positive manifold derives instead largely from between-task neural overlap, suggesting a potential way of reconciling individual differences with some form of modularity."

--- Journal version: http://dx.doi.org/10.1080/02643294.2011.609813

--- The model simulated from is, I think, just another version of Thompson's ability sampling model.

in_NB
have_read
iq
factor_analysis
marcus.gary_f.
neuropsychology
re:g_paper
--- Journal version: http://dx.doi.org/10.1080/02643294.2011.609813

--- The model simulated from is, I think, just another version of Thompson's ability sampling model.

5 days ago

Data Science at the Command Line - O'Reilly Media

5 days ago

"This hands-on guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data."

books:noted
unix
to_teach:statcomp
5 days ago

Over at Project Syndicate: Making Do with More (Brad DeLong's Grasping Reality...)

7 days ago

Brad channeling Keynes, and indeed _The German Ideology_. But notice: he's talking about how prosperity is moving us into areas where we know markets generically fail badly, and are very artificial creatures of state power at best...

economics
market_failures_in_everything
delong.brad
7 days ago

Overdispersion Diagnostics for Generalized Linear Models on JSTOR

8 days ago

"Generalized linear models (GLM's) are simple, convenient models for count data, but they assume that the variance is a specified function of the mean. Although overdispersed GLM's allow more flexible mean-variance relationships, they are often not as simple to interpret nor as easy to fit as standard GLM's. This article introduces a convexity plot, or C plot for short, that detects overdispersion and relative variance curves and relative variance tests that help to understand the nature of the overdispersion. Convexity plots sometimes detect overdispersion better than score tests, and relative variance curves and tests sometimes distinguish the source of the overdispersion better than score tests."

in_NB
statistics
regression
model_checking
kith_and_kin
roeder.kathryn
to_teach:undergrad-ADA
have_read
8 days ago

Delay Differential Embedding of Time Series

9 days ago

"Nonlinear dynamical system analysis based on embedding theory has been used for modeling and prediction, but it also has applications to signal detection and classification of time series. An embedding creates a multidimensional geometrical object from a single time series. Traditionally either delay or derivative embeddings have been used. The delay embedding is composed of delayed versions of the signal, and the derivative embedding is composed of successive derivatives of the signal. The delay embedding has been extended to nonuniform embeddings to take multiple timescales into account. Both embeddings provide information on the underlying dynamical system without having direct access to all the system variables. Delay differential analysis is based on functional embeddings, a combination of the derivative embedding with nonuniform delay embeddings. Small delay differential equation (DDE) models that best represent relevant dynamic features of time series data are selected from a pool of candidate models for detection or classification. We show that the properties of DDEs support spectral analysis in the time domain where nonlinear correlation functions are used to detect frequencies, frequency and phase couplings, and bispectra. These can be efficiently computed with short time windows and are robust to noise. For frequency analysis, this framework is a multivariate extension of discrete Fourier transform (DFT), and for higher-order spectra, it is a linear and multivariate alternative to multidimensional fast Fourier transform of multidimensional correlations. This method can be applied to short or sparse time series and can be extended to cross-trial and cross-channel spectra if multiple short data segments of the same experiment are available. Together, this time-domain toolbox provides higher temporal resolution, increased frequency and phase coupling information, and it allows an easy and straightforward implementation of higher-order spectra across time compared with frequency-based methods such as the DFT and cross-spectral analysis."

to:NB
time_series
dynamical_systems
geometry_from_a_time_series
9 days ago

Why I Just Asked My Students To Put Their Laptops Away… — Medium

9 days ago

I think Clay actually misses a trick here. A computer is a really useful note-taking device, and writing notes helps you remember (even if you don't consult the notes). What is really needed, for instructional purposes, is a switch we can throw which jams all wifi signals. (I actually tried to get the computing lab to block Internet access during my class hour, but they said it was technically infeasible.)

teaching
pedagogy
education
attention
networked_life
shirky.clay
9 days ago

Chernozhukov , Chetverikov , Kato : Gaussian approximation of suprema of empirical processes

10 days ago

"This paper develops a new direct approach to approximating suprema of general empirical processes by a sequence of suprema of Gaussian processes, without taking the route of approximating whole empirical processes in the sup-norm. We prove an abstract approximation theorem applicable to a wide variety of statistical problems, such as construction of uniform confidence bands for functions. Notably, the bound in the main approximation theorem is nonasymptotic and the theorem allows for functions that index the empirical process to be unbounded and have entropy divergent with the sample size. The proof of the approximation theorem builds on a new coupling inequality for maxima of sums of random vectors, the proof of which depends on an effective use of Stein’s method for normal approximation, and some new empirical process techniques. We study applications of this approximation theorem to local and series empirical processes arising in nonparametric estimation via kernel and series methods, where the classes of functions change with the sample size and are non-Donsker. Importantly, our new technique is able to prove the Gaussian approximation for the supremum type statistics under weak regularity conditions, especially concerning the bandwidth and the number of series functions, in those examples."

Ungated: http://arxiv.org/abs/1212.6885

in_NB
empirical_processes
to_read
Ungated: http://arxiv.org/abs/1212.6885

10 days ago

Evans , Richardson : Markovian acyclic directed mixed graphs for discrete data

10 days ago

"Acyclic directed mixed graphs (ADMGs) are graphs that contain directed (→) and bidirected (↔) edges, subject to the constraint that there are no cycles of directed edges. Such graphs may be used to represent the conditional independence structure induced by a DAG model containing hidden variables on its observed margin. The Markovian model associated with an ADMG is simply the set of distributions obeying the global Markov property, given via a simple path criterion (m-separation). We first present a factorization criterion characterizing the Markovian model that generalizes the well-known recursive factorization for DAGs. For the case of finite discrete random variables, we also provide a parameterization of the model in terms of simple conditional probabilities, and characterize its variation dependence. We show that the induced models are smooth. Consequently, Markovian ADMG models for discrete variables are curved exponential families of distributions."

in_NB
graphical_models
probability
richardson.thomas
exponential_families
10 days ago

Feng , He : Statistical inference based on robust low-rank data matrix approximation

10 days ago

"The singular value decomposition is widely used to approximate data matrices with lower rank matrices. Feng and He [Ann. Appl. Stat. 3 (2009) 1634–1654] developed tests on dimensionality of the mean structure of a data matrix based on the singular value decomposition. However, the first singular values and vectors can be driven by a small number of outlying measurements. In this paper, we consider a robust alternative that moderates the effect of outliers in low-rank approximations. Under the assumption of random row effects, we provide the asymptotic representations of the robust low-rank approximation. These representations may be used in testing the adequacy of a low-rank approximation. We use oligonucleotide gene microarray data to demonstrate how robust singular value decomposition compares with the its traditional counterparts. Examples show that the robust methods often lead to a more meaningful assessment of the dimensionality of gene intensity data matrices."

to:NB
dimension_reduction
low-rank_approximation
statistics
re:g_paper
10 days ago

What Do Data on Millions of U.S. Workers Reveal about Life-Cycle Earnings Risk?

15 days ago

"We study the evolution of individual labor earnings over the life cycle, using a large panel data set of earnings histories drawn from U.S. administrative records. Using fully nonparametric methods, our analysis reaches two broad conclusions. First, earnings shocks display substantial deviations from lognormality—the standard assumption in the literature on incomplete markets. In particular, earnings shocks display strong negative skewness and extremely high kurtosis—as high as 30 compared with 3 for a Gaussian distribution. The high kurtosis implies that, in a given year, most individuals experience very small earnings shocks, and a small but non-negligible number experience very large shocks. Second, these statistical properties vary significantly both over the life cycle and with the earnings level of individuals. We also estimate impulse response functions of earnings shocks and find important asymmetries: Positive shocks to high-income individuals are quite transitory, whereas negative shocks are very persistent; the opposite is true for low-income individuals. Finally, we use these rich sets of moments to estimate econometric processes with increasing generality to capture these salient features of earnings dynamics."

--- Last tag conditional on what exactly is in the "data appendix" at https://fguvenendotcom.files.wordpress.com/2014/04/moments_for_publication.xls

to:NB
to_read
economics
inequality
heavy_tails
to_teach:undergrad-ADA
statistics
great_risk_shift
--- Last tag conditional on what exactly is in the "data appendix" at https://fguvenendotcom.files.wordpress.com/2014/04/moments_for_publication.xls

15 days ago

Make - GNU Project - Free Software Foundation

15 days ago

My book needs a make file. Which means I need to figure out how to really write one, with the source files spread across a gazillion sub-directories...

programming
re:ADAfaEPoV
15 days ago

Philip Kitcher: The Lure of the Peak | The New Republic

15 days ago

I must say this (politely) makes Parfit's book sound exquisitely boring and pointless.

book_reviews
ethics
philosophy_of_science
kitcher.philip
15 days ago

Why does financial sector growth crowd out real economic growth?

15 days ago

"In this paper we examine the negative relationship between the rate of growth of the financial sector and the rate of growth of total factor productivity. We begin by showing that by disproportionately benefiting high collateral/low productivity projects, an exogenous increase in finance reduces total factor productivity growth. Then, in a model with skilled workers and endogenous financial sector growth, we establish the possibility of multiple equilibria. In the equilibrium where skilled labour works in finance, the financial sector grows more quickly at the expense of the real economy. We go on to show that consistent with this theory, financial growth disproportionately harms financially dependent and R&D-intensive industries."

to:NB
to_read
economics
financialization
productivity
via:jbdelong
15 days ago

My email is a monster - The Oatmeal

15 days ago

Why I have not written an adequate reply to your gracious note. (And yet I much prefer e-mail to just about every other online or printed medium; I have Issues.)

email
networked_life
moral_psychology
cartoons
funny:because_its_true
15 days ago

On the Interpretation of Instrumental Variables in the Presence of Specification Errors

16 days ago

"The method of instrumental variables (IV) and the generalized method of moments (GMM), and their applications to the estimation of errors-in-variables and simultaneous equations models in econometrics, require data on a sufficient number of instrumental variables that are both exogenous and relevant. We argue that, in general, such instruments (weak or strong) cannot exist."

--- I think they are too quick to dismiss non-parametric IV; if what one wants is consistent estimates of the partial derivatives at a given point, you _can_ get that by (e.g.) splines or locally linear regression. Need to think through this in terms of Pearl's graphical definition of IVs.

in_NB
instrumental_variables
misspecification
regression
linear_regression
causal_inference
statistics
econometrics
via:jbdelong
have_read
to_teach:undergrad-ADA
re:ADAfaEPoV
--- I think they are too quick to dismiss non-parametric IV; if what one wants is consistent estimates of the partial derivatives at a given point, you _can_ get that by (e.g.) splines or locally linear regression. Need to think through this in terms of Pearl's graphical definition of IVs.

16 days ago

The Promises and Pitfalls of Genoeconomics

17 days ago

"This article reviews existing research at the intersection of genetics and economics, presents some new findings that illustrate the state of genoeconomics research, and surveys the prospects of this emerging field. Twin studies suggest that economic outcomes and preferences, once corrected for measurement error, appear to be about as heritable as many medical conditions and personality traits. Consistent with this pattern, we present new evidence on the heritability of permanent income and wealth. Turning to genetic association studies, we survey the main ways that the direct measurement of genetic variation across individuals is likely to contribute to economics, and we outline the challenges that have slowed progress in making these contributions. The most urgent problem facing researchers in this field is that most existing efforts to find associations between genetic variation and economic behavior are based on samples that are too small to ensure adequate statistical power. This has led to many false positives in the literature. We suggest a number of possible strategies to improve and remedy this problem: (a) pooling data sets, (b) using statistical techniques that exploit the greater information content of many genes considered jointly, and (c) focusing on economically relevant traits that are most proximate to known biological mechanisms."

--- Not bad, of its kind, but notice that when they get impossible-according-to-the-model results (like negative environmental variance components, or non-identical twins being more similar on some traits than identical twins), the response is always an ad-hoc modification or data exclusion, rather than re-thinking the model. Also, I really think they need to give more attention to population structure than they do, because we _know_ PCA doesn't control it away (http://dx.doi.org/10.1016/j.ajhg.2011.05.025). Full points for honesty, however, in their examples of sheer failure to replicate.

to:NB
have_read
genomics
human_genetics
economics
statistics
--- Not bad, of its kind, but notice that when they get impossible-according-to-the-model results (like negative environmental variance components, or non-identical twins being more similar on some traits than identical twins), the response is always an ad-hoc modification or data exclusion, rather than re-thinking the model. Also, I really think they need to give more attention to population structure than they do, because we _know_ PCA doesn't control it away (http://dx.doi.org/10.1016/j.ajhg.2011.05.025). Full points for honesty, however, in their examples of sheer failure to replicate.

17 days ago

Characterizing Autopoiesis in the Game of Life

18 days ago

"Maturana and Varela's concept of autopoiesis defines the essential organization of living systems and serves as a foundation for their biology of cognition and the enactive approach to cognitive science. As an initial step toward a more formal analysis of autopoiesis, this article investigates its application to the compact, recurrent spatiotemporal patterns that arise in Conway's Game-of-Life cellular automaton. In particular, we demonstrate how such entities can be formulated as self-constructing networks of interdependent processes that maintain their own boundaries. We then characterize the specific organizations of several such entities, suggest a way to simplify the descriptions of these organizations, and briefly consider the transformation of such organizations over time."

to:NB
self-organization
cellular_automata
emergence
artificial_life
beer.randall
18 days ago

Korostelev : A minimaxity criterion in nonparametric regression based on large-deviations probabilities

18 days ago

"A large-deviations criterion is proposed for optimality of nonparametric regression estimators. The criterion is one of minimaxity of the large-deviations probabilities. We study the case where the underlying class of regression functions is either Lipschitz or Hölder, and when the loss function involves estimation at a point or in supremum norm. Exact minimax asymptotics are found in the Gaussian case."

in_NB
large_deviations
regression
nonparametrics
statistics
have_read
18 days ago

Fu : Large Sample Point Estimation: A Large Deviation Theory Approach

18 days ago

"In this paper the exponential rates of decrease and bounds on tail probabilities for consistent estimators are studied using large deviation methods. The asymptotic expansions of Bahadur bounds and exponential rates in the case of the maximum likelihood estimator are obtained. Based on these results we have obtained a result parallel to the Fisher-Rao-Efron result concerning second-order efficiency (see Efron, 1975). Our results also substantiate the geometric observation given by Efron (1975) that if the statistical curvature of the underlying distribution is small, then the maximum likelihood estimator is nearly optimal.'

in_NB
large_deviations
statistics
estimation
have_read
18 days ago

Symmetry and Collective Fluctuations in Evolutionary Games - Books - IOPscience

19 days ago

"In this monograph we bring together a conceptual treatment of evolutionary dynamics and a path-ensemble approach to non-equilibrium stochastic processes. Our framework is evolutionary game theory, in which the map from individual types and their interactions to the fitness that determines their evolutionary success is modeled as a game played among agents in the population. Our approach, however, is not anchored either in analogy to play or in motivations to interpret particular interactions as games. Rather, we argue that games are a flexible and reasonably generic framework to capture, classify and analyze the processes in development and some forms of inter-agent interaction that lie behind arbitrary frequency-dependent fitness models."

to:NB
books:noted
evolutionary_game_theory
stochastic_processes
large_deviations
smith.eric
kith_and_kin
re:do-institutions-evolve
19 days ago

Finance vs. Wal-Mart: Why are Financial Services so Expensive?

19 days ago

"Despite its fast computers and credit derivatives, the current financial system does not seem better at transferring funds from savers to borrowers than the financial system of 1910."

--- I really wish papers like this would give more details about their calculations, and not use graphs which are so freaking painful to the eye.

--- ETA: Also, he never does get around to answering the question in his subtitle!

in_NB
finance
financialization
economics
whats_gone_wrong_with_america
via:jbdelong
have_read
--- I really wish papers like this would give more details about their calculations, and not use graphs which are so freaking painful to the eye.

--- ETA: Also, he never does get around to answering the question in his subtitle!

19 days ago

Federal Reserve Bank San Francisco | The Recent Rise and Fall of Rapid Productivity Growth |

19 days ago

"Information technology fueled a surge in U.S. productivity growth in the late 1990s and early 2000s. However, this rapid pace proved to be temporary, as productivity growth slowed before the Great Recession. Furthermore, looking through the effects of the economic downturn on productivity, the reduced pace of productivity gains has continued and suggests that average future output growth will likely be relatively slow."

--- But mightn't these also be sectors where measuring value-added is particularly difficult? (Especially when a lot of the valuable product is supposed to be given away.)

to:NB
have_read
economics
productivity
innovation
via:jbdelong
--- But mightn't these also be sectors where measuring value-added is particularly difficult? (Especially when a lot of the valuable product is supposed to be given away.)

19 days ago

Wickham, C.: Sleepwalking into a New World: The Emergence of Italian City Communes in the Twelfth Century. (eBook and Hardcover)

19 days ago

"Amid the disintegration of the Kingdom of Italy in the eleventh and twelfth centuries, a new form of collective government—the commune—arose in the cities of northern and central Italy. Sleepwalking into a New World takes a bold new look at how these autonomous city-states came about, and fundamentally alters our understanding of one of the most important political and cultural innovations of the medieval world.

"Chris Wickham provides richly textured portraits of three cities—Milan, Pisa, and Rome—and sets them against a vibrant backcloth of other towns. He argues that, in all but a few cases, the elites of these cities and towns developed one of the first nonmonarchical forms of government in medieval Europe, unaware that they were creating something altogether new. Wickham makes clear that the Italian city commune was by no means a democracy in the modern sense, but that it was so novel that outsiders did not know what to make of it. He describes how, as the old order unraveled, the communes emerged, governed by consular elites “chosen by the people,” and subject to neither emperor nor king. They regularly fought each other, yet they grew organized and confident enough to ally together to defeat Frederick Barbarossa, the German emperor, at the Battle of Legnano in 1176."

to:NB
italy
medieval_european_history
political_science
cities
"Chris Wickham provides richly textured portraits of three cities—Milan, Pisa, and Rome—and sets them against a vibrant backcloth of other towns. He argues that, in all but a few cases, the elites of these cities and towns developed one of the first nonmonarchical forms of government in medieval Europe, unaware that they were creating something altogether new. Wickham makes clear that the Italian city commune was by no means a democracy in the modern sense, but that it was so novel that outsiders did not know what to make of it. He describes how, as the old order unraveled, the communes emerged, governed by consular elites “chosen by the people,” and subject to neither emperor nor king. They regularly fought each other, yet they grew organized and confident enough to ally together to defeat Frederick Barbarossa, the German emperor, at the Battle of Legnano in 1176."

19 days ago

Chernozhukov , Chetverikov , Kato : Anti-concentration and honest, adaptive confidence bands

20 days ago

"Modern construction of uniform confidence bands for nonparametric densities (and other functions) often relies on the classical Smirnov–Bickel–Rosenblatt (SBR) condition; see, for example, Giné and Nickl [Probab. Theory Related Fields 143 (2009) 569–596]. This condition requires the existence of a limit distribution of an extreme value type for the supremum of a studentized empirical process (equivalently, for the supremum of a Gaussian process with the same covariance function as that of the studentized empirical process). The principal contribution of this paper is to remove the need for this classical condition. We show that a considerably weaker sufficient condition is derived from an anti-concentration property of the supremum of the approximating Gaussian process, and we derive an inequality leading to such a property for separable Gaussian processes. We refer to the new condition as a generalized SBR condition. Our new result shows that the supremum does not concentrate too fast around any value.

"We then apply this result to derive a Gaussian multiplier bootstrap procedure for constructing honest confidence bands for nonparametric density estimators (this result can be applied in other nonparametric problems as well). An essential advantage of our approach is that it applies generically even in those cases where the limit distribution of the supremum of the studentized empirical process does not exist (or is unknown). This is of particular importance in problems where resolution levels or other tuning parameters have been chosen in a data-driven fashion, which is needed for adaptive constructions of the confidence bands. Finally, of independent interest is our introduction of a new, practical version of Lepski’s method, which computes the optimal, nonconservative resolution levels via a Gaussian multiplier bootstrap method."

--- Ungated version: http://arxiv.org/abs/1303.7152

in_NB
confidence_sets
bootstrap
density_estimation
nonparametrics
statistics
regression
to_read
re:ADAfaEPoV
"We then apply this result to derive a Gaussian multiplier bootstrap procedure for constructing honest confidence bands for nonparametric density estimators (this result can be applied in other nonparametric problems as well). An essential advantage of our approach is that it applies generically even in those cases where the limit distribution of the supremum of the studentized empirical process does not exist (or is unknown). This is of particular importance in problems where resolution levels or other tuning parameters have been chosen in a data-driven fashion, which is needed for adaptive constructions of the confidence bands. Finally, of independent interest is our introduction of a new, practical version of Lepski’s method, which computes the optimal, nonconservative resolution levels via a Gaussian multiplier bootstrap method."

--- Ungated version: http://arxiv.org/abs/1303.7152

20 days ago

Silverman : Spline Smoothing: The Equivalent Variable Kernel Method

20 days ago

"The spline smoothing approach to nonparametric regression and curve estimation is considered. It is shown that, in a certain sense, spline smoothing corresponds approximately to smoothing by a kernel method with bandwidth depending on the local density of design points. Some exact calculations demonstrate that the approximation is extremely close in practice. Consideration of kernel smoothing methods demonstrates that the way in which the effective local bandwidth behaves in spline smoothing has desirable properties. Finally, the main result of the paper is applied to the related topic of penalized maximum likelihood probability density estimates; a heuristic discussion shows that these estimates should adapt well in the tails of the distribution."

in_NB
have_read
splines
kernel_estimators
nonparametrics
regression
density_estimation
statistics
silverman.b.w.
20 days ago

Levine, A.: American Insecurity: Why Our Economic Fears Lead to Political Inaction. (eBook and Hardcover)

20 days ago

"Americans today face no shortage of threats to their financial well-being, such as job and retirement insecurity, health care costs, and spiraling college tuition. While one might expect that these concerns would motivate people to become more politically engaged on the issues, this often doesn’t happen, and the resulting inaction carries consequences for political debates and public policy. Moving beyond previously studied barriers to political organization, American Insecurity sheds light on the public’s inaction over economic insecurities by showing that the rhetoric surrounding these issues is actually self-undermining. By their nature, the very arguments intended to mobilize individuals—asking them to devote money or time to politics—remind citizens of their economic fears and personal constraints, leading to undermobilization and nonparticipation.

"Adam Seth Levine explains why the set of people who become politically active on financial insecurity issues is therefore quite narrow. When money is needed, only those who care about the issues but are not personally affected become involved. When time is needed, participation is limited to those not personally affected or those who are personally affected but outside of the labor force with time to spare. The latter explains why it is relatively easy to mobilize retirees on topics that reflect personal financial concerns, such as Social Security and Medicare. In general, however, when political representation requires a large group to make their case, economic insecurity threats are uniquely disadvantaged."

--- If only we could conceive of institutions that would organize ordinary people for political action around economic concerns! (I.e., I wonder what his results would've looked like back when we had a labor movement.)

to:NB
books:noted
inequality
great_risk_shift
political_science
us_politics
whats_gone_wrong_with_america
"Adam Seth Levine explains why the set of people who become politically active on financial insecurity issues is therefore quite narrow. When money is needed, only those who care about the issues but are not personally affected become involved. When time is needed, participation is limited to those not personally affected or those who are personally affected but outside of the labor force with time to spare. The latter explains why it is relatively easy to mobilize retirees on topics that reflect personal financial concerns, such as Social Security and Medicare. In general, however, when political representation requires a large group to make their case, economic insecurity threats are uniquely disadvantaged."

--- If only we could conceive of institutions that would organize ordinary people for political action around economic concerns! (I.e., I wonder what his results would've looked like back when we had a labor movement.)

20 days ago

Le Grand, J. and New, B.: Government Paternalism: Nanny State or Helpful Friend?. (eBook and Hardcover)

20 days ago

"Should governments save people from themselves? Do governments have the right to influence citizens’ behavior related to smoking tobacco, eating too much, not saving enough, drinking alcohol, or taking marijuana—or does this create a nanny state, leading to infantilization, demotivation, and breaches in individual autonomy? Looking at examples from both sides of the Atlantic and around the world, Government Paternalism examines the justifications for, and the prevalence of, government involvement and considers when intervention might or might not be acceptable. Building on developments in philosophy, behavioral economics, and psychology, Julian Le Grand and Bill New explore the roles, boundaries, and responsibilities of the government and its citizens.

"Le Grand and New investigate specific policy areas, including smoking, saving for pensions, and assisted suicide. They discuss legal restrictions on risky behavior, taxation of harmful activities, and subsidies for beneficial activities. And they pay particular attention to “nudge” or libertarian paternalist proposals that try to change the context in which individuals make decisions so that they make the right ones. Le Grand and New argue that individuals often display “reasoning failure”: an inability to achieve the ends that they set themselves. Such instances are ideal for paternalistic interventions—for though such interventions might impinge on autonomy, the impact can be outweighed by an improvement in well-being."

--- Unfairly, the affiliations and the endorsements make me more skeptical.

to:NB
books:noted
political_philosophy
re:anti-nudge
"Le Grand and New investigate specific policy areas, including smoking, saving for pensions, and assisted suicide. They discuss legal restrictions on risky behavior, taxation of harmful activities, and subsidies for beneficial activities. And they pay particular attention to “nudge” or libertarian paternalist proposals that try to change the context in which individuals make decisions so that they make the right ones. Le Grand and New argue that individuals often display “reasoning failure”: an inability to achieve the ends that they set themselves. Such instances are ideal for paternalistic interventions—for though such interventions might impinge on autonomy, the impact can be outweighed by an improvement in well-being."

--- Unfairly, the affiliations and the endorsements make me more skeptical.

20 days ago

[1502.02398] Towards a Learning Theory of Causation

20 days ago

"We pose causal inference as the problem of learning to classify probability distributions. In particular, we assume access to a collection {(Si,li)}ni=1, where each Si is a sample drawn from the probability distribution of Xi×Yi, and li is a binary label indicating whether "Xi→Yi" or "Xi←Yi". Given these data, we build a causal inference rule in two steps. First, we featurize each Si using the kernel mean embedding associated with some characteristic kernel. Second, we train a binary classifier on such embeddings to distinguish between causal directions. We present generalization bounds showing the statistical consistency and learning rates of the proposed approach, and provide a simple implementation that achieves state-of-the-art cause-effect inference. Furthermore, we extend our ideas to infer causal relationships between more than two variables."

--- Finally, I am sympathetic to complaints about ML-ish methods giving us no understanding even when they work predictively.

to:NB
to_read
causal_inference
hilbert_space
statistics
via:vaguery
--- Finally, I am sympathetic to complaints about ML-ish methods giving us no understanding even when they work predictively.

20 days ago

Uniform random generation of large acyclic digraphs - Springer

23 days ago

"Directed acyclic graphs are the basic representation of the structure underlying Bayesian networks, which represent multivariate probability distributions. In many practical applications, such as the reverse engineering of gene regulatory networks, not only the estimation of model parameters but the reconstruction of the structure itself is of great interest. As well as for the assessment of different structure learning algorithms in simulation studies, a uniform sample from the space of directed acyclic graphs is required to evaluate the prevalence of certain structural features. Here we analyse how to sample acyclic digraphs uniformly at random through recursive enumeration, an approach previously thought too computationally involved. Based on complexity considerations, we discuss in particular how the enumeration directly provides an exact method, which avoids the convergence issues of the alternative Markov chain methods and is actually computationally much faster. The limiting behaviour of the distribution of acyclic digraphs then allows us to sample arbitrarily large graphs. Building on the ideas of recursive enumeration based sampling we also introduce a novel hybrid Markov chain with much faster convergence than current alternatives while still being easy to adapt to various restrictions. Finally we discuss how to include such restrictions in the combinatorial enumeration and the new hybrid Markov chain method for efficient uniform sampling of the corresponding graphs."

to:NB
graphical_models
monte_carlo
graph_sampling
23 days ago

On parallel implementation of sequential Monte Carlo methods: the island particle model - Springer

23 days ago

"The approximation of the Feynman-Kac semigroups by systems of interacting particles is a very active research field, with applications in many different areas. In this paper, we study the parallelization of such approximations. The total population of particles is divided into sub-populations, referred to as islands. The particles within each island follow the usual selection/mutation dynamics. We show that the evolution of each island is also driven by a Feynman-Kac semigroup, whose transition and potential can be explicitly related to ones of the original problem. Therefore, the same genetic type approximation of the Feynman-Kac semi-group may be used at the island level; each island might undergo selection/mutation algorithm. We investigate the impact of the population size within each island and the number of islands, and study different type of interactions. We find conditions under which introducing interactions between islands is beneficial. The theoretical results are supported by some Monte Carlo experiments."

to:NB
particle_filters
monte_carlo
computational_statistics
stochastic_processes
interacting_particle_systems
re:amplification_sampling
23 days ago

Critique of The History Manifesto | Deborah Cohen

26 days ago

If the claims made (with quotations) about what's said in the book are accurate, and so are Figures 1--3, then it's really incredibly damning.

book_reviews
evisceration
historiography
have_read
26 days ago

Benjamin, A. and Chartrand, G., Zhang, P.: The Fascinating World of Graph Theory (eBook and Hardcover).

27 days ago

"The fascinating world of graph theory goes back several centuries and revolves around the study of graphs—mathematical structures showing relations between objects. With applications in biology, computer science, transportation science, and other areas, graph theory encompasses some of the most beautiful formulas in mathematics—and some of its most famous problems. For example, what is the shortest route for a traveling salesman seeking to visit a number of cities in one trip? What is the least number of colors needed to fill in any map so that neighboring regions are always colored differently? Requiring readers to have a math background only up to high school algebra, this book explores the questions and puzzles that have been studied, and often solved, through graph theory. In doing so, the book looks at graph theory’s development and the vibrant individuals responsible for the field’s growth.

"Introducing graph theory’s fundamental concepts, the authors explore a diverse plethora of classic problems such as the Lights Out Puzzle, the Minimum Spanning Tree Problem, the Königsberg Bridge Problem, the Chinese Postman Problem, a Knight’s Tour, and the Road Coloring Problem. They present every type of graph imaginable, such as bipartite graphs, Eulerian graphs, the Petersen graph, and trees. Each chapter contains math exercises and problems for readers to savor."

--- For a freshman seminar?

to:NB
books:noted
mathematics
graph_theory
"Introducing graph theory’s fundamental concepts, the authors explore a diverse plethora of classic problems such as the Lights Out Puzzle, the Minimum Spanning Tree Problem, the Königsberg Bridge Problem, the Chinese Postman Problem, a Knight’s Tour, and the Road Coloring Problem. They present every type of graph imaginable, such as bipartite graphs, Eulerian graphs, the Petersen graph, and trees. Each chapter contains math exercises and problems for readers to savor."

--- For a freshman seminar?

27 days ago

Harris, M.: Mathematics without Apologies: Portrait of a Problematic Vocation. (eBook and Hardcover)

27 days ago

"What do pure mathematicians do, and why do they do it? Looking beyond the conventional answers—for the sake of truth, beauty, and practical applications—this book offers an eclectic panorama of the lives and values and hopes and fears of mathematicians in the twenty-first century, assembling material from a startlingly diverse assortment of scholarly, journalistic, and pop culture sources.

"Drawing on his personal experiences and obsessions as well as the thoughts and opinions of mathematicians from Archimedes and Omar Khayyám to such contemporary giants as Alexander Grothendieck and Robert Langlands, Michael Harris reveals the charisma and romance of mathematics as well as its darker side. In this portrait of mathematics as a community united around a set of common intellectual, ethical, and existential challenges, he touches on a wide variety of questions, such as: Are mathematicians to blame for the 2008 financial crisis? How can we talk about the ideas we were born too soon to understand? And how should you react if you are asked to explain number theory at a dinner party?

"Disarmingly candid, relentlessly intelligent, and richly entertaining, Mathematics without Apologies takes readers on an unapologetic guided tour of the mathematical life, from the philosophy and sociology of mathematics to its reflections in film and popular music, with detours through the mathematical and mystical traditions of Russia, India, medieval Islam, the Bronx, and beyond."

--- From looking at the online teaser material, I'll say that its heart is in the right place, but the authorial voice makes me recoil.

books:noted
mathematics
popular_science
"Drawing on his personal experiences and obsessions as well as the thoughts and opinions of mathematicians from Archimedes and Omar Khayyám to such contemporary giants as Alexander Grothendieck and Robert Langlands, Michael Harris reveals the charisma and romance of mathematics as well as its darker side. In this portrait of mathematics as a community united around a set of common intellectual, ethical, and existential challenges, he touches on a wide variety of questions, such as: Are mathematicians to blame for the 2008 financial crisis? How can we talk about the ideas we were born too soon to understand? And how should you react if you are asked to explain number theory at a dinner party?

"Disarmingly candid, relentlessly intelligent, and richly entertaining, Mathematics without Apologies takes readers on an unapologetic guided tour of the mathematical life, from the philosophy and sociology of mathematics to its reflections in film and popular music, with detours through the mathematical and mystical traditions of Russia, India, medieval Islam, the Bronx, and beyond."

--- From looking at the online teaser material, I'll say that its heart is in the right place, but the authorial voice makes me recoil.

27 days ago

AEAweb: JEP (29,1) p. 67 - Putting Distribution Back at the Center of Economics: Reflections on Capital in the Twenty-First Century

27 days ago

"When a lengthy book is widely discussed in academic circles and the popular media, it is probably inevitable that the arguments of the book will be simplified in the telling and retelling. In the case of my book Capital in the Twenty-First Century (2014), a common simplification of the main theme is that because the rate of return on capital r exceeds the growth rate of the economy g, the inequality of wealth is destined to increase indefinitely over time. In my view, the magnitude of the gap between r and g is indeed one of the important forces that can explain historical magnitudes and variations in wealth inequality. However, I do not view r > g as the only or even the primary tool for considering changes in income and wealth in the 20th century, or for forecasting the path of income and wealth inequality in the 21st century. In this essay, I will take up several themes from my book that have perhaps become attenuated or garbled in the ongoing discussions of the book, and will seek to re-explain and re-frame these themes. First, I stress the key role played in my book by the interaction between beliefs systems, institutions, and the dynamics of inequality. Second, I briefly describe my multidimensional approach to the history of capital and inequality. Third, I review the relationship and differing causes between wealth inequality and income inequality. Fourth, I turn to the specific role of r > g in the dynamics of wealth inequality: specifically, a larger r - g gap will amplify the steady-state inequality of a wealth distribution that arises out of a given mixture of shocks. Fifth, I consider some of the scenarios that affect how r - g might evolve in the 21st century, including rising international tax competition, a growth slowdown, and differential access by the wealthy to higher returns on capital. Finally, I seek to clarify what is distinctive in my historical and political economy approach to institutions and inequality dynamics, and the complementarity with other approaches."

--- A reply to critics, including being very polite to the utterly bizarre critique of Acemoglou and Robinson.

piketty.thomas
economics
inequality
have_read
--- A reply to critics, including being very polite to the utterly bizarre critique of Acemoglou and Robinson.

27 days ago

Corbeill, A.: Sexing the World: Grammatical Gender and Biological Sex in Ancient Rome. (eBook and Hardcover)

27 days ago

"From the moment a child in ancient Rome began to speak Latin, the surrounding world became populated with objects possessing grammatical gender—masculine eyes (oculi), feminine trees (arbores), neuter bodies (corpora). Sexing the World surveys the many ways in which grammatical gender enabled Latin speakers to organize aspects of their society into sexual categories, and how this identification of grammatical gender with biological sex affected Roman perceptions of Latin poetry, divine power, and the human hermaphrodite.

"Beginning with the ancient grammarians, Anthony Corbeill examines how these scholars used the gender of nouns to identify the sex of the object being signified, regardless of whether that object was animate or inanimate. This informed the Roman poets who, for a time, changed at whim the grammatical gender for words as seemingly lifeless as “dust” (pulvis) or “tree bark” (cortex). Corbeill then applies the idea of fluid grammatical gender to the basic tenets of Roman religion and state politics. He looks at how the ancients tended to construct Rome’s earliest divinities as related male and female pairs, a tendency that waned in later periods. An analogous change characterized the dual-sexed hermaphrodite, whose sacred and political significance declined as the republican government became an autocracy. Throughout, Corbeill shows that the fluid boundaries of sex and gender became increasingly fixed into opposing and exclusive categories."

to:NB
books:noted
linguistics
ancient_history
latin
history_of_ideas
sex_vs_gender
"Beginning with the ancient grammarians, Anthony Corbeill examines how these scholars used the gender of nouns to identify the sex of the object being signified, regardless of whether that object was animate or inanimate. This informed the Roman poets who, for a time, changed at whim the grammatical gender for words as seemingly lifeless as “dust” (pulvis) or “tree bark” (cortex). Corbeill then applies the idea of fluid grammatical gender to the basic tenets of Roman religion and state politics. He looks at how the ancients tended to construct Rome’s earliest divinities as related male and female pairs, a tendency that waned in later periods. An analogous change characterized the dual-sexed hermaphrodite, whose sacred and political significance declined as the republican government became an autocracy. Throughout, Corbeill shows that the fluid boundaries of sex and gender became increasingly fixed into opposing and exclusive categories."

27 days ago

r - How to add elements to a plot using a knitr chunk without original markdown output? - Stack Overflow

29 days ago

Need to check whether this works when knitting a latex document as well. (Presumably.)

latex
R
knitr
to_teach:statcomp
re:ADAfaEPoV
29 days ago

The Limits of Matter: Chemistry, Mining, and Enlightenment, Fors

29 days ago

"During the seventeenth and eighteenth centuries, Europeans raised a number of questions about the nature of reality and found their answers to be different from those that had satisfied their forebears. They discounted tales of witches, trolls, magic, and miraculous transformations and instead began looking elsewhere to explain the world around them. In The Limits of Matter, Hjalmar Fors investigates how conceptions of matter changed during the Enlightenment and pins this important change in European culture to the formation of the modern discipline of chemistry."

to:NB
books:noted
chemistry
history_of_science
enlightenment
early_modern_european_history
29 days ago

Loving Literature: A Cultural History, Lynch

29 days ago

"Of the many charges laid against contemporary literary scholars, one of the most common—and perhaps the most wounding—is that they simply don't love books. And while the most obvious response is that, no, actually the profession of literary studies does acknowledge and address personal attachments to literature, that answer risks obscuring a more fundamental question: Why should they?

"That question led Deidre Shauna Lynch into the historical and cultural investigation of Loving Literature. How did it come to be that professional literary scholars are expected not just to study, but to love literature, and to inculcate that love in generations of students?"

to:NB
books:noted
academia
literary_criticism
literary_history
criticism_of_criticism_of_criticism
history_of_ideas
history_of_tastes
"That question led Deidre Shauna Lynch into the historical and cultural investigation of Loving Literature. How did it come to be that professional literary scholars are expected not just to study, but to love literature, and to inculcate that love in generations of students?"

29 days ago

What Eudoxus and Aristotle Thought

4 weeks ago

_Not_ the Ptolemaic system, because all the spheres have the same center. Still pretty funky by the time you get to some of the planets.

history_of_science
astronomy
4 weeks ago

[1501.00960] Characterizing the Google Books corpus: Strong limits to inferences of socio-cultural and linguistic evolution

4 weeks ago

"It is tempting to treat frequency trends from Google Books data sets as indicators for the true popularity of various words and phrases. Doing so allows us to draw novel conclusions about the evolution of public perception of a given topic, such as time and gender. However, sampling published works by availability and ease of digitization leads to several important effects. One of these is the surprising ability of a single prolific author to noticeably insert new phrases into a language. A greater effect arises from scientific texts, which have become increasingly prolific in the last several decades and are heavily sampled in the corpus. The result is a surge of phrases typical to academic articles but less common in general, such as references to time in the form of citations. Here, we highlight these dynamics by examining and comparing major contributions to the statistical divergence of English data sets between decades in the period 1800--2000. We find that only the English Fiction data set from the second version of the corpus is not heavily affected by professional texts, in clear contrast to the first version of the fiction data set and both unfiltered English data sets. Our findings emphasize the need to fully characterize the dynamics of the Google Books corpus before using these data sets to draw broad conclusions about cultural and linguistic evolution."

to:NB
selection_bias
linguistics
history_of_ideas
4 weeks ago

[1501.01571] An Introduction to Matrix Concentration Inequalities

4 weeks ago

"In recent years, random matrices have come to play a major role in computational mathematics, but most of the classical areas of random matrix theory remain the province of experts. Over the last decade, with the advent of matrix concentration inequalities, research has advanced to the point where we can conquer many (formerly) challenging problems with a page or two of arithmetic. The aim of this monograph is to describe the most successful methods from this area along with some interesting examples that these techniques can illuminate."

in_NB
probability
random_matrices
concentration_of_measure
deviation_inequalities
4 weeks ago

[1501.06794] Computing Functions of Random Variables via Reproducing Kernel Hilbert Space Representations

4 weeks ago

"We describe a method to perform functional operations on probability distributions of random variables. The method uses reproducing kernel Hilbert space representations of probability distributions, and it is applicable to all operations which can be applied to points drawn from the respective distributions. We refer to our approach as {\em kernel probabilistic programming}. We illustrate it on synthetic data, and show how it can be used for nonparametric structural equation models, with an application to causal inference."

to:NB
kernel_methods
hilbert_space
computational_statistics
causal_inference
statistics
4 weeks ago

[1501.02663] Extremes on river networks

4 weeks ago

"Max-stable processes are the natural extension of the classical extreme-value distributions to the functional setting, and they are increasingly widely used to estimate probabilities of complex extreme events. In this paper we broaden them from the usual setting in which dependence varies according to functions of Euclidean distance to the situation in which extreme river discharges at two locations on a river network may be dependent because the locations are flow-connected or because of common meteorological events. In the former case dependence depends on river distance, and in the second it depends on the hydrological distance between the locations, either of which may be very different from their Euclidean distance. Inference for the model parameters is performed using a multivariate threshold likelihood, which is shown by simulation to work well. The ideas are illustrated with data from the upper Danube basin."

to:NB
spatio-temporal_statistics
extreme_values
statistics
rivers
4 weeks ago

[1404.1578] Models as Approximations: How Random Predictors and Model Violations Invalidate Classical Inference in Regression

4 weeks ago

"We review and interpret the early insights of Halbert White who over thirty years ago inaugurated a form of statistical inference for regression models that is asymptotically correct even under "model misspecification," that is, under the assumption that models are approximations rather than generative truths. This form of inference, which is pervasive in econometrics, relies on the "sandwich estimator" of standard error. Whereas linear models theory in statistics assumes models to be true and predictors to be fixed, White's theory permits models to be approximate and predictors to be random. Careful reading of his work shows that the deepest consequences for statistical inference arise from a synergy --- a "conspiracy" --- of nonlinearity and randomness of the predictors which invalidates the ancillarity argument that justifies conditioning on the predictors when they are random. Unlike the standard error of linear models theory, the sandwich estimator provides asymptotically correct inference in the presence of both nonlinearity and heteroskedasticity. An asymptotic comparison of the two types of standard error shows that discrepancies between them can be of arbitrary magnitude. If there exist discrepancies, standard errors from linear models theory are usually too liberal even though occasionally they can be too conservative as well. A valid alternative to the sandwich estimator is provided by the "pairs bootstrap"; in fact, the sandwich estimator can be shown to be a limiting case of the pairs bootstrap. We conclude by giving meaning to regression slopes when the linear model is an approximation rather than a truth. --- In this review we limit ourselves to linear least squares regression, but many qualitative insights hold for most forms of regression."

-- Very close to what I teach in my class, though I haven't really talked about sandwich variances.

in_NB
have_read
statistics
regression
linear_regression
bootstrap
misspecification
estimation
approximation
-- Very close to what I teach in my class, though I haven't really talked about sandwich variances.

4 weeks ago

The Sharing Economy Isn’t About Sharing at All - HBR

4 weeks ago

Well, yes, obviously. (I have used Zipcar regularly for years, but it would never have occurred to me that it was some form of _sharing_; it's a car _rental_ company which is a lot more convenient for me than the older ones.) Something like Uber or Airbnb makes its money by being the centralized intermediary between consumers and asset owners/service workers. (The goal would be to become the only effective marketplace for that sort of good or service --- would that be the monagorist? --- and so collect rents.) I guess I'd supposed/hoped that people actually in the industry realized this, and the "sharing" rhetoric was conscious camouflage, but this article makes it sound like they believe their own press.

corporations
networked_life
marketing
market_making
economics
have_read
via:wh
4 weeks ago

The cultural evolution of mind reading

4 weeks ago

"It is not just a manner of speaking: “Mind reading,” or working out what others are thinking and feeling, is markedly similar to print reading. Both of these distinctly human skills recover meaning from signs, depend on dedicated cortical areas, are subject to genetically heritable disorders, show cultural variation around a universal core, and regulate how people behave. But when it comes to development, the evidence is conflicting. Some studies show that, like learning to read print, learning to read minds is a long, hard process that depends on tuition. Others indicate that even very young, nonliterate infants are already capable of mind reading. Here, we propose a resolution to this conflict. We suggest that infants are equipped with neurocognitive mechanisms that yield accurate expectations about behavior (“automatic” or “implicit” mind reading), whereas “explicit” mind reading, like literacy, is a culturally inherited skill; it is passed from one generation to the next by verbal instruction."

--- ETA after reading: interesting and not crazy, though not completely convincing; I'd need to think carefully, and look at their references, to decide how much of this is about mind-reading, the activity, vs. talking about mind-reading. (It's also interesting to imagine the psychological theories we might have if literacy were a cultural universal, which it could well be in a century or two.)

to:NB
have_read
cognitive_science
cognitive_development
cultural_transmission_of_cognitive_tools
theory_of_mind
--- ETA after reading: interesting and not crazy, though not completely convincing; I'd need to think carefully, and look at their references, to decide how much of this is about mind-reading, the activity, vs. talking about mind-reading. (It's also interesting to imagine the psychological theories we might have if literacy were a cultural universal, which it could well be in a century or two.)

4 weeks ago

[1411.6179] Spatiotemporal Detection of Unusual Human Population Behavior Using Mobile Phone Data

4 weeks ago

"With the aim to contribute to humanitarian response to disasters and violent events, scientists have proposed the development of analytical tools that could identify emergency events in real-time, using mobile phone data. The assumption is that dramatic and discrete changes in behavior, measured with mobile phone data, will indicate extreme events. In this study, we propose an efficient system for spatiotemporal detection of behavioral anomalies from mobile phone data and compare sites with behavioral anomalies to an extensive database of emergency and non-emergency events in Rwanda. Our methodology successfully captures anomalous behavioral patterns associated with a broad range of events, from religious and official holidays to earthquakes, floods, violence against civilians and protests. Our results suggest that human behavioral responses to extreme events are complex and multi-dimensional, including extreme increases and decreases in both calling and movement behaviors. We also find significant temporal and spatial variance in responses to extreme events. Our behavioral anomaly detection system and extensive discussion of results are a significant contribution to the long-term project of creating an effective real-time event detection system with mobile phone data and we discuss the implications of our findings for future research to this end. "

to:NB
spatio-temporal_statistics
re:social_networks_as_sensor_networks
data_mining
statistics
dobra.adrian
eagle.nathan
anomaly_detection
4 weeks ago

Discovery: Fish Live beneath Antarctica - Scientific American

4 weeks ago

All that's missing is commentary from Drs. Lake and Danforth, and maybe the adjective "Stygian".

antarctica
biology
cthulhiana
4 weeks ago

[1411.2664] Preserving Statistical Validity in Adaptive Data Analysis

4 weeks ago

"A great deal of effort has been devoted to reducing the risk of spurious scientific discoveries, from the use of sophisticated validation techniques, to deep statistical methods for controlling the false discovery rate in multiple hypothesis testing. However, there is a fundamental disconnect between the theoretical results and the practice of data analysis: the theory of statistical inference assumes a fixed collection of hypotheses to be tested, or learning algorithms to be applied, selected non-adaptively before the data are gathered, whereas in practice data is shared and reused with hypotheses and new analyses being generated on the basis of data exploration and the outcomes of previous analyses.

"In this work we initiate a principled study of how to guarantee the validity of statistical inference in adaptive data analysis. As an instance of this problem, we propose and investigate the question of estimating the expectations of m adaptively chosen functions on an unknown distribution given n random samples.

"We show that, surprisingly, there is a way to estimate an \emph{exponential} in n number of expectations accurately even if the functions are chosen adaptively. This gives an exponential improvement over standard empirical estimators that are limited to a linear number of estimates. Our result follows from a general technique that counter-intuitively involves actively perturbing and coordinating the estimates, using techniques developed for privacy preservation. We give additional applications of this technique to our question."

to:NB
to_read
statistics
learning_theory
via:arthegall
concentration_of_measure
stability_of_learning
"In this work we initiate a principled study of how to guarantee the validity of statistical inference in adaptive data analysis. As an instance of this problem, we propose and investigate the question of estimating the expectations of m adaptively chosen functions on an unknown distribution given n random samples.

"We show that, surprisingly, there is a way to estimate an \emph{exponential} in n number of expectations accurately even if the functions are chosen adaptively. This gives an exponential improvement over standard empirical estimators that are limited to a linear number of estimates. Our result follows from a general technique that counter-intuitively involves actively perturbing and coordinating the estimates, using techniques developed for privacy preservation. We give additional applications of this technique to our question."

4 weeks ago

Sociolinguistic Typology: Social Determinants of Linguistic Complexity

4 weeks ago

"Peter Trudgill looks at why human societies at different times and places produce different kinds of language. He considers how far social factors influence language structure and compares languages and dialects spoken across the globe, from Vietnam to Nigeria, Polynesia to Scandinavia, and from Canada to Amazonia.

"Modesty prevents Pennsylvanian Dutch Mennonites using the verb wotte ('want'); stratified society lies behind complicated Japanese honorifics; and a mountainous homeland suggests why speakers of Tibetan-Burmese Lahu have words for up there and down there. But culture and environment don't explain why Amazonian Jarawara needs three past tenses, nor why Nigerian Igbo can make do with eight adjectives, nor why most languages spoken in high altitudes do not exhibit an array of spatial demonstratives. Nor do they account for some languages changing faster than others or why some get more complex while others get simpler. The author looks at these and many other puzzles, exploring the social, linguistic, and other factors that might explain them and in the context of a huge range of languages and societies."

to:NB
books:noted
cultural_evolution
linguistics
complexity
"Modesty prevents Pennsylvanian Dutch Mennonites using the verb wotte ('want'); stratified society lies behind complicated Japanese honorifics; and a mountainous homeland suggests why speakers of Tibetan-Burmese Lahu have words for up there and down there. But culture and environment don't explain why Amazonian Jarawara needs three past tenses, nor why Nigerian Igbo can make do with eight adjectives, nor why most languages spoken in high altitudes do not exhibit an array of spatial demonstratives. Nor do they account for some languages changing faster than others or why some get more complex while others get simpler. The author looks at these and many other puzzles, exploring the social, linguistic, and other factors that might explain them and in the context of a huge range of languages and societies."

4 weeks ago

The C&O Canal Companion

4 weeks ago

"A comprehensive guide to one of America's unique national parks, The C&O Canal Companion takes readers on a mile-by-mile, lock-by-lock tour of the 184-mile Potomac River waterway and towpath that stretches from Washington, D.C., to Cumberland, Maryland, and the Allegheny Mountains. Making extensive use of records at the National Archives and the C&O Canal Park Headquarters, Mike High demonstrates how events and places along the canal relate to the history of the nation, from Civil War battles and river crossings to the frontier forts guarding the route to the West. Using attractive photographs and drawings, he introduces park visitors to the hidden history along the canal and provides practical advice on cycling, paddling, and hiking—all the information needed to fully enjoy the park's varied delights.

"Thoroughly overhauled and expanded, the second edition of this popular, fact-packed book features updated maps and photographs, as well as the latest information on lodgings and other facilities for hikers, bikers, and campers on weekend excursions or extended outdoor vacations. It also delves deeper into the history of the upland region, relaying new narratives about Native American settlements, the European explorers and traders who were among the first settlers, and the lives of slaves and free blacks who lived along or escaped slavery via the canal.

"Visitors to the C&O Canal who are interested in exploring natural wonders while tracing the routes of pioneers and engineers—not to mention the path of George Washington, who explored the Potomac route to the West as a young man and later laid out the first canals to make the river navigable—will find this guide indispensable."

books:noted
maryland
appalachia
travel
american_history
re:GAP_trail_trip
"Thoroughly overhauled and expanded, the second edition of this popular, fact-packed book features updated maps and photographs, as well as the latest information on lodgings and other facilities for hikers, bikers, and campers on weekend excursions or extended outdoor vacations. It also delves deeper into the history of the upland region, relaying new narratives about Native American settlements, the European explorers and traders who were among the first settlers, and the lives of slaves and free blacks who lived along or escaped slavery via the canal.

"Visitors to the C&O Canal who are interested in exploring natural wonders while tracing the routes of pioneers and engineers—not to mention the path of George Washington, who explored the Potomac route to the West as a young man and later laid out the first canals to make the river navigable—will find this guide indispensable."

4 weeks ago

Abandoned Footnotes: The Saudi Monarchy as a Family Firm

5 weeks ago

"Indeed, in some respects the Saudi system has more in common with systems of single party rule than with medieval European kingship. The Al Saud are an odd party, to be sure; only women can join voluntarily (by marrying into the family) but without gaining any formal power (though they may have influence through their sons). But, with its internal dispute resolution mechanisms, its intelligence networks, its “service” requirements, the family basically mimics the institutions of an effective (if small) party on the Leninist model. And thus the incentives that keep it in power are not dissimilar from the incentives that kept the PRI in Mexico or the Chinese Communist Party in power: they are basically reasons for insiders to stick together and not seek outsider support, and thus to prefer corporate control of the state to going alone."

saudi_arabia
political_science
have_read
5 weeks ago

Pop Sonnets

5 weeks ago

Ogged: "That Website Than Which No Greater Can Be Conceived".

funny
poetry
popular_culture
affectionate_parody
via:unfogged
5 weeks ago

Of Course You Hear What I Hear — Christmas Music Season Is Totally Data-Driven | FiveThirtyEight

5 weeks ago

Observations: (i) The domination of our popular culture by the childhoods of baby boomers --- my students' grandparents! --- is truly a force to behold. (ii) And of course that helps shape what all subsequent generations hear as "Christmas music". (iii) I am unreasonably charmed by the idea of using something like pagerank (or is it Kleinberg's HITS?) to identify Christmas-ness. (iv) Wait, _that's_ what happened to the author of _The War Against Silence_?

data_mining
music
christmas
popular_culture
towards_an_algorithmic_criticism
to_teach:data-mining
path_dependence
pagerank
to:blog
have_read
5 weeks ago

Numenera

5 weeks ago

Because what the Dying Earth/Viriconium/Urth etc. needed was a role-playing game. (Actually, it looks pretty good.)

role-playing_games
via:???
5 weeks ago

r - knitr - How to align code and plot side by side - Stack Overflow

5 weeks ago

Could this be modified to put figure on top, then code, then caption?

R
kntir
re:ADAfaEPoV
5 weeks ago

A practical introduction to functional programming at Mary Rose Cook

5 weeks ago

Uses python, but these ideas are exactly the ones I try to teach in that part of my R course, only better expressed.

programming
functional_programming
have_read
via:tealtan
to_teach:statcomp
5 weeks ago

How To Tell If You Are In A High Fantasy Novel

5 weeks ago

Funny, but really this is a criticism of extruded epic fantasy product, rather than high fantasy proper; it's about mistaking Poughkeepsie for Elfland.

funny:geeky
funny:malicious
literary_criticism
fantasy
5 weeks ago

Dehling , Durieu , Tusche : Approximating class approach for empirical processes of dependent sequences indexed by functions

5 weeks ago

"We study weak convergence of empirical processes of dependent data (Xi)i≥0, indexed by classes of functions. Our results are especially suitable for data arising from dynamical systems and Markov chains, where the central limit theorem for partial sums of observables is commonly derived via the spectral gap technique. We are specifically interested in situations where the index class is different from the class of functions f for which we have good properties of the observables (f(Xi))i≥0. We introduce a new bracketing number to measure the size of the index class which fits this setting. Our results apply to the empirical process of data (Xi)i≥0 satisfying a multiple mixing condition. This includes dynamical systems and Markov chains, if the Perron–Frobenius operator or the Markov operator has a spectral gap, but also extends beyond this class, for example, to ergodic torus automorphisms."

to:NB
empirical_processes
approximation
stochastic_processes
markov_models
dynamical_systems
ergodic_theory
mixing
5 weeks ago

Lederer , van de Geer : New concentration inequalities for suprema of empirical processes

5 weeks ago

"While effective concentration inequalities for suprema of empirical processes exist under boundedness or strict tail assumptions, no comparable results have been available under considerably weaker assumptions. In this paper, we derive concentration inequalities assuming only low moments for an envelope of the empirical process. These concentration inequalities are beneficial even when the envelope is much larger than the single functions under consideration."

to:NB
empirical_processes
concentration_of_measure
deviation_inequalities
van_de_geer.sara
stochastic_processes
to_read
5 weeks ago

Crisan , Míguez : Particle-kernel estimation of the filter density in state-space models

5 weeks ago

"Sequential Monte Carlo (SMC) methods, also known as particle filters, are simulation-based recursive algorithms for the approximation of the a posteriori probability measures generated by state-space dynamical models. At any given time t, a SMC method produces a set of samples over the state space of the system of interest (often termed “particles”) that is used to build a discrete and random approximation of the posterior probability distribution of the state variables, conditional on a sequence of available observations. One potential application of the methodology is the estimation of the densities associated to the sequence of a posteriori distributions. While practitioners have rather freely applied such density approximations in the past, the issue has received less attention from a theoretical perspective. In this paper, we address the problem of constructing kernel-based estimates of the posterior probability density function and its derivatives, and obtain asymptotic convergence results for the estimation errors. In particular, we find convergence rates for the approximation errors that hold uniformly on the state space and guarantee that the error vanishes almost surely as the number of particles in the filter grows. Based on this uniform convergence result, we first show how to build continuous measures that converge almost surely (with known rate) toward the posterior measure and then address a few applications. The latter include maximum a posteriori estimation of the system state using the approximate derivatives of the posterior density and the approximation of functionals of it, for example, Shannon’s entropy."

to:NB
particle_filters
kernel_estimators
density_estimation
filtering
state_estimation
state-space_models
statistics
computational_statistics
5 weeks ago

Trashorras , Wintenberger : Large deviations for bootstrapped empirical measures

5 weeks ago

"We investigate the Large Deviations (LD) properties of bootstrapped empirical measures with exchangeable weights. Our main results show in great generality how the resulting rate functions combine the LD properties of both the sample weights and the observations. As an application, we obtain new LD results and discuss both conditional and unconditional LD-efficiency for many classical choices of entries such as Efron’s, leave-p-out, i.i.d. weighted, k-blocks bootstraps, etc."

to:NB
bootstrap
empirical_processes
large_deviations
stochastic_processes
statistics
re:almost_none
5 weeks ago

Fischer : On the form of the large deviation rate function for the empirical measures of weakly interacting systems

5 weeks ago

"A basic result of large deviations theory is Sanov’s theorem, which states that the sequence of empirical measures of independent and identically distributed samples satisfies the large deviation principle with rate function given by relative entropy with respect to the common distribution. Large deviation principles for the empirical measures are also known to hold for broad classes of weakly interacting systems. When the interaction through the empirical measure corresponds to an absolutely continuous change of measure, the rate function can be expressed as relative entropy of a distribution with respect to the law of the McKean–Vlasov limit with measure-variable frozen at that distribution. We discuss situations, beyond that of tilted distributions, in which a large deviation principle holds with rate function in relative entropy form."

to:NB
interacting_particle_systems
large_deviations
information_theory
stochastic_processes
to_read
re:almost_none
5 weeks ago

academia
afghanistan
agent-based_models
american_history
archaeology
art
bad_data_analysis
bad_science_journalism
bayesian_consistency
bayesianism
biochemical_networks
book_reviews
books:noted
books:owned
books:recommended
bootstrap
cartoons
cats
causal_inference
causality
central_asia
central_limit_theorem
class_struggles_in_america
classifiers
climate_change
clustering
cognitive_science
collective_cognition
comics
community_discovery
complexity
computational_statistics
confidence_sets
corruption
coveted
crime
cthulhiana
cultural_criticism
data_analysis
data_mining
debunking
decision-making
decision_theory
delong.brad
democracy
density_estimation
dimension_reduction
distributed_systems
dynamical_systems
econometrics
economic_history
economic_policy
economics
education
empirical_processes
ensemble_methods
entropy_estimation
epidemic_models
epistemology
ergodic_theory
estimation
evisceration
evolution_of_cooperation
evolutionary_biology
experimental_psychology
finance
financial_crisis_of_2007--
financial_markets
financial_speculation
fmri
food
fraud
funny
funny:academic
funny:geeky
funny:laughing_instead_of_screaming
funny:malicious
funny:pointed
graph_theory
graphical_models
have_read
heard_the_talk
heavy_tails
high-dimensional_statistics
hilbert_space
history_of_ideas
history_of_science
human_genetics
hypothesis_testing
ideology
imperialism
in_nb
inequality
inference_to_latent_objects
information_theory
institutions
kernel_estimators
kernel_methods
kith_and_kin
krugman.paul
large_deviations
lasso
learning_theory
liberman.mark
likelihood
linguistics
literary_criticism
machine_learning
macro_from_micro
macroeconomics
manifold_learning
market_failures_in_everything
markov_models
mathematics
mixing
model_selection
modeling
modern_ruins
monte_carlo
moral_psychology
moral_responsibility
mortgage_crisis
natural_history_of_truthiness
network_data_analysis
networked_life
networks
neural_data_analysis
neuroscience
non-equilibrium
nonparametrics
optimization
our_decrepit_institutions
philosophy
philosophy_of_science
photos
physics
pittsburgh
political_economy
political_science
practices_relating_to_the_transmission_of_genetic_information
prediction
pretty_pictures
principal_components
probability
programming
progressive_forces
psychology
r
racism
random_fields
re:almost_none
re:aos_project
re:democratic_cognition
re:do-institutions-evolve
re:g_paper
re:homophily_and_confounding
re:network_differences
re:smoothing_adjacency_matrices
re:social_networks_as_sensor_networks
re:stacs
re:your_favorite_dsge_sucks
recipes
regression
regulation
running_dogs_of_reaction
science_as_a_social_process
science_fiction
simulation
social_influence
social_life_of_the_mind
social_media
social_networks
social_science_methodology
sociology
something_about_america
sparsity
spatial_statistics
state-space_models
statistical_inference_for_stochastic_processes
statistical_mechanics
statistics
stochastic_processes
text_mining
the_american_dilemma
the_continuing_crises
time_series
to:blog
to:nb
to_be_shot_after_a_fair_trial
to_read
to_teach:complexity-and-inference
to_teach:data-mining
to_teach:statcomp
to_teach:undergrad-ada
track_down_references
us-iraq_war
us_politics
utter_stupidity
vast_right-wing_conspiracy
via:?
via:henry_farrell
via:jbdelong
via:klk
visual_display_of_quantitative_information
whats_gone_wrong_with_america
why_oh_why_cant_we_have_a_better_academic_publishing_system
why_oh_why_cant_we_have_a_better_press_corps