concentration_of_measure   114
[1602.00721] Concentration of measure without independence: a unified approach via the martingale method
"The concentration of measure phenomenon may be summarized as follows: a function of many weakly dependent random variables that is not too sensitive to any of its individual arguments will tend to take values very close to its expectation. This phenomenon is most completely understood when the arguments are mutually independent random variables, and there exist several powerful complementary methods for proving concentration inequalities, such as the martingale method, the entropy method, and the method of transportation inequalities. The setting of dependent arguments is much less well understood. This chapter focuses on the martingale method for deriving concentration inequalities without independence assumptions. In particular, we use the machinery of so-called Wasserstein matrices to show that the Azuma-Hoeffding concentration inequality for martingales with almost surely bounded differences, when applied in a sufficiently abstract setting, is powerful enough to recover and sharpen several known concentration results for nonproduct measures. Wasserstein matrices provide a natural formalism for capturing the interplay between the metric and the probabilistic structures, which is fundamental to the concentration phenomenon."
to:NB  concentration_of_measure  stochastic_processes  kith_and_kin  kontorovich.aryeh  raginsky.maxim
6 days ago by cshalizi
[1907.12686] Concentration of measure, classification of submeasures, and dynamics of $L_{0}$
"Exhibiting a new type of measure concentration, we prove uniform concentration bounds for measurable Lipschitz functions on product spaces, where Lipschitz is taken with respect to the metric induced by a weighted covering of the index set of the product. Our proof combines the Herbst argument with an entropic version of the weighted Loomis--Whitney inequality. We give a quantitative "geometric" classification of diffused submeasures into elliptic, parabolic, and hyperbolic. We prove that any non-elliptic submeasure (for example, any measure, or any pathological submeasure) has a property that we call covering concentration. Our results have strong consequences for the dynamics of the corresponding topological L0-groups."
to:NB  probability  concentration_of_measure
13 days ago by cshalizi
[1905.12202] Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness
"Many recent works have shown that adversarial examples that fool classifiers can be found by minimally perturbing a normal input. Recent theoretical results, starting with Gilmer et al. (2018), show that if the inputs are drawn from a concentrated metric probability space, then adversarial examples with small perturbation are inevitable. A concentrated space has the property that any subset with Ω(1) (e.g., 1/100) measure, according to the imposed distribution, has small distance to almost all (e.g., 99/100) of the points in the space. It is not clear, however, whether these theoretical results apply to actual distributions such as images. This paper presents a method for empirically measuring and bounding the concentration of a concrete dataset which is proven to converge to the actual concentration. We use it to empirically estimate the intrinsic robustness to ℓ∞ and ℓ2 perturbations of several image classification benchmarks."
11 weeks ago by cshalizi
[1705.00302] Measure concentration and the weak Pinsker property
Let (X,μ) be a standard probability space. An automorphism T of (X,μ) has the weak Pinsker property if for every ε>0 it has a splitting into a direct product of a Bernoulli shift and an automorphism of entropy less than ε. This property was introduced by Thouvenot, who asked whether it holds for all ergodic automorphisms.
This paper proves that it does. The proof actually gives a more general result. Firstly, it gives a relative version: any factor map from one ergodic automorphism to another can be enlarged by arbitrarily little entropy to become relatively Bernoulli. Secondly, using some facts about relative orbit equivalence, the analogous result holds for all free and ergodic measure-preserving actions of a countable amenable group.
The key to this work is a new result about measure concentration. Suppose now that μ is a probability measure on a finite product space An, and endow this space with its Hamming metric. We prove that μ may be represented as a mixture of other measures in which (i) most of the weight in the mixture is on measures that exhibit a strong kind of concentration, and (ii) the number of summands is bounded in terms of the difference between the Shannon entropy of μ and the combined Shannon entropies of its marginals.

-- Seems like one (the other one being Hairer's proof) of those stare-at-awe-and-never-understand-how-it-was-done kind of proofs. Interesting to philosophize about and think of constructive ways to split the automorphism.
tim.austin  dynamical_system  ergodic_theory  information_theory  concentration_of_measure
march 2019 by rvenkat
High dimensional statistics non asymptotic viewpoint | Statistical theory and methods | Cambridge University Press
Recent years have witnessed an explosion in the volume and variety of data collected in all scientific disciplines and industrial settings. Such massive data sets present a number of challenges to researchers in statistics and machine learning. This book provides a self-contained introduction to the area of high-dimensional statistics, aimed at the first-year graduate level. It includes chapters that are focused on core methodology and theory - including tail bounds, concentration inequalities, uniform laws and empirical process, and random matrices - as well as chapters devoted to in-depth exploration of particular model classes - including sparse linear models, matrix models with rank constraints, graphical models, and various types of non-parametric models. With hundreds of worked examples and exercises, this text is intended both for courses and for self-study by graduate students and researchers in statistics, machine learning, and related fields who must understand, apply, and adapt modern statistical methods suited to large-scale data.
book  high-dimensional_inference  statistics  machine_learning  concentration_of_measure  martin.wainwright
february 2019 by rvenkat
[1512.00677] On concentration for (regularized) empirical risk minimization
Rates of convergence for empirical risk minimizers have been well studied in the literature. In this paper, we aim to provide a complementary set of results, in particular by showing that after normalization, the risk of the empirical minimizer concentrates on a single point. Such results have been established by~\cite{chatterjee2014new} for constrained estimators in the normal sequence model. We first generalize and sharpen this result to regularized least squares with convex penalties, making use of a "direct" argument based on Borell's theorem. We then study generalizations to other loss functions, including the negative log-likelihood for exponential families combined with a strictly convex regularization penalty. The results in this general setting are based on more "indirect" arguments as well as on concentration inequalities for maxima of empirical processes.
statistics  machine_learning  probability  concentration_of_measure
july 2016 by rvenkat
[1602.00721] Concentration of measure without independence: a unified approach via the martingale method
"The concentration of measure phenomenon may be summarized as follows: a function of many weakly dependent random variables that is not too sensitive to any of its individual arguments will tend to take values very close to its expectation. This phenomenon is most completely understood when the arguments are mutually independent random variables, and there exist several powerful complementary methods for proving concentration inequalities, such as the martingale method, the entropy method, and the method of transportation inequalities. The setting of dependent arguments is much less well understood. This chapter focuses on the martingale method for deriving concentration inequalities without independence assumptions. In particular, we use the machinery of so-called Wasserstein matrices to show that the Azuma-Hoeffding concentration inequality for martingales with almost surely bounded differences, when applied in a sufficiently abstract setting, is powerful enough to recover and sharpen several known concentration results for nonproduct measures. Wasserstein matrices provide a natural formalism for capturing the interplay between the metric and the probabilistic structures, which is fundamental to the concentration phenomenon."
to:NB  concentration_of_measure  martingales  stochastic_processes  kontorovich.aryeh  raginsky.maxim  kith_and_kin
february 2016 by cshalizi
High Dimensional Statistics (Rinaldo)
"In big-data problems, it is often the case that the statistical complexity of a single datum is increasing in the sample size. In these instances, the classical statistical modeling approach, which is predicated on the assumption that the number of model parameters remains fixed as the sample size grows unbounded, is no longer applicable. In contrast, high-dimensional statistical models allow for the number of model parameters to change with the sample size, so that, as more data are gathered, models of increasing complexity and predictive power can be used. In order to formalize statistical tasks and evaluate the theoretical performance of statistical procedures in high-dimensional settings, it is necessary to use a different suite of techniques and theories than in traditional statistics.
"The learning objectives of this course are two-fold. The first goal is present various concentration inequalities techniques to derive finite sample upper bounds on the performance of high-dimensional algorithms. The second goal is to exemplify the use of such techniques on various problems borrowed from the current literature, including high-dimensional regression and compressed sensing, matrix completion, high-dimensional graphical modeling, community detection, network analysis, etc. "
--- First of two mini-courses, second is at http://www.stat.cmu.edu/~arinaldo/36789/
statistics  concentration_of_measure  high-dimensional_statistics  kith_and_kin  rinaldo.alessandro
november 2015 by cshalizi
[1507.02803] Logarithmic Sobolev inequalities in discrete product spaces: a proof by a transportation cost distance
"The aim of this paper is to prove an inequality between relative entropy and the sum of average conditional relative entropies of the following form: For a fixed probability measure qn on n, ( is a finite set), and any probability measure pn=(Yn) on n, we have
D(pn||qn)≤Const.∑i=1n𝔼pnD(pi(⋅|Y1,…,Yi−1,Yi+1,…,Yn)||qi(⋅|Y1,…,Yi−1,Yi+1,…,Yn)),
where pi(⋅|y1,…,yi−1,yi+1,…,yn) and qi(⋅|x1,…,xi−1,xi+1,…,xn) denote the local specifications for pn resp. qn. The constant shall depend on the properties of the local specifications of qn"

(The abstract goes on)
to:NB  probability  information_theory  concentration_of_measure  via:ded-maxim  marton.katalin
july 2015 by cshalizi
[1502.03049] Sparse random graphs: regularization and concentration of the Laplacian
We study random graphs with possibly different edge probabilities in the challenging sparse regime of bounded expected degrees. Unlike in the dense case, neither the graph adjacency matrix nor its Laplacian concentrate around their expectations due to the highly irregular distribution of node degrees. It has been empirically observed that simply adding a constant of order 1/n to each entry of the adjacency matrix substantially improves the behavior of Laplacian. Here we prove that this regularization indeed forces Laplacian to concentrate even in sparse graphs. As an immediate consequence in network analysis, we establish the validity of one of the simplest and fastest approaches to community detection -- regularized spectral clustering, under the stochastic block model. Our proof of concentration of regularized Laplacian is based on Grothendieck's inequality and factorization, combined with paving arguments
graph_limit  concentration_of_measure  networks
june 2015 by rvenkat
[1506.00669] Concentration and regularization of random graphs
This paper studies how close random graphs are typically to their expectations. We interpret this question through the concentration of the adjacency and Laplacian matrices in the spectral norm. We study inhomogeneous Erd\"os-R\'enyi random graphs on n vertices, where edges form independently and possibly with different probabilities pij. Sparse random graphs whose expected degrees are o(logn) fail to concentrate. The obstruction is caused by vertices with abnormally high and low degrees. We show that concentration can be restored if we regularize the degrees of such vertices, and one can do this is various ways. As an example, let us reweight or remove enough edges to make all degrees bounded above by O(d) where d=maxpnij. Then we show that the resulting adjacency matrix A′ concentrates with the optimal rate: ∥A′−EA∥=O(d√). Similarly, if we make all degrees bounded below by d by adding weight d/n to all edges, then the resulting Laplacian concentrates with the optimal rate: ∥L(A′)−L(EA′)∥=O(1/d√). Our approach is based on Grothendieck-Pietsch factorization, using which we construct a new decomposition of random graphs. These results improve and simplify the recent work of L. Levina and the authors. We illustrate the concentration results with an application to the community detection problem in the analysis of networks.
concentration_of_measure  networks
june 2015 by rvenkat
[1506.00669] Concentration and regularization of random graphs
"This paper studies how close random graphs are typically to their expectations. We interpret this question through the concentration of the adjacency and Laplacian matrices in the spectral norm. We study inhomogeneous Erd\"os-R\'enyi random graphs on n vertices, where edges form independently and possibly with different probabilities pij. Sparse random graphs whose expected degrees are o(logn) fail to concentrate. The obstruction is caused by vertices with abnormally high and low degrees. We show that concentration can be restored if we regularize the degrees of such vertices, and one can do this is various ways. As an example, let us reweight or remove enough edges to make all degrees bounded above by O(d) where d=maxpnij. Then we show that the resulting adjacency matrix A′ concentrates with the optimal rate: ∥A′−𝔼A∥=O(d‾‾√). Similarly, if we make all degrees bounded below by d by adding weight d/n to all edges, then the resulting Laplacian concentrates with the optimal rate: ∥L(A′)−L(𝔼A′)∥=O(1/d‾‾√). Our approach is based on Grothendieck-Pietsch factorization, using which we construct a new decomposition of random graphs. These results improve and simplify the recent work of L. Levina and the authors. We illustrate the concentration results with an application to the community detection problem in the analysis of networks."
june 2015 by cshalizi
[1503.05077] Tail index estimation, concentration and adaptivity
"This paper presents an adaptive version of the Hill estimator based on Lespki's model selection method. This simple data-driven index selection method is shown to satisfy an oracle inequality and is checked to achieve the lower bound recently derived by Carpentier and Kim. In order to establish the oracle inequality, we derive non-asymptotic variance bounds and concentration inequalities for Hill estimators. These concentration inequalities are derived from Talagrand's concentration inequality for smooth functions of independent exponentially distributed random variables combined with three tools of Extreme Value Theory: the quantile transform, Karamata's representation of slowly varying functions, and R\'enyi's characterisation of the order statistics of exponential samples. The performance of this computationally and conceptually simple method is illustrated using Monte-Carlo simulations."
may 2015 by cshalizi
[1501.01571] An Introduction to Matrix Concentration Inequalities
"In recent years, random matrices have come to play a major role in computational mathematics, but most of the classical areas of random matrix theory remain the province of experts. Over the last decade, with the advent of matrix concentration inequalities, research has advanced to the point where we can conquer many (formerly) challenging problems with a page or two of arithmetic. The aim of this monograph is to describe the most successful methods from this area along with some interesting examples that these techniques can illuminate."
in_NB  probability  random_matrices  concentration_of_measure  deviation_inequalities
february 2015 by cshalizi
[1411.2664] Preserving Statistical Validity in Adaptive Data Analysis
"A great deal of effort has been devoted to reducing the risk of spurious scientific discoveries, from the use of sophisticated validation techniques, to deep statistical methods for controlling the false discovery rate in multiple hypothesis testing. However, there is a fundamental disconnect between the theoretical results and the practice of data analysis: the theory of statistical inference assumes a fixed collection of hypotheses to be tested, or learning algorithms to be applied, selected non-adaptively before the data are gathered, whereas in practice data is shared and reused with hypotheses and new analyses being generated on the basis of data exploration and the outcomes of previous analyses.
"In this work we initiate a principled study of how to guarantee the validity of statistical inference in adaptive data analysis. As an instance of this problem, we propose and investigate the question of estimating the expectations of m adaptively chosen functions on an unknown distribution given n random samples.
"We show that, surprisingly, there is a way to estimate an \emph{exponential} in n number of expectations accurately even if the functions are chosen adaptively. This gives an exponential improvement over standard empirical estimators that are limited to a linear number of estimates. Our result follows from a general technique that counter-intuitively involves actively perturbing and coordinating the estimates, using techniques developed for privacy preservation. We give additional applications of this technique to our question."
to:NB  to_read  statistics  learning_theory  via:arthegall  concentration_of_measure  stability_of_learning
january 2015 by cshalizi
Lederer , van de Geer : New concentration inequalities for suprema of empirical processes
"While effective concentration inequalities for suprema of empirical processes exist under boundedness or strict tail assumptions, no comparable results have been available under considerably weaker assumptions. In this paper, we derive concentration inequalities assuming only low moments for an envelope of the empirical process. These concentration inequalities are beneficial even when the envelope is much larger than the single functions under consideration."
in_NB  empirical_processes  concentration_of_measure  deviation_inequalities  van_de_geer.sara  stochastic_processes  to_read
january 2015 by cshalizi
[1409.8557] Statistical Theory for High-Dimensional Models
"These lecture notes consist of three chapters. In the first chapter we present oracle inequalities for the prediction error of the Lasso and square-root Lasso and briefly describe the scaled Lasso. In the second chapter we establish asymptotic linearity of a de-sparsified Lasso. This implies asymptotic normality under certain conditions and therefore can be used to construct confidence intervals for parameters of interest. A similar line of reasoning can be invoked to derive bounds in sup-norm for the Lasso and asymptotic linearity of de-sparsified estimators of a precision matrix. In the third chapter we consider chaining and the more general generic chaining method developed by Talagrand. This allows one to bound suprema of random processes. Concentration inequalities are refined probability inequalities, mostly again for suprema of random processes. We combine the two. We prove a deviation inequality directly using (generic) chaining."
to:NB  statistics  high-dimensional_statistics  concentration_of_measure  empirical_processes  regression  lasso  van_de_geer.sara
january 2015 by cshalizi
[1409.6276] Concentration Inequalities from Likelihood Ratio Method
"We explore the applications of our previously established likelihood-ratio method for deriving concentration inequalities for a wide variety of univariate and multivariate distributions. New concentration inequalities for various distributions are developed without the idea of minimizing moment generating functions."
in_NB  deviation_inequalities  probability  concentration_of_measure
january 2015 by cshalizi
Concentration Inequalities and Estimation of Conditional Probabilities
"We prove concentration inequalities inspired from [DP] to obtain estimators of conditional probabilities for weak dependant se- quences. This generalize results from Csisza ́r ([Cs]). For Gibbs mea- sures and dynamical systems, these results lead to construct estimators of the potential function and also to test the nullity of the asymptotic variance of the system."
in_NB  concentration_of_measure  stochastic_processes  re:AoS_project  have_read  mixing
june 2014 by cshalizi

Copy this bookmark:

description:

tags: