**concentration_of_measure**114

[1602.00721] Concentration of measure without independence: a unified approach via the martingale method

6 days ago by cshalizi

"The concentration of measure phenomenon may be summarized as follows: a function of many weakly dependent random variables that is not too sensitive to any of its individual arguments will tend to take values very close to its expectation. This phenomenon is most completely understood when the arguments are mutually independent random variables, and there exist several powerful complementary methods for proving concentration inequalities, such as the martingale method, the entropy method, and the method of transportation inequalities. The setting of dependent arguments is much less well understood. This chapter focuses on the martingale method for deriving concentration inequalities without independence assumptions. In particular, we use the machinery of so-called Wasserstein matrices to show that the Azuma-Hoeffding concentration inequality for martingales with almost surely bounded differences, when applied in a sufficiently abstract setting, is powerful enough to recover and sharpen several known concentration results for nonproduct measures. Wasserstein matrices provide a natural formalism for capturing the interplay between the metric and the probabilistic structures, which is fundamental to the concentration phenomenon."

to:NB
concentration_of_measure
stochastic_processes
kith_and_kin
kontorovich.aryeh
raginsky.maxim
6 days ago by cshalizi

[1907.12686] Concentration of measure, classification of submeasures, and dynamics of $L_{0}$

13 days ago by cshalizi

"Exhibiting a new type of measure concentration, we prove uniform concentration bounds for measurable Lipschitz functions on product spaces, where Lipschitz is taken with respect to the metric induced by a weighted covering of the index set of the product. Our proof combines the Herbst argument with an entropic version of the weighted Loomis--Whitney inequality. We give a quantitative "geometric" classification of diffused submeasures into elliptic, parabolic, and hyperbolic. We prove that any non-elliptic submeasure (for example, any measure, or any pathological submeasure) has a property that we call covering concentration. Our results have strong consequences for the dynamics of the corresponding topological L0-groups."

to:NB
probability
concentration_of_measure
13 days ago by cshalizi

[1905.12202] Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness

11 weeks ago by cshalizi

"Many recent works have shown that adversarial examples that fool classifiers can be found by minimally perturbing a normal input. Recent theoretical results, starting with Gilmer et al. (2018), show that if the inputs are drawn from a concentrated metric probability space, then adversarial examples with small perturbation are inevitable. A concentrated space has the property that any subset with Ω(1) (e.g., 1/100) measure, according to the imposed distribution, has small distance to almost all (e.g., 99/100) of the points in the space. It is not clear, however, whether these theoretical results apply to actual distributions such as images. This paper presents a method for empirically measuring and bounding the concentration of a concrete dataset which is proven to converge to the actual concentration. We use it to empirically estimate the intrinsic robustness to ℓ∞ and ℓ2 perturbations of several image classification benchmarks."

in_NB
to_read
adversarial_examples
concentration_of_measure
probability
statistics
learning_theory
11 weeks ago by cshalizi

[1705.00302] Measure concentration and the weak Pinsker property

march 2019 by rvenkat

Let (X,μ) be a standard probability space. An automorphism T of (X,μ) has the weak Pinsker property if for every ε>0 it has a splitting into a direct product of a Bernoulli shift and an automorphism of entropy less than ε. This property was introduced by Thouvenot, who asked whether it holds for all ergodic automorphisms.

This paper proves that it does. The proof actually gives a more general result. Firstly, it gives a relative version: any factor map from one ergodic automorphism to another can be enlarged by arbitrarily little entropy to become relatively Bernoulli. Secondly, using some facts about relative orbit equivalence, the analogous result holds for all free and ergodic measure-preserving actions of a countable amenable group.

The key to this work is a new result about measure concentration. Suppose now that μ is a probability measure on a finite product space An, and endow this space with its Hamming metric. We prove that μ may be represented as a mixture of other measures in which (i) most of the weight in the mixture is on measures that exhibit a strong kind of concentration, and (ii) the number of summands is bounded in terms of the difference between the Shannon entropy of μ and the combined Shannon entropies of its marginals.

-- Seems like one (the other one being Hairer's proof) of those stare-at-awe-and-never-understand-how-it-was-done kind of proofs. Interesting to philosophize about and think of constructive ways to split the automorphism.

tim.austin
dynamical_system
ergodic_theory
information_theory
concentration_of_measure
This paper proves that it does. The proof actually gives a more general result. Firstly, it gives a relative version: any factor map from one ergodic automorphism to another can be enlarged by arbitrarily little entropy to become relatively Bernoulli. Secondly, using some facts about relative orbit equivalence, the analogous result holds for all free and ergodic measure-preserving actions of a countable amenable group.

The key to this work is a new result about measure concentration. Suppose now that μ is a probability measure on a finite product space An, and endow this space with its Hamming metric. We prove that μ may be represented as a mixture of other measures in which (i) most of the weight in the mixture is on measures that exhibit a strong kind of concentration, and (ii) the number of summands is bounded in terms of the difference between the Shannon entropy of μ and the combined Shannon entropies of its marginals.

-- Seems like one (the other one being Hairer's proof) of those stare-at-awe-and-never-understand-how-it-was-done kind of proofs. Interesting to philosophize about and think of constructive ways to split the automorphism.

march 2019 by rvenkat

High dimensional statistics non asymptotic viewpoint | Statistical theory and methods | Cambridge University Press

february 2019 by rvenkat

Recent years have witnessed an explosion in the volume and variety of data collected in all scientific disciplines and industrial settings. Such massive data sets present a number of challenges to researchers in statistics and machine learning. This book provides a self-contained introduction to the area of high-dimensional statistics, aimed at the first-year graduate level. It includes chapters that are focused on core methodology and theory - including tail bounds, concentration inequalities, uniform laws and empirical process, and random matrices - as well as chapters devoted to in-depth exploration of particular model classes - including sparse linear models, matrix models with rank constraints, graphical models, and various types of non-parametric models. With hundreds of worked examples and exercises, this text is intended both for courses and for self-study by graduate students and researchers in statistics, machine learning, and related fields who must understand, apply, and adapt modern statistical methods suited to large-scale data.

book
high-dimensional_inference
statistics
machine_learning
concentration_of_measure
martin.wainwright
february 2019 by rvenkat

[1512.00677] On concentration for (regularized) empirical risk minimization

july 2016 by rvenkat

Rates of convergence for empirical risk minimizers have been well studied in the literature. In this paper, we aim to provide a complementary set of results, in particular by showing that after normalization, the risk of the empirical minimizer concentrates on a single point. Such results have been established by~\cite{chatterjee2014new} for constrained estimators in the normal sequence model. We first generalize and sharpen this result to regularized least squares with convex penalties, making use of a "direct" argument based on Borell's theorem. We then study generalizations to other loss functions, including the negative log-likelihood for exponential families combined with a strictly convex regularization penalty. The results in this general setting are based on more "indirect" arguments as well as on concentration inequalities for maxima of empirical processes.

statistics
machine_learning
probability
concentration_of_measure
july 2016 by rvenkat

[1602.00721] Concentration of measure without independence: a unified approach via the martingale method

february 2016 by cshalizi

"The concentration of measure phenomenon may be summarized as follows: a function of many weakly dependent random variables that is not too sensitive to any of its individual arguments will tend to take values very close to its expectation. This phenomenon is most completely understood when the arguments are mutually independent random variables, and there exist several powerful complementary methods for proving concentration inequalities, such as the martingale method, the entropy method, and the method of transportation inequalities. The setting of dependent arguments is much less well understood. This chapter focuses on the martingale method for deriving concentration inequalities without independence assumptions. In particular, we use the machinery of so-called Wasserstein matrices to show that the Azuma-Hoeffding concentration inequality for martingales with almost surely bounded differences, when applied in a sufficiently abstract setting, is powerful enough to recover and sharpen several known concentration results for nonproduct measures. Wasserstein matrices provide a natural formalism for capturing the interplay between the metric and the probabilistic structures, which is fundamental to the concentration phenomenon."

to:NB
concentration_of_measure
martingales
stochastic_processes
kontorovich.aryeh
raginsky.maxim
kith_and_kin
february 2016 by cshalizi

High Dimensional Statistics (Rinaldo)

november 2015 by cshalizi

"In big-data problems, it is often the case that the statistical complexity of a single datum is increasing in the sample size. In these instances, the classical statistical modeling approach, which is predicated on the assumption that the number of model parameters remains fixed as the sample size grows unbounded, is no longer applicable. In contrast, high-dimensional statistical models allow for the number of model parameters to change with the sample size, so that, as more data are gathered, models of increasing complexity and predictive power can be used. In order to formalize statistical tasks and evaluate the theoretical performance of statistical procedures in high-dimensional settings, it is necessary to use a different suite of techniques and theories than in traditional statistics.

"The learning objectives of this course are two-fold. The first goal is present various concentration inequalities techniques to derive finite sample upper bounds on the performance of high-dimensional algorithms. The second goal is to exemplify the use of such techniques on various problems borrowed from the current literature, including high-dimensional regression and compressed sensing, matrix completion, high-dimensional graphical modeling, community detection, network analysis, etc. "

--- First of two mini-courses, second is at http://www.stat.cmu.edu/~arinaldo/36789/

statistics
concentration_of_measure
high-dimensional_statistics
kith_and_kin
rinaldo.alessandro
"The learning objectives of this course are two-fold. The first goal is present various concentration inequalities techniques to derive finite sample upper bounds on the performance of high-dimensional algorithms. The second goal is to exemplify the use of such techniques on various problems borrowed from the current literature, including high-dimensional regression and compressed sensing, matrix completion, high-dimensional graphical modeling, community detection, network analysis, etc. "

--- First of two mini-courses, second is at http://www.stat.cmu.edu/~arinaldo/36789/

november 2015 by cshalizi

[1507.02803] Logarithmic Sobolev inequalities in discrete product spaces: a proof by a transportation cost distance

july 2015 by cshalizi

"The aim of this paper is to prove an inequality between relative entropy and the sum of average conditional relative entropies of the following form: For a fixed probability measure qn on n, ( is a finite set), and any probability measure pn=(Yn) on n, we have

D(pn||qn)≤Const.∑i=1n𝔼pnD(pi(⋅|Y1,…,Yi−1,Yi+1,…,Yn)||qi(⋅|Y1,…,Yi−1,Yi+1,…,Yn)),

where pi(⋅|y1,…,yi−1,yi+1,…,yn) and qi(⋅|x1,…,xi−1,xi+1,…,xn) denote the local specifications for pn resp. qn. The constant shall depend on the properties of the local specifications of qn"

(The abstract goes on)

to:NB
probability
information_theory
concentration_of_measure
via:ded-maxim
marton.katalin
D(pn||qn)≤Const.∑i=1n𝔼pnD(pi(⋅|Y1,…,Yi−1,Yi+1,…,Yn)||qi(⋅|Y1,…,Yi−1,Yi+1,…,Yn)),

where pi(⋅|y1,…,yi−1,yi+1,…,yn) and qi(⋅|x1,…,xi−1,xi+1,…,xn) denote the local specifications for pn resp. qn. The constant shall depend on the properties of the local specifications of qn"

(The abstract goes on)

july 2015 by cshalizi

[1502.03049] Sparse random graphs: regularization and concentration of the Laplacian

june 2015 by rvenkat

We study random graphs with possibly different edge probabilities in the challenging sparse regime of bounded expected degrees. Unlike in the dense case, neither the graph adjacency matrix nor its Laplacian concentrate around their expectations due to the highly irregular distribution of node degrees. It has been empirically observed that simply adding a constant of order 1/n to each entry of the adjacency matrix substantially improves the behavior of Laplacian. Here we prove that this regularization indeed forces Laplacian to concentrate even in sparse graphs. As an immediate consequence in network analysis, we establish the validity of one of the simplest and fastest approaches to community detection -- regularized spectral clustering, under the stochastic block model. Our proof of concentration of regularized Laplacian is based on Grothendieck's inequality and factorization, combined with paving arguments

graph_limit
concentration_of_measure
networks
june 2015 by rvenkat

[1506.00669] Concentration and regularization of random graphs

june 2015 by rvenkat

This paper studies how close random graphs are typically to their expectations. We interpret this question through the concentration of the adjacency and Laplacian matrices in the spectral norm. We study inhomogeneous Erd\"os-R\'enyi random graphs on n vertices, where edges form independently and possibly with different probabilities pij. Sparse random graphs whose expected degrees are o(logn) fail to concentrate. The obstruction is caused by vertices with abnormally high and low degrees. We show that concentration can be restored if we regularize the degrees of such vertices, and one can do this is various ways. As an example, let us reweight or remove enough edges to make all degrees bounded above by O(d) where d=maxpnij. Then we show that the resulting adjacency matrix A′ concentrates with the optimal rate: ∥A′−EA∥=O(d√). Similarly, if we make all degrees bounded below by d by adding weight d/n to all edges, then the resulting Laplacian concentrates with the optimal rate: ∥L(A′)−L(EA′)∥=O(1/d√). Our approach is based on Grothendieck-Pietsch factorization, using which we construct a new decomposition of random graphs. These results improve and simplify the recent work of L. Levina and the authors. We illustrate the concentration results with an application to the community detection problem in the analysis of networks.

concentration_of_measure
networks
june 2015 by rvenkat

[1506.00669] Concentration and regularization of random graphs

june 2015 by cshalizi

"This paper studies how close random graphs are typically to their expectations. We interpret this question through the concentration of the adjacency and Laplacian matrices in the spectral norm. We study inhomogeneous Erd\"os-R\'enyi random graphs on n vertices, where edges form independently and possibly with different probabilities pij. Sparse random graphs whose expected degrees are o(logn) fail to concentrate. The obstruction is caused by vertices with abnormally high and low degrees. We show that concentration can be restored if we regularize the degrees of such vertices, and one can do this is various ways. As an example, let us reweight or remove enough edges to make all degrees bounded above by O(d) where d=maxpnij. Then we show that the resulting adjacency matrix A′ concentrates with the optimal rate: ∥A′−𝔼A∥=O(d‾‾√). Similarly, if we make all degrees bounded below by d by adding weight d/n to all edges, then the resulting Laplacian concentrates with the optimal rate: ∥L(A′)−L(𝔼A′)∥=O(1/d‾‾√). Our approach is based on Grothendieck-Pietsch factorization, using which we construct a new decomposition of random graphs. These results improve and simplify the recent work of L. Levina and the authors. We illustrate the concentration results with an application to the community detection problem in the analysis of networks."

to:NB
to_read
concentration_of_measure
graph_theory
graph_limits
re:smoothing_adjacency_matrices
re:network_differences
via:ded-maxim
june 2015 by cshalizi

[1503.05077] Tail index estimation, concentration and adaptivity

may 2015 by cshalizi

"This paper presents an adaptive version of the Hill estimator based on Lespki's model selection method. This simple data-driven index selection method is shown to satisfy an oracle inequality and is checked to achieve the lower bound recently derived by Carpentier and Kim. In order to establish the oracle inequality, we derive non-asymptotic variance bounds and concentration inequalities for Hill estimators. These concentration inequalities are derived from Talagrand's concentration inequality for smooth functions of independent exponentially distributed random variables combined with three tools of Extreme Value Theory: the quantile transform, Karamata's representation of slowly varying functions, and R\'enyi's characterisation of the order statistics of exponential samples. The performance of this computationally and conceptually simple method is illustrated using Monte-Carlo simulations."

to:NB
heavy_tails
statistics
to_read
concentration_of_measure
may 2015 by cshalizi

[1501.01571] An Introduction to Matrix Concentration Inequalities

february 2015 by cshalizi

"In recent years, random matrices have come to play a major role in computational mathematics, but most of the classical areas of random matrix theory remain the province of experts. Over the last decade, with the advent of matrix concentration inequalities, research has advanced to the point where we can conquer many (formerly) challenging problems with a page or two of arithmetic. The aim of this monograph is to describe the most successful methods from this area along with some interesting examples that these techniques can illuminate."

in_NB
probability
random_matrices
concentration_of_measure
deviation_inequalities
february 2015 by cshalizi

[1411.2664] Preserving Statistical Validity in Adaptive Data Analysis

january 2015 by cshalizi

"A great deal of effort has been devoted to reducing the risk of spurious scientific discoveries, from the use of sophisticated validation techniques, to deep statistical methods for controlling the false discovery rate in multiple hypothesis testing. However, there is a fundamental disconnect between the theoretical results and the practice of data analysis: the theory of statistical inference assumes a fixed collection of hypotheses to be tested, or learning algorithms to be applied, selected non-adaptively before the data are gathered, whereas in practice data is shared and reused with hypotheses and new analyses being generated on the basis of data exploration and the outcomes of previous analyses.

"In this work we initiate a principled study of how to guarantee the validity of statistical inference in adaptive data analysis. As an instance of this problem, we propose and investigate the question of estimating the expectations of m adaptively chosen functions on an unknown distribution given n random samples.

"We show that, surprisingly, there is a way to estimate an \emph{exponential} in n number of expectations accurately even if the functions are chosen adaptively. This gives an exponential improvement over standard empirical estimators that are limited to a linear number of estimates. Our result follows from a general technique that counter-intuitively involves actively perturbing and coordinating the estimates, using techniques developed for privacy preservation. We give additional applications of this technique to our question."

to:NB
to_read
statistics
learning_theory
via:arthegall
concentration_of_measure
stability_of_learning
"In this work we initiate a principled study of how to guarantee the validity of statistical inference in adaptive data analysis. As an instance of this problem, we propose and investigate the question of estimating the expectations of m adaptively chosen functions on an unknown distribution given n random samples.

"We show that, surprisingly, there is a way to estimate an \emph{exponential} in n number of expectations accurately even if the functions are chosen adaptively. This gives an exponential improvement over standard empirical estimators that are limited to a linear number of estimates. Our result follows from a general technique that counter-intuitively involves actively perturbing and coordinating the estimates, using techniques developed for privacy preservation. We give additional applications of this technique to our question."

january 2015 by cshalizi

Lederer , van de Geer : New concentration inequalities for suprema of empirical processes

january 2015 by cshalizi

"While effective concentration inequalities for suprema of empirical processes exist under boundedness or strict tail assumptions, no comparable results have been available under considerably weaker assumptions. In this paper, we derive concentration inequalities assuming only low moments for an envelope of the empirical process. These concentration inequalities are beneficial even when the envelope is much larger than the single functions under consideration."

in_NB
empirical_processes
concentration_of_measure
deviation_inequalities
van_de_geer.sara
stochastic_processes
to_read
january 2015 by cshalizi

[1409.8557] Statistical Theory for High-Dimensional Models

january 2015 by cshalizi

"These lecture notes consist of three chapters. In the first chapter we present oracle inequalities for the prediction error of the Lasso and square-root Lasso and briefly describe the scaled Lasso. In the second chapter we establish asymptotic linearity of a de-sparsified Lasso. This implies asymptotic normality under certain conditions and therefore can be used to construct confidence intervals for parameters of interest. A similar line of reasoning can be invoked to derive bounds in sup-norm for the Lasso and asymptotic linearity of de-sparsified estimators of a precision matrix. In the third chapter we consider chaining and the more general generic chaining method developed by Talagrand. This allows one to bound suprema of random processes. Concentration inequalities are refined probability inequalities, mostly again for suprema of random processes. We combine the two. We prove a deviation inequality directly using (generic) chaining."

to:NB
statistics
high-dimensional_statistics
concentration_of_measure
empirical_processes
regression
lasso
van_de_geer.sara
january 2015 by cshalizi

[1409.6276] Concentration Inequalities from Likelihood Ratio Method

january 2015 by cshalizi

"We explore the applications of our previously established likelihood-ratio method for deriving concentration inequalities for a wide variety of univariate and multivariate distributions. New concentration inequalities for various distributions are developed without the idea of minimizing moment generating functions."

in_NB
deviation_inequalities
probability
concentration_of_measure
january 2015 by cshalizi

Probability in High Dimension

july 2014 by cshalizi

2014 lecture notes for van Handel's class. Looks great.

in_NB
concentration_of_measure
empirical_processes
probability
high-dimensional_probability
learning_theory
vc-dimension
van_handel.ramon
via:arsyed
re:almost_none
july 2014 by cshalizi

Concentration Inequalities and Estimation of Conditional Probabilities

june 2014 by cshalizi

"We prove concentration inequalities inspired from [DP] to obtain estimators of conditional probabilities for weak dependant se- quences. This generalize results from Csisza ́r ([Cs]). For Gibbs mea- sures and dynamical systems, these results lead to construct estimators of the potential function and also to test the nullity of the asymptotic variance of the system."

in_NB
concentration_of_measure
stochastic_processes
re:AoS_project
have_read
mixing
june 2014 by cshalizi

**related tags**

Copy this bookmark: