cshalizi + vc-dimension   26

Model Selection via the VC Dimension
"We derive an objective function that can be optimized to give an estimator for the Vapnik-Chervonenkis dimension for use in model selection in regression problems. We verify our estimator is consistent. Then, we verify it performs well compared to seven other model selection techniques. We do this for a variety of types of data sets."
to:NB  learning_theory  vc-dimension  statistics  model_selection 
june 2019 by cshalizi
Model Selection via the VC Dimension
"We develop an objective function that can be readily optimized to give an estimator of the Vapnik-Chervonenkis dimension for regression problems. We verify our estimator is consistent and performs well in simulations. We use our estimator on two datasets both acknowledged to be difficult and see that it gives results that are comparable in quality if not better than established techniques such as the Bayes information criterion, two forms of empirical risk minimization, and two sparsity methods"

--- If they really fixed the issues with the Vapnik et al. (1994) results, this would be very nice.
to:NB  learning_theory  model_selection  statistics  regression  classifiers  vc-dimension  to_read 
march 2018 by cshalizi
Learning Distributions by Their Density Levels: A Paradigm for Learning without a Teacher - ScienceDirect
"We propose a mathematical model for learning the high-density areas of an unknown distribution from (unlabeled) random points drawn according to this distribution. While this type of a learning task has not been previously addressed in the computational learnability literature, we believe that this it a rather basic problem that appears in many practical learning scenarios. From a statistical theory standpoint, our model may be viewed as a restricted instance of the fundamental issue of inferring information about a probability distribution from the random samples it generates. From a computational learning angle, what we propose is a few framework of unsupervised concept learning. The examples provided to the learner in our model are not labeled (and are not necessarily all positive or all negative). The only information about their membership is indirectly disclosed to the student through the sampling distribution. We investigate the basic features of the proposed model and provide lower and upper bounds on the sample complexity of such learning tasks. We prove that classes whose VC-dimension is finite are learnable in a very strong sense, while on the other hand,ε-covering numbers of a concept class impose lower bounds on the sample size needed for learning in our models. One direction of the proof involves a reduction of the density-level learnability to PAC learning with respect to fixed distributions (as well as some fundamental statistical lower bounds), while the sufficiency condition is proved through the introduction of a generic learning algorithm."
to:NB  vc-dimension 
april 2017 by cshalizi
Bounding the Vapnik-Chervonenkis dimension of concept classes parameterized by real numbers | SpringerLink
"The Vapnik-Chervonenkis (V-C) dimension is an important combinatorial tool in the analysis of learning problems in the PAC framework. For polynomial learnability, we seek upper bounds on the V-C dimension that are polynomial in the syntactic complexity of concepts. Such upper bounds are automatic for discrete concept classes, but hitherto little has been known about what general conditions guarantee polynomial bounds on V-C dimension for classes in which concepts and examples are represented by tuples of real numbers. In this paper, we show that for two general kinds of concept class the V-C dimension is polynomially bounded in the number of real numbers used to define a problem instance. One is classes where the criterion for membership of an instance in a concept can be expressed as a formula (in the first-order theory of the reals) with fixed quantification depth and exponentially-bounded length, whose atomic predicates are polynomial inequalities of exponentially-bounded degree. The other is classes where containment of an instance in a concept is testable in polynomial time, assuming we may compute standard arithmetic operations on reals exactly in constant time.
"Our results show that in the continuous case, as in the discrete, the real barrier to efficient learning in the Occam sense is complexity-theoretic and not information-theoretic. We present examples to show how these results apply to concept classes defined by geometrical figures and neural nets, and derive polynomial bounds on the V-C dimension for these classes."
to:NB  to_read  re:hyperbolic_networks  vc-dimension  computational_complexity 
april 2017 by cshalizi
Arcones , Gine : Limit Theorems for $U$-Processes
"Necessary and sufficient conditions for the law of large numbers and sufficient conditions for the central limit theorem for U-processes are given. These conditions are in terms of random metric entropies. The CLT and LLN for VC subgraph classes of functions as well as for classes satisfying bracketing conditions follow as consequences of the general results. In particular, Liu's simplicial depth process satisfies both the LLN and the CLT. Among the techniques used, randomization, decoupling inequalities, integrability of Gaussian and Rademacher chaos and exponential inequalities for U-statistics should be mentioned."
in_NB  u-statistics  empirical_processes  deviation_inequalities  vc-dimension  central_limit_theorem  re:smoothing_adjacency_matrices 
december 2015 by cshalizi
Measures of Complexity - Festschrift for Alexey Chervonenkis | Vladimir Vovk | Springer
"This book brings together historical notes, reviews of research developments, fresh ideas on how to make VC (Vapnik–Chervonenkis) guarantees tighter, and new technical contributions in the areas of machine learning, statistical inference, classification, algorithmic statistics, and pattern recognition."
to:NB  learning_theory  empirical_processes  vc-dimension  books:noted 
october 2015 by cshalizi
From Uniform Laws of Large Numbers to Uniform Ergodic Theorems
"The purpose of these lectures is to present three different approaches with their own methods for establishing uniform laws of large numbers and uni- form ergodic theorems for dynamical systems. The presentation follows the principle according to which the i.i.d. case is considered first in great de- tail, and then attempts are made to extend these results to the case of more general dependence structures. The lectures begin (Chapter 1) with a re- view and description of classic laws of large numbers and ergodic theorems, their connection and interplay, and their infinite dimensional extensions to- wards uniform theorems with applications to dynamical systems. The first approach (Chapter 2) is of metric entropy with bracketing which relies upon the Blum-DeHardt law of large numbers and Hoffmann-Jørgensen’s exten- sion of it. The result extends to general dynamical systems using the uniform ergodic lemma (or Kingman’s subadditive ergodic theorem). In this context metric entropy and majorizing measure type conditions are also considered. The second approach (Chapter 3) is of Vapnik and Chervonenkis. It relies upon Rademacher randomization (subgaussian inequality) and Gaussian ran- domization (Sudakov’s minoration) and offers conditions in terms of random entropy numbers. Absolutely regular dynamical systems are shown to sup- port the VC theory using a blocking technique and Eberlein’s lemma. The third approach (Chapter 4) is aimed to cover the wide sense stationary case which is not accessible by the previous two methods. This approach relies upon the spectral representation theorem and offers conditions in terms of the orthogonal stochastic measures which are associated with the underlying dynamical system by means of this theorem. The case of bounded variation is covered, while the case of unbounded variation is left as an open question. The lectures finish with a supplement in which the role of uniform conver- gence of reversed martingales towards consistency of statistical models is explained via the concept of Hardy’s regular convergence."

--- I got a glance at this once a decade ago in a library, and have been looking for a copy for years.
in_NB  ergodic_theory  vc-dimension  learning_theory  stochastic_processes  to_read  empirical_processes 
august 2015 by cshalizi
[1506.05900] Representation Learning for Clustering: A Statistical Framework
"We address the problem of communicating domain knowledge from a user to the designer of a clustering algorithm. We propose a protocol in which the user provides a clustering of a relatively small random sample of a data set. The algorithm designer then uses that sample to come up with a data representation under which k-means clustering results in a clustering (of the full data set) that is aligned with the user's clustering. We provide a formal statistical model for analyzing the sample complexity of learning a clustering representation with this paradigm. We then introduce a notion of capacity of a class of possible representations, in the spirit of the VC-dimension, showing that classes of representations that have finite such dimension can be successfully learned with sample size error bounds, and end our discussion with an analysis of that dimension for classes of representations induced by linear embeddings."
to:NB  machine_learning  representation  learning_theory  clustering  vc-dimension 
july 2015 by cshalizi
[1412.6612] The Vapnik-Chervonenkis Dimension of Norms on $mathbb{R}^d$
"The Vapnik-Chervonenkis dimension of a collection of subsets of a set is an important combinatorial parameter in machine learning. In this paper we show that the VC dimension of the family of d-dimensional cubes in ℝd (that is, the closed balls according to the ℓ∞ norm) is ⌊(3d+1)/2⌋. We also prove that the VC dimension of certain families of convex sets in ℝ2 (including the balls of all norms) is at most 3, and that there is a norm in ℝ3 the collection of whose balls has infinite VC dimension."
to:NB  learning_theory  vc-dimension 
january 2015 by cshalizi
[1401.7388] Bounding Embeddings of VC Classes into Maximum Classes
"One of the earliest conjectures in computational learning theory-the Sample Compression conjecture-asserts that concept classes (equivalently set systems) admit compression schemes of size linear in their VC dimension. To-date this statement is known to be true for maximum classes---those that possess maximum cardinality for their VC dimension. The most promising approach to positively resolving the conjecture is by embedding general VC classes into maximum classes without super-linear increase to their VC dimensions, as such embeddings would extend the known compression schemes to all VC classes. We show that maximum classes can be characterised by a local-connectivity property of the graph obtained by viewing the class as a cubical complex. This geometric characterisation of maximum VC classes is applied to prove a negative embedding result which demonstrates VC-d classes that cannot be embedded in any maximum class of VC dimension lower than 2d. On the other hand, we show that every VC-d class C embeds in a VC-(d+D) maximum class where D is the deficiency of C, i.e., the difference between the cardinalities of a maximum VC-d class and of C. For VC-2 classes in binary n-cubes for 4 <= n <= 6, we give best possible results on embedding into maximum classes. For some special classes of Boolean functions, relationships with maximum classes are investigated. Finally we give a general recursive procedure for embedding VC-d classes into VC-(d+k) maximum classes for smallest k."
in_NB  learning_theory  vc-dimension 
february 2014 by cshalizi
[1310.5796] Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions
"We present an extensive analysis of relative deviation bounds, including detailed proofs of two-sided inequalities and their implications. We also give detailed proofs of two-sided generalization bounds that hold in the general case of unbounded loss functions, under the assumption that a moment of the loss is bounded. These bounds are useful in the analysis of importance weighting and other learning tasks such as unbounded regression."
in_NB  learning_theory  vc-dimension  empirical_processes  have_read 
october 2013 by cshalizi
[1309.2626] Some new maximum VC classes
"Set systems of finite VC dimension are frequently used in applications relating to machine learning theory and statistics. Two simple types of VC classes which have been widely studied are the maximum classes (those which are extremal with respect to Sauer's lemma) and so-called Dudley classes, which arise as sets of positivity for linearly parameterized functions. These two types of VC class were related by Floyd, who gave sufficient conditions for when a Dudley class is maximum. It is widely known that Floyd's condition applies to positive Euclidean half-spaces and certain other classes, such as sets of positivity for univariate polynomials.
"In this paper we show that Floyd's lemma applies to a wider class of linearly parameterized functions than has been formally recognized to date. In particular we show that, modulo some minor technicalities, the sets of positivity for any linear combination of real analytic functions is maximum on points in general position. This includes sets of positivity for multivariate polynomials as a special case."
to:NB  learning_theory  vc-dimension 
september 2013 by cshalizi
Structural Risk Minimization over Data-Dependent Hierarchies
"The paper introduces some generalizations of Vapnik’s method of structural risk min- imisation (SRM). As well as making explicit some of the details on SRM, it provides a result that allows one to trade off errors on the training sample against improved general- ization performance. It then considers the more general case when the hierarchy of classes is chosen in response to the data. A result is presented on the generalization performance of classifiers with a “large margin”. This theoretically explains the impressive generaliza- tion performance of the maximal margin hyperplane algorithm of Vapnik and co-workers (which is the basis for their support vector machines). The paper concludes with a more general result in terms of “luckiness” functions, which provides a quite general way for ex- ploiting serendipitous simplicity in observed data to obtain better prediction accuracy from small training sets. Four examples are given of such functions, including the VC dimension measured on the sample."
in_NB  learning_theory  structural_risk_minimization  classifiers  vc-dimension  re:your_favorite_dsge_sucks  have_read  to:blog 
april 2013 by cshalizi
[1303.5976] On Learnability, Complexity and Stability
"We consider the fundamental question of learnability of a hypotheses class in the supervised learning setting and in the general learning setting introduced by Vladimir Vapnik. We survey classic results characterizing learnability in term of suitable notions of complexity, as well as more recent results that establish the connection between learnability and stability of a learning algorithm."
in_NB  learning_theory  machine_learning  stability_of_learning  vc-dimension 
march 2013 by cshalizi
On the Stability of Empirical Risk Minimization in the Presence of Multiple Risk Minimizers
"Recently, Kutin and Niyogi investigated several notions of algorithmic stability—a property of a learning map conceptually similar to continuity—showing that training stability is sufficient for consistency of empirical risk minimization (ERM) while distribution-free CV-stability is necessary and sufficient for having finite VC-dimension. This paper concerns a phase transition in the training stability of ERM, conjectured by the same authors. Kutin and Niyogi proved that ERM on finite hypothesis spaces containing a unique risk minimizer has training stability that scales exponentially with sample size, and conjectured that the existence of multiple risk minimizers prevents even super-quadratic convergence. We prove this result for the strictly weaker notion of CV-stability, positively resolving the conjecture."
in_NB  learning_theory  stability_of_learning  cross-validation  vc-dimension 
june 2012 by cshalizi
[1203.0193] Vapnik-Chervonenkis Dimension of Axis-Parallel Cuts
Huh: "The Vapnik-Chervonenkis dimension (VC dimension) of the set of half-spaces of R^d with frontiers parallel to the axes is computed exactly. It is shown that this VC dimension is smaller than the intuitive value of d. An additional approximation using the Stirling's formula is given. This result may be used to evaluate the performance of classifiers or regressors based on dyadic partitioning of R^d for instance. Algorithms using axis-parallel cuts to partition R^d are often used to reduce the computational time of such estimators when d is large."
in_NB  learning_theory  vc-dimension  classifiers 
march 2012 by cshalizi
[1111.3404] Estimated VC dimension for risk bounds
"Vapnik-Chervonenkis (VC) dimension is a fundamental measure of the generalization capacity of learning algorithms. However, apart from a few special cases, it is hard or impossible to calculate analytically. Vapnik et al. [10] proposed a technique for estimating the VC dimension empirically. While their approach behaves well in simulations, it could not be used to bound the generalization risk of classifiers, because there were no bounds for the estimation error of the VC dimension itself. We rectify this omission, providing high probability concentration results for the proposed estimator and deriving corresponding generalization bounds."
self-promotion  learning_theory  vc-dimension  machine_learning  re:your_favorite_dsge_sucks 
november 2011 by cshalizi
Adams, Nobel: Uniform convergence of Vapnik–Chervonenkis classes under ergodic sampling
Oooh: "We show that if X is a complete separable metric space and C is a countable family of Borel subsets of with finite VC dimension, then, for every stationary ergodic process with values in X, the relative frequencies of sets c \in C converge uniformly to their limiting probabilities. Beyond ergodicity, no assumptions are imposed on the sampling process, and no regularity conditions are imposed on the elements of C. The result extends existing work of Vapnik and Chervonenkis, among others, who have studied uniform convergence for i.i.d. and strongly mixing processes. Our method of proof is new and direct: it does not rely on symmetrization techniques, probability inequalities or mixing conditions. The uniform convergence of relative frequencies for VC-major and VC-graph classes of functions under ergodic sampling is established as a corollary of the basic result for sets."  No rates, but very nice.
ergodic_theory  learning_theory  stochastic_processes  vc-dimension  have_read  re:XV_for_mixing  re:your_favorite_dsge_sucks 
july 2010 by cshalizi
[1006.5090] PAC learnability of a concept class under non-atomic measures: a problem by Vidyasagar
Huh. When you put it that way, it totally makes sense. After all, the Adversary makes VC dimension large by exhibiting big sets which can still be shattered. But if all His counter-examples have measure zero under all non-atomic distributions, I can ignore them...
learning_theory  vc-dimension  measure_theory 
june 2010 by cshalizi
A Note on the Richness of the Convex Hulls of VC Classes
"We prove the existence of a class A of subsets of Rd of VC dimension 1 such that the symmetric convex hull F of the class of characteristic functions of sets in A is rich in the following sense. For any absolutely continuous probability measure μ on Rd, measurable set B and ε >0, there exists a function f in F such that the measure of the symmetric difference of B and the set where f is positive is less than ε. The question was motivated by the investigation of the theoretical properties of certain algorithms in machine learning." --- I see it, but I don't believe it! (The proof would seem to extend to arbitrary complete separable metric spaces, not just R^d.)
learning_theory  density_estimation  statistics  have_read  vc-dimension  analysis  measure_theory 
march 2009 by cshalizi

Copy this bookmark: