nhaliday + concentration-of-measure   76

multivariate analysis - Is it possible to have a pair of Gaussian random variables for which the joint distribution is not Gaussian? - Cross Validated
The bivariate normal distribution is the exception, not the rule!

It is important to recognize that "almost all" joint distributions with normal marginals are not the bivariate normal distribution. That is, the common viewpoint that joint distributions with normal marginals that are not the bivariate normal are somehow "pathological", is a bit misguided.

Certainly, the multivariate normal is extremely important due to its stability under linear transformations, and so receives the bulk of attention in applications.

note: there is a multivariate central limit theorem, so those such applications have no problem
nibble  q-n-a  overflow  stats  math  acm  probability  distribution  gotchas  intricacy  characterization  structure  composition-decomposition  counterexample  limits  concentration-of-measure 
october 2017 by nhaliday
Hoeffding’s Inequality
basic idea of standard pf: bound e^{tX} by line segment (convexity) then use Taylor expansion (in p = b/(b-a), the fraction of range to right of 0) of logarithm
pdf  lecture-notes  exposition  nibble  concentration-of-measure  estimate  proofs  ground-up  acm  probability  series  s:null 
february 2017 by nhaliday
st.statistics - Lower bound for sum of binomial coefficients? - MathOverflow
- basically approximate w/ geometric sum (which scales as final term) and you can get it up to O(1) factor
- not good enough for many applications (want 1+o(1) approx.)
- Stirling can also give bound to constant factor precision w/ more calculation I believe
- tighter bound at Section 7.3 here: http://webbuild.knu.ac.kr/~trj/Combin/matousek-vondrak-prob-ln.pdf
q-n-a  overflow  nibble  math  math.CO  estimate  tidbits  magnitude  concentration-of-measure  stirling  binomial  metabuch  tricki  multi  tightness  pdf  lecture-notes  exposition  probability  probabilistic-method  yoga 
february 2017 by nhaliday
Prékopa–Leindler inequality | Academically Interesting
Consider the following statements:
1. The shape with the largest volume enclosed by a given surface area is the n-dimensional sphere.
2. A marginal or sum of log-concave distributions is log-concave.
3. Any Lipschitz function of a standard n-dimensional Gaussian distribution concentrates around its mean.
What do these all have in common? Despite being fairly non-trivial and deep results, they all can be proved in less than half of a page using the Prékopa–Leindler inequality.

ie, Brunn-Minkowski
acmtariat  clever-rats  ratty  math  acm  geometry  measure  math.MG  estimate  distribution  concentration-of-measure  smoothness  regularity  org:bleg  nibble  brunn-minkowski  curvature  convexity-curvature 
february 2017 by nhaliday
Simultaneous confidence intervals for multinomial parameters, for small samples, many classes? - Cross Validated
- "Bonferroni approach" is just union bound
- so Pr(|hat p_i - p_i| > ε for any i) <= 2k e^{-ε^2 n} = δ
- ε = sqrt(ln(2k/δ)/n)
- Bonferroni approach should work for case of any dependent Bernoulli r.v.s
q-n-a  overflow  stats  moments  distribution  acm  hypothesis-testing  nibble  confidence  concentration-of-measure  bonferroni  parametric  synchrony 
february 2017 by nhaliday
probability - Variance of maximum of Gaussian random variables - Cross Validated
In full generality it is rather hard to find the right order of magnitude of the variance of a Gaussien supremum since the tools from concentration theory are always suboptimal for the maximum function.

order ~ 1/log n
q-n-a  overflow  stats  probability  acm  orders  tails  bias-variance  moments  concentration-of-measure  magnitude  tidbits  distribution  yoga  structure  extrema  nibble 
february 2017 by nhaliday
bounds - What is the variance of the maximum of a sample? - Cross Validated
- sum of variances is always a bound
- can't do better even for iid Bernoulli
- looks like nice argument from well-known probabilist (using E[(X-Y)^2] = 2Var X), but not clear to me how he gets to sum_i instead of sum_{i,j} in the union bound?
edit: argument is that, for j = argmax_k Y_k, we have r < X_i - Y_j <= X_i - Y_i for all i, including i = argmax_k X_k
- different proof here (later pages): http://www.ism.ac.jp/editsec/aism/pdf/047_1_0185.pdf
Var(X_n:n) <= sum Var(X_k:n) + 2 sum_{i < j} Cov(X_i:n, X_j:n) = Var(sum X_k:n) = Var(sum X_k) = nσ^2
why are the covariances nonnegative? (are they?). intuitively seems true.
- for that, see https://pinboard.in/u:nhaliday/b:ed4466204bb1
- note that this proof shows more generally that sum Var(X_k:n) <= sum Var(X_k)
- apparently that holds for dependent X_k too? http://mathoverflow.net/a/96943/20644
q-n-a  overflow  stats  acm  distribution  tails  bias-variance  moments  estimate  magnitude  probability  iidness  tidbits  concentration-of-measure  multi  orders  levers  extrema  nibble  bonferroni  coarse-fine  expert  symmetry  s:*  expert-experience  proofs 
february 2017 by nhaliday
254A, Supplement 4: Probabilistic models and heuristics for the primes (optional) | What's new
among others, the Cramér model for the primes (basically kinda looks like primality is independently distributed w/ Pr[n is prime] = 1/log n)
gowers  mathtariat  lecture-notes  exposition  math  math.NT  probability  heuristic  models  cartoons  nibble  org:bleg  pseudorandomness  borel-cantelli  concentration-of-measure  multiplicative 
february 2017 by nhaliday
6.896: Essential Coding Theory
- probabilistic method and Chernoff bound for Shannon coding
- probabilistic method for asymptotically good Hamming codes (Gilbert coding)
- sparsity used for LDPC codes
mit  course  yoga  tcs  complexity  coding-theory  math.AG  fields  polynomials  pigeonhole-markov  linear-algebra  probabilistic-method  lecture-notes  bits  sparsity  concentration-of-measure  linear-programming  linearity  expanders  hamming  pseudorandomness  crypto  rigorous-crypto  communication-complexity  no-go  madhu-sudan  shannon  unit  p:**  quixotic  advanced 
february 2017 by nhaliday
Dvoretzky's theorem - Wikipedia
In mathematics, Dvoretzky's theorem is an important structural theorem about normed vector spaces proved by Aryeh Dvoretzky in the early 1960s, answering a question of Alexander Grothendieck. In essence, it says that every sufficiently high-dimensional normed vector space will have low-dimensional subspaces that are approximately Euclidean. Equivalently, every high-dimensional bounded symmetric convex set has low-dimensional sections that are approximately ellipsoids.

math  math.FA  inner-product  levers  characterization  geometry  math.MG  concentration-of-measure  multi  q-n-a  overflow  intuition  examples  proofs  dimensionality  gowers  mathtariat  tcstariat  quantum  quantum-info  norms  nibble  high-dimension  wiki  reference  curvature  convexity-curvature  tcs 
january 2017 by nhaliday
reference request - The coupon collector's earworm - MathOverflow
I have a playlist with, say, N pieces of music. While using the shuffle option (each such piece is played randomly at each step), I realized that, generally speaking, I have to hear quite a lot of times the same piece before the last one appears. It makes me think of the following question:

At the moment the last non already heard piece is played, what is the max, in average, of number of times the same piece has already been played?

A: e log N + o(log N)
q-n-a  overflow  math  math.CO  tidbits  puzzles  probability  magnitude  oly  nibble  concentration-of-measure  binomial 
january 2017 by nhaliday
The infinitesimal model | bioRxiv
Our focus here is on the infinitesimal model. In this model, one or several quantitative traits are described as the sum of a genetic and a non-genetic component, the first being distributed as a normal random variable centred at the average of the parental genetic components, and with a variance independent of the parental traits. We first review the long history of the infinitesimal model in quantitative genetics. Then we provide a definition of the model at the phenotypic level in terms of individual trait values and relationships between individuals, but including different evolutionary processes: genetic drift, recombination, selection, mutation, population structure, ... We give a range of examples of its application to evolutionary questions related to stabilising selection, assortative mating, effective population size and response to selection, habitat preference and speciation. We provide a mathematical justification of the model as the limit as the number M of underlying loci tends to infinity of a model with Mendelian inheritance, mutation and environmental noise, when the genetic component of the trait is purely additive. We also show how the model generalises to include epistatic effects. In each case, by conditioning on the pedigree relating individuals in the population, we incorporate arbitrary selection and population structure. We suppose that we can observe the pedigree up to the present generation, together with all the ancestral traits, and we show, in particular, that the genetic components of the individual trait values in the current generation are indeed normally distributed with a variance independent of ancestral traits, up to an error of order M^{-1/2}. Simulations suggest that in particular cases the convergence may be as fast as 1/M.

published version:
The infinitesimal model: Definition, derivation, and implications: https://sci-hub.tw/10.1016/j.tpb.2017.06.001

Commentary: Fisher’s infinitesimal model: A story for the ages: http://www.sciencedirect.com/science/article/pii/S0040580917301508?via%3Dihub
This commentary distinguishes three nested approximations, referred to as “infinitesimal genetics,” “Gaussian descendants” and “Gaussian population,” each plausibly called “the infinitesimal model.” The first and most basic is Fisher’s “infinitesimal” approximation of the underlying genetics – namely, many loci, each making a small contribution to the total variance. As Barton et al. (2017) show, in the limit as the number of loci increases (with enough additivity), the distribution of genotypic values for descendants approaches a multivariate Gaussian, whose variance–covariance structure depends only on the relatedness, not the phenotypes, of the parents (or whether their population experiences selection or other processes such as mutation and migration). Barton et al. (2017) call this rigorously defensible “Gaussian descendants” approximation “the infinitesimal model.” However, it is widely assumed that Fisher’s genetic assumptions yield another Gaussian approximation, in which the distribution of breeding values in a population follows a Gaussian — even if the population is subject to non-Gaussian selection. This third “Gaussian population” approximation, is also described as the “infinitesimal model.” Unlike the “Gaussian descendants” approximation, this third approximation cannot be rigorously justified, except in a weak-selection limit, even for a purely additive model. Nevertheless, it underlies the two most widely used descriptions of selection-induced changes in trait means and genetic variances, the “breeder’s equation” and the “Bulmer effect.” Future generations may understand why the “infinitesimal model” provides such useful approximations in the face of epistasis, linkage, linkage disequilibrium and strong selection.
study  exposition  bio  evolution  population-genetics  genetics  methodology  QTL  preprint  models  unit  len:long  nibble  linearity  nonlinearity  concentration-of-measure  limits  applications  🌞  biodet  oscillation  fisher  perturbation  stylized-facts  chart  ideas  article  pop-structure  multi  pdf  piracy  intricacy  map-territory  kinship  distribution  simulation  ground-up  linear-models  applicability-prereqs  bioinformatics 
january 2017 by nhaliday
gt.geometric topology - Intuitive crutches for higher dimensional thinking - MathOverflow
Terry Tao:
I can't help you much with high-dimensional topology - it's not my field, and I've not picked up the various tricks topologists use to get a grip on the subject - but when dealing with the geometry of high-dimensional (or infinite-dimensional) vector spaces such as R^n, there are plenty of ways to conceptualise these spaces that do not require visualising more than three dimensions directly.

For instance, one can view a high-dimensional vector space as a state space for a system with many degrees of freedom. A megapixel image, for instance, is a point in a million-dimensional vector space; by varying the image, one can explore the space, and various subsets of this space correspond to various classes of images.

One can similarly interpret sound waves, a box of gases, an ecosystem, a voting population, a stream of digital data, trials of random variables, the results of a statistical survey, a probabilistic strategy in a two-player game, and many other concrete objects as states in a high-dimensional vector space, and various basic concepts such as convexity, distance, linearity, change of variables, orthogonality, or inner product can have very natural meanings in some of these models (though not in all).

It can take a bit of both theory and practice to merge one's intuition for these things with one's spatial intuition for vectors and vector spaces, but it can be done eventually (much as after one has enough exposure to measure theory, one can start merging one's intuition regarding cardinality, mass, length, volume, probability, cost, charge, and any number of other "real-life" measures).

For instance, the fact that most of the mass of a unit ball in high dimensions lurks near the boundary of the ball can be interpreted as a manifestation of the law of large numbers, using the interpretation of a high-dimensional vector space as the state space for a large number of trials of a random variable.

More generally, many facts about low-dimensional projections or slices of high-dimensional objects can be viewed from a probabilistic, statistical, or signal processing perspective.

Scott Aaronson:
Here are some of the crutches I've relied on. (Admittedly, my crutches are probably much more useful for theoretical computer science, combinatorics, and probability than they are for geometry, topology, or physics. On a related note, I personally have a much easier time thinking about R^n than about, say, R^4 or R^5!)

1. If you're trying to visualize some 4D phenomenon P, first think of a related 3D phenomenon P', and then imagine yourself as a 2D being who's trying to visualize P'. The advantage is that, unlike with the 4D vs. 3D case, you yourself can easily switch between the 3D and 2D perspectives, and can therefore get a sense of exactly what information is being lost when you drop a dimension. (You could call this the "Flatland trick," after the most famous literary work to rely on it.)
2. As someone else mentioned, discretize! Instead of thinking about R^n, think about the Boolean hypercube {0,1}^n, which is finite and usually easier to get intuition about. (When working on problems, I often find myself drawing {0,1}^4 on a sheet of paper by drawing two copies of {0,1}^3 and then connecting the corresponding vertices.)
3. Instead of thinking about a subset S⊆R^n, think about its characteristic function f:R^n→{0,1}. I don't know why that trivial perspective switch makes such a big difference, but it does ... maybe because it shifts your attention to the process of computing f, and makes you forget about the hopeless task of visualizing S!
4. One of the central facts about R^n is that, while it has "room" for only n orthogonal vectors, it has room for exp⁡(n) almost-orthogonal vectors. Internalize that one fact, and so many other properties of R^n (for example, that the n-sphere resembles a "ball with spikes sticking out," as someone mentioned before) will suddenly seem non-mysterious. In turn, one way to internalize the fact that R^n has so many almost-orthogonal vectors is to internalize Shannon's theorem that there exist good error-correcting codes.
5. To get a feel for some high-dimensional object, ask questions about the behavior of a process that takes place on that object. For example: if I drop a ball here, which local minimum will it settle into? How long does this random walk on {0,1}^n take to mix?

Gil Kalai:
This is a slightly different point, but Vitali Milman, who works in high-dimensional convexity, likes to draw high-dimensional convex bodies in a non-convex way. This is to convey the point that if you take the convex hull of a few points on the unit sphere of R^n, then for large n very little of the measure of the convex body is anywhere near the corners, so in a certain sense the body is a bit like a small sphere with long thin "spikes".
q-n-a  intuition  math  visual-understanding  list  discussion  thurston  tidbits  aaronson  tcs  geometry  problem-solving  yoga  👳  big-list  metabuch  tcstariat  gowers  mathtariat  acm  overflow  soft-question  levers  dimensionality  hi-order-bits  insight  synthesis  thinking  models  cartoons  coding-theory  information-theory  probability  concentration-of-measure  magnitude  linear-algebra  boolean-analysis  analogy  arrows  lifts-projections  measure  markov  sampling  shannon  conceptual-vocab  nibble  degrees-of-freedom  worrydream  neurons  retrofit  oscillation  paradox  novelty  tricki  concrete  high-dimension  s:***  manifolds  direction  curvature  convexity-curvature  elegance 
december 2016 by nhaliday
Bounds on the Expectation of the Maximum of Samples from a Gaussian
σ/sqrt(pi log 2) sqrt(log n) <= E[Y] <= σ sqrt(2) sqrt(log n)

upper bound pf: Jensen's inequality+mgf+union bound+choose optimal t (Chernoff bound basically)
lower bound pf: more ad-hoc (and difficult)
pdf  tidbits  math  probability  concentration-of-measure  estimate  acm  tails  distribution  calculation  iidness  orders  magnitude  extrema  tightness  outliers  expectancy  proofs  elegance 
october 2016 by nhaliday
Talagrand’s concentration inequality | What's new
Proposition 1 follows easily from the following statement, that asserts that if a convex set {A \subset {\bf R}^n} occupies a non-trivial fraction of the cube {\{-1,+1\}^n}, then the neighbourhood {A_t := \{ x \in {\bf R}^n: \hbox{dist}(x,A) \leq t \}} will occupy almost all of the cube for {t \gg 1}:
exposition  math.CA  math  gowers  concentration-of-measure  mathtariat  random-matrices  levers  estimate  probability  math.MG  geometry  boolean-analysis  nibble  org:bleg  high-dimension  p:whenever  dimensionality  curvature  convexity-curvature 
may 2016 by nhaliday

bundles : abstractacademeacmpatternstcs

related tags

aaronson  academia  accretion  acm  acmtariat  additive  additive-combo  advanced  adversarial  alg-combo  algebraic-complexity  algorithms  alignment  allodium  analogy  analysis  aphorism  applicability-prereqs  applications  approximation  arrows  article  axelrod  backup  behavioral-gen  benchmarks  best-practices  better-explained  bias-variance  big-list  big-picture  binomial  bio  biodet  bioinformatics  bits  boltzmann  bonferroni  books  boolean-analysis  borel-cantelli  bounded-cognition  brunn-minkowski  calculation  canon  cartoons  chaining  characterization  chart  cheatsheet  christianity  classic  clever-rats  cmu  coarse-fine  coding-theory  commentary  communication-complexity  comparison  competition  complex-systems  complexity  composition-decomposition  compressed-sensing  concentration-of-measure  concept  conceptual-vocab  concrete  conference  confidence  conquest-empire  contrarianism  convergence  convexity-curvature  cooperate-defect  coordination  counterexample  counting  course  cracker-econ  crypto  curiosity  curvature  cycles  data  data-science  dataviz  debate  debt  decision-theory  definition  degrees-of-freedom  density  dependence-independence  dimensionality  direction  discrete  discussion  distribution  draft  duality  ecology  economics  econotariat  education  egalitarianism-hierarchy  EGT  elegance  embeddings  encyclopedic  engineering  enhancement  entropy-like  equilibrium  error  essay  estimate  events  evolution  examples  existence  expanders  expansionism  expectancy  expert  expert-experience  explanation  exposition  extrema  faq  farmers-and-foragers  features  fedja  fermi  fields  finance  fisher  flux-stasis  fourier  frontier  futurism  game-theory  garett-jones  gaussian-processes  genetics  genomics  geometry  giants  gotchas  gowers  gradient-descent  graph-theory  graphs  gray-econ  ground-up  growth-econ  GT-101  GWAS  gwern  hamming  hanson  hashing  heuristic  hi-order-bits  high-dimension  history  hmm  homogeneity  housing  hsu  huge-data-the-biggest  hypothesis-testing  icml  ideas  identity  IEEE  iidness  info-dynamics  info-econ  information-theory  init  inner-product  innovation  insight  integral  interdisciplinary  intersection  intersection-connectedness  intricacy  intuition  iq  is-ought  ising  isotropy  iteration-recursion  jargon  journos-pundits  kinship  knowledge  learning  learning-theory  lecture-notes  left-wing  len:long  lens  levers  lifts-projections  limits  linear-algebra  linear-models  linear-programming  linearity  liner-notes  links  list  literature  local-global  long-short-run  machine-learning  macro  madhu-sudan  magnitude  malthus  manifolds  map-territory  marginal-rev  market-failure  markets  markov  martingale  math  math.AG  math.AT  math.CA  math.CO  math.CV  math.FA  math.MG  math.NT  mathtariat  matrix-factorization  measure  mental-math  metabuch  metameta  methodology  metric-space  metrics  migration  mihai  mit  models  moloch  moments  monte-carlo  motivation  multi  multiplicative  neurons  nibble  nitty-gritty  no-go  nonlinearity  nonparametric  norms  novelty  objektbuch  oly  online-learning  optimization  orders  ORFE  org:bleg  org:com  org:edu  org:mat  orourke  oscillation  outcome-risk  outliers  overflow  p:*  p:**  p:***  p:whenever  PAC  papers  paradox  parametric  pdf  percolation  performance  perturbation  phase-transition  physics  pic  pigeonhole-markov  piracy  poetry  polynomials  pop-structure  population-genetics  postmortem  power-law  prediction  preprint  presentation  princeton  probabilistic-method  probability  problem-solving  project  proofs  property-rights  pseudorandomness  publishing  puzzles  q-n-a  qra  QTL  quantifiers-sums  quantum  quantum-info  quixotic  quora  quotes  rand-approx  random  random-matrices  ratty  recruiting  reference  reflection  regularity  regularization  regularizer  regulation  relaxation  religion  retention  retrofit  rhetoric  rigorous-crypto  risk  robust  rounding  s:*  s:**  s:***  s:null  sampling  sanjeev-arora  scaling-up  science  scitariat  SDP  sebastien-bubeck  selection  separation  series  shannon  signal-noise  simulation  singularity  skeleton  slides  smoothness  social  soft-question  space  sparsity  spatial  spearhead  spectral  speculation  speed  spreading  stanford  stat-mech  stats  status  stirling  stochastic-processes  stories  strategy  stream  street-fighting  structure  study  stylized-facts  sublinear  submodular  summary  survey  symmetry  synchrony  synthesis  tails  talks  tcs  tcstariat  technology  techtariat  tensors  the-basilisk  the-prices  the-trenches  theos  thinking  thurston  tidbits  tightness  tim-roughgarden  time  time-complexity  toolkit  top-n  topics  topology  tricki  tricks  truth  twitter  uncertainty  uniqueness  unit  usa  valiant  values  video  visual-understanding  vitality  volo-avolo  von-neumann  wiki  wire-guided  wisdom  wormholes  worrydream  yoga  zooming  🌞  👳  🔬 

Copy this bookmark: