nhaliday + random-matrices   6

Information Processing: Search results for compressed sensing
https://www.unz.com/jthompson/the-hsu-boundary/
http://infoproc.blogspot.com/2017/09/phase-transitions-and-genomic.html
Donoho-Student says:
September 14, 2017 at 8:27 pm GMT • 100 Words

The Donoho-Tanner transition describes the noise-free (h2=1) case, which has a direct analog in the geometry of polytopes.

The n = 30s result from Hsu et al. (specifically the value of the coefficient, 30, when p is the appropriate number of SNPs on an array and h2 = 0.5) is obtained via simulation using actual genome matrices, and is original to them. (There is no simple formula that gives this number.) The D-T transition had only been established in the past for certain classes of matrices, like random matrices with specific distributions. Those results cannot be immediately applied to genomes.

The estimate that s is (order of magnitude) 10k is also a key input.

I think Hsu refers to n = 1 million instead of 30 * 10k = 300k because the effective SNP heritability of IQ might be less than h2 = 0.5 — there is noise in the phenotype measurement, etc.

Donoho-Student says:
September 15, 2017 at 11:27 am GMT • 200 Words

Lasso is a common statistical method but most people who use it are not familiar with the mathematical theorems from compressed sensing. These results give performance guarantees and describe phase transition behavior, but because they are rigorous theorems they only apply to specific classes of sensor matrices, such as simple random matrices. Genomes have correlation structure, so the theorems do not directly apply to the real world case of interest, as is often true.

What the Hsu paper shows is that the exact D-T phase transition appears in the noiseless (h2 = 1) problem using genome matrices, and a smoothed version appears in the problem with realistic h2. These are new results, as is the prediction for how much data is required to cross the boundary. I don’t think most gwas people are familiar with these results. If they did understand the results they would fund/design adequately powered studies capable of solving lots of complex phenotypes, medical conditions as well as IQ, that have significant h2.

Most people who use lasso, as opposed to people who prove theorems, are not even aware of the D-T transition. Even most people who prove theorems have followed the Candes-Tao line of attack (restricted isometry property) and don’t think much about D-T. Although D eventually proved some things about the phase transition using high dimensional geometry, it was initially discovered via simulation using simple random matrices.
hsu  list  stream  genomics  genetics  concept  stats  methodology  scaling-up  scitariat  sparsity  regression  biodet  bioinformatics  norms  nibble  compressed-sensing  applications  search  ideas  multi  albion  behavioral-gen  iq  state-of-art  commentary  explanation  phase-transition  measurement  volo-avolo  regularization  levers  novelty  the-trenches  liner-notes  clarity  random-matrices  innovation  high-dimension  linear-models
november 2016 by nhaliday
Talagrand’s concentration inequality | What's new
Proposition 1 follows easily from the following statement, that asserts that if a convex set {A \subset {\bf R}^n} occupies a non-trivial fraction of the cube {\{-1,+1\}^n}, then the neighbourhood {A_t := \{ x \in {\bf R}^n: \hbox{dist}(x,A) \leq t \}} will occupy almost all of the cube for {t \gg 1}:
exposition  math.CA  math  gowers  concentration-of-measure  mathtariat  random-matrices  levers  estimate  probability  math.MG  geometry  boolean-analysis  nibble  org:bleg  high-dimension  p:whenever  dimensionality  curvature  convexity-curvature
may 2016 by nhaliday