nhaliday + random-matrices   6

Information Processing: Search results for compressed sensing
https://www.unz.com/jthompson/the-hsu-boundary/
http://infoproc.blogspot.com/2017/09/phase-transitions-and-genomic.html
Added: Here are comments from "Donoho-Student":
Donoho-Student says:
September 14, 2017 at 8:27 pm GMT • 100 Words

The Donoho-Tanner transition describes the noise-free (h2=1) case, which has a direct analog in the geometry of polytopes.

The n = 30s result from Hsu et al. (specifically the value of the coefficient, 30, when p is the appropriate number of SNPs on an array and h2 = 0.5) is obtained via simulation using actual genome matrices, and is original to them. (There is no simple formula that gives this number.) The D-T transition had only been established in the past for certain classes of matrices, like random matrices with specific distributions. Those results cannot be immediately applied to genomes.

The estimate that s is (order of magnitude) 10k is also a key input.

I think Hsu refers to n = 1 million instead of 30 * 10k = 300k because the effective SNP heritability of IQ might be less than h2 = 0.5 — there is noise in the phenotype measurement, etc.

Donoho-Student says:
September 15, 2017 at 11:27 am GMT • 200 Words

Lasso is a common statistical method but most people who use it are not familiar with the mathematical theorems from compressed sensing. These results give performance guarantees and describe phase transition behavior, but because they are rigorous theorems they only apply to specific classes of sensor matrices, such as simple random matrices. Genomes have correlation structure, so the theorems do not directly apply to the real world case of interest, as is often true.

What the Hsu paper shows is that the exact D-T phase transition appears in the noiseless (h2 = 1) problem using genome matrices, and a smoothed version appears in the problem with realistic h2. These are new results, as is the prediction for how much data is required to cross the boundary. I don’t think most gwas people are familiar with these results. If they did understand the results they would fund/design adequately powered studies capable of solving lots of complex phenotypes, medical conditions as well as IQ, that have significant h2.

Most people who use lasso, as opposed to people who prove theorems, are not even aware of the D-T transition. Even most people who prove theorems have followed the Candes-Tao line of attack (restricted isometry property) and don’t think much about D-T. Although D eventually proved some things about the phase transition using high dimensional geometry, it was initially discovered via simulation using simple random matrices.
hsu  list  stream  genomics  genetics  concept  stats  methodology  scaling-up  scitariat  sparsity  regression  biodet  bioinformatics  norms  nibble  compressed-sensing  applications  search  ideas  multi  albion  behavioral-gen  iq  state-of-art  commentary  explanation  phase-transition  measurement  volo-avolo  regularization  levers  novelty  the-trenches  liner-notes  clarity  random-matrices  innovation  high-dimension  linear-models 
november 2016 by nhaliday
Talagrand’s concentration inequality | What's new
Proposition 1 follows easily from the following statement, that asserts that if a convex set {A \subset {\bf R}^n} occupies a non-trivial fraction of the cube {\{-1,+1\}^n}, then the neighbourhood {A_t := \{ x \in {\bf R}^n: \hbox{dist}(x,A) \leq t \}} will occupy almost all of the cube for {t \gg 1}:
exposition  math.CA  math  gowers  concentration-of-measure  mathtariat  random-matrices  levers  estimate  probability  math.MG  geometry  boolean-analysis  nibble  org:bleg  high-dimension  p:whenever  dimensionality  curvature  convexity-curvature 
may 2016 by nhaliday

bundles : academeacmmath

related tags

albion  algorithms  ankur-moitra  applications  average-case  behavioral-gen  biodet  bioinformatics  boolean-analysis  clarity  commentary  compressed-sensing  concentration-of-measure  concept  convergence  convexity-curvature  course  curvature  dimensionality  duality  ensembles  ergodic  estimate  expectancy  explanation  exposition  fourier  genetics  genomics  geometry  giants  gowers  gradient-descent  graph-theory  graphs  hashing  high-dimension  hsu  ideas  iidness  init  innovation  iq  iterative-methods  latent-variables  lecture-notes  levers  limits  linear-algebra  linear-models  linear-programming  liner-notes  list  magnitude  math  math.CA  math.CO  math.DS  math.MG  mathtariat  measurement  metabuch  methodology  mit  moments  multi  nibble  norms  novelty  oly  online-learning  org:bleg  overflow  p:***  p:whenever  perturbation  phase-transition  probability  q-n-a  quixotic  random  random-matrices  random-networks  regression  regularization  rounding  scaling-up  scitariat  SDP  search  sparsity  spectral  state-of-art  stats  stream  sublinear  submodular  tcs  tcstariat  the-trenches  toolkit  unit  volo-avolo  von-neumann  yoga  👳 

Copy this bookmark:



description:


tags: