nhaliday + sparsity   27

Accurate Genomic Prediction Of Human Height | bioRxiv
Stephen Hsu's compressed sensing application paper

We construct genomic predictors for heritable and extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show that these predictors capture, respectively, ~40, 20, and 9 percent of total variance for the three traits. For example, predicted heights correlate ~0.65 with actual height; actual heights of most individuals in validation samples are within a few cm of the prediction.


I'm in Mountain View to give a talk at 23andMe. Their latest funding round was $250M on a (reported) valuation of $1.5B. If I just add up the Crunchbase numbers it looks like almost half a billion invested at this point...

Slides: Genomic Prediction of Complex Traits

Here's how people + robots handle your spit sample to produce a SNP genotype:

study  bio  preprint  GWAS  state-of-art  embodied  genetics  genomics  compressed-sensing  high-dimension  machine-learning  missing-heritability  hsu  scitariat  education  🌞  frontier  britain  regression  data  visualization  correlation  phase-transition  multi  commentary  summary  pdf  slides  brands  skunkworks  hard-tech  presentation  talks  methodology  intricacy  bioinformatics  scaling-up  stat-power  sparsity  norms  nibble  speedometer  stats  linear-models  2017  biodet 
september 2017 by nhaliday
6.896: Essential Coding Theory
- probabilistic method and Chernoff bound for Shannon coding
- probabilistic method for asymptotically good Hamming codes (Gilbert coding)
- sparsity used for LDPC codes
mit  course  yoga  tcs  complexity  coding-theory  math.AG  fields  polynomials  pigeonhole-markov  linear-algebra  probabilistic-method  lecture-notes  bits  sparsity  concentration-of-measure  linear-programming  linearity  expanders  hamming  pseudorandomness  crypto  rigorous-crypto  communication-complexity  no-go  madhu-sudan  shannon  unit  p:**  quixotic 
february 2017 by nhaliday
CS 731 Advanced Artificial Intelligence - Spring 2011
- statistical machine learning
- sparsity in regression
- graphical models
- exponential families
- variational methods
- dimensionality reduction, eg, PCA
- Bayesian nonparametrics
- compressive sensing, matrix completion, and Johnson-Lindenstrauss
course  lecture-notes  yoga  acm  stats  machine-learning  graphical-models  graphs  model-class  bayesian  learning-theory  sparsity  embeddings  markov  monte-carlo  norms  unit  nonparametric  compressed-sensing  matrix-factorization  features 
january 2017 by nhaliday
Information Processing: Search results for compressed sensing
Added: Here are comments from "Donoho-Student":
Donoho-Student says:
September 14, 2017 at 8:27 pm GMT • 100 Words

The Donoho-Tanner transition describes the noise-free (h2=1) case, which has a direct analog in the geometry of polytopes.

The n = 30s result from Hsu et al. (specifically the value of the coefficient, 30, when p is the appropriate number of SNPs on an array and h2 = 0.5) is obtained via simulation using actual genome matrices, and is original to them. (There is no simple formula that gives this number.) The D-T transition had only been established in the past for certain classes of matrices, like random matrices with specific distributions. Those results cannot be immediately applied to genomes.

The estimate that s is (order of magnitude) 10k is also a key input.

I think Hsu refers to n = 1 million instead of 30 * 10k = 300k because the effective SNP heritability of IQ might be less than h2 = 0.5 — there is noise in the phenotype measurement, etc.

Donoho-Student says:
September 15, 2017 at 11:27 am GMT • 200 Words

Lasso is a common statistical method but most people who use it are not familiar with the mathematical theorems from compressed sensing. These results give performance guarantees and describe phase transition behavior, but because they are rigorous theorems they only apply to specific classes of sensor matrices, such as simple random matrices. Genomes have correlation structure, so the theorems do not directly apply to the real world case of interest, as is often true.

What the Hsu paper shows is that the exact D-T phase transition appears in the noiseless (h2 = 1) problem using genome matrices, and a smoothed version appears in the problem with realistic h2. These are new results, as is the prediction for how much data is required to cross the boundary. I don’t think most gwas people are familiar with these results. If they did understand the results they would fund/design adequately powered studies capable of solving lots of complex phenotypes, medical conditions as well as IQ, that have significant h2.

Most people who use lasso, as opposed to people who prove theorems, are not even aware of the D-T transition. Even most people who prove theorems have followed the Candes-Tao line of attack (restricted isometry property) and don’t think much about D-T. Although D eventually proved some things about the phase transition using high dimensional geometry, it was initially discovered via simulation using simple random matrices.
hsu  list  stream  genomics  genetics  concept  stats  methodology  scaling-up  scitariat  sparsity  regression  biodet  bioinformatics  norms  nibble  compressed-sensing  applications  search  ideas  multi  albion  behavioral-gen  iq  state-of-art  commentary  explanation  phase-transition  measurement  volo-avolo  regularization  levers  novelty  the-trenches  liner-notes  clarity  random-matrices  innovation  high-dimension  linear-models 
november 2016 by nhaliday
Xavier Amatriain's answer to What is the difference between L1 and L2 regularization? - Quora
So, as opposed to what Andrew Ng explains in his "Feature selection, l1 vs l2 regularization, and rotational invariance" (Page on stanford.edu), I would say that as a rule-of-thumb, you should always go for L2 in practice.
best-practices  q-n-a  machine-learning  acm  optimization  tidbits  advice  qra  regularization  model-class  regression  sparsity  features  comparison  model-selection  norms  nibble 
november 2016 by nhaliday
Bottoming Out – arg min blog
Now, I’ve been hammering the point in my previous posts that saddle points are not what makes non-convex optimization difficult. Here, when specializing to deep learning, even local minima are not getting in my way. Deep neural nets are just very easy to minimize.
machine-learning  deep-learning  optimization  rhetoric  speculation  research  hmm  research-program  acmtariat  generalization  metabuch  local-global  off-convex  ben-recht  extrema  org:bleg  nibble  sparsity  curvature  ideas  aphorism  convexity-curvature  explanans  volo-avolo  hardness 
june 2016 by nhaliday

bundles : abstractacmmathpatternstcs

related tags

acm  acmtariat  advice  albion  algorithmic-econ  algorithms  aphorism  applications  article  atoms  bandits  bayesian  behavioral-gen  ben-recht  best-practices  big-list  big-picture  bio  biodet  bioinformatics  bits  boltzmann  brands  britain  chicago  circuits  clarity  classification  coding-theory  combo-optimization  commentary  communication-complexity  comparison  complexity  compressed-sensing  compression  concentration-of-measure  concept  confusion  convexity-curvature  correlation  course  crypto  curiosity  curvature  data  data-structures  debate  deep-learning  definition  dimensionality  direction  discovery  discrete  economics  education  elegance  embeddings  embodied  essay  examples  expanders  explanans  explanation  exploratory  exposition  extrema  features  fields  fourier  frontier  game-theory  GCTA  generalization  generative  genetics  genomics  geometry  gowers  gradient-descent  graphical-models  graphs  GWAS  hamming  hard-tech  hardness  hashing  high-dimension  hmm  homepage  hsu  human-capital  ideas  IEEE  inner-product  innovation  intricacy  intuition  iq  isotropy  language  learning-theory  lecture-notes  levers  linear-algebra  linear-models  linear-programming  linearity  liner-notes  links  list  local-global  low-hanging  lower-bounds  machine-learning  madhu-sudan  markov  math  math.AG  math.MG  mathtariat  matrix-factorization  measure  measurement  metabuch  methodology  metric-space  microfoundations  mihai  missing-heritability  mit  model-class  model-selection  monte-carlo  multi  nibble  nlp  no-go  nonparametric  norms  novelty  off-convex  online-learning  optimization  org:bleg  org:edu  overflow  p:**  p:***  p:someday  PAC  papers  pdf  phase-transition  photography  pic  pigeonhole-markov  polynomials  population-genetics  preprint  presentation  princeton  probabilistic-method  proofs  pseudorandomness  q-n-a  qra  QTL  quixotic  random-matrices  reference  reflection  regression  regularization  research  research-program  rhetoric  rigidity  rigorous-crypto  robust  s:*  sampling  sanjeev-arora  sapiens  scaling-up  scitariat  search  separation  shannon  skunkworks  slides  soft-question  space-complexity  sparsity  spearhead  spectral  speculation  speedometer  stanford  stat-power  state-of-art  stats  stochastic-processes  stories  stream  study  sublinear  submodular  summary  synthesis  talks  tcs  tcstariat  the-trenches  thinking  tidbits  tim-roughgarden  toolkit  topics  unit  unsupervised  valiant  video  visual-understanding  visualization  volo-avolo  wiki  wormholes  yoga  🌞  🎩  👳  🔬 

Copy this bookmark: