**nhaliday + high-dimension**
30

8 PCA – A Powerful Method for Analyze Ecological Niches

august 2018 by nhaliday

Influences of ecology and biogeography on shaping the distributions of cryptic species: three bat tales in Iberia: https://academic.oup.com/biolinnean/article/112/1/150/2415750

Combining Historical Biogeography with Niche Modeling in theCaprifoliumClade ofLonicera(Caprifoliaceae, Dipsacales): https://watermark.silverchair.com/syq011.pdf?token=AQECAHi208BE49Ooan9kkhW_Ercy7Dm3ZL_9Cf3qfKAc485ysgAAAagwggGkBgkqhkiG9w0BBwagggGVMIIBkQIBADCCAYoGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMnQcew1QnnjkjJSlVAgEQgIIBW-Nu-4L3xpOdRIb27NdbMbhPjaeByMM3g6H1bpeMMK4OJ9gBOH7V5WfuKGlHlsgsStQQLC_s2YGVu5KDOtwhudWOPFqrXmYlAXjhFNi5hFNpCxjNT-4tTJlRJHU5plgPE2BWZht5okuM2sngjX3t5dDScmz0oTBvu7xnUXo3sbGkad6gw-za6Rpyl5_3-nnnbOpz6WeqfxcR7NDGwPd741QVJKjjp-FHPf8JdWN3mcsLMVJ6p11FoeMeQdA7gsyXhKDPfE8sJ2Xamjxk5uSaGkfi1bi71OB1Ag0UvV2xlON1UwWD9V8tE7e3JJQanv_aKgKyppuXQikoMhH05x_nCFsiVif-_-26Yyx0CMIHv4so81sOpwN5YM_BISyUp_RoT2yfjiEhZpcJlyWX4z6ZeKAUEICloT8evsOX8Ll4FUocBHARhnqZgRlc8w33b_J3wslXv-PVBvvXNs0h

pdf
article
study
methodology
bio
ecology
data
analysis
stats
exploratory
matrix-factorization
geography
environment
time
crosstab
history
letters
correlation
evolution
distribution
examples
high-dimension
multi
chart
howto
objektbuch
metabuch
nibble
data-science
things
Combining Historical Biogeography with Niche Modeling in theCaprifoliumClade ofLonicera(Caprifoliaceae, Dipsacales): https://watermark.silverchair.com/syq011.pdf?token=AQECAHi208BE49Ooan9kkhW_Ercy7Dm3ZL_9Cf3qfKAc485ysgAAAagwggGkBgkqhkiG9w0BBwagggGVMIIBkQIBADCCAYoGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMnQcew1QnnjkjJSlVAgEQgIIBW-Nu-4L3xpOdRIb27NdbMbhPjaeByMM3g6H1bpeMMK4OJ9gBOH7V5WfuKGlHlsgsStQQLC_s2YGVu5KDOtwhudWOPFqrXmYlAXjhFNi5hFNpCxjNT-4tTJlRJHU5plgPE2BWZht5okuM2sngjX3t5dDScmz0oTBvu7xnUXo3sbGkad6gw-za6Rpyl5_3-nnnbOpz6WeqfxcR7NDGwPd741QVJKjjp-FHPf8JdWN3mcsLMVJ6p11FoeMeQdA7gsyXhKDPfE8sJ2Xamjxk5uSaGkfi1bi71OB1Ag0UvV2xlON1UwWD9V8tE7e3JJQanv_aKgKyppuXQikoMhH05x_nCFsiVif-_-26Yyx0CMIHv4so81sOpwN5YM_BISyUp_RoT2yfjiEhZpcJlyWX4z6ZeKAUEICloT8evsOX8Ll4FUocBHARhnqZgRlc8w33b_J3wslXv-PVBvvXNs0h

august 2018 by nhaliday

Accurate Genomic Prediction Of Human Height | bioRxiv

september 2017 by nhaliday

Stephen Hsu's compressed sensing application paper

We construct genomic predictors for heritable and extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show that these predictors capture, respectively, ~40, 20, and 9 percent of total variance for the three traits. For example, predicted heights correlate ~0.65 with actual height; actual heights of most individuals in validation samples are within a few cm of the prediction.

https://infoproc.blogspot.com/2017/09/accurate-genomic-prediction-of-human.html

http://infoproc.blogspot.com/2017/11/23andme.html

I'm in Mountain View to give a talk at 23andMe. Their latest funding round was $250M on a (reported) valuation of $1.5B. If I just add up the Crunchbase numbers it looks like almost half a billion invested at this point...

Slides: Genomic Prediction of Complex Traits

Here's how people + robots handle your spit sample to produce a SNP genotype:

https://drive.google.com/file/d/1e_zuIPJr1hgQupYAxkcbgEVxmrDHAYRj/view

study
bio
preprint
GWAS
state-of-art
embodied
genetics
genomics
compressed-sensing
high-dimension
machine-learning
missing-heritability
hsu
scitariat
education
🌞
frontier
britain
regression
data
visualization
correlation
phase-transition
multi
commentary
summary
pdf
slides
brands
skunkworks
hard-tech
presentation
talks
methodology
intricacy
bioinformatics
scaling-up
stat-power
sparsity
norms
nibble
speedometer
stats
linear-models
2017
biodet
We construct genomic predictors for heritable and extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show that these predictors capture, respectively, ~40, 20, and 9 percent of total variance for the three traits. For example, predicted heights correlate ~0.65 with actual height; actual heights of most individuals in validation samples are within a few cm of the prediction.

https://infoproc.blogspot.com/2017/09/accurate-genomic-prediction-of-human.html

http://infoproc.blogspot.com/2017/11/23andme.html

I'm in Mountain View to give a talk at 23andMe. Their latest funding round was $250M on a (reported) valuation of $1.5B. If I just add up the Crunchbase numbers it looks like almost half a billion invested at this point...

Slides: Genomic Prediction of Complex Traits

Here's how people + robots handle your spit sample to produce a SNP genotype:

https://drive.google.com/file/d/1e_zuIPJr1hgQupYAxkcbgEVxmrDHAYRj/view

september 2017 by nhaliday

Overcoming Bias : High Dimensional Societes?

july 2017 by nhaliday

I’ve seen many “spatial” models in social science. Such as models where voters and politicians sit at points in a space of policies. Or where customers and firms sit at points in a space of products. But I’ve never seen a discussion of how one should expect such models to change in high dimensions, such as when there are more dimensions than points.

In small dimensional spaces, the distances between points vary greatly; neighboring points are much closer to each other than are distant points. However, in high dimensional spaces, distances between points vary much less; all points are about the same distance from all other points. When points are distributed randomly, however, these distances do vary somewhat, allowing us to define the few points closest to each point as that point’s “neighbors”. “Hubs” are closest neighbors to many more points than average, while “anti-hubs” are closest neighbors to many fewer points than average. It turns out that in higher dimensions a larger fraction of points are hubs and anti-hubs (Zimek et al. 2012).

If we think of people or organizations as such points, is being a hub or anti-hub associated with any distinct social behavior? Does it contribute substantially to being popular or unpopular? Or does the fact that real people and organizations are in fact distributed in real space overwhelm such things, which only only happen in a truly high dimensional social world?

ratty
hanson
speculation
ideas
thinking
spatial
dimensionality
high-dimension
homo-hetero
analogy
models
network-structure
degrees-of-freedom
In small dimensional spaces, the distances between points vary greatly; neighboring points are much closer to each other than are distant points. However, in high dimensional spaces, distances between points vary much less; all points are about the same distance from all other points. When points are distributed randomly, however, these distances do vary somewhat, allowing us to define the few points closest to each point as that point’s “neighbors”. “Hubs” are closest neighbors to many more points than average, while “anti-hubs” are closest neighbors to many fewer points than average. It turns out that in higher dimensions a larger fraction of points are hubs and anti-hubs (Zimek et al. 2012).

If we think of people or organizations as such points, is being a hub or anti-hub associated with any distinct social behavior? Does it contribute substantially to being popular or unpopular? Or does the fact that real people and organizations are in fact distributed in real space overwhelm such things, which only only happen in a truly high dimensional social world?

july 2017 by nhaliday

Genomic analysis of family data reveals additional genetic effects on intelligence and personality | bioRxiv

june 2017 by nhaliday

methodology:

Using Extended Genealogy to Estimate Components of Heritability for 23 Quantitative and Dichotomous Traits: http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1003520

Pedigree- and SNP-Associated Genetics and Recent Environment are the Major Contributors to Anthropometric and Cardiometabolic Trait Variation: http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1005804

Missing Heritability – found?: https://westhunt.wordpress.com/2017/02/09/missing-heritability-found/

There is an interesting new paper out on genetics and IQ. The claim is that they have found the missing heritability – in rare variants, generally different in each family.

Some of the variants, the ones we find with GWAS, are fairly common and fitness-neutral: the variant that slightly increases IQ confers the same fitness (or very close to the same) as the one that slightly decreases IQ – presumably because of other effects it has. If this weren’t the case, it would be impossible for both of the variants to remain common.

The rare variants that affect IQ will generally decrease IQ – and since pleiotropy is the norm, usually they’ll be deleterious in other ways as well. Genetic load.

Happy families are all alike; every unhappy family is unhappy in its own way.: https://westhunt.wordpress.com/2017/06/06/happy-families-are-all-alike-every-unhappy-family-is-unhappy-in-its-own-way/

It now looks as if the majority of the genetic variance in IQ is the product of mutational load, and the same may be true for many psychological traits. To the extent this is the case, a lot of human psychological variation must be non-adaptive. Maybe some personality variation fulfills an evolutionary function, but a lot does not. Being a dumb asshole may be a bug, rather than a feature. More generally, this kind of analysis could show us whether particular low-fitness syndromes, like autism, were ever strategies – I suspect not.

It’s bad new news for medicine and psychiatry, though. It would suggest that what we call a given type of mental illness, like schizophrenia, is really a grab-bag of many different syndromes. The ultimate causes are extremely varied: at best, there may be shared intermediate causal factors. Not good news for drug development: individualized medicine is a threat, not a promise.

see also comment at: https://pinboard.in/u:nhaliday/b:a6ab4034b0d0

https://www.reddit.com/r/slatestarcodex/comments/5sldfa/genomic_analysis_of_family_data_reveals/

So the big implication here is that it's better than I had dared hope - like Yang/Visscher/Hsu have argued, the old GCTA estimate of ~0.3 is indeed a rather loose lower bound on additive genetic variants, and the rest of the missing heritability is just the relatively uncommon additive variants (ie <1% frequency), and so, like Yang demonstrated with height, using much more comprehensive imputation of SNP scores or using whole-genomes will be able to explain almost all of the genetic contribution. In other words, with better imputation panels, we can go back and squeeze out better polygenic scores from old GWASes, new GWASes will be able to reach and break the 0.3 upper bound, and eventually we can feasibly predict 0.5-0.8. Between the expanding sample sizes from biobanks, the still-falling price of whole genomes, the gradual development of better regression methods (informative priors, biological annotation information, networks, genetic correlations), and better imputation, the future of GWAS polygenic scores is bright. Which obviously will be extremely helpful for embryo selection/genome synthesis.

The argument that this supports mutation-selection balance is weaker but plausible. I hope that it's true, because if that's why there is so much genetic variation in intelligence, then that strongly encourages genetic engineering - there is no good reason or Chesterton fence for intelligence variants being non-fixed, it's just that evolution is too slow to purge the constantly-accumulating bad variants. And we can do better.

https://rubenarslan.github.io/generation_scotland_pedigree_gcta/

The surprising implications of familial association in disease risk: https://arxiv.org/abs/1707.00014

https://spottedtoad.wordpress.com/2017/06/09/personalized-medicine-wont-work-but-race-based-medicine-probably-will/

As Greg Cochran has pointed out, this probably isn’t going to work. There are a few genes like BRCA1 (which makes you more likely to get breast and ovarian cancer) that we can detect and might affect treatment, but an awful lot of disease turns out to be just the result of random chance and deleterious mutation. This means that you can’t easily tailor disease treatment to people’s genes, because everybody is fucked up in their own special way. If Johnny is schizophrenic because of 100 random errors in the genes that code for his neurons, and Jack is schizophrenic because of 100 other random errors, there’s very little way to test a drug to work for either of them- they’re the only one in the world, most likely, with that specific pattern of errors. This is, presumably why the incidence of schizophrenia and autism rises in populations when dads get older- more random errors in sperm formation mean more random errors in the baby’s genes, and more things that go wrong down the line.

The looming crisis in human genetics: http://www.economist.com/node/14742737

Some awkward news ahead

- Geoffrey Miller

Human geneticists have reached a private crisis of conscience, and it will become public knowledge in 2010. The crisis has depressing health implications and alarming political ones. In a nutshell: the new genetics will reveal much less than hoped about how to cure disease, and much more than feared about human evolution and inequality, including genetic differences between classes, ethnicities and races.

2009!

study
preprint
bio
biodet
behavioral-gen
GWAS
missing-heritability
QTL
🌞
scaling-up
replication
iq
education
spearhead
sib-study
multi
west-hunter
scitariat
genetic-load
mutation
medicine
meta:medicine
stylized-facts
ratty
unaffiliated
commentary
rhetoric
wonkish
genetics
genomics
race
pop-structure
poast
population-genetics
psychiatry
aphorism
homo-hetero
generalization
scale
state-of-art
ssc
reddit
social
summary
gwern
methodology
personality
britain
anglo
enhancement
roots
s:*
2017
data
visualization
database
let-me-see
bioinformatics
news
org:rec
org:anglo
org:biz
track-record
prediction
identity-politics
pop-diff
recent-selection
westminster
inequality
egalitarianism-hierarchy
high-dimension
applications
dimensionality
ideas
no-go
volo-avolo
magnitude
variance-components
GCTA
tradeoffs
counter-revolution
org:mat
dysgenics
paternal-age
distribution
chart
abortion-contraception-embryo
Using Extended Genealogy to Estimate Components of Heritability for 23 Quantitative and Dichotomous Traits: http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1003520

Pedigree- and SNP-Associated Genetics and Recent Environment are the Major Contributors to Anthropometric and Cardiometabolic Trait Variation: http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1005804

Missing Heritability – found?: https://westhunt.wordpress.com/2017/02/09/missing-heritability-found/

There is an interesting new paper out on genetics and IQ. The claim is that they have found the missing heritability – in rare variants, generally different in each family.

Some of the variants, the ones we find with GWAS, are fairly common and fitness-neutral: the variant that slightly increases IQ confers the same fitness (or very close to the same) as the one that slightly decreases IQ – presumably because of other effects it has. If this weren’t the case, it would be impossible for both of the variants to remain common.

The rare variants that affect IQ will generally decrease IQ – and since pleiotropy is the norm, usually they’ll be deleterious in other ways as well. Genetic load.

Happy families are all alike; every unhappy family is unhappy in its own way.: https://westhunt.wordpress.com/2017/06/06/happy-families-are-all-alike-every-unhappy-family-is-unhappy-in-its-own-way/

It now looks as if the majority of the genetic variance in IQ is the product of mutational load, and the same may be true for many psychological traits. To the extent this is the case, a lot of human psychological variation must be non-adaptive. Maybe some personality variation fulfills an evolutionary function, but a lot does not. Being a dumb asshole may be a bug, rather than a feature. More generally, this kind of analysis could show us whether particular low-fitness syndromes, like autism, were ever strategies – I suspect not.

It’s bad new news for medicine and psychiatry, though. It would suggest that what we call a given type of mental illness, like schizophrenia, is really a grab-bag of many different syndromes. The ultimate causes are extremely varied: at best, there may be shared intermediate causal factors. Not good news for drug development: individualized medicine is a threat, not a promise.

see also comment at: https://pinboard.in/u:nhaliday/b:a6ab4034b0d0

https://www.reddit.com/r/slatestarcodex/comments/5sldfa/genomic_analysis_of_family_data_reveals/

So the big implication here is that it's better than I had dared hope - like Yang/Visscher/Hsu have argued, the old GCTA estimate of ~0.3 is indeed a rather loose lower bound on additive genetic variants, and the rest of the missing heritability is just the relatively uncommon additive variants (ie <1% frequency), and so, like Yang demonstrated with height, using much more comprehensive imputation of SNP scores or using whole-genomes will be able to explain almost all of the genetic contribution. In other words, with better imputation panels, we can go back and squeeze out better polygenic scores from old GWASes, new GWASes will be able to reach and break the 0.3 upper bound, and eventually we can feasibly predict 0.5-0.8. Between the expanding sample sizes from biobanks, the still-falling price of whole genomes, the gradual development of better regression methods (informative priors, biological annotation information, networks, genetic correlations), and better imputation, the future of GWAS polygenic scores is bright. Which obviously will be extremely helpful for embryo selection/genome synthesis.

The argument that this supports mutation-selection balance is weaker but plausible. I hope that it's true, because if that's why there is so much genetic variation in intelligence, then that strongly encourages genetic engineering - there is no good reason or Chesterton fence for intelligence variants being non-fixed, it's just that evolution is too slow to purge the constantly-accumulating bad variants. And we can do better.

https://rubenarslan.github.io/generation_scotland_pedigree_gcta/

The surprising implications of familial association in disease risk: https://arxiv.org/abs/1707.00014

https://spottedtoad.wordpress.com/2017/06/09/personalized-medicine-wont-work-but-race-based-medicine-probably-will/

As Greg Cochran has pointed out, this probably isn’t going to work. There are a few genes like BRCA1 (which makes you more likely to get breast and ovarian cancer) that we can detect and might affect treatment, but an awful lot of disease turns out to be just the result of random chance and deleterious mutation. This means that you can’t easily tailor disease treatment to people’s genes, because everybody is fucked up in their own special way. If Johnny is schizophrenic because of 100 random errors in the genes that code for his neurons, and Jack is schizophrenic because of 100 other random errors, there’s very little way to test a drug to work for either of them- they’re the only one in the world, most likely, with that specific pattern of errors. This is, presumably why the incidence of schizophrenia and autism rises in populations when dads get older- more random errors in sperm formation mean more random errors in the baby’s genes, and more things that go wrong down the line.

The looming crisis in human genetics: http://www.economist.com/node/14742737

Some awkward news ahead

- Geoffrey Miller

Human geneticists have reached a private crisis of conscience, and it will become public knowledge in 2010. The crisis has depressing health implications and alarming political ones. In a nutshell: the new genetics will reveal much less than hoped about how to cure disease, and much more than feared about human evolution and inequality, including genetic differences between classes, ethnicities and races.

2009!

june 2017 by nhaliday

A cube, a starfish, a thin shell, and the central limit theorem – Libres pensées d'un mathématicien ordinaire

mathtariat org:bleg nibble math acm probability concentration-of-measure high-dimension cartoons limits dimensionality measure yoga hi-order-bits synthesis exposition spatial geometry math.MG curvature convexity-curvature

february 2017 by nhaliday

mathtariat org:bleg nibble math acm probability concentration-of-measure high-dimension cartoons limits dimensionality measure yoga hi-order-bits synthesis exposition spatial geometry math.MG curvature convexity-curvature

february 2017 by nhaliday

ho.history overview - History of the high-dimensional volume paradox - MathOverflow

q-n-a overflow math math.MG geometry spatial dimensionality limits measure concentration-of-measure history stories giants cartoons soft-question nibble paradox novelty high-dimension examples gotchas

january 2017 by nhaliday

q-n-a overflow math math.MG geometry spatial dimensionality limits measure concentration-of-measure history stories giants cartoons soft-question nibble paradox novelty high-dimension examples gotchas

january 2017 by nhaliday

pr.probability - "Entropy" proof of Brunn-Minkowski Inequality? - MathOverflow

q-n-a overflow math information-theory wormholes proofs geometry math.MG estimate gowers mathtariat dimensionality limits intuition insight stat-mech concentration-of-measure 👳 cartoons math.FA additive-combo measure entropy-like nibble tensors coarse-fine brunn-minkowski boltzmann high-dimension curvature convexity-curvature

january 2017 by nhaliday

q-n-a overflow math information-theory wormholes proofs geometry math.MG estimate gowers mathtariat dimensionality limits intuition insight stat-mech concentration-of-measure 👳 cartoons math.FA additive-combo measure entropy-like nibble tensors coarse-fine brunn-minkowski boltzmann high-dimension curvature convexity-curvature

january 2017 by nhaliday

Dvoretzky's theorem - Wikipedia

january 2017 by nhaliday

In mathematics, Dvoretzky's theorem is an important structural theorem about normed vector spaces proved by Aryeh Dvoretzky in the early 1960s, answering a question of Alexander Grothendieck. In essence, it says that every sufficiently high-dimensional normed vector space will have low-dimensional subspaces that are approximately Euclidean. Equivalently, every high-dimensional bounded symmetric convex set has low-dimensional sections that are approximately ellipsoids.

http://mathoverflow.net/questions/143527/intuitive-explanation-of-dvoretzkys-theorem

http://mathoverflow.net/questions/46278/unexpected-applications-of-dvoretzkys-theorem

math
math.FA
inner-product
levers
characterization
geometry
math.MG
concentration-of-measure
multi
q-n-a
overflow
intuition
examples
proofs
dimensionality
gowers
mathtariat
tcstariat
quantum
quantum-info
norms
nibble
high-dimension
wiki
reference
curvature
convexity-curvature
tcs
http://mathoverflow.net/questions/143527/intuitive-explanation-of-dvoretzkys-theorem

http://mathoverflow.net/questions/46278/unexpected-applications-of-dvoretzkys-theorem

january 2017 by nhaliday

mg.metric geometry - How to explain the concentration-of-measure phenomenon intuitively? - MathOverflow

q-n-a overflow soft-question math geometry probability intuition tcstariat orourke concentration-of-measure dimensionality tcs math.MG random pigeonhole-markov nibble paradox novelty high-dimension s:** spatial

january 2017 by nhaliday

q-n-a overflow soft-question math geometry probability intuition tcstariat orourke concentration-of-measure dimensionality tcs math.MG random pigeonhole-markov nibble paradox novelty high-dimension s:** spatial

january 2017 by nhaliday

reference request - Why are two "random" vectors in $mathbb R^n$ approximately orthogonal for large $n$? - MathOverflow

q-n-a overflow math probability tidbits intuition cartoons math.MG spatial geometry linear-algebra mathtariat dimensionality magnitude concentration-of-measure probabilistic-method random separation inner-product nibble relaxation paradox novelty high-dimension direction

january 2017 by nhaliday

q-n-a overflow math probability tidbits intuition cartoons math.MG spatial geometry linear-algebra mathtariat dimensionality magnitude concentration-of-measure probabilistic-method random separation inner-product nibble relaxation paradox novelty high-dimension direction

january 2017 by nhaliday

A cheap version of the Kabatjanskii-Levenstein bound for almost orthogonal vectors | What's new

gowers mathtariat exposition tidbits math geometry spatial math.CO magnitude probabilistic-method cartoons linear-algebra math.MG dimensionality random separation inner-product nibble org:bleg relaxation high-dimension direction

january 2017 by nhaliday

gowers mathtariat exposition tidbits math geometry spatial math.CO magnitude probabilistic-method cartoons linear-algebra math.MG dimensionality random separation inner-product nibble org:bleg relaxation high-dimension direction

january 2017 by nhaliday

fa.functional analysis - Almost orthogonal vectors - MathOverflow

january 2017 by nhaliday

- you can pick exp(Θ(nε^2)) ε-almost orthogonal unit vectors in R^n w/ probabilistic method

- can also use Johnson-Lindenstrauss

q-n-a
overflow
math
tidbits
intuition
geometry
spatial
cartoons
dimensionality
linear-algebra
magnitude
gowers
mathtariat
tcstariat
math.CO
probabilistic-method
embeddings
math.MG
random
separation
inner-product
nibble
relaxation
paradox
novelty
high-dimension
direction
shift
- can also use Johnson-Lindenstrauss

january 2017 by nhaliday

ca.analysis and odes - What's a nice argument that shows the volume of the unit ball in $mathbb R^n$ approaches 0? - MathOverflow

q-n-a overflow intuition math geometry spatial dimensionality limits tidbits math.MG measure magnitude visual-understanding oly concentration-of-measure pigeonhole-markov nibble fedja coarse-fine novelty high-dimension

january 2017 by nhaliday

q-n-a overflow intuition math geometry spatial dimensionality limits tidbits math.MG measure magnitude visual-understanding oly concentration-of-measure pigeonhole-markov nibble fedja coarse-fine novelty high-dimension

january 2017 by nhaliday

Why Tool AIs Want to Be Agent AIs - Gwern.net

gwern rhetoric prediction speculation ai-control ai ratty direct-indirect telos-atelos volo-avolo risk threat-modeling intelligence reinforcement moloch machine-learning economics coordination cooperate-defect competition equilibrium complement-substitute analysis dimensionality high-dimension comparison incentives deep-learning model-class atoms gradient-descent survey links iteration-recursion retention attention turing performance slides expert-experience google deepgoog systems interdisciplinary applications heuristic explore-exploit q-n-a optimization saas unsupervised decision-theory average-case adversarial intricacy reduction the-self

january 2017 by nhaliday

gwern rhetoric prediction speculation ai-control ai ratty direct-indirect telos-atelos volo-avolo risk threat-modeling intelligence reinforcement moloch machine-learning economics coordination cooperate-defect competition equilibrium complement-substitute analysis dimensionality high-dimension comparison incentives deep-learning model-class atoms gradient-descent survey links iteration-recursion retention attention turing performance slides expert-experience google deepgoog systems interdisciplinary applications heuristic explore-exploit q-n-a optimization saas unsupervised decision-theory average-case adversarial intricacy reduction the-self

january 2017 by nhaliday

Science Policy | West Hunter

december 2016 by nhaliday

If my 23andme profile revealed that I was the last of the Plantagenets (as some suspect), and therefore rightfully King of the United States and Defender of Mexico, and I asked you for a general view of the right approach to science and technology – where the most promise is, what should be done, etc – what would you say?

genetically personalized medicine: https://westhunt.wordpress.com/2016/12/08/science-policy/#comment-85698

I have no idea how personalized medicine is supposed to work. Suppose that we sequence your entire genome, and then we intend to tailor a therapeutic approach to your genome.

How do we test it? By trying it on a bunch of genetically similar people? The more genetic details we take into account, the smaller that class is. It could easily become so small that it would be difficult to recruit enough people for a reasonable statistical trial. Second, the more details we take into account, the smaller the class that benefits from the whole testing process – which as far as I can see, is just as expensive as conventional Phasei/II etc trials.

What am I missing?

Now if you are a forethoughtful trillionaire, sure: you manufacture lots of clones just to test therapies you might someday need, and cost is no object.

I think I can see ways you could make it work tho [edit: what did I mean by this?...damnit]

west-hunter
discussion
politics
government
policy
science
technology
the-world-is-just-atoms
🔬
scitariat
meta:science
proposal
genetics
genomics
medicine
meta:medicine
multi
ideas
counter-revolution
poast
homo-hetero
generalization
scale
antidemos
alt-inst
applications
dimensionality
high-dimension
bioinformatics
no-go
volo-avolo
magnitude
trump
2016-election
questions
genetically personalized medicine: https://westhunt.wordpress.com/2016/12/08/science-policy/#comment-85698

I have no idea how personalized medicine is supposed to work. Suppose that we sequence your entire genome, and then we intend to tailor a therapeutic approach to your genome.

How do we test it? By trying it on a bunch of genetically similar people? The more genetic details we take into account, the smaller that class is. It could easily become so small that it would be difficult to recruit enough people for a reasonable statistical trial. Second, the more details we take into account, the smaller the class that benefits from the whole testing process – which as far as I can see, is just as expensive as conventional Phasei/II etc trials.

What am I missing?

Now if you are a forethoughtful trillionaire, sure: you manufacture lots of clones just to test therapies you might someday need, and cost is no object.

I think I can see ways you could make it work tho [edit: what did I mean by this?...damnit]

december 2016 by nhaliday

gt.geometric topology - Intuitive crutches for higher dimensional thinking - MathOverflow

december 2016 by nhaliday

Terry Tao:

I can't help you much with high-dimensional topology - it's not my field, and I've not picked up the various tricks topologists use to get a grip on the subject - but when dealing with the geometry of high-dimensional (or infinite-dimensional) vector spaces such as R^n, there are plenty of ways to conceptualise these spaces that do not require visualising more than three dimensions directly.

For instance, one can view a high-dimensional vector space as a state space for a system with many degrees of freedom. A megapixel image, for instance, is a point in a million-dimensional vector space; by varying the image, one can explore the space, and various subsets of this space correspond to various classes of images.

One can similarly interpret sound waves, a box of gases, an ecosystem, a voting population, a stream of digital data, trials of random variables, the results of a statistical survey, a probabilistic strategy in a two-player game, and many other concrete objects as states in a high-dimensional vector space, and various basic concepts such as convexity, distance, linearity, change of variables, orthogonality, or inner product can have very natural meanings in some of these models (though not in all).

It can take a bit of both theory and practice to merge one's intuition for these things with one's spatial intuition for vectors and vector spaces, but it can be done eventually (much as after one has enough exposure to measure theory, one can start merging one's intuition regarding cardinality, mass, length, volume, probability, cost, charge, and any number of other "real-life" measures).

For instance, the fact that most of the mass of a unit ball in high dimensions lurks near the boundary of the ball can be interpreted as a manifestation of the law of large numbers, using the interpretation of a high-dimensional vector space as the state space for a large number of trials of a random variable.

More generally, many facts about low-dimensional projections or slices of high-dimensional objects can be viewed from a probabilistic, statistical, or signal processing perspective.

Scott Aaronson:

Here are some of the crutches I've relied on. (Admittedly, my crutches are probably much more useful for theoretical computer science, combinatorics, and probability than they are for geometry, topology, or physics. On a related note, I personally have a much easier time thinking about R^n than about, say, R^4 or R^5!)

1. If you're trying to visualize some 4D phenomenon P, first think of a related 3D phenomenon P', and then imagine yourself as a 2D being who's trying to visualize P'. The advantage is that, unlike with the 4D vs. 3D case, you yourself can easily switch between the 3D and 2D perspectives, and can therefore get a sense of exactly what information is being lost when you drop a dimension. (You could call this the "Flatland trick," after the most famous literary work to rely on it.)

2. As someone else mentioned, discretize! Instead of thinking about R^n, think about the Boolean hypercube {0,1}^n, which is finite and usually easier to get intuition about. (When working on problems, I often find myself drawing {0,1}^4 on a sheet of paper by drawing two copies of {0,1}^3 and then connecting the corresponding vertices.)

3. Instead of thinking about a subset S⊆R^n, think about its characteristic function f:R^n→{0,1}. I don't know why that trivial perspective switch makes such a big difference, but it does ... maybe because it shifts your attention to the process of computing f, and makes you forget about the hopeless task of visualizing S!

4. One of the central facts about R^n is that, while it has "room" for only n orthogonal vectors, it has room for exp(n) almost-orthogonal vectors. Internalize that one fact, and so many other properties of R^n (for example, that the n-sphere resembles a "ball with spikes sticking out," as someone mentioned before) will suddenly seem non-mysterious. In turn, one way to internalize the fact that R^n has so many almost-orthogonal vectors is to internalize Shannon's theorem that there exist good error-correcting codes.

5. To get a feel for some high-dimensional object, ask questions about the behavior of a process that takes place on that object. For example: if I drop a ball here, which local minimum will it settle into? How long does this random walk on {0,1}^n take to mix?

Gil Kalai:

This is a slightly different point, but Vitali Milman, who works in high-dimensional convexity, likes to draw high-dimensional convex bodies in a non-convex way. This is to convey the point that if you take the convex hull of a few points on the unit sphere of R^n, then for large n very little of the measure of the convex body is anywhere near the corners, so in a certain sense the body is a bit like a small sphere with long thin "spikes".

q-n-a
intuition
math
visual-understanding
list
discussion
thurston
tidbits
aaronson
tcs
geometry
problem-solving
yoga
👳
big-list
metabuch
tcstariat
gowers
mathtariat
acm
overflow
soft-question
levers
dimensionality
hi-order-bits
insight
synthesis
thinking
models
cartoons
coding-theory
information-theory
probability
concentration-of-measure
magnitude
linear-algebra
boolean-analysis
analogy
arrows
lifts-projections
measure
markov
sampling
shannon
conceptual-vocab
nibble
degrees-of-freedom
worrydream
neurons
retrofit
oscillation
paradox
novelty
tricki
concrete
high-dimension
s:***
manifolds
direction
curvature
convexity-curvature
I can't help you much with high-dimensional topology - it's not my field, and I've not picked up the various tricks topologists use to get a grip on the subject - but when dealing with the geometry of high-dimensional (or infinite-dimensional) vector spaces such as R^n, there are plenty of ways to conceptualise these spaces that do not require visualising more than three dimensions directly.

For instance, one can view a high-dimensional vector space as a state space for a system with many degrees of freedom. A megapixel image, for instance, is a point in a million-dimensional vector space; by varying the image, one can explore the space, and various subsets of this space correspond to various classes of images.

One can similarly interpret sound waves, a box of gases, an ecosystem, a voting population, a stream of digital data, trials of random variables, the results of a statistical survey, a probabilistic strategy in a two-player game, and many other concrete objects as states in a high-dimensional vector space, and various basic concepts such as convexity, distance, linearity, change of variables, orthogonality, or inner product can have very natural meanings in some of these models (though not in all).

It can take a bit of both theory and practice to merge one's intuition for these things with one's spatial intuition for vectors and vector spaces, but it can be done eventually (much as after one has enough exposure to measure theory, one can start merging one's intuition regarding cardinality, mass, length, volume, probability, cost, charge, and any number of other "real-life" measures).

For instance, the fact that most of the mass of a unit ball in high dimensions lurks near the boundary of the ball can be interpreted as a manifestation of the law of large numbers, using the interpretation of a high-dimensional vector space as the state space for a large number of trials of a random variable.

More generally, many facts about low-dimensional projections or slices of high-dimensional objects can be viewed from a probabilistic, statistical, or signal processing perspective.

Scott Aaronson:

Here are some of the crutches I've relied on. (Admittedly, my crutches are probably much more useful for theoretical computer science, combinatorics, and probability than they are for geometry, topology, or physics. On a related note, I personally have a much easier time thinking about R^n than about, say, R^4 or R^5!)

1. If you're trying to visualize some 4D phenomenon P, first think of a related 3D phenomenon P', and then imagine yourself as a 2D being who's trying to visualize P'. The advantage is that, unlike with the 4D vs. 3D case, you yourself can easily switch between the 3D and 2D perspectives, and can therefore get a sense of exactly what information is being lost when you drop a dimension. (You could call this the "Flatland trick," after the most famous literary work to rely on it.)

2. As someone else mentioned, discretize! Instead of thinking about R^n, think about the Boolean hypercube {0,1}^n, which is finite and usually easier to get intuition about. (When working on problems, I often find myself drawing {0,1}^4 on a sheet of paper by drawing two copies of {0,1}^3 and then connecting the corresponding vertices.)

3. Instead of thinking about a subset S⊆R^n, think about its characteristic function f:R^n→{0,1}. I don't know why that trivial perspective switch makes such a big difference, but it does ... maybe because it shifts your attention to the process of computing f, and makes you forget about the hopeless task of visualizing S!

4. One of the central facts about R^n is that, while it has "room" for only n orthogonal vectors, it has room for exp(n) almost-orthogonal vectors. Internalize that one fact, and so many other properties of R^n (for example, that the n-sphere resembles a "ball with spikes sticking out," as someone mentioned before) will suddenly seem non-mysterious. In turn, one way to internalize the fact that R^n has so many almost-orthogonal vectors is to internalize Shannon's theorem that there exist good error-correcting codes.

5. To get a feel for some high-dimensional object, ask questions about the behavior of a process that takes place on that object. For example: if I drop a ball here, which local minimum will it settle into? How long does this random walk on {0,1}^n take to mix?

Gil Kalai:

This is a slightly different point, but Vitali Milman, who works in high-dimensional convexity, likes to draw high-dimensional convex bodies in a non-convex way. This is to convey the point that if you take the convex hull of a few points on the unit sphere of R^n, then for large n very little of the measure of the convex body is anywhere near the corners, so in a certain sense the body is a bit like a small sphere with long thin "spikes".

december 2016 by nhaliday

Information Processing: Search results for compressed sensing

november 2016 by nhaliday

https://www.unz.com/jthompson/the-hsu-boundary/

http://infoproc.blogspot.com/2017/09/phase-transitions-and-genomic.html

Added: Here are comments from "Donoho-Student":

Donoho-Student says:

September 14, 2017 at 8:27 pm GMT • 100 Words

The Donoho-Tanner transition describes the noise-free (h2=1) case, which has a direct analog in the geometry of polytopes.

The n = 30s result from Hsu et al. (specifically the value of the coefficient, 30, when p is the appropriate number of SNPs on an array and h2 = 0.5) is obtained via simulation using actual genome matrices, and is original to them. (There is no simple formula that gives this number.) The D-T transition had only been established in the past for certain classes of matrices, like random matrices with specific distributions. Those results cannot be immediately applied to genomes.

The estimate that s is (order of magnitude) 10k is also a key input.

I think Hsu refers to n = 1 million instead of 30 * 10k = 300k because the effective SNP heritability of IQ might be less than h2 = 0.5 — there is noise in the phenotype measurement, etc.

Donoho-Student says:

September 15, 2017 at 11:27 am GMT • 200 Words

Lasso is a common statistical method but most people who use it are not familiar with the mathematical theorems from compressed sensing. These results give performance guarantees and describe phase transition behavior, but because they are rigorous theorems they only apply to specific classes of sensor matrices, such as simple random matrices. Genomes have correlation structure, so the theorems do not directly apply to the real world case of interest, as is often true.

What the Hsu paper shows is that the exact D-T phase transition appears in the noiseless (h2 = 1) problem using genome matrices, and a smoothed version appears in the problem with realistic h2. These are new results, as is the prediction for how much data is required to cross the boundary. I don’t think most gwas people are familiar with these results. If they did understand the results they would fund/design adequately powered studies capable of solving lots of complex phenotypes, medical conditions as well as IQ, that have significant h2.

Most people who use lasso, as opposed to people who prove theorems, are not even aware of the D-T transition. Even most people who prove theorems have followed the Candes-Tao line of attack (restricted isometry property) and don’t think much about D-T. Although D eventually proved some things about the phase transition using high dimensional geometry, it was initially discovered via simulation using simple random matrices.

hsu
list
stream
genomics
genetics
concept
stats
methodology
scaling-up
scitariat
sparsity
regression
biodet
bioinformatics
norms
nibble
compressed-sensing
applications
search
ideas
multi
albion
behavioral-gen
iq
state-of-art
commentary
explanation
phase-transition
measurement
volo-avolo
regularization
levers
novelty
the-trenches
liner-notes
clarity
random-matrices
innovation
high-dimension
linear-models
http://infoproc.blogspot.com/2017/09/phase-transitions-and-genomic.html

Added: Here are comments from "Donoho-Student":

Donoho-Student says:

September 14, 2017 at 8:27 pm GMT • 100 Words

The Donoho-Tanner transition describes the noise-free (h2=1) case, which has a direct analog in the geometry of polytopes.

The n = 30s result from Hsu et al. (specifically the value of the coefficient, 30, when p is the appropriate number of SNPs on an array and h2 = 0.5) is obtained via simulation using actual genome matrices, and is original to them. (There is no simple formula that gives this number.) The D-T transition had only been established in the past for certain classes of matrices, like random matrices with specific distributions. Those results cannot be immediately applied to genomes.

The estimate that s is (order of magnitude) 10k is also a key input.

I think Hsu refers to n = 1 million instead of 30 * 10k = 300k because the effective SNP heritability of IQ might be less than h2 = 0.5 — there is noise in the phenotype measurement, etc.

Donoho-Student says:

September 15, 2017 at 11:27 am GMT • 200 Words

Lasso is a common statistical method but most people who use it are not familiar with the mathematical theorems from compressed sensing. These results give performance guarantees and describe phase transition behavior, but because they are rigorous theorems they only apply to specific classes of sensor matrices, such as simple random matrices. Genomes have correlation structure, so the theorems do not directly apply to the real world case of interest, as is often true.

What the Hsu paper shows is that the exact D-T phase transition appears in the noiseless (h2 = 1) problem using genome matrices, and a smoothed version appears in the problem with realistic h2. These are new results, as is the prediction for how much data is required to cross the boundary. I don’t think most gwas people are familiar with these results. If they did understand the results they would fund/design adequately powered studies capable of solving lots of complex phenotypes, medical conditions as well as IQ, that have significant h2.

Most people who use lasso, as opposed to people who prove theorems, are not even aware of the D-T transition. Even most people who prove theorems have followed the Candes-Tao line of attack (restricted isometry property) and don’t think much about D-T. Although D eventually proved some things about the phase transition using high dimensional geometry, it was initially discovered via simulation using simple random matrices.

november 2016 by nhaliday

Foundations of Data Science

pdf data-science books draft acm learning-theory machine-learning synthesis encyclopedic cmu 👳 org:edu markov accretion monte-carlo ground-up big-picture unit dimensionality concentration-of-measure high-dimension p:*** matrix-factorization org:mat quixotic

november 2016 by nhaliday

pdf data-science books draft acm learning-theory machine-learning synthesis encyclopedic cmu 👳 org:edu markov accretion monte-carlo ground-up big-picture unit dimensionality concentration-of-measure high-dimension p:*** matrix-factorization org:mat quixotic

november 2016 by nhaliday

Princeton University CS Dept COS521: Advanced Algorithm Design Fall 2015

october 2016 by nhaliday

good exposition of curse of dimensionality

princeton
course
algorithms
yoga
👳
tcs
metabuch
lecture-notes
toolkit
dimensionality
sanjeev-arora
unit
concentration-of-measure
hashing
linearity
linear-programming
online-learning
gradient-descent
markov
SDP
approximation
duality
coding-theory
crypto
rigorous-crypto
huge-data-the-biggest
heuristic
counting
sampling
game-theory
decision-theory
high-dimension
p:***
matrix-factorization
quixotic
october 2016 by nhaliday

machine learning - Euclidean distance is usually not good for sparse data? - Cross Validated

machine-learning acm intuition synthesis thinking q-n-a sparsity overflow soft-question dimensionality curiosity separation concentration-of-measure norms nibble novelty high-dimension direction metric-space yoga measure inner-product best-practices

september 2016 by nhaliday

machine-learning acm intuition synthesis thinking q-n-a sparsity overflow soft-question dimensionality curiosity separation concentration-of-measure norms nibble novelty high-dimension direction metric-space yoga measure inner-product best-practices

september 2016 by nhaliday

machine learning - Why is Euclidean distance not a good metric in high dimensions? - Cross Validated

thinking machine-learning math acm synthesis intuition q-n-a overflow soft-question dimensionality hi-order-bits curiosity cartoons concentration-of-measure norms nibble novelty high-dimension direction metric-space yoga measure best-practices

september 2016 by nhaliday

thinking machine-learning math acm synthesis intuition q-n-a overflow soft-question dimensionality hi-order-bits curiosity cartoons concentration-of-measure norms nibble novelty high-dimension direction metric-space yoga measure best-practices

september 2016 by nhaliday

The Modern Algorithmic Toolbox (CS168), Spring 2015-2016

course tcs yoga stanford algorithms synthesis 👳 mihai lecture-notes tim-roughgarden valiant unit hashing sublinear dimensionality embeddings norms gradient-descent toolkit metabuch regularization linear-algebra spectral sampling concentration-of-measure markov monte-carlo fourier sparsity linear-programming optimization expanders compressed-sensing high-dimension p:*** curvature matrix-factorization convexity-curvature quixotic

june 2016 by nhaliday

course tcs yoga stanford algorithms synthesis 👳 mihai lecture-notes tim-roughgarden valiant unit hashing sublinear dimensionality embeddings norms gradient-descent toolkit metabuch regularization linear-algebra spectral sampling concentration-of-measure markov monte-carlo fourier sparsity linear-programming optimization expanders compressed-sensing high-dimension p:*** curvature matrix-factorization convexity-curvature quixotic

june 2016 by nhaliday

Talagrand’s concentration inequality | What's new

may 2016 by nhaliday

Proposition 1 follows easily from the following statement, that asserts that if a convex set {A \subset {\bf R}^n} occupies a non-trivial fraction of the cube {\{-1,+1\}^n}, then the neighbourhood {A_t := \{ x \in {\bf R}^n: \hbox{dist}(x,A) \leq t \}} will occupy almost all of the cube for {t \gg 1}:

exposition
math.CA
math
gowers
concentration-of-measure
mathtariat
random-matrices
levers
estimate
probability
math.MG
geometry
boolean-analysis
nibble
org:bleg
high-dimension
p:whenever
dimensionality
curvature
convexity-curvature
may 2016 by nhaliday

**related tags**

Copy this bookmark: