Is there a common method for detecting the convergence of the Gibbs sampler and the expectation-maximization algorithm? - Quora

october 2019 by nhaliday

In practice and theory it is much easier to diagnose convergence in EM (vanilla or variational) than in any MCMC algorithm (including Gibbs sampling).

https://www.quora.com/How-can-you-determine-if-your-Gibbs-sampler-has-converged

There is a special case when you can actually obtain the stationary distribution, and be sure that you did! If your markov chain consists of a discrete state space, then take the first time that a state repeats in your chain: if you randomly sample an element between the repeating states (but only including one of the endpoints) you will have a sample from your true distribution.

One can achieve this 'exact MCMC sampling' more generally by using the coupling from the past algorithm (Coupling from the past).

Otherwise, there is no rigorous statistical test for convergence. It may be possible to obtain a theoretical bound for the convergence rates: but these are quite difficult to obtain, and quite often too large to be of practical use. For example, even for the simple case of using the Metropolis algorithm for sampling from a two-dimensional uniform distribution, the best convergence rate upper bound achieved, by Persi Diaconis, was something with an astronomical constant factor like 10^300.

In fact, it is fair to say that for most high dimensional problems, we have really no idea whether Gibbs sampling ever comes close to converging, but the best we can do is use some simple diagnostics to detect the most obvious failures.

nibble
q-n-a
qra
acm
stats
probability
limits
convergence
distribution
sampling
markov
monte-carlo
ML-MAP-E
checking
equilibrium
stylized-facts
gelman
levers
mixing
empirical
plots
manifolds
multi
fixed-point
iteration-recursion
heuristic
expert-experience
theory-practice
project
https://www.quora.com/How-can-you-determine-if-your-Gibbs-sampler-has-converged

There is a special case when you can actually obtain the stationary distribution, and be sure that you did! If your markov chain consists of a discrete state space, then take the first time that a state repeats in your chain: if you randomly sample an element between the repeating states (but only including one of the endpoints) you will have a sample from your true distribution.

One can achieve this 'exact MCMC sampling' more generally by using the coupling from the past algorithm (Coupling from the past).

Otherwise, there is no rigorous statistical test for convergence. It may be possible to obtain a theoretical bound for the convergence rates: but these are quite difficult to obtain, and quite often too large to be of practical use. For example, even for the simple case of using the Metropolis algorithm for sampling from a two-dimensional uniform distribution, the best convergence rate upper bound achieved, by Persi Diaconis, was something with an astronomical constant factor like 10^300.

In fact, it is fair to say that for most high dimensional problems, we have really no idea whether Gibbs sampling ever comes close to converging, but the best we can do is use some simple diagnostics to detect the most obvious failures.

october 2019 by nhaliday

Shuffling - Wikipedia

august 2019 by nhaliday

The Gilbert–Shannon–Reeds model provides a mathematical model of the random outcomes of riffling, that has been shown experimentally to be a good fit to human shuffling[2] and that forms the basis for a recommendation that card decks be riffled seven times in order to randomize them thoroughly.[3] Later, mathematicians Lloyd M. Trefethen and Lloyd N. Trefethen authored a paper using a tweaked version of the Gilbert-Shannon-Reeds model showing that the minimum number of riffles for total randomization could also be 5, if the method of defining randomness is changed.[4][5]

nibble
tidbits
trivia
cocktail
wiki
reference
games
howto
random
models
math
applications
probability
math.CO
mixing
markov
sampling
best-practices
acm
august 2019 by nhaliday

Stat 260/CS 294: Bayesian Modeling and Inference

july 2017 by nhaliday

Topics

- Priors (conjugate, noninformative, reference)

- Hierarchical models, spatial models, longitudinal models, dynamic models, survival models

- Testing

- Model choice

- Inference (importance sampling, MCMC, sequential Monte Carlo)

- Nonparametric models (Dirichlet processes, Gaussian processes, neutral-to-the-right processes, completely random measures)

- Decision theory and frequentist perspectives (complete class theorems, consistency, empirical Bayes)

- Experimental design

unit
course
berkeley
expert
michael-jordan
machine-learning
acm
bayesian
probability
stats
lecture-notes
priors-posteriors
markov
monte-carlo
frequentist
latent-variables
decision-theory
expert-experience
confidence
sampling
- Priors (conjugate, noninformative, reference)

- Hierarchical models, spatial models, longitudinal models, dynamic models, survival models

- Testing

- Model choice

- Inference (importance sampling, MCMC, sequential Monte Carlo)

- Nonparametric models (Dirichlet processes, Gaussian processes, neutral-to-the-right processes, completely random measures)

- Decision theory and frequentist perspectives (complete class theorems, consistency, empirical Bayes)

- Experimental design

july 2017 by nhaliday

Unsupervised learning, one notion or many? – Off the convex path

june 2017 by nhaliday

(Task A) Learning a distribution from samples. (Examples: gaussian mixtures, topic models, variational autoencoders,..)

(Task B) Understanding latent structure in the data. This is not the same as (a); for example principal component analysis, clustering, manifold learning etc. identify latent structure but don’t learn a distribution per se.

(Task C) Feature Learning. Learn a mapping from datapoint → feature vector such that classification tasks are easier to carry out on feature vectors rather than datapoints. For example, unsupervised feature learning could help lower the amount of labeled samples needed for learning a classifier, or be useful for domain adaptation.

Task B is often a subcase of Task C, as the intended user of “structure found in data” are humans (scientists) who pour over the representation of data to gain some intuition about its properties, and these “properties” can be often phrased as a classification task.

This post explains the relationship between Tasks A and C, and why they get mixed up in students’ mind. We hope there is also some food for thought here for experts, namely, our discussion about the fragility of the usual “perplexity” definition of unsupervised learning. It explains why Task A doesn’t in practice lead to good enough solution for Task C. For example, it has been believed for many years that for deep learning, unsupervised pretraining should help supervised training, but this has been hard to show in practice.

acmtariat
org:bleg
nibble
machine-learning
acm
thinking
clarity
unsupervised
conceptual-vocab
concept
explanation
features
bayesian
off-convex
deep-learning
latent-variables
generative
intricacy
distribution
sampling
grokkability-clarity
org:popup
(Task B) Understanding latent structure in the data. This is not the same as (a); for example principal component analysis, clustering, manifold learning etc. identify latent structure but don’t learn a distribution per se.

(Task C) Feature Learning. Learn a mapping from datapoint → feature vector such that classification tasks are easier to carry out on feature vectors rather than datapoints. For example, unsupervised feature learning could help lower the amount of labeled samples needed for learning a classifier, or be useful for domain adaptation.

Task B is often a subcase of Task C, as the intended user of “structure found in data” are humans (scientists) who pour over the representation of data to gain some intuition about its properties, and these “properties” can be often phrased as a classification task.

This post explains the relationship between Tasks A and C, and why they get mixed up in students’ mind. We hope there is also some food for thought here for experts, namely, our discussion about the fragility of the usual “perplexity” definition of unsupervised learning. It explains why Task A doesn’t in practice lead to good enough solution for Task C. For example, it has been believed for many years that for deep learning, unsupervised pretraining should help supervised training, but this has been hard to show in practice.

june 2017 by nhaliday

Econometric Modeling as Junk Science

june 2017 by nhaliday

The Credibility Revolution in Empirical Economics: How Better Research Design Is Taking the Con out of Econometrics: https://www.aeaweb.org/articles?id=10.1257/jep.24.2.3

On data, experiments, incentives and highly unconvincing research – papers and hot beverages: https://papersandhotbeverages.wordpress.com/2015/10/31/on-data-experiments-incentives-and-highly-unconvincing-research/

In my view, it has just to do with the fact that academia is a peer monitored organization. In the case of (bad) data collection papers, issues related to measurement are typically boring. They are relegated to appendices, no one really has an incentive to monitor it seriously. The problem is similar in formal theory: no one really goes through the algebra in detail, but it is in principle feasible to do it, and, actually, sometimes these errors are detected. If discussing the algebra of a proof is almost unthinkable in a seminar, going into the details of data collection, measurement and aggregation is not only hard to imagine, but probably intrinsically infeasible.

Something different happens for the experimentalist people. As I was saying, I feel we have come to a point in which many papers are evaluated based on the cleverness and originality of the research design (“Using the World Cup qualifiers as an instrument for patriotism!? Woaw! how cool/crazy is that! I wish I had had that idea”). The sexiness of the identification strategy has too often become a goal in itself. When your peers monitor you paying more attention to the originality of the identification strategy than to the research question, you probably have an incentive to mine reality for ever crazier discontinuities. It is true methodologists have been criticized in the past for analogous reasons, such as being guided by the desire to increase mathematical complexity without a clear benefit. But, if you work with pure formal theory or statistical theory, your work is not meant to immediately answer question about the real world, but instead to serve other researchers in their quest. This is something that can, in general, not be said of applied CI work.

https://twitter.com/pseudoerasmus/status/662007951415238656

This post should have been entitled “Zombies who only think of their next cool IV fix”

https://twitter.com/pseudoerasmus/status/662692917069422592

massive lust for quasi-natural experiments, regression discontinuities

barely matters if the effects are not all that big

I suppose even the best of things must reach their decadent phase; methodological innov. to manias……

https://twitter.com/cblatts/status/920988530788130816

Following this "collapse of small-N social psych results" business, where do I predict econ will collapse? I see two main contenders.

One is lab studies. I dallied with these a few years ago in a Kenya lab. We ran several pilots of N=200 to figure out the best way to treat

and to measure the outcome. Every pilot gave us a different stat sig result. I could have written six papers concluding different things.

I gave up more skeptical of these lab studies than ever before. The second contender is the long run impacts literature in economic history

We should be very suspicious since we never see a paper showing that a historical event had no effect on modern day institutions or dvpt.

On the one hand I find these studies fun, fascinating, and probably true in a broad sense. They usually reinforce a widely believed history

argument with interesting data and a cute empirical strategy. But I don't think anyone believes the standard errors. There's probably a HUGE

problem of nonsignificant results staying in the file drawer. Also, there are probably data problems that don't get revealed, as we see with

the recent Piketty paper (http://marginalrevolution.com/marginalrevolution/2017/10/pikettys-data-reliable.html). So I take that literature with a vat of salt, even if I enjoy and admire the works

I used to think field experiments would show little consistency in results across place. That external validity concerns would be fatal.

In fact the results across different samples and places have proven surprisingly similar across places, and added a lot to general theory

Last, I've come to believe there is no such thing as a useful instrumental variable. The ones that actually meet the exclusion restriction

are so weird & particular that the local treatment effect is likely far different from the average treatment effect in non-transparent ways.

Most of the other IVs don't plausibly meet the e clue ion restriction. I mean, we should be concerned when the IV estimate is always 10x

larger than the OLS coefficient. This I find myself much more persuaded by simple natural experiments that use OLS, diff in diff, or

discontinuities, alongside randomized trials.

What do others think are the cliffs in economics?

PS All of these apply to political science too. Though I have a special extra target in poli sci: survey experiments! A few are good. I like

Dan Corstange's work. But it feels like 60% of dissertations these days are experiments buried in a survey instrument that measure small

changes in response. These at least have large N. But these are just uncontrolled labs, with negligible external validity in my mind.

The good ones are good. This method has its uses. But it's being way over-applied. More people have to make big and risky investments in big

natural and field experiments. Time to raise expectations and ambitions. This expectation bar, not technical ability, is the big advantage

economists have over political scientists when they compete in the same space.

(Ok. So are there any friends and colleagues I haven't insulted this morning? Let me know and I'll try my best to fix it with a screed)

HOW MUCH SHOULD WE TRUST DIFFERENCES-IN-DIFFERENCES ESTIMATES?∗: https://economics.mit.edu/files/750

Most papers that employ Differences-in-Differences estimation (DD) use many years of data and focus on serially correlated outcomes but ignore that the resulting standard errors are inconsistent. To illustrate the severity of this issue, we randomly generate placebo laws in state-level data on female wages from the Current Population Survey. For each law, we use OLS to compute the DD estimate of its “effect” as well as the standard error of this estimate. These conventional DD standard errors severely understate the standard deviation of the estimators: we find an “effect” significant at the 5 percent level for up to 45 percent of the placebo interventions. We use Monte Carlo simulations to investigate how well existing methods help solve this problem. Econometric corrections that place a specific parametric form on the time-series process do not perform well. Bootstrap (taking into account the auto-correlation of the data) works well when the number of states is large enough. Two corrections based on asymptotic approximation of the variance-covariance matrix work well for moderate numbers of states and one correction that collapses the time series information into a “pre” and “post” period and explicitly takes into account the effective sample size works well even for small numbers of states.

‘METRICS MONDAY: 2SLS–CHRONICLE OF A DEATH FORETOLD: http://marcfbellemare.com/wordpress/12733

As it turns out, Young finds that

1. Conventional tests tend to overreject the null hypothesis that the 2SLS coefficient is equal to zero.

2. 2SLS estimates are falsely declared significant one third to one half of the time, depending on the method used for bootstrapping.

3. The 99-percent confidence intervals (CIs) of those 2SLS estimates include the OLS point estimate over 90 of the time. They include the full OLS 99-percent CI over 75 percent of the time.

4. 2SLS estimates are extremely sensitive to outliers. Removing simply one outlying cluster or observation, almost half of 2SLS results become insignificant. Things get worse when removing two outlying clusters or observations, as over 60 percent of 2SLS results then become insignificant.

5. Using a Durbin-Wu-Hausman test, less than 15 percent of regressions can reject the null that OLS estimates are unbiased at the 1-percent level.

6. 2SLS has considerably higher mean squared error than OLS.

7. In one third to one half of published results, the null that the IVs are totally irrelevant cannot be rejected, and so the correlation between the endogenous variable(s) and the IVs is due to finite sample correlation between them.

8. Finally, fewer than 10 percent of 2SLS estimates reject instrument irrelevance and the absence of OLS bias at the 1-percent level using a Durbin-Wu-Hausman test. It gets much worse–fewer than 5 percent–if you add in the requirement that the 2SLS CI that excludes the OLS estimate.

Methods Matter: P-Hacking and Causal Inference in Economics*: http://ftp.iza.org/dp11796.pdf

Applying multiple methods to 13,440 hypothesis tests reported in 25 top economics journals in 2015, we show that selective publication and p-hacking is a substantial problem in research employing DID and (in particular) IV. RCT and RDD are much less problematic. Almost 25% of claims of marginally significant results in IV papers are misleading.

https://twitter.com/NoamJStein/status/1040887307568664577

Ever since I learned social science is completely fake, I've had a lot more time to do stuff that matters, like deadlifting and reading about Mediterranean haplogroups

--

Wait, so, from fakest to realest IV>DD>RCT>RDD? That totally matches my impression.

https://twitter.com/wwwojtekk/status/1190731344336293889

https://archive.is/EZu0h

Great (not completely new but still good to have it in one place) discussion of RCTs and inference in economics by Deaton, my favorite sentences (more general than just about RCT) below

Randomization in the tropics revisited: a theme and eleven variations: https://scholar.princeton.edu/sites/default/files/deaton/files/deaton_randomization_revisited_v3_2019.pdf

org:junk
org:edu
economics
econometrics
methodology
realness
truth
science
social-science
accuracy
generalization
essay
article
hmm
multi
study
🎩
empirical
causation
error
critique
sociology
criminology
hypothesis-testing
econotariat
broad-econ
cliometrics
endo-exo
replication
incentives
academia
measurement
wire-guided
intricacy
twitter
social
discussion
pseudoE
effect-size
reflection
field-study
stat-power
piketty
marginal-rev
commentary
data-science
expert-experience
regression
gotchas
rant
map-territory
pdf
simulation
moments
confidence
bias-variance
stats
endogenous-exogenous
control
meta:science
meta-analysis
outliers
summary
sampling
ensembles
monte-carlo
theory-practice
applicability-prereqs
chart
comparison
shift
ratty
unaffiliated
garett-jones
On data, experiments, incentives and highly unconvincing research – papers and hot beverages: https://papersandhotbeverages.wordpress.com/2015/10/31/on-data-experiments-incentives-and-highly-unconvincing-research/

In my view, it has just to do with the fact that academia is a peer monitored organization. In the case of (bad) data collection papers, issues related to measurement are typically boring. They are relegated to appendices, no one really has an incentive to monitor it seriously. The problem is similar in formal theory: no one really goes through the algebra in detail, but it is in principle feasible to do it, and, actually, sometimes these errors are detected. If discussing the algebra of a proof is almost unthinkable in a seminar, going into the details of data collection, measurement and aggregation is not only hard to imagine, but probably intrinsically infeasible.

Something different happens for the experimentalist people. As I was saying, I feel we have come to a point in which many papers are evaluated based on the cleverness and originality of the research design (“Using the World Cup qualifiers as an instrument for patriotism!? Woaw! how cool/crazy is that! I wish I had had that idea”). The sexiness of the identification strategy has too often become a goal in itself. When your peers monitor you paying more attention to the originality of the identification strategy than to the research question, you probably have an incentive to mine reality for ever crazier discontinuities. It is true methodologists have been criticized in the past for analogous reasons, such as being guided by the desire to increase mathematical complexity without a clear benefit. But, if you work with pure formal theory or statistical theory, your work is not meant to immediately answer question about the real world, but instead to serve other researchers in their quest. This is something that can, in general, not be said of applied CI work.

https://twitter.com/pseudoerasmus/status/662007951415238656

This post should have been entitled “Zombies who only think of their next cool IV fix”

https://twitter.com/pseudoerasmus/status/662692917069422592

massive lust for quasi-natural experiments, regression discontinuities

barely matters if the effects are not all that big

I suppose even the best of things must reach their decadent phase; methodological innov. to manias……

https://twitter.com/cblatts/status/920988530788130816

Following this "collapse of small-N social psych results" business, where do I predict econ will collapse? I see two main contenders.

One is lab studies. I dallied with these a few years ago in a Kenya lab. We ran several pilots of N=200 to figure out the best way to treat

and to measure the outcome. Every pilot gave us a different stat sig result. I could have written six papers concluding different things.

I gave up more skeptical of these lab studies than ever before. The second contender is the long run impacts literature in economic history

We should be very suspicious since we never see a paper showing that a historical event had no effect on modern day institutions or dvpt.

On the one hand I find these studies fun, fascinating, and probably true in a broad sense. They usually reinforce a widely believed history

argument with interesting data and a cute empirical strategy. But I don't think anyone believes the standard errors. There's probably a HUGE

problem of nonsignificant results staying in the file drawer. Also, there are probably data problems that don't get revealed, as we see with

the recent Piketty paper (http://marginalrevolution.com/marginalrevolution/2017/10/pikettys-data-reliable.html). So I take that literature with a vat of salt, even if I enjoy and admire the works

I used to think field experiments would show little consistency in results across place. That external validity concerns would be fatal.

In fact the results across different samples and places have proven surprisingly similar across places, and added a lot to general theory

Last, I've come to believe there is no such thing as a useful instrumental variable. The ones that actually meet the exclusion restriction

are so weird & particular that the local treatment effect is likely far different from the average treatment effect in non-transparent ways.

Most of the other IVs don't plausibly meet the e clue ion restriction. I mean, we should be concerned when the IV estimate is always 10x

larger than the OLS coefficient. This I find myself much more persuaded by simple natural experiments that use OLS, diff in diff, or

discontinuities, alongside randomized trials.

What do others think are the cliffs in economics?

PS All of these apply to political science too. Though I have a special extra target in poli sci: survey experiments! A few are good. I like

Dan Corstange's work. But it feels like 60% of dissertations these days are experiments buried in a survey instrument that measure small

changes in response. These at least have large N. But these are just uncontrolled labs, with negligible external validity in my mind.

The good ones are good. This method has its uses. But it's being way over-applied. More people have to make big and risky investments in big

natural and field experiments. Time to raise expectations and ambitions. This expectation bar, not technical ability, is the big advantage

economists have over political scientists when they compete in the same space.

(Ok. So are there any friends and colleagues I haven't insulted this morning? Let me know and I'll try my best to fix it with a screed)

HOW MUCH SHOULD WE TRUST DIFFERENCES-IN-DIFFERENCES ESTIMATES?∗: https://economics.mit.edu/files/750

Most papers that employ Differences-in-Differences estimation (DD) use many years of data and focus on serially correlated outcomes but ignore that the resulting standard errors are inconsistent. To illustrate the severity of this issue, we randomly generate placebo laws in state-level data on female wages from the Current Population Survey. For each law, we use OLS to compute the DD estimate of its “effect” as well as the standard error of this estimate. These conventional DD standard errors severely understate the standard deviation of the estimators: we find an “effect” significant at the 5 percent level for up to 45 percent of the placebo interventions. We use Monte Carlo simulations to investigate how well existing methods help solve this problem. Econometric corrections that place a specific parametric form on the time-series process do not perform well. Bootstrap (taking into account the auto-correlation of the data) works well when the number of states is large enough. Two corrections based on asymptotic approximation of the variance-covariance matrix work well for moderate numbers of states and one correction that collapses the time series information into a “pre” and “post” period and explicitly takes into account the effective sample size works well even for small numbers of states.

‘METRICS MONDAY: 2SLS–CHRONICLE OF A DEATH FORETOLD: http://marcfbellemare.com/wordpress/12733

As it turns out, Young finds that

1. Conventional tests tend to overreject the null hypothesis that the 2SLS coefficient is equal to zero.

2. 2SLS estimates are falsely declared significant one third to one half of the time, depending on the method used for bootstrapping.

3. The 99-percent confidence intervals (CIs) of those 2SLS estimates include the OLS point estimate over 90 of the time. They include the full OLS 99-percent CI over 75 percent of the time.

4. 2SLS estimates are extremely sensitive to outliers. Removing simply one outlying cluster or observation, almost half of 2SLS results become insignificant. Things get worse when removing two outlying clusters or observations, as over 60 percent of 2SLS results then become insignificant.

5. Using a Durbin-Wu-Hausman test, less than 15 percent of regressions can reject the null that OLS estimates are unbiased at the 1-percent level.

6. 2SLS has considerably higher mean squared error than OLS.

7. In one third to one half of published results, the null that the IVs are totally irrelevant cannot be rejected, and so the correlation between the endogenous variable(s) and the IVs is due to finite sample correlation between them.

8. Finally, fewer than 10 percent of 2SLS estimates reject instrument irrelevance and the absence of OLS bias at the 1-percent level using a Durbin-Wu-Hausman test. It gets much worse–fewer than 5 percent–if you add in the requirement that the 2SLS CI that excludes the OLS estimate.

Methods Matter: P-Hacking and Causal Inference in Economics*: http://ftp.iza.org/dp11796.pdf

Applying multiple methods to 13,440 hypothesis tests reported in 25 top economics journals in 2015, we show that selective publication and p-hacking is a substantial problem in research employing DID and (in particular) IV. RCT and RDD are much less problematic. Almost 25% of claims of marginally significant results in IV papers are misleading.

https://twitter.com/NoamJStein/status/1040887307568664577

Ever since I learned social science is completely fake, I've had a lot more time to do stuff that matters, like deadlifting and reading about Mediterranean haplogroups

--

Wait, so, from fakest to realest IV>DD>RCT>RDD? That totally matches my impression.

https://twitter.com/wwwojtekk/status/1190731344336293889

https://archive.is/EZu0h

Great (not completely new but still good to have it in one place) discussion of RCTs and inference in economics by Deaton, my favorite sentences (more general than just about RCT) below

Randomization in the tropics revisited: a theme and eleven variations: https://scholar.princeton.edu/sites/default/files/deaton/files/deaton_randomization_revisited_v3_2019.pdf

june 2017 by nhaliday

Equivalence between counting and sampling

february 2017 by nhaliday

also: every counting problem either has FPTRAS or no approx. w/i polynomial factor

pdf
exposition
lecture-notes
berkeley
nibble
tcs
counting
sampling
characterization
complexity
approximation
rand-approx
proofs
february 2017 by nhaliday

cc.complexity theory - The complexity of sampling (approximately) the Fourier transform of a Boolean function - Theoretical Computer Science Stack Exchange

q-n-a overflow tcs complexity quantum quantum-info sampling fourier boolean-analysis research aaronson tcstariat mathtariat hierarchy rand-complexity open-problems counting nibble questions

february 2017 by nhaliday

q-n-a overflow tcs complexity quantum quantum-info sampling fourier boolean-analysis research aaronson tcstariat mathtariat hierarchy rand-complexity open-problems counting nibble questions

february 2017 by nhaliday

bayesian - How would you explain Markov Chain Monte Carlo (MCMC) to a layperson? - Cross Validated

q-n-a overflow stats acm markov monte-carlo explanation concept big-picture sampling nibble summary init synthesis limits convergence equilibrium mixing motivation applications

january 2017 by nhaliday

q-n-a overflow stats acm markov monte-carlo explanation concept big-picture sampling nibble summary init synthesis limits convergence equilibrium mixing motivation applications

january 2017 by nhaliday

gt.geometric topology - Intuitive crutches for higher dimensional thinking - MathOverflow

december 2016 by nhaliday

Terry Tao:

I can't help you much with high-dimensional topology - it's not my field, and I've not picked up the various tricks topologists use to get a grip on the subject - but when dealing with the geometry of high-dimensional (or infinite-dimensional) vector spaces such as R^n, there are plenty of ways to conceptualise these spaces that do not require visualising more than three dimensions directly.

For instance, one can view a high-dimensional vector space as a state space for a system with many degrees of freedom. A megapixel image, for instance, is a point in a million-dimensional vector space; by varying the image, one can explore the space, and various subsets of this space correspond to various classes of images.

One can similarly interpret sound waves, a box of gases, an ecosystem, a voting population, a stream of digital data, trials of random variables, the results of a statistical survey, a probabilistic strategy in a two-player game, and many other concrete objects as states in a high-dimensional vector space, and various basic concepts such as convexity, distance, linearity, change of variables, orthogonality, or inner product can have very natural meanings in some of these models (though not in all).

It can take a bit of both theory and practice to merge one's intuition for these things with one's spatial intuition for vectors and vector spaces, but it can be done eventually (much as after one has enough exposure to measure theory, one can start merging one's intuition regarding cardinality, mass, length, volume, probability, cost, charge, and any number of other "real-life" measures).

For instance, the fact that most of the mass of a unit ball in high dimensions lurks near the boundary of the ball can be interpreted as a manifestation of the law of large numbers, using the interpretation of a high-dimensional vector space as the state space for a large number of trials of a random variable.

More generally, many facts about low-dimensional projections or slices of high-dimensional objects can be viewed from a probabilistic, statistical, or signal processing perspective.

Scott Aaronson:

Here are some of the crutches I've relied on. (Admittedly, my crutches are probably much more useful for theoretical computer science, combinatorics, and probability than they are for geometry, topology, or physics. On a related note, I personally have a much easier time thinking about R^n than about, say, R^4 or R^5!)

1. If you're trying to visualize some 4D phenomenon P, first think of a related 3D phenomenon P', and then imagine yourself as a 2D being who's trying to visualize P'. The advantage is that, unlike with the 4D vs. 3D case, you yourself can easily switch between the 3D and 2D perspectives, and can therefore get a sense of exactly what information is being lost when you drop a dimension. (You could call this the "Flatland trick," after the most famous literary work to rely on it.)

2. As someone else mentioned, discretize! Instead of thinking about R^n, think about the Boolean hypercube {0,1}^n, which is finite and usually easier to get intuition about. (When working on problems, I often find myself drawing {0,1}^4 on a sheet of paper by drawing two copies of {0,1}^3 and then connecting the corresponding vertices.)

3. Instead of thinking about a subset S⊆R^n, think about its characteristic function f:R^n→{0,1}. I don't know why that trivial perspective switch makes such a big difference, but it does ... maybe because it shifts your attention to the process of computing f, and makes you forget about the hopeless task of visualizing S!

4. One of the central facts about R^n is that, while it has "room" for only n orthogonal vectors, it has room for exp(n) almost-orthogonal vectors. Internalize that one fact, and so many other properties of R^n (for example, that the n-sphere resembles a "ball with spikes sticking out," as someone mentioned before) will suddenly seem non-mysterious. In turn, one way to internalize the fact that R^n has so many almost-orthogonal vectors is to internalize Shannon's theorem that there exist good error-correcting codes.

5. To get a feel for some high-dimensional object, ask questions about the behavior of a process that takes place on that object. For example: if I drop a ball here, which local minimum will it settle into? How long does this random walk on {0,1}^n take to mix?

Gil Kalai:

This is a slightly different point, but Vitali Milman, who works in high-dimensional convexity, likes to draw high-dimensional convex bodies in a non-convex way. This is to convey the point that if you take the convex hull of a few points on the unit sphere of R^n, then for large n very little of the measure of the convex body is anywhere near the corners, so in a certain sense the body is a bit like a small sphere with long thin "spikes".

q-n-a
intuition
math
visual-understanding
list
discussion
thurston
tidbits
aaronson
tcs
geometry
problem-solving
yoga
👳
big-list
metabuch
tcstariat
gowers
mathtariat
acm
overflow
soft-question
levers
dimensionality
hi-order-bits
insight
synthesis
thinking
models
cartoons
coding-theory
information-theory
probability
concentration-of-measure
magnitude
linear-algebra
boolean-analysis
analogy
arrows
lifts-projections
measure
markov
sampling
shannon
conceptual-vocab
nibble
degrees-of-freedom
worrydream
neurons
retrofit
oscillation
paradox
novelty
tricki
concrete
high-dimension
s:***
manifolds
direction
curvature
convexity-curvature
elegance
guessing
I can't help you much with high-dimensional topology - it's not my field, and I've not picked up the various tricks topologists use to get a grip on the subject - but when dealing with the geometry of high-dimensional (or infinite-dimensional) vector spaces such as R^n, there are plenty of ways to conceptualise these spaces that do not require visualising more than three dimensions directly.

For instance, one can view a high-dimensional vector space as a state space for a system with many degrees of freedom. A megapixel image, for instance, is a point in a million-dimensional vector space; by varying the image, one can explore the space, and various subsets of this space correspond to various classes of images.

One can similarly interpret sound waves, a box of gases, an ecosystem, a voting population, a stream of digital data, trials of random variables, the results of a statistical survey, a probabilistic strategy in a two-player game, and many other concrete objects as states in a high-dimensional vector space, and various basic concepts such as convexity, distance, linearity, change of variables, orthogonality, or inner product can have very natural meanings in some of these models (though not in all).

It can take a bit of both theory and practice to merge one's intuition for these things with one's spatial intuition for vectors and vector spaces, but it can be done eventually (much as after one has enough exposure to measure theory, one can start merging one's intuition regarding cardinality, mass, length, volume, probability, cost, charge, and any number of other "real-life" measures).

For instance, the fact that most of the mass of a unit ball in high dimensions lurks near the boundary of the ball can be interpreted as a manifestation of the law of large numbers, using the interpretation of a high-dimensional vector space as the state space for a large number of trials of a random variable.

More generally, many facts about low-dimensional projections or slices of high-dimensional objects can be viewed from a probabilistic, statistical, or signal processing perspective.

Scott Aaronson:

Here are some of the crutches I've relied on. (Admittedly, my crutches are probably much more useful for theoretical computer science, combinatorics, and probability than they are for geometry, topology, or physics. On a related note, I personally have a much easier time thinking about R^n than about, say, R^4 or R^5!)

1. If you're trying to visualize some 4D phenomenon P, first think of a related 3D phenomenon P', and then imagine yourself as a 2D being who's trying to visualize P'. The advantage is that, unlike with the 4D vs. 3D case, you yourself can easily switch between the 3D and 2D perspectives, and can therefore get a sense of exactly what information is being lost when you drop a dimension. (You could call this the "Flatland trick," after the most famous literary work to rely on it.)

2. As someone else mentioned, discretize! Instead of thinking about R^n, think about the Boolean hypercube {0,1}^n, which is finite and usually easier to get intuition about. (When working on problems, I often find myself drawing {0,1}^4 on a sheet of paper by drawing two copies of {0,1}^3 and then connecting the corresponding vertices.)

3. Instead of thinking about a subset S⊆R^n, think about its characteristic function f:R^n→{0,1}. I don't know why that trivial perspective switch makes such a big difference, but it does ... maybe because it shifts your attention to the process of computing f, and makes you forget about the hopeless task of visualizing S!

4. One of the central facts about R^n is that, while it has "room" for only n orthogonal vectors, it has room for exp(n) almost-orthogonal vectors. Internalize that one fact, and so many other properties of R^n (for example, that the n-sphere resembles a "ball with spikes sticking out," as someone mentioned before) will suddenly seem non-mysterious. In turn, one way to internalize the fact that R^n has so many almost-orthogonal vectors is to internalize Shannon's theorem that there exist good error-correcting codes.

5. To get a feel for some high-dimensional object, ask questions about the behavior of a process that takes place on that object. For example: if I drop a ball here, which local minimum will it settle into? How long does this random walk on {0,1}^n take to mix?

Gil Kalai:

This is a slightly different point, but Vitali Milman, who works in high-dimensional convexity, likes to draw high-dimensional convex bodies in a non-convex way. This is to convey the point that if you take the convex hull of a few points on the unit sphere of R^n, then for large n very little of the measure of the convex body is anywhere near the corners, so in a certain sense the body is a bit like a small sphere with long thin "spikes".

december 2016 by nhaliday

Shtetl-Optimized » Blog Archive » Avi Wigderson’s “Permanent” Impact on Me

algebraic-complexity complexity aaronson talks tcs reflection synthesis tcstariat big-picture wigderson valiant sampling quantum quantum-info hierarchy counting nibble org:bleg big-surf essay p:whenever

october 2016 by nhaliday

algebraic-complexity complexity aaronson talks tcs reflection synthesis tcstariat big-picture wigderson valiant sampling quantum quantum-info hierarchy counting nibble org:bleg big-surf essay p:whenever

october 2016 by nhaliday

Princeton University CS Dept COS521: Advanced Algorithm Design Fall 2015

october 2016 by nhaliday

good exposition of curse of dimensionality

princeton
course
algorithms
yoga
👳
tcs
metabuch
lecture-notes
toolkit
dimensionality
sanjeev-arora
unit
concentration-of-measure
hashing
linearity
linear-programming
online-learning
gradient-descent
markov
SDP
approximation
duality
coding-theory
crypto
rigorous-crypto
huge-data-the-biggest
heuristic
counting
sampling
game-theory
decision-theory
high-dimension
p:***
matrix-factorization
quixotic
advanced
october 2016 by nhaliday

The Power of Noise - Less Wrong

rationality complexity essay reflection synthesis philosophy lesswrong insight len:long spock 🤖 ratty 2014 clever-rats acmtariat rand-approx big-picture rand-complexity markov monte-carlo tcs rhetoric random clarity sampling unit nibble hi-order-bits p:whenever s:** grokkability-clarity

august 2016 by nhaliday

rationality complexity essay reflection synthesis philosophy lesswrong insight len:long spock 🤖 ratty 2014 clever-rats acmtariat rand-approx big-picture rand-complexity markov monte-carlo tcs rhetoric random clarity sampling unit nibble hi-order-bits p:whenever s:** grokkability-clarity

august 2016 by nhaliday

CS294 MARKOV CHAIN MONTE CARLO: FOUNDATIONS & APPLICATIONS, FALL 2009

course berkeley tcs expert yoga 👳 lecture-notes topics markov monte-carlo sampling ergodic unit mixing counting approximation math.FA phase-transition stat-mech spectral graphs graph-theory random ising p:someday expert-experience quixotic advanced

august 2016 by nhaliday

course berkeley tcs expert yoga 👳 lecture-notes topics markov monte-carlo sampling ergodic unit mixing counting approximation math.FA phase-transition stat-mech spectral graphs graph-theory random ising p:someday expert-experience quixotic advanced

august 2016 by nhaliday

The Modern Algorithmic Toolbox (CS168), Spring 2015-2016

course tcs yoga stanford algorithms synthesis 👳 mihai lecture-notes tim-roughgarden valiant unit hashing sublinear dimensionality embeddings norms gradient-descent toolkit metabuch regularization linear-algebra spectral sampling concentration-of-measure markov monte-carlo fourier sparsity linear-programming optimization expanders compressed-sensing high-dimension p:*** curvature matrix-factorization convexity-curvature quixotic elegance advanced generalization exploratory differential-privacy

june 2016 by nhaliday

course tcs yoga stanford algorithms synthesis 👳 mihai lecture-notes tim-roughgarden valiant unit hashing sublinear dimensionality embeddings norms gradient-descent toolkit metabuch regularization linear-algebra spectral sampling concentration-of-measure markov monte-carlo fourier sparsity linear-programming optimization expanders compressed-sensing high-dimension p:*** curvature matrix-factorization convexity-curvature quixotic elegance advanced generalization exploratory differential-privacy

june 2016 by nhaliday

**related tags**

Copy this bookmark: