nhaliday + acm + probability   80

Prisoner's dilemma - Wikipedia
caveat to result below:
An extension of the IPD is an evolutionary stochastic IPD, in which the relative abundance of particular strategies is allowed to change, with more successful strategies relatively increasing. This process may be accomplished by having less successful players imitate the more successful strategies, or by eliminating less successful players from the game, while multiplying the more successful ones. It has been shown that unfair ZD strategies are not evolutionarily stable. The key intuition is that an evolutionarily stable strategy must not only be able to invade another population (which extortionary ZD strategies can do) but must also perform well against other players of the same type (which extortionary ZD players do poorly, because they reduce each other's surplus).[14]

Theory and simulations confirm that beyond a critical population size, ZD extortion loses out in evolutionary competition against more cooperative strategies, and as a result, the average payoff in the population increases when the population is bigger. In addition, there are some cases in which extortioners may even catalyze cooperation by helping to break out of a face-off between uniform defectors and win–stay, lose–switch agents.[8]

https://alfanl.com/2018/04/12/defection/
Nature boils down to a few simple concepts.

Haters will point out that I oversimplify. The haters are wrong. I am good at saying a lot with few words. Nature indeed boils down to a few simple concepts.

In life, you can either cooperate or defect.

Used to be that defection was the dominant strategy, say in the time when the Roman empire started to crumble. Everybody complained about everybody and in the end nothing got done. Then came Jesus, who told people to be loving and cooperative, and boom: 1800 years later we get the industrial revolution.

Because of Jesus we now find ourselves in a situation where cooperation is the dominant strategy. A normie engages in a ton of cooperation: with the tax collector who wants more and more of his money, with schools who want more and more of his kid’s time, with media who wants him to repeat more and more party lines, with the Zeitgeist of the Collective Spirit of the People’s Progress Towards a New Utopia. Essentially, our normie is cooperating himself into a crumbling Western empire.

Turns out that if everyone blindly cooperates, parasites sprout up like weeds until defection once again becomes the standard.

The point of a post-Christian religion is to once again create conditions for the kind of cooperation that led to the industrial revolution. This necessitates throwing out undead Christianity: you do not blindly cooperate. You cooperate with people that cooperate with you, you defect on people that defect on you. Christianity mixed with Darwinism. God and Gnon meet.

This also means we re-establish spiritual hierarchy, which, like regular hierarchy, is a prerequisite for cooperation. It is this hierarchical cooperation that turns a household into a force to be reckoned with, that allows a group of men to unite as a front against their enemies, that allows a tribe to conquer the world. Remember: Scientology bullied the Cathedral’s tax department into submission.

With a functioning hierarchy, men still gossip, lie and scheme, but they will do so in whispers behind closed doors. In your face they cooperate and contribute to the group’s wellbeing because incentives are thus that contributing to group wellbeing heightens status.

Without a functioning hierarchy, men gossip, lie and scheme, but they do so in your face, and they tell you that you are positively deluded for accusing them of gossiping, lying and scheming. Seeds will not sprout in such ground.

Spiritual dominance is established in the same way any sort of dominance is established: fought for, taken. But the fight is ritualistic. You can’t force spiritual dominance if no one listens, or if you are silenced the ritual is not allowed to happen.

If one of our priests is forbidden from establishing spiritual dominance, that is a sure sign an enemy priest is in better control and has vested interest in preventing you from establishing spiritual dominance..

They defect on you, you defect on them. Let them suffer the consequences of enemy priesthood, among others characterized by the annoying tendency that very little is said with very many words.

https://contingentnotarbitrary.com/2018/04/14/rederiving-christianity/
To recap, we started with a secular definition of Logos and noted that its telos is existence. Given human nature, game theory and the power of cooperation, the highest expression of that telos is freely chosen universal love, tempered by constant vigilance against defection while maintaining compassion for the defectors and forgiving those who repent. In addition, we must know the telos in order to fulfill it.

In Christian terms, looks like we got over half of the Ten Commandments (know Logos for the First, don’t defect or tempt yourself to defect for the rest), the importance of free will, the indestructibility of evil (group cooperation vs individual defection), loving the sinner and hating the sin (with defection as the sin), forgiveness (with conditions), and love and compassion toward all, assuming only secular knowledge and that it’s good to exist.

Iterated Prisoner's Dilemma is an Ultimatum Game: http://infoproc.blogspot.com/2012/07/iterated-prisoners-dilemma-is-ultimatum.html
The history of IPD shows that bounded cognition prevented the dominant strategies from being discovered for over over 60 years, despite significant attention from game theorists, computer scientists, economists, evolutionary biologists, etc. Press and Dyson have shown that IPD is effectively an ultimatum game, which is very different from the Tit for Tat stories told by generations of people who worked on IPD (Axelrod, Dawkins, etc., etc.).

...

For evolutionary biologists: Dyson clearly thinks this result has implications for multilevel (group vs individual selection):
... Cooperation loses and defection wins. The ZD strategies confirm this conclusion and make it sharper. ... The system evolved to give cooperative tribes an advantage over non-cooperative tribes, using punishment to give cooperation an evolutionary advantage within the tribe. This double selection of tribes and individuals goes way beyond the Prisoners' Dilemma model.

implications for fractionalized Europe vis-a-vis unified China?

and more broadly does this just imply we're doomed in the long run RE: cooperation, morality, the "good society", so on...? war and group-selection is the only way to get a non-crab bucket civilization?

Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent:
http://www.pnas.org/content/109/26/10409.full
http://www.pnas.org/content/109/26/10409.full.pdf
https://www.edge.org/conversation/william_h_press-freeman_dyson-on-iterated-prisoners-dilemma-contains-strategies-that

https://en.wikipedia.org/wiki/Ultimatum_game

analogy for ultimatum game: the state gives the demos a bargain take-it-or-leave-it, and...if the demos refuses...violence?

The nature of human altruism: http://sci-hub.tw/https://www.nature.com/articles/nature02043
- Ernst Fehr & Urs Fischbacher

Some of the most fundamental questions concerning our evolutionary origins, our social relations, and the organization of society are centred around issues of altruism and selfishness. Experimental evidence indicates that human altruism is a powerful force and is unique in the animal world. However, there is much individual heterogeneity and the interaction between altruists and selfish individuals is vital to human cooperation. Depending on the environment, a minority of altruists can force a majority of selfish individuals to cooperate or, conversely, a few egoists can induce a large number of altruists to defect. Current gene-based evolutionary theories cannot explain important patterns of human altruism, pointing towards the importance of both theories of cultural evolution as well as gene–culture co-evolution.

...

Why are humans so unusual among animals in this respect? We propose that quantitatively, and probably even qualitatively, unique patterns of human altruism provide the answer to this question. Human altruism goes far beyond that which has been observed in the animal world. Among animals, fitness-reducing acts that confer fitness benefits on other individuals are largely restricted to kin groups; despite several decades of research, evidence for reciprocal altruism in pair-wise repeated encounters4,5 remains scarce6–8. Likewise, there is little evidence so far that individual reputation building affects cooperation in animals, which contrasts strongly with what we find in humans. If we randomly pick two human strangers from a modern society and give them the chance to engage in repeated anonymous exchanges in a laboratory experiment, there is a high probability that reciprocally altruistic behaviour will emerge spontaneously9,10.

However, human altruism extends far beyond reciprocal altruism and reputation-based cooperation, taking the form of strong reciprocity11,12. Strong reciprocity is a combination of altruistic rewarding, which is a predisposition to reward others for cooperative, norm-abiding behaviours, and altruistic punishment, which is a propensity to impose sanctions on others for norm violations. Strong reciprocators bear the cost of rewarding or punishing even if they gain no individual economic benefit whatsoever from their acts. In contrast, reciprocal altruists, as they have been defined in the biological literature4,5, reward and punish only if this is in their long-term self-interest. Strong reciprocity thus constitutes a powerful incentive for cooperation even in non-repeated interactions and when reputation gains are absent, because strong reciprocators will reward those who cooperate and punish those who defect.

...

We will show that the interaction between selfish and strongly reciprocal … [more]
concept  conceptual-vocab  wiki  reference  article  models  GT-101  game-theory  anthropology  cultural-dynamics  trust  cooperate-defect  coordination  iteration-recursion  sequential  axelrod  discrete  smoothness  evolution  evopsych  EGT  economics  behavioral-econ  sociology  new-religion  deep-materialism  volo-avolo  characterization  hsu  scitariat  altruism  justice  group-selection  decision-making  tribalism  organizing  hari-seldon  theory-practice  applicability-prereqs  bio  finiteness  multi  history  science  social-science  decision-theory  commentary  study  summary  giants  the-trenches  zero-positive-sum  🔬  bounded-cognition  info-dynamics  org:edge  explanation  exposition  org:nat  eden  retention  long-short-run  darwinian  markov  equilibrium  linear-algebra  nitty-gritty  competition  war  explanans  n-factor  europe  the-great-west-whale  occident  china  asia  sinosphere  orient  decentralized  markets  market-failure  cohesion  metabuch  stylized-facts  interdisciplinary  physics  pdf  pessimism  time  insight  the-basilisk  noblesse-oblige  the-watchers  ideas  l 
march 2018 by nhaliday
Stein's example - Wikipedia
Stein's example (or phenomenon or paradox), in decision theory and estimation theory, is the phenomenon that when three or more parameters are estimated simultaneously, there exist combined estimators more accurate on average (that is, having lower expected mean squared error) than any method that handles the parameters separately. It is named after Charles Stein of Stanford University, who discovered the phenomenon in 1955.[1]

An intuitive explanation is that optimizing for the mean-squared error of a combined estimator is not the same as optimizing for the errors of separate estimators of the individual parameters. In practical terms, if the combined error is in fact of interest, then a combined estimator should be used, even if the underlying parameters are independent; this occurs in channel estimation in telecommunications, for instance (different factors affect overall channel performance). On the other hand, if one is instead interested in estimating an individual parameter, then using a combined estimator does not help and is in fact worse.

...

Many simple, practical estimators achieve better performance than the ordinary estimator. The best-known example is the James–Stein estimator, which works by starting at X and moving towards a particular point (such as the origin) by an amount inversely proportional to the distance of X from that point.
nibble  concept  levers  wiki  reference  acm  stats  probability  decision-theory  estimate  distribution  atoms 
february 2018 by nhaliday
multivariate analysis - Is it possible to have a pair of Gaussian random variables for which the joint distribution is not Gaussian? - Cross Validated
The bivariate normal distribution is the exception, not the rule!

It is important to recognize that "almost all" joint distributions with normal marginals are not the bivariate normal distribution. That is, the common viewpoint that joint distributions with normal marginals that are not the bivariate normal are somehow "pathological", is a bit misguided.

Certainly, the multivariate normal is extremely important due to its stability under linear transformations, and so receives the bulk of attention in applications.

note: there is a multivariate central limit theorem, so those such applications have no problem
nibble  q-n-a  overflow  stats  math  acm  probability  distribution  gotchas  intricacy  characterization  structure  composition-decomposition  counterexample  limits  concentration-of-measure 
october 2017 by nhaliday
Section 10 Chi-squared goodness-of-fit test.
- pf that chi-squared statistic for Pearson's test (multinomial goodness-of-fit) actually has chi-squared distribution asymptotically
- the gotcha: terms Z_j in sum aren't independent
- solution:
- compute the covariance matrix of the terms to be E[Z_iZ_j] = -sqrt(p_ip_j)
- note that an equivalent way of sampling the Z_j is to take a random standard Gaussian and project onto the plane orthogonal to (sqrt(p_1), sqrt(p_2), ..., sqrt(p_r))
- that is equivalent to just sampling a Gaussian w/ 1 less dimension (hence df=r-1)
QED
pdf  nibble  lecture-notes  mit  stats  hypothesis-testing  acm  probability  methodology  proofs  iidness  distribution  limits  identity  direction  lifts-projections 
october 2017 by nhaliday
Lecture 14: When's that meteor arriving
- Meteors as a random process
- Limiting approximations
- Derivation of the Exponential distribution
- Derivation of the Poisson distribution
- A "Poisson process"
nibble  org:junk  org:edu  exposition  lecture-notes  physics  mechanics  space  earth  probability  stats  distribution  stochastic-processes  closure  additive  limits  approximation  tidbits  acm  binomial  multiplicative 
september 2017 by nhaliday
Kelly criterion - Wikipedia
In probability theory and intertemporal portfolio choice, the Kelly criterion, Kelly strategy, Kelly formula, or Kelly bet, is a formula used to determine the optimal size of a series of bets. In most gambling scenarios, and some investing scenarios under some simplifying assumptions, the Kelly strategy will do better than any essentially different strategy in the long run (that is, over a span of time in which the observed fraction of bets that are successful equals the probability that any given bet will be successful). It was described by J. L. Kelly, Jr, a researcher at Bell Labs, in 1956.[1] The practical use of the formula has been demonstrated.[2][3][4]

The Kelly Criterion is to bet a predetermined fraction of assets and can be counterintuitive. In one study,[5][6] each participant was given $25 and asked to bet on a coin that would land heads 60% of the time. Participants had 30 minutes to play, so could place about 300 bets, and the prizes were capped at $250. Behavior was far from optimal. "Remarkably, 28% of the participants went bust, and the average payout was just $91. Only 21% of the participants reached the maximum. 18 of the 61 participants bet everything on one toss, while two-thirds gambled on tails at some stage in the experiment." Using the Kelly criterion and based on the odds in the experiment, the right approach would be to bet 20% of the pot on each throw (see first example in Statement below). If losing, the size of the bet gets cut; if winning, the stake increases.
nibble  betting  investing  ORFE  acm  checklists  levers  probability  algorithms  wiki  reference  atoms  extrema  parsimony  tidbits  decision-theory  decision-making  street-fighting  mental-math  calculation 
august 2017 by nhaliday
Stat 260/CS 294: Bayesian Modeling and Inference
Topics
- Priors (conjugate, noninformative, reference)
- Hierarchical models, spatial models, longitudinal models, dynamic models, survival models
- Testing
- Model choice
- Inference (importance sampling, MCMC, sequential Monte Carlo)
- Nonparametric models (Dirichlet processes, Gaussian processes, neutral-to-the-right processes, completely random measures)
- Decision theory and frequentist perspectives (complete class theorems, consistency, empirical Bayes)
- Experimental design
unit  course  berkeley  expert  michael-jordan  machine-learning  acm  bayesian  probability  stats  lecture-notes  priors-posteriors  markov  monte-carlo  frequentist  latent-variables  decision-theory  expert-experience  confidence  sampling 
july 2017 by nhaliday
More on Multivariate Gaussians
Fact #1: mean and covariance uniquely determine distribution
Fact #3: closure under sum, marginalizing, and conditioning
covariance of conditional distribution is given by a Schur complement (independent of x_B. is that obvious?)
pdf  exposition  lecture-notes  stanford  nibble  distribution  acm  machine-learning  probability  levers  calculation  ground-up  characterization  rigidity  closure  nitty-gritty  linear-algebra  properties 
february 2017 by nhaliday
Hoeffding’s Inequality
basic idea of standard pf: bound e^{tX} by line segment (convexity) then use Taylor expansion (in p = b/(b-a), the fraction of range to right of 0) of logarithm
pdf  lecture-notes  exposition  nibble  concentration-of-measure  estimate  proofs  ground-up  acm  probability  series  s:null 
february 2017 by nhaliday
Mixing (mathematics) - Wikipedia
One way to describe this is that strong mixing implies that for any two possible states of the system (realizations of the random variable), when given a sufficient amount of time between the two states, the occurrence of the states is independent.

Mixing coefficient is
α(n) = sup{|P(A∪B) - P(A)P(B)| : A in σ(X_0, ..., X_{t-1}), B in σ(X_{t+n}, ...), t >= 0}
for σ(...) the sigma algebra generated by those r.v.s.

So it's a notion of total variational distance between the true distribution and the product distribution.
concept  math  acm  physics  probability  stochastic-processes  definition  mixing  iidness  wiki  reference  nibble  limits  ergodic  math.DS  measure  dependence-independence 
february 2017 by nhaliday
probability - Variance of maximum of Gaussian random variables - Cross Validated
In full generality it is rather hard to find the right order of magnitude of the variance of a Gaussien supremum since the tools from concentration theory are always suboptimal for the maximum function.

order ~ 1/log n
q-n-a  overflow  stats  probability  acm  orders  tails  bias-variance  moments  concentration-of-measure  magnitude  tidbits  distribution  yoga  structure  extrema  nibble 
february 2017 by nhaliday
bounds - What is the variance of the maximum of a sample? - Cross Validated
- sum of variances is always a bound
- can't do better even for iid Bernoulli
- looks like nice argument from well-known probabilist (using E[(X-Y)^2] = 2Var X), but not clear to me how he gets to sum_i instead of sum_{i,j} in the union bound?
edit: argument is that, for j = argmax_k Y_k, we have r < X_i - Y_j <= X_i - Y_i for all i, including i = argmax_k X_k
- different proof here (later pages): http://www.ism.ac.jp/editsec/aism/pdf/047_1_0185.pdf
Var(X_n:n) <= sum Var(X_k:n) + 2 sum_{i < j} Cov(X_i:n, X_j:n) = Var(sum X_k:n) = Var(sum X_k) = nσ^2
why are the covariances nonnegative? (are they?). intuitively seems true.
- for that, see https://pinboard.in/u:nhaliday/b:ed4466204bb1
- note that this proof shows more generally that sum Var(X_k:n) <= sum Var(X_k)
- apparently that holds for dependent X_k too? http://mathoverflow.net/a/96943/20644
q-n-a  overflow  stats  acm  distribution  tails  bias-variance  moments  estimate  magnitude  probability  iidness  tidbits  concentration-of-measure  multi  orders  levers  extrema  nibble  bonferroni  coarse-fine  expert  symmetry  s:*  expert-experience  proofs 
february 2017 by nhaliday
Predicting with confidence: the best machine learning idea you never heard of | Locklin on science
The advantages of conformal prediction are many fold. These ideas assume very little about the thing you are trying to forecast, the tool you’re using to forecast or how the world works, and they still produce a pretty good confidence interval. Even if you’re an unrepentant Bayesian, using some of the machinery of conformal prediction, you can tell when things have gone wrong with your prior. The learners work online, and with some modifications and considerations, with batch learning. One of the nice things about calculating confidence intervals as a part of your learning process is they can actually lower error rates or use in semi-supervised learning as well. Honestly, I think this is the best bag of tricks since boosting; everyone should know about and use these ideas.

The essential idea is that a “conformity function” exists. Effectively you are constructing a sort of multivariate cumulative distribution function for your machine learning gizmo using the conformity function. Such CDFs exist for classical stuff like ARIMA and linear regression under the correct circumstances; CP brings the idea to machine learning in general, and to models like ARIMA when the standard parametric confidence intervals won’t work. Within the framework, the conformity function, whatever may be, when used correctly can be guaranteed to give confidence intervals to within a probabilistic tolerance. The original proofs and treatments of conformal prediction, defined for sequences, is extremely computationally inefficient. The conditions can be relaxed in many cases, and the conformity function is in principle arbitrary, though good ones will produce narrower confidence regions. Somewhat confusingly, these good conformity functions are referred to as “efficient” -though they may not be computationally efficient.
techtariat  acmtariat  acm  machine-learning  bayesian  stats  exposition  research  online-learning  probability  decision-theory  frontier  unsupervised  confidence 
february 2017 by nhaliday
What is the relationship between information theory and Coding theory? - Quora
basically:
- finite vs. asymptotic
- combinatorial vs. probabilistic (lotsa overlap their)
- worst-case (Hamming) vs. distributional (Shannon)

Information and coding theory most often appear together in the subject of error correction over noisy channels. Historically, they were born at almost exactly the same time - both Richard Hamming and Claude Shannon were working at Bell Labs when this happened. Information theory tends to heavily use tools from probability theory (together with an "asymptotic" way of thinking about the world), while traditional "algebraic" coding theory tends to employ mathematics that are much more finite sequence length/combinatorial in nature, including linear algebra over Galois Fields. The emergence in the late 90s and first decade of 2000 of codes over graphs blurred this distinction though, as code classes such as low density parity check codes employ both asymptotic analysis and random code selection techniques which have counterparts in information theory.

They do not subsume each other. Information theory touches on many other aspects that coding theory does not, and vice-versa. Information theory also touches on compression (lossy & lossless), statistics (e.g. large deviations), modeling (e.g. Minimum Description Length). Coding theory pays a lot of attention to sphere packing and coverings for finite length sequences - information theory addresses these problems (channel & lossy source coding) only in an asymptotic/approximate sense.
q-n-a  qra  math  acm  tcs  information-theory  coding-theory  big-picture  comparison  confusion  explanation  linear-algebra  polynomials  limits  finiteness  math.CO  hi-order-bits  synthesis  probability  bits  hamming  shannon  intricacy  nibble  s:null  signal-noise 
february 2017 by nhaliday
Existence of the moment generating function and variance - Cross Validated
This question provides a nice opportunity to collect some facts on moment-generating functions (mgf).

In the answer below, we do the following:
1. Show that if the mgf is finite for at least one (strictly) positive value and one negative value, then all positive moments of X are finite (including nonintegral moments).
2. Prove that the condition in the first item above is equivalent to the distribution of X having exponentially bounded tails. In other words, the tails of X fall off at least as fast as those of an exponential random variable Z (up to a constant).
3. Provide a quick note on the characterization of the distribution by its mgf provided it satisfies the condition in item 1.
4. Explore some examples and counterexamples to aid our intuition and, particularly, to show that we should not read undue importance into the lack of finiteness of the mgf.
q-n-a  overflow  math  stats  acm  probability  characterization  concept  moments  distribution  examples  counterexample  tails  rigidity  nibble  existence  s:null  convergence  series 
january 2017 by nhaliday
pr.probability - When are probability distributions completely determined by their moments? - MathOverflow
Roughly speaking, if the sequence of moments doesn't grow too quickly, then the distribution is determined by its moments. One sufficient condition is that if the moment generating function of a random variable has positive radius of convergence, then that random variable is determined by its moments.
q-n-a  overflow  math  acm  probability  characterization  tidbits  moments  rigidity  nibble  existence  convergence  series 
january 2017 by nhaliday
Breeding the breeder's equation - Gene Expression
- interesting fact about normal distribution: when thresholding Gaussian r.v. X ~ N(0, σ^2) at X > 0, the new mean μ_s satisfies μ_s = pdf(X,t)/(1-cdf(X,t)) σ^2
- follows from direct calculation (any deeper reason?)
- note (using Taylor/asymptotic expansion of complementary error function) that this is Θ(t) as t -> 0 or ∞ (w/ different constants)
- for X ~ N(0, 1), can calculate 0 = cdf(X, t)μ_<t + (1-cdf(X, t))μ_>t => μ_<t = -pdf(X, t)/cdf(X, t)
- this declines quickly w/ t (like e^{-t^2/2}). as t -> 0, it goes like -sqrt(2/pi) + higher-order terms ~ -0.8.

Average of a tail of a normal distribution: https://stats.stackexchange.com/questions/26805/average-of-a-tail-of-a-normal-distribution

Truncated normal distribution: https://en.wikipedia.org/wiki/Truncated_normal_distribution
gnxp  explanation  concept  bio  genetics  population-genetics  agri-mindset  analysis  scitariat  org:sci  nibble  methodology  distribution  tidbits  probability  stats  acm  AMT  limits  magnitude  identity  integral  street-fighting  symmetry  s:*  tails  multi  q-n-a  overflow  wiki  reference  objektbuch  proofs 
december 2016 by nhaliday
gt.geometric topology - Intuitive crutches for higher dimensional thinking - MathOverflow
Terry Tao:
I can't help you much with high-dimensional topology - it's not my field, and I've not picked up the various tricks topologists use to get a grip on the subject - but when dealing with the geometry of high-dimensional (or infinite-dimensional) vector spaces such as R^n, there are plenty of ways to conceptualise these spaces that do not require visualising more than three dimensions directly.

For instance, one can view a high-dimensional vector space as a state space for a system with many degrees of freedom. A megapixel image, for instance, is a point in a million-dimensional vector space; by varying the image, one can explore the space, and various subsets of this space correspond to various classes of images.

One can similarly interpret sound waves, a box of gases, an ecosystem, a voting population, a stream of digital data, trials of random variables, the results of a statistical survey, a probabilistic strategy in a two-player game, and many other concrete objects as states in a high-dimensional vector space, and various basic concepts such as convexity, distance, linearity, change of variables, orthogonality, or inner product can have very natural meanings in some of these models (though not in all).

It can take a bit of both theory and practice to merge one's intuition for these things with one's spatial intuition for vectors and vector spaces, but it can be done eventually (much as after one has enough exposure to measure theory, one can start merging one's intuition regarding cardinality, mass, length, volume, probability, cost, charge, and any number of other "real-life" measures).

For instance, the fact that most of the mass of a unit ball in high dimensions lurks near the boundary of the ball can be interpreted as a manifestation of the law of large numbers, using the interpretation of a high-dimensional vector space as the state space for a large number of trials of a random variable.

More generally, many facts about low-dimensional projections or slices of high-dimensional objects can be viewed from a probabilistic, statistical, or signal processing perspective.

Scott Aaronson:
Here are some of the crutches I've relied on. (Admittedly, my crutches are probably much more useful for theoretical computer science, combinatorics, and probability than they are for geometry, topology, or physics. On a related note, I personally have a much easier time thinking about R^n than about, say, R^4 or R^5!)

1. If you're trying to visualize some 4D phenomenon P, first think of a related 3D phenomenon P', and then imagine yourself as a 2D being who's trying to visualize P'. The advantage is that, unlike with the 4D vs. 3D case, you yourself can easily switch between the 3D and 2D perspectives, and can therefore get a sense of exactly what information is being lost when you drop a dimension. (You could call this the "Flatland trick," after the most famous literary work to rely on it.)
2. As someone else mentioned, discretize! Instead of thinking about R^n, think about the Boolean hypercube {0,1}^n, which is finite and usually easier to get intuition about. (When working on problems, I often find myself drawing {0,1}^4 on a sheet of paper by drawing two copies of {0,1}^3 and then connecting the corresponding vertices.)
3. Instead of thinking about a subset S⊆R^n, think about its characteristic function f:R^n→{0,1}. I don't know why that trivial perspective switch makes such a big difference, but it does ... maybe because it shifts your attention to the process of computing f, and makes you forget about the hopeless task of visualizing S!
4. One of the central facts about R^n is that, while it has "room" for only n orthogonal vectors, it has room for exp⁡(n) almost-orthogonal vectors. Internalize that one fact, and so many other properties of R^n (for example, that the n-sphere resembles a "ball with spikes sticking out," as someone mentioned before) will suddenly seem non-mysterious. In turn, one way to internalize the fact that R^n has so many almost-orthogonal vectors is to internalize Shannon's theorem that there exist good error-correcting codes.
5. To get a feel for some high-dimensional object, ask questions about the behavior of a process that takes place on that object. For example: if I drop a ball here, which local minimum will it settle into? How long does this random walk on {0,1}^n take to mix?

Gil Kalai:
This is a slightly different point, but Vitali Milman, who works in high-dimensional convexity, likes to draw high-dimensional convex bodies in a non-convex way. This is to convey the point that if you take the convex hull of a few points on the unit sphere of R^n, then for large n very little of the measure of the convex body is anywhere near the corners, so in a certain sense the body is a bit like a small sphere with long thin "spikes".
q-n-a  intuition  math  visual-understanding  list  discussion  thurston  tidbits  aaronson  tcs  geometry  problem-solving  yoga  👳  big-list  metabuch  tcstariat  gowers  mathtariat  acm  overflow  soft-question  levers  dimensionality  hi-order-bits  insight  synthesis  thinking  models  cartoons  coding-theory  information-theory  probability  concentration-of-measure  magnitude  linear-algebra  boolean-analysis  analogy  arrows  lifts-projections  measure  markov  sampling  shannon  conceptual-vocab  nibble  degrees-of-freedom  worrydream  neurons  retrofit  oscillation  paradox  novelty  tricki  concrete  high-dimension  s:***  manifolds  direction  curvature  convexity-curvature  elegance  guessing 
december 2016 by nhaliday
Borel–Cantelli lemma - Wikipedia
- sum of probabilities finite => a.s. only finitely many occur
- "<=" w/ some assumptions (pairwise independence)
- classic result from CS 150 (problem set 1)
wiki  reference  estimate  probability  math  acm  concept  levers  probabilistic-method  limits  nibble  borel-cantelli 
november 2016 by nhaliday
Bounds on the Expectation of the Maximum of Samples from a Gaussian
σ/sqrt(pi log 2) sqrt(log n) <= E[Y] <= σ sqrt(2) sqrt(log n)

upper bound pf: Jensen's inequality+mgf+union bound+choose optimal t (Chernoff bound basically)
lower bound pf: more ad-hoc (and difficult)
pdf  tidbits  math  probability  concentration-of-measure  estimate  acm  tails  distribution  calculation  iidness  orders  magnitude  extrema  tightness  outliers  expectancy  proofs  elegance 
october 2016 by nhaliday
Dirichlet process - Wikipedia, the free encyclopedia
In probability theory, Dirichlet processes (after Peter Gustav Lejeune Dirichlet) are a family of stochastic processes whose realizations are probability distributions. In other words, a Dirichlet process is a probability distribution whose range is itself a set of probability distributions. It is often used in Bayesian inference to describe the prior knowledge about the distribution of random variables—how likely it is that the random variables are distributed according to one or another particular distribution.
probability  stats  wiki  reference  concept  acm  distribution  bayesian  simplex  nibble 
june 2016 by nhaliday
Kullback–Leibler divergence - Wikipedia, the free encyclopedia
see https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence#Motivation especially

https://twitter.com/SimonDeDeo/status/993881889143447552
https://archive.is/hZcVb
Kullback-Leibler divergence has an enormous number of interpretations and uses: psychological, epistemic, thermodynamic, statistical, computational, geometrical... I am pretty sure I could teach an entire graduate seminar on it.
information-theory  math  wiki  probability  concept  reference  acm  hmm  atoms  operational  characterization  metrics  bits  entropy-like  nibble  properties  multi  twitter  social  discussion  backup 
may 2016 by nhaliday

bundles : academeacmframemath

related tags

aaronson  abstraction  academia  accretion  accuracy  acm  acmtariat  additive  advanced  adversarial  advice  age-generation  aging  agri-mindset  agriculture  ai  ai-control  algebra  algorithmic-econ  algorithms  alignment  altruism  AMT  analogy  analysis  analytical-holistic  anthropic  anthropology  antiquity  apollonian-dionysian  applicability-prereqs  applications  approximation  arrows  art  article  ascetic  asia  atoms  axelrod  backup  bayesian  behavioral-econ  behavioral-gen  benchmarks  berkeley  best-practices  betting  bias-variance  big-list  big-peeps  big-picture  binomial  bio  biodet  bits  blog  boltzmann  bonferroni  books  boolean-analysis  borel-cantelli  bounded-cognition  brexit  britain  broad-econ  business  calculation  caltech  career  cartoons  causation  chaining  characterization  charity  chart  cheatsheet  checklists  chemistry  china  christianity  civilization  clever-rats  closure  cmu  coarse-fine  code-dive  code-organizing  coding-theory  cog-psych  cohesion  commentary  communication  comparison  competition  complement-substitute  complex-systems  complexity  composition-decomposition  computation  concentration-of-measure  concept  conceptual-vocab  concrete  concurrency  confidence  confluence  confusion  conquest-empire  contracts  contrarianism  convergence  convexity-curvature  cooperate-defect  coordination  correlation  cost-benefit  counterexample  coupling-cohesion  course  cracker-econ  criminal-justice  cs  cultural-dynamics  culture  curiosity  curvature  cybernetics  cycles  dan-luu  dark-arts  darwinian  data  data-science  data-structures  debate  debt  decentralized  decision-making  decision-theory  deep-learning  deep-materialism  definition  degrees-of-freedom  density  dependence-independence  detail-architecture  differential  dimensionality  direction  discrete  discussion  distribution  diversity  domestication  DP  draft  early-modern  earth  ecology  economics  econotariat  eden  EEA  egalitarianism-hierarchy  EGT  elections  electromag  elegance  elite  emotion  endo-exo  endogenous-exogenous  ends-means  engineering  enhancement  entropy-like  envy  equilibrium  ergodic  eric-kaufmann  erik-demaine  error  estimate  ethics  europe  events  evolution  evopsych  examples  existence  expansionism  expectancy  expert  expert-experience  explanans  explanation  exposition  extrema  faq  farmers-and-foragers  fermi  finance  finiteness  flexibility  formal-values  fourier  free-riding  frequentist  frontier  functional  futurism  game-theory  garett-jones  gaussian-processes  gedanken  genetics  geometry  giants  gnon  gnosis-logos  gnxp  gotchas  government  gowers  graph-theory  graphical-models  graphs  ground-up  group-selection  growth  growth-econ  GT-101  guessing  guilt-shame  gwern  hamming  hardware  hari-seldon  henrich  hi-order-bits  high-dimension  history  hmm  homepage  homogeneity  honor  housing  hsu  hypothesis-testing  ideas  identity  ideology  IEEE  iidness  illusion  impact  impro  incentives  individualism-collectivism  industrial-revolution  info-dynamics  info-foraging  information-theory  init  insight  integral  intelligence  interdisciplinary  interests  intersection-connectedness  intricacy  intuition  invariance  investing  iron-age  israel  iteration-recursion  jargon  jobs  journos-pundits  justice  kernels  knowledge  language  latent-variables  learning-theory  lecture-notes  lectures  legacy  lesswrong  letters  levers  leviathan  lifts-projections  limits  linear-algebra  linearity  links  linux  list  logic  long-short-run  love-hate  machine-learning  macro  magnitude  management  manifolds  marginal  marginal-rev  market-failure  markets  markov  martingale  math  math.CA  math.CO  math.DS  math.FA  math.GN  math.GR  math.MG  math.NT  math.RT  mathtariat  measure  mechanics  mechanism-design  medieval  mediterranean  mental-math  meta:math  meta:prediction  meta:rhetoric  metabuch  metal-to-virtual  metameta  methodology  metrics  michael-jordan  micro  microsoft  mit  mixing  ML-MAP-E  model-class  models  moments  monte-carlo  morality  mostly-modern  motivation  multi  multiplicative  music-theory  n-factor  network-structure  neuro  neurons  new-religion  nibble  nietzschean  nitty-gritty  noblesse-oblige  nonlinearity  nonparametric  novelty  number  numerics  objektbuch  occident  ocw  off-convex  oly  online-learning  operational  optimization  order-disorder  orders  ORFE  org:bleg  org:com  org:edge  org:edu  org:junk  org:mat  org:nat  org:sci  organizing  orient  os  oscillation  outcome-risk  outliers  overflow  p:*  p:**  p:***  p:null  p:someday  p:whenever  papers  paradox  parametric  parasites-microbiome  pareto  parsimony  patho-altruism  pdf  peace-violence  people  performance  pessimism  phase-transition  philosophy  phys-energy  physics  pic  pigeonhole-markov  piracy  plots  polisci  polynomials  population  population-genetics  positivity  postmortem  power-law  pragmatic  pre-2013  preprint  presentation  princeton  prioritizing  priors-posteriors  pro-rata  probabilistic-method  probability  problem-solving  productivity  prof  programming  project  proofs  properties  prudence  psychology  public-goodish  putnam-like  q-n-a  qra  quantum  questions  quixotic  quora  rand-approx  random  rationality  ratty  reading  recommendations  recruiting  reduction  reference  reflection  regularizer  regulation  reinforcement  relativity  religion  reputation  research  retention  retrofit  review  rhetoric  rigidity  risk  ritual  roadmap  robust  roots  rot  s:*  s:**  s:***  s:null  sampling  sapiens  scale  scaling-up  scholar-pack  science  scitariat  selection  self-interest  sequential  series  shalizi  shannon  signal-noise  signaling  signum  simplex  simulation  singularity  sinosphere  skeleton  slides  smoothness  social  social-norms  social-science  sociality  sociology  soft-question  space  spatial  spearhead  spectral  speculation  speed  stanford  stat-mech  stats  stochastic-processes  strategy  stream  street-fighting  structure  study  studying  stylized-facts  subculture  summary  sv  symmetry  synthesis  system-design  systems  tails  talks  tcs  tcstariat  tech  techtariat  telos-atelos  the-basilisk  the-classics  the-great-west-whale  the-prices  the-trenches  the-watchers  theory-practice  theos  thermo  thinking  threat-modeling  thurston  tidbits  tightness  time  time-complexity  toolkit  top-n  topics  topology  track-record  trade  tribalism  tricki  trust  truth  twitter  unit  unix  unsupervised  us-them  usa  vampire-squid  video  visual-understanding  visualization  volo-avolo  war  washington  westminster  wiki  winter-2016  winter-2017  working-stiff  world-war  worrydream  xenobio  yoga  zero-positive-sum  🎓  👳  🔬  🖥  🤖  🦉 

Copy this bookmark:



description:


tags: