**nhaliday + acm + hi-order-bits**
15

Theory of Self-Reproducing Automata - John von Neumann

april 2018 by nhaliday

Fourth Lecture: THE ROLE OF HIGH AND OF EXTREMELY HIGH COMPLICATION

Comparisons between computing machines and the nervous systems. Estimates of size for computing machines, present and near future.

Estimates for size for the human central nervous system. Excursus about the “mixed” character of living organisms. Analog and digital elements. Observations about the “mixed” character of all componentry, artificial as well as natural. Interpretation of the position to be taken with respect to these.

Evaluation of the discrepancy in size between artificial and natural automata. Interpretation of this discrepancy in terms of physical factors. Nature of the materials used.

The probability of the presence of other intellectual factors. The role of complication and the theoretical penetration that it requires.

Questions of reliability and errors reconsidered. Probability of individual errors and length of procedure. Typical lengths of procedure for computing machines and for living organisms--that is, for artificial and for natural automata. Upper limits on acceptable probability of error in individual operations. Compensation by checking and self-correcting features.

Differences of principle in the way in which errors are dealt with in artificial and in natural automata. The “single error” principle in artificial automata. Crudeness of our approach in this case, due to the lack of adequate theory. More sophisticated treatment of this problem in natural automata: The role of the autonomy of parts. Connections between this autonomy and evolution.

- 10^10 neurons in brain, 10^4 vacuum tubes in largest computer at time

- machines faster: 5 ms from neuron potential to neuron potential, 10^-3 ms for vacuum tubes

https://en.wikipedia.org/wiki/John_von_Neumann#Computing

pdf
article
papers
essay
nibble
math
cs
computation
bio
neuro
neuro-nitgrit
scale
magnitude
comparison
acm
von-neumann
giants
thermo
phys-energy
speed
performance
time
density
frequency
hardware
ems
efficiency
dirty-hands
street-fighting
fermi
estimate
retention
physics
interdisciplinary
multi
wiki
links
people
🔬
atoms
duplication
iteration-recursion
turing
complexity
measure
nature
technology
complex-systems
bits
information-theory
circuits
robust
structure
composition-decomposition
evolution
mutation
axioms
analogy
thinking
input-output
hi-order-bits
coding-theory
flexibility
rigidity
automata-languages
Comparisons between computing machines and the nervous systems. Estimates of size for computing machines, present and near future.

Estimates for size for the human central nervous system. Excursus about the “mixed” character of living organisms. Analog and digital elements. Observations about the “mixed” character of all componentry, artificial as well as natural. Interpretation of the position to be taken with respect to these.

Evaluation of the discrepancy in size between artificial and natural automata. Interpretation of this discrepancy in terms of physical factors. Nature of the materials used.

The probability of the presence of other intellectual factors. The role of complication and the theoretical penetration that it requires.

Questions of reliability and errors reconsidered. Probability of individual errors and length of procedure. Typical lengths of procedure for computing machines and for living organisms--that is, for artificial and for natural automata. Upper limits on acceptable probability of error in individual operations. Compensation by checking and self-correcting features.

Differences of principle in the way in which errors are dealt with in artificial and in natural automata. The “single error” principle in artificial automata. Crudeness of our approach in this case, due to the lack of adequate theory. More sophisticated treatment of this problem in natural automata: The role of the autonomy of parts. Connections between this autonomy and evolution.

- 10^10 neurons in brain, 10^4 vacuum tubes in largest computer at time

- machines faster: 5 ms from neuron potential to neuron potential, 10^-3 ms for vacuum tubes

https://en.wikipedia.org/wiki/John_von_Neumann#Computing

april 2018 by nhaliday

An Outsider's Tour of Reinforcement Learning – arg min blog

acmtariat ben-recht org:bleg nibble exposition explanation expert-experience tutorial guide yoga reinforcement optimization linear-algebra model-class atoms concept signal-noise iteration-recursion volo-avolo benchmarks deep-learning unsupervised thinking descriptive values gradient-descent acm decision-theory decision-making math.DS sequential random search realness hi-order-bits synthesis coarse-fine bare-hands openai replication linearity nonlinearity research

april 2018 by nhaliday

acmtariat ben-recht org:bleg nibble exposition explanation expert-experience tutorial guide yoga reinforcement optimization linear-algebra model-class atoms concept signal-noise iteration-recursion volo-avolo benchmarks deep-learning unsupervised thinking descriptive values gradient-descent acm decision-theory decision-making math.DS sequential random search realness hi-order-bits synthesis coarse-fine bare-hands openai replication linearity nonlinearity research

april 2018 by nhaliday

New Theory Cracks Open the Black Box of Deep Learning | Quanta Magazine

september 2017 by nhaliday

A new idea called the “information bottleneck” is helping to explain the puzzling success of today’s artificial-intelligence algorithms — and might also explain how human brains learn.

sounds like he's just talking about autoencoders?

news
org:mag
org:sci
popsci
announcement
research
deep-learning
machine-learning
acm
information-theory
bits
neuro
model-class
big-surf
frontier
nibble
hmm
signal-noise
deepgoog
expert
ideas
wild-ideas
summary
talks
video
israel
roots
physics
interdisciplinary
ai
intelligence
shannon
giants
arrows
preimage
lifts-projections
composition-decomposition
characterization
markov
gradient-descent
papers
liner-notes
experiment
hi-order-bits
generalization
expert-experience
explanans
org:inst
speedometer
sounds like he's just talking about autoencoders?

september 2017 by nhaliday

A cube, a starfish, a thin shell, and the central limit theorem – Libres pensées d'un mathématicien ordinaire

mathtariat org:bleg nibble math acm probability concentration-of-measure high-dimension cartoons limits dimensionality measure yoga hi-order-bits synthesis exposition spatial geometry math.MG curvature convexity-curvature

february 2017 by nhaliday

mathtariat org:bleg nibble math acm probability concentration-of-measure high-dimension cartoons limits dimensionality measure yoga hi-order-bits synthesis exposition spatial geometry math.MG curvature convexity-curvature

february 2017 by nhaliday

What is the relationship between information theory and Coding theory? - Quora

february 2017 by nhaliday

basically:

- finite vs. asymptotic

- combinatorial vs. probabilistic (lotsa overlap their)

- worst-case (Hamming) vs. distributional (Shannon)

Information and coding theory most often appear together in the subject of error correction over noisy channels. Historically, they were born at almost exactly the same time - both Richard Hamming and Claude Shannon were working at Bell Labs when this happened. Information theory tends to heavily use tools from probability theory (together with an "asymptotic" way of thinking about the world), while traditional "algebraic" coding theory tends to employ mathematics that are much more finite sequence length/combinatorial in nature, including linear algebra over Galois Fields. The emergence in the late 90s and first decade of 2000 of codes over graphs blurred this distinction though, as code classes such as low density parity check codes employ both asymptotic analysis and random code selection techniques which have counterparts in information theory.

They do not subsume each other. Information theory touches on many other aspects that coding theory does not, and vice-versa. Information theory also touches on compression (lossy & lossless), statistics (e.g. large deviations), modeling (e.g. Minimum Description Length). Coding theory pays a lot of attention to sphere packing and coverings for finite length sequences - information theory addresses these problems (channel & lossy source coding) only in an asymptotic/approximate sense.

q-n-a
qra
math
acm
tcs
information-theory
coding-theory
big-picture
comparison
confusion
explanation
linear-algebra
polynomials
limits
finiteness
math.CO
hi-order-bits
synthesis
probability
bits
hamming
shannon
intricacy
nibble
s:null
signal-noise
- finite vs. asymptotic

- combinatorial vs. probabilistic (lotsa overlap their)

- worst-case (Hamming) vs. distributional (Shannon)

Information and coding theory most often appear together in the subject of error correction over noisy channels. Historically, they were born at almost exactly the same time - both Richard Hamming and Claude Shannon were working at Bell Labs when this happened. Information theory tends to heavily use tools from probability theory (together with an "asymptotic" way of thinking about the world), while traditional "algebraic" coding theory tends to employ mathematics that are much more finite sequence length/combinatorial in nature, including linear algebra over Galois Fields. The emergence in the late 90s and first decade of 2000 of codes over graphs blurred this distinction though, as code classes such as low density parity check codes employ both asymptotic analysis and random code selection techniques which have counterparts in information theory.

They do not subsume each other. Information theory touches on many other aspects that coding theory does not, and vice-versa. Information theory also touches on compression (lossy & lossless), statistics (e.g. large deviations), modeling (e.g. Minimum Description Length). Coding theory pays a lot of attention to sphere packing and coverings for finite length sequences - information theory addresses these problems (channel & lossy source coding) only in an asymptotic/approximate sense.

february 2017 by nhaliday

gt.geometric topology - Intuitive crutches for higher dimensional thinking - MathOverflow

december 2016 by nhaliday

Terry Tao:

I can't help you much with high-dimensional topology - it's not my field, and I've not picked up the various tricks topologists use to get a grip on the subject - but when dealing with the geometry of high-dimensional (or infinite-dimensional) vector spaces such as R^n, there are plenty of ways to conceptualise these spaces that do not require visualising more than three dimensions directly.

For instance, one can view a high-dimensional vector space as a state space for a system with many degrees of freedom. A megapixel image, for instance, is a point in a million-dimensional vector space; by varying the image, one can explore the space, and various subsets of this space correspond to various classes of images.

One can similarly interpret sound waves, a box of gases, an ecosystem, a voting population, a stream of digital data, trials of random variables, the results of a statistical survey, a probabilistic strategy in a two-player game, and many other concrete objects as states in a high-dimensional vector space, and various basic concepts such as convexity, distance, linearity, change of variables, orthogonality, or inner product can have very natural meanings in some of these models (though not in all).

It can take a bit of both theory and practice to merge one's intuition for these things with one's spatial intuition for vectors and vector spaces, but it can be done eventually (much as after one has enough exposure to measure theory, one can start merging one's intuition regarding cardinality, mass, length, volume, probability, cost, charge, and any number of other "real-life" measures).

For instance, the fact that most of the mass of a unit ball in high dimensions lurks near the boundary of the ball can be interpreted as a manifestation of the law of large numbers, using the interpretation of a high-dimensional vector space as the state space for a large number of trials of a random variable.

More generally, many facts about low-dimensional projections or slices of high-dimensional objects can be viewed from a probabilistic, statistical, or signal processing perspective.

Scott Aaronson:

Here are some of the crutches I've relied on. (Admittedly, my crutches are probably much more useful for theoretical computer science, combinatorics, and probability than they are for geometry, topology, or physics. On a related note, I personally have a much easier time thinking about R^n than about, say, R^4 or R^5!)

1. If you're trying to visualize some 4D phenomenon P, first think of a related 3D phenomenon P', and then imagine yourself as a 2D being who's trying to visualize P'. The advantage is that, unlike with the 4D vs. 3D case, you yourself can easily switch between the 3D and 2D perspectives, and can therefore get a sense of exactly what information is being lost when you drop a dimension. (You could call this the "Flatland trick," after the most famous literary work to rely on it.)

2. As someone else mentioned, discretize! Instead of thinking about R^n, think about the Boolean hypercube {0,1}^n, which is finite and usually easier to get intuition about. (When working on problems, I often find myself drawing {0,1}^4 on a sheet of paper by drawing two copies of {0,1}^3 and then connecting the corresponding vertices.)

3. Instead of thinking about a subset S⊆R^n, think about its characteristic function f:R^n→{0,1}. I don't know why that trivial perspective switch makes such a big difference, but it does ... maybe because it shifts your attention to the process of computing f, and makes you forget about the hopeless task of visualizing S!

4. One of the central facts about R^n is that, while it has "room" for only n orthogonal vectors, it has room for exp(n) almost-orthogonal vectors. Internalize that one fact, and so many other properties of R^n (for example, that the n-sphere resembles a "ball with spikes sticking out," as someone mentioned before) will suddenly seem non-mysterious. In turn, one way to internalize the fact that R^n has so many almost-orthogonal vectors is to internalize Shannon's theorem that there exist good error-correcting codes.

5. To get a feel for some high-dimensional object, ask questions about the behavior of a process that takes place on that object. For example: if I drop a ball here, which local minimum will it settle into? How long does this random walk on {0,1}^n take to mix?

Gil Kalai:

This is a slightly different point, but Vitali Milman, who works in high-dimensional convexity, likes to draw high-dimensional convex bodies in a non-convex way. This is to convey the point that if you take the convex hull of a few points on the unit sphere of R^n, then for large n very little of the measure of the convex body is anywhere near the corners, so in a certain sense the body is a bit like a small sphere with long thin "spikes".

q-n-a
intuition
math
visual-understanding
list
discussion
thurston
tidbits
aaronson
tcs
geometry
problem-solving
yoga
👳
big-list
metabuch
tcstariat
gowers
mathtariat
acm
overflow
soft-question
levers
dimensionality
hi-order-bits
insight
synthesis
thinking
models
cartoons
coding-theory
information-theory
probability
concentration-of-measure
magnitude
linear-algebra
boolean-analysis
analogy
arrows
lifts-projections
measure
markov
sampling
shannon
conceptual-vocab
nibble
degrees-of-freedom
worrydream
neurons
retrofit
oscillation
paradox
novelty
tricki
concrete
high-dimension
s:***
manifolds
direction
curvature
convexity-curvature
elegance
guessing
I can't help you much with high-dimensional topology - it's not my field, and I've not picked up the various tricks topologists use to get a grip on the subject - but when dealing with the geometry of high-dimensional (or infinite-dimensional) vector spaces such as R^n, there are plenty of ways to conceptualise these spaces that do not require visualising more than three dimensions directly.

For instance, one can view a high-dimensional vector space as a state space for a system with many degrees of freedom. A megapixel image, for instance, is a point in a million-dimensional vector space; by varying the image, one can explore the space, and various subsets of this space correspond to various classes of images.

One can similarly interpret sound waves, a box of gases, an ecosystem, a voting population, a stream of digital data, trials of random variables, the results of a statistical survey, a probabilistic strategy in a two-player game, and many other concrete objects as states in a high-dimensional vector space, and various basic concepts such as convexity, distance, linearity, change of variables, orthogonality, or inner product can have very natural meanings in some of these models (though not in all).

It can take a bit of both theory and practice to merge one's intuition for these things with one's spatial intuition for vectors and vector spaces, but it can be done eventually (much as after one has enough exposure to measure theory, one can start merging one's intuition regarding cardinality, mass, length, volume, probability, cost, charge, and any number of other "real-life" measures).

For instance, the fact that most of the mass of a unit ball in high dimensions lurks near the boundary of the ball can be interpreted as a manifestation of the law of large numbers, using the interpretation of a high-dimensional vector space as the state space for a large number of trials of a random variable.

More generally, many facts about low-dimensional projections or slices of high-dimensional objects can be viewed from a probabilistic, statistical, or signal processing perspective.

Scott Aaronson:

Here are some of the crutches I've relied on. (Admittedly, my crutches are probably much more useful for theoretical computer science, combinatorics, and probability than they are for geometry, topology, or physics. On a related note, I personally have a much easier time thinking about R^n than about, say, R^4 or R^5!)

1. If you're trying to visualize some 4D phenomenon P, first think of a related 3D phenomenon P', and then imagine yourself as a 2D being who's trying to visualize P'. The advantage is that, unlike with the 4D vs. 3D case, you yourself can easily switch between the 3D and 2D perspectives, and can therefore get a sense of exactly what information is being lost when you drop a dimension. (You could call this the "Flatland trick," after the most famous literary work to rely on it.)

2. As someone else mentioned, discretize! Instead of thinking about R^n, think about the Boolean hypercube {0,1}^n, which is finite and usually easier to get intuition about. (When working on problems, I often find myself drawing {0,1}^4 on a sheet of paper by drawing two copies of {0,1}^3 and then connecting the corresponding vertices.)

3. Instead of thinking about a subset S⊆R^n, think about its characteristic function f:R^n→{0,1}. I don't know why that trivial perspective switch makes such a big difference, but it does ... maybe because it shifts your attention to the process of computing f, and makes you forget about the hopeless task of visualizing S!

4. One of the central facts about R^n is that, while it has "room" for only n orthogonal vectors, it has room for exp(n) almost-orthogonal vectors. Internalize that one fact, and so many other properties of R^n (for example, that the n-sphere resembles a "ball with spikes sticking out," as someone mentioned before) will suddenly seem non-mysterious. In turn, one way to internalize the fact that R^n has so many almost-orthogonal vectors is to internalize Shannon's theorem that there exist good error-correcting codes.

5. To get a feel for some high-dimensional object, ask questions about the behavior of a process that takes place on that object. For example: if I drop a ball here, which local minimum will it settle into? How long does this random walk on {0,1}^n take to mix?

Gil Kalai:

This is a slightly different point, but Vitali Milman, who works in high-dimensional convexity, likes to draw high-dimensional convex bodies in a non-convex way. This is to convey the point that if you take the convex hull of a few points on the unit sphere of R^n, then for large n very little of the measure of the convex body is anywhere near the corners, so in a certain sense the body is a bit like a small sphere with long thin "spikes".

december 2016 by nhaliday

A Fervent Defense of Frequentist Statistics - Less Wrong

september 2016 by nhaliday

Short summary. This essay makes many points, each of which I think is worth reading, but if you are only going to understand one point I think it should be “Myth 5″ below, which describes the online learning framework as a response to the claim that frequentist methods need to make strong modeling assumptions. Among other things, online learning allows me to perform the following remarkable feat: if I’m betting on horses, and I get to place bets after watching other people bet but before seeing which horse wins the race, then I can guarantee that after a relatively small number of races, I will do almost as well overall as the best other person, even if the number of other people is very large (say, 1 billion), and their performance is correlated in complicated ways.

If you’re only going to understand two points, then also read about the frequentist version of Solomonoff induction, which is described in “Myth 6″.

...

If you are like me from, say, two years ago, you are firmly convinced that Bayesian methods are superior and that you have knockdown arguments in favor of this. If this is the case, then I hope this essay will give you an experience that I myself found life-altering: the experience of having a way of thinking that seemed unquestionably true slowly dissolve into just one of many imperfect models of reality. This experience helped me gain more explicit appreciation for the skill of viewing the world from many different angles, and of distinguishing between a very successful paradigm and reality.

If you are not like me, then you may have had the experience of bringing up one of many reasonable objections to normative Bayesian epistemology, and having it shot down by one of many “standard” arguments that seem wrong but not for easy-to-articulate reasons. I hope to lend some reprieve to those of you in this camp, by providing a collection of “standard” replies to these standard arguments.

bayesian
philosophy
stats
rhetoric
advice
debate
critique
expert
lesswrong
commentary
discussion
regularizer
essay
exposition
🤖
aphorism
spock
synthesis
clever-rats
ratty
hi-order-bits
top-n
2014
acmtariat
big-picture
acm
iidness
online-learning
lens
clarity
unit
nibble
frequentist
s:**
expert-experience
subjective-objective
If you’re only going to understand two points, then also read about the frequentist version of Solomonoff induction, which is described in “Myth 6″.

...

If you are like me from, say, two years ago, you are firmly convinced that Bayesian methods are superior and that you have knockdown arguments in favor of this. If this is the case, then I hope this essay will give you an experience that I myself found life-altering: the experience of having a way of thinking that seemed unquestionably true slowly dissolve into just one of many imperfect models of reality. This experience helped me gain more explicit appreciation for the skill of viewing the world from many different angles, and of distinguishing between a very successful paradigm and reality.

If you are not like me, then you may have had the experience of bringing up one of many reasonable objections to normative Bayesian epistemology, and having it shot down by one of many “standard” arguments that seem wrong but not for easy-to-articulate reasons. I hope to lend some reprieve to those of you in this camp, by providing a collection of “standard” replies to these standard arguments.

september 2016 by nhaliday

machine learning - Why is Euclidean distance not a good metric in high dimensions? - Cross Validated

thinking machine-learning math acm synthesis intuition q-n-a overflow soft-question dimensionality hi-order-bits curiosity cartoons concentration-of-measure norms nibble novelty high-dimension direction metric-space yoga measure best-practices

september 2016 by nhaliday

thinking machine-learning math acm synthesis intuition q-n-a overflow soft-question dimensionality hi-order-bits curiosity cartoons concentration-of-measure norms nibble novelty high-dimension direction metric-space yoga measure best-practices

september 2016 by nhaliday

Probably Overthinking It: There is still only one test

june 2016 by nhaliday

all hypothesis tests are based on the same framework

stats
explanation
init
rhetoric
synthesis
acm
insight
concept
methodology
hi-order-bits
big-picture
hypothesis-testing
🔬
june 2016 by nhaliday

Useful Math | Academically Interesting

math academia list roadmap machine-learning tcs yoga acm synthesis metabuch clever-rats ratty scholar-pack top-n hi-order-bits levers 🎓 👳 pre-2013 acmtariat big-picture org:bleg nibble metameta impact meta:math skeleton s:*** p:*** applications chart knowledge studying prioritizing ideas track-record checklists tricki problem-solving optimization differential linear-algebra probability stochastic-processes martingale estimate math.CA series approximation deep-learning graphs graph-theory graphical-models model-class pigeonhole-markov linearity atoms distribution entropy-like dimensionality homogeneity spectral fourier arrows finiteness math.GN topology smoothness measure manifolds curvature concept conceptual-vocab convexity-curvature confluence toolkit apollonian-dionysian pragmatic telos-atelos ends-means quixotic

february 2016 by nhaliday

math academia list roadmap machine-learning tcs yoga acm synthesis metabuch clever-rats ratty scholar-pack top-n hi-order-bits levers 🎓 👳 pre-2013 acmtariat big-picture org:bleg nibble metameta impact meta:math skeleton s:*** p:*** applications chart knowledge studying prioritizing ideas track-record checklists tricki problem-solving optimization differential linear-algebra probability stochastic-processes martingale estimate math.CA series approximation deep-learning graphs graph-theory graphical-models model-class pigeonhole-markov linearity atoms distribution entropy-like dimensionality homogeneity spectral fourier arrows finiteness math.GN topology smoothness measure manifolds curvature concept conceptual-vocab convexity-curvature confluence toolkit apollonian-dionysian pragmatic telos-atelos ends-means quixotic

february 2016 by nhaliday

**related tags**

Copy this bookmark: