nhaliday + ground-up   140

Rational Sines of Rational Multiples of p
For which rational multiples of p is the sine rational? We have the three trivial cases
[0, pi/2, pi/6]
and we wish to show that these are essentially the only distinct rational sines of rational multiples of p.

The assertion about rational sines of rational multiples of p follows from two fundamental lemmas. The first is

Lemma 1: For any rational number q the value of sin(qp) is a root of a monic polynomial with integer coefficients.

[Pf uses some ideas unfamiliar to me: similarity parameter of Moebius (linear fraction) transformations, and finding a polynomial for a desired root by constructing a Moebius transformation with a finite period.]


Lemma 2: Any root of a monic polynomial f(x) with integer coefficients must either be an integer or irrational.

[Gauss's Lemma, cf Dummit-Foote.]

nibble  tidbits  org:junk  analysis  trivia  math  algebra  polynomials  fields  characterization  direction  math.CA  math.CV  ground-up 
july 2019 by nhaliday
Bareiss algorithm - Wikipedia
During the execution of Bareiss algorithm, every integer that is computed is the determinant of a submatrix of the input matrix. This allows, using the Hadamard inequality, to bound the size of these integers. Otherwise, the Bareiss algorithm may be viewed as a variant of Gaussian elimination and needs roughly the same number of arithmetic operations.
nibble  ground-up  cs  tcs  algorithms  complexity  linear-algebra  numerics  sci-comp  fields  calculation  nitty-gritty 
june 2019 by nhaliday
What every computer scientist should know about floating-point arithmetic
Floating-point arithmetic is considered as esoteric subject by many people. This is rather surprising, because floating-point is ubiquitous in computer systems: Almost every language has a floating-point datatype; computers from PCs to supercomputers have floating-point accelerators; most compilers will be called upon to compile floating-point algorithms from time to time; and virtually every operating system must respond to floating-point exceptions such as overflow. This paper presents a tutorial on the aspects of floating-point that have a direct impact on designers of computer systems. It begins with background on floating-point representation and rounding error, continues with a discussion of the IEEE floating point standard, and concludes with examples of how computer system builders can better support floating point.

"you must use an epsilon when dealing with floats" is a knee-jerk reaction of programmers with a superficial understanding of floating-point computations, for comparisons in general (not only to zero).

This is usually unhelpful because it doesn't tell you how to minimize the propagation of rounding errors, it doesn't tell you how to avoid cancellation or absorption problems, and even when your problem is indeed related to the comparison of two floats, it doesn't tell you what value of epsilon is right for what you are doing.


Regarding the propagation of rounding errors, there exists specialized analyzers that can help you estimate it, because it is a tedious thing to do by hand.


This was part of HW1 of CS24:
In particular, simply summing n numbers in sequence has a worst-case error that grows proportional to n, and a root mean square error that grows as {\displaystyle {\sqrt {n}}} {\sqrt {n}} for random inputs (the roundoff errors form a random walk).[2] With compensated summation, the worst-case error bound is independent of n, so a large number of values can be summed with an error that only depends on the floating-point precision.[2]

In numerical analysis, pairwise summation, also called cascade summation, is a technique to sum a sequence of finite-precision floating-point numbers that substantially reduces the accumulated round-off error compared to naively accumulating the sum in sequence.[1] Although there are other techniques such as Kahan summation that typically have even smaller round-off errors, pairwise summation is nearly as good (differing only by a logarithmic factor) while having much lower computational cost—it can be implemented so as to have nearly the same cost (and exactly the same number of arithmetic operations) as naive summation.

In particular, pairwise summation of a sequence of n numbers xn works by recursively breaking the sequence into two halves, summing each half, and adding the two sums: a divide and conquer algorithm. Its worst-case roundoff errors grow asymptotically as at most O(ε log n), where ε is the machine precision (assuming a fixed condition number, as discussed below).[1] In comparison, the naive technique of accumulating the sum in sequence (adding each xi one at a time for i = 1, ..., n) has roundoff errors that grow at worst as O(εn).[1] Kahan summation has a worst-case error of roughly O(ε), independent of n, but requires several times more arithmetic operations.[1] If the roundoff errors are random, and in particular have random signs, then they form a random walk and the error growth is reduced to an average of {\displaystyle O(\varepsilon {\sqrt {\log n}})} O(\varepsilon {\sqrt {\log n}}) for pairwise summation.[2]

A very similar recursive structure of summation is found in many fast Fourier transform (FFT) algorithms, and is responsible for the same slow roundoff accumulation of those FFTs.[2][3]

However, these encouraging error-growth rates only apply if the trigonometric “twiddle” factors in the FFT algorithm are computed very accurately. Many FFT implementations, including FFTW and common manufacturer-optimized libraries, therefore use precomputed tables of twiddle factors calculated by means of standard library functions (which compute trigonometric constants to roughly machine precision). The other common method to compute twiddle factors is to use a trigonometric recurrence formula—this saves memory (and cache), but almost all recurrences have errors that grow as O(n‾√) , O(n) or even O(n2) which lead to corresponding errors in the FFT.


There are, in fact, trigonometric recurrences with the same logarithmic error growth as the FFT, but these seem more difficult to implement efficiently; they require that a table of Θ(logn) values be stored and updated as the recurrence progresses. Instead, in order to gain at least some of the benefits of a trigonometric recurrence (reduced memory pressure at the expense of more arithmetic), FFTW includes several ways to compute a much smaller twiddle table, from which the desired entries can be computed accurately on the fly using a bounded number (usually <3) of complex multiplications. For example, instead of a twiddle table with n entries ωkn , FFTW can use two tables with Θ(n‾√) entries each, so that ωkn is computed by multiplying an entry in one table (indexed with the low-order bits of k ) by an entry in the other table (indexed with the high-order bits of k ).

[ed.: Nicholas Higham's "Accuracy and Stability of Numerical Algorithms" seems like a good reference for this kind of analysis.]
nibble  pdf  papers  programming  systems  numerics  nitty-gritty  intricacy  approximation  accuracy  types  sci-comp  multi  q-n-a  stackex  hmm  oly-programming  accretion  formal-methods  yak-shaving  wiki  reference  algorithms  yoga  ground-up  divide-and-conquer  fourier  books  tidbits  chart  caltech  nostalgia 
may 2019 by nhaliday
Teach debugging
A friend of mine and I couldn't understand why some people were having so much trouble; the material seemed like common sense. The Feynman Method was the only tool we needed.

1. Write down the problem
2. Think real hard
3. Write down the solution

The Feynman Method failed us on the last project: the design of a divider, a real-world-scale project an order of magnitude more complex than anything we'd been asked to tackle before. On the day he assigned the project, the professor exhorted us to begin early. Over the next few weeks, we heard rumors that some of our classmates worked day and night without making progress.


And then, just after midnight, a number of our newfound buddies from dinner reported successes. Half of those who started from scratch had working designs. Others were despondent, because their design was still broken in some subtle, non-obvious way. As I talked with one of those students, I began poring over his design. And after a few minutes, I realized that the Feynman method wasn't the only way forward: it should be possible to systematically apply a mechanical technique repeatedly to find the source of our problems. Beneath all the abstractions, our projects consisted purely of NAND gates (woe to those who dug around our toolbox enough to uncover dynamic logic), which outputs a 0 only when both inputs are 1. If the correct output is 0, both inputs should be 1. The input that isn't is in error, an error that is, itself, the output of a NAND gate where at least one input is 0 when it should be 1. We applied this method recursively, finding the source of all the problems in both our designs in under half an hour.

How To Debug Any Program: https://www.blinddata.com/blog/how-to-debug-any-program-9
May 8th 2019 by Saketh Are

Start by Questioning Everything


When a program is behaving unexpectedly, our attention tends to be drawn first to the most complex portions of the code. However, mistakes can come in all forms. I've personally been guilty of rushing to debug sophisticated portions of my code when the real bug was that I forgot to read in the input file. In the following section, we'll discuss how to reliably focus our attention on the portions of the program that need correction.

Then Question as Little as Possible

Suppose that we have a program and some input on which its behavior doesn’t match our expectations. The goal of debugging is to narrow our focus to as small a section of the program as possible. Once our area of interest is small enough, the value of the incorrect output that is being produced will typically tell us exactly what the bug is.

In order to catch the point at which our program diverges from expected behavior, we must inspect the intermediate state of the program. Suppose that we select some point during execution of the program and print out all values in memory. We can inspect the results manually and decide whether they match our expectations. If they don't, we know for a fact that we can focus on the first half of the program. It either contains a bug, or our expectations of what it should produce were misguided. If the intermediate state does match our expectations, we can focus on the second half of the program. It either contains a bug, or our understanding of what input it expects was incorrect.

Question Things Efficiently

For practical purposes, inspecting intermediate state usually doesn't involve a complete memory dump. We'll typically print a small number of variables and check whether they have the properties we expect of them. Verifying the behavior of a section of code involves:

1. Before it runs, inspecting all values in memory that may influence its behavior.
2. Reasoning about the expected behavior of the code.
3. After it runs, inspecting all values in memory that may be modified by the code.

Reasoning about expected behavior is typically the easiest step to perform even in the case of highly complex programs. Practically speaking, it's time-consuming and mentally strenuous to write debug output into your program and to read and decipher the resulting values. It is therefore advantageous to structure your code into functions and sections that pass a relatively small amount of information between themselves, minimizing the number of values you need to inspect.


Finding the Right Question to Ask

We’ve assumed so far that we have available a test case on which our program behaves unexpectedly. Sometimes, getting to that point can be half the battle. There are a few different approaches to finding a test case on which our program fails. It is reasonable to attempt them in the following order:

1. Verify correctness on the sample inputs.
2. Test additional small cases generated by hand.
3. Adversarially construct corner cases by hand.
4. Re-read the problem to verify understanding of input constraints.
5. Design large cases by hand and write a program to construct them.
6. Write a generator to construct large random cases and a brute force oracle to verify outputs.
techtariat  dan-luu  engineering  programming  debugging  IEEE  reflection  stories  education  higher-ed  checklists  iteration-recursion  divide-and-conquer  thinking  ground-up  nitty-gritty  giants  feynman  error  input-output  structure  composition-decomposition  abstraction  systematic-ad-hoc  reduction  teaching  state  correctness  multi  oly  oly-programming  metabuch  neurons  problem-solving  wire-guided  marginal  strategy  tactics  methodology  simplification-normalization 
may 2019 by nhaliday
Hyperbolic angle - Wikipedia
A unit circle {\displaystyle x^{2}+y^{2}=1} x^2 + y^2 = 1 has a circular sector with an area half of the circular angle in radians. Analogously, a unit hyperbola {\displaystyle x^{2}-y^{2}=1} {\displaystyle x^{2}-y^{2}=1} has a hyperbolic sector with an area half of the hyperbolic angle.
nibble  math  trivia  wiki  reference  physics  relativity  concept  atoms  geometry  ground-up  characterization  measure  definition  plots  calculation  nitty-gritty  direction  metrics  manifolds 
november 2017 by nhaliday
What is the connection between special and general relativity? - Physics Stack Exchange
Special relativity is the "special case" of general relativity where spacetime is flat. The speed of light is essential to both.
nibble  q-n-a  overflow  physics  relativity  explanation  synthesis  hi-order-bits  ground-up  gravity  summary  aphorism  differential  geometry 
november 2017 by nhaliday
What is the difference between general and special relativity? - Quora
General Relativity is, quite simply, needed to explain gravity.

Special Relativity is the special case of GR, when the metric is flat — which means no gravity.

You need General Relativity when the metric gets all curvy, and when things start to experience gravitation.
nibble  q-n-a  qra  explanation  physics  relativity  synthesis  hi-order-bits  ground-up  gravity  summary  aphorism  differential  geometry 
november 2017 by nhaliday
Any particular gene has a specific location (its "locus") on a particular chromosome. For any two genes (or loci) alpha and beta, we can ask "What is the recombination frequency between them?" If the genes are on different chromosomes, the answer is 50% (independent assortment). If the two genes are on the same chromosome, the recombination frequency will be somewhere in the range from 0 to 50%. The "map unit" (1 cM) is the genetic map distance that corresponds to a recombination frequency of 1%. In large chromosomes, the cumulative map distance may be much greater than 50cM, but the maximum recombination frequency is 50%. Why? In large chromosomes, there is enough length to allow for multiple cross-overs, so we have to ask what result we expect for random multiple cross-overs.

1. How is it that random multiple cross-overs give the same result as independent assortment?

Figure 5.12 shows how the various double cross-over possibilities add up, resulting in gamete genotype percentages that are indistinguisable from independent assortment (50% parental type, 50% non-parental type). This is a very important figure. It provides the explanation for why genes that are far apart on a very large chromosome sort out in crosses just as if they were on separate chromosomes.

2. Is there a way to measure how close together two crossovers can occur involving the same two chromatids? That is, how could we measure whether there is spacial "interference"?

Figure 5.13 shows how a measurement of the gamete frequencies resulting from a "three point cross" can answer this question. If we would get a "lower than expected" occurrence of recombinant genotypes aCb and AcB, it would suggest that there is some hindrance to the two cross-overs occurring this close together. Crosses of this type in Drosophila have shown that, in this organism, double cross-overs do not occur at distances of less than about 10 cM between the two cross-over sites. ( Textbook, page 196. )

3. How does all of this lead to the "mapping function", the mathematical (graphical) relation between the observed recombination frequency (percent non-parental gametes) and the cumulative genetic distance in map units?

Figure 5.14 shows the result for the two extremes of "complete interference" and "no interference". The situation for real chromosomes in real organisms is somewhere between these extremes, such as the curve labelled "interference decreasing with distance".
org:junk  org:edu  explanation  faq  nibble  genetics  genomics  bio  ground-up  magnitude  data  flux-stasis  homo-hetero  measure  orders  metric-space  limits  measurement 
october 2017 by nhaliday
Best Topology Olympiad ***EVER*** - Affine Mess - Quora
Most people take courses in topology, algebraic topology, knot theory, differential topology and what have you without once doing anything with a finite topological space. There may have been some quirky questions about such spaces early on in a point-set topology course, but most of us come out of these courses thinking that finite topological spaces are either discrete or only useful as an exotic counterexample to some standard separation property. The mere idea of calculating the fundamental group for a 4-point space seems ludicrous.

Only it’s not. This is a genuine question, not a joke, and I find it both hilarious and super educational. DO IT!!
nibble  qra  announcement  math  geometry  topology  puzzles  rec-math  oly  links  math.AT  ground-up  finiteness  math.GN 
october 2017 by nhaliday
Resonance in a Pendulum - YouTube
The vibration of any given washer is able to transmit its energy only to another washer with exactly the same frequency. Since the length of a pendulum determines its frequency of vibration, each pendulum can only set another pendulum vibrating if it has the same length.
nibble  video  social  physics  mechanics  waves  oscillation  synchrony  flux-stasis  increase-decrease  concrete  ground-up  dirty-hands  phys-energy  frequency  spreading 
september 2017 by nhaliday
Power of a point - Wikipedia
The power of point P (see in Figure 1) can be defined equivalently as the product of distances from the point P to the two intersection points of any ray emanating from P.
nibble  math  geometry  spatial  ground-up  concept  metrics  invariance  identity  atoms  wiki  reference  measure  yoga  calculation 
september 2017 by nhaliday
Centers of gravity in non-uniform fields - Wikipedia
In physics, a center of gravity of a material body is a point that may be used for a summary description of gravitational interactions. In a uniform gravitational field, the center of mass serves as the center of gravity. This is a very good approximation for smaller bodies near the surface of Earth, so there is no practical need to distinguish "center of gravity" from "center of mass" in most applications, such as engineering and medicine.

In a non-uniform field, gravitational effects such as potential energy, force, and torque can no longer be calculated using the center of mass alone. In particular, a non-uniform gravitational field can produce a torque on an object, even about an axis through the center of mass. The center of gravity seeks to explain this effect. Formally, a center of gravity is an application point of the resultant gravitational force on the body. Such a point may not exist, and if it exists, it is not unique. One can further define a unique center of gravity by approximating the field as either parallel or spherically symmetric.

The concept of a center of gravity as distinct from the center of mass is rarely used in applications, even in celestial mechanics, where non-uniform fields are important. Since the center of gravity depends on the external field, its motion is harder to determine than the motion of the center of mass. The common method to deal with gravitational torques is a field theory.
nibble  wiki  reference  physics  mechanics  intricacy  atoms  expectancy  spatial  direction  ground-up  concept  existence  uniqueness  homo-hetero  gravity  gotchas 
september 2017 by nhaliday
Drude model - Wikipedia
The Drude model of electrical conduction was proposed in 1900[1][2] by Paul Drude to explain the transport properties of electrons in materials (especially metals). The model, which is an application of kinetic theory, assumes that the microscopic behavior of electrons in a solid may be treated classically and looks much like _a pinball machine_, with a sea of constantly jittering electrons bouncing and re-bouncing off heavier, relatively immobile positive ions.

The two most significant results of the Drude model are an electronic equation of motion,

d<p(t)>/dt = q(E + 1/m <p(t)> x B) - <p(t)>/τ

and a linear relationship between current density J and electric field E,

J = (nq^2τ/m) E

latter is Ohm's law
nibble  physics  electromag  models  local-global  stat-mech  identity  atoms  wiki  reference  ground-up  cartoons 
september 2017 by nhaliday
Introduction to Scaling Laws

Galileo’s Discovery of Scaling Laws: https://www.mtholyoke.edu/~mpeterso/classes/galileo/scaling8.pdf
Days 1 and 2 of Two New Sciences

An example of such an insight is “the surface of a small solid is comparatively greater than that of a large one” because the surface goes like the square of a linear dimension, but the volume goes like the cube.5 Thus as one scales down macroscopic objects, forces on their surfaces like viscous drag become relatively more important, and bulk forces like weight become relatively less important. Galileo uses this idea on the First Day in the context of resistance in free fall, as an explanation for why similar objects of different size do not fall exactly together, but the smaller one lags behind.
nibble  org:junk  exposition  lecture-notes  physics  mechanics  street-fighting  problem-solving  scale  magnitude  estimate  fermi  mental-math  calculation  nitty-gritty  multi  scitariat  org:bleg  lens  tutorial  guide  ground-up  tricki  skeleton  list  cheatsheet  identity  levers  hi-order-bits  yoga  metabuch  pdf  article  essay  history  early-modern  europe  the-great-west-whale  science  the-trenches  discovery  fluid  architecture  oceans  giants  tidbits  elegance 
august 2017 by nhaliday
Inscribed angle - Wikipedia
- for triangle w/ one side = a diameter, draw isosceles triangle and use supplementary angle identities
- otherwise draw second triangle w/ side = a diameter, and use above result twice
nibble  math  geometry  spatial  ground-up  wiki  reference  proofs  identity  levers  yoga 
august 2017 by nhaliday
Analysis of variance - Wikipedia
Analysis of variance (ANOVA) is a collection of statistical models used to analyze the differences among group means and their associated procedures (such as "variation" among and between groups), developed by statistician and evolutionary biologist Ronald Fisher. In the ANOVA setting, the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether or not the means of several groups are equal, and therefore generalizes the t-test to more than two groups. ANOVAs are useful for comparing (testing) three or more means (groups or variables) for statistical significance. It is conceptually similar to multiple two-sample t-tests, but is more conservative (results in less type I error) and is therefore suited to a wide range of practical problems.

good pic: https://en.wikipedia.org/wiki/Analysis_of_variance#Motivating_example

tutorial by Gelman: http://www.stat.columbia.edu/~gelman/research/published/econanova3.pdf

so one way to think of partitioning the variance:
y_ij = alpha_i + beta_j + eps_ij
Var(y_ij) = Var(alpha_i) + Var(beta_j) + Cov(alpha_i, beta_j) + Var(eps_ij)
and alpha_i, beta_j are independent, so Cov(alpha_i, beta_j) = 0

can you make this work w/ interaction effects?
data-science  stats  methodology  hypothesis-testing  variance-components  concept  conceptual-vocab  thinking  wiki  reference  nibble  multi  visualization  visual-understanding  pic  pdf  exposition  lecture-notes  gelman  scitariat  tutorial  acm  ground-up  yoga 
july 2017 by nhaliday
co.combinatorics - Classification of Platonic solids - MathOverflow
My question is very basic: where can I find a complete (and hopefully self-contained) proof of the classification of Platonic solids? In all the references that I found, they use Euler's formula v−e+f=2v−e+f=2 to show that there are exactly five possible triples (v,e,f)(v,e,f). But of course this is not a complete proof because it does not rule out the possibility of different configurations or deformations. Has anyone ever written up a complete proof of this statement?!


This is a classical question. Here is my reading of it: Why is there a unique polytope with given combinatorics of faces, which are all regular polygons? Of course, for simple polytopes (tetrahedron, cube, dodecahedron) this is clear, but for the octahedron and icosahedron this is less clear.

The answer lies in the Cauchy's theorem. It was Legendre, while writing his Elements of Geometry and Trigonometry, noticed that Euclid's proof is incomplete in the Elements. Curiously, Euclid finds both radii of inscribed and circumscribed spheres (correctly) without ever explaining why they exist. Cauchy worked out a proof while still a student in 1813, more or less specifically for this purpose. The proof also had a technical gap which was found and patched up by Steinitz in 1920s.

The complete (corrected) proof can be found in the celebrated Proofs from the Book, or in Marcel Berger's Geometry. My book gives a bit more of historical context and some soft arguments (ch. 19). It's worth comparing this proof with (an erroneous) pre-Steinitz exposition, say in Hadamard's Leçons de Géométrie Elémentaire II, or with an early post-Steinitz correct but tedious proof given in (otherwise, excellent) Alexandrov's monograph (see also ch.26 in my book which compares all the approaches).

P.S. Note that Coxeter in Regular Polytopes can completely avoid this issue but taking a different (modern) definition of the regular polytopes (which are symmetric under group actions). For a modern exposition and the state of art of this approach, see McMullen and Schulte's Abstract Regular Polytopes.

q-n-a  overflow  math  topology  geometry  math.CO  history  iron-age  mediterranean  the-classics  multi  curiosity  clarity  proofs  nibble  wiki  reference  characterization  uniqueness  list  ground-up  grokkability-clarity 
july 2017 by nhaliday
Harmonic mean - Wikipedia
The harmonic mean is a Schur-concave function, and dominated by the minimum of its arguments, in the sense that for any positive set of arguments, {\displaystyle \min(x_{1}\ldots x_{n})\leq H(x_{1}\ldots x_{n})\leq n\min(x_{1}\ldots x_{n})} . Thus, the harmonic mean cannot be made arbitrarily large by changing some values to bigger ones (while having at least one value unchanged).

more generally, for the weighted mean w/ Pr(x_i)=t_i, H(x1,...,xn) <= x_i/t_i
nibble  math  properties  estimate  concept  definition  wiki  reference  extrema  magnitude  expectancy  metrics  ground-up 
july 2017 by nhaliday
Lecture 6: Nash Equilibrum Existence
- For mixed strategy profile p ∈ Δ(A), let g_ij(p) = gain for player i to switch to pure strategy j.
- Define y: Δ(A) -> Δ(A) by y_ij(p) ∝ p_ij + g_ij(p) (normalizing constant = 1 + ∑_k g_ik(p)).
- Look at fixed point of y.
pdf  nibble  lecture-notes  exposition  acm  game-theory  proofs  math  topology  existence  fixed-point  simplex  equilibrium  ground-up 
june 2017 by nhaliday
Fundamental Theorems of Evolution: The American Naturalist: Vol 0, No 0
I suggest that the most fundamental theorem of evolution is the Price equation, both because of its simplicity and broad scope and because it can be used to derive four other familiar results that are similarly fundamental: Fisher’s average-excess equation, Robertson’s secondary theorem of natural selection, the breeder’s equation, and Fisher’s fundamental theorem. These derivations clarify both the relationships behind these results and their assumptions. Slightly less fundamental results include those for multivariate evolution and social selection. A key feature of fundamental theorems is that they have great simplicity and scope, which are often achieved by sacrificing perfect accuracy. Quantitative genetics has been more productive of fundamental theorems than population genetics, probably because its empirical focus on unknown genotypes freed it from the tyranny of detail and allowed it to focus on general issues.
study  essay  bio  evolution  population-genetics  fisher  selection  EGT  dynamical  exposition  methodology  🌞  big-picture  levers  list  nibble  article  chart  explanation  clarity  trees  ground-up  ideas  grokkability-clarity 
march 2017 by nhaliday
More on Multivariate Gaussians
Fact #1: mean and covariance uniquely determine distribution
Fact #3: closure under sum, marginalizing, and conditioning
covariance of conditional distribution is given by a Schur complement (independent of x_B. is that obvious?)
pdf  exposition  lecture-notes  stanford  nibble  distribution  acm  machine-learning  probability  levers  calculation  ground-up  characterization  rigidity  closure  nitty-gritty  linear-algebra  properties 
february 2017 by nhaliday
Hoeffding’s Inequality
basic idea of standard pf: bound e^{tX} by line segment (convexity) then use Taylor expansion (in p = b/(b-a), the fraction of range to right of 0) of logarithm
pdf  lecture-notes  exposition  nibble  concentration-of-measure  estimate  proofs  ground-up  acm  probability  series  s:null 
february 2017 by nhaliday
Einstein's Most Famous Thought Experiment
When Einstein abandoned an emission theory of light, he had also to abandon the hope that electrodynamics could be made to conform to the principle of relativity by the normal sorts of modifications to electrodynamic theory that occupied the theorists of the second half of the 19th century. Instead Einstein knew he must resort to extraordinary measures. He was willing to seek realization of his goal in a re-examination of our basic notions of space and time. Einstein concluded his report on his youthful thought experiment:

"One sees that in this paradox the germ of the special relativity theory is already contained. Today everyone knows, of course, that all attempts to clarify this paradox satisfactorily were condemned to failure as long as the axiom of the absolute character of time, or of simultaneity, was rooted unrecognized in the unconscious. To recognize clearly this axiom and its arbitrary character already implies the essentials of the solution of the problem."
einstein  giants  physics  history  stories  gedanken  exposition  org:edu  electromag  relativity  nibble  innovation  novelty  the-trenches  synchrony  discovery  🔬  org:junk  science  absolute-relative  visuo  explanation  ground-up  clarity  state  causation  intuition  ideas  mostly-modern  pre-ww2  marginal  grokkability-clarity 
february 2017 by nhaliday
A brief philosophical discussion:
Measure theory, as much as any branch of mathematics, is an area where it is important to be acquainted with the basic notions and statements, but not desperately important to be acquainted with the detailed proofs, which are often rather unilluminating. One should always have in a mind a place where one could go and look if one ever did need to understand a proof: for me, that place is Rudin’s Real and Complex Analysis (Rudin’s “red book”).
gowers  pdf  math  math.CA  math.FA  philosophy  measure  exposition  synthesis  big-picture  hi-order-bits  ergodic  ground-up  summary  roadmap  mathtariat  proofs  nibble  unit  integral  zooming  p:whenever 
february 2017 by nhaliday
Cauchy-Schwarz inequality and Hölder's inequality - Mathematics Stack Exchange
- Cauchy-Schwarz (special case of Holder's inequality where p=q=1/2) implies Holder's inequality
- pith: define potential F(t) = int f^{pt} g^{q(1-t)}, show log F is midpoint-convex hence convex, then apply convexity between F(0) and F(1) for F(1/p) = ||fg||_1
q-n-a  overflow  math  estimate  proofs  ground-up  math.FA  inner-product  tidbits  norms  duality  nibble  integral 
january 2017 by nhaliday
interpretation - How to understand degrees of freedom? - Cross Validated
From Wikipedia, there are three interpretations of the degrees of freedom of a statistic:

In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary.

Estimates of statistical parameters can be based upon different amounts of information or data. The number of independent pieces of information that go into the estimate of a parameter is called the degrees of freedom (df). In general, the degrees of freedom of an estimate of a parameter is equal to the number of independent scores that go into the estimate minus the number of parameters used as intermediate steps in the estimation of the parameter itself (which, in sample variance, is one, since the sample mean is the only intermediate step).

Mathematically, degrees of freedom is the dimension of the domain of a random vector, or essentially the number of 'free' components: how many components need to be known before the vector is fully determined.


This is a subtle question. It takes a thoughtful person not to understand those quotations! Although they are suggestive, it turns out that none of them is exactly or generally correct. I haven't the time (and there isn't the space here) to give a full exposition, but I would like to share one approach and an insight that it suggests.

Where does the concept of degrees of freedom (DF) arise? The contexts in which it's found in elementary treatments are:

- The Student t-test and its variants such as the Welch or Satterthwaite solutions to the Behrens-Fisher problem (where two populations have different variances).
- The Chi-squared distribution (defined as a sum of squares of independent standard Normals), which is implicated in the sampling distribution of the variance.
- The F-test (of ratios of estimated variances).
- The Chi-squared test, comprising its uses in (a) testing for independence in contingency tables and (b) testing for goodness of fit of distributional estimates.

In spirit, these tests run a gamut from being exact (the Student t-test and F-test for Normal variates) to being good approximations (the Student t-test and the Welch/Satterthwaite tests for not-too-badly-skewed data) to being based on asymptotic approximations (the Chi-squared test). An interesting aspect of some of these is the appearance of non-integral "degrees of freedom" (the Welch/Satterthwaite tests and, as we will see, the Chi-squared test). This is of especial interest because it is the first hint that DF is not any of the things claimed of it.


Having been alerted by these potential ambiguities, let's hold up the Chi-squared goodness of fit test for examination, because (a) it's simple, (b) it's one of the common situations where people really do need to know about DF to get the p-value right and (c) it's often used incorrectly. Here's a brief synopsis of the least controversial application of this test:


This, many authorities tell us, should have (to a very close approximation) a Chi-squared distribution. But there's a whole family of such distributions. They are differentiated by a parameter νν often referred to as the "degrees of freedom." The standard reasoning about how to determine νν goes like this

I have kk counts. That's kk pieces of data. But there are (functional) relationships among them. To start with, I know in advance that the sum of the counts must equal nn. That's one relationship. I estimated two (or pp, generally) parameters from the data. That's two (or pp) additional relationships, giving p+1p+1 total relationships. Presuming they (the parameters) are all (functionally) independent, that leaves only k−p−1k−p−1 (functionally) independent "degrees of freedom": that's the value to use for νν.

The problem with this reasoning (which is the sort of calculation the quotations in the question are hinting at) is that it's wrong except when some special additional conditions hold. Moreover, those conditions have nothing to do with independence (functional or statistical), with numbers of "components" of the data, with the numbers of parameters, nor with anything else referred to in the original question.


Things went wrong because I violated two requirements of the Chi-squared test:

1. You must use the Maximum Likelihood estimate of the parameters. (This requirement can, in practice, be slightly violated.)
2. You must base that estimate on the counts, not on the actual data! (This is crucial.)


The point of this comparison--which I hope you have seen coming--is that the correct DF to use for computing the p-values depends on many things other than dimensions of manifolds, counts of functional relationships, or the geometry of Normal variates. There is a subtle, delicate interaction between certain functional dependencies, as found in mathematical relationships among quantities, and distributions of the data, their statistics, and the estimators formed from them. Accordingly, it cannot be the case that DF is adequately explainable in terms of the geometry of multivariate normal distributions, or in terms of functional independence, or as counts of parameters, or anything else of this nature.

We are led to see, then, that "degrees of freedom" is merely a heuristic that suggests what the sampling distribution of a (t, Chi-squared, or F) statistic ought to be, but it is not dispositive. Belief that it is dispositive leads to egregious errors. (For instance, the top hit on Google when searching "chi squared goodness of fit" is a Web page from an Ivy League university that gets most of this completely wrong! In particular, a simulation based on its instructions shows that the chi-squared value it recommends as having 7 DF actually has 9 DF.)
q-n-a  overflow  stats  data-science  concept  jargon  explanation  methodology  things  nibble  degrees-of-freedom  clarity  curiosity  manifolds  dimensionality  ground-up  intricacy  hypothesis-testing  examples  list  ML-MAP-E  gotchas  grokkability-clarity 
january 2017 by nhaliday
teaching - Intuitive explanation for dividing by $n-1$ when calculating standard deviation? - Cross Validated
The standard deviation calculated with a divisor of n-1 is a standard deviation calculated from the sample as an estimate of the standard deviation of the population from which the sample was drawn. Because the observed values fall, on average, closer to the sample mean than to the population mean, the standard deviation which is calculated using deviations from the sample mean underestimates the desired standard deviation of the population. Using n-1 instead of n as the divisor corrects for that by making the result a little bit bigger.

Note that the correction has a larger proportional effect when n is small than when it is large, which is what we want because when n is larger the sample mean is likely to be a good estimator of the population mean.


A common one is that the definition of variance (of a distribution) is the second moment recentered around a known, definite mean, whereas the estimator uses an estimated mean. This loss of a degree of freedom (given the mean, you can reconstitute the dataset with knowledge of just n−1 of the data values) requires the use of n−1 rather than nn to "adjust" the result.
q-n-a  overflow  stats  acm  intuition  explanation  bias-variance  methodology  moments  nibble  degrees-of-freedom  sampling-bias  generalization  dimensionality  ground-up  intricacy 
january 2017 by nhaliday
« earlier      
per page:    204080120160

bundles : initmathproblem-solvingvague

related tags

aaronson  absolute-relative  abstraction  academia  accretion  accuracy  acm  acmtariat  advanced  adversarial  advice  age-generation  aging  ai  alg-combo  algebra  algorithmic-econ  algorithms  amortization-potential  AMT  analogy  analysis  analytical-holistic  announcement  aphorism  apollonian-dionysian  applicability-prereqs  applications  approximation  architecture  arrows  art  article  assembly  atmosphere  atoms  audio  automata-languages  bandits  bayesian  ben-recht  best-practices  better-explained  bias-variance  big-list  big-peeps  big-picture  bio  biodet  bioinformatics  bits  boltzmann  books  boolean-analysis  business  c(pp)  caching  calculation  caltech  canon  career  cartoons  causation  characterization  chart  cheatsheet  checking  checklists  chemistry  circuits  clarity  classic  clever-rats  closure  cmu  cocktail  code-dive  code-organizing  coding-theory  commentary  common-case  communication  comparison  competition  complexity  composition-decomposition  computation  concentration-of-measure  concept  conceptual-vocab  concrete  concurrency  conference  confluence  confusion  contest  contradiction  contrarianism  convexity-curvature  cool  core-rats  correctness  cost-benefit  coupling-cohesion  courage  course  critique  crypto  cs  culture  curiosity  curvature  dan-luu  dark-arts  data  data-science  data-structures  database  dbs  debugging  decision-making  deep-learning  definition  degrees-of-freedom  density  devops  differential  dimensionality  direction  dirty-hands  discovery  discrete  discussion  distributed  distribution  divide-and-conquer  draft  duality  dumb-ML  dynamic  dynamical  dysgenics  early-modern  econometrics  education  EGT  einstein  electromag  elegance  elite  embedded  encyclopedic  engineering  ensembles  entropy-like  equilibrium  ergodic  erik-demaine  error  essay  estimate  europe  evolution  examples  existence  expectancy  expert  expert-experience  explanans  explanation  exploratory  explore-exploit  exposition  extratricky  extrema  faq  fedja  fermi  feynman  fields  finance  finiteness  fire  fisher  fixed-point  fluid  flux-stasis  focs  foreign-lang  formal-methods  fourier  frequency  frontend  frontier  functional  game-theory  games  gedanken  gelman  generalization  genetics  genomics  geometry  giants  gotchas  gowers  grad-school  gradient-descent  graph-theory  graphical-models  graphs  gravity  grokkability-clarity  ground-up  growth  GT-101  guessing  guide  hamming  hardware  harvard  heavyweights  heuristic  hi-order-bits  high-dimension  high-variance  higher-ed  history  hmm  hn  homo-hetero  homogeneity  human-capital  humility  hypothesis-testing  ideas  identification-equivalence  identity  idk  IEEE  increase-decrease  induction  inference  info-dynamics  info-foraging  infographic  information-theory  init  inner-product  innovation  input-output  insight  instinct  integral  interdisciplinary  intricacy  intuition  invariance  iron-age  isotropy  iteration-recursion  iterative-methods  jargon  javascript  jelani-nelson  jobs  judgement  jvm  kernels  kinship  knowledge  large-factor  learning  learning-theory  lecture-notes  lectures  legacy  len:long  lens  lesswrong  let-me-see  levers  lifts-projections  limits  linear-algebra  linear-models  linear-programming  linearity  links  linux  list  local-global  logic  long-term  lower-bounds  luca-trevisan  machine-learning  magnitude  management  manifolds  map-territory  marginal  marketing  markov  matching  math  math.AC  math.AT  math.CA  math.CO  math.CV  math.DS  math.FA  math.GN  math.GR  math.NT  math.RT  mathtariat  matrix-factorization  measure  measurement  mechanics  mechanism-design  mediterranean  mental-math  meta:math  meta:research  meta:science  metabuch  metal-to-virtual  metameta  methodology  metric-space  metrics  michael-nielsen  microsoft  minimum-viable  mit  ML-MAP-E  model-class  models  moments  monte-carlo  mostly-modern  motivation  mrtz  multi  multiplicative  mutation  narrative  nature  networking  neurons  news  nibble  nitty-gritty  nlp  no-go  nonlinearity  nonparametric  norms  nostalgia  notetaking  novelty  numerics  objektbuch  occam  oceans  ocw  off-convex  old-anglo  oly  oly-programming  online-learning  operational  optics  optimate  optimization  orders  ORFE  org:biz  org:bleg  org:edu  org:inst  org:junk  org:lite  org:mat  org:med  org:nat  org:popup  organization  orourke  os  oscillation  overflow  p:*  p:**  p:***  p:null  p:someday  p:whenever  PAC  papers  paradox  parametric  parsimony  pdf  performance  perturbation  philosophy  phys-energy  physics  pic  pigeonhole-markov  piracy  plots  polynomials  pop-structure  population-genetics  positivity  practice  pragmatic  pre-ww2  preimage  preprint  presentation  princeton  prioritizing  probabilistic-method  probability  problem-solving  productivity  programming  proof-systems  proofs  properties  pseudorandomness  psychometrics  puzzles  q-n-a  qra  QTL  quantifiers-sums  quantum  quantum-info  quantum-money  quixotic  quiz  quotes  rand-approx  rand-complexity  random  ranking  reading  reason  rec-math  recommendations  recruiting  reduction  reference  reflection  regression  regression-to-mean  regularizer  reinforcement  relativity  relativization  relaxation  research  retention  retrofit  rigidity  rigorous-crypto  roadmap  robust  rounding  ryan-odonnell  s-factor  s:*  s:**  s:***  s:null  sample-complexity  sampling  sampling-bias  sanjeev-arora  scale  scholar  scholar-pack  sci-comp  science  scitariat  SDP  sebastien-bubeck  security  selection  sensitivity  series  shannon  short-circuit  signal-noise  signum  simplex  simplification-normalization  simulation  skeleton  sky  slides  smoothness  social  social-science  soft-question  space  space-complexity  spatial  spreading  stackex  stanford  stat-mech  state  stats  stoc  stock-flow  store  stories  strategy  street-fighting  structure  study  studying  stylized-facts  submodular  success  summary  sv  symmetry  synchrony  synthesis  system-design  systematic-ad-hoc  systems  tactics  talks  tcs  tcstariat  teaching  tech  technical-writing  techtariat  temperature  tensors  the-classics  the-great-west-whale  the-trenches  the-world-is-just-atoms  thermo  things  thinking  thurston  tidbits  tim-roughgarden  time-complexity  toolkit  top-n  topology  toys  tradeoffs  trees  tricki  tricks  trivia  turing  tutorial  twitter  types  UGC  uniqueness  unit  unix  unsupervised  usa  usaco-ioi  vague  variance-components  vc-dimension  video  virtu  visual-understanding  visualization  visuo  war  water  waves  web  wigderson  wiki  winter-2017  wire-guided  wisdom  wordlessness  working-stiff  wormholes  worrydream  writing  yak-shaving  yoga  zooming  🌞  🎓  👳  🔬  🖥  🦉 

Copy this bookmark: