ml-map-e   21

Fitting a Structural Equation Model
seems rather unrigorous: nonlinear optimization, possibility of nonconvergence, doesn't even mention local vs. global optimality...
pdf  slides  lectures  acm  stats  hypothesis-testing  graphs  graphical-models  latent-variables  model-class  optimization  nonlinearity  gotchas  nibble  ML-MAP-E  iteration-recursion  convergence 
november 2017 by nhaliday
Atrocity statistics from the Roman Era
Christian Martyrs [make link]
Gibbon, Decline & Fall v.2 ch.XVI: < 2,000 k. under Roman persecution.
Ludwig Hertling ("Die Zahl de Märtyrer bis 313", 1944) estimated 100,000 Christians killed between 30 and 313 CE. (cited -- unfavorably -- by David Henige, Numbers From Nowhere, 1998)
Catholic Encyclopedia, "Martyr": number of Christian martyrs under the Romans unknown, unknowable. Origen says not many. Eusebius says thousands.


General population decline during The Fall of Rome: 7,000,000 [make link]
- Colin McEvedy, The New Penguin Atlas of Medieval History (1992)
- From 2nd Century CE to 4th Century CE: Empire's population declined from 45M to 36M [i.e. 9M]
- From 400 CE to 600 CE: Empire's population declined by 20% [i.e. 7.2M]
- Paul Bairoch, Cities and economic development: from the dawn of history to the present, p.111
- "The population of Europe except Russia, then, having apparently reached a high point of some 40-55 million people by the start of the third century [ca.200 C.E.], seems to have fallen by the year 500 to about 30-40 million, bottoming out at about 20-35 million around 600." [i.e. ca.20M]
- Francois Crouzet, A History of the European Economy, 1000-2000 (University Press of Virginia: 2001) p.1.
- "The population of Europe (west of the Urals) in c. AD 200 has been estimated at 36 million; by 600, it had fallen to 26 million; another estimate (excluding ‘Russia’) gives a more drastic fall, from 44 to 22 million." [i.e. 10M or 22M]

The geometric mean of these two extremes would come to 4½ per day, which is a credible daily rate for the really bad years.

why geometric mean? can you get it as the MLE given min{X1, ..., Xn} and max{X1, ..., Xn} for {X_i} iid Poissons? some kinda limit? think it might just be a rule of thumb.

yeah, it's a rule of thumb. found it it his book (epub).
org:junk  data  let-me-see  scale  history  iron-age  mediterranean  the-classics  death  nihil  conquest-empire  war  peace-violence  gibbon  trivia  multi  todo  AMT  expectancy  heuristic  stats  ML-MAP-E  data-science  estimate  magnitude  population  demographics  database  list  religion  christianity  leviathan 
september 2017 by nhaliday
interpretation - How to understand degrees of freedom? - Cross Validated
From Wikipedia, there are three interpretations of the degrees of freedom of a statistic:

In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary.

Estimates of statistical parameters can be based upon different amounts of information or data. The number of independent pieces of information that go into the estimate of a parameter is called the degrees of freedom (df). In general, the degrees of freedom of an estimate of a parameter is equal to the number of independent scores that go into the estimate minus the number of parameters used as intermediate steps in the estimation of the parameter itself (which, in sample variance, is one, since the sample mean is the only intermediate step).

Mathematically, degrees of freedom is the dimension of the domain of a random vector, or essentially the number of 'free' components: how many components need to be known before the vector is fully determined.


This is a subtle question. It takes a thoughtful person not to understand those quotations! Although they are suggestive, it turns out that none of them is exactly or generally correct. I haven't the time (and there isn't the space here) to give a full exposition, but I would like to share one approach and an insight that it suggests.

Where does the concept of degrees of freedom (DF) arise? The contexts in which it's found in elementary treatments are:

- The Student t-test and its variants such as the Welch or Satterthwaite solutions to the Behrens-Fisher problem (where two populations have different variances).
- The Chi-squared distribution (defined as a sum of squares of independent standard Normals), which is implicated in the sampling distribution of the variance.
- The F-test (of ratios of estimated variances).
- The Chi-squared test, comprising its uses in (a) testing for independence in contingency tables and (b) testing for goodness of fit of distributional estimates.

In spirit, these tests run a gamut from being exact (the Student t-test and F-test for Normal variates) to being good approximations (the Student t-test and the Welch/Satterthwaite tests for not-too-badly-skewed data) to being based on asymptotic approximations (the Chi-squared test). An interesting aspect of some of these is the appearance of non-integral "degrees of freedom" (the Welch/Satterthwaite tests and, as we will see, the Chi-squared test). This is of especial interest because it is the first hint that DF is not any of the things claimed of it.


Having been alerted by these potential ambiguities, let's hold up the Chi-squared goodness of fit test for examination, because (a) it's simple, (b) it's one of the common situations where people really do need to know about DF to get the p-value right and (c) it's often used incorrectly. Here's a brief synopsis of the least controversial application of this test:


This, many authorities tell us, should have (to a very close approximation) a Chi-squared distribution. But there's a whole family of such distributions. They are differentiated by a parameter νν often referred to as the "degrees of freedom." The standard reasoning about how to determine νν goes like this

I have kk counts. That's kk pieces of data. But there are (functional) relationships among them. To start with, I know in advance that the sum of the counts must equal nn. That's one relationship. I estimated two (or pp, generally) parameters from the data. That's two (or pp) additional relationships, giving p+1p+1 total relationships. Presuming they (the parameters) are all (functionally) independent, that leaves only k−p−1k−p−1 (functionally) independent "degrees of freedom": that's the value to use for νν.

The problem with this reasoning (which is the sort of calculation the quotations in the question are hinting at) is that it's wrong except when some special additional conditions hold. Moreover, those conditions have nothing to do with independence (functional or statistical), with numbers of "components" of the data, with the numbers of parameters, nor with anything else referred to in the original question.


Things went wrong because I violated two requirements of the Chi-squared test:

1. You must use the Maximum Likelihood estimate of the parameters. (This requirement can, in practice, be slightly violated.)
2. You must base that estimate on the counts, not on the actual data! (This is crucial.)


The point of this comparison--which I hope you have seen coming--is that the correct DF to use for computing the p-values depends on many things other than dimensions of manifolds, counts of functional relationships, or the geometry of Normal variates. There is a subtle, delicate interaction between certain functional dependencies, as found in mathematical relationships among quantities, and distributions of the data, their statistics, and the estimators formed from them. Accordingly, it cannot be the case that DF is adequately explainable in terms of the geometry of multivariate normal distributions, or in terms of functional independence, or as counts of parameters, or anything else of this nature.

We are led to see, then, that "degrees of freedom" is merely a heuristic that suggests what the sampling distribution of a (t, Chi-squared, or F) statistic ought to be, but it is not dispositive. Belief that it is dispositive leads to egregious errors. (For instance, the top hit on Google when searching "chi squared goodness of fit" is a Web page from an Ivy League university that gets most of this completely wrong! In particular, a simulation based on its instructions shows that the chi-squared value it recommends as having 7 DF actually has 9 DF.)
q-n-a  overflow  stats  data-science  concept  jargon  explanation  methodology  things  nibble  degrees-of-freedom  clarity  curiosity  manifolds  dimensionality  ground-up  intricacy  hypothesis-testing  examples  list  ML-MAP-E  gotchas 
january 2017 by nhaliday

related tags

accretion  acm  acmtariat  amt  approximation  article  assortative-mating  atoms  automata  bayesian  behavioral-gen  bio  biodet  bioinformatics  boltzmann  books  caltech  christianity  clarity  classic  clever-rats  comparison  concept  conceptual-vocab  confidence  confluence  confusion  conquest-empire  convergence  convexity-curvature  correlation  course  curiosity  curvature  data-science  data  database  death  degrees-of-freedom  demographics  dimensionality  distribution  dumb-ml  enhancement  estimate  examples  expectancy  expert-experience  expert  explanation  exploratory  exposition  frequentist  gcta  generative  genetics  genomics  gibbon  gotchas  graphical-models  graphs  ground-up  gxe  heuristic  history  hmm  howto  hypothesis-testing  ideas  init  intricacy  iron-age  iteration-recursion  jargon  latent-variables  lecture-notes  lectures  lens  let-me-see  levers  leviathan  liner-notes  links  list  machine-learning  magnitude  manifolds  markov  mathtariat  matrix-factorization  mediterranean  methodology  missing-heritability  model-class  models  moments  monte-carlo  multi  nibble  nihil  nlp  nonlinearity  optimization  org:bleg  org:junk  overflow  p:*  papers  parametric  pdf  peace-violence  phys-energy  piracy  poast  population-genetics  population  preprint  priors-posteriors  programming  python  q-n-a  qra  qtl  ratty  recommendations  reference  reflection  regression  religion  review  sampling  scale  scaling-up  slides  spearhead  stackex  stanford  stat-power  stats  study  summary  synthesis  talks  the-classics  things  thinking  todo  top-n  trivia  tutorial  twin-study  unit  variance-components  war  wiki  winter-2017  yoga  🌞  🔬 

Copy this bookmark: