Pearson's paper of 1900 introduced what subsequently became known as the chi-squared test of goodness of fit. The terminology and allusions of 80 years ago create a barrier for the modern reader, who finds that the interpretation of Pearson's test procedure and the assessment of what he achieved are less than straightforward, notwithstanding the technical advances made since then. An attempt is made here to surmount these difficulties by exploring Pearson's relevant activities during the first decade of his statistical career, and by describing the work by his contemporaries and predecessors which seem to have influenced his approach to the problem. Not all the questions are answered, and others remain for further study.

original paper: http://www.economics.soton.ac.uk/staff/aldrich/1900.pdf

How did Karl Pearson come up with the chi-squared statistic?: https://stats.stackexchange.com/questions/97604/how-did-karl-pearson-come-up-with-the-chi-squared-statistic

He proceeds by working with the multivariate normal, and the chi-square arises as a sum of squared standardized normal variates.

You can see from the discussion on p160-161 he's clearly discussing applying the test to multinomial distributed data (I don't think he uses that term anywhere). He apparently understands the approximate multivariate normality of the multinomial (certainly he knows the margins are approximately normal - that's a very old result - and knows the means, variances and covariances, since they're stated in the paper); my guess is that most of that stuff is already old hat by 1900. (Note that the chi-squared distribution itself dates back to work by Helmert in the mid-1870s.)

Then by the bottom of p163 he derives a chi-square statistic as "a measure of goodness of fit" (the statistic itself appears in the exponent of the multivariate normal approximation).

He then goes on to discuss how to evaluate the p-value*, and then he correctly gives the upper tail area of a χ212χ122 beyond 43.87 as 0.000016. [You should keep in mind, however, that he didn't correctly understand how to adjust degrees of freedom for parameter estimation at that stage, so some of the examples in his papers use too high a d.f.]

nibble
papers
acm
stats
hypothesis-testing
methodology
history
mostly-modern
pre-ww2
old-anglo
giants
science
the-trenches
stories
multi
q-n-a
overflow
explanation
summary
innovation
discovery
distribution
degrees-of-freedom
limits
original paper: http://www.economics.soton.ac.uk/staff/aldrich/1900.pdf

How did Karl Pearson come up with the chi-squared statistic?: https://stats.stackexchange.com/questions/97604/how-did-karl-pearson-come-up-with-the-chi-squared-statistic

He proceeds by working with the multivariate normal, and the chi-square arises as a sum of squared standardized normal variates.

You can see from the discussion on p160-161 he's clearly discussing applying the test to multinomial distributed data (I don't think he uses that term anywhere). He apparently understands the approximate multivariate normality of the multinomial (certainly he knows the margins are approximately normal - that's a very old result - and knows the means, variances and covariances, since they're stated in the paper); my guess is that most of that stuff is already old hat by 1900. (Note that the chi-squared distribution itself dates back to work by Helmert in the mid-1870s.)

Then by the bottom of p163 he derives a chi-square statistic as "a measure of goodness of fit" (the statistic itself appears in the exponent of the multivariate normal approximation).

He then goes on to discuss how to evaluate the p-value*, and then he correctly gives the upper tail area of a χ212χ122 beyond 43.87 as 0.000016. [You should keep in mind, however, that he didn't correctly understand how to adjust degrees of freedom for parameter estimation at that stage, so some of the examples in his papers use too high a d.f.]

october 2017 by nhaliday

**related tags**

Copy this bookmark: