**nhaliday + bias-variance**
25

trees are harlequins, words are harlequins — bayes: a kinda-sorta masterpost

august 2017 by nhaliday

lol, gwern: https://www.reddit.com/r/slatestarcodex/comments/6ghsxf/biweekly_rational_feed/diqr0rq/

> What sort of person thinks “oh yeah, my beliefs about these coefficients correspond to a Gaussian with variance 2.5″? And what if I do cross-validation, like I always do, and find that variance 200 works better for the problem? Was the other person wrong? But how could they have known?

> ...Even ignoring the mode vs. mean issue, I have never met anyone who could tell whether their beliefs were normally distributed vs. Laplace distributed. Have you?

I must have spent too much time in Bayesland because both those strike me as very easy and I often think them! My beliefs usually are Laplace distributed when it comes to things like genetics (it makes me very sad to see GWASes with flat priors), and my Gaussian coefficients are actually a variance of 0.70 (assuming standardized variables w.l.o.g.) as is consistent with field-wide meta-analyses indicating that d>1 is pretty rare.

ratty
ssc
core-rats
tumblr
social
explanation
init
philosophy
bayesian
thinking
probability
stats
frequentist
big-yud
lesswrong
synchrony
similarity
critique
intricacy
shalizi
scitariat
selection
mutation
evolution
priors-posteriors
regularization
bias-variance
gwern
reddit
commentary
GWAS
genetics
regression
spock
nitty-gritty
generalization
epistemic
🤖
rationality
poast
multi
best-practices
methodology
data-science
> What sort of person thinks “oh yeah, my beliefs about these coefficients correspond to a Gaussian with variance 2.5″? And what if I do cross-validation, like I always do, and find that variance 200 works better for the problem? Was the other person wrong? But how could they have known?

> ...Even ignoring the mode vs. mean issue, I have never met anyone who could tell whether their beliefs were normally distributed vs. Laplace distributed. Have you?

I must have spent too much time in Bayesland because both those strike me as very easy and I often think them! My beliefs usually are Laplace distributed when it comes to things like genetics (it makes me very sad to see GWASes with flat priors), and my Gaussian coefficients are actually a variance of 0.70 (assuming standardized variables w.l.o.g.) as is consistent with field-wide meta-analyses indicating that d>1 is pretty rare.

august 2017 by nhaliday

Econometric Modeling as Junk Science

june 2017 by nhaliday

The Credibility Revolution in Empirical Economics: How Better Research Design Is Taking the Con out of Econometrics: https://www.aeaweb.org/articles?id=10.1257/jep.24.2.3

On data, experiments, incentives and highly unconvincing research – papers and hot beverages: https://papersandhotbeverages.wordpress.com/2015/10/31/on-data-experiments-incentives-and-highly-unconvincing-research/

In my view, it has just to do with the fact that academia is a peer monitored organization. In the case of (bad) data collection papers, issues related to measurement are typically boring. They are relegated to appendices, no one really has an incentive to monitor it seriously. The problem is similar in formal theory: no one really goes through the algebra in detail, but it is in principle feasible to do it, and, actually, sometimes these errors are detected. If discussing the algebra of a proof is almost unthinkable in a seminar, going into the details of data collection, measurement and aggregation is not only hard to imagine, but probably intrinsically infeasible.

Something different happens for the experimentalist people. As I was saying, I feel we have come to a point in which many papers are evaluated based on the cleverness and originality of the research design (“Using the World Cup qualifiers as an instrument for patriotism!? Woaw! how cool/crazy is that! I wish I had had that idea”). The sexiness of the identification strategy has too often become a goal in itself. When your peers monitor you paying more attention to the originality of the identification strategy than to the research question, you probably have an incentive to mine reality for ever crazier discontinuities. It is true methodologists have been criticized in the past for analogous reasons, such as being guided by the desire to increase mathematical complexity without a clear benefit. But, if you work with pure formal theory or statistical theory, your work is not meant to immediately answer question about the real world, but instead to serve other researchers in their quest. This is something that can, in general, not be said of applied CI work.

https://twitter.com/pseudoerasmus/status/662007951415238656

This post should have been entitled “Zombies who only think of their next cool IV fix”

https://twitter.com/pseudoerasmus/status/662692917069422592

massive lust for quasi-natural experiments, regression discontinuities

barely matters if the effects are not all that big

I suppose even the best of things must reach their decadent phase; methodological innov. to manias……

https://twitter.com/cblatts/status/920988530788130816

Following this "collapse of small-N social psych results" business, where do I predict econ will collapse? I see two main contenders.

One is lab studies. I dallied with these a few years ago in a Kenya lab. We ran several pilots of N=200 to figure out the best way to treat

and to measure the outcome. Every pilot gave us a different stat sig result. I could have written six papers concluding different things.

I gave up more skeptical of these lab studies than ever before. The second contender is the long run impacts literature in economic history

We should be very suspicious since we never see a paper showing that a historical event had no effect on modern day institutions or dvpt.

On the one hand I find these studies fun, fascinating, and probably true in a broad sense. They usually reinforce a widely believed history

argument with interesting data and a cute empirical strategy. But I don't think anyone believes the standard errors. There's probably a HUGE

problem of nonsignificant results staying in the file drawer. Also, there are probably data problems that don't get revealed, as we see with

the recent Piketty paper (http://marginalrevolution.com/marginalrevolution/2017/10/pikettys-data-reliable.html). So I take that literature with a vat of salt, even if I enjoy and admire the works

I used to think field experiments would show little consistency in results across place. That external validity concerns would be fatal.

In fact the results across different samples and places have proven surprisingly similar across places, and added a lot to general theory

Last, I've come to believe there is no such thing as a useful instrumental variable. The ones that actually meet the exclusion restriction

are so weird & particular that the local treatment effect is likely far different from the average treatment effect in non-transparent ways.

Most of the other IVs don't plausibly meet the e clue ion restriction. I mean, we should be concerned when the IV estimate is always 10x

larger than the OLS coefficient. This I find myself much more persuaded by simple natural experiments that use OLS, diff in diff, or

discontinuities, alongside randomized trials.

What do others think are the cliffs in economics?

PS All of these apply to political science too. Though I have a special extra target in poli sci: survey experiments! A few are good. I like

Dan Corstange's work. But it feels like 60% of dissertations these days are experiments buried in a survey instrument that measure small

changes in response. These at least have large N. But these are just uncontrolled labs, with negligible external validity in my mind.

The good ones are good. This method has its uses. But it's being way over-applied. More people have to make big and risky investments in big

natural and field experiments. Time to raise expectations and ambitions. This expectation bar, not technical ability, is the big advantage

economists have over political scientists when they compete in the same space.

(Ok. So are there any friends and colleagues I haven't insulted this morning? Let me know and I'll try my best to fix it with a screed)

HOW MUCH SHOULD WE TRUST DIFFERENCES-IN-DIFFERENCES ESTIMATES?∗: https://economics.mit.edu/files/750

Most papers that employ Differences-in-Differences estimation (DD) use many years of data and focus on serially correlated outcomes but ignore that the resulting standard errors are inconsistent. To illustrate the severity of this issue, we randomly generate placebo laws in state-level data on female wages from the Current Population Survey. For each law, we use OLS to compute the DD estimate of its “effect” as well as the standard error of this estimate. These conventional DD standard errors severely understate the standard deviation of the estimators: we find an “effect” significant at the 5 percent level for up to 45 percent of the placebo interventions. We use Monte Carlo simulations to investigate how well existing methods help solve this problem. Econometric corrections that place a specific parametric form on the time-series process do not perform well. Bootstrap (taking into account the auto-correlation of the data) works well when the number of states is large enough. Two corrections based on asymptotic approximation of the variance-covariance matrix work well for moderate numbers of states and one correction that collapses the time series information into a “pre” and “post” period and explicitly takes into account the effective sample size works well even for small numbers of states.

‘METRICS MONDAY: 2SLS–CHRONICLE OF A DEATH FORETOLD: http://marcfbellemare.com/wordpress/12733

As it turns out, Young finds that

1. Conventional tests tend to overreject the null hypothesis that the 2SLS coefficient is equal to zero.

2. 2SLS estimates are falsely declared significant one third to one half of the time, depending on the method used for bootstrapping.

3. The 99-percent confidence intervals (CIs) of those 2SLS estimates include the OLS point estimate over 90 of the time. They include the full OLS 99-percent CI over 75 percent of the time.

4. 2SLS estimates are extremely sensitive to outliers. Removing simply one outlying cluster or observation, almost half of 2SLS results become insignificant. Things get worse when removing two outlying clusters or observations, as over 60 percent of 2SLS results then become insignificant.

5. Using a Durbin-Wu-Hausman test, less than 15 percent of regressions can reject the null that OLS estimates are unbiased at the 1-percent level.

6. 2SLS has considerably higher mean squared error than OLS.

7. In one third to one half of published results, the null that the IVs are totally irrelevant cannot be rejected, and so the correlation between the endogenous variable(s) and the IVs is due to finite sample correlation between them.

8. Finally, fewer than 10 percent of 2SLS estimates reject instrument irrelevance and the absence of OLS bias at the 1-percent level using a Durbin-Wu-Hausman test. It gets much worse–fewer than 5 percent–if you add in the requirement that the 2SLS CI that excludes the OLS estimate.

Methods Matter: P-Hacking and Causal Inference in Economics*: http://ftp.iza.org/dp11796.pdf

Applying multiple methods to 13,440 hypothesis tests reported in 25 top economics journals in 2015, we show that selective publication and p-hacking is a substantial problem in research employing DID and (in particular) IV. RCT and RDD are much less problematic. Almost 25% of claims of marginally significant results in IV papers are misleading.

https://twitter.com/NoamJStein/status/1040887307568664577

Ever since I learned social science is completely fake, I've had a lot more time to do stuff that matters, like deadlifting and reading about Mediterranean haplogroups

--

Wait, so, from fakest to realest IV>DD>RCT>RDD? That totally matches my impression.

org:junk
org:edu
economics
econometrics
methodology
realness
truth
science
social-science
accuracy
generalization
essay
article
hmm
multi
study
🎩
empirical
causation
error
critique
sociology
criminology
hypothesis-testing
econotariat
broad-econ
cliometrics
endo-exo
replication
incentives
academia
measurement
wire-guided
intricacy
twitter
social
discussion
pseudoE
effect-size
reflection
field-study
stat-power
piketty
marginal-rev
commentary
data-science
expert-experience
regression
gotchas
rant
map-territory
pdf
simulation
moments
confidence
bias-variance
stats
endogenous-exogenous
control
meta:science
meta-analysis
outliers
summary
sampling
ensembles
monte-carlo
theory-practice
applicability-prereqs
chart
comparison
shift
ratty
unaffiliated
On data, experiments, incentives and highly unconvincing research – papers and hot beverages: https://papersandhotbeverages.wordpress.com/2015/10/31/on-data-experiments-incentives-and-highly-unconvincing-research/

In my view, it has just to do with the fact that academia is a peer monitored organization. In the case of (bad) data collection papers, issues related to measurement are typically boring. They are relegated to appendices, no one really has an incentive to monitor it seriously. The problem is similar in formal theory: no one really goes through the algebra in detail, but it is in principle feasible to do it, and, actually, sometimes these errors are detected. If discussing the algebra of a proof is almost unthinkable in a seminar, going into the details of data collection, measurement and aggregation is not only hard to imagine, but probably intrinsically infeasible.

Something different happens for the experimentalist people. As I was saying, I feel we have come to a point in which many papers are evaluated based on the cleverness and originality of the research design (“Using the World Cup qualifiers as an instrument for patriotism!? Woaw! how cool/crazy is that! I wish I had had that idea”). The sexiness of the identification strategy has too often become a goal in itself. When your peers monitor you paying more attention to the originality of the identification strategy than to the research question, you probably have an incentive to mine reality for ever crazier discontinuities. It is true methodologists have been criticized in the past for analogous reasons, such as being guided by the desire to increase mathematical complexity without a clear benefit. But, if you work with pure formal theory or statistical theory, your work is not meant to immediately answer question about the real world, but instead to serve other researchers in their quest. This is something that can, in general, not be said of applied CI work.

https://twitter.com/pseudoerasmus/status/662007951415238656

This post should have been entitled “Zombies who only think of their next cool IV fix”

https://twitter.com/pseudoerasmus/status/662692917069422592

massive lust for quasi-natural experiments, regression discontinuities

barely matters if the effects are not all that big

I suppose even the best of things must reach their decadent phase; methodological innov. to manias……

https://twitter.com/cblatts/status/920988530788130816

Following this "collapse of small-N social psych results" business, where do I predict econ will collapse? I see two main contenders.

One is lab studies. I dallied with these a few years ago in a Kenya lab. We ran several pilots of N=200 to figure out the best way to treat

and to measure the outcome. Every pilot gave us a different stat sig result. I could have written six papers concluding different things.

I gave up more skeptical of these lab studies than ever before. The second contender is the long run impacts literature in economic history

We should be very suspicious since we never see a paper showing that a historical event had no effect on modern day institutions or dvpt.

On the one hand I find these studies fun, fascinating, and probably true in a broad sense. They usually reinforce a widely believed history

argument with interesting data and a cute empirical strategy. But I don't think anyone believes the standard errors. There's probably a HUGE

problem of nonsignificant results staying in the file drawer. Also, there are probably data problems that don't get revealed, as we see with

the recent Piketty paper (http://marginalrevolution.com/marginalrevolution/2017/10/pikettys-data-reliable.html). So I take that literature with a vat of salt, even if I enjoy and admire the works

I used to think field experiments would show little consistency in results across place. That external validity concerns would be fatal.

In fact the results across different samples and places have proven surprisingly similar across places, and added a lot to general theory

Last, I've come to believe there is no such thing as a useful instrumental variable. The ones that actually meet the exclusion restriction

are so weird & particular that the local treatment effect is likely far different from the average treatment effect in non-transparent ways.

Most of the other IVs don't plausibly meet the e clue ion restriction. I mean, we should be concerned when the IV estimate is always 10x

larger than the OLS coefficient. This I find myself much more persuaded by simple natural experiments that use OLS, diff in diff, or

discontinuities, alongside randomized trials.

What do others think are the cliffs in economics?

PS All of these apply to political science too. Though I have a special extra target in poli sci: survey experiments! A few are good. I like

Dan Corstange's work. But it feels like 60% of dissertations these days are experiments buried in a survey instrument that measure small

changes in response. These at least have large N. But these are just uncontrolled labs, with negligible external validity in my mind.

The good ones are good. This method has its uses. But it's being way over-applied. More people have to make big and risky investments in big

natural and field experiments. Time to raise expectations and ambitions. This expectation bar, not technical ability, is the big advantage

economists have over political scientists when they compete in the same space.

(Ok. So are there any friends and colleagues I haven't insulted this morning? Let me know and I'll try my best to fix it with a screed)

HOW MUCH SHOULD WE TRUST DIFFERENCES-IN-DIFFERENCES ESTIMATES?∗: https://economics.mit.edu/files/750

Most papers that employ Differences-in-Differences estimation (DD) use many years of data and focus on serially correlated outcomes but ignore that the resulting standard errors are inconsistent. To illustrate the severity of this issue, we randomly generate placebo laws in state-level data on female wages from the Current Population Survey. For each law, we use OLS to compute the DD estimate of its “effect” as well as the standard error of this estimate. These conventional DD standard errors severely understate the standard deviation of the estimators: we find an “effect” significant at the 5 percent level for up to 45 percent of the placebo interventions. We use Monte Carlo simulations to investigate how well existing methods help solve this problem. Econometric corrections that place a specific parametric form on the time-series process do not perform well. Bootstrap (taking into account the auto-correlation of the data) works well when the number of states is large enough. Two corrections based on asymptotic approximation of the variance-covariance matrix work well for moderate numbers of states and one correction that collapses the time series information into a “pre” and “post” period and explicitly takes into account the effective sample size works well even for small numbers of states.

‘METRICS MONDAY: 2SLS–CHRONICLE OF A DEATH FORETOLD: http://marcfbellemare.com/wordpress/12733

As it turns out, Young finds that

1. Conventional tests tend to overreject the null hypothesis that the 2SLS coefficient is equal to zero.

2. 2SLS estimates are falsely declared significant one third to one half of the time, depending on the method used for bootstrapping.

3. The 99-percent confidence intervals (CIs) of those 2SLS estimates include the OLS point estimate over 90 of the time. They include the full OLS 99-percent CI over 75 percent of the time.

4. 2SLS estimates are extremely sensitive to outliers. Removing simply one outlying cluster or observation, almost half of 2SLS results become insignificant. Things get worse when removing two outlying clusters or observations, as over 60 percent of 2SLS results then become insignificant.

5. Using a Durbin-Wu-Hausman test, less than 15 percent of regressions can reject the null that OLS estimates are unbiased at the 1-percent level.

6. 2SLS has considerably higher mean squared error than OLS.

7. In one third to one half of published results, the null that the IVs are totally irrelevant cannot be rejected, and so the correlation between the endogenous variable(s) and the IVs is due to finite sample correlation between them.

8. Finally, fewer than 10 percent of 2SLS estimates reject instrument irrelevance and the absence of OLS bias at the 1-percent level using a Durbin-Wu-Hausman test. It gets much worse–fewer than 5 percent–if you add in the requirement that the 2SLS CI that excludes the OLS estimate.

Methods Matter: P-Hacking and Causal Inference in Economics*: http://ftp.iza.org/dp11796.pdf

Applying multiple methods to 13,440 hypothesis tests reported in 25 top economics journals in 2015, we show that selective publication and p-hacking is a substantial problem in research employing DID and (in particular) IV. RCT and RDD are much less problematic. Almost 25% of claims of marginally significant results in IV papers are misleading.

https://twitter.com/NoamJStein/status/1040887307568664577

Ever since I learned social science is completely fake, I've had a lot more time to do stuff that matters, like deadlifting and reading about Mediterranean haplogroups

--

Wait, so, from fakest to realest IV>DD>RCT>RDD? That totally matches my impression.

june 2017 by nhaliday

POPULATION STRUCTURE AND QUANTITATIVE CHARACTERS

may 2017 by nhaliday

The variance of among-group variance is substantial and does not depend on the number of loci contributing to variance in the character. It is just as large for polygenic characters as for single loci with the same additive variance. This implies that one polygenic character contains exactly as much information about population relationships as one single-locus marker.

same is true of expectation apparently (so drift has same impact on polygenic and single-locus traits)

pdf
study
west-hunter
scitariat
bio
genetics
genomics
sapiens
QTL
correlation
null-result
magnitude
nibble
🌞
models
population-genetics
methodology
regularizer
moments
bias-variance
pop-diff
pop-structure
gene-drift
same is true of expectation apparently (so drift has same impact on polygenic and single-locus traits)

may 2017 by nhaliday

Law of total variance - Wikipedia

march 2017 by nhaliday

Var Y = E[Var(Y|X)] + Var E[Y|X]

math
acm
stats
probability
identity
levers
wiki
reference
marginal
moments
bias-variance
nibble
march 2017 by nhaliday

probability - Variance of maximum of Gaussian random variables - Cross Validated

february 2017 by nhaliday

In full generality it is rather hard to find the right order of magnitude of the variance of a Gaussien supremum since the tools from concentration theory are always suboptimal for the maximum function.

order ~ 1/log n

q-n-a
overflow
stats
probability
acm
orders
tails
bias-variance
moments
concentration-of-measure
magnitude
tidbits
distribution
yoga
structure
extrema
nibble
order ~ 1/log n

february 2017 by nhaliday

bounds - What is the variance of the maximum of a sample? - Cross Validated

february 2017 by nhaliday

- sum of variances is always a bound

- can't do better even for iid Bernoulli

- looks like nice argument from well-known probabilist (using E[(X-Y)^2] = 2Var X), but not clear to me how he gets to sum_i instead of sum_{i,j} in the union bound?

edit: argument is that, for j = argmax_k Y_k, we have r < X_i - Y_j <= X_i - Y_i for all i, including i = argmax_k X_k

- different proof here (later pages): http://www.ism.ac.jp/editsec/aism/pdf/047_1_0185.pdf

Var(X_n:n) <= sum Var(X_k:n) + 2 sum_{i < j} Cov(X_i:n, X_j:n) = Var(sum X_k:n) = Var(sum X_k) = nσ^2

why are the covariances nonnegative? (are they?). intuitively seems true.

- for that, see https://pinboard.in/u:nhaliday/b:ed4466204bb1

- note that this proof shows more generally that sum Var(X_k:n) <= sum Var(X_k)

- apparently that holds for dependent X_k too? http://mathoverflow.net/a/96943/20644

q-n-a
overflow
stats
acm
distribution
tails
bias-variance
moments
estimate
magnitude
probability
iidness
tidbits
concentration-of-measure
multi
orders
levers
extrema
nibble
bonferroni
coarse-fine
expert
symmetry
s:*
expert-experience
proofs
- can't do better even for iid Bernoulli

- looks like nice argument from well-known probabilist (using E[(X-Y)^2] = 2Var X), but not clear to me how he gets to sum_i instead of sum_{i,j} in the union bound?

edit: argument is that, for j = argmax_k Y_k, we have r < X_i - Y_j <= X_i - Y_i for all i, including i = argmax_k X_k

- different proof here (later pages): http://www.ism.ac.jp/editsec/aism/pdf/047_1_0185.pdf

Var(X_n:n) <= sum Var(X_k:n) + 2 sum_{i < j} Cov(X_i:n, X_j:n) = Var(sum X_k:n) = Var(sum X_k) = nσ^2

why are the covariances nonnegative? (are they?). intuitively seems true.

- for that, see https://pinboard.in/u:nhaliday/b:ed4466204bb1

- note that this proof shows more generally that sum Var(X_k:n) <= sum Var(X_k)

- apparently that holds for dependent X_k too? http://mathoverflow.net/a/96943/20644

february 2017 by nhaliday

Count–min sketch - Wikipedia

february 2017 by nhaliday

- estimates frequency vector (f_i)

- idea:

d = O(log 1/δ) hash functions h_j: [n] -> [w] (w = O(1/ε))

d*w counters a[r, c]

for each event i, increment counters a[1, h_1(i)], a[2, h_2(i)], ..., a[d, h_d(i)]

estimate for f_i is min_j a[j, h_j(i)]

- never underestimates but upward-biased

- pf: Markov to get constant probability of success, then exponential decrease with repetition

lecture notes: http://theory.stanford.edu/~tim/s15/l/l2.pdf

- note this can work w/ negative updates. just use median instead of min. pf still uses markov on the absolute value of error.

algorithms
data-structures
sublinear
hashing
wiki
reference
bias-variance
approximation
random
tcs
multi
stanford
lecture-notes
pdf
tim-roughgarden
nibble
pigeonhole-markov
PAC
- idea:

d = O(log 1/δ) hash functions h_j: [n] -> [w] (w = O(1/ε))

d*w counters a[r, c]

for each event i, increment counters a[1, h_1(i)], a[2, h_2(i)], ..., a[d, h_d(i)]

estimate for f_i is min_j a[j, h_j(i)]

- never underestimates but upward-biased

- pf: Markov to get constant probability of success, then exponential decrease with repetition

lecture notes: http://theory.stanford.edu/~tim/s15/l/l2.pdf

- note this can work w/ negative updates. just use median instead of min. pf still uses markov on the absolute value of error.

february 2017 by nhaliday

teaching - Intuitive explanation for dividing by $n-1$ when calculating standard deviation? - Cross Validated

january 2017 by nhaliday

The standard deviation calculated with a divisor of n-1 is a standard deviation calculated from the sample as an estimate of the standard deviation of the population from which the sample was drawn. Because the observed values fall, on average, closer to the sample mean than to the population mean, the standard deviation which is calculated using deviations from the sample mean underestimates the desired standard deviation of the population. Using n-1 instead of n as the divisor corrects for that by making the result a little bit bigger.

Note that the correction has a larger proportional effect when n is small than when it is large, which is what we want because when n is larger the sample mean is likely to be a good estimator of the population mean.

...

A common one is that the definition of variance (of a distribution) is the second moment recentered around a known, definite mean, whereas the estimator uses an estimated mean. This loss of a degree of freedom (given the mean, you can reconstitute the dataset with knowledge of just n−1 of the data values) requires the use of n−1 rather than nn to "adjust" the result.

q-n-a
overflow
stats
acm
intuition
explanation
bias-variance
methodology
moments
nibble
degrees-of-freedom
sampling-bias
generalization
dimensionality
ground-up
intricacy
Note that the correction has a larger proportional effect when n is small than when it is large, which is what we want because when n is larger the sample mean is likely to be a good estimator of the population mean.

...

A common one is that the definition of variance (of a distribution) is the second moment recentered around a known, definite mean, whereas the estimator uses an estimated mean. This loss of a degree of freedom (given the mean, you can reconstitute the dataset with knowledge of just n−1 of the data values) requires the use of n−1 rather than nn to "adjust" the result.

january 2017 by nhaliday

Choosing prediction over explanation in psychology: Lessons from machine learning

study psychology social-psych social-science best-practices methodology interdisciplinary lens essay rhetoric heterodox big-picture machine-learning meta:prediction meta:science len:long info-dynamics bias-variance huge-data-the-biggest generalization regularization interpretability

january 2017 by nhaliday

study psychology social-psych social-science best-practices methodology interdisciplinary lens essay rhetoric heterodox big-picture machine-learning meta:prediction meta:science len:long info-dynamics bias-variance huge-data-the-biggest generalization regularization interpretability

january 2017 by nhaliday

definition - Why square the difference instead of taking the absolute value in standard deviation? - Cross Validated

stats acm motivation synthesis q-n-a discussion probability tidbits overflow soft-question bias-variance curiosity moments robust comparison nibble s:* characterization limits concentration-of-measure

december 2016 by nhaliday

stats acm motivation synthesis q-n-a discussion probability tidbits overflow soft-question bias-variance curiosity moments robust comparison nibble s:* characterization limits concentration-of-measure

december 2016 by nhaliday

pr.probability - Google question: In a country in which people only want boys - MathOverflow

december 2016 by nhaliday

- limits to 1/2 w/ number of families -> ∞

- proportion of girls in one family is biased estimator of proportion in general population (larger families w/ more girls count more)

- interesting comment on Douglas Zare's answer (whether process has stopped or not)

puzzles
math
google
thinking
probability
q-n-a
gotchas
tidbits
math.CO
overflow
nibble
paradox
gender
bias-variance
stochastic-processes
- proportion of girls in one family is biased estimator of proportion in general population (larger families w/ more girls count more)

- interesting comment on Douglas Zare's answer (whether process has stopped or not)

december 2016 by nhaliday

A computer program used for bail and sentencing decisions was labeled biased against blacks. It’s actually not that clear. - The Washington Post

november 2016 by nhaliday

How We Analyzed the COMPAS Recidivism Algorithm: https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm

Automatic Justice: http://www.theamericanconservative.com/articles/automatic-justice/

A.I. ‘BIAS’ DOESN’T MEAN WHAT JOURNALISTS SAY IT MEANS: https://jacobitemag.com/2017/08/29/a-i-bias-doesnt-mean-what-journalists-want-you-to-think-it-means/

When a journalist discusses bias, they typically do not mean it in the same manner that statisticians do. As described in the examples above, a journalist typically uses the term “bias” when an algorithm’s output fails to live up to the journalist’s ideal reality.

Machine Learning and Human Bias: https://www.youtube.com/watch?v=59bMh59JQDo

analysis
ethical-algorithms
crime
critique
news
debate
org:rec
technocracy
multi
media
data
org:mag
investigative-journo
right-wing
douthatish
policy
law
criminal-justice
criminology
madisonian
org:popup
gnon
techtariat
rhetoric
biases
meta:rhetoric
propaganda
bias-variance
data-science
video
presentation
google
censorship
sv
tech
Automatic Justice: http://www.theamericanconservative.com/articles/automatic-justice/

A.I. ‘BIAS’ DOESN’T MEAN WHAT JOURNALISTS SAY IT MEANS: https://jacobitemag.com/2017/08/29/a-i-bias-doesnt-mean-what-journalists-want-you-to-think-it-means/

When a journalist discusses bias, they typically do not mean it in the same manner that statisticians do. As described in the examples above, a journalist typically uses the term “bias” when an algorithm’s output fails to live up to the journalist’s ideal reality.

Machine Learning and Human Bias: https://www.youtube.com/watch?v=59bMh59JQDo

november 2016 by nhaliday

Tetlock and Gardner’s Superforecasting: The Art and Science of Prediction | EVOLVING ECONOMICS

september 2016 by nhaliday

not as good as Expert Political Judgement apparently

Tetlock’s formula for a successful team is fairly simple. Get lots of forecasts, calculate the average of the forecast, and give extra weight to the top forecasters – a version of wisdom of the crowds. Then extremize the forecast. If the forecast is a 70% probability, bump up to 85%. If 30%, cut it to 15%.

The idea behind extremising is quite clever. No one in the group has access to all the dispersed information. If everyone had all the available information, this would tend to raise their confidence, which would result in a more extreme forecast. Since we can’t give everyone all the information, extremising is an attempt to simulate what would happen if you did. To get the benefits of this extremising, however, requires diversity. If everyone holds the same information there is no sharing of information to be simulated.

tetlock
books
review
summary
econotariat
meta:prediction
complex-systems
ensembles
biases
rationality
bounded-cognition
bias-variance
extrema
diversity
broad-econ
info-dynamics
Tetlock’s formula for a successful team is fairly simple. Get lots of forecasts, calculate the average of the forecast, and give extra weight to the top forecasters – a version of wisdom of the crowds. Then extremize the forecast. If the forecast is a 70% probability, bump up to 85%. If 30%, cut it to 15%.

The idea behind extremising is quite clever. No one in the group has access to all the dispersed information. If everyone had all the available information, this would tend to raise their confidence, which would result in a more extreme forecast. Since we can’t give everyone all the information, extremising is an attempt to simulate what would happen if you did. To get the benefits of this extremising, however, requires diversity. If everyone holds the same information there is no sharing of information to be simulated.

september 2016 by nhaliday

Democracy does not cause growth | Brookings Institution

september 2016 by nhaliday

64-page paper

Democracy & Growth: http://www.nber.org/papers/w4909

The favorable effects on growth include maintenance of the rule of law, free markets, small government consumption, and high human capital. Once these kinds of variables and the initial level of real per-capita GDP are held constant, the overall effect of democracy on growth is weakly negative. There is a suggestion of a nonlinear relationship in which democracy enhances growth at low levels of political freedom but depresses growth when a moderate level of freedom has already been attained.

The growth effect of democracy: Is it heterogenous and how can it be estimated∗: http://perseus.iies.su.se/~tpers/papers/cifar_paper_may16_07.pdf

In particular, we find an average negative effect on growth of leaving democracy on the order of −2 percentage points implying effects on income per capita as large as 45 percent over the 1960-2000 panel. Heterogenous characteristics of reforming and non-reforming countries appear to play an important role in driving these results.

Does democracy cause innovation? An empirical test of the popper hypothesis: http://www.sciencedirect.com.sci-hub.cc/science/article/pii/S0048733317300975

The results from the difference-in-differences method show that democracy itself has no direct positive effect on innovation measured with patent counts, patent citations and patent originality.

Benevolent Autocrats: https://williameasterly.files.wordpress.com/2011/09/benevolent-autocrats-easterly-draft.pdf

A large literature attributes this to the higher variance of growth rates under autocracy than under democracy. The literature offers alternative explanations for this stylized fact: (1) leaders don’t matter under democracy, but good and bad leaders under autocracy cause high and low growth, (2) leaders don’t matter under autocracy either, but good and bad autocratic systems cause greater extremes of high and low growth, or (3) democracy does better than autocracy at reducing variance from shocks from outside the political system. This paper details further the stylized facts to test these distinctions. Inconsistent with (1), the variance of growth within the terms of leaders swamps the variance across leaders, and more so under autocracy than under democracy. Country effects under autocracy are also overwhelmed by within-country variance, inconsistent with (2). Explanation (3) fits the stylized facts the best of the three alternatives.

Political Institutions, Size of Government and Redistribution: An empirical investigation: http://www.lse.ac.uk/internationalDevelopment/pdf/WP/WP89.pdf

Results show that the stronger democratic institutions are, the lower is government size and the higher the redistributional capacity of the state. Political competition exercises the strongest and most robust effect on the two variables.

https://twitter.com/GarettJones/status/899466295170801664

https://archive.is/sPFII

Fits the high-variance theory of autocracies:

More miracles, more disasters. And there's a lot of demand for miracles.

Measuring the ups and downs of governance: https://www.brookings.edu/blog/future-development/2017/09/22/measuring-the-ups-and-downs-of-governance/

Figure 2: Voice and Accountability and Government Effectiveness, 2016

https://twitter.com/whyvert/status/917444456386666497

https://archive.is/EBQlD

Georgia, Japan, Rwanda, and Serbia ↑ Gov Effectiveness; Indonesia, Tunisia, Liberia, Serbia, and Nigeria ↑ Voice and Accountability.

The logic of hereditary rule: theory and evidence: http://eprints.lse.ac.uk/69615/

Hereditary leadership has been an important feature of the political landscape throughout history. This paper argues that hereditary leadership is like a relational contract which improves policy incentives. We assemble a unique dataset on leaders between 1874 and 2004 in which we classify them as hereditary leaders based on their family history. The core empirical finding is that economic growth is higher in polities with hereditary leaders but only if executive constraints are weak. Moreover, this holds across of a range of specifications. The finding is also mirrored in policy outcomes which affect growth. In addition, we find that hereditary leadership is more likely to come to an end when the growth performance under the incumbent leader is poor.

I noted this when the paper was a working paper, but non-hereditary polities with strong contraints have higher growth rates.

study
announcement
polisci
economics
macro
government
policy
contrarianism
hmm
econometrics
counterfactual
alt-inst
institutions
new-religion
thiel
political-econ
stylized-facts
🎩
group-level
longitudinal
c:**
2016
summary
realpolitik
wonkish
mostly-modern
democracy
org:ngo
ideology
definite-planning
social-choice
nascent-state
chart
madisonian
antidemos
cynicism-idealism
kumbaya-kult
whiggish-hegelian
multi
pdf
effect-size
authoritarianism
growth-econ
econ-metrics
wealth-of-nations
wealth
innovation
null-result
endo-exo
leviathan
civil-liberty
property-rights
capitalism
markets
human-capital
curvature
piracy
easterly
bias-variance
moments
outcome-risk
redistribution
welfare-state
white-paper
natural-experiment
correlation
history
cold-war
twitter
social
commentary
spearhead
econotariat
garett-jones
backup
gibbon
counter-revolution
data
visualization
plots
trends
marginal
scitariat
hive-mind
inequality
egalitarianism-hierarchy
world
developing-world
convexity-curvature
endogeno
Democracy & Growth: http://www.nber.org/papers/w4909

The favorable effects on growth include maintenance of the rule of law, free markets, small government consumption, and high human capital. Once these kinds of variables and the initial level of real per-capita GDP are held constant, the overall effect of democracy on growth is weakly negative. There is a suggestion of a nonlinear relationship in which democracy enhances growth at low levels of political freedom but depresses growth when a moderate level of freedom has already been attained.

The growth effect of democracy: Is it heterogenous and how can it be estimated∗: http://perseus.iies.su.se/~tpers/papers/cifar_paper_may16_07.pdf

In particular, we find an average negative effect on growth of leaving democracy on the order of −2 percentage points implying effects on income per capita as large as 45 percent over the 1960-2000 panel. Heterogenous characteristics of reforming and non-reforming countries appear to play an important role in driving these results.

Does democracy cause innovation? An empirical test of the popper hypothesis: http://www.sciencedirect.com.sci-hub.cc/science/article/pii/S0048733317300975

The results from the difference-in-differences method show that democracy itself has no direct positive effect on innovation measured with patent counts, patent citations and patent originality.

Benevolent Autocrats: https://williameasterly.files.wordpress.com/2011/09/benevolent-autocrats-easterly-draft.pdf

A large literature attributes this to the higher variance of growth rates under autocracy than under democracy. The literature offers alternative explanations for this stylized fact: (1) leaders don’t matter under democracy, but good and bad leaders under autocracy cause high and low growth, (2) leaders don’t matter under autocracy either, but good and bad autocratic systems cause greater extremes of high and low growth, or (3) democracy does better than autocracy at reducing variance from shocks from outside the political system. This paper details further the stylized facts to test these distinctions. Inconsistent with (1), the variance of growth within the terms of leaders swamps the variance across leaders, and more so under autocracy than under democracy. Country effects under autocracy are also overwhelmed by within-country variance, inconsistent with (2). Explanation (3) fits the stylized facts the best of the three alternatives.

Political Institutions, Size of Government and Redistribution: An empirical investigation: http://www.lse.ac.uk/internationalDevelopment/pdf/WP/WP89.pdf

Results show that the stronger democratic institutions are, the lower is government size and the higher the redistributional capacity of the state. Political competition exercises the strongest and most robust effect on the two variables.

https://twitter.com/GarettJones/status/899466295170801664

https://archive.is/sPFII

Fits the high-variance theory of autocracies:

More miracles, more disasters. And there's a lot of demand for miracles.

Measuring the ups and downs of governance: https://www.brookings.edu/blog/future-development/2017/09/22/measuring-the-ups-and-downs-of-governance/

Figure 2: Voice and Accountability and Government Effectiveness, 2016

https://twitter.com/whyvert/status/917444456386666497

https://archive.is/EBQlD

Georgia, Japan, Rwanda, and Serbia ↑ Gov Effectiveness; Indonesia, Tunisia, Liberia, Serbia, and Nigeria ↑ Voice and Accountability.

The logic of hereditary rule: theory and evidence: http://eprints.lse.ac.uk/69615/

Hereditary leadership has been an important feature of the political landscape throughout history. This paper argues that hereditary leadership is like a relational contract which improves policy incentives. We assemble a unique dataset on leaders between 1874 and 2004 in which we classify them as hereditary leaders based on their family history. The core empirical finding is that economic growth is higher in polities with hereditary leaders but only if executive constraints are weak. Moreover, this holds across of a range of specifications. The finding is also mirrored in policy outcomes which affect growth. In addition, we find that hereditary leadership is more likely to come to an end when the growth performance under the incumbent leader is poor.

I noted this when the paper was a working paper, but non-hereditary polities with strong contraints have higher growth rates.

september 2016 by nhaliday

**related tags**

Copy this bookmark: