nhaliday + stats   301

Stein's example - Wikipedia
Stein's example (or phenomenon or paradox), in decision theory and estimation theory, is the phenomenon that when three or more parameters are estimated simultaneously, there exist combined estimators more accurate on average (that is, having lower expected mean squared error) than any method that handles the parameters separately. It is named after Charles Stein of Stanford University, who discovered the phenomenon in 1955.[1]

An intuitive explanation is that optimizing for the mean-squared error of a combined estimator is not the same as optimizing for the errors of separate estimators of the individual parameters. In practical terms, if the combined error is in fact of interest, then a combined estimator should be used, even if the underlying parameters are independent; this occurs in channel estimation in telecommunications, for instance (different factors affect overall channel performance). On the other hand, if one is instead interested in estimating an individual parameter, then using a combined estimator does not help and is in fact worse.


Many simple, practical estimators achieve better performance than the ordinary estimator. The best-known example is the James–Stein estimator, which works by starting at X and moving towards a particular point (such as the origin) by an amount inversely proportional to the distance of X from that point.
nibble  concept  levers  wiki  reference  acm  stats  probability  decision-theory  estimate  distribution  atoms 
february 2018 by nhaliday
The Gelman View – spottedtoad
I have read Andrew Gelman’s blog for about five years, and gradually, I’ve decided that among his many blog posts and hundreds of academic articles, he is advancing a philosophy not just of statistics but of quantitative social science in general. Not a statistician myself, here is how I would articulate the Gelman View:

A. Purposes

1. The purpose of social statistics is to describe and understand variation in the world. The world is a complicated place, and we shouldn’t expect things to be simple.
2. The purpose of scientific publication is to allow for communication, dialogue, and critique, not to “certify” a specific finding as absolute truth.
3. The incentive structure of science needs to reward attempts to independently investigate, reproduce, and refute existing claims and observed patterns, not just to advance new hypotheses or support a particular research agenda.

B. Approach

1. Because the world is complicated, the most valuable statistical models for the world will generally be complicated. The result of statistical investigations will only rarely be to  give a stamp of truth on a specific effect or causal claim, but will generally show variation in effects and outcomes.
2. Whenever possible, the data, analytic approach, and methods should be made as transparent and replicable as possible, and should be fair game for anyone to examine, critique, or amend.
3. Social scientists should look to build upon a broad shared body of knowledge, not to “own” a particular intervention, theoretic framework, or technique. Such ownership creates incentive problems when the intervention, framework, or technique fail and the scientist is left trying to support a flawed structure.


1. Measurement. How and what we measure is the first question, well before we decide on what the effects are or what is making that measurement change.
2. Sampling. Who we talk to or collect information from always matters, because we should always expect effects to depend on context.
3. Inference. While models should usually be complex, our inferential framework should be simple enough for anyone to follow along. And no p values.

He might disagree with all of this, or how it reflects his understanding of his own work. But I think it is a valuable guide to empirical work.
ratty  unaffiliated  summary  gelman  scitariat  philosophy  lens  stats  hypothesis-testing  science  meta:science  social-science  institutions  truth  is-ought  best-practices  data-science  info-dynamics  alt-inst  academia  empirical  evidence-based  checklists  strategy  epistemic 
november 2017 by nhaliday
Use and Interpretation of LD Score Regression
LD Score regression distinguishes confounding from polygenicity in genome-wide association studies: https://sci-hub.bz/10.1038/ng.3211
- Po-Ru Loh, Nick Patterson, et al.


Both polygenicity (i.e. many small genetic effects) and confounding biases, such as cryptic relatedness and population stratification, can yield inflated distributions of test statistics in genome-wide association studies (GWAS). However, current methods cannot distinguish between inflation from bias and true signal from polygenicity. We have developed an approach that quantifies the contributions of each by examining the relationship between test statistics and linkage disequilibrium (LD). We term this approach LD Score regression. LD Score regression provides an upper bound on the contribution of confounding bias to the observed inflation in test statistics and can be used to estimate a more powerful correction factor than genomic control. We find strong evidence that polygenicity accounts for the majority of test statistic inflation in many GWAS of large sample size.

Supplementary Note: https://images.nature.com/original/nature-assets/ng/journal/v47/n3/extref/ng.3211-S1.pdf

An atlas of genetic correlations across human diseases
and traits: https://sci-hub.bz/10.1038/ng.3406


Supplementary Note: https://images.nature.com/original/nature-assets/ng/journal/v47/n11/extref/ng.3406-S1.pdf

ldsc is a command line tool for estimating heritability and genetic correlation from GWAS summary statistics. ldsc also computes LD Scores.
nibble  pdf  slides  talks  bio  biodet  genetics  genomics  GWAS  genetic-correlation  correlation  methodology  bioinformatics  concept  levers  🌞  tutorial  explanation  pop-structure  gene-drift  ideas  multi  study  org:nat  article  repo  software  tools  libraries  stats  hypothesis-testing  biases  confounding  gotchas  QTL  simulation  survey  preprint  population-genetics 
november 2017 by nhaliday
Fitting a Structural Equation Model
seems rather unrigorous: nonlinear optimization, possibility of nonconvergence, doesn't even mention local vs. global optimality...
pdf  slides  lectures  acm  stats  hypothesis-testing  graphs  graphical-models  latent-variables  model-class  optimization  nonlinearity  gotchas  nibble  ML-MAP-E  iteration-recursion  convergence 
november 2017 by nhaliday
multivariate analysis - Is it possible to have a pair of Gaussian random variables for which the joint distribution is not Gaussian? - Cross Validated
The bivariate normal distribution is the exception, not the rule!

It is important to recognize that "almost all" joint distributions with normal marginals are not the bivariate normal distribution. That is, the common viewpoint that joint distributions with normal marginals that are not the bivariate normal are somehow "pathological", is a bit misguided.

Certainly, the multivariate normal is extremely important due to its stability under linear transformations, and so receives the bulk of attention in applications.

note: there is a multivariate central limit theorem, so those such applications have no problem
nibble  q-n-a  overflow  stats  math  acm  probability  distribution  gotchas  intricacy  characterization  structure  composition-decomposition  counterexample  limits  concentration-of-measure 
october 2017 by nhaliday
Karl Pearson and the Chi-squared Test
Pearson's paper of 1900 introduced what subsequently became known as the chi-squared test of goodness of fit. The terminology and allusions of 80 years ago create a barrier for the modern reader, who finds that the interpretation of Pearson's test procedure and the assessment of what he achieved are less than straightforward, notwithstanding the technical advances made since then. An attempt is made here to surmount these difficulties by exploring Pearson's relevant activities during the first decade of his statistical career, and by describing the work by his contemporaries and predecessors which seem to have influenced his approach to the problem. Not all the questions are answered, and others remain for further study.

original paper: http://www.economics.soton.ac.uk/staff/aldrich/1900.pdf

How did Karl Pearson come up with the chi-squared statistic?: https://stats.stackexchange.com/questions/97604/how-did-karl-pearson-come-up-with-the-chi-squared-statistic
He proceeds by working with the multivariate normal, and the chi-square arises as a sum of squared standardized normal variates.

You can see from the discussion on p160-161 he's clearly discussing applying the test to multinomial distributed data (I don't think he uses that term anywhere). He apparently understands the approximate multivariate normality of the multinomial (certainly he knows the margins are approximately normal - that's a very old result - and knows the means, variances and covariances, since they're stated in the paper); my guess is that most of that stuff is already old hat by 1900. (Note that the chi-squared distribution itself dates back to work by Helmert in the mid-1870s.)

Then by the bottom of p163 he derives a chi-square statistic as "a measure of goodness of fit" (the statistic itself appears in the exponent of the multivariate normal approximation).

He then goes on to discuss how to evaluate the p-value*, and then he correctly gives the upper tail area of a χ212χ122 beyond 43.87 as 0.000016. [You should keep in mind, however, that he didn't correctly understand how to adjust degrees of freedom for parameter estimation at that stage, so some of the examples in his papers use too high a d.f.]
nibble  papers  acm  stats  hypothesis-testing  methodology  history  mostly-modern  pre-ww2  old-anglo  giants  science  the-trenches  stories  multi  q-n-a  overflow  explanation  summary  innovation  discovery  distribution  degrees-of-freedom  limits 
october 2017 by nhaliday
Section 10 Chi-squared goodness-of-fit test.
- pf that chi-squared statistic for Pearson's test (multinomial goodness-of-fit) actually has chi-squared distribution asymptotically
- the gotcha: terms Z_j in sum aren't independent
- solution:
- compute the covariance matrix of the terms to be E[Z_iZ_j] = -sqrt(p_ip_j)
- note that an equivalent way of sampling the Z_j is to take a random standard Gaussian and project onto the plane orthogonal to (sqrt(p_1), sqrt(p_2), ..., sqrt(p_r))
- that is equivalent to just sampling a Gaussian w/ 1 less dimension (hence df=r-1)
pdf  nibble  lecture-notes  mit  stats  hypothesis-testing  acm  probability  methodology  proofs  iidness  distribution  limits  identity  direction  lifts-projections 
october 2017 by nhaliday
self study - Looking for a good and complete probability and statistics book - Cross Validated
I never had the opportunity to visit a stats course from a math faculty. I am looking for a probability theory and statistics book that is complete and self-sufficient. By complete I mean that it contains all the proofs and not just states results.
nibble  q-n-a  overflow  data-science  stats  methodology  books  recommendations  list  top-n  confluence  proofs  rigor  reference  accretion 
october 2017 by nhaliday
Variance of product of multiple random variables - Cross Validated
prod_i (var[X_i] + (E[X_i])^2) - prod_i (E[X_i])^2

two variable case: var[X] var[Y] + var[X] (E[Y])^2 + (E[X])^2 var[Y]
nibble  q-n-a  overflow  stats  probability  math  identity  moments  arrows  multiplicative  iidness  dependence-independence 
october 2017 by nhaliday
Accurate Genomic Prediction Of Human Height | bioRxiv
Stephen Hsu's compressed sensing application paper

We construct genomic predictors for heritable and extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show that these predictors capture, respectively, ~40, 20, and 9 percent of total variance for the three traits. For example, predicted heights correlate ~0.65 with actual height; actual heights of most individuals in validation samples are within a few cm of the prediction.


I'm in Mountain View to give a talk at 23andMe. Their latest funding round was $250M on a (reported) valuation of $1.5B. If I just add up the Crunchbase numbers it looks like almost half a billion invested at this point...

Slides: Genomic Prediction of Complex Traits

Here's how people + robots handle your spit sample to produce a SNP genotype:

study  bio  preprint  GWAS  state-of-art  embodied  genetics  genomics  compressed-sensing  high-dimension  machine-learning  missing-heritability  hsu  scitariat  education  🌞  frontier  britain  regression  data  visualization  correlation  phase-transition  multi  commentary  summary  pdf  slides  brands  skunkworks  hard-tech  presentation  talks  methodology  intricacy  bioinformatics  scaling-up  stat-power  sparsity  norms  nibble  speedometer  stats  linear-models  2017  biodet 
september 2017 by nhaliday
Lecture 14: When's that meteor arriving
- Meteors as a random process
- Limiting approximations
- Derivation of the Exponential distribution
- Derivation of the Poisson distribution
- A "Poisson process"
nibble  org:junk  org:edu  exposition  lecture-notes  physics  mechanics  space  earth  probability  stats  distribution  stochastic-processes  closure  additive  limits  approximation  tidbits  acm  binomial  multiplicative 
september 2017 by nhaliday
Atrocity statistics from the Roman Era
Christian Martyrs [make link]
Gibbon, Decline & Fall v.2 ch.XVI: < 2,000 k. under Roman persecution.
Ludwig Hertling ("Die Zahl de Märtyrer bis 313", 1944) estimated 100,000 Christians killed between 30 and 313 CE. (cited -- unfavorably -- by David Henige, Numbers From Nowhere, 1998)
Catholic Encyclopedia, "Martyr": number of Christian martyrs under the Romans unknown, unknowable. Origen says not many. Eusebius says thousands.


General population decline during The Fall of Rome: 7,000,000 [make link]
- Colin McEvedy, The New Penguin Atlas of Medieval History (1992)
- From 2nd Century CE to 4th Century CE: Empire's population declined from 45M to 36M [i.e. 9M]
- From 400 CE to 600 CE: Empire's population declined by 20% [i.e. 7.2M]
- Paul Bairoch, Cities and economic development: from the dawn of history to the present, p.111
- "The population of Europe except Russia, then, having apparently reached a high point of some 40-55 million people by the start of the third century [ca.200 C.E.], seems to have fallen by the year 500 to about 30-40 million, bottoming out at about 20-35 million around 600." [i.e. ca.20M]
- Francois Crouzet, A History of the European Economy, 1000-2000 (University Press of Virginia: 2001) p.1.
- "The population of Europe (west of the Urals) in c. AD 200 has been estimated at 36 million; by 600, it had fallen to 26 million; another estimate (excluding ‘Russia’) gives a more drastic fall, from 44 to 22 million." [i.e. 10M or 22M]

The geometric mean of these two extremes would come to 4½ per day, which is a credible daily rate for the really bad years.

why geometric mean? can you get it as the MLE given min{X1, ..., Xn} and max{X1, ..., Xn} for {X_i} iid Poissons? some kinda limit? think it might just be a rule of thumb.

yeah, it's a rule of thumb. found it it his book (epub).
org:junk  data  let-me-see  scale  history  iron-age  mediterranean  the-classics  death  nihil  conquest-empire  war  peace-violence  gibbon  trivia  multi  todo  AMT  expectancy  heuristic  stats  ML-MAP-E  data-science  estimate  magnitude  population  demographics  database  list  religion  christianity  leviathan 
september 2017 by nhaliday
All models are wrong - Wikipedia
Box repeated the aphorism in a paper that was published in the proceedings of a 1978 statistics workshop.[2] The paper contains a section entitled "All models are wrong but some are useful". The section is copied below.

Now it would be very remarkable if any system existing in the real world could be exactly represented by any simple model. However, cunningly chosen parsimonious models often do provide remarkably useful approximations. For example, the law PV = RT relating pressure P, volume V and temperature T of an "ideal" gas via a constant R is not exactly true for any real gas, but it frequently provides a useful approximation and furthermore its structure is informative since it springs from a physical view of the behavior of gas molecules.

For such a model there is no need to ask the question "Is the model true?". If "truth" is to be the "whole truth" the answer must be "No". The only question of interest is "Is the model illuminating and useful?".
thinking  metabuch  metameta  map-territory  models  accuracy  wire-guided  truth  philosophy  stats  data-science  methodology  lens  wiki  reference  complex-systems  occam  parsimony  science  nibble  hi-order-bits  info-dynamics  the-trenches  meta:science  physics  fluid  thermo  stat-mech  applicability-prereqs  theory-practice 
august 2017 by nhaliday
trees are harlequins, words are harlequins — bayes: a kinda-sorta masterpost
lol, gwern: https://www.reddit.com/r/slatestarcodex/comments/6ghsxf/biweekly_rational_feed/diqr0rq/
> What sort of person thinks “oh yeah, my beliefs about these coefficients correspond to a Gaussian with variance 2.5″? And what if I do cross-validation, like I always do, and find that variance 200 works better for the problem? Was the other person wrong? But how could they have known?
> ...Even ignoring the mode vs. mean issue, I have never met anyone who could tell whether their beliefs were normally distributed vs. Laplace distributed. Have you?
I must have spent too much time in Bayesland because both those strike me as very easy and I often think them! My beliefs usually are Laplace distributed when it comes to things like genetics (it makes me very sad to see GWASes with flat priors), and my Gaussian coefficients are actually a variance of 0.70 (assuming standardized variables w.l.o.g.) as is consistent with field-wide meta-analyses indicating that d>1 is pretty rare.
ratty  ssc  core-rats  tumblr  social  explanation  init  philosophy  bayesian  thinking  probability  stats  frequentist  big-yud  lesswrong  synchrony  similarity  critique  intricacy  shalizi  scitariat  selection  mutation  evolution  priors-posteriors  regularization  bias-variance  gwern  reddit  commentary  GWAS  genetics  regression  spock  nitty-gritty  generalization  epistemic  🤖  rationality  poast  multi  best-practices  methodology  data-science 
august 2017 by nhaliday
Analysis of variance - Wikipedia
Analysis of variance (ANOVA) is a collection of statistical models used to analyze the differences among group means and their associated procedures (such as "variation" among and between groups), developed by statistician and evolutionary biologist Ronald Fisher. In the ANOVA setting, the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether or not the means of several groups are equal, and therefore generalizes the t-test to more than two groups. ANOVAs are useful for comparing (testing) three or more means (groups or variables) for statistical significance. It is conceptually similar to multiple two-sample t-tests, but is more conservative (results in less type I error) and is therefore suited to a wide range of practical problems.

good pic: https://en.wikipedia.org/wiki/Analysis_of_variance#Motivating_example

tutorial by Gelman: http://www.stat.columbia.edu/~gelman/research/published/econanova3.pdf

so one way to think of partitioning the variance:
y_ij = alpha_i + beta_j + eps_ij
Var(y_ij) = Var(alpha_i) + Var(beta_j) + Cov(alpha_i, beta_j) + Var(eps_ij)
and alpha_i, beta_j are independent, so Cov(alpha_i, beta_j) = 0

can you make this work w/ interaction effects?
data-science  stats  methodology  hypothesis-testing  variance-components  concept  conceptual-vocab  thinking  wiki  reference  nibble  multi  visualization  visual-understanding  pic  pdf  exposition  lecture-notes  gelman  scitariat  tutorial  acm  ground-up  yoga 
july 2017 by nhaliday
Stat 260/CS 294: Bayesian Modeling and Inference
- Priors (conjugate, noninformative, reference)
- Hierarchical models, spatial models, longitudinal models, dynamic models, survival models
- Testing
- Model choice
- Inference (importance sampling, MCMC, sequential Monte Carlo)
- Nonparametric models (Dirichlet processes, Gaussian processes, neutral-to-the-right processes, completely random measures)
- Decision theory and frequentist perspectives (complete class theorems, consistency, empirical Bayes)
- Experimental design
unit  course  berkeley  expert  michael-jordan  machine-learning  acm  bayesian  probability  stats  lecture-notes  priors-posteriors  markov  monte-carlo  frequentist  latent-variables  decision-theory  expert-experience  confidence  sampling 
july 2017 by nhaliday
Econometric Modeling as Junk Science
The Credibility Revolution in Empirical Economics: How Better Research Design Is Taking the Con out of Econometrics: https://www.aeaweb.org/articles?id=10.1257/jep.24.2.3

On data, experiments, incentives and highly unconvincing research – papers and hot beverages: https://papersandhotbeverages.wordpress.com/2015/10/31/on-data-experiments-incentives-and-highly-unconvincing-research/
In my view, it has just to do with the fact that academia is a peer monitored organization. In the case of (bad) data collection papers, issues related to measurement are typically boring. They are relegated to appendices, no one really has an incentive to monitor it seriously. The problem is similar in formal theory: no one really goes through the algebra in detail, but it is in principle feasible to do it, and, actually, sometimes these errors are detected. If discussing the algebra of a proof is almost unthinkable in a seminar, going into the details of data collection, measurement and aggregation is not only hard to imagine, but probably intrinsically infeasible.

Something different happens for the experimentalist people. As I was saying, I feel we have come to a point in which many papers are evaluated based on the cleverness and originality of the research design (“Using the World Cup qualifiers as an instrument for patriotism!? Woaw! how cool/crazy is that! I wish I had had that idea”). The sexiness of the identification strategy has too often become a goal in itself. When your peers monitor you paying more attention to the originality of the identification strategy than to the research question, you probably have an incentive to mine reality for ever crazier discontinuities. It is true methodologists have been criticized in the past for analogous reasons, such as being guided by the desire to increase mathematical complexity without a clear benefit. But, if you work with pure formal theory or statistical theory, your work is not meant to immediately answer question about the real world, but instead to serve other researchers in their quest. This is something that can, in general, not be said of applied CI work.

This post should have been entitled “Zombies who only think of their next cool IV fix”
massive lust for quasi-natural experiments, regression discontinuities
barely matters if the effects are not all that big
I suppose even the best of things must reach their decadent phase; methodological innov. to manias……

Following this "collapse of small-N social psych results" business, where do I predict econ will collapse? I see two main contenders.
One is lab studies. I dallied with these a few years ago in a Kenya lab. We ran several pilots of N=200 to figure out the best way to treat
and to measure the outcome. Every pilot gave us a different stat sig result. I could have written six papers concluding different things.
I gave up more skeptical of these lab studies than ever before. The second contender is the long run impacts literature in economic history
We should be very suspicious since we never see a paper showing that a historical event had no effect on modern day institutions or dvpt.
On the one hand I find these studies fun, fascinating, and probably true in a broad sense. They usually reinforce a widely believed history
argument with interesting data and a cute empirical strategy. But I don't think anyone believes the standard errors. There's probably a HUGE
problem of nonsignificant results staying in the file drawer. Also, there are probably data problems that don't get revealed, as we see with
the recent Piketty paper (http://marginalrevolution.com/marginalrevolution/2017/10/pikettys-data-reliable.html). So I take that literature with a vat of salt, even if I enjoy and admire the works
I used to think field experiments would show little consistency in results across place. That external validity concerns would be fatal.
In fact the results across different samples and places have proven surprisingly similar across places, and added a lot to general theory
Last, I've come to believe there is no such thing as a useful instrumental variable. The ones that actually meet the exclusion restriction
are so weird & particular that the local treatment effect is likely far different from the average treatment effect in non-transparent ways.
Most of the other IVs don't plausibly meet the e clue ion restriction. I mean, we should be concerned when the IV estimate is always 10x
larger than the OLS coefficient. This I find myself much more persuaded by simple natural experiments that use OLS, diff in diff, or
discontinuities, alongside randomized trials.

What do others think are the cliffs in economics?
PS All of these apply to political science too. Though I have a special extra target in poli sci: survey experiments! A few are good. I like
Dan Corstange's work. But it feels like 60% of dissertations these days are experiments buried in a survey instrument that measure small
changes in response. These at least have large N. But these are just uncontrolled labs, with negligible external validity in my mind.
The good ones are good. This method has its uses. But it's being way over-applied. More people have to make big and risky investments in big
natural and field experiments. Time to raise expectations and ambitions. This expectation bar, not technical ability, is the big advantage
economists have over political scientists when they compete in the same space.
(Ok. So are there any friends and colleagues I haven't insulted this morning? Let me know and I'll try my best to fix it with a screed)

Most papers that employ Differences-in-Differences estimation (DD) use many years of data and focus on serially correlated outcomes but ignore that the resulting standard errors are inconsistent. To illustrate the severity of this issue, we randomly generate placebo laws in state-level data on female wages from the Current Population Survey. For each law, we use OLS to compute the DD estimate of its “effect” as well as the standard error of this estimate. These conventional DD standard errors severely understate the standard deviation of the estimators: we find an “effect” significant at the 5 percent level for up to 45 percent of the placebo interventions. We use Monte Carlo simulations to investigate how well existing methods help solve this problem. Econometric corrections that place a specific parametric form on the time-series process do not perform well. Bootstrap (taking into account the auto-correlation of the data) works well when the number of states is large enough. Two corrections based on asymptotic approximation of the variance-covariance matrix work well for moderate numbers of states and one correction that collapses the time series information into a “pre” and “post” period and explicitly takes into account the effective sample size works well even for small numbers of states.

‘METRICS MONDAY: 2SLS–CHRONICLE OF A DEATH FORETOLD: http://marcfbellemare.com/wordpress/12733
As it turns out, Young finds that
1. Conventional tests tend to overreject the null hypothesis that the 2SLS coefficient is equal to zero.
2. 2SLS estimates are falsely declared significant one third to one half of the time, depending on the method used for bootstrapping.
3. The 99-percent confidence intervals (CIs) of those 2SLS estimates include the OLS point estimate over 90 of the time. They include the full OLS 99-percent CI over 75 percent of the time.
4. 2SLS estimates are extremely sensitive to outliers. Removing simply one outlying cluster or observation, almost half of 2SLS results become insignificant. Things get worse when removing two outlying clusters or observations, as over 60 percent of 2SLS results then become insignificant.
5. Using a Durbin-Wu-Hausman test, less than 15 percent of regressions can reject the null that OLS estimates are unbiased at the 1-percent level.
6. 2SLS has considerably higher mean squared error than OLS.
7. In one third to one half of published results, the null that the IVs are totally irrelevant cannot be rejected, and so the correlation between the endogenous variable(s) and the IVs is due to finite sample correlation between them.
8. Finally, fewer than 10 percent of 2SLS estimates reject instrument irrelevance and the absence of OLS bias at the 1-percent level using a Durbin-Wu-Hausman test. It gets much worse–fewer than 5 percent–if you add in the requirement that the 2SLS CI that excludes the OLS estimate.

Methods Matter: P-Hacking and Causal Inference in Economics*: http://ftp.iza.org/dp11796.pdf
Applying multiple methods to 13,440 hypothesis tests reported in 25 top economics journals in 2015, we show that selective publication and p-hacking is a substantial problem in research employing DID and (in particular) IV. RCT and RDD are much less problematic. Almost 25% of claims of marginally significant results in IV papers are misleading.

Ever since I learned social science is completely fake, I've had a lot more time to do stuff that matters, like deadlifting and reading about Mediterranean haplogroups
Wait, so, from fakest to realest IV>DD>RCT>RDD? That totally matches my impression.
org:junk  org:edu  economics  econometrics  methodology  realness  truth  science  social-science  accuracy  generalization  essay  article  hmm  multi  study  🎩  empirical  causation  error  critique  sociology  criminology  hypothesis-testing  econotariat  broad-econ  cliometrics  endo-exo  replication  incentives  academia  measurement  wire-guided  intricacy  twitter  social  discussion  pseudoE  effect-size  reflection  field-study  stat-power  piketty  marginal-rev  commentary  data-science  expert-experience  regression  gotchas  rant  map-territory  pdf  simulation  moments  confidence  bias-variance  stats  endogenous-exogenous  control  meta:science  meta-analysis  outliers  summary  sampling  ensembles  monte-carlo  theory-practice  applicability-prereqs  chart  comparison  shift  ratty  unaffiliated 
june 2017 by nhaliday
Pearson correlation coefficient - Wikipedia
what does this mean?: https://twitter.com/GarettJones/status/863546692724858880
deleted but it was about the Pearson correlation distance: 1-r
I guess it's a metric


A less misleading way to think about the correlation R is as follows: given X,Y from a standardized bivariate distribution with correlation R, an increase in X leads to an expected increase in Y: dY = R dX. In other words, students with +1 SD SAT score have, on average, roughly +0.4 SD college GPAs. Similarly, students with +1 SD college GPAs have on average +0.4 SAT.

this reminds me of the breeder's equation (but it uses r instead of h^2, so it can't actually be the same)

stats  science  hypothesis-testing  correlation  metrics  plots  regression  wiki  reference  nibble  methodology  multi  twitter  social  discussion  best-practices  econotariat  garett-jones  concept  conceptual-vocab  accuracy  causation  acm  matrix-factorization  todo  explanation  yoga  hsu  street-fighting  levers  🌞  2014  scitariat  variance-components  meta:prediction  biodet  s:**  mental-math  reddit  commentary  ssc  poast  gwern  data-science  metric-space  similarity  measure  dependence-independence 
may 2017 by nhaliday
Deming regression - Wikipedia
Deming regression. The red lines show the error in both x and y. This is different from the traditional least squares method which measures error parallel to the y axis. The case shown, with deviations measured perpendicularly, arises when errors in x and y have equal variances.

stats  data-science  regression  methodology  direction  noise-structure  wiki  reference  nibble  multi 
may 2017 by nhaliday
[1502.05274] How predictable is technological progress?
Recently it has become clear that many technologies follow a generalized version of Moore's law, i.e. costs tend to drop exponentially, at different rates that depend on the technology. Here we formulate Moore's law as a correlated geometric random walk with drift, and apply it to historical data on 53 technologies. We derive a closed form expression approximating the distribution of forecast errors as a function of time. Based on hind-casting experiments we show that this works well, making it possible to collapse the forecast errors for many different technologies at different time horizons onto the same universal distribution. This is valuable because it allows us to make forecasts for any given technology with a clear understanding of the quality of the forecasts. As a practical demonstration we make distributional forecasts at different time horizons for solar photovoltaic modules, and show how our method can be used to estimate the probability that a given technology will outperform another technology at a given point in the future.

- p_t = unit price of tech
- log(p_t) = y_0 - μt + ∑_{i <= t} n_i
- n_t iid noise process
preprint  study  economics  growth-econ  innovation  discovery  technology  frontier  tetlock  meta:prediction  models  time  definite-planning  stylized-facts  regression  econometrics  magnitude  energy-resources  phys-energy  money  cost-benefit  stats  data-science  🔬  ideas  speedometer  multiplicative  methodology  stochastic-processes  time-series  stock-flow  iteration-recursion  org:mat 
april 2017 by nhaliday
'Capital in the Twenty-First Century' by Thomas Piketty, reviewed | New Republic
by Robert Solow (positive)

The data then exhibit a clear pattern. In France and Great Britain, national capital stood fairly steadily at about seven times national income from 1700 to 1910, then fell sharply from 1910 to 1950, presumably as a result of wars and depression, reaching a low of 2.5 in Britain and a bit less than 3 in France. The capital-income ratio then began to climb in both countries, and reached slightly more than 5 in Britain and slightly less than 6 in France by 2010. The trajectory in the United States was slightly different: it started at just above 3 in 1770, climbed to 5 in 1910, fell slightly in 1920, recovered to a high between 5 and 5.5 in 1930, fell to below 4 in 1950, and was back to 4.5 in 2010.

The wealth-income ratio in the United States has always been lower than in Europe. The main reason in the early years was that land values bulked less in the wide open spaces of North America. There was of course much more land, but it was very cheap. Into the twentieth century and onward, however, the lower capital-income ratio in the United States probably reflects the higher level of productivity: a given amount of capital could support a larger production of output than in Europe. It is no surprise that the two world wars caused much less destruction and dissipation of capital in the United States than in Britain and France. The important observation for Piketty’s argument is that, in all three countries, and elsewhere as well, the wealth-income ratio has been increasing since 1950, and is almost back to nineteenth-century levels. He projects this increase to continue into the current century, with weighty consequences that will be discussed as we go on.


Now if you multiply the rate of return on capital by the capital-income ratio, you get the share of capital in the national income. For example, if the rate of return is 5 percent a year and the stock of capital is six years worth of national income, income from capital will be 30 percent of national income, and so income from work will be the remaining 70 percent. At last, after all this preparation, we are beginning to talk about inequality, and in two distinct senses. First, we have arrived at the functional distribution of income—the split between income from work and income from wealth. Second, it is always the case that wealth is more highly concentrated among the rich than income from labor (although recent American history looks rather odd in this respect); and this being so, the larger the share of income from wealth, the more unequal the distribution of income among persons is likely to be. It is this inequality across persons that matters most for good or ill in a society.


The data are complicated and not easily comparable across time and space, but here is the flavor of Piketty’s summary picture. Capital is indeed very unequally distributed. Currently in the United States, the top 10 percent own about 70 percent of all the capital, half of that belonging to the top 1 percent; the next 40 percent—who compose the “middle class”—own about a quarter of the total (much of that in the form of housing), and the remaining half of the population owns next to nothing, about 5 percent of total wealth. Even that amount of middle-class property ownership is a new phenomenon in history. The typical European country is a little more egalitarian: the top 1 percent own 25 percent of the total capital, and the middle class 35 percent. (A century ago the European middle class owned essentially no wealth at all.) If the ownership of wealth in fact becomes even more concentrated during the rest of the twenty-first century, the outlook is pretty bleak unless you have a taste for oligarchy.

Income from wealth is probably even more concentrated than wealth itself because, as Piketty notes, large blocks of wealth tend to earn a higher return than small ones. Some of this advantage comes from economies of scale, but more may come from the fact that very big investors have access to a wider range of investment opportunities than smaller investors. Income from work is naturally less concentrated than income from wealth. In Piketty’s stylized picture of the United States today, the top 1 percent earns about 12 percent of all labor income, the next 9 percent earn 23 percent, the middle class gets about 40 percent, and the bottom half about a quarter of income from work. Europe is not very different: the top 10 percent collect somewhat less and the other two groups a little more.

You get the picture: modern capitalism is an unequal society, and the rich-get-richer dynamic strongly suggest that it will get more so. But there is one more loose end to tie up, already hinted at, and it has to do with the advent of very high wage incomes. First, here are some facts about the composition of top incomes. About 60 percent of the income of the top 1 percent in the United States today is labor income. Only when you get to the top tenth of 1 percent does income from capital start to predominate. The income of the top hundredth of 1 percent is 70 percent from capital. The story for France is not very different, though the proportion of labor income is a bit higher at every level. Evidently there are some very high wage incomes, as if you didn’t know.

This is a fairly recent development. In the 1960s, the top 1 percent of wage earners collected a little more than 5 percent of all wage incomes. This fraction has risen pretty steadily until nowadays, when the top 1 percent of wage earners receive 10–12 percent of all wages. This time the story is rather different in France. There the share of total wages going to the top percentile was steady at 6 percent until very recently, when it climbed to 7 percent. The recent surge of extreme inequality at the top of the wage distribution may be primarily an American development. Piketty, who with Emmanuel Saez has made a careful study of high-income tax returns in the United States, attributes this to the rise of what he calls “supermanagers.” The very highest income class consists to a substantial extent of top executives of large corporations, with very rich compensation packages. (A disproportionate number of these, but by no means all of them, come from the financial services industry.) With or without stock options, these large pay packages get converted to wealth and future income from wealth. But the fact remains that much of the increased income (and wealth) inequality in the United States is driven by the rise of these supermanagers.

and Deirdre McCloskey (p critical): https://ejpe.org/journal/article/view/170
nice discussion of empirical economics, economic history, market failures and statism, etc., with several bon mots

Piketty’s great splash will undoubtedly bring many young economically interested scholars to devote their lives to the study of the past. That is good, because economic history is one of the few scientifically quantitative branches of economics. In economic history, as in experimental economics and a few other fields, the economists confront the evidence (as they do not for example in most macroeconomics or industrial organization or international trade theory nowadays).


Piketty gives a fine example of how to do it. He does not get entangled as so many economists do in the sole empirical tool they are taught, namely, regression analysis on someone else’s “data” (one of the problems is the word data, meaning “things given”: scientists should deal in capta, “things seized”). Therefore he does not commit one of the two sins of modern economics, the use of meaningless “tests” of statistical significance (he occasionally refers to “statistically insignificant” relations between, say, tax rates and growth rates, but I am hoping he does not suppose that a large coefficient is “insignificant” because R. A. Fisher in 1925 said it was). Piketty constructs or uses statistics of aggregate capital and of inequality and then plots them out for inspection, which is what physicists, for example, also do in dealing with their experiments and observations. Nor does he commit the other sin, which is to waste scientific time on existence theorems. Physicists, again, don’t. If we economists are going to persist in physics envy let us at least learn what physicists actually do. Piketty stays close to the facts, and does not, for example, wander into the pointless worlds of non-cooperative game theory, long demolished by experimental economics. He also does not have recourse to non-computable general equilibrium, which never was of use for quantitative economic science, being a branch of philosophy, and a futile one at that. On both points, bravissimo.


Since those founding geniuses of classical economics, a market-tested betterment (a locution to be preferred to “capitalism”, with its erroneous implication that capital accumulation, not innovation, is what made us better off) has enormously enriched large parts of a humanity now seven times larger in population than in 1800, and bids fair in the next fifty years or so to enrich everyone on the planet. [Not SSA or MENA...]


Then economists, many on the left but some on the right, in quick succession from 1880 to the present—at the same time that market-tested betterment was driving real wages up and up and up—commenced worrying about, to name a few of the pessimisms concerning “capitalism” they discerned: greed, alienation, racial impurity, workers’ lack of bargaining strength, workers’ bad taste in consumption, immigration of lesser breeds, monopoly, unemployment, business cycles, increasing returns, externalities, under-consumption, monopolistic competition, separation of ownership from control, lack of planning, post-War stagnation, investment spillovers, unbalanced growth, dual labor markets, capital insufficiency (William Easterly calls it “capital fundamentalism”), peasant irrationality, capital-market imperfections, public … [more]
news  org:mag  big-peeps  econotariat  economics  books  review  capital  capitalism  inequality  winner-take-all  piketty  wealth  class  labor  mobility  redistribution  growth-econ  rent-seeking  history  mostly-modern  trends  compensation  article  malaise  🎩  the-bones  whiggish-hegelian  cjones-like  multi  mokyr-allen-mccloskey  expert  market-failure  government  broad-econ  cliometrics  aphorism  lens  gallic  clarity  europe  critique  rant  optimism  regularizer  pessimism  ideology  behavioral-econ  authoritarianism  intervention  polanyi-marx  politics  left-wing  absolute-relative  regression-to-mean  legacy  empirical  data-science  econometrics  methodology  hypothesis-testing  physics  iron-age  mediterranean  the-classics  quotes  krugman  world  entrepreneurialism  human-capital  education  supply-demand  plots  manifolds  intersection  markets  evolution  darwinian  giants  old-anglo  egalitarianism-hierarchy  optimate  morality  ethics  envy  stagnation  nl-and-so-can-you  expert-experience  courage  stats  randy-ayndy  reason  intersection-connectedness  detail-architect 
april 2017 by nhaliday
Educational Romanticism & Economic Development | pseudoerasmus


Did Nations that Boosted Education Grow Faster?: http://econlog.econlib.org/archives/2012/10/did_nations_tha.html
On average, no relationship. The trendline points down slightly, but for the time being let's just call it a draw. It's a well-known fact that countries that started the 1960's with high education levels grew faster (example), but this graph is about something different. This graph shows that countries that increased their education levels did not grow faster.

Where has all the education gone?: http://citeseerx.ist.psu.edu/viewdoc/download?doi=




The Case Against Education: What's Taking So Long, Bryan Caplan: http://econlog.econlib.org/archives/2015/03/the_case_agains_9.html

The World Might Be Better Off Without College for Everyone: https://www.theatlantic.com/magazine/archive/2018/01/whats-college-good-for/546590/
Students don't seem to be getting much out of higher education.
- Bryan Caplan

College: Capital or Signal?: http://www.economicmanblog.com/2017/02/25/college-capital-or-signal/
After his review of the literature, Caplan concludes that roughly 80% of the earnings effect from college comes from signalling, with only 20% the result of skill building. Put this together with his earlier observations about the private returns to college education, along with its exploding cost, and Caplan thinks that the social returns are negative. The policy implications of this will come as very bitter medicine for friends of Bernie Sanders.

Doubting the Null Hypothesis: http://www.arnoldkling.com/blog/doubting-the-null-hypothesis/

Is higher education/college in the US more about skill-building or about signaling?: https://www.quora.com/Is-higher-education-college-in-the-US-more-about-skill-building-or-about-signaling
ballpark: 50% signaling, 30% selection, 20% addition to human capital
more signaling in art history, more human capital in engineering, more selection in philosophy

Econ Duel! Is Education Signaling or Skill Building?: http://marginalrevolution.com/marginalrevolution/2016/03/econ-duel-is-education-signaling-or-skill-building.html
Marginal Revolution University has a brand new feature, Econ Duel! Our first Econ Duel features Tyler and me debating the question, Is education more about signaling or skill building?

Against Tulip Subsidies: https://slatestarcodex.com/2015/06/06/against-tulip-subsidies/




Most American public school kids are low-income; about half are non-white; most are fairly low skilled academically. For most American kids, the majority of the waking hours they spend not engaged with electronic media are at school; the majority of their in-person relationships are at school; the most important relationships they have with an adult who is not their parent is with their teacher. For their parents, the most important in-person source of community is also their kids’ school. Young people need adult mirrors, models, mentors, and in an earlier era these might have been provided by extended families, but in our own era this all falls upon schools.

Caplan gestures towards work and earlier labor force participation as alternatives to school for many if not all kids. And I empathize: the years that I would point to as making me who I am were ones where I was working, not studying. But they were years spent working in schools, as a teacher or assistant. If schools did not exist, is there an alternative that we genuinely believe would arise to draw young people into the life of their community?


It is not an accident that the state that spends the least on education is Utah, where the LDS church can take up some of the slack for schools, while next door Wyoming spends almost the most of any state at $16,000 per student. Education is now the one surviving binding principle of the society as a whole, the one black box everyone will agree to, and so while you can press for less subsidization of education by government, and for privatization of costs, as Caplan does, there’s really nothing people can substitute for it. This is partially about signaling, sure, but it’s also because outside of schools and a few religious enclaves our society is but a darkling plain beset by winds.

This doesn’t mean that we should leave Caplan’s critique on the shelf. Much of education is focused on an insane, zero-sum race for finite rewards. Much of schooling does push kids, parents, schools, and school systems towards a solution ad absurdum, where anything less than 100 percent of kids headed to a doctorate and the big coding job in the sky is a sign of failure of everyone concerned.

But let’s approach this with an eye towards the limits of the possible and the reality of diminishing returns.

The real reason the left would support Moander: the usual reason. because he’s an enemy.

I have a problem in thinking about education, since my preferences and personal educational experience are atypical, so I can’t just gut it out. On the other hand, knowing that puts me ahead of a lot of people that seem convinced that all real people, including all Arab cabdrivers, think and feel just as they do.

One important fact, relevant to this review. I don’t like Caplan. I think he doesn’t understand – can’t understand – human nature, and although that sometimes confers a different and interesting perspective, it’s not a royal road to truth. Nor would I want to share a foxhole with him: I don’t trust him. So if I say that I agree with some parts of this book, you should believe me.


Caplan doesn’t talk about possible ways of improving knowledge acquisition and retention. Maybe he thinks that’s impossible, and he may be right, at least within a conventional universe of possibilities. That’s a bit outside of his thesis, anyhow. Me it interests.

He dismisses objections from educational psychologists who claim that studying a subject improves you in subtle ways even after you forget all of it. I too find that hard to believe. On the other hand, it looks to me as if poorly-digested fragments of information picked up in college have some effect on public policy later in life: it is no coincidence that most prominent people in public life (at a given moment) share a lot of the same ideas. People are vaguely remembering the same crap from the same sources, or related sources. It’s correlated crap, which has a much stronger effect than random crap.

These widespread new ideas are usually wrong. They come from somewhere – in part, from higher education. Along this line, Caplan thinks that college has only a weak ideological effect on students. I don’t believe he is correct. In part, this is because most people use a shifting standard: what’s liberal or conservative gets redefined over time. At any given time a population is roughly half left and half right – but the content of those labels changes a lot. There’s a shift.

I put it this way, a while ago: “When you think about it, falsehoods, stupid crap, make the best group identifiers, because anyone might agree with you when you’re obviously right. Signing up to clear nonsense is a better test of group loyalty. A true friend is with you when you’re wrong. Ideally, not just wrong, but barking mad, rolling around in your own vomit wrong.”
You just explained the Credo quia absurdum doctrine. I always wondered if it was nonsense. It is not.
Someone on twitter caught it first – got all the way to “sliding down the razor blade of life”. Which I explained is now called “transitioning”

What Catholics believe: https://theweek.com/articles/781925/what-catholics-believe
We believe all of these things, fantastical as they may sound, and we believe them for what we consider good reasons, well attested by history, consistent with the most exacting standards of logic. We will profess them in this place of wrath and tears until the extraordinary event referenced above, for which men and women have hoped and prayed for nearly 2,000 years, comes to pass.

According to Caplan, employers are looking for conformity, conscientiousness, and intelligence. They use completion of high school, or completion of college as a sign of conformity and conscientiousness. College certainly looks as if it’s mostly signaling, and it’s hugely expensive signaling, in terms of college costs and foregone earnings.

But inserting conformity into the merit function is tricky: things become important signals… because they’re important signals. Otherwise useful actions are contraindicated because they’re “not done”. For example, test scores convey useful information. They could help show that an applicant is smart even though he attended a mediocre school – the same role they play in college admissions. But employers seldom request test scores, and although applicants may provide them, few do. Caplan says ” The word on the street… [more]
econotariat  pseudoE  broad-econ  economics  econometrics  growth-econ  education  human-capital  labor  correlation  null-result  world  developing-world  commentary  spearhead  garett-jones  twitter  social  pic  discussion  econ-metrics  rindermann-thompson  causation  endo-exo  biodet  data  chart  knowledge  article  wealth-of-nations  latin-america  study  path-dependence  divergence  🎩  curvature  microfoundations  multi  convexity-curvature  nonlinearity  hanushek  volo-avolo  endogenous-exogenous  backup  pdf  people  policy  monetary-fiscal  wonkish  cracker-econ  news  org:mag  local-global  higher-ed  impetus  signaling  rhetoric  contrarianism  domestication  propaganda  ratty  hanson  books  review  recommendations  distribution  externalities  cost-benefit  summary  natural-experiment  critique  rent-seeking  mobility  supply-demand  intervention  shift  social-choice  government  incentives  interests  q-n-a  street-fighting  objektbuch  X-not-about-Y  marginal-rev  c:***  qra  info-econ  info-dynamics  org:econlib  yvain  ssc  politics  medicine  stories 
april 2017 by nhaliday
Placebo interventions for all clinical conditions. - PubMed - NCBI
We did not find that placebo interventions have important clinical effects in general. However, in certain settings placebo interventions can influence patient-reported outcomes, especially pain and nausea, though it is difficult to distinguish patient-reported effects of placebo from biased reporting. The effect on pain varied, even among trials with low risk of bias, from negligible to clinically important. Variations in the effect of placebo were partly explained by variations in how trials were conducted and how patients were informed.

How much of the placebo 'effect' is really statistical regression?: https://www.ncbi.nlm.nih.gov/pubmed/6369471
Statistical regression to the mean predicts that patients selected for abnormalcy will, on the average, tend to improve. We argue that most improvements attributed to the placebo effect are actually instances of statistical regression. First, whereas older clinical trials susceptible to regression resulted in a marked improvement in placebo-treated patients, in a modern series of clinical trials whose design tended to protect against regression, we found no significant improvement (median change 0.3 per cent, p greater than 0.05) in placebo-treated patients.

Placebo effects are weak: regression to the mean is the main reason ineffective treatments appear to work: http://www.dcscience.net/2015/12/11/placebo-effects-are-weak-regression-to-the-mean-is-the-main-reason-ineffective-treatments-appear-to-work/

A radical new hypothesis in medicine: give patients drugs they know don’t work: https://www.vox.com/science-and-health/2017/6/1/15711814/open-label-placebo-kaptchuk
People on no treatment got about 30 percent better. And people who were given an open-label placebo got 60 percent improvement in the adequate relief of their irritable bowel syndrome.

Surgery Is One Hell Of A Placebo: https://fivethirtyeight.com/features/surgery-is-one-hell-of-a-placebo/
study  psychology  social-psych  medicine  meta:medicine  contrarianism  evidence-based  embodied-cognition  intervention  illusion  realness  meta-analysis  multi  science  stats  replication  gelman  regularizer  thinking  regression-to-mean  methodology  insight  hmm  news  org:data  org:lite  interview  tricks  drugs  cost-benefit  health  ability-competence  chart 
march 2017 by nhaliday
The genetics of politics: discovery, challenges, and progress
Figure 1. Summary of relative genetic and environmental influences on political traits.

- heritability increases discontinuously on leaving home
- pretty big range of heritability for different particular traits (party identification is lowest w/ largest shared environment by far)
- overall ideology quite highly heritable
- social trust is surprisingly highly compared other measurements I've seen...
- ethnocentrism quite low (sample-dependent?)
- authoritarianism and traditionalism quite high
- voter turnout quite high

Genes, psychological traits and civic engagement: http://rstb.royalsocietypublishing.org/content/370/1683/20150015
We show an underlying genetic contribution to an index of civic engagement (0.41), as well as for the individual acts of engagement of volunteering for community or public service activities (0.33), regularly contributing to charitable causes (0.28) and voting in elections (0.27). There are closer genetic relationships between donating and the other two activities; volunteering and voting are not genetically correlated. Further, we show that most of the correlation between civic engagement and both positive emotionality and verbal IQ can be attributed to genes that affect both traits.

Are Political Orientations Genetically Transmitted?: http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1006&context=poliscifacpub
TABLE 1. Genetic and Environmental Influences on Political Attitudes: The 28 Individual Wilson–Patterson Items

The origins of party identification and its relationship to political orientations: http://sci-hub.tw/http://www.sciencedirect.com/science/article/pii/S0191886915002470

All models showed a good overall fit (see Table 3). The data indicate that party identification is substantially heritable, with about 50% of the variation in PID attributable to additive genetic effects. Moreover, the results indicate that the non-genetic influences on party identification stem primarily from unique environmental factors rather than shared ones such as growing up in the same family. This too is not consistent with the Michigan model.

Table 3 also indicates that genetic influences explained about 50% of the variance in liberalism–conservatism. This estimate is similar to previous behavior genetic findings on political attitudes (e.g., Alford et al., 2005; Bouchard, 2004; Hatemi et al., 2014; Kandler, Bleidorn, & Riemann, 2012). The remaining variance was again due primarily to nonshared environmental influences. The latter finding indicates that the Michigan hypothesis that partisan social influences affect political orientations may have some merit, although the substantial level of heritability for this variable suggests that genetic effects also play an important role.


As Table 4 reveals, the best fitting model indicates that 100% of the genetic variance in PID is held in common with liberalism–conservatism ([aC2]/[aC2 + aPID2] = 1.00). Similarly, 73% of the environmental variation in PID is shared with liberalism–conservatism ([eC2]/[eC2 + ePID2] = .73). All told, only 13% of the total variance in PID cannot be explained by variation in liberalism–conservatism (1 [aC2 + eC2] = .13), as illustrated in Fig. 3. Since only a small proportion of the variance in PID cannot be explained by liberalism– conservatism, the findings are consistent with the hypothesis that genetic and environmental factors influence liberalism–conservatism, which in turn affects party identification. However, as discussed below, other causal scenarios cannot be ruled out.

Table 4 and Fig. 3 also show that 55% of the total variance in liberalism–conservatism cannot be accounted for by variance in PID

Fig. 3. Venn diagram mapping the common and specific variance in party
identification and liberalism–conservatism.

intuition for how you can figure out overlap of variance: look at how corr(PID, liberal-conservative) differs between MZ and DZ twin pairs, etc., fit structural equational model

p_k,i,j = r_A a_k,i,j,p + r_C c_k,i,p + r_E e_k,i,j,p (k=MZ or DZ, i=1..n_k, j=1,2, p=PID or LC value)

c_k,i,j,p = r_{C,p} c'_k,i,p + r_{C,common} c'_k,i,common (ditto)
e_k,i,j,p = r_{E,p} e'_k,i,j,p + r_{E,common} e'_k,i,j,common (ditto)

MZ twins:
a_MZ,i,j,p = r_{A,p} a'_MZ,i,p + r_{A,common} a'_MZ,i,common (i=1..n_k, j=1,2 p=PID or LC value)

DZ twins:
a_DZ,i,j,p = r_{A,p} (1/2 a'_DZ,i,p + 1/2 a'_DZ,i,j,p) + r_{A,common} (1/2 a'_DZ,i,common + 1/2 a'_DZ,i,j,common) (i=1..n_k, j=1,2 p=PID or LC value)

Gaussian distribution for the underlying a', c' and e' variables, maximum likelihood, etc.

see page 9 here: https://pinboard.in/u:nhaliday/b:70f8b5b559a9

1. calculate population means μ from data (so just numbers)
2. calculate covariance matrix Σ in terms of latent parameters r_A, r_C, etc. (so variable correlations)
3. assume observed values are Gaussian with those parameters μ, Σ
4. maximum likelihood to figure out the parameters r_A, r_C, etc.

A Genetic Basis of Economic Egalitarianism: http://sci-hub.tw/10.1007/s11211-017-0297-y
Our results show that the large portion of the variance in a four-item economic egalitarianism scale can be attributed to genetic factor. At the same time, shared environment, as a socializing factor, has no significant effect. The effect of environment seems to be fully reserved for unique personal experience. Our findings further problematize a long-standing view that social justice attitudes are dominantly determined by socialization.

published in the journal "Social Justice Research" by some Hungarians, lol

various political science findings, w/ a few behavioral genetic, focus on Trump, right-wing populism/authoritarianism, and polarization: http://www.nationalaffairs.com/blog/detail/findings-a-daily-roundup/a-bridge-too-far
pdf  study  org:nat  biodet  politics  values  psychology  social-psych  genetics  variance-components  survey  meta-analysis  environmental-effects  🌞  parenting  replication  candidate-gene  GWAS  anthropology  society  trust  hive-mind  tribalism  authoritarianism  things  sociology  expression-survival  civic  shift  ethnocentrism  spearhead  garett-jones  broad-econ  political-econ  behavioral-gen  biophysical-econ  polisci  stylized-facts  neuro-nitgrit  phalanges  identity-politics  tradition  microfoundations  ideology  multi  genetic-correlation  data  database  twin-study  objektbuch  gender  capitalism  peace-violence  military  labor  communism  migration  civil-liberty  exit-voice  censorship  sex  sexuality  assortative-mating  usa  anglo  comparison  knowledge  coalitions  piracy  correlation  intersection  latent-variables  methodology  stats  models  ML-MAP-E  nibble  explanation  bioinformatics  graphical-models  hypothesis-testing  intersection-connectedness  poll  egalitarianism-hierarchy  envy  inequality  justice  westminster  publishing 
february 2017 by nhaliday
« earlier      
per page:    204080120160

bundles : acmframescitechie

related tags

2016-election  :/  ability-competence  absolute-relative  abstraction  academia  accretion  accuracy  acm  acmtariat  additive  aDNA  adversarial  advertising  advice  africa  age-generation  aggregator  aging  agri-mindset  agriculture  ai  ai-control  akrasia  albion  algebra  algorithms  alien-character  alignment  allodium  alt-inst  amazon  AMT  analogy  analysis  analytical-holistic  anglo  anglosphere  anomie  anthropology  antidemos  aphorism  apollonian-dionysian  app  apple  applicability-prereqs  applications  approximation  arbitrage  aristos  arms  arrows  art  article  asia  assortative-mating  atmosphere  atoms  audio  authoritarianism  autism  auto-learning  automation  axelrod  axioms  backup  bandits  barons  bayesian  behavioral-econ  behavioral-gen  being-becoming  being-right  benchmarks  benevolence  berkeley  best-practices  better-explained  bias-variance  biases  big-list  big-peeps  big-picture  big-yud  binomial  bio  biodet  bioinformatics  biophysical-econ  biotech  bits  blog  blowhards  boaz-barak  boltzmann  bonferroni  books  bounded-cognition  brain-scan  branches  brands  brexit  britain  broad-econ  business  business-models  c:*  c:**  c:***  calculation  calculator  california  caltech  cancer  candidate-gene  canon  capital  capitalism  career  cartoons  causation  censorship  characterization  charity  chart  cheatsheet  checking  checklists  chemistry  china  christianity  civic  civil-liberty  civilization  cjones-like  clarity  class  class-warfare  classic  classification  clever-rats  climate-change  cliometrics  closure  cmu  coalitions  coarse-fine  cocktail  cog-psych  cohesion  cold-war  collaboration  columbia  comics  coming-apart  commentary  communication  communism  community  comparison  compensation  competition  complement-substitute  complex-systems  complexity  composition-decomposition  compressed-sensing  computation  computer-vision  concentration-of-measure  concept  conceptual-vocab  concrete  conference  confidence  confluence  confounding  confusion  conquest-empire  consilience  context  contradiction  contrarianism  control  convergence  convexity-curvature  cool  cooperate-defect  core-rats  corporation  correlation  cost-benefit  cost-disease  counter-revolution  counterexample  counterfactual  courage  course  cracker-econ  creative  crime  criminal-justice  criminology  critique  crooked  crosstab  crypto  cs  culture  culture-war  curiosity  current-events  curvature  cycles  cynicism-idealism  dark-arts  darwinian  data  data-science  database  dataviz  death  debate  debt  decision-making  decision-theory  deep-learning  deep-materialism  defense  definite-planning  definition  degrees-of-freedom  democracy  demographics  dennett  density  dependence-independence  descriptive  detail-architecture  developing-world  developmental  devtools  differential  differential-privacy  dimensionality  direct-indirect  direction  dirty-hands  discovery  discrimination  discussion  disease  distribution  divergence  diversity  diy  domestication  douthatish  DP  draft  drama  drugs  duality  dumb-ML  duplication  duty  dynamic  dysgenics  early-modern  earth  eastern-europe  ecology  econ-metrics  econometrics  economics  econotariat  education  effect-size  efficiency  egalitarianism-hierarchy  einstein  elections  electromag  elite  embeddings  embodied  embodied-cognition  emotion  empirical  ems  encyclopedic  endo-exo  endocrine  endogenous-exogenous  energy-resources  engineering  enhancement  ensembles  entrepreneurialism  entropy-like  environment  environmental-effects  envy  epistemic  equilibrium  ergodic  error  essay  essence-existence  estimate  ethics  ethnocentrism  europe  evan-miller  events  evidence-based  evolution  examples  existence  exit-voice  exocortex  expectancy  experiment  expert  expert-experience  explanans  explanation  exploratory  explore-exploit  exposition  expression-survival  externalities  extra-introversion  extrema  facebook  faq  farmers-and-foragers  fashun  FDA  features  fermi  fertility  feudal  ffi  fiction  field-study  finance  fisher  fitness  flexibility  fluid  flux-stasis  focus  foreign-lang  foreign-policy  formal-values  forms-instances  forum  frequentist  frontier  futurism  gallic  galton  game-theory  games  garett-jones  gavisti  GCTA  gedanken  gelman  gender  gender-diff  gene-drift  generalization  generative  genetic-correlation  genetics  genomics  geoengineering  geography  geometry  geopolitics  germanic  giants  gibbon  gnon  gnosis-logos  gnxp  god-man-beast-victim  google  gotchas  government  gowers  grad-school  gradient-descent  graph-theory  graphical-models  graphs  gray-econ  gregory-clark  ground-up  group-selection  growth  growth-econ  growth-mindset  GT-101  guide  GWAS  gwern  GxE  hacker  hanson  hanushek  hard-tech  hari-seldon  harvard  haskell  hci  health  heavy-industry  heterodox  heuristic  hi-order-bits  hidden-motives  high-dimension  high-variance  higher-ed  history  hive-mind  hmm  hn  homepage  homo-hetero  honor  housing  howto  hsu  huge-data-the-biggest  human-capital  human-ml  humility  hypocrisy  hypothesis-testing  ide  ideas  identity  identity-politics  ideology  idk  iidness  illusion  immune  impact  impetus  impro  incentives  india  individualism-collectivism  induction  industrial-org  inequality  inference  info-dynamics  info-econ  info-foraging  infographic  information-theory  init  innovation  input-output  insight  instinct  institutions  integral  integrity  intel  intelligence  interdisciplinary  interests  internet  intersection  intersection-connectedness  intervention  interview  intricacy  intuition  invariance  investing  ioannidis  iq  iron-age  is-ought  islam  iteration-recursion  janus  japan  jargon  journos-pundits  judaism  justice  kernels  kinship  knowledge  korea  krugman  kumbaya-kult  labor  language  large-factor  latent-variables  latin-america  law  leadership  learning  learning-theory  lecture-notes  lectures  left-wing  legacy  legibility  len:long  len:short  lens  lesswrong  let-me-see  letters  levers  leviathan  lexical  libraries  life-history  lifts-projections  limits  linear-algebra  linear-models  linearity  liner-notes  links  list  literature  live-coding  local-global  logic  lol  long-term  longevity  longform  longitudinal  love-hate  lower-bounds  machine-learning  macro  magnitude  malaise  management  manifolds  map-territory  marginal  marginal-rev  market-failure  market-power  markets  markov  martial  martingale  math  math.CA  math.CO  math.CV  math.DS  math.FA  math.GR  math.MG  math.NT  math.RT  mathtariat  matrix-factorization  meaningness  measure  measurement  mechanics  media  medicine  medieval  mediterranean  MENA  mena4  mendel-randomization  mental-math  meta-analysis  meta:math  meta:medicine  meta:prediction  meta:science  metabuch  metameta  methodology  metric-space  metrics  michael-jordan  micro  microfoundations  microsoft  migration  military  missing-heritability  mit  mixing  ML-MAP-E  mobile  mobility  model-class  model-selection  models  mokyr-allen-mccloskey  moments  monetary-fiscal  money  monte-carlo  morality  mostly-modern  motivation  mrtz  multi  multiplicative  music-theory  musk  mutation  myth  n-factor  narrative  nascent-state  nationalism-globalism  natural-experiment  nature  near-far  network-structure  neuro  neuro-nitgrit  neurons  new-religion  news  nibble  nietzschean  nihil  nitty-gritty  nl-and-so-can-you  nlp  no-go  noble-lie  noise-structure  nonlinearity  nonparametric  nootropics  nordic  norms  northeast  nostalgia  novelty  nuclear  null-result  numerics  nutrition  nyc  objektbuch  occam  occident  old-anglo  oly  online-learning  open-closed  open-problems  operational  opioids  optimate  optimism  optimization  order-disorder  orders  ORFE  org:anglo  org:bleg  org:data  org:econlib  org:edge  org:edu  org:inst  org:junk  org:lite  org:mag  org:mat  org:med  org:nat  org:popup  org:rec  org:sci  organization  organizing  orient  orwellian  oscillation  oss  osx  outcome-risk  outliers  overflow  p:*  p:***  p:someday  p:whenever  papers  parable  paradox  parallax  parametric  parasites-microbiome  parenting  pareto  parsimony  paste  path-dependence  patho-altruism  patience  pdf  peace-violence  pennsylvania  people  performance  personality  perturbation  pessimism  phalanges  pharma  phase-transition  phd  philosophy  phys-energy  physics  pic  piketty  pinker  piracy  planning  plots  pls  plt  poast  podcast  poetry  polanyi-marx  polarization  policy  polisci  political-econ  politics  poll  pop-diff  pop-structure  popsci  population  population-genetics  populism  positivity  postmortem  power  power-law  ppl  practice  pragmatic  pre-2013  pre-ww2  prediction  preference-falsification  preprint  presentation  primitivism  princeton  prioritizing  priors-posteriors  privacy  pro-rata  probability  problem-solving  productivity  prof  profile  programming  progression  project  proofs  propaganda  properties  proposal  protestant-catholic  pseudoE  psych-architecture  psychiatry  psychology  psychometrics  public-goodish  public-health  publishing  putnam-like  puzzles  python  q-n-a  qra  QTL  quality  quantified-self  quantitative-qualitative  quantum  questions  quixotic  quora  quotes  r-lang  race  rand-approx  random  random-matrices  randy-ayndy  ranking  rant  rat-pack  rationality  ratty  reading  realness  reason  recent-selection  recommendations  recruiting  reddit  redistribution  reference  reflection  regression  regression-to-mean  regularization  regularizer  regulation  relativity  religion  rent-seeking  replication  repo  research  research-program  responsibility  retention  review  revolution  rhetoric  rhythm  right-wing  rigidity  rigor  rindermann-thompson  risk  ritual  roadmap  robotics  robust  roots  rot  russia  rust  s:*  s:**  s:null  sampling  sampling-bias  sapiens  scale  scaling-tech  scaling-up  science  scifi-fantasy  scitariat  search  securities  selection  sensitivity  sequential  series  sex  sexuality  shakespeare  shalizi  shift  sib-study  signal-noise  signaling  signum  similarity  simler  simplex  simulation  sinosphere  skeleton  skunkworks  slides  social  social-capital  social-choice  social-norms  social-psych  social-science  social-structure  society  sociology  socs-and-mops  soft-question  software  solid-study  space  sparsity  spatial  spearhead  speculation  speed  speedometer  spock  sports  ssc  stackex  stagnation  stanford  startups  stat-mech  stat-power  state-of-art  statesmen  stats  status  stereotypes  stochastic-processes  stock-flow  stories  strategy  stream  street-fighting  structure  students  study  studying  stylized-facts  subculture  subjective-objective  success  sulla  summary  supply-demand  survey  sv  symmetry  synchrony  synthesis  systematic-ad-hoc  systems  szabo  tactics  tails  talks  tcs  tcstariat  tech  technology  techtariat  telos-atelos  tetlock  the-bones  the-classics  the-devil  the-founding  the-great-west-whale  the-trenches  the-watchers  the-west  the-world-is-just-atoms  theory-of-mind  theory-practice  theos  thermo  thick-thin  thiel  things  thinking  tidbits  tightness  time  time-complexity  time-preference  time-series  tip-of-tongue  todo  toolkit  tools  top-n  topology  toxo-gondii  track-record  trade  tradeoffs  tradition  transportation  trees  trends  tribalism  tricki  tricks  trivia  troll  trump  trust  truth  tumblr  tutorial  twin-study  twitter  ui  unaffiliated  uncertainty  unintended-consequences  unit  unsupervised  urban  urban-rural  us-them  usa  values  vampire-squid  variance-components  vc-dimension  venture  video  virginia-DC  visual-understanding  visualization  visuo  vitality  volo-avolo  war  wealth  wealth-of-nations  webapp  weird  welfare-state  west-hunter  westminster  whiggish-hegelian  white-paper  wiki  winner-take-all  winter-2017  wire-guided  wisdom  within-without  woah  wonkish  working-stiff  world  world-war  worrydream  wut  X-not-about-Y  yak-shaving  yoga  yvain  zero-positive-sum  zooming  🌞  🎩  🐸  👳  👽  🔬  🖥  🤖  🦉 

Copy this bookmark: