Archive ouverte HAL - The Great Regression. Machine Learning, Econometrics, and the Future of Quantitative Social Sciences
"What can machine learning do for (social) scientific analysis, and what can it do to it? A contribution to the emerging debate on the role of machine learning for the social sciences, this article offers an introduction to this class of statistical techniques. It details its premises, logic, and the challenges it faces. This is done by comparing machine learning to more classical approaches to quantification – most notably parametric regression– both at a general level and in practice. The article is thus an intervention in the contentious debates about the role and possible consequences of adopting statistical learning in science. We claim that the revolution announced by many and feared by others will not happen any time soon, at least not in the terms that both proponents and critics of the technique have spelled out. The growing use of machine learning is not so much ushering in a radically new quantitative era as it is fostering an increased competition between the newly termed classic method and the learning approach. This, in turn, results in more uncertainty with respect to quantified results. Surprisingly enough, this may be good news for knowledge overall."

--- The correct line here is that 90%+ of "machine learning" is rebranded non-parametric regression, which is what the social sciences should have been doing all along anyway, because they have no good theories which suggest particular parametric forms. (Partial exceptions: demography and epidemiology.) If the resulting confidence sets are bigger than they'd like, that's still the actual range of uncertainty they need to live with, until they can reduce it with more and better empirical information, or additional constraints from well-supported theories. (Arguably, this was all in Haavelmo.) I look forward to seeing whether this paper grasps these obvious truths.
QuantEcon is a NumFOCUS fiscally sponsored project dedicated to development and documentation of modern open source computational tools for economics, econometrics, and decision making. We welcome contributions and collaboration from the economics community and other partner organizations.
pdf -- The Cost of Bad Parents: Evidence from Incarceration on Children’s Education
> I find that conditional on conviction, parental incarceration increases years of education by 0.6 years for the children whose parents were on the margin of incarceration. This positive effect is larger when the incarceration is for a violent crime, for boys and when the incarcerated parent is the mother.
Predictive modeling of U.S. health care spending in late life | Science
That one-quarter of Medicare spending in the United States occurs in the last year of life is commonly interpreted as waste. But this interpretation presumes knowledge of who will die and when. Here we analyze how spending is distributed by predicted mortality, based on a machine-learning model of annual mortality risk built using Medicare claims. Death is highly unpredictable. Less than 5% of spending is accounted for by individuals with predicted mortality above 50%. The simple fact that we spend more on the sick—both on those who recover and those who die—accounts for 30 to 50% of the concentration of spending on the dead. Our results suggest that spending on the ex post dead does not necessarily mean that we spend on the ex ante “hopeless.”
health  statistics  prediction  mortality_risk  econometrics  sendhil.mullainathan  for_friends 
