cshalizi + bad_data_analysis   175

[1909.12475] Hidden Stratification Causes Clinically Meaningful Failures in Machine Learning for Medical Imaging
"Machine learning models for medical image analysis often suffer from poor performance on important subsets of a population that are not identified during training or testing. For example, overall performance of a cancer detection model may be high, but the model still consistently misses a rare but aggressive cancer subtype. We refer to this problem as hidden stratification, and observe that it results from incompletely describing the meaningful variation in a dataset. While hidden stratification can substantially reduce the clinical efficacy of machine learning models, its effects remain difficult to measure. In this work, we assess the utility of several possible techniques for measuring and describing hidden stratification effects, and characterize these effects both on multiple medical imaging datasets and via synthetic experiments on the well-characterised CIFAR-100 benchmark dataset. We find evidence that hidden stratification can occur in unidentified imaging subsets with low prevalence, low label quality, subtle distinguishing features, or spurious correlates, and that it can result in relative performance differences of over 20% on clinically important subsets. Finally, we explore the clinical implications of our findings, and suggest that evaluation of hidden stratification should be a critical component of any machine learning deployment in medical imaging."
to:NB  classifiers  data_mining  prediction  bad_data_analysis  statistics  to_teach:data-mining 
17 days ago by cshalizi
[1909.06539] Not again! Data Leakage in Digital Pathology
"Bioinformatics of high throughput omics data (e.g. microarrays and proteomics) has been plagued by uncountable issues with reproducibility at the start of the century. Concerns have motivated international initiatives such as the FDA's led MAQC Consortium, addressing reproducibility of predictive biomarkers by means of appropriate Data Analysis Plans (DAPs). For instance, repreated cross-validation is a standard procedure meant at mitigating the risk that information from held-out validation data may be used during model selection. We prove here that, many years later, Data Leakage can still be a non-negligible overfitting source in deep learning models for digital pathology. In particular, we evaluate the impact of (i) the presence of multiple images for each subject in histology collections; (ii) the systematic adoption of training over collection of subregions (i.e. "tiles" or "patches") extracted for the same subject. We verify that accuracy scores may be inflated up to 41%, even if a well-designed 10x5 iterated cross-validation DAP is applied, unless all images from the same subject are kept together either in the internal training or validation splits. Results are replicated for 4 classification tasks in digital pathology on 3 datasets, for a total of 373 subjects, and 543 total slides (around 27, 000 tiles). Impact of applying transfer learning strategies with models pre-trained on general-purpose or digital pathology datasets is also discussed."
to:NB  cross-validation  statistics  bad_data_analysis  to_teach:undergrad-ADA  to_teach:data-mining 
17 days ago by cshalizi
[1908.08702] Economically rational sample-size choice and irreproducibility
"Several systematic studies have suggested that a large fraction of published research is not reproducible. One probable reason for low reproducibility is insufficient sample size, resulting in low power and low positive predictive value. It has been suggested that insufficient sample-size choice is driven by a combination of scientific competition and 'positive publication bias'. Here we formalize this intuition in a simple model, in which scientists choose economically rational sample sizes, balancing the cost of experimentation with income from publication. Specifically, assuming that a scientist's income derives only from 'positive' findings (positive publication bias) and that individual samples cost a fixed amount, allows to leverage basic statistical formulas into an economic optimality prediction. We find that if effects have i) low base probability, ii) small effect size or iii) low grant income per publication, then the rational (economically optimal) sample size is small. Furthermore, for plausible distributions of these parameters we find a robust emergence of a bimodal distribution of obtained statistical power and low overall reproducibility rates, matching empirical findings. Overall, the model describes a simple mechanism explaining both the prevalence and the persistence of small sample sizes. It suggests economic rationality, or economic pressures, as a principal driver of irreproducibility."

--- To be clear, my skepticism here isn't about the basic idea, which has been articulated about a zillion times (back to Meehl at least...), but rather whether mathing it up with dubious simplifying assumptions adds anything of value.
to:NB  to_be_shot_after_a_fair_trial  bad_data_analysis  statistics  sociology_of_science  economics 
5 weeks ago by cshalizi
[1909.04436] The Prevalence of Errors in Machine Learning Experiments
"Context: Conducting experiments is central to research machine learning research to benchmark, evaluate and compare learning algorithms. Consequently it is important we conduct reliable, trustworthy experiments. Objective: We investigate the incidence of errors in a sample of machine learning experiments in the domain of software defect prediction. Our focus is simple arithmetical and statistical errors. Method: We analyse 49 papers describing 2456 individual experimental results from a previously undertaken systematic review comparing supervised and unsupervised defect prediction classifiers. We extract the confusion matrices and test for relevant constraints, e.g., the marginal probabilities must sum to one. We also check for multiple statistical significance testing errors. Results: We find that a total of 22 out of 49 papers contain demonstrable errors. Of these 7 were statistical and 16 related to confusion matrix inconsistency (one paper contained both classes of error). Conclusions: Whilst some errors may be of a relatively trivial nature, e.g., transcription errors their presence does not engender confidence. We strongly urge researchers to follow open science principles so errors can be more easily be detected and corrected, thus as a community reduce this worryingly high error rate with our computational experiments."
to:NB  bad_data_analysis  machine_learning  to_teach:data-mining 
5 weeks ago by cshalizi
[1905.11052] Does the $h_α$ index reinforce the Matthew effect in science? Agent-based simulations using Stata and R
"Recently, Hirsch (2019a) proposed a new variant of the h index called the hα index. He formulated as follows: "we define the hα index of a scientist as the number of papers in the h-core of the scientist (i.e. the set of papers that contribute to the h-index of the scientist) where this scientist is the α-author" (p. 673). The hα index was criticized by Leydesdorff, Bornmann, and Opthof (2019). One of their most important points is that the index reinforces the Matthew effect in science. We address this point in the current study using a recently developed Stata command (h_index) and R package (hindex), which can be used to simulate h index and hαindex applications in research evaluation. The user can investigate under which conditions hα reinforces the Matthew effect. The results of our study confirm what Leydesdorff et al. (2019) expected: the hα index reinforces the Matthew effect. This effect can be intensified if strategic behavior of the publishing scientists and cumulative advantage effects are additionally considered in the simulation."
to:NB  bibliometry  bad_data_analysis 
may 2019 by cshalizi
Looking Through Broken Windows: The Impact of Neighborhood Disorder on Aggression and Fear of Crime Is an Artifact of Research Design | Annual Review of Criminology
"Broken windows theory (BWT) has heavily influenced social science and policy over the past 30 years. It posits that disorder in neighborhoods leads to elevated crime by inviting additional criminal activity and by discouraging the positive social behavior that prevents crime. Scholars have debated the veracity of BWT, and here we conduct a meta-analysis of 96 studies to examine the effects of disorder on residents’ (a) general proclivities for aggressive behavior and (b) perceptions of and attitudes toward their neighborhood (e.g., fear of crime), with particular attention to aspects of research design that might confound causal inference. We found no consistent evidence that disorder induces greater aggression or more negative attitudes toward the neighborhood. Studies that found such effects disproportionately utilized weaker research designs that omit key correlates or confound perceptions of disorder with other neighborhood attitudes. We explore implications for theory, research, and policy."
to:NB  bad_data_analysis  crime  sociology  broken_windows 
may 2019 by cshalizi
Interpreting and Understanding Logits, Probits, and Other Nonlinear Probability Models | Annual Review of Sociology
"Methods textbooks in sociology and other social sciences routinely recommend the use of the logit or probit model when an outcome variable is binary, an ordered logit or ordered probit when it is ordinal, and a multinomial logit when it has more than two categories. But these methodological guidelines take little or no account of a body of work that, over the past 30 years, has pointed to problematic aspects of these nonlinear probability models and, particularly, to difficulties in interpreting their parameters. In this review, we draw on that literature to explain the problems, show how they manifest themselves in research, discuss the strengths and weaknesses of alternatives that have been suggested, and point to lines of further analysis."
to:NB  statistics  classifiers  bad_data_analysis  to_teach:undergrad-ADA 
may 2019 by cshalizi
In 2017, the feds said Tesla Autopilot cut crashes 40%—that was bogus | Ars Technica
Unfortunately, the mistake here is so bald that it'd be hard to turn into a good teaching example.
bad_data_analysis  to_teach  driverless_cars 
february 2019 by cshalizi
Overlooked factors in the analysis of parole decisions | PNAS
"Danziger et al. (1) concluded that meal breaks taken by Israeli parole boards influence the boards’ decisions. This conclusion depends on the order of cases being random or at least exogenous to the timing of meal breaks. We examined data provided by the authors and obtained additional data from 12 hearing days (n = 227 decisions).* We also interviewed three attorneys, a parole panel judge, and five personnel at Israeli Prison Services and Court Management, learning that case ordering is not random and that several factors contribute to the downward trend in prisoner success between meal breaks. The most important is that the board tries to complete all cases from one prison before it takes a break and to start with another prison after the break. Within each session, unrepresented prisoners usually go last and are less likely to be granted parole than prisoners with attorneys. Using the same decision rules as Danziger et al., our data indicate that unrepresented prisoners account for about one-third of all cases, but they prevail only 15% of the time, whereas prisoners with counsel prevail at a 35% rate.
"This nonrandom order of cases might have become apparent had the authors not limited their analysis. They lumped together decisions rejecting parole and cases that were deferred to a later date. Theoretically and in practice, deferrals are not comparable to rejections of parole.
"Excluding these deferred cases, our data indicate a success rate of 67% for prisoners with counsel and 39% for unrepresented prisoners. Excluding deferrals in the authors' data yields very similar success rates, beginning at about 75% and dropping to 42% at the end of a session. Thus, we strongly suspect that the pattern of declining success rates is a result of hearing represented prisoners first and unrepresented prisoners last...."
psychology  via:?  bad_data_analysis 
september 2018 by cshalizi
Rainfall statistics, stationarity, and climate change | PNAS
"There is a growing research interest in the detection of changes in hydrologic and climatic time series. Stationarity can be assessed using the autocorrelation function, but this is not yet common practice in hydrology and climate. Here, we use a global land-based gridded annual precipitation (hereafter P) database (1940–2009) and find that the lag 1 autocorrelation coefficient is statistically significant at around 14% of the global land surface, implying nonstationary behavior (90% confidence). In contrast, around 76% of the global land surface shows little or no change, implying stationary behavior. We use these results to assess change in the observed P over the most recent decade of the database. We find that the changes for most (84%) grid boxes are within the plausible bounds of no significant change at the 90% CI. The results emphasize the importance of adequately accounting for natural variability when assessing change."

--- They really do seem to be saying that because _independent, identically distributed_ random variables have 0 autocorrelation, all autocorrelated time series are non-stationary. This is so unbelievably stupid that I am going to have to read it again very carefully before banging my head into my desk.
to:NB  to_read  bad_data_analysis  time_series  statistics  to_teach:data_over_space_and_time  to_be_shot_after_a_fair_trial 
may 2018 by cshalizi
The Power of Bias in Economics Research - Ioannidis - 2017 - The Economic Journal - Wiley Online Library
"We investigate two critical dimensions of the credibility of empirical economics research: statistical power and bias. We survey 159 empirical economics literatures that draw upon 64,076 estimates of economic parameters reported in more than 6,700 empirical studies. Half of the research areas have nearly 90% of their results under-powered. The median statistical power is 18%, or less. A simple weighted average of those reported results that are adequately powered (power ≥ 80%) reveals that nearly 80% of the reported effects in these empirical economics literatures are exaggerated; typically, by a factor of two and with one-third inflated by a factor of four or more."
to:NB  economics  statistics  hypothesis_testing  bad_data_analysis  bad_science_journalism  re:neutral_model_of_inquiry  via:d-squared  to_read 
october 2017 by cshalizi
‘Moneyball’ for Professors?
The key paragraph:

"Using a hand-curated data set of 54 scholars who obtained doctorates after 1995 and held assistant professorships at top-10 operations research programs in 2003 or earlier, these statistical models made different decisions than the tenure committees for 16 (30%) of the candidates. Specifically, these new criteria yielded a set of scholars who, in the future, produced more papers published in the top journals and research that was cited more often than the scholars who were actually selected by tenure committees"

--- In other words, "success" here is defined entirely through the worst sort of abuse of citation metrics, i.e., through doing the things which everyone who has seriously studied citation metrics says you should _not_ use them for. (Cf. https://arxiv.org/abs/0910.3529 .) If the objective was to making academic hiring decisions _even less_ sensitive to actually intellectual quality, one could hardly do better.
I am sure that this idea will, however, be widely adopted and go from strength to strength.
bad_data_analysis  academia  bibliometry  social_networks  network_data_analysis  prediction  utter_stupidity  have_read  via:jbdelong  to:blog 
december 2016 by cshalizi
Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates
"Functional MRI (fMRI) is 25 years old, yet surprisingly its most common statistical methods have not been validated using real data. Here, we used resting-state fMRI data from 499 healthy controls to conduct 3 million task group analyses. Using this null data with different experimental designs, we estimate the incidence of significant results. In theory, we should find 5% false positives (for a significance threshold of 5%), but instead we found that the most common software packages for fMRI analysis (SPM, FSL, AFNI) can result in false-positive rates of up to 70%. These results question the validity of some 40,000 fMRI studies and may have a large impact on the interpretation of neuroimaging results."

--- Nichols is a serious guy (and co-author of one of the best fMRI textbooks I've seen). This is pretty awful news for the field.
to:NB  spatial_statistics  hypothesis_testing  fmri  neural_data_analysis  statistics  bad_data_analysis  nichols.thomas_e.  have_read  to:blog 
june 2016 by cshalizi
Ask a silly question, get a silly answer | Stats Chat
Presumably there a linguistic-pragmatics explanation of this --- people are interpreting the question so it makes sense as something asked for by an intelligent person, quite possibly more knowledgeable than they are.
bad_data_analysis  bad_science_journalism  surveys  natural_history_of_truthiness  blogged 
january 2015 by cshalizi
[1407.4240] Unconscious lie detection as an example of a widespread fallacy in the Neurosciences
"Neuroscientists frequently use a certain statistical reasoning to establish the existence of distinct neuronal processes in the brain. We show that this reasoning is flawed and that the large corresponding literature needs reconsideration. We illustrate the fallacy with a recent study that received an enormous press coverage because it concluded that humans detect deceit better if they use unconscious processes instead of conscious deliberations. The study was published under a new open-data policy that enabled us to reanalyze the data with more appropriate methods. We found that unconscious performance was close to chance - just as the conscious performance. This illustrates the flaws of this widely used statistical reasoning, the benefits of open-data practices, and the need for careful reconsideration of studies using the same rationale."
to:NB  to_read  statistics  psychology  experimental_psychology  bad_data_analysis  to_teach:undergrad-ADA  have_skimmed 
january 2015 by cshalizi
The Truth About Chicago’s Crime Rates, Part 1 | Chicago magazine | May 2014
The last tag is of course the least of the issues here.
(And tempting as it is to say "what gets measured gets massaged", that seems like not just a counsel of despair, but an _unfounded_ counsel of despair.)
violence  chicago  crime  corruption  juking_the_stats  evidence_based  social_measurement  management  social_science_methodology  to_teach  have_read  bad_data_analysis  to:blog 
august 2014 by cshalizi
For CEOs, Correlation Between Pay and Stock Performance Is Pretty Random - Businessweek
While I agree with the conclusion, I really have to poke holes in this. At the very least, it makes little sense to aggregate over industries this way. (A few years ago, you could have put an orangutan in charge of a commodities company and it would have made money.) And: what if each CEO were paid exactly what they were worth to the company, but there was inevitably a substantial amount of noise in stock returns because of circumstances beyond their control --- what would this plot look like?
economics  corporations  corporate_governance  why_corporations_are_messed_up  finance  to_teach  via:mejn  have_read  bad_data_analysis 
july 2014 by cshalizi
Infotainment Journalism | Jacobin
"I belabor all this because I take data analysis seriously. The processing and presentation of quantitative data is a key way that facts are manufactured, a source of things people “know” about the world. So it bothers me to see the discursive pollution of things that are essentially vacuous “infotainment” dressed up in fancy terms like “data science” and “data journalism.”"
bad_data_analysis  why_oh_why_cant_we_have_a_better_press_corps  why_oh_why_cant_we_have_a_better_intelligentsia  to:blog 
june 2014 by cshalizi
Posterior-Hacking: Selective Reporting Invalidates Bayesian Results Also by Uri Simonsohn :: SSRN
"Many believe that Bayesian statistics are robust to p-hacking. Many are wrong. In this paper I show with simulations and actual data that the two Bayesian approaches that have been proposed within Psychology, Bayesian inference and Bayes factors, are as invalidated by selective reporting as p-values are. Going Bayesian may offer some benefits, providing a solution to selective reporting is not one of them. Required disclosure is the only solution."
to:NB  statistics  bad_data_analysis  bayesianism  hypothesis_testing  meta-analysis  re:neutral_model_of_inquiry  have_read  to:blog 
march 2014 by cshalizi
The Epistemology of Mathematical and Statistical Modeling: A Quiet Methodological Revolution
Y'know, when your poster-child for the new methodological approach is one part ecological fallacy to one part ignorance of networks to two parts ignorance of causal inference, you might want to reconsider.
bad_data_analysis  social_science_methodology  statistics  social_influence  bad_science  psychology  via:DRMacIver 
march 2014 by cshalizi
On the scalability of statistical procedures: why the p-value bashers just don’t get it. | Simply Statistics
"Enforcing education and practice in data analysis is the only way to resolve the problems that people usually attribute to P-values. In the short term, we should at minimum require all the editors of journals who regularly handle data analysis to show competency in statistics and data analysis."
hypothesis_testing  statistics  bad_data_analysis  data_analysis 
february 2014 by cshalizi
Was Göring a good father? | Stats Chat
Yet more evidence of how the rigorous editorial standards of the big journals assures that only the most careful and important research receives high-profile dissemination.
bad_data_analysis  utter_stupidity  practices_relating_to_the_transmission_of_genetic_information  to_teach:undergrad-ADA 
september 2013 by cshalizi
Let's Put Garbage-Can Regressions and Garbage-Can Probits Where They Belong
"Many social scientists believe that dumping long lists of explanatory variables into linear regression, probit, logit, and other statistical equations will successfully “control” for the effects of auxiliary factors. Encouraged by convenient software and ever more powerful computing, researchers also believe that this conventional approach gives the true explanatory variables the best chance to emerge. The present paper argues that these beliefs are false, and that statistical models with more than a few independent variables are likely to be inaccurate. Instead, a quite different research methodology is needed, one that integrates contemporary powerful statistical methods with classic data-analytic techniques of creative engagement with the data."
to:NB  to_read  bad_data_analysis  social_science_methodology  statistics  linear_regression  regression  to_teach:undergrad-ADA  have_skimmed 
september 2013 by cshalizi
BishopBlog: Have we become slower and dumber?

ETA: Read the update, it gets better. Then read through to the comment by "Flint": Not only is the lead author heavily into the usual IQ-mongering / pseudo-scientific racism / dysgenics nexus (no surprise), his other great scientific passion is the taxonomy of sea monsters. (No, really.)
psychology  iq  bad_data_analysis  re:g_paper 
may 2013 by cshalizi
Mario Draghi’s Economic Ideology Revealed?
"Draghi’s presentation contains a simple but fatal error – or should that be misrepresentation? As the note to the graphs indicates, the productivity measure is expressed in real terms. In other words it shows how much more output an average worker produced in 2012 compared with 2000. So far so good. However, the wage measure that he uses, compensation per employee, is expressed in nominal terms (even if, interestingly, this is not expressly indicated on the slides). In other words the productivity measure includes inflation, the wage measure does not."

- This is an astonishing howler, if true.
economics  economic_policy  financial_crisis_of_2007--  bad_data_analysis  via:phnk 
march 2013 by cshalizi
Spreadsheets in the Cloud - Not Ready Yet
"Cloud computing is a relatively new technology that facilitates collaborative creation and modification of documents over the internet in real time. Here we provide an introductory assessment of the available statistical functions in three leading cloud spreadsheets namely Google Spreadsheet, Microsoft Excel Web App, and Zoho Sheet. Our results show that the developers of cloud-based spreadsheets are not performing basic quality control, resulting in statistical computations that are misleading and erroneous. Moreover, the developers do not provide sufficient information regarding the software and the hardware, which can change at any time without notice. Indeed, rerunning the tests after several months we obtained different and sometimes worsened results."
to:NB  bad_data_analysis  the_spreadsheet_menace  computational_statistics  to_teach:statcomp 
march 2013 by cshalizi
[1302.3299] The Geography of Happiness: Connecting Twitter sentiment and expression, demographics, and objective characteristics of place
"We conduct a detailed investigation of correlations between real-time expressions of individuals made across the United States and a wide range of emotional, geographic, demographic, and health characteristics. We do so by combining (1) a massive, geo-tagged data set comprising over 80 million words generated over the course of several recent years on the social network service Twitter and (2) annually-surveyed characteristics of all 50 states and close to 400 urban populations. Among many results, we generate taxonomies of states and cities based on their similarities in word use; estimate the happiness levels of states and cities; correlate highly-resolved demographic characteristics with happiness levels; and connect word choice and message length with urban characteristics such as education levels and obesity rates. Our results show how social media may potentially be used to estimate real-time levels and changes in population-level measures such as obesity rates."

--- After reading: oh for crying out loud. The measurement of happiness is counting words which have ordinal "happiness scores" and then averaging. Even if you leave aside the bad measurement, the analysis is just the sort of things I teach in undergraduate data mining, except I'd warn them against the ecological fallacy, which admittedly might have eliminated the paper.
to:NB  obesity  social_media  re:social_networks_as_sensor_networks  text_mining  bad_data_analysis  have_read 
february 2013 by cshalizi
Put the Brakes on Weight Gain « All 2013 News « News « College of Liberal Arts & Sciences « University of Illinois
Let's see, we've got regression mistaken for causation, an untested parametric regression form, ignoring serial correlation, and the ecological fallacy.
bad_data_analysis  obesity  epidemiology  to_teach:undergrad-ADA  via:rweaver 
february 2013 by cshalizi
Languages cool as they expand: Allometric scaling and the decreasing need for new words : Scientific Reports : Nature Publishing Group
Every single result here, including the "cooling", is entirely explicable as a sampling artifact, once we grant Zipf's law, or indeed any regularly varying distribution of word frequencies.
heavy_tails  bad_data_analysis  linguistics  shot_after_a_fair_trial 
december 2012 by cshalizi
Changing software to nudge researchers toward better data analysis practice « The Hardest Science
On the one hand, this probably wouldn't hurt. (Using R, rather than paying for junk, would get you #1--#3.)
On the other hand, to the extent that things like this _can_ help, it indicates a failure to understand one's tools, much more than bad default settings in the tools.
data_analysis  statistics  bad_data_analysis  social_science_methodology 
november 2012 by cshalizi
Storks Deliver Babies (p = 0.008)
"This article shows that a highly statistically significant correlation exists between stork populations and human birth rates across Europe. While storks may not deliver babies, unthinking interpretation of correlation and p-values can certainly deliver unreliable conclusions."
funny:geeky  funny:malicious  statistics  regression  bad_data_analysis  birds  via:tslumley  to_teach 
october 2012 by cshalizi
Language Log » Ignorance about ignorance
"Thus the survey designers craft a question like this (asked at a time when William Rehnquist was the Chief Justice of the United States):
" “Now we have a set of questions concerning various public figures. We want to see how much information about them gets out to the public from television, newspapers and the like…. What about William Rehnquist – What job or political office does he NOW hold?”
"The answers to such open-ended questions are recorded — as audio recordings and/or as notes taken by the interviewer — and these records are coded, later on, by hired coders.
"The survey designers give these coders very specific instructions about what counts as right and wrong in the recorded answers. In the case of the question about William Rehnquist, the criteria for an answer to be judged correct were mentions of both "chief justice" and "Supreme Court". These terms had to be mentioned explicitly, so all of the following (actual answers) counted as wrong:
"Supreme Court justice. The main one.
"He’s the senior judge on the Supreme Court.
"He is the Supreme Court justice in charge.
"He’s the head of the Supreme Court.
"He’s top man in the Supreme Court.
"Supreme Court justice, head.
"Supreme Court justice. The head guy.
"Head of Supreme Court.
"Supreme Court justice head honcho.
"Similarly, the technically correct answer ("Chief Justice of the United States") was also scored as wrong."

I'm astonished. If this is a general practice, and it amounts to any considerable fraction of survey answers to questions like this, I think our picture of what the public knows has to change a lot (and in a good way).
surveys  bad_data_analysis  social_life_of_the_mind  lupia.arthur 
october 2012 by cshalizi
Conclusions of difference require evidence of difference | Stats Chat
"All this is leading up to a story in the Herald, where a group of genetics researchers claim that a well-studied variant in a gene called monoamine oxidase increases happiness in women, but not in men.  We know this is surprising, because the researcher said so — they were expecting a decrease in happiness, and they don’t seem to have been expecting a male:female difference.  The researchers say that the difference could be because of testosterone — and of course it could be, but they don’t present any evidence at all that it is.
"Anyway, as you will be expecting by now, I found the paper (the Herald gets points for giving the journal name), and it is possible to do a simple test for differences in `happiness’ effect between men and women. And there isn’t much evidence for a difference. For people who collect p-values: about 0.09 (Bayesian would get a similar conclusion after a lot more work). So, if we didn’t expect a benefit in  women and no difference in men, the data don’t give us much encouragement for believing that.
"Testing for differences isn’t the ideal solution — even better would be to fit a model that allows for a smooth variation between constant effect and separate effect — but testing for differences is a good precursor to putting out a press release about differences and trying for headlines all over the world. We can’t expect newspapers to weed this sort of thing out if scientists are encouraging it via press releases."
multiple_comparisons  human_genetics  bad_data_analysis  methodological_advice  why_oh_why_cant_we_have_a_better_academic_publishing_system  re:neutral_model_of_inquiry 
august 2012 by cshalizi
Nathan Explains Science: How to read graphs: The public policy version
Dear God.
Let me emphasize a point Nathan mentioned but could've said more of. The bad-but-experienced teachers are selected to be the bottom 10--24% of experienced teachers. Nonetheless, their median is at the 35th percentile of _all_ teachers --- not the 5th or the 12th percentile, or even the 24th, but the 35th. To use this as an argument that experienced teachers aren't, on average, better, is really remarkable innumeracy. Conceivably, it could just be that bad teachers leave preferentially, so this is all selection and not learning, but still.
bad_data_analysis  natural_history_of_truthiness  education  visual_display_of_quantitative_information  collins.nathan  evisceration 
august 2012 by cshalizi
Language Log » Texting and language skills
See if they could be persuaded to cough up the data, and make it a case study?
bad_data_analysis  evisceration  linguistics  liberman.mark  to_teach:undergrad-ADA 
august 2012 by cshalizi
Humanities Hackathon « Quomodocumque
"I came home only to encounter this breathless post from the Science blog about a claim that you can use network invariants (e.g. clustering coefficient, degree distribution, correlation of degree between adjacent nodes) to distinguish factually grounded narratives like the Iliad from entirely fictional ones like Harry Potter.  The paper itself is not so convincing.  For instance, its argument on “assortativity,” the property that high-degree nodes tend to be adjacent to one another, goes something like this:
"Real-life social networks tend to be assortative, in the sense that the number of friends I have is positively correlated with the number of friends my friends have.
"The social network they write down for the Iliad isn’t assortative, so they remove all the interactions classified as “hostile,” and then it is.
"The social network for Beowulf isn’t assortative, so they remove all the interactions classified as “hostile,” and then it still isn’t, so they take out Beowulf himself, and then it is, but just barely.
"Conclusion: The social networks of Beowulf and the Iliad are assortative, just like real social networks.
"Digital humanities can be better than this!"

Well, that's one less stupid paper I've got to post about.
network_data_analysis  bad_data_analysis  bad_science  mythology 
july 2012 by cshalizi
How Robust Standard Errors Expose Methodological Problems They Do Not Fix
"“Robust standard errors” are used in a vast array of scholarship across all fields of em- pirical political science and most other social science disciplines. The popularity of this procedure stems from the fact that estimators of certain quantities in some mod- els can be consistently estimated even under particular types of misspecification; and although classical standard errors are inconsistent in these situations, robust standard errors can sometimes be consistent. However, in applications where misspecification is bad enough to make classical and robust standard errors diverge, assuming that misspecification is nevertheless not so bad as to bias everything else requires con- siderable optimism. And even if the optimism is warranted, we show that settling for a misspecified model (even with robust standard errors) can be a big mistake, in that all but a few quantities of interest will be impossible to estimate (or simulate) from the model without bias. We suggest a different practice: Recognize that differ- ences between robust and classical standard errors are like canaries in the coal mine, providing clear indications that your model is misspecified and your inferences are likely biased. At that point, it is often straightforward to use some of the numerous and venerable model checking diagnostics to locate the source of the problem, and then modern approaches to choosing a better model. With a variety of real exam- ples, we demonstrate that following these procedures can drastically reduce biases, improve statistical inferences, and change substantive conclusions."

N.B., much of the point here is to say that using robust standard errors doesn't protect you from many other consequences of getting distributional assumptions wrong. This is true but can often be side-stepped by bootstrapping (for instance, for prediction intervals).
have_read  via:phnk  to_teach:undergrad-ADA  misspecification  statistics  estimation  social_science_methodology  regression  bad_data_analysis  king.gary  model_checking  to:blog  in_NB 
july 2012 by cshalizi
Language Log » Textual narcissism, replication 2
While Mark is correct that these data are impossible to reconcile with the claims, even granting the methods, the fact is that those methods are exquisitely awful.
evisceration  text_mining  bad_data_analysis  decline_of_american_character  liberman.mark  to_teach:undergrad-ADA 
july 2012 by cshalizi
« earlier      
per page:    204080120160

related tags

academia  acree.brice  adolescence  affirmative_action  anthropology  anti-contrarianism  antifeminist_idiocy  astrology  auerbach.david  bad_data_analysis  bad_management  bad_science  bad_science_journalism  barabasi.albert-laszlo  bayesianism  behavioral_genetics  berk.richard  bibliometry  big_data  big_pharma  bioinformatics  birds  blogged  blogging  books:noted  book_reviews  booze  broken_windows  brooks.david  brumm.maria  budiansky.stephen  burke.timothy  burstein.miriam  cats  causal_inference  cavalli-sforza.l.luca  chalko.tom  chicago  classifiers  class_struggles_in_america  clauset.aaron  climate_change  coates.ta-nehisi  cobb-douglas_production_functions  cognitive_science  cold_war  collins.nathan  computational_statistics  confounding  copulas  corporate_governance  corporations  correlation  corruption  credit_derivatives  crime  cross-validation  cultural_criticism  cultural_transmission  data_analysis  data_mining  debunking  decision_trees  decline_of_american_character  democracy  did_this_really_need_saying?  diffusion_of_innovations  dimon.jamie  disasters  diversity  DNA_testing  douthat.ross  driverless_cars  drones  drugs  drum.kevin  dsquared  earthquakes  ecology  econometrics  economics  economic_growth  economic_policy  education  ellenberg.jordan  elster.jon  ensemble_methods  entropy  epidemiology  estimation  even_the_liberal_new_republic  evidence_based  evisceration  evolutionary_psychology  excel_considered_harmful  experimental_psychology  factor_analysis  farrell.henry  fermi_problems  fienberg.stephen_e.  filtering  finance  financialization  financial_crisis_of_2007--  financial_speculation  fmri  food  forensics  frankel.jeffrey  freese.jeremy  functional_connectivity  funny:academic  funny:geeky  funny:malicious  funny:pointed  galtons_problem  gelman.andrew  generations  genetics  gene_expression_data_analysis  geoengineering  geology  gives_economists_a_bad_name  gives_physicists_a_bad_name  goldberger.arthur  graphical_models  happiness  harrapan_civilization  harris.christine  have_read  have_skimmed  healy.kieran  heart_attack  heavy_tails  heritability  hierarchical_statistical_models  historical_genetics  historical_materialism  humanities  human_genetics  hypothesis_testing  ideology  indus_valley_civilization  inequality  information_theory  injustice  institutions  intelligence_(spying)  interactome  in_NB  iq  journalism  juking_the_stats  kamin.leon  kanazawa.satoshi  king.gary  klein.ezra  levitt.steven  levy.ferdinand  liberman.mark  libraries  linear_regression  linguistics  literary_criticism  logistic_regression  loud_and_prolonged_applause  lupia.arthur  machine_learning  macroeconomics  management  mcwhorter.john  measurement  mea_copula  mea_maxima_copula  medical_statistics  medicine  mental_testing  meta-analysis  methodological_advice  methodology  misspecification  model_checking  modest_proposals  molecular_biology  moral_depravity  mortgage_crisis  moskos.peter  multiple_comparisons  multiple_testing  murray.charles  music  mythology  national_surveillance_state  natural_history_of_truthiness  network_data_analysis  neural_data_analysis  neuroscience  newman.mark  nichols.thomas_e.  nielsen.michael  nukes  obesity  ok_surprisingly_fragile_data_analysis  or_perhaps_the_nightmare_into_which_we_are_slipping  our_decrepit_institutions  over-fitting  p-values  parapsychology  partnoy.frank  pashler.harold  peer_review  penn.mark  pictish  pielke.roger  pierrehumbert.raymond  poetry  political_science  polling  porter.mason_a.  practices_relating_to_the_transmission_of_genetic_information  prediction  primates  principal_components  psychoceramics  psychology  public_policy  pullum.geoff  race  racism  racist_idiocy  rational_choice  re:g_paper  re:homophily_and_confounding  re:neutral_model_of_inquiry  re:social_networks_as_sensor_networks  re:your_favorite_dsge_sucks  reactionary_idiocy  regression  reinhart.carmen  renewable_energy  risk_assessment  risk_perception  rogoff.kenneth  routinization  rubin.donald  running_dogs_of_reaction  salmon  sampling_on_the_dependent_variable  satire  science  science_policy  securitization  selection_bias  self-organized_criticality  self-promotion  sex_differences  shot_after_a_fair_trial  sides.john  silver.nate  simon.herbert  skeel.david  smith.noah  social_influence  social_life_of_the_mind  social_measurement  social_media  social_networks  social_neuroscience  social_psychology  social_science_methodology  sociology  sociology_of_science  spatial_statistics  splines  sports  standardized_testing  stanley.h._eugene  stark.philip_b.  statistics  stepping_stone_model  stochastic_processes  su.shi  suhay.liz  surveys  technological_change  terrorism_fears  text_mining  theory_of_mind  the_american_dilemma  the_continuing_crises  the_nightmare_from_which_we_are_trying_to_awake  the_present_before_it_was_widely_distributed  the_problem_is_not_the_p-values  the_robo-nuclear_apocalypse_in_our_past_light_cone  the_spreadsheet_menace  the_wired_ideology  thinking_in_stereotypes  time_series  to:blog  to:NB  to_be_shot_after_a_fair_trial  to_read  to_teach  to_teach:data-mining  to_teach:data_over_space_and_time  to_teach:linear_models  to_teach:statcomp  to_teach:undergrad-ADA  to_teach:undergrad-research  track_down_references  under_precisely_controlled_experimental_conditions_the_organism_does_what_it_damn_well_pleases  unions  ussr  us_news_and_world_report  us_politics  utter_stupidity  value-added_measurement_in_education  variance_estimation  via:?  via:aaron_clauset  via:ariddell  via:arthegall  via:coates.ta-nehisi  via:d-squared  via:djm1107  via:DRMacIver  via:email  via:fionajay  via:flint_riemen  via:gelman  via:henry_farrell  via:hilzoy  via:io9  via:james-nicoll  via:jbdelong  via:klk  via:krugman  via:languagelog  via:mathbabe  via:mejn  via:orzelc  via:phnk  via:rweaver  via:samii  via:simply_statistics  via:slaniel  via:tony_lin  via:tslumley  via:unfogged  violence  visual_display_of_quantitative_information  voting  vul.edward  wade.nicholas  whats_gone_wrong_with_america  what_the_experiment_died_of  why_corporations_are_messed_up  why_oh_why_cant_we_have_a_better_academic_publishing_system  why_oh_why_cant_we_have_a_better_intelligentsia  why_oh_why_cant_we_have_a_better_press_corps  writing  writing_advice  xkcd  yarkoni.tal  yglesias.matthew 

Copy this bookmark: