I was planning to write something thoughtful today on formal versus substantive universals and how we have made lots of progress wrt the former but quite a bit less wrt the latter. This post, however, will have to wait. Why? Lunch! Over lunch I had an intriguing discussion of the latest academic brouhaha, and how it compares with one that linguists know about “L’affaire Hauser.” For those of you that don’t read the financial pages, or don’t religiously follow Krugman (here, first of many), or don’t watch Colbert (here) let me set the stage.
In 2010, two big name economists, Carmen Reinhart and Kenneth Rogoff (RR), wrote a paper building on their very important work (This Time is Different) chronicling the aftermath of financial crises through the ages. The book garnered unbelievable reviews and made the two rock stars of the Great Recession. The paper that followed was equally provocative, though not nearly as well received. The paper claimed to find an important kind of debt threshold which when crossed caused economic growth to tank. Actually, this is a bit tendentious. What the paper claimed was that there was a correlation between debt to GDP ratio of 90% and higher and the collapse of growth. Note: correlation, not causation. However, what made the paper hugely influential was the oft-suggested hint that the causality was from high debt to slow growth rather than the other way around or some combination of the two. The first interpretation was quickly seized upon by the “Very
Special Serious People” (VSP), aka “austerians,” to justify policies of aggressively cutting budget
deficits rather than fiscally priming the economic pump to combat high
Keynesians like Krugman (and many others, including Larry Summers, another
famous Harvardian) argued that the causality was from slow growth to large
deficits and so the right policy was to boost government spending to fight
unemployment as doing this would also alleviate the debt “problem.”
At any rate, it is safe to say that RR’s 2010 paper had considerable political
and economic impact. Ok, let’s shift to
the present, or at any rate the last week.
Three U Mass economists (Herndon, Ash and Pollin:the lead being a first year grad student whose econometrics class project was to replicate some well known result in order to learn econometric methods) showed that the 2010 paper was faulty in several important ways: (i) there was a spread sheet error with some important info left out (this accounted for a small part of RR’s result), (ii) there was a trimming decision where some data points that could be deemed relevant as they trended against the RR conclusion were left out (this accounted for a decent percentage of the RR effect), and (iii) there was a weighting decision in which one year’s results were weighted the same as 17 year’s worth of results (this accounted for a good chunk of RR’s results). All together, when these were factored in, RR’s empirical claim disappeared. Those who click onto the Colbert link above will get to meet the young grad student that started all of this. If you are interested in the incident, just plug “Reinhart and Rogoff” into Google and start reading. To say that this is now all over the news is an understatement. Ok, why do I find this interesting for us. Several reasons.
First, though this is getting well discussed and amply criticized in the media, I did not read anywhere that Harvard was putting together a panel to investigate bad scientific practice. Spreadsheet errors are to be expected. But the other maneuvers look like pretty shoddy empirical practice, i.e. even if defensible, they should be up front and center in any paper. They weren’t. But, still no investigation. Why not? It cannot be because this is “acceptable” for once exposed it seems that everyone finds it odd. Moreover, RR’s findings have been politically very potent, i.e. consequential. So, the findings were important, false and shoddy. Why no investigation? Because this stuff though really important is hard to distinguish from what everyone does?
Second, why no expose in the Chronicle accompanied by a careful think piece about research ethics? One might think that this would be front page academic news and that venues that got all excited over fraud might find it right up their alley to discuss such an influential case.
It is worth comparing this institutional complacency to the reaction our own guardians of scientific virtue had wrt Hauser. They went ape (tamarin?) shit! Professors were impanelled to review his lab’s work, he was censured and effectively thrown out of the university, big shot journal editors reviled him in the blogosphere, and he was held out as an object lesson in scientific vice. The Chronicle also jumped onto the band wagon tsk-tsking about dishonesty and how it derails serious science. Moreover, even after all the results in all the disputed papers were replicated no second thoughts, no revisiting and re-evaluating the issues, nothing. However, if one were asked to weigh the risks to scientific practice of RR’s behavior and Hauser’s alleged malpractice it’s pretty clear that the former are far more serious than the latter. Their results did not replicate. And, I am willing to bet, their sins are far more common and so pollute the precious data stream much much more. Indeed, there is a recent paper (here) that suggests that the bulk of research in neuroscience is not replicable, i.e. the data are simply not, in general, reliable. Do we know how generally replicable results in psycho are? Anyone want to lay a bet that the number is not as high as we like to think?
Is this surprising? Not really, I think. We don’t know that much about the brain or the mind. It strikes me that a lot of research consists of looking for interesting phenomena rather than testing coherent hypotheses. When you know nothing, it’s not clear what to count or how to count it. The problem is that the powerful methods of statistics encourages us to think that we know something when in fact we don’t. John Maynard Smith, I think, said that statistics is a tool that allows one to do 20 experiments and get one published in Nature (think p>.05). Fraud is not the problem, and I suspect that it never has been. The problems lie in the accepted methods, which, unless used very carefully and intelligently, can muddy the empirical waters substantially. What recent events indicate (at least to me) is that if you are interested in good data, then it’s the accepted methods that need careful scrutiny. Indeed, if replicability is what we want (and isn’t that the gold standard for data?), maybe we should all imitate Hauser for he seems to know how to get results that others can get as well.
I will end on a positive note: we linguists are pretty lucky. Our data is easily accessed and very reliable (as Sprouse and Almeida have made abundantly clear). We are also lucky in that we have managed to construct non-trivial theories with reasonable empirical reach. This acts to focus research and, just as importantly, makes it possible to identify “funny looking” data so that it can be subjected to careful test. Theories guard against gullibility. So, despite the fact that we don’t in general gather our data as “carefully” as neuroscientists and psychologists and economists gather theirs, we don’t need to. It’s harder to “cheat,” statistically or otherwise, because we have some decent theory and because the data is ubiquitous, easy to access and surprisingly robust. This need not always be so. In the future, we may need to devise fancy experiments to get data relevant to our theories. But to date, informal methods have proven sufficient. Strange that some see this as a problem, given the myriad ways there are to obscure the facts when one is being ultra careful.