Faculty of Language: Methodological Hygiene

Thursday, April 25, 2013

Methodological Hygiene

I was planning to write something thoughtful today on formal versus substantive universals and how we have made lots of progress wrt the former but quite a bit less wrt the latter. This post, however, will have to wait. Why? Lunch! Over lunch I had an intriguing discussion of the latest academic brouhaha, and how it compares with one that linguists know about “L’affaire Hauser.” For those of you that don’t read the financial pages, or don’t religiously follow Krugman (here, first of many), or don’t watch Colbert (here) let me set the stage.

In 2010, two big name economists, Carmen Reinhart and Kenneth Rogoff (RR), wrote a paper building on their very important work (This Time is Different) chronicling the aftermath of financial crises through the ages. The book garnered unbelievable reviews and made the two rock stars of the Great Recession. The paper that followed was equally provocative, though not nearly as well received. The paper claimed to find an important kind of debt threshold which when crossed caused economic growth to tank. Actually, this is a bit tendentious. What the paper claimed was that there was a correlation between debt to GDP ratio of 90% and higher and the collapse of growth. Note: correlation, not causation. However, what made the paper hugely influential was the oft-suggested hint that the causality was from high debt to slow growth rather than the other way around or some combination of the two. The first interpretation was quickly seized upon by the “Very ~~Special~~ Serious People” (VSP), aka “austerians,” to justify policies of aggressively cutting budget deficits rather than fiscally priming the economic pump to combat high unemployment.[1] Keynesians like Krugman (and many others, including Larry Summers, another famous Harvardian) argued that the causality was from slow growth to large deficits and so the right policy was to boost government spending to fight unemployment as doing this would also alleviate the debt “problem.”[2] At any rate, it is safe to say that RR’s 2010 paper had considerable political and economic impact. Ok, let’s shift to the present, or at any rate the last week.

Three U Mass economists (Herndon, Ash and Pollin:the lead being a first year grad student whose econometrics class project was to replicate some well known result in order to learn econometric methods) showed that the 2010 paper was faulty in several important ways: (i) there was a spread sheet error with some important info left out (this accounted for a small part of RR’s result), (ii) there was a trimming decision where some data points that could be deemed relevant as they trended against the RR conclusion were left out (this accounted for a decent percentage of the RR effect), and (iii) there was a weighting decision in which one year’s results were weighted the same as 17 year’s worth of results (this accounted for a good chunk of RR’s results). All together, when these were factored in, RR’s empirical claim disappeared. Those who click onto the Colbert link above will get to meet the young grad student that started all of this. If you are interested in the incident, just plug “Reinhart and Rogoff” into Google and start reading. To say that this is now all over the news is an understatement. Ok, why do I find this interesting for us. Several reasons.

First, though this is getting well discussed and amply criticized in the media, I did not read anywhere that Harvard was putting together a panel to investigate bad scientific practice. Spreadsheet errors are to be expected. But the other maneuvers look like pretty shoddy empirical practice, i.e. even if defensible, they should be up front and center in any paper. They weren’t. But, still no investigation. Why not? It cannot be because this is “acceptable” for once exposed it seems that everyone finds it odd. Moreover, RR’s findings have been politically very potent, i.e. consequential. So, the findings were important, false and shoddy. Why no investigation? Because this stuff though really important is hard to distinguish from what everyone does?

Second, why no expose in the Chronicle accompanied by a careful think piece about research ethics? One might think that this would be front page academic news and that venues that got all excited over fraud might find it right up their alley to discuss such an influential case.

It is worth comparing this institutional complacency to the reaction our own guardians of scientific virtue had wrt Hauser. They went ape (tamarin?) shit! Professors were impanelled to review his lab’s work, he was censured and effectively thrown out of the university, big shot journal editors reviled him in the blogosphere, and he was held out as an object lesson in scientific vice. The Chronicle also jumped onto the band wagon tsk-tsking about dishonesty and how it derails serious science. Moreover, even after all the results in all the disputed papers were replicated no second thoughts, no revisiting and re-evaluating the issues, nothing. However, if one were asked to weigh the risks to scientific practice of RR’s behavior and Hauser’s alleged malpractice it’s pretty clear that the former are far more serious than the latter. Their results did not replicate. And, I am willing to bet, their sins are far more common and so pollute the precious data stream much much more. Indeed, there is a recent paper (here) that suggests that the bulk of research in neuroscience is not replicable, i.e. the data are simply not, in general, reliable. Do we know how generally replicable results in psycho are? Anyone want to lay a bet that the number is not as high as we like to think?

Is this surprising? Not really, I think. We don’t know that much about the brain or the mind. It strikes me that a lot of research consists of looking for interesting phenomena rather than testing coherent hypotheses. When you know nothing, it’s not clear what to count or how to count it. The problem is that the powerful methods of statistics encourages us to think that we know something when in fact we don’t. John Maynard Smith, I think, said that statistics is a tool that allows one to do 20 experiments and get one published in Nature (think p>.05). Fraud is not the problem, and I suspect that it never has been. The problems lie in the accepted methods, which, unless used very carefully and intelligently, can muddy the empirical waters substantially. What recent events indicate (at least to me) is that if you are interested in good data, then it’s the accepted methods that need careful scrutiny. Indeed, if replicability is what we want (and isn’t that the gold standard for data?), maybe we should all imitate Hauser for he seems to know how to get results that others can get as well.

I will end on a positive note: we linguists are pretty lucky. Our data is easily accessed and very reliable (as Sprouse and Almeida have made abundantly clear). We are also lucky in that we have managed to construct non-trivial theories with reasonable empirical reach. This acts to focus research and, just as importantly, makes it possible to identify “funny looking” data so that it can be subjected to careful test. Theories guard against gullibility. So, despite the fact that we don’t in general gather our data as “carefully” as neuroscientists and psychologists and economists gather theirs, we don’t need to. It’s harder to “cheat,” statistically or otherwise, because we have some decent theory and because the data is ubiquitous, easy to access and surprisingly robust. This need not always be so. In the future, we may need to devise fancy experiments to get data relevant to our theories. But to date, informal methods have proven sufficient. Strange that some see this as a problem, given the myriad ways there are to obscure the facts when one is being ultra careful.

[1] VSP is Krugman’s coinage. I am not sure who first coined the second.

[2] The scare quotes are to indicate that there is some debate about whether there actually is a problem, at least in the medium term.

20 comments:

AnonymousApril 27, 2013 at 12:25 PM
I don't really see why there would be an investigation. Some people get the gold and some people get the shaft. Linguists should know this better than any school. For example, what's UG based on? It's not empirical and it's not a spreadsheet error so...

And yet Chomsky has a position at MIT and Pinker has a position at Harvard, even though their arguments scurry faster than a cockroach when held up to the light (for one example, see: http://languagelog.ldc.upenn.edu/nll/?p=4211).

I can't speak too much for the Hauser case, but I think it boils down to money (as all things possibly do). Harvard can shaft Hauser because they need to save face. RR got picked up by some high ranking politicians who smell like endowments. Now if you'll excuse me, I'm off to go hack my version of Excel...
ReplyDelete
Replies
UnknownApril 27, 2013 at 5:12 PM
I admit I also struggled with the comparison between RR and Hauser but think the point Norbert was trying to make in the last paragraph is actually quite interesting. He says linguists are fortunate because:

"Our data is easily accessed and very reliable (as Sprouse and Almeida have made abundantly clear). We are also lucky in that we have managed to construct non-trivial theories with reasonable empirical reach. This acts to focus research and, just as importantly, makes it possible to identify “funny looking” data so that it can be subjected to careful test. Theories guard against gullibility."

So while possibly some linguistic "arguments scurry faster than a cockroach when held up to the light", theories are stable and here to stay; especially the non-trivial ones with reasonable reach. And the easy access to data allows for the public to evaluate theories as mentioned by joemcveigh. Of course the foundation of biolinguistics, MP, is not a theory but a research program. And SMT is not a theory either but a thesis. So I am curious: what are the current theories?
ReplyDelete
Replies
Marc van OostendorpApril 28, 2013 at 9:03 AM
As it happens, the New York Times had an article about another case of fraud, in my home country: http://www.nytimes.com/2013/04/28/magazine/diederik-stapels-audacious-academic-fraud.html

I believe this case can serve to illustrate some of Norbert's points (even though in this case, the professor in question WAS fired). My impression is that social psychology is even worse than economics in its equating science with finding correlations in large data sets: especially if you read Stapel's book it becomes clear that social psychology is all about doing experiments and interpreting them in the 'right' way statistically, and hardly ever about trying to construct a theory with some explanatory depth.

If Stapel's research would not have been fraudulent, not many things would have changed. He found correlations between eating meat and being aggressive, or between seeing the word 'capitalism' and eating M&M's. In this way, he became an academic superstar, at least at the Dutch scale: he published in Science, was a Dean at Tilburg University (where as you may know, a thriving department of linguistics has been closed a few years ago because of its unfeasibility) and appeared on tv a lot with the outcomes of his 'research'.

People are now discussing what should be the consequences of this. The die-hard empiricists say that experiments should be more standardly replicated, we should do statistics on data sets just to see how likely it is that they have been made up, etc. But it seems to me that having a good and solid theory, or a number of competing theories, also helps here.

The point is, once you have a theory, some data can be PROBLEMATIC (or 'funny-looking', as Norbert says) for somebody believing in that theory, so that person will become suspicious and therefore motivated to replicate the experiments, or at least check all the relevant data. This apparently is hardly ever the case in social psychology: the (fabricated) observation that people who see the word 'capitalism' eat more M&Ms was just not problematic for anybody, since nobody had any deep expectations about the relation between seeing that word and consuming chocolate sweets to begin with.

But to be fair, it has to be noted that in this case after a number of years a few junior researchers were brave enough to discover the fraud and talk to the rector about it, and the guy was fired. (A detail which might interest linguists, and which is not mentioned in the NYT article, is that the committee which examined the fraud was led by the well-known psycholinguist Willem Levelt.) And that might shed some light on the asymmetry between the Hauser case and the RR case. The differences might have less to do with issues of methodology than with prestige and political power.
(I have to admit that I know much more about the Stapel case than about Hauser or RR.)
ReplyDelete
Replies
UnknownApril 30, 2014 at 1:23 PM
This comment has been removed by a blog administrator.
ReplyDelete
Replies

Add comment