Sunday, April 28, 2013

More on Science Hygiene

I was going to say a word or two about today's article in the NYT on scientific fraud (here). But Marc van Oostendorp beat me to it. I liked his discussion and have hoisted it from the comments for greater visibility. Here's his comment:

As it happens, the New York Times had an article about another case of fraud, in my home country:

I believe this case can serve to illustrate some of Norbert's points (even though in this case, the professor in question WAS fired). My impression is that social psychology is even worse than economics in its equating science with finding correlations in large data sets: especially if you read Stapel's book it becomes clear that social psychology is all about doing experiments and interpreting them in the 'right' way statistically, and hardly ever about trying to construct a theory with some explanatory depth.

If Stapel's research would not have been fraudulent, not many things would have changed. He found correlations between eating meat and being aggressive, or between seeing the word 'capitalism' and eating M&M's. In this way, he became an academic superstar, at least at the Dutch scale: he published in Science, was a Dean at Tilburg University (where as you may know, a thriving department of linguistics has been closed a few years ago because of its unfeasibility) and appeared on tv a lot with the outcomes of his 'research'.

People are now discussing what should be the consequences of this. The die-hard empiricists say that experiments should be more standardly replicated, we should do statistics on data sets just to see how likely it is that they have been made up, etc. But it seems to me that having a good and solid theory, or a number of competing theories, also helps here.

The point is, once you have a theory, some data can be PROBLEMATIC (or 'funny-looking', as Norbert says) for somebody believing in that theory, so that person will become suspicious and therefore motivated to replicate the experiments, or at least check all the relevant data. This apparently is hardly ever the case in social psychology: the (fabricated) observation that people who see the word 'capitalism' eat more M&Ms was just not problematic for anybody, since nobody had any deep expectations about the relation between seeing that word and consuming chocolate sweets to begin with.

But to be fair, it has to be noted that in this case after a number of years a few junior researchers were brave enough to discover the fraud and talk to the rector about it, and the guy was fired. (A detail which might interest linguists, and which is not mentioned in the NYT article, is that the committee which examined the fraud was led by the well-known psycholinguist Willem Levelt.) And that might shed some light on the asymmetry between the Hauser case and the RR case. The differences might have less to do with issues of methodology than with prestige and political power.
(I have to admit that I know much more about the Stapel case than about Hauser or RR.) 

Let me add a word or two.

First, one very significant difference (for me) between the Hauser case and the other two Bhattacharjee (the author) mentions is that all of Hauser's disputed work was REPLICATED.  This is really a very big deal.  It means that whatever shortcuts there may have been, they were inconsequential given that the results stand. In fact, I would go further, if the stuff replicates then this constitutes prima facie evidence that the investigation was done right to begin with.  Replicability is the gold standard. The problem with Suk's results and Stapel's is that the stuff was not only dishonest, but totally unstable, or so I gather from the article. Only they could get their results. In this regard Hauser's results are very different.

Second, as Marc points out, it looks like Stapel's work really didn't matter.  There was nothing deep there. Nor, a priori, could you expect there to be.  His experiments were unlikely to touch the underlying psychological mechanisms of the behavior of interest because the kinds of behaviors Stapel is interested in are just too damn complicated. Great experiments isolate single causal factors. The kinds of powers implicated in this experiment are many and interact, no doubt, in many complex ways. Thus, there is no surprise that the effect sizes were expected to be small (Stapel had to cook the numbers so that the effect sizes did not appear to be large ("He knew that the effect he was looking for had to be small in order to be believable..")). He was trolling for statistical, not scientific, significance.  His ambitions were political rather than scientific. At any rate, he believed that the appearance of credibility was tied to having small significant effect sizes. Maybe the right question is why anyone should care about small effects in this sort of domain to begin with.

Third, I think that Bhattacharjee ends by noting that fraud is not likely to be the most serious polluter of the data stream. Let me quote here:

   "Fraud like Satapel's -brazen and careless in hindsight- might represent the lesser threat to the integrity of science than the massaging of data and selective reporting of experiments...tweaking results  [NH]- like stopping data collection once the results confirm the hypothesis - is a common practice. "I would certainly see that if you do it in more subtle ways, it's more difficult to detect," Ap Dijksterhuis, one of the Netherlands best known psychologists, told me...

So, is fraud bad? Sure. Nobody endorses it in science any more than anywhere else.  But, this article, as Marc so eloquently notes, shows that it is easier to do where one is trolling for correlations rather than exploring underlying causal powers (i.e. in the absence of real theory) and where effect sizes are likely to be small because of the complex interaction of multiple causal factors.  Last, let's never forget about replicability. It matters, and where it exists, fraud may not be easily distinguishable from correct design.