Andrew Gelman, a statistician at Columbia (and one whose opinions I generally respect (I read his blog regularly) and whose work, to the degree that I understand it, I really like), has a thing about Hauser (here). What offends him are Hauser’s (alleged) data faking (yes, I use ‘alleged’ because I have personally not seen the evidence, only heard allegations, and given how easy these are to make, well, let's just try not to jump to comfortable conclusions). Here he explains why the “faking” is so bad, not rape, murder or torture bad, but science wise bad. Why? Because what fake data does is “waste people’s time (and some lab animals’ lives) and slow down the progress of science.” Color me skeptical.
Here’s what I mean: is this a generic claim or one specific to Hauser’s work? If the latter, then I would like to see the evidence that his alleged improprieties had any such effect. Let me remind you again (see here, here) that the results of all of Hauser’s papers that were questioned have since been replicated. Thus, the conclusions of these papers stand. Anyone who relied on them to do their research did just fine. Was there a huge amount of time and effort wasted? Did lab animals get used in vain? Maybe. What’s the evidence? And maybe not. They all replicated. Moreover, if the measure of his crime is wasted time and effort, did Hauser’s papers really lead down more blind alleys and wild goose chases then your average unreplicable psych or neuro paper (here).
As for the generic claim, I would like to see more evidence for this as well. Among the “time wasters” out there, is faked data really the biggest problem, or even a very big problem? Or is this sort of like Republican "worries" about fake voters inundating the polls and voting for Democrats? My impression is that the misapplication of standard statistical techniques to get BS results that fail to replicate are far more problematic (see here and here). If this is so, then Gelman’s fake data worries, by misdirection, may be leading us away from the more serious time wasters, i.e. it diverts attention from the real time sinks, viz. the production of non-replicable “results,” which, so far as I can tell is closely tied to the use of BS statistical techniques to coax significance in one out of every 20 or so experiments. We should be so lucky that the main problem is fakery!
So that I am not misunderstood, let me add, that nobody I know condones faking data. But this is not because it in some large measure retards the forward march of science (this claim may be true, but it is not a truism), but because faking is quite generally bad. Period (again, not unlike voter fraud). And it should not be practiced or condoned for the same reason that lying, bullying, and plagiarism should not be practiced or condoned. These are all lousy ways to behave. However, that said, I have real doubts that fake data is the main problem holding back so many of the “sciences,” and claiming otherwise without evidence can misdirect attention from where it belongs. The main problem with many of the “sciences” is absence of even a modicum of theory, i.e. lack of insight (i.e. absence of any idea about what’s going on), and all the data mining in the world cannot substitute for one or two really good ideas. The problem I have with Gelman’s obsession is that in the end it suggests a view of science that I find wrongheaded: that data mining is what science is all about. As the posts noted above indicate, I could not disagree more.