Faculty of Language: Science "fraud"

Tuesday, June 16, 2015

Science "fraud"

Here's another piece on this topic in the venerable (i.e. almost always behind the curve) NYTs. The point worth noting is the sources of the "crimes and misdemeanors," what drive retraction and lack of replicability in so many domains. Curiously, outright fraud is not the great plague. Rather (surprise surprise), the main problems come from data manipulation (i.e. abuse and misuse of stats), plagiarism and (I cannot fathom why this is a real problem) publishing the same results in more than one venue. Outright fraud comes in at number four and the paper does not actually quantify how many such papers there are. So, if you want to make the data better, then beware of statistical methods! They are very open to abuse; either as data trimming or phishing for results. This does not mean that stats are useless. Of course they aren't. But they are tools open to easy abuse and misunderstanding. This is worth keeping in mind when the stats inclined rail against our informal methods. It's actually easy to control for bad data in linguistics. To repeat, I am all in favor of using stat tools if useful (e.g. see the last post), but as is evident, sat because it is "statistically" represented is not without its own problems.

Last point: the NYT reports some dissenters's opinions regarding how serious this problem really is. People delight in being super concerned about these sorts of problems. As I have stated before, I am still not convinced that this is a real problem. The main problem is not bad data but very bad theories. When you know nothing then bad data matters. When you do, much much less. Good theory (even at the MLG level) purifies data. The problem is less bad data than the idea that data is the ultimate standard. There is one version of this which is unarguable (that facts matter). But there is another, the strong data first version, that is pernicious (every data point is sacrosanct). The idea behind the NYT article seems to be that if we are only careful and honest all the data we produce will be good. This is bunk. There is no substitute for thinking, no matter how much data one gets. And stats hygiene will not make this fact go away.

5 comments:

Noah MotionJune 16, 2015 at 4:18 PM
What do you mean when you write that you cannot fathom why publishing the same results in more than one venue is a problem? Do you mean that you can't fathom how this could be one of the main problems in science? Or do you mean that you don't think it's a problem to publish the same thing in multiple venues?
ReplyDelete
Replies

Add comment

Faculty of Language

Comments

Tuesday, June 16, 2015

Science "fraud"

5 comments:

Contributors