Here's another piece on this topic in the venerable (i.e. almost always behind the curve) NYTs. The point worth noting is the sources of the "crimes and misdemeanors," what drive retraction and lack of replicability in so many domains. Curiously, outright fraud is not the great plague. Rather (surprise surprise), the main problems come from data manipulation (i.e. abuse and misuse of stats), plagiarism and (I cannot fathom why this is a real problem) publishing the same results in more than one venue. Outright fraud comes in at number four and the paper does not actually quantify how many such papers there are. So, if you want to make the data better, then beware of statistical methods! They are very open to abuse; either as data trimming or phishing for results. This does not mean that stats are useless. Of course they aren't. But they are tools open to easy abuse and misunderstanding. This is worth keeping in mind when the stats inclined rail against our informal methods. It's actually easy to control for bad data in linguistics. To repeat, I am all in favor of using stat tools if useful (e.g. see the last post), but as is evident, sat because it is "statistically" represented is not without its own problems.
Last point: the NYT reports some dissenters's opinions regarding how serious this problem really is. People delight in being super concerned about these sorts of problems. As I have stated before, I am still not convinced that this is a real problem. The main problem is not bad data but very bad theories. When you know nothing then bad data matters. When you do, much much less. Good theory (even at the MLG level) purifies data. The problem is less bad data than the idea that data is the ultimate standard. There is one version of this which is unarguable (that facts matter). But there is another, the strong data first version, that is pernicious (every data point is sacrosanct). The idea behind the NYT article seems to be that if we are only careful and honest all the data we produce will be good. This is bunk. There is no substitute for thinking, no matter how much data one gets. And stats hygiene will not make this fact go away.