It seems that everywhere you look science is collapsing. I hope my tongue was visibly in cheek as you read first sentence. There is currently a fad deploring the irreplicability of experiments in various fields. Here’s a lamentation I read lately, the following being the money line:
The deeper problem is that much of cancer research in the lab—maybe even most of it—simply can’t be trusted. The data are corrupt. The findings are unstable. The science doesn’t work.
Why? Because there is “a replication crisis in biomedicine.”
I am actually skeptical about the claims that the scientific sky is falling. But before I get to that, I have to admit to a bit of shadenfreude. Compared to what we see in large parts of psychology, and neuroscience and, if the above is correct, biomedicine, linguistic “data” is amazingly robust and stable. It is easy to get, easy to vet, and easy to replicate. There is no “data” problem in linguistics analogous to what we are hearing exists in the other domains of inquiry. And it is worth thinking about why this is. Here’s my view.
First, FL is a robust mental organ. What I mean by this is that Gs tend to have a large effect on acceptability, and acceptability is something that native speakers are (or can be trained) to judge reliably. This is a big deal. Linguists are lucky in this way. There are occasional problems inferring Gish properties from acceptability judgments, and we ought not to confuse grammaticality with acceptability. However, as a matter of fact, the two often swing in tandem and the contribution of grammaticality to acceptability is very often quite large. This need not have been true, but it appears that it is.
We should be appropriately amazed by this. Many things go into an acceptability judgment. However, it is hard to swamp the G factor. This is almost certainly a reflection of the modular nature of FL and Gish knowledge. Gishness doesn’t give a hoot for the many contextual factors involved in language use. Context matters little, coherence matters little, ease of processing matters little. What really matters is formal kashrut. So when contextual/performance factors do affect acceptability, as they do, they don’t wipe out the effects of G.
Some advice: When you are inclined to think otherwise repeat to yourself colorless green ideas sleep furiously or recall that instinctively eagles that fly swim puts the instinct with the obviously wrong eagle trait. Gs aren’t trumped by sense or pragmatic naturalness and because of this we linguists can use very cheap and dirty methods to get reliable data, in many domain of interest.
So, we are lucky and we do not have a data problem. However, putting my gloating aside, let’s return to the data crises in science. Let me make three points.
First, experiments are always hard. They involve lots of tacit knowledge on the part of the experimenters. Much of this knowledge cannot be written down in notebooks and is part of what it is to get an experiment to run right (see here). It is not surprising that this knowledge gets easily lost and that redoing experiments from long ago become challenging (as the Slate piece makes clear). This need not imply sloppiness or methodological sloth or corruption. Lab notes do not (and likely cannot) record important intangibles, or, if they do, they don’t do so well. Experiments are performances and, as we all know, a score does not record every detail of how to perform a piece. So, even in the best case, experiments, at least complex ones, will be hard to replicate, especially after some time has passed.
Second, IMO, much of the brouhaha occurs in areas where we have confused experiments relevant to science with those relevant to engineering. Science experiments are aimed at isolating basic underlying causal factors. They are not designed to produce useful product. In fact, they are not engineering at all for they generally abstract from precisely those problems that are the most interesting engineering wise. Here's a nice quote from Thornton Fry, once head of Bell Labs math unit:
The mathematician tends to idealize any situation with which he is confronted. His gases are “ideal,” his conductors “perfect, “ his surfaces “smooth” He call this “getting down to the essentials.” The engineer is likely to dub it “ignoring the facts.”
Science experiments are generally investigating the properties of these ideal objects and their experiments are not worried about the fine details that the engineer would rightly worry about. This is a problem when the interest in the findings becomes interesting from an engineering point of view. Here’s the Slate piece:
When cancer research does get tested, it’s almost always by a private research lab. Pharmaceutical and biotech businesses have the money and incentive to proceed—but these companies mostly keep their findings to themselves. (That’s another break in the feedback loop of self-correction.) In 2012, the former head of cancer research at Amgen, Glenn Begley, brought wide attention to this issue when he decided to go public with his findings in a piece for Nature. Over a 10-year stretch, he said, Amgen’s scientists had tried to replicate the findings of 53 “landmark” studies in cancer biology. Just six of them came up with positive results.
I am not trying to suggest that being replicable is a bad idea, but I am suggesting that what counts as a good experiment for scientific purposes might not be one that suffices for engineering purposes. Thus, I would not be at all surprised that there is a much smaller replication crisis in molecular or cell biology than there is in biomedicine, the former further removed from the engineering “promise” of bioscience than the latter. If this is correct, then part of the problem we see might be attributed to the NIH and NSF’s insistence that science payoff (“bench to bed” requirements). Here, IMO, is one of the less positive consequences of the “wider impact” sections of contemporary grants.
Third, at least in some areas, the problem of replication really is a problem of ignorance. When you know very little, an experiment can be very fragile. We try to mitigate the fragility by statistical massaging, but ignorance makes it hard to know what to control for. IMO, domains where we find replicability problems look like domains where our knowledge of the true causal structures is very spotty. This is certainly true of large parts of psychology. It strikes me that the same might hold in biomedicine (medicine being as much an art as a science as anyone who has visited a doctor likely knows). To repeat Eddington’s dictum: never trust an experiment until it’s been verified by theory! Theory poor domains will also be experimentally fragile ones. This does not mean that science is in trouble. It means that not everything we call a science really is.
Let me repeat this point more vigoroulsy: there is a tendency to identify science with certain techniques of investigation: experiments, stats, controls, design etc. But this does not science make. The real sciences are not distinguished by their techniques but are domains where, for some happy reason, we have identified the right idealizations to investigate. Real science arises when our idealizations gain empirical purchase, when they fit. Thinking these up, moreover, is very hard for any number of reasons. Here is one: idealizations rely on abstractions, and some domains lend themselves to abstraction more easily than others. Thus some domains will be more scientifically successful than others. Experiments work and are useful when we have some ideas where the causal joints are and this comes form correctly conceiving of the problems our experiments are constructed to address. Sadly, in most domains if interest, we know little and it should be no surprise that when you don’t know much you can be easily misled even if you are careful.
Let me put this another way: there is a cargo cult conception of science that the end of science lamentations seem to presuppose. Do experiments, stats, controls, be careful etc. and knowledge will come. Science on this view is the careful accumulation and vetting of data. Get the data right and the science will take care of itself. It lives very comfortably with an Empiricist conception of knowledge. IMO, it is wrong. Science arises when we manage to get the problem right. Then these techniques (and they are important) gain traction. We then understand what experiments are telling us. The lamentations we are seeing routinely now about the collapse of science has less to do with the real thing than with our misguided conception of what the enterprise consists in. It is a reflection of the overwhelming dominance of Empiricist ideology, which, at bottom comes down to the belief that insight is just a matter of more and more factual detail. The modern twist on this is tha though one fact might not speak for itself, lots and lots of them do (hence the appeal of big data). What we are finding is that there is no real substitute for insight and thought. This might be unwelcome news to many, but that’s the way it is and always will be. The “crises” is largely a product of the fact that for most domains of interest we have very little idea about what’s going on, and urging more careful attention to experimental detail will not be able to finesse this.
 Again see the work by Jon Sprouse, Diogo Almeida and colleagues on this. The take home message from their work is that what we always thought to be reliable data is in fact reliable data and that our methods of collecting it are largely fine.
 This is why stats for data collection is not generally required (or useful). I read a nice quote from Rutherford: If your experiment needs statistics, you ought to have done a better experiment.” You draw the inference.