Wednesday, January 22, 2014

Linguistics and the scientific method

The Scientific Method (SM) is coming in for some hard knocks lately. For example, the NYTs has a recent piece rehearsing the replicability problems that “meta-scientists” like John Ioannidis have uncovered (see here). It seems that a lot (maybe most) of the published “findings” served up in our best journals reflect our own biases and ambitions more than they reveal nature’s secrets. Moreover, George Johnson, the NYT science writer (and he is a pretty good popularizer of some neat stuff (see here)) speculates that this may be unavoidable, at least given the costs of big science (hint: it’s really expensive) and the reward structure for adding a new brick to our growing scientific edifice (new surprising results are rewarded, disclosures that the emperor is naked are not). Not surprisingly, there are new calls to fix this in various ways: making your reported data available online when publishing (that this was not done as a rule surprised me, but it seems that in many areas, (e.g. psych, econ) one doesn’t as a rule make one’s raw results, computational methods and procedures available. Why? Well, it’s labor intensive to develop these data sets and why share them with just anyone given that you can use them again and again. It’s a kind of free rider problem), declaring the point of your experiment ahead of time so that you cannot statistically cherry pick the results while trolling for significance, getting more journals to publish failed/successful replications, getting the hiring/promotion/tenure process to value replication work, have labs replicate one another’s “results” as part of their normal operation etc. In other words, all proposals to fix an otherwise well designed system. These reforms assume that the scientific method is both laudable and workable and what we need are ways to tweak the system so that it better reflects our collective ideals.

However, there are more subversive rumblings out there. First, there is the observation that maybe humans are unfit for SM.  It seems that scientists are human. They easily delude themselves into thinking that their work matters, or as Johnson puts it “the more passionate scientists are about their work, the more susceptible they are to bias.” 

Moreover, it is often really hard to apply SM’s tough love for scientists often cannot make explicit what one needs to do to get the damn experiment to work. Tacit knowledge is rife in experimental work and is one of the main reasons for why one hires a post doc from a lab whose work you admire; so that s/he can teach you how to run the relevant experiments, how to get just the right “feel” for what needs doing.

Third, being a scientific spoiler, the one that rats out all that beautiful work out there is not nearly as attractive a career prospect as being an original thinker developing new ideas. I doubt that there will ever be a Nobel Prize for non-replication. And, for just this reason, I suspect that money to support the vast amount of replication work that seems to be needed will not be forthcoming.

Forth, a passion for replication may carry its own darker consequences. Johnson notes that there is “a fear that perfectly good results will be thrown out.” See here for some intelligent discussion of the issue.

None of these points, even if accepted as completely accurate, argue against trying to do something. They simply argue that SM may not be something that scientists can manage well and so the institutions that regulate science must do it for them. The problem, of course, is that these institutions are themselves run by scientists, generally the most successful (i.e. the ones that produce the new and interesting results) and so the problem one finds at the individual level may thereby percolate to the institutional one as well. Think of bankers regulating the SEC or the Fed. Not always a recipe for success.

There is a more radical critique of SM that is also finding a voice, one that questions the implicit conception of science that SM embodies.  Here’s a comment on the SciAm blog that I found particularly stimulating. I’m not sure if it is correct, but it is a useful antidote to what is fast becoming the conventional wisdom.  The point the author (Jared Horvath)  makes is that replication has never been the gold standard for, at least, our most important and influential research. Or as Horvath puts it: “In actuality, unreliable research and irreproducible data have been the status quo since the inception of modern science. Far from being ruinous, this unique feature of research is integral to the evolution of science.” Horvath illustrates this point by noting some of the more illustrious failures. They include Galileo, Millikan, and Dalton to name three. His conclusion? Maybe replicability is not that big a deal, and that our current worries stem from a misperception of what’s really important in the forward march of science. Here’s Horvath:

Many are taught that science moves forward in discreet, cumulative steps; that truth builds upon truth as the tapestry of the universe slowly unfolds.  Under this ideal, when scientific intentions (hypotheses) fail to manifest, scientists must tinker until their work is replicable everywhere at anytime.  In other words, results that aren’t valid are useless.
In reality, science progresses in subtle degrees, half-truths and chance.  An article that is 100 percent valid has never been published.  While direct replication may be a myth, there may be information or bits of data that are useful among the noise.  It is these bits of data that allow science to evolve.  In order for utility to emerge, we must be okay with publishing imperfect and potentially fruitless data.  If scientists were to maintain the ideal, the small percentage of useful data would never emerge; we’d all be waiting to achieve perfection before reporting our work.

I have quite a lot of sympathy for this view of the world, as many of you might have guessed. In fact, I suspect that what drives research forward is as much due to the emergence of good ideas as the emergence of solid data. The picture that SM provides doesn’t reflect the unbelievable messiness of real research, as Horvath notes. It also, I believe, misrepresents the role of data in scientific discourse. It is there to clean up the ideas.

The goal of scientific research is to uncover the underlying powers and mechanisms. We do this by proposing some of these “invisible” powers/mechanisms and asking what sorts of more easily accessible things would result were these indeed true.  In other words, we look for evidence and this comes in many forms: the mechanisms predict that the paper will turn blue if I put it in tea, the mechanisms predict that Sunday will be my birthday, etc. Some of the utility of these postulated mechanisms is that they “make sense” of things (e.g. Darwin’s theory of Natural Selection), or they resolve paradoxes (e.g. special relativity).  At any rate, these are as important (in my view maybe more important) than replicable results. Data matters, but ideas matter more. And (and this is the important part), the ideas that matter are not just summations of the data!

Let me put this another way: if you believe that theories are just compact ways of representing the data, i.e. that they are efficient ways of representing the “facts,” then bad facts must lead to bad theories for these will inherit the blemishes of their inputs (viz. gigo: garbage in, garbage out). On this view, theories live and die by whether they accurately encode in compact form the facts, the latter being both epistemologically and ontologically prior to the former. If however you believe that theories are descriptions of mechanisms then when there is a conflict between them, the source of the problem might just as well be with the facts as with the theories. Theoretical mechanisms, not being just summary restatements of the facts, have an integrity independent of the facts used to investigate them. Thus, when a conflict arises, one is faced with a serious problem: is it the data that is misleading or the facts? And, sadly, there is no recipe or algorithm or procedure for solving this conflict. Thought and judgment are required, not computation, not “method,” at least if interpreted as some low level mechanical procedure.

As you might have guessed, IMO the problem with the standard interpretation of SM is that it endorses a pretty unsophisticated empiricist conception of science.  It takes the central scientific problem to be the careful accumulation of accurate data points. Absent this, science grinds to a halt. I doubt it. Science can falter for many reasons, but the principle one is the absence of insight. Absent reasonable theory, data is of dubious value. Given reasonable theory, even “bad” data can be and has been useful.

An aside: It strikes me as interesting that a lot of the discussion over replication has taken medicine, psychology and neuroscience as the problematic subject areas.  In these areas good understanding of the underlying mechanisms is pretty thin. If this is correct, the main problem here is not that the data is “dirty” but that the theory is negligible. In other words, we are largely ignorant of what’s going on. It’s useful to contrast this with what happens in physics when a purportedly recalcitrant data point emerges: it is vetted against serious theory and made to justify itself (see here). Dirty data is a serious problem when we know nothing. It is far less a problem once theory begins to reveal the underling mechanisms.

What’s the upshot of all of this for linguistics?  To a certain degree, we have been able to finesse the replicability worries for the cost of experimentation is so low: just ask a native speaker or two. Moreover, as lots of linguistics is based on language data that practitioners are native speakers of, checking the “facts” can be done quickly and easily and reliably (as Jon Sprouse and Diego Almeida have demonstrated in various papers). Of course, this same safeguard does not as easily extend in the same way to field work of less common languages.  Here we cannot often check the facts by consulting ourselves, and this makes this sort of data less secure. Indeed, this is one of the reasons that the Piraha “debate” has been so fractious. Putting aside it’s putative relevance for UG (none!), the data itself has been very contentious, and rightly so (see here).

So where are we linguists? I think pretty well off. We have a decent-ish description of underlying generative mechanisms, i.e. the structure of FL/UG (yes there are gaps and problems, but IMO it’s pretty damn good!) and we have ready access to lots of useful, reliable and usable data. As current work in the sciences go, we are pretty lucky.


  1. Perhaps there's also a notion of 'indirect replication' whereby, if X proposes an idea with somewhat dodgy replication, and Y, W and Z do further work with somewhat but not completely dodgy replication, the whole thing winds up well corroborated. The takehome message for linguistics might be that people who have problems with Chomsky should think less about him and more about Rizzi and other people at a similar level (conjecturing that if they hadn't decided to accept Minimalism, it might have tanked).

  2. Great write-up. Definitely enlightening.

    I'm curious, as opposed to the common Popperian conception of SM that lauds falsifiability (and therefore replicability, as you've discussed), what do you think of Lakatos' research programmes model of scientific methodology for linguistics? My main concern with Popperian SM is that it is far too strong—if a single bad data point spoils the theory, it's throwing the baby out with the bathwater, so to speak. Do you think that a more nuanced approach per research programmes is more applicable to linguistics and science generally?

    1. Sorry for the delayed reply. Somehow, things got lost etc. So Lakatos? Well as a description of what goes on it is better than Popper. Sadly, however, as Feyerabend pointed out correctly, there is precious little "method," beyond injunctions like "do your best." I think that what makes science interesting is that there are no rules, just some rules of thumb. Sure, look at the data, try to find non-trivial explanations, be responsive to the criticism of your peers, consider what might falsify your claims, consider what might support your claims, etc. But these are anodyne. They boil down to "use your noodle!" With this I can agree.

      However, in more restricted circumstances, local rules of thumb can become more stringent. So, within linguistics, I think it pays to keep your eyes on the big prize: what's the structure of FL/UG and why. One wants answers to these questions and should prize research that seems to address them. This is not a methodological dictum, but it can have regulative force in the right contexts.

      So, general principles/dicat/methods? No, not so much. Rules of thumb: sure but very weak.

  3. Replicability is a necessary, but not sufficient, component of good science (granting that there is a long-standing debate about what constitutes 'good' science). If you have a theory that posits underlying mechanisms that make concrete predictions, if you then find those predictions borne out, replication will help confirm that this isn't a fluke. Which is to say that, if you've posited good (see above) mechanisms, replicability of some sort is implied.

    On the other hand, the benefits of replication (and replicability in general) without any kind of theoretical understanding are questionable.

  4. Below is a link to a new paper that should be of interest to anyone who is concerned about how the scientific method applies [or should apply] to linguistics, especially the branch closest to Norbert's heart; enjoy.