The Scientific Method (SM) is coming in for some hard knocks
lately. For example, the NYTs has a recent piece rehearsing the replicability
problems that “meta-scientists” like John Ioannidis have uncovered (see here).
It seems that a lot (maybe most) of
the published “findings” served up in our best journals reflect our own biases and
ambitions more than they reveal nature’s secrets. Moreover, George Johnson, the
NYT science writer (and he is a pretty good popularizer of some neat stuff (see
here))
speculates that this may be unavoidable, at least given the costs of big
science (hint: it’s really expensive) and the reward structure for adding a new
brick to our growing scientific edifice (new surprising results are rewarded,
disclosures that the emperor is naked are not). Not surprisingly, there are new
calls to fix this in various ways: making your reported data available online
when publishing (that this was not done as a rule surprised me, but it seems
that in many areas, (e.g. psych, econ) one doesn’t as a rule make one’s raw
results, computational methods and procedures available. Why? Well, it’s labor
intensive to develop these data sets and why share them with just anyone given
that you can use them again and again. It’s a kind of free rider problem),
declaring the point of your experiment ahead of time so that you cannot
statistically cherry pick the results while trolling for significance, getting
more journals to publish failed/successful replications, getting the hiring/promotion/tenure
process to value replication work, have labs replicate one another’s “results”
as part of their normal operation etc. In other words, all proposals to fix an
otherwise well designed system. These reforms assume that the scientific method
is both laudable and workable and what we need are ways to tweak the system so
that it better reflects our collective ideals.
However, there are more subversive rumblings out there.
First, there is the observation that maybe humans are unfit for SM. It seems that scientists are human. They
easily delude themselves into thinking that their work matters, or as Johnson
puts it “the more passionate scientists are about their work, the more
susceptible they are to bias.”
Moreover, it is often really hard to apply SM’s tough love
for scientists often cannot make explicit what one needs to do to get the damn
experiment to work. Tacit knowledge is rife in experimental work and is one of
the main reasons for why one hires a post doc from a lab whose work you admire;
so that s/he can teach you how to run the relevant experiments, how to get just
the right “feel” for what needs doing.
Third, being a scientific spoiler, the one that rats out all
that beautiful work out there is not nearly as attractive a career prospect as
being an original thinker developing new ideas. I doubt that there will ever be
a Nobel Prize for non-replication. And, for just this reason, I suspect that
money to support the vast amount of replication work that seems to be needed
will not be forthcoming.
Forth, a passion for replication may carry its own darker
consequences. Johnson notes that there is “a fear that perfectly good results
will be thrown out.” See here for
some intelligent discussion of the issue.
None of these points, even if accepted as completely
accurate, argue against trying to do something. They simply argue that SM may
not be something that scientists can manage well and so the institutions that
regulate science must do it for them. The problem, of course, is that these
institutions are themselves run by scientists, generally the most successful
(i.e. the ones that produce the new and interesting results) and so the problem
one finds at the individual level may thereby percolate to the institutional
one as well. Think of bankers regulating the SEC or the Fed. Not always a
recipe for success.
There is a more radical critique of SM that is also finding
a voice, one that questions the implicit conception of science that SM
embodies. Here’s
a comment on the SciAm blog that I found particularly stimulating. I’m not sure
if it is correct, but it is a useful antidote to what is fast becoming the
conventional wisdom. The point the
author (Jared Horvath) makes is that
replication has never been the gold
standard for, at least, our most important and influential research. Or as
Horvath puts it: “In actuality, unreliable research and irreproducible data
have been the status quo since the inception of modern science. Far from being
ruinous, this unique feature of research is integral to the evolution of
science.” Horvath illustrates this point by noting some of the more illustrious
failures. They include Galileo, Millikan, and Dalton to name three. His
conclusion? Maybe replicability is not that big a deal, and that our current
worries stem from a misperception of what’s really important in the forward
march of science. Here’s Horvath:
Many are taught that science moves forward in
discreet, cumulative steps; that truth builds upon truth as the tapestry of the
universe slowly unfolds. Under this ideal, when scientific intentions
(hypotheses) fail to manifest, scientists must tinker until their work is
replicable everywhere at anytime. In other words, results that aren’t
valid are useless.
In
reality, science progresses in subtle degrees, half-truths and chance. An
article that is 100 percent valid has never been published. While direct
replication may be a myth, there may be information or bits of data that are useful
among the noise. It is these bits of data that allow science to
evolve. In order for utility to emerge, we must be okay with publishing
imperfect and potentially fruitless data. If scientists were to maintain
the ideal, the small percentage of useful data would never emerge; we’d all be
waiting to achieve perfection before reporting our work.
I have quite a lot of sympathy for this view of the world,
as many of you might have guessed. In fact, I suspect that what drives research
forward is as much due to the emergence of good ideas as the emergence of solid data. The picture that SM provides
doesn’t reflect the unbelievable messiness of real research, as Horvath notes.
It also, I believe, misrepresents the role of data in scientific discourse. It
is there to clean up the ideas.
The goal of scientific research is to uncover the underlying
powers and mechanisms. We do this by proposing some of these “invisible”
powers/mechanisms and asking what sorts of more easily accessible things would
result were these indeed true. In other
words, we look for evidence and this comes in many forms: the mechanisms
predict that the paper will turn blue if I put it in tea, the mechanisms
predict that Sunday will be my birthday, etc. Some of the utility of these
postulated mechanisms is that they “make sense” of things (e.g. Darwin’s theory
of Natural Selection), or they resolve paradoxes (e.g. special
relativity). At any rate, these are as
important (in my view maybe more important) than replicable results. Data
matters, but ideas matter more. And (and this is the important part), the ideas
that matter are not just summations of the data!
Let me put this another way: if you believe that theories
are just compact ways of representing
the data, i.e. that they are efficient ways of representing the “facts,” then
bad facts must lead to bad theories for these will inherit the blemishes of
their inputs (viz. gigo: garbage in, garbage out). On this view, theories live
and die by whether they accurately encode in compact form the facts, the latter
being both epistemologically and ontologically prior to the former. If however
you believe that theories are descriptions of mechanisms then when there is a
conflict between them, the source of the problem might just as well be with the
facts as with the theories. Theoretical mechanisms, not being just summary
restatements of the facts, have an integrity independent of the facts used to
investigate them. Thus, when a conflict arises, one is faced with a serious
problem: is it the data that is misleading or the facts? And, sadly, there is
no recipe or algorithm or procedure for solving this conflict. Thought and
judgment are required, not computation, not “method,” at least if interpreted
as some low level mechanical procedure.
As you might have guessed, IMO the problem with the standard
interpretation of SM is that it endorses a pretty unsophisticated empiricist
conception of science. It takes the central
scientific problem to be the careful accumulation of accurate data points.
Absent this, science grinds to a halt. I doubt it. Science can falter for many
reasons, but the principle one is the absence of insight. Absent reasonable
theory, data is of dubious value. Given reasonable theory, even “bad” data can
be and has been useful.
An aside: It strikes me as interesting that a lot of the
discussion over replication has taken medicine, psychology and neuroscience as
the problematic subject areas. In these
areas good understanding of the underlying mechanisms is pretty thin. If this
is correct, the main problem here is not that the data is “dirty” but that the
theory is negligible. In other words, we are largely ignorant of what’s going
on. It’s useful to contrast this with what happens in physics when a
purportedly recalcitrant data point emerges: it is vetted against serious
theory and made to justify itself (see here).
Dirty data is a serious problem when we know nothing. It is far less a problem
once theory begins to reveal the underling mechanisms.
What’s the upshot of all of this for linguistics? To a certain degree, we have been able to
finesse the replicability worries for the cost of experimentation is so low:
just ask a native speaker or two. Moreover, as lots of linguistics is based on
language data that practitioners are native speakers of, checking the “facts”
can be done quickly and easily and reliably (as Jon Sprouse and Diego Almeida
have demonstrated in various papers). Of course, this same safeguard does not
as easily extend in the same way to field work of less common languages. Here we cannot often check the facts by
consulting ourselves, and this makes this sort of data less secure. Indeed,
this is one of the reasons that the Piraha “debate” has been so fractious.
Putting aside it’s putative relevance for UG (none!), the data itself has been
very contentious, and rightly so (see here).
So where are we linguists? I think pretty well off. We have
a decent-ish description of underlying generative mechanisms, i.e. the
structure of FL/UG (yes there are gaps and problems, but IMO it’s pretty damn
good!) and we have ready access to lots of useful, reliable and usable data. As
current work in the sciences go, we are pretty lucky.
Perhaps there's also a notion of 'indirect replication' whereby, if X proposes an idea with somewhat dodgy replication, and Y, W and Z do further work with somewhat but not completely dodgy replication, the whole thing winds up well corroborated. The takehome message for linguistics might be that people who have problems with Chomsky should think less about him and more about Rizzi and other people at a similar level (conjecturing that if they hadn't decided to accept Minimalism, it might have tanked).
ReplyDeleteGreat write-up. Definitely enlightening.
ReplyDeleteI'm curious, as opposed to the common Popperian conception of SM that lauds falsifiability (and therefore replicability, as you've discussed), what do you think of Lakatos' research programmes model of scientific methodology for linguistics? My main concern with Popperian SM is that it is far too strong—if a single bad data point spoils the theory, it's throwing the baby out with the bathwater, so to speak. Do you think that a more nuanced approach per research programmes is more applicable to linguistics and science generally?
Sorry for the delayed reply. Somehow, things got lost etc. So Lakatos? Well as a description of what goes on it is better than Popper. Sadly, however, as Feyerabend pointed out correctly, there is precious little "method," beyond injunctions like "do your best." I think that what makes science interesting is that there are no rules, just some rules of thumb. Sure, look at the data, try to find non-trivial explanations, be responsive to the criticism of your peers, consider what might falsify your claims, consider what might support your claims, etc. But these are anodyne. They boil down to "use your noodle!" With this I can agree.
DeleteHowever, in more restricted circumstances, local rules of thumb can become more stringent. So, within linguistics, I think it pays to keep your eyes on the big prize: what's the structure of FL/UG and why. One wants answers to these questions and should prize research that seems to address them. This is not a methodological dictum, but it can have regulative force in the right contexts.
So, general principles/dicat/methods? No, not so much. Rules of thumb: sure but very weak.
Replicability is a necessary, but not sufficient, component of good science (granting that there is a long-standing debate about what constitutes 'good' science). If you have a theory that posits underlying mechanisms that make concrete predictions, if you then find those predictions borne out, replication will help confirm that this isn't a fluke. Which is to say that, if you've posited good (see above) mechanisms, replicability of some sort is implied.
ReplyDeleteOn the other hand, the benefits of replication (and replicability in general) without any kind of theoretical understanding are questionable.
Below is a link to a new paper that should be of interest to anyone who is concerned about how the scientific method applies [or should apply] to linguistics, especially the branch closest to Norbert's heart; enjoy.
ReplyDeletehttp://ling.auf.net/lingbuzz/002006