Is science broken/breaking and if it is what broke/is
breaking it? This question has been asked and answered a lot lately. Here
is another recent contribution by Roy and Edwards (R&E). Their answer is
that it is nearly broken (or at least
severely injured) and that we should fix it by removing the perverse incentives
that currently drive it. Though I am sympathetic to ameliorating many of the
adverse forces R&E identify, I am more skeptical than they are that there
is much of a crisis out there. Indeed, to date, I have seen no evidence showing
that what we see today is appreciably worse than what we had before, either in
the distant past (you know in the days when science was the more or less
exclusive pursuit of whitish men of means) or even the more recent past (you
know when a PhD was still something that women could only dream about). I have
seen no evidence that show that published results were once more reliable than
they are now or that progress overall was swifter. Furthermore, from where I sit, things seem
(at least from the outside) to be going no worse than before in the older
“hard” sciences and in the “softer” sciences the problem is less the perverse
incentives and the slipshod data management that R&E point to so much as
the dearth of good ideas that can allow such inquiries to attain some
explanatory depth. So, though I agree that there are many perverse incentives
out there and that there are pressures that can (and often do) lead to bad
behavior, I am unsure whether given the scale of the modern scientific
enterprise things are really appreciably worse today than they were in some
prior golden age (not that I would object to more money being thrown at the
research problems I find interesting!). Let me ramble a bit on these themes.
What broke/is breaking science? R&E point to
hypercompetition among academic researchers. Whence the hypercompetition?
Largely from the fact that universities are operating “more like businesses”
(2). What in particular? (i) The squeezed labor market for academics (fewer
tenure track jobs and less pleasant work environment), (ii) The reliance on
quantitative performance metrics (numbers of papers, research dollars,
citations) and (iii) the fall in science research funding (from 2% of GDP in
1960 to .78% in 2014)[1]
(p. 7) work together to incentivize scientists to cut corners in various ways.
As R&E puts it:
The steady growth of perverse incentives,
and their instrumental role in faculty research, hiring and promotion
practices, amounts to a systematic dysfunction endangering scientific
integrity. There is growing evidence that today’s research publications too
frequently suffer from lack of replicability, rely on biased data-sets, apply
low or sub-standard statistical methods, fail to guard against researcher
biases, and overhype their findings. (p. 8)
So, perverse incentives due to heightened competition for
shrinking research dollars and academic positions leads scientists interested
in advancing their research and careers to conduct research “more vulnerable to
falsehoods.” More vulnerable than what/when? Well, by implication than some
earlier golden age when such incentives did not dominate and scientists pursued
knowledge in a more relaxed fashion and were not incentivized to cut corners as
they are today. [2]
I like some of this story. I believe as R&E argues that
scientific life is less pleasant than it used to be when I was younger (at least
for those that make it into an academic position). I also agree that the
pressures of professional advancement make it costly (especially to young
investigators) to undertake hard questions, ones that might resist solution
(R&E quotes Nobelist Roger
Kornberg as claiming: “If the work you propose to do isn’t virtually
certain of success, then it won’t be funded” (8)).[3]
Not surprisingly then, there is a tendency to concentrate on those questions
for whose solution techniques already available apply and that hard work and
concentrated effort can crack. I further agree that counting publications is,
at best, a crude way of evaluating scientific merit, even if buttressed by
citation counts (but see below). All of this seems right to me, and yet…I am
skeptical that science as a whole is really doing so badly or that there was a
golden age that our own is a degenerate version of. In fact, I suspect that
people who think this don’t know much about earlier periods (there really was
always lots of junk published and disseminated) or have an inflated view of the
cooperative predilections of our predecessors (though how anyone who read the
Double Helix might think this is beyond me).
But I have a somewhat larger problem with this story: if the
perverse incentives are as R&E describes them then we should witness its
ill effects all across the sciences
and not concentrated in a subset of them. In particular, it should affect the
hardcore areas (e.g. physics, chemistry, molecular biology) just as it does the
softer (more descriptive) domains of inquiry (social psychology, neuroscience).
But it is my impression that this is not
what we find. Rather we find the problem areas more concentrated, roughly in
those domains of inquiry where, to be blunt, we do not know that much about the
fundamental mechanisms at play. Put another way, the problem is not merely (or even largely) the bad
incentives. The problem is that we often cannot distinguish between those domains
that are sciency from those domains that are scientific. What’s the difference?
The latter have results (i.e. serious theory) that describe non-trivial aspects
of the basic mechanisms where the former has methods (i.e. ways of “correctly”
gathering and evaluating data) that are largely deployed to descriptive (vs
explanatory) ends. As Suppes said over 60 years ago: “It’s a paradox of
scientific method that the branches of empirical science that have the least
theoretical developments have the most sophisticated methods of evaluating
evidence.” It should not surprise that domains where insight is weakest are
also domains where shortcuts are most accessible.
And this is why it’s important to distinguish these domains.
If we look, it seems that the perverse incentives R&E identifies are most
apt to do damage in those domains where we know relatively little. Fake data,
non-replicability, bad statistical methods leading to forking paths/P-hacking, research
biases, these are all serious problems especially
in domains where nothing much is known. In domains with few insights and
where all we have is the data then screwing with the data (intentionally or
not) is the main source of abuse. And in those domains when incentives for
abuse rise, then enticements to make the data say what we need them to say
heightens. And when the techniques for managing the data are capable of being
manipulated to make them say what you want them to say (or at least whose proper
deployment eludes even the experts in the field (see here
and here),
then opportunity allows enticement to flower into abuse.
The problem then is not just
perverse incentives and hypercompetition (these general factors hold in the
mature sciences too) but the fact that in many fields the only bulwark against
scientific malpractice is personal integrity. What we are discovering is that
as a group, scientists are just as prone to pursuing self-interest and career
advancement as any other group. What makes scientists virtuous is not their
characters but their non-trivial knowledge. Good theory serves as a conceptual
break against shoddy methods. Strong priors (which is what theory provides) is
really important in preventing shoddy data and slipshod thinking from leading
the field astray. If this is right, then the problem is not with the general
sociological observation that the world is in many ways crappier than before,
but with the fact that many parts of what we call the sciences are pretty
immature. There is far less science out there than advertised if we measure a
science by the depth of its insights rather than the complexity of its
techniques (especially, its data management techniques).
There is, of course, a reason for why the term ‘science’ is
used to cover so much inquiry. The prestige factor. Being “scientific” endows
prestige, money, power and deference. Science stands behind “expertise,” and
expertise commands many other goodies. There is thus utility in inflating the domain
of “science” and this widens the possibilities for and advantages of the kinds
of problems that R&E catalogue.
R&E ends with a discussion of ways to fix things. They
seem like worthy, albeit modest fixes. But I personally doubt they will make
much of a difference. They include getting a better fix on the perverting
incentives, finding better ways to measure scientific contribution so that
reward can be tuned to these more accurate metrics, and implementing more
vigilant policing and punishment of malefactors. This might have an effect,
though I suspect it will be quite modest. Most of the suggestions revolve
around ways of short-circuiting data manipulation. That’s why I think these
suggestions will ultimately fail to do much. They misdiagnose the problem.
R&E implicitly takes the problem to mainly reside with current perverse incentives
to pollute the data stream for career advancement. The R&E solution amounts
to cleaning up the data stream by eliminating the incentives to dirty it. But
the problem is not only (or mainly) dirty data. The problem is our very modest
understanding for which yet more data is not a good substitute.
Let me end with a mention of another paper. The other paper
(here)
takes a historical view on metrics and how it affected research practice in the
past. It notes that the problems we identify as novel today have long been with
us. It notes that this is not the first time people are looking for some kind
of methodological or technological fix to slovenly practice. And that is the
problem; the idea that there is a kind of methodological “fix” available, a
dream for a more rigorous scientific method and a clearer scientific ethics.
But there is no such method (beyond the trivial do your best) and the idea that
scientists qua scientists are more
noble than others is farfetched. Science cannot be automated. Thinking is hard
and ideas, not just data collection, matter. Moreover, this kind of thinking
cannot be routinized and insights cannot be summoned not mater how useful they
would be. What the crisis R&E identifies points to, IMO, is that where we
don’t know much we can be easily misled and easily confused. I doubt that there
is a methodological or institutional fix for this.
[1]
It is worth pointing out that real GDP in 2016 is over five times higher than
it was in 1960 (roughly 3 trillion vs 17 trillion) (see here). In real
terms then, there is a lot more money today than there was then for science
research. Though the denominator went down, the numerator really shot up.
[2]
Again, what is odd is the dearth of comparative data on these measures. Are
findings less replicable today than in the past? Are data sets more biased than
before? Was statistical practice better in the past? I confess that it is hard
to believe that any of these measures have gotten worse if compared using the
same yardsticks.
[3]
This is odd and I’ve complained about this myself. However, it is also true
that in the good old days science was a far more restricted option for most
people (it is far more open today and many more people and kinds of people can
become scientists). However, what seems more or less right is that immediately
after the war there was a lot of government money pouring into science and that
made it possible to pursue research that did not show immediate signs of
success. What would be nice to see is evidence that this made for better more
interesting science, rather than more comfortable scientists.
My impression is that many of these problems DO affect the hard sciences as well. How much progress has been made in theoretical physics or chemistry in the last two decades? Plus, they have two advantages: they have far more money for research (at least here in Europe), so e.g. the career perspectives for junior researchers and some other problems are not as bad. And they can build on a large theoretical edifice that already exists. Many of the successes of these fields are confirmations of predictions made many years ago.
ReplyDeleteI am curious to know what you base your assumption on that those hard sciences are doing so well!
Sorry for the delay in replying. Thjxgiving got in the way. Too much food, wine and relaxation. At any rate, my view differed from yours. I see no evidence that there are endemic problems in the hard sciences like those roiling psychology, neuroscience and medicine. The replication crisis has not hit there. The main problem in physics is that the standard theory has done a bit too well. Theory is stuck for a variety of reasons, including the success of the standard theory (no great anomalies leading to big time theory revision), the lack of verification for some of the leading unification theories (including supersymmetry) and the remove of much of high theory from empirical test (this is a big problem I am told for string theory, but it is also evident in theories of inflation (multi-verses)). That said, this strikes me as quite different from what we find in the less hard (very soft?) sciences where the problem goes quite a bit further. No good theories to speak of in social psych and the problem with data in all of them. These are not what we find in physics or chemistry of most of small scale (cells) biology. In fact, we don't find it in all areas of psych either. Psychophysics seems not plagued by replication or empirical problems, I have gone on record as noting that the same holds for large chunks of linguistics. The same seems to hold true for large parts of developmental psych. So, I don't see that the social problems that these papers noted above CORRECTLY identify (I am not disputing the sociological factors are there everywhere) are having the same deleterious effect on the practice of science that we see in the domains most cited as problematic. The question is then why: and my candidate is that the problematic areas are also ones where we just don't know that much. The idea that data should speak for itself and that stat methods should protect us from egregious systematic radish errors always struck me as misguided. I cite the differential impact of the socio factors on the practice of science in these different domains as evidence of this.
Delete