Thursday, November 16, 2017

Is science broken/breaking?

Is science broken/breaking and if it is what broke/is breaking it? This question has been asked and answered a lot lately. Here is another recent contribution by Roy and Edwards (R&E). Their answer is that it is nearly broken (or at least severely injured) and that we should fix it by removing the perverse incentives that currently drive it. Though I am sympathetic to ameliorating many of the adverse forces R&E identify, I am more skeptical than they are that there is much of a crisis out there. Indeed, to date, I have seen no evidence showing that what we see today is appreciably worse than what we had before, either in the distant past (you know in the days when science was the more or less exclusive pursuit of whitish men of means) or even the more recent past (you know when a PhD was still something that women could only dream about). I have seen no evidence that show that published results were once more reliable than they are now or that progress overall was swifter.  Furthermore, from where I sit, things seem (at least from the outside) to be going no worse than before in the older “hard” sciences and in the “softer” sciences the problem is less the perverse incentives and the slipshod data management that R&E point to so much as the dearth of good ideas that can allow such inquiries to attain some explanatory depth. So, though I agree that there are many perverse incentives out there and that there are pressures that can (and often do) lead to bad behavior, I am unsure whether given the scale of the modern scientific enterprise things are really appreciably worse today than they were in some prior golden age (not that I would object to more money being thrown at the research problems I find interesting!). Let me ramble a bit on these themes.

What broke/is breaking science? R&E point to hypercompetition among academic researchers. Whence the hypercompetition? Largely from the fact that universities are operating “more like businesses” (2). What in particular? (i) The squeezed labor market for academics (fewer tenure track jobs and less pleasant work environment), (ii) The reliance on quantitative performance metrics (numbers of papers, research dollars, citations) and (iii) the fall in science research funding (from 2% of GDP in 1960 to .78% in 2014)[1] (p. 7) work together to incentivize scientists to cut corners in various ways. As R&E puts it:

The steady growth of perverse incentives, and their instrumental role in faculty research, hiring and promotion practices, amounts to a systematic dysfunction endangering scientific integrity. There is growing evidence that today’s research publications too frequently suffer from lack of replicability, rely on biased data-sets, apply low or sub-standard statistical methods, fail to guard against researcher biases, and overhype their findings. (p. 8)

So, perverse incentives due to heightened competition for shrinking research dollars and academic positions leads scientists interested in advancing their research and careers to conduct research “more vulnerable to falsehoods.” More vulnerable than what/when? Well, by implication than some earlier golden age when such incentives did not dominate and scientists pursued knowledge in a more relaxed fashion and were not incentivized to cut corners as they are today. [2]

I like some of this story. I believe as R&E argues that scientific life is less pleasant than it used to be when I was younger (at least for those that make it into an academic position). I also agree that the pressures of professional advancement make it costly (especially to young investigators) to undertake hard questions, ones that might resist solution (R&E quotes Nobelist Roger Kornberg as claiming: “If the work you propose to do isn’t virtually certain of success, then it won’t be funded” (8)).[3] Not surprisingly then, there is a tendency to concentrate on those questions for whose solution techniques already available apply and that hard work and concentrated effort can crack. I further agree that counting publications is, at best, a crude way of evaluating scientific merit, even if buttressed by citation counts (but see below). All of this seems right to me, and yet…I am skeptical that science as a whole is really doing so badly or that there was a golden age that our own is a degenerate version of. In fact, I suspect that people who think this don’t know much about earlier periods (there really was always lots of junk published and disseminated) or have an inflated view of the cooperative predilections of our predecessors (though how anyone who read the Double Helix might think this is beyond me).

But I have a somewhat larger problem with this story: if the perverse incentives are as R&E describes them then we should witness its ill effects all across the sciences and not concentrated in a subset of them. In particular, it should affect the hardcore areas (e.g. physics, chemistry, molecular biology) just as it does the softer (more descriptive) domains of inquiry (social psychology, neuroscience). But it is my impression that this is not what we find. Rather we find the problem areas more concentrated, roughly in those domains of inquiry where, to be blunt, we do not know that much about the fundamental mechanisms at play. Put another way, the problem is not merely (or even largely) the bad incentives. The problem is that we often cannot distinguish between those domains that are sciency from those domains that are scientific. What’s the difference? The latter have results (i.e. serious theory) that describe non-trivial aspects of the basic mechanisms where the former has methods (i.e. ways of “correctly” gathering and evaluating data) that are largely deployed to descriptive (vs explanatory) ends. As Suppes said over 60 years ago: “It’s a paradox of scientific method that the branches of empirical science that have the least theoretical developments have the most sophisticated methods of evaluating evidence.” It should not surprise that domains where insight is weakest are also domains where shortcuts are most accessible.

And this is why it’s important to distinguish these domains. If we look, it seems that the perverse incentives R&E identifies are most apt to do damage in those domains where we know relatively little. Fake data, non-replicability, bad statistical methods leading to forking paths/P-hacking, research biases, these are all serious problems especially in domains where nothing much is known. In domains with few insights and where all we have is the data then screwing with the data (intentionally or not) is the main source of abuse. And in those domains when incentives for abuse rise, then enticements to make the data say what we need them to say heightens. And when the techniques for managing the data are capable of being manipulated to make them say what you want them to say (or at least whose proper deployment eludes even the experts in the field (see here and here), then opportunity allows enticement to flower into abuse.

The problem then is not just perverse incentives and hypercompetition (these general factors hold in the mature sciences too) but the fact that in many fields the only bulwark against scientific malpractice is personal integrity. What we are discovering is that as a group, scientists are just as prone to pursuing self-interest and career advancement as any other group. What makes scientists virtuous is not their characters but their non-trivial knowledge. Good theory serves as a conceptual break against shoddy methods. Strong priors (which is what theory provides) is really important in preventing shoddy data and slipshod thinking from leading the field astray. If this is right, then the problem is not with the general sociological observation that the world is in many ways crappier than before, but with the fact that many parts of what we call the sciences are pretty immature. There is far less science out there than advertised if we measure a science by the depth of its insights rather than the complexity of its techniques (especially, its data management techniques).

There is, of course, a reason for why the term ‘science’ is used to cover so much inquiry. The prestige factor. Being “scientific” endows prestige, money, power and deference. Science stands behind “expertise,” and expertise commands many other goodies. There is thus utility in inflating the domain of “science” and this widens the possibilities for and advantages of the kinds of problems that R&E catalogue.

R&E ends with a discussion of ways to fix things. They seem like worthy, albeit modest fixes. But I personally doubt they will make much of a difference. They include getting a better fix on the perverting incentives, finding better ways to measure scientific contribution so that reward can be tuned to these more accurate metrics, and implementing more vigilant policing and punishment of malefactors. This might have an effect, though I suspect it will be quite modest. Most of the suggestions revolve around ways of short-circuiting data manipulation. That’s why I think these suggestions will ultimately fail to do much. They misdiagnose the problem. R&E implicitly takes the problem to mainly reside with current perverse incentives to pollute the data stream for career advancement. The R&E solution amounts to cleaning up the data stream by eliminating the incentives to dirty it. But the problem is not only (or mainly) dirty data. The problem is our very modest understanding for which yet more data is not a good substitute.

Let me end with a mention of another paper. The other paper (here) takes a historical view on metrics and how it affected research practice in the past. It notes that the problems we identify as novel today have long been with us. It notes that this is not the first time people are looking for some kind of methodological or technological fix to slovenly practice. And that is the problem; the idea that there is a kind of methodological “fix” available, a dream for a more rigorous scientific method and a clearer scientific ethics. But there is no such method (beyond the trivial do your best) and the idea that scientists qua scientists are more noble than others is farfetched. Science cannot be automated. Thinking is hard and ideas, not just data collection, matter. Moreover, this kind of thinking cannot be routinized and insights cannot be summoned not mater how useful they would be. What the crisis R&E identifies points to, IMO, is that where we don’t know much we can be easily misled and easily confused. I doubt that there is a methodological or institutional fix for this.


[1] It is worth pointing out that real GDP in 2016 is over five times higher than it was in 1960 (roughly 3 trillion vs 17 trillion) (see here). In real terms then, there is a lot more money today than there was then for science research. Though the denominator went down, the numerator really shot up.
[2] Again, what is odd is the dearth of comparative data on these measures. Are findings less replicable today than in the past? Are data sets more biased than before? Was statistical practice better in the past? I confess that it is hard to believe that any of these measures have gotten worse if compared using the same yardsticks.
[3] This is odd and I’ve complained about this myself. However, it is also true that in the good old days science was a far more restricted option for most people (it is far more open today and many more people and kinds of people can become scientists). However, what seems more or less right is that immediately after the war there was a lot of government money pouring into science and that made it possible to pursue research that did not show immediate signs of success. What would be nice to see is evidence that this made for better more interesting science, rather than more comfortable scientists. 

2 comments:

  1. My impression is that many of these problems DO affect the hard sciences as well. How much progress has been made in theoretical physics or chemistry in the last two decades? Plus, they have two advantages: they have far more money for research (at least here in Europe), so e.g. the career perspectives for junior researchers and some other problems are not as bad. And they can build on a large theoretical edifice that already exists. Many of the successes of these fields are confirmations of predictions made many years ago.

    I am curious to know what you base your assumption on that those hard sciences are doing so well!

    ReplyDelete
    Replies
    1. Sorry for the delay in replying. Thjxgiving got in the way. Too much food, wine and relaxation. At any rate, my view differed from yours. I see no evidence that there are endemic problems in the hard sciences like those roiling psychology, neuroscience and medicine. The replication crisis has not hit there. The main problem in physics is that the standard theory has done a bit too well. Theory is stuck for a variety of reasons, including the success of the standard theory (no great anomalies leading to big time theory revision), the lack of verification for some of the leading unification theories (including supersymmetry) and the remove of much of high theory from empirical test (this is a big problem I am told for string theory, but it is also evident in theories of inflation (multi-verses)). That said, this strikes me as quite different from what we find in the less hard (very soft?) sciences where the problem goes quite a bit further. No good theories to speak of in social psych and the problem with data in all of them. These are not what we find in physics or chemistry of most of small scale (cells) biology. In fact, we don't find it in all areas of psych either. Psychophysics seems not plagued by replication or empirical problems, I have gone on record as noting that the same holds for large chunks of linguistics. The same seems to hold true for large parts of developmental psych. So, I don't see that the social problems that these papers noted above CORRECTLY identify (I am not disputing the sociological factors are there everywhere) are having the same deleterious effect on the practice of science that we see in the domains most cited as problematic. The question is then why: and my candidate is that the problematic areas are also ones where we just don't know that much. The idea that data should speak for itself and that stat methods should protect us from egregious systematic radish errors always struck me as misguided. I cite the differential impact of the socio factors on the practice of science in these different domains as evidence of this.

      Delete