Friday, October 18, 2013

Evaluating Scientific Merit

I have just finished reviewing two papers for possible publication. As you all know, this is hard work. You gotta read the thing, think about it rationally, judge it along several dimensions concluding in an overall assessment, and then write this all up in comprehensible, and hopefully, helpful prose. I find any one of these activities exhausting, and I thus don't count reviewing as among my top 10 favorite pastimes. Given my distaste, I have begun to wonder how valuable all this expended effort is. After all, it's worth doing if the process has value even if it is a pain (sort of like dieting or regular exercise). So is it and does it?

Sadly, this has become less and less clear to me of late. In a recent post (here) I pointed you to some work aimed at evaluating the quality of journal publications. Yesterday, I ran across a paper published in Plos Biology (here) that, if accurate, suggests, that the return on investment for reviewing is remarkably slight. Why? Because, as the authors ( Adam Eyre-Walker and Nina Stoletzki: (E-WS)) put it: "scientists are poor at estimating the merit of a scientific publication" (6). How poor? Very. Why? because (i) there is very low intersubjective agreement after the fact on what counts as worthwhile, (ii) there is a a strong belief that a paper published in a prestige journal is ipso facto meritorious (though as E-WS show this not a well justified assumption), and post hoc citation indices are very nosiy indicators of merit. So, it seems that there is little evidence for the common assumption that the cream rises to the top or that the best papers get published in the best journals or that the cumbersome and expensive weeding process (i.e. reviewing) really identifies the good stuff and discards the bad.

Now, linguists might respond to this by saying that this all concerns papers in biology, not hard sciences like syntax, semantics and psycholinguistics. And, of course, I sympathize. However, more often than not, what holds in one domain has analogues in others. Would it really surprise you to find out that the same holds true in our little domain of inquiry, even granted the obvious superior intellect and taste of generative grammarians?

Say this is true, what's to be done? I really don't know. Journals, one might think, play three different roles in academia. First, they disseminate research. Second, they help direct the direction of research by ranking research into different piles of excellence. Third, they are used for promotion and tenure and the distribution of scarce research resources, aka grants.  In this internet age, journals no longer serve the first function. As for the second, the E-WS results suggest that the effort is not worth it, at least if the aim is to find the good stuff and discard the bad. The last then becomes the real point of the current system: it's a kind of arbitrary way of distributing scarce resources, though if the E-WS findings are correct, journals are the academic equivalent of dice, with luck more than merit determining the outcome.


  1. It's a noisy process, for sure. But if it were really a waste of time, then you'd find no correlation between the assessments of the various reviewers of any given manuscript or grant proposal. That's just not the case. Does the process take time and effort - of course. Is it imperfect - of course. Do some fields show higher levels of noise and vindictiveness - alas yes (I'm looking at you, syntacticians). But is it a pointless crapshoot - not close. To take one example, the most recent ms. that I reviewed was for a well regarded journal. The submission received three reviews, all saying roughly the same thing: enthusiastic about the originality of the ideas, but with reservations about the case made. The authors took this to heart, came back with a substantially improved paper, and hopefully all will be more-or-less happy. That's a case of the process working well.

  2. So you are saying that in OUR journals, in contrast to theirs, there is high intersubjective agreement and this is a sign of quality. If so, then it seems that our review process is different from what they found in their field, at least when the accepted papers were evaluated after publication. If this is correct, then would you take this to indicate that, at least in their field, reviewing is not doing what we hope it should. Btw, as you would be the first to tell me, anecdote is not the singular of data. Your experience, though interesting, seems somewhat less compelling than their study. That said, maybe our field is different, and not beset by the problems that they found. But is this really that likely? Recall, they come very close to finding no correlation, your nightmare scenario. They concede a weak one, but then raise the reasonable question of whether the payoff is worth the price. So, you may be right: our area is different. We do a good job separating the good from the bad. Any thoughts on why we do so well and they don't?

  3. It seems plausible that most papers are improved by undergoing the review process. That is compatible with the process doing a bad job of selecting the best papers. In other words, even if the papers to be published are in effect chosen at random, it may nonetheless be the case that the majority of published papers are better than they would have been if they'd never been reviewed.

  4. I think they are very often improved (especially by being made more intelligible, pointless detours omitted, etc), but also sometimes made worse by being made to conform to current fashions that are in fact not very well founded. Nevertheless, on balance, I approve of reviewing, because it means that the thing has at least been looked at by a semi-captive audience at least some whom probably act on their moral obligation to actually read it, rather than just toss it in the bin the first time they get bored or annoyed.