Saturday, January 17, 2015

Numbers and prose

Sometimes words do speak louder than numbers it seems. Klea Grohmann sent me this link to a recent piece in Nature where the relationship between grant scoring (a number apparently) and grant getting were not that closely related to one another. The Medical Research Council (MRC), the funding agency, defends this by saying that the raw scores don't always line up with comments made and that the latter can be more revealing. This sounds plausible to me right now given that I am reading admissions files. At UMD, letters of recommendation forms ask for both a numeric rating of a candidate's attributes (how smart, original, motivated, mature etc) as well as a letter. To my surprise these often don't seem to match up that well. So, I can believe that this is also true at the MRC.

However, I suspect that what's closer to the truth is a comment made in the last paragraph by Ferric Fang. He notes that grant funding is something of a lottery at present and that with money becoming scarcer relative to number and quality of grants there will be more disappointed (and so disaffected) applicants. Moreover, the fact that there are so many good quality grants being submitted (and I believe that there are) only makes whatever distinctions utilized to make a decision seem arbitrary (largely because they are so). The hard decision arises between equally strong options. And hard decisions rely on more tendentious criteria precisely because the most obviously relevant ones don't decide. Isn't that what stairwells are for?

Last point: I believe that this is likely also true for linguistic grants. There is not enough funding for linguistic research, or more accurately, the funding has stayed about the same while the number of people shooting for it has greatly increased. I have the impression that landing a grant is much much harder now than it used to be, somewhere in the vicinity of 10% or less. I sincerely doubt that the criteria segmenting the top 10% from the next 10% are all that reliable, so that means that a lot of granting decisions are haphazard. Note that this does not mean that they are unfair, or dishonest, or done in some underhanded way, as Ian Eperon suggests in the link piece. Rather, it is that among roughly equal applications there is no reasonable way to make a decision, so the choices become noisy, influenced as they must be by rather minor differences among the applications. When all deserve funding but all cannot get it then even if there is no difference between winners and losers it does not mean that winners and losers are selected unfairly. What's unfair about the flip of a coin?

At any rate, someone with more knowledge about the ling funding scene or the psycho/neuro ling funding scene than I have please chime in. I would love to know how much our world resembles theirs.  Thx Klea.


  1. On the reliability of the review process, NIPS (one of the big machine learning conferences) last year, had as program chairs Neil Lawrence and Corinna Cortes, who decided to run an experiment with two parallel PCs, where the papers were reviewed independently. As this article puts it:

    "TL:DR Half the papers appearing at NIPS would be rejected if the review process were rerun."

    It's a problem in all fields, even the "hard" branches. The NIPS experiment generated a lot of discussion, but it remains to be seen what effects it will have.

  2. One quantitative remark. I don't see why choices should become more arbitrary once the funding cut-off becomes more extreme. Surely any cut-off point will be associated with noise, and hence arbitrary outcomes. And the number of arbitrary outcomes should be greatest when the cut-off point is in the densest part of the scoring distribution.

    This is related to Alex's point about the NIPS experiment (which is a neat idea, for sure). The phrasing that "half of the papers at NIPS would be rejected in a re-run of the review process" invites the notion that the review process was a waste of time. But that's surely not true. Imagine that you run the Olympic 100-meter final many times. There are 8 guys, generally closely matched. Let's say that Usain Bolt always wins -- he's just that good. And then let's say that there are two guys that consistently get the silver medal, depending on the race. The rest of the guys are really, really fast, but they never make the top 2. That's the same situation as the NIPS outcome, but we'd conclude that the outcome is surprisingly reliable.

    1. Thx. I think I agree (I think). My point was that when the number of applicants goes up and the money does not that means that there are likely to be many more very good grants chasing the same pot. If there had been enough then all the good applicants could have been funded and the criteria for their selection would be clear. Now imagine that only half can be and a choice must be made. Well then one starts looking for less clear criteria to judge. These might not be as widely hared nor as stable even for one judger. Fine judgments are tough and speaking personally I have been known to change my mind a lot when pressed for decisions based on subtle criteria. That's what I think often goes on when we get past the clear A-list (grants, job candidates, grad students). Within the A-list, rankings are quite a bit harder and more arbitrary and, hence, changeable.

      Last point: I did not decide that the review process was unreliable, though I would be interested in knowing how the 50% shifts. Maybe some always make it in and some never do. That would be informative. But what if for any given run the chance of any paper is 50%. What does this say about the criteria? I doubt that this is what happens, but it may be closer to the truth than your Bolt example. I would love to know more.

  3. One other comment on my experience of the review process in the US, based on various settings at NSF and NIH. The review process works differently at different funding agencies, and that would affect the way that scores might mis-align with comments.

    1. Different NSF programs use different procedures. Some programs rely entirely on a panel (i.e., only the handful of people in the room read the proposals). Linguistics follows a model where a panel acts as an "editorial board", reviewing lots of reviews submitted by external experts. The external reviewers can provide crucial expertise (on a language, on a method, etc.), but since they review only one proposal, they have little context. And those reviews vary greatly in their quality ... and in whether they even send a review. The job of the panelists is not to average the external review scores (we have spreadsheets for that), but to weigh the comments from the various reviews. That means that the panelists may decide that one reviewer is unfairly negative, or is merely cheer-leading for a friend's proposal (both of those are disappointingly common). The panel's review may therefore rely heavily on one external reviewer more than others. This approach isn't perfect, but I have seen plenty of cases where it has successfully neutralized the effects of an unfair reviewer.

    2. NIH uses a different model. Reviews are mostly done by a panel, and all panelists (as many as 20) get to score a proposal, although generally only 3 panelists read the proposal. This means that much depends on the trust and oratory of the 3 assigned panelists.

    3. Funding agencies do have a notion of "portfolio balance". They are aware of the different constituencies that they serve/support, and do not want to overly favor one area over others. Nor do they want to create the impression that "Agency X just doesn't like area Y". One can disagree with the strategy, and the same complaint will arise regardless, but it's a definite factor. And in small samples it could lead to further mis-alignment between scores and funding outcomes.

    4. Yes, there's a lot of good work out there that is not getting funded. But there's also a lot of variability in how well submitters present their proposals. Effective writing really does make a difference. Submitters need to get their reviewers excited about the proposal, and need to also make it easy for reviewers to explain why they are excited. Some proposals are extremely well presented, and others are dreadful (with a lot in the middle, of course).

    1. I think a distinction needs to be made between 'fair' and 'arbitrary.' Things can be very fair if everyone is treated arbitrarily. We attach a lot to these review processes beyond who gets how much. We use this for hiring, promotion, tenure, etc. The higher ups treat success at the funding level as a major trump in their decision making. Now, I think that this is an important factor. But it can get to be over prominent. And cases like those Alex C mentions are very useful reminders that fair and arbitrary may not be the same thing.

    2. Yes, you're quite right that 'fair' and 'arbitrary' should be distinguished. And I certainly agree that funding outcomes are a blunt measure for evaluating the merit of research(ers), especially at the tenure stage. That's true of anything that turns a multidimensional scale into a simple yes/no outcome. (If only there were "honorable mentions" in grant submission outcomes.) My only objection is to the slide from "there's arbitrariness in the outcomes" (undeniably true) to "it's a crapshoot, so screw the process". The submitter's task is to do everything in his/her power to try to minimize the uncertainty. My own take on that is that it's paramount to put oneself in the shoes of those making the decisions (reviewers, panelists, program officers), and be willing to sometimes tear up what you've written and start over.

    3. I don't have much first-hand knowledge of the grant review process for cog neuro at NIH, but I have listened to my former advisor discuss the process (along with his many complaints). It seems like the composition of the review panel and their expertise/taste is a big factor. The methods and the point of particular proposals may likely be misunderstood by the reviewers, leading to unwarranted low assessment, or suggestions for revision that are erroneous and unhelpful. In this sense, it's not competition among the best that leads to randomness in the funding game, but fundamental problems with the funding game.

      I agree with Colin that many grants are likely poorly written, and we should be taking care in clearly communicating the ideas and their importance, which should lead to a clearer research program anyway. But I think that the major problematic factor is availability of funding and the ineptitude of the review process, and the fact that what tends to get funded are projects that sound good, rather than projects that are good. I think this leads to a feedback loop such that grant writers then dilute their proposals in order to sound good, or perhaps pursue lines of research that sound good rather than being informative and useful. This is a major problem, and I can't think of obvious solutions except for being active in publicly discussing the problem and trying to change this situation.

      As a side note, the impact of politics on funding is horrendous. A close friend of mine is an evolutionary biologist, and she has told me that the entire field of evolutionary biology basically has to forego NIH funding because of the radioactivity of the term "evolution". I can imagine it makes it a little bit hard to get funding to study evolution if you can't talk about evolution.

    4. @ Colin: I take it that the quotes in the screw-the-process line are intended to scare rather than to quote. At least here I have not said to screw it and I don't think I suggested malice or bad faith in reviewers, panelists or grant officers. But now that you bring this up…

      The tasks are hard surely, but I am not as certain as you are that certain kinds of taste do not dominate many parts of the process. I have mentioned before that I do not think that theoretical work gets a fair shake within the linguistics panel. I still believe that. Nor do I believe that this is only because theorists have a hard time taking their reviewers points of view and problems into perspective. They may and likely do. But I also think that this kind of work is generally under-appreciated and would be a much harder sell than, say, a project based on analyses of some language L or, for that matter, experimental work. Am I wrong? Maybe.

      There is a tad of blame the looser in your note. I assume that it is not intended. But from where I sit, deciders are not as pure and open minded as you hint at (hard as their lot is, no doubt) and the complainers are not as uniformly arrogant, lazy and stiff knocked as you suggest (incompetent and whiny thought they certainly are in general)? What the current culture values is competition. It supposedly improves the quality of what gets done. Winners especially like this. I am much less sure than others that this is correct. At the very least, the virtues of competition elude me when the number of winners is radically reduced. When that happens, the system, even if fair, may be very deleterious to one's health.

    5. To be clear, I'm voicing an often-heard-remark, rather than quoting you.

      I am not, in these remarks at least, taking any position on the purity or open-mindedness of the deciders. I'm just saying that it is helpful to think carefully about who is standing in judgment on one's proposal (or paper, or application). I fully agree that the system is especially hard on highly qualified young researchers who narrowly miss the cut.

      I don't think that I will ever convince you that NSF Linguistics values theoretical work. But the best evidence that I can offer is the comparison between semantics on the one hand and syntax/phonology on the other hand. Semantics has fared (relatively) well in recent years, and I think it's instructive to consider the reasons for that difference.

      As we have discussed in the past, though, I think that another difference lies in the motivation of the submitters. Folks whose work is impossible without the money often just keep coming back. They'll try, and try, and try. In other areas, it's easier for people to walk away from the process. I don't expect you to believe me on this one either, and one can undoubtedly quote exceptions to this. But based on observing the process, covering hundreds of proposals, and seeing cases that just keep coming back, they tend to be folks whose research or tenure most depend on it.

      Things that float to the top do tend to have some kind of a "wow" factor. Which might or might not be correlated with quality. And I would not disagree that some things are easier to convey as "wow" than others. I have certainly seen proposals that are superficially wow but have less substance (the case that I'm thinking of right now is not in linguistics).