Faculty of Language: Numbers and prose

Saturday, January 17, 2015

Numbers and prose

Sometimes words do speak louder than numbers it seems. Klea Grohmann sent me this link to a recent piece in Nature where the relationship between grant scoring (a number apparently) and grant getting were not that closely related to one another. The Medical Research Council (MRC), the funding agency, defends this by saying that the raw scores don't always line up with comments made and that the latter can be more revealing. This sounds plausible to me right now given that I am reading admissions files. At UMD, letters of recommendation forms ask for both a numeric rating of a candidate's attributes (how smart, original, motivated, mature etc) as well as a letter. To my surprise these often don't seem to match up that well. So, I can believe that this is also true at the MRC.

However, I suspect that what's closer to the truth is a comment made in the last paragraph by Ferric Fang. He notes that grant funding is something of a lottery at present and that with money becoming scarcer relative to number and quality of grants there will be more disappointed (and so disaffected) applicants. Moreover, the fact that there are so many good quality grants being submitted (and I believe that there are) only makes whatever distinctions utilized to make a decision seem arbitrary (largely because they are so). The hard decision arises between equally strong options. And hard decisions rely on more tendentious criteria precisely because the most obviously relevant ones don't decide. Isn't that what stairwells are for?

Last point: I believe that this is likely also true for linguistic grants. There is not enough funding for linguistic research, or more accurately, the funding has stayed about the same while the number of people shooting for it has greatly increased. I have the impression that landing a grant is much much harder now than it used to be, somewhere in the vicinity of 10% or less. I sincerely doubt that the criteria segmenting the top 10% from the next 10% are all that reliable, so that means that a lot of granting decisions are haphazard. Note that this does not mean that they are unfair, or dishonest, or done in some underhanded way, as Ian Eperon suggests in the link piece. Rather, it is that among roughly equal applications there is no reasonable way to make a decision, so the choices become noisy, influenced as they must be by rather minor differences among the applications. When all deserve funding but all cannot get it then even if there is no difference between winners and losers it does not mean that winners and losers are selected unfairly. What's unfair about the flip of a coin?

At any rate, someone with more knowledge about the ling funding scene or the psycho/neuro ling funding scene than I have please chime in. I would love to know how much our world resembles theirs. Thx Klea.

9 comments:

Alex ClarkJanuary 18, 2015 at 5:33 AM
On the reliability of the review process, NIPS (one of the big machine learning conferences) last year, had as program chairs Neil Lawrence and Corinna Cortes, who decided to run an experiment with two parallel PCs, where the papers were reviewed independently. As this article puts it:

"TL:DR Half the papers appearing at NIPS would be rejected if the review process were rerun."

It's a problem in all fields, even the "hard" branches. The NIPS experiment generated a lot of discussion, but it remains to be seen what effects it will have.
ReplyDelete
Replies
Colin PhillipsJanuary 18, 2015 at 8:47 AM
One quantitative remark. I don't see why choices should become more arbitrary once the funding cut-off becomes more extreme. Surely any cut-off point will be associated with noise, and hence arbitrary outcomes. And the number of arbitrary outcomes should be greatest when the cut-off point is in the densest part of the scoring distribution.

This is related to Alex's point about the NIPS experiment (which is a neat idea, for sure). The phrasing that "half of the papers at NIPS would be rejected in a re-run of the review process" invites the notion that the review process was a waste of time. But that's surely not true. Imagine that you run the Olympic 100-meter final many times. There are 8 guys, generally closely matched. Let's say that Usain Bolt always wins -- he's just that good. And then let's say that there are two guys that consistently get the silver medal, depending on the race. The rest of the guys are really, really fast, but they never make the top 2. That's the same situation as the NIPS outcome, but we'd conclude that the outcome is surprisingly reliable.
ReplyDelete
Replies
Colin PhillipsJanuary 18, 2015 at 9:08 AM
One other comment on my experience of the review process in the US, based on various settings at NSF and NIH. The review process works differently at different funding agencies, and that would affect the way that scores might mis-align with comments.

1. Different NSF programs use different procedures. Some programs rely entirely on a panel (i.e., only the handful of people in the room read the proposals). Linguistics follows a model where a panel acts as an "editorial board", reviewing lots of reviews submitted by external experts. The external reviewers can provide crucial expertise (on a language, on a method, etc.), but since they review only one proposal, they have little context. And those reviews vary greatly in their quality ... and in whether they even send a review. The job of the panelists is not to average the external review scores (we have spreadsheets for that), but to weigh the comments from the various reviews. That means that the panelists may decide that one reviewer is unfairly negative, or is merely cheer-leading for a friend's proposal (both of those are disappointingly common). The panel's review may therefore rely heavily on one external reviewer more than others. This approach isn't perfect, but I have seen plenty of cases where it has successfully neutralized the effects of an unfair reviewer.

2. NIH uses a different model. Reviews are mostly done by a panel, and all panelists (as many as 20) get to score a proposal, although generally only 3 panelists read the proposal. This means that much depends on the trust and oratory of the 3 assigned panelists.

3. Funding agencies do have a notion of "portfolio balance". They are aware of the different constituencies that they serve/support, and do not want to overly favor one area over others. Nor do they want to create the impression that "Agency X just doesn't like area Y". One can disagree with the strategy, and the same complaint will arise regardless, but it's a definite factor. And in small samples it could lead to further mis-alignment between scores and funding outcomes.

4. Yes, there's a lot of good work out there that is not getting funded. But there's also a lot of variability in how well submitters present their proposals. Effective writing really does make a difference. Submitters need to get their reviewers excited about the proposal, and need to also make it easy for reviewers to explain why they are excited. Some proposals are extremely well presented, and others are dreadful (with a lot in the middle, of course).
ReplyDelete
Replies

Add comment

Faculty of Language

Comments

Saturday, January 17, 2015

Numbers and prose

9 comments:

Contributors