Comments on Faculty of Language: Scientific Publishing in a Modern World: A Thought Experiment

@Ewan: What sort of observational data do you thin...

2016-09-02T17:48:39.233-07:00

@Ewan: What sort of observational data do you think would be useful? I'm having a hard time imagining how you could collect such data. In order to do so, it seems like you would already have to have some amount of buy-in from institutions for the system you wanted to collect data about so that you could have some people actually using the system and then see how it works for the two things you mentioned above: "getting good quality science" and "performance evaluations for giving out money and jobs".

I don't see that as likely. Unless you have in mind some other way of setting up a small set of users to test run the system. But then I think the results would be biased toward showing the new system to be abysmal because universities/institutions wouldn't have yet bought into the system and so it wouldn't work well for the "performance evaluations for giving out money and jobs" part.

I think Thomas is right about this that in order to have some sort of paradigm shift, we need slow change that incrementally builds on how things currently are.

@Thomas: Thanks for this post! I generally like the system you sketch, though I'd like to see a better thought-out solution to the problem of double-blind review that Brooke raised.

And I also think some of these things will be really hard to work toward. As you said, moving towards plain-text authored manuscripts and version control is a really high bar to begin with. Key signing stuff would definitely need a GUI.

I do think some of the other things could be slowly implemented as you said. One other possible place to start, too, is with Lingbuzz. Sometime sort of recently, there were some rumblings about trying to do something since the Lingbuzz servers always seem to be down. Perhaps that could be a jumping off point, especially since Lingbuzz already has some sort of versioning built into it. (Though these rumblings have died down for various reasons.)

If anyone is interested in trying to take some concrete steps towards something like what Thomas sketched (or something else), I'd be really interested in collaborating/helping out. I think this is a huge problem that really needs to be addressed.

I'm not sure that any of these options are goo...

2016-09-02T17:24:41.094-07:00

I'm not sure that any of these options are good answers to Brooke's question about double-blind review. The main problem, I think, is that it's not at all clear when would be the appropriate time to reveal that you were the author of a paper. In the current system, there's a very clear time when this happens. And after this point, once the paper has been accepted, there's not as much harm that can be done by (sub)conscious gender/race/etc bias. You have a line that you can put on your CV that contains a prestigious-sounding name like Linguistic Inquiry.

On the other hand, in the system you sketch, there's not a clear point at which a paper has been accepted/published. Say reviewing network X reviews the paper and gives you a good review and a "like". This boosts your paper's rating because likes from reviewing networks are rated highly. Wanting to get credit for this paper, you decide to reveal your authorship. After this, another reviewing network Y reviews your paper and because of (sub)conscious bias, they give you a negative rating and a "dislike". Say this then happens again with network Z.

Your paper is now rated very poorly because the weightings from the negative reviews of networks Y and Z greatly outweigh the one positive rating from network X.

I think it's important to think of a good way to address this issue since this really seems to still be a problem (e.g., here). One off-the-cuff thought is that perhaps a restriction could be implemented such that there could only be one review from one reviewing network in any given field. Off the top of my head, I'm not exactly sure how you would implement this technically, but perhaps it could be something that is just generally agreed upon by everyone. Or someone smarter than me could think of a way to implement it technically. :p

But this would then give you a clear point at which it makes sense to reveal your authorship of a paper.

Actual roll-out would require a slow, incremental ...

2016-08-01T16:23:14.917-07:00

Actual roll-out would require a slow, incremental phase-in that starts out as an extension of the current system and keeps growing in functionality until it is the dominant force in the market, at which point backwards compatibility to the old model can be slowly phased out over 10 or 20 years.

So I would start with torrents as an additional method of distribution for open access articles. Pretty much anybody should like that because it distributes network load, which reduces hosting cost. Then you add a mechanism for signing torrents, and an option for authors to include the source files in the torrent. Then you expand that to include full version history. Then you add multiple signing (publisher + author). And so on. Basically, put the distribution system in place, and once that is ubiquitous you move on to evaluation/reviewing.

There I would start by building up the importance of blogs and pushing for a more flexible system for computing metrics --- everybody knows that impact factor and citation counts are pretty crude, so if you can offer more information publishers and university admins would like that, too.

It really has to be done in this kind of incremental fashion, sort of like open source development: release early, release often. Proprietary development style --- work in secret until you have a finished product to push out --- will not work well, I think.

I think there must be enough (observational) data ...

2016-08-01T14:09:39.954-07:00

I think there must be enough (observational) data available from existing networks + existing sets of experience with peer review. I don't know how to synthesize it, but the sheer number of intuitively plausible ideas about how to attack these problems, and the number of details that spring to mind about the reviewing system that would "of course" need to be implemented, just within the space of a few comments, plus the fact that we don't even really have a proper negative spec sheet (for specific problems with the current system to be avoided) - well, that's not enough to absolutely demand careful empirical work first, I agree. What does lead me to think that the initial product better be pretty damn good is that it will be hard to get buy-in in the first place.

2016-08-01T14:07:52.027-07:00

This comment has been removed by the author.

I think we worry about slightly different things h...

2016-07-31T16:17:02.239-07:00

I think we worry about slightly different things here. My main goal was to address the technical side of publishing, because there is this common narrative that publishing must be centralized, inherently costly, and so on. It would be nice to see people dream a little bigger than "how do we add open access to the status quo?".

How you actually implement such as system in practice is a different matter, though the two are intertwined of course. I'd say in practice it will even be a Herculean task to get scientists to author in plain text and use version control. No matter how easy you make the tools, people hate changing their ways.

For the reviewing, I still believe that it isn't too difficult to create incentives. All you need is to attach a score to authors that reflects their reviewing diligence. Suppose you submit a paper, which is reviewed by X, Y, and Z. After a grace period of two months, they look at the latest version of the paper, and while X is happy that you addressed his/her concerns, Y and Z are annoyed because you completely disregarded everything they said. So X gives you a good review score and writes a few nice words, Y and Z give you low scores and leave some sternly worded remarks. The next time you submit a paper, reviewers can see your less than stellar score and some may simply decide not to review the paper because you of your bad revising track record. Fewer reviewers means fewer signatures from prestigious reviewing networks, which means fewer readers and less valuable papers. So that's a strong incentive: make careful revisions (and document what you did in the version control system), or you're jeopardizing the success of future papers.

Of course that system also needs some checks & balances, you don't want some crazed reviewer to turn you into a pariah. So reviews and reviewers should also be rated by you, the community, and other people that reviewed the paper, and this may reduce or increase their weight.

I'm sure there's many other things one could try; all the things you suggest sound very reasonable. You are right that at this point this is groping in the dark, but I don't think you'll be able to learn much from experimental studies either. Distributed systems with many checks and balances are inherently dynamic, and they require a certain level of maturity on the part of the community. Researchers have to learn how to interact in this new system, and the dynamics may be very different in a community of 100 and one of 1000. A professional writing/reviewing community is also very different from Amazon reviews or Facebook likes.

If something's horribly broken, you could probably detect that in a trial run, but the real implications will only surface years if not decades after broad adoption (as was the case with our current publishing system). So rather than carefully test and vet everything before the release of the perfect publishing system, it might be better to just do it and roll with the punches.

My response to the evidence about blog comments be...

2016-07-31T03:15:35.417-07:00

My response to the evidence about blog comments being responsive is that I indeed, as you suspect, get a feeling it's optimistic selection of examples. I don't believe that in the aggregate blog comments are anywhere near as responsive as responses to reviewers, nor that ACL revisions are anywhere near as responsive as journal revisions. It seems pretty self-evident that the distribution of quality-of-responsiveness-to-reviewers changes a lot as a function of how much skin you've got in the game. The current system puts direct financial incentives on improving the quality of the science. The commercial software cases you cite also put some strong financial incentives on the developers to improve the software, so they aren't comparable to this system.

I'd have to see it demonstrated empirically that there's some other way of getting useful feedback in before I'd invest any time or money into a system like this. Downstream, that's needed for two things. One, getting good quality science. Two, performance evaluations for giving out money and jobs. You can bet that the standards for institutions with money as far as demonstrating that this system is not worse than the current one are going to be a lot more rigorous than my own.

That's my gut reaction to the empirical basis of this.

I don't doubt that the right comments model with moderation could put a fair amount of pressure on authors to make substantive changes to their papers. I just don't have a sense a model currently exists. The snippet extraction part sounds like a good idea. Maybe something else that doesn't exist right now would be "conditional likes," the equivalent of an accept-with-revisions. Raters could flag individual comments as conditional likes, meaning that they'll be willing to rate your paper higher if you make the necessary changes. Another thing that isn't present in any commenting system are prompts for specific, rather than general questions. I get the sense that being asked to comment on a fixed, narrow set of dimensions and enforcing that the authors reply, as in Frontiers reviews, is an order of magnitude better in terms of quality of reviews and quality of responses.

It feels like groping in the dark looking for things that would work. I get the intuition about what might or might not work, and some conflicting intuitions about how well they'd work, but the stakes are high. On none of the cases cited here do I have a large set of useful, carefully collected data to refer to to check these intuitions against. Is there some funding opportunity that would fund empirical research into the dynamics of networks like this - studying peer review, software development, online comment and reputation systems - so that we can have something to go on? I've had my eyes open for such a thing (as a side project to build a useful review/reputation system for journals and publishing venues).

But that's where your main assertion comes in:...

2016-07-30T18:08:35.086-07:00

But that's where your main assertion comes in: bad papers cannot be salvaged, readers won't give a paper a second change, so the likes/dislikes system has no bite. But that is operating with the assumptions of a world where papers are regarded as finished products (in contrast to research projects, which are always in flux). In the system I sketch, papers don't need to be monolithic, they can keep evolving, and the readership understands that (because it's an ideal readership that has no preconceptions about what academic writing should look like).

The best analogy are actually software products, in particular video games. Games can get good reviews, but with major caveats like "don't buy now, still too buggy" or "muddy textures, developer promises HD texture pack for next update". And developers actually patch and upgrade their games after the first release because there is a certain grace period of good will in which you get to fix the problems before buyers lose interest and move on. Each patch clearly indicates what issues got fixed, and the gaming community adjusts its opinions accordingly. If the game is just too broken (Assassin's Creed Unity) or the patches arrive way too late (Batman Arkham Knight), then there's no way of turning things around and you do have a major commercial failure on your hands. But that's the equivalent of papers that are so bad they cannot be salvaged without starting from scratch, and we've all seen our fair share of those.

There's also cases of developers releasing patches years after release to drum up publicity for the soon-to-be-released sequel. Or some indie developers just love their fans so much they give them an anniversary present in the form of a few bonus levels. This has all happened. The point being: the publishing system I sketch has a completely different dynamic, it does not follow the logic of incentives that is inherent to our system.

[Part 1 of comment] Maybe I'm too idealistic, ...

2016-07-30T18:06:28.875-07:00

[Part 1 of comment]
Maybe I'm too idealistic, but that strikes me as a very cynical view of how academics write papers. I'll first list a few personal experiences, and then address the more systemic point.

So let's look at the examples. For computational linguistics conferences, you submit full papers and get feedback from the reviewers. Nobody has ever checked whether I integrated that feedback into the published paper, nor do I have to submit a changelog of the revisions. So I could be a lazy sod and just disregard the criticism. But I don't do that, because I want my paper to be as good as possible. It has my name on it, after all.

Similarly, it happens that authors have journal papers accepted for publication with no revisions required. In my case, the editors still urged me to take the reviewers accounts into consideration, but I didn't have to. The reviews themselves said the paper is fine as is. But I still put a lot of effort into revisions, because I could see that the suggested changes would improve the paper.

I also disagree with your claim about blogs. People do correct their blog posts. You'll often find posts where some parts are crossed out, followed by a clarifying remark why the original statement is wrong. And the reason is obvious: somebody pointed out the mistake in the comments, and if the blogger doesn't address that, their readership will stop trusting them and move on to a different blog. Not revising your papers is a good way of building up a bad reputation, with fewer and fewer people willing to read (and review!) your papers.

And this takes us to the systemic point: you absolutely want to revise your paper in the system I sketch because 1) reviews are valuable, and 2) the reviews are available online for everyone to see. And in such an ecosystem, you can bet that there will be a way to automatically link papers to their reviews and extract the juiciest snippets to give readers an immediate impression of its overall quality. In combination with the (dis)likes system, this means that a bad paper simply won't get any attention. And if you're known for giving a damn about reviews, nobody will write any reviews for you and you'll never get the prestigious signatures from reviewing networks.

In principle, double blind is easy, just don't...

2016-07-30T17:28:22.723-07:00

In principle, double blind is easy, just don't put a name in your paper, don't attach a signature to the torrent, and share the magnet link anonymously. Or use a pseudonym with a separate key, as people do all the time online There's two shortcomings: if you're the only body seeding that anonymous paper, you're very likely to be the author. More importantly, though, if that paper turns out to be a huge success, somebody else can claim to have authored it if you didn't sign it. Or they can claim to be the real person behind your pseudonym.

Both can be prevented by setting up proxies: universities or reviewing blogs provide a service where you can register the anonymous paper before publication, and they will sign it with their key ans seed the torrent. If you want to reveal your identity later on, they vouch for you as the true author. Crucially, this can only happen before publication, otherwise anybody could register your paper with them once it is out in the wild.

I'm sure one could set up even more sophisticated methods like attaching a special signature to the paper that can only be read by somebody with your private key, which only you can have. So there's all kinds of ways to safeguard authorship while allowing for anonymity.

I feel like there's something missing from the...

2016-07-30T15:25:01.538-07:00

I feel like there's something missing from the deep review system, and the comment I'm writing now is a case in point. There's no incentive to actually change the paper in response to the reviews. There's every incentive not to - thinking is hard, and doing the ensuing work takes time. In the current system, in contrast, you (1) don't get published, and then (2) don't get jobs/funding if you don't appropriately respond to reviewers (which ideally amounts to "improve the paper").

Thought experiment: what kind of response are you inclined to give to this comment? My impression is that everyone's default response to blog comments is to give a reply that ranges between thoughtless rage and sort-of-thought-out deflections. The chances are pretty low that someone turns around and says, "Ah, I see your point. I take back point 2 completely, and replace it with this new, completely re-thought solution." Once you've thought through and written the damn thing, it's over with, and you have no reason to go back and think through and write any part of it again.

The only potential incentive system here is maybe the paper's popularity, mediated maybe by its likes, and the likes themselves. I'm skeptical as to whether that's enough to make people improve their papers, both because I just don't think it's enough and because I have a feeling the likes and/or the popularity are not going to be responsive to the revision. If the first version was bad, no one's going to go back and read the revision I'd wager.

it might be the case that I missed this in my read...

2016-07-29T08:48:13.837-07:00

it might be the case that I missed this in my reading, but is there any space for double blind reviews in this system? Nothing's 100% effective, but double blind is a good way to mitigate some worries like subconscious or conscious gender biases at the review stage. The system proposed seems to have more points where biases like that could have an effect

2016-07-28T08:02:57.291-07:00

This comment has been removed by the author.