Wednesday, July 27, 2016

Scientific Publishing in a Modern World: A Thought Experiment

Norbert and regular readers of this prestigious blog may have seen me participate in some discussions about open access publishing, e.g. in the wake of the Lingua exodus or after Norbert's link to that article purportedly listing a number of arguments in favor of traditional publishers. One thing that I find frustrating about this debate is that pretty much everybody who participates in it thinks of this issues as how the current publishing model can be reconciled with open access. That is a very limiting perspective, in my opinion, just like every company that has approached free/libre and open source software (aka FLOSS) with the mindset of a proprietary business model has failed in that domain or is currently failing (look at what happened to OpenOffice and MySQL after Oracle took control of the projects). In that spirit, I'd like to conduct a thought experiment: what would academic publishing look like if it didn't have decades of institutional cruft to carry around? Basically, if academic publishing hadn't existed until a few years ago, what kind of system would a bunch of technically-minded academics be hacking away on?


 As you might have already gleaned from the intro, this post is rather tech-heavy. That said, I'll try my best to keep things accessible for people who have made the much more reasonable decision of not spending many hours a week on Linux Today, HowtoForge, the Arch wiki, or the website of the Electronic Frontier Foundation. Putting aside technical matters, the publishing model we would see in my hypothetical scenario has three fundamental properties:
  1. Individualistic Every scientist is a publisher. Every scientist is a reviewer. Every scientist is an editor. Instead of an infrastructure that locks scientists into specific roles with lots of institutional oversight, scientists directly share papers, review them, and curate them.
  2. Crowd Sourced By making every scientist an active part of the publishing model, you open up the way for a 100% crowd-sourced publishing and archiving infrastructure. All the issues that are usually invoked to motivate the current model --- administrative overhead, copy-editing, hosting costs --- are taken care off collectively by the community.
  3. Fully Open The current system involves lots of steps that are hidden away from the community at large. Authors do not share the production pipeline that produced the paper (software, source code, editable figures, raw data), only the final product. Publishers do not share the tools they use for administration, hosting and editing. Reviewers do not share their evaluation with the community, only editors. The hypothetical system makes all of this available to the community, which can learn from it, critically evaluate it, and improve it as needed.
To keep things concrete, I will (try to) explain how the system works from the perspective of two participants: Joe, who's an average guy and just wants to get his results out there, and Ned, the tech nerd. We'll go through the following steps: writing, distribution, reviewing.

Writing

Joe and Ned do not use the same tools in their writing, but both use systems that separate content from presentation. This means that the source format they write in allows for many different output files to be produced, depending on purpose: pdfs for printing and presentations, epub for e-readers, html for online publishing.

Joe likes things to be simple, so he chose a markdown dialect known as pandoc. Markdown allows Joe to throw out his word processor and instead replace it by a text editor of his choice. As a Windows user, Joe eventually wound up with Notepad++ and its pandoc plugin Joe is happy that he didn't have to pay for Notepad++ (he's sick of spending money on the newest version of MS Office every few years). Notepad++ also loads much faster than any word processor on his aging laptop. Learning markdown was also easy for Joe, as it turns out that its syntax is very close to his digital note taking habits anyways. With the extensions of the pandoc format, he can do almost everything he needs: headings, font formatting (bold, italic), paragraphs, lists, footnotes, figures, tables, links, all the basics are there. Semantic formulas could be easier to write, but it works. Trees are automatically produced from labeled bracketings, although the syntax for adding movement arrows took some getting used to. And for glossed examples he had to adapt an old trick from his word processor and use invisible tables to properly align all words. But Joe is confident that once more linguists use pandoc, these kinks will be quickly ironed out.

In contrast to Joe, Ned believes that a steeper learning curve and heavier time investment is often worth it in the long run. He has put many hours into learning Latex, in particular tikz, with the major payoff that he now has perfect control over his writing. With tikz he can handle even the most complicated trees and figures, typsetting semantic formulas is pure joy, glosses work effortlessly, and he can write mathematical formulas that would make every proprietary typesetting software sweat. Ned has to pay special attention to portability, though, since some Latex tricks do not work well with HTML and epub instead of pdf. Recently, Ned has also discovered that he can replace the Latex engine by Luatex, which allows him to extend Latex with Lua scripts. Overall, Ned is confident that his time investment has paid off and that Latex will still be around for many decades to come thanks to a large community that has been going strong since the mid 80s.

Both Joe and Ned also love that the source files for their papers are now in plain text and can be read by anybody with a text editor. Compilation to an output format increases readability, but the content of the paper is perfectly clear from the source code itself. This ensures that no special software is needed to open and read these files. Even 50 or a 100 years from now, researchers won't have a problem reading their papers --- for if civilization has forgotten how to read plain text files, it has forgotten how computers work.
Since everything is plain text, Joe and Ned can also put their papers under version control, e.g. git. This allows them to record the entire history of the paper, from the first few sentences to the final product. They can also easily sync the papers to an external server --- Ned rolled his own via cgit, Joe went with the more user-friendly github. With easy syncing and detailed version control, it is also very simple for the two of them to write papers collaboratively.

Joe and Ned now happily write their papers using their own workflow. There's no publisher telling them what software or template to use, how long their paper may be, nor whether they should use American or British spelling. They each produce the paper that they think conveys their ideas best while being as pleasant to read and look at as possible. They realize that this freedom also means that they cannot rely on somebody else to fix things for them, but with great power comes great responsibility.

Distribution

Both Joe and Ned now have several papers that they want to share with the world. They make them available on their personal websites in a variety of formats, but it occurs to them that this is not a particularly good way of going about things. Sure, by hosting them on their website they clearly signal that they wrote these papers and endorse them in the current form. But their website is not a good venue for promoting their work, nor is it a safe backup. What if their website goes down? What happens after they die? There has to be a better way of doing this.

Joe and Ned briefly consider other options, in particular paper repositories such as Lingbuzz and arXiv. But those have no guarantee of availability either, and they show just how easy it is for Joe and Ned's work to get lost in an uncurated sea of papers. And then there's of course the issue of cost: if a few servers get hammered by thousands of downloads and uploads every hour, whoever has to keep those servers running needs to pay big bucks for hardware and system administration. This makes it hard for volunteers to shoulder the burden, and for-profit repositories cannot be relied on in the long run. It seems that any solution with a single point of failure is no solution at all.

Ned quickly realizes, though, that distributing a paper is not the same as hosting a paper. Taking a hint from pirates and Linux distros, he decides to use peer-to-peer file sharing. He creates a torrent of his paper (all output formats + the whole version control history) and puts a magnet link to it on his website. He also uploads these magnet links to a number of archives. Ned makes sure that his paper is always seeded by running a torrent client on his home router, and he asks everybody who downloads the torrent to keep seeding it. So now Ned can rely on the community to keep his paper available even if his website and all the repositories go offline.

Because Ned is lucky enough to live in a hypothetical world where ideas succeed based on their merits, other researchers follow Ned in adopting peer-to-peer distribution of their papers. As the number of users grows, so does the number of seeders, and the network becomes more and more resilient to failure. Moreover, paper repositories no longer host pdfs but just magnet links, which only take up a few bytes each instead of several megabytes. This reduces bandwidth usage by a factor of 1,000, and the size of the repositories shrinks from terabytes to megabytes. All of a sudden, everybody with a little bit of hard drive space can make a full backup of these repositories or run a mirror. Tech-savvy researchers do so across the field, greatly improving resilience. It is now impossible for a paper repository to disappear --- if one server gets shut down, hundreds of mirrors are ready to take its place. Even if all servers were to magically disappear over night, many researchers would have local backups on their computers that they could share online to get a new server started. Since the actual distribution of files is completely decoupled from the distribution of magnet links, even a total loss of servers and backups would not mean a loss of papers --- researchers would just have to reshare the magnet links for the torrents, which are still alive and well in the torrent network.

Libraries and professional societies also take notice and start creating dedicated backup torrents that collect entire years or decades of publications under a single magnet link (actually, they create a script to scrawl the web and share it with the community, so there is little actual work involved). At first, these much larger torrents (several GB per year) are shared only by professional institutions and power users like Ned with lots of network storage. But as the prices for mass storage keep plummeting, even Joe finds that he can pay his debt to the scientific community by purchasing a NAS with several TB for 200 bucks. The NAS comes with a torrent client built in, so all he has to do is turn it on and seed these backup torrents. Over the years, seeding becomes an expected part of academic life, just like reviewing and administrative duties are in our world --- in contrast to the latter, though, the time commitment is a few minutes per year at most.

But Ned, eternal tinkerer that he is, still isn't completely happy with the system. Recently some trolls played a prank on him: they downloaded one of his papers, replaced all figures by 4chan memes, and shared the modified version of the paper as a very similarly named torrent. Many of his colleagues fell for the scam and wrote him dismayed emails about his homophobic agenda. In order to prevent such abuse in the future, Ned decides that torrents need something like a certificate of authenticity. Since Ned already has a PGP key for encrypting email, he decides to sign all his torrents with his key --- torrents that aren't signed with his key clearly aren't his. Again people like the idea, and everybody starts signing their torrents. Professional societies jump on the train and offer double signing, the service to sign torrents with their society key that have been signed by one of their members. This makes it very easy to design a filter that only accepts torrents signed with one of these society keys. Some nifty programmers also develop a tool that allows other academics to sign torrents they downloaded and verified for correctness, creating a distributed web of trust in addition to the central verification via professional societies.

Another thing that irks Ned is that the papers are shared in a distributed manner, while magnet links are not. If the evil dictator of Tropico wanted to limit access to scientific papers, they could DNS block all paper repositories. Tech-savvy users would get around a DNS block in minutes, of course, and mirrors can be created faster than they can be blocked. But it still makes it much harder to access papers for most researchers. So Ned looks around a bit and learns about Freenet and Zeronet, which extend the peer-to-peer concept to websites. Ned starts a distributed magnet link repository that can be shared by the community just like any other torrent, without any central servers or centralized DNS records. Now only deep packet inspection could restrict access to these repositories, but since the traffic is encrypted this isn't possible, either. The result is a network that is completely hosted by the scientific community, which guarantees that it is freely accessible around the globe as long as this community exists.
Joe and Ned now both live in a world where they can easily create and distribute papers. They can rest safely in the knowledge that the burden of paper distribution and archiving is shouldered by the whole community. But while papers are easier to share than ever before, it is also harder than ever before to find good papers. In a sea of thousands of papers, it is hard to separate the wheat from the chaff.

Evaluation and Review

What Joe and Ned's world still lacks is a system to indicate the quality of a paper. That is not the same as missing reviews. Reviews are easy, because they do not differ from any other paper: any academic can write a review and distribute it as a torrent. If the torrent isn't signed (and the author didn't include their name in the review), the review will automatically be anonymous. But if a researcher wants to know whether a paper is worth an hour of their time, the answer cannot be to spend several hours tracking down reviews and reading them. There must be a system to quickly gauge the quality of a paper within a few seconds, and to easily find more in-depth reviews.

The typical, user-based review system of Amazon, Netflix and Co will not do. It requires complicated software (at least a full LAMP stack), is tied to specific paper repositories, makes backups and mirrors much more complex, and does not work with Freenet or Zeronet. Again we need something that is platform independent, community-hosted, easy to backup, and built on robust technology that will be around for many years to come.

Note that these are all properties of the paper distribution system in Ned and Joes' world, so the best choice is to directly integrate the review system into the existing infrastructure. We want two levels of review: shallow review, similar to facebook likes, and deep review, which mirrors modern peer review.

Shallow review amounts to adding up how many people like a paper. In other words, we want to know how trusted a paper is by the community, which takes us to a point we already mentioned above: the web of trust. Even good ol' Joe now understands that torrents can be signed to indicate their authenticity via a web of trust, and the same web of trust can be used to indicate the quality of a paper. Instead of just a verification key, researchers have three types of keys: a verification key (to guarantee that they authored the paper), a yea key (for papers they like), and a nay key (for bad papers). After reading a paper, Joe can sign a paper to indicate its quality, or just do nothing --- e.g. if he felt the paper is neither particularly good nor particularly bad, or he doesn't qualify to judge, and so on. By tallying up the positive and negative signatures of a paper, one can compute an overall score of quality: 87%, 4 out of 5 stars, whatever you want. Readers are free to define their own metric, all they have to do is count the positive and negative signatures and weigh them in some manner. Ned will happily define his own metrics, while Joe goes with a few that have been designed by other people and seem to be well-liked in the community. So the shallow review is neutral in the sense that it only creates raw data, rather than a compound score.

Having three keys instead of one slightly complicates things since one now has to distinguish between keys signing for authenticity and keys signing for quality. But that is something for people even more technically minded than Ned to figure out. Once a standard has been established, the torrent clients for end-users like Joe just need an option to indicate what key should be used for signing. Joe picks the right option in the GUI and thereby casts his digital vote. Paper repositories are updated at fixed intervals with the web-of-trust scores for each torrent. Some dedicated services will be started by the community or commercial enterprises to analyze the readily accessible raw data and convert it into something more insightful, for instance what kind of researchers like your papers. With a minor extension of the infrastructure, Joe and Ned now enjoy a system where each paper has quality data attached to it that can be converted into whatever metric is most useful to authors, readers, universities, and so on.

But a score still isn't exactly a good way of assessing or promoting a paper. Nobody goes through a paper repository to look for the most recent papers with 4+ stars.And that one paper has much more likes than another says little about their relative scientific merit. Ned is worried about this, but before he even has time to come up with a solution, somebody else does it for him: many researchers are already blogging, and many of them use their blogs to review and promote papers. This is the seed for a much more elaborate and professional system for deep review.

In order to attract an audience, blogs need to specialize. Over time, then, some of them start to focus on paper reviews, garnering them a devout following. The blogger writes a review of a paper, taking great care to indicate the reviewed version in that papers version control history (we'll see in a second why that matters). He or she also signs the paper with a positive or negative key, depending on overall evaluation. If the blogger is well-known, metrics that turn signatures into scores may take this into account and assign this signature more weight than than others, which we might consider a reflection of the blogger's impact factor. Some tools will also be able to pick out the signatures of prominent bloggers and do a reverse search to automatically load the review of the paper. Other tools keep track of signatures of torrents and notify an author when a signature by a prominent reviewer has been added to one of their papers. They can then take this review into account, revise the paper, and incorporate the new version into the torrent (that's why it matters that reviews link to specific commits in the version history). The reviewer, in turn, can revise their review if they're so inclined. Since blog posts are produced like any other paper, they too can be put under version control and distributed in a peer-to-peer fashion. This creates a reviewing ecosystem where reviewers and authors can interact in a dynamic fashion, all changes and modifications are preserved for history, both papers and reviews are readily accessible to the community, and readers can define their own metrics to determine if a paper is worth their time based on who signed it.

But things do not stop here. To improve their coverage and grow their audience, some of the blogs join forces and form reviewing networks, which they run under flowery names such as Linguistic Inquiry and Natural Language and Linguistic Theory. These review networks are very prestigious entities, so their signatures can affect scores a lot. Since they have a large readership, they are also essential in getting many people interested in your paper. Many reviewing blogs also seek out experts in the community for reviews, who can do so anonymously if they want. Reviewers are paid by having their signing key signed by the reviewing blog. Since those are prestigious keys, metrics can once again decided to value the signature of an academic more if it is signed by a reviewing network. Consequently, many academics are eager to review for these blogs in order to increase their own prestige and the influence they wield over the reception of other papers. Without any intervention by Ned, a reviewing system has naturally developed that is incredibly detailed and comprehensive while being easy enough for Joe to participate in.

Wrapping Up

So there you have it, my pipe dream of what publishing would look like if it wasn't tied down by existing conventions and instead was newly designed from the ground up with modern technology in mind. I skipped some points that I consider minor:
  • copy editing: author should read carefully or hire a copy editor if they absolutely want one; dedicated community members can suggest paper revisions via forks and pull requests on git repositories (one of the many advantages of plain text + version control)
  • doi: bad solution since it's centralized, and not really needed if you have magnet links; but if you absolutely want to you can register each magnet link under a doi
  • page numbers: make no sense for HTML and epub, so switch to paragraph numbering
  • aggregators: just like blog aggregators, these provide meta-reviews and promote the best promotional blog entries
  • conference proceedings: dedicated signing key for validation and review + a separate blog, which may also be just a category on some bigger blog
And then there's also some systemic issues that are hard to fix, in particular the Matthew effect (the rich get richer, the poor get poorer) and ratings hysteria. However, those issues also exist in the current system. Eradicating them is hopeless, I think, though one would like to have some mitigating strategies in place. The fact that signatures are metric neutral is helpful, as every reader can define their own metrics, and professional organizations can give recommendations on what a good metric should like. If you have any other suggestions, or you feel the need to burst my naive tech bubble, the comments section is all yours.

13 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. it might be the case that I missed this in my reading, but is there any space for double blind reviews in this system? Nothing's 100% effective, but double blind is a good way to mitigate some worries like subconscious or conscious gender biases at the review stage. The system proposed seems to have more points where biases like that could have an effect

    ReplyDelete
    Replies
    1. In principle, double blind is easy, just don't put a name in your paper, don't attach a signature to the torrent, and share the magnet link anonymously. Or use a pseudonym with a separate key, as people do all the time online There's two shortcomings: if you're the only body seeding that anonymous paper, you're very likely to be the author. More importantly, though, if that paper turns out to be a huge success, somebody else can claim to have authored it if you didn't sign it. Or they can claim to be the real person behind your pseudonym.

      Both can be prevented by setting up proxies: universities or reviewing blogs provide a service where you can register the anonymous paper before publication, and they will sign it with their key ans seed the torrent. If you want to reveal your identity later on, they vouch for you as the true author. Crucially, this can only happen before publication, otherwise anybody could register your paper with them once it is out in the wild.

      I'm sure one could set up even more sophisticated methods like attaching a special signature to the paper that can only be read by somebody with your private key, which only you can have. So there's all kinds of ways to safeguard authorship while allowing for anonymity.

      Delete
    2. I'm not sure that any of these options are good answers to Brooke's question about double-blind review. The main problem, I think, is that it's not at all clear when would be the appropriate time to reveal that you were the author of a paper. In the current system, there's a very clear time when this happens. And after this point, once the paper has been accepted, there's not as much harm that can be done by (sub)conscious gender/race/etc bias. You have a line that you can put on your CV that contains a prestigious-sounding name like Linguistic Inquiry.

      On the other hand, in the system you sketch, there's not a clear point at which a paper has been accepted/published. Say reviewing network X reviews the paper and gives you a good review and a "like". This boosts your paper's rating because likes from reviewing networks are rated highly. Wanting to get credit for this paper, you decide to reveal your authorship. After this, another reviewing network Y reviews your paper and because of (sub)conscious bias, they give you a negative rating and a "dislike". Say this then happens again with network Z.

      Your paper is now rated very poorly because the weightings from the negative reviews of networks Y and Z greatly outweigh the one positive rating from network X.

      I think it's important to think of a good way to address this issue since this really seems to still be a problem (e.g., here). One off-the-cuff thought is that perhaps a restriction could be implemented such that there could only be one review from one reviewing network in any given field. Off the top of my head, I'm not exactly sure how you would implement this technically, but perhaps it could be something that is just generally agreed upon by everyone. Or someone smarter than me could think of a way to implement it technically. :p

      But this would then give you a clear point at which it makes sense to reveal your authorship of a paper.

      Delete
  3. I feel like there's something missing from the deep review system, and the comment I'm writing now is a case in point. There's no incentive to actually change the paper in response to the reviews. There's every incentive not to - thinking is hard, and doing the ensuing work takes time. In the current system, in contrast, you (1) don't get published, and then (2) don't get jobs/funding if you don't appropriately respond to reviewers (which ideally amounts to "improve the paper").

    Thought experiment: what kind of response are you inclined to give to this comment? My impression is that everyone's default response to blog comments is to give a reply that ranges between thoughtless rage and sort-of-thought-out deflections. The chances are pretty low that someone turns around and says, "Ah, I see your point. I take back point 2 completely, and replace it with this new, completely re-thought solution." Once you've thought through and written the damn thing, it's over with, and you have no reason to go back and think through and write any part of it again.

    The only potential incentive system here is maybe the paper's popularity, mediated maybe by its likes, and the likes themselves. I'm skeptical as to whether that's enough to make people improve their papers, both because I just don't think it's enough and because I have a feeling the likes and/or the popularity are not going to be responsive to the revision. If the first version was bad, no one's going to go back and read the revision I'd wager.

    ReplyDelete
    Replies
    1. [Part 1 of comment]
      Maybe I'm too idealistic, but that strikes me as a very cynical view of how academics write papers. I'll first list a few personal experiences, and then address the more systemic point.

      So let's look at the examples. For computational linguistics conferences, you submit full papers and get feedback from the reviewers. Nobody has ever checked whether I integrated that feedback into the published paper, nor do I have to submit a changelog of the revisions. So I could be a lazy sod and just disregard the criticism. But I don't do that, because I want my paper to be as good as possible. It has my name on it, after all.

      Similarly, it happens that authors have journal papers accepted for publication with no revisions required. In my case, the editors still urged me to take the reviewers accounts into consideration, but I didn't have to. The reviews themselves said the paper is fine as is. But I still put a lot of effort into revisions, because I could see that the suggested changes would improve the paper.

      I also disagree with your claim about blogs. People do correct their blog posts. You'll often find posts where some parts are crossed out, followed by a clarifying remark why the original statement is wrong. And the reason is obvious: somebody pointed out the mistake in the comments, and if the blogger doesn't address that, their readership will stop trusting them and move on to a different blog. Not revising your papers is a good way of building up a bad reputation, with fewer and fewer people willing to read (and review!) your papers.

      And this takes us to the systemic point: you absolutely want to revise your paper in the system I sketch because 1) reviews are valuable, and 2) the reviews are available online for everyone to see. And in such an ecosystem, you can bet that there will be a way to automatically link papers to their reviews and extract the juiciest snippets to give readers an immediate impression of its overall quality. In combination with the (dis)likes system, this means that a bad paper simply won't get any attention. And if you're known for giving a damn about reviews, nobody will write any reviews for you and you'll never get the prestigious signatures from reviewing networks.

      Delete
    2. But that's where your main assertion comes in: bad papers cannot be salvaged, readers won't give a paper a second change, so the likes/dislikes system has no bite. But that is operating with the assumptions of a world where papers are regarded as finished products (in contrast to research projects, which are always in flux). In the system I sketch, papers don't need to be monolithic, they can keep evolving, and the readership understands that (because it's an ideal readership that has no preconceptions about what academic writing should look like).

      The best analogy are actually software products, in particular video games. Games can get good reviews, but with major caveats like "don't buy now, still too buggy" or "muddy textures, developer promises HD texture pack for next update". And developers actually patch and upgrade their games after the first release because there is a certain grace period of good will in which you get to fix the problems before buyers lose interest and move on. Each patch clearly indicates what issues got fixed, and the gaming community adjusts its opinions accordingly. If the game is just too broken (Assassin's Creed Unity) or the patches arrive way too late (Batman Arkham Knight), then there's no way of turning things around and you do have a major commercial failure on your hands. But that's the equivalent of papers that are so bad they cannot be salvaged without starting from scratch, and we've all seen our fair share of those.

      There's also cases of developers releasing patches years after release to drum up publicity for the soon-to-be-released sequel. Or some indie developers just love their fans so much they give them an anniversary present in the form of a few bonus levels. This has all happened. The point being: the publishing system I sketch has a completely different dynamic, it does not follow the logic of incentives that is inherent to our system.

      Delete
    3. My response to the evidence about blog comments being responsive is that I indeed, as you suspect, get a feeling it's optimistic selection of examples. I don't believe that in the aggregate blog comments are anywhere near as responsive as responses to reviewers, nor that ACL revisions are anywhere near as responsive as journal revisions. It seems pretty self-evident that the distribution of quality-of-responsiveness-to-reviewers changes a lot as a function of how much skin you've got in the game. The current system puts direct financial incentives on improving the quality of the science. The commercial software cases you cite also put some strong financial incentives on the developers to improve the software, so they aren't comparable to this system.

      I'd have to see it demonstrated empirically that there's some other way of getting useful feedback in before I'd invest any time or money into a system like this. Downstream, that's needed for two things. One, getting good quality science. Two, performance evaluations for giving out money and jobs. You can bet that the standards for institutions with money as far as demonstrating that this system is not worse than the current one are going to be a lot more rigorous than my own.

      That's my gut reaction to the empirical basis of this.

      I don't doubt that the right comments model with moderation could put a fair amount of pressure on authors to make substantive changes to their papers. I just don't have a sense a model currently exists. The snippet extraction part sounds like a good idea. Maybe something else that doesn't exist right now would be "conditional likes," the equivalent of an accept-with-revisions. Raters could flag individual comments as conditional likes, meaning that they'll be willing to rate your paper higher if you make the necessary changes. Another thing that isn't present in any commenting system are prompts for specific, rather than general questions. I get the sense that being asked to comment on a fixed, narrow set of dimensions and enforcing that the authors reply, as in Frontiers reviews, is an order of magnitude better in terms of quality of reviews and quality of responses.

      It feels like groping in the dark looking for things that would work. I get the intuition about what might or might not work, and some conflicting intuitions about how well they'd work, but the stakes are high. On none of the cases cited here do I have a large set of useful, carefully collected data to refer to to check these intuitions against. Is there some funding opportunity that would fund empirical research into the dynamics of networks like this - studying peer review, software development, online comment and reputation systems - so that we can have something to go on? I've had my eyes open for such a thing (as a side project to build a useful review/reputation system for journals and publishing venues).

      Delete
    4. I think we worry about slightly different things here. My main goal was to address the technical side of publishing, because there is this common narrative that publishing must be centralized, inherently costly, and so on. It would be nice to see people dream a little bigger than "how do we add open access to the status quo?".

      How you actually implement such as system in practice is a different matter, though the two are intertwined of course. I'd say in practice it will even be a Herculean task to get scientists to author in plain text and use version control. No matter how easy you make the tools, people hate changing their ways.

      For the reviewing, I still believe that it isn't too difficult to create incentives. All you need is to attach a score to authors that reflects their reviewing diligence. Suppose you submit a paper, which is reviewed by X, Y, and Z. After a grace period of two months, they look at the latest version of the paper, and while X is happy that you addressed his/her concerns, Y and Z are annoyed because you completely disregarded everything they said. So X gives you a good review score and writes a few nice words, Y and Z give you low scores and leave some sternly worded remarks. The next time you submit a paper, reviewers can see your less than stellar score and some may simply decide not to review the paper because you of your bad revising track record. Fewer reviewers means fewer signatures from prestigious reviewing networks, which means fewer readers and less valuable papers. So that's a strong incentive: make careful revisions (and document what you did in the version control system), or you're jeopardizing the success of future papers.

      Of course that system also needs some checks & balances, you don't want some crazed reviewer to turn you into a pariah. So reviews and reviewers should also be rated by you, the community, and other people that reviewed the paper, and this may reduce or increase their weight.

      I'm sure there's many other things one could try; all the things you suggest sound very reasonable. You are right that at this point this is groping in the dark, but I don't think you'll be able to learn much from experimental studies either. Distributed systems with many checks and balances are inherently dynamic, and they require a certain level of maturity on the part of the community. Researchers have to learn how to interact in this new system, and the dynamics may be very different in a community of 100 and one of 1000. A professional writing/reviewing community is also very different from Amazon reviews or Facebook likes.

      If something's horribly broken, you could probably detect that in a trial run, but the real implications will only surface years if not decades after broad adoption (as was the case with our current publishing system). So rather than carefully test and vet everything before the release of the perfect publishing system, it might be better to just do it and roll with the punches.

      Delete
    5. This comment has been removed by the author.

      Delete
    6. I think there must be enough (observational) data available from existing networks + existing sets of experience with peer review. I don't know how to synthesize it, but the sheer number of intuitively plausible ideas about how to attack these problems, and the number of details that spring to mind about the reviewing system that would "of course" need to be implemented, just within the space of a few comments, plus the fact that we don't even really have a proper negative spec sheet (for specific problems with the current system to be avoided) - well, that's not enough to absolutely demand careful empirical work first, I agree. What does lead me to think that the initial product better be pretty damn good is that it will be hard to get buy-in in the first place.

      Delete
    7. Actual roll-out would require a slow, incremental phase-in that starts out as an extension of the current system and keeps growing in functionality until it is the dominant force in the market, at which point backwards compatibility to the old model can be slowly phased out over 10 or 20 years.

      So I would start with torrents as an additional method of distribution for open access articles. Pretty much anybody should like that because it distributes network load, which reduces hosting cost. Then you add a mechanism for signing torrents, and an option for authors to include the source files in the torrent. Then you expand that to include full version history. Then you add multiple signing (publisher + author). And so on. Basically, put the distribution system in place, and once that is ubiquitous you move on to evaluation/reviewing.

      There I would start by building up the importance of blogs and pushing for a more flexible system for computing metrics --- everybody knows that impact factor and citation counts are pretty crude, so if you can offer more information publishers and university admins would like that, too.

      It really has to be done in this kind of incremental fashion, sort of like open source development: release early, release often. Proprietary development style --- work in secret until you have a finished product to push out --- will not work well, I think.

      Delete
    8. @Ewan: What sort of observational data do you think would be useful? I'm having a hard time imagining how you could collect such data. In order to do so, it seems like you would already have to have some amount of buy-in from institutions for the system you wanted to collect data about so that you could have some people actually using the system and then see how it works for the two things you mentioned above: "getting good quality science" and "performance evaluations for giving out money and jobs".

      I don't see that as likely. Unless you have in mind some other way of setting up a small set of users to test run the system. But then I think the results would be biased toward showing the new system to be abysmal because universities/institutions wouldn't have yet bought into the system and so it wouldn't work well for the "performance evaluations for giving out money and jobs" part.

      I think Thomas is right about this that in order to have some sort of paradigm shift, we need slow change that incrementally builds on how things currently are.

      @Thomas: Thanks for this post! I generally like the system you sketch, though I'd like to see a better thought-out solution to the problem of double-blind review that Brooke raised.

      And I also think some of these things will be really hard to work toward. As you said, moving towards plain-text authored manuscripts and version control is a really high bar to begin with. Key signing stuff would definitely need a GUI.

      I do think some of the other things could be slowly implemented as you said. One other possible place to start, too, is with Lingbuzz. Sometime sort of recently, there were some rumblings about trying to do something since the Lingbuzz servers always seem to be down. Perhaps that could be a jumping off point, especially since Lingbuzz already has some sort of versioning built into it. (Though these rumblings have died down for various reasons.)

      If anyone is interested in trying to take some concrete steps towards something like what Thomas sketched (or something else), I'd be really interested in collaborating/helping out. I think this is a huge problem that really needs to be addressed.

      Delete