Wednesday, January 14, 2015

Science writing

I read this piece recently about good science writing. It distinguishes "good" prose from "clear" prose and advocates for the latter. Why? Because some good writing is good because it is stylish and being stylish can involve opacity and indirectness, and hence make reading more demanding than it should be if effective communication is the aim.

I think that there is something here. One of the things I have noted is that many people like to write papers that are sort of like a good mystery. The problem is outlined and several false paths are traversed to raise the suspense and then at the end all the clues that have been sprinkled throughout the paper are collected and a big bushy rabbit is pulled out of the hat to the delight of all. This can involve really good writing, but it probably does not make for a good science paper. Better to make things less a mystery. Say up front what the point is, what the argument will be, what the data will be and then show how the data supports the conclusion and illuminates the point. In fact, if you have written the paper correctly, after the problem has been presented and the data and argument form outlined, the reader can write the rest of the paper him/herself. So, I agree that forgoing the mystery format is a good idea.

However, I think that this can go too far. Lots of science writing is flat and uninteresting. In fact it is "morphemic" in the 'morpheus' sense of sleep inducing. There is no reason why science writing cannot be clear and not flat.  There is no reason why all one's personality need be wrung from the body of the paper. It's nice to hear a writer's voice.

But even more important, it's nice to know why the author thinks that the paper is worth reading.  Lots of the time, papers skip this part perhaps because it is thought of as too subjective. But it's not. A reader is dedicating time (possibly better spent otherwise) to read the piece. The writer owes it to the reader to explain why this investment is worthwhile, at least in the author's opinion.

Much of this applies to linguistic papers too, I believe. So, clarity yes. But, please tell me why I should care and please tell me where I am going to go and how I am going to go there before I start the journey.

And, if at all possible, one more thing. Don't self-congratulate. There is a bad tendency in linguistic writing to applaud one's own arguments. We see things like "this surprising conclusion…" or "this provides strong evidence for…" These are fine things to say about some other proposal. But it is unseemly (and tendentious) to say it about one's own proposal. It's a little like Turing calling his device a "Turing machine." He didn't. Church did. And what's ok for Church to do would not have been ok for Turing to do. He would have sounded like a putz.


  1. Oh boy, one of my biggest pet peeves, the scientific writing style in linguistics.

    One of the things I have noted is that many people like to write papers that are sort of like a good mystery.
    That is indeed the part I dislike most about linguistics papers because it is a waste of time and paper. It would never occur to a mathematician to outline all the failed proofs he went through before he found the right one, you get one proof and that's it. You might say it is instructive to see why the previous analyses failed, but then it is still more succinct and transparent to present the final system first and then show that altering any of the parameters causes problems.

    Quite generally a shocking number of linguistics papers are incredibly verbose and unorganized. Most of them are over 30 pages with no clear distinction between problem, data, and analysis, which makes it incredibly hard to quickly find a piece of data or extract the details of the technical machinery. Much worse than definition-theorem-proof in mathematics, proposal-implementation-evaluation in CS, or experiment-data-analysis in psychology.

    Now of course different fields have different requirements, so we shouldn't expect linguistic writing to mimic one of these paradigms. But I still think that many linguistics papers could be shortened significantly while increasing readability. Even simple things like explicitly highlighting defintions rather than lumping them into the prose or numbering them consecutively with example sentences would help.

    There is also a strong tradition of walking the reader through the analysis with lots of handholding, examples, multiple explanations of the same thing. You don't see that a lot in computer science or mathematics where the papers are a lot more formal and challenging. Personally, I would rather spend 2 hours working through a dense 8 page paper than 2 hours being annoyed by a verbose 30 page read.

    1. [cont]
      Speaking of defintions, linguistics differs in this respect from more formal fields too. If you take a paper in mathematical linguistics, for example, it is common practice to include a preliminaries section that defines even the most basic things like string language and the notation for string concatenation, even though these are fairly standard. This is done for several reasons. For one thing it makes the paper self-contained and avoids accidental misunderstandings of the notation. More importantly, perhaps, it makes the paper "history-proof": even if some terminology and notation may have changed 50 years from now, readers in 2065 will still be able to pick up and read through those papers fairly easily. But how many peope in 2065 will know what is meant by "strong PIC"? I'd wager very few considering that many students don't even know m-command anymore. So please think of your future readers and include a definition.

      Then there's notation, or rather the lack thereof. If you pick up old math papers, e.g. the one by Frege, you'll find that they do not read too differently from linguistics papers, with a lot of prose, analogies and so on. And even if you already know what they're saying, they're very hard to read because of the lack of suitable notation. Syntacticians still don't have a standard notation for basic things like "object DP moves to Spec,vP and the containing VP remnant moves to Spec,TP". There's trees, but those are more specific since they also present irrelevant information such as other nodes in the tree (and they take up a lot of space and cannot be typeset easily without specialized tools).

      Linguistics papers are also very heavy of foot/endnotes. Not nearly as bad as philosophy or other humanities, but much worse than the sciences. Once again not too surprising considering the history of linguistics, but still something that should be changed. I personally can't think of a single footnote I ever found enlightening. There's also a tradition of what one may call weak reference footnotes, i.e. something like "See X, Y, and Z for ideas on feature checking that may have some bearing on this point". I guess that, too, is a humanities remnant where the scientific discourse is, well, much more of a discourse between authors. In other fields there is a much stronger tradition of only including indispensable citations (e.g. because your paper must not be longer than 4 pages).

      So, clarity yes. But, please tell me why I should care and please tell me where I am going to go and how I am going to go there before I start the journey.
      That I fully agree with, but a good introduction and conclusion can do that in less than 2 pages. I can think of a fair number of Minimalist papers that start out with a five page conceptual discussion that adds very little to the paper. If I have a choice between that and no contextualization at all, I'd rather pick the latter. For a well-written paper it is still fairly easy to figure out quickly if it's results are relevant to me.

    2. One final remark: it would be interesting to see how papers change during the publication process in various fields. I have a hunch that in some fields they become more baroque and littered with side remarks and footnotes (the transition from proceedings paper to journal paper in linguistics, as you have pointed out), while in others they may get stripped down to the bare essentials (for instance, I have to strip down a 25 page paper to 16 pages right now, and while I'm not happy about that, I think the endproduct may actually be a lot more pleasant for the reader).

    3. Well, it seems this really is a bee in sein bonnet, nicht wahr? I tend to agree with most of what you said except one thing. I personally have found lots that is interesting in footnotes, though I suspect that is because that is one of the few places you can say things that are interesting and that are relatively immune from reviewers comments.

      Two further thing: We do have models for short good papers in ling that are not verbose etc. It's Nels and WCCFL and CLS proceedings. These tend to be quite well focused and are aimed to make one point. I generally find them worth the effort to page ratio.

      Last, there is a little mathematical preening going on, or so it sounds to me. One of the points that Halmos makes is that one writes for an audience. Some of my mathematically inclined colleagues when addressing linguists are more interested in impressing them with their technical chops that with explaining what they have in mind. Nor do I think the this is entirely accidental. So, though short is nice, as Halmos notes, it can get in the way of clarity.

      One really last point: a couple of pages to set the scene. Yes that should be enough. But its not scene setting that I had in mind really. Too often it is unclear why I should care about a purported result. The author simply assumes that what they are saying is interesting. Maybe, but maybe not. This is no less true in the mathy world that you inhabit than the one that I do. Theorems do not speak for themselves. Or at least their import doesn't.

    4. I agree with pretty much everything you say, in particular the remark about proceedings papers (which unfortunately are becoming increasingly rare because they count very little compared to journal publications nowadays) . You're also right that footnotes can include interesting observations, the question is how often they actually do. Many footnotes are only two or three lines long, in which case they can either be dropped or incorporated into the main text.

      Anyways, my point wasn't that mathematicians have it all figured out and linguists should copy their style. Rather, it's always a good idea to look at how other fields are handling these things and see whether any of those strategies would be useful for linguistic writing, too. After all it's no accident that psycholinguistics papers usually follow the format of psychology papers --- it's a good format for that type of work. Syntax is closer to the theoretical sciences and math, but the writing style is still very similar to a humanities paper. I've outlined a few things that imho would be more appropriate, but I'm sure there's multiple remedies to the writing quirks that bug me.

      But since we're already at it, let's talk big ideas: could we please start numbering our paragraphs so that page numbers are no longer needed for references and articles can be distributed in formats that do not preserve pagination? Or dump the article format in toto and move to ever-expanding wikis? I know, not gonna happen, but a man can dream... a man can dream.

    5. I like the paragraph idea. Dream away.

    6. Hi, I'm Meaghan, and I'm new here. And I have strong opinions about science writing too!

      I very much agree with most of your points, Thomas, especially on things like separating out definitions and providing a preliminaries section. There's just one thing: I love me some handholding! While if the choice were between rambling, unclear prose and succinct, dense precision, I'd take the latter any day, my preference if for precision and some limited verbosity: examples, a couple of ways of approaching a definition or claim, etc. When I read really dense and succinct paper, if for any reason I don't understand something, often that was my only chance, and now I'm sunk, but with more information, more angles, and examples, I have several ways in. If it's done well, it's much quicker than struggling through opaque notation and dense argumentation on my own.

      Oh yes, and I'm with Norbert: I love footnotes! Sometimes that's the only interesting part of a paper.

  2. One of the best pieces I ever read about how to write was a 1973 essay by Paul Halmos titled "How to Write Mathematics" Despite the title, It's really about good writing no matter the field. There is an overview here but it really doesn't do justice to the essay. The first several pages especially are just amazing.