Real science data is not natural. It is artificial. It is rarely encountered in the wild and (as Nancy Cartwright has emphasized (see here for discussion)) it standardly takes a lot of careful work to create the conditions in which the facts are observable. The idea that science proceeds by looking carefully at the natural world is deeply misleading, unless, of course, the world you inhabit happens to be CERN. I mention this because one of the hallmarks of a progressive research program is that it supports the manufacture of such novel artificial data and their bundling into large scale “effects,” artifacts which then become the targets of theoretical speculation. Indeed, one measure of how far a science has gotten is the degree to which the data it concerns itself with is factitious and the number of well-established effects it has managed to manufacture. Actually, I am tempted to go further: as a general rule only very immature scientific endeavors are based on naturally available/occurring facts.
Why do I mention this. Well, first, by this measure, Generative Grammar (GG) has been a raging success. I have repeatedly pointed to the large number of impressive effects that GG has collected over the last 60 years and the interesting theories that GGers have developed trying to explain them (e.g. here). Island and ECP effects, binding effects and WCO effects do not arise naturally in language use. They need to be constructed, and in this they are like most facts of scientific interest.
Second, one nice way to get a sense of what is happening in a nearby domain is to zero in on the effects its practitioners are addressing. Actually, more pointedly, one quick and dirty way of seeing whether some area is worth spending time on is to canvass the variety and number of different effects it has manufactured. In what follows I would like to discuss one of these that has recently come to my attention that has some interests for a GGer like me.
A recent paper (here) by Jiwon Yun, Zhong Chen, Tim Hunter, John Whitman and John Hale (YCHWH) discusses an interesting processing fact concerning relative clauses (RC) that seems to hold robustly cross linguistically. The effect is called the “Subject Advantage” (SA). What’s interesting about this effect is that it holds in languages where the head both precedes and follows the relative clause (i.e. for languages like English and those like Japanese). Why is this interesting?
Well, first, this argues against the idea that the SA simply reflects increasing memory load as a function of linear distance between gap and filler (i.e. head). This cannot be the relevant variable for though it could account for SA effects in languages like English where the head precedes the RC (thus making the subject gap closer to the head than the object gap is) in Japanese style RCs where heads follow the clause the object gap is linearly closer to the head than the subject gap is, hence predicting an object advantage, contrary to experimental fact.
Second, and here let me quote John Hale (p.c.):
SA effects defy explanation in terms of "surprisal". The surprisal idea is that low probability words are harder, in context. But in relative clauses surprisal values from simple phrase structure grammars either predict effort on the wrong word (Hale 2001) or get it completely backwards --- an object advantage, rather than a subject advantage (Levy 2008, page 1164).
Thus, SA effects are interesting in that they appear to be stable over languages as diverse as English on the one hand and Japanese on the other and seem to refractory to many of the usual processing explanations.
Furthermore, SA effects suggest that grammatical structure is important, or to put this in more provocative terms, that SA effects are structure dependent in some way. Note that this does not imply that SA effects are grammatical effects, only that G structure is implicated in their explanation. In this, SA effects are a little like Island Effects as understood (here). Purely functional stories that ignore G structure (e.g. like linearly dependent memory load or surprisal based on word-by-word processing difficulty) seem to be insufficient to explain these effects (see YCHWH 117-118).
So how to explain the SA? YCHWH proposes an interesting idea: that what makes object relatives harder than subject relatives is have different amounts of “sentence medial ambiguity” (the former more than the latter) and that resolving this ambiguity takes work that is reflected in processing difficulty. Or put more flatfootedly, finding an object gap requires getting rid of more grammatical ambiguity than finding a subject gap and getting rid of this ambiguity requires work, which is reflected in processing difficulty. That’s the basic idea. He work is in the details that YCHWH provides. And there are a lot of them. Here are some.
YCHWH defines a notion of “Entropy Reduction” based on the weighted possible continuations available at a given point in a parse. One feature of this is that the model provides a way of specifying how much work parsing is engaged in at a particular point. This contrasts with, for example, a structural measure of memory load. As note 4 observes, such a measure could explain a subject advantage but as John Hale (p.c.) has pointed out to me concerning this kind of story:
This general account is thus adequate but not very precise. It leaves open, for instance, the question of where exactly greater difficulty should start to accrue during incremental processing.
That said, whether to go for the YCHWH account or the less precise structural memory load account is ultimately an empirical matter. One thing that YCHWH suggests is that it should be possible to obviate the SA effect given the right kind of corpus data. Here’s what I mean.
YCHWH defines entropy reduction by (i) specifying a G for a language that defines the possible G continuations in that language and (ii) assigning probabilistic weights to these continuations. Thusm YCHWH shows how to combine Gs and probabilities of use of these. Parsing, not surprisingly, relies on the details of a particular G and the details of the corpus of usages of those G possibilities. Thus, what options a particular G allows affects how much entropy reduction a given word licenses, as does the details of the corpus that are probabilize the G. This thus means that it is possible that SA might disappear given the right corpus details. Or it allows us to ask what if any corpus details could wipe out SA effects. This, as Tim Hunter noted (p.c.) raises two possibilities. In his words:
An interesting (I think) question that arises is: what, if any, different patterns of corpus data would wipe out the subject advantage? If the answer were 'none', then that would mean that the grammar itself (i.e. the choice of rules) was the driving force. This is almost certainly not the case. But, at the other extreme, if the answer were 'any corpus data where SRCs are less frequent than ORCs', then one would be forgiven for wondering whether the grammar was doing anything at all, i.e. wondering whether this whole grammar-plus-entropy-reduction song and dance were just a very roundabout way of saying "SRCs are easier because you hear them more often".
One of the nice features of the YCHWH discussion is that it makes it possible to analytically approach this problem. It would be nice to know what the answer is both analytically as well as empirically.
Another one of he nice features of YCHWH is that it demonstrates how to probabilize MGs of the Stabler variety so that one can view parsing as a general kind of information processing problem. In such a context difficulties in language parsing are the natural result of general information processing demands. Thus, this conception of parsing locates it in a more general framework of information processing, parsing being one specific application where the problem is to determine the possible G compatible continuations of a sentence. Note that this provides a general model of how G knowledge can get used to perform some task.
Interestingly, on this view, parsing does not require a parser. Why? Because parsing just is information processing when the relevant information is fixed. It’s not like we do language parsing differently than we do, say, visual scene interpretation once we fix the relevant structures being manipulated. In other words, parsing on the YCHWH view is just information processing in the domain of language (i.e. there is nothing special about language processing except the fact that it is Gish structures that are being manipulated). Or, to say this another way, though we have lots of parsing, there is no parser that does it.
YCHWH is a nice example of a happy marriage of grammar and probabilities to explain an interesting parsing effect, the SA. The latter is a discovery about the ease of parsing RCs that suggests that G structure matters and that language independent functional considerations just won’t cut it. It also shows how easy it is to combine MGs with corpora to deliver probabilistic Gs that are plausibly useful in language use. All in all, fun stuff, and very instructive.
 This is one reason why I find admonitions to focus on natural speech as a source of linguistic data to be bad advice in general. There may be exceptions, but as a general rule such data should be treated very gingerly.
 See, for example, the discussion in the paper by Sprouse, Wagers and Phillips.
 A measure of distance based on structure could explain the SA. For example, there are more nodes separating the object trace and the head than separating the subject trace and the head. If memory load were a function of depth of separation, that could account for the SA, at least at the whole sentence level. However, until someone defines an incremental version of the Whole-Sentence structural memory load theory, it seems that only Entropy Reduction can account for the word-by-word SA effect across both English-type and Japanese-type languages.
 The following is based on some correspondence with Tim Hunter. Thus he is entirely responsible for whatever falsehoods creep into the discussion here.