Tuesday, September 15, 2015

Judgments and grammars

A native speaker judges The woman loves himself to be odd sounding. I explain this by saying that the structure underlying this string is ungrammatical, specifically that it violates principle A of the Binding Theory. How does what I say explain this judgment? It explains it if we assume the following: grammaticality is a causally relevant variable in judgments of acceptability. It may not be the only variable relevant to acceptability, but it is one of the relevant variables that cause the native speaker to judge as s/he does (i.e the ungrammaticality of the structure underlying the string causes a native speaker to judge the sentence unacceptable). If this is correct (which it is), the relation between acceptability and grammaticality is indirect. The aim of what follows is to consider how indirect it can be while still leaving the relation between (un)acceptability and (un)grammaticality direct enough for judgments concerning the former to be useful as probes into the structure of the latter (and into the structure of Gs which the notion of grammaticality implicitly reflects).

The above considerations suffice to conclude that (un)acceptability need not be an infallible guide to (un)grammaticality. As the latter is but one factor, then it need not be that the former perfectly tracks the latter. And, indeed, we know that there are many strings that are quite unacceptable but are not ungrammatical. Famous examples include self-embedding (e.g. That that that Mary saw Sam is intriguing is interesting is false), ‘buffalo’ sentences (e.g. Buffalo buffalo buffalo buffalo buffalo buffalo buffalo) and multiple negation sentences (No eye injury is too insignificant to ignore). The latter kinds of sentences are hard to process and are reliably judged quite poor despite their being grammatical. A favorite hobby of psycho-linguists is to find other cases of grammatical strings that trouble speakers as this allows them to investigate (via how sentences are parsed in real time) factors other than grammaticality that are psycho-linguistically important. Crucially, all accept (and have since the “earliest days of Generative Grammar”) that unacceptability does not imply ungrammaticality.

Moreover, we have some reason to believe that acceptability does not imply grammaticality. There are the famous cases like More people visited Rome than I did which are judged by speakers to be fine despite the fact that speakers cannot tell you what they mean. I personally no longer think that this shows that these sentences are acceptable. Why? Precisely because there is no interpretation that they support. There is no interpretation for these strings that native speakers consistently recognize so I conclude form this that they are unacceptable despite “sounding” fine. In other words, “sounding fine” is at best a proxy for acceptability, one that further probing may undermine. It often is good enough and it may be an interesting question to ask why some ungrammatical sentences “sound fine” but the mere fact that they do is not in itself sufficient reason to conclude that these strings are acceptable (let alone grammatical).[1]

So are there any cases of acceptability without grammaticality? I believe the best examples are those where we find subliminal island effects (see here for discussion). In such cases we find sentences that are judged acceptable under the right interpretation. Despite this, they display the kinds of super-additivity effects that characterize islands. It seems reasonable to me to describe these strings as ungrammatical (i.e. violate island conditions) despite their being acceptable. What this means is that for cases such as these the super-additivity profile is a more sensitive measure of grammaticality than is the bare acceptability judgment. In fact, assuming that the sentence violates islands restrictions explains why we find the super-additivity profile.  Of course, we would love to know why in these cases (but not in many other island violating examples) ungrammaticality does not lead to unacceptability. But not knowing why this is so, does not in and of itself compromise the conclusion that sentences these acceptable sentences are ungrammatical.[2]

So, (un)acceptability does not imply (un)grammaticality, nor vice versa. How then can the former be used as a probe into the latter? Well, because this relation is stable often enough. In other words, over a very large domain acceptability judgments track grammaticality judgments, and that is good enough. In fact, as I’ve mentioned more than once, Sprouse, Almeida, and Schutze have shown that these data are very robust and very reliable over a very wide range, and thus are excellent probes into grammaticality. Of course, this does not mean that they such judgments are infallible indicators of grammatical structure, but then nobody thought that they ever were. Let me elaborate on this.

We’ve known for a very long time that acceptability is affected by many factors (see Aspects:10-15 for an early sophisticated discussion of these issues), including sentence length, word frequencies, number of referential DPs employed, intonation and prosody, types of embedding, priming, kinds of dependency resolutions required, among others. These factors combine to yield a judgment of (un)acceptability on a given occasion. And these are expected to be (and acknowledged to be) a matter of degree. One of the things that linguists try to do in probing for grammaticality is to compensate for these factors by comparing sentences of similar complexity to one another to isolate the grammatical contribution to the judgment in a particular case (e.g. we compare sentences of equal degree of embedding when probing for island effects). This is frequently doable, though we currently have no detailed account of how these factors interact to produce any given judgment. Let me repeat this: though we don’t have a general theory of acceptability judgments, we have a pretty good idea what factors are involved and when we are careful (and even when we are not, as Sprouse has shown) we can control for these and allow the grammatical factor to shine thorough a particular judgment. In other words, we can set up a specific experimental situation that reliably tests for G-factors (i.e. we can test whether G-factors are causally relevant in the standard way that experiments typically do, by controlling the hell out of the other factors). This is standard practice in the real sciences, where unpacking interaction effects is the main aim of experimentation. I see no reason why the same should not hold in linguistics.[3]

It is worth noting that the problem of understanding complex data (i.e data that is reasonably taken to be the result of many causally interacting factors) is not limited to linguistics. It is a common feature of the real sciences (e.g. physics). Geoffrey Joseph has a nice (old) paper discussing this, where he notes (786):[4]

Success at the construction and testing of theories often does not proceed by attempting to explain all, or even most, of the actually available data. Either by selecting appropriate naturally occurring data or by producing appropriate data in the laboratory, the theorist implicitly acknowledges …his decomposing the causal factors at work into more comprehensible components. A consequence of this feature of his methodology is that we are often in the position of having very well-confirmed fundamental theories at hand, but at the same time being unable to formulate complete deductive explanations of natural (complex) phenomena.

This said, it is interesting when (un)acceptability and (un)grammaticality diverge. Why? Because, somewhat surprisingly, as a matter of fact the two track one another so closely (in fact, much more closely than we had any reason to expect a priori). This is what makes it theoretically interesting when the two diverge. Here’s what I mean.[5]

There is no reason why (un)acceptability should have been such a good probe into (un)grammaticality. After all, this is a pretty gross judgment that we ask speakers to make and we are absolutely sure that many factors are involved. Nonetheless, it seems that the two really are closely aligned over a pretty large domain. And precisely because they are, it is interesting to probe where they diverge and why. Figuring out what’s going on is likely to be very informative.

Paul Pietroski has suggested an analogy from the real sciences. Celestial mechanics tracks the actual position of planets in space in terms of their apparent positions. Now, there is no a priori reason why a planet’s apparent position should be a reliable guide to its actual one. After all, we know that the fact that a stick in water looks bent does not mean that it is bent. But at least in the heavens, apparent position data was good enough to ground Kepler’s discoveries and Newton’s. Moreover, precisely because the fit was so good, their apparent divergence in a few cases was rightly taken to be an interesting problem to be solved. The solutions required a complete overhaul of Newton’s laws of gravitation (actually, relativity ended up deriving Newton’s laws as limit cases, so in an important sense they these laws were conserved). Note that this did not deny that apparent position was pretty good evidence of actual position. Rather it explained why in the general case this was so and why in the exceptional cases the correlation failed to hold. It would have been a very bad idea for the history of physics had physicists drawn the conclusion that the anomalies (e.g. the perihelion of Mercury) showed that Newton’s laws should be trashed. The right conclusion was that the anomaly needed explanation, and they retained the theory until something came along that explained both the old data and also explained the anomalies.

This seems like a rational strategy, and it should be applied to our divergent cases as well. And it has been to good effect in some cases. The case of unacceptability-despite- grammaticality has generated interesting parsing models that try to explain why self- embedding is particularly problematic given the nature of biological memory (e.g. see work by Rick Lewis and friends). The case of acceptability-despite-ungrammaticality has led to the development of somewhat more refined tools for testing acceptability that has given us criteria other than simple acceptability to measure grammaticality.

The most interesting instance of the divergence, IMO, is the case of Mainland Scandinavian where detectable island violations (remember the super-additivity effects) do not yield unacceptability. Why not? Dunno.[6] But, as in the mechanics case above, the right attitude is not that the failure of acceptability to track grammaticality shows that there are no island effects and that a UG theory of islands is clearly off the mark. Rather the divergence indicates the possibility of an interesting problem here and that there is something that we still do not understand. Need I say, that this latter observation is not a surprise to any working GGer? Need I say that this is what we should expect in normal scientific practice?

So, grammaticality is one factor in acceptability and a reliable robust one at that. However, like most measures, it works better in some contexts than in others, and though this fact does not undermine the general utility of the measure, it raises interesting research questions as to why.

Let me end by repeating that all of this is old hat. Indeed, Chomsky’s discussion in chapter 1 of Aspects is still a valuable intro to these issues (11):

…the scales of grammaticalness and acceptability do not coincide. Grammaticalness is only one of many factors that interact to determine acceptability. Correspondingly, although one might propose various operational tests for acceptability, it is unlikely that a necessary and sufficient operational criterion might be invented for the much more abstract and important notion of grammaticalness.

So, can we treat grammaticalness as some “kind” of acceptability? No, nor should we expect to. Can we use acceptability to probe grammaticalness? Yes, but as in all areas of inquiry there is no guarantee that these judgments are infallible guides. Should we expect to one day have a solid theory of acceptability? Well, some hope for this, but I am skeptical. Phenomena that are the result of the interactions of many factors are usually theoretically elusive. We can tie loose ends down well enough in particular cases, but theories that specify in advance which looses ends are most relevant are hard to come by, and not only in linguistics. There are no general theories of experimental design. Rather there are rules of thumb of what to control for in specific cases informed by practice and some theory. This is true in the “real” sciences, and we should expect no less in linguistics. Those who demand more in the latter case are methodological dualists, holding linguistics to uniquely silly standards.

[1] Other similar examples involve cases where linear intervening non-licensing material can improve a sentences acceptability. There is a lot of work on this involving NPI licensing by clearly non-c-commanding negative elements. This too has been widely discussed in the parsing literature. Again, interpretations for these improved sentences are hard to come by and so it is unclear whether these sentences are actually acceptable despite their tonal improvements.
[2] So far as I can tell, similar reasoning applies to some recent discussion of binding effects in a recent Cognition paper by Cole, Hermon and Yanti that I hope to discuss more fully in the near future.
[3] As Nancy Cartwright observes in her 1983 book (p. 83), the aim of an experiment is to find “quite specific effects peculiarly sensitive to the exact character of the causes [you] want to study.” Experiments are very context sensitive set ups developed to find these effects.  And they often fail to explain a lot. See Geoffrey Joseph quote below.
[4] See his “The many sciences and the one world,” Journal of Philosophy 1980: 773-791.
[5] I own what follows to some discussion with Paul. He is, of course, fully responsible for my misstatements.
[6] But I believe that Dave Kush and friends have provided the right kind of answer. See chapter 11 here for an example).


  1. Since you mention the case here, I'll mention that our in-depth study on comparative illusions ("More people have been to Russia than I have"), led by Alexis Wellwood, is now finally available.

    In connection with those cases, it's interesting that you propose a 3-way distinction, which I haven't seen discussed much elsewhere: (i) acceptable irrespective of meaning ("sounds good"); (ii) acceptable with a specific meaning; (iii) grammatical. It's worth noting that the Sprouse et al. large scale acceptability judgment studies that you cite here specifically sampled sentences where meaning is not at stake. So they avoided testing your (ii).

    I'll also mention that the main finding from our studies on comparative illusions is that they do seem to be associated with a specific meaning. Just one that shouldn't be grammatically possible. The sentences look like statements that compare quantities of individuals, but they seem to be understood as comparisons of quantities of events. At least, the most consistent effect on the acceptability of the sentences is whether the predicate is repeatable, hence countable. Of course, the effects are on the squishy side, so many things affect how consistently people fall for the illusion.

  2. I have to say that I am a bit surprised to see you, Norbert, suggesting that we separate out sounds good but I can't pin down its specific meaning from acceptable.

    The reason for my surprise is that the distinction between these two category presupposes an understanding of what "a specific meaning" is, an understanding that some theoreticians may have (or purport to have), but which a participant in an experiment surely has (at best) very indirect access to.

    Obviously, my own methodological proclivities are showing, here. But consider this: the above discussion of acceptability centers around the idea that we have no "direct line" to grammaticality, and that acceptability is a surprisingly-reliable-but-still-indirect conduit for well-formedness. Well, if anything, it would seem to me that we have an even less direct line to [well-formedness ∧ semantic-interpretability]. I know this goes against the grain of the last, say, 15-20 years of research in syntax – where reasoning about syntactic structure from the availability or unavailability of a given reading has become de rigueur. And I'm certainly not advocating that the (wonderful) results of this research be tossed out or anything remotely like that. But to me, it seems that asking naïve participants "is this an acceptable sentence" is methodologically cleaner than asking them "is this sentence acceptable and can you articulate its meaning," or "is this sentence acceptable under this interpretation." Of course you can set up a context or some real-world knowledge that biases towards a specific reading, and then ask if the sentence is acceptable in that context (Lisa Matthewson, Amy Rose Deal, and others have been writing up more formalized methodologies that address these issues in much more careful, systematic ways). That said, I see no a priori reason why we should presuppose that "acceptability" correlates with [syntactically well-formed ∧ semantically interpretable] more directly than it correlates with [syntactically well-formed] alone. That seems to me to be an empirical question.

    1. @Omer. I find Norbert's distinction reasonable (though I was surprised to see the conclusion that it led to). We talk all the time about acceptability of form-meaning pairings, whether explicitly or otherwise. So it's fair to point out that if an example is accepted despite no stable meaning, then it's a different animal. I happen to think that those comparative illusions are not (quite) as semantically mysterious as has been assumed in the past. But Norbert's point stands.

    2. @Colin: I don't doubt that there is an in principle distinction between acceptable-with-stable-meaning and acceptable-without-stable-meaning. What I was taking issue with was the idea that the term "acceptability" should be applied to one and not the other. To me, acceptability is a behavioral measure, and I suspect that in practice the these two categories don't separate so neatly (behaviorally).

    3. I see that I may have been too bold in my claim. So let me modify it a bit both in Colin's and Karthik's direction.

      I take it that Gs primarily map meanings with/a sounds over an infinite domain. I find it reasonable to suppose that the data most relevant for probing this mechanism are those which do this, i.e. produce a structure with both a sound side and a meaning side. Now, curiously, many sentences that are unacceptable are nonetheless quite meaningful. IN fact, so far as I can tell this is the norm. So fixed subject violations are nonetheless easy to understand and even relative clause island violations are hardly incomprehensible (i.e. it's pretty clear what they WOULD mean were they ok). So the norm is that a sentence sounds funny but is comprehensible. There are other cases (relatively few in number) where sentences sound right but it is very unclear what they COULD mean, or where the interesting reading is just unavailable. Examples include 'why' extraction from strong islands and, until Colin corrected me, my view of sentences like 'more people visited Rome than I did.' The latter sounds quite good to me, but I have no idea what it could mean.

      Now Karthik notes that this kind of sentence might be as informative as Chomsky's famous 'colorless…" clauses. These too have no sensible interpretation yet tell us that words come in categories that Gs care about. So maybe here we find the same thing: this is of the right form for a comparative. Or, maybe just as useful, it is easily confused for something that is generable with a meaning and a sound. From what Colin says, I assume that these are confused with '(many/some) people visited Rome more often than I did.' This said, the actual sentence does not mean this and cannot. Rather people will assume/assimilate what they heard to something sensible like this for whatever reason. Is this interesting? Sure. So I need to retract.

      That said, I think that the core cases we use are those where the relevant linguistic objects are semantically sensible but "off" for some other reason. Why? Because if something is NOT semantically sensible then it is plausible that its unacceptability has nothing to do with the grammar at all and as our interest is in probing the structure if Gs and FL/UG we control for these sorts of cases. So, I think that the core data has been meaningful-but-nonetheless-odd.

      That said, Omer and Karthik have rightly nailed me: how does the fall back position sound? Meaningful and well formed?

  3. Somewhat related to Omer’s concerns. If one links acceptability to stable meaning/interpretation, what happens to sentences like “colourless green ideas sleep furiously”. (a) What would such sentences be showing us? (b) What would the status of such sentences be then? Wasn’t the whole point that such sentences seem much more “acceptable” than word-salads using the same words despite being devoid of meaning.

    1. So, I just talk to Paul and I would now add a further point to the one above. First, recall the point of Chomsky's "colorless' sentence. It was not to show that there was syntax even in the absence of semantic content. Rather it was to show that the acceptability of a sentence could not be reduced to its probability. The sentence was constructed so that no two words were likely to ever be paired together. So you will never find 'colorless-green,' nor 'green-ideas,' nor 'ideas-sleep,' nor 'sleep-furiously.' This was in response to the claim that grammaticality was irrelevant to acceptability and that all we need to do is take a look at bigram probabilities.

      So, is the sentence actually semantically incoherent? Well, not so clear as Paul pointed out. After all, it seems to have some serious implications all of which are false. So, if true then there are colorless green things. But there are none so the sentence is false. It implies that ideas sleep, which under a literal interpretation also seems false. So, the sentence has implications and they are false so the sentence is as well and so it has "content." The content is absurd and obviously false but so is 'John hugged a flying pink unicorn.' So, what do we learn from the Chomsky sentence? Well we learn that pairwise probabilities cannot explain acceptability and that something else is required and that sentences that sound odd might be odd because they are obviously false and/or too fanciful to be tolerated. Also that such sentences might be well formed and even meaningful on analogy with other sentences like 'revolutionary new ideas occur infrequently' with which they share a common structure.

      Now to 'more people visited Rome than I did.' What does this sentence imply? Quite unclear. Here the puzzle is that locally the words seem fine: 'More people visited Rome' is ok though it needs linking to a comparative. It is so linked but the comparative does not deliver the right comparison set. The puzzle then is why despite its obvious incoherence (is it even false?) it sounds ok. Good question. I take it that this is what COlin is working on. Interestingly, the suggestion is not (as I understand it) that the sentence is sorta grammatical. Ratehr the claim is that it is confused with a grammatical one that has a fine meaning.

      So what's the upshot? I think that the original Chomsky sentence might not be relevant here (though I confess that I understood it as Karthik did until about 5 minutes ago). Second, that the probes into Gs that we need are those that converge at the CI interface. This makes having a plausible interpretation an important pre-requisite for syntactic relevance, at lest prima facie. OK?

    2. This comment has been removed by the author.

    3. It might be relevant that 'Colorless green ideas'-type sentences were very common in certain forms of poetry (surrealistic, originally mostly French, I believe) not so long ago, during and before when Chomsky was a child. I'm only familiar with a Greek instance of this (Elytis), but the combination of low transition probabilities (either in a string or a dependency tree) and standard if sometimes rather complex syntax is quite striking. So, in anticipation of colorless green idea sleeping furiously, we have in 1939 naked feet censing the watercrests with green haste.

      In this English speaking world at least, this kind of thing is clearly a niche interest of a small number of people, but it used to be a lot more popular (& there's still quite a lot of genuine interest in Elytis in Greece); exactly why this is/was the case might be worth thinking about. At any rate, the genre is not something that Chomsky invented, but had/has an independent existence. Perhaps there are volumes of Verlaine, Baudelaire, Breton and Eluard on his shelves?

  4. @Norbert (Part 1):
    I actually went back to see how Chomsky had used “Colorless green ideas…” in the 50’s. My own original reading of that stuff was that Chomsky was making both points (a) frequency != grammaticality, (b) grammaticality != meaningfulness. Relevant quotes below:

    From Three Models for the Description of Language (pg. 116):
    “Whatever the other interest of statistical approximation in this sense may be, it is clear that it can shed no light on the problems of grammar. There is no general relation between the frequency of a string (or its component parts) and its grammaticalness. We can see this most clearly by considering such strings as
    (14) colorless green ideas sleep furiously.”

    From Syntactic Structures (pg. 15):
    “Second, the notion “grammatical” cannot be identified with “meaningful” or “significant” in any sense. Sentences (1) and (2) are equally nonsensical, but any speaker of English will recognize that only the former is grammatical.
    (1) Colorless green ideas sleep furiously.
    (2) Furiously sleep ideas green colorless.
    Similarly, there is no semantic reason to prefer (3) to (5)…”

    From the second quote it is clear that the sentences were used to show that there was syntax (grammaliticality) in the absence of semantic content. In fact, I suspect Chomsky used semantically nonsensical meanings to construct his 0 frequency sentences. So, at least as Chomsky intended the use of his example sentences, it was for both reasons.

  5. @Norbert (Part 2):

    >>So, is the sentence actually semantically incoherent? Well, not so clear as Paul pointed out. After all, it seems to have some serious implications all of which are false. So, if true then there are colorless green things. But there are none so the sentence is false.

    It is not clear to me that “colorless green” is false. It seems to me to have some sort of basic failure in assumptions; some sort of presupposition failure (caveat: I am not a semanticist, so I might be misusing these words).

    Now, I will grant you that even I can see that the way in which 'more people visited Rome than I did.' seems to be bad wrt meaning is different from “colorless green ideas…”. But what’s not clear is that this is the dividing line for acceptability.

    Chomsky in Aspects (pg. 10) says this about acceptability:
    “For the purposes of this discussion, let us use the term “acceptable” to refer to utterances that are perfectly natural and immediately comprehensible without paper-and-pencil analysis, and in no way bizarre or outlandish. Obviously, acceptability will be a matter of degree, along various dimensions. One could go on to propose various operational tests to specify the notion more precisely (for example, rapidity, correctness, and uniformity of recall and recognition, normalcy of intonation).” (Footnote 4 suggests looking at Miller and Isard (1963) for more info)

    "immediately comprehensible" is so vague that it is not clear that it distinguishes the two types of sentences above (also the word “correctness” is indecipherable to me in this context).

    Finally, moving closer to Norbert’s view :) - my own view is that as researchers we probably want to use both meaning and prosody and anything else that might be useful (speed, confidence, RTs, ERPs...) to understand grammaticality. But, we need to separate perhaps the analyst’s use of these things from the notion of acceptability itself, which as Omer mentioned is perhaps best seen as a superficial behavioral measure. Perhaps, another equally consistent view is to say that acceptability as a unified notion is ultimately not completely coherent. But, this should still not preclude a researcher from using some rule-of-thumb understanding of the concept to probe deeper, I think. I mean, what other option do we have! :)