A native speaker judges The
woman loves himself to be odd sounding. I explain this by saying that the
structure underlying this string is ungrammatical, specifically that it
violates principle A of the Binding Theory. How does what I say explain this judgment? It explains it if
we assume the following: grammaticality is a causally relevant variable in
judgments of acceptability. It may not be the only variable relevant to acceptability, but it is one of the
relevant variables that cause the native speaker to judge as s/he does (i.e the
ungrammaticality of the structure underlying the string causes a native speaker to judge the sentence unacceptable). If
this is correct (which it is), the relation between acceptability and
grammaticality is indirect. The aim of what follows is to consider how indirect
it can be while still leaving the relation between (un)acceptability and
(un)grammaticality direct enough for judgments concerning the former to be
useful as probes into the structure of the latter (and into the structure of Gs
which the notion of grammaticality implicitly reflects).
The above considerations suffice to conclude that
(un)acceptability need not be an infallible guide to (un)grammaticality. As the
latter is but one factor, then it
need not be that the former perfectly tracks the latter. And, indeed, we know
that there are many strings that are quite unacceptable but are not
ungrammatical. Famous examples include self-embedding (e.g. That that that Mary saw Sam is intriguing is
interesting is false), ‘buffalo’ sentences (e.g. Buffalo buffalo buffalo buffalo buffalo buffalo buffalo) and
multiple negation sentences (No eye
injury is too insignificant to ignore). The latter kinds of sentences are
hard to process and are reliably judged quite poor despite their being
grammatical. A favorite hobby of psycho-linguists is to find other cases of
grammatical strings that trouble speakers as this allows them to investigate (via
how sentences are parsed in real time) factors other than grammaticality that
are psycho-linguistically important. Crucially, all accept (and have since the
“earliest days of Generative Grammar”) that unacceptability does not imply
ungrammaticality.
Moreover, we have some reason to believe that acceptability
does not imply grammaticality. There are the famous cases like More people visited Rome than I did
which are judged by speakers to be fine despite the fact that speakers cannot
tell you what they mean. I personally no longer think that this shows that
these sentences are acceptable. Why? Precisely because there is no
interpretation that they support. There is no interpretation for these strings
that native speakers consistently recognize so I conclude form this that they
are unacceptable despite “sounding” fine. In other words, “sounding fine” is at
best a proxy for acceptability, one that further probing may undermine. It
often is good enough and it may be an interesting question to ask why some
ungrammatical sentences “sound fine” but the mere fact that they do is not in
itself sufficient reason to conclude that these strings are acceptable (let
alone grammatical).[1]
So are there any cases of acceptability without
grammaticality? I believe the best examples are those where we find subliminal
island effects (see here
for discussion). In such cases we find sentences that are judged acceptable
under the right interpretation. Despite this, they display the kinds of
super-additivity effects that characterize islands. It seems reasonable to me
to describe these strings as ungrammatical (i.e. violate island conditions)
despite their being acceptable. What this means is that for cases such as these
the super-additivity profile is a more sensitive measure of grammaticality than
is the bare acceptability judgment. In fact, assuming that the sentence
violates islands restrictions explains
why we find the super-additivity profile.
Of course, we would love to know why in these cases (but not in many
other island violating examples) ungrammaticality does not lead to
unacceptability. But not knowing why this is so, does not in and of itself compromise the conclusion that sentences these
acceptable sentences are ungrammatical.[2]
So, (un)acceptability does not imply (un)grammaticality, nor
vice versa. How then can the former be used as a probe into the latter? Well,
because this relation is stable often
enough. In other words, over a very large domain acceptability judgments
track grammaticality judgments, and that is good enough. In fact, as I’ve
mentioned more than once, Sprouse, Almeida, and Schutze have shown that these
data are very robust and very reliable over a very wide range, and thus are
excellent probes into grammaticality. Of course, this does not mean that they
such judgments are infallible indicators of grammatical structure, but then
nobody thought that they ever were. Let me elaborate on this.
We’ve known for a very long time that acceptability is
affected by many factors (see Aspects:10-15
for an early sophisticated discussion of these issues), including sentence
length, word frequencies, number of referential DPs employed, intonation and
prosody, types of embedding, priming, kinds of dependency resolutions required,
among others. These factors combine to yield a judgment of (un)acceptability on
a given occasion. And these are expected to be (and acknowledged to be) a
matter of degree. One of the things that linguists try to do in probing for
grammaticality is to compensate for these factors by comparing sentences of
similar complexity to one another to isolate the grammatical contribution to
the judgment in a particular case (e.g. we compare sentences of equal degree of
embedding when probing for island effects). This is frequently doable, though
we currently have no detailed account of how these factors interact to produce
any given judgment. Let me repeat this: though we don’t have a general theory of acceptability
judgments, we have a pretty good idea what factors are involved and when we are
careful (and even when we are not, as Sprouse has shown) we can control for
these and allow the grammatical factor to shine thorough a particular judgment.
In other words, we can set up a specific experimental situation that reliably tests
for G-factors (i.e. we can test whether G-factors are causally relevant in the
standard way that experiments typically do, by controlling the hell out of the
other factors). This is standard practice in the real sciences, where unpacking
interaction effects is the main aim of experimentation. I see no reason why the
same should not hold in linguistics.[3]
It is worth noting that the problem of understanding complex
data (i.e data that is reasonably taken to be the result of many causally
interacting factors) is not limited to linguistics. It is a common feature of
the real sciences (e.g. physics). Geoffrey Joseph has a nice (old) paper
discussing this, where he notes (786):[4]
Success at the construction
and testing of theories often does not proceed by attempting to explain all, or
even most, of the actually available data. Either by selecting appropriate
naturally occurring data or by producing appropriate data in the laboratory,
the theorist implicitly acknowledges …his decomposing
the causal factors at work into more comprehensible components. A consequence
of this feature of his methodology is that we are often in the position of
having very well-confirmed fundamental theories at hand, but at the same time
being unable to formulate complete deductive explanations of natural (complex)
phenomena.
This said, it is
interesting when (un)acceptability and (un)grammaticality diverge. Why? Because,
somewhat surprisingly, as a matter of
fact the two track one another so closely (in fact, much more closely than
we had any reason to expect a priori).
This is what makes it theoretically interesting when the two diverge. Here’s
what I mean.[5]
There is no reason why (un)acceptability should have been
such a good probe into (un)grammaticality. After all, this is a pretty gross
judgment that we ask speakers to make and we are absolutely sure that many
factors are involved. Nonetheless, it seems that the two really are closely
aligned over a pretty large domain. And precisely because they are, it is
interesting to probe where they diverge and why. Figuring out what’s going on
is likely to be very informative.
Paul Pietroski has suggested an analogy from the real
sciences. Celestial mechanics tracks the actual position of planets in space in
terms of their apparent positions. Now, there is no a priori reason why a planet’s apparent
position should be a reliable guide to its actual one. After all, we know that
the fact that a stick in water looks bent does not mean that it is bent. But at
least in the heavens, apparent position data was good enough to ground Kepler’s
discoveries and Newton’s. Moreover, precisely because the fit was so good,
their apparent divergence in a few cases was rightly taken to be an interesting
problem to be solved. The solutions required a complete overhaul of Newton’s
laws of gravitation (actually, relativity ended up deriving Newton’s laws as
limit cases, so in an important sense they these laws were conserved). Note
that this did not deny that apparent position was pretty good evidence of
actual position. Rather it explained why in the general case this was so and
why in the exceptional cases the correlation failed to hold. It would have been
a very bad idea for the history of physics had physicists drawn the conclusion
that the anomalies (e.g. the perihelion of Mercury) showed that Newton’s laws
should be trashed. The right conclusion was that the anomaly needed
explanation, and they retained the theory until something came along that
explained both the old data and also
explained the anomalies.
This seems like a rational strategy, and it should be
applied to our divergent cases as well. And it has been to good effect in some
cases. The case of unacceptability-despite- grammaticality has generated
interesting parsing models that try to explain why self- embedding is
particularly problematic given the nature of biological memory (e.g. see work
by Rick Lewis and friends). The case of acceptability-despite-ungrammaticality
has led to the development of somewhat more refined tools for testing acceptability
that has given us criteria other than simple acceptability to measure
grammaticality.
The most interesting instance of the divergence, IMO, is the
case of Mainland Scandinavian where detectable island violations (remember the
super-additivity effects) do not yield unacceptability. Why not? Dunno.[6]
But, as in the mechanics case above, the right attitude is not that the failure
of acceptability to track grammaticality shows that there are no island effects
and that a UG theory of islands is clearly off the mark. Rather the divergence
indicates the possibility of an interesting problem here and that there is
something that we still do not understand. Need I say, that this latter observation
is not a surprise to any working GGer? Need I say that this is what we should
expect in normal scientific practice?
So, grammaticality is one factor in acceptability and a
reliable robust one at that. However, like most measures, it works better in
some contexts than in others, and though this fact does not undermine the
general utility of the measure, it raises interesting research questions as to
why.
Let me end by repeating that all of this is old hat. Indeed,
Chomsky’s discussion in chapter 1 of Aspects
is still a valuable intro to these issues (11):
…the scales of grammaticalness and
acceptability do not coincide. Grammaticalness is only one of many factors that
interact to determine acceptability. Correspondingly, although one might
propose various operational tests for acceptability, it is unlikely that a
necessary and sufficient operational criterion might be invented for the much
more abstract and important notion of grammaticalness.
So, can we treat grammaticalness as some “kind” of
acceptability? No, nor should we expect to. Can we use acceptability to probe
grammaticalness? Yes, but as in all areas of inquiry there is no guarantee that
these judgments are infallible guides. Should we expect to one day have a solid
theory of acceptability? Well, some hope for this, but I am skeptical.
Phenomena that are the result of the interactions of many factors are usually
theoretically elusive. We can tie loose ends down well enough in particular
cases, but theories that specify in advance which looses ends are most relevant
are hard to come by, and not only in linguistics. There are no general theories
of experimental design. Rather there are rules of thumb of what to control for
in specific cases informed by practice and some theory. This is true in the
“real” sciences, and we should expect no less in linguistics. Those who demand
more in the latter case are
methodological dualists, holding linguistics to uniquely silly standards.
[1]
Other similar examples involve cases where linear intervening non-licensing
material can improve a sentences acceptability. There is a lot of work on this
involving NPI licensing by clearly non-c-commanding negative elements. This too
has been widely discussed in the parsing literature. Again, interpretations for
these improved sentences are hard to come by and so it is unclear whether these
sentences are actually acceptable despite their tonal improvements.
[2]
So far as I can tell, similar reasoning applies to some recent discussion of
binding effects in a recent Cognition paper by Cole, Hermon and Yanti that I
hope to discuss more fully in the near future.
[3]
As Nancy Cartwright observes in her 1983 book (p. 83), the aim of an experiment
is to find “quite specific effects peculiarly sensitive to the exact character
of the causes [you] want to study.” Experiments are very context sensitive set
ups developed to find these effects. And
they often fail to explain a lot. See Geoffrey Joseph quote below.
[4]
See his “The many sciences and the one world,” Journal of Philosophy 1980: 773-791.
[5]
I own what follows to some discussion with Paul. He is, of course, fully
responsible for my misstatements.
Since you mention the case here, I'll mention that our in-depth study on comparative illusions ("More people have been to Russia than I have"), led by Alexis Wellwood, is now finally available.
ReplyDeleteIn connection with those cases, it's interesting that you propose a 3-way distinction, which I haven't seen discussed much elsewhere: (i) acceptable irrespective of meaning ("sounds good"); (ii) acceptable with a specific meaning; (iii) grammatical. It's worth noting that the Sprouse et al. large scale acceptability judgment studies that you cite here specifically sampled sentences where meaning is not at stake. So they avoided testing your (ii).
I'll also mention that the main finding from our studies on comparative illusions is that they do seem to be associated with a specific meaning. Just one that shouldn't be grammatically possible. The sentences look like statements that compare quantities of individuals, but they seem to be understood as comparisons of quantities of events. At least, the most consistent effect on the acceptability of the sentences is whether the predicate is repeatable, hence countable. Of course, the effects are on the squishy side, so many things affect how consistently people fall for the illusion.
I have to say that I am a bit surprised to see you, Norbert, suggesting that we separate out sounds good but I can't pin down its specific meaning from acceptable.
ReplyDeleteThe reason for my surprise is that the distinction between these two category presupposes an understanding of what "a specific meaning" is, an understanding that some theoreticians may have (or purport to have), but which a participant in an experiment surely has (at best) very indirect access to.
Obviously, my own methodological proclivities are showing, here. But consider this: the above discussion of acceptability centers around the idea that we have no "direct line" to grammaticality, and that acceptability is a surprisingly-reliable-but-still-indirect conduit for well-formedness. Well, if anything, it would seem to me that we have an even less direct line to [well-formedness ∧ semantic-interpretability]. I know this goes against the grain of the last, say, 15-20 years of research in syntax – where reasoning about syntactic structure from the availability or unavailability of a given reading has become de rigueur. And I'm certainly not advocating that the (wonderful) results of this research be tossed out or anything remotely like that. But to me, it seems that asking naïve participants "is this an acceptable sentence" is methodologically cleaner than asking them "is this sentence acceptable and can you articulate its meaning," or "is this sentence acceptable under this interpretation." Of course you can set up a context or some real-world knowledge that biases towards a specific reading, and then ask if the sentence is acceptable in that context (Lisa Matthewson, Amy Rose Deal, and others have been writing up more formalized methodologies that address these issues in much more careful, systematic ways). That said, I see no a priori reason why we should presuppose that "acceptability" correlates with [syntactically well-formed ∧ semantically interpretable] more directly than it correlates with [syntactically well-formed] alone. That seems to me to be an empirical question.
@Omer. I find Norbert's distinction reasonable (though I was surprised to see the conclusion that it led to). We talk all the time about acceptability of form-meaning pairings, whether explicitly or otherwise. So it's fair to point out that if an example is accepted despite no stable meaning, then it's a different animal. I happen to think that those comparative illusions are not (quite) as semantically mysterious as has been assumed in the past. But Norbert's point stands.
Delete@Colin: I don't doubt that there is an in principle distinction between acceptable-with-stable-meaning and acceptable-without-stable-meaning. What I was taking issue with was the idea that the term "acceptability" should be applied to one and not the other. To me, acceptability is a behavioral measure, and I suspect that in practice the these two categories don't separate so neatly (behaviorally).
DeleteI see that I may have been too bold in my claim. So let me modify it a bit both in Colin's and Karthik's direction.
DeleteI take it that Gs primarily map meanings with/a sounds over an infinite domain. I find it reasonable to suppose that the data most relevant for probing this mechanism are those which do this, i.e. produce a structure with both a sound side and a meaning side. Now, curiously, many sentences that are unacceptable are nonetheless quite meaningful. IN fact, so far as I can tell this is the norm. So fixed subject violations are nonetheless easy to understand and even relative clause island violations are hardly incomprehensible (i.e. it's pretty clear what they WOULD mean were they ok). So the norm is that a sentence sounds funny but is comprehensible. There are other cases (relatively few in number) where sentences sound right but it is very unclear what they COULD mean, or where the interesting reading is just unavailable. Examples include 'why' extraction from strong islands and, until Colin corrected me, my view of sentences like 'more people visited Rome than I did.' The latter sounds quite good to me, but I have no idea what it could mean.
Now Karthik notes that this kind of sentence might be as informative as Chomsky's famous 'colorless…" clauses. These too have no sensible interpretation yet tell us that words come in categories that Gs care about. So maybe here we find the same thing: this is of the right form for a comparative. Or, maybe just as useful, it is easily confused for something that is generable with a meaning and a sound. From what Colin says, I assume that these are confused with '(many/some) people visited Rome more often than I did.' This said, the actual sentence does not mean this and cannot. Rather people will assume/assimilate what they heard to something sensible like this for whatever reason. Is this interesting? Sure. So I need to retract.
That said, I think that the core cases we use are those where the relevant linguistic objects are semantically sensible but "off" for some other reason. Why? Because if something is NOT semantically sensible then it is plausible that its unacceptability has nothing to do with the grammar at all and as our interest is in probing the structure if Gs and FL/UG we control for these sorts of cases. So, I think that the core data has been meaningful-but-nonetheless-odd.
That said, Omer and Karthik have rightly nailed me: how does the fall back position sound? Meaningful and well formed?
Somewhat related to Omer’s concerns. If one links acceptability to stable meaning/interpretation, what happens to sentences like “colourless green ideas sleep furiously”. (a) What would such sentences be showing us? (b) What would the status of such sentences be then? Wasn’t the whole point that such sentences seem much more “acceptable” than word-salads using the same words despite being devoid of meaning.
ReplyDeleteSo, I just talk to Paul and I would now add a further point to the one above. First, recall the point of Chomsky's "colorless' sentence. It was not to show that there was syntax even in the absence of semantic content. Rather it was to show that the acceptability of a sentence could not be reduced to its probability. The sentence was constructed so that no two words were likely to ever be paired together. So you will never find 'colorless-green,' nor 'green-ideas,' nor 'ideas-sleep,' nor 'sleep-furiously.' This was in response to the claim that grammaticality was irrelevant to acceptability and that all we need to do is take a look at bigram probabilities.
DeleteSo, is the sentence actually semantically incoherent? Well, not so clear as Paul pointed out. After all, it seems to have some serious implications all of which are false. So, if true then there are colorless green things. But there are none so the sentence is false. It implies that ideas sleep, which under a literal interpretation also seems false. So, the sentence has implications and they are false so the sentence is as well and so it has "content." The content is absurd and obviously false but so is 'John hugged a flying pink unicorn.' So, what do we learn from the Chomsky sentence? Well we learn that pairwise probabilities cannot explain acceptability and that something else is required and that sentences that sound odd might be odd because they are obviously false and/or too fanciful to be tolerated. Also that such sentences might be well formed and even meaningful on analogy with other sentences like 'revolutionary new ideas occur infrequently' with which they share a common structure.
Now to 'more people visited Rome than I did.' What does this sentence imply? Quite unclear. Here the puzzle is that locally the words seem fine: 'More people visited Rome' is ok though it needs linking to a comparative. It is so linked but the comparative does not deliver the right comparison set. The puzzle then is why despite its obvious incoherence (is it even false?) it sounds ok. Good question. I take it that this is what COlin is working on. Interestingly, the suggestion is not (as I understand it) that the sentence is sorta grammatical. Ratehr the claim is that it is confused with a grammatical one that has a fine meaning.
So what's the upshot? I think that the original Chomsky sentence might not be relevant here (though I confess that I understood it as Karthik did until about 5 minutes ago). Second, that the probes into Gs that we need are those that converge at the CI interface. This makes having a plausible interpretation an important pre-requisite for syntactic relevance, at lest prima facie. OK?
This comment has been removed by the author.
DeleteIt might be relevant that 'Colorless green ideas'-type sentences were very common in certain forms of poetry (surrealistic, originally mostly French, I believe) not so long ago, during and before when Chomsky was a child. I'm only familiar with a Greek instance of this (Elytis), but the combination of low transition probabilities (either in a string or a dependency tree) and standard if sometimes rather complex syntax is quite striking. So, in anticipation of colorless green idea sleeping furiously, we have in 1939 naked feet censing the watercrests with green haste.
DeleteIn this English speaking world at least, this kind of thing is clearly a niche interest of a small number of people, but it used to be a lot more popular (& there's still quite a lot of genuine interest in Elytis in Greece); exactly why this is/was the case might be worth thinking about. At any rate, the genre is not something that Chomsky invented, but had/has an independent existence. Perhaps there are volumes of Verlaine, Baudelaire, Breton and Eluard on his shelves?
@Norbert (Part 1):
ReplyDeleteI actually went back to see how Chomsky had used “Colorless green ideas…” in the 50’s. My own original reading of that stuff was that Chomsky was making both points (a) frequency != grammaticality, (b) grammaticality != meaningfulness. Relevant quotes below:
From Three Models for the Description of Language (pg. 116):
“Whatever the other interest of statistical approximation in this sense may be, it is clear that it can shed no light on the problems of grammar. There is no general relation between the frequency of a string (or its component parts) and its grammaticalness. We can see this most clearly by considering such strings as
(14) colorless green ideas sleep furiously.”
From Syntactic Structures (pg. 15):
“Second, the notion “grammatical” cannot be identified with “meaningful” or “significant” in any sense. Sentences (1) and (2) are equally nonsensical, but any speaker of English will recognize that only the former is grammatical.
(1) Colorless green ideas sleep furiously.
(2) Furiously sleep ideas green colorless.
Similarly, there is no semantic reason to prefer (3) to (5)…”
From the second quote it is clear that the sentences were used to show that there was syntax (grammaliticality) in the absence of semantic content. In fact, I suspect Chomsky used semantically nonsensical meanings to construct his 0 frequency sentences. So, at least as Chomsky intended the use of his example sentences, it was for both reasons.
This comment has been removed by the author.
Delete@Norbert (Part 2):
ReplyDelete>>So, is the sentence actually semantically incoherent? Well, not so clear as Paul pointed out. After all, it seems to have some serious implications all of which are false. So, if true then there are colorless green things. But there are none so the sentence is false.
It is not clear to me that “colorless green” is false. It seems to me to have some sort of basic failure in assumptions; some sort of presupposition failure (caveat: I am not a semanticist, so I might be misusing these words).
Now, I will grant you that even I can see that the way in which 'more people visited Rome than I did.' seems to be bad wrt meaning is different from “colorless green ideas…”. But what’s not clear is that this is the dividing line for acceptability.
Chomsky in Aspects (pg. 10) says this about acceptability:
“For the purposes of this discussion, let us use the term “acceptable” to refer to utterances that are perfectly natural and immediately comprehensible without paper-and-pencil analysis, and in no way bizarre or outlandish. Obviously, acceptability will be a matter of degree, along various dimensions. One could go on to propose various operational tests to specify the notion more precisely (for example, rapidity, correctness, and uniformity of recall and recognition, normalcy of intonation).” (Footnote 4 suggests looking at Miller and Isard (1963) for more info)
"immediately comprehensible" is so vague that it is not clear that it distinguishes the two types of sentences above (also the word “correctness” is indecipherable to me in this context).
Finally, moving closer to Norbert’s view :) - my own view is that as researchers we probably want to use both meaning and prosody and anything else that might be useful (speed, confidence, RTs, ERPs...) to understand grammaticality. But, we need to separate perhaps the analyst’s use of these things from the notion of acceptability itself, which as Omer mentioned is perhaps best seen as a superficial behavioral measure. Perhaps, another equally consistent view is to say that acceptability as a unified notion is ultimately not completely coherent. But, this should still not preclude a researcher from using some rule-of-thumb understanding of the concept to probe deeper, I think. I mean, what other option do we have! :)