In a recent post (here), I talked about the relation between two related concepts, acceptability and grammaticality and noted that the first is the empirical probe that GGers have standardly used to study the second more important concept. The main point was that there is no reason to think that the two concepts should coincide extensionally (i.e. that some linguistic object (LO) is (un)grammatical iff it is (un)acceptable. We should expect considerable slack between the two, as Chomsky noted long ago in Aspects (and Current Issues). Two things follow from this, one surprising the other not so much. The expected fact is that there are cases where the two concepts diverge. The more surprising one is that this happens surprisingly rarely (though this is more an impression than a quantifiable (at least by me, claim)) and that when it does happen it is interesting to try and figure out why.
In this post, I’d like to consider another question; how important are (or, should be) notions like degree of acceptability and grammaticality. It is often taken for granted that acceptability judgments are gradient and it is often assumed that this is an important fact about human linguistic competence and, hence, must be grammatically addressed. In other words we sometimes find the following inchoate argument: acceptability judgments are gradient therefore grammatical competence must incorporate a gradient grammatical component (e.g. probabilistic grammars or notions like degree of grammaticality). In what follows I would like to address this ‘therefore.’ I have no problem thinking that sentences may be grammatical to some degree or other rather than simply +/- grammatical. What I am far less sure of is whether this possible notion has been even weakly justified. Let’s start with gradient acceptability.
It is not infrequently observed that acceptability judgments (of utterances) are gradient while most theories of G provide categorical classifications of Los (e.g. sentences). Thus, though GGers (e.g. Chomsky) have allowed that grammaticality might be a graded notion (i.e. the relevant notion being multi-valued not binary) in practice GG accounts have been based on a categorical understanding of the notion of well-formedness. It is often further mooted that we need (or methodologically desire) a tighter fit between the two notions and because acceptability data is gradient therefore we need a gradient notion of grammaticality. How convincing is this argument?
Let me say a couple of things. First, it is not clear, at least to me, that all or even most acceptability judgment data is gradient. Some facts are clearly pretty black or white with nary a shade of gray. Here’s an example of one such.
Take the sentences (1)-(3):
(1) Mary hugged John
(2) John hugged Mary
(3) John was hugged by Mary
It is a fact universally acknowledged that English speakers in search of meanings will treat (1) and (2) quite differently. More specifically, all speakers know that (1) and (2) don’t mean the same thing, that in (1) Mary is the hugger and John the thing hugged while the reverse is true in (2), that (3) is a paraphrase of (1) (i.e. that the same hugger hugee relations hold in (3) as in (1)) and that (3) is not a paraphrase of (2). So far as I can tell, these judgments are not in the least variable and they are entirely consistent across all native speakers of English, always. Moreover, these kinds of very categorical data are a dime a dozen.
Indeed, I would go further: if asked to ordinally rank pairs of sentences, over a very wide range, speaker judgments will be very consistent and categorically so. Thus, everyone will judge the conceptually opaque colorless green ideas sleep furiously superior to the conceptually opaque ideas furiously sleep green colorless, and everyone will judge who is it that John persuaded someone who met to hug him is less acceptable than who is it that persuaded someone who met John to hug him. As regards pair-wise relative acceptability, the data are very clear in these cases as well.
Where things become murkier is when we ask people to assign degrees of acceptability to individual sentences or pairs thereof. Here we ask not if some sentence is better or worse than another, but ask people to provide graded judgments to stimuli (to say how much worse); rate this sentence’s acceptability between 1-7 (best to worst) on a scale. This question elicits graded judgments. Not for all cases, as I doubt that any speaker would be anything but completely sure that (1) and (2) are not paraphrases and that (1) and (3) depict the same events in contras to (2). But, it might be that whereas some assign a 6 to the first wh question above and a 2-3 for the second, come might give different rankings. I am even ready to believe that the same person might give different rankings on different occasions. Say that this is true? What if anything should we conclude about the Gs pertinent to these judgments?
One conclusion is that the Gs must be graded because (some of) our acceptability judgments are. But why believe this? Why believe that grammaticality is graded simply because how we measure it using one particular probe is graded (conceding for the moment that acceptability judgments are invariably gradient). Surely, we don’t conclude that the melting point of lead (viz. 327.5 Celsius) is gradient just because every time we measure it we get a slightly different value. In fact, anytime we measure anything we get different values. So what? Thus, the mere fact that one acceptability measure yields gradient values implies very little about whether grammaticality (i.e. the thing measured by the acceptability judgment) is also gradient. We wouldn’t conclude this about the melting point of lead, so why conclude this about the grammaticality of John saw Mary or the ungrammaticality of who did John see Mary?
Moreover, we know a thing or two about how gradient values can arise from the interaction of non-gradient factors. Indeed, phenomena that are the product of many interacting factors can lead to gradient outputs precisely because they combine many disparate factors (think of continuous height which is the confluence of many interacting integral genetic factors). Doesn’t this suffice to accommodate graded judgments of acceptability even if grammaticality is quite categorical? We have known forever that grammaticality is but one factor in acceptability so we should not be surprised that the many many interacting factors that underlie any give acceptability judgment lead to a gradient response even if every one of the contributing factors is NOT gradient at all.
I would go further: what is surprising is not that we sometimes get gradient responses when we ask for them (and that is what asking people to rate stimuli on a 7 point scale is doing) but that we find it easy to consistently bin a large number of similar sentences into +/- piles. That is surprising. Why? Because it suggests that grammaticality must be a robust factor in acceptability for it outshines all the other factors in many cases even when collateral influences are only weakly controlled. That is surprising (at least to me): why can we do this so reliably over a pretty big domain? Of course, the fact that we can is what makes acceptability judgments decent probes into grammatical structure, with all of the usual caveats (i.e. usual given the rest of the sciences) about how measuring might be messy.
Note, I personally have nothing against the conclusion that ‘grammatical’ is not a binary notion but multi-valued. Maybe it is (Chomsky has constantly mentioned this possibility over the years, especially in his earliest writing (see quote at end of post)). So, I am not questioning the possibility. What I want to know is why we should currently assume this? What is the data (or theoretical gain) that warrants this conclusion? It can’t just be variable acceptability judgments for these can arise even if grammaticality is a simple binary factor. In other words noting that we can get speakers to give gradient judgments is not evidence that grammars are gradient.
A second question: how would it change things (theory or practice) were it so. In other words, how do we conceptually of theoretically benefit by assuming that Gs are gradient? Here’s what I mean. I take it as a virtual given that linguistic competence rests (in part) on having a G. Thus it seems a fair question as to what kinds of Gs humans have: what kinds of structures and dependencies are characteristic of human Gs. Once we know this, we can add that the Gs humans use to produce and understand the linguistic world around them code not only the kinds of dependencies that are possible, but the probabilities of usage concerning this or that structure or dependency. In other words, I have no problem probabilizing a G. What I don’t see is how this second process of adding numbers between 0 and 1 to Gs will eliminate the need to specify the class of Gs without their probabilistic clothing. If it won’t, then whether or not Gs carry probabilities will not much affect the question of what the class of possible Gs is.
Let me put this another way: probabilities requires a set of options over which these probabilities (the probability “mass” (I love that term)) are distributed. This means that we need some way of specifying the options. This is something that Gs do well: they specify the range of options to which probabilities are then added (as people like John Hale and his friends) do to get probabilistic Gs, useful for the investigation of various kinds of performance facts (e.g. how hard something is to parse). Doing this might even tell us something about which Gs are right (as Time Hunter has recently been arguing (see here)). This all seems perfectly fine to me. But, but, but, …this all presupposes that we can usefully divide the question into two parts: what is the grammar and how are probabilities computed over these structures. And if this is so, then even if there is a sense in which Gs are probabilistic, it does not in any way suggest that the search for the right Gs is any way off the mark. In fact, the probabilistic stuff presupposes the grammar stuff and the latter is very algebraic. Why? Because probabilities presuppose a possibility space defined by an algebra over which probabilities are then added. So the question: why should I assume that grammaticality is a gradient notion even if the data that I use to probe it is gradient (if in fact it is)? I have no idea.
Let me say this another way. One aim of theory is to decompose a phenomenon to reveal the interacting sub-parts. It would be nice if we could regularly then add these subparts up together to “derive” the observable effect. However, this is only possible in a small number of cases (e.g. physics tells us how to combine forces to get a resultant force, but these laws of combination are surprisingly rare and is not possible even in large areas of physics). How then do we identify the interacting factors? By holding the other non-interacting factors constant (i.e. by controlling the “noise”). When successful this allows us to identify the components of interest and even investigate their properties even though we cannot elaborate in any general way how the various factors combine or, even what all of them might be. Thus, being able to control the relevant interfering factors locally (i.e within a given “experiment”) does not imply that we can globally identify all that is relevant (i.e. identify all potentially relevant factors ahead of time). The demand that Gs be gradient sounds like the demand (i) that we eschew decomposing complex phenomena into their subparts or (ii) that we cannot study parts of a complex phenomenon unless we can explain how the parts work together to produce the whole. The first demand is silly. The second is too demanding. Of course, everyone would love to know how to combine various “forces” to yield a resultant one. But the inability to do this does not mean that we have failed to understand anything about the interacting components. Rather it only implies what is obvious; that we still don’t understand exactly how they interact. This is standard practice in the real sciences and demanding more from linguists is just another instance of methodological dualism.
Let me end by noting that Chomsky in his early work was happy to think that grammaticality was also a gradient notion. In particular, in Current Issues (9) he writes:
This [human linguistic NH] competence can be represented…as a system of rules that we can call a grammar of his language. To each phonetically possible utterance…the grammar assigns a certain structural description that specifies the linguistic elements of which it is constituted and their structural relations…For some utterances, the structural description will indicate…that they are perfectly well-formed sentences. To others, the grammar will assign structural descriptions that indicate the manner of their deviation from perfect well-formedness. Where the deviation is sufficiently limited, an interpretation can often be imposed by virtue of formal relations to sentences of the generated language.
So, there is nothing that GGers have against notions of grammatical gradience, though, as Chomsky notes, it will likely be parasitic on some notion of perfect well-formedness. However, so far as I can tell, this more elaborate notion (though mooted) has not played a particularly important role in GG. We have had occasional observations that some kinds of sentences are more unacceptable than others and that this might perhaps be related to their violating more grammatical conditions. But this has played a pretty minor role in the development of theories of grammar and, so far as I can determine, have not displaced the idea that we need some notion of absolute well-formedness to support this kind of gradient notion. So, as a practical matter, the gradient notion of grammaticality has been of minor interest (if any).
So do we need a notion like degree of grammaticality? I don’t see it. And I really do not see why we should conclude from the (putative, but quite unclear) fact that utterances are (all?) gradient in their acceptability that GG needs a gradient conception of grammar. Maybe it does, but this is a lousy argument. Moreover, if we do need this notion, it seems at this point like it will be a minor revision to what we already think. In other words, if true, it is not clear that it is particularly important. Why then the fuss? Because many confuse the tools they use with the subject matter being investigated. This is a tendency particularly prone to those bent in an Empiricist direction. If Gs are just statistical summaries of the input, then as the input is gradient (or so it is assumed) then the inferred Gs must be. Given this conception, the really bad inference from gradient utterances to gradient sentences makes sense. Moral: this is just another reason to avoid being bent in an Esih direction.
 What is far less clear is that we can get a well ordering of these pair-wise judgments, i.e. if A is better than B and B is better than C then A will be better than C. Sometimes this works, but I can imagine that sometimes it does not. If I recall correctly, Jon Sprouse discusses this in his thesis.
 Connectionists used to endorse the holistic idea that decomposing complex phenomena into interacting parts falsifies cognition. The whole brain does stuff and trying to figure out what part of a given effect was due to which cognitive powers was taken to be wrong-headed. I have no idea if this is still a popular view, but if it is, that it is not a view endorsed in the “real” sciences. For more discussion of this issue see Geoffrey Joseph’s “The many sciences and the one world,” J of Philosophy December 1980. As he puts it (786):
The scientist does not observe, hypothesize and deduce. He observes, decomposes, hypothesizes, and deduces. Implicit in the methodology of the theorists to whom we owe our knowledge of the laws of the various force fields is the realization that one must formulate theories of the components and leave for the indefinite future the task of unifying the resulting subtheories into a comprehensive theory of the world…A consequence of this feature of his methodology is we are often in the position of having very well-confirmed fundamental theories at hand, but at the same time being unable to formulate complete deductive explanations of natural (complex) phenomena.
If they cannot do this in physics, it would be surprising if we could do this in cognition. Right now aiming to have a comprehensive theory of acceptability strikes me as aiming way too high. IMO, we will likely never get this, and, more importantly, it is not necessary for trying to figure out the structure of FL/UG/G that we do. Demanding this in linguistics while even physics cannot deliver the goods is a form of methodological sadism, best ignored except in the privacy of your lab meetings between consenting researchers.