Up comes a linguist in the street interviewer and asks: “So NH, what would grad students at UMD find to be one of your more annoying habits?” I would answer: my unrelenting obsession with forever banishing from the linguistics lexicon the phrase “grammaticality judgment.” As I never tire of making clear, usually in a flurry of red ball-point scribbles and exclamation marks, the correct term is “acceptability judgment,” at least when used, as it almost invariably is, to describe how speakers rate some bit of data. “Acceptability” is the name of the scale along which such speaker judgments array. “Grammaticality” is how linguists explain (or partly explain) these acceptability judgments. Linguists make grammaticality judgments when advancing one or another analysis of some bit of acceptability data. I doubt that there is an interesting scale for such theoretical assessments.
Why the disregard for this crucial difference among practicing linguists? Here’s a benign proposal. A sentence’s acceptability is prima facie evidence that it is grammatical and that a descriptively adequate G should generate it. A sentence’s unacceptability is prima facie evidence that a descriptively adequate G should not generate it. Given this, using the terms interchangeably is no big deal. Of course, not all facies are prima and we recognize that there are unacceptable sentences that an adequate G should generate and that some things that are judged acceptable nonetheless should not be generated. We thus both recognize the difference between the two notions, despite their intimate intercourse, and interchange them guilt free.
On this benign view, my OCD behavior is simple pedantry, a sign of my inexorable aging and decline. However, I recently read a paper by Katz and Bever (K&B) that vindicates my sensitivities (see here), which, of course, I like very much and am writing to recommend to you (I would nominate it for classic status). Of relevance here, K&B argues that the distinction between grammaticality and acceptability is an important one and that blurring it often reflects the baleful influence of that most pernicious intellectual habit of mind, EMPIRICISM! I have come to believe that K&B is right about this (as well, I should add, about many other things, though not all). So before getting into the argument regarding acceptability and Empiricism, let me recommend it to you again. Like an earlier paper by Bever that I posted about recently (here), this is a Whig History of an interesting period of GG research. There is a sustained critical discussion of early Generative Semantics that is worth looking at, especially given the recent rise of interest in these kinds of ideas. But this is not what I want to discuss here. For the remainder, let me zero in on one or two particular points in K&B that got me thinking.
Let’s start with the acceptability vs grammaticality distinction. K&B spend a lot of time contrasting Chomsky’s understanding of Gs and Transformations with Zellig Harris’s. For Harris, Gs were seen as compact ways to cataloguing linguistic corpora. Here is K&B (15):
…grammars came to be viewed as efficient data catalogues of linguistic corpora, and linguistic theory took the form of a mechanical discovery procedure for cataloguing linguistic data.
Bloomfieldian structuralism concentrated on analyzing phonology and morphology in these terms. Harris’s contribution was to propose a way of extending these methods to syntax (16):
Harris’s particular achievement was to find a way of setting up substitution frames for sentences so that sentences could be grouped according to the environments they share, similar to the way that phonemes or morphemes were grouped by shared environments…Discourse analysis was…the product of this attempt to extend the range of taxanomic analysis beyond the level of immediate constituents.
Harris proposed two important conceptual innovations to extend Structuralist taxonomic techniques to sentences; kernel sentences and transformations. Kernels are a small “well-defined set of forms” and transformations, when applied to kernels, “yields all the sentence constructions of the language” (17). The coocurrence restrictions that lie at the heart of the taxanomy are stated at the level of kernel sentences. Transformations of a given kernel define an equivalence class of sentences that share the same discourse “constituency.” K&B put this nicely, albeit in a footnote (16:#3):
In discourse analysis, transformations serve as the means of normalizing texts, that is of converting the sentences of the text into a standard form so that they can be compared and intersentence properties [viz. their coocuurences, NH] discovered.
So, for Harris, kernel sentences and transformations are ways of compressing a text’s distributional regularities (i.e. “cataloguing the data of a corpus” (12)).
This is entirely unlike the modern GG conception due to Chomsky, as you all know. But in case you need a refresher, for modern GG, Gs are mental objects internalized in brains of native speakers and which underlie their ability to produce and understand an effectively unbounded number of sentences, most of which have never before been encountered (aka; linguistic creativity). Transformations are a species of rule these mental Gs contain that map meaning relevant levels of G information to sound relevant (or articulator relevant) levels of G information. Importantly, on this view, Gs are not ways of characterizing the distributional properties of texts or speech. They are (intended) descriptions of mental structures.
As K&B note, so understood, much of the structure of Gs is not surface visible. The consequence?
The input to the language acquisition process no longer seems rich enough and the output no longer simple enough for the child to obtain its knowledge of the latter by inductive inferences that generalize the distributional regularities found in speech. For now the important properties of the language lie hidden beneath the surface form of sentences and the grammatical structure to be acquired is seen as an extremely complex system of highly intricate rules relating the underlying levels of sentences to their surface phonetic form. (12)
In other words, once one treats Gs as mental constructs the possibility of an Empiricist understanding of what lies behind human linguistic facility disappears as a reasonable prospect and is replaced by a Rationalist conception of mind. This is what made Chomsky’s early writings on language so important. They served to discredit empiricism in the behavioral sciences (though ‘discredit’ is too weak a word for what happened). Or as K&B nicely summarize matters (12):
From the general intellectual viewpoint, the most significant aspect of the transformationalist revolution is that it is a decisive defeat of empiricism in an influential social science. The natural position for an empiricist to adopt on the question of the nature of grammars is the structuralist theory of taxanomic grammar, since on this theory every property essential to a language is characterizable on the basis of observable features of the surface form of its sentences. Hence, everything that must be acquired in gaining mastery of a language is “out in the open”; moreover, it can be learned on the basis of procedures for segmenting and classifying speech that presupposes only inductive generalizations from observable distributional regularities. On the structuralist theory of taxanomic grammar, the environmental input to language acquisition is rich enough, relative to the presumed richness of the grammatical structure of the language, for this acquisition process to take place without the help of innate principles about the universal structure of language…
Give up the idea that Gs are just generalizations of the surface properties of speech, and the plausibility of Empiricism rapidly fades. Thus, enter Chomsky and Rationalism, exit taxonomy and Empiricism.
The shift from the Harris Structuralist, to the Chomsky mentalist, conception of Gs naturally shifts interest to the kinds of rules that Gs contain and to the generative properties of these rules. And importantly, from a rule-based perspective it is possible to define a notion of ‘grammaticality’ that is purely formal: a sentence is grammatical iff it is generated by the grammar. This, K&B note is not dependent on the distribution of forms in a corpus. It is a purely formal notion, which, given the Chomsky understanding of Gs, is central to understanding human linguistic facility. Moreover, it allows for several conceptions of well-formedness (phonological, syntactic, semantic etc.) that together contribute along with other factors to a notion of acceptability, but are not reducible to it. So, given the rationalist conception, it is easy and natural to distinguish various ingredients of acceptability.
A view that takes grammaticality to just be a representation of acceptability, the Harris view, finds this to be artificial at best and ill-founded at worst (see K&B quote of Harris p. 20). On a corpus-based view of Gs, sentences are expected to vary in acceptability along a cline (reflecting, for example, how likely they are to be found in a certain text environment). After all, Gs are just compact representations of precisely such facts. And this runs together all sorts factors that appear diverse from the standard GG perspective. As K&B put it (21):
Statements of the likelihood of new forms occurring under certain conditions must express every feature of the situation that exerts an influence on likelihood of occurrence. This means that all sorts of grammatically extraneous features are reflected on a par with genuine grammatical constraints. For example, complexity of constituent structure, length of sentences, social mores, and so on often exerts a real influence on the probability that a certain n-tuple of morphemes will occur in the corpus.
Or to put this another way: a Rationalist conception of G allows for a “sharp and absolute distinction between the grammatical and the ungrammatical, and between the competence principles that determine the grammatical and anything else that combines with them to produce performance” (29). So, a Rationalist conception understands linguistic performance to be a complex interaction effect of discrete interacting systems. Grammaticality does not track the linguistic environment. Linguistic experience is gradient. It does not reflect the algebraic nature of the underlying sub-systems. ‘Acceptability’ tracks the gradiance, ‘grammaticality’ the discrete algebra. Confusing the two threatens a return to structuralism and its attendant Empiricism.
Let me mention one other point that K&B makes that I found very helpful. They outline what a Structuralist discovery procedure (DP) is (15). It is “explicit procedures for segmenting and classifying utterances that would automatically apply to a corpus to organize it in a form that meets” four conditions:
1. The G is a hierarchy of classes; lower units being temporal segments of speech event, the higher are classes or sequences of classes.
2. The elements of each level are determined by their distributional features together with their representations at the immediately lower level.
3. Information in the construction of a G flows “upward” from level to level, i.e. no information at a higher level can be used to determine an analysis at a lower level.
4. The main distributional principles for determining class memberships at level Li are complementary distribution and free variation at level Li-1.
Noting the structure of a discovery procedure (DP) in (1-4) allows us to appreciate why Chomsky stressed the autonomy of levels in his early work. If, for example, the syntactic level is autonomous (i.e. not inferable from the distributional properties of other levels) then the idea that DPs could be adequate accounts of language learning evaporates. And once one focuses on the rules relating articulation and interpretation the plausibility of a DP for language with the properties in (1-4) becomes very implausible, or, as K&B nicely put it (33):
Given that actual speech is so messy, heterogeneous, fuzzy and filled with one or another performance error, the empiricist’s explanation of Chomskyan rules, as having been learned as a purely inductive generalization of a sample of actual speech is hard to take seriously to say the very least.
So, if one is an Empiricist, then one will have to deny the idea of a G as a rule based system of the GG variety. Thus, it is no surprise that Empiricists discussing language like to emphasize the acceptability gradients characteristic of actual speech. Or, to put this in terms relevant to the discussion above, why Empiricists will understand ‘grammaticality’ as the limiting case of ‘acceptability.’
Ok, this post, once again, is far too long. Look at the paper. It’s really good and useful. It also is a useful prophylactic against recurring Empiricism and, unfortunately, we cannot have too much of that.
 Sadly, pages 18-19 are missing from the online version. It would be nice to repair this sometime in the future. If there is a student of Tom’s at U of Arizona reading this, maybe you can fix it.
 I would note that in this context corpus linguistics makes sense as an enterprise. It is entirely unclear whether it makes any sense once one gives up this structuralist perspective and adopts a Chomsky view of Gs and transformations. Furthermore, I am very skeptical that there exist Harris-like regularities over texts, even if normalized to kernel sentences. Chomsky’s observation that sentences are not “stimulus bound” if accurate (and IMO they are) undermine the view that we can say anything at all about the distributions of sentences in texts. We cannot predict with any reliability what someone will say next (unless, of course, it is your mother), and even if we could in some stylized texts, it would tell us nothing about how the sentence could be felicitously used. In other words, there would be precious little generalizations across texts. Thus, I doubt that there is any interesting statistical regularities regarding the distribution of sentences in texts (at least understood as stretches of discourse).
Btw, there is some evidence for this. It is well known that machines trained on one kind of corpus do a piss poor job of generalizing to a different kind of corpus. This is quite unexpected if figuring out the distribution of sentences in one text gave you a good idea of what would take place in others. Understanding how to order in a restaurant or make airline reservations does not carry over well to a discussion of Trump’s (and the rest of the GOP’s) execrable politics, for example. A long time ago, in a galaxy far far away I very polemically discussed these issues in a paper with Elan Dresher that still gives me chuckles when I read it. See Cognition 1976, 4, pp.32l‑398.
 That this schema looks so much like those characteristic of Deep Learning suggests that it cannot be a correct general theory of language acquisition. It just won’t work, and we know this because it was tried before.
 That speech is messy is a problem, but not the only problem. The bigger one is that there is virtually no evidence in the PLD for may of G properties (e.g. ECP effects, island effects, binding effects etc.). Thus the data is both degenerate and deficient.