Faculty of Language: ‘I’ before ‘E’: asterisks and ungenerability

If (1) is a sentence, then what is (2)? More importantly, what does the asterisk in (2) signify?

(1) A doctor might have been there

(2) *A doctor might been have there

And do the asterisks in (2-5) have the same significance?

(3) *Was the guest who fed waffles fed the parking meter?

(4) *Colorless green ideas sleep furiously

(5) *The rat a cat the dog chased chased hid

In conversations, I’ve been assured that practitioners know what they’re doing, and that only troublemakers ask such questions. But often enough, my assurers disagree about what the standard “data marks” mean, especially if asked how ‘*’ differs from ‘#’ and ‘??’. I’m certain that many good students are, quite understandably, confused about how these marks get used. And as a philosopher, I get paid to make trouble. (It’s nice work if you can get it.)

Chapter one of Aspects is, in my view, very clear and very right about the distinction between acceptability and grammaticality. So for present purposes, I’ll take it as given that asterisks indicate a kind of unacceptability that ordinary speakers can detect and report, while grammaticality is a theoretical notion that ordinary speakers may not have (pace Michael Devitt’s Ignorance of Language). But one wants to know how data regarding unacceptability can be evidence for or against theories of implementable procedures that connect articulations with meanings. For example, how does the oddity of (3) bear on theories of Human I-Languages?

As previously discussed, (3) has the meaning of (3a), which is weird but grammatical.

(3a) *The guest who fed waffles was fed the parking meter?

But (3) would be acceptable if it could be understood as having the meaning of (3b).

(3b) #The guest who was fed waffles fed the parking meter?

Here, ‘#’ indicates that (3) cannot be understood as (3b) is understood. Theorists can describe this point about (3) in terms of a grammatical constraint: the auxiliary verb ‘was’ cannot be displaced from the relative clause ‘who was fed waffles’; though it can be displaced from the verb phrase ‘was fed the parking meter’. But whatever the pronunciation, it’s weird to talk about feeding waffles or feeding a guest a parking meter. And since (6) is fine,

(6) Was the guest who fed the parking meter fed waffles?

it seems clear that (3) has a grammatical reading, viz. (3a). But then the unacceptability of (3) reflects a cluster of facts. The perfectly fine thought indicated with the acceptable sentence (3b) can’t be expressed with (3); and the thought that can be expressed, indicated with (3a), is weird.

In one respect, (3) is like (4): grammatical and hence meaningful as opposed to gibberish, yet apt to elicit a “bizarreness reaction” in competent speakers who know some things about waffles, guests, ideas, sleep, etc. In another respect, (3) is like (2), which can’t be understood as having the meaning of (1). It’s quite interesting that (2) is nearly word salad, as opposed to a comprehensible second way of expressing the thought expressed with (1). Compare (7),

(7) *The child seems sleeping

which is a degraded but still comprehensible way of saying that the child seems to be sleeping.

Moral: if (3) is like (4) in one respect, and like (2) in another respect, then it’s important to distinguish sources of unacceptability. But it’s hard to see how one can make the requisite distinctions without positing a procedure that generates articulation-meaning pairs, where the generable meanings are in turn related to mental representations that may or may not be reasonable depictions of language-independent reality. To repeat, the “data point” noted with the asterisk in (3) reflects a cluster of underlying facts: the perfectly reasonable question indicated with (3b) cannot be expressed with (3); and the expressable query, indicated with (3a), is bizarre.

Famously, the sources of unacceptability differ across (2), (4), and (5). It turns out that (5) is a sentence that can be paraphrased with (5a), which is long and awkward, but not crazy.

(5a) The rat that was chased by a cat which the dog chased (was a rat that) hid

Prima facie, the anomaly of (5) has to do with memory limitations and center embedding, as opposed to either constraints on generability of expressions or the kind of “conceptual boggle” that attends thoughts of feeding waffles. This reminds us that Human I-languages can and evidently do (generatively) connect pronunciations with meanings that may go unrecognized by those who have the I-languages, even in cases that involve a relatively small number of words. As Chomsky also noted, it takes works to hear all the possible readings of (8).

(8) I almost had my wallet stolen.

Though the point regarding (3) is a little different. Given a way of classifying examples like (5) as “grammatical but hard to parse,” one might set such data points aside and try to construct an algorithm that classifies other strings as acceptable or not. But examples like (4) remind us that some unacceptable strings are easily parsed “linearizations” of generable expressions that exhibit perfectly fine grammatical structure. Of course, as (2) illustrates, many unacceptable strings are not linearizations of any generable expressions. So if the aim is to provide a theory of human linguistic understanding, one needs to specify an algorithm that pairs the pronunciation of (4) with its meaning and doesn’t pair the pronunciation of (2) with any meaning that it doesn’t have.

Put another way, (2) presents a special case of unambiguity: zero meanings, as opposed to even one; whereas (3) has one meaning but not two. That is, (2) is not the linearization of any generable structured expression that connects the pronunciation in question with a meaning. It’s not that (2) is a sentence, albeit an unacceptable one. That’s the situation with regard to (3). But the more interesting thing about (3) is that its pronunciation isn’t linked to the meaning of (3b). And the interesting thing about (2) is that its pronunciation isn’t linked to the meaning of (1).

It’s also true that the pronunciation of (2) isn’t linked to the meaning of (5a). But that’s hardly surprising. For any word-string, there are endlessly many meanings it doesn’t have. Yet some of those non-meanings can be built up from the word meanings in ways that initially seem no more complicated than the ways that actual expression meanings are built up from word meanings. So absence of homophony, with word salad as a special case, can provide valuable clues about the procedures that generate pronunciation-meaning pairs. So if Human Languages are such procedures, then the data that linguists typically use can indeed be used as evidence for/against theories of Human Languages. Those who don’t adopt an I-language perspective need to say what their target of inquiry is such that it still makes sense to be using the same data.

We probably also need a graded, multi-dimensional notion of grammaticality. For me, (9) is OK though marginal on its only reading. (Did someone find a helpful dog for any vet?)

(9) Was a vet found a dog that helped?

Perhaps (9) deserves a mark less harsh than an asterisk. But in any case, the clear unacceptability of (10) reflects the unavailability of meaning (10a) and the unacceptability of (10b).

(10) *Was a vet helped a dog that found?

(10a) #A vet helped a dog that was found?

(10b) *A vet was helped a dog that found?

The unacceptability of ‘was helped a dog that found’, unlike that of ‘fed waffles’, may well be due to grammar. But to make such distinctions, we need to talk about constrained generative procedures, not conditions on logically possible outputs that might or might not be generable.

Faculty of Language

Comments

Sunday, November 18, 2012

‘I’ before ‘E’: asterisks and ungenerability

No comments:

Post a Comment

Contributors