Monday, October 19, 2015

What's in UG (part 3)

Here is the third and final post on the CHY paper (see here and here).

The second CHY argument goes as follows: (i) the clear categorical complementary distribution of BT-anaphors and pronominals that one finds in languages like English is merely a preference in other languages and (ii) ungrammaticality implies categorical unacceptability. In other words, mere preference (i.e. graded acceptability) is a sure indicator that the acceptability difference cannot reflect G structure.[1] This argument form is one that we’ve encountered before (see here) and it is no more compelling here than it was there, or so I will again argue. Let’s go to the videotape for details.

What is the CHY case of interest. It describes two dialects of Malay. In one the categorical judgments found in English are replicated (call this M-1). In the other, the same kinds of sentences evoke preference judgments rather than categorical judgments (call this M-2). The argument is that because Gs only license categorical judgments, M-2’s preferences cannot be explained Gishly. But as M-1 and M-2 are so similar, then whatever account offered for one must extend to the other. Thus, because the account of M-2 cannot be a Gish one, the account of M-1 can’t be either. That’s the argument. Not very good unless one accepts that categorical (un)acceptability is a necessary property of (un)grammaticality. Reject this and the argument goes nowhere. And reject this we should. Here’s why.

The basic judgment data that linguists use involve relative acceptability (usually under an interpretation). Sometimes, the relevant comparison class is obvious and the data is so clean that we can treat the data as categorical (as I argued here, I think that this is not at all uncommon). However, little goes awry if judgment data is glossed in terms of relative acceptability and virtually all the data can be so construed. Now, in these terms, the perception that some judgments (acceptability under an interpretation in this case) are preferable to others is perfectly serviceable. And it may (need not, but may) reflect underlying Gish properties. It will depend on the case at hand.

I mention this because, as noted, CHY describes M-1 and M-2 as making the same distinctions but with M-1 judgments being categorical and M-2 being preferences. CHY concludes that these should be treated in the same way. I agree. However, I do not see how this implies that the distinction is non-grammatical, unless one assumes that preferences cannot reflect underlying grammatical form.  CHY provides no argument for this. It takes it as obvious. I do not.

Is there anything to recommend CHY’s assumption? There is one line of reasoning that I can think of. How is one to explain gradient acceptability (aka preference) if one takes grammaticality to be categorical?  This is the question extensively discussed here (see the discussion thread in particular) and here. The problem in the domain of island phenomena is that even when we find the diagnostic marks of islandhood (super additivity effects) and we conclude that there is “subliminal” ungrammaticality, we are left asking why for speakers of one G the effects of ungrammaticality manifest themselves in stronger unacceptability judgments than for speakers of another G. In other words, why if island violations are ungrammatical do some find them relatively acceptable? The same question arises in the case that binding data that CHY discusses. And in both instances the question raised is a good one. What kind of answer should/might we expect?

Here’s a proposal: we should expect ungrammaticality ceteris paribus to get reflected in categorical judgments of unacceptability. However, ceteri are seldom paribused.  We know that lots goes into an acceptability judgment and it is hard to keep all things equal.  So for example, it is not inconceivable that sentences with many probable parses are more demanding performance wise than those without. More concretely, imagine a sentence where the visible functional surface vocabulary (FSV) fails to make clear what the underlying structure is. I am assuming, as is standard, that functional vocabulary can be a guide to underlying form (hence the relative “acceptability” of Jabberwocky). Say that in some languages the underlying surface morphology is more closely correlated to the underlying syntactic categories than in others. And say that this creates problems mapping from the utterance to the underlying G form. And say that this manifests itself in muddier (un)acceptability judgments. To say this another way; the less ambiguous the mapping from surface forms to underlying forms the more categorical the judgment will be. If something like this is right, then if we find a language where the morphology does not disambiguate BT-anaphors from exempt anaphors then we might expect acceptability to be less than categorical. Think of it as the acceptability judgment averaging over the two G possibilities (maybe a weighted average). On this scenario, then the absence of a “dedicated reflexive” form (see CHY p. 9) in M-2 will make it harder to apply BT than in a language where there is a dedicated form, as in M-1. Note, this is consistent with the assumption that in both languages the G distinguishes well-formed forms from ill-formed forms. However, it is harder to “see” this in M-2 given the obscurity of the surface FSVs than it is in M-1 where the distinction has been “grammaticalized.”[2]

I mention this option for it is consistent with everything that CHY discusses and, as I hope is evident, it leaves the question of the UG status of BT untouched. In short, in this particular case it is easy enough to cook up an explanation for why binding judgments in M-2 are murkier than those in M-1 without assuming that both reflect the operations of a common UG.[3] Thus, the CHY conclusion is not only based on a debatable premise, but in this particular case there is a pretty obvious way of explaining why the two dialects might provide different acceptability judgments. I should also add that this little story I’ve provided is more than CHY does. Here’s what I mean.

Curiously CHY does not explain how M-1 and M-2 are related except to say that M-1 has grammaticalized a distinction that M-2 has not. Which? M-1 has grammaticalized the notion notion of anaphor. What kind of process is “gramamticalizing”?  CHY does not say. It does not provide an account of what grammaticalization actually is, it only points to some of its effects in Malay and suggests that this goes on in creolization. Nor does CHY explain how undergoing grammataicalization renders preferences in the pre-grammaticalization period categorical in the post grammaticalization period? In CHY “grammaticalization” is Voltarian.[4] Let me offer a proposal of what gramamticalization is (actually this is implicit in CHY’s discussion).

Here’s one proposal: grammaticalization involves sharpening the FSV so that it more directly reflects the underlying G structure. In other words, grammaticalization is a process that aligns surface functional vocabulary with underlying grammatical forms. It might even be the case that language change is driven to sharpen this alignment (though I doubt that the force is very strong (personal opinion) Why? Because FSVs muddies overtime as well as sharpens and lots of FSV is very misleading). But if this is what grammaticalization is, then it can hardly challenge the UG nature of BT as it presupposes it. Grammaticalization is the process whereby the underlying categories of FL/UG act as attractors for overt functional morphology (i.e. LADs try to treat visible functional as reflecting underlying G categories and so over time the surface functional vocabulary will come to (more) perfectly delineate UG cleavages). In fact on this view, CHY, inadvertently, argues FOR the UG nature of BT for it assumes that grammaticalization is the operative process linking M-1 and M-2.

As noted, CHY does not explain what grammaticalization is (nor, to my knowledge has anybody else), though it does note what drives it. It is the usual suspect in such cases; the facilitation of processing (p. 17).  Unfortunately, even were this so (and I am skeptical that this actually means anything), it leaves unexplained how languages like M-2 could exist. After all, if processing ease is a good thing, then why should only M-1 partake?  The answer must be that something stops it from enjoying the fruits of parsing efficiency. What might this be? Well, how about the fact that the PLD only murkily maps the binding relevant FSVs (i.e. the surface forms of the anaphoric morphemes) onto the relevant underlying grammatical categories. But, as noted, if this is what grammaticalization is and what it does, then it is not merely compatible with the view that BT is part of UG and that it is innate, but virtually presupposes that something like this must be the case. Attractors cannot attract without existing.

Let me end here, with a diagnosis of what I take the fundamental error that drives the CHY discussion to be. It is not a new mistake, but one that is, sadly, endemic. It rests on the confusion between Greenberg and Chomsky Universals. CHY assumes that BT aims to catalogue surface distribution of overt morphemes. On this construal, BT is indeed not universal (as even a rabid nativist like me would concede). It is clearly not the case that languages all distinguish overt morphological categories subject to different BT principles. Some languages don’t clearly have a demarcated distinction between overt anaphors or pronominals among their FSVs, some don’t even have dedicated overt functional forms for reflexivization or pronominalization. If one understands UG as committing hostages to surface functional morphology, then CHY is right that BT is not universal. However, this is not how GGers ever understood (or, more accurately, ever should have understood) UG and universals. Chomsky universals are not Greenberg universals. They are more abstract and can be hard to discern from the surface (btw, this is what makes them interesting, IMO). Thus criticizing BT because it is wrong when understood in Greenbergian terms is not much of a fault given that it was not supposed to be so understood (i.e. another Dan Everett moment (see here)).  What is surprising is that the distinction between the two kinds of universals seems so difficult for linguists to grasp. Why is this?

Here’s an unfair (though I believe close to accurate) speculation: it results from the confluence of two powerful factors (i) the attraction of Empiricist conceptions of learning and (ii) the fascination with language diversity.

The first is a horse that I have hobbied on many times before. If you think that acquisition is largely inductive then universals without clear surface reflexes are a challenging concept. Being Eish with a taste for universals leads one to naturally erroneously understand Chomsky universals as Greenbrg universals (as Greenberg universals are the only ones that Eism tolerates).

The second force leading to the confusion between Greenberg and Chomsky universals comes from a fascination with linguistic variation (clearly something that is at the center of CHY). FL/UG rests on the idea that underlyingly there is very little real G variation. If one’s interest is in variation, then this notion of UG will seem way off track. Just look at all the differences! To be told that this just surface morphology will seem unhelpful at best and hostile at worst. The natural response is to look for helpful typological universals and these, not surprisingly will be Greenbergian. Here the generalizations concern surface patterns, as do typological differences if Chomsky’s conception of FL/UG is on the right track. Typological interests do not require embracing a Greenberg conception of universals (unlike a commitment to Eism, which does). However, it is, I believe, a constant temptation. The fact is that a Chomskyan conception of UG is consistent with the view that there are very few (if any) robust typological (i.e. surface true) universals. UG in Chomsky's sense doesn’t need them. It just needs a way of mapping overt forms to underlying forms. In other words, UG needs to be coupled with a theory of acquisition, but this theory does not require that there be surface true universals. Of course, there may be some but they are not conceptually required.

So that’s it. CHY’s conclusions only follow from a flawed understanding of what a universal is and what UG enjoins. The argument is not very good. Sadly, it might well be influential, which is why I spent so much effort trying to dismember it. It appeared in an influential cog sci journal. It will be read as undermining the notion of UG and the relevance of PoS reasoning. It will do so not because the arguments are sound but because this is a welcome conclusion to many. I strongly suggest that GGers educate their psycho counterparts and explain to them why Cognition has once again failed to understand what Chomskyan linguistics is all about. I also suggest that understanding a PoS argument be placed at the center of the field’s pedagogical concerns. It really helps to know how to construct one.

[1] I have no idea why this assumption is so robust among linguists. I don’t believe that anyone ever argued the case and many argued that it was not. So, for example, Chomsky explicitly denies this that one could operationalize grammaticality in terms of (categorical, or otherwise) judgments of acceptability (see, e.g. Current Issues: 7-9 and chapter 3). In fact, there is little reason to believe that there can be operational criteria of FL/UG notions of grammaticality, as holds true for any interesting abstract scientific notions (See Current Issues: 56-7).  If this is correct, then systematic preference judgments might be just as revealing of underlying grammatical form as categorical judgments. At any rate, the assumption that preferences exclusively reflect extra grammatical factors is tendentious. It really depends.
[2] I return on a moment to explicate this term.
[3] It is actually harder to come up with a good story for variable island effects, though as I’ve mentioned before I believe that Kush’s ambiguity hypothesis is likely on the right track.
[4] As in: why does this morphine put you to sleep? In virtue of its dormitive powers. 
I should add that CHY needs to offer an account of what the process consists in at pains of undermining its main argument. The argument is that M-1 and M-2 are sensitive to the same distinction. But this distinction cannot be a Gish one because Gish ones result on categorical judgments. But the M-2 judgments are not categorical therefore the distinction cannot be grammatical. How then to explain m-1 judgments? Well they are categorical because the non G distinction in M-2 has been grammaticalized in M-1. This invites the obvious question: what’s the output of the process of grammaticalization? It sounds like the end product is to render the distinction a grammatical one. But if this is so, then the premise that the distinction is the same in both M-1 and M-2 fails for what is a non-grammatical distinction in M-2 is a grammatical distinction in M-1. The only way to explicate what is going on in the argument is to specify what grammaticalization is and what it does. CHY does not do this.