Faculty of Language: What's in UG (part 1)?

Wednesday, October 7, 2015

What's in UG (part 1)?

This is the first of three posts on a forthcoming Cognition paper arguing against UG. The specific argument is against the Binding Theory. But the form is intended to generalize. The paper is written by excellent linguists, which is precisely why I spend three posts exposing its weaknesses. The paper, because it will appear in Cognition, is likely to be influential. It shouldn’t be. Here’s the first of three posts explaining why.

Let’s start with some truisms: not every property of a language particular G is innate. Here’s another one: some features of G reflect innate properties of the language acquisition device (LAD). Let’s end with a truth (that should be a truism by now but is still contested by some for reasons that are barely comprehensible): some of the innate LAD structure key to acquiring a G is linguistically dedicated (i.e. not cognitively general (i.e. due to UG)). These three claims should be obvious. True truisms. Sadly, they are not everywhere and always recognized as such. Not even by extremely talented linguists. I don’t know why this is so (though I will speculate towards the end of this note), but it is. Recent evidence comes from a forthcoming paper in Cognition (here) by Cole, Hermon and Yanti (CHY) on the UG status of the Binding Theory (BT).[1] The CHY argument is that BT cannot explain certain facts in a certain set of Javanese and Malay dialects. It concludes that binding cannot be innate. The very strong implication is that UG contains nothing like BT, and that even if it did it would not help explain how languages differ and how kids acquire their Gs. IMO, this implication is what got the paper into Cognition (anything that ends with the statement or implication that there is nothing special about language (i.e. Chomsky is wrong!!!) has a special preferential HOV lane in the new Cognition’s review process). Boy do I miss Jacques Mehler. Come back Jacques. Please.

Before getting into the details of CHY, let’s consider what the classical BT says.[2] It is divided into three principles and a definition of binding:

A. An anaphor must be bound in its domain

B. A pronominal cannot be bound in its domain

C. An R-expression cannot be bound

(1) An expression E binds an expression E’ iff E c-commands E’ and E is co-indexed with E’.

We also need a definition of ‘domain’ but I leave it to the reader to pick her/his favorite one. That’s the classical BT.

What does it say? It outlines a set of relations that must hold between classes of grammatical expressions. BT-A states that if some expression is in the grammatical category ‘anaphor’ then it must have a local c-commanding binder. BT-B states that if some expression is in the category ‘pronominal’ then it cannot have a local c-commanding binder. And BT-C states, well you know what it states, if…

Now what does BT not say? It says nothing about which phonetically visible expressions fall into which class. It does not say that every overt expression must fall into at least one of these classes. It does not say that every G must contain expressions that fall into these classes. In fact, BT by itself says nothing at all about how a given “visible” morphologically/phonetically visible expression distributes or what licensing conditions it must enter into. In other words, by itself BT does not tell us, for example, that (2) is ungrammatical. All it says is that if ‘herself’ is an anaphor then it needs a binder. That’s it.

(2) John likes herself

How then does BT gain empirical traction? It does so via the further assumption that reflexives in English are BT anaphors (and, additionally, that binding triggers morphologically overt agreement in English reflexives). Assuming this, ‘herself’ is subject to principle BT-A and assuming that John is masculine, herself has no binder in its domain, and so violates BT-A above. This means that the structure underlying (2) is ungrammatical and this is signaled by (2)’s unacceptability.

As stated, there is a considerable distance between a linguistic object’s surface form and its underlying grammatical one. So what’s the empirical advantage of assuming something as abstract as the classical BT? The most important reason, IMO, is that it helps resolve a critical Poverty of Stimulus (PoS) problem. Let me explain (and I will do this slowly for CHY never actually explains what the specific PoS problem in the domain of binding is (though they allude to the problem as an important feature of their investigation), and this, IMO, allows the paper to end in intellectually unfortunate places).

As BT connoisseurs know, the distribution of overt reflexives and pronouns is quite restricted. Here is the standard data:[3]

(3) a. John₁ likes herself_1/*₂

b. John₁ likes himself_1/*2

c. John₁ talked to Bill₂ about himself_1/2/*3

d. John₁ expects Mary₂ to like himself_*1/*2/*3

e. John₁ expects Mary₂ to like herself_*1/2/*3

f. John₁ expects himself_1/*2/*3 to like Mary₂

g. John₁ expects (that) he/himself_*1/*2/*3 will like Mary₂

If we assume that reflexives are BT-A-anaphors then we can explain all of this data. Where’s the PoS problem? Well, lots of these data concern what cannot happen. On the assumption that the ungrammatical cases in (3) are not attested in the PLD, then the fact that a typical English Language Acquisition Device (LAD, aka, kid) converges on the grammatical profile outlined in (3) must mean that this profile in part reflects intrinsic features of the LAD. For example, the fact that kids do not generalize from the acceptability of (3f) to conclude that (3g) should also be acceptable needs to be explained and it is implausible that the LAD infers that that this is an incorrect inference by inspecting unacceptable sentences like (3g), for being unacceptable they will not appear in the PLD.[4] Thus, how LADs come to converge to Gs that allow the good sentences and prevent the bad ones looks like (because it is) a standard PoS puzzle.

How does assuming that BT is part of UG solve the problem? Well, it doesn’t, not all by itself (and nobody ever thought that it could all by itself). But it radically changes it. Here’s what I mean.

If BT is part of UG then the acquisition problem facing the LAD boils down to identifying those expressions in your language that are anaphors, pronominals and R-expressions. This is not an easy task, but it is easier than figuring this out plus figuring out the data distribution in (3). In fact, as I doubt that there is any PLD able to fix the data in (3) (this is after all what the PoS problem in the binding domain consists in) and as it is obvious that any theory of binding will need to have the LAD figure out (i.e. learn) using the PLD which overt morphemes (if any) are BT anaphors/pronominals (after all, ‘himself’ is a reflexive in English but not in French and I assume that this fact must be acquired on the basis of PLD) then the best story wrt Plato’s Problem in the domain of binding is where what must obviously be learned is all that must be learned. Why? Because once I know that reflexives in English are BT anaphors subject to BT-A then I get the knowledge illustrated by the data in (3) as a UG bonus. That’s how PoS problems are solved.[5] So, to repeat: all the LAD needs do to become binding competent is figure out which overt expressions fall into which binding categories. Do this and the rest is an epistemic freebie.

Furthermore, it’s virtually certain that the UG BT principles act as useful guides for the categorization of morphemes into the abstract categories BT trucks in (i.e. anaphor, pronominal, and R-expression). Take anaphors. If BT is part of UG it provides the LAD with some diagnostics for anaphoricity. Anaphors must have antecedents. They must be local and high enough. This means that if the LAD hears a sentence like John scratched himself in a situation where John is indeed scratching himself then he has prima facie evidence that ‘himself’ is a reflexive (as it fits A constraints). Of course, the LAD may be wrong (hence the ‘prima facie’ above). For example, say that the LAD also hears pairs of sentences like John loves Mary. She loves himself too and ‘himself’ here is anaphoric to John, then the LAD has evidence that reflexives are not just subject to BT-A (i.e. they are at best ambiguous morphemes and at worst not subject to BT-A at all). So, I can see how PLD of the right sort in conjunction with an innate UG provided BT-A would help with the classification of morphemes to the more abstract categories using simple PLD in the.[6] That’s another nice feature of an articulate UG.

Please observe: on this view of things UG is an important part of a theory of language learning. It is not itself a theory of learning. This point was made in Aspects, and is as true today as it was then. In fact, you might say that in the current climate of Bayesian excess that it is the obvious conclusion to draw: UG limns the hyporthesis space that the learning procedure explores. There are many current models of how UG knowledge might be incorporated in more explicit learning accounts of various flavors (see Charles Yang’s work or Jeff Lidz’s stuff for some recent general proposals and worked out examples).

Does any of this suppose that the LAD uses only attested BT patterns in learning to classify expressions? Of course not. For example, the LAD might conclude that ‘itself’ is a BT-A anaphor in English on first encountering it. Why? By generalizing from forms it has encountered before (e.g. ‘herself’, ‘themselves’). Here the generalization is guided not by UG binding properties but by the details of English morphology. It is easy to imagine other useful learning strategies (see note 6). However, it seems likely that one way the LAD will distinguish BT-A from BT-B morphemes will be in terms of their cataphoric possibilities positively evidenced in the PLD.

So, BT as part of UG can indeed help solve a PoS problem (by simplifying what needs to be acquired) and plausibly provides guide-posts towards that classification. However, BT does not suffice to fix knowledge of binding all by itself nor did anyone ever think that it would. Moreover, even the most rabid linguistic nativist (I know because I am one of these) is not committed to any particular pattern of surface data. To repeat, BT does not imply anything about how morphemes fall into any of the relevant categories or even if any of them do or even if there are any relevant surface categories to fall into.

With this as background, we are now ready to discuss CHY. I will do this in the next post.

[1] I have been a great admirer of both Cole and Hermon’s work for a long time. They are extremely good linguists, much better than I could ever hope to be. This paper, however, is not good at all. It’s the paper, not the people, that this post discusses.

[2] I will discuss the GB version for this is what CHY discusses. I personally believe that this version of BT is reducible to the theory of movement (A-chain dependencies actually). The story I favor looks more like the old Lees & Klima account. I hope to blog about the differences in the very near future.

[3] As GGers also know, the judgments effectively reverse if we replace the reflexive with a bound pronoun. This reflects the fact that in languages like English, reflexives and bound pronouns are (roughly) in complementary distribution. This fact results from the opposite requirements stated in BT-A and BT-B. The same effect was achieved in earlier theories of binding (e.g. Lees and Klima) by other means.

[4] From what I know, sentences like (3g) are unattested in CHILDES. Indeed, though I don’t know this, I suspect that sentences with reflexives in ECM subject position are not a dime a dozen either.

[5] I assume that I need not say that once one figures out which (if any) of the morphemes are pronominals then BT-B effects (the opposite of those in (3) with pronouns replacing reflexives) follow apace. As I need not say this, I won’t.

[6] Please note that this is simply an illustration, not a full proposal. There are many wrinkles one could add. Here’s another potential learning principle: LADs are predisposed to analyze dependencies in BT terms if this is possible. Thus the default analysis is to treat a dependency as a BT dependency. But this principle, again, is not an assumption properly part of BT. It is part of the learning theory that incorporates a UG BT.

30 comments:

Alex DrummondOctober 7, 2015 at 2:51 PM
The money quote appears to be:

“If this analysis of awake dheen is correct, it constitutes a serious challenge for UG-based approaches to Binding. The presence in a language of a form that is used anaphorically but which is exempt from the Binding requirements of UG would impose a considerable burden on the child acquiring the language. The problem is that a child learning to speak a language would need to learn which forms that are functionally anaphoric in their use are subject to UG principles of Binding and which are not. The existence of UG sanctioned categories for anaphora simplifies learning only if all anaphoric elements (in the nontechnical sense of ‘‘anaphoric” that includes both pronouns and reflexives) are subject to UG principles.”

I can’t really make much sense of this, especially the last sentence. I’m surprised that CH&Y didn’t mention Turkish complex reflexives, which have been known at least since the late 80s to apparently not obey any syntactic binding requirements. A pretty reasonable proposal about how reflexives of this sort work is that they have a complex syntactic structure containing a pronominal, so that the pronominal is shielded from local binding (Kornfilt 2001). This analysis doesn’t require modifying UG to permit the existence of pronouns that are subject neither to Condition A nor Condition B. Some other instances where “shielding” has a more transparent morphological realization are mentioned in Reuland (2001:482). [I’m sure there are much earlier references for some of this stuff; I’m just citing what I know.] Anyway, I don’t see why it would be a big deal if UG were to sanction pronouns that are subject to neither Condition A nor Condition B. It’s not as if CH&Y claim to have found pronouns which obey syntactic constraints on their distribution-under-a-given-interpretation other than those conditions. So the kid still knows that if there is a syntactic constraint at work, it’s Condition A or Condition B.
ReplyDelete
Replies
ewanOctober 8, 2015 at 6:15 AM
I'm pessimistic about their result too. It would be one thing if they showed that every logically possible kind of pronominal was possible. They don't. They just show that there are things that refer back that aren't Principle A type or Principle B type. (I thought this was what ziji was? So didn't we already know this?) It would be one thing to show that there aren't any patterns _at all_ in the kinds of pronominals we see, but all this shows is that there's an item somewhere that doesn't fit the pattern. So I preface by saying I think we're on the same page in terms of the conclusion not matching the premises.

But.

I also think that your statement that "it’s virtually certain that the UG BT principles act as useful guides for the categorization of morphemes into the abstract categories BT trucks in" is wrong. It seems to me that Naho Orita's 2013 Cog Sci casts that in serious doubt.

http://ling.umd.edu/~naho/orita_cogsci2013.pdf

The paper tried to build a model that learned categories of pronominal morphemes in terms of what syntactic positions their antecedents could be in (syntactic position coded as to whether it was local and/or c-commanding or not). This turned out to be hard. But, contrary to our intuitions the strategy that worked as a useful guide was _not_ BT. BT - forget just the categories, the full monty - didn't improve anything. Saying "there are two kinds of pronouns, one with local-c-commanding antecedents **and these must be reflexive** and another with non-local/c-commanding antecedents **and these must not be reflexive** did not work.

Now we know that it's possible to succeed with this model. If you take the items, give some context in a dialogue, and remove the pronouns, and then get people to tell you, based on the discourse context, what kind of pronoun they expect, then you have, on an item by item basis, some sense of what people's expectations for the reflexive/non-reflexive meaning of the sentence was.

So, briefly then - the result was that having a guess at the meaning, broad strokes, of a sentence was _sufficient_ to learn that, syntactically, there were two categories of pronouns. Binding principles were not.

Is having a good guess at the referent a realistic assumption for kids? I'm happy to give that a categorical no. But is it "virtually certain" that BT would help when they don't? By no means. This model says no.
ReplyDelete
Replies
UnknownOctober 8, 2015 at 9:14 AM
At the risk of asking a question that has a very obvious answer, what is the evidence that the stimulus is poor to begin with? Can't you learn which anaphora must be c-commanded by a co-indexed noun phrase and which ones can't just by looking at a parsed corpus? E.g., do we know that there aren't enough examples to notice that "himself" is always in the former category and "he" is always in the latter?
ReplyDelete
Replies
halOctober 9, 2015 at 8:00 AM
Hi Norbert -- thanks for relatively the clear post!

However, I don't understand this bit "On the assumption that the ungrammatical cases in (3) are not attested in the PLD, then the fact that a typical English Language Acquisition Device (LAD, aka, kid) converges on the grammatical profile outlined in (3) must mean that this profile in part reflects intrinsic features of the LAD."

In particular, what's the logic that's being employed in "must mean"? It seems like there are some pretty strong assumption's about how the LAD is working in order for this to be a valid conclusion.

Here's the part I get. We assume (A) kids don't hear the ungrammatical versions in (3) (or at least hear them rarely as noise). We assume (B) kids do figure out which are grammatical and which are ungrammatical (seems right). But then we want to conclude that the kid has some built in inductive bias that helps them with (B). That's the part I don't follow.

(Sorry for being dense -- I'm really trying to understand.)
ReplyDelete
Replies
Olaf K.October 30, 2015 at 2:00 AM
This comment has been removed by the author.
ReplyDelete
Replies
Olaf K.October 30, 2015 at 3:59 AM
I have a question about the POS argument related to (3g), here split up in two examples:

(3g-i) *John expects that himself will like Mary
(3g-ii) *John expects that heself will like Mary

The idea here is that it is impossible to figure out the ungrammaticality of these examples without some prior BT knowledge. However, for (3g-i) one could say that you have a non-nominative subject in a finite subject position, the impossibility of which has to be acquired independent of BT data. So it’s at least not so clear this is a case of binding POS. For (3g-ii), the question arises why “heself” is an impossible word. One answer could be BT (perhaps with the locality parameter switched to “finite clause”), which forbids nominative anaphors. But is this the only conceivable explanation? Is it conceivable that the impossibility of “heself” reflects a conservative learning strategy pertaining to the acquisition of morphology more generally? “Self” attaches to accusative pronouns only, much like –ize attaches to adjectives only. Or do we want to see a POS problem in the acquisition of –ize too? Just want to be sure that we have tried everything else :)
ReplyDelete
Replies

Add comment

Faculty of Language

Comments

Wednesday, October 7, 2015

What's in UG (part 1)?

30 comments:

Contributors