This is the first of three posts on a forthcoming Cognition paper arguing against UG. The specific argument is against the Binding Theory. But the form is intended to generalize. The paper is written by excellent linguists, which is precisely why I spend three posts exposing its weaknesses. The paper, because it will appear in Cognition, is likely to be influential. It shouldn’t be. Here’s the first of three posts explaining why.
Let’s start with some truisms: not every property of a language particular G is innate. Here’s another one: some features of G reflect innate properties of the language acquisition device (LAD). Let’s end with a truth (that should be a truism by now but is still contested by some for reasons that are barely comprehensible): some of the innate LAD structure key to acquiring a G is linguistically dedicated (i.e. not cognitively general (i.e. due to UG)). These three claims should be obvious. True truisms. Sadly, they are not everywhere and always recognized as such. Not even by extremely talented linguists. I don’t know why this is so (though I will speculate towards the end of this note), but it is. Recent evidence comes from a forthcoming paper in Cognition (here) by Cole, Hermon and Yanti (CHY) on the UG status of the Binding Theory (BT). The CHY argument is that BT cannot explain certain facts in a certain set of Javanese and Malay dialects. It concludes that binding cannot be innate. The very strong implication is that UG contains nothing like BT, and that even if it did it would not help explain how languages differ and how kids acquire their Gs. IMO, this implication is what got the paper into Cognition (anything that ends with the statement or implication that there is nothing special about language (i.e. Chomsky is wrong!!!) has a special preferential HOV lane in the new Cognition’s review process). Boy do I miss Jacques Mehler. Come back Jacques. Please.
Before getting into the details of CHY, let’s consider what the classical BT says. It is divided into three principles and a definition of binding:
A. An anaphor must be bound in its domain
B. A pronominal cannot be bound in its domain
C. An R-expression cannot be bound
(1) An expression E binds an expression E’ iff E c-commands E’ and E is co-indexed with E’.
We also need a definition of ‘domain’ but I leave it to the reader to pick her/his favorite one. That’s the classical BT.
What does it say? It outlines a set of relations that must hold between classes of grammatical expressions. BT-A states that if some expression is in the grammatical category ‘anaphor’ then it must have a local c-commanding binder. BT-B states that if some expression is in the category ‘pronominal’ then it cannot have a local c-commanding binder. And BT-C states, well you know what it states, if…
Now what does BT not say? It says nothing about which phonetically visible expressions fall into which class. It does not say that every overt expression must fall into at least one of these classes. It does not say that every G must contain expressions that fall into these classes. In fact, BT by itself says nothing at all about how a given “visible” morphologically/phonetically visible expression distributes or what licensing conditions it must enter into. In other words, by itself BT does not tell us, for example, that (2) is ungrammatical. All it says is that if ‘herself’ is an anaphor then it needs a binder. That’s it.
(2) John likes herself
How then does BT gain empirical traction? It does so via the further assumption that reflexives in English are BT anaphors (and, additionally, that binding triggers morphologically overt agreement in English reflexives). Assuming this, ‘herself’ is subject to principle BT-A and assuming that John is masculine, herself has no binder in its domain, and so violates BT-A above. This means that the structure underlying (2) is ungrammatical and this is signaled by (2)’s unacceptability.
As stated, there is a considerable distance between a linguistic object’s surface form and its underlying grammatical one. So what’s the empirical advantage of assuming something as abstract as the classical BT? The most important reason, IMO, is that it helps resolve a critical Poverty of Stimulus (PoS) problem. Let me explain (and I will do this slowly for CHY never actually explains what the specific PoS problem in the domain of binding is (though they allude to the problem as an important feature of their investigation), and this, IMO, allows the paper to end in intellectually unfortunate places).
As BT connoisseurs know, the distribution of overt reflexives and pronouns is quite restricted. Here is the standard data:
(3) a. John1 likes herself1/*2
b. John1 likes himself1/*2
c. John1 talked to Bill2 about himself1/2/*3
d. John1 expects Mary2 to like himself*1/*2/*3
e. John1 expects Mary2 to like herself*1/2/*3
f. John1 expects himself1/*2/*3 to like Mary2
g. John1 expects (that) he/himself*1/*2/*3 will like Mary2
If we assume that reflexives are BT-A-anaphors then we can explain all of this data. Where’s the PoS problem? Well, lots of these data concern what cannot happen. On the assumption that the ungrammatical cases in (3) are not attested in the PLD, then the fact that a typical English Language Acquisition Device (LAD, aka, kid) converges on the grammatical profile outlined in (3) must mean that this profile in part reflects intrinsic features of the LAD. For example, the fact that kids do not generalize from the acceptability of (3f) to conclude that (3g) should also be acceptable needs to be explained and it is implausible that the LAD infers that that this is an incorrect inference by inspecting unacceptable sentences like (3g), for being unacceptable they will not appear in the PLD. Thus, how LADs come to converge to Gs that allow the good sentences and prevent the bad ones looks like (because it is) a standard PoS puzzle.
How does assuming that BT is part of UG solve the problem? Well, it doesn’t, not all by itself (and nobody ever thought that it could all by itself). But it radically changes it. Here’s what I mean.
If BT is part of UG then the acquisition problem facing the LAD boils down to identifying those expressions in your language that are anaphors, pronominals and R-expressions. This is not an easy task, but it is easier than figuring this out plus figuring out the data distribution in (3). In fact, as I doubt that there is any PLD able to fix the data in (3) (this is after all what the PoS problem in the binding domain consists in) and as it is obvious that any theory of binding will need to have the LAD figure out (i.e. learn) using the PLD which overt morphemes (if any) are BT anaphors/pronominals (after all, ‘himself’ is a reflexive in English but not in French and I assume that this fact must be acquired on the basis of PLD) then the best story wrt Plato’s Problem in the domain of binding is where what must obviously be learned is all that must be learned. Why? Because once I know that reflexives in English are BT anaphors subject to BT-A then I get the knowledge illustrated by the data in (3) as a UG bonus. That’s how PoS problems are solved. So, to repeat: all the LAD needs do to become binding competent is figure out which overt expressions fall into which binding categories. Do this and the rest is an epistemic freebie.
Furthermore, it’s virtually certain that the UG BT principles act as useful guides for the categorization of morphemes into the abstract categories BT trucks in (i.e. anaphor, pronominal, and R-expression). Take anaphors. If BT is part of UG it provides the LAD with some diagnostics for anaphoricity. Anaphors must have antecedents. They must be local and high enough. This means that if the LAD hears a sentence like John scratched himself in a situation where John is indeed scratching himself then he has prima facie evidence that ‘himself’ is a reflexive (as it fits A constraints). Of course, the LAD may be wrong (hence the ‘prima facie’ above). For example, say that the LAD also hears pairs of sentences like John loves Mary. She loves himself too and ‘himself’ here is anaphoric to John, then the LAD has evidence that reflexives are not just subject to BT-A (i.e. they are at best ambiguous morphemes and at worst not subject to BT-A at all). So, I can see how PLD of the right sort in conjunction with an innate UG provided BT-A would help with the classification of morphemes to the more abstract categories using simple PLD in the. That’s another nice feature of an articulate UG.
Please observe: on this view of things UG is an important part of a theory of language learning. It is not itself a theory of learning. This point was made in Aspects, and is as true today as it was then. In fact, you might say that in the current climate of Bayesian excess that it is the obvious conclusion to draw: UG limns the hyporthesis space that the learning procedure explores. There are many current models of how UG knowledge might be incorporated in more explicit learning accounts of various flavors (see Charles Yang’s work or Jeff Lidz’s stuff for some recent general proposals and worked out examples).
Does any of this suppose that the LAD uses only attested BT patterns in learning to classify expressions? Of course not. For example, the LAD might conclude that ‘itself’ is a BT-A anaphor in English on first encountering it. Why? By generalizing from forms it has encountered before (e.g. ‘herself’, ‘themselves’). Here the generalization is guided not by UG binding properties but by the details of English morphology. It is easy to imagine other useful learning strategies (see note 6). However, it seems likely that one way the LAD will distinguish BT-A from BT-B morphemes will be in terms of their cataphoric possibilities positively evidenced in the PLD.
So, BT as part of UG can indeed help solve a PoS problem (by simplifying what needs to be acquired) and plausibly provides guide-posts towards that classification. However, BT does not suffice to fix knowledge of binding all by itself nor did anyone ever think that it would. Moreover, even the most rabid linguistic nativist (I know because I am one of these) is not committed to any particular pattern of surface data. To repeat, BT does not imply anything about how morphemes fall into any of the relevant categories or even if any of them do or even if there are any relevant surface categories to fall into.
With this as background, we are now ready to discuss CHY. I will do this in the next post.
 I have been a great admirer of both Cole and Hermon’s work for a long time. They are extremely good linguists, much better than I could ever hope to be. This paper, however, is not good at all. It’s the paper, not the people, that this post discusses.
 I will discuss the GB version for this is what CHY discusses. I personally believe that this version of BT is reducible to the theory of movement (A-chain dependencies actually). The story I favor looks more like the old Lees & Klima account. I hope to blog about the differences in the very near future.
 As GGers also know, the judgments effectively reverse if we replace the reflexive with a bound pronoun. This reflects the fact that in languages like English, reflexives and bound pronouns are (roughly) in complementary distribution. This fact results from the opposite requirements stated in BT-A and BT-B. The same effect was achieved in earlier theories of binding (e.g. Lees and Klima) by other means.
 From what I know, sentences like (3g) are unattested in CHILDES. Indeed, though I don’t know this, I suspect that sentences with reflexives in ECM subject position are not a dime a dozen either.
 I assume that I need not say that once one figures out which (if any) of the morphemes are pronominals then BT-B effects (the opposite of those in (3) with pronouns replacing reflexives) follow apace. As I need not say this, I won’t.
 Please note that this is simply an illustration, not a full proposal. There are many wrinkles one could add. Here’s another potential learning principle: LADs are predisposed to analyze dependencies in BT terms if this is possible. Thus the default analysis is to treat a dependency as a BT dependency. But this principle, again, is not an assumption properly part of BT. It is part of the learning theory that incorporates a UG BT.