This is the first of three posts on a forthcoming Cognition paper arguing against UG. The
specific argument is against the Binding Theory. But the form is intended to
generalize. The paper is written by excellent linguists, which is precisely why
I spend three posts exposing its weaknesses. The paper, because it will appear
in Cognition, is likely to be
influential. It shouldn’t be. Here’s the first of three posts explaining why.
Let’s start with some truisms: not every property of a
language particular G is innate. Here’s another one: some features of G reflect
innate properties of the language acquisition device (LAD). Let’s end with a
truth (that should be a truism by now but is still contested by some for
reasons that are barely comprehensible): some of the innate LAD structure key
to acquiring a G is linguistically dedicated (i.e. not cognitively general (i.e. due to UG)). These three claims
should be obvious. True truisms. Sadly, they are not everywhere and always
recognized as such. Not even by extremely talented linguists. I don’t know why
this is so (though I will speculate towards the end of this note), but it is.
Recent evidence comes from a forthcoming paper in Cognition (here)
by Cole, Hermon and Yanti (CHY) on the UG status of the Binding Theory (BT).[1]
The CHY argument is that BT cannot explain certain facts in a certain set of Javanese
and Malay dialects. It concludes that binding cannot be innate. The very strong
implication is that UG contains nothing like BT, and that even if it did it
would not help explain how languages differ and how kids acquire their Gs. IMO,
this implication is what got the paper into Cognition
(anything that ends with the statement or implication that there is nothing
special about language (i.e. Chomsky is wrong!!!) has a special preferential HOV
lane in the new Cognition’s review
process). Boy do I miss Jacques Mehler. Come back Jacques. Please.
Before getting into the details of CHY, let’s consider what
the classical BT says.[2]
It is divided into three principles and a definition of binding:
A. An
anaphor must be bound in its domain
B. A
pronominal cannot be bound in its domain
C. An
R-expression cannot be bound
(1) An
expression E binds an expression E’ iff E c-commands E’ and E is co-indexed
with E’.
We also need a definition of ‘domain’ but I leave it to the
reader to pick her/his favorite one. That’s the classical BT.
What does it say? It outlines a set of relations that must
hold between classes of grammatical expressions. BT-A states that if some expression is in the grammatical
category ‘anaphor’ then it must have a local c-commanding binder. BT-B states
that if some expression is in the
category ‘pronominal’ then it cannot have a local c-commanding binder. And BT-C
states, well you know what it states, if…
Now what does BT not
say? It says nothing about which
phonetically visible expressions fall into which class. It does not say that every overt expression must
fall into at least one of these classes. It does not say that every G must contain expressions that fall into these
classes. In fact, BT by itself says
nothing at all about how a given “visible” morphologically/phonetically visible
expression distributes or what licensing conditions it must enter into. In
other words, by itself BT does not
tell us, for example, that (2) is ungrammatical. All it says is that if ‘herself’ is an anaphor then it needs
a binder. That’s it.
(2) John
likes herself
How then does BT gain empirical traction? It does so via the
further assumption that reflexives in English are BT anaphors (and,
additionally, that binding triggers morphologically overt agreement in English
reflexives). Assuming this, ‘herself’ is subject to principle BT-A and assuming
that John is masculine, herself has no binder in its domain, and
so violates BT-A above. This means that the structure underlying (2) is
ungrammatical and this is signaled by (2)’s unacceptability.
As stated, there is a considerable distance between a
linguistic object’s surface form and its underlying grammatical one. So what’s
the empirical advantage of assuming something as abstract as the classical BT?
The most important reason, IMO, is that it helps resolve a critical Poverty of
Stimulus (PoS) problem. Let me explain (and I will do this slowly for CHY never
actually explains what the specific PoS
problem in the domain of binding is (though they allude to the problem as an
important feature of their investigation), and this, IMO, allows the paper to
end in intellectually unfortunate places).
As BT connoisseurs know, the distribution of overt
reflexives and pronouns is quite restricted. Here is the standard data:[3]
(3) a. John1
likes herself1/*2
b. John1
likes himself1/*2
c.
John1 talked to Bill2 about himself1/2/*3
d. John1 expects Mary2 to like himself*1/*2/*3
e. John1 expects Mary2 to like herself*1/2/*3
f. John1 expects himself1/*2/*3 to like Mary2
g.
John1 expects (that) he/himself*1/*2/*3 will like Mary2
If we assume that reflexives are BT-A-anaphors then we can
explain all of this data. Where’s the PoS problem? Well, lots of these data
concern what cannot happen. On the
assumption that the ungrammatical cases in (3) are not attested in the PLD,
then the fact that a typical English Language Acquisition Device (LAD, aka,
kid) converges on the grammatical profile outlined in (3) must mean that this
profile in part reflects intrinsic features of the LAD. For example, the fact
that kids do not generalize from the acceptability of (3f) to conclude that
(3g) should also be acceptable needs to be explained and it is implausible that
the LAD infers that that this is an incorrect inference by inspecting
unacceptable sentences like (3g), for being unacceptable they will not appear
in the PLD.[4]
Thus, how LADs come to converge to Gs that allow the good sentences and prevent the bad ones looks like
(because it is) a standard PoS puzzle.
How does assuming that BT is part of UG solve the problem? Well,
it doesn’t, not all by itself (and
nobody ever thought that it could all by
itself). But it radically changes
it. Here’s what I mean.
If BT is part of UG then the acquisition problem facing the
LAD boils down to identifying those expressions in your language that are
anaphors, pronominals and R-expressions. This is not an easy task, but it is easier
than figuring this out plus figuring
out the data distribution in (3). In fact, as I doubt that there is any PLD
able to fix the data in (3) (this is after all what the PoS problem in the
binding domain consists in) and as it is obvious that any theory of binding
will need to have the LAD figure out (i.e. learn) using the PLD which overt morphemes
(if any) are BT anaphors/pronominals (after all, ‘himself’ is a reflexive in
English but not in French and I assume that this fact must be acquired on the
basis of PLD) then the best story wrt Plato’s Problem in the domain of binding
is where what must obviously be learned is all
that must be learned. Why? Because once I know that reflexives in English are BT
anaphors subject to BT-A then I get the knowledge illustrated by the data in
(3) as a UG bonus. That’s how PoS
problems are solved.[5]
So, to repeat: all the LAD needs do to become binding competent is figure out
which overt expressions fall into which binding categories. Do this and the
rest is an epistemic freebie.
Furthermore, it’s virtually certain that the UG BT
principles act as useful guides for the categorization of morphemes into the
abstract categories BT trucks in (i.e. anaphor, pronominal, and R-expression). Take anaphors. If BT is part of UG it
provides the LAD with some diagnostics for anaphoricity. Anaphors must have antecedents. They must be
local and high enough. This means that if the LAD hears a sentence like John scratched himself in a situation
where John is indeed scratching himself then he has prima facie evidence that ‘himself’ is a reflexive (as it fits A
constraints). Of course, the LAD may be wrong (hence the ‘prima facie’ above). For example, say that the LAD also hears pairs
of sentences like John loves Mary. She
loves himself too and ‘himself’ here is anaphoric to John, then the LAD has evidence that reflexives are not just subject to BT-A (i.e. they are at
best ambiguous morphemes and at worst not subject to BT-A at all). So, I can
see how PLD of the right sort in conjunction with an innate UG provided BT-A
would help with the classification of morphemes to the more abstract categories
using simple PLD in the.[6] That’s another nice feature of an articulate
UG.
Please observe: on this view of things UG is an important part of a theory of language learning.
It is not itself a theory of learning. This point was made in Aspects, and is as true today as it was
then. In fact, you might say that in the current climate of Bayesian excess
that it is the obvious conclusion to draw: UG limns the hyporthesis space that
the learning procedure explores. There are many current models of how UG
knowledge might be incorporated in more explicit learning accounts of various
flavors (see Charles Yang’s work or Jeff Lidz’s stuff for some recent general
proposals and worked out examples).
Does any of this suppose that the LAD uses only attested BT patterns in learning to
classify expressions? Of course not. For example, the LAD might conclude that
‘itself’ is a BT-A anaphor in English on first encountering it. Why? By
generalizing from forms it has encountered before (e.g. ‘herself’, ‘themselves’).
Here the generalization is guided not by UG binding properties but by the
details of English morphology. It is
easy to imagine other useful learning strategies (see note 6). However, it
seems likely that one way the LAD will distinguish BT-A from BT-B morphemes
will be in terms of their cataphoric possibilities positively evidenced in the
PLD.
So, BT as part of UG can indeed help solve a PoS problem (by
simplifying what needs to be acquired) and plausibly provides guide-posts
towards that classification. However,
BT does not suffice to fix knowledge of binding all by itself nor did anyone ever think that it would. Moreover,
even the most rabid linguistic nativist (I know because I am one of these) is
not committed to any particular
pattern of surface data. To repeat, BT does not imply anything about how
morphemes fall into any of the relevant categories or even if any of them do or
even if there are any relevant surface categories to fall into.
With this as background, we are now ready to discuss CHY. I
will do this in the next post.
[1]
I have been a great admirer of both Cole and Hermon’s work for a long time.
They are extremely good linguists, much better than I could ever hope to be.
This paper, however, is not good at all. It’s the paper, not the people, that
this post discusses.
[2]
I will discuss the GB version for this is what CHY discusses. I personally
believe that this version of BT is reducible to the theory of movement (A-chain
dependencies actually). The story I favor looks more like the old Lees &
Klima account. I hope to blog about the differences in the very near future.
[3]
As GGers also know, the judgments effectively reverse if we replace the
reflexive with a bound pronoun. This reflects the fact that in languages like
English, reflexives and bound pronouns are (roughly) in complementary
distribution. This fact results from the opposite requirements stated in BT-A
and BT-B. The same effect was achieved in earlier theories of binding (e.g.
Lees and Klima) by other means.
[4]
From what I know, sentences like (3g) are unattested in CHILDES. Indeed, though
I don’t know this, I suspect that sentences with reflexives in ECM subject
position are not a dime a dozen either.
[5]
I assume that I need not say that once one figures out which (if any) of the
morphemes are pronominals then BT-B effects (the opposite of those in (3) with
pronouns replacing reflexives) follow apace. As I need not say this, I won’t.
[6]
Please note that this is simply an illustration, not a full proposal. There are
many wrinkles one could add. Here’s another potential learning principle: LADs
are predisposed to analyze dependencies in BT terms if this is possible. Thus the default analysis is to treat a
dependency as a BT dependency. But this principle, again, is not an assumption properly part of BT.
It is part of the learning theory that incorporates a UG BT.