Impossibility Arguments or
Why I Love Poverty of Stimulus Thinking
Sometimes to
figure out what something does it is useful to consider what it can’t do. This
is what impossibility arguments are designed for and generative grammarians
love them (ok, many don’t appreciate their beauty but they should!). The most
famous one in linguistics is the Poverty of Stimulus argument (POS). The POS actually fulfills two functions: (i)
it shows that the data kids use to acquire their natural language grammars
radically underdetermine what they linguistically achieve and (ii) it offers a
conceptual tool for investigating the structure of UG that has remarkable
carrying power. Let’s consider these two
points in turn.
First a detailed
review of the argument. The POS has several ingredients but it boils down to a
very simple schema:
a.
Consider
the features of some grammar rule R in some particular language L.
b.
Subtract
out those features of L that are data driven.
c.
The
residue characterizes UG.
The argument is
easily illustrated and has been many times.
Nonetheless, let’s take a look at the standard example and pull it
apart. The most discussed example involves Yes/No (Y/N) question formation in
English. The facts are simple (btw: portys are Portuguese Water Dogs and I have
one; ‘Sampson,’ who once (he’s now 13) was an excellent jumper). (1)
illustrates an instance of the rule.
(1)
Portys
can jump à
Can portys jump
(2)
John
is thinking portys can jump à
a.
Is
John thinking portys can jump
b.
*Can
John is thinking that portys can jump
(3)
Portys
that can jump should swim daily
a.
Should
portys that can jump swim daily
b.
*Can
portys that jump should swim daily
(4)
Can
portys that jump swim daily
These facts tell
us the following: (1) a Y/N question in English involves moving the auxiliary
of the corresponding declarative answer to the front. (2) tells us that when there is more than one
auxiliary (so there is no the
auxiliary) one can move only one of them. What’s the right way of describing
the “correct” one? Consider two possibilities: (i) the right auxiliary is the
highest one, (ii) the right auxiliary is the one linearly closest to the beginning. Now the big questions: How does the kid
settle on (i) and not on (ii)?, Do the data in (1)-(5) help the kid choose
between (i) and (ii) and if so, how? Well, simple sentences like (1) and more
complex ones like (2) cannot help him/her much for both (i) and (ii) cover the
relevant data (viz. can and is are both the highest and leftmost
auxiliaries in (1) and (2)). To find data that distinguishes (i) from (ii) we
need (3)? It is not consistent with
(ii) (viz. should is the highest and can is the leftmost auxiliary) so it
indicates that the “right” specification of the rule (the one English speakers
know) is something like (i), i.e. move the highest auxiliary to the front. (4)
is a bonus fact. Restricting Y/N
question formation as in (i) explains why (4) is unambiguous. It means (5a) and
cannot mean (5b).
(5)
a.
Is it the case that portys that can jump swim daily
b. Is it the case that portys that jump
can swim daily
To review: there
are several kinds of facts, a mix of positive and negative data. The positive
data divide into those sentences that are acceptable simpliciter, (viz. (1),
(2a), (3a)) and those acceptable only under a certain interpretation (viz. (4)
with interpretation (5b)). The negative data divide into those that are simply
unacceptable (viz. (2b), (3b) and those that are unacceptable under a specific
interpretation, (viz. (4) with interpretation (5a)). Conclusion: (i) is how English speakers
determine the right auxiliary to
move.
Before moving
on, it is important to appreciate that there is nothing conceptually wrong with
either a linear or a hierarchical proximity condition. Both are conceptually fine, neither is a priori simpler than the other
(actually the linear proximity condition might be less complex as linear order seems
a simpler conception than hierarchical order) and both could yield perfectly
interpretable (though, as we’ve seen, different) outputs. Nonetheless, it is
absolutely clear given these simple English facts that (ii) is in fact wrong
and (i) (or something like it) is in fact right.
With this
settled we can now consider the real problem, the question whose answer allows
us to fix the structure of UG: how do native speakers of English come to adopt
(i) and reject (ii)? Here’s what we can
immediately conclude from perusing the data in (1)-(5). They do not do so on
the basis of sentences like (1) or (2a) as both hypotheses are consistent with
this kind of data. Moreover, sentences
like (2b) and (3b) being barely comprehensible are never produced (and if they
were (rarely) they would be treated as noise given that they are gibberish) and
sentences like (3a) are vanishingly rare in the primary linguistic data, at
least as Childes indicates. Legate andYang do the requisite grunt work and show that there are no (“no,” as in zero!) Y/N sentences of the required form (i.e. sentences
like (3) and (4)) in the relevant data bases.[1] That means that human adults are not driven to
adopt (i) by the data available to them in the course of acquisition. There simply
is no such data. Conclusion: if opting for (i) is not data driven then the
choice of (i) must be result of some
internal bias in the learner in favor of restrictions like (i) and against
those like (ii). Put in generativist terms, the bias towards (i) and against
(ii) is part of UG and so need not be learned at all. It is, rather, part of what humans bring to
the task of learning language.
To be clear:
this argument does not show that Y/N
question formation is innate, nor does it imply that all languages will form
Y/N questions by moving verbal auxiliaries to the front (they don’t). What this
argument does show is that rules have a structure sensitive, rather than a
linear sensitive format. And this is
what’s really interesting. The very simple data above concerning a very simple
rule of English grammar, Y/N question formation, argue for a very general
conclusion about the kinds of grammar rules natural languages possess. They are
sensitive to hierarchical relations (e.g. “highest”) and oblivious to linear
ones (e.g. “leftmost”). Why UG is built
this way is a very interesting question (currently a topic of some interesting
speculation which I will sadly ignore). But built that way it is and this conclusion
has very wide ranging ramifications, viz. that we should expect to find no rule of grammar in any natural language that is stated in
linear rather than hierarchical terms.
In other words, the reasoning we have engaged in requires that the
format restrictions that UG puts on English Y/N questions must regulate grammar
rules in general, with the result that all rules in all natural language grammars
must embrace (i) and eschew (ii). This
is a very strong claim, and one that to the best of my knowledge has admirably
withstood decades of linguistic investigation on the grammars of many many
diverse languages.
And this, if you
sit back and stare at it for a moment is amazing. Careful consideration of the
logic of very simple cases in one language allows one to establish how rules
will operate in all languages. One of
the marvels of the POS (and which particularly appeals to my philological ineptitude)
is that it allows for a very respectable kind of arm-chair discovery, one able
to establish properties of UG more powerfully than with any other method. The
reason is that POS focuses attention on the properties of the grammatical
coding system rather than the specifics of any given code. English may code different rules than Swahili
or Japanese but all three languages rules will code rules in the same way, necessarily
cleaving to the format that UG requires.
We know that this must be so given another simple fact: any kid can learn any language if dropped into the right speech community. UG
describes this very general language learning capacity. POS allows us to
subtract out the limits of what data can
determine. What’s left over are innate features of UG that, by the design of
the POS, cannot be data sensitive. Consequently what we learn about UG using
the POS in studying the grammar of any one
language must necessarily carry over to the grammatical format of another. The
POS, though deceptively simple, cuts very deep.
The POS; simple, effective, amazing!
[1] They also consider another possible data
source to distinguish (i) from (ii) (sentence like where’s the other dolly that was in there) and find that these
appear in the relevant data sets at a rate of about .05%. This is
indistinguishable from noise. Put
another way: only a system looking for such data could be expected to find it
and such a system would already have been biased towards a Y/N rule
incorporating (i). Thus, if this is not noise for the learner it is because
the learner is already looking for the “right” answer, the same conclusion we
have reached without considering such cases.
Incidentally, I highly recommend the
Legate-Yang paper as it is the most sophisticated elaboration of the POS
argument I’ve seen. They try to quantify “how much” data is too little data.
The punch line is about 1.2% for Y/N questions. This for the case at hand the Childes
data is roughly 1/20th of what is needed.
I think you're somewhat overstating the power of the armchair here, because what raises the PoS consideration to the level of a real argument is all the non-armchair work by the creators of CHILDES, plus that of the language-describers who can be taken to have established that nothing like string-sensitive verb or aux preposing actually occurs in any language. (And, that the most similar thing to it that I know about, 2nd position clitics, is only able to escape structure-dependence by putting things after the first phonological word, for essentially phonological reasons (Legate's name pops up again, interestingly ...))
ReplyDeleteBy contrast, some of other PoS-based arguments, such as the ones surrounding the that-trace constraint, don't fare so well at all away from the armchair (Hsu & Chater 2010 The Logical Problem of Language Acquisition: A Probabilistic
Perspective, Cognitive Science 34 (2010) 972–1016; also a 2011 article by them + Vitanyi in Cognition.)
Again we differ in some respects (this may become a theme I suspect). Armchairs alone do not a comfy room make! However, the PoS is a highly slighted theoretical tool with great power. A current influential conception in the field seems to be that linguists just don't have enough facts or the right kinds of facts or reliable enough facts and until we do get these theorizing is premature. I suspect that the opposite is the case: that the best kind of empirical research is guided by theory and that a great tool for theory generation if one's interest is UG is the PoS. It's particular application is not always dispositive, but it is always very suggestive and more often than not dead on.
ReplyDeleteI'm not at all sure that the PoS argument-form is 'usually' right (many applications such as for the necessity of various parameters, and my version of the 'Morphological Blocking Convention' for LFG, are in my current view probably wrong, done in by the Bayesians, but I agree that it raises the right issues, which is really much more important.
ReplyDeleteWhere are we? Results wise there might be a difference (note 'might': would you trade 'more often than might be expected' for 'usually'?) on whether the cup is half empty or half full and on the strategy side we agree that it runneth over. I can live with that.
ReplyDeleteI believe you have a typo. You wrote:
ReplyDelete(4) Can portys that jump swim daily
Restricting Y/N question formation as in (i) explains why (4) is unambiguous.
It means (5a) and cannot mean (5b).
(5) a. Is it the case that portys that can jump swim daily
b. Is it the case that portys that jump can swim daily
This should read: It means (5b) and cannot mean (5a).
I think no typo actually.
Delete(5a) is for the question Can portys that can jump swim daily? and (5b) is for the ill-formed question *Can portys that jump can swim daily?