Impossibility Arguments or
Why I Love Poverty of Stimulus Thinking
Sometimes to figure out what something does it is useful to consider what it can’t do. This is what impossibility arguments are designed for and generative grammarians love them (ok, many don’t appreciate their beauty but they should!). The most famous one in linguistics is the Poverty of Stimulus argument (POS). The POS actually fulfills two functions: (i) it shows that the data kids use to acquire their natural language grammars radically underdetermine what they linguistically achieve and (ii) it offers a conceptual tool for investigating the structure of UG that has remarkable carrying power. Let’s consider these two points in turn.
First a detailed review of the argument. The POS has several ingredients but it boils down to a very simple schema:
a. Consider the features of some grammar rule R in some particular language L.
b. Subtract out those features of L that are data driven.
c. The residue characterizes UG.
The argument is easily illustrated and has been many times. Nonetheless, let’s take a look at the standard example and pull it apart. The most discussed example involves Yes/No (Y/N) question formation in English. The facts are simple (btw: portys are Portuguese Water Dogs and I have one; ‘Sampson,’ who once (he’s now 13) was an excellent jumper). (1) illustrates an instance of the rule.
(1) Portys can jump à Can portys jump
(2) John is thinking portys can jump à
a. Is John thinking portys can jump
b. *Can John is thinking that portys can jump
(3) Portys that can jump should swim daily
a. Should portys that can jump swim daily
b. *Can portys that jump should swim daily
(4) Can portys that jump swim daily
These facts tell us the following: (1) a Y/N question in English involves moving the auxiliary of the corresponding declarative answer to the front. (2) tells us that when there is more than one auxiliary (so there is no the auxiliary) one can move only one of them. What’s the right way of describing the “correct” one? Consider two possibilities: (i) the right auxiliary is the highest one, (ii) the right auxiliary is the one linearly closest to the beginning. Now the big questions: How does the kid settle on (i) and not on (ii)?, Do the data in (1)-(5) help the kid choose between (i) and (ii) and if so, how? Well, simple sentences like (1) and more complex ones like (2) cannot help him/her much for both (i) and (ii) cover the relevant data (viz. can and is are both the highest and leftmost auxiliaries in (1) and (2)). To find data that distinguishes (i) from (ii) we need (3)? It is not consistent with (ii) (viz. should is the highest and can is the leftmost auxiliary) so it indicates that the “right” specification of the rule (the one English speakers know) is something like (i), i.e. move the highest auxiliary to the front. (4) is a bonus fact. Restricting Y/N question formation as in (i) explains why (4) is unambiguous. It means (5a) and cannot mean (5b).
(5) a. Is it the case that portys that can jump swim daily
b. Is it the case that portys that jump can swim daily
To review: there are several kinds of facts, a mix of positive and negative data. The positive data divide into those sentences that are acceptable simpliciter, (viz. (1), (2a), (3a)) and those acceptable only under a certain interpretation (viz. (4) with interpretation (5b)). The negative data divide into those that are simply unacceptable (viz. (2b), (3b) and those that are unacceptable under a specific interpretation, (viz. (4) with interpretation (5a)). Conclusion: (i) is how English speakers determine the right auxiliary to move.
Before moving on, it is important to appreciate that there is nothing conceptually wrong with either a linear or a hierarchical proximity condition. Both are conceptually fine, neither is a priori simpler than the other (actually the linear proximity condition might be less complex as linear order seems a simpler conception than hierarchical order) and both could yield perfectly interpretable (though, as we’ve seen, different) outputs. Nonetheless, it is absolutely clear given these simple English facts that (ii) is in fact wrong and (i) (or something like it) is in fact right.
With this settled we can now consider the real problem, the question whose answer allows us to fix the structure of UG: how do native speakers of English come to adopt (i) and reject (ii)? Here’s what we can immediately conclude from perusing the data in (1)-(5). They do not do so on the basis of sentences like (1) or (2a) as both hypotheses are consistent with this kind of data. Moreover, sentences like (2b) and (3b) being barely comprehensible are never produced (and if they were (rarely) they would be treated as noise given that they are gibberish) and sentences like (3a) are vanishingly rare in the primary linguistic data, at least as Childes indicates. Legate andYang do the requisite grunt work and show that there are no (“no,” as in zero!) Y/N sentences of the required form (i.e. sentences like (3) and (4)) in the relevant data bases. That means that human adults are not driven to adopt (i) by the data available to them in the course of acquisition. There simply is no such data. Conclusion: if opting for (i) is not data driven then the choice of (i) must be result of some internal bias in the learner in favor of restrictions like (i) and against those like (ii). Put in generativist terms, the bias towards (i) and against (ii) is part of UG and so need not be learned at all. It is, rather, part of what humans bring to the task of learning language.
To be clear: this argument does not show that Y/N question formation is innate, nor does it imply that all languages will form Y/N questions by moving verbal auxiliaries to the front (they don’t). What this argument does show is that rules have a structure sensitive, rather than a linear sensitive format. And this is what’s really interesting. The very simple data above concerning a very simple rule of English grammar, Y/N question formation, argue for a very general conclusion about the kinds of grammar rules natural languages possess. They are sensitive to hierarchical relations (e.g. “highest”) and oblivious to linear ones (e.g. “leftmost”). Why UG is built this way is a very interesting question (currently a topic of some interesting speculation which I will sadly ignore). But built that way it is and this conclusion has very wide ranging ramifications, viz. that we should expect to find no rule of grammar in any natural language that is stated in linear rather than hierarchical terms. In other words, the reasoning we have engaged in requires that the format restrictions that UG puts on English Y/N questions must regulate grammar rules in general, with the result that all rules in all natural language grammars must embrace (i) and eschew (ii). This is a very strong claim, and one that to the best of my knowledge has admirably withstood decades of linguistic investigation on the grammars of many many diverse languages.
And this, if you sit back and stare at it for a moment is amazing. Careful consideration of the logic of very simple cases in one language allows one to establish how rules will operate in all languages. One of the marvels of the POS (and which particularly appeals to my philological ineptitude) is that it allows for a very respectable kind of arm-chair discovery, one able to establish properties of UG more powerfully than with any other method. The reason is that POS focuses attention on the properties of the grammatical coding system rather than the specifics of any given code. English may code different rules than Swahili or Japanese but all three languages rules will code rules in the same way, necessarily cleaving to the format that UG requires. We know that this must be so given another simple fact: any kid can learn any language if dropped into the right speech community. UG describes this very general language learning capacity. POS allows us to subtract out the limits of what data can determine. What’s left over are innate features of UG that, by the design of the POS, cannot be data sensitive. Consequently what we learn about UG using the POS in studying the grammar of any one language must necessarily carry over to the grammatical format of another. The POS, though deceptively simple, cuts very deep. The POS; simple, effective, amazing!
 They also consider another possible data source to distinguish (i) from (ii) (sentence like where’s the other dolly that was in there) and find that these appear in the relevant data sets at a rate of about .05%. This is indistinguishable from noise. Put another way: only a system looking for such data could be expected to find it and such a system would already have been biased towards a Y/N rule incorporating (i). Thus, if this is not noise for the learner it is because the learner is already looking for the “right” answer, the same conclusion we have reached without considering such cases.
Incidentally, I highly recommend the Legate-Yang paper as it is the most sophisticated elaboration of the POS argument I’ve seen. They try to quantify “how much” data is too little data. The punch line is about 1.2% for Y/N questions. This for the case at hand the Childes data is roughly 1/20th of what is needed.