Lately, I’ve been worried that many people – mostly psychologists, but also philosophers, computer scientists and even linguists, do not appreciate the argument from the poverty of the stimulus. They simply do not see what this argument is supposed to show. This difficulty leads to skepticism about the poverty of the stimulus and about generative syntax more generally, which, in turn, interferes with progress towards solving the problem of how children learn language.
An argument from the poverty of the stimulus is based on two observations: (a) there exists a generalization about the grammar of some language and (b) the learner’s experience does not provide sufficient data to support that generalization over a range of alternative generalizations. These two observations support the conclusion that something other than experience must be responsible for the true generalization that holds of the speakers of the language in question. This conclusion invites hypotheses. Typically, these hypotheses have come in the form of innate constraints on linguistic representations, though nothing stops a theorist from proposing alternative sources for the relevant generalization.
But a common response to this argument is that we just didn’t try hard enough. The generalization really is supported in the data, but we just didn’t see the data in the right way. If we only understood a little more about how learners build up their representations of the data, then we would see how the data really does contain the relevant generalization. So, armed with this little bit of skepticism, one can blithely assert that there is no poverty of the stimulus problem based only on the belief that if we linguists just worked a little harder, the problem would dissipate. But skepticism is neither a counter argument nor a counter proposal.
So, I’d like to issue the following challenge. I will show a poverty of the stimulus argument that is not too complicated. I will then show how I looked for the relevant data in the environment and conclude that it really wasn’t there. I will then invite all takers (from whatever field) to show that the correct generalization and none of the alternatives really is available in the data. If someone shows that the relevant data really was there, then I will concede that there was no poverty of the stimulus argument for that phenomenon. Indeed, I will celebrate, because that discovery will represent progress that all students of the human language faculty will recognize as such. Nobody requires that every fact of language derives from innate knowledge; learning which ones do and which ones don’t sounds like progress. And with that kind of progress, I’d be more than happy to repeat the exercise until we discover some general principles.
But, if the poverty of the stimulus is not overturned for this case, then we can take that failure as a recognition that the problem is real and that the way forward in studying the human language faculty is by asking about what property of the learner makes the environmental data evidentiary for building a grammar.
With that preamble out of the way, let’s begin. Consider the judgments in 1-2, which Leddon and Lidz (2006) show with experimentally collected data are reliable in adult speakers of English:
(1) a. Norbert remembered that Ellen painted a picture of herself
b. * Norbert remembered that Ellen painted a picture of himself
c. Norbert remembered that Ellen was very proud of herself
d. * Norbert remembered that Ellen was very proud of himself
(2) a. Norbert remembered which picture of herself Ellen painted
b. Norbert remembered which picture of himself Ellen painted
c. Norbert remembered how proud of herself Ellen was
d. * Norbert remembered how proud of himself Ellen was
The facts in (1) illustrate a very simple generalization: a reflexive pronoun must take its antecedent in the domain of the closest subject. In all of (1a-d) only Ellen can be the antecedent of the reflexive. Let us assume (perhaps falsely) that this generalization is supported by the learner’s experience and that there is no poverty of the stimulus problem associated with it.
The facts in (2) do not obviously fit our generalization about reflexive pronouns. If we take “closest subject” to be the main clause subject, then we would expect only (b) and (d) to be grammatical. If we take “closest subject” to be the embedded subject, then we expect only (a) and (c) to be grammatical. And, if we take “closest subject” to be underspecified in these cases, then we expect all of (a-d) to be grammatical. So, something’s gotta give. What we need is for the “closest” subject to be Ellen in (c-d), but not (a-b). And, we need closest subject to be underspecified in (a-b) but not (c-d). We’ll get back to a way to do that in a moment.
But first we should see how these patterns relate to the poverty of the stimulus. Leddon and Lidz (2006) also showed that sentences like those in (2) are unattested in speech to children. While we didn’t do a search of every sentence that any child ever heard, we did examine 10,000 wh-questions in CHILDES and we didn’t find a single example of a wh-phrase containing a reflexive pronoun, a non-reflexive pronoun or a name. So, there really is no data to generalize from. Whatever we come to know about these sentences, it must be a generalization beyond the data of experience.
One might complain, fairly, that 10,000 wh-questions is not that many and that if we had looked at a bigger corpus we might have found some with the relevant properties. We did search Google for strings containing wh-phrases like those in (2) and the only hits we got were example sentences from linguistics papers. This gives us some confidence that our estimate of the experience of children is accurate.
If these estimates are correct, the data of experience appears to be compatible with many generalizations, varying in whether Norbert, Ellen or both are possible antecedents in the (a-b) cases, the (c-d) cases or both. With these possibilities, there are 8 possible patterns. But out of these eight, all English speakers acquire the same one. Something must be responsible for this uniformity. That is the extent of the argument. It doesn’t really have a conclusion, except that something must be responsible for the pattern. The argument is merely the identification of a mystery, inviting hypotheses that explain it.
Here’s a solution that is based on prior knowledge, due to Huang (1993). The first part of the solution is that we maintain our generalization about reflexives: reflexives must find their antecedent in the domain of the nearest subject. The second part capitalizes on the difference between (2a-b), in which the wh-phrase is an argument of the lower verb, and (2c-d), in which the wh-phrase is the lower predicate itself. In (2a-b), the domain of the nearest subject is underspecified. If we calculate it in terms of the “base position” of the wh-phrase, then the embedded subject is the nearest subject and so only Ellen can be the antecedent. If we calculate it in terms of the “surface position” of the wh-phrase, then the matrix subject is the nearest subject. For (2c-d), however, the closest subject is the same, independent of whether we interpret the wh-phrase in its "base" or "surface" position. This calculation of closest subject follows from the Predicate Internal Subject Hypothesis (PISH): The predicate carries information about its subject wherever it goes. Because of PISH, the wh-phrase [how proud of himself/herself] contains an unpronounced residue of the embedded subject and so is really represented as [how
proud of himself/herself]. This residue (despite not being pronounced)
counts as the nearest subject for the reflexive, no matter where the predicate
occurs. Thus, the reflexive must be bound within that domain and Ellen is the only possible antecedent
for that reflexive. So, as long as the learner knows the PISH, then the pattern
of facts in (2) follows deductively. The learner requires no experience with
sentences like (2) in order to reach the correct generalization.
Now, this argument only says that the learner must know that the predicate carries information about its subject with it in the syntax prior to encountering sentences like (2). It doesn’t yet require that knowledge to be innate. So, the poverty of the stimulus problem posed by (2) shifts to the problem of determining whether subjects are generated predicate internally.
Our next question is whether we have independent support for PISH and whether the data that supports PISH can also lead to its acquisition. I can think of several important patterns of facts that argue in favor of PISH. The first (due, I believe, to Jim McCloskey) concerns the relative scope of negation and a universal quantifier in subject position. Consider the following sentences:
(3) a. Every horse didn’t jump over the fence
b. A fiat is not necessarily a reliable car
c. A fiat is necessarily not a reliable car
The important thing to notice about these sentences is that (3a) is ambiguous but that neither (3b) nor (3c) is. (3a) can be interpreted as making a strong claim that none of the horses jumped over the fence or a weaker claim that not all of them jumped. This ambiguity concerns the scope of negation. Does the negation apply to something that includes the universal or not? If it does, then we get the weak reading that not all horses jumped. If it does not, then we get the strong reading that none of them did.
How does this scope ambiguity arise? The case where the subject takes scope over negation is straightforward if we assume (uncontroversially) that scope can be read directly off of the hierarchical structure of the sentence. But what about the reading where negation takes wide scope? We can consider two possibilities. First, it might be that the negation can take the whole sentence in its scope even if it does not occur at the left edge of the sentence. But this possibility is shown to be false by the lack of ambiguity in (3c). If negation could simply take wide scope over the entire sentence independent of its syntactic position, then we would expect (3c) to be ambiguous, contrary to fact. (3c) just can’t mean what (3b) does. The second possibility is PISH: the structure of (3a) is really (4), with the struck-out copy of every horse representing the unpronounced residue of the subject-predicate relation:
(4) every horse didn’t
[every horse] jump
over the fence
Given that there are two positions for every horse in the representation, we can interpret negation as either taking scope relative to either the higher one or the lower one.
Is there evidence in speech to children concerning the ambiguity of (3a)? If there is, then that might count as evidence that they could use to learn PISH and hence solve the poverty of the stimulus problem associated with (2). Here we run into two difficulties. First, Gennari and MacDonald (2005) show that these sentences do not occur in speech to children (and are pretty rare in speech between adults). Second, when we present such sentences to preschoolers, they appear to be relatively deaf to their ambiguity. Julien Musolino and I have written extensively on this topic and the take away message from those papers is (i) that children’s grammars can generate the wide-scope negation interpretation of sentences like (3a), but (ii), it takes a lot of either pragmatic or priming effort to get that interpretation to reveal itself. So, even if such sentences did occur in speech to children, their dominant interpretation from the children’s perspective is the one where the subject scopes over negation (even when that interpretation is not consistent with the context or the intentions of the speaker) and so this potential evidence is unlikely to be perceived as evidence of PISH. And if PISH is not learned from that, then we are left with a mystery of how it comes to be responsible for the pattern of facts in (2).
A second argument (due to Molly Diesing) in favor of PISH concerns the interpretation of bare plural subjects, like in (5):
(5) Linguists are available (to argue with)
This sentence is ambiguous between a generic and an existential reading of the bare plural subject. Under the generic reading, it is a general property of linguists (as a whole) that they are available. Under the existential reading, there are some linguists who are available at the moment.
Diesing observes that these two interpretations are associated with different syntactic positions in German. The generic interpretation requires the subject to be outside of the verb phrase. The existential interpretation requires it to be inside the verb phrase (providing evidence for the availability of the predicate-internal position crosslinguistically). So, Diesing argues that we can capture a cross-linguistic generalization about the interpretations of bare plural subjects by positing that the same mapping between position and interpretation occurs in English. The difference is that in English, the existential interpretation is associated with the unpronounced residue of the subject inside the predicate. This is not exactly evidence in favor of PISH, but PISH allows us to link the German and English facts together in a way that PISH-less theory would not. So we could take it as evidence for PISH.
Now, this one is a bit trickier to think about when it comes to acquisition. Should learners take evidence of existential interpretations of bare plural subjects to be evidence of PISH? Maybe, if they already know something about how positions relate to interpretations. But in the end, the issue is moot because Sneed (2007) showed that in speech to children, bare plural subjects are uniformly used with the generic interpretation. How children come to know about the existential readings is itself a poverty of the stimulus argument (and one that could be solved by antecedent knowledge of PISH and the rules for mapping from syntactic position to semantic interpretation). So, if we think that the facts in (2) follow from PISH, then we still need a source for PISH in speech to children.
The final argument that I can think of in favor of PISH comes from Jane Grimshaw. She shows that it is possible to coordinate an active and a passive verb phrase:
(6) Norbert insulted some psychologists and was censured
The argument takes advantage of three independent generalizations. First, passives involve a relation between the surface subject and the object position of the passive verb, represented here by the invisible residue of Norbert:
(7) Norbert was censured
Second, extraction from one conjunct in a coordinated structure is ungrammatical (Ross’s 1968 Coordinate Structure Constraint):
(8) * Who did Norbert criticize the book and Jeff insult
Third, extraction from a conjunct is possible as long as the extracted phrase is associated with both conjuncts (Across The Board extraction):
(9) Who did Norbert criticize and Jeff insult
So, if there were no predicate internal subject position in (6), then we would have the representation in (10):
(10) Norbert [VP insulted some psychologists] and [VP was censured
This representation violates the coordinate structure constraint and so the sentence is predicted to be ungrammatical, contrary to fact. However, if there is a predicate internal subject position, then the sentence can be represented as an across the board extraction:
(11) Norbert [VP
insulted some psychologists] and [VP was censured [Norbert]]
So, we can understand the grammaticality of (6) straightforwardly if it has the representation in (11), as required by PISH.
Do sentences like (6) occur in speech to children? I don’t know of any evidence about this, but I also don’t think it matters. It doesn’t matter because if the learner encountered (6), that datum would support either PISH or the conclusion that movement out of one conjunct in a coordinate structure is grammatical (i.e, that the coordinate structure constraint does not hold). If there is a way of determining that the learner should draw the PISH conclusion and not the other one, I don’t know what it is.
So, there’s a potential avenue for the stimulus-poverty-skeptic to show that the pattern in (2) follows from the data. First show that data like (6) occurs at a reasonable rate in speech to children, whatever reasonable means. Then show how the coordinate structure constraint can be acquired. Then build a model showing how putting (6) together with an already acquired coordinate structure constraint will lead to the postulation of PISH and not to the discarding of the coordinate structure constraint. And if that project succeeds, it will be party time; we will have made serious progress on solving a poverty of the stimulus problem.
But for the moment, the best solution on the table is the one in which PISH is innate. This solution is the best because it explains with a single mechanism the pattern of facts in (2), the ambiguity of (3), the interpretive properties of (5) and its German counterparts, and the grammaticality of (6). And it explains how each of these can be acquired in the absence of direct positive evidence. Once learners figure out what subjects and predicates look like in their language, these empirical properties will follow deductively because the learners will have been forced by their innate endowment to build PISH-compatible representations.
One final note. I am confident that no stimulus-poverty-skeptics will change their views on the basis of this post (if any of them even see it). And it is not my intention to get them to. Rather, I am offering an invitation to work on the kinds of problems that poverty of the stimulus arguments raise. It is highly likely that the analyses I have presented are incorrect and that scientists with different explanatory tastes would follow different routes to a solution. But we will all have a lot more fun if we engage at least some of the same kinds of problems and do not deny that there are problems to solve. The charge that we haven’t looked hard enough to find out how the data really is evidentiary is hereby dismissed. But if there are stimulus-poverty-skeptics who want to disagree about something real, linguists are available.
Jeff Lidz, November 23, 2014.