This paper by Berwick, Pietroski, Yankama and Chomsky (BPYC) offers an excellent succinct review of the logic of the Poverty of Stimulus argument (POS). In addition it provides an up to date critical survey of putative “refutations.” Let me highlight (and maybe slightly elaborate) on some points that I found particularly useful.
First, they offer an important perspective, one in which the POS is one step in a more comprehensive enterprise. What’s the utility in identifying “innate domain specific factors” of linguistic cognition? It is “part of a larger attempt to…isolate the role of other factors [my emphasis](1209).” In other words, the larger goal is to resolve linguistic cognition into four possible factors; (i) innate domain specific, (ii) innate domain general, (iii) external stimuli, and (iv) effects of “natural law.” Understanding (i) is a critical first step in better specifying the roles of the other three factors and how they combine to produce linguistic competence. As they put it:
The point of a POS argument is not to replace “learning” with appeals to “innate principles” of Universal Grammar (UG). The goal is to identify factor (i) contributions to linguistic knowledge, in a way that helps characterize those contributions. One hopes for subsequent revision and reduction of the initial characterization, so that 50 years later, the posited UG seems better grounded (1210).
I have the sense that many of those that get antsy with UG and POS do so because they see it as smug explanatory complacency: all linguists do is shove all sorts of stuff into UG and declare the problem of linguistic competence solved! Wrong. What linguists have done is create a body of knowledge outlining some non-trivial properties of UG. In this light, for example, GB’s principles and parameters can be understood as identifying a dozen or so “laws of grammar,” which can now themselves become the object of further investigation, e.g. Are these laws basic or derived from more basic cognitive and physical factors? (minimalists believe the latter), What more basic factors are these? (Chomsky thinks Merge is the heart of the system and speculation abounds about whether it is domain general or linguistically specific) and so forth. The principles/laws, in other words, become potential explananda in their own right. There is nothing wrong with reducing proposed innate principles of UG to other factors (in fact, this is now the parlor game of choice in contemporary minimalist syntax). However, to be worthwhile it helps a lot to start with a relatively decent description of the facts that need explaining and the POS has been a vital tool in establishing these facts. Most critiques of the POS fail to appreciate how useful it is for identifying what needs to be explained.
The second section of the BPYC paper recaps the parade case for POS, Aux to Comp movement in Y/N questions. It provides an excellent and improved description of the main facts that need explanation. It identifies the target of inquiry to be the phenomenon of “constrained homophony,” i.e. humans given a word-string will understand it to have a subset of the possible interpretations logically attributable to it. Importantly, native speakers find both that strings have the meanings they do and don’t have the meanings they don’t. The core phenomenon is “(un)acceptability under an interpretation." Simple unacceptability (due to pure ungrammaticality with no interpreations) is the special case of having zero readings. Thus what needs explanation is how given ordinary experience native speakers develop a capacity to identify both the possible and impossible interpretations and thus:
…language acquisition is not merely a matter of acquiring a capacity to associate word strings with interpretations. Much less is it a mere process of acquiring a (weak generative) capacity to produce just the valid word strings of the language (1212).
As BPYC go on to show in §4, the main problem with most of the non-generative counter proposals is that they simply misidentify what needs to be explained. None of the three proposals on offer even discuss the problem of constrained homophony, let alone account for it. BPYC emphasize this in their discussion of string-based approaches in §4.1. However, the same problem extends to the Reali and Christensen bi-gram/tri-gram/recurrent network models discussed in §4.3 and the Perfors, Tanenbaum and Regier paper in §4.2, though BPYC don’t emphasize this in their discussion of the latter two.
The take home message from §2 and §4.1 and §4.2 is that even for this relatively simple case of Y/N question formation, critics of the POS have “solved” the wrong problem (often poorly, as BPYC demonstrate).
Sociologically, the most important section of the BPYC paper is §4.2. Here they review a paper by Perfors, Tenenbaum and Regier (PTR) that has generated a lot of buzz. They effectively show that where PTR is not misleading, it fails to illuminate matters much. The interested reader should take a careful look. I want to highlight two points.
First, contrary to what one might expect given the opening paragraphs PTR do not engage with the original problem that Aux-inversion generated; whether UG requires that grammatical (transformational) rules be structure dependent. PTR address another question: whether the primary linguistic data contains information that would force an ideal learner to choose a Phrase Structure Grammar (PSG) over either a finite list or a right regular grammar (generates only right branching structures). PTR conclude that given these three options there is sufficient information in the Primary Linguistic Data (PLD) for an ideal learner to choose a PSG type grammar over the other two. Whatever the interest of this substitute question, it is completely unrelated to the original one Chomsky posed. Whether grammars have PSG structure says nothing about whether transformations are structure or linearly dependent processes. This point was made as soon as the PTR paper saw the light of day (I heard Lasnik and Uriagereka make it at a conference at MIT many years ago where the paper was presented) and it is surprising that the published PTR version did not clearly point out that the authors were going to discuss a completely different problem. I assume it’s because the POS problem is sexier than the one that PTR actually address. Nothing like a little bait and switch to goose interest in one’s work.
Putting this important matter aside, it’s worth asking what PTR’s paper shows on its own terms. Curiously, it does not show that kids actually use PLD to choose among the three competing possibilities. They cannot show this for PTR is exploring the behavior of ideal learners, not actual ones. How close kids are to ideal learners is an open question. And similar to one Chomsky long ago considered. Chomsky’s reasons for moving from evaluation measure models (as in Aspects) to parameter setting ones (as in LGB) revolved around the computational feasibility of providing an ordering of alternative grammars necessary for having usable evaluation metrics. Chomsky thought (and still thinks) that this is a very tall computational order, one unlikely to be realizable. The same kind of feasibility issues affects PTR’s idealization. How computationally feasible is to assume that are able to order, compare and decide among the many grammars compatible with the data that are in the hypothesis space? The larger the space of options available for comparison the more demanding the problem typically is. When looking for needles, choose small haystacks. In the end, what PTR shows is that there is usable information in the PLD that were it used could choose PSGs over the two other alternatives. It does not show that kids do or can use this information effectively.
The results are IMHO more modest still. The POS argument has been used to make the rationalist point that linguistically capable minds come stocked full of linguistically relevant information. PTR’s model agrees. The ideal mind comes with the three possible options pre-coded in the hypothesis space. What they show is that given such a specification of the hypothesis space, the PLD could be used to choose among them. So, as presented, the three grammatical options (note the domain specificity: it's grammars in the hypothesis space) are given (i.e. innately specified). What's “learned” (i.e. data driven) is not the range of options but the particular option selected. What pposition does PTR argue against? It appears to be the following position: only by delimiting the hypothesis space of possible grammars so as to exclude all but PSGs can we explain why the grammars attained are PSGs (which of course they are not, as we have known for a long while, but ignore that here). PTR's proposal is that it's ok to widen the options open for consideration to include non-PSG grammars because the PLD suffices to single out PSGs in this expanded space.
I can’t personally identify any generativist who has held the position PTR targets. There are two dimensions in a Bayesian scenario: (i) the possible options the hypothesis space delimits, (ii) a possible weighting of the given options giving some higher priors than others (making them less marked in linguistic terms). These dimensions are also part of Chomsky’s abstract description of the options in Aspects chapter 1 and crops up in current work on questions of whether some parameter value is marked. So far as I can tell, the rationalist ambitions the POS serves are equally well met by theories that limit the size of the hypothesis space and those that widen it but make some options more desirable than others via various kinds of (markedness) measures (viz. priors). Thus, even disregarding the fact that the issue PTR discuss is not what generativists mean by structure dependence, it is not clear how revelatory their conclusions are as their learning scenario assumes exactly the kind of richly structured domain specific innate hypothesis space the POS generally aims to establish. So, if you are thinking that PTR gets you out from under rich domain specific innate structures, think again. Indeed if anything, PTR pack more into the innate hypothesis space than generativists typically do.
Rabbinic law requires that every Jew rehash the story of the exodus every year on Passover. Why? The rabbis answer: it’s important for everyone in every generation to personally understand the whys and wherefores of the exodus, to feel as if s/he too were personally liberated, lest it’s forgotten how much was gained. The seductions of empiricism are almost alluring as “the fleshpots of Egypt.” Reading BPYC responsively, with friends, is excellent antidote, lest we forget!