One of the perks of academic life is the opportunity to learn from your colleagues. This semester I am sitting in on Jeff Lidz’s graduate intro to acquisition course. And what’s the first thing I learn? That Jerry Fodor wrote a paper in 1966 on how to exploit Poverty of Stimulus reasoning that is as good as anything I’ve ever seen (including my own wonderful stuff). The paper is called “How to learn to talk: some simple ways” and it appears here. I cannot recommend it highly enough. I want to review some of the main attractions, but really you should go and get a copy. It’s really a good read.
Fodor starts with three obvious points:
1. Speakers have information about the structural relations “within and among the sentences of that language” (105).
2. Some of the speaker’s information must be learned (105).
3. The child must bring to the task of language learning “some amount of intrinsic structure” (106).
None of these starting points can be controversial. (1) just says that speakers have a grammar, (2) that some aspects of this internalized grammar are acquired on the basis of exposure to instances of that grammar and (3) that “any organism that extrapolates from experience does so on the basis of principles that are not themselves supplied by its experience” (106). Of course, whether these principles are general or language-specific is an empirical question.
Given this troika of truisms, Fodor then asks how we can start investigating the “psychological process involved in the assimilation of the syntax of a first language” (106). He notes that this process requires at least three variables (106):
4. The observations (i.e. verbalizations in its vicinity) that the child is exposed to that it uses
5. The learning principles that the child uses to “organize and extrapolate these observations”
6. The body of linguistic information that are the output of the application of the principles to the data that the child will subsequently use in speaking and understanding
We can, following standard convention, call (4) the PLD (Primary Linguistic Data), (5) UG, and (6) the G acquired. Thus, what we have here is the standard description of the problem as finding the right UG that can mediate PLD and G; UG(PLDL) = GL. Or as Fodor puts it (107): “the child’s data plus his intrinsic structure must jointly determine the linguistic information at which he arrives.”
And this, of course, has an obvious consequence:
…it is a conclusive disproof of any theory that about the child’s intrinsic structure to demonstrate that a device having the structure could not learn the syntax of the language on the basis of the kind of data that the child’s verbal environment provides (107).
This just is the Poverty of Stimulus argument (PoS): a tool for investigating the properties of UG by comparing the PLD to the features of the acquired G. In Fodor’s words:
…a comparison of the child’s data with a formulation of the linguistic information necessary to speak the language the child learns permits us to estimate the nature of the complexity of the child’s intrinsic structure. If the information in the child’s data approximates the linguistic information he must master, we may assume that the role of intrinsic structure is relatively insignificant. Conversely, if the linguistic information at which the child arrives is only indirectly or abstractly related to the data provided by the child’s exposure to adult speech, we shall to suppose that the child’s intrinsic structure is correspondingly complex.
I like Fodor’s use of the term ‘estimate.’ Note, so framed, the question is not whether there is intrinsic structure in the learner, but how “significant” it is (i.e. how much work it does in accounting for the acquired knowledge). And the measure of significance is the distance between the information the PLD provides and the G acquired.
Though Fodor doesn’t emphasize this, it means that all proposals regarding language learning must advert to Gs (i.e. to rules). After all, these are the end points of the process as Fodor describes it. And emphasizing this can have consequences. So, for example, observing that there are statistical regularities in the PLD is perhaps useful, but not enough. A specification of how the observed regularities lead to the rules acquired must be provided. In other words, as regularities are not themselves rules of G (though they can be the basis on which the learner infers/decides what rules G contains) accounts that stop at adumbrating the statistical regularities in a corpus are accounts that stop much too soon. In other words, pointing to such regularities (e.g. noting that a statistical learner can divide some set of linguistic data into two pieces) cannot be the final step in describing what the learner has learned. There must be a specification of the rule.
This really is an important point. Much stuff on statistical properties of corpora seem to take for granted that identifying a stats regularity in and of itself explains something. It doesn’t, at least in the case of language. Why? Because we know that there lurks rules in them there corpora. So though finding a regularity may in fact be very helpful (identifying that which the LAD looks for to help determine the properties of the relevant rules) it is not by itself enough. A specification of the route from the regularity to the rule is required or we have not described what is going on in the LAD.
Fodor then proceeds to flesh this tri-partite picture out. The first step is to describe the PLD. It consists of “a sample of the kinds of utterances fluent speakers of his language typically produce” (108). If entirely random (indeed, even if not), this “corpus” will be pretty noisy (i.e. slips of the tongue, utterances in different registers, false starts, etc.). The first task the child faces is to “discover regularities in these data (109).” This means ignoring some portions of the corpus and highlighting and grouping others. In other words, in language acquisition, the data is not given. It must be constructed.
This is not a small point. It is possible that some pre-sorting of the data can be done in the absence of the relevant grammatical categories and rules that are the ultimate targets of acquisition, but it is doubtful (at least to me) that these will get the child very far. If this is correct, then simply regimenting the data in a way useful to G acquisition will already require a specification of given (i.e. intrinsic) grammatical possibilities. Here’s Fodor on this (109):
His problem is to discover regularities in these data that, at the very least, can be relied upon to hold however much additional data is added. Characteristically the extrapolation takes the form of a construction of a theory that simultaneously marks the systematic similarities among the data at various levels of abstraction, permits the rejection of some of the observational data as unsystematic, and automatically provides a general characterization of the possible future observations. In the case of learning language, this theory is precisely the linguistic information at which the child arrives by applying his intrinsic information to the analysis of the corpus. In particular, this linguistic information is at the very least required to provide an abstract account of syntactic structure in terms of which systematically relevant features of the observed utterance can be discarded as violating the formation rules of the dialect, and in terms of which the notion “possible sentence of the language” can be defined (my emphasis, NH).
Nor is this enough. In addition we need principles that bridge from the provided data to the principles. What sorts of bridges? Fodor provides some suggestions.
…many of the assertions the child hears must be true, many of the things he hears referred to must exist, many of the questions he hears must be answerable, and many of the commands he receives must be performable. Clearly the child could not learn to talk if adults talked at random. (109-110).
Thus, in addition to a specification of the given grammatical possibilities, part of the acquisition process involves correlating linguistic input with available semantic information.
This is a very complicated process, as Fodor notes, and involves the identification of a-linguistic predicates that can be put into service absent knowledge of a G (those enjoying “epistemologically priority” in Chomsky’s sense). Things like UTAH can play a role here. For example, if we assume that the capacity for parsing a situation into agents and patients and recipients and themes and experiencers and…is a-linguistic, then this information can be used to map words (once identified) onto structures. In particular, if one assumes a mapping principle that says agents are external arguments and themes are internal arguments (so we can identify agenthood and themehood for at least a core number of initially available predicates) then hearing “Fido is biting Max” allows one to build a representation of this sentence such as [Fido [biting [Max]]]. This representation when coupled with what UG requires of Gs (e.g. case on DPs, agreement of unvalued phi features etc.) should allow this initial seeding to generate structures something like [TP Fido is [VP Fido [V’ biting Max]]]]. As Fodor notes, even if “Fido is biting Max” is observable, [TP Fido is [VP Fido [V’ biting Max]]]] is not. The PoS problem then is how to go from things like the first (utterances of sentences) to things like the second (phrase markers of sentences), and for this, a specification of the allowable grammatical options seems unavoidable, with these options (not themselves given in the data) being necessary for the child to organize the data in a usable form.
One of the most useful discussions in the paper begins around p. 113. Here Fodor distinguishes two problems: (i) specification of a device that given a corpus provides the correct G for that input, and (ii) specification of a device that given a corpus only attempts “to describe its input in terms of the kinds of relations that are known to be relevant to systematic linguistic description.” (i) is the project of explaining how particular Gs are acquired. (ii) is the project of adumbrating the class of possible Gs. Fodor describes (ii) as an “intermediate problem” (113) on the way to (i) (see here for some discussion of the distinction and some current ways of exploring the bridge). Why “intermediate”? Because it is reasonable to believe that restricting the class of possible Gs, (what Fodor describes as “characterizing a device that produces only non-“phony” extrapolations of corpuses” (114)) will contribute to understanding how a child settles on its specific G. As Fodor notes, there are “indefinitely many absurd hypothesis” that a given corpus is consistent with. And “whatever intrinsic structure the child brings to the language-learning situation must a least be sufficient to preclude the necessity of running through” them all.
So, one way to start addressing (i) is via (ii). Fodor also suggests another useful step (114-5):
It is worth considering the possibility that the child may bring to the language-learning situation a set of rules that takes him from the recognition of specified formal relations within and among strings in his data to specific putative characterizations of underlying structures for strings of those types. Such rules would implicitly define the space of hypotheses through which the child must search in order to arrive at the precisely correct syntactic analysis of his corpus.
So, Fodor in effect suggests two intermediate steps: a characterization of the range of possible extensions of a corpus (roughly a theory of possible Gs) and a specification of “learning rules” that specify trajectories through this space of possible Gs. This still leaves the hardest problem, how to globally order the Gs themselves (i.e. the evaluation metric). Here’s Fodor again (115):
Presumably the rules [the learning rules, NH] would have to be so formulated as to assume (1) that the number of possible analyses assigned to a given corpus is fairly small; (2) that the correct analysis (or at least any even a best analysis) is among these; (3) that the rules project no analysis that describes the corpus in terms of the sorts of phony properties already discussed, but that all the analyses exploit only relations of types that sometimes figure in adequate syntactic theories.
Fodor helpfully provides an illustration of such a learning rule (116-7). It maps non-contiguous string dependencies into underlying G rules that related these surface dependencies via a movement operation. The learning rule is roughly (1):
(1) Given a string ‘I X J’ where the forms of I and J swing together (think be/ing in is kissing’ over a variable X (i.e. ‘kiss’), the learner assumes that this comes from a rule like (I,J) X à I X J.
This is but one example. Fodor notes as well that we might also find suggestions for such rules in the “techniques of substitution and classification traditionally employed in attempts to formulate linguistic discovery procedure[s].” As Fodor notes, this is not to endorse attempts to find discovery procedures. This project failed. Rather in the current more restricted context, these techniques might prove useful precisely because what we are expecting of them is more limited:
I am proposing…that the child may employ such relations as substitutability-in-frames to arrive at tentative classifications of elements and sequences of elements in his corpus and hence at tentative domains for the application of intrinsic rules for inducing base structures. Whether such a classification is retained or discarded would be contingent upon the relative simplicity of the entire system of which it forms a part.
In other words, these procedures might prove useful for finding surface statistical regularities in strings and mapping them to underlying G rules (drawn from the class of UG possible G rules) that generate these strings.
Fodor notes two important virtues of this way of seeing things. First, it allows the learner to take advantage of “distributional regularities” in the input to guide him to “tentative analyses that are required if he is to employ rules that project putative descriptions of underlying structure” (118). Second, these learning procedures need not be perfect, and more often than not might be wrong. The idea is that their frailties will be adequately compensated for by the intrinsic features of the learner (i.e. UG). In other words, he provides a nice vision of how techniques of statistical analysis of the corpus can be combined with principles of UG to (partially (remember, we still need an evaluation metric for a full story)) explain G acquisition.
There is lots more in the paper. It is even imbued with a certain two fold modesty.
First, Fodor’s outline starts by distinguishing two questions; a hard one (that he suggests we put aside for the moment) and an approachable one (that he outlines). The hard question is (in his words) “What sort of device would project a unique correct grammar on the basis of exposure to the data?” The approachable one is “What sort of device would project candidate grammars that are reasonably sensitive to the contents of the corpus and that operate only with the sorts of relations that are known to figure in linguistic descriptions?” (120). IMO, Fodor makes an excellent case for thinking that solving the approachable problem would be a good step towards answering the hard one. PoS arguments fit into this schema in that they allow us to plumb UG, which serves to specify the class of “non-phony” generalizations. Add that to rules taking you from surface regularities to potential G analyses and you have the outlines of a project aimed at addressing the second question.
Fodor’s second modest moment comes when he acknowledges that his proposal “is surely incorrect as stated” (120). Here he is, IMO, the modesty is misplaced. Fodor’s proposal may be wrong in detail, but it lays out the various kinds of questions that need addressing and some mechanisms for how to do so lucidly and concisely.
As I noted, it’s great to sit in on other’s classes. It’s great to discover “unknown-to-you” classics. So, take a busman’s holiday. It’s fun.
 Fodor like the term ‘intrinsic’ rather than ‘innate’ for he allows for the possibility that some of these principles may themselves be learned. I would also add that ‘innate’ seems to raise red flags for some reason. As Fodor notes (as has Chomsky repeatedly) there cannot reasonably be an “innateness hypothesis.” Where there is generalization from data, there must be principles licensing this generalization. The question is not whether these given principles exist, but what they are. In this sense, everyone is a nativist.
 Of course, if one has a rich enough UG then this will also allow a derivation of the utterance wherein case and agreement have been discharged, but this is getting ahead of ourselves. Right now, what is relevant is that some semantic information can be very useful for acquiring syntactic structure even if, as Fodor notes, syntax is not reducible to semantics.