One of the perks of academic life is the opportunity to
learn from your colleagues. This semester I am sitting in on Jeff Lidz’s
graduate intro to acquisition course. And what’s the first thing I learn? That
Jerry Fodor wrote a paper in 1966 on how to exploit Poverty of Stimulus
reasoning that is as good as anything I’ve ever seen (including my own
wonderful stuff). The paper is called “How to learn to talk: some simple ways”
and it appears here.
I cannot recommend it highly enough. I want to review some of the main
attractions, but really you should go and get a copy. It’s really a good read.
Fodor starts with three obvious points:
1. Speakers
have information about the structural relations “within and among the sentences
of that language” (105).
2. Some
of the speaker’s information must be learned (105).
3. The
child must bring to the task of language learning “some amount of intrinsic structure” (106).[1]
None of these starting points can be controversial. (1) just
says that speakers have a grammar, (2) that some aspects of this internalized
grammar are acquired on the basis of exposure to instances of that grammar and
(3) that “any organism that extrapolates from experience does so on the basis
of principles that are not themselves supplied by its experience” (106). Of
course, whether these principles are general or language-specific is an
empirical question.
Given this troika of truisms, Fodor then asks how we can
start investigating the “psychological process involved in the assimilation of
the syntax of a first language” (106). He notes that this process requires at
least three variables (106):
4. The
observations (i.e. verbalizations in its vicinity) that the child is exposed to
that it uses
5. The
learning principles that the child uses to “organize and extrapolate these
observations”
6. The
body of linguistic information that are the output of the application of the
principles to the data that the child will subsequently use in speaking and
understanding
We can, following standard convention, call (4) the PLD
(Primary Linguistic Data), (5) UG, and (6) the G acquired. Thus, what we have
here is the standard description of the problem as finding the right UG that
can mediate PLD and G; UG(PLDL) = GL. Or as Fodor puts it
(107): “the child’s data plus his intrinsic structure must jointly determine the
linguistic information at which he arrives.”
And this, of course, has an obvious consequence:
…it is a conclusive disproof of any
theory that about the child’s intrinsic structure to demonstrate that a device
having the structure could not learn the syntax of the language on the basis of
the kind of data that the child’s verbal environment provides (107).
This just is the Poverty of Stimulus argument (PoS): a tool
for investigating the properties of UG by comparing the PLD to the features of
the acquired G. In Fodor’s words:
…a comparison of the child’s data
with a formulation of the linguistic information necessary to speak the
language the child learns permits us to estimate the nature of the complexity
of the child’s intrinsic structure. If the information in the child’s data
approximates the linguistic information he must master, we may assume that the
role of intrinsic structure is relatively insignificant. Conversely, if the
linguistic information at which the child arrives is only indirectly or
abstractly related to the data provided by the child’s exposure to adult
speech, we shall to suppose that the child’s intrinsic structure is
correspondingly complex.
I like Fodor’s use of the term ‘estimate.’ Note, so framed,
the question is not whether there is
intrinsic structure in the learner, but how “significant” it is (i.e. how much
work it does in accounting for the acquired knowledge). And the measure of
significance is the distance between the information the PLD provides and the G
acquired.
Though Fodor doesn’t emphasize this, it means that all
proposals regarding language learning must advert to Gs (i.e. to rules). After
all, these are the end points of the process as Fodor describes it. And
emphasizing this can have consequences. So, for example, observing that there
are statistical regularities in the PLD is perhaps useful, but not enough. A
specification of how the observed regularities lead to the rules acquired must
be provided. In other words, as regularities are not themselves rules of G (though they can be the basis
on which the learner infers/decides what rules G contains) accounts that stop
at adumbrating the statistical regularities in a corpus are accounts that stop
much too soon. In other words, pointing to such regularities (e.g. noting that
a statistical learner can divide some set of linguistic data into two pieces)
cannot be the final step in describing what the learner has learned. There must
be a specification of the rule.
This really is an important point. Much stuff on statistical
properties of corpora seem to take for granted that identifying a stats
regularity in and of itself explains something. It doesn’t, at least in the
case of language. Why? Because we know
that there lurks rules in them there corpora. So though finding a regularity
may in fact be very helpful (identifying that which the LAD looks for to help
determine the properties of the relevant rules) it is not by itself enough. A
specification of the route from the regularity to the rule is required or we have
not described what is going on in the LAD.
Fodor then proceeds to flesh this tri-partite picture out.
The first step is to describe the PLD. It consists of “a sample of the kinds of
utterances fluent speakers of his language typically produce” (108). If
entirely random (indeed, even if not), this “corpus” will be pretty noisy (i.e.
slips of the tongue, utterances in different registers, false starts, etc.).
The first task the child faces is to “discover regularities in these data
(109).” This means ignoring some portions of the corpus and highlighting and
grouping others. In other words, in language acquisition, the data is not given. It must be constructed.
This is not a small point. It is possible that some
pre-sorting of the data can be done in the absence of the relevant grammatical
categories and rules that are the ultimate targets of acquisition, but it is
doubtful (at least to me) that these will get the child very far. If this is
correct, then simply regimenting the data in a way useful to G acquisition will
already require a specification of given
(i.e. intrinsic) grammatical possibilities. Here’s Fodor on this (109):
His problem is to discover
regularities in these data that, at the very least, can be relied upon to hold
however much additional data is added. Characteristically the extrapolation
takes the form of a construction of a theory that simultaneously marks the
systematic similarities among the data at various levels of abstraction,
permits the rejection of some of the observational data as unsystematic, and
automatically provides a general characterization of the possible future
observations. In the case of learning
language, this theory is precisely the linguistic information at which the
child arrives by applying his intrinsic information to the analysis of the
corpus. In particular, this linguistic information is at the very least
required to provide an abstract account of syntactic structure in terms of
which systematically relevant features of the observed utterance can be
discarded as violating the formation rules of the dialect, and in terms of
which the notion “possible sentence of the language” can be defined (my
emphasis, NH).
Nor is this enough. In addition we need principles that
bridge from the provided data to the principles. What sorts of bridges? Fodor
provides some suggestions.
…many of the assertions the child
hears must be true, many of the things he hears referred to must exist, many of
the questions he hears must be answerable, and many of the commands he receives
must be performable. Clearly the child could not learn to talk if adults talked
at random. (109-110).
Thus, in addition to a specification of the given
grammatical possibilities, part of
the acquisition process involves correlating linguistic input with available
semantic information.
This is a very complicated process, as Fodor notes, and
involves the identification of a-linguistic predicates that can be put into
service absent knowledge of a G (those enjoying “epistemologically priority” in
Chomsky’s sense). Things like UTAH can play a role here. For example, if we
assume that the capacity for parsing a situation into agents and patients and
recipients and themes and experiencers and…is a-linguistic, then this information can be used to map words (once
identified) onto structures. In particular, if one assumes a mapping principle
that says agents are external arguments and themes are internal arguments (so
we can identify agenthood and themehood for at least a core number of initially
available predicates) then hearing “Fido is biting Max” allows one to build a
representation of this sentence such as [Fido [biting [Max]]].[2]
This representation when coupled with what UG requires of Gs (e.g. case on DPs,
agreement of unvalued phi features etc.) should allow this initial seeding to
generate structures something like [TP Fido is [VP Fido [V’
biting Max]]]]. As Fodor notes, even if “Fido is biting Max” is observable, [TP
Fido is [VP Fido [V’ biting Max]]]] is not. The PoS
problem then is how to go from things like the first (utterances of sentences) to
things like the second (phrase markers of sentences), and for this, a
specification of the allowable grammatical options seems unavoidable, with
these options (not themselves given in the data) being necessary for the child to
organize the data in a usable form.
One of the most useful discussions in the paper begins
around p. 113. Here Fodor distinguishes two problems: (i) specification of a
device that given a corpus provides the correct G for that input, and (ii)
specification of a device that given a corpus only attempts “to describe its
input in terms of the kinds of relations that are known to be relevant to
systematic linguistic description.” (i) is the project of explaining how
particular Gs are acquired. (ii) is the project of adumbrating the class of
possible Gs. Fodor describes (ii) as an “intermediate problem” (113) on the way
to (i) (see here
for some discussion of the distinction and some current ways of exploring the
bridge). Why “intermediate”? Because it is reasonable to believe that
restricting the class of possible Gs, (what Fodor describes as “characterizing
a device that produces only non-“phony” extrapolations of corpuses” (114)) will
contribute to understanding how a child settles on its specific G. As Fodor
notes, there are “indefinitely many absurd hypothesis” that a given corpus is
consistent with. And “whatever intrinsic structure the child brings to the
language-learning situation must a least be sufficient to preclude the
necessity of running through” them all.
So, one way to start addressing (i) is via (ii). Fodor also
suggests another useful step (114-5):
It is worth considering the
possibility that the child may bring to the language-learning situation a set
of rules that takes him from the recognition of specified formal relations
within and among strings in his data to specific putative characterizations of
underlying structures for strings of those types. Such rules would implicitly
define the space of hypotheses through which the child must search in order to
arrive at the precisely correct syntactic analysis of his corpus.
So, Fodor in effect suggests two intermediate steps: a
characterization of the range of possible extensions of a corpus (roughly a
theory of possible Gs) and a specification of “learning rules” that specify
trajectories through this space of possible Gs. This still leaves the hardest
problem, how to globally order the Gs themselves (i.e. the evaluation metric).
Here’s Fodor again (115):
Presumably the rules [the learning
rules, NH] would have to be so formulated as to assume (1) that the number of
possible analyses assigned to a given corpus is fairly small; (2) that the
correct analysis (or at least any even a best analysis) is among these; (3)
that the rules project no analysis that describes the corpus in terms of the
sorts of phony properties already discussed, but that all the analyses exploit
only relations of types that sometimes figure in adequate syntactic theories.
Fodor helpfully provides an illustration of such a learning
rule (116-7). It maps non-contiguous string dependencies into underlying G
rules that related these surface dependencies via a movement operation. The
learning rule is roughly (1):
(1) Given
a string ‘I X J’ where the forms of I and J swing together (think be/ing in is
kissing’ over a variable X (i.e. ‘kiss’), the learner assumes that
this comes from a rule like (I,J) X à I X J.
This is but one example. Fodor notes as well that we might
also find suggestions for such rules in the “techniques of substitution and
classification traditionally employed in attempts to formulate linguistic
discovery procedure[s].” As Fodor notes, this is not to endorse attempts to find
discovery procedures. This project failed. Rather in the current more
restricted context, these techniques might prove useful precisely because what
we are expecting of them is more limited:
I am proposing…that the child may
employ such relations as substitutability-in-frames to arrive at tentative
classifications of elements and sequences of elements in his corpus and hence
at tentative domains for the application of intrinsic rules for inducing base
structures. Whether such a classification is retained or discarded would be
contingent upon the relative simplicity of the entire system of which it forms
a part.
In other words, these procedures might prove useful for
finding surface statistical regularities in strings and mapping them to
underlying G rules (drawn from the class of UG possible G rules) that generate
these strings.
Fodor notes two important virtues of this way of seeing
things. First, it allows the learner to take advantage of “distributional
regularities” in the input to guide him to “tentative analyses that are
required if he is to employ rules that project putative descriptions of
underlying structure” (118). Second, these learning procedures need not be
perfect, and more often than not might be wrong. The idea is that their
frailties will be adequately compensated for by the intrinsic features of the
learner (i.e. UG). In other words, he provides a nice vision of how techniques
of statistical analysis of the corpus can be combined with principles of UG to
(partially (remember, we still need an evaluation metric for a full story))
explain G acquisition.
There is lots more in the paper. It is even imbued with a
certain two fold modesty.
First, Fodor’s outline starts by distinguishing two
questions; a hard one (that he suggests we put aside for the moment) and an
approachable one (that he outlines). The hard question is (in his words) “What
sort of device would project a unique correct grammar on the basis of exposure
to the data?” The approachable one is “What sort of device would project
candidate grammars that are reasonably sensitive to the contents of the corpus
and that operate only with the sorts of relations that are known to figure in
linguistic descriptions?” (120). IMO, Fodor makes an excellent case for
thinking that solving the approachable problem would be a good step towards
answering the hard one. PoS arguments fit into this schema in that they allow
us to plumb UG, which serves to specify the class of “non-phony”
generalizations. Add that to rules
taking you from surface regularities to potential G analyses and you have the
outlines of a project aimed at addressing the second question.
Fodor’s second modest moment comes when he acknowledges that
his proposal “is surely incorrect as stated” (120). Here he is, IMO, the
modesty is misplaced. Fodor’s proposal may be wrong in detail, but it lays out
the various kinds of questions that need addressing and some mechanisms for how
to do so lucidly and concisely.
As I noted, it’s great to sit in on other’s classes. It’s
great to discover “unknown-to-you” classics. So, take a busman’s holiday. It’s
fun.
[1]
Fodor like the term ‘intrinsic’ rather than ‘innate’ for he allows for the
possibility that some of these principles may themselves be learned. I would
also add that ‘innate’ seems to raise red flags for some reason. As Fodor notes
(as has Chomsky repeatedly) there cannot reasonably be an “innateness
hypothesis.” Where there is generalization from data, there must be principles
licensing this generalization. The question is not whether these given
principles exist, but what they are. In this sense, everyone is a nativist.
[2]
Of course, if one has a rich enough UG then this will also allow a derivation
of the utterance wherein case and agreement have been discharged, but this is
getting ahead of ourselves. Right now, what is relevant is that some semantic
information can be very useful for acquiring syntactic structure even if, as Fodor notes, syntax is not
reducible to semantics.
When I click the first 'here' I get a link to The Genesis of Language: A Psycholinguistic Approach Paperback – August 15, 1968 by Frank Smith (Editor), George A. Miller (Editor). Was this intended? Is there a link to the actual paper by Jerry Fodor available?
ReplyDeleteSadly, there is not.
ReplyDeleteSerendipitously enough, I just happened to come across this paper a couple of weeks ago.
ReplyDeleteI thought it was characteristically 1965.
Not that the observation invalidates the proposals in this paper, but I think it's at least worth noting that Fodor has been sketching out a number of proposals since the 80's that entail a quite contrary view of acquisition.
Some of the findings deepen PoS arguments significantly while distancing further from anything resembling a discovery procedure. I'm thinking of his research into problems of 'holism' in acquisition and use of language and thought. So far as I know, Fodor, hasn't published specifically on the topic of acquisition on syntax with respect to issues of holism and such, but he often makes comments about contemporary generative theories of syntax and phonology that relate to this recent work.
The tl;dr of it is: having spent the past semester devouring fodors work starting from the present and going backwards in time, i could hardly recognize this as one of his papers.
Max, could you elaborate a bit. I'm not sure what you are pointing to. But it sounds interesting.
Delete@Christina and any others who are interested: Here is a scan of Fodor's paper. Sorry for all the markup. I scanned it from a copy of the book in my library.
ReplyDeleteah gee
DeleteThank you very much, Adam.
ReplyDelete