Last week John Trueswell gave a colloquium talk at UMD that
I unfortunately could not attend. I was in Montreal at a workshop in honor of
an old prof of mine, Jim McGilvray. The
Montreal gig was great and it was a pleasure to be able to fete Jim in person
(he supervised one of my earliest linguistics projects, (my undergrad thesis on
a Reichenbachian theory of tense), but I confess that I would have loved to
have been at John’s talk as well. I await the day when some clever physicist
figures out how to allow someone to be in two places at once. Until that happy
time arrives, I thought it appropriate to do some penance for my physical
failings by re-reading a
terrific paper by Roediger and Arnold (R&A) on the history of one-trial
learning experiments. Why the R&A paper? Because, John’s recent work on
lexical acquisition is in the one-trial learning tradition whose history they
review. I’ve discussed this work already in a couple of places (here
and here)
but I wanted to bring your attention to the R&A paper again for it
highlights some of the more interesting implications of this line of research
for topics near and dear to my intellectual prejudices: despite the common
conviction that Empiricism has (at least) something going for it, there is a
remarkable absence of evidence
supporting this very weak view.[1]
Here’s what I mean.
Behind every theory there is an inspirational picture. In
the mental sciences, the two grand traditions, Rationalism (R) and Empiricism
(E), are animated by two contrasting conceptions of the underlying mechanisms
of mental life. For Es, the afflatus is the blank wax tablet, learning
consisting of imprinting by experience on this tablet and the clarity and distinctness
of the resultant concept/idea being a function of the number of repeated imprints.
The more the experience, the deeper and clearer the resulting acquired concept/idea. Here’s Ebbinghause’s
version (quoted in R&A: 129):
These relations [between repetition
and performance] can be described figuratively by speaking of the series as
being more or less deeply engraved on some mental substratum. To carry out this
figure: as the number of repetitions increases, the series are engraved more
and more deeply and indelibly; if the number of repetitions is small, the
inscription is but surface deep and only fleeting glimpses of the tracery can
be caught…
Note that on this conception, repeated experience forms the
concepts in the mind (e.g. makes the grooves). Repetition is critical for the
mind’s main character is its receptivity to external formative forces, the mind itself being structurally rudimentary. On
this view, to understand acquisition requires analyzing the fine structure of
the input for what minds/brains do in forming mental constructs (ideas,
concepts, etc.) is sift and manipulate these input experiences. It is not
surprising, that this view focuses on minds’ significant statistical capacities
for these are obvious candidate mechanisms for organizing the inputs and
separating the significant wheat from the non-significant chaff.
This contrasts with Rish proposals. For these, the mind is
very articulated. There is lots of given pre-experiential structure. Thus, the
role of experience is not to construct the relevant concepts attained but to
kick start them into activation. Experience on this view is a trigger, not an artificer. Not
surprisingly, this conception focuses on discovering the natively provided
mental structures that experience serves (importantly but modestly) to activate.
On considering these two different raw philosophical pictures,
one can understand the intrinsic interest in one-trial learning (OTL). The
existence of OTL would be a problem for E but not for R. The empirical question
then is whether OTL exists and how common it is. Investigating this requires
translating the philosophical pictures into testable theories, and this leads
to learning curves.
R&A observe that one of the biggest pieces of evidence
for the E view of the world is the classical learning curve; you know the one
that rises from low left to high right decelerating as it goes (as below
reproduced from R&A p. 128).
R&A note that this curve perfectly embodies the E
conception that the Ebbinghause quote poetically describes. R&A point out
two important features of learning curves consonant with the leading E idea.[2]
First, “[t]he fact that the learning curve shows a gradual increase in
performance is a reflection of the underlying mechanism- the build up of
strength- which is itself also gradual.” And second, that this curve is “the
same across astonishingly different experimental situations and dependent
measures, as well as across species from slugs to humans,” strongly suggesting
general “underlying mechanisms” and general “laws of learning” (129). In a
word, this curve, it is argued, puts paid to the R idea of triggering and its
concomitant conception of a highly structured mind. If learning curves describe
the mechanics of learning, then E beats R. [3]
Unless
this curve is actually an artifact of, e.g. how experimental data is crunched, rather
than a description of an underlying mental mechanism. And that’s where the story
that R&A tell gets really interesting.
In the late 1950s and early 1960s Irvin Rock and William
Estes (these two were big psych shots, look them up) did a series of
experiments that showed (at the very least) that this interpretation of the
curve as describing an underlying mechanism that is similarly smooth and
incremental is premature, and (at the most) that it was false. They showed two
things: (i) that these curves were both consistent with an underlying OTL
mechanism (i.e. “although the learning curves were continuous, the underlying
processes were anything but continuous” (p. 129)) and (ii) that there was very
good evidence that OTL is the norm. Let’s discuss each point separately.
Here’s R&A quoting Rock (that’s the “p.186” below) and
then commenting (p. 130) wrt (i):
“Another possibility is that
repetition is essential because only a limited number of associations can be
formed in one trial, and improvement with repetition is only an artifact of
working with long lists of items.” (p.186 ). … That subset is learned
perfectly, but all the rest of the associations that were presented are not learned at all….The “artifact” Rock
referred to is essentially that of averaging across many subjects learning many
lists on many trials: despite the
all-or-none nature of the underlying process, the learning curve will be smooth
when performance is averaged over these several parameters. (my
emphasis;NH)
This is a very important conceptual point for it divorces
the big E conclusion that the mechanisms of learning are gradual and driven by
repeated environmental inputs (viz. repeated engravings by experience on a
mental substratum) from the fact that learning curves have the shape they do
and can be found quite generally across tasks and species. Put more pointedly,
if this is correct, then the smooth shape of the learning curve implies nothing at all about the smoothness and
gradualness of the underlying mechanism.
Rock and Estes not only made this important observation but also
then went on to show that in classical cases of “learning”[4]
(i.e. acquiring paired associates), there is good evidence against the
classical picture. The basic set up was to have two groups, one that learned
listed pairs by going through one list again and again (the control group) and
a second that learns a list that removes the non-learned pairs so that they are
not encountered again. The prediction if E is correct is that the control group
will do better than the second group. I will not review Rock’s and Este’s experiments
here as that’s what the R&A paper does so well. Suffice it to say, that, as
R&A put it, their papers showed that “there was no hint for the
continuity/incremental hypothesis in the data” (p.131).
Note that if (ii) is correct, an account that uses an
incremental mechanism to derive
acquisition data correctly described
by the classical learning curve is incorrect.
Let me beat this horse good and dead: Point (i) shows that a classical
learning curve is consistent with OTL mechanisms. Point (ii) argues that OTL
mechanisms are in fact what we find. Hence,
if correct, theories that deploy incremental mechanisms even if they can derive
classical learning curves are wrong. This should not be surprising: such
curves graph a correlation between trials and responses. The aim of a theory is
not to “model” the data but to “model” the mechanisms that generate the data.
The E mistake, is to wrongly infer that smooth incremental data implies smooth
incremental mechanisms. It doesn’t, though thinking it does is an Eish
diathesis. [5]
It goes without saying that both Rock’s and Estes’ results
were contested. Methodological problems were purportedly found that confounded
the conclusions. However, and this is interesting, no good evidence for the classical theory appeared to be
forthcoming. Rather, the critics seemed satisfied with a draw, viz. showing
that the Rock/Estes results need not be interpreted as debunking the classical
E view. One particularly cute study that R&A report seemed satisfied with
the conclusion “that the incremental theory is untestable” or, quoting the
critics directly (Underwood and Keppel): “certain theories are not capable of
disproof. Certain aspects of the incremental theory seem to be of this nature”
(p. 135). If this be a vindication of the classical E view imagine what a
refutation would look like.
John’s current work on lexical acquisition develops the
Rock/Estes conception. This is what makes it so interesting for people like me.
If they are correct, then classical E conceptions of learning don’t exist, or,
more modestly, there is precious little evidence in its favor. To me, this has
enormous implications for standard approaches to modeling acquisition that
assume some form of gradual process taking place, some form of gradual hill
climbing or gradual strengthening of connections. Indeed, in some moods (e.g.
now), I think that this work, if correct, implies that Gallistel and Matzel are
correct and there is no general theory of learning to be had, as there are no mechanisms, mental or neural, that
correspond to what the E picture took learning to be.[6]
However, for now I would be happy with more modest conclusions: (i) that there
is precious little evidence in favor of the E conception, (ii) that there is
little evidence in favor of the view that mental mechanisms are gradual and
continuous, and (iii) that there is pretty good evidence that we have mechanisms
that enable what amounts to one trial “learning” and that this kind of
acquisition requires something very much like the classical R conception of the
mind.[7]
There is no good reason, in other words, for taking the E conception to be the
default and every reason to think that the problem of acquisition is largely
one of getting the pre-packaged representational formats correct.
Let me end with a request and an exhortation. First the
request: does anyone have a poster case of learning not susceptible to the
Rock/Estes critique from the psych literature. It would be nice to have one.
Second read the R&A papers and the recent papers developing these ideas by
John and Lila and Charles and Jesse and their students. If their insights are
internalized, we may finally be able to break the grip of E conceptions of
mental mechanisms as the default position. One, at least, can always hope;
after all we got rid of phlogiston, didn’t we?
[1]
I think it was Lila Gleitman who first advanced the following PoS argument:
Empiricism must be innate for what else could explain the widespread conviction
that it is true despite the dearth of evidence in its favor. Lila also was kind
enough to bring the R&A paper to my attention. I should also add that I
doubt that Lila would endorse my interpretation of this work as outlined below.
In fact, I am sure she wouldn’t given this.
[2]
Gallistel and Matzel (see here)
note that the LTP view of brains has been taken to similarly support an E
picture. G&M argue that the E picture is widely accepted in the neurosciences,
despite there being little to recommend it (and a lot to disavow it). R&A’s
discussion merges well with G&M’s and supports a similar conclusion.
[3]
R&A identify, in passing, an attraction of E views of learning to the
formally inclined that comes from the mathematical tractability of learning
curves. I quote: “Learning curves (like forgetting curves) are smooth and
beautiful, and psychologists with a mathematical bent can have a field day
fitting equations to them” (p.128). One
should never underestimate the attraction of a conclusion that fits snugly with
your available technology.
[4]
Note the scare quotes: if Rock and Estes (and Gleitman and Trueswell and
Company are right) then learning is a hypothesis
about the mechanisms of acquisition,
not a neutral description of an observed phenomenon. What we observe is change
over time given environmental inputs. This change may be due to learning,
maturation, growth, or whatever. The cognitive question concerns the mechanism
and learning is a proposal for one such.
[5]
This is the kind of mistake that modeling which takes the name of the game to
be getting the input/output relations right is particularly susceptible to. E
conceptions are prone to this kind of misconception given their picture that
mental structure mirrors the structure of the input. However, modeling I/O
relations confuses the data to be explained for the mechanism that does the
explanation. For a related point in a Bayesian context see Glymour on
“Osiander’s Psychology” in the comments to Jones & Love’s discussion of
modern Bayesianism here
and some blogish discussion by me (here).
J&L note a similarity between earlier behaviorist conceptions and some
modern Bayesian analyses. The above suggests how two programs that appear so
different on the surface might nonetheless lead to the same conceptual place
via a shared partiality to associationism and/or a misunderstanding of what modeling
is supposed to do.
[6]
After all, if (roughly) one trial “learning” is the norm then there is little
for E like mechanisms to do. Acquisition on this view is more akin to transduction
than to mental computation. I would probably be satisfied if it turned out that
there was “a little” learning, but this is a topic for another discussion.
[7]
After all if we don’t acquire knowledge by carefully sifting through the input
it’s because such sifting is not necessary and it wouldn’t be necessary if the
knowledge is basically, already, all there.
This comment has been removed by the author.
ReplyDeleteI am not sure I get this completely but does the artifact argument by R&A rely on the learning curve being created by multiple subjects that you average over? Put differently, what would the counterargument against incremental learning be if you find a learning curve within a single individual? This is for instance how McLelland & Patterson (2002) reanalyze Pinker & Ullman's data on the acquisition of regular past tense.
ReplyDeletethey note that the curve results from averaging over individuals, trials and items. Look at the paper.
ReplyDeleteYou mean the R&A paper? They don't say much that I can relate to the case I have in mind, as far as I can tell. I can see how a learning curve averaging over different individuals may hide multiple distinct OTL's for distinct individuals, but if a learning curve for one learner averages over different items (say, different verbs correctly inflected for past tense), then it would at most hide multiple distinct OTL's for different verbs correctly inflected for past tense. But it doesn't hide a distinct OTL for the past tense RULE. Which is good news for the connectionist. What am I missing?
ReplyDeleteIf I understand this correctly, it averages over how many trials it takes to learn the full paradigm even though whatever is learned is learned at once or not at all. The relevant information is "many subjects learning many lists over many trials." All three cases of averaging can have this effect. The Gleitman and Trueswell stuff involves averaging over how many guess it takes to acquire the lexical item. Though at each guess its get it or not get it depending on how many guess it takes for an individual on a given list gives the smooth effect. I would look into the Rock paper if you want the details for he worries about it there. I think that Gleitman and Trueswell do to so there might be a good place to look. Hope this helps.
ReplyDeleteYes, I will have a look at the Rock paper itself, coz I don't immediately see how what may be the case for the learning of vocabulary items can be extended to individual learning curves of rules.
ReplyDelete