Monday, November 4, 2013

Learning is to cognition what phlogiston is to chemistry

Last week John Trueswell gave a colloquium talk at UMD that I unfortunately could not attend. I was in Montreal at a workshop in honor of an old prof of mine, Jim McGilvray.  The Montreal gig was great and it was a pleasure to be able to fete Jim in person (he supervised one of my earliest linguistics projects, (my undergrad thesis on a Reichenbachian theory of tense), but I confess that I would have loved to have been at John’s talk as well. I await the day when some clever physicist figures out how to allow someone to be in two places at once. Until that happy time arrives, I thought it appropriate to do some penance for my physical failings by re-reading a terrific paper by Roediger and Arnold (R&A) on the history of one-trial learning experiments. Why the R&A paper? Because, John’s recent work on lexical acquisition is in the one-trial learning tradition whose history they review. I’ve discussed this work already in a couple of places (here and here) but I wanted to bring your attention to the R&A paper again for it highlights some of the more interesting implications of this line of research for topics near and dear to my intellectual prejudices: despite the common conviction that Empiricism has (at least) something going for it, there is a remarkable absence of evidence supporting this very weak view.[1] Here’s what I mean.

Behind every theory there is an inspirational picture. In the mental sciences, the two grand traditions, Rationalism (R) and Empiricism (E), are animated by two contrasting conceptions of the underlying mechanisms of mental life. For Es, the afflatus is the blank wax tablet, learning consisting of imprinting by experience on this tablet and the clarity and distinctness of the resultant concept/idea being a function of the number of repeated imprints. The more the experience, the deeper and clearer the resulting acquired concept/idea.  Here’s Ebbinghause’s version (quoted in R&A: 129):

These relations [between repetition and performance] can be described figuratively by speaking of the series as being more or less deeply engraved on some mental substratum. To carry out this figure: as the number of repetitions increases, the series are engraved more and more deeply and indelibly; if the number of repetitions is small, the inscription is but surface deep and only fleeting glimpses of the tracery can be caught…

Note that on this conception, repeated experience forms the concepts in the mind (e.g. makes the grooves). Repetition is critical for the mind’s main character is its receptivity to external formative forces, the mind itself being structurally rudimentary. On this view, to understand acquisition requires analyzing the fine structure of the input for what minds/brains do in forming mental constructs (ideas, concepts, etc.) is sift and manipulate these input experiences. It is not surprising, that this view focuses on minds’ significant statistical capacities for these are obvious candidate mechanisms for organizing the inputs and separating the significant wheat from the non-significant chaff.

This contrasts with Rish proposals. For these, the mind is very articulated. There is lots of given pre-experiential structure. Thus, the role of experience is not to construct the relevant concepts attained but to kick start them into activation. Experience on this view is a trigger, not an artificer. Not surprisingly, this conception focuses on discovering the natively provided mental structures that experience serves (importantly but modestly) to activate.

On considering these two different raw philosophical pictures, one can understand the intrinsic interest in one-trial learning (OTL). The existence of OTL would be a problem for E but not for R. The empirical question then is whether OTL exists and how common it is. Investigating this requires translating the philosophical pictures into testable theories, and this leads to learning curves.

R&A observe that one of the biggest pieces of evidence for the E view of the world is the classical learning curve; you know the one that rises from low left to high right decelerating as it goes (as below reproduced from R&A p. 128). 

R&A note that this curve perfectly embodies the E conception that the Ebbinghause quote poetically describes. R&A point out two important features of learning curves consonant with the leading E idea.[2] First, “[t]he fact that the learning curve shows a gradual increase in performance is a reflection of the underlying mechanism- the build up of strength- which is itself also gradual.” And second, that this curve is “the same across astonishingly different experimental situations and dependent measures, as well as across species from slugs to humans,” strongly suggesting general “underlying mechanisms” and general “laws of learning” (129). In a word, this curve, it is argued, puts paid to the R idea of triggering and its concomitant conception of a highly structured mind. If learning curves describe the mechanics of learning, then E beats R. [3] Unless this curve is actually an artifact of, e.g. how experimental data is crunched, rather than a description of an underlying mental mechanism. And that’s where the story that R&A tell gets really interesting.

In the late 1950s and early 1960s Irvin Rock and William Estes (these two were big psych shots, look them up) did a series of experiments that showed (at the very least) that this interpretation of the curve as describing an underlying mechanism that is similarly smooth and incremental is premature, and (at the most) that it was false. They showed two things: (i) that these curves were both consistent with an underlying OTL mechanism (i.e. “although the learning curves were continuous, the underlying processes were anything but continuous” (p. 129)) and (ii) that there was very good evidence that OTL is the norm. Let’s discuss each point separately.

Here’s R&A quoting Rock (that’s the “p.186” below) and then commenting (p. 130) wrt (i):

“Another possibility is that repetition is essential because only a limited number of associations can be formed in one trial, and improvement with repetition is only an artifact of working with long lists of items.” (p.186 ). … That subset is learned perfectly, but all the rest of the associations that were presented  are not learned at all….The “artifact” Rock referred to is essentially that of averaging across many subjects learning many lists on many trials: despite the all-or-none nature of the underlying process, the learning curve will be smooth when performance is averaged over these several parameters. (my emphasis;NH)

This is a very important conceptual point for it divorces the big E conclusion that the mechanisms of learning are gradual and driven by repeated environmental inputs (viz. repeated engravings by experience on a mental substratum) from the fact that learning curves have the shape they do and can be found quite generally across tasks and species. Put more pointedly, if this is correct, then the smooth shape of the learning curve implies nothing at all about the smoothness and gradualness of the underlying mechanism.

Rock and Estes not only made this important observation but also then went on to show that in classical cases of “learning”[4] (i.e. acquiring paired associates), there is good evidence against the classical picture. The basic set up was to have two groups, one that learned listed pairs by going through one list again and again (the control group) and a second that learns a list that removes the non-learned pairs so that they are not encountered again. The prediction if E is correct is that the control group will do better than the second group. I will not review Rock’s and Este’s experiments here as that’s what the R&A paper does so well. Suffice it to say, that, as R&A put it, their papers showed that “there was no hint for the continuity/incremental hypothesis in the data” (p.131).

Note that if (ii) is correct, an account that uses an incremental mechanism to derive acquisition data correctly described by the classical learning curve is incorrect.  Let me beat this horse good and dead: Point (i) shows that a classical learning curve is consistent with OTL mechanisms. Point (ii) argues that OTL mechanisms are in fact what we find. Hence, if correct, theories that deploy incremental mechanisms even if they can derive classical learning curves are wrong. This should not be surprising: such curves graph a correlation between trials and responses. The aim of a theory is not to “model” the data but to “model” the mechanisms that generate the data. The E mistake, is to wrongly infer that smooth incremental data implies smooth incremental mechanisms. It doesn’t, though thinking it does is an Eish diathesis. [5]

It goes without saying that both Rock’s and Estes’ results were contested. Methodological problems were purportedly found that confounded the conclusions. However, and this is interesting, no good evidence for the classical theory appeared to be forthcoming. Rather, the critics seemed satisfied with a draw, viz. showing that the Rock/Estes results need not be interpreted as debunking the classical E view. One particularly cute study that R&A report seemed satisfied with the conclusion “that the incremental theory is untestable” or, quoting the critics directly (Underwood and Keppel): “certain theories are not capable of disproof. Certain aspects of the incremental theory seem to be of this nature” (p. 135). If this be a vindication of the classical E view imagine what a refutation would look like.

John’s current work on lexical acquisition develops the Rock/Estes conception. This is what makes it so interesting for people like me. If they are correct, then classical E conceptions of learning don’t exist, or, more modestly, there is precious little evidence in its favor. To me, this has enormous implications for standard approaches to modeling acquisition that assume some form of gradual process taking place, some form of gradual hill climbing or gradual strengthening of connections. Indeed, in some moods (e.g. now), I think that this work, if correct, implies that Gallistel and Matzel are correct and there is no general theory of learning to be had, as there are no mechanisms, mental or neural, that correspond to what the E picture took learning to be.[6] However, for now I would be happy with more modest conclusions: (i) that there is precious little evidence in favor of the E conception, (ii) that there is little evidence in favor of the view that mental mechanisms are gradual and continuous, and (iii) that there is pretty good evidence that we have mechanisms that enable what amounts to one trial “learning” and that this kind of acquisition requires something very much like the classical R conception of the mind.[7] There is no good reason, in other words, for taking the E conception to be the default and every reason to think that the problem of acquisition is largely one of getting the pre-packaged representational formats correct.

Let me end with a request and an exhortation. First the request: does anyone have a poster case of learning not susceptible to the Rock/Estes critique from the psych literature. It would be nice to have one. Second read the R&A papers and the recent papers developing these ideas by John and Lila and Charles and Jesse and their students. If their insights are internalized, we may finally be able to break the grip of E conceptions of mental mechanisms as the default position. One, at least, can always hope; after all we got rid of phlogiston, didn’t we?

[1] I think it was Lila Gleitman who first advanced the following PoS argument: Empiricism must be innate for what else could explain the widespread conviction that it is true despite the dearth of evidence in its favor. Lila also was kind enough to bring the R&A paper to my attention. I should also add that I doubt that Lila would endorse my interpretation of this work as outlined below. In fact, I am sure she wouldn’t given this.
[2] Gallistel and Matzel (see here) note that the LTP view of brains has been taken to similarly support an E picture. G&M argue that the E picture is widely accepted in the neurosciences, despite there being little to recommend it (and a lot to disavow it). R&A’s discussion merges well with G&M’s and supports a similar conclusion.
[3] R&A identify, in passing, an attraction of E views of learning to the formally inclined that comes from the mathematical tractability of learning curves. I quote: “Learning curves (like forgetting curves) are smooth and beautiful, and psychologists with a mathematical bent can have a field day fitting equations to them” (p.128).  One should never underestimate the attraction of a conclusion that fits snugly with your available technology. 
[4] Note the scare quotes: if Rock and Estes (and Gleitman and Trueswell and Company are right) then learning is a hypothesis about the mechanisms of acquisition, not a neutral description of an observed phenomenon. What we observe is change over time given environmental inputs. This change may be due to learning, maturation, growth, or whatever. The cognitive question concerns the mechanism and learning is a proposal for one such.
[5] This is the kind of mistake that modeling which takes the name of the game to be getting the input/output relations right is particularly susceptible to. E conceptions are prone to this kind of misconception given their picture that mental structure mirrors the structure of the input. However, modeling I/O relations confuses the data to be explained for the mechanism that does the explanation. For a related point in a Bayesian context see Glymour on “Osiander’s Psychology” in the comments to Jones & Love’s discussion of modern Bayesianism here and some blogish discussion by me (here). J&L note a similarity between earlier behaviorist conceptions and some modern Bayesian analyses. The above suggests how two programs that appear so different on the surface might nonetheless lead to the same conceptual place via a shared partiality to associationism and/or a misunderstanding of what modeling is supposed to do.
[6] After all, if (roughly) one trial “learning” is the norm then there is little for E like mechanisms to do. Acquisition on this view is more akin to transduction than to mental computation. I would probably be satisfied if it turned out that there was “a little” learning, but this is a topic for another discussion.
[7] After all if we don’t acquire knowledge by carefully sifting through the input it’s because such sifting is not necessary and it wouldn’t be necessary if the knowledge is basically, already, all there.


  1. This comment has been removed by the author.

  2. I am not sure I get this completely but does the artifact argument by R&A rely on the learning curve being created by multiple subjects that you average over? Put differently, what would the counterargument against incremental learning be if you find a learning curve within a single individual? This is for instance how McLelland & Patterson (2002) reanalyze Pinker & Ullman's data on the acquisition of regular past tense.

  3. they note that the curve results from averaging over individuals, trials and items. Look at the paper.

  4. You mean the R&A paper? They don't say much that I can relate to the case I have in mind, as far as I can tell. I can see how a learning curve averaging over different individuals may hide multiple distinct OTL's for distinct individuals, but if a learning curve for one learner averages over different items (say, different verbs correctly inflected for past tense), then it would at most hide multiple distinct OTL's for different verbs correctly inflected for past tense. But it doesn't hide a distinct OTL for the past tense RULE. Which is good news for the connectionist. What am I missing?

  5. If I understand this correctly, it averages over how many trials it takes to learn the full paradigm even though whatever is learned is learned at once or not at all. The relevant information is "many subjects learning many lists over many trials." All three cases of averaging can have this effect. The Gleitman and Trueswell stuff involves averaging over how many guess it takes to acquire the lexical item. Though at each guess its get it or not get it depending on how many guess it takes for an individual on a given list gives the smooth effect. I would look into the Rock paper if you want the details for he worries about it there. I think that Gleitman and Trueswell do to so there might be a good place to look. Hope this helps.

  6. Yes, I will have a look at the Rock paper itself, coz I don't immediately see how what may be the case for the learning of vocabulary items can be extended to individual learning curves of rules.