Comments

Showing posts with label Irvin Rock. Show all posts
Showing posts with label Irvin Rock. Show all posts

Monday, November 4, 2013

Learning is to cognition what phlogiston is to chemistry

Last week John Trueswell gave a colloquium talk at UMD that I unfortunately could not attend. I was in Montreal at a workshop in honor of an old prof of mine, Jim McGilvray.  The Montreal gig was great and it was a pleasure to be able to fete Jim in person (he supervised one of my earliest linguistics projects, (my undergrad thesis on a Reichenbachian theory of tense), but I confess that I would have loved to have been at John’s talk as well. I await the day when some clever physicist figures out how to allow someone to be in two places at once. Until that happy time arrives, I thought it appropriate to do some penance for my physical failings by re-reading a terrific paper by Roediger and Arnold (R&A) on the history of one-trial learning experiments. Why the R&A paper? Because, John’s recent work on lexical acquisition is in the one-trial learning tradition whose history they review. I’ve discussed this work already in a couple of places (here and here) but I wanted to bring your attention to the R&A paper again for it highlights some of the more interesting implications of this line of research for topics near and dear to my intellectual prejudices: despite the common conviction that Empiricism has (at least) something going for it, there is a remarkable absence of evidence supporting this very weak view.[1] Here’s what I mean.

Behind every theory there is an inspirational picture. In the mental sciences, the two grand traditions, Rationalism (R) and Empiricism (E), are animated by two contrasting conceptions of the underlying mechanisms of mental life. For Es, the afflatus is the blank wax tablet, learning consisting of imprinting by experience on this tablet and the clarity and distinctness of the resultant concept/idea being a function of the number of repeated imprints. The more the experience, the deeper and clearer the resulting acquired concept/idea.  Here’s Ebbinghause’s version (quoted in R&A: 129):

These relations [between repetition and performance] can be described figuratively by speaking of the series as being more or less deeply engraved on some mental substratum. To carry out this figure: as the number of repetitions increases, the series are engraved more and more deeply and indelibly; if the number of repetitions is small, the inscription is but surface deep and only fleeting glimpses of the tracery can be caught…

Note that on this conception, repeated experience forms the concepts in the mind (e.g. makes the grooves). Repetition is critical for the mind’s main character is its receptivity to external formative forces, the mind itself being structurally rudimentary. On this view, to understand acquisition requires analyzing the fine structure of the input for what minds/brains do in forming mental constructs (ideas, concepts, etc.) is sift and manipulate these input experiences. It is not surprising, that this view focuses on minds’ significant statistical capacities for these are obvious candidate mechanisms for organizing the inputs and separating the significant wheat from the non-significant chaff.

This contrasts with Rish proposals. For these, the mind is very articulated. There is lots of given pre-experiential structure. Thus, the role of experience is not to construct the relevant concepts attained but to kick start them into activation. Experience on this view is a trigger, not an artificer. Not surprisingly, this conception focuses on discovering the natively provided mental structures that experience serves (importantly but modestly) to activate.

On considering these two different raw philosophical pictures, one can understand the intrinsic interest in one-trial learning (OTL). The existence of OTL would be a problem for E but not for R. The empirical question then is whether OTL exists and how common it is. Investigating this requires translating the philosophical pictures into testable theories, and this leads to learning curves.

R&A observe that one of the biggest pieces of evidence for the E view of the world is the classical learning curve; you know the one that rises from low left to high right decelerating as it goes (as below reproduced from R&A p. 128). 




R&A note that this curve perfectly embodies the E conception that the Ebbinghause quote poetically describes. R&A point out two important features of learning curves consonant with the leading E idea.[2] First, “[t]he fact that the learning curve shows a gradual increase in performance is a reflection of the underlying mechanism- the build up of strength- which is itself also gradual.” And second, that this curve is “the same across astonishingly different experimental situations and dependent measures, as well as across species from slugs to humans,” strongly suggesting general “underlying mechanisms” and general “laws of learning” (129). In a word, this curve, it is argued, puts paid to the R idea of triggering and its concomitant conception of a highly structured mind. If learning curves describe the mechanics of learning, then E beats R. [3] Unless this curve is actually an artifact of, e.g. how experimental data is crunched, rather than a description of an underlying mental mechanism. And that’s where the story that R&A tell gets really interesting.

In the late 1950s and early 1960s Irvin Rock and William Estes (these two were big psych shots, look them up) did a series of experiments that showed (at the very least) that this interpretation of the curve as describing an underlying mechanism that is similarly smooth and incremental is premature, and (at the most) that it was false. They showed two things: (i) that these curves were both consistent with an underlying OTL mechanism (i.e. “although the learning curves were continuous, the underlying processes were anything but continuous” (p. 129)) and (ii) that there was very good evidence that OTL is the norm. Let’s discuss each point separately.

Here’s R&A quoting Rock (that’s the “p.186” below) and then commenting (p. 130) wrt (i):

“Another possibility is that repetition is essential because only a limited number of associations can be formed in one trial, and improvement with repetition is only an artifact of working with long lists of items.” (p.186 ). … That subset is learned perfectly, but all the rest of the associations that were presented  are not learned at all….The “artifact” Rock referred to is essentially that of averaging across many subjects learning many lists on many trials: despite the all-or-none nature of the underlying process, the learning curve will be smooth when performance is averaged over these several parameters. (my emphasis;NH)

This is a very important conceptual point for it divorces the big E conclusion that the mechanisms of learning are gradual and driven by repeated environmental inputs (viz. repeated engravings by experience on a mental substratum) from the fact that learning curves have the shape they do and can be found quite generally across tasks and species. Put more pointedly, if this is correct, then the smooth shape of the learning curve implies nothing at all about the smoothness and gradualness of the underlying mechanism.

Rock and Estes not only made this important observation but also then went on to show that in classical cases of “learning”[4] (i.e. acquiring paired associates), there is good evidence against the classical picture. The basic set up was to have two groups, one that learned listed pairs by going through one list again and again (the control group) and a second that learns a list that removes the non-learned pairs so that they are not encountered again. The prediction if E is correct is that the control group will do better than the second group. I will not review Rock’s and Este’s experiments here as that’s what the R&A paper does so well. Suffice it to say, that, as R&A put it, their papers showed that “there was no hint for the continuity/incremental hypothesis in the data” (p.131).

Note that if (ii) is correct, an account that uses an incremental mechanism to derive acquisition data correctly described by the classical learning curve is incorrect.  Let me beat this horse good and dead: Point (i) shows that a classical learning curve is consistent with OTL mechanisms. Point (ii) argues that OTL mechanisms are in fact what we find. Hence, if correct, theories that deploy incremental mechanisms even if they can derive classical learning curves are wrong. This should not be surprising: such curves graph a correlation between trials and responses. The aim of a theory is not to “model” the data but to “model” the mechanisms that generate the data. The E mistake, is to wrongly infer that smooth incremental data implies smooth incremental mechanisms. It doesn’t, though thinking it does is an Eish diathesis. [5]

It goes without saying that both Rock’s and Estes’ results were contested. Methodological problems were purportedly found that confounded the conclusions. However, and this is interesting, no good evidence for the classical theory appeared to be forthcoming. Rather, the critics seemed satisfied with a draw, viz. showing that the Rock/Estes results need not be interpreted as debunking the classical E view. One particularly cute study that R&A report seemed satisfied with the conclusion “that the incremental theory is untestable” or, quoting the critics directly (Underwood and Keppel): “certain theories are not capable of disproof. Certain aspects of the incremental theory seem to be of this nature” (p. 135). If this be a vindication of the classical E view imagine what a refutation would look like.

John’s current work on lexical acquisition develops the Rock/Estes conception. This is what makes it so interesting for people like me. If they are correct, then classical E conceptions of learning don’t exist, or, more modestly, there is precious little evidence in its favor. To me, this has enormous implications for standard approaches to modeling acquisition that assume some form of gradual process taking place, some form of gradual hill climbing or gradual strengthening of connections. Indeed, in some moods (e.g. now), I think that this work, if correct, implies that Gallistel and Matzel are correct and there is no general theory of learning to be had, as there are no mechanisms, mental or neural, that correspond to what the E picture took learning to be.[6] However, for now I would be happy with more modest conclusions: (i) that there is precious little evidence in favor of the E conception, (ii) that there is little evidence in favor of the view that mental mechanisms are gradual and continuous, and (iii) that there is pretty good evidence that we have mechanisms that enable what amounts to one trial “learning” and that this kind of acquisition requires something very much like the classical R conception of the mind.[7] There is no good reason, in other words, for taking the E conception to be the default and every reason to think that the problem of acquisition is largely one of getting the pre-packaged representational formats correct.

Let me end with a request and an exhortation. First the request: does anyone have a poster case of learning not susceptible to the Rock/Estes critique from the psych literature. It would be nice to have one. Second read the R&A papers and the recent papers developing these ideas by John and Lila and Charles and Jesse and their students. If their insights are internalized, we may finally be able to break the grip of E conceptions of mental mechanisms as the default position. One, at least, can always hope; after all we got rid of phlogiston, didn’t we?





[1] I think it was Lila Gleitman who first advanced the following PoS argument: Empiricism must be innate for what else could explain the widespread conviction that it is true despite the dearth of evidence in its favor. Lila also was kind enough to bring the R&A paper to my attention. I should also add that I doubt that Lila would endorse my interpretation of this work as outlined below. In fact, I am sure she wouldn’t given this.
[2] Gallistel and Matzel (see here) note that the LTP view of brains has been taken to similarly support an E picture. G&M argue that the E picture is widely accepted in the neurosciences, despite there being little to recommend it (and a lot to disavow it). R&A’s discussion merges well with G&M’s and supports a similar conclusion.
[3] R&A identify, in passing, an attraction of E views of learning to the formally inclined that comes from the mathematical tractability of learning curves. I quote: “Learning curves (like forgetting curves) are smooth and beautiful, and psychologists with a mathematical bent can have a field day fitting equations to them” (p.128).  One should never underestimate the attraction of a conclusion that fits snugly with your available technology. 
[4] Note the scare quotes: if Rock and Estes (and Gleitman and Trueswell and Company are right) then learning is a hypothesis about the mechanisms of acquisition, not a neutral description of an observed phenomenon. What we observe is change over time given environmental inputs. This change may be due to learning, maturation, growth, or whatever. The cognitive question concerns the mechanism and learning is a proposal for one such.
[5] This is the kind of mistake that modeling which takes the name of the game to be getting the input/output relations right is particularly susceptible to. E conceptions are prone to this kind of misconception given their picture that mental structure mirrors the structure of the input. However, modeling I/O relations confuses the data to be explained for the mechanism that does the explanation. For a related point in a Bayesian context see Glymour on “Osiander’s Psychology” in the comments to Jones & Love’s discussion of modern Bayesianism here and some blogish discussion by me (here). J&L note a similarity between earlier behaviorist conceptions and some modern Bayesian analyses. The above suggests how two programs that appear so different on the surface might nonetheless lead to the same conceptual place via a shared partiality to associationism and/or a misunderstanding of what modeling is supposed to do.
[6] After all, if (roughly) one trial “learning” is the norm then there is little for E like mechanisms to do. Acquisition on this view is more akin to transduction than to mental computation. I would probably be satisfied if it turned out that there was “a little” learning, but this is a topic for another discussion.
[7] After all if we don’t acquire knowledge by carefully sifting through the input it’s because such sifting is not necessary and it wouldn’t be necessary if the knowledge is basically, already, all there.

Wednesday, December 12, 2012

Does Anyone Ever Learn Anything?


Let’s follow Jimmy and Judy from birth to about five.  At birth they say precious little. At five you can’t shut them up. What happened in these five years? Answer: they learned their native language, English say. Obvious no?  Yes. But is it right? Did Judy and Jimmy learn English?  Well, to paraphrase a recent political celebrity, it all depends on what you mean by learn and English.  It seems undeniable that Judy and Jimmy developed a capacity absent (or at least invisible[1]) at birth and this capacity can be exercised to converse with some natives, the English speaking ones, but not others, the Mandarin speaking ones. However, does this imply that they learned English? 

Linguists have long understood that labels like English, French, Swahili, Mandarin, etc. are more convenience terms than terms of art (here’s where linguists mention the Weinreich quip that a language is just a dialect with an army and a navy; real cognoscenti adding sotto voce that dialects are just idiolects with epaulettes). However, recent research suggests that we have been far too cavalier about the first half of this doublet.  We can all agree that Judy and Jimmy acquired (competence in) English, but did they learn English? Recent (and, as we shall see, not so recent) research into word learning suggest that here we need to look before we leap, something, it appears, that kids do not do, at least when it comes to early word acquisition. I’ve already discussed some of the research by MSTG (here) which argues (very convincingly in my view) that the early stages of lexical acquisition do not involve the careful statistical weighing of competing alternatives but involves jumping to a conclusion mentally clutched with fierce determination and, with time, forgotten if incorrect, only to set up another ill supported jump into the lexical abyss (boy was that fun to write!). MSTG support this conclusion by considering learning situations less factitious than the contrived set-ups near and dear to the psych lab.  When kid and adults are asked to consider more natural (and hence visually busy) filmed vignettes in which the targets of lexical labeling are not clearly segregated and identified on pristine picture cards, they acquire word meanings all at once or not at all. This result is important for several reasons.

First and foremost, it provides a concrete illustration of why we should not equate acquisition with learning.  MSTG provides diagnostics of learning (multiple hypotheses, statistical weighing of these alternatives, gradual convergence on the right result) and argues that learning so understood fails to hold in more realistic contexts of lexical acquisition. Specific conclusion; in at least one demonstrable case, acquisition does not equal learning.  More general conclusion; it is an empirical question whether in any given acquisition context it is true that learning (now understood to be one mechanism among others for the acquisition of knowledge) is taking place.  In other words, Jimmy and Judy certainly acquired English but whether they learned it is entirely open for empirical grabs. Chomsky’s repeated suggestion that we understand language acquisition as a species of growth rather than learning makes an analogous point.  MSTG makes the case more crisply, I believe, by providing clear diagnostics of learning and showing that there are core cases of “learning” where these signature properties of learning are demonstrably absent.

Second, MSTG provide a rationale for why learning doesn’t hold in their examined cases. Commenting on this (here) I observed that the MSTG results suggest that in such busy contexts the prerequisites for “cross situational learning” do not exist and this is why an alternative acquisition strategy is employed. Following MSTG’s lead, I even suggested that learning requires the kind of structured hypothesis space provided by the contrived set ups that MSTG’s more realistic vignettes argue against. This constituted a kind of compromise position; learning applies where acquirers have well structured hypothesis spaces and leaping to conclusions holds where this fails to hold.[2]  However, (and learn from this you soft-hearted open-minded intellectual compromisers out there) once again Norbert’s natural generosity of spirit and desire for intellectual group-hug kumbaya moments led him astray. It seems that even this compromise position concedes too much to learning mechanisms. In a companion paper, Trueswell, Medina, Hafri and Gleitman (TMHG) extend the MSTG results to include the more stylized learning environments in which options are clearly marked and lexical targets (aka referents) are crisply identified.[3]  Even in the artificial setting of the experimental psych lab kids and adults do the darndest things!

More specifically, like MSTG, TMGH identify the quiddity of “cross situational learning” with the following mechanism:

…keeping track of multiple hypotheses about a word’s meaning across successive learning instances, and gradually converg[ing] on the correct meaning via an intersective statistical process (128).

What they demonstrate is that even in simple stylized psych lab contexts when acquirers “are placed in the initial novice state of identifying word-to-referent mappings across learning instances, evidence for such a multiple hypothesis tracking procedure is strikingly absent (128),” and learners don’t “track the cross-trial statistics in order to build the correct mappings between words and referents (129).”  What acquirers do is “make a single conjecture upon hearing the word used in context and carry that conjecture forward to be evaluated for consistency with the next observed context (129).”  If confirmed, the guess is retained, if disconfirmed, acquirers guess again as if de novo.

TMGH show this (same with MSTG) by considering the dynamics of knowledge fixation: “how learning accuracy unfolded across learning instances (130).” This is very interesting stuff and I strongly recommend that you look at the details. However, just as interesting is the little bit of history that TMGH review. TMGH acknowledge tapping into a long history of criticism of this traditional gradual/comparative conception of learning.  In the late 1950s and early 1960s first Irvin Rock (1957) and then William Estes (1960) used very analogous kinds of arguments to demonstrate that verbal learning was not learning at all but was one-trial guessing. They did not fare so well. As Roediger and Arnold (RA) (2012) put it “the verbal learning establishment rose up to smite down these new ideas (2).”  It is instructive to read the RA paper for it shows the weakness of the counter-arguments used against Rock and Estes that nonetheless carried the day.  Just like today, learning was less a hypothesized mechanism for acquisition than a definitional truth about it.[4] 

The TMHG paper ends with an interesting paradox that I would like to briefly discuss as well. Initial word learning is slow and laborious (35-140 words at 14 months) but unbelievably rapid thereafter (12,000 words by age 6, i.e. roughly 7 words per day for a little under 5 years).  Why the change? TMHG note other work (Lila’s syntactic bootstrapping hypothesis) which proposes that “the acquisition of syntax and other linguistic knowledge by toddlers and young preschoolers during this time period provides a rich database of additional constraints that permit the learning of many additional words.” It is conceivable that with this knowledge in place, cross situational learning might finally become operative, though as TMHG correctly observe, this is decidedly an empirical question and it is “plausible that a propose-but-verify word learning procedure is at work all along the course of word learning throughout most of the life cycle (150).”  It would be interesting in the extreme, in my view were either conclusion correct.

If the latter conclusion proved true (propose and verify all the way down), then language acquisition might have nothing to do with learning in the technical sense. Of course, this is consistent with grammar acquisition being a case of learning, i.e. perhaps we don’t learn words but we do learn parameters. Maybe, but I’d be very skeptical. If even word acquisition isn’t an instance of learning then it seems to me that the burden of proof that any area of language acquisition involves learning would be pretty high.

If, on the other hand, the former option is correct, (viz. that learning only kicks in when grammar is there to buttress it) then its role in accounting for language acquisition would, in my opinion, be quite modest.  Yes, learning plays a role, but really most of the action lies with the constraints that grammars place on the process. This does not mean that we should not study the extras that learning might be adding (though remember the possibility mooted in the prior paragraph), but I doubt that these cognitive titivations will generate much excitement if they only operate in restricted hypothesis spaces. Learning is interesting when there are lots of options that need sifting, not so much when the range of possible end states is highly restricted. 

MSTG and TMHG show us that terminology matters. If we call acquisition “learning” we’ve loaded the research dice. If MSTG and TMHG are right (and I’d bet quite a bit that they are: any suckers out there?) it looks like we’ve repeatedly loaded them to come up snake eyes. It’s time we cut our losses and open our minds to the possibility that ‘language learning’ like ‘Justice Roberts,’ ‘military intelligence,’ and ‘western civilization’ is an oxymoron.



[1] I include this hedge for every week another (usually French speaking) psychologist shows that the youngest kids seem to have the most prodigious knowledge. It seems that we have nothing to teach those little know-it-alls. 
[2] A kind of learning as first resort, jumping to conclusions a last resort, method. C.f. TMHG p. 130.
[3] It is doubtful that the meaning of a lexical item is simply its referent for reasons that Chomsky has belabored (sadly, quite unsuccessfully) over the years. There is far more structure to lexical items than what they refer to. However, for current purposes, this only further dramatizes the inadequacy of learning as a mechanism for lexical acquisition. 
[4] TMHG also cite another paper by Gallistel and friends that I have not yet read but will try to get hold of and blog about when I do. It argues (cited in TMHG 151), that “in most subjects, in most paradigms, the transition from a low level of responding to an asymptotic level is abrupt.” Oh my. I’ll keep you posted.