Wednesday, January 9, 2013

I before E: Acquisition as Constrained Creation

As children mature, they acquire many things: teeth, languages, toys, viruses, friends, habits, secondary sexual characteristics, drivers’ licences, etc. With regard to some of these—e.g., teeth—all normal children follow a common developmental path that ends in a mature state, allowing for individual variation within narrow limits. Each child acquires her own teeth, which at some level of detail, exhibit a unique cluster of properties: sizes, shapes, gaps, susceptibility to cavities, etc. But no child acquires thirty-seven shark teeth. On the whole, parents and dentists know what to expect. Teeth emerge gradually—starting in the second half of the first year after birth, with the full set of “primary” teeth manifesting in the next two years—and then kids lose their first set, starting around age 7, as their “permanent” teeth start to emerge.
            In some respects, language acquisition is similar. But acquired languages, in contrast to teeth, vary in ways tracked by linguistic experience. Kids tend to acquire the languages they are exposed to. (Exposure to English doesn’t yield competence in Japanese.) Another difference is that normal human beings don’t have opinions about odontogenesis. Which is just as well, since we’d get it wrong. Tooth development is a complicated business, starting around 14 weeks after conception, with roots and crowns continuing to morph for a while even after the teeth have “erupted.” By contrast, people are apt to have untutored opinions about how kids acquire languages. And this can influence how theorists describe language acquisition. In particular, it’s often taken to be obvious that kids acquire languages by learning them. This little theory often masquerades as common sense, thereby avoiding critical scrutiny. But let’s remove the mask.
Historically, the little theory has been combined with very implausible assumptions about animal psychology, and an inchoate version of the idea that humans acquire E-languages; see, e.g., Word and Object. But one can hypothesize that kids acquire I-languages by learning them.
The ‘by’ is important here. One can describe I-language acquisition as a kind of learning. One can also describe walking as a kind of locomotion. But walkers don’t walk by locomoting; the fact that walkers locomote doesn’t explain how walkers walk. With this in mind, imagine an alien who wants to know how human children acquire I-languages: do I-languages emerge like teeth; do kids spontaneously create the generative procedures, perhaps after being given a pill; do kids download local I-languages via The Cloud; are I-languages somehow contagious?
Suppose the reply is that kids acquire I-languages by learning them, and that this is why acquired I-languages vary in ways tracked by linguistic experience. The alien will surely ask how kids learn procedures (as opposed to skills). So one might explain Universal Grammar (UG) and offer this idealization: if L is the UG-compatible I-language used in community C, then kids born into C learn that L is the UG-compatible I-language used in C. But then the alien will ask if there isn’t a distinction between (i) learning L and (ii) learning that L is the language that members of C are using, given prior assumptions about which languages they could be using.
A less polite alien would complain that it’s a bait-and-switch to say that kids acquire languages by learning them, and then treat instances of (ii) as instances of (i). But for today, let’s suppose that kids in Topeka acquire English as opposed to Japanese/Finnish/etc. because they somehow learn which (UG-compatible) I-language is “locally dominant,” while kids in Tokyo acquire Japanese as opposed to English/Finnish/etc. because they likewise learn which I-language is locally dominant. Still, it’s one thing to explain why a child acquires English as opposed to Japanese. It’s another thing to explain how the child acquires an I-language at all, and English in particular. So getting back to the little theory, one question is whether a child who acquires L does so by learning that L is the locally dominant I-language.
In principle, one can imagine a positive answer. For L might serve as a “target” in either or both of two senses. A child might start with a “proto-I-language” P, use it as a simple initial model of the locally dominant I-language L, and use experience to successively adjust P in ways that eventually lead to instantiation of L. Or a child might use symptoms of L to determine values of variables in an evaluation procedure that selects L from a space of candidates for being the locally dominant I-language. The first idea is that kids successively approximate L. The second idea is that kids compute which I-language is locally dominant. Either way, L looks like an E-language from an acquirer’s perspective: L is represented as an aspect of the acquirer’s environment. And familiar difficulties beset attempts to approximate generative procedures that exhibit constrained homophony, or infer such procedures from detectable parameter values. A more attractive option, in my view, is that acquired languages are not targets in either sense.
Charles Yang shows how choices for parameter values can be rewarded/punished by experience in ways that lead acquisition devices to converge on what theorists (not the devices) describe as a target grammar. I think Yang’s models shed real light on how kids stabilize on a local I-language. We can even speak of Yang-learning that L is the locally dominant I-language. And clearly, some aspects of L (e.g., how to tense verbs) are learned in some sense. But a Yang-learner doesn’t acquire L by learning that L is the locally dominant I-language. A system of this sort is not in the business of discovering what other systems are doing. It has a procedure for replacing the I-language(s) it uses—under pressure from apparent failures of comprehension and/or communication—without representing or trying to acquire anybody’s I-language.
 This coheres with my general hunch that I-languages play an important role in human cognition prior to their use in communication. Suppose that I-languages are basically the same on the meaning side, and some I-language (perhaps phonologically null, with a limited lexicon) will emerge naturally in a child, at least given a few gruntings by conspecifics. That might be enough for certain intrapersonal uses. But for interpersonal communication, a child will need to trade in the “starter I-language” for something a bit flashier, and then keep trading up until she hits on an I-language that supports communication well enough (or until the critical period ends).
This coheres with the phenomenon of creolization. Kids can acquire an I-language that had never been acquired before. That’s bad news for the theory that kids acquire I-languages by learning them. One can say that creolizers try to acquire the locally dominant (UG-compatible) I-language, and end up with what they end up with, as on a quest for the grail. But in any case, kids can acquire an I-language that wasn’t there to be learned. And whoever first acquired (UG-compatible) I-languages didn’t learn them. Those humans may have been kids who heard parental grunting, and then found themselves talking to each other in ways the parents couldn’t follow. One can hypothesize that even these pioneer creolizers were trying to acquire the locally dominant I-language. (Quite an evolutionary trick: emergence of a capacity to learn I-languages before there were any to learn.) But in any case, the pioneers’ descendents have the same power to create I-languages where there are none to learn. So why think that humans ever started to acquire I-languages by learning them? Maybe kids kept doing what the pioneers did, just in response to fancier gruntings. (That coheres with the “continuity” hypothesis discussed here.)
Of course, there’s no guarantee that an I-language L that lets you talk with your peers will let you talk to your kids in ways that will get them to acquire L. And the I-languages currently in stable use are “heritable” in this sense. But any “unheritable” I-languages wouldn’t remain in use for long. So maybe we shouldn’t be too impressed by the fact kids typically acquire the locally dominant I-language (as opposed to a significantly different variant). It may be that the counterexamples to this generalization quickly die out.


  1. I am not sure I follow the distinction that you are trying to make.

    Take a non-linguistic domain like riding a bicycle.

    So I buy a bicycle and go down to the park and ride it around for a bit and fall off a few times, and watch how other people ride, and after a few hours I have the ability to ride a bicycle.

    Does this count as learning in your view?

    Have I acquired the ability to ride *by* learning?

  2. I can't speak for Paul, but I guess that would just be an empirical question about how people actually do learn (or 'learn') to ride bicycles.

    I guess on the face of it, it seems unlikely that there is any explicitly represented target bicycle-riding procedure involved in that process. I.e., it seems more likely that you just keep making adjustments to your technique until you stop falling over, rather than successively refining a model of a bicycle-riding procedure to more closely approximate a target procedure, or computing which of several models of a bicycle-riding procedure is likely to be closest to the target.

    1. For me it's not an empirical question at all: to me, that is about as canonical an example of learning as I can think of.
      Learning for me just means (among other things) that sort of thing -- improving your performance at some task on the basis of experience.

      So I think there is some redefinition of the term 'learning' here which I am not getting, or some specific technical use that I didn't get the memo about.
      But that may not be Paul's point.

      (I think this is different from the brute-causal triggering argument that was aired earlier, or Norbert's redefinition of learning to exclude sudden learning, a propos of the Medina et al paper).

    2. If it's just a question of what we ordinarily call "learning", then clearly languages are ordinarily said to be learned. On my understanding (which may not be very good), what would distinguish acquiring the ability to ride a bike *by learning* from (merely) acquiring the ability to ride a bike on the basis of experience is that only in the former case does experience play an evidential role. So:

      *(Merely) on the basis of experience*: I fall over. I follow some rule for modifying some parameter of my bike-riding procedure.

      *By learning*: I fall over. This experience constitutes evidence against some hypothesis I am currently entertaining regarding the correct procedure for riding a bike. I dump this hypothesis for one which is more likely correct given my recently-expanded evidential base.

      In the second case, there is a target that I'm working towards, and the key role of my experience is to provide evidence for/against models of the target. In the first case, there is no target. I just keep modifying the parameters in response to my experiences according to some set of rules. If I'm lucky, my rules/experiences will be such that I eventually end up with the correct set of parameters.

    3. If it's not an empirical question, then it's not an empirical question. But...

      I'm generally suspicious of "paradigm case" arguments. For respectable reasons, smart people once took the sun and moon (but not the earth) to be paradigmatic cases of planets, which were definitely not stars. Not so long ago, the Great Nebula in Andromeda was taken to be just that...a paradigmatic case of a nebula, not a galaxy (since people hadn't heard of galaxies).

      I'm not trying to define or redefine anything. For me, it's about trying to figure out if there is some theoretically interesting sense in which animals often learn things, such that language acquisition is (or heavily involves) learning in that sense.

      If you want to characterize learning in terms of improving performance, that's fine. Then it looks like skills (and not I-languages) are the sorts of "things" one can learn.
      But I was considering the idea that kids acquire I-languages by leaning them.

      In your sense, transitioning from not being able to ride a bike to being able to ride one is indeed a clear case of learning, so long the performance improvement you have in mind in performance with regard to bike riding. I don't have any special knowledge of what goes on when people acquire the capacity to ride a bike. (I seem to remember reading something suggesting that it was more complicated than Ryle made it out to be.) But as Alex Drummond suggests, I do think it's an empirical question whether there is some theoretically interesting sense in which animals often learn things, such that acquisition of bike-competence is (or heavily involves) learning in that sense.

      If "learning" is just a cover term for many processes that differ in kind--compare "locomotion"--then language acquisition may well be a kind of learning in the way that walking is a kind of locomotion. But then for purposes of explanation, we may as well just talk about language acquisition, and consign "learning" to the flames (or at least back to common sense talk with no real place in science).

      But we can also consider various hypotheses according to which kids acquire I-languages by learning them--in some way that everyone would recognize as learning--and then ask if those hypotheses are true. If none of them are, that may be worth knowing.

      I take it that if water can turn out to be H20, and heat can turn out to be mean molecular motion, then (in principle) learning might turn out to be a kind of process that wasn't going on in the Skinner boxes, is what underwrites the acquisition of bike-competence and other skills, isn't what underwrites the acquisition of I-languages, etc. In that sense, I think it is indeed an empirical question, which is why I inclined to fuss about part to get clearer about which empirical question(s) it is.

  3. If a scientist finds out that water is H20, this doesn't mean that the stuff in the pipes is no longer water. It is still water, we just have a better understanding of what water is.

    Maybe jade is the better example (reaching into our toy chests of standard philosophical examples). So what has traditionally been called jade is apparently two substances, jadeite and nephrite. That doesn't mean that the lumps of stuff that had been called jade stopped being jade -- it just meant that if you were a scientist you stopped using jade as a technical term.

    So I completely agree that when we understand learning we may split it into learning1 and learning2, and find that riding a bike is learning1 and acquiring a language is learning2 -- indeed I think that is quite likely. That is definitely an empirical question: how do we learn languages? And it is a very hard question -- parameter setting, perhaps Yang-style? domain general processes of analogy and abstraction? Bootstrapping from the semantics? What type of learning is it?

    But the next move of saying 'why would we think that language acquisition is learning?' is like saying why would we think that this lump of jade is jade? Well, because it *is* jade. That is not in question -- what is in question is *what type of jade it is*.

    1. For me, the analogous question is not: is jadeite jade? The analogous question is: once we have distinguished jadeite from nephrite, should we use 'jade' as a theoretical term? We can (and legitimately do) use 'learning' as a commonsense notion to initially identify some explananda. But it doesn't follow that 'learning' belongs in any explanans.
      Re targets (below), we're on exactly the same page.

    2. Yes I mostly agree with that. But for example, in the case of 'memory' a closely related term, which clearly has split into some interesting subclasses -- short-term, long-term etc which clearly have different biological substrata, one can still use the term 'memory' (as people have been doing on this blog) without any confusion to refer to a whole range of related but distinct information storage capabilities of the brain. And I think the term 'learning' is likely to be in a similar situation.

      I think this is more than just a terminological dispute -- I am pushing back on this because I feel a substantive empirical point is being smuggled in under the guise of just a precisification of terms.
      E.g. to go back to jade -- suppose before we had figured out the jadeite/nephrite split, someone said, "let's call jade from Australia ozite and jade from New Zealand kiwite". Well, that is out of order -- it is an empirical question whether the stuff from Australia is different from the stuff from New Zealand; and deciding a priori to split the terminology assumes something about how that empirical question will ultimately be decided.

      And so until we have the empirical facts, we should use a neutral term (like learning) to cover this cases.
      Someone might say in this case that he *knows* what the facts are -- but there is no broad consensus about this yet.

      (Of course, it's not a bad idea to call the journals and conferences and textbooks 'language acquisition' rather than language learning to avoid confusion with adults learning second languages)

    3. I agree with all that, except for the thought that 'learning' is a neutral term, as used in a field that has the intellectual history that this one has. I worry that a substantive empirical claim is being smuggled in under the guise of using commonsense vocabulary. But thanks for pressing the worry about undue precisification.

  4. The other point -- about whether there are 'targets' or not is interesting too.
    I think you are right to draw a distinction (If this is the one you mean) between a child acquiring its language by
    a) trying to match the local linguistic environment (trying to fit in)
    b) trying to achieve communicative goals

    Of course when learning communication systems these can't get too far apart. Whereas with bicycling learning to ride fast and learning to ride like other people could in principle be completely separate.

    In learning theory papers of the type I write we often talk about 'targets' and converging to the target, but these are only in the context of analysing the algorithm, and proving it works; these are not generally part of the algorithm itself, which just takes some data and outputs some grammar, with no talk of targets.