There are never enough good papers illustrating the poverty
of the stimulus. Here’s
a recent one that I read by Jennifer Culbertson and David Adger (yes, that
David Adger!) (C&A) that uses artificial language learning tasks as probes
into the kinds of generalizations that learners naturally (i.e. uninstructed) make.
Remember that generalization is the name of the game. Everyone agrees that no
generalizing beyond the input, no learning. The debate is is not about whether
this exists, but what the relevant dimensions are that guide the generalization
process. One standard view is that it’s just frequency of some kind, often
bigram and trigram frequencies. Another is that the dimension along which a
learner generalizes is more abstract, e.g. along some dimension of linguistic
structure. C&A provide an
interesting example of the latter in the context of artificial language
learning, a technique, I believe, that is still new to most linguists.[1]
Let me say a word about this technique. Typological
investigation provides a standard method for finding UG universals. The method
is to survey diverse grammars (or more often, and more superficially,
languages) and see what properties they all share. Greenberg was a past master
of this methodology, though from the current perspective, his methods look
rather “shallow,” (though the same cannot be said of modern cartographers like
Cinque). And, looking for common features of diverse grammars seems like a plausible
way to search for invariances. The current typological literature is well
developed in this regard and C&A note that Greenberg’s U20, which their
experiment explores, is based on an analysis of 341 languages (p.2/6). So, these kinds of typological investigations
are clearly suggestive. Nonetheless, I think that C&A are correct in
thinking that supplementing this kind of typological evidence with experimental
evidence is a very good idea for it allows one to investigate directly what typological surveys can
only do indirectly: to what degree the gaps in the record are principled. We know for a fact that the extant
languages/grammars are not the only possible
ones. Moreover, we know (or at least I believe) that the sample of grammars we
have at our disposal are a small subset of the possible ones. As the artificial
language learning experiments promise to allow us to directly probe what typological comparison only allows us to indirectly infer, better to use the
direct method if it is workable.
C&A’s paper offers a nice paradigm for how to do this that those
interested in exploring UG should look at this method with interest.
So what do C&A do? They expose learners to an artificial
version of English wherein pre-nominal order of determiner, numeral and
adjective are flipped from the English case. So, in “real” English (RE), the
order and structure is [Dem [ num [ adj [ N ] ] ] (as in: these three yellow mice). C&A expose learners to nominal bits
of artificial English (AE) where the dem, num, and adj are postnominal. In
particular, they present learners with data like mice these, mice three, mice
yellow etc. and see how they generalize to examples with more than one
postnominal element, e.g. do learners prefer phrases in AE like mice yellow these or mice these yellow? If learners treat AE
as just like RE but for the postnominal order then they might be expected to
preserve the word order they invariably see pre-nominally in RE postnominally
in AE (thus to prefer mice these yellow).
However, if they prefer to retain the scope
structure of the expressionsin RE and port that over to AE, then they will
prefer to preserve the bracketing noted above but flip the word order, i.e. [ [
[ N ] adj ] num ] dem]. On the first hypothesis, learners prefer to orders
they’ve encountered repeatedly in RE before, while on the second they prefer to
preserve RE’s more abstract scope relations when projecting to the new structures
in AE.
So what happens? Well you already know, right? They go for
door number 2 and preserve the scope order of RE thus reliably generalizing to
an order ‘N-adj-num-det.’ C&A conclude, reasonably enough, that “learner’s
overwhelmingly favor structural similarity over preservation of superficial order”
(abstract, p.1/6) and that this means that “when they are pitted against one
another, structural rather than distributional knowledge is brought to bear
most strongly in learning a new language” (p.5/6). The relevant cognitive
constraint, C&A conclude, is that learners adopt a constraint “enforcing an
isomorphism in the mapping between semantics and surface word order via
hierarchical syntax.”[2]
This actually coincides with similar biases young kids
exhibit in acquiring their first language. Lidz and Musolino (2006) (L&M)
show a similar kind of preference in relating quantificational scope and
surface word order. Together, C&A and L&M show a strong preference for
preserving a direct mapping between overt linear order and hierarchical
structure, at least in “early” learning, and, as C&A’s results show, that
this preference is not a simple L-R preference but a real structural one.
One further point struck me. We must couple the observed
preference for scope preserving order with a dispreference for treating surface
forms as a derived structure, i.e. a product of movement. C&A note that
‘N-dem-num-adj’ order is typologically rare. However, this order is easy enough
to derive from a structure like (1) via head movement given some plausible
functional structure. Given (1), N to F0 movement suffices.
(1) F0 [Dem [ num [ adj [ N ] ] ] à
[N+F0 [Dem [ num [ adj [ N
] ] ] ]
We know that there are languages where N moves to above
determiners (so one gets the order N-det rather than Det-N) and though the
N-dem-num-adj is “rare” it is, apparently, not unattested. So, there must be more going on. This, it goes without
saying I hope, does not detract from C&A’s conclusions, but it raises other
interesting questions that we might be able to use this technique to explore.
So, C&A have written a fun paper with an interesting
conclusion that deploys a useful method that those interested in FL might find
productive to incorporate into their bag of investigative tricks. Enjoy!
How does this relate to FL, exactly? I can see how you could make a case that it says something about UG, though, even here, you need to be careful, since these are adult native speakers of a language generalizing in a highly constrained space. It could be that it's just telling us something about a specific G and how, having acquired that G, people generalize in this particular space.
ReplyDeleteI think what Noah is trying to say is: English speakers have learned that English has hierarchical structure (e.g. is better described as a context free grammar than as a Markov chain), and that this hierarchical structure typically corresponds to semantic scope. So they form an "overhypothesis" about languages in general and will be biased to assume that any new language they learn should have the same properties. So while it's neat that C&A's participants applied these pretty abstract overhypotheses to an artificial language, it's unclear whether the results of the experiment tell us anything about innate biases. I guess it would be interesting to replicate the experiment with speakers of one of the languages where word order doesn't correspond to semantic scope relations and see if the generalization that participants extract is similar to the one in their native language or to Greenberg's universal.
ReplyDeleteHi Tal. We're designing ongoing work to do exactly that - running similar experiments on speakers of Thai and, we hope, kikuyu or kitharaka (PNAS made us take out the footnote referring to it). And you're absolutely right. We're very careful in the paper to make clear that this is about how linguistic knowledge is represented and applied, but the larger project connects this to biases in real acquisition that display themselves in terms of typological frequencies. See also Jenny's work with Smolensly and Legendre on universal 18.
ReplyDeleteThanks for your reply, David. Just to be clear, I was commenting on the summary of the paper in the blog post, and not on the paper itself. As you're saying the paper is pretty cautious about the potential interpretations of the results. Regardless of the source of the learning biases though (innate / transfer from native language), the fact that they even exist is very interesting IMO, and potentially rules out certain classes of strictly linear order based models. Great to hear you're running the experiment with Thai / Kikuyu speakers -- looking forward to reading about the results.
ReplyDeleteSorry for the long delay in responding. I think that the original work bears on FL for it is based in two kinds of evidence: typological evidence concerning these patterns and experimental evidence concerning what happens in artificial languages. It seems that these two streams of evidence converge. Now, one might argue that the experimental evidence just reflects the properties about a specific G and how that influences generalization to a G-like variant. However, this does not address the typological patterns, or at least I don't see that it does (I might be obtuse here). So, if we take these two together, we want the same factor that explains the experimental evidence that C&A produce and the typological patterns. I took they common factor to be a bias for generalizing along the "scope structure" dimension, rather than the "linear order" dimension. Were there such a bias, then it would seem to explain BOTH sets of facts.
DeleteAt any rate, that was why I took the C&A stuff to bear on FL. Of course, it would be good to do yet more experiments looking at typologically different languages, as David indicates he is planning to do.
Like Norbert, I think this work is really interesting, and I had an opportunity to talk to C & A about this last year, but it raises some important questions. In particular whether adult AGL learning experiments target the LAD or not. So many people think that the LAD is only active in the critical period --roughly before puberty. So does using these experiments commit you to the idea that the LAD is still active in adults?
ReplyDeleteThe second interesting question is what the role of these biases is in explaining language acquisition. The naive view is that since children can learn languages that violate these biases (since there are attested languages where they are violated) then any acquisition that is happening is happening instead of these biases rather then because of them. In other words, if you have a bias towards languages that have property P, but you can learn languages that are not P, then it is hard to see how the bias to P can be part of the explanation of how you acquire languages in general.
I feel the pull of that argument, but I think it may be missing a few steps; I guess this is the question that Norbert is raising at the end of the post.
I'm not sure either how a bias like this one could explain language acquisition. It's not even clear that there's a poverty of the stimulus issue involved in the acquisition of noun phrase modifier order (in a natural language, as opposed to the artificial language from C&A's experiment). Biases can still explain language universals and statistical typological generalizations diachronically though: language learners might extract the wrong generalizations from the data such that after a few generations languages that conform to the bias will be more common than languages that don't (see e.g. this paper by Joe Pater and Elliott Moreton: http://people.umass.edu/pater/pater-moreton-2012-offprint.pdf), though I don't know how plausible this story is for noun phrase modifier order.
ReplyDelete