Tuesday, November 18, 2014

Why Morphology? (part deux)

As Chomsky has repeatedly emphasized, natural language (NL) has two distinctive features; it’s hierarchically recursive and it contains a whole bunch of lexical items (LI) with rather distinctive properties (when compared to what one finds in animal communication systems (e.g. they are not stimulus bound and they are very conceptually labile)). A natural question that arises is whether these two features are related to one another; does the fact that NLs are hierarchically recursive (actually generated by Gs that are recursive and produce hierarchical linguistic objects, or the corresponding I-language is such, but let’s forget the niceities for now) have any causal bearing on the fact that NL lexicons are massive or vice versa?[1] The only proposal that I know of that links these two facts together causally is Lila Gleitman’s hypothesis that vocab acquisition leverages syntactic knowledge. The syntactic bootstrapping thesis is that Gs facilitate (and hence accelerate) vocab acquisition in LADs. By “vocab” I mean “open” class contentful items like ‘shoe’ and ‘rutabaga’ rather than “closed” class items like ‘in’ or ‘the’ or the plural ‘-s’ morpheme. NLs have a surprisingly large number of open class “words” when compared to other animal systems (at least 3 orders of magnitude larger) and it is reasonable to ask why this is so.[2]

Gleitman’s syntactic bootstrapping thesis provides a possible answer: without syntactic leverage, acquiring words is a very very arduous process (see here, here, and here for discussion about how first words are acquired), and as only humans have syntax, only they will have large vocabs. Oh, btw, by acquisition I intend something pretty trivial; tagging a concept with a label (generally phonological, but not exclusively so, think ASL). I don’t think that this is all there is to vocab acquisition,[3] but it turns out that even this is surprisingly difficult to accomplish, (contrary to intimation to the contrary in the philo of language literature, see Quine on the ‘museum myth’) and it requires lots of machinery to pull off (e.g. a way of generating labels, namely, something like a phonology and a way of identifying the things that need tagging).

I mention all of this because I just recently heard a lecture by Anne Christophe (see slides here) that goes into some details about the possible mechanics behind this leveraging process that bears on an earlier question I have been wondering about for a long time: why do NLs have so much phonologically overt morphology?  For any English speaker, morphology seems like a terrible idea (it’s a mess and a pain to learn, you should hear my German). However, Christophe argues that morphology and closed class items serve to facilitate the labeling process that drives open class vocab acquisition in humans. In other words, her work (and here I mean that of the work from her lab as there are many contributors here as the slides make clear) sketches the following picture: closed class items (and I including morphological stuff here) provide excellent environments for the identification and tagging of open class expressions. And as closed class items are in fact very closely tied to (correlated with) syntactic structure (think Jabberwocky!!), morphology, broadly construed, is what enables kids to leverage syntax to build vast lexicons. If correct, this forges a close causal connection between the two distinctive properties NLs display; large vocabs and syntactic structure. Let’s call this the Gleitman-Christophe thesis (GCT).

What’s the evidence? Effectively, morphology provides relatively stable landmarks between which content words sit thus allowing for easy identification and tagging. In other words, morphology allows the LAD to zero in on content words by locating them in relatively fixed morpho-syntactic frames. And very young LADs can use this information for they are known to be very good at acquiring this fixed syntactic scaffolding using non-linguistic distributional (and statistical) methods (see slide 18).[4] Closed class items are acquired early (they are ubiquitous, frequent and short) and it has been shown that kids can and do exploit them to “find” content words. Christophe reports on new work that shows how useful this can be for categorizing words into initial semantic classes (e.g. distinguishing Ns (canonically things) from Vs (canonically eventish)).  She then goes on to describe how acquiring a few words can further enhance the process of acquiring yet more vocab. The first few words act as “seeds” that once acquired serve to further leverage the acquisition process. In sum, Christophe describes a process in which prosody (which Christophe discusses in the first part of the talk), morphology and syntax greatly facilitate word acquisition, which in turn enables yet more word acquisition. We thus get a virtuous circle anchored in morpho-syntax, which is in turn anchored in epistemologically prior pre-linguistic capacities to find fixed phono-morpho-syntactic landmarks that allow LADs to quickly fix on new words.[5] This all provides good evidence in favor of the GCT. It also allows one to begin to formulate one answer to my earlier question: Why Morphology?

So what’s morphology for? (This is a dumb question really, for it could be for lots of things. However, thinking functionally is natural for ‘why’ questions. See below for a just so story). Well among other things It is there to support large vocabs and this, I would suggest is a very big deal.  I once suggested the following thought experiment: I/you am/are going to Hungary (I/you speak NO Hungarian) and am/are made the following offer: I/you can have a vocab of 50k words and no syntax whatsoever or a perfect syntax and a vocab of 10 words. Which would I/you prefer? I/(you?) would take door number 1. One can get a long way on 50k words. Nor is this only for purposes of “communication” (though were there an indirect tie between communicative enhancement and morpho-syntax I would be ok with that). Phenomenologically speaking, tagging a concept has an effect not unlike making an implicit assumption explicit. And explicitness is a very good way to enhance thought (indeed, it feels like it allows one to entertain novel thoughts). Having a word for something allows it to be conceptually accessible and salient in a way that having the concept inchoately does not. In fact, I am tempted to say, that having a word for something can change the way you think (i.e. it can affect cognitive competence, not just enhance performance).[6] So, tagging concepts really helps and tagging a lot of concepts really really helps. Thus, if you are thinking of putting in an Amazon order for a syntax, I would suggest asking for one that also supports large scale vocab acquisition (tagging concepts) and the GCT argues that such a syntax would come with phonologically overt morphology and phonologically overt closed class items that our pre-linguistic proclivities can exploit to build large lexicons quickly. In other words, if you want an NL that is good for thinking (and maybe also for communication) get one that has the kinds of relations between morphology and syntax that we actually find.

Note that this view of morphology leaves (completely?) open the question of how morphology functions inside UG. It is consistent with this view that G operations are driven by feature checking requirements, some of which become realized overtly in the phonology (this is characteristic of early Minimalist proposals). It is also consistent with the view that they are not (e.g. that they are mere by-products of grammatical operations rather than drivers thereof (this is what we find in later EPP based conceptions of grammatical operations in later minimalism). It is consistent with the idea that morphology exists to fit syntax to phonology (readjustment rules), or that it’s not (i.e. it’s a functionally useless ornamentation).  All the GCT requires is that there be reliable correlations between overt markers and the syntax so as to allow the LAD to leverage the syntax in order to acquire a content rich lexicon.

If this is right, then it might serve as the start of an account of why there is so much overt morphology and/or closed class items (very frequent items that function to limn syntactic structure) in NLs.  In fact, it suggests that there should be no NL that eschews both, though what the mix needs to be can be quite open. So Chinese and English don’t have much morphology, but they have classifiers (Chinese) or Determiners and Verbal morphology (English) and this can serve to do the lexicon building job (Anne C tells me that all the stuff done on French discussed in the slides replicates in Mandarin).

As an aside, IMO, one of the puzzles of morphology is why some languages seem to have so much (Georgian) and some seem to have so little (English, Chinese). If morphology is that important in the grammar either functionally (e.g. for processing) or grammatically (e.g. it drives operations) then why should some NLs express so much of it overtly and some have almost none at all that is visible. The GCT offers a possible way out: morphology per se is not what we should be looking at. Rather it is morphology plus closed class items; items that give you fixed frames for triangulating on content “words.” There needs to be a sufficiency of these to undergird vocab acquisition, but the mix need not be crucial (in fact it is not even clear how much of both is sufficient or if there may be a cost in having too much. Or even if these queries make any sense).

Let me end here. NLs are stuffed with what appears to be “useless” (and as an English speaker, cumbersome) morphology. And useless it may be from a purely grammatical point of view (note I say may, leaving the question open). But GCT suggests that overt grammar dependent fixed points can be very useful for building lexicons given our pre-linguistic capacities. And given the virtues of a good sized lexicon for thinking and communicating, a syntax that can support this should have advantages over one that doesn’t (that’s the just so story, btw). If correct, and the data collected so far is non-trivial, this is nice to know for it serves to possibly bridge two big facts about NLs, facts that to date seem (or more accurately, seemed to me) to be entirely independent of one another.

[1] Nothing I say touches on the fact that NL lexical items have very distinctive properties, at least when compared to symbols in animal systems. In other words, the lexicon presents two puzzles: (i) why is it so big? (ii) why do human lexical items function so differently from non-human ones. What follows tries to say something about the first, but leaves the second untouched. Chomsky has discussed some of these distinctive features and Paul Pietroski has a forthcoming book that discusses the topic of lexicalization. 
[2] So far as I know, no animal communication system has any closed class expressions, which, if they have no syntax, would not be a surprise given Gleitman’s conjecture.
[3] Paul Pietroski has a terrific forthcoming book on what lexicalization consists in “semantically” and I intend my views on the matter to closely track his. However, for present purposes, we can ignore the semantic side of the word acquisition process and concentrate solely on the process of phonetically tagging LIs.
[4] That kids are really good at identifying morphology has always surprised me. This is far less the case in second language acquisition if my experience is anything to go on. At any rate, it seems that kids rarely make “errors” of commission morphologically. If they screw up, which is surprisingly infrequently, it manifests as errors of omission. Karin Stromswold has work from a while ago documenting this.
[5] This might also provide a model for how to think of the thematic hierarchy.  Is this part of UG or not? Well, one reason for thinking not is that it is very hard to define theta roles so that they apply across a broad class of verbs. Dowty showed how hard it is to define ‘agent’ and ‘patient’ etc. and Grimshaw noted that it is largely irrelevant for syntactic concerns anyhow. When does it matter? Real, theta roles matter for UTAH. We need to know where a DP starts its derivational life.  If this is its sole role, then what one needs theta roles for is not semantic interpretation in general but for priming the syntax.  In this case, all one needs are a few thematically well behaved verbs (like ‘eat,’ ‘hug,’ ‘hit,’ ‘kiss,’) and to get the syntax off the ground. Once flying, we don’t need thematic information any more for there arise other ways, some of them being morpho-syntactic, to figure our where a DP came from (think case, or fixed word order position or agreement patterns). At any rate, like that morphology case, the thematic hierarchy need not be part of UG to be linguistically very important to the LAD.
[6] Think of this as a very weak (almost truistic) version of the Sapir-Whorf hypothesis.


  1. Form is easy, meaning is hard: this should be enshrined somewhere.

    The evidence that closed class words facilitate the acquisition of open class words comes from several places. One type is summarized by Christophe--an idea that Virginia Valian refers to as "anchoring points" (J. Mem. Lg, 1988), high frequency items (e.g., closed class words) can be used as anchors to determine the properties of low frequency items (e.g., open class words). Roger Brown’s classic “sib” experiment started this all.

    Another strand comes, perhaps surprisingly, from NLP. In a series of insightful papers, Qiuye Zhao, who just finished at Penn, exploited these very ideas to achieve state of the art results for part of speech tagging and Chinese word segmentation in a largely unsupervised setting. The key idea is to use high frequency elements, including words AND morphological endings--Norbert: that’s indeed what morphology is for!—to define the distributional profile for the lower frequency elements. (Email me for a copy of her dissertation or find her papers at venues such as EMNLP, ACL etc.)

    But it remains unclear how much this very first step of bootstrapping is statistical (or non-linguistic). Page 19 of the slides cites an important study by Shi and Melancon ( When presented with "le mige", the latter a non-sense French word, 14 month old infants interpret "mige" as a noun. But when presented with "je mige", they do NOT interpret "mige" as a verb. I have checked the statistical properties of child directed French and more extensively, English, which is similar in this respect. The transitional probability of D-N is actually LOWER than that of Pronoun-V: a statistical learner should be do better at recognizing “mige” the verb than “mige” the noun.

    We are not quite sure why the French infants behaved the way they did, but a plausible idea is very much a linguistic one: D-N is part of a phase but Pronoun-V straddles phrase boundaries. It is a general principle of language that statistical correlation over structurally ill-formed units, no matter how strong, is simply not attended to. BTW, this is exactly what Carl de Marcken showed in his 1995 ACL paper: statistical correlations frequently mislead, unless they are constrained by structural principles.

    1. Thx for the info. One of the charms of writing about something you know nothing about is that it is pure speculation. It's nice to know that this can actually be backed up with real work. Thx for the Valian reference. I should have known about this one.

      I could not agree more heartily with your last remark. IT is fashionable (I do this too) to set up a dichotomy between structure and induction. But one way of looking at what GG has done is to try and identify those principles and domains that will allow induction to fruitfully proceed.

  2. Regarding this portion of Norbert's post:

    [...] morphology per se is not what we should be looking at. Rather it is morphology plus closed class items.

    It might be worth noting that within a "syntax all-the-way-down" model (as in, say, Distributed Morphology), the disjunction embodied by the 'plus' in this quote would be an illusion. That's because, in such a model, there's little difference between so-called closed class "words" and closed class (derivational) morphemes.

    Of course, there are nuances here -- in particular, inflectional morphology is not as neatly unified with free-standing closed class "words" as derivational morphology is.

    But to me, it's interesting to consider that the child really isn't summing over two classes (closed class "words" and closed class morphemes) at all. It only looks that way to adult linguists (and associated laypersons) who have been falsely indoctrinated into a false belief in the existence of "words" :-)

    NB: I don't think anybody denies that there is such a thing as a phonological word, of course, not even dyed-in-the-wool DMers. But notice that, crucially, closed class items seldom constitute phonological words in their own right. So the existence of phonological words is only useful for ascertaining what the closed class items are insofar as, say, closed class items tend to occur exclusively at the edges of phonological words. (This is something that Anne mentioned in her talk, though I don't remember off hand if it was in the slides.)

  3. A minor quibble. You write, "generally phonological, but not exclusively so, think ASL". Sign languages have a level of structure roughly comparable to phonological structure in spoken languages, and the consensus in the sign language linguistics community is to call this level phonology. One could argue whether this is a neutral choice of terminology, given that the word has the Greek root for voice in it. But that battle is over, and statements like this risk giving the impression that ASL (and other sign languages) lack phonological structure. If instead you meant the possibility of iconic signs, most (all?) iconic signs have phonological structure, and iconic vocabulary exists for spoken languages as well.

    1. You are right. I might no invidious distinction between ASL and, say, English. I should have said "generally expressed in sound…." Thx.

  4. This stuff is fascinating, I agree. I think the general research strategy -- explaining some quite obvious gross properties of language through the way they facilitate language acquisition -- is great, and may be quite widely applicable as it can explain them without having to be part of UG, and so doesn't aggravate Darwin's problem.
    Furthermore, as Charles points out, it is amenable to computational investigation.

  5. Morphology may well facilitate vocabulary learning, but I'm not sure that tells us how we got so much morphology in the first place. The process of acquisition might explain it: kids seem to be pretty good at identifying the component morphemes of a complex form, even when those morphemes are presented simultaneously (as in ASL verbs of motion - Elissa Newport and Ted Supalla have work from the 80's showing that kids will actually pull out morphemes individually and then [incorrectly] assemble them sequentially for a while before mastering the grammatical rules for simultaneous combination). They can identify morphemes and end up with a regular morphological system even when their input is messy and irregular (Lissa again, in studies of deaf children learning from non-native signers- Simon's the famous one). Hints of the same effects come through in the work on Nicaraguan Sign Language by Annie Senghas et al where it looks like kids may be inventing morphemes where none existed in their (non-linguistic, gestural) input.
    So maybe kids really like segmented/componentially organized systems, or they really 'want' language to be organized that way for whatever reason (UG if you like), and as a result we get lots of morphology.

    1. So, if I get you correctly, your story is that morphology is a by-product of how gets segment things. I am not sure that this is inconsistent with the story I was trying to push. Why morphology? To license vocab acquisition (rather than, say, parsing, or production). For it to do this, of course, kids better be ok at tracking it (and it appears that they are as Newport and Suppala a.o. have argued). But the fact that kids are good at it does not mean that languages should display lots of it, or does it?

    2. It does, if one agrees that kids being good at morphology can explain the emergence of morphology in the cases I mentioned (or potential emergence, for NSL). The idea is that kids are SO good at morphology that in some cases they create it where it didn't exist before, leading to languages with more morphology.

      I don't think this idea inconsistent with your story either - this seems to me like a question of function/utility (what helpful things does morphology do once it's there) vs. source (why is it there/how did it get there in the first place).

    3. Yes, they do create it. But I guess I want to know WHY they do. What's the morphology doing for them. Here's one reason, because kids come with UGs that favor morphology. Thus, because they are good at it and UG favors it when they have the chance to construct Gs with overt morphology they do so. Ok, next question: why does UG care about Gs with morphology? Now one answer is that UG doesn't care about morphology at all. Morphology is just what you get when humans form constituents from strings. If that's so, we should find morphology in non-linguistic systems. Does one? I don't know. But one possibility is that UG "likes" Gs with overt morphology because it promotes another important feature of language; large vocal growth. How does it do this? By allowing the natural talents that kids have (find and create morphology) to ornament Gs with these properties so that large vocals will be easily acquirable. Does this make sense? There are two questions: What powers must kids have to allow for a lot of morphology? And Why does UG exploit these properties to favor Gs that have this property?

  6. A note from Anne Christophe (whose name I misspelled in the post but have now corrected) regarding the Chinese stuff that I mentioned. She corrects matters. So for the record from her note to me:

    "Anne C tells me that all the stuff done on French discussed in the slides replicates in Mandarin" it looks as if all the research presented replicates in Mandarin (including the baby work etc), which is of course not the case at all; only the 'Frequent frames' thing has been replicated in Mandarin and it works quite well thanks to the noun classifiers (and this is not published unfortunately…)