As Chomsky has repeatedly emphasized, natural language (NL)
has two distinctive features; it’s hierarchically recursive and it contains a
whole bunch of lexical items (LI) with rather distinctive properties (when
compared to what one finds in animal communication systems (e.g. they are not
stimulus bound and they are very conceptually labile)). A natural question that
arises is whether these two features are related to one another; does the fact
that NLs are hierarchically recursive (actually generated by Gs that are
recursive and produce hierarchical linguistic objects, or the corresponding
I-language is such, but let’s forget the niceities for now) have any causal bearing on the fact that NL
lexicons are massive or vice versa?[1]
The only proposal that I know of that links these two facts together causally
is Lila Gleitman’s hypothesis that vocab acquisition leverages syntactic
knowledge. The syntactic bootstrapping thesis is that Gs facilitate (and hence
accelerate) vocab acquisition in LADs. By “vocab” I mean “open” class contentful
items like ‘shoe’ and ‘rutabaga’ rather than “closed” class items like ‘in’ or
‘the’ or the plural ‘-s’ morpheme. NLs have a surprisingly large number of open
class “words” when compared to other animal systems (at least 3 orders of
magnitude larger) and it is reasonable to ask why this is so.[2]
Gleitman’s syntactic bootstrapping thesis provides a
possible answer: without syntactic leverage, acquiring words is a very very
arduous process (see here,
here,
and here
for discussion about how first words are acquired), and as only humans have
syntax, only they will have large vocabs. Oh, btw, by acquisition I intend
something pretty trivial; tagging a concept with a label (generally
phonological, but not exclusively so, think ASL). I don’t think that this is
all there is to vocab acquisition,[3]
but it turns out that even this is surprisingly difficult to accomplish, (contrary
to intimation to the contrary in the philo of language literature, see Quine on
the ‘museum myth’) and it requires lots of machinery to pull off (e.g. a way of
generating labels, namely, something like a phonology and a way of identifying
the things that need tagging).
I mention all of this because I just recently heard a lecture
by Anne Christophe (see slides here)
that goes into some details about the possible mechanics behind this leveraging
process that bears on an earlier question I have been wondering about for a long
time: why do NLs have so much phonologically
overt morphology? For any English
speaker, morphology seems like a terrible idea (it’s a mess and a pain to learn,
you should hear my German). However, Christophe argues that morphology and
closed class items serve to facilitate the labeling process that drives open
class vocab acquisition in humans. In other words, her work (and here I mean
that of the work from her lab as there are many contributors here as the slides
make clear) sketches the following picture: closed class items (and I including
morphological stuff here) provide excellent environments for the identification
and tagging of open class expressions. And as closed class items are in fact very
closely tied to (correlated with) syntactic structure (think Jabberwocky!!),
morphology, broadly construed, is what enables kids to leverage syntax to build
vast lexicons. If correct, this forges a close causal connection between the
two distinctive properties NLs display; large vocabs and syntactic structure. Let’s
call this the Gleitman-Christophe thesis (GCT).
What’s the evidence? Effectively, morphology provides relatively
stable landmarks between which content words sit thus allowing for easy
identification and tagging. In other words, morphology allows the LAD to zero
in on content words by locating them in relatively fixed morpho-syntactic
frames. And very young LADs can use this information for they are known to be very good at acquiring this fixed
syntactic scaffolding using non-linguistic
distributional (and statistical) methods (see slide 18).[4]
Closed class items are acquired early (they are ubiquitous, frequent and short)
and it has been shown that kids can and do exploit them to “find” content
words. Christophe reports on new work that shows how useful this can be for
categorizing words into initial semantic classes (e.g. distinguishing Ns
(canonically things) from Vs (canonically eventish)). She then goes on to describe how acquiring a
few words can further enhance the process of acquiring yet more vocab. The
first few words act as “seeds” that once acquired serve to further leverage the
acquisition process. In sum, Christophe describes a process in which prosody
(which Christophe discusses in the first part of the talk), morphology and
syntax greatly facilitate word acquisition, which in turn enables yet more word
acquisition. We thus get a virtuous circle anchored in morpho-syntax, which is
in turn anchored in epistemologically prior pre-linguistic capacities to find
fixed phono-morpho-syntactic landmarks that allow LADs to quickly fix on new
words.[5]
This all provides good evidence in favor of the GCT. It also allows one to
begin to formulate one answer to my earlier question: Why Morphology?
So what’s morphology for? (This is a dumb question really, for
it could be for lots of things. However, thinking functionally is natural for
‘why’ questions. See below for a just so story). Well among other things It is
there to support large vocabs and this, I would suggest is a very big
deal. I once suggested the following
thought experiment: I/you am/are going to Hungary (I/you speak NO Hungarian) and am/are made the
following offer: I/you can have a vocab of 50k words and no syntax whatsoever
or a perfect syntax and a vocab of 10 words. Which would I/you prefer? I/(you?)
would take door number 1. One can get a long way on 50k words. Nor is this only
for purposes of “communication” (though were there an indirect tie between
communicative enhancement and morpho-syntax I would be ok with that). Phenomenologically
speaking, tagging a concept has an effect not unlike making an implicit
assumption explicit. And explicitness is a very good way to enhance thought
(indeed, it feels like it allows one to entertain novel thoughts). Having a
word for something allows it to be conceptually accessible and salient in a way
that having the concept inchoately does not. In fact, I am tempted to say, that
having a word for something can change
the way you think (i.e. it can affect cognitive competence, not just enhance performance).[6]
So, tagging concepts really helps and tagging a lot of concepts really really
helps. Thus, if you are thinking of putting in an Amazon order for a syntax, I
would suggest asking for one that also supports large scale vocab acquisition
(tagging concepts) and the GCT argues that such a syntax would come with phonologically
overt morphology and phonologically overt closed class items that our
pre-linguistic proclivities can exploit to build large lexicons quickly. In
other words, if you want an NL that is good for thinking (and maybe also for
communication) get one that has the kinds of relations between morphology and
syntax that we actually find.
Note that this view of morphology leaves (completely?) open
the question of how morphology functions inside UG. It is consistent with this
view that G operations are driven by feature checking requirements, some of
which become realized overtly in the phonology (this is characteristic of early
Minimalist proposals). It is also consistent with the view that they are not
(e.g. that they are mere by-products of grammatical operations rather than
drivers thereof (this is what we find in later EPP based conceptions of
grammatical operations in later minimalism). It is consistent with the idea
that morphology exists to fit syntax to phonology (readjustment rules), or that
it’s not (i.e. it’s a functionally useless ornamentation). All the GCT requires is that there be reliable
correlations between overt markers and the syntax so as to allow the LAD to
leverage the syntax in order to acquire a content rich lexicon.
If this is right, then it might serve as the start of an
account of why there is so much overt morphology and/or closed class items
(very frequent items that function to limn syntactic structure) in NLs. In fact, it suggests that there should be no
NL that eschews both, though what the mix needs to be can be quite open. So
Chinese and English don’t have much morphology, but they have classifiers
(Chinese) or Determiners and Verbal morphology (English) and this can serve to
do the lexicon building job (Anne C tells me that all the stuff done on French
discussed in the slides replicates in Mandarin).
As an aside, IMO, one of the puzzles of morphology is why
some languages seem to have so much (Georgian) and some seem to have so little
(English, Chinese). If morphology is that
important in the grammar either functionally (e.g. for processing) or
grammatically (e.g. it drives operations) then why should some NLs express so
much of it overtly and some have almost none at all that is visible. The GCT offers
a possible way out: morphology per se
is not what we should be looking at. Rather it is morphology plus closed class items; items that give
you fixed frames for triangulating on content “words.” There needs to be a sufficiency
of these to undergird vocab acquisition, but the mix need not be crucial (in
fact it is not even clear how much of both is sufficient or if there may be a
cost in having too much. Or even if these queries make any sense).
Let me end here. NLs are stuffed with what appears to be “useless”
(and as an English speaker, cumbersome)
morphology. And useless it may be from a purely grammatical point of view (note
I say may, leaving the question
open). But GCT suggests that overt grammar dependent fixed points can be very
useful for building lexicons given our pre-linguistic capacities. And given the
virtues of a good sized lexicon for thinking and communicating, a syntax that
can support this should have advantages over one that doesn’t (that’s the just
so story, btw). If correct, and the data collected so far is non-trivial, this
is nice to know for it serves to possibly bridge two big facts about NLs, facts
that to date seem (or more accurately, seemed to me) to be entirely independent
of one another.
[1]
Nothing I say touches on the fact that NL lexical items have very distinctive
properties, at least when compared to symbols in animal systems. In other
words, the lexicon presents two puzzles: (i) why is it so big? (ii) why do
human lexical items function so differently from non-human ones. What follows
tries to say something about the first, but leaves the second untouched.
Chomsky has discussed some of these distinctive features and Paul Pietroski has
a forthcoming book that discusses the topic of lexicalization.
[2]
So far as I know, no animal communication system has any closed class
expressions, which, if they have no syntax, would not be a surprise given
Gleitman’s conjecture.
[3]
Paul Pietroski has a terrific forthcoming book on what lexicalization consists
in “semantically” and I intend my views on the matter to closely track his.
However, for present purposes, we can ignore the semantic side of the word
acquisition process and concentrate solely on the process of phonetically tagging
LIs.
[4]
That kids are really good at identifying morphology has always surprised me.
This is far less the case in second language acquisition if my experience is
anything to go on. At any rate, it seems that kids rarely make “errors” of
commission morphologically. If they screw up, which is surprisingly
infrequently, it manifests as errors of omission. Karin Stromswold has work
from a while ago documenting this.
[5]
This might also provide a model for how to think of the thematic
hierarchy. Is this part of UG or not?
Well, one reason for thinking not is that it is very hard to define theta roles
so that they apply across a broad class of verbs. Dowty showed how hard it is
to define ‘agent’ and ‘patient’ etc. and Grimshaw noted that it is largely
irrelevant for syntactic concerns anyhow. When does it matter? Real, theta
roles matter for UTAH. We need to know where a DP starts its derivational
life. If this is its sole role, then
what one needs theta roles for is not semantic interpretation in general but for
priming the syntax. In this case, all
one needs are a few thematically well behaved verbs (like ‘eat,’ ‘hug,’ ‘hit,’
‘kiss,’) and to get the syntax off the ground. Once flying, we don’t need
thematic information any more for there arise other ways, some of them being
morpho-syntactic, to figure our where a DP came from (think case, or fixed word
order position or agreement patterns). At any rate, like that morphology case,
the thematic hierarchy need not be part of UG to be linguistically very
important to the LAD.
[6]
Think of this as a very weak (almost truistic) version of the Sapir-Whorf
hypothesis.
Form is easy, meaning is hard: this should be enshrined somewhere.
ReplyDeleteThe evidence that closed class words facilitate the acquisition of open class words comes from several places. One type is summarized by Christophe--an idea that Virginia Valian refers to as "anchoring points" (J. Mem. Lg, 1988), high frequency items (e.g., closed class words) can be used as anchors to determine the properties of low frequency items (e.g., open class words). Roger Brown’s classic “sib” experiment started this all.
Another strand comes, perhaps surprisingly, from NLP. In a series of insightful papers, Qiuye Zhao, who just finished at Penn, exploited these very ideas to achieve state of the art results for part of speech tagging and Chinese word segmentation in a largely unsupervised setting. The key idea is to use high frequency elements, including words AND morphological endings--Norbert: that’s indeed what morphology is for!—to define the distributional profile for the lower frequency elements. (Email me for a copy of her dissertation or find her papers at venues such as EMNLP, ACL etc.)
But it remains unclear how much this very first step of bootstrapping is statistical (or non-linguistic). Page 19 of the slides cites an important study by Shi and Melancon (http://www.tpsycho.uqam.ca/NUN/D_pages_Profs/D_GRL/Publications/shi_melancon_2010.pdf). When presented with "le mige", the latter a non-sense French word, 14 month old infants interpret "mige" as a noun. But when presented with "je mige", they do NOT interpret "mige" as a verb. I have checked the statistical properties of child directed French and more extensively, English, which is similar in this respect. The transitional probability of D-N is actually LOWER than that of Pronoun-V: a statistical learner should be do better at recognizing “mige” the verb than “mige” the noun.
We are not quite sure why the French infants behaved the way they did, but a plausible idea is very much a linguistic one: D-N is part of a phase but Pronoun-V straddles phrase boundaries. It is a general principle of language that statistical correlation over structurally ill-formed units, no matter how strong, is simply not attended to. BTW, this is exactly what Carl de Marcken showed in his 1995 ACL paper: statistical correlations frequently mislead, unless they are constrained by structural principles.
Thx for the info. One of the charms of writing about something you know nothing about is that it is pure speculation. It's nice to know that this can actually be backed up with real work. Thx for the Valian reference. I should have known about this one.
DeleteI could not agree more heartily with your last remark. IT is fashionable (I do this too) to set up a dichotomy between structure and induction. But one way of looking at what GG has done is to try and identify those principles and domains that will allow induction to fruitfully proceed.
Regarding this portion of Norbert's post:
ReplyDelete[...] morphology per se is not what we should be looking at. Rather it is morphology plus closed class items.
It might be worth noting that within a "syntax all-the-way-down" model (as in, say, Distributed Morphology), the disjunction embodied by the 'plus' in this quote would be an illusion. That's because, in such a model, there's little difference between so-called closed class "words" and closed class (derivational) morphemes.
Of course, there are nuances here -- in particular, inflectional morphology is not as neatly unified with free-standing closed class "words" as derivational morphology is.
But to me, it's interesting to consider that the child really isn't summing over two classes (closed class "words" and closed class morphemes) at all. It only looks that way to adult linguists (and associated laypersons) who have been falsely indoctrinated into a false belief in the existence of "words" :-)
NB: I don't think anybody denies that there is such a thing as a phonological word, of course, not even dyed-in-the-wool DMers. But notice that, crucially, closed class items seldom constitute phonological words in their own right. So the existence of phonological words is only useful for ascertaining what the closed class items are insofar as, say, closed class items tend to occur exclusively at the edges of phonological words. (This is something that Anne mentioned in her talk, though I don't remember off hand if it was in the slides.)
A minor quibble. You write, "generally phonological, but not exclusively so, think ASL". Sign languages have a level of structure roughly comparable to phonological structure in spoken languages, and the consensus in the sign language linguistics community is to call this level phonology. One could argue whether this is a neutral choice of terminology, given that the word has the Greek root for voice in it. But that battle is over, and statements like this risk giving the impression that ASL (and other sign languages) lack phonological structure. If instead you meant the possibility of iconic signs, most (all?) iconic signs have phonological structure, and iconic vocabulary exists for spoken languages as well.
ReplyDeleteYou are right. I might no invidious distinction between ASL and, say, English. I should have said "generally expressed in sound…." Thx.
DeleteThis stuff is fascinating, I agree. I think the general research strategy -- explaining some quite obvious gross properties of language through the way they facilitate language acquisition -- is great, and may be quite widely applicable as it can explain them without having to be part of UG, and so doesn't aggravate Darwin's problem.
ReplyDeleteFurthermore, as Charles points out, it is amenable to computational investigation.
Morphology may well facilitate vocabulary learning, but I'm not sure that tells us how we got so much morphology in the first place. The process of acquisition might explain it: kids seem to be pretty good at identifying the component morphemes of a complex form, even when those morphemes are presented simultaneously (as in ASL verbs of motion - Elissa Newport and Ted Supalla have work from the 80's showing that kids will actually pull out morphemes individually and then [incorrectly] assemble them sequentially for a while before mastering the grammatical rules for simultaneous combination). They can identify morphemes and end up with a regular morphological system even when their input is messy and irregular (Lissa again, in studies of deaf children learning from non-native signers- Simon's the famous one). Hints of the same effects come through in the work on Nicaraguan Sign Language by Annie Senghas et al where it looks like kids may be inventing morphemes where none existed in their (non-linguistic, gestural) input.
ReplyDeleteSo maybe kids really like segmented/componentially organized systems, or they really 'want' language to be organized that way for whatever reason (UG if you like), and as a result we get lots of morphology.
So, if I get you correctly, your story is that morphology is a by-product of how gets segment things. I am not sure that this is inconsistent with the story I was trying to push. Why morphology? To license vocab acquisition (rather than, say, parsing, or production). For it to do this, of course, kids better be ok at tracking it (and it appears that they are as Newport and Suppala a.o. have argued). But the fact that kids are good at it does not mean that languages should display lots of it, or does it?
DeleteIt does, if one agrees that kids being good at morphology can explain the emergence of morphology in the cases I mentioned (or potential emergence, for NSL). The idea is that kids are SO good at morphology that in some cases they create it where it didn't exist before, leading to languages with more morphology.
DeleteI don't think this idea inconsistent with your story either - this seems to me like a question of function/utility (what helpful things does morphology do once it's there) vs. source (why is it there/how did it get there in the first place).
Yes, they do create it. But I guess I want to know WHY they do. What's the morphology doing for them. Here's one reason, because kids come with UGs that favor morphology. Thus, because they are good at it and UG favors it when they have the chance to construct Gs with overt morphology they do so. Ok, next question: why does UG care about Gs with morphology? Now one answer is that UG doesn't care about morphology at all. Morphology is just what you get when humans form constituents from strings. If that's so, we should find morphology in non-linguistic systems. Does one? I don't know. But one possibility is that UG "likes" Gs with overt morphology because it promotes another important feature of language; large vocal growth. How does it do this? By allowing the natural talents that kids have (find and create morphology) to ornament Gs with these properties so that large vocals will be easily acquirable. Does this make sense? There are two questions: What powers must kids have to allow for a lot of morphology? And Why does UG exploit these properties to favor Gs that have this property?
DeleteA note from Anne Christophe (whose name I misspelled in the post but have now corrected) regarding the Chinese stuff that I mentioned. She corrects matters. So for the record from her note to me:
ReplyDelete"Anne C tells me that all the stuff done on French discussed in the slides replicates in Mandarin" it looks as if all the research presented replicates in Mandarin (including the baby work etc), which is of course not the case at all; only the 'Frequent frames' thing has been replicated in Mandarin and it works quite well thanks to the noun classifiers (and this is not published unfortunately…)