Saturday, November 12, 2016

Linguistic diversity

Here’s an interesting piece by Nick Evans on the indigenous languages of Australia. It is imbued with a sensibility concerning the study of language quite different than my own (which is partly why I found it interesting) but it also raises some questions that someone who approaches linguistic questions from my direction should find intriguing. In what follows I will discuss both points of con- and di-vergence. But before starting, let me reiterate that I found the piece intriguing and I could imagine spending quite a bit of pleasant time over several cold beers talking to Nick about his work, which is a long-winded way of saying that you should take a look at the piece for yourself.[1]

Some comments:

(1) Nick worries about a question whose utility from where I sit is not at all evident: How to distinguish a language from a dialect (see 4). This is in service of trying to establish the integrity of the Australian language family, which is in turn in service of trying to estimate how fast languages change and how old language families are. The idea that Nick moots is that the Australian language family is 60,000 years old and that this raises the possibility that the emergence of the Faculty of Language is much older still. In other words, Nick takes the dating of the language family question as bearing on the emergence of the FL question. Clearly, the second one is of interest to devotees of the Minimalist Program.

However, I am not sure that I would take the question as nearly as well posed as Nick does. I do not see that there is a principled way of distinguishing languages from dialects. The one that he proposes is the following: “a language is something that is distinct enough to needs its own distinctive descriptive grammar” (5). But what does ‘distinctive enough’ mean? Darn if I know. For me a G is a mental construct. It is almost certain that no two Gs are the same (i.e. no two people have exactly the same Gs).  So the question is one of more or less. But so far as I know this becomes a question of G overlap and the degree of overlap will not be precise. But we need some measure of this to see how different two Gs are so as to get a measure of G difference and hence, change.  Maybe such measures exist, but I know of none, and unless one specifies some dimensions of similarity (which may exist (recall, I am no expert on these matters)) then the rate of change issue becomes hard to specify.

This said, if we could establish a rate of G change then this might be useful in establishing how old FL is, and given that the only evidence we have for when it emerged is indirect (the emergence of complex cultural artifacts (i.e. the big bang)) this would be useful. That said, I doubt that it would significantly alter the backdrop for Darwin’s Problem as it applies to language. The big fact is that FL appeared more or less in one piece and it has not evolved since.  There is no indication from what Nick writes that these older Gs are qualitatively different from contemporary ones. This means that the FL required to acquire them is effectively the same as the one that we still possess. And if that is the case then the logic of Darwin’s Problem as it applies to MP remains unchanged. So far as someone with my interests is concerned, that is enough.

Let me add a question before moving on: is there a measure of G change (or the more ambitious rate of G change?) out there?  Note that this would be a measure of how Gs of the same language change. This seems to require reifying languages so that two Gs can be Gs of the same language even if different in detail. So far as I know, modern GG has only an inchoate qualitative purchase on the notion of a language, and it has not been important to make it more precise. In fact, it is part of a dispensable idealization concerning ideal-speaker hearers. Nick’s project requires theoretically grounding the informal notion sufficient for most GG inquiries. I am skeptical, but wish him luck.

(2) Nick raises a second question: why are there so many languages anyhow (8ff)?  He asks this in order to focus efforts on identifying “the social processes that drive differentiation.” I also find this question interesting, but in a slightly different way.  From my perspective, Gs are products of three factors: (i) the structure of FL/UG, (ii) the nature of the PLD (the input data that the LAD uses to construct its G given the options FL/UG allows) and (iii) the learning theory that LADs use to organize the PLD and uses to construct a particular G given (i) and (ii).[2] The question I find interesting is why FL/UG makes so many Gs available. Why not simply hardwire in one G and be done with it? Why is FL/UG so open textured and environmentally sensitive (i.e. open to the effects of PLD)? Note, that FL/UG could have specified one G in the species (say all Gs have more or less the syntax of “English”). This is roughly what happens in some songbirds: all birds of a species sing the same song. Why isn’t this what happened for language? In P&P terms this would mean an FL/UG with no parameters. Why don’t we have this?  And does the fact that we don’t have this tell us anything interesting about FL/UG?
There are several possibilities. Mark Baker has offered a kind of evolutionary rationale. He thinks that Gs are codes that enable speakers of the same language to conceal information from outsiders (here:8):
Suppose that the language faculty has a concealing function as well as a revealing function. Our language faculty could have the purpose of communicating complex propositional information to collaborators while concealing it from rivals that might be listening in.
I say evolutionary, for I am assuming that it is because concealment can confer selective advantages that we have such a code. Though an ingenious idea, I am skeptical for the obvious reasons. This parameterized coding scheme is now species wide and anyone can acquire any of the coding schemes (aka Gs) if placed in the right linguistic environment. If the goal was opacity useful for segregating in groups form out groups then one can imagine schemes that would make it impossible (or at least very difficult) for outlanders to acquire the code would have been a superior option. But so far as we can tell, all humans are equally adept at learning any G (i.e. set of parameter values). Perhaps what Mark has in mind is that it is hard to learn a non native G later in life and this suffices for whatever advantages concealment promotes. Maybe.

I have remarked before, that parametrization is a very curious fact (if it is a fact) (here), one that suggests that, contrary to standard assumptions, typological difference tell us very little about the structure of FL. However, putting this to one side, it is interesting that Gs can be so different and Nick’s question of why there is so much variation is a good one.

What’s his answer? There are social processes that drive differentiation and we need to identify these. He suggests two steps (8-9):

The first step is to see how new linguistic elements are born: new sounds, new grammatical structures, new words, new meanings. What makes the range of these more or less diverse in different groups? For example, does being multilingual add options to the pool? ...

The second step is to find how the society promotes one variant over another. It is clear that some groups have linguistic ideologies that place a high premium on harnessing linguistic means to say “Our clan is different”, “our moiety is different” and so on…

This might be right so far as it goes, but it presupposes that FL/UG allows all of these options to begin with. In other words, given that FL allows diverse Gs what drives the specific diversity we see. Baker (and me) are interested in another question: why does FL allow the diversity to begin with. What’s wrong with an FL that, as it were, had no parameters at all?

Here’s my thought: an FL with fixed parameters is more biologically expensive than an open textured one. The idea is that if evolution can rely on there always being enough PLD to allow a child to acquire the local G then there is no reaons for evolution to code information in the genome that the PLD makes readily available. If fixing info in the genome is costly then it will not be put there unless it must be. So, an open textured system is what we should expect. That’s the idea.

I think that this fits pretty well with MP thinking as well. If what allows FL to emerge is a small addition, say an operation like Merge, (an addition that remains very stable and unchanging over time) then given that Merge is consistent with various surface differences then so long as the non linguistic proprietary parts of FL suffice with Merge to generate Gs then we should not expect more linguistic proprietary info to be biologically coded. If Merge is enough, then it’s all that we will get. Note, that this suggests that MP like systems will not likely have an FL/UG specification of a particular parameter space (see here and here for some discussion). If this can be fleshed out, then the reason we have G diversity is that fixed parameters are costly and MP takes FL to be what we get we add only a smidgen of linguistically proprietary structure to an otherwise language ready cognitive system. In other words, typologically diversity (PLD sensitive G generation) is just what MP ordered.

(3) Nick provides sort of an antidote for my tolerance for inferring UG principles from the properties of a single G. As he puts it (12-3):

We are just coming out of half a century where generative linguistics, as inspired by the great linguist Noam Chomsky, placed great emphasis on ‘Universal Grammar’, very much seeing all languages as alike with only minor variations. Part of this emphasis meant claiming there are all sorts of imaginable design options that are simply not found in language. For example, Steven Pinker and Paul Bloom wrote, in the early 90s, that ‘‘no language uses noun affixes to express tense’’. Now clearly this is simply wrong for Kayardild. It is an example of what can go wrong, scientifically, when one extrapolates prematurely from too limited a range of cases. Now there’s nothing wrong with the scientific strategy of making strong statements to invite falsification. But what Kayardild shows us – and many other languages I could have used to illustrate the structural originality of Australian languages, in different ways – is that we really need to get out there and describe languages, as they are, to realize the full richness and diversity of how humans have colonized the design space of language through the languages they have built through use.

I say “sort of” because Nick’s observations are not couched in terms of Gs but in terms of languages and the problems he cites have less to do with the properties of Gs than with their surface manifestations. Chomsky did not (and does not) see “all languages alike.” What he saw/sees was/is that all I-languages are pretty much alike. Missing the ‘I’ prefix threatens confusing Chomsky for Greenberg. I can understand that if one’s interest are mainly typological and that diversity is what gets you excited then dropping the ‘I’ will seem like the best way to import Chomsky’s insights into your work. But this is a mistake (as you knew I would say). It is not the diversity of languages that we need to investigate if your goal is GGish, but the diversity of Gs and these will only be indirectly related to surface patterns we observe. The Pinker-Bloom example is very much a Greenberg conception of universal at least as Nick takes it to be refuted by Kyardild (it appears to deal with features of overt affixes). If we are to learn about FL/UG by exploring the rich “design space of language” then we need to keep in mind that it is I-language space we should be exploring. Moreover, when it comes to I-language space I am less sure than Nick is that

[t]he world of languages holds more possibilities than any linguist has imagined, and Australian languages have taken the ‘design space’ in lots of rare and unusual directions, so that we’re still finding new phenomena that people hadn’t imagined before (14).

In fact, from where I sit, we have actually found relatively few new universals since the mid 1980s. If this is correct, oddly, exploring the ‘design space’ has enriched our understanding of language diversity but has left our understanding of I­-language variation pretty much where it was when only a small number of languages served as linguistic model organisms.[3]

That’s it. I think that Nick has asked some interesting questions, the most interesting being why FL/UG allows G variation. We are interested in different things, but the paper was fun to read and Kayardild sounds like it can take you on a wild ride. Like I said, I’d love to have a beer with him.

[1] Thx to Kleanthes for sending me the URL.
[2] This follows Anderson discussed a bit here.
[3] See here for a partial list. The observant reader will note that most of these are very old. It would be nice to have some candidate universals that are of more recent vintage, say discovered in the last 20 years. If my hunch is right that recent contributions to the list have been sparse of late, this is interesting and worth trying to understand.


  1. @Norbert: I don't understand your point about tense affixes on nouns and the I-language / E-language divide. You're right that "Kayardild has tense affixes on its nouns" is a statement about an E-language, since there really isn't such a thing as "Kayardild" any more than there is such a thing as "English" – there's just the mental representation of the grammar necessary to generate an I-language in the Kayardild- (or English-) speaking individual.

    So far so good, but how is the step from "Kayardild has tense affixes on its nouns" to "the mental grammar of the idealized Kayardild-speaking individual generates tense affixes on its nouns" different from the step from "English has no overt subjects in the infinitival complement of 'try'" to "the mental grammar of the idealized English-speaking individual disallows overt subjects in the infinitival complement of 'try'"? And if these inferential steps are not qualitatively different, then ceteris paribus we can take this to be a fact about possible I-languages. And so, if we take the Pinker-Bloom statement to be one about possible I-languages, then I think Kayardild does indeed falsify a putative universal concerning possible I-languages.

    Now, all of this is separate from the question of whether the Pinker-Bloom statement ("no language uses noun affixes to express tense") is or isn't a candidate for the status of Chomskyan universal. Let me put this another way: you could imagine Greenbergian universals stated over E-languages (as I believe Greenberg intended them to be), or as universals stated over I-languages (which is what I was entertaining above). "Greenbergian I-universals," if you will. If your interests are at the granularity of Merge, then I can see how this Pinker-Bloom hypothesis would seem irrelevant from your perspective; but I can easily see someone who is interested in differences between syntactic categories seeing the same hypothesis as a candidate to be a Chomskyan universal: "if a language has a reliable noun-verb distinction, then ..." And then you could imagine that if a language had no noun-verb distinction (I have not seen a convincing case that such languages exist, but let's put that aside), it would not bear on this issue, because it is a Chomskyan rather than Greenbergian universal.

    In short: from where I sit, while it might be true that all universals stated over E-languages are Greenbergian, the converse is not true.

  2. I largely agree with Omer's comment, specifically I worry about making the separation between Chomsky universals and Greenberg universals so sharp that Chomsky universals don't have any observable consequences. (I've commented on this point before.)

    But I take it that the important point for Norbert is that Evans apparently takes the reported facts to contradict the emphasis on "seeing all languages as alike with only minor variations". So it all comes down to what sort of variation one takes to be "minor". If you focus on the word-sequences that get externalized, then the difference between putting a tense marker or a case marker on a single word versus putting it on some large span of the sentence might not seem particularly minor. Whereas if you focus on the mental generative procedure and think about case marking as some dependency that gets established between two nodes of a tree, then seeing overt case marking appear on all words dominated by one of the involved nodes is more likely to seem like a "minor variation". (I don't know anything about the details of Kayardild case-marking, haven't even read Evans' description particularly carefully, and make no claims about exactly well it fits into modern theories of case, but I think the general point can stand that what you take to be "minor" will differ depending on the Chomskyan versus Greenbergian perspective.)

  3. If I remember right, the future tense marker in Kayardild appears on all non-subject NP/DPs. Don't remember the details, but Kayardild also has case-stacking.

    1. Yes, there's a book about it here:

      Also the author has a Yale PhD thesis online.

    2. The theoretical bite here is that the workings of these languages, easily accessible to anybody from about 1995 (Evans' book, the Plank's Double Case book, Sadock's autolexical analysis), show that it is completely impossible to make morphological marking dependent on structural adjacency, but this idea clung on at least till 2008, in Baker's Agreement book, and I am not entirely convinced that it has been eradicated from everybody's minds yet (although it is clearly gone from those of many people). Whereas, the final date after which it should have become untenable would have been 1998, when Rachel Nordlinger's book on Case appeared, but it lingered on for another decade or so, until Norvin Richard's work on Lardil became available.

      My impression is that people do take linguistic diversity seriously now to a considerably greater extent than they did in 1995, but the extent to which practice has shifted is not recognized either in the MP community or outside of it, and the exposition of the theoretical framework still looks to me like a complete mess.

    3. @Avery: thanks for the references!

  4. > In fact, from where I sit, we have actually found relatively few new universals since the mid 1980s.

    Is there a list?