Monday, October 24, 2016

Universal tendencies

Let’s say we find two languages displaying a common pattern, or two languages converging towards a common pattern, or even all languages doing the same. How should we explain this? Stephen Anderson (here, and discussed by Haspelmath here) notes that if you are a GGer there are three available options: (i) the nature of the input, (ii) the learning theory and (iii) the cognitive limits of the LAD (be they linguistically specific or domain general). Note that (ii) will include (iii) as a subpart and will have to reflect the properties of (i) but will also include all sorts other features (cognitive control, structure of memory and attention, the number of options the LAD considers at one time etc.). These, as Anderson notes, are the only options available to a GGer for s/he takes G change to reflect the changing distribution of Gs in the heads of a population of speakers. Or, to put this more provocatively: languages don't exist apart from their incarnation in speakers’ minds/brains. And given this, all diachronic “laws” (laws that explain how languages or Gs change over time) must reflect the cognitive, linguistic or computational properties of human minds/brains.

This said, Haspelmath (H) observes (here and here) (correctly in my view) that GGers have long “preferred purely synchronic ways of explaining typological distributions,” and by this he means explanations that allude to properties of the “innate Language Faculty” (see here for discussion). In other words, GGers like to think that typological differences reflect intrinsic properties of FL/UG and that studying patterns of variation will hence shed light on its properties. I have voiced some skepticism concerning this “hence” here. In what follows I would like to comment on H’s remarks on a similar topic. However, before I get into details I should note that we might not be talking about the same thing. Here’s what I mean.

The way I understand it, FL/UG bears on properties of Gs not on properties of their outputs. Hence, when I look at typology I am asking how variation in typologies and historical change might explain changes in Gs. Of course, I use outputs of these Gs to try to discern the properties of the underlying Gs, but what I am interested in is G variation not output variation. This concedes that one might achieve similar (identical?) outputs from different congeries of G rules, operations and filters. In effect, whereas changing surface patterns do signal some change in the underlying Gs, similarity of surface patterns need not. Moreover, given our current accounts there is (sadly) too many roads to Rome, thus the fact that two Gs generate similar outputs (or have moved towards similar outputs from different Gish starting points) does not imply that they must be doing so in the same way. Maybe they are and maybe not. It really all depends.

Ok back to H. He is largely interested in the (apparent) fact (and let’s stipulate that H is correct) that there exist “recurrent paths of changes,” “near universal tendencies” (NUT) that apply in “all or a great majority of languages.”[1] He is somewhat skeptical that we have currently identified diachronic mechanisms to explain such changes and that those on the market do not deliver: “It seems clear to me that in order to explain universal tendencies one needs to appeal to something stronger than “common paths of change,” namely change constraints, or, mutational constraints…” I could not agree more. That there exist recurrent paths of change is a datum that we need mechanisms to explain. It is not yet a complete explanation. Huh?

Recall, we need to keep our questions clear. Say that we have identified an actual NUT (i.e. we have compelling evidence that certain kinds of G changes are “preferred”). If we have this and we find another G changing in the same direction then we can attribute this to that same NUT. So we explain the change by so attributing it. Well, in part: we have identified the kind of thing it is even if we do not yet know why these types of things exist.  An analogy: I have a pencil in my hand. I open it. The pencil falls. Why? Gravitational attraction. I then find out that the same thing happens when I have a pen, an eraser, a piece of chalk (yes, this horse is good and dead!) and any other school supply at hand. I conclude that these falls are all instances of the same causal power (i.e. gravity). Have I explained why when I pick up a thumbtack and let it loose and it too falls that it falls because of gravity? Well, up to a point. A small point IMO, but a point nonetheless.  Of course we want to know how Gravity does this, what exactly it does when it does it and even why it does is the way that it does, but classifying phenomena into various explanatory pots is often a vital step in setting up the next step of the investigation (viz. identifying and explaining the properties of the alleged underlying “force”).

This said, I agree that the explanation is pretty lame if left like this. Why did X fall when I dropped it? Because everything falls when you drop it. Satisfied? I hope not.

Sadly, from where I sit, many explanations of typological difference or diachronic change have this flavor. In GG we often identify a parameter that has switched value and (more rarely) some PLD that might have led to the switch. This is devilishly hard to do right and I am not dissing this kind of work. However, it is often very unsatisfying given how easy it is to postulate parameters for any observable difference. Moreover, very few proposals actually do the hard work of sketching the presupposed learning theory that would drive the change or looking at the distribution of PLD that the learning theory would evaluate in making the change. To get beyond the weak explanations noted above, we need more robust accounts of the nature of the learning mechanisms and the data that was input to it (PLD) that led to the change.[2] Absent this, we do have an explanation of a very weak sort.

Would H agree? I think so, but I am not absolutely sure of this. I think that H runs together things that I would keep separate. For example: H considers Anderson’s view that many synchronic features of a G are best seen as remnants of earlier patterns. In other words, what we see in particular Gs might be reflections of “the shaping effects of history” and “not because the nature of the Language Faculty requires it” (H quoting Anderson: p. 2). H rejects this for the following reason: he doesn’t see “how the historical developments can have “shaping effects” if they are “contingent” (p. 2). But why not?  What does the fact that something is contingent have to do with whether it can be systematically causal? 1066 and all that was contingent, yet its effects on “English” Gs has been long lasting. There is no reason to think that contingent events cannot have long lasting shaping effects.

Nor, so far as I can tell, is there reason to think that this only holds for G-particular “idiosyncrasies.” There is no reason in principle why historical contingencies might not explain “universal tendencies.” Here’s what I mean.

Let’s for the sake of argument assume that there are around 50 different parameters (and this number is surely small). This gives a space of possible Gs (assuming the parameters are independent) of about 1,510,000,000. The current estimate of different languages out there (and I assume, maybe incorrectly, Gs) is on the order of 7,000, at least that’s the number I hear bandied about among typologists. This number is miniscule. It covers .0005% of the possible space. It is not inconceivable that languages in this part of the space have many properties in common purely because they are all in the same part of the space. These common properties would be contingent in a UG sense if we assumed that we only accidentally occupy this part of the space. Or, had we been dropped into another part of the G space we would have developed Gs without these properties. It is even possible that it is hard to get to any other of the G possibilities given that we are in this region.  On this sort of account, there might be many apparent universals that have no deep cognitive grounding and are nonetheless pervasive. Don’t get me wrong, I am not saying these exist, only that we really have no knock down reason for thinking they do not.  And if something like this could be true, then the fact that some property did or didn’t occur in every G could be attributed to the nature of the kind of PLD our part of the G space makes available (or how this kind of PLD interacts with the learning algorithm). This would fit with Anderson’s view: contingent yet systematic and attributable to the properties of the PLD plus learning theory.

I don’t think that H (nor most linguists) would find this possibility compelling. If something is absent from 7,000 languages (7,000 I tell you!!!) then this could not be an accident! Well maybe not. My only claim is that the basis for this confidence is not particularly clear. And thinking through this scenario makes it clear that gaps in the existing language patterns/Gs are (at best) suggestive about FL/UG properties rather than strongly dispositive.  It could be our ambient PLD that is responsible. We need to see the reasoning. Culbertson and Adger provide a nice model for how this might be done (see here).

One last point: what makes PoS arguments powerful is that they are not subject to this kind of sampling skepticism. PoS arguments really do, if successful, shed direct light on FL/UG. Why? Because if correctly grounded PoSs abstract away from PLD altogether and so remove this as a causal source of systematicity. Hence, PoSs short-circuit the skeptical suggestions above. Of course, the two kinds of investigation can be combined However, it is worth keeping in mind that typological investigations will always suffer from the kind of sampling problem noted above and will thus be less direct probes of FL/UG than will PoS considerations. This suggests, IMO, that it would be very good practice to supplement typologically based conclusions with PoS style arguments.[3] Even better would be explicit learning models, though these will be far more demanding given how hard it likely is to settle on what the PLD is for any historical change.[4]

I found H’s discussion of these matters to be interesting and provocative. I disagree with many things that H says (he really is focused on languages rather than Gs). Nonetheless, his discussion can be translated well enough into my own favored terms to be worth thinking about. Take a look.

[1] I say ‘apparent’ for I know very little of this literature though I am willing to assume H is correct that these exist for the sake of argument.
[2] Which does not mean that we have nice models of what better accounts might look like. Bob Berwick, Elan Dresher, Janet Fodor, Jeff Lidz, Lisa Pearl, William Sakas, Charles Yang, a.o., have provided excellent models of what such explanations would look like.
[3] Again a nice example of this is Culbertson and Adger’s work discussed  here. It develops an artificial G argument (meatier than a simple PoS argument) to more firmly ground a typological conclusion.

[4] Hard, but not impossible as the work of Kroch, Lightfoot and Roberts, for example, shows.


  1. Are not all parameters binary? In that case would not the number of possible Gs be 2^50, (i.e., more than 10^15)?

  2. Are you saying that the FoL may allow Gs that have/lack certain properties that none of the languages we have access to actually instantiate?

    1. Yes. Logically this is possible. The range of Gs we see might be a small subset of those FL allows and so generalizing from properties of those we see to those FL allows is potentially risky. Hence, inferring FL structures from typological generalizations, though common and reasonable, is hardly trivial. IMO, this problem does not infect other kinds of inferences to the structure of FL in the same way or degree (e.g. inference from PoS reasoning).

  3. I agree that something that's absent from the 6000-7000 languages we see is not in principle guaranteed to be ruled out by our mental capacities (be they linguistic or otherwise). This is a methodological heuristic, and I think it's one that has served us well. I personally find this heuristic to be much more reasonable than the one that underlies work in the artificial grammar paradigm – namely, that how adults treat novel linguistic data is relevant to how children do so.

    (And I agree with the first commenter regarding the math: 2^50 is 1,125,899,906,842,624.)

    1. I agree with the math too. But that makes the sample even smaller. So, heuristic it is. A good one? Not bad, and as I think one always uses what one can and does the best with it one can, I have no problems with it. But, I think it is worth noting how tendentious it is as a probe for FL/UG. The unquestioned assumption is that this is the best (form some, only) way to investigate FL/UG. I thought that highlighting its weaknesses was a public service. Hope you agree.

    2. >> PoS arguments really do, if successful, shed direct light on FL/UG

      Norbert, it seems to me you are saying that there is something nearly infallible about PoS arguments, which is not true of "heuristics". I very much like PoS arguments, personally, but it is not clear that they necessarily shed light on FL/UG.

      If we go back to Zeno’s paradoxes, which seemed like unassailable arguments for a couple of millennia, then many would argue in retrospect, that they didn’t really shed any light on reality, but really, they now shed light on our collective ignorance (of calculus).

      This is what makes me uncomfortable when anyone starts using PoS arguments as proofs, instead of interesting starting points for future inquiry. I am not sure whether they necessarily shed light on FL/UG or whether they reveal our collective ignorance. But, PoS arguments do make for interesting further questions, which for me is the real reason to follow them.

    3. @Norbert: I agree that it's worth reminding ourselves that this is a heuristic. And, for what it's worth, yes: the math correction means that our coverage is even smaller than you made it out to be. But whether or not this coverage is "too small" depends entirely on the structure of the space. There are certain spaces of size 2^50 for which a sample of 6000 is much more than you need in order to figure out everything about how the space is structured. (Is language that way? I don't know.)

      But all of this is specifically about the question of whether lack-of-crosslinguistic-attestation is dispositive. There is something related, but separate, about which I know we disagree: how necessary is it to look at what *is* attested in other languages, even if one is ultimately interested in the mental infrastructure. I feel confident saying that it's very necessary, simply judging by the history. I don't think it's really disputable that some principles, proposed on the basis of English (and closely related languages), have turned out to be wrong when confronted with facts from a broader set of languages. Here's a few examples: the purported relation between A-movement and case; the Activity Condition; the impossibility of A-movement out of finite clauses; the idea that finiteness is linked to tense in particular (as opposed to aspect, or even person or location; see Ritter & Wiltschko on Halkomelem and Blackfoot); the idea that Spec,TP has anything to do with agreement (an interestingly persistent misapprehension); the idea that the presence or absence of agreement is intrinsically linked to finiteness (or tense, for that matter); etc. And note, these are just from the particular area that I work in – I'm sure that people who work in other areas could make equally good lists from their respective necks of the woods.

      Let me be clear: I think there is tons we can learn from a single language, and most of it stands up to scrutiny (or needs very minor tweaking) when confronted with the broader crosslinguistic picture. But some of it doesn't. Perhaps not for any deep, principled reason (i.e., it could have been otherwise) – but that's just how things have turned out (at least from where I sit).

      So, no, I don't really trust results that have not been vetted by at least a basic crosslinguistic sanity-check. The good news is, most people listening to a talk or reading a paper do this implicitly; they think about how this would work in (other) languages that they know something about. And if there's a problem, they bring it up. That is, after all, how we know that all of the principles I listed above are false :-)

    4. @Karthik:
      Though my love of PoS arguments is nearly unbounded I don't quite go to infallibility (though Ido wish I could claim they were apodictic!). What I do think is that they are more DIRECT windows onto the structure of FL/UG than typological inductions over Gs. And this, I believe, is the opposite of the common view, which is why I try to strongly make the case for it.

      Furthermore, I believe that it is important to make this point for another reason. In the real sciences it is understood that some data bear more directly on a given mechanism than does other. This has to do with the structure of the relevant theories and the inferential steps linking them to data. PoS shines a bright light on FL/UG precisely because it abstracts away from PLD when successful. PLD distorts one view of FL/UG via its influence on the properties of particular Gs. What makes PoS so good is precisely that it abstracts from such interfering details. In other words it focuses on G properties as such and not G+PLD properties. Those interested in languages and their properties will find this kind of argument unconvincing. But that's because I think that they are only secondarily interested in FL/UG. The main interest is in language and its diversity. FL/UG are of subsidiary concern, FL/UG being an abstraction, at best an induction over different Gs.

      Last point: I could not agree with you more re the last point. All of this would be nugatory were PoS argumentation heuristically idle. One hopes it has legs and leads to other good questions, including ones that can be further elaborated and pursued typologically. That said, I would be happy were the logic of the above conceded. If your main interest is in FL/UG understand that approaching the problem mainly through comparative typology is a very indirect route, one far more indirect than PoS argumentation.

      One very last point: this does not mean to say that PoS should always lead. It may be that setting up a good one is too hard and that a typological investigation is easier and so more cost effective. That there is a direct method does not mean that indirect ones are useless or to be eschewed. Not so. However, the PRESUPPOSITION that the typological approach is the best way to study FL/UG (and this is the default view IMO) is just wrong.

    5. @Omer:
      My argument is not that typology cannot dispose of FL/UG claims. Of course it can, and here it does not matter what the space G possibilities looks like. If something is claimed impossible it cannot exist. 7,000 languages are more than enough (as you remind me often) to establish that some principle is wrong. However, there is another view that I think is wrong: that comparative typology shines a direct light on FL/UG. Or, to study it you must first gather a set of Gs, preferably very diverse and then see what they have in common. This induction over Gs treats FL/UG as a kind of abstraction over G properties. This, I believe, is the standard conception of UG (and, surprise, I believe that it is an abstract version of Greeneberg's conception). It is this view that I find severely wanting. It is not that one can learn a lot about FL/UG from a single language, it's that one can only learn about FL/UG by abstracting away from the variable effects that PLD has that are most responsible for making languages appear diverse. The question is how to abstract from this? IMO, PoS is an excellent way. We need others, and I think that the Culbertson and Adger attempt was an interesting one.

      A last point: one way of reading your comment is that we need typology to refine what we find using other methods. I can live with this. PoS arguments hardly ever implicate an exact mechanism. Rather it identifies a class of data that requires an FL/UG source. In other words, it isolates data that directly reflect and FL/UG source. All alone, it does not specify the exact mechanisms as many different ones may suffice. Here comparative data could be valuable for it might allow us to narrow done the specifics at play. That's hard to disagree with, so I won't.

      One last word on sanity checks: were listeners regularly to ask themselves about the PoS implications of some typological proposal the way they regularly apply a cross linguistic sanity check to proposed universals I would be a happy camper. In my experience, the issues hardly ever arise, and when they are mooted a surprising silence ensues. Tant pis.

    6. This comment has been removed by the author.

  4. How much does the approximation of number of parameters vary? Is there not a risk of circularity if the parameters are derived from how the Gs of the languages of the world look like, and then the space of possible Gs is approximated from this number of parameters? Or am I missing something?