Faculty of Language: What does typology teach us about FL?

Monday, November 16, 2015

What does typology teach us about FL?

I have been thinking lately about the following question: What does comparative/typology (C/T) study contribute to our understanding of FL/UG? Observe that I am taking it as obvious that GG takes the structure of FL/UG to be the proper object of study and, as a result, that any linguistic research project must ultimately be justified by the light it can shed on the fine structure of this mental organ. So, the question: what does studying C/T bring to the FL/UG table?

Interestingly, the question will sound silly to many. After all, the general consensus is that one cannot reasonably study Universal Grammar without studying the specific Gs of lots of different languages, the more the better. Many vocal critics of GG complain that GG fails precisely because it has investigated too narrow a range of languages and has, thereby, been taken in by many false universals.

Most GGers agree with spirit of this criticism. How so? Well, the critics accuse GG of being English or Euro centric and GGers tend to reflexively drop into a defensive crouch by disputing the accuracy of the accusation. The GG response is that GG has as a matter of fact studied a very wide variety of languages from different families and eras. In other words, the counterargument is that critics are wrong because GG is already doing what they demand.

The GG reply is absolutely accurate. However, it obscures a debatable assumption, one that indicates agreement with the spirit of the criticism: that only or primarily the study of a wide variety of typologically diverse languages can ground GG conclusions that aspire to universal relevance. In other words, both GG and its critics take the intensive study of typology and variation to be a conceptually necessary part of an empirically successful UG project.

I want to pick at this assumption in what follows. I have nothing against C/T inquiry.[1] Some good friends engage in it. I enjoy reading it. However, I want to put my narrow prejudices aside here in order to try and understand exactly what C/T work teaches us about FL/UG? Is the tacit (apparently widely accepted) assumption that C/T work is essential for (or at least, practically indispensible for or very conducive to) uncovering the structure of FL/UG correct?

Let me not be coy. I actually don’t think it is necessary, though I am ready to believe that C/T inquiry has been a practical and useful way of proceeding to investigate FL/UG. To grease the skids of this argument, let me remind you that most of biology is built on the study of a rather small number of organisms (e. coli, C. elegans, fruitflies, mice). I have rarely heard the argument made that one can’t make general claims about the basic mechanisms of biology because only a very few organisms have been intensively studied. If this is so for biology, why should the study of FL/UG be any different. Why should bears be barely (sorry I couldn’t help it) relevant for biologists but Belarusian be indispensable for linguistics? Is there more to this than just Greenbergian sentiments (which, we can all agree, should be generally resisted)?

So is C/T work necessary? I don’t think it is. In fact, I personally believe that POS investigations (and acquisition studies more generally (though these are often very hard to do right)) are more directly revealing of FL/UG structure. A POS argument if correctly deployed (i.e. well grounded empirically) tells us more about what structure FL/UG must have than surveys (even wide ones) of different Gs do. Logically, this seems obvious. Why? Because POS arguments are impossibility arguments (see here) whereas surveys, even ones that cast a wide linguistic net, are empirically contingent on the samples surveyed. The problem with POS reasoning is not the potential payoff or the logic but the difficulty of doing it well. In particular, it is harder than I would like to always specify the nature of the relevant PLD (e.g. is only child directed speech relevant? Is PLD degree 0+?). However, when carefully done (i.e. when we can fix the relevant PLD sufficiently well), the conclusions of a POS are close to definitive. Not so for cross-linguistic surveys.[2]

Assume I am right (I know you don’t, but humor me). Nothing I’ve said gainsays the possibility that C/T inquiry is a very effective way of studying FL/UG, even if it is not necessary. So, assuming it is an effective way of studying FL/UG, what exactly does C/T inquiry bring to the FL/UG table?

I can think of three ways that C/T work could illuminate the structure of FL/UG.

First, C/T inquiry can suggest candidate universals. Second, C/T investigations can help sharpen our understanding of the extant universals. Third, it can adumbrate the range of Gish variation, which will constrain the reach of possible universal principles. Let me discuss each point in turn.

First, C/T work as a source of candidate universals. Though this is logically possible, as a matter of fact, it’s my impression that this has not been where plausible candidates have come from. From where I sit (but I concede that this might be a skewed perspective) most (virtually all?) of the candidates have come from the intensive study of a pretty small number of languages. If the list I provided here is roughly comprehensive, then many, if not most, of these were “discovered” using a pretty small range of the possible Gs out there. This is indeed often mooted as a problem for these purported universals. However, as I’ve mentioned tiresomely before, this critique often rests on a confusion between Chomsky universals with their Grennbergian eponymous doubles.

Relevantly, many of these candidate universals predate the age of intensive C/T study (say dating from the late 70s and early 80s). Not all of them, but quite a few. Indeed, let me (as usual) go a little further: there have been relatively few new candidate universals proposed over the last 20 years, despite the continually increasing investigation of more and more different Gs. That suggests to me that despite the possibility that many of our universals could have been inductively discovered by rummaging through myriad different Gs, in fact this is not what actually took place.[3] Rather, as in biology, we learned a lot by intensively studying a small number of Gs and via (sometimes inchoate) POS reasoning, plausibly concluded that what we found in English is effectively a universal feature of FL/UG. This brings us to the second way that C/T inquiry is useful. Let’s turn to this now.

The second way that C/T inquiry has contributed to the understanding of FL/UG is that it has allowed us (i) to further empirically ground the universals discovered on the basis of a narrow range of studied languages and, (ii) much more importantly, to refine these universals. So, for example, Ross discovers island phenomena in languages like English and proposes them as due to the inherent structure of FL/UG. Chomsky comes along and develops a theory of islands that proposes that FL/UG computations are bounded (i.e. must take place in bounded domains) and that apparent long distance dependencies are in fact the products of smaller successive cyclic dependencies that respect these bounds. C/T work then comes along and refines this basic idea further. So Rizzi notes that (i) wh-islands are variable (and multiple WH languages like Romanian shows that there is more than one way to apparently violate Wh islands) and (ii) Huang suggests that islands needs to include adjuncts and subjects and (iii) work on the East Asian languages suggests that we need to distinguish island effects from ECP effects despite their structural similarity and (iv) studies of in-situ wh languages allows us to investigate the bounding requirements on overt and covert movement and (v) C/T data from Irish and Chamorro and French and Spanish provides direct evidence for successive cyclic movement even absent islands.

There are many other examples of C/T thinking purifying candidate universals. Another favorite example of mine is how the anaphor agreement effect (investigated by Rizzi and Woolford) shows that Principle A cannot be the last word on anaphor binding (see Omer’s discussion here). This effect strongly argues that anaphor licensing is not just a matter of binding domain size, as the classical GB binding theory proposes.[4] So, finding that nominative anaphors cannot be bound in Icelandic changes the way we should think about the basic form of the binding theory. In other words, considering how binding operates in a language with different case and agreement profiles from English has proven to be very informative about our basic understanding binding principles.[5]

However, though I think this work has been great (and a great resource at parties to impress friends and family), it is worth noting that the range of relevant languages needed for the refinements has been relatively small (what would we do without Icelandic!). This said, C/T work has made apparent the wide range of apparently different surface phenomena that fall into the same general underlying patterns (this is especially true of the rich investigations on case/agreement phenomena). It has also helped refine our understanding by investigating the properties of languages whose Gs make morpho-syntactically explicit what is less surface evident in other languages. So for example, the properties of inverse agreement (and hence defective intervention effects) are easier to study in languages like Icelandic where one finds overt post verbal nominatives than it is in English where there is relatively little useful morphology to track.[6] The analogue of this work in (other) areas of biology is the use of big fat and easily manipulated squid axons (rather than dainty, small and smooshy mice axons) to study neuronal conduction.

Another instance of the same thing comes from the great benefits of C/T work in identifying languages where UG principles of interest leave deeper overt footprints than in others (sometimes very very deep (e.g. inverse control, IMO)). There is no question that the effects of some principles are hard to find in some languages (e.g. island effects in languages which don’t tend to move things around much, or binding effects in Malay-2 (see here)). And there is no doubt that sometimes languages give us extremely good evidence of what is largely theoretical inference in others. Thus, as mentioned, the morphological effects of successive cyclic movement in Irish or Chamorro or verb inversion in French and Spanish make evident at the surface the successive cyclic movement that FL/UG infers from, among other things, island effects. So, there is no question that C/T research has helped ground many FL/UG universals, and has even provided striking evidence for their truth. However (and maybe this is the theorist in me talking), it is surprising how much of these refinements and evidence builds on proposals with a still very narrow C/T basis. What made the C-agreement data interesting, for example, is that it provided remarkably clear evidence for something that we already had pretty good indirect evidence for (e.g. Islands are already pretty good evidence for successive cyclic movement in a subjacency account). However, I don’t want to downplay the contributions of C/T work here. It has been instrumental in grounding lots of conclusions motivated on pretty indirect theoretical grounds, and direct evidence is always a plus. What I want to emphasize is that more often than not, this additional evidence has buttressed conclusions reached on theoretical (rather than inductive) grounds, rather than challenging them.

This leaves the third way that C/T work can be useful: it may not propose but it can dispose. It can help identify the limits of universalist ambitions. I actually think that this is much harder to do than is often assumed. I have recently discussed an (IMO unsuccessful) attempt to do this for Binding Theory (here and here), and I have elsewhere discussed the C/T work on islands and their implications for a UG theory of bounding (here). Here too I have argued that standard attempts to discredit universal claims regarding islands have fallen short and that the (more “suspect”) POS reasoning has proven far more reliable. So, I don't believe that C/T work has, by and large, been successful at clearly debunking most of the standard universals.

However, it has been important in identifying the considerable distance that can lie between a universal underlying principle and its surface expressions. Individual Gs must map underlying principles to surface forms and Gs must reflect this possible variation. Consequently, finding relevant examples thereof sets up interesting acquisition problems (both real time and logical) to be solved. Or, to say this another way, one potential value of C/T work is in identifying something to explain given FL/UG. C/T work can provide the empirical groundwork for studying how FL/UG is used to build Gs, and this can have the effect of forcing us to revise our theories of FL/UG.[7] Let me explain.

The working GG conceit is that the LAD uses FL and its UG principles to acquire Gs on the basis of PLD. To be empirically adequate an FL/UG must allow for the derivation of different Gs (ones that respect the observed surface properties). So, one way to study FL/UG is to investigate differing languages and ask how their Gs (i.e. ones with different surface properties) could be fixed on the basis of available PLD. On this view, the variation C/T discovers is not interesting in itself but is interesting because it empirically identifies an acquisition problem: how is this variation acquired? And this problem has direct bearing on the structure of FL/UG. Of course, this does not mean that any variation implies a difference in FL/UG. There is more to actual acquisition than FL/UG. However, the problem of understanding how variation arises given FL/UG clearly bear on what we take to be in FL/UG.[8]

And this is not merely a possibility. Lots of work on historical change from the mid 1980s onwards can be, and was, seen in this light (e.g. Lightfoot, Roberts, Berwick and Nyogi). Looking for concomitant changes in Gs was used to shed light on the structure of FL/UG parameter space. The variation, in other words, was understood to tell us something about the internal structure of FL/UG. It is unclear to me how many GGers still believe in this view of parameters (see here and here). However, the logic of using G change to probe the structure of FL/UG is impeccable. And there is no reason to limit the logic to historical variation. It can apply just as well to C/T work on synchronically different Gs, closely related but different dialects, and more.

This said, it is my impression that this is not what most C/T work actually aspires to anymore, and this is becuase most C/T research is not understood in the larger context of Plato’s Problem or how Gs are acquired by LADs in real time. In other words, C/T work is not understood as a first step towards the study FL/UG. This is unfortunate for this is an obvious way of using C/T results to study the structure of FL/UG. Why then is this not being done? In fact, why does it not even seem to be on the C/T research radar?

I have a hunch that will likely displease you. I believe that many C/T researchers either don’t actually care to study FL/UG and/or they understand universals in Greenbergian terms. Both are products of the same conception; the idea that linguistics studies languages, not FL. Given this view, C/T work is what linguists should do for the simple reason that C/T work investigates languages and that’s what linguistics studies. We should recognize that this is contrary to the founding conception of modern linguistics. Chomsky’s big idea was to shift the focus of study from languages to the underlying capacity for language (i.e FL/UG). Languages on this conception are not the objects of inquiry. FL is. Nor are Greenberg universals what we are looking for. We are looking for Chomsky universals (i.e. the basic structural properties of FL). Of course, C/T work might advance this investigation. But the supposition that it obviously does so needs argumentation. So let’s have some, and to start the ball rolling let me ask you: how does C/T work illuminate the structure of FL/UG? What are its greatest successes? Should we expect further illumination? Given the prevalence of the activity, it should be easy to find convincing answers to these questions.

[1] I will treat the study of variation and typological study as effectively the same things. I also think that historical change falls into the same group. Why study any of these?

[2] Aside from the fact that induction over small Ns can be hazardous (and right now the actual number of Gs surveyed is pretty small given the class of possible Gs), most languages differ from English in only having a small number of investigators. Curiously, this was also a problem in early modern biology. Max Delbruck decreed that everyone would work on e.coli in order to make sure that the biology research talent did not spread itself too thin. This is also a problem within a small field like linguistics. It would be nice if as many people worked on any other language as work on English. But this is impossible. This is one reason why English appears to be so grammatically exotic; the more people work on a language the more idiosyncratic it appears to be. This is not to disparage C/T research, but only to observe the obvious, viz. that person-power matters.

[3] Why has the discovery of new universals slowed down (if it has, recall this is my impression)? One hopeful possibility is that we’ve found more or less all of them. This has important implications for theoretical work if it is true, something that I hope to discuss at some future point.

[4] Though, as everyone knows, the GB binding theory as revised in Knowledge of Language treats the unacceptability of *John thinks himself/heself is tall as not a binding effect but an ECP effect. The anaphor-agreement effect suggests that this too is incorrect, as does the acceptability of quirky anaphoric subjects in Icelandic.

[5] I proposed one possible reinterpretation of binding theory based in part on such data here. I cannot claim that the proposal has met with wide acceptance and so I only mention it for the delectation of the morbidly curious.

[6] One great feature of overt morphology is that it often allows for crisp speaker acceptability judgments. As this has been syntax’s basic empirical fodder, crisp judgments rock.

[7] My colleague Jeff Lidz is a master of this. Take a look at some of his papers. Omer Preminger’s recent NELS invited address does something similar from a more analytical perspective. I have other favorite practitioners of this art including Bob Berwick, Charles Yang, Ken Wexler, Elan Dresher, Janet Fodor, Stephen Crain, Steve Pinker, and this does not exhaust the list. Though it does exhaust my powers of immediate short term recall.

[8] Things are, of course, more complex. FL/UG cannot explain acquisition all by its lonesome; we also need (at least) a learning theory. Charles Yang and Jeff Lidz provide good paradigms of how to combine FL/UG and learning theory to investigate each. I urge you to take a look.

15 comments:

UnknownNovember 16, 2015 at 1:46 PM
I can only speak for my own line of work, but typology is a lot more important to my current interests than what is going on in individual languages (including their acquisition). I think this point is just a variation of your third argument in defense of C/T, but since it comes from a different subfield with very different methodology it might be of interest nonetheless:

The study of individual languages was very useful in establishing lower bounds on generative capacity (Swiss German for supra-CFL, Yoruba and a few others for supra-MCFL), but it is very unlikely that we will find any more constructions that would push us even higher up the Chomsky hierarchy. And that's not restricted to weak generative capacity, we're also on pretty safe ground with respect to strong generative capacity. The Minimalist grammar framework as it stands right now can handle pretty much anything linguists use in their analyses while keeping expressivity in check and having very interesting formal properties.

To a certain extent, this trivializes the questions that dominate the analysis of individual languages. For example, Korean allows case endings to be dropped on the last DP in fragment answers. The standard view would be that this is a peculiar property that tells us something about how FL operates. From the computational perspective outlined above this phenomenon is entirely unsurprising because the formalism is already capable of generating such languages (any formalism with subcategorization can do this). Assuming free variation, some language is bound to display this behavior, so there's nothing to see here... if that were the end of the story.

The interesting thing is that not all logical possibilities are attested cross-linguistically. I haven't been able to fully map out the space of attested options for case dropping yet, but it's already clear that there are several typological gaps: drop only the penultimate case marker, only the antepenultimate case marker, every case marker that is preceded by an even number of DPs, and so on. Those are also things that the formalism is capable of (once again, any formalim with subcategorization can do that), so their non-existence does require explanation. Those may take the form of certain algebraic properties, showing that the typological gaps are more complex in some sense, or deriving them from independently motivated substantive universals. But whatever the answer, it is only due to the typology that the construction becomes interesting.

Bottom-line: the study of individual languages reveals what's possible, but we also need to know what's impossible. Typology does that. POS arguments can do it too, but it is far from obvious that both cover the same ground. In addition, POS arguments are much more theory-laden than simply mapping out the space of typological (surface) variation.

For my own research, work emphasizing typology (e.g. Inkelas' Interplay of Morphology and Phonology) has proven a lot more useful than the standard approach (e.g. Kramer's Morphosyntax of Gender; not a bad book, just not what I needed). So for my purposes it doesn't matter whether the C/T community is interested in universals --- as long as their work contains succinct tables contrasting which logically possible options are attested/unattested, I'm happy.
ReplyDelete
Replies
AveryAndrewsNovember 16, 2015 at 2:31 PM
"However, I don’t want to downplay the contributions of C/T work here. It has been instrumental in grounding lots of conclusions motivated on pretty indirect theoretical grounds, and direct evidence is always a plus. What I want to emphasize is that more often than not, this additional evidence has buttressed conclusions reached on theoretical (rather than inductive) grounds, rather than challenging them."

In other words, C/T work plays a big role in converting what would otherwise be conjectures into what can reasonably be claimed to be knowledge. We can claim to *know*, for example, that no languages forms questions by moving the first word of some grammatical category into initial position, because if this was described in any grammar of any language, some C/T worker would have noticed it and brought it to our attention.
ReplyDelete
Replies
OmerNovember 17, 2015 at 6:31 AM
One important way in which work on diverse languages can advance our understanding of FL/UG is by challenging (and, in some cases, disproving) certain Chomsky Universals.

Norbert is fond of pointing out that the naïve version of this line of argumentation is fallacious ("language X doesn't have internally-headed free relatives, therefore Chomsky is wrong!"), and he is of course correct about that. But there is another, less nonsensical version of this. Chomsky Universals sometimes take the form of no language has a rule with the formal property P (think of the subj-aux inversion stuff: no language has an inversion rule based on linearly closest rather than structurally closest (except in examples where linearly closest means adjacent)). So if we found a language that could only be successfully modeled using a grammar G that resorts to rules that have the (allegedly unattested) formal property P, we will have disproven the particular Chomsky Universal under consideration.

[DISCLAIMER: The following example relies on my own research; if you don't buy the argumentation therein, then it obviously doesn't exemplify what I'm taking it to exemplify.]

One could envision the following Chomsky Universal:

(1) No language has a rule whose application cannot be enforced exclusively via Interface Conditions (i.e., conditions statable at the interface of syntax with semantics or with morphophonology).

I don't think this is a straw-man; many people read Chomsky's (2000) "Strong Minimalist Thesis" (SMT) to entail something like (1). What my 2014 book attempts to show is that agreement in the Kichean languages disproves (1). [And, if you think that (1) is entailed by the SMT, then it also disproves the SMT.] That argument simply could not be mounted using data from English, as far as I can tell. And so this is an example of data from languages that are considerably less studied than English informing our inventory of Chomsky Universals.
ReplyDelete
Replies

Add comment

Faculty of Language

Comments

Monday, November 16, 2015

What does typology teach us about FL?

15 comments:

Contributors