Comments

Showing posts with label Minimalist Program. Show all posts
Showing posts with label Minimalist Program. Show all posts

Wednesday, September 19, 2018

Generative grammar's Chomsky Problem

Martin Haspelmath (MP) and I inhabit different parts of the (small) linguistics universe. Consequently, we tend to value very different kinds of work and look to answer very different kinds of questions. As a result, when our views converge, I find it interesting to pay attention. In what follows I note a point or two of convergence. Here is the relevant text that I will be discussing (Henceforth MHT (for MHtext)).[1]

MHT’s central claim is that “Chomsky no longer argues for a rich UG of the sort that would be relevant for the ordinary grammarian and, e.g. for syntax textbooks” (1). It extends a similar view to me: “even if he is not as radical about a lean UG as Chomsky 21stcentury writings (where nothing apart from recursion is UG), Hornstein’s view is equally incompatible with current practice in generative grammar” (MHT emphasis, (2)).[2]

Given that neither Chomsky nor I seems to be inspiring current grammatical practice (btw, thx for the company MH), MHT notes that “generative grammarians currently seem to lack an ideological superstructure.” MHT seems to suggest that this is a problem (who wants to be superstructure-less after all?), though it is unclear for whom, other than Chomsky and me (what’s a superstructure anyhow?). MHT adds that Chomsky “does not seem to be relevant to linguistics anymore” (2).

MHT ends with a few remarks about Chomsky on alien (as in extra-terrestial) language, noting a difference between him and Jessica Coon on this topic. Jessica says the following (2):

 When people talk about universal grammar it’s just the genetic endowment that allows humans to acquire language. There are grammatical properties we could imagine that we just don’t ever find in any human language, so we know what’s specific to humans and our endowment for language. There’s no reason to expect aliens would have the same system. In fact, it would be very surprising if they did. But while having a better understanding of human language wouldn’t necessarily help, hopefully it’d give us tools to know how we might at least approach the problem.

This is a pretty vintage late 1980s bioling view of FL. Chomsky demurs, thinking that perhaps “the Martian language might not be so different from human language after all” (3). Why? Because Chomsky proposes that many features of FL might be grounded in generic computational properties rather than idiosyncratic biological ones. In his words:

We can, in short, try to sharpen the question of what constitutes a principled explanation for properties of language, and turn to one of the most fundamental questions of the biology of language: to what extent does language approximate an optimal solution to conditions that it must satisfy to be usable at all, given extralinguistic structural architecture?” 

MHT finds this opaque (as do I actually) though the intent is clear: To the degree that the properties of FL and the Gs it gives rise to are grounded in general computational properties, properties that a system would need to have “to be usable at all” then to that degree there is no reason to think that these properties would be restricted to human language (i.e. there is no reason to think that they would be biologically idiosyncratic). 

MHT’s closing remark about this is to reiterate his main point: “Chomsky’s thinking since at least 2002 is not really compatible with the practice of mainstream generative grammar” (3-4).

I agree with this, especially MHT's remark about current linguistic practice. Much of what interests Chomsky (and me) is not currently high up on the GG research agenda. Indeed, I have argued (herethat much of current GG research has bracketed the central questions that originally animated GG research and that this change in interests is what largely lies behind the disappointment many express with the Minimalist Program (MP). 

More specifically, I think that though MP has been wildly successful in its own terms and that it is the natural research direction building on prior results in GG, its central concerns have been of little mainstream interest. If this assessment is correct, it raises a question: why the mainstream disappointment with MP and why has current GG practice diverged so significantly from Chomsky’s? I believe that the main reason is that MP has sharpened the two contradictory impulses that have been part of the GG research program from its earliest days. Since the beginning there has been a tension between those mainly interested in the philological details of languages and those interested in the mental/cognitive/neuro implications of linguistic competence.

We can get a decent bead on the tension by inspecting two standard answers to a simple question: what does linguistics study? The obvious answer is language. The less obvious answer is the capacity for language (aka, linguistic competence). Both are fine interests (actually, I am not sure that I believe this, but I want to be concessive (sorry Jerry)). And for quite a while it did not much matter to everyday research in GG which interest guided inquiry as the standard methods for investigating the core properties of the capacity for language proceeded via a filigree philological analysis of the structures of language. So, for example, one investigated the properties of the construal modules by studying the distribution of reflexives and pronouns in various languages. Or by studying the locality restrictions on questions formation (again in particular languages) one could surmise properties of the mentalist format of FL rules and operations. Thus, the way that one studied the specific cognitive capacity a speaker of a particular language L had was by studying the details of the language L and the way that one studied more general (universal) properties characteristic of FL and UG was by comparing and contrasting constructions and their properties across various Ls. In other words, the basic methods were philological even if the aims were cognitive and mentalisic.[3]And because of this, it was perfectly easy for the work pursued by the philologically inclined to be useful to those pursuing the cognitive questions and vice versa. Linguistic theory provided powerful philological tools for the description of languages and this was a powerful selling point. 

This peaceful commensalism ends with MP. Or, to put it more bluntly, MP sharpens the differences between these two pursuits because MP inquiry only makes sense in a mentalistic/cognitive/neuro setting. Let me explain.

Here is very short history of GG. It starts with two facts: (1) native speakers are linguistically productive and (2) any human can learn any language. (1) implies that natural languages are open ended and thus can only be finitely characterized via recursive rule systems (aka grammars (Gs)). Languages differ in the rules their Gs embody. Given this, the first item on the GG research agenda was to specify the kinds of rules that Gs have and the kinds of dependencies Gs care about. Given an inventory of such rules sets up the next stage of inquiry.

The second stage begins with fact (2). Translated into Gish terms it says that any Language Acquisition Device (aka, child) can acquire any G. We called this meta-capacity to acquire Gs “FL” and we called the fine structure of FL “UG.” The fact that any child can acquire any G despite the relative paucity and poverty of the linguistic input data implies that FL has some internal structure. We study this structure by studying the kinds of rules that Gs can and cannot have. Note that this second project makes little sense until we have candidate G rules. Once we have some, we can ask why the rules we find have the properties they do (e.g. structure dependence, locality, c-command). Not surprisingly then, the investigation of FL/UG and the investigation of language particular Gs naturally went hand in hand and the philological methods beloved of typologists and comparative grammarians led the way. And boy did they lead! GB was the culmination of this line of inquiry. GB provided the first outlines of what a plausible FL/UG might look like, one that had grounding in facts about actual Gs. 

Now, this line of research was, IMO, very successful. By the mid 90s, GG had discovered somewhere in the vicinity of 25-35 non-trivial universals (i.e. design features of FL) that were “roughly” correct (see here for a (partial) list). These “laws of grammar” constitute, IMO, a great intellectual achievement. Moreover, they set the stage for MP in much the way that the earlier discovery of rules of Gs set the stage for GB style theories of FL/UG. Here’s what I mean.

Recall that studying the fine structure of FL/UG makes little sense unless we have candidate Gs and a detailed specification of some of their rules. Similarly, if one’s interest is in understanding why our FL has the properties it has, we need some candidate FL properties (UG principles) for study. This is what the laws of grammar provide; candidate principles of FL/UG. Given these we can now ask why we have these kinds of rules/principles and not other conceivable ones. And this is the question that MP sets for itself: why this FL/UG? MP, in short, takes as its explanadum the structure of FL.[4]

Note, if this is indeed the object of study, then MP only makes sense from a cognitive perspective. You won’t ask why FL has the properties it has if you are not interested in FL’s properties in the first place. So, whereas the minimalist program so construed makes sense in a GG setting of the Chomsky variety where a mental organ like FL and its products are the targets of inquiry, it is less clear that the project makes much sense if ones interests are largely philological (in fact, it is pretty clear to me that it doesn’t). If this is correct and if it is correct that most linguists have mainly philological interests then it should be no surprise that most linguists are disappointed with MP inquiry. It does not deliver what they can use for it is no longer focused on questions analogous to the ones that were prominent before and which had useful spillover effects. The MP focus is on issues decidedly more abstract and removed from immediate linguistic data than heretofore. 

There is a second reason that MP will disappoint the philologically inclined. It promotes a different sort of inquiry. Recall that the goal is explaining the properties of FL/UG (i.e. the laws of grammar are the explanada). But this explanatory project requires presupposing that the laws are more or less correct. In other words, MP takes GB as (more or less) right.[5] MP's added value comes in explaining it, not challenging it. 

In this regard, MP is to GB what Subjacency Theory is to Ross’s islands. The former takes Ross’s islands as more or less descriptively accurate and tries to derive them on the basis of more natural assumptions. It would be dumb to aim at such a derivation if one took Ross’s description to be basically wrong headed. So too here. Aiming to derive the laws of grammar requires believing that these are basically on the right track. However, this means that so far as MP is concerned, the GBish conception of UG, though not fundamental, is largely empirically accurate. And this means that MP is not an empirical competitor to GB. Rather, it is a theoretical competitor in the way that Subjacency Theory is to Ross’s description of islands. Importantly, empirically speaking, MP does not aim to overthrow (or even substantially revise the content of) earlier theory.[6]

Now this is a problem for many working linguists. First, many don’t have the same sanguine view that I do of GB and the laws it embodies. In fact, I think that many (most?) linguists doubt that we know very much about UG or FL or that the laws of grammar are even remotely correct. If this is right, then the whole MP enterprise will seem premature and wrong headed to them.  Second, even if one takes these as decent approximations to the truth, MP will encourage a kind of work that will be very different from earlier inquiry. Let me explain.

The MP project so conceived will involve two subparts. The first one is to derive the GB principles. If successful, this will mean that we end up empirically where we started. If successful, MP will recover the content of GB. Of course, if you think GB is roughly right, then this is a good place to end up. But the progress will be theoretical not empirical. It will demonstrate that it is reasonable to think that FL is simpler than GB presents it as being. However, the linguistic data covered will, at least initially, be very much the same. Again, this is a good thing from a theoretical point of view. But if one’s interests are philological and empirical, then this will not seem particularly impressive as it will largely recapitulate GB's empirical findings, albeit in a novel way.

The second MP project will be to differentiate the structure of FL and to delineate those parts that are cognitively general from those that are linguistically proprietary. As you all know, the MP conceit is that linguistic competence relies on only a small cognitive difference between us and our apish cousins. MP expects FL’s fundamental operations and principles to be cognitively and computationally generic rather than linguistically specific. When Chomsky denies UG, what he denies is that there is a lot of linguistic specificity to FL (again: he does not deny that the GB identified principles of UG are indeed characteristic features of FL). Of course, hoping that this is so and showing that it might be/is are two very different things. The MP research agenda is to make good on this. Chomsky’s specific idea is that Merge and some reasonable computational principles are all that one needs. I am less sanguine that this is all that one needs, but I believe that a case can be made that this gets one pretty far. At any rate, note that most of this work is theoretical and it is not clear that it makes immediate contact with novel linguistic data (except, of course, in the sense that it derives GB principles/laws that are themselves empirically motivated (though recall that these are presupposed rather than investigated)). And this makes for a different kind of inquiry than the one that linguists typically pursue. It worries about finding natural more basic principles and showing how these can be deployed to derive the basic features of FL. So a lot more theoretical deduction and a lot less (at least initially) empirical exploration.

Note, incidentally, that in this context, Chomsky’s speculations about Martians and his disagreement with Coons is a fanciful and playful way of making an interesting point. If FL’s basic properties derive from the fact that it is a well designed computational system (its main properties follow from generic features of computations), then we should expect other well designed computational systems to have similar properties. That is what Chomsky is speculating might be the case. 

So, why is Chomsky (and MP work more generally) out of the mainstream? Because mainstream linguistics is (and has always been IMO) largely uninterested in the mentalist conception of language that has always motivated Chomsky’s view of language. For a long time, the difference in motivations between Chomsky and the rest of the field was of little moment. With MP that has changed. The MP project only makes sense in a mentalist setting and invites decidedly philologically  projects without direct implications for further philological inquiry. This means that the two types of linguistics are parting company. That’s why many have despaired about MP. It fails to have the crossover appeal that prior syntactic theory had. MHT's survey of the lay of the linguistic land accurately reflects this IMO.

Is this a bad thing? Not necessarily, intellectually speaking. After all, there are different projects and there is no reason why we all need to be working on the same things, though I would really love it if the field left some room for the kind of theoretical speculation that MP invites.

However, the divergence might be sociologically costly. Linguistics has gained most of its extra mural prestige from being part of the cog-neuro sciences. Interestingly, MP has generated interest in that wider world (and here I am thinking cog-neuro and biology). Linguistics as philology is not tethered to these wider concerns. As a result, linguistics in general will, I believe, become less at the center of general intellectual life than it was in earlier years when it was at the center of work in the nascent cognitive and cog-neuro sciences. But I could be wrong. At any rate, MHT is right to observe that Chomsky’s influence has waned within linguistics proper. I would go further. The idea that linguistics is and ought to be part of the cog-neuro sciences is, I believe, a minority position within the discipline right now. The patron saint of modern linguistics is not Chomsky, but Greenberg. This is why Chomsky has become a more marginal figure (and why MH sounds so delighted). I suspect that down the road there will be a reshuffling of the professional boundaries of the discipline, with some study of language of the Chomsky variety moving in with cog-neuro and some returning to the language departments. The days of the idea of a larger common linguistic enterprise, I believe, are probably over.


[1]I find that this is sometimes hard to open. Here is the url to paste in:
https://dlc.hypotheses.org/1269 

[2]I should add that I have a syntax textbook that puts paid to the idea that Chomsky’s basic current ideas cannot be explicated in one. That said, I assume that what MHT intends is that Chomsky’s views are not standard text book linguistics anymore. I agree with this, as you will see below.
[3]This was and is still the main method of linguistic investigation. FoLers know that I have long argued that PoS style investigations are different in kind from the comparative methods that are the standard and that when applicable they allow for a more direct view of the structure of FL. But as I have made this point before, I will avoid making it here. For current purposes, it suffices to observe that whatever the merits of PoS styles of investigation, these methods are less prevalent than the comparative method is.
[4]MHT thinks that Chomsky largely agrees with anti UG critics in “rejecting universal grammar” (1). This is a bit facile. What Chomsky rejects is that the kinds of principles we have identified as characteristic of UG are linguistically specific. By this he intends that they follow from more general principles. What he does not do, at least this is not what Ido, is reject that the principles of UG as targets of explanation. The problem with Evans and Levinson and Ibbotson and Tomasello is that their work fails to grapple with what GG has found in 60 years of research. There are a ton of non-trivial Gish facts (laws) that have been discovered. The aim is to explain these facts/laws and ignoring them or not knowing anything about them is not the same as explaining them. Chomsky “believes” that language has properties that previous work on UG ahs characterized. What he is questioning is whether theseproperties are fundamental or derived. The critics of UG that MHT cites have never addressed this question so they and Chomsky are engaged in entirely different projects. 
            Last point: MHT notes that neophytes will be confused about all of this. However, a big part of the confusion comes from people telling them that Chomsky and Evans/Levinson and Ibbotson/Tomasello are engaged in anything like the same project.
[5]Let me repeat for the record, that one can do MP and presuppose some conception of FL other than GB. IMO, most of the different “frameworks” make more or less the same claims. I will stick to GB because this is what I know best andMP indeed has targeted GB conceptions most directly.
[6]Or, more accurately, it aims to preserve most of it, just as General Relativity aimed to preserve most of Newtonian mechanics.

Wednesday, March 5, 2014

Minimalism's program and its theories

Is there any minimalist theory and if so what are its main tenets? I ask this because I have recently been reading a slew of papers that appear to treat minimalism as a theoretical “framework,” and this suggests that there are distinctive theoretical commitments that minimalist analyses make that render them minimalist (just as there are distinctive assumptions that make a theory GBish, see below). What are these and what makes these commitments “minimalist”? I ask this for it starts to address a “worry” (not really, as I don’t worry much abut these things) that I’ve been thinking about for a while, the distinction between a program and a theory. Here’s what I’ve been thinking.

Minimalism debuted in 1993 as a program, in Chomsky’s eponymous paper. There is some debate as to whether chapter 3 of the “Black Book” (BB) was really the start of the Minimalist Program (MP) or whether there were already substantial hints about the nature of MP earlier on (e.g. in what became chapters 1 and 2 of BB).  What is clear is that by 1993, there existed a self-conscious effort to identify a set of minimalist themes and to explore these in systematic ways. These tropes divided into at least two kinds, when seen in retrospect.[1]

First, there was the methodological motif. MP was a call to critically re-examine the theoretical commitments of earlier theory, in particular GB.[2]  The idea was to try and concretize methodological nostrums like “simple, elegant, natural theories are best” in the context of then extant syntactic theory. Surprisingly (at least to me)[3], Chomsky showed that these considerations could have a pretty sharp bite in the context of mid 90s theory. I still consider Chomsky’s critical analysis of the GB levels as the paradigm example of methodological minimalism. The paper shows how conceptual considerations impose different burdens of proof wrt the different postulated levels. Levels like PF (which interface with the sound systems (AP)) or LF (which interfaces with the  belief (CI) systems) need jump no empirical hurdles (they are “virtually conceptually necessary”) in contrast to internal levels like DS and SS (which require considerable empirical justification). In this instance, methodological minimalism rests on the observation that whereas holding that grammars interface with sound and meaning is a truism (and has been since the dawn of time), postulating grammar internal levels is anything but. From this it trivially follows that any theory that postulates levels analogous to DS and SS faces a high burden of proof. This reasoning is just the linguistic version of the anodyne observation that saying anything scientifically non-trivial requires decent evidence. In effect, it is the observation that PF and LF are very uninteresting levels while DS and SS are interesting indeed.

One of the marvels, IMO, of these methodological considerations is that they led rather quickly to a total reconfiguration of UG (eliminating DS and SS from UG is a significant theoretical step) and induced a general suspicion of grammar internal constructs beyond the suspect levels. In addition to DS and SS the 93 paper cast aspersions on traces (replacing them with copies), introduced feature checking, and suggested that government was a very artificial primitive relation whose central role in the theory of grammar called for serious reconsideration.

These themes are more fully developed in BB’s chapter 4, but the general argumentative outlines are similar to what we find in chapter 3. For example, the reasoning developing Bare Phrase Structure has a very similar structure to that concerning the elimination of DS/SS. It starts with the observation that any theory of grammar must have a combination operation (merge) and then goes on to outline what is the least we must assume concerning the properties of such an operation given widely accepted facts about linguistic structures. The minimal properties require little justification. Departures from them do. The trick is to see how far we can get making only anodyne assumptions (e.g. grammars interface with CI/AP, grammars involve very simple rules of combination) and then requiring that what goes beyond the trivial be well supported before being accepted. So far as I can see, there should be nothing controversial in this form of argument or the burdens it places on theory, though there has been, and continues to be, reasonable controversy about how to apply it in particular cases.[4]

However, truth be told, methodological minimalism is better at raising concerns than delivering theory to meet them. So, for example, a grammar with Merge alone is pretty meager. Thus, to support standard grammatical investigation, minimalists have added technology that supplements the skimpy machinery that methodological minimalism motivates.

A prime example of such is the slew of locality conditions minimalists have adopted (e.g. minimality and phase impenetrability) and the feature inventories and procedures for checking them (Spec-X0, AGREE via probe-goal) that have been explored. Locality conditions are tough to motivate on methodological grounds. Indeed, there is a good sense in which grammars that include locality conditions of various kinds and features of various flavors licensed by different feature checking operations are less simple than those that eschew these. However, to be even mildly empirically adequate any theory of grammar will need substantive locality conditions of some kind. Minimalists have tried to motivate them on computational rather methodological grounds. In particular, minimalists have assumed that bounding the domain of applicable operations is a virtue in a computational system (like a grammar) and so locality conditions of some variety are to be expected to be part of UG. The details, however, are very much open to discussion and require empirical justification.

Let me stress this. I have suggested above that there are some minimalist moves that are methodological defaults (e.g. no DS/SS, copies versus traces, some version of merge). The bulk of current minimalist technology, however, does not fall under this rubric.  It’s chief motivations are computational and empirical. And here is where we move from minimalism as program to minimalism as theory. Phase theory, for example, does not enjoy the methodological privileges of the copy theory. The latter is the minimal way of coding for the evident existence of non-local dependencies. The former is motivated (at best) in terms of the general virtues of local domains in a computational context and the specific empirical virtues of phase based notions of locality. Phase Theory moves us from the anodyne to the very interesting indeed. It moves us from program to theory, or, more accurately, theories, for there are many ways to realize the empirical and computational goals that motivate phases.

Consider an example, e.g. choosing between a minimalist theory that includes the first more local version of the phase impenetrability condition (PIC1) or the second more expansive one (PIC2). The latter is currently favored because it fits better with a probe-goal technology given data like inverse nominative agreement in Icelandic quirky case clauses. But this is hardly the only technology available and so the decision in favor of this version of the PIC is motivated neither on general methodological nor broadly computational ones. It really is an entirely empirical matter: how well does the specific proposal handle the relevant data? In other words, lots of current phase theory is only tangentially related to the larger minimalist themes that motivate the minimalist program. And this is true for much (maybe most) of what gets currently discussed under the rubric of minimalism.

Now, you may conclude from the above that I take this to be a problem. I don’t. What may be problematic is that practitioners of the minimalist art appear to me not to recognize the difference between these different kinds of considerations. So for example, current minimalism seems to take Phases, PIC2, AGREE under Probe-Goal, and Multiple Spell Out (MSO) as defining features of minimalist syntax. A good chunk of current work consists in tweaking these assumptions (which heads are phases?, is there multiple agree?, must probes be phase heads?, are the heads relevant to AP MSO identical to CI MSO?, etc.) in response to one or another recalcitrant data set. Despite this, there is relatively little discussion (I know of virtually none) of how these assumptions relate to more general minimalist themes, or indeed to any minimalist considerations. Indeed, from where I sit, though the above are thought of as quintessentially minimalist problems, it is completely unclear to me how (or even if) they relate to the any of the features that originally motivated the minimalist program, be they methodological, conceptual or computational. Lots of the technology in use today by those working in the minimalist “framework” is different from what was standard in GB (though lots only looks different, phase theory, for example, being virtually isomorphic to classical subjacency theory), but modulo the technology, the proposals having nothing distinctively minimalist about them. This is not a criticism of the research, for there can be lots of excellent work that is orthogonal to minimalist concerns. However, identifying minimalist research with the particular technical questions that arise from a very specific syntactic technology can serve to insulate current syntactic practice from precisely those larger conceptual and methodological concerns that motivated the minimalist program at the outset.

Let me put this another way: one of the most salutary features of early minimalism is that it encouraged us to carefully consider our assumptions. Very general assumptions led us to reconsider the organization of the grammar in terms of four special levels and reject at least two and maybe all level organized conceptions of UG. It led us to rethink the core properties of phrase structure and the relation of phrase structure operations to displacement rules. It lead us to appreciate the virtues of the unification of the modules (on both methodological and Darwin’s Problem grounds) and to replace traces (and, for some (moi), PRO) with copies. It led us to consider treating all long distance dependencies regardless of their morphological surface manifestations in terms of the same basic operations. These moves were motivated by a combination of considerations. In the early days, minimalism had a very high regard for the effort of clarifying the proferred explanatory details. This was extremely salutary and, IMO, it has been pretty much lost. I suspect that part of the reason for this has been the failure to distinguish the general broad concerns of the minimalist program from the specific technical features of different minimalist theories, thus obscuring the minimalist roots of our theoretical constructs.[5]

Let me end on a slightly different note. Programs are not true of false. Theories are. Our aim is to find out how FL is organized, i.e. we want to find out the truth about FL. MP is a step forward if it helps promote good theories. IMO, it has. But part of minimalism’s charm has been to get us to see the variety of arguments we can and should deploy and how to weight them. One aim is to isolate the distinctive minimalist ideas from the others, e.g. the more empirically motivated assumptions. To evaluate the minimalist program we want to investigate minimalist theories that build on its leading ideas. One way of clarifying what is distinctively minimalist might be by using GB as a point of comparison. Contrasting minimalist proposals with their GBish counterparts would allow us to isolate the distinctive features of each.  In the early days, this was standard procedure (look at BB’s Chapter 3!). Now this is rarely done. I suggest we start re-integrating the question “what would GB say” (WWGBS) back into our research methods (here) so as to evaluate how and how much minimalist considerations actually drive current theory. Here’s my hunch: much less than the widespread adoption of the minimalist “framework” might lead you to expect.



[1] Actually there is a third: in addition to methodological and computational motifs there exists evolutionary considerations stemming from Darwin’s Problem. I won’t discuss these here.
[2] Such methodological minimalism could be applied to any theory. Not surprisingly, Chomsky’s efforts were directed at GB, but his methodological considerations could apply to virtually any extant approach.
[3] A bit of confession: I originally reacted quite negatively to the 93 paper, thinking that it could not possibly be either true or reasonable. What changed my mind was an invitation to teach a winter course in the Netherlands on syntactic theory during the winter of 93. I had the impression that my reaction was the norm, so I decided to dedicate the two weeks I was teaching to defending the nascent minimalist viewpoint. Doing this convinced me that there was a lot more to the basic idea than I had thought. What really surprised me is that taking the central tenets even moderately seriously led to entirely novel ways of approaching old phenomena, including ACD constructions, multiple interrogation/ superiority, and QR. Moreover, these alternative approaches, though possibly incorrect were not obviously incorrect and they were different. To discover that the minimalist point of view could prove so fecund given what appear to be such bare bones assumptions, still strikes me as nothing short of miraculous.
[4] As readers may know, I have tried to deploy similar considerations in the domains of control and binding. This has proven to be very controversial but, IMO, not because of the argument form deployed but due to different judgments concerning the empirical consequences. Some find the evidence in favor of grammar internal formatives like PRO to meet the burden of proof requirement outlined above. Some do not. That’s a fine minimalist debate.
[5] I further suspect that the field as a whole has tacitly come to the conclusion that MP was actually not a very good idea, but this is a topic for another post.

Wednesday, October 3, 2012

Universal Grammar



Everyone believes that humans have a Universal Grammar (UG). Why?  Because it is a one step conclusion licensed by a trivial (and it is trivial) inference from one obvious factual premise (viz. humans are linguistically capable beings) and one major premise (viz. if humans are linguistically capable then there are some mental properties on which this capacity rests).  As UG is the name we give to these mental properties there cannot be a real debate about whether humans have a UG.  What has been contentious is what UG looks like. In what follows I discuss two features that generative linguists attribute to it: (i) UG is exclusive to humans and (ii) UG is linguistically specific.  Chomsky, for example, has claimed both properties for UG. Let’s consider these in turn.

First what does “species-specific” mean?  One thing it means is that UG is a property of humans the way that bipedalism or opposable thumbs are.  Human genetics insures that individual humans normally (i.e. exempting pathological cases) come equipped with the capacity to walk erect, to grab stuff and to acquire and use a language. So, just as tigers are biologically built to have stripes, and salmon to return to their birthplaces to spawn so too humans come biologically equipped to develop linguistic facility.  The empirical basis for this observation is overwhelming and not at all subtle. Anyone who observes language acquisition in the wild cannot fail to notice that human children (regardless of socio-economic status, religious affiliation, birth marks, head size, overall IQ or anything else short of pathology) when reared in a linguistic environment come to acquire linguistic competence in the native language they are exposed to. This observational truism leads smoothly to a related truism: that the capacity to develop such linguistic facility is due to mental equipment that individual humans share simply in virtue of being human. 

These truisms conceded, even if UG is species specific to humans it does not imply that UG is exclusively a human endowment. After all the observation that humans come genetically packaged with a four chamber heart does not imply that other animals do not.  Nonetheless, as a matter of fact, it appears that whatever humans have that allows them to develop linguistic competence is not widely shared.  Concretely, so far as we can tell (and take it from me, many investigators have tried to tell!), nothing does language like humans do. Apes don’t. Dolphins don’t. Parrots don’t. Or at least they don’t obviously. While it takes considerable effort to show that other animals show language-like behavior, nobody will win a Nobel for demonstrating that 5 year olds (or even 2 year olds) talk.  Thus, it’s a safe bet that whatever is going on in the human case is qualitatively different from what we see in other animals.

This said, it is worth noting that the program of describing UG wouldn’t change much were it established that other animals had one. Depending on which animals it was it might raise additional questions of how these UGs evolved (other apes? look for common ancestor; apes and dolphins? look for language as correlate of brain size; birds and bees? who knows). If other animals talked we could (at least in principle) investigate UG by studying how they acquired and used language, though the difficulties of studying UG in this way should not be underestimated. The biggest bonus would likely arise if we decided to treat these non-human talkers as possible targets for the kinds of experiments that are morally and legally forbidden on humans (cut them up, put them in Skinner boxes), though if they really talked like us we might be squeamish about treating them the way we treat white mice, chimps and cute bunny rabbits, though considering the unquenchable (blood thirsty?) desire for pure knowledge that homo sapiens regularly displays even pleading animals might not be safe from our inquisitive minds.  However, excluding such scenarios, finding another species that talked just like we do would not substantially change the research problem. In fact, it would not make it appreciably different from studying UG by investigating the grammatical properties of different languages (English, Chinese, ASL etc.), something that generative grammarians already do in spades. So though it appears as a matter of fact that UG is exclusively a feature of humans, if the aim is to describe UG it does not much matter that this is so.

Let’s now consider the suggestion that UG is a linguistically specific capacity, to be understood as the claim that UG’s cognitive mechanisms are sui generis, different from the cognitive mechanisms at work in other areas of cognition. There is a stronger and a weaker version of this claim. The stronger one is that all (or most or many) of the cognitive powers that go into linguistic competence differ from those that support other cognitive capacities (e.g. the capacity to identify objects, recognize and “read” other minds, understand causal interactions, navigate home, keep track of where and when you hid your food, etc.). The weak one is that UG enjoys at least one cognitively distinctive feature.  Current speculation among generativists leans towards the weak claim. Much current research in syntax (especially that which flies under the flag of the Minimalist Program) aims to reduce the linguistically specific mechanisms of UG to a small core.  Chomsky, for example, has argued that the only real distinctive feature of UG is (hierarchical) recursion (a product of the operation Merge), the property whereby the outputs of rules can be treated as inputs to these same rules. This allows for the generation of endlessly large linguistic objects, a fact that sits well with the observation that there appears to be no upper bound on the size of admissible phrases and sentences in natural languages. 

How reasonable is this second claim?  To my mind, it is almost ineluctable for the following reasons. First, if as discussed above, humans have UG but other animals do not then one plausible reason for this is that humans have at least one mental power that other animals don’t. The alternative is that human cognition is not qualitatively different from that of other animals but only quantitatively so; all animals share the same basic mechanisms just that humans have more horse-power under the cranial hood.  This
option is a favorite of those excited by general learning theories. They tend to be of an empiricist bent (something we will discuss in a later post).   On this view, the same cognitive powers are used in every area of cognition, including language.  There are two kinds of puzzles this empiricist conception runs into in the domain of language. First, the species specificity problem noted above; why do only humans talk?  The second is the separability of linguistic competence from other forms of cognition. It appears that linguistic competence is independent of most other kinds of cognitive competence, e.g. IQ, face recognition, etc. Why so if all involve the same general all purpose cognitive powers? Were linguistic competence a product of general cognitive factors it would be natural to expect that success in acquiring linguistic competence tightly correlated with other cognitive achievements. But it appears that it does not; both the rich and the poor, the high IQed and the low, those with good memories and bad all seem to acquire linguistic competence at roughly the same rate and roughly the same way.

The second reason for thinking that UG involves at least one special cognitive feature is that it would be quite surprising biologically if it did not.  Many animals have (almost) unique capacities (think echo location in bats, or navigation in ants).  These capacities supervene on distinctive cognitive powers (e.g. in ants, the built in capacity to form a compass oriented map using sun position as anchor). Why should it be any different with humans and language? We are not surprised to find that other animals are specifically built to do the special things they do, why should humans be any different? 

Third, the alternative, that all learning relies on general purpose mechanisms, is as coherent as the idea that all perception relies on a general purpose sensing mechanism.  Just as seeing involves mental mechanisms and cognitive apparatus different from hearing or smelling or touching or tasting so too learning language is different from learning faces or learning to recognize objects or fixing causal interactions.  Gallistel and King (in Memory and the Computational Brain: 221) make the point well:
…a very general truth about learning mechanisms [is] they do not learn universal truths.  The relevant universal truths are built into the structure of a learning mechanism. Indeed, absent some built-in relevant universal truths and the strong constraints they place on the form of the representation that can be extracted from a given experience, learning would not be possible.
As linguistic representations from all that we currently know are formally quite different from other cognitive objects it would be surprising if their peculiar properties did not require some built in language specific mental mechanisms to allow for their acquisition and use, just as in the case of honey-bees and ants with respect to navigation.

Many resist these conclusions about UG. The idea that UG involves at least one linguistic specific feature is considered particularly controversial. But for the general reasons noted above, I can’t see why anyone would assume anything different. This judgment is reinforced once one takes a look at the linguistic competence humans have in more detail. Linguistic objects are very distinctive. UG must be able to accommodate these distinctive properties. If this means that UG invokes special cognitive powers, we should not be in the least surprised, nor perturbed.  That’s the way biology works.