Faculty of Language: parameters

Showing posts with label parameters. Show all posts

Tuesday, December 13, 2016

Domesticating parameters

I have a confession to make: I am not a fan of parameters. I have come to dislike them for two reasons. First, they don’t fit well with what I take to be Minimalist Program (MP) commitments. Second, it appears that the evidence that they exist is relatively weak (see here and here for some discussion). Let me say a bit more about each point.

First the fit with MP: as Chomsky has rightly emphasized, the more we pack into UG (the linguistically proprietary features of FL) the harder it is to solve Darwin’s Problem in the domain of language. This is a quasi-logical point, and not really debatable. So, all things being equal we would like to minimize the linguistically specific content of FL. Parameters are poster children of this sort of linguistically specific information. So, any conception of FL that comes with a FL specified set of ways that Gs can differ (i.e. a specification of the degrees of possible options) comes at an MP cost. This means that the burden of proof for postulating FL internal parameters is heavy and should be resisted unless faced with overwhelming evidence that we need them.[1]

This brings us to the second point: it appears that the evidence for such FL internal parameters is weak (or so my informants tell me (when do fieldwork among variationists)). The classical evidence for parameters comes from the observation that Gs differ wholesale not just retail. What I mean by this is that surface changes come in large units. The classic example is Rizzi’s elegant proposal linking the fixed subject constraint, pro-drop and subject inversion. What made these proposals more than a little intriguing is that they reduced what looked like very diverse G phenomena to a single source that further appeared to be fixable on the basis of degree-0 PLD. This made circumscribing macro-variation via parameters very empirically enticing. The problem was that the proposals that linked together the variation in terms of single parameter setting differences proved to be empirically problematic.

What was the main problem? It appears that we were able to find Gs that dissociated each of the relevant factors. So we could get absence of fixed subject condition effects without subject inversion or pro-drop. And we could find pro-drop without subject inversion. And this was puzzling if these surface differences all reflect the setting of a single parameter value.

I used to be very moved by these considerations but a recent little paper on domestication has started me rethinking whether there may not be a better argument for parameters, one that focuses less on synchronic facts about how Gs differ and more on how Gs change over time. Let me lay out what I have in mind, but first I want to take a short detour into the biology of domestication because what follows was prompted by an article on animal domestication (here).

This article illustrates the close conceptual ties between modern P&P theories and biology/genetics. This connection is old news and leaders in both fields have noted the links repeatedly over the years (think Jacob, Chomsky). What is interesting for present purposes is how domestication has the contours of a classic parameter setting story.

It seems that Darwin was the first to note that domestication often resulted in changes not specifically selected for by the breeder (2):

Darwin noticed that, when it came to mammals, virtually all domesticated species shared a bundle of characteristics that their wild ancestors lacked. These included traits you might expect, such as tameness and increased sociability, but also a number of more surprising ones, such as smaller teeth, floppy ears, variable colour, shortened faces and limbs, curly tails, smaller brains, and extended juvenile behaviour. Darwin thought these features might have something to do with the hybridisation of different breeds or the better diet and gentler ‘conditions of living’ for tame animals – but he couldn’t explain how these processes would produce such a broad spectrum of attributes across so many different species.

So, we choose for tameness and we get floppy ears. Darwin’s observation was strongly confirmed many years later by a dissident Soviet biologist Dimitri Belyaev. Belyaev domesticated silver foxes. More specifically (5):

He selected his foxes based on a single trait: tameness, which he measured by their capacity to tolerate human proximity without fear or aggression. Only 5 per cent of the tamest males and 20 per cent of the tamest females were allowed to breed.

Within a few generations, Belyaev started noticing some odd things. After six generations, the foxes began to wag their tails for their caretakers. After 10, they would lick their faces. They were starting to act like puppies. Their appearance was also changing. Their ears grew more floppy. Their tails became curly. Their fur went from silver to mottled brown. Some foxes developed a white blaze. Their snouts got shorter and their faces became broader. Their breeding season lengthened. These pet foxes could also read human intentions, through gestures and glances.

So, selecting for tameness, narrowly specified, brought in its wake tail wagging, floppy ears etc. The reasonable conclusion from this large scale change in traits is that they are causally linked. As the piece puts it (5):

What the Belayaev results suggest is that the manifold aspects of domestication might have a common cause in a gene or set of genes, which occur naturally in different species but tend to be selected out by adaptive and environmental pressures.

There is even a suggested mechanism: something called “neural crest cells.” But the details do not matter really. What matters is the reasoning: things that change together do so because of some common cause. In other words, common change suggests common cause. This is related to (but is not identical to) the discussion about Gs above. The above looks at whether G traits necessarily co-occur at any given time. The discussion here zeros in on whether when they change they change together. These are different diagnostics. I mention this, because the fact that the traits are not always found together does not imply that they are would not change together.

The linguistic application of this line of reasoning is found in Tony Kroch’s diachronic work. He argued that tracking the rates of change of various G properties is a good way of identifying parameters.[2] However, what I did not appreciate when I first read this is that the fact that features change together need not imply that they must always be found together. Here’s what I mean,

Think of dogs. Domestication brings with it floppy ears. So select for approachability and you move from feral foxes with pointy ears to domesticated foxes with floppy ears. However, this does not mean that every domesticated dog will have floppy ears. No, this feature can be detached from the others (and breeders can do this while holding many other traits constant) even though without attempts to detach floppy ears the natural change will be to floppy ears. So we can select against a natural trait even if the underlying relationship is one that links them. As the quote above puts it: traits that occur naturally together can be adaptively selected out.

In the linguistic case, this suggests that even if a parameter links some properties together and so if one changes they all will, we need not find them together in any one G. What we find at any one time will be due to a confluence of causes, some of which might obscure otherwise extant causal dependencies.

So where does this leave us? Well, I mention all of this because though I still think that MP considerations argue against FL internal parameters, I don’t believe that the observation that Gs can treat these properties atomically is a dispositive argument against their being parametrically related. Changing together looks like a better indicator of parametric relatedness than living together.

Last point: common change implies common cause. But common cause need not rest on there being FL internal parameters. This is one way of causally linking seemingly disparate factors. It is not clear that it is the only or even the best way. What made Rizzi’s story so intriguing (at least for me) is that it tied together simple changes visible in main clauses with variation in (not PLD visible) embedded clause effects. So one could conclude from what is available in the PLD to what will be true of the LD in general. These cases are where parameter thinking really pay off, and these still seem to be pretty thin on the ground, as we might expect if indeed FL has no internal parameters.

[1] There is another use of ‘parameter’ where the term is descriptive and connotes the obvious fact that Gs differ. Nobody could (nor does) object to parameters in this sense. The MP challenging one is the usage wherein FL prescribes a (usually finite) set of options that (usually, finitely) circumscribe the number of possible Gs. Examples include the pro-drop parameter, the V raising parameter, the head parameter.

[2] See his “Reflexes of grammar in patterns of language change.” You can get this from his website here. Here’s a nice short quote summing up the logic: “…since V to I raising in English is lost in all finite clauses with tensed main verbs and at the same rate, there must be a factor or factors which globally favor this loss” (32).

Saturday, November 12, 2016

Linguistic diversity

Here’s an interesting piece by Nick Evans on the indigenous languages of Australia. It is imbued with a sensibility concerning the study of language quite different than my own (which is partly why I found it interesting) but it also raises some questions that someone who approaches linguistic questions from my direction should find intriguing. In what follows I will discuss both points of con- and di-vergence. But before starting, let me reiterate that I found the piece intriguing and I could imagine spending quite a bit of pleasant time over several cold beers talking to Nick about his work, which is a long-winded way of saying that you should take a look at the piece for yourself.[1]

Some comments:

(1) Nick worries about a question whose utility from where I sit is not at all evident: How to distinguish a language from a dialect (see 4). This is in service of trying to establish the integrity of the Australian language family, which is in turn in service of trying to estimate how fast languages change and how old language families are. The idea that Nick moots is that the Australian language family is 60,000 years old and that this raises the possibility that the emergence of the Faculty of Language is much older still. In other words, Nick takes the dating of the language family question as bearing on the emergence of the FL question. Clearly, the second one is of interest to devotees of the Minimalist Program.

However, I am not sure that I would take the question as nearly as well posed as Nick does. I do not see that there is a principled way of distinguishing languages from dialects. The one that he proposes is the following: “a language is something that is distinct enough to needs its own distinctive descriptive grammar” (5). But what does ‘distinctive enough’ mean? Darn if I know. For me a G is a mental construct. It is almost certain that no two Gs are the same (i.e. no two people have exactly the same Gs). So the question is one of more or less. But so far as I know this becomes a question of G overlap and the degree of overlap will not be precise. But we need some measure of this to see how different two Gs are so as to get a measure of G difference and hence, change. Maybe such measures exist, but I know of none, and unless one specifies some dimensions of similarity (which may exist (recall, I am no expert on these matters)) then the rate of change issue becomes hard to specify.

This said, if we could establish a rate of G change then this might be useful in establishing how old FL is, and given that the only evidence we have for when it emerged is indirect (the emergence of complex cultural artifacts (i.e. the big bang)) this would be useful. That said, I doubt that it would significantly alter the backdrop for Darwin’s Problem as it applies to language. The big fact is that FL appeared more or less in one piece and it has not evolved since. There is no indication from what Nick writes that these older Gs are qualitatively different from contemporary ones. This means that the FL required to acquire them is effectively the same as the one that we still possess. And if that is the case then the logic of Darwin’s Problem as it applies to MP remains unchanged. So far as someone with my interests is concerned, that is enough.

Let me add a question before moving on: is there a measure of G change (or the more ambitious rate of G change?) out there? Note that this would be a measure of how Gs of the same language change. This seems to require reifying languages so that two Gs can be Gs of the same language even if different in detail. So far as I know, modern GG has only an inchoate qualitative purchase on the notion of a language, and it has not been important to make it more precise. In fact, it is part of a dispensable idealization concerning ideal-speaker hearers. Nick’s project requires theoretically grounding the informal notion sufficient for most GG inquiries. I am skeptical, but wish him luck.

(2) Nick raises a second question: why are there so many languages anyhow (8ff)? He asks this in order to focus efforts on identifying “the social processes that drive differentiation.” I also find this question interesting, but in a slightly different way. From my perspective, Gs are products of three factors: (i) the structure of FL/UG, (ii) the nature of the PLD (the input data that the LAD uses to construct its G given the options FL/UG allows) and (iii) the learning theory that LADs use to organize the PLD and uses to construct a particular G given (i) and (ii).[2] The question I find interesting is why FL/UG makes so many Gs available. Why not simply hardwire in one G and be done with it? Why is FL/UG so open textured and environmentally sensitive (i.e. open to the effects of PLD)? Note, that FL/UG could have specified one G in the species (say all Gs have more or less the syntax of “English”). This is roughly what happens in some songbirds: all birds of a species sing the same song. Why isn’t this what happened for language? In P&P terms this would mean an FL/UG with no parameters. Why don’t we have this? And does the fact that we don’t have this tell us anything interesting about FL/UG?

There are several possibilities. Mark Baker has offered a kind of evolutionary rationale. He thinks that Gs are codes that enable speakers of the same language to conceal information from outsiders (here:8):

Suppose that the language faculty has a concealing function as well as a revealing function. Our language faculty could have the purpose of communicating complex propositional information to collaborators while concealing it from rivals that might be listening in.

I say evolutionary, for I am assuming that it is because concealment can confer selective advantages that we have such a code. Though an ingenious idea, I am skeptical for the obvious reasons. This parameterized coding scheme is now species wide and anyone can acquire any of the coding schemes (aka Gs) if placed in the right linguistic environment. If the goal was opacity useful for segregating in groups form out groups then one can imagine schemes that would make it impossible (or at least very difficult) for outlanders to acquire the code would have been a superior option. But so far as we can tell, all humans are equally adept at learning any G (i.e. set of parameter values). Perhaps what Mark has in mind is that it is hard to learn a non native G later in life and this suffices for whatever advantages concealment promotes. Maybe.

I have remarked before, that parametrization is a very curious fact (if it is a fact) (here), one that suggests that, contrary to standard assumptions, typological difference tell us very little about the structure of FL. However, putting this to one side, it is interesting that Gs can be so different and Nick’s question of why there is so much variation is a good one.

What’s his answer? There are social processes that drive differentiation and we need to identify these. He suggests two steps (8-9):

The first step is to see how new linguistic elements are born: new sounds, new grammatical structures, new words, new meanings. What makes the range of these more or less diverse in different groups? For example, does being multilingual add options to the pool? ...

The second step is to find how the society promotes one variant over another. It is clear that some groups have linguistic ideologies that place a high premium on harnessing linguistic means to say “Our clan is different”, “our moiety is different” and so on…

This might be right so far as it goes, but it presupposes that FL/UG allows all of these options to begin with. In other words, given that FL allows diverse Gs what drives the specific diversity we see. Baker (and me) are interested in another question: why does FL allow the diversity to begin with. What’s wrong with an FL that, as it were, had no parameters at all?

Here’s my thought: an FL with fixed parameters is more biologically expensive than an open textured one. The idea is that if evolution can rely on there always being enough PLD to allow a child to acquire the local G then there is no reaons for evolution to code information in the genome that the PLD makes readily available. If fixing info in the genome is costly then it will not be put there unless it must be. So, an open textured system is what we should expect. That’s the idea.

I think that this fits pretty well with MP thinking as well. If what allows FL to emerge is a small addition, say an operation like Merge, (an addition that remains very stable and unchanging over time) then given that Merge is consistent with various surface differences then so long as the non linguistic proprietary parts of FL suffice with Merge to generate Gs then we should not expect more linguistic proprietary info to be biologically coded. If Merge is enough, then it’s all that we will get. Note, that this suggests that MP like systems will not likely have an FL/UG specification of a particular parameter space (see here and here for some discussion). If this can be fleshed out, then the reason we have G diversity is that fixed parameters are costly and MP takes FL to be what we get we add only a smidgen of linguistically proprietary structure to an otherwise language ready cognitive system. In other words, typologically diversity (PLD sensitive G generation) is just what MP ordered.

(3) Nick provides sort of an antidote for my tolerance for inferring UG principles from the properties of a single G. As he puts it (12-3):

We are just coming out of half a century where generative linguistics, as inspired by the great linguist Noam Chomsky, placed great emphasis on ‘Universal Grammar’, very much seeing all languages as alike with only minor variations. Part of this emphasis meant claiming there are all sorts of imaginable design options that are simply not found in language. For example, Steven Pinker and Paul Bloom wrote, in the early 90s, that ‘‘no language uses noun affixes to express tense’’. Now clearly this is simply wrong for Kayardild. It is an example of what can go wrong, scientifically, when one extrapolates prematurely from too limited a range of cases. Now there’s nothing wrong with the scientific strategy of making strong statements to invite falsification. But what Kayardild shows us – and many other languages I could have used to illustrate the structural originality of Australian languages, in different ways – is that we really need to get out there and describe languages, as they are, to realize the full richness and diversity of how humans have colonized the design space of language through the languages they have built through use.

I say “sort of” because Nick’s observations are not couched in terms of Gs but in terms of languages and the problems he cites have less to do with the properties of Gs than with their surface manifestations. Chomsky did not (and does not) see “all languages alike.” What he saw/sees was/is that all I-languages are pretty much alike. Missing the ‘I’ prefix threatens confusing Chomsky for Greenberg. I can understand that if one’s interest are mainly typological and that diversity is what gets you excited then dropping the ‘I’ will seem like the best way to import Chomsky’s insights into your work. But this is a mistake (as you knew I would say). It is not the diversity of languages that we need to investigate if your goal is GGish, but the diversity of Gs and these will only be indirectly related to surface patterns we observe. The Pinker-Bloom example is very much a Greenberg conception of universal at least as Nick takes it to be refuted by Kyardild (it appears to deal with features of overt affixes). If we are to learn about FL/UG by exploring the rich “design space of language” then we need to keep in mind that it is I-language space we should be exploring. Moreover, when it comes to I-language space I am less sure than Nick is that

[t]he world of languages holds more possibilities than any linguist has imagined, and Australian languages have taken the ‘design space’ in lots of rare and unusual directions, so that we’re still finding new phenomena that people hadn’t imagined before (14).

In fact, from where I sit, we have actually found relatively few new universals since the mid 1980s. If this is correct, oddly, exploring the ‘design space’ has enriched our understanding of language diversity but has left our understanding of I-language variation pretty much where it was when only a small number of languages served as linguistic model organisms.[3]

That’s it. I think that Nick has asked some interesting questions, the most interesting being why FL/UG allows G variation. We are interested in different things, but the paper was fun to read and Kayardild sounds like it can take you on a wild ride. Like I said, I’d love to have a beer with him.

[1] Thx to Kleanthes for sending me the URL.

[2] This follows Anderson discussed a bit here.

[3] See here for a partial list. The observant reader will note that most of these are very old. It would be nice to have some candidate universals that are of more recent vintage, say discovered in the last 20 years. If my hunch is right that recent contributions to the list have been sparse of late, this is interesting and worth trying to understand.

Monday, December 7, 2015

How deep are typological differences?

Linguists like to put languages into groups. Some of these, as in biology, are groupings based on historical descent (Germanic vs Romance), some of long standing (Indo-European vs Ural Altaic vs Micronesian). Some categorizations show more sensitive to morpho-syntactic form (analytic vs agglutinative) and some are tied to whether they got to where they are spoken by tough guys who rode little horses over long distances (Finno-Ugaric (and Basque?)). There is a tacit agreement that these groupings are significant typologically and hence linguistically significant as well. In what follows, I want to query the ‘hence.’ I would like to offer a line of argument that concludes that typological differences tell us nothing about FL. Or, to put this another way, the structure of FL in no way reflects the typological differences that linguists have uncovered. Or, to put this in reverse, typological distinctions among languages have no FL import. If this is correct, then typology is not a good probe into the structure of FL. And so if your interest is the structure of FL (i.e. if liming the fine structure of FL is how you measure linguistic significance), you might be well advised to study something other than typology.

Before proceeding let me confess that I am not all that confident about the argument that follows. There are several reasons for this. First, I am unsure that the premises are as solid as I would like them to be. As you will see, it relies on some semi-evolutionary speculation (and we all know how great that is, not!). Second, even given the premises, I am unsure that the logic is airtight. However, I think that the argument form is interesting and it rests on widely held minimalist premises (based on a relatively new and, IMO, very important observation regarding the evolutionary stability of FL), so even if the argument fails it might tell us something about these premises. So, with these caveats, cavils, hedges and CYAs out of the way, here is the argument.

Big fact 1: the stability of FL. Chomsky has emphasized this point recently. It is the observation that whatever change (genetic, epi-genetic, angelic) led to the re-wiring of the human brain thus supporting the distinctive species specific nature of human linguistic facility, whatever change that was, it has remained intact and unchanged in the species since its biological entrance. How do we know?

We know because of Big Fact 2: any kid can learn any language and any kid learning any language does so in essentially the same way. Thus, for example, a kid from NYC raised in Papua New Guinea (PNG) will acquire the local argot just like a native (and in the same way, with the same stages, making the same kinds of mistakes etc.). And vice versa for a PNGer in NYC, despite the relative biological isolation of PNGers for a pretty long period of time. If you don’t like this pair, plug in any you would like, say Piraha speakers and German speakers or Hebrew Speakers and Japanese. A child’s biological background seems irrelevant to which Gs it can acquire and how it acquires them. Thus, since humans separated about 100kya (trek out of Africa and all that), FL has remained biologically stable in the species. It has not changed. That’s the big fact of interest.

Now, observe that 100k years is more than enough time for evolution to work its magic. Think of Darwin’s finches. As soon as a niche opened up, these little critters evolved to exploit it. And quickly filling niches is not reserved just for finches. Humans do the same thing. Think of lactase persistence (here). The capacity to usefully digest milk products arose with the spread of cattle domestication (i.e. roughly 5-10kya).[1] So, humans also evolutionarily track novel “environmental” options and change to exploit them at a relatively rapid rate. If 5-10k years is enough for the evolution of the digestive system, then 100k years should be enough for FL to “evolve” should there be something there to evolve. But, as we saw above, this seems to be false. Or, more accurately, Big Fact 2 implies Big Fact 1 and Big Fact 1 denies that FL has evolved in the last 100k years. In sum, it seems that once the change allowing FL to emerge occurred nothing else happened evolution wise to differentially affect this capacity across humans. So far as we can tell, all human FLs are the same.

We can add to this a third “observation,” or, more accurately, something I believe that linguists think is likely to be true though we probably only have anecdotal evidence for it. Let’s call this Big Fact 3 (understanding the slight tendentiousness of the “fact” part): kids can learn multiple first languages simultaneously and do so in the same way despite the languages involved. [2] So, LADs can acquire English and Hebrew (a Germanic and Semitic language) as easily as German and Swedish (two Germanic languages), or Navajo and French or Basque and Spanish as easily as French and Spanish or… In fact, kids will acquire any two languages no matter how typologically distinct in effectively the same way. In short, typological difference has no discernable impact on the course of acquisition of two first languages. So, not only is there no ethnically-biologically based genetic pre-disposition among FLs for some Gs over others, there is not even a cognitive preference for acquiring Gs of the same type over Gs that are typologically radically different.

If thess “facts” are indeed facts, the conclusion seems obvious: to the degree that we understand FL as that cognitive-neural feature of humans that underlies our capacity to acquire Gs then it is the same across all humans (in the same sense that hearts or kidneys are, (i.e. abstracting from normal variation)) and this implies that it has not evolved despite apparently sufficient time for doing so.[3]

This raises an obvious question: why not? Why does the process of language acquisition not care about typological differences? Or, if typological differences run deep then why have they had no impact on the FLs of people who have lived in distinct linguistic eco-niches?

Here’s one obvious answer: typological differences are irrelevant to FL. However big these differences may seem to linguists, FL sees these typologically “different” languages as all of a piece. In other words, from the point of view of FL, typological variation is just surface fluff.

Same thing, said differently: there is a difference between variation and typology. Variation is a fact, languages appear on the surface to have different properties. Typology is a mid-level theoretical construct. It is the supposition that variation comes in family types, or, that variation is (at least in part) grammatically principled. The argument above does not question the fact of variation. It calls into question whether this variation is in any FL sense principled, whether the mid level construct is FL significant. It argues it isn’t.

Let me put this last point more positively. Variation establishes a puzzle for linguists in that kids acquire Gs that result in different surface features. So, FL plus a learning theory must be able to accommodate variation. However, if the above is on the right track, then this is not because typological cleavages reflect structural fault lines (or G-attractors) in FL or the learning theory. How exactly FL and learning theories yield distinctive Gs is currently unknown. We have good cases studies of how experience can fix different Gs with different surface properties but I think it is fair to say that there is still lots more fundamental work to be done.[4] Nonetheless, even without knowing how this happens, the argument above suggests that it does not happen in virtue of a typologically differentiated FL.

Let me end with one last observation. Say that the above is correct, it seems to me that a likely corollary is that FL has no internal parameters. What I mean is that FL does not determine a finite space of possible Gs, as GB envisioned. Why not?

Well say that acquisition consisted in fixing the values of a finite series of FL internal open parameters. Then why wouldn’t evolution have fixed the FL of speakers of typologically isolated languages so that the relevant typological parameters were no longer open. On the assumption that “closing” such a parameter would yield an acquisition advantage (fixing parameters would reduce the size of the parameter space, so the more fixed parameters the better as this would simplify the acquisition problem), why wouldn’t evolution take advantage of the eco-niche to speed up G acquisition? Thus, why wouldn’t humans be like finches with FLs quickly specializing to their typological eco-niches? Doesn’t this suggest that parameters are not internal properties of FL?

I am pretty sure that readers will find much to disagree with here. That’s great. I think that the line of reasoning above is reasonable and hangs together. Moreover, if correct, I believe that it is important for pretty obvious reasons. But, there are sure to be counter-arguments and other ways of understanding the “facts.” Can’t wait.

[1] The example provided by Bill Idsardi. Thanks.

[2] By two “first” languages I intend to signal the fact that this is a different process from second language acquisition. Form the little I know about this process, there is no strcit upper bound on how many first languages one can simultaneously acquire, though I am willing to bet that past 3 or 4 the process gets pretty hairy.

[3] It also strongly casts doubt on the idea that FL itself is the product of an evolutionary process. If it is, the question becomes why did it stop when it did and not continue after humans separated? Why no apparent changes in the last 100k years?

[4] Charles Yang has a forthcoming book on this topic (which I heartily recommend) and Jeff Lidz has done some exemplary work showing how to think of FL and learning theory together to deliver integrated accounts of real time language acquisition. I am sure that there is other work of this kind. Feel free to mention them in comments.

Tuesday, February 11, 2014

Plato, Darwin, P&P and variation

Alex C (in the comment section here (Feb. 1)) makes a point that I’ve encountered before that I would like to comment on. He notes that Chomsky has stopped worrying about Plato’s Problem (PP) (as has much of “theoretical” linguistics as I noted in the previous post) and suggests (maybe this is too much to attribute to him, if so, sorry Alex) that this is due to Darwin’s Problems (DP) occupying center stage at present. I don’t want to argue with this factual claim, for I believe that there’s lots of truth to it (though IMO, as readers of the last several posts have no doubt gathered, theory of any kind is largely absent from current research). What I want to observe is that (1) there is a tension between PP and DP and (2) that resolving it opens an important place for theoretical speculation. IMO, one of the more interesting facets of current theoretical work is that it proposes a way of resolving this tension in an empirically interesting way. This is what I want to talk about.

First the tension: PP is the observation that the PLD the child uses in developing its G is impoverished in various ways when one compares it to the properties of Gs that children attain. PP, then, is another name for the Poverty of Stimulus Problem (POS). Generative Grammarians have proposed to “solve” this problem by packing FL with principles of UG, many of which are very language specific (LS), at least if GB is taken as a guide to the content of FL. By LS, I mean that the principles advert to very linguisticky objects (e.g. Subjects, tensed clauses, governors, case assigners, barriers, islands, c-command, etc) and very linguisticky operations (agreement, movement, binding, case assignment, etc.). The idea has been that making UG rich enough and endowing it with LS innate structure will allow our theories of FL to attain explanatory adequacy, i.e. to explain how, say, Gs obey islands despite the absence of good and bad data relevant to fixing them present in the PLD.

By now, all of this is pretty standard stuff (which is not to say that everyone buys into the scheme (Alex?)), and, for the most part, I am a big fan of POS arguments of this kind and their attendant conclusions. However, even given this, the theoretical problem that PP poses has hardly been solved. What we do have (again assuming that the POS arguments are well founded (which I do believe)) is a list of (plausibly) invariant(ish) properties of Gs and an explanation for why these can emerge in Gs in the absence of the relevant data in the PLD required to fix them. Thus, why do movement rules in a given G resist extraction from islands? Because something like the Subjacency/Barriers theory is part of every Language Acquisition Device’s (LAD) FL, that’s why.

However, even given this, what we still don’t have is an adequate account of how the variant properties of Gs emerge when planted in a particular PLD environment. Why is there V to T in French but not in English? Why do we have inverse control in Tsez but not Polish? Why wh-in-situ in Chinese but multiple wh to C in Bulgarian. The answer GB provided (and so far as I can tell, the answer still) is that FL contains parameters that can be set in different ways on the basis of PLD and the various Gs we have are the result of differential parameter setting. This is the story, but we have known for quite a while that this is less a solution to the question of how Gs emerge in all their variety than it is an explanation schema for a solution. P&P models, in other words, are not so much well worked out theories than they are part of a general recipe for a theory that were we able to cook it, would produce just the kind of FL that could provide a satisfying answer to the question of how Gs can vary so much. Moreover, as many have observed (Dresher and Janet Fodor are two notable examples, see below) there are serious problems with successfully fleshing out a P&P model.

Here are two: (i) the hope that many variant properties of Gs would hinge on fixing a small number of parameters seems increasingly empirically uncertain. Cederic Boeckx and Fritz Newmeyer have been arguing this for a while, and while their claims are debated (and by very intelligent people so, at least for a non-expert like me, the dust is still too unsettled to reach firm conclusions), it seems pretty clear that the empirical merits of earlier proposed parameterizations are less obvious than we took them to be. Indeed, there appears to some skepticism about whether there are any macro-parameters (in Baker’s sense[1]) and many of the micro-parametric proposals seem to end up restating what we observe in the data: that languages can differ. What made early macro-parameter theories interesting is the idea that differences among Gs come in largish clumps. The relation between a given parameter setting and the attested surface differences was understood as one to many. If, however, it turns out that every parameter correlates with just a single difference then the value of a parametric approach becomes quite unclear, at least so far as acquisition considerations are concerned. Why? Because it implies that surface differences are just due to differing PLD, not to the different options inherent in the structure of FL. In other words, if we end up with one parameter per surface difference then variation among Gs will not be as much of a window into the structure of FL as we thought it could be.

Here’s another problem: (ii) the likely parameters are not independent. Dresher (and friends) has demonstrated this for stress systems and Fodor (and friends) has provided analogous results for syntax. The problem with a theory where parameters are not independent is that they make it very hard to see how acquisition could be incremental. If it turns out that the value of any parameter is conditional on the value of every other parameter (or very many others) then it would seem that we are stuck with a model in which all parameters must be set at once (i.e. instantaneous learning). This is not good! To evade this problem, we need some way of imposing independence on the parameters so that they can be set piecemeal without fear of having to re-set them later on. Both Dresher and Fodor have proposed ways of solving this independence problem (both elaborate a richer learning theory for parameter values to accommodate this problem). But, I think that it is fair to say that we are still a long way from a working solution. Moreover, the solutions provided all involve greatly enriching FL in a very LS way. This is where PP runs into DP. So let’s return to the aforementioned tension between PP and DP.

One way to solve PP is to enrich FL. The problem is that the richer and more linguistically parochial FL is, the harder it becomes to understand how it might have evolved. In other words, our standard GB tack in solving PP (LS enrichment of FL) appears to make answering DP harder. Note I say ‘appears.’ There are really two problems, and they are not equally acute. Let me explain.

As noted above, we have two things that a rich FL has been used to explain; (a) invariances characteristic of all Gs and (b) the attested variation among Gs. In a P&P model, the first ‘P’ handles (a) and the second (b). I believe that we have seen glimmers of how to resolve the tension between PP’s demands on FL versus DP’s as regards the principles part of P&P. Where things have become far more obscure (and even this might be too kind) involves the second parametric P. Here’s what I mean.

As I’ve argued in the past, one important minimalist project has been to do for the principles of GB what Chomsky did for islands and movement via the theory of subjacency in On Wh Movement (OWM). What Chomsky did in this paper is theoretically unify the disparate island effects by unifying all non-local (A’) dependency constructions by proposing that they have a common movement core (viz. move WH) subject to locality restrictions characterized by Bounding Theory (BT). This was terrifically inventive theory and aside from rationalizing/unifying Ross’s very disparate Island Effects, the combination of Move WH + BT predicted that all long movement would have to be successive cyclic (and even predicted a few more islands, e.g. subject islands and Wh-islands).[2]

But to get back to PP and DP, one way of regarding MP work over the last 20 years is as an attempt to do for GB modules what Chomsky did for Ross’s Islands. I’ve suggested this many times before but what I want to emphasize here is that this MP project is perfectly in harmony with the PP observation that we want to explain many of the invariances witnessed across Gs in terms of an innately structured FL. Here there is no real tension if this kind of unification can be realized. Why not? Because if successful we retain the GB generalizations. Just as Move WH + BT retain Ross’s generalizations, a successful unification within MP will retain GB’s (more or less) and so we can continue to tell the very same story about why Gs display the invariances attested as we did before. Thus, wrt this POS problem, there is a way to harmonize DP concerns with PP concerns. Of course, this does not mean that we will successfully manage to unify the GB modules in a Move WH + BT way, but we understand what a successful solution would look like and, IMO, we have every reason to be hopeful, though this is not the place to defend this view.

So, the principles part of P&P is, we might say, DP compatible (little joke here for the cognoscenti). The problem lies with the second P. FL on GB was understood to provide not only the principles of invariance but also to specify all the possible ways that Gs could differ. The parameters in GB were part of FL! And it is hard to see how to square this with DP given the terrific linguistic specificity of these parameters. The MP conceit has been to try and understand what Gs do in terms of one (perhaps)[3] linguistically specific operation (Merge) interacting with many general cognitive/computational operations/principles. In other words, the aim has been to reduce the parochialism of the GB version of FL. The problem with the GB conception of parameters is that it is hard to see how to recast them in similarly general terms. All the parameters exploit notions that seem very very linguo-centric. This is especially true of micro parameters, but it is even true of macro ones. So, theoretically, parameters present a real problem for DP, and this is why the problems alluded to earlier have been taken by some (e.g. me) to suggest that maybe FL has little to say about G-variation. Moreover, it might explain why it is that, with DP becoming prominent, some of the interest in PP has seemed to wane. It is due to a dawning realization that maybe the structure of FL (our theory of UG) has little to say directly about grammatical variation and typology. Taken together PP and DP can usefully constrain our theories of FL, but mainly in licensing certain inferences about what kinds of invariances we will likely discover (indeed have discovered). However, when it comes to understanding variation, if parameters cannot be bleached of their LSity (and right now, this looks to me like a very rough road), it looks to me like they will never be made to fit with the leading ideas of MP, which are in turn driven by DP.

So, Alex C was onto something important IMO. Linguists tend to believe that understanding variation is key to understanding FL. This is taken as virtually an article of faith. However, I am no longer so sure that this is a well founded presumption. DP provides us with some reasons to doubt that the range of variation reflects intrinsic properties of FL. If that is correct, then variation per se may me of little interest for those interested in liming the basic architecture of FL. Studying various Gs will, of course, remain a useful tool for in getting the details of the invariant principles and operations right. But, unlike earlier GB P&P models, there is at least an argument to be made (and one that I personally find compelling) that the range of G-variation has nothing whatsoever to do with the structure of FL and so will shed no light on two of the fundamental questions in Generative Grammar: what’s the structure of FL and why?[4]

[1] Though Baker, a really smart guy, thinks that there are so please don’t take me as endorsing the view that there aren’t any. I just don’t know. This is just my impression from linguist in the street interviews.

[2] The confirmation of this prediction was one of the great successes of generative grammar and the papers by, e.g. Kayne and Pollock, McCloskey, Chung, Torrego, and many others are still worth reading and re-reading. It is worth noting that the Move WH + BT story was largely driven by theoretical considerations, as Chomsky makes clear in OWM. The gratifying part is that the theory proved to be so empirically fecund.

[3] Note the ‘perhaps.’ If even merge is in the current parlance “third factor” then there is nothing taken to be linguistically special about FL.

[4] Note that this quite a bit of room for “learning” theory. For if the range of variation is not built into FL then why we see the variation we do must be due to how we acquire Gs given FL/UG. The latter will still be important (indeed critical) in that any larning theory will have to incorporate the isolated invariances. However, a large part of the range of variation will fall outside the purview of FL. I discuss this somewhat in the last chapter of A theory if syntax for any of you with a prurient interest in such matters. See, in particular, the suggestion that we drop the switch analogy in favor of a more geometrical one.

Faculty of Language

Comments