I have a confession to make: I am not a fan of parameters. I have come to dislike them for two reasons. First, they don’t fit well with what I take to be Minimalist Program (MP) commitments. Second, it appears that the evidence that they exist is relatively weak (see here and here for some discussion). Let me say a bit more about each point.
First the fit with MP: as Chomsky has rightly emphasized, the more we pack into UG (the linguistically proprietary features of FL) the harder it is to solve Darwin’s Problem in the domain of language. This is a quasi-logical point, and not really debatable. So, all things being equal we would like to minimize the linguistically specific content of FL. Parameters are poster children of this sort of linguistically specific information. So, any conception of FL that comes with a FL specified set of ways that Gs can differ (i.e. a specification of the degrees of possible options) comes at an MP cost. This means that the burden of proof for postulating FL internal parameters is heavy and should be resisted unless faced with overwhelming evidence that we need them.
This brings us to the second point: it appears that the evidence for such FL internal parameters is weak (or so my informants tell me (when do fieldwork among variationists)). The classical evidence for parameters comes from the observation that Gs differ wholesale not just retail. What I mean by this is that surface changes come in large units. The classic example is Rizzi’s elegant proposal linking the fixed subject constraint, pro-drop and subject inversion. What made these proposals more than a little intriguing is that they reduced what looked like very diverse G phenomena to a single source that further appeared to be fixable on the basis of degree-0 PLD. This made circumscribing macro-variation via parameters very empirically enticing. The problem was that the proposals that linked together the variation in terms of single parameter setting differences proved to be empirically problematic.
What was the main problem? It appears that we were able to find Gs that dissociated each of the relevant factors. So we could get absence of fixed subject condition effects without subject inversion or pro-drop. And we could find pro-drop without subject inversion. And this was puzzling if these surface differences all reflect the setting of a single parameter value.
I used to be very moved by these considerations but a recent little paper on domestication has started me rethinking whether there may not be a better argument for parameters, one that focuses less on synchronic facts about how Gs differ and more on how Gs change over time. Let me lay out what I have in mind, but first I want to take a short detour into the biology of domestication because what follows was prompted by an article on animal domestication (here).
This article illustrates the close conceptual ties between modern P&P theories and biology/genetics. This connection is old news and leaders in both fields have noted the links repeatedly over the years (think Jacob, Chomsky). What is interesting for present purposes is how domestication has the contours of a classic parameter setting story.
It seems that Darwin was the first to note that domestication often resulted in changes not specifically selected for by the breeder (2):
Darwin noticed that, when it came to mammals, virtually all domesticated species shared a bundle of characteristics that their wild ancestors lacked. These included traits you might expect, such as tameness and increased sociability, but also a number of more surprising ones, such as smaller teeth, floppy ears, variable colour, shortened faces and limbs, curly tails, smaller brains, and extended juvenile behaviour. Darwin thought these features might have something to do with the hybridisation of different breeds or the better diet and gentler ‘conditions of living’ for tame animals – but he couldn’t explain how these processes would produce such a broad spectrum of attributes across so many different species.
So, we choose for tameness and we get floppy ears. Darwin’s observation was strongly confirmed many years later by a dissident Soviet biologist Dimitri Belyaev. Belyaev domesticated silver foxes. More specifically (5):
He selected his foxes based on a single trait: tameness, which he measured by their capacity to tolerate human proximity without fear or aggression. Only 5 per cent of the tamest males and 20 per cent of the tamest females were allowed to breed.
Within a few generations, Belyaev started noticing some odd things. After six generations, the foxes began to wag their tails for their caretakers. After 10, they would lick their faces. They were starting to act like puppies. Their appearance was also changing. Their ears grew more floppy. Their tails became curly. Their fur went from silver to mottled brown. Some foxes developed a white blaze. Their snouts got shorter and their faces became broader. Their breeding season lengthened. These pet foxes could also read human intentions, through gestures and glances.
So, selecting for tameness, narrowly specified, brought in its wake tail wagging, floppy ears etc. The reasonable conclusion from this large scale change in traits is that they are causally linked. As the piece puts it (5):
What the Belayaev results suggest is that the manifold aspects of domestication might have a common cause in a gene or set of genes, which occur naturally in different species but tend to be selected out by adaptive and environmental pressures.
There is even a suggested mechanism: something called “neural crest cells.” But the details do not matter really. What matters is the reasoning: things that change together do so because of some common cause. In other words, common change suggests common cause. This is related to (but is not identical to) the discussion about Gs above. The above looks at whether G traits necessarily co-occur at any given time. The discussion here zeros in on whether when they change they change together. These are different diagnostics. I mention this, because the fact that the traits are not always found together does not imply that they are would not change together.
The linguistic application of this line of reasoning is found in Tony Kroch’s diachronic work. He argued that tracking the rates of change of various G properties is a good way of identifying parameters. However, what I did not appreciate when I first read this is that the fact that features change together need not imply that they must always be found together. Here’s what I mean,
Think of dogs. Domestication brings with it floppy ears. So select for approachability and you move from feral foxes with pointy ears to domesticated foxes with floppy ears. However, this does not mean that every domesticated dog will have floppy ears. No, this feature can be detached from the others (and breeders can do this while holding many other traits constant) even though without attempts to detach floppy ears the natural change will be to floppy ears. So we can select against a natural trait even if the underlying relationship is one that links them. As the quote above puts it: traits that occur naturally together can be adaptively selected out.
In the linguistic case, this suggests that even if a parameter links some properties together and so if one changes they all will, we need not find them together in any one G. What we find at any one time will be due to a confluence of causes, some of which might obscure otherwise extant causal dependencies.
So where does this leave us? Well, I mention all of this because though I still think that MP considerations argue against FL internal parameters, I don’t believe that the observation that Gs can treat these properties atomically is a dispositive argument against their being parametrically related. Changing together looks like a better indicator of parametric relatedness than living together.
Last point: common change implies common cause. But common cause need not rest on there being FL internal parameters. This is one way of causally linking seemingly disparate factors. It is not clear that it is the only or even the best way. What made Rizzi’s story so intriguing (at least for me) is that it tied together simple changes visible in main clauses with variation in (not PLD visible) embedded clause effects. So one could conclude from what is available in the PLD to what will be true of the LD in general. These cases are where parameter thinking really pay off, and these still seem to be pretty thin on the ground, as we might expect if indeed FL has no internal parameters.
 There is another use of ‘parameter’ where the term is descriptive and connotes the obvious fact that Gs differ. Nobody could (nor does) object to parameters in this sense. The MP challenging one is the usage wherein FL prescribes a (usually finite) set of options that (usually, finitely) circumscribe the number of possible Gs. Examples include the pro-drop parameter, the V raising parameter, the head parameter.
 See his “Reflexes of grammar in patterns of language change.” You can get this from his website here. Here’s a nice short quote summing up the logic: “…since V to I raising in English is lost in all finite clauses with tensed main verbs and at the same rate, there must be a factor or factors which globally favor this loss” (32).