Comments

Friday, January 25, 2013

Parameters, anyone?

This post was prompted by Neil Smith's recent reflections on parameters on this blog. (It also builds on my own writings on the matter, which the interested reader can find on Lingbuzz.)

Neil writes that "Parametric Variation (PV) has been under attack. ... there has been widespread reluctance to accept the existence of parameters as opposed to particular rules (see e.g. Newmeyer 2005, Boeckx 2010). A vigorous counter-movement (see e.g. the special issue of Theoretical Linguistics 36.1, devoted to Holmberg 2010) shows that no consensus has been reached." I beg to differ. I think the attack's over, the consensus has been reached: Parameters (with capital P, the parameters of the classical time, those actually worth fighting for) are gone. Proof? Even those who fought for them ("vigorously", as Neil puts it) have accepted defeat. They confess that parameters are not part of UG. They have begun to be pretty upfront about this. Here's a taste (from the first line of "Mafioso Parameters and the Limits of Syntactic Variation", abstract by Roberts and colleagues, vigorous fighters, who will present this at this year's WCCFL and other variation-related event like the Bilbao conference in June.)
"We build on recent work proposing non-UG-specified, emergent parameter hierarchies ([1]),
arguing that a system of this kind not only addresses objections levelled against earlier formulations
of parameters ([2], [3], [4]), but also potentially offers a suitably restrictive theory of the nature and
limits of syntactic variation."

See: non-UG-specified ...

So, I guess I have to qualify Norbert's statements stemming from his infatuation with the (L)GB model. We may have pretty good ideas about the laws of grammars regarding some principles, but not (I claim) about those laws/principles that had switches in them (i.e., parameters). The LGB promises in this domain have not been met (sadly). I won't go over the evidence backing up this statement. Take a look at Newmeyer's excellent book (OUP, 2005), my Lingbuzz papers, and references therein.
What I want to do in this post is look at why people hang onto parameters. Here's a list of things I've heard from people (I've discussed some of these in my papers, others are new additions.)

1. The LGB picture was too good to be false.
2. That's the only game in town.
3. The current parametric theory we have is so restrictive, hence so irresistible
4. The current parametric theory makes great typological predictions
5. The worry concerning the exponential growth of parameters is nothing to worry about
6. The current parametric theory has such a nice deductive structure to it
7. Current parameters are no longer embedded within principles, so don't suffer from problems raised by Newmeyer, Boeckx, and others.
8. If we drop parameters, we are back to the land of Tomasello, Piaget, Skinner, Locke, Satan, mother-in-laws: the land of infinite variation!
9. Current parameters are no longer part of the first factor, they are part of the third factor, so everything is fine.

Let's look at each of these briefly (spoiler alert: all of these argument in favor of hanging onto parameters fail miserably)

RE 1. People told me that yes, all the parameters put forth so far have not fared very well, certainly not as well as Chomsky predicted in LGB, but they have not lost the hope to find parameters of the right kind. When I ask them to give me an example, they insist that they are working on it, that what they have in mind would be something more abstract than anything proposed so far. (When they pronounced the word abstract, I think I've seen raised highbrows and maybe a repressed smile.) Since I have not (yet) figured out what such a parameter would look like (other than the ones in LGB, which did not materialize), I can't tell you if it exists.

RE 2 (only game in town). A theory can be demonstrably wrong even if we don't yet have a better theory.

RE 3(restrictiveness). Part of the difficulty here is figuring out what people mean by "current parametric theory" (note: this is not an innocent point). Most of the time, they have the "Chomsky-Borer" conjecture in mind. That's part of the problem: it's a conjecture, not a theory. Luigi Rizzi, for example, put forth the following restrictive definition of parameter:
"A parameter is an instruction for a certain syntactic action expressed as a feature on a lexical item and made operative when the lexical item enters syntax as a head."

Sounds great (and restrictive), except that no one's got the foggiest idea about what counts as a possible feature, lexical item, and head. To be restrictive, the characterization just mentioned out to be embedded within a theory about these things (i.e., a theory of the lexicon). But even my grandmother knows we don't have a theory there (and she tells me she is not very optimistic we'll have one anytime soon).
Incidentally, this is why the LGB picture about parameters was so great: parameters were embedded within a general theory of grammatical principles. If you take parameters out of that context, they look quite bad.

RE 4. Typological predictions. First, let's not confuse Plato's problem and Greenberg's problem. Second, let's listen to Mark Baker. "If it is correct to reduce all macroparameters to a series of relatively independent microparameters in this way, then one would expect to find a relatively smooth continuum of languages. ... [But] if natural human language permits both macroparameters and microparameters. We would expect there to be parametric clusters in something like the classical way. But we'd expect these clusters to be much noisier, as a result of microparametric variation." I understand the logic, but once we accept (as good minimalists should) that principles of syntax are invariant, and (therefore) macroparameters are aggregates of microparameters, Baker's picture is no longer tenable. Macro- and micro-parameters are not two different things.


RE 5. Exponential growth. Kayne writes that "We must of course keep in mind that as we discover finer- and finer-grained syntactic differences (by examining more and more languages and dialects) the number of parameters that we need to postulate, although it will rise, can be expected to rise much more slowly than the number of differences discovered, insofar as n independent binary-valued parameters can cover a space of 2n languages/ dialects (e.g. only 8 such parameters could cover 28 = 256 languages/dialects, etc." (Kayne)

True, but this does not the worry concerning exponential growth of parameters. The worry is about the size of the space the child has to go through to acquire her language. Once parameters are removed from the background of principles (see point 3), what's the structure that will guide the child?


RE 6. Deductive structure. Where does the deductive structure come from? In an impoverished theory like minimalists propose, it must come from some place else. (In addition, when you look at the deductive structure you are supposed to find in parametric proposals, what you find is something very ugly, nothing like the nice parmeter hierarchy in Baker's Atoms of Language, but something like the Tokyo subway map. I'll devote a separate post to this issue, and will report on work done with my student Evelina Leivada, where we discovered more subway maps than we imagined.)

RE 7. Principles/parameters separation. It's true that parameters are no longer embedded inside GB principles. Minimalism got rid of those nice GB principles. But there is another sense in which current parameters are still embedded within principles (sorry if this is confusing, it's not my fault; read the papers!): many minimalists out there are crypto-GB folks: they still hold on to a rich UG, with lots of lexical principles (micro-principles, with micro-parameters in them). That's what provides articifical life to parameters. But if you get rid of these principles, and become a Proper Minimalist (All you need is Merge), then there is no room for parameters within that single Ur-principle.

RE 8. The fear of infinite variation. This is a myth, friends. Don't believe people who tell you that without parameters, there is no limit to variation in language. Hang on to your wallet. Tell them that even if there is a little bit of invariant syntax (say, Merge), then this puts an end to what can vary in grammar.
Don't believe me? Read Chomsky: (note that he does not sound very worried)

"Most of the parameters, maybe all, have to do with the mappings [to the sensory-motor interface]. It might even turn out that there isn't a finite number of parameters, if there are lots of ways of solving this mapping problem"

By the way, this quote is taken from "The science of language" book that G. Pullum did not seem to like. Turns out that the book is not as "useless" as he said it was. Pullum wrote that the book has "essentially no academic value", but well, it looks like I found something of value in it (I always find something of value in what Chomsky writes. You should try too. It's very easy.)


RE 9. First/Third factor. They are not independent. Every good biologist (I've asked my friend Dick Lewontin) will tell you that the first, second, and third factors define one another, you can't separate them so easily (read Dick's "Triple helix"). So, appealing to third factor only, abstracting away from the first, is not fair.

In sum: Why would anyone hang onto parameters? How did this post get so long?

--Cedric Boeckx

Thursday, January 24, 2013

The Gallistel-King Conjecture; part deux


A while ago I wrote about an idea that I dubbed the Gallistel-King conjecture (here).  The nub of their conjecture is that (i) cognition requires something like a Turing-von Neumann architecture to support it (i.e. connectionist style systems won’t serve) and (ii) the physical platform for the kinds of computational mechanisms needed exploits the same molecular structure used to pass information across generations. In other words, DNA, RNA and proteins constitute (part of) the cognitive code in addition to being the chemical realization of the genetic code.  Today, Science Daily reports (here) that the European bioinformatics Institute “have created a way to store data in the form of DNA.” And not just a little, but tons. And not just for an hour but for decades and centuries if not longer.  As Nick Goldman, the lead on the project, says:

We already know that DNA is a robust way to store information because we can extract it from bones of wooly mammoths, which date back tens of thousands of years, and make sense of it. It is also incredibly small and does not need any power for storage, so shipping and keeping it is easy.

G&K observe that these same virtues (viz. stability, energy efficiency and longevity) would be very useful for storing information in brains.  Add to this that DNA has the kind of discrete/digital structure making information thus stored easy to retrieve and appropriate for computation (c.f. G&K 167ff. for a computational demo), and it would seem that so storing information is just what we would expect from a reasonably well designed biological thinking machine.

Let me shout from the rooftops so that it is clear: I AM NO EXPERT IN THESE MATTERS. However, if G&K are right then we need a system with read-write memory to support the kinds of cognition we find in animals.  As the Science Daily article reports: “Reading DNA is fairly straightforward” the problem is that “writing it has until now been a major hurdle to making DNA storage a reality.” There are two problems that Goldman and his colleague Ewan Birney had to solve: (i) “using current methods, it is only possible to manufacture DNA in short strings” and (ii) “both writing and reading DNA are prone to errors, particularly when the same DNA letter is repeated.” What Goldman and Birney did was devise a code “using only short strings of DNA, and do it in such a way that creating a run of the same letter would be impossible.”

Before proceeding, note the similarity of this and the kinds of considerations about optimal coding that we were talking about in earlier posts (here).  This is a good example of the kind of thing I was thinking of concerning efficient coding and computation.  Note how sensitive the relevant considerations are to the problem required to be solved and to the physical context within it which it needs solving.  This is a good example, I would argue, of the kind of efficiency concerns minimalists should be interested in as well.  Ok, so what did they do?

Birney describes it as follows:

So we figured, let’s break up the code into lots of overlapping fragments going in both directions, with indexing information showing where each of the fragments belongs in the overall code, and make a coding scheme that doesn’t allow repeats. That way, you would have to have the same error on four different fragments for it to fail- and that would be very rare.

The upshot, Goldman says is “a code that is error tolerant using a molecular from we know will last in the right conditions for 10,000 years, or possibly longer...As long as someone knows what the code is, you will be able to read it back if you have a machine that can read DNA.”

Talk about long-term memory! And we certainly all embody machines that can read DNA!

Goldman and Birney see this as a great technological breakthrough; good-by hard drives, hello DNA.  However, with a little mental squinting it is not that hard to imagine how this technological breakthrough would be just what G&K would have hoped for.

Science often follows the leading technology of the day, especially in the neuro/physio world. In Descartes day the brain was imagined as a series of interconnected pipes inspired by the intricate fountains on display (c.f. Vaucanson’s duck). Earlier it was clockwork brains. In our day, it was computers of various kinds. Now, biotech may be pointing to a new paradigm.  If humans can code to DNA and retrieve info from it, why shouldn’t brains?  Moreover, wouldn’t it be odd if brains had all this computational power at its disposal but nature never figured out how to use it?  A little like birds having wings but never learning to fly? As I said, I’m no expert, but you gotta wonder…

Wednesday, January 23, 2013

Minimalism and On Wh Movement


The Generative enterprise as Chomsky envisioned it has been based on five separate but related questions:

1.     What does a given native speaker know about his language? Or, what do particular Gs look like?
2.     What makes it possible for native speakers to acquire Gs? Or, what does UG look like?
3.     How did UG arise in the species? Or, what must be added to non-linguistic cognition to get UGs?
4.     How do native speakers use Gs in performance? Or, how are Gs used to produce and comprehend sentences in real time?
5.     How do brains code for G and UG?

These questions, though separate are clearly interconnected. For example, it’s very hard to impossible to consider the properties of UG without knowing how particular Gs are put together.  It’s pointless to engage in questions about how UG could have arisen in the species without knowing anything about UG. It’s hard to study how people use their Gs in the absence of descriptions of the Gs that are used.  And, last, it’s hard to study how brains embody G and UG without knowing anything about brains, Gs or UGs. All of this should be evident.  What is less obvious, but still true, is that even when we know a non-trivial amount about the things we are trying to relate, relating them may be really tough. There are many reasons for this. Let’s consider two.

In the last several posts I considered the relation between (2) and (3) above.  I noted that we have a pretty good description of what UG does, e.g. it regulates movement and construal dependencies, identifies the kinds of phrase structures that are linguistically admissible and where expressions that are displaced can be interpreted both phonetically and semantically. GB is a pretty good effective theory of these relations.  However, it is a problematic fundamental theory. Why? Because it’s hard to see how something with the special purpose predicates and intricate internal structure of GB could have arisen in the species. It’s just too different from what we find in other domains of human cognition and much too elaborately structured. Consequently, it’s entirely opaque how something this apparently complex and cognitively idiosyncratic could have arisen in the species, especially in the (apparently) short time available. This motivates a re-think of GB’s depiction of UG. The Minimalist Program is a research program aimed at reducing GB’s parochialism and intricacy by reanalyzing heretofore language specific operations in more general cognitive terms (e.g. Merge) and unifying conditions on grammatical processes in more generic computational terms (e.g. cyclicity as monotonicity/no tampering). As I’ve suggested in other posts, this strategy recapitulates Chomsky’s in ‘On Wh Movement’ (OWM) and I have suggested that OWM provides a unification strategy worth emulating. What specifically did Chomsky do in OWM?

First, he adopted Ross’s theory as an effective account. Thus, he accepted that that lay of the land described by Ross was roughly correct. Why ‘roughly’? Because, though Chomsky adopted Ross’s descriptions of strong islands, he tweaked the data in light of the details of the unifying Subjacency account. In particular, the relevant island data included Wh islands (pace Ross’s theory, which treated Wh-islands as porous) and was generalized to cover subjects in general (not only sentential subjects as in Ross).  Thus, though OWM largely adopted Ross’s description of the data, it modified it as well. 

Second, Chomsky unified the various islands under the theory of movement by unifying the movement constructions under a ‘Move alpha’ rubric. Whereas in Ross rules were effectively constructions, Chomsky distilled out a movement component common to all the cases subject to the Subjacency Condition (i.e. ‘move alpha’) and proposed that all and only the move alpha operation was subject to the locality requirements eventuating in movement effects when violated.[1] Thus, some constructions were treated as composites; one part move alpha one part specific head with their own particular grammatical contributions (‘criteria’ in Rizzi’s sense). And all that mattered in computing locality was the movement part.  In sum, what makes a construction subject to islands in OWM is that it is composed of a move alpha part. None of its other properties matter.[2]

Chomsky adopted a very strong version of this claim: as noted, all and only move alpha is subject to subjacency restrictions. Arguing for this constitutes the better part of OWM. The most interesting empirical component was reducing various kinds of deletion operations investigated by Bresnan and Grimshaw to constructions mediated by movement, most particularly their analysis of comparatives as deletion operations.[3] At any rate, sitting where we are today, it looks like Chomsky’s reanalysis has won the day and that islands are now taken as virtual diagnostics of a movement dependency.

The third leg of the analysis involved the Subjacency Condition itself. This involved several proposals: one concerning the inventory of bounding nodes (which nodes counted in computing distance), one concerning which nodes had “escape hatches,” aka comp(lementizer)s, and one concerning the fine structure of these comps (in particular how many slots it contained).  In elaborating these in OWM Chomsky noted that movement via escape hatches effectively allowed move alpha to finesse the Specified Subject Condition (SSC) and the Propositional Island Constraint (PIC). These were two proposed universals that were retained in revised form in GB, though in later work move alpha was not thought to be subject to them.[4] At any rate, it is interesting to see what Chomsky took to be the computational implications of Subjacency Theory:

… the island constraints can be explained in terms of general and quite reasonable computational properties of formal grammar (i.e. subjacency, a property of cyclic rules that states, in effect, that transformational rules have a restricted domain of potential application; SSC, which states that only the most prominent phrase in an embedded structure is accessible to rules relating it to phrases outside; PIC, which stipulates that clauses are islands subject to the language specific escape hatch..). If this conclusion can be sustained, it will be a significant result, since such conditions as CNPC and the independent wh-island constraint seem very curious and difficult to explain on other grounds. [my emphasis] (p. 89; OWM).

Note what we have here: an attempt to motivate the particular witnessed properties of the theory of bounding on more general computational grounds. This should sound very familiar to minimalist ears- restricted computational domains, prominent targets of operations (think labels in place of subjects) etc.  As we know, theories embedding subjacency were subsequently built into interesting proposals with non-trivial consequences for efficient parsing (e.g. Marcus, Berwick and Weinberg). 

The upshot? OWM provides a good model for minimalist unification ambitions.  Extract out a common core operation in seemingly disparate “constructions” and propose general conditions on the applications of this common rule to unify the indicated dependencies. Minimalism has already taken steps in this direction. For example, case theory was unified with movement theory in early minimalism (e.g. Chomsky 1993) by having case discharged in the specifier positions of relevant heads.  Phrase structure theory has been unified with movement theory by treating both as products of Merge (E-merge and I-merge being instances of the same operation as applied to different inputs). More ambitiously (stupidly?) still, some (e.g. yours truly, Boeckx, Nunes, Idsardi, Lidz, Kayne, Zwart, Polinsky, Potsdam) have proposed treating construal rules like Control, reflexivization, and pronominal binding as species of movement hence unifying all of these operations with Merge. 

These last mentioned proposals require abandoning two key features of GB’s view of movement: (i) that movement into thematic positions is barredand (ii) all movement result in phonetic gaps at launch sites. [5]  (i) is a plausible consequence of removing D-structure as a “level” and (ii) constitutes a return to earlier versions of Generative Grammar in which transformations included the addition of some designated lexical material (e.g. there, reflexives, bound pronouns, etc.). It is very very unclear at this moment whether this whole range of unifications is possible.[6] I personally find the data uncovered and the arguments made to be highly suggestive and the empirical hurdles to be less daunting than they appear.  However, this is a personal judgment, not one shared by the field as a whole (sadly).  That said, I would argue that extending the OWM reasoning to other parts of the grammar to unify the modules is the right thing to try if one hopes to address Darwin’s problem in (3) above. Why? Because if this kind of unification succeeded, grammars with the phenomenological properties of GB’s version of UG would follow from the addition of Merge to the cognitive repertoire of our apish ancestors.  In other words, we would have isolated a single change sufficient to allow the development of an FL with GB observable properties. Such a theory could be considered fundamental.

Interestingly, such a theory would not only plausibly provide an answer to Darwin’s problem (in (3)), it would also provide one for Boraca’s Problem (in (5)).  Indeed, this post was intended to address this point here (as the title indicates), but as I’ve rambled on long enough, let me delay addressing how Minimalism relates to question (5) until the next post.


[1] Please note: This is a little bit of Whig history and it’s not really fair to Ross. Ross suggested that all island sensitive constructions involved a specific rule of chopping, an operation that deleted the resumptive pronoun that was part of the constructions of interest. Interestingly, the intuition behind this part of Ross’s analysis has enjoyed somewhat of a revival in recent work (see Merchant and Lasnik in particular), which traces island effects as PF phenomena rather than restrictions on derivational operations as Chomsky originally proposed.
[2] This is a fact worth savoring. A priori there is nothing odd about having the specific construction determine locality rather than the created move alpha dependency.  There is nothing contradictory in assuming, e.g. that Focus and question formation would obey islands but that Topicalizations, comparatives and relativizations would not.  But this does not seem to be the way things panned out.
[3] The other really interesting consequence of Subjaency Theory was the implication that all movement was successive cyclic. This prediction was confirmed in work by Kayne and Pollock, Sportiche, and Torrego a.o.  In my opinion, this is still one of the nicest “predictions” any theory in syntax has ever made.
[4] The reason is that in OWM A’-traces were treated as anaphoric elements. This was revised in later work due to some empirical findings due to Lasnik.
[5] To be honest, this is my reconstruction of their results. Kayne and Zwart, for example, retain the theta criterion by assuming that there is a lot more doubling than meets the eye.
[6] Indeed, I believe (actually I think I know this, but let me be coy) that Chomsky is very skeptical about the unification of construal with movement (with the possible exception of reflexivization).  This may account for why binding, when discussed, is suggested to be a CI interface operation fed by the grammar rather than a direct product of the grammar as in all earlier generative accounts.  

Monday, January 21, 2013

Parametric Variation and Darwin’s Problem

What follows is Neil Smith's first post. I am delighted that he agreed to post here.


          Parametric Variation (PV) has been under attack.  Despite Berwick & Chomsky’s (2008:8) assertion that: “... principles do not determine the answers to all questions about language, but leave some questions as open parameters”, there has been widespread reluctance to accept the existence of parameters as opposed to particular rules (see e.g. Newmeyer 2005, Boeckx 2010). A vigorous counter-movement (see e.g. the special issue of Theoretical Linguistics 36.1, devoted to Holmberg 2010) shows that no consensus has been reached. There is a parallel with human types where the apparent obvious diversity of different ‘races’ disguises profound underlying unity, and specifying the nature of the variation is difficult. It is against this background that I want to raise the issue of the evolvability of PV: how plausible is the claim that PV has evolved, rather than the sceptics’ claim that aggregations of unconnected rules have developed? If it could be demonstrated that PV could not have evolved this would constitute a powerful argument in favour of such scepticism. Providing a sketch of how PV might have evolved doesn’t, of course, show that it did, but it would remove one brick from the sceptics’ edifice.

          PV theory unifies two different domains: typology and acquisition.  Variation among the world’s languages (more accurately the set of internalised I-languages, Chomsky 1986) is defined in terms of parametric differences and, in first language acquisition the child’s task is reduced to setting the values of such parameters on the basis of the stimuli it is exposed to – utterances in the ambient language. Given the strikingly uniform success of first language acquisition, it follows that “the set of possibilities [must] be narrow in range and easily attained by the first language learner” (Smith 2004:83). By hypothesis, the principles do not vary from child to child or from language to language so, as Chomsky (2006:183) puts it, “acquisition is a matter of parameter setting, and is therefore divorced entirely from … the principles of UG.”

          The theory is at once ‘internalist’ (i.e. it is a theory of states of the mind/brain, pertaining to knowledge which is largely unconscious), and universalist. An immediate implication of this position is that the range of parametric choices is known in advance and, as a corollary, it claims that acquisition is largely a process of ‘selection’ rather than instruction (see Piattelli-Palmarini 1989) and that such acquisition is likely to take place in a critical period or periods. For some, “all syntactic variation is parametric” (Kayne, 2005; Manzini & Savoia, 2007; Roberts & Holmberg, 2010, etc.), but this is undesirable because too unconstrained. If not all variation is parametric, the responsibility of parametric theory is reduced and we need to redefine the  nature and scope of ‘parametric’ variation.  I presuppose a 3-way distinction among Universal Principles, Parameters, and Accidents – typified by irregularities of morphology. Thus we have not only a distinction between Principles and Parameters but also Constraints on Parameters to differentiate parametric choices from ‘Accidents’. That is, we need identity criteria for parameters (see Smith & Law 2009 for discussion). 

          Further, theoretical consistency demands that we have the same criteria for syntax and phonology and (if there are relevant examples) for semantic choices at the C-I interface (cf. Chierchia, 1998) and (if definable) for systematic morphological variation. For current purposes I am happy to go along with Chomsky’s observation that “parametrization and diversity too would be mostly maybe entirely restricted to externalization,” (Chomsky 2010:60; to “language shorn of the properties of the sound system” as Smith 2004:43, puts it) hence mainly morphology and phonology. One reason for the multiplicity of languages is then that “the problem of externalization can be solved in many different and independent ways,” (Chomsky, 2010:61) where moreover these may all be ‘optimal’ in different ways. I should note in passing that I am unconvinced by such claims asTo externalize the internally generated expression ‘what John is eating what’, it would be necessary to pronounce ‘what’ twice, and that turns out to place a very considerable burden on computation” (Berwick & Chomsky 2008:11). The burden seems slight, especially given that in first language acquisition children regularly repeat material ‘unnecessarily’ (see e.g. Crain & Pietroski’s 2002: What do you think what pigs eat?).

So, is PV evolvable? One stance it is tempting to adopt is that evolvability[1] is simply another criterion characterising PV. In the absence (in the domain of language) of some independent characterisation of evolvability, however, such a claim would be circular. Indeed, like Fodor & Piattelli-Palmarini (2010:159) I am sceptical that there is any level of evolutionary explanation: “Natural history is just one damned thing after another”, in their colourful terms. That is, there is no unified theory accounting for the fixation of phenotypic traits in natural populations, and there is equally no unified theory accounting for language typology, as witness the ‘accidents’ mentioned above. Evolution is not explicable in terms of a single theory. Even so, variation emerged so there had better be an account. More positively, it seems uncontentious that, as Bennett (2010:31) puts it, “macroevolution may, over the longer term, be driven largely by internally generated genetic change, not adaptation to a changing environment”, that “mutations occur continually, without external influence”, and that “a single small change can have far-reaching and unpredictable effects”. Parallels with language are obvious, though I would not wish to deny the relevance of ‘external influence’.

There are three strands in the alternative account I wish to provide.  First, there is evidence for genetically determined individual variation in knowledge and use of language, including not only pathological but also typically developing populations.  Second, it is plausible to postulate that language learners find different solutions to the problem of externalisation more or less ‘accessible’.  Third, the linguistic typology we see today is asymmetric, with only a subset of the logically possible languages being attested: the observation which initially made (macro-)parameters plausible. But the typology is not as skewed as predicted: too many of the logical possibilities actually are attested, an observation which makes some, conceivably all, such parameters suspect. The classic example is the fractionation of the ‘pro-drop parameter’ when it was observed that the individual sub-parts dissociated. Nonetheless, I think that this imperfect typology is more robust than is standardly accepted, and probably arose as the result of a learning strategy that is still operative in first language acquisition.  If correct, this suggests that rejecting PV on these grounds would be premature. 

Assuming a stage in hominid evolution by which language has appeared, we need to motivate the development in subsequent generations of PV. By hypothesis, the first language to evolve had no PV. That is, the existence, and hence emergence, of PV presupposes the prior evolution of language. Berwick & Chomsky (2008) argue that the source of PV is externalisation and that there is “no reason to suppose that solving the externalisation problem involved an evolutionary change – that is, genomic change”. This claim is not, however, apodictic: the switch-box metaphor suggests that the range of parametric variants is ‘antecedently known’, and implicit in ‘antecedently known’ is genetic control, and in the same paper they talk of “the evolution ... of the mode of externalisation”. We need a more nuanced account to resolve the tension caused by the apparent inconsistency: solving the problem of externalisation probably did not ‘involve’ an evolutionary change in the sense of being caused by such change, but the solution may well have given rise to a situation in which evolutionary, and genomic, change could occur.

In what follows I suggest how in principle PV might have become genetically encoded and, in doing so, explore the need for a transition in looking at PV from a stance where typological variation is primary and first language acquisition secondary, to one (now) where the primary motivation is acquisition and accounting for typological variation is just a useful bonus.  To see how plausible this transition is we turn to the first of the three strands mentioned above, beginning with the status of genetic factors in the knowledge and use of language. It is presumably uncontroversial that absolute (i.e. universal) principles such as structure-dependence must have become genetically encoded with no individual variation.

In general, genetic and neuro-imaging studies have provided cogent evidence only for the aetiology of pathological as opposed to typical development (see e.g. Stromswold, 2010:181; O’Connor et al, 1994). However, it is known on the basis of twin studies (Stromswold, 2010) that different aspects of our linguistic knowledge and its use are subject to genetically determined variation. That is, linguistic variation, which may be due to genetic or environmental factors, is pervasive among individuals in a population, even between siblings, but monozygotic (MZ, or ‘identical’) twins are linguistically more similar to each other than are dizygotic (DZ, or ‘fraternal’) twins, implying that genetic factors are responsible for this variance. Consider some examples from Stromswold (2010). In phonology “two-thirds of the variance in phonemic awareness, and 15 percent of the variance in articulation is due to heritable factors” (p.181); “for over 90 percent of the morphosyntactic tests, the MZ correlation coefficient was larger than the DZ correlation coefficient”; “about a third of the variance in vocabulary” is due to heritable factors. Strikingly, except for the onset of babbling, there is considerable genetic overlap[2] in the emergence of linguistic milestones (e.g. the onset of first words and sentences), and even fluency of language use is subject to genetic conditioning.

All this variation falls on a continuum which crucially includes typical development, suggesting that the emergence of individual linguistic differences, both random and principled, is ubiquitous. More importantly it suggests how cross-linguistic variation might emerge, perhaps ultimately becoming genetically conditioned.

A possible scenario, involving the second strand mentioned above, might be that choices that are now parametric were originally the result of randomly induced rules.  Some choices of rule then clustered, presumably because of perceived structural parallelism, leading to a situation where some of these ‘clustered’ choices were more accessible than others and therefore came to predominate over time. The increased frequency of these patterns facilitated learning and ultimately gave rise to parametrisation. For instance, some properties that are now described as parts of a single macro-parameter (and that therefore participate in parametric cascades) might have developed from alternatives ranked in accessibility according to their conceptual salience or facilitatory effect on processing, where this could vary according to viewpoint – parser or producer. As a simplistic example, the possibility of null subjects is differentially advantageous to the speaker, for whom it constitutes an economical saving of effort, and to the hearer, for whom it constitutes an interpretive problem.

To make the proposal clearer, I first present an abstract schema and then illustrate it with real examples.  Learners might associate property X with property Y or with property Z. Such association in the mind of the child could lead to short-cuts in the learning process: learner A might jump to the conclusion C1 that observing X licensed the conclusion that Y, learner B might jump to the conclusion C2 that observing X licensed the conclusion that Z.  In each case the provisional conclusion would result in a difference in knowledge which itself could result in a skewing of the primary linguistic data for the next generation.  Such skewing would in turn make C1 or C2 more accessible. In an environment gradually enriched by the presence of more and more speakers, both patterns could become equally accessible to some populations with the result that arbitrary or random choice (or no choice) might ensue. The next, crucial, stage is whether the learners’ assumptions become consolidated as parametric alternatives or remain, as they originated, as learning-theoretic strategies. The latter possibility finesses, or at least postpones, the problem of how a subset of linguistic alternatives become genetically encoded – but suggests that ‘antecedently known’ may have a learning-theoretic not a linguistic-theoretic domain of application. 

Working in an early parametric framework, but one which suffices for exegetical purposes, Lefebvre (1998:355ff.) discusses a number of correlations between putative parametric choices, where the existence of property A licenses the assumption that the language exhibits property B or property C, leading potentially to the existence of macro-parametric variation. Thus, the presence of serial verb constructions correlates with either the absence of prepositions or the absence of inflectional morphology; the presence of double object constructions coincides with either the presence of preposition stranding or with the direction of theta-role assignment. Lefebvre then observes that all of these correlations fail, either because there are attested exceptions to them or because they are theoretically unmotivated.  This initially disappointing fact has one redeeming feature: it suggests that one might entertain an interpretation on which modes of externalisation were originally unconstrained, and that systematic correlations among linguistic properties might have emerged gradually, giving rise to a relatively constrained set of systems. Crucially the correlations need not emerge as absolutes but as strong tendencies forming the basis for a learning strategy.

A simpler illustration is provided by the case of head-direction. For the child, as for the linguist, the underlying assumption is that sub-parts of the grammar, including head-directionality, are ‘harmonic’ or ‘consistent’. The child acquiring its first language observes that verbs precede their objects and then jumps to the conclusion, on the basis of minimal – or even no – further examples, that its language is head-first.  If the language is consistent, nothing more needs to be said, or acquired; if it is inconsistent, then positive evidence will lead to traditional learning modifying the parameter setting. That is, the parametric story predicts the need for learning in certain situations and hence, importantly, also predicts particular error patterns (over-generalisation to the harmonic) in such situations. If there is positive evidence of the kind mentioned it follows that PV is not conceptually necessary, though it may nonetheless be empirically correct.

A corollary of this view is that if such asymmetric choices arose in individuals and then groups of individuals, it would eventuate in typological asymmetry of the well-known kind. That is, greater accessibility would lead not to the kind of absolutes characteristic of standard accounts of PV but to statistical asymmetry in the incidence of harmonic combinations. This more realistic result emerges from invoking the role of learning theory rather than explaining everything in terms of knowledge of language.

Just as I insisted that the criteria for PV extend to phonology as well as syntax, I require the same for evolvability. A phonological example of possible emergent parametrisation can be seen in the differential treatment of post-consonantal glides discussed in Smith & Law 2009. Yip (2003:804) argued that some speakers treat a post-consonantal glide as a secondary articulation of the consonant, others as a segment in its own right:  “the rightful home of /y/ [is] underdetermined by the usual data, leaving room for variation.”  Her conclusion was that “speakers opt for different structures in the absence of conclusive evidence for either.” We used Yip’s observation to argue that this did not exemplify PV because it was not deterministic, but it may still represent a step along the path to parametrisation.  If one pattern becomes more frequent, perhaps at random, it may gradually become consolidated as a parameter.

So I think it is clear that, and even how, PV could have evolved – a view defended in more detail by Roberts & Holmberg 2010 – and conclude with a summary list of possible stages in parametric evolution.

a.         Random variation gives rise to choice for the next generation, resulting in
incipient typology.
b.         First language learners find the solutions to the problem of externalisation provided by this choice more or less ‘accessible’.
c.         Hence the linguistic typology became asymmetric (as we see today), with
            only a subset of the logically possible languages being attested.

d.         Current evidence for genetically determined individual variation in knowledge
and use of language in all populations suggests genomic change. 

e.         The asymmetric  typology in (c) is more revealing than is standardly accepted, and
probably arose as the result of a learning strategy that is still operative in first language
acquisition. 

f.          The putative genetic encoding of PV suggests there was a transition from typological variation to acquisition, where the plausibility of the transition depends on the status of genetic factors in the knowledge and use of language.

g.         Universal principles such as structure-dependence must have become genetically encoded
with no individual variation. So such encoding is in principle possible.

h.         Twin studies show that different aspects of our linguistic knowledge and its use are
subject to genetically determined variation. 

i.          All this variation falls on a continuum which includes typical development, suggesting
that the emergence of individual linguistic differences is ubiquitous and that cross-
linguistic variation might become genetically conditioned.

j.          The child assumes that sub-parts of the grammar are ‘harmonic’ or ‘consistent’. Hence:
            some choices of rule cluster because of perceived structural parallelism, leading to
short-cuts in the learning process. Some such clusters became more accessible than others
so came to predominate. Increased frequency of these patterns facilitated learning and
gave rise to incipient parametrisation depending on the conceptual salience or facilitatory
effect on processing – either parsing or production of the patterns.

k.         The data are not always clear-cut and learners may associate property X with property Y
(C1) or with property Z (C2). Each provisional conclusion results in a skewing of the
primary linguistic data for the next generation. 

l.          Such skewing makes C1 or C2 more accessible. In an environment gradually enriched by
more and more speakers, both patterns could become equally accessible with the result that arbitrary or random choice (or no choice) might ensue.

m.        Such asymmetric choices by (groups of) individuals would eventuate in typological
asymmetry. Greater accessibility would lead not to the absolutes characteristic of
standard accounts of PV but to statistical asymmetry in the incidence of harmonic combinations. This result emerges from invoking learning theory rather than relying exclusively on knowledge of language.

n.         A crucial issue is whether the learners’ assumptions become consolidated as parametric
alternatives or remain, as they originated, as learning-theoretic strategies. The latter
possibility suggests that ‘antecedently known’ may have a learning-theoretic not a
linguistic-theoretic domain of application:  a ‘third factor’ effect.

o.         The current parametric story predicts the need for learning in certain situations and hence
also predicts particular error patterns (over-generalisation to the harmonic). If there is
positive evidence it follows that PV is not conceptually necessary, but it may nonetheless
be empirically correct.


References    

Bennett, Keith. 2010. “The chaos theory of evolution” New Scientist 208:2782, pp.28-31.
Berwick, Robert & Noam Chomsky. 2008. The Biolinguistic Program: The Current State of its Evolution and Development. Forthcoming in Anna Maria Di Sciullo & Calixto Aguero (eds.), Biolinguistic Investigations. Cambridge, MA: MIT Press.
Boeckx, Cedric. 2010. What principles and parameters got wrong.  http://ling.auf.net/lingBuzz/001118.
Chierchia, Gennaro. 1998. Plurality of Mass Nouns and the Notion of ‘Semantic Parameter’. In Susan Rothstein (ed.), Events and Grammar, 53-103. Dordrecht: Kluwer.
Chomsky, Noam. 1986. Knowledge of Language: Its Nature, Origin and Use. New York: Praeger.
Chomsky, Noam. 2006. Language and Mind (3rd edn.). Cambridge: Cambridge University Press.
Chomsky, Noam. 2010. Some simple evo devo theses: how true might they be for language? In Richard Larson, Viviane Déprez & Hiroko Yamakido (eds.) The Evolution of Human Language:  Biolinguistic Perspectives. Cambridge: CUP, pp. 45-62.
Crain, Stephen & Paul Pietroski. 2002. Why language acquisition is a snap. The Linguistic Review 19, 163-183.
Fodor, Jerry A. & Massimo Piattelli-Palmarini. 2010.  What Darwin got Wrong. London: Profile Books.
Holmberg, Anders. 2010. Parameters in Minimalist theory: The Case of Scandinavian. Theoretical Linguistics 36.1:1-48.
Kayne, Richard S. 2005. Some notes on comparative syntax, with special reference to English and French. In Guglielmo Cinque & Richard S. Kayne (eds.), The Oxford Handbook of Comparative Syntax, 3-69. Oxford: Oxford University Press.
Lefebvre, Claire. 1998. Creole Genesis and the Acquisition of Grammar: The Case of Haitian Creole. Cambridge: Cambridge University Press.
Manzini, M. Rita & Leonardo M. Savoia. 2007. A Unification of Morphology and Syntax. London: Routledge.
Newmeyer, Frederick J. 2005. Possible and Probable Languages: A Generative Perspective on Linguistic Typology. Oxford: Oxford University Press.
Neil O'Connor, Neil, Neil Smith, Chris Frith & Ianthi Tsimpli (1994) "Neuropsychology
            and linguistic talent".  Journal of Neurolinguistics 8:95-107. 
Piattelli-Palmarini, Massimo. 1989. Evolution, selection and cognition: from learning to parameter setting in biology and in the study of language. Cognition 31, 1–44.
Roberts, Ian & Anders Holmberg (2010) “Introduction: Parameters in minimalist theory”. In Theresa Biberauer, Anders Holmberg, Ian Roberts & Michelle Sheehan (eds) Parametric Variation: Null Subjects in Minimalist Theory. Cambridge, Cambridge University Press; pp. 1-57.
Smith, Neil. 2004. Chomsky: Ideas and Ideals (2nd edn.). Cambridge: Cambridge University Press.
Smith, Neil & Ann Law (2009) “On parametric (and non-parametric) variation”.
Biolinguistics 3:332-343.
Stromswold, Karin. 2010. Genetics and the evolution of language: What genetic studies reveal about the evolution of language. In Richard Larson, Viviane Déprez & Hiroko Yamakido (eds.) The Evolution of Human Language:  Biolinguistic Perspectives. Cambridge: CUP, pp.176-190.
Wagner, Andreas. 2007. Robustness and Evolvability in Living Systems. Princeton University Press.
Yip, Moira. 2003. Casting doubt on the Onset-Rime distinction. Lingua 113, 779-816.



[1] Defined as the ability of a population of organisms to generate genetic diversity and evolve. For discussion see e.g. Wagner (2007).
[2] That is, genetic variation that predisposes the bearer to manifest several phenotypic traits.