Faculty of Language: Parametric Variation and Darwin’s Problem

What follows is Neil Smith's first post. I am delighted that he agreed to post here.

Parametric Variation (PV) has been under attack. Despite Berwick & Chomsky’s (2008:8) assertion that: “... principles do not determine the answers to all questions about language, but leave some questions as open parameters”, there has been widespread reluctance to accept the existence of parameters as opposed to particular rules (see e.g. Newmeyer 2005, Boeckx 2010). A vigorous counter-movement (see e.g. the special issue of Theoretical Linguistics 36.1, devoted to Holmberg 2010) shows that no consensus has been reached. There is a parallel with human types where the apparent obvious diversity of different ‘races’ disguises profound underlying unity, and specifying the nature of the variation is difficult. It is against this background that I want to raise the issue of the evolvability of PV: how plausible is the claim that PV has evolved, rather than the sceptics’ claim that aggregations of unconnected rules have developed? If it could be demonstrated that PV could not have evolved this would constitute a powerful argument in favour of such scepticism. Providing a sketch of how PV might have evolved doesn’t, of course, show that it did, but it would remove one brick from the sceptics’ edifice.

PV theory unifies two different domains: typology and acquisition. Variation among the world’s languages (more accurately the set of internalised I-languages, Chomsky 1986) is defined in terms of parametric differences and, in first language acquisition the child’s task is reduced to setting the values of such parameters on the basis of the stimuli it is exposed to – utterances in the ambient language. Given the strikingly uniform success of first language acquisition, it follows that “the set of possibilities [must] be narrow in range and easily attained by the first language learner” (Smith 2004:83). By hypothesis, the principles do not vary from child to child or from language to language so, as Chomsky (2006:183) puts it, “acquisition is a matter of parameter setting, and is therefore divorced entirely from … the principles of UG.”

The theory is at once ‘internalist’ (i.e. it is a theory of states of the mind/brain, pertaining to knowledge which is largely unconscious), and universalist. An immediate implication of this position is that the range of parametric choices is known in advance and, as a corollary, it claims that acquisition is largely a process of ‘selection’ rather than instruction (see Piattelli-Palmarini 1989) and that such acquisition is likely to take place in a critical period or periods. For some, “all syntactic variation is parametric” (Kayne, 2005; Manzini & Savoia, 2007; Roberts & Holmberg, 2010, etc.), but this is undesirable because too unconstrained. If not all variation is parametric, the responsibility of parametric theory is reduced and we need to redefine the nature and scope of ‘parametric’ variation. I presuppose a 3-way distinction among Universal Principles, Parameters, and Accidents – typified by irregularities of morphology. Thus we have not only a distinction between Principles and Parameters but also Constraints on Parameters to differentiate parametric choices from ‘Accidents’. That is, we need identity criteria for parameters (see Smith & Law 2009 for discussion).

Further, theoretical consistency demands that we have the same criteria for syntax and phonology and (if there are relevant examples) for semantic choices at the C-I interface (cf. Chierchia, 1998) and (if definable) for systematic morphological variation. For current purposes I am happy to go along with Chomsky’s observation that “parametrization and diversity too would be mostly – maybe entirely – restricted to externalization,” (Chomsky 2010:60; to “language shorn of the properties of the sound system” as Smith 2004:43, puts it) hence mainly morphology and phonology. One reason for the multiplicity of languages is then that “the problem of externalization can be solved in many different and independent ways,” (Chomsky, 2010:61) where moreover these may all be ‘optimal’ in different ways. I should note in passing that I am unconvinced by such claims as “To externalize the internally generated expression ‘what John is eating what’, it would be necessary to pronounce ‘what’ twice, and that turns out to place a very considerable burden on computation” (Berwick & Chomsky 2008:11). The burden seems slight, especially given that in first language acquisition children regularly repeat material ‘unnecessarily’ (see e.g. Crain & Pietroski’s 2002: ‘What do you think what pigs eat?’).

So, is PV evolvable? One stance it is tempting to adopt is that evolvability[1] is simply another criterion characterising PV. In the absence (in the domain of language) of some independent characterisation of evolvability, however, such a claim would be circular. Indeed, like Fodor & Piattelli-Palmarini (2010:159) I am sceptical that there is any level of evolutionary explanation: “Natural history is just one damned thing after another”, in their colourful terms. That is, there is no unified theory accounting for the fixation of phenotypic traits in natural populations, and there is equally no unified theory accounting for language typology, as witness the ‘accidents’ mentioned above. Evolution is not explicable in terms of a single theory. Even so, variation emerged so there had better be an account. More positively, it seems uncontentious that, as Bennett (2010:31) puts it, “macroevolution may, over the longer term, be driven largely by internally generated genetic change, not adaptation to a changing environment”, that “mutations occur continually, without external influence”, and that “a single small change can have far-reaching and unpredictable effects”. Parallels with language are obvious, though I would not wish to deny the relevance of ‘external influence’.

There are three strands in the alternative account I wish to provide. First, there is evidence for genetically determined individual variation in knowledge and use of language, including not only pathological but also typically developing populations. Second, it is plausible to postulate that language learners find different solutions to the problem of externalisation more or less ‘accessible’. Third, the linguistic typology we see today is asymmetric, with only a subset of the logically possible languages being attested: the observation which initially made (macro-)parameters plausible. But the typology is not as skewed as predicted: too many of the logical possibilities actually are attested, an observation which makes some, conceivably all, such parameters suspect. The classic example is the fractionation of the ‘pro-drop parameter’ when it was observed that the individual sub-parts dissociated. Nonetheless, I think that this imperfect typology is more robust than is standardly accepted, and probably arose as the result of a learning strategy that is still operative in first language acquisition. If correct, this suggests that rejecting PV on these grounds would be premature.

Assuming a stage in hominid evolution by which language has appeared, we need to motivate the development in subsequent generations of PV. By hypothesis, the first language to evolve had no PV. That is, the existence, and hence emergence, of PV presupposes the prior evolution of language. Berwick & Chomsky (2008) argue that the source of PV is externalisation and that there is “no reason to suppose that solving the externalisation problem involved an evolutionary change – that is, genomic change”. This claim is not, however, apodictic: the switch-box metaphor suggests that the range of parametric variants is ‘antecedently known’, and implicit in ‘antecedently known’ is genetic control, and in the same paper they talk of “the evolution ... of the mode of externalisation”. We need a more nuanced account to resolve the tension caused by the apparent inconsistency: solving the problem of externalisation probably did not ‘involve’ an evolutionary change in the sense of being caused by such change, but the solution may well have given rise to a situation in which evolutionary, and genomic, change could occur.

In what follows I suggest how in principle PV might have become genetically encoded and, in doing so, explore the need for a transition in looking at PV from a stance where typological variation is primary and first language acquisition secondary, to one (now) where the primary motivation is acquisition and accounting for typological variation is just a useful bonus. To see how plausible this transition is we turn to the first of the three strands mentioned above, beginning with the status of genetic factors in the knowledge and use of language. It is presumably uncontroversial that absolute (i.e. universal) principles such as structure-dependence must have become genetically encoded with no individual variation.

In general, genetic and neuro-imaging studies have provided cogent evidence only for the aetiology of pathological as opposed to typical development (see e.g. Stromswold, 2010:181; O’Connor et al, 1994). However, it is known on the basis of twin studies (Stromswold, 2010) that different aspects of our linguistic knowledge and its use are subject to genetically determined variation. That is, linguistic variation, which may be due to genetic or environmental factors, is pervasive among individuals in a population, even between siblings, but monozygotic (MZ, or ‘identical’) twins are linguistically more similar to each other than are dizygotic (DZ, or ‘fraternal’) twins, implying that genetic factors are responsible for this variance. Consider some examples from Stromswold (2010). In phonology “two-thirds of the variance in phonemic awareness, and 15 percent of the variance in articulation is due to heritable factors” (p.181); “for over 90 percent of the morphosyntactic tests, the MZ correlation coefficient was larger than the DZ correlation coefficient”; “about a third of the variance in vocabulary” is due to heritable factors. Strikingly, except for the onset of babbling, there is considerable genetic overlap[2] in the emergence of linguistic milestones (e.g. the onset of first words and sentences), and even fluency of language use is subject to genetic conditioning.

All this variation falls on a continuum which crucially includes typical development, suggesting that the emergence of individual linguistic differences, both random and principled, is ubiquitous. More importantly it suggests how cross-linguistic variation might emerge, perhaps ultimately becoming genetically conditioned.

A possible scenario, involving the second strand mentioned above, might be that choices that are now parametric were originally the result of randomly induced rules. Some choices of rule then clustered, presumably because of perceived structural parallelism, leading to a situation where some of these ‘clustered’ choices were more accessible than others and therefore came to predominate over time. The increased frequency of these patterns facilitated learning and ultimately gave rise to parametrisation. For instance, some properties that are now described as parts of a single macro-parameter (and that therefore participate in parametric cascades) might have developed from alternatives ranked in accessibility according to their conceptual salience or facilitatory effect on processing, where this could vary according to viewpoint – parser or producer. As a simplistic example, the possibility of null subjects is differentially advantageous to the speaker, for whom it constitutes an economical saving of effort, and to the hearer, for whom it constitutes an interpretive problem.

To make the proposal clearer, I first present an abstract schema and then illustrate it with real examples. Learners might associate property X with property Y or with property Z. Such association in the mind of the child could lead to short-cuts in the learning process: learner A might jump to the conclusion C₁ that observing X licensed the conclusion that Y, learner B might jump to the conclusion C₂ that observing X licensed the conclusion that Z. In each case the provisional conclusion would result in a difference in knowledge which itself could result in a skewing of the primary linguistic data for the next generation. Such skewing would in turn make C₁ or C₂ more accessible. In an environment gradually enriched by the presence of more and more speakers, both patterns could become equally accessible to some populations with the result that arbitrary or random choice (or no choice) might ensue. The next, crucial, stage is whether the learners’ assumptions become consolidated as parametric alternatives or remain, as they originated, as learning-theoretic strategies. The latter possibility finesses, or at least postpones, the problem of how a subset of linguistic alternatives become genetically encoded – but suggests that ‘antecedently known’ may have a learning-theoretic not a linguistic-theoretic domain of application.

Working in an early parametric framework, but one which suffices for exegetical purposes, Lefebvre (1998:355ff.) discusses a number of correlations between putative parametric choices, where the existence of property A licenses the assumption that the language exhibits property B or property C, leading potentially to the existence of macro-parametric variation. Thus, the presence of serial verb constructions correlates with either the absence of prepositions or the absence of inflectional morphology; the presence of double object constructions coincides with either the presence of preposition stranding or with the direction of theta-role assignment. Lefebvre then observes that all of these correlations fail, either because there are attested exceptions to them or because they are theoretically unmotivated. This initially disappointing fact has one redeeming feature: it suggests that one might entertain an interpretation on which modes of externalisation were originally unconstrained, and that systematic correlations among linguistic properties might have emerged gradually, giving rise to a relatively constrained set of systems. Crucially the correlations need not emerge as absolutes but as strong tendencies forming the basis for a learning strategy.

A simpler illustration is provided by the case of head-direction. For the child, as for the linguist, the underlying assumption is that sub-parts of the grammar, including head-directionality, are ‘harmonic’ or ‘consistent’. The child acquiring its first language observes that verbs precede their objects and then jumps to the conclusion, on the basis of minimal – or even no – further examples, that its language is head-first. If the language is consistent, nothing more needs to be said, or acquired; if it is inconsistent, then positive evidence will lead to traditional learning modifying the parameter setting. That is, the parametric story predicts the need for learning in certain situations and hence, importantly, also predicts particular error patterns (over-generalisation to the harmonic) in such situations. If there is positive evidence of the kind mentioned it follows that PV is not conceptually necessary, though it may nonetheless be empirically correct.

A corollary of this view is that if such asymmetric choices arose in individuals and then groups of individuals, it would eventuate in typological asymmetry of the well-known kind. That is, greater accessibility would lead not to the kind of absolutes characteristic of standard accounts of PV but to statistical asymmetry in the incidence of harmonic combinations. This more realistic result emerges from invoking the role of learning theory rather than explaining everything in terms of knowledge of language.

Just as I insisted that the criteria for PV extend to phonology as well as syntax, I require the same for evolvability. A phonological example of possible emergent parametrisation can be seen in the differential treatment of post-consonantal glides discussed in Smith & Law 2009. Yip (2003:804) argued that some speakers treat a post-consonantal glide as a secondary articulation of the consonant, others as a segment in its own right: “the rightful home of /y/ [is] underdetermined by the usual data, leaving room for variation.” Her conclusion was that “speakers opt for different structures in the absence of conclusive evidence for either.” We used Yip’s observation to argue that this did not exemplify PV because it was not deterministic, but it may still represent a step along the path to parametrisation. If one pattern becomes more frequent, perhaps at random, it may gradually become consolidated as a parameter.

So I think it is clear that, and even how, PV could have evolved – a view defended in more detail by Roberts & Holmberg 2010 – and conclude with a summary list of possible stages in parametric evolution.

a. Random variation gives rise to choice for the next generation, resulting in

incipient typology.

b. First language learners find the solutions to the problem of externalisation provided by this choice more or less ‘accessible’.

c. Hence the linguistic typology became asymmetric (as we see today), with

only a subset of the logically possible languages being attested.

d. Current evidence for genetically determined individual variation in knowledge

and use of language in all populations suggests genomic change.

e. The asymmetric typology in (c) is more revealing than is standardly accepted, and

probably arose as the result of a learning strategy that is still operative in first language

acquisition.

f. The putative genetic encoding of PV suggests there was a transition from typological variation to acquisition, where the plausibility of the transition depends on the status of genetic factors in the knowledge and use of language.

g. Universal principles such as structure-dependence must have become genetically encoded

with no individual variation. So such encoding is in principle possible.

h. Twin studies show that different aspects of our linguistic knowledge and its use are

subject to genetically determined variation.

i. All this variation falls on a continuum which includes typical development, suggesting

that the emergence of individual linguistic differences is ubiquitous and that cross-

linguistic variation might become genetically conditioned.

j. The child assumes that sub-parts of the grammar are ‘harmonic’ or ‘consistent’. Hence:

some choices of rule cluster because of perceived structural parallelism, leading to

short-cuts in the learning process. Some such clusters became more accessible than others

so came to predominate. Increased frequency of these patterns facilitated learning and

gave rise to incipient parametrisation depending on the conceptual salience or facilitatory

effect on processing – either parsing or production of the patterns.

k. The data are not always clear-cut and learners may associate property X with property Y

(C₁) or with property Z (C₂). Each provisional conclusion results in a skewing of the

primary linguistic data for the next generation.

l. Such skewing makes C₁ or C₂ more accessible. In an environment gradually enriched by

more and more speakers, both patterns could become equally accessible with the result that arbitrary or random choice (or no choice) might ensue.

m. Such asymmetric choices by (groups of) individuals would eventuate in typological

asymmetry. Greater accessibility would lead not to the absolutes characteristic of

standard accounts of PV but to statistical asymmetry in the incidence of harmonic combinations. This result emerges from invoking learning theory rather than relying exclusively on knowledge of language.

n. A crucial issue is whether the learners’ assumptions become consolidated as parametric

alternatives or remain, as they originated, as learning-theoretic strategies. The latter

possibility suggests that ‘antecedently known’ may have a learning-theoretic not a

linguistic-theoretic domain of application: a ‘third factor’ effect.

o. The current parametric story predicts the need for learning in certain situations and hence

also predicts particular error patterns (over-generalisation to the harmonic). If there is

positive evidence it follows that PV is not conceptually necessary, but it may nonetheless

be empirically correct.

References

Bennett, Keith. 2010. “The chaos theory of evolution” New Scientist 208:2782, pp.28-31.

Berwick, Robert & Noam Chomsky. 2008. The Biolinguistic Program: The Current State of its Evolution and Development. Forthcoming in Anna Maria Di Sciullo & Calixto Aguero (eds.), Biolinguistic Investigations. Cambridge, MA: MIT Press.

Boeckx, Cedric. 2010. What principles and parameters got wrong. http://ling.auf.net/lingBuzz/001118.

Chierchia, Gennaro. 1998. Plurality of Mass Nouns and the Notion of ‘Semantic Parameter’. In Susan Rothstein (ed.), Events and Grammar, 53-103. Dordrecht: Kluwer.

Chomsky, Noam. 1986. Knowledge of Language: Its Nature, Origin and Use. New York: Praeger.

Chomsky, Noam. 2006. Language and Mind (3^rd edn.). Cambridge: Cambridge University Press.

Chomsky, Noam. 2010. Some simple evo devo theses: how true might they be for language? In Richard Larson, Viviane Déprez & Hiroko Yamakido (eds.) The Evolution of Human Language: Biolinguistic Perspectives. Cambridge: CUP, pp. 45-62.

Crain, Stephen & Paul Pietroski. 2002. Why language acquisition is a snap. The Linguistic Review 19, 163-183.

Fodor, Jerry A. & Massimo Piattelli-Palmarini. 2010. What Darwin got Wrong. London: Profile Books.

Holmberg, Anders. 2010. Parameters in Minimalist theory: The Case of Scandinavian. Theoretical Linguistics 36.1:1-48.

Kayne, Richard S. 2005. Some notes on comparative syntax, with special reference to English and French. In Guglielmo Cinque & Richard S. Kayne (eds.), The Oxford Handbook of Comparative Syntax, 3-69. Oxford: Oxford University Press.

Lefebvre, Claire. 1998. Creole Genesis and the Acquisition of Grammar: The Case of Haitian Creole. Cambridge: Cambridge University Press.

Manzini, M. Rita & Leonardo M. Savoia. 2007. A Unification of Morphology and Syntax. London: Routledge.

Newmeyer, Frederick J. 2005. Possible and Probable Languages: A Generative Perspective on Linguistic Typology. Oxford: Oxford University Press.

Neil O'Connor, Neil, Neil Smith, Chris Frith & Ianthi Tsimpli (1994) "Neuropsychology

and linguistic talent". Journal of Neurolinguistics 8:95-107.

Piattelli-Palmarini, Massimo. 1989. Evolution, selection and cognition: from learning to parameter setting in biology and in the study of language. Cognition 31, 1–44.

Roberts, Ian & Anders Holmberg (2010) “Introduction: Parameters in minimalist theory”. In Theresa Biberauer, Anders Holmberg, Ian Roberts & Michelle Sheehan (eds) Parametric Variation: Null Subjects in Minimalist Theory. Cambridge, Cambridge University Press; pp. 1-57.

Smith, Neil. 2004. Chomsky: Ideas and Ideals (2^nd edn.). Cambridge: Cambridge University Press.

Smith, Neil & Ann Law (2009) “On parametric (and non-parametric) variation”.

Biolinguistics 3:332-343.

Stromswold, Karin. 2010. Genetics and the evolution of language: What genetic studies reveal about the evolution of language. In Richard Larson, Viviane Déprez & Hiroko Yamakido (eds.) The Evolution of Human Language: Biolinguistic Perspectives. Cambridge: CUP, pp.176-190.

Wagner, Andreas. 2007. Robustness and Evolvability in Living Systems. Princeton University Press.

Yip, Moira. 2003. Casting doubt on the Onset-Rime distinction. Lingua 113, 779-816.

[1] Defined as the ability of a population of organisms to generate genetic diversity and evolve. For discussion see e.g. Wagner (2007).

[2] That is, genetic variation that predisposes the bearer to manifest several phenotypic traits.

Faculty of Language

Comments

Monday, January 21, 2013

Parametric Variation and Darwin’s Problem

1 comment:

Contributors