This post was prompted by Neil Smith's recent reflections on parameters on this blog. (It also builds on my own writings on the matter, which the interested reader can find on Lingbuzz.)
Neil writes that "Parametric Variation (PV) has been under attack. ... there has been widespread reluctance to accept the existence of parameters as opposed to particular rules (see e.g. Newmeyer 2005, Boeckx 2010). A vigorous counter-movement (see e.g. the special issue of Theoretical Linguistics 36.1, devoted to Holmberg 2010) shows that no consensus has been reached." I beg to differ. I think the attack's over, the consensus has been reached: Parameters (with capital P, the parameters of the classical time, those actually worth fighting for) are gone. Proof? Even those who fought for them ("vigorously", as Neil puts it) have accepted defeat. They confess that parameters are not part of UG. They have begun to be pretty upfront about this. Here's a taste (from the first line of "Mafioso Parameters and the Limits of Syntactic Variation", abstract by Roberts and colleagues, vigorous fighters, who will present this at this year's WCCFL and other variation-related event like the Bilbao conference in June.)
"We build on recent work proposing non-UG-specified, emergent parameter hierarchies ([1]),
arguing that a system of this kind not only addresses objections levelled against earlier formulations
of parameters ([2], [3], [4]), but also potentially offers a suitably restrictive theory of the nature and
limits of syntactic variation."
See: non-UG-specified ...
So, I guess I have to qualify Norbert's statements stemming from his infatuation with the (L)GB model. We may have pretty good ideas about the laws of grammars regarding some principles, but not (I claim) about those laws/principles that had switches in them (i.e., parameters). The LGB promises in this domain have not been met (sadly). I won't go over the evidence backing up this statement. Take a look at Newmeyer's excellent book (OUP, 2005), my Lingbuzz papers, and references therein.
What I want to do in this post is look at why people hang onto parameters. Here's a list of things I've heard from people (I've discussed some of these in my papers, others are new additions.)
1. The LGB picture was too good to be false.
2. That's the only game in town.
3. The current parametric theory we have is so restrictive, hence so irresistible
4. The current parametric theory makes great typological predictions
5. The worry concerning the exponential growth of parameters is nothing to worry about
6. The current parametric theory has such a nice deductive structure to it
7. Current parameters are no longer embedded within principles, so don't suffer from problems raised by Newmeyer, Boeckx, and others.
8. If we drop parameters, we are back to the land of Tomasello, Piaget, Skinner, Locke, Satan, mother-in-laws: the land of infinite variation!
9. Current parameters are no longer part of the first factor, they are part of the third factor, so everything is fine.
Let's look at each of these briefly (spoiler alert: all of these argument in favor of hanging onto parameters fail miserably)
RE 1. People told me that yes, all the parameters put forth so far have not fared very well, certainly not as well as Chomsky predicted in LGB, but they have not lost the hope to find parameters of the right kind. When I ask them to give me an example, they insist that they are working on it, that what they have in mind would be something more abstract than anything proposed so far. (When they pronounced the word abstract, I think I've seen raised highbrows and maybe a repressed smile.) Since I have not (yet) figured out what such a parameter would look like (other than the ones in LGB, which did not materialize), I can't tell you if it exists.
RE 2 (only game in town). A theory can be demonstrably wrong even if we don't yet have a better theory.
RE 3(restrictiveness). Part of the difficulty here is figuring out what people mean by "current parametric theory" (note: this is not an innocent point). Most of the time, they have the "Chomsky-Borer" conjecture in mind. That's part of the problem: it's a conjecture, not a theory. Luigi Rizzi, for example, put forth the following restrictive definition of parameter:
"A parameter is an instruction for a certain syntactic action expressed as a feature on a lexical item and made operative when the lexical item enters syntax as a head."
Sounds great (and restrictive), except that no one's got the foggiest idea about what counts as a possible feature, lexical item, and head. To be restrictive, the characterization just mentioned out to be embedded within a theory about these things (i.e., a theory of the lexicon). But even my grandmother knows we don't have a theory there (and she tells me she is not very optimistic we'll have one anytime soon).
Incidentally, this is why the LGB picture about parameters was so great: parameters were embedded within a general theory of grammatical principles. If you take parameters out of that context, they look quite bad.
RE 4. Typological predictions. First, let's not confuse Plato's problem and Greenberg's problem. Second, let's listen to Mark Baker. "If it is correct to reduce all macroparameters to a series of relatively independent microparameters in this way, then one would expect to find a relatively smooth continuum of languages. ... [But] if natural human language permits both macroparameters and microparameters. We would expect there to be parametric clusters in something like the classical way. But we'd expect these clusters to be much noisier, as a result of microparametric variation." I understand the logic, but once we accept (as good minimalists should) that principles of syntax are invariant, and (therefore) macroparameters are aggregates of microparameters, Baker's picture is no longer tenable. Macro- and micro-parameters are not two different things.
RE 5. Exponential growth. Kayne writes that "We must of course keep in mind that as we discover finer- and finer-grained syntactic differences (by examining more and more languages and dialects) the number of parameters that we need to postulate, although it will rise, can be expected to rise much more slowly than the number of differences discovered, insofar as n independent binary-valued parameters can cover a space of 2n languages/ dialects (e.g. only 8 such parameters could cover 28 = 256 languages/dialects, etc." (Kayne)
True, but this does not the worry concerning exponential growth of parameters. The worry is about the size of the space the child has to go through to acquire her language. Once parameters are removed from the background of principles (see point 3), what's the structure that will guide the child?
RE 6. Deductive structure. Where does the deductive structure come from? In an impoverished theory like minimalists propose, it must come from some place else. (In addition, when you look at the deductive structure you are supposed to find in parametric proposals, what you find is something very ugly, nothing like the nice parmeter hierarchy in Baker's Atoms of Language, but something like the Tokyo subway map. I'll devote a separate post to this issue, and will report on work done with my student Evelina Leivada, where we discovered more subway maps than we imagined.)
RE 7. Principles/parameters separation. It's true that parameters are no longer embedded inside GB principles. Minimalism got rid of those nice GB principles. But there is another sense in which current parameters are still embedded within principles (sorry if this is confusing, it's not my fault; read the papers!): many minimalists out there are crypto-GB folks: they still hold on to a rich UG, with lots of lexical principles (micro-principles, with micro-parameters in them). That's what provides articifical life to parameters. But if you get rid of these principles, and become a Proper Minimalist (All you need is Merge), then there is no room for parameters within that single Ur-principle.
RE 8. The fear of infinite variation. This is a myth, friends. Don't believe people who tell you that without parameters, there is no limit to variation in language. Hang on to your wallet. Tell them that even if there is a little bit of invariant syntax (say, Merge), then this puts an end to what can vary in grammar.
Don't believe me? Read Chomsky: (note that he does not sound very worried)
"Most of the parameters, maybe all, have to do with the mappings [to the sensory-motor interface]. It might even turn out that there isn't a finite number of parameters, if there are lots of ways of solving this mapping problem"
By the way, this quote is taken from "The science of language" book that G. Pullum did not seem to like. Turns out that the book is not as "useless" as he said it was. Pullum wrote that the book has "essentially no academic value", but well, it looks like I found something of value in it (I always find something of value in what Chomsky writes. You should try too. It's very easy.)
RE 9. First/Third factor. They are not independent. Every good biologist (I've asked my friend Dick Lewontin) will tell you that the first, second, and third factors define one another, you can't separate them so easily (read Dick's "Triple helix"). So, appealing to third factor only, abstracting away from the first, is not fair.
In sum: Why would anyone hang onto parameters? How did this post get so long?
--Cedric Boeckx
Friday, January 25, 2013
Thursday, January 24, 2013
The Gallistel-King Conjecture; part deux
A while ago I wrote about an idea that I dubbed the
Gallistel-King conjecture (here). The
nub of their conjecture is that (i) cognition requires something like a
Turing-von Neumann architecture to support it (i.e. connectionist style systems
won’t serve) and (ii) the physical platform for the kinds of computational
mechanisms needed exploits the same molecular structure used to pass
information across generations. In other words, DNA, RNA and proteins
constitute (part of) the cognitive code in addition to being the chemical
realization of the genetic code. Today,
Science Daily reports (here) that the European bioinformatics Institute “have
created a way to store data in the form of DNA.” And not just a little, but
tons. And not just for an hour but for decades and centuries if not
longer. As Nick Goldman, the lead on the
project, says:
We already know that DNA is a
robust way to store information because we can extract it from bones of wooly
mammoths, which date back tens of thousands of years, and make sense of it. It
is also incredibly small and does not need any power for storage, so shipping
and keeping it is easy.
G&K observe that these same virtues (viz. stability,
energy efficiency and longevity) would be very useful for storing information
in brains. Add to this that DNA has the
kind of discrete/digital structure making information thus stored easy to
retrieve and appropriate for computation (c.f. G&K 167ff. for a
computational demo), and it would seem that so storing information is just what
we would expect from a reasonably well designed biological thinking machine.
Let me shout from the rooftops so that it is clear: I AM NO
EXPERT IN THESE MATTERS. However, if G&K are right then we need a system
with read-write memory to support the kinds of cognition we find in
animals. As the Science Daily article reports:
“Reading DNA is fairly straightforward” the problem is that “writing it has
until now been a major hurdle to making DNA storage a reality.” There are two
problems that Goldman and his colleague Ewan Birney had to solve: (i) “using
current methods, it is only possible to manufacture DNA in short strings” and
(ii) “both writing and reading DNA are prone to errors, particularly when the
same DNA letter is repeated.” What Goldman and Birney did was devise a code
“using only short strings of DNA, and do it in such a way that creating a run
of the same letter would be impossible.”
Before proceeding, note the similarity of this and the kinds
of considerations about optimal coding that we were talking about in earlier
posts (here). This is a good example of
the kind of thing I was thinking of concerning efficient coding and
computation. Note how sensitive the relevant
considerations are to the problem required to be solved and to the physical
context within it which it needs solving.
This is a good example, I would argue, of the kind of efficiency
concerns minimalists should be interested in as well. Ok, so what did they do?
Birney describes it as follows:
So we figured, let’s break up the
code into lots of overlapping fragments going in both directions, with indexing
information showing where each of the fragments belongs in the overall code,
and make a coding scheme that doesn’t allow repeats. That way, you would have
to have the same error on four different fragments for it to fail- and that
would be very rare.
The upshot, Goldman says is “a code that is error tolerant
using a molecular from we know will last in the right conditions for 10,000
years, or possibly longer...As long as someone knows what the code is, you will
be able to read it back if you have a machine that can read DNA.”
Talk about long-term memory! And we certainly all embody
machines that can read DNA!
Goldman and Birney see this as a great technological
breakthrough; good-by hard drives, hello DNA.
However, with a little mental squinting it is not that hard to imagine
how this technological breakthrough would be just what G&K would have hoped
for.
Science often follows the leading technology of the day,
especially in the neuro/physio world. In Descartes day the brain was imagined
as a series of interconnected pipes inspired by the intricate fountains on
display (c.f. Vaucanson’s duck). Earlier it was clockwork brains. In our day,
it was computers of various kinds. Now, biotech may be pointing to a new
paradigm. If humans can code to DNA and
retrieve info from it, why shouldn’t brains?
Moreover, wouldn’t it be odd if brains had all this computational power
at its disposal but nature never figured out how to use it? A little like birds having wings but never
learning to fly? As I said, I’m no expert, but you gotta wonder…
Wednesday, January 23, 2013
Minimalism and On Wh Movement
The Generative enterprise as Chomsky envisioned it has been
based on five separate but related questions:
1. What
does a given native speaker know about his language? Or, what do particular Gs
look like?
2. What
makes it possible for native speakers to acquire Gs? Or, what does UG look
like?
3. How
did UG arise in the species? Or, what must be added to non-linguistic cognition
to get UGs?
4. How
do native speakers use Gs in performance? Or, how are Gs used to produce and
comprehend sentences in real time?
5. How
do brains code for G and UG?
These questions, though separate are clearly interconnected.
For example, it’s very hard to impossible to consider the properties of UG
without knowing how particular Gs are put together. It’s pointless to engage in questions about
how UG could have arisen in the species without knowing anything about UG. It’s
hard to study how people use their Gs in the absence of descriptions of the Gs
that are used. And, last, it’s hard to
study how brains embody G and UG without knowing anything about brains, Gs or
UGs. All of this should be evident. What
is less obvious, but still true, is that even when we know a non-trivial amount
about the things we are trying to relate, relating them may be really tough.
There are many reasons for this. Let’s consider two.
In the last several posts I considered the relation between
(2) and (3) above. I noted that we have
a pretty good description of what UG does, e.g. it regulates movement and
construal dependencies, identifies the kinds of phrase structures that are
linguistically admissible and where expressions that are displaced can be
interpreted both phonetically and semantically. GB is a pretty good effective theory of these relations. However, it is a problematic fundamental theory. Why? Because it’s
hard to see how something with the special purpose predicates and intricate
internal structure of GB could have arisen in the species. It’s just too
different from what we find in other domains of human cognition and much too
elaborately structured. Consequently, it’s entirely opaque how something this
apparently complex and cognitively idiosyncratic could have arisen in the
species, especially in the (apparently) short time available. This motivates a
re-think of GB’s depiction of UG. The Minimalist Program is a research program
aimed at reducing GB’s parochialism and intricacy by reanalyzing heretofore
language specific operations in more general cognitive terms (e.g. Merge) and
unifying conditions on grammatical processes in more generic computational
terms (e.g. cyclicity as monotonicity/no tampering). As I’ve suggested in other
posts, this strategy recapitulates Chomsky’s in ‘On Wh Movement’ (OWM) and I
have suggested that OWM provides a unification strategy worth emulating. What specifically
did Chomsky do in OWM?
First, he adopted Ross’s theory as an effective account.
Thus, he accepted that that lay of the land described by Ross was roughly
correct. Why ‘roughly’? Because, though Chomsky adopted Ross’s descriptions of
strong islands, he tweaked the data in light of the details of the unifying
Subjacency account. In particular, the relevant island data included Wh islands
(pace Ross’s theory, which treated Wh-islands as porous) and was generalized to
cover subjects in general (not only sentential subjects as in Ross). Thus, though OWM largely adopted Ross’s description
of the data, it modified it as well.
Second, Chomsky unified the various islands under the theory
of movement by unifying the movement constructions under a ‘Move alpha’ rubric.
Whereas in Ross rules were effectively constructions, Chomsky distilled out a
movement component common to all the cases subject to the Subjacency Condition
(i.e. ‘move alpha’) and proposed that all and only the move alpha operation was
subject to the locality requirements eventuating in movement effects when
violated.[1]
Thus, some constructions were treated as composites; one part move alpha one
part specific head with their own particular grammatical contributions (‘criteria’
in Rizzi’s sense). And all that mattered in computing locality was the movement
part. In sum, what makes a construction
subject to islands in OWM is that it is composed of a move alpha part. None of
its other properties matter.[2]
Chomsky adopted a very strong version of this claim: as
noted, all and only move alpha is
subject to subjacency restrictions. Arguing for this constitutes the better
part of OWM. The most interesting empirical component was reducing various
kinds of deletion operations investigated by Bresnan and Grimshaw to
constructions mediated by movement, most particularly their analysis of
comparatives as deletion operations.[3]
At any rate, sitting where we are today, it looks like Chomsky’s reanalysis has
won the day and that islands are now taken as virtual diagnostics of a movement
dependency.
The third leg of the analysis involved the Subjacency
Condition itself. This involved several proposals: one concerning the inventory
of bounding nodes (which nodes counted in computing distance), one concerning
which nodes had “escape hatches,” aka comp(lementizer)s, and one concerning the
fine structure of these comps (in particular how many slots it contained). In elaborating these in OWM Chomsky noted
that movement via escape hatches effectively allowed move alpha to finesse the
Specified Subject Condition (SSC) and the Propositional Island Constraint
(PIC). These were two proposed universals that were retained in revised form in
GB, though in later work move alpha was not thought to be subject to them.[4]
At any rate, it is interesting to see what Chomsky took to be the computational
implications of Subjacency Theory:
… the island constraints can be
explained in terms of general and quite reasonable computational properties
of formal grammar (i.e. subjacency, a property of cyclic rules that
states, in effect, that transformational rules have a restricted domain of
potential application; SSC, which states that only the most prominent phrase in
an embedded structure is accessible to rules relating it to phrases outside;
PIC, which stipulates that clauses are islands subject to the language specific
escape hatch..). If this conclusion can be sustained, it will be a significant
result, since such conditions as CNPC and the independent wh-island
constraint seem very curious and difficult to explain on other grounds. [my
emphasis] (p. 89; OWM).
Note what we have here: an attempt to motivate the
particular witnessed properties of the theory of bounding on more general
computational grounds. This should sound very familiar to minimalist ears-
restricted computational domains, prominent targets of operations (think labels
in place of subjects) etc. As we know,
theories embedding subjacency were subsequently built into interesting
proposals with non-trivial consequences for efficient parsing (e.g. Marcus,
Berwick and Weinberg).
The upshot? OWM provides a good model for minimalist
unification ambitions. Extract out a
common core operation in seemingly disparate “constructions” and propose
general conditions on the applications of this common rule to unify the
indicated dependencies. Minimalism has already taken steps in this direction.
For example, case theory was unified with movement theory in early minimalism
(e.g. Chomsky 1993) by having case discharged in the specifier positions of
relevant heads. Phrase structure theory
has been unified with movement theory by treating both as products of Merge
(E-merge and I-merge being instances of the same operation as applied to
different inputs). More ambitiously (stupidly?) still, some (e.g. yours truly,
Boeckx, Nunes, Idsardi, Lidz, Kayne, Zwart, Polinsky, Potsdam) have proposed
treating construal rules like Control, reflexivization, and pronominal binding
as species of movement hence unifying all of these operations with Merge.
These last mentioned proposals require abandoning two key
features of GB’s view of movement: (i) that movement into thematic positions is
barredand (ii) all movement result in phonetic gaps at launch sites. [5]
(i) is a plausible consequence of
removing D-structure as a “level” and (ii) constitutes a return to earlier
versions of Generative Grammar in which transformations included the addition
of some designated lexical material (e.g. there,
reflexives, bound pronouns, etc.). It is very very unclear at this moment
whether this whole range of unifications is possible.[6]
I personally find the data uncovered and the arguments made to be highly
suggestive and the empirical hurdles to be less daunting than they appear. However, this is a personal judgment, not one
shared by the field as a whole (sadly).
That said, I would argue that extending the OWM reasoning to other parts
of the grammar to unify the modules is the right thing to try if one hopes to
address Darwin’s problem in (3) above. Why? Because if this kind of unification succeeded, grammars with the
phenomenological properties of GB’s version of UG would follow from the
addition of Merge to the cognitive repertoire of our apish ancestors. In other words, we would have isolated a single change sufficient to allow the
development of an FL with GB observable properties. Such a theory could be
considered fundamental.
Interestingly, such a theory would not only plausibly
provide an answer to Darwin’s problem (in (3)), it would also provide one for
Boraca’s Problem (in (5)). Indeed, this
post was intended to address this point here (as the title indicates), but as
I’ve rambled on long enough, let me delay addressing how Minimalism relates to
question (5) until the next post.
[1]
Please note: This is a little bit of Whig history and it’s not really fair to Ross. Ross suggested that all island sensitive
constructions involved a specific rule of chopping, an operation that deleted
the resumptive pronoun that was part of the constructions of interest.
Interestingly, the intuition behind this part of Ross’s analysis has enjoyed
somewhat of a revival in recent work (see Merchant and Lasnik in particular),
which traces island effects as PF phenomena rather than restrictions on derivational
operations as Chomsky originally proposed.
[2]
This is a fact worth savoring. A priori
there is nothing odd about having the specific construction determine locality
rather than the created move alpha dependency.
There is nothing contradictory in assuming, e.g. that Focus and question
formation would obey islands but that Topicalizations, comparatives and
relativizations would not. But this does
not seem to be the way things panned out.
[3]
The other really interesting consequence of Subjaency Theory was the
implication that all movement was
successive cyclic. This prediction was confirmed in work by Kayne and Pollock,
Sportiche, and Torrego a.o. In my
opinion, this is still one of the nicest “predictions” any theory in syntax has
ever made.
[4]
The reason is that in OWM A’-traces were treated as anaphoric elements. This
was revised in later work due to some empirical findings due to Lasnik.
[5]
To be honest, this is my
reconstruction of their results. Kayne and Zwart, for example, retain the theta
criterion by assuming that there is a lot more doubling than meets the eye.
[6]
Indeed, I believe (actually I think I know this, but let me be coy) that
Chomsky is very skeptical about the unification of construal with movement
(with the possible exception of reflexivization). This may account for why binding, when
discussed, is suggested to be a CI interface operation fed by the grammar
rather than a direct product of the grammar as in all earlier generative
accounts.
Monday, January 21, 2013
Parametric Variation and Darwin’s Problem
What follows is Neil Smith's first post. I am delighted that he agreed to post here.
Parametric Variation (PV) has been
under attack. Despite Berwick &
Chomsky’s (2008:8) assertion that: “... principles do not determine the answers
to all questions about language, but leave some questions as open parameters”,
there has been widespread reluctance to accept the existence of parameters as
opposed to particular rules (see e.g. Newmeyer 2005, Boeckx 2010). A vigorous
counter-movement (see e.g. the special issue of Theoretical Linguistics 36.1, devoted to Holmberg 2010) shows that no
consensus has been reached. There is a parallel with human types where
the apparent obvious diversity of different ‘races’ disguises profound
underlying unity, and specifying the nature of the variation is difficult. It is against this background that I want to raise the issue of the
evolvability of PV: how plausible is the claim that PV has evolved, rather than
the sceptics’ claim that aggregations of unconnected rules have developed? If
it could be demonstrated that PV could not have evolved this would constitute a
powerful argument in favour of such scepticism. Providing a sketch of how PV
might have evolved doesn’t, of course, show that it did, but it would remove
one brick from the sceptics’ edifice.
PV theory unifies two different
domains: typology and acquisition. Variation among the world’s languages (more
accurately the set of internalised I-languages, Chomsky 1986) is defined in
terms of parametric differences and, in first language acquisition the child’s
task is reduced to setting the values of such parameters on the basis of the
stimuli it is exposed to – utterances in the ambient language. Given the
strikingly uniform success of first language acquisition, it follows that “the
set of possibilities [must] be narrow in range and easily attained by the first
language learner” (Smith 2004:83). By
hypothesis, the principles do not vary from child to
child or from language to language so, as
Chomsky (2006:183) puts it, “acquisition is a matter of
parameter setting, and is therefore divorced entirely from … the principles of
UG.”
The theory is at once ‘internalist’
(i.e. it is a theory of states of the mind/brain, pertaining to knowledge which
is largely unconscious), and universalist. An immediate implication of this
position is that the range of parametric choices is known in advance and, as a
corollary, it claims that acquisition is largely a process of ‘selection’
rather than instruction (see Piattelli-Palmarini 1989) and that such
acquisition is likely to take place in a critical period or periods. For some, “all syntactic
variation is parametric” (Kayne, 2005; Manzini & Savoia, 2007; Roberts
& Holmberg, 2010, etc.), but this is undesirable because too unconstrained.
If not all variation is parametric, the responsibility of parametric theory is
reduced and we need to redefine the
nature and scope of ‘parametric’
variation. I presuppose a 3-way distinction among Universal
Principles, Parameters, and Accidents – typified by irregularities of
morphology. Thus we have not only a distinction between Principles and Parameters but also Constraints on
Parameters to differentiate parametric choices from ‘Accidents’. That is, we
need identity criteria for
parameters (see Smith & Law 2009 for discussion).
Further, theoretical
consistency demands that we have the
same criteria for syntax and
phonology and (if there are relevant examples) for semantic choices at
the C-I interface (cf. Chierchia, 1998) and (if definable) for systematic morphological
variation. For current purposes I am happy to go along with Chomsky’s
observation that “parametrization and diversity too
would be mostly – maybe entirely
– restricted to
externalization,” (Chomsky 2010:60; to “language shorn of the properties of the
sound system” as Smith 2004:43, puts it) hence mainly morphology and phonology.
One reason for the multiplicity of languages is then that “the problem of
externalization can be solved in many different and independent ways,”
(Chomsky, 2010:61) where moreover these may all be ‘optimal’ in different ways.
I should note in passing that I am unconvinced by such claims as
“To
externalize the internally generated expression ‘what John is eating what’, it
would be necessary to pronounce ‘what’ twice, and that turns out to place a
very considerable burden on computation” (Berwick & Chomsky 2008:11). The
burden seems slight, especially given that in first language acquisition
children regularly repeat material ‘unnecessarily’ (see e.g. Crain &
Pietroski’s 2002: ‘What
do you think what pigs eat?’).
So, is PV
evolvable? One stance it is tempting to adopt is that evolvability[1] is simply another criterion characterising PV. In the absence (in
the domain of language) of some independent characterisation of evolvability,
however, such a claim would be circular. Indeed, like Fodor &
Piattelli-Palmarini (2010:159) I am sceptical that there is any level of
evolutionary explanation: “Natural history is just one damned thing after
another”, in their colourful terms. That is, there is no unified theory
accounting for the fixation of phenotypic traits in natural populations, and
there is equally no unified theory accounting for language typology, as witness
the ‘accidents’ mentioned above. Evolution is not explicable in terms of a
single theory. Even so, variation emerged so there had better be an account. More
positively, it seems uncontentious that, as Bennett (2010:31) puts it,
“macroevolution may, over the longer term, be driven largely by internally
generated genetic change, not adaptation to a changing environment”, that
“mutations occur continually, without external influence”, and that “a single
small change can have far-reaching and unpredictable effects”. Parallels with
language are obvious, though I would not wish to deny the relevance of ‘external
influence’.
There are three
strands in the alternative account I wish to provide. First, there is evidence for genetically
determined individual variation in knowledge and use of language, including not only pathological but also
typically developing populations.
Second, it is plausible to postulate that language learners find
different solutions to the problem of externalisation more or less
‘accessible’. Third, the linguistic typology
we see today is asymmetric, with only a subset of the logically possible
languages being attested: the observation which initially made
(macro-)parameters plausible. But the typology is not as skewed as predicted:
too many of the logical possibilities actually are attested, an observation
which makes some, conceivably all, such parameters suspect. The classic example
is the fractionation of the ‘pro-drop parameter’ when it was observed that the
individual sub-parts dissociated. Nonetheless, I think that this imperfect
typology is more robust than is standardly accepted, and probably arose as the
result of a learning strategy that is still operative in first language
acquisition. If correct, this suggests
that rejecting PV on these grounds would be premature.
Assuming a stage in
hominid evolution by which language has appeared, we need to motivate the
development in subsequent generations of PV. By hypothesis, the first language
to evolve had no PV. That is, the existence, and hence emergence, of PV
presupposes the prior evolution of language. Berwick & Chomsky (2008) argue
that the source of PV is externalisation and that there is “no reason to
suppose that solving the externalisation problem involved an evolutionary
change – that is, genomic change”. This claim is not, however, apodictic: the
switch-box metaphor suggests that the range of parametric variants is
‘antecedently known’, and implicit in ‘antecedently known’ is genetic control,
and in the same paper they talk of “the evolution ... of the mode of
externalisation”. We need a more nuanced account to resolve the tension caused
by the apparent inconsistency: solving the problem of externalisation probably
did not ‘involve’ an evolutionary change in the sense of being caused by such
change, but the solution may well have given rise to a situation in which
evolutionary, and genomic, change could occur.
In what follows I
suggest how in principle PV might have become genetically encoded and, in doing
so, explore the need for a transition in looking at PV from a stance where
typological variation is primary and first language acquisition secondary, to
one (now) where the primary motivation is acquisition and accounting for
typological variation is just a useful bonus.
To see how plausible this transition is we turn to the first of the
three strands mentioned above, beginning with the status of genetic factors in
the knowledge and use of language. It is presumably uncontroversial that
absolute (i.e. universal) principles such as structure-dependence must have
become genetically encoded with no individual variation.
In general,
genetic and neuro-imaging studies have provided cogent evidence only for the
aetiology of pathological as opposed to typical development (see e.g.
Stromswold, 2010:181; O’Connor et al, 1994). However, it is known on the basis
of twin studies (Stromswold, 2010) that different aspects of our linguistic
knowledge and its use are subject to
genetically determined variation. That is, linguistic variation, which may be
due to genetic or environmental factors, is pervasive among individuals in a
population, even between siblings, but monozygotic (MZ, or ‘identical’) twins
are linguistically more similar to each other than are dizygotic (DZ, or ‘fraternal’)
twins, implying that genetic factors are responsible for this variance. Consider
some examples from Stromswold (2010). In phonology “two-thirds of the variance
in phonemic awareness, and 15 percent of the variance in articulation is due to
heritable factors” (p.181); “for over 90 percent of the morphosyntactic tests,
the MZ correlation coefficient was larger than the DZ correlation coefficient”;
“about a third of the variance in vocabulary” is due to heritable factors.
Strikingly, except for the onset of babbling, there is considerable genetic
overlap[2] in the emergence of linguistic milestones (e.g. the onset of first
words and sentences), and even fluency of language use is subject to genetic
conditioning.
All this variation
falls on a continuum which crucially includes typical development, suggesting
that the emergence of individual linguistic differences, both random and
principled, is ubiquitous. More importantly it suggests how cross-linguistic
variation might emerge, perhaps ultimately becoming genetically conditioned.
A possible
scenario, involving the second strand mentioned above, might be that choices
that are now parametric were originally the result of randomly induced rules. Some choices of rule then clustered,
presumably because of perceived structural parallelism, leading to a situation
where some of these ‘clustered’ choices were more accessible than others and therefore
came to predominate over time. The increased frequency of these patterns
facilitated learning and ultimately gave rise to parametrisation. For instance,
some properties that are now described as parts of a single macro-parameter
(and that therefore participate in parametric cascades) might have developed
from alternatives ranked in accessibility according to their conceptual
salience or facilitatory effect on processing, where this could vary according
to viewpoint – parser or producer. As a simplistic example, the possibility of
null subjects is differentially advantageous to the speaker, for whom it
constitutes an economical saving of effort, and to the hearer, for whom it
constitutes an interpretive problem.
To make the
proposal clearer, I first present an abstract schema and then illustrate it
with real examples. Learners might
associate property X with property Y or with property Z. Such association in
the mind of the child could lead to short-cuts in the learning process: learner
A might jump to the conclusion C1 that observing X licensed the
conclusion that Y, learner B might jump to the conclusion C2 that
observing X licensed the conclusion that Z.
In each case the provisional conclusion would result in a difference in
knowledge which itself could result in a skewing of the primary linguistic data
for the next generation. Such skewing
would in turn make C1 or C2 more accessible. In an
environment gradually enriched by the presence of more and more speakers, both
patterns could become equally accessible to some populations with the result
that arbitrary or random choice (or no choice) might ensue. The next, crucial,
stage is whether the learners’ assumptions become consolidated as parametric
alternatives or remain, as they originated, as learning-theoretic strategies.
The latter possibility finesses, or at least postpones, the problem of how a
subset of linguistic alternatives become genetically encoded – but suggests
that ‘antecedently known’ may have a learning-theoretic not a
linguistic-theoretic domain of application.
Working in an
early parametric framework, but one which suffices for exegetical purposes,
Lefebvre (1998:355ff.) discusses a number of correlations between putative
parametric choices, where the existence of property A licenses the assumption
that the language exhibits property B or property C, leading potentially to the
existence of macro-parametric variation. Thus, the presence of serial verb
constructions correlates with either the absence of prepositions or the absence
of inflectional morphology; the presence of double object constructions
coincides with either the presence of preposition stranding or with the
direction of theta-role assignment. Lefebvre then observes that all of these
correlations fail, either because there are attested exceptions to them or
because they are theoretically unmotivated.
This initially disappointing fact has one redeeming feature: it suggests
that one might entertain an interpretation on which modes of externalisation were
originally unconstrained, and that systematic correlations among linguistic
properties might have emerged gradually, giving rise to a relatively
constrained set of systems. Crucially the correlations need not emerge as
absolutes but as strong tendencies forming the basis for a learning strategy.
A simpler
illustration is provided by the case of head-direction. For the child, as for
the linguist, the underlying assumption is that sub-parts of the grammar,
including head-directionality, are ‘harmonic’ or ‘consistent’. The child
acquiring its first language observes that verbs precede their objects and then
jumps to the conclusion, on the basis of minimal – or even no – further
examples, that its language is head-first.
If the language is consistent, nothing more needs to be said, or
acquired; if it is inconsistent, then positive evidence will lead to
traditional learning modifying the parameter setting. That is, the parametric
story predicts the need for learning in certain situations and hence, importantly,
also predicts particular error patterns (over-generalisation to the harmonic)
in such situations. If there is positive evidence of the kind mentioned it
follows that PV is not conceptually necessary, though it may nonetheless be
empirically correct.
A corollary of
this view is that if such asymmetric choices arose in individuals and then
groups of individuals, it would eventuate in typological asymmetry of the
well-known kind. That is, greater accessibility would lead not to the kind of
absolutes characteristic of standard accounts of PV but to statistical
asymmetry in the incidence of harmonic combinations. This more realistic result
emerges from invoking the role of learning theory rather than explaining
everything in terms of knowledge of language.
Just as I insisted
that the criteria for PV extend to phonology as well as syntax, I require the
same for evolvability. A phonological example of possible emergent
parametrisation can be seen in the differential treatment of post-consonantal
glides discussed in Smith & Law 2009. Yip (2003:804) argued that some
speakers treat a post-consonantal glide as a secondary articulation of the
consonant, others as a segment in its own right: “the rightful home of /y/ [is]
underdetermined by the usual data, leaving room for variation.” Her conclusion was that “speakers opt for
different structures in the absence of conclusive evidence for either.” We used Yip’s observation to argue that this did not exemplify PV
because it was not deterministic, but it may still represent a step along the
path to parametrisation. If one pattern
becomes more frequent, perhaps at random, it may gradually become consolidated
as a parameter.
So I think it is clear that, and even how,
PV could have evolved – a view defended in more detail by Roberts &
Holmberg 2010 – and conclude with a summary list of possible stages in parametric
evolution.
a. Random
variation gives rise to choice for the next generation, resulting in
incipient
typology.
b. First language learners find the solutions
to the problem of externalisation provided by this choice more or less
‘accessible’.
c. Hence
the linguistic typology became asymmetric (as we see today), with
only
a subset of the logically possible languages being attested.
d. Current
evidence for genetically determined individual variation in knowledge
and use of language in all populations suggests genomic change.
e. The asymmetric typology in (c) is more revealing than is
standardly accepted, and
probably arose as the result of a learning strategy that is
still operative in first language
acquisition.
f. The putative genetic encoding of PV suggests
there was a transition from typological variation to acquisition, where the
plausibility of the transition depends on the status of genetic factors in the knowledge
and use of language.
g. Universal principles such as
structure-dependence must have become genetically encoded
with no individual variation. So such encoding is in
principle possible.
h. Twin studies show that different aspects
of our linguistic knowledge and its use are
subject to genetically determined
variation.
i. All this variation falls on a
continuum which includes typical development, suggesting
that the emergence of individual linguistic differences is
ubiquitous and that cross-
linguistic variation might become genetically conditioned.
j. The child assumes that sub-parts of
the grammar are ‘harmonic’ or ‘consistent’. Hence:
some choices of rule cluster because
of perceived structural parallelism, leading to
short-cuts in the learning process. Some such clusters
became more accessible than others
so came to predominate. Increased frequency of these
patterns facilitated learning and
gave rise to incipient parametrisation depending on the
conceptual salience or facilitatory
effect on processing – either parsing or production of the
patterns.
k. The data are not always clear-cut and
learners may associate property X with property Y
(C1) or with property Z (C2). Each
provisional conclusion results in a skewing of the
primary linguistic data for the next generation.
l. Such
skewing makes C1 or C2 more accessible. In an environment
gradually enriched by
more and more speakers, both patterns could become equally
accessible with the result that arbitrary or random choice (or no choice) might
ensue.
m. Such asymmetric choices by (groups of)
individuals would eventuate in typological
asymmetry. Greater accessibility would lead not to the
absolutes characteristic of
standard accounts of PV but to statistical asymmetry in the
incidence of harmonic combinations. This result emerges from invoking learning
theory rather than relying exclusively on knowledge of language.
n. A crucial issue is whether the
learners’ assumptions become consolidated as parametric
alternatives or remain, as they originated, as
learning-theoretic strategies. The latter
possibility
suggests that ‘antecedently known’ may have a learning-theoretic not a
linguistic-theoretic
domain of application: a ‘third
factor’ effect.
o. The current parametric story predicts
the need for learning in certain situations and hence
also predicts particular error patterns
(over-generalisation to the harmonic). If there is
positive evidence it follows that PV is not conceptually
necessary, but it may nonetheless
be empirically correct.
References
Bennett, Keith. 2010. “The chaos theory of evolution” New Scientist 208:2782, pp.28-31.
Berwick, Robert & Noam Chomsky. 2008. The
Biolinguistic Program: The Current State of its Evolution and Development.
Forthcoming in Anna Maria Di Sciullo & Calixto Aguero (eds.), Biolinguistic Investigations. Cambridge,
MA: MIT Press.
Boeckx, Cedric. 2010. What principles and parameters got
wrong. http://ling.auf.net/lingBuzz/001118.
Chierchia, Gennaro. 1998. Plurality of Mass Nouns and the Notion of
‘Semantic Parameter’. In Susan Rothstein (ed.), Events and Grammar,
53-103. Dordrecht: Kluwer.
Chomsky, Noam. 1986. Knowledge
of Language: Its Nature, Origin and Use. New York: Praeger.
Chomsky, Noam. 2006. Language and Mind (3rd edn.).
Cambridge: Cambridge University Press.
Chomsky, Noam. 2010. Some
simple evo devo theses: how true might they be for language? In Richard Larson,
Viviane Déprez & Hiroko Yamakido (eds.) The
Evolution of Human Language:
Biolinguistic Perspectives. Cambridge: CUP, pp. 45-62.
Crain, Stephen & Paul Pietroski. 2002. Why language acquisition
is a snap. The Linguistic Review 19,
163-183.
Fodor, Jerry A. & Massimo Piattelli-Palmarini. 2010. What Darwin
got Wrong. London: Profile Books.
Holmberg, Anders. 2010.
Parameters in Minimalist theory: The Case of Scandinavian. Theoretical Linguistics 36.1:1-48.
Kayne, Richard S.
2005. Some notes on comparative syntax, with special reference to English and
French. In Guglielmo Cinque & Richard S. Kayne (eds.), The Oxford Handbook of Comparative Syntax, 3-69. Oxford: Oxford
University Press.
Lefebvre, Claire.
1998. Creole Genesis and the Acquisition
of Grammar: The Case of Haitian Creole. Cambridge: Cambridge University
Press.
Manzini, M. Rita & Leonardo
M. Savoia. 2007. A Unification of
Morphology and Syntax. London: Routledge.
Newmeyer, Frederick J. 2005. Possible and Probable Languages: A Generative Perspective on
Linguistic Typology. Oxford: Oxford University Press.
Neil O'Connor, Neil, Neil Smith, Chris
Frith & Ianthi Tsimpli (1994) "Neuropsychology
and
linguistic talent". Journal of Neurolinguistics
8:95-107.
Piattelli-Palmarini, Massimo. 1989. Evolution, selection and
cognition: from learning to parameter setting in biology and in the study of
language. Cognition 31, 1–44.
Roberts, Ian & Anders Holmberg (2010) “Introduction: Parameters
in minimalist theory”. In Theresa Biberauer, Anders Holmberg, Ian Roberts &
Michelle Sheehan (eds) Parametric
Variation: Null Subjects in Minimalist Theory. Cambridge, Cambridge
University Press; pp. 1-57.
Smith, Neil. 2004. Chomsky: Ideas and Ideals (2nd
edn.). Cambridge: Cambridge University Press.
Smith, Neil & Ann Law (2009) “On parametric (and non-parametric) variation”.
Biolinguistics 3:332-343.
Stromswold, Karin. 2010. Genetics and the evolution of language:
What genetic studies reveal about the evolution of language. In Richard Larson,
Viviane Déprez & Hiroko Yamakido (eds.) The
Evolution of Human Language: Biolinguistic
Perspectives. Cambridge: CUP, pp.176-190.
Wagner, Andreas. 2007. Robustness and Evolvability in Living
Systems. Princeton University Press.
Yip, Moira. 2003. Casting doubt
on the Onset-Rime distinction. Lingua
113, 779-816.
Subscribe to:
Comments (Atom)