There a very interesting pair of posts and a long thread of
insightful comments relating to parameters, both the empirical support for such
as well as their suitability given current theoretical commitments. Cederic, commenting on Neil’s initial post
and then adding a longer elaboration, makes the point that nobody seems committed to parameters in the classical sense
anymore. Avery and Alex C comment that whatever the empirical shortcomings of
parameteric accounts, something is always better than nothing so they
reasonably ask what do we replace it with. Alex D rightly points out that the
success of parametric accounts is logically independent of the POS and claims
about linguistic nativism. In this post, I want to reconstruct the history of
how parameter theory arose so as to consider where we ought to go from here.
The thoughts ramble on a bit, because I have been trying to figure this out for
myself. Apologies ahead of time.
In the beginning there was the evaluation metric (EM), and
Chomsky looked on his work and saw that it was deficient. The idea in Aspects was that there is a measure of grammatical complexity built
into FL and that children in acquiring their I-language (an anachronism here)
chose the simplest one compatible with the PLD (viz. the linguistic data
available to and used by the child). EM effectively ordered grammars according
to their complexity. The idea riffed on ideas of minimal description length
around at the time but with the important addition that the aim of a grammatical
theory with aspirations of explanatory adequacy was to find the correct UG for
specifying the meta-language relevant to determining the correct notion of “description”
and “length” in minimal description length. The problem was finding the right
things to count when evaluating grammars. At any rate, on this conception, the
theory of acquisition involved finding the best overall G compatible with PLD
as specified by EM. Chomsky concluded
that this conception, though logically coherent, was not feasible as a learning
theory, largely because it looked to be computationally intractable. Nobody had (nor I believe, has) a good tractable
idea of how to compare grammars overall
so as to have a complete ordering. Chomsky in LSLT developed some pair-wise
metrics for the local comparison of alternative rules, but this is a long way
from having a total ordering of the alternative Gs that is required to make EM accounts
feasible. Chomsky’s remedy for this
problem: divorce language acquisition from the evaluation of overall grammar
formats.
The developments of the Extended Standard Theory, which
culminated in GB theories, allows for an alternative conception of acquisition,
one that divorces it from measuring overall grammar complexity. How so? Well, first we eliminated the idea
that Gs were compendia of constructions specific rules. And second, we proposed
that UG consists of biologically provided schemata (hence part of UG and hence
not in need of acquisition) that specify the overall shape of a particular G.
On this view, acquisition consists in filling in values for the schematic
variables. Filling in values of UG
specified variables is a different task from figuring out the overall shape of
the grammar and, on the surface at least, a far more tractable task. The number
of parameters being finite already distinguished this from earlier conceptions.
In the earlier EM view of things there was no reason to think that the space of
grammatical possibilities was finite. Now, as Chomsky emphasized, within a
parameter setting model, the space of alternatives, though perhaps very large,
was still finite and hence the computational problem was different in kind from
the one lightly limned in Aspects.
So, divorcing the question of grammatical formats (via the elimination of rules
or their reduction to a bare minimum form like ‘move alpha’) from the question
of acquisition allowed for what looked like a feasible solution to Plato’s
Problem. In place of Gs being sets of constructions specific rules with EMs
measuring their overall collective fitness, we had the idea that Gs were vectors
of UG specified variables with two possible values (and hence “at most” 2n
possible grammars, a finite number of options). Finding the values was divorced
from evaluating sets of rules and this looked feasible.
Note that this is largely a conceptual argument. There is a
reasonable hunch but no “proof.” I mention this because other conceptual
considerations (we will get to them) can serve to challenge the conclusion and
make parameter theories less appealing.
In addition to these conceptual considerations, the
comparative grammar research in the 70s, 80s, and 90s provided wow-inducing
empirical confirmation of parameter based conceptions. It is hard for current
(youngish) practitioners of the grammatical dark arts to appreciate how
exciting early work on parameter setting models was. There were effectively
three lines of empirical support.
1. The
comparative synchronic grammar research. For example:
a. The
S versus S’ parameter distinguishing Italian from English islands (Rizzi,
Sportiche, Torrego)).
b. The
pro drop parameter (correlating, null
subjects, inversion and long movement apparently violating the fixed subject/that-t condition (Rizzi, Brandi and
Cordin)).
c. The
parametric discussions of anaphoric classes (local and long distance anaphors
(Wexler, Borer), to name just three, all uncovered a huge amount of new
linguistic data and argued for the fecundity of parametric thinking.
2. Crain’s
“continuity thesis,” which provided evidence that kids “mistakes” in acquiring
their particular Gs all actually conform to actual adult Gs. This provided
evidence that the space of G options was pretty circumscribed, as a parameter
theory implies it is.
3. The
work on diachronic change by Kroch, Lightfoot, Roberts (and more formal work by
Berwick and Niyogi) a,o., which indicated that large shifts in grammatical
structure over time (e.g. SOV to SVO) could be analyzed as changes in a small
number of simple parameter changes.
So, there was a good conceptual reason for moving to parameter
models of UG and the move proved to be empirically very fecund. Why the current
skepticism? What’s changed?
To my mind, three changes occurred. As usual, I will start
with the conceptual challenges and then proceed to the empirical ones.
The first one can be traced to work first by Dresher and
Kaye, and then taken up and further developed with great gusto by Fodor (viz.
Janet) and Sakas. This work shows that finite parameter setting can present
tractability problems almost as difficult as the ones that Chomsky identified
in his rejection of EM models. What this
work demonstrates is that given current envisioned parameters, parameter
setting cannot be incremental. Why not? Because parameter values are not independent. In other words, the value of one parameter in
a particular G may depend crucially on that of another. Indeed, the value of
any may depend on the value of each and this makes for an explosive combinatory
problem. It also makes incremental acquisition mysterious; how do the parameter
values get set if any bit of later PLD can completely overturn values
previously set?
There have been ingenious solutions to this problem, my
favorite being cue-based conceptions (developed by Dresher, Fodor, Lightfoot
a.o.). These rely on the notion that there is some data in the PLD that unambiguously determines the value of a
parameter. Once set on the basis of this
data, the value need never change.
Triggers effectively impose independence on the parameter space. If this is correct, then it renders UG
yet more linguistically specific; not only are the parameters very
linguistically specific, but the apparatus required to fix these is very
linguistically specific as well. Those that don’t like linguistically parochial
UGs should really hate both parameter theories and this fix to them. Which
brings us to the second conceptual shift: Minimalism.
The minimalist conceit is to eliminate the parochialism of
FL and show that the linguistically specific structure of UG can be accounted
for in more general cognitive/computational terms. This is motivated both on
general methodological grounds (factoring out what is cognitively general from
what is linguistically specific is good science) and as a first step to
answering Darwin’s Problem, as we’ve discussed at length in other posts. FL internal
parameters are a very big challenge to this project. Why? Because UG specified
parameters encumber FL with very linguistically specific information (e.g. it’s
hard to see how the pro drop
parameter (if correct) could possibly be stated in non linguistically specific
terms!).
This is what I meant earlier when I noted that conceptual
reasons could challenge Chomsky’s earlier conceptual arguments. Even if parameters made addressing Plato’s
Problem more tractable, they may not be a very good solution to the feasibility
problem if they severely compromise any approach to Darwin’s. This is what
motivates Cederic’s concerns (and others, e.g. Terje Lohndal) I believe, and
rightly so. So, the conceptual landscape
has changed and it is not surprising that parameter theories have become less
appealing and so open to challenge.
Moreover, as Cederic also stresses, the theoretical
landscape has changed as well. A legacy of the GB era that has survived into Minimalism
is the agreement that Gs do not consist of construction based rules. Rather,
there are very general operations (Merge) with very general constraints (e.g.
Extension, Minimality) that allow for a small set of dependencies universally.
Much of this (not all, but much) can be
reanalyzed in non linguistically specific terms (or so I believe). With this
factored out, there are featural idiosyncracies located in demands made by
specific lexical items, but this kind of idiosyncracy may be tolerable as it is
segregated to the lexicon, a well known repository of eccentrics.
At any rate, it is easy to see what would
motivate a reconsideration of UG internal parameters.
The tractability problems related to parameter setting noted
by Dresher-Fodor and company simply add to these motivations.
That leaves us with the empirical arguments. These alone are
what make parameter accounts worth endorsing, if they are well founded, and this is what is currently up for
grabs and way beyond my pay grade. Cederic and Fritz Newmeyer (among others)
have challenged the empirical validity of the key results. The most important
discoveries amounted to the clumping of surface effects with the settings of
single values, e.g. pro drop+subject
inversion+no that-t effects together
as a unit. Find one, you find them all.
However, this is what has been challenged. Is it really true that the
groupings of phenomena under single parameter settings is correct? Do these patterns coagulate as proposed? If
not, and this I believe is Newmeyer’s point and strongly empahasized by Cedric,
then it is not clear what parameters buy us.
Yes I-languages are different. So? Why think that this difference is due
to different parameter settings? So, there is an empirical argument: are there
data groupings of the kind earlier proposals advocated? Is the continuity
thesis accurate and if so how does one explain this without parameters? These
are the two big empirical questions and it is likely to be where the battle
over parameters has been joined and, one hopes, will ultimately get resolved.
I’d like to epmpahsize that this is an empirical
question. If the data falls on the
classical side then this is a problem for minimalists and exacerbates our task
of addressing Darwin’s problem. So be it. Minimalism as I understand it has an
empirical core and if it turns out that there is richer structure to UG than I
would like, well tough cookies on me (and you if your sympathies tend in the
same direction)!
Last point and I will end the rambling here. One nice
feature of parameter models is the pretty metaphor it afforded for language
acquisition as parameter setting. The switch box model is intuitive and easy to
grasp. There is no equivalent for EM models and this is partly why nobody knew
what to do with the damn thing. EM never
really got used to generate actual empirical research the way parameter setting
models did, at least not in syntax. So can we envision a metaphor for non
parameter setting models. I think we can. I offered one in A theory of syntax that I’d like to try and push it again here (I
know that this is self aggrandizing, but tooting one’s own horn can be so much
fun). Here’s what I said there (chapter
7):
Assume for a moment that the idea
of specified parameters is abandoned. What then? One attractive property of the GB story was
the picture that it came with. The LAD
was analogized to a machine with open switches.
Learning amounts to flipping the switches ‘on’ or ‘off’. A specific grammar is then just a vector of
these switches in one of the two positions.
Given this view there are at most 2P grammars (P=number of
parameters). There is, in short, a
finite amount of possible variation among grammars.
We
can replace this picture of acquisition with another one. Say that FL provides the basic operations and
conditions on their application (e.g. like minimality). The acquisition process can now be seen as a
curve fitting exercise using these given operations. There is no upper bound on the ways that
languages might differ though there are still some things that grammars cannot
do. A possible analogy for this
conception of grammar is the variety of geometrical figures that can be drawn
using a straight edge and compass. There
is no upper bound on the number of possible different figures. However, there are many figures that cannot
be drawn (e.g. there will be no triangles with 20 degree angles). Similarly, languages may contain arbitrarily
many different kinds of rules depending on the PLD they are trying to fit.
So think of the basic operations and conditions as the
analogues of the straight edge and compass and think of language acquisition as
fitting the data using these tools. Add to this a few general rules for figure
fitting: add a functional category if required, pronounce a bottom copy of a
chain rather than a top copy, add an escape hatch to a phase head. These are
general procedures that can allow the LAD to escape the strictures of the
limited operations a minimalistically stripped down FL makes available. The analogy is not perfect. But the picture
might be helpful in challenging the intuitive availability of the switch box
metaphor.
That’s it. This post has also been way too long. Kudos to
Neil and Cedric and the various very articulate commenters for making this such
a fruitful topic for thought, at least for me.