The proliferation of handbooks on linguistics identifies a
gap in the field. There are so many now that there is an obvious need for a
handbook of handbooks consisting of papers that are summaries of the various
handbook summaries. And once we take this first tiny recursive step, as you all
know, sky’s the limit.
You may be wondering why this thought crossed my mind. Well,
it’s because I’ve been reading some handbook papers recently and many of those
that take a historical trajectory through the material often have a penultimate
section (before a rousing summary conclusion) with the latest minimalist take
on the relevant subject matter. So, we go through the Standard Theory version
of X, the Extended Standard Theory version, the GB version and finally an early
minimalist and late minimalist version of X. This has naturally led me to think
about the following question: what makes an analysis minimalist? When is an analysis minimalist and when not? And why
should one care?
Before starting let me immediately caveat this. Being true
is the greatest virtue an analysis can have. And being minimalist does not
imply that an analysis is true. So
not
being minimalist is not in itself necessarily a criticism of any given
proposal. Or at least not a decisive one. However, it is, IMO, a legit question
to ask of a given proposal whether and how it is minimalist. Why? Well because
I believe that Darwin’s Problem (and the simplicity metrics it favors) is
well-posed (albeit fuzzy in places) and therefore that proposals dressed in
assumptions that successfully address it gain empirical credibility. So, being
minimalist is a virtue and suggestive of truth, even if not its guarantor.
Perhaps I should add that I don’t think that
anything guarantees truth in the
empirical sciences and that I also tend to think that truth is the kind of
virtue that one only gains slantwise. What I mean by this is that it is the
kind of goal one attains indirectly rather than head on. True accounts are ones
that economically cover reasonable data in interesting ways, shed light on
fundamental questions and open up new avenues for further research.
If a story does all of that pretty well then we conclude it is true (or well on
its way to it). In this way truth is to theory what happiness is to life plans.
If you aim for it directly, you are unlikely to get it. Sort of like trying to
fall asleep. As insomniacs will tell you, that doesn’t work.
That out of the way, what are the signs of a minimalist
analysis (MA)? We can identify various grades of minimalist commitment.
The shallowest is technological minimalism. On this
conception an MA is minimalist because it expresses its findings in terms of
‘I-merge’ rather than ‘move,’ ‘phases’ rather than ‘bounding nodes’/‘barriers,’
or ‘Agree’ rather than ‘binding.’ There is nothing wrong with this. But
depending on the details there need not be much that is distinctively
minimalist here. So, for example, there are versions of phase theory (so far as
I can tell, most versions) that are
isomorphic to previous GB theories of subjacency, modulo the addition of v as a bounding node (though see Barriers). The second version of the PIC
(i.e. where Spell Out is delayed to the next phase) is virtually identical to
1-subjacency and the number of available phase edges is identical to the
specification of “escape hatches.”
Similarly for many Agree based theories of anaphora and/or
control. In place of local coindexing we express the identical dependency in
terms of Agree in probe/goal configurations (antecedents as probes, anaphors as
goals)
subject to some conception of locality. There are differences, of course, but
largely the analyses inter-translate and the novel nomenclature serves to mask
the continuity with prior analyses of the proposed account. In other words, what
makes such analyses minimalist is less a grounding in basic features of the
minimalist program, then a technical isomorphism between current and earlier technology.
Or, to put this another way, when successful, such stories tell us that our
earlier GB accounts were no less minimalist than our contemporary ones. Or, to
put this yet another way, our current understanding is no less adequate than
our earlier understanding (i.e. we’ve lost nothing by going minimalist). This
is nice to know, but given that we thought that GB left Darwin’s Problem (DP)
relatively intact (this being the main original motivation for going Minimalist
(i.e.
beyond explanatory adequacy)
then analyses that are effectively the same as earlier GB analyses likely leave
DP in the same opaque state. Does this mean that translating earlier proposals
into current idiom is useless? No. But such translations often make a modest
contribution to the program as a whole given the suppleness of current
technology.
There is a second more interesting kind of MA. It starts
from one of the main research projects that minimalism motivates. Let’s call
this “reductive” or “unificational minimalism” (UM). Here’s what I mean.
The minimalist program (MP) starts from the observation that
FL is a fairly recent cognitive novelty and
thus
what is linguistically proprietary is likely to be quite meager. This suggests
that
most of FL is cognitively or
computationally general, with only a small linguistically specific residue. This
suggests a research program given a GB backdrop (see
here
for discussion). Take the GB theory of FL/UG to provide a decent effective
theory (i.e. descriptively pretty good but not fundamental) and try to find a
more fundamental one that has these GB principles as consequences.
This conception provides a two pronged research program: (i) eliminate the
internal modularity of GB (i.e. show that the various GB modules are all
instances of the same principles and operations (see
here))
and (ii) show that of the operations and principles that are required to effect
the unification in (i), all save one are cognitively and/or computationally
generic. If we can successfully realize this research project then we have a
potential answer to DP: FL arose with the adventitious addition of the
linguistically proprietary operation/principle to the cognitive/computational
apparatus the species antecedently had.
That’s the main contours of the research program. UF
concentrates on (i) and aims to reduce the different principles and operations
within FL to the absolute minimum. It does this by proposing to unify domains
that appear disparate on the surface and by reducing G options to an absolute
minimum.
A reasonable heuristic for this kind of MA is the idea that Gs never do things
in more than one way (e.g. there are
not
two ways (viz. via matching or raising) to form relative clauses). This is not
to deny different surface patterns obtain, only that they are not the products
of distinctive operations.
Let me put this another way: UM takes the GB disavowal of
constructions to the limit. GB eschewed constructions in that it eliminated
rules like Relativization and Topicalization, seeing both as instances of
movement. However, it did not fully
eliminate constructions for it proposed very different basic operations for (apparently)
different kinds of dependencies. Thus, GB distinguishes movement from construal
and binding from control and case assignment from theta checking. In fact, each
of the modules is defined in terms of proprietary primitives, operations and
constraints. This is to treat the modules as constructions. One way of
understanding UM is that it is radically anti-constructivist and recognizes
that all G dependencies are effected in the same way. There is, grammatically
speaking, only ever one road to Rome.
Some of the central results of MP are of this ilk. So, for
example, Chomsky’s conception of Merge unifies phrase structure theory and
movement theory. The theory of case assignment in the Black Book unifies case
theory and movement theory (the latter being just a specific reflex of
movement) in much the way that move alpha
unifies question formation, relativization, topicalization etc. The movement
theory of control and binding unifies both modules with movement. The overall
picture then is one in which binding, structure building, case licensing,
movement, and control “reduce” to a single computational basis. There aren’t
movement rules versus phrase
structure rules versus binding rules versus control rules versus case assignment rules. Rather these are all different
reflexes of a single Merge effected dependency with different features being licensed via the same operation. It is the logic of On wh movement writ large.
There are other examples of the same “less is more” logic:
The elimination of D-structure and S-structure in the Black Book, Sportiche’s
recent proposals to unify promotion and matching analyses of relativization, unifying
reconstruction and movement via the copy theory of movement (in turn based on a
set theoretic conception of Merge), Nunes theory of parasitic gaps, and Sportiche’s
proposed elimination of late merger to name five. All of these are MAs in the
specific sense that they aim to show that rich empirical coverage is compatible
with a reduced inventory of basic
operations and principles and that the architecture of FL as envisioned in GB
can be simplified and unified thereby advancing the idea that a (one!) small
change to the cognitive economy of our ancestors could have led to the emergence of an FL like the one that we have
good (GB) evidence to think is ours.
Thus, MAs of the UM variety clearly provide potential answers to the
core minimalist DP question and hence deserve their ‘minimalist’ modifier.
The minimalist ambitions can be greater still. MAs have two
related yet distinct goals. The first is to show that svelter Gs do no worse
than the more complex ones that they replace (or at least don’t do
much worse).
The second is to show that they do better. Chomsky contrasted these in chapter
three of the Black Book and provided examples illustrating how doing less with
more might be possible. I would like to mention a few by way of illustration,
after a brief running start.
Chomsky made two methodological observations. First, if a
svelter account does (nearly) empirically as well as a grosser one then it
“wins” given MP desiderata. We noted why this was so above regarding DP, but
really nobody considers Chomsky’s scoring controversial given that it is a lead
footed application of Ockham. Fewer assumptions are always better than more for
the simple reason that for a given empirical payoff K an explanation based on N
assumptions leaves each assumption with greater empirical justification than
one based on N+1 assumptions. Of course, things are hardly ever this clean, but
often they are clean enough and the principle is not really contestable.
However, Chomsky’s point extends this reasoning beyond
simple assumption counting. For MP it’s not only the number of assumptions that
matter but their pedigree. Here’s what I mean.
Let’s distinguish FL from UG. Let ‘FL’ designate whatever allows the LAD
to acquire a particular G
L based on PLD
L. Let ‘UG’
designate those features of FL that are linguistically proprietary (i.e. not
reflexes of more generic cognitive or computational operations). A MA aims to
reduce the UG part of FL. In the best case, it contains a single linguistically
specific novelty.
So, it is not
just a matter of
counting assumptions. Rather what matters is counting UG (i.e. linguistically
proprietary) assumptions. We prefer those FLs with minimal UGs and minimal
language specific assumptions.
An example of this is Chomsky’s arguments against
D-structure and S-structure as internal levels. Chomsky does not deny that Gs
interface with interpretive interfaces, rather he objects to treating these as
having linguistically special properties.
Of course, Gs interface with sound and meaning. That’s obvious (i.e.
“conceptually necessary”). But this assumption does not imply that there need
be anything linguistically special about the G levels that do the interfacing
beyond the fact that they must be readable by these interfaces. So, any
assumption that goes beyond this (e.g. the theta criterion) needs defending
because it requires encumbering FL with UG strictures that specify the extras
required.
All of this is old hat, and, IMO, perfectly straightforward
and reasonable. But it points to another kind of MA: one that does not reduce
the number of assumptions required for a particular analysis, but that
reapportions the assumptions between UGish ones and generic cognitive-computational
ones. Again, Chomsky’s discussions in chapter 3 of the Black Book provide nice
examples of this kind of reasoning, as does the computational motivation for
phases and Spell Out.
Let me add one more (and this will involve some self referentiality).
One argument against PRO based conceptions of (obligatory) control is that they
require a linguistically “special” account of the properties of PRO. After all,
to get the trains to run on time PRO must be packed with features which force
it to be subject to the G constraints it is subject to (PRO needs to be locally
minimally bound, occurs largely in non-finite subject positions, and
has very distinctive interpretive
properties). In other words, PRO is a G internal formative with special G sensitive
features (often of the possibly unspecified phi-varierty) that force it into G
relations. Thus, it is MP problematic.
Thus a proposal that eschews PRO is
prima
facie an MA story of control for it dispenses with the requirement that
there exists a G internal formative with linguistically specific requirements.
I would like to add, precisely because I have had skin in this game, that this
does
not imply that PRO-less accounts
of control are correct or even superior to PRO based conceptions.
No! But it does mean that eschewing PRO has minimalist
advantages over accounts that adopt PRO as they minimize the UG aspects of FL
when it comes to control.
Ok, enough self-promotion.
Back to the main point. The point is not merely to count assumptions but
to minimize UGish ones. In this sense, MAs aim to satisfy Darwin more than Ockham.
A good MA minimizes UG assumptions and does (about) as well empirically as more
UG encumbered alternatives. A good sign that a paper is providing an MA of this
sort, is manifest concern to minimize the UG nature of the principles assumed.
Let’s now turn to (and end with) the last most ambitious MA:
it is one that not merely does (almost) as well as more UG encumbered accounts,
but does better. How can one do better. Recall that we should expect MAs to be
more empirically brittle than less minimalist alternatives given that MP
assumptions generally restrict an account’s descriptive apparatus.
So, how can a svelter account do better? It
does so by having more explanatory oomph (see
here).
Here’s what I mean.
Again, the Black Book provides some examples.
Recall Chomsky’s discussion of examples like (1) with structures like (2):
(1) John
wonders how many pictures of himself Frank took
(2) John
wonders [[how many pictures of himself] Frank took [how many pictures of
himself]]
The observation is that (1) has an idiomatic reading just in
case
Frank is the antecedent of the
reflexive.
This can be explained if we assume that there is no D-structure level or
S-structure level. Without these binding and idiom interpretation must be
defined over that G level that is input to the CI interface. In other words,
idiom interpretation and binding are computed over the same representation and
we thus expect that the requirements of each will affect the possibilities of
the other.
More concretely, to get the idiomatic reading of take pictures requires using the lower
copy of the wh phrase. To get the John
as potential antecedent of the reflexive requires using the higher copy. If we
assume that only a single copy can be retained on the mapping to CI, this
implies that if take pictures of himself
is understood idiomatically, Frank is
the only available local antecedent of the reflexive. The prediction relies on
the assumption that idiom interpretation and binding exploit the same representation. Thus, by
eliminating D-structure, the theory can no longer make D-structure the locus of
idiom interpretation and by eliminating S-structure, the theory cannot make it
the locus of binding. Thus by eliminating both levels the proposal predicts a correlation between
idiomaticity and reflexive antecedence.
It is important to note that a GBish theory where idioms are
licensed at D-structure and reflexives are licensed at S-structure (or later)
is compatible with Chomsky’s reported data, but does not predict it. The
relevant data can be tracked in a theory with the two internal levels. What is
missing is the prediction that they must swing together. In other words, the MP
story explains what the non-MP story must stipulate. Hence, the explanatory
oomph. One gets more explanation with less G internal apparatus.
There are other examples of this kind of reasoning, but not
that many. One of the reasons I have
always liked Nunes’ theory of parasitic gaps is that it explains why they are licensed only in overt syntax. One of the
reasons that I like the Movement Theory of Control is that it explains why one finds (OC) PRO in the
subject position of non-finite clauses. No stipulations necessary, no ad hoc
assumptions concerning flavors of case, no simple (but honest) stipulations
restricting PRO to such positions. These are minimalist in a strong sense.
Let’s end here. I have tried to identify three kinds of MAs.
What makes proposals minimalist is that they either answer or serve as steps
towards answering the big minimalist question: why do we have the FL we have?
How did FL arise in the species? That’s
the question of interest. It’s not the only
question of interest, but it is an important one. Precisely because the
question is interesting it is worth identifying whether and in what respects a
given proposal might be minimalist. Wouldn’t it be nice if papers in minimalist
syntax regularly identified their minimalist assumptions so that we could not
not only appreciate their empirical virtuosity, but could also evaluate their contributions
to the programmatic goals.