In an earlier post (
here)
I mentioned that I was asked to write about the Minimalist Program (MP) for a
volume aimed at comparing various linguistic
“frameworks.” I am personally skeptical about “framework differences.” I
personally find them to be more notational variants of common themes (indeed,
when pressed, I have been known to complain that H/GPSG, LFG, RG, are all
“dialects” of GB) than actual conceptually divergent perspectives. The main
reason is that most linguistic theory lacks any real depth and most of the
frameworks have the wherewithal to mimic one another’s (ahem) deep insights.
So, where others see ideological divergence, I tend to see slight differences
in accent.
That said, I have a further gripe about this kind of volume.
Even were I to recognize that these different frameworks empirically competed
(or should compete) I do not think that MP should be included in the race. MP
is not an alternative to GB (or its
various dialects) but presupposes
that the results of GB are (largely) empirically correct. MP builds on GB results
(and its dialects) and aims to conserve these results. Thus, it is not intended
(or should not be intended) as a
wholesale replacement. For an MPer, GB is wrong the way that Newtonian
Gravitation is wrong when compared to General Relativity: the latter theory
derives the former as a limiting case. It does not reject the former as
misguided, rather it treats it as descriptive rather than fundamental. Indeed, an
important part of the argument in favor of General Relativity is that it can
derive Newton as a special case. If it could not, that would be an excellent
argument that it was fundamentally flawed.
And the MP point? From the MP perspective, as M Antony might
have put matters: MPers come to (largely) praise (and incorporate) GB not to
bury it. The whole point of MP is to show that the “laws” that GB discovered
can be understood in more fundamental MP terms. IMO, it has been less
appreciated than it should be how much MP has succeeded in making good on this
ambition. So, in this post (and another I will put up later on) I will try to
show how far MP has come in making good on its ambitions.
A caveat: if you
find GB (and its cousins) to be hopelessly wrongheaded, then you will find a
theory that derives its results also hopelessly wrongheaded. If you are one of
these, then MP will have, at most, aesthetic interest. If you, like me, take GB
to be more or less correct, then MP’s aesthetic virtues will combine with GB’s
empirical panache to create a very powerful intellectual rush. It will even
create the impression that MP is very much on the right track.
The Merge Hypothesis: Explaining some core features of FL/UG
Here is a list of some
characteristic features of FL/UG and its GLs:
(1) a. Hierarchical recursion
b. Displacement (aka, movement)
c. Gs
generate natural formats for semantic interpretation
d. Reconstruction
effects
e. Movement
targets c-commanding positions
f. No
lowering rules
g. Strict
cyclicity
h. G
rules are structure dependent
i. Antecedents
c-command their anaphors
j. Anaphors
never c-command their antecedents (i.e. Principle C effects and Strong Cross
Over Effects)
k. XPs
move, X’s don’t, X0s might
l. Control
targets subjects of “defective” (i.e. tns or agreement deficiency) clauses
m. Control
respects the Principle of Minimal Distance
n. Case
and agreement are X0-YP dependencies
o. Reflexivization
and Pronominalization are in complementary distribution
p. Selection/subcategorization
are very local head-head relations
q. Gs
treat arguments and adjuncts differently, with the former less “constrained”
than the latter
Note, I am not saying that this exhausts the properties of
FL/UG, nor am I saying that all LINGers agree with
all of these accurately describe FL/UG.
What I am saying is that (1a-q) identify empirically robust(ish) properties of
FL/UG and the generative procedures its G
Ls allow. Put another way,
I am claiming (i) that certain facts about human G
Ls (e.g. that they
have hierarchical recursion and movement and binding under c-command and
display principle C effects and obligatory control effects, etc.) are
empirically well-grounded and (ii) that it is appropriate to ask
why FL/UG allows for G
Ls with
these properties and not others. If you buy this, then welcome to the
Minimalist Program (MP).
I would go further; not only are the assumptions in (i)
reasonable and the question in (ii) appropriate, MP has provided some answers
to the question in (ii). One well-known approach to (1a-h), the Merge Hypothesis (MH), unifies all these
properties, deriving them from the core generative mechanism Merge. Or more
particularly, MH postulates that FL/UG contains a very simple operation (aka,
Merge) that suffices to generate unbounded hierarchical structures (1a) and
that these Merge generated hierarchical structures will also have the seven
additional properties (1b-h). Let’s examine the features of this simple
operation and see how it manages to derive these eight properties?
Unbounded hierarchy implies a recursive procedure.
MH explains this by postulating a simple operation (“Merge”) that generates the
requisite unbounded hierarchical structures. Merge consists of a very simple
recursive specification of
Syntactic
Objects (SO) coupled with the assumption that complex SOs are sets.
(2) a.
If
a
is a lexical item then
a is a SO
b.
If a
is an SO and b
is an SO the Merge(a,b) is
an SO
(3) For a, b,
SOs, Merge(a,b)à
{a,b}
The inductive step (2b) allows Merge to apply to its own
outputs and thus licenses unboundedly “deep” SOs with sets contained within
sets contained within sets… The Merge Hypothesis is that the “simplest”
conception of this combinatoric operation (the minimum required to generate
unbounded hierarchically organized objects) suffices to explain why FL/UG has
many of the other properties listed in (1).
In what way is Merge the “simplest” specification of
unbounded hierarchy? The operation has three key features: (i) it directly and
uniquely targets hierarchy (i.e. the basic complex objects are sets (which are
unordered), not strings), (ii) it in no way changes the atomic objects combined
in combining them (Inclusiveness), and (iii) it in no way changes the complex
objects combined in combining them (Extension). Inclusiveness and Extension
together constitute the “No Tampering Condition” (NTC). Thus, Merge recursively
builds hierarchy (and only hierarchy) without “tampering” with the inputs in
any way save combining them in a very simple way (i.e. just hierarchy no linear
information).
The key theoretical observation is that
if
FL/UG has Merge as its primary generative mechanism,
then it delivers G
Ls with properties (1a-h). And if this is right,
it provides a proof of concept that it is not premature to ask
why FL/UG is structured as it is. In
other words, this would be a very nice result given the stated aims of MP.
Let’s see how Merge so conceived derives (1a-h).
It should be clear that Gs with Merge can generate unbounded
hierarchical dependencies. Given a lexicon containing a finite list of atoms a,b,g,d,… we
can, using the definitions in (2) and (3) form structures like (4) (try it!).
(4) a. {a, {b, {g, d}}}
b. {{a, b}, {g, d}}
c. {{{a, b}, g}, d}
And given the recursive nature of the operation, we can keep
on going ad libitum. So Merge
suffices to generate an unbounded number of hierarchically organized syntactic
objects.
Merge can also generate structures that model displacement
(i.e. movement dependencies). Movement rules code the fact that a single
expression can enjoy multiple relations within a structure (e.g. it can be both
a complement of a predicate and the subject of a sentence).
Merge allows for the derivation of structures that have this property. And this
is a very good thing given that we know (due to over 60 years of work in
Generative Grammar) that displacement is a key feature of human G
Ls.
Here’s how Merge does this. Given a structure like (5a)
consider how (2) and (3) yield the movement structure (5b). Observe that in (5b),
b occurs twice. This can be understood as coding
a movement dependency, b being both sister of the SO a and sister of the derived
SO {g,
{l,
{a,
b}}}.
The derivation is in (6).
(5)
a. {g, {l, {a, b}}}
b. {b, {g, {l, {a, b}}}}
(6) The SO {g, {l, {a, b}}} and the SO b
(within {g,
{l,
{a,
b}}})
merge to from {b, {g, {l, {a, b}}}}
Note that this derivation assumes that once an SO always an
SO. Thus, Merging an SO a to form part of a complex SO b that contains a does
not change (tamper with) a’s status as an SO. Because complex SOs are composed of
SOs Merge can target a subpart of an SO for further Merging. Thus, NTC allows
Merge to generate structures with the properties of movement; structures where
a SO is a member of two different “sets.”
Let me emphasize an important point: the key feature that
allows Merge to generate movement dependencies (viz. the “once an SO always an
SO” assumption) follows from the assumption that Merge does nothing more than take SOs and form them
into a unit. It otherwise leaves the combined objects alone. Thus, if some
expression is an SO before being merged with another SO then it will retain
this property after being Merged given that Merge in no way changes the
expressions but for combining them. NTC (specifically the Inclusiveness and
Extension Conditions) leaves all properties of the combining expressions
intact. So, if a
has some property before being combined with b (e.g. being an SO), it
will have this property after it is combined with b. As being an SO is a
property of an expression, Merging it will not change this and so Merge thus
legitimately combine a subpart of an SO to its container.
Before pressing on, a comment: unifying movement and phrase
building is an MP innovation. Earlier theories of grammar (and early minimalist
theories) treated phrasal dependencies and movement dependencies as the
products of entirely different kinds
of rules (e.g. phrase structure rules vs transformations/Merge vs Copy+Merge).
Merge unifies these two kinds of
dependencies and treats them as different outputs of a single operation. As
such, the fact that FL yields Gs that contain both unbounded hierarchy and displacement operations is
unsurprising. Hierarchy and displacement are flips sides of the same
combinatoric coin. Thus, if Merge is the core combinatoric operation FL makes
available, then MH explains why FL/UG constructs GLs that have both
(1a) and (1b) as characteristic features.
Let’s continue. As should be clear, Merge generated
structures like those in (5) and (6) also provides all we need to code the two
basic types of semantic dependencies: predicate-argument structures (i.e.
thematic dependency) and scope structure. Let me be a bit clearer. The two
basic applications of move are those that take two separate SOs and combine
them and those that take two SOs with one contained in the other and combines
them. The former, E-Merge, is fit for the representation of predicate-argument
(aka, thematic structure). The latter, I-Merge, provides an adequate
grammatical format for representing operator/variable (i.e. scope)
dependencies. There is ample evidence that Gs code for these two kinds of
semantic information in simple constructions like Wh-questions. Thus, it is an argument in its favor that Merge as
defined in (2) and (3) provides a syntactic format for both. An argument
saturates the predicate it E-merges with and scopes over the SO it I-merges
with. If this is correct, then Merge provides structure appropriate to explain
(1c).
And also (1d). A standard account of Reconstruction Effects
(RE) involves allowing a moved expression to function
as if it still occupied the position from which it moved. This
as-if is redeemed theoretically if the
movement site contains a copy of the moved expression. Why does a displaced
expression semantically comport itself as if it is in its base position?
Because a copy of the moved expression
is
in the base position. Or, to put this another way, a copy theory of movement
would go a long way towards providing the technical wherewithal to account for
the possibility of REs. But Merge based accounts of movement like the one above
embody a copy theory of movement. Look at (5b):
b is in two positions in
virtue of being I-merged with its container. Thus,
b is a member of the lowest
set and the highest. Reconstruction amounts to choosing which “copy” to
interpret semantically and phonetically.
Reducing movement to I-merge explains why movement should allow REs.
Furthermore, having this option follows from a key
assumption concerning Merge. Recall that it eschews tampering. In other words,
if movement is a species of Merge then no-tampering requires coding movement
with “copies.” To see this contrast how movement is treated in GB.
Within GB, if a moves from its base position to some higher position a
trace is left in the launch site. Thus, a GB version of (5b) would look
something like (7):
(7) {b1,
{g,
{l,
{a,
t1}}}}
Two features are noteworthy; (i) in place of a copy in the
launch site we find a trace and (ii) that trace is co-indexed with the moved
expression
b.
These features are built into the GB understanding of a movement rule.
Understood from a Merge perspective, this GB conception is doubly suspect for
it violates the Inclusiveness Condition clause of the NTC twice over. It
replaces a copy with a trace and it adds indices to the derived structure.
Empirically, it also mystifies REs. Traces have no contents. That’s what makes
them traces (see note 13). Why are they then able to act as if they did have
them? To accommodate such effects GB adds a layer of theory
specific to REs (e.g. it invokes
reconstruction rules to undo the effects of movement understood in trace
theoretic terms). Having copies in place of traces simplifies matters and
explains how REs are possible. Furthermore, if movement is a species of Merge
(i.e. I-merge) then SOs like (7) are not generable at all as they violate NTC.
More specifically, the only kosher way to code movement and obey the NTC is via
copies. So, the only way to code movement given a simple conception of
syntactic combination like Merge (i.e. one that embodies no tampering) results
in a copy theory of movement that serves to rationalize REs without a
theoretically bespoke theory of reconstruction. Not bad!
So Merge delivers properties (1a-d), and the gifts just keep
on coming. It also serves up (6e,f,g) as consequences. This time let’s look at
the Extension Condition (EC) codicil to the NTC. EC requires that that inputs
to Merge be preserved in the outputs to Merge (any other result would “change”
one of the inputs). Thus, if an SO is input to the operation it will be a
unit/set in the output as well because Merge does no more than create
linguistic units from the inputs. Thus, whatever is a constituent in the input
appears as a constituent with the same properties in the output.
This implies (i) that all I-merge is to a
c-commanding position, (ii) that lowering rules cannot exist, and (iii) that
derivations are strictly cyclic.
The conditions that movements be always upwards to c-commanding positions and
strictly cyclic thus follows trivially from this simple specification of Merge
(i.e. Merge with NTC understood as embodying EC).
An illustration will help clarify this. NTC prohibits
deriving structure (8b) from (8a). Here we Merge g with a. The
output of this instance of Merge obliterates the fact that {a,b} had
been a unit/constituent in (8a), the input to Merge. EC prohibits this. It
effectively restricts I-Merge to the root. So restricted, (8b) is not a licit
instance of I-Merge (note that {a,b} is not a unit in the output. Nor is (8c) (note that {{a,b},{g,d}} is
not a unit in the output). Nor is a derivation that violates the strict cycle
(as in (8d)). Only (8e) is a grammatically licit Merge derivation for here all
the inputs to the derivation (i.e. g and {{a,b},{g,d}}) are also units in the output of the derivation (i.e.
thus the inputs have been preserved (remain unchanged) in the output). Yes a
new relation has been added, but no previous ones have been destroyed (i.e. the
derivation is info-preserving (viz. monotonic). Repeat the slogan: once an SO
always an SO). In deriving (8b-c) one of the inputs (viz. {{a,b},{g,d}})
is no longer a unit in the output and so NTC/EC has been violated.
(8) a.
{{a,b},{g,d}}
b.
{{{g,a},b}, {g,d}}
c.
{{g,{a,b}},{g,d}}
d. {{a, b}, {d, {g,d}}
d.
{g,{{a,b},{g,d}}}
In sum, if movement
is I-merge subject to NTC then all movement will necessarily be to c-commanding
positions, upwards, and strictly cyclic.
It is worth noting that these three features are not
particularly recondite properties of FL/UG and find a place in most GG accounts
of movement. This makes their seamless derivation within a Merge based account
particularly interesting.
Last, we can derive the fact that the rules of grammar are
structure dependent ((1h) above), an oft-noted feature of syntactic operations.
Why should this be so? Well, if Merge is the sole syntactic operation and then
non-structure dependent operations are very hard (impossible?) to state. Why?
Because the products of Merge are sets and sets impose no linear requirements
on their elements. If we understand a derivation to be a mapping of phrase
markers into phrase markers
and we
understand phrase markers to effectively be sets (i.e. to only specify
hierarchical relations) then it is no surprise that rules that leverage linear
left-right properties of a string cannot be exploited. They don’t exist for
phrase markers eschew this sort of information and thus operations that exploit
left/right (i.e. string based) information cannot be defined.
So, why are rules of G structure dependent?
Because this is the only structural information that Merge based Gs represent.
So, if the basic combinatoric operation that FL/UG allows is Merge, then FL/UGs
restriction to structure dependent operations is unsurprising.
This is a good place to pause for a temporary summary:
Research in GG over the last 60 years has uncovered several plausible design
features of FL/UG. (1a-h) summarizes some uncontroversial examples. All of these properties of FL/UG can be
unified if we assume that Merge as outlined in (2) and (3) is the basic
combination operation that FL/UG affords. Put simply, the Merge Hypothesis has
(1a-h) as consequences.
Let me say the same thing more tendentiously. All agree that
a basic feature of FL/UG is that allows for Gs with unbounded hierarchy. A very
simple inductive procedure sufficient for specifying this property (1a), also
entails many other features of FL/UG (1b-h). What makes this specification
simple is that it directly targets hierarchy and requires that the computation
be strongly monotonic (embody the NTC). Thus we can explain the fact that FL/UG
has these properties by assuming that it embodies a very simple (arguably, the
simplest) version of a procedure that any
empirically adequate theory of FL/UG would have to embody. Or, given that FL/UG allows for unbounded
hierarchical recursion (a non-controversial fact given the fact of Linguistic
Productivity), the simplest (or at
least, very simple) version of the requisite procedure brings in its train
displacement, an adequate format for semantic interpretation, Reconstruction
Effects, movement rules that target c-commanding positions, eschew lowering and
are strictly cyclic, and G operations that are structure dependent. Thus, if
the Merge Hypothesis is true (i.e. if FL/UG has Merge as the basic syntactic
operation), it explains why FL/UG has
this bushel of properties. In other words, the Merge Hypothesis, provides a
plausible first step in answering the basic MP question: why does FL/UG have
the properties it has?
Moreover, it is morally certain that something like Merge will be part of any theory of FL/UG precisely
because it is so very simple. It is always possible to add bells and whistles
to the G rules FL/UG makes available. But any theory hoping to be empirically
adequate will contain at least this
much structure. After all, what do (2) and (3) specify? They specify a
recursive procedure for building hierarchical structures that does nothing but
build such structures. Given the fact of Linguistic Productivity and Linguistic
Promiscuity any theory of FL/UG will contain at least this much. If it does not contain much more than this
much, then (1a-h) results. Not bad.