In an earlier post (here)
I mentioned that I was asked to write about the Minimalist Program (MP) for a
volume aimed at comparing various linguistic
“frameworks.” I am personally skeptical about “framework differences.” I
personally find them to be more notational variants of common themes (indeed,
when pressed, I have been known to complain that H/GPSG, LFG, RG, are all
“dialects” of GB) than actual conceptually divergent perspectives. The main
reason is that most linguistic theory lacks any real depth and most of the
frameworks have the wherewithal to mimic one another’s (ahem) deep insights.
So, where others see ideological divergence, I tend to see slight differences
in accent.
That said, I have a further gripe about this kind of volume.
Even were I to recognize that these different frameworks empirically competed
(or should compete) I do not think that MP should be included in the race. MP
is not an alternative to GB (or its
various dialects) but presupposes
that the results of GB are (largely) empirically correct. MP builds on GB results
(and its dialects) and aims to conserve these results. Thus, it is not intended
(or should not be intended) as a
wholesale replacement. For an MPer, GB is wrong the way that Newtonian
Gravitation is wrong when compared to General Relativity: the latter theory
derives the former as a limiting case. It does not reject the former as
misguided, rather it treats it as descriptive rather than fundamental. Indeed, an
important part of the argument in favor of General Relativity is that it can
derive Newton as a special case. If it could not, that would be an excellent
argument that it was fundamentally flawed.
And the MP point? From the MP perspective, as M Antony might
have put matters: MPers come to (largely) praise (and incorporate) GB not to
bury it. The whole point of MP is to show that the “laws” that GB discovered
can be understood in more fundamental MP terms. IMO, it has been less
appreciated than it should be how much MP has succeeded in making good on this
ambition. So, in this post (and another I will put up later on) I will try to
show how far MP has come in making good on its ambitions.
A caveat: if you
find GB (and its cousins) to be hopelessly wrongheaded, then you will find a
theory that derives its results also hopelessly wrongheaded. If you are one of
these, then MP will have, at most, aesthetic interest. If you, like me, take GB
to be more or less correct, then MP’s aesthetic virtues will combine with GB’s
empirical panache to create a very powerful intellectual rush. It will even
create the impression that MP is very much on the right track.
The Merge Hypothesis: Explaining some core features of FL/UG
Here is a list of some
characteristic features of FL/UG and its GLs:
(1) a. Hierarchical recursion
b. Displacement (aka, movement)
c. Gs
generate natural formats for semantic interpretation
d. Reconstruction
effects
e. Movement
targets c-commanding positions
f. No
lowering rules
g. Strict
cyclicity
h. G
rules are structure dependent
i. Antecedents
c-command their anaphors
j. Anaphors
never c-command their antecedents (i.e. Principle C effects and Strong Cross
Over Effects)
k. XPs
move, X’s don’t, X0s might
l. Control
targets subjects of “defective” (i.e. tns or agreement deficiency) clauses
m. Control
respects the Principle of Minimal Distance
n. Case
and agreement are X0-YP dependencies
o. Reflexivization
and Pronominalization are in complementary distribution
p. Selection/subcategorization
are very local head-head relations
q. Gs
treat arguments and adjuncts differently, with the former less “constrained”
than the latter
Note, I am not saying that this exhausts the properties of
FL/UG, nor am I saying that all LINGers agree with all of these accurately describe FL/UG.[1]
What I am saying is that (1a-q) identify empirically robust(ish) properties of
FL/UG and the generative procedures its GLs allow. Put another way,
I am claiming (i) that certain facts about human GLs (e.g. that they
have hierarchical recursion and movement and binding under c-command and
display principle C effects and obligatory control effects, etc.) are
empirically well-grounded and (ii) that it is appropriate to ask why FL/UG allows for GLs with
these properties and not others. If you buy this, then welcome to the
Minimalist Program (MP).
I would go further; not only are the assumptions in (i)
reasonable and the question in (ii) appropriate, MP has provided some answers
to the question in (ii). One well-known approach to (1a-h), the Merge Hypothesis (MH), unifies all these
properties, deriving them from the core generative mechanism Merge. Or more
particularly, MH postulates that FL/UG contains a very simple operation (aka,
Merge) that suffices to generate unbounded hierarchical structures (1a) and
that these Merge generated hierarchical structures will also have the seven
additional properties (1b-h). Let’s examine the features of this simple
operation and see how it manages to derive these eight properties?
Unbounded hierarchy implies a recursive procedure.[2]
MH explains this by postulating a simple operation (“Merge”) that generates the
requisite unbounded hierarchical structures. Merge consists of a very simple
recursive specification of Syntactic
Objects (SO) coupled with the assumption that complex SOs are sets.
b.
If a
is an SO and b
is an SO the Merge(a,b) is
an SO
(3) For a, b,
SOs, Merge(a,b)à
{a,b}
The inductive step (2b) allows Merge to apply to its own
outputs and thus licenses unboundedly “deep” SOs with sets contained within
sets contained within sets… The Merge Hypothesis is that the “simplest”
conception of this combinatoric operation (the minimum required to generate
unbounded hierarchically organized objects) suffices to explain why FL/UG has
many of the other properties listed in (1).
In what way is Merge the “simplest” specification of
unbounded hierarchy? The operation has three key features: (i) it directly and
uniquely targets hierarchy (i.e. the basic complex objects are sets (which are
unordered), not strings), (ii) it in no way changes the atomic objects combined
in combining them (Inclusiveness), and (iii) it in no way changes the complex
objects combined in combining them (Extension). Inclusiveness and Extension
together constitute the “No Tampering Condition” (NTC). Thus, Merge recursively
builds hierarchy (and only hierarchy) without “tampering” with the inputs in
any way save combining them in a very simple way (i.e. just hierarchy no linear
information).[4]
The key theoretical observation is that if
FL/UG has Merge as its primary generative mechanism,[5]
then it delivers GLs with properties (1a-h). And if this is right,
it provides a proof of concept that it is not premature to ask why FL/UG is structured as it is. In
other words, this would be a very nice result given the stated aims of MP.
Let’s see how Merge so conceived derives (1a-h).
It should be clear that Gs with Merge can generate unbounded
hierarchical dependencies. Given a lexicon containing a finite list of atoms a,b,g,d,… we
can, using the definitions in (2) and (3) form structures like (4) (try it!).
(4) a. {a, {b, {g, d}}}
b. {{a, b}, {g, d}}
c. {{{a, b}, g}, d}
And given the recursive nature of the operation, we can keep
on going ad libitum. So Merge
suffices to generate an unbounded number of hierarchically organized syntactic
objects.
Merge can also generate structures that model displacement
(i.e. movement dependencies). Movement rules code the fact that a single
expression can enjoy multiple relations within a structure (e.g. it can be both
a complement of a predicate and the subject of a sentence).[6]
Merge allows for the derivation of structures that have this property. And this
is a very good thing given that we know (due to over 60 years of work in
Generative Grammar) that displacement is a key feature of human GLs.
Here’s how Merge does this. Given a structure like (5a)
consider how (2) and (3) yield the movement structure (5b). Observe that in (5b),
b occurs twice. This can be understood as coding
a movement dependency, b being both sister of the SO a and sister of the derived
SO {g,
{l,
{a,
b}}}.
The derivation is in (6).
(5)
a. {g, {l, {a, b}}}
b. {b, {g, {l, {a, b}}}}
(6) The SO {g, {l, {a, b}}} and the SO b
(within {g,
{l,
{a,
b}}})
merge to from {b, {g, {l, {a, b}}}}
Note that this derivation assumes that once an SO always an
SO. Thus, Merging an SO a to form part of a complex SO b that contains a does
not change (tamper with) a’s status as an SO. Because complex SOs are composed of
SOs Merge can target a subpart of an SO for further Merging. Thus, NTC allows
Merge to generate structures with the properties of movement; structures where
a SO is a member of two different “sets.”
Let me emphasize an important point: the key feature that
allows Merge to generate movement dependencies (viz. the “once an SO always an
SO” assumption) follows from the assumption that Merge does nothing more than take SOs and form them
into a unit. It otherwise leaves the combined objects alone. Thus, if some
expression is an SO before being merged with another SO then it will retain
this property after being Merged given that Merge in no way changes the
expressions but for combining them. NTC (specifically the Inclusiveness and
Extension Conditions) leaves all properties of the combining expressions
intact. So, if a
has some property before being combined with b (e.g. being an SO), it
will have this property after it is combined with b. As being an SO is a
property of an expression, Merging it will not change this and so Merge thus
legitimately combine a subpart of an SO to its container.
Before pressing on, a comment: unifying movement and phrase
building is an MP innovation. Earlier theories of grammar (and early minimalist
theories) treated phrasal dependencies and movement dependencies as the
products of entirely different kinds
of rules (e.g. phrase structure rules vs transformations/Merge vs Copy+Merge).
Merge unifies these two kinds of
dependencies and treats them as different outputs of a single operation. As
such, the fact that FL yields Gs that contain both unbounded hierarchy and displacement operations is
unsurprising. Hierarchy and displacement are flips sides of the same
combinatoric coin. Thus, if Merge is the core combinatoric operation FL makes
available, then MH explains why FL/UG constructs GLs that have both
(1a) and (1b) as characteristic features.
Let’s continue. As should be clear, Merge generated
structures like those in (5) and (6) also provides all we need to code the two
basic types of semantic dependencies: predicate-argument structures (i.e.
thematic dependency) and scope structure. Let me be a bit clearer. The two
basic applications of move are those that take two separate SOs and combine
them and those that take two SOs with one contained in the other and combines
them. The former, E-Merge, is fit for the representation of predicate-argument
(aka, thematic structure). The latter, I-Merge, provides an adequate
grammatical format for representing operator/variable (i.e. scope)
dependencies. There is ample evidence that Gs code for these two kinds of
semantic information in simple constructions like Wh-questions. Thus, it is an argument in its favor that Merge as
defined in (2) and (3) provides a syntactic format for both. An argument
saturates the predicate it E-merges with and scopes over the SO it I-merges
with. If this is correct, then Merge provides structure appropriate to explain
(1c).
And also (1d). A standard account of Reconstruction Effects
(RE) involves allowing a moved expression to function as if it still occupied the position from which it moved. This as-if is redeemed theoretically if the
movement site contains a copy of the moved expression. Why does a displaced
expression semantically comport itself as if it is in its base position?
Because a copy of the moved expression is
in the base position. Or, to put this another way, a copy theory of movement
would go a long way towards providing the technical wherewithal to account for
the possibility of REs. But Merge based accounts of movement like the one above
embody a copy theory of movement. Look at (5b): b is in two positions in
virtue of being I-merged with its container. Thus, b is a member of the lowest
set and the highest. Reconstruction amounts to choosing which “copy” to
interpret semantically and phonetically.[7]
Reducing movement to I-merge explains why movement should allow REs.[8]
Furthermore, having this option follows from a key
assumption concerning Merge. Recall that it eschews tampering. In other words,
if movement is a species of Merge then no-tampering requires coding movement
with “copies.” To see this contrast how movement is treated in GB.
Within GB, if a moves from its base position to some higher position a
trace is left in the launch site. Thus, a GB version of (5b) would look
something like (7):
(7) {b1,
{g,
{l,
{a,
t1}}}}
Two features are noteworthy; (i) in place of a copy in the
launch site we find a trace and (ii) that trace is co-indexed with the moved
expression b.[9]
These features are built into the GB understanding of a movement rule.
Understood from a Merge perspective, this GB conception is doubly suspect for
it violates the Inclusiveness Condition clause of the NTC twice over. It
replaces a copy with a trace and it adds indices to the derived structure.
Empirically, it also mystifies REs. Traces have no contents. That’s what makes
them traces (see note 13). Why are they then able to act as if they did have
them? To accommodate such effects GB adds a layer of theory specific to REs (e.g. it invokes
reconstruction rules to undo the effects of movement understood in trace
theoretic terms). Having copies in place of traces simplifies matters and
explains how REs are possible. Furthermore, if movement is a species of Merge
(i.e. I-merge) then SOs like (7) are not generable at all as they violate NTC.
More specifically, the only kosher way to code movement and obey the NTC is via
copies. So, the only way to code movement given a simple conception of
syntactic combination like Merge (i.e. one that embodies no tampering) results
in a copy theory of movement that serves to rationalize REs without a
theoretically bespoke theory of reconstruction. Not bad![10]
So Merge delivers properties (1a-d), and the gifts just keep
on coming. It also serves up (6e,f,g) as consequences. This time let’s look at
the Extension Condition (EC) codicil to the NTC. EC requires that that inputs
to Merge be preserved in the outputs to Merge (any other result would “change”
one of the inputs). Thus, if an SO is input to the operation it will be a
unit/set in the output as well because Merge does no more than create
linguistic units from the inputs. Thus, whatever is a constituent in the input
appears as a constituent with the same properties in the output. This implies (i) that all I-merge is to a
c-commanding position, (ii) that lowering rules cannot exist, and (iii) that
derivations are strictly cyclic.[11]
The conditions that movements be always upwards to c-commanding positions and
strictly cyclic thus follows trivially from this simple specification of Merge
(i.e. Merge with NTC understood as embodying EC).
An illustration will help clarify this. NTC prohibits
deriving structure (8b) from (8a). Here we Merge g with a. The
output of this instance of Merge obliterates the fact that {a,b} had
been a unit/constituent in (8a), the input to Merge. EC prohibits this. It
effectively restricts I-Merge to the root. So restricted, (8b) is not a licit
instance of I-Merge (note that {a,b} is not a unit in the output. Nor is (8c) (note that {{a,b},{g,d}} is
not a unit in the output). Nor is a derivation that violates the strict cycle
(as in (8d)). Only (8e) is a grammatically licit Merge derivation for here all
the inputs to the derivation (i.e. g and {{a,b},{g,d}}) are also units in the output of the derivation (i.e.
thus the inputs have been preserved (remain unchanged) in the output). Yes a
new relation has been added, but no previous ones have been destroyed (i.e. the
derivation is info-preserving (viz. monotonic). Repeat the slogan: once an SO
always an SO). In deriving (8b-c) one of the inputs (viz. {{a,b},{g,d}})
is no longer a unit in the output and so NTC/EC has been violated.
(8) a.
{{a,b},{g,d}}
b.
{{{g,a},b}, {g,d}}
c.
{{g,{a,b}},{g,d}}
d. {{a, b}, {d, {g,d}}
d.
{g,{{a,b},{g,d}}}
In sum, if movement
is I-merge subject to NTC then all movement will necessarily be to c-commanding
positions, upwards, and strictly cyclic.
It is worth noting that these three features are not
particularly recondite properties of FL/UG and find a place in most GG accounts
of movement. This makes their seamless derivation within a Merge based account
particularly interesting.
Last, we can derive the fact that the rules of grammar are
structure dependent ((1h) above), an oft-noted feature of syntactic operations.[12]
Why should this be so? Well, if Merge is the sole syntactic operation and then
non-structure dependent operations are very hard (impossible?) to state. Why?
Because the products of Merge are sets and sets impose no linear requirements
on their elements. If we understand a derivation to be a mapping of phrase
markers into phrase markers and we
understand phrase markers to effectively be sets (i.e. to only specify
hierarchical relations) then it is no surprise that rules that leverage linear
left-right properties of a string cannot be exploited. They don’t exist for
phrase markers eschew this sort of information and thus operations that exploit
left/right (i.e. string based) information cannot be defined. So, why are rules of G structure dependent?
Because this is the only structural information that Merge based Gs represent.
So, if the basic combinatoric operation that FL/UG allows is Merge, then FL/UGs
restriction to structure dependent operations is unsurprising.
This is a good place to pause for a temporary summary:
Research in GG over the last 60 years has uncovered several plausible design
features of FL/UG. (1a-h) summarizes some uncontroversial examples. All of these properties of FL/UG can be
unified if we assume that Merge as outlined in (2) and (3) is the basic
combination operation that FL/UG affords. Put simply, the Merge Hypothesis has
(1a-h) as consequences.
Let me say the same thing more tendentiously. All agree that
a basic feature of FL/UG is that allows for Gs with unbounded hierarchy. A very
simple inductive procedure sufficient for specifying this property (1a), also
entails many other features of FL/UG (1b-h). What makes this specification
simple is that it directly targets hierarchy and requires that the computation
be strongly monotonic (embody the NTC). Thus we can explain the fact that FL/UG
has these properties by assuming that it embodies a very simple (arguably, the
simplest) version of a procedure that any
empirically adequate theory of FL/UG would have to embody. Or, given that FL/UG allows for unbounded
hierarchical recursion (a non-controversial fact given the fact of Linguistic
Productivity), the simplest (or at
least, very simple) version of the requisite procedure brings in its train
displacement, an adequate format for semantic interpretation, Reconstruction
Effects, movement rules that target c-commanding positions, eschew lowering and
are strictly cyclic, and G operations that are structure dependent. Thus, if
the Merge Hypothesis is true (i.e. if FL/UG has Merge as the basic syntactic
operation), it explains why FL/UG has
this bushel of properties. In other words, the Merge Hypothesis, provides a
plausible first step in answering the basic MP question: why does FL/UG have
the properties it has?
Moreover, it is morally certain that something like Merge will be part of any theory of FL/UG precisely
because it is so very simple. It is always possible to add bells and whistles
to the G rules FL/UG makes available. But any theory hoping to be empirically
adequate will contain at least this
much structure. After all, what do (2) and (3) specify? They specify a
recursive procedure for building hierarchical structures that does nothing but
build such structures. Given the fact of Linguistic Productivity and Linguistic
Promiscuity any theory of FL/UG will contain at least this much. If it does not contain much more than this
much, then (1a-h) results. Not bad.
[1]
For example, fans of Dependent Case Theory will reject (1n).
[2]
Recall, that LP implies recursion and linguistics has discovered ample evidence
that GLs can generate structures of arbitrary depth.
[3]
The term lexical item denotes the
atoms that are not themselves products of Merge. These roughly correspond to
the notion morpheme or word, though these notions are
themselves terms of art and it is possible that the naïve notions only roughly
corresponds to the technical ones. Every theory of syntax postulates the
existence of such atoms. Thus, what is debatable is not their existence but
their features.
[4]
In my opinion, this line of argument does not require that Merge be the
“simplest” possible operation. It suffices that it be natural and simple. The
conception of Merge in (2) and (3) meets this threshold.
[5]
In the best of all worlds, the sole generative procedure.
[6]
A phrase marker is just a list of relations that the combined atoms enjoy.
Derivations that map phrase markers into phrase markers allow an expression to
enjoy different relations coded in the various relations it enjoys in the
varying phrase markers.
[7]
Copy is simply a descriptive term here. A more technically accurate variant is
“occurrence.” b
occurs twice in (5b). The logic, however, does not change.
[8]
A full theory of REs would articulate the principles behind choosing which
copies to interpret. See Sportiche (forthcoming) for an interesting substantive
proposal.
[10]
We could go further: Merge based theory cannot have objects like traces. Traces
live on the distinction between Phrase Structure Rules and lexical insertion
operations. They are effectively the phrase structure scaffolding without the
lexical insertion. But, Merge makes no distinction between structure building
and lexical insertion (i.e. between slots and contents). As such, if traces exist, they must be lexical
primitives rather than syntactically derived formatives. This would be a very
weird conception of traces, inconsistent with the GB rendering in note 12. The
same, incidentally, goes for PRO, which we will talk about later on. The
upshot: not only would traces violate No tampering, they are indefinable given
the “bare phrase structure” nature of movement understood as I-merge.
[11]
The first conjunct only holds if there is no inter-arboreal/sidewards movement.
For now, let’s assume this to be correct.
[12]
For a recent review and defense of the claim see Berwick et. al.
Hi,
ReplyDelete"I personally find them to be more notational variants of common themes (indeed, when pressed, I have been known to complain that H/GPSG, LFG, RG, are all “dialects” of GB) than actual conceptually divergent perspectives."
There is this old joke: "Your theory is a notational variant of mine and it is wrong." You may check textbooks on LFG and HPSG (eg Bresnan's) to read about movement paradoxes that do not arise or can be solved in LFG/HPSG. The theory about extraction in GPSG and HPSG gets Across the Board extraction right without any further stipulation. And so on. There are differences.
"The main reason is that most linguistic theory lacks any real depth and most of the frameworks have the wherewithal to mimic one another’s (ahem) deep insights."
This is just plainly wrong. There are deep implemented analyses of several languages both in LFG and HPSG. You may read about the CoreGram project here.
https://hpsg.hu-berlin.de/~stefan/Pub/coregram.html
It has large scale grammars for German, Danish, Persian, Maltese and smaller ones for Yiddish, Hindi, French, English. All grammars come with a morphology component that is part of the linguistic theory and with semantics (underspecified semantics, Minimal Recursion Semantics). No such thing exists for GB not to talk about MP. You may read up in my grammar theory text book. It lists implementations for all the alternative frameworks. And it also discusses formalization of GB/MP theories. There are papers called "The GB blues" by authors who gave up frustrated since it was impossible to come up with implementations since core notions were not worked out. The most prominent implementations were those by Stabler, but they are toy grammars compared to what is available elsewhere. Please read the sections about complexity and interaction of phenomena in the CoreGram paper or in my GT textbook. As Abney said: the more you cover the more complex it gets.
http://langsci-press.org/catalog/book/25
You may download a virtual machine and play with the grammars:
https://hpsg.hu-berlin.de/Software/Grammix/
They exist, they are consistent, they work and they share a common core. All the goals that Generative Grammar always had are reached in this project. Please drop the statement that all theories are shallow. It is not true.
"So, where others see ideological divergence, I tend to see slight differences in accent."
I agree about move and merge.
https://hpsg.hu-berlin.de/~stefan/Pub/unifying-everything.html
They were part of HPSG right from the beginning, but research style, rhetoric and the overall architecture are quite different. Kayne-style theories would never be entertained in any of the alternative theories.
Best
Stefan
A side note on the following comment:
ReplyDelete>You may check textbooks on LFG and HPSG (eg Bresnan's) to read about movement paradoxes that do not arise or can be solved in LFG/HPSG.
Movement paradoxes aren’t paradoxes in any vaguely modern variety of transformational grammar. The idea of a movement paradox sort of makes sense with reference to varieties of TG where (i) there are no surface filters and (ii) apparent optionality is accounted for via optional transformations. It is then indeed puzzling that you can say “That languages are learnable is captured by this theory” but not “This theory captures that languages are learnable” (given that the only way to account for the availability of both structures with other passive verbs is to posit an optional transformation). However, once you have surface filters and/or featural triggers for movement, there is nothing paradoxical about such cases at all. For example, you could add a “* captures that” filter, or have the verb ‘capture’ refuse to select a CP complement lacking whatever feature it is that triggers the movement.
Looks like "epicycles" to me. In any case the difference remains. You have a transformational theory that needs filters and you have other theories that do not. And what about Haider's cases where you have "[Einen Hund füttern, der Hunger hat,] wird man wohl müssen." but not "weil man wohl einen Hund füttern, der Hunger hat, müssen wird"
DeleteIt's important to be clear on what the criticism actually is. You seem to be conceding that there is no paradox. So rather than making vague allusions to epicycles, you need to explain exactly how the data that usually come under the heading of "movement paradoxes" pose a problem for any modern variety of TG. This has not been done in the existing literature, so it would be a useful contribution if you could take the time to spell it out.
ReplyDeleteAs for needing filters, you'll notice that I mentioned a way of handling these cases that doesn't make use of filters. But anyway, every framework has some kind of technical device that the others lack (this is what makes the frameworks different!), so I fail to see how this is a big deal. You might just as well point out that the transformational theory needs transformations while the others do not.
You'll have to elaborate on the significance of the German examples. What makes them more difficult?
Yes, you are right. The theories are different. This was my main point. So, they are not just notational variants of each other.
ReplyDeleteThe point about the German data is that there is fronted stuff that cannot originate from one point since it would be ungrammatical there (The relative clause has to follow all verbs, it cannot go into the middle).
The fact that the theories use different technical devices does not show that they aren't notational variants.
ReplyDeleteGiven your description of the German data, it does not seem any more problematic than the English examples that I referred to. What exactly makes it different and more difficult to deal with?
The story you say is very long. I think this has made me realize this is so much and many people are interested in reading this very well. For those who are interested in reading this, hurry to read this page at all.
ReplyDeleteสมัครบาคาร่า