3. Minimalism:
The third epoch
Where are we? We
reviewed how the first period of syntactic research examined how grammars that
had to generate an unbounded number of hierarchically organized objects might
be structured. It did this by postulating rules whose interactions yielded
interesting empirical coverage, generating both a fair number of acceptable
sentences and not generating an
interesting number of unacceptable sentences. In the process, this early work
discovered an impressive number of effects that served as higher-level targets
of explanation for subsequent theory. To say the same thing a little pompously,
early GG discovered a bunch of “effects” which catalogued deep-seated
generalizations characteristic of the products of human Gs. These effects sometimes fell together as
“laws of grammar” and were taken, reasonably, as consequences of the built-in
structural properties of FL.
This work set the stage for the second stage of research: a
more direct theoretical investigation of the properties of FL. The relevant
entrée to this line of investigation was Plato’s Problem: the observation that
what native speakers know about their languages far exceeds what they could
have learned about it by examining the PLD available to them in the course of
language acquisition. Conceptually, addressing Plato’s Problem suggested a
two-prong attack: first, radical simplification of the rules that Gs contain
and second, enrichment of what FL brings to the task of acquisition. By
factoring out the complexity built into previous rules into simple operations
like Move a
made the language particular rules that were acquired easier to acquire. This simplification, however, threatened
generative chaos. The theoretical task was to prevent this. This was
accomplished by enriching the innate structure of FL in principled ways. The
key theoretical innovation was trace theory.
Traces simplified derivations by making them structure preserving and it
allowed for the unification of movement and binding. These theoretical moves
addressed the over-generation problem.[1]
They also set the stage for contemporary minimalist investigations. We turn to
this now.
The main problem with the GB theory of FL from a minimalist
perspective is its linguistic specificity.
Here’s what we mean.
Within GB, FL is both very complex and the proposed innate
principles and operations are very linguistically specific. The complexity is
evident in the modular architecture of the basic GB theory as well as in the
specific principles and operations within each module. (26) and (27) reiterate
the basic structure of the theory.
(26) a. X’ theory of phrase structure
b.
Case
c. Theta
d. Movement
i. Subjacency
ii. ECP
e. Construal
i. Binding
f. Control
(27) DS: X’-rules,
Theta Theory, input to T-rules
|
| Move a
(T-rules)/ trace theory, output SS
|
SS: Case
Theory, Subjacency, gamma-marking,
BT
|
| | Move a
(covert movement)
PF LF: BT, *+gamma,
Though some critical relations crosscut (many of) the
various modules (e.g. government), the modules each have their own special
features. For example, X’ theory traffics in notions like specifier,
complement, head, maximal projection, adjunct and bar level. Case theory also
singles out heads but distinguishes between those that are case assigning and
those that require case. There is also a case filter, case features and case
assigning configurations (government). Theta theory also uses government but
for the assignment of Q-roles, which are assigned in D-structure by heads and
are regulated by the theta criterion; a condition that requires every argument
to get one and at most one theta role. Movement exploits another set of
concepts and primitives: bounding node/barrier, escape hatch, subjacency
principle, antecedent government, head government, g-marking, a.o. Last the
construal rules come in four different types, one for PRO, one for local
anaphors like reflexives and reciprocals, one for pronouns and one for all the
other kinds of DPs, dubbed R-expressions. There is also a specific licensing
domain for anaphors and
pronouns, indexing procedures for the specification of syntactic
antecedence relations and hierarchical requirements (c-command) between an
antecedent and its anaphoric dependent. Furthermore all of these conditions are
extrinsically ordered to apply at various derivational levels specified in the
T-model.[2]
If the information outlined in (26) and (27) is on the right
track, then FL is richly structured with very domain specific (viz.
linguistically tuned) information. And though such linguistic specificity is a
positive with regard to Plato’s Problem, it raises difficulties when trying to
address Darwin’s Problem (i.e. how FL could have arisen from a pre-linguistic
cognitive system). Indeed, the logic of the two problems seems to have them
pulling in largely opposite directions.
A rich linguistically specific FL plausibly eases the child’s task by
restricting what the child needs to use the PLD to acquire. However, the more cognitively sui generis FL is, the more complicated
the evolutionary path to FL. Thus, from
the perspective of Darwin’s problem, we want the operations and principles of
FL to be cognitively (or computationally) general and very simple. It is this tension that modern GG aims to
address.
The tension is exacerbated when the evolutionary timeline is
considered. The consensus opinion is that humans became linguistically facile
about 100,000 years ago and that the capacity that evolved has remained
effectively unchanged ever since. Thus,
whatever the addition, it must have been relatively minor (the addition of at
most one or two operations/principles). Or, putting this another way, our FL is
what you get when you wed (at most) one (or two) linguistically specific
features with a cognitively generic brain.
Threading the Plato’s/Darwin’s problem needle suggests a
twofold strategy: (i) Simplify GB by unifying the various FL internal modules
and (ii) Show that this simplified FL can be distilled into largely general
cognitive an/or computational parts plus (at most) one linguistically specific
one.[3]
Before illustrating how this might be managed, note that GB
is the target of explanation. In other words, the Minimalist Program (MP) takes
GB to be a good model of what FL looks like. It has largely correctly described
(in extension) the innate structure of FL. However, the GB description is not
fundamental. If MP is realizable, then FL is less linguistically parochial than
GB supposes. If MP is realizable, then FL exploits many generic operations and
principles (i.e. operations and principles not domain restricted to language)
in its linguistic computations. Yet, MP takes GB’s answer to Plato’s Problem to
be largely correct though it disagrees with GB about how domain specific the
innate architecture of FL is. More concretely, MP agrees with GB that Binding
Theory provides a concise description of certain grammatical laws that
accurately reflect the structure of FL. But, though GB’s BT accurately
describes these laws/effects and correctly distinguishes the data that reflects
the innate features of FL from what is acquired, the actual fundamental principles
of binding are different from those identified by GB (though these principles
are roughly derivable from the less domain specific ones that characterize the
structure of FL). (Grandiosely (pehaps)) borrowing terminology common in
physics, MP takes GB to be a good effective
theory of FL but denies that it is the fundamental
theory of FL. A useful practical consequence of this is to take the
principles of GB to be targets for derivation by the more fundamental
principles that minimalist theories will discover.
That’s the basic idea.
Let’s illustrate with some examples. First let’s do some GB clean-up. If
MP is correct, then GB must be much simpler than it appears to be. One way to
simplify the model is to ask which features of the T-model are trivially true
and which are making substantive claims. Chomsky (1993) argues that whereas PF
and LF are obviously part of any theory of grammar, DS and SS are not (see
below for why). The former two levels are unexciting
features of any conceivable theory,
while the latter two are empirically substantive. To make this point another
way: PF and LF are conceptually motivated while DS and SS must be empirically
motivated. Or, because DS and SS complicate the structure of FL, we should
attribute them to FL only if the facts require it. Note, that this implies that if we can
reanalyze the facts that motivate DS and SS in ways that do not require
adverting to these two levels, we can simplify the T-model to its bare
conceptual minimum.
So what is this minimum? PF and LF simply state that FL
interfaces with the conceptual system and the sound system (recall the earlier
<m,s> pairs in section 0). This must
be true: after all, linguistic products obviously
pair meanings and sounds (or motor articulations of some sort). So FL’s
products must interact with the thought system and the sound system. Bluntly put, no theory of grammar that failed
to interact with these two cognitive systems would be worth looking at.
This is not so with DS and SS. These are FL internal levels with FL internal properties. DS is where Q-structure
and syntax meet. Lexical items are put into X’-formatted phrase markers in a
way that directly reflects their thematic contributions. In particular, all and
only Q-positions
are occupied in DS and the positions they occupy reflect the thematic
contributions the expressions make. Thus, DPs understood as being the logical
objects of predicates are in syntactic object positions, logical subjects in
syntactic subject positions etc.
Furthermore, DS structure building operations
(X’-operations, Lexical Insertion) are different in kind from the
Transformations that follow (e.g. Move a) and the T-model
stipulates that all DS operations apply before any Move a operation does (viz. DS operations and Move a
never interleave). Thus, DS implicitly defines the positions move a can
target and it also helps distinguish the various kinds of phonetically empty
categories Gs can contain (e.g. PRO, versus traces left by Move a).
So, DS is hardly innocent: it has distinctive rules, which
apply in a certain manner, and produces structures meeting specific structural
and thematic conditions. All of this is non-trivial and coded in
a very linguistically proprietary vocabulary.
One reasonable minimalist ambition is to eliminate DS by showing that
its rules can be unified with those in the rest of the grammar, that its rules
freely mix with other kinds of processes (e.g. movement), and that Q-relations
can be defined without the benefit of a special pre-transformational
level.
This is what early minimalist work did. It showed that it
was possible to unify phrase structure building rules with movement operations,
that both these kinds of processes could interleave and (this is more
controversial) that both structure building rules and movement operations could
discharge Q-obligations.
In other words, that the properties that DS distinguished were not proprietary
to DS when looked at in the right way and so DS did not really exist.
Let’s consider the unification of phrase structure and
movement. X’ theory took a big step in
eliminating phrase structure rules by abandoning the distinction between phrase
structure rules and lexical insertion operations. The argument for doing so is that PS rules
and lexical insertion processes are highly redundant. In particular, given a
set of lexical items with specified thematic relations determines which PS
rules must apply to generate the right structures to house them. In effect, the
content of the lexical items determines the relevant PS rules. By reconceiving
phrase structure as the projection of lexical information, the distinction
between lexical insertion and PS rules can be eliminated. X’ theory specifies how to project a given lexical item with
given lexical information into a syntactic schema based on its lexical content
instead of first generating the phrase structure and then filtering the
inappropriate options via lexical insertion.
Minimalist theories carry this one step further. It unifies
the operations that built phrases and the operations that underlie
movement. The unification has been
dubbed Merge. What is it? Merge takes
two linguistic items X, Y and puts them together in the simplest imaginable
way. In particular, it just puts them
together: it specifies no order between them, it does not change them in any
way when putting them together and it puts them together in all the ways that
two things can be put together.
Concretely, it takes two linguistic items and forms them into a set.
Let’s see how.
Take the items eat
and bagels. ‘Merge (eat, bagels)’ is
the operation that forms the set {eat, bagels}. This object, the set, can
itself be merged with Noam (i.e.
Merge (Noam, {eat,bagels}) to form {Noam, {eat, bagels}}. And we can apply
Merge to this too (i.e. Merge (bagels, {Noam, {eat,bagels}}) to get {bagels,
{Noam, {eat, bagels}}. This illustrates
the two possible applications of Merge X,Y: The first two instances apply to
linguistic elements X and Y where neither contains the other. The last applies
to X and Y where one does contain the other (e.g. Y contains X). The first
models PS operations like complementation, the second movement operations like Topicalization. Merge is a very simple rule, arguably (and Chomsky has argued this) the
simplest possible rule that derives an unbounded number of structured objects.
It is recursive and it is information preserving (e.g. like the Projection
Principle was). It unifies phrase building and movement. It also models
movement without the benefit of
traces. Two occurrences of the same element (i.e. the two instances of bagels above) express the movement
dependency. If sufficient, then, Merge is just what we need to simplify the grammar. It simplifies it
by uniting phrase building and movement and models movement without resorting
to very “grammatiky” element like traces.[4]
Merge has other interesting properties, when combined with
plausible generic computational constraints. For example, as noted, the simplest possible
combination operation would leave the combining elements unchanged. Call this
the No Tampering Condition (NTC). The
NTC clearly echoes the GB Projection Principle in being a conservation
principle: objects once created must be conserved. Thus, structure once created
cannot be destroyed. Interestingly, the NTC entails some important grammatical
generalizations that traces and their licensing conditions had been used to
explain before. For example, it is well known that human Gs do not have
lowering rules, like (27) (we use traces to mark whence the movement began): [5]
(27) [ t1…[…[b
… a1…]…]
The structure in (27) depicts the output of an operation
that takes a and moves it down
leaving behind an unlicensed trace t1.
In GB, the ECP and principle A filtered such derivations out. However, the NTC
suffices to achieve the same end without invoking traces (which, recall, MP
aims to eliminate as being too “linguistiky”). How so? Lowering rules violate
the NTC. In (27), if a lowers into the structure labeled b the
constituent that is input to the operation (viz. b without a in
it) will not be preserved in the output.
This serves to eliminate this possible class of movement operations
without having to resort to traces and their licensing conditions (a good thing
given Minimalist ambitions with regard to Darwin’s Problem).
Similarly, we can derive the fact that when movement occurs
the moved expression moves to a position that c-commands the original movement cite
(another effect derived via the ECP and Principle A in GB). This is illustrated
in (28). Recall, Movement is just merging a subpart of a phrase marker with
another part. So say we want to move a and combine it with the
phrase marked b.
Then, unless a
merges with the root, the movement will violate the NTC. Thus, if Merge obeys
the NTC, movement will always to a c-commanding position. Again a nice result,
achieved without the benefit of traces.
(28) [b ….[….a….]…]
à
[a
[b
….[….a….]…]]
In fact, we can go one step further: the NTC plausibly requires the elimination of traces and
their replacement with “copies.” In other words, not only can we replace traces with copies, given the NTC we must do so. The reason is that defining
movement as an operation that leaves behind a “trace” violates the NTC if
strictly interpreted. In (28), for example, were we to replace the lower a on
the right of the derivational arrow with a trace (i.e. [e]1) we will
not be conserving the input to the derivation in the output. This violates the
NTC. Thus, strengthening the Projection Principle to the NTC eliminates the
possibility of traces and requires that we derive earlier trace theoretic
results in other ways. Interestingly, as we have shown, the NTC itself already
prohibits lowering and requires that movement be to a c-commanding position;
two important consequences of the GB theory of trace licensing. Thus, we derive
the same results in a more principled fashion. In case you were wondering, this
is a very nice result.[6]
In sum, not only can we unify movement and phrase building
given a very simple rule like Merge, but arguably the computationally simplest
version of it (viz. one that obeys the NTC, a very generic (non language
specific) cognitive principle regarding computations) will also derive some of
the basic features of movement that traces and their specific licensing
conditions accounted for in GB. In other
words, we get many of the benefits of GB movement theory without their language
specific apparatus (viz. traces).
We can go further still. In the best of all possible worlds,
Merge should be the sole linguistically
specific operation in the FL. That means
that the GB relations that the various different modules exploited should all
resolve to a single Merge style dependency. In practice, this means that all
non-local dependencies should resolve to movement dependencies. For example,
binding, control, and case assignment should all be movement dependencies
rather than licensed under the more parochial conditions GB assumed. Once again, we want the GB data to fall out
without the linguistiky GB apparatus.
So, can the modules be unified as expressions of various
kinds of movement dependencies? The outlook is promising. Let’s consider a
couple of illustrative examples to help fix ideas. Recall, that the idea is
that dependencies that in GB are not
movement dependencies are now treated as products of movement (which, recall,
can be unified with Phrase structure under a common operation Merge). So, as a baseline, let’s consider a standard
case of subject to subject raising (A-movement). The contrast in (29)
illustrates the well-known fact that raising from non-finite subjects is
possible, while raising from finite subjects is not.
(29) a. John1 was believed t1 to be tall
b. *John1 was
believed t1 is tall
Now observe that case marking patterns identically in the
same contexts (c.f. (30)). These are ECM
structures, wherein the embedded subject him
in (30a) is case licensed by the higher verb believe.[7]
On the assumption that believe can
case license him iff they form a
constituent, we can explain the data in (30) on the model of (29) by assuming
that the relevant structure for case licensing is that in (31). Note that where
t1 is acceptable in (29) it is also
acceptable in (31) and where not, not. In other words, we can unify the two
cases as instances of movement.
(30) a. John believes him to be tall
b. *John believes
him is tall
(31) a. John [him1 believes [t1 to be tall]]
b. *John [him1
believes [t1 is tall]]
The same approach will serve to unify Control and
Reflexivization with movement. (32) illustrates the parallels between Raising
and Control. If we assume that PRO is actually the residue of movement (i.e.
the grammatical structure of (32c,d) is actually (32e,f)), we can unify the two
cases.[8]
Note the structural parallels between (29b), (31b), (32b) and (32f).
(32) a. John1 seems t1 to like Mary (Raising)
b.
*John1 seems t1 will like Mary
c. John1 expects
PRO1 to like Mary (Control)
d. *John expects t
will like Mary
e. John1 expects [t1
to like Mary]
f. *John expects [t
will like Mary]
The same analysis extends to explain the Reflexivization
data in (33) on the assumption that reflexives are the morphological residues
of movement.
(33) a. John1 expects himself1 to win
b. John1 expects
t1 to win
c. *John expects
(t1=)himself will win
d. *John expects
t1 will win
These are just illustrations, not full analyses. However, we
hope that they serve to motivate as plausible a project aiming to unify
phenomena that GB treated as different.
There are various other benefits of unification. Here’s one
more. Another nice side-benefit of this unification is an explanation of the
c-command condition in anatecedent-anaphor licensing that is part of the GB BT.
Thus, the c-command condition on Reflexive licensing follows trivially once
Reflexivization is unified with movement, as, recall, that a moved expression
must c-command it’s launch site is a simple consequence of the NTC in a Merge
based theory. Thus if Reflexivization is
an instance of movement, then the c-command condition packed into BT follows
trivially. There are other nice consequences as well, but here is not the place
to go into them. Remember, this is a shortish
Whig History!
Before summing things up, note two features of this
Minimalist line of inquiry. First, it takes the classical effects and laws very seriously. MP approaches build on
prior GG results. Second, it extends the theoretical impulses that drove GB
research. NTC bears more than a passing family resemblance to the Projection
Principle. The radical simplification of Merge continues the process started
with Move a.
The unification of movement, case, reflexivization and control echoes the
unification of movement and binding in GB. The replacement of traces with
copies continues the process of eliminating the cognitive parochialism of
grammatical processes that the elimination of constructions by GB began, as
does the simplification of the T-model by the removal of D-structure and
S-structure as “special” grammatical levels (which, incidentally is a necessary
step in the unification of the four phenomena above in terms of movement). So
the simplification of rules and derivations, and the unification of the various
dependencies is a well-established theme within GG research, one that
Minimalism is simply further extending. Modern syntax sits squarely on the
empirical and theoretical results of earlier GG research. There is no radical
discontinuity, though there is, one would hope, empirical and theoretical
progress.
3. Conclusion.
As we noted at the outset, one mark of a successful science
is that it is both empirically and theoretically cumulative. Even “revolutions”
in the sciences tend to be conservative in the sense that new theories are (in
part) evaluated by how they explain results form prior theory. Einstein did not discredit Newton. He showed
that Newton’s results were a special case of a more general understanding of
gravity. Quantum mechanics did not overturn classical mechanics but showed that
the latter were special cases of the former (when lots of stuff interacts).
This is the mark of a mature discipline. It’s earlier discoveries serve as boundary conditions for developing
novelties.
Moreover, this is generally true in several ways. First, a successful field generally has a
budget of “effects” that serve as targets of theoretical explication. Effects
are robust generalizations of (often) manufactured data. By “manufactured” we
mean not generally found in the wild
but the result of careful and deliberate construction. In physics there are
many many of these. Generative Grammar has a nice number of these as well (as
we reviewed in section 2). A nice
feature of effects is that they are relatively immune to shifts in theory. SCO
effects, Complex NP effects, CED effects, Fixed Subject Condition effects, Weak
and Strong Crossover effects etc. are robust phenomena even were no good theory
to explain why they have the
properties they do.[9]
This is why effects are good tests for theories.
Groups of effects, aka “laws,” are more theory dependent
than effects but still useful targets of theoretical explanation. Examples of
these in GG are Island conditions (which unify a whole variety of distinct
island effects), Binding conditions, Minimality effects, Locality effects,
etc. As noted, these are more general
versions of the simpler effects noted above and their existence relies on
theoretical unification. Unlike the
simpler effects that compose them, laws are more liable to reinterpretation as
theory progresses for they rely on more theory for their articulation. However,
and this is important, a sign of scientific progress is that these laws are
also generally conserved in later theoretical developments. There may be some tidying up at the edges,
but by and large treating Binding as a unified phenomenon applying to anaphoric
dependencies in general has survived the theoretical shifts from the Standard
theory to GB to MP. So too with the
general observations concerning how movement operations function (viz.
cylically, no lowering, to c-commanding positions). Good theories, then, conserve prior effects
and tend to conserve prior laws. Indeed,
successful novel theories tend to treat prior theoretical results and laws as
limit cases in the new schema. As our WH
above hopefully illustrates, this is also a characteristic of GG research over
the last 60 years.
Last but not least, novel theories tend to conserve the themes that motivated earlier inquiry.
GB is a very nice theory, which explains a lot.
The shift to Minimalist accounts, we have argued, extends the style of explanation that GB initiated. The focus on
simplification of rules and derivations and the ambition to unify what appear
to be disparate phenomena is not a MP novelty. What is novel (perhaps) is the
scope of the ambition and the readiness to reanalyze grammar specific
constructs (traces, DS, SS, the distinction between phrase structure and
movement, etc.) in more general terms.
But, as we have shown, this impulse is not novel. And, more important still, the ambition has
been made possible by the empirical and theoretical advances that GB
consolidated. This is what happens in
successful inquiry: the results of prior work provide a new set of problems
that novel theory aims to explain without loosing the insights of prior theory.
As we’ve argued, GG research has been both very successful
and appropriately conservative. Looked
at in the right way (our WH!), we are where we are because we have been able to
extend and build on earlier results. We
are making haste slowly and deliberately, just as a fruitful scientific program
should. Three cheers for Generative Grammar!!!
[1]
The prettiest possible theory, one that Chomsky advanced in early GB, failed to
hold empirically. The first idea was, effectively, to treat all traces as anaphoric. Though this
worked very well for A-traces, it proved inadequate for A’-traces, which seemed
to function more like R-expressions than anaphors (or at least the “argument”
A’-traces did). A virtue of assimilating A’-traces to R-expressions is that it
led to an explanation of Strong Cross Over effects in terms of Principle C. Unfortunately, it failed to explain a range
of subject-object and argument- adjunct asymmetries that crystalized as the
ECP. These ECP effects led to a whole new set of binding-like conditions
(so-called “antecedent government”) that did not fit particularly comfortably
with other parts of the theory. Indeed, the bulk of GB theory in the last part
of the second epoch consisted in investigations of the ECP and various ways of
trying to explain the subject/object and argument/adjunct effects. Three important ideas came from this work:
first that the domains relevant for ECP effects are largely identical to those
relevant for subjacency effects. Second that ECP effects really do come in two
flavors with the subject-object cases being quite different from the
argument-adjunct cases. Third, Relativized Minimality. This was an important idea due to Rizzi, and
one that fit very well with later minimalist conceptions. This said, ECP
effects, especially the argument/adjunct asymmetries have proven theoretically
refractory and still remain puzzling, especially in the context of Minimalist
theory.
[2]
By ‘extrinsically’ we mean that the exact point in the derivation at which the
conditions apply is stipulated.
[3]
We distinguish cognitively general from computationally general for there are two
possible sources of relief from GB specificity. Either the
operations/principles are borrowed from other pre-linguistic cognitive domains
or they arise as a general feature of complex computational systems as such.
Chomsky has urged the possibility of the second in various places, suggesting
that these general computational principles might be traced to as yet unknown
physical principles. However, for
research purposes, the important issue lies with the non-linguistic specificity
of the relevant operations and principles, not whether they arise as a result
of general cognition or natural physical law.
[4]
So understanding movement also raises a question that trace theory effectively
answered by stipulation: why are some “copies” phonetically silent? As traces were defined as phonetically empty,
this is not a question that arose within GB. However, given a merge based
conception it becomes important to give a non-stipulative
answer to this question, and lots of interesting theoretical work has tried to
answer it. This is a good example of how pursuit of deeper theory can reveal
explanatory gaps which earlier accounts stipulated away rather than answered.
As should be obvious, this is a very good thing.
[5]
Traces are now being used for purely expository purposes. Minimalist theories
eschew traces, replacing them with copies.
[6]
Before we get too delighted with ourselves, we should note that there are other
trace licensing effects that GB accommodated that are not currently explained
in the more conceptually svelt minimalism.
So for example, there is currently no account for the argument/adjunct
asymmetries that GB spent so much time and effort cataloguing and
explaining.
[7]
This is not quite correct, but it will serve for a Whig History.
[8]
To repeat, we are using trace notation here as a simple convenience. As
indicated above, traces do not exist. What actually occupie the trace positions
are copies of the moved expression, as discussed above.
[9]
Which is not to imply that there are none. As of writing, there are interesting
attempts to account for these effects in minimalist terms, some more successful
than others.
This comment has been removed by the author.
ReplyDeleteNicely put. I think that a different perception is widespread outside GG, as in Clark & Lappin's (2010 Wiley book p. 8) characterization of the Minimalist Program as "a drastic retreat from the richly articulated, domain specific mechanisms specified in Chomsky's previous theories." The MP is an advance, not a retreat. If it had been a retreat, then some competing theory's solutions would be looking more attractive.
ReplyDeleteRegarding one of your details: If reflexivization is movement, why doesn't it abide by the CSC? John expects himself and Mary to get along, vs. *Who do you expect Mary and to get along?
Good question. I can think of two lines of attack.
Delete1. Take the reflexives within conjunctions to be hidden Pronouns. What I mean is if you take the complimentary distribution with pronouns to be dispositive of reflexives then one might expect that reflexives within conjuncts are not "real" reflexives. This is not entirely nuts. Consider 'I talked to Sally about Frank and me/myself' Sue told Bill about sheila and her/herself'. These seem quite a bit better than 'I talked to Sally about me' or 'Sue told Bill about her.' If this is right, then…
2. Treat island hood as a PF effect, even CSC. If this is right then plausibly gaps are necessary to island effects. This,as you know, is not a novel idea. Maybe the fact that reflexives are A-movement with copy pronounced shields them from island effects.
That's the best II can do right now. Good question. Research topic?
Another possibility would be to suppose that A-movement isn't subject to the CSC. (We touched on this once before.) It's hard to find very strong arguments one way or the other, but there are some examples (from the link above) where A-movement out of conjuncts doesn't seem particularly bad:
Delete(1) It seems to rain and that Norbert is handsome.
(2) John expected Bill to win the Preakness and that Frank would win the Kentucky Derby.
For the second example let's assume a raising-to-object type analysis.
Neither is perfect but perhaps this is because of the conjunction of finite and non-finite clauses.
I often wonder how seriously we are supposed to take the NTC. Feature Inheritance (Richards 2007, Chomsky 2013) and Agree with feature valuation/checking with deletion require that 'tampering' be permitted. Unless of course we say that No Tampering is a condition on Merge, and that Agree, being an entirely separate operation, is not constrained by it. We still need to say something about what can and cannot be tampered with, and what kind of consequences that would have, because presumably having quite an articulated feature system with tampering permitted could end up allowing us to derive the kinds of things that we wanted to rule out with the NTC in the first place. On the other hand, some kind of tampering seems to be entirely necessary to get feature bundles to interact in any kind of meaningful way.
ReplyDeleteAny thoughts on this?
I think I've had similar qualms expressed on FoL about feature valuation and NTC. Others tried to convince me not to worry. I still do. So, the real problem you point to is that within current minimalist theory there are actually two operations with different kinds of properties. The first is merge which subsumes structure building and movement. The second is Agree. Merge is relatively well behaved given Chomsky's plausible criteria. So that's the poster child when talking about minimalist successes. the second, IMO, is a bit of a dog's breakfast of assumptions (note the IMO here). I really don't like Agree or feature checking or valuation or… Not only does it introduce huge redundancies into the theory if treated as a Probe-Goal relation, but as you note it sits ill with other nice properties. Now, one can simply distinguish these agree phenomena from the other parts of the syntax. But that's not a great idea IMO. We still want to know why and where this stuff came from if Darwin's Problem is one you are interested in. A more radical thought is to try and reconceptualize feature checking along the lines of theta theory. Recall, the standard view is that there are no theta roles. The latter are just interpretive reflexives of grammatical relations. Why not extend the same idea to case and agreement. Why make invidiously distinguish such features from theta features? This would shunt much of these phenomena to the PF side of the ledger, but is it clear that this is a bad place for them? At any rate, let me admit that I feel as you do about these and think that this requires us to think harder about features and the role they play in the grammar. Are they central? Are they more morpho-phonological titivation? Dunno. Again: research topic?
DeleteThere are good reasons, I think, why (the set of grammatical processes/relations covered by) Agree cannot be relocated wholesale to PF. One that I am partial to is the following: if a language has morphologically expressed finite verb agreement at all, then either (i) the only DPs that can move to subject position are those that have been agreed with, or (ii) any DP, bearing any case, can move to subject position. What we don't find are languages where some proper subset of the set of all DPs (e.g. nominatives and accusatives, but not datives) can move to subject position, but that subset does not align with the subset of noun phrases that can control agreement. I've taken this to indicate that, in "type (i)" languages, agreement feeds movement to subject position. Since the latter has LF-consequences (scope), agreement cannot occur "at PF" and still stand in the relevant feeding relation.
DeleteNow, there are several nuances to consider. One is the separation between what Arregi & Nevins call Agree-Link (the probe ascertaining which DP it is going to enter into a feature relationship with) and what they call Agree-Copy (the actual copying of morphological features from the DP to the probe). I think there are really good reasons to think Agree-Copy is "at PF"; so the paragraph above, recast through the prism of the Agree-Link/Agree-Copy division, is strictly about Agree-Link.
Second, it is worth pondering how the generalization in the first paragraph would shake out in a system where there was no (long-distance) Agree, only valuation under (re)merge (with PF and LF each free to privilege the lower copy). If what I've been calling "moves to subject position" in the preceding paragraphs is just "the subset of chains stopping in [Spec,TP] in which PF pronounces the higher copy", then a PF-only conception of Agree could still be responsible for the aforementioned generalization: imagine everything (i.e., every DP in the clause/phase/domain/whatever) moves to [Spec,TP], but the only chains that receive higher-copy pronunciation are those where, at PF, agreement also obtains. But here's the problem: of all of these everything-moves-to-[Spec,TP] chains, only the one that is an agreement chain (and receives higher-copy pronunciation) behaves, scopally, as a subject. On this view, that is a mystery: the question of which of these chains will get to be the agreement-controlling (and hence, higher-copy pronounced) chain is determined after the PF-LF split.
Nice problem. This looks like the old question of how to coordinate case at LF with over morphological differences that was discussed in the very first minimalist paper leading to case CHECKING. But, I'm sure there is more to it. Thx for the puzzle.
DeleteThanks, I've enjoyed reading this series, it's helped me see the logic behind how GG has developed in this tradition.
ReplyDeleteBut I wonder about some of some of the claims of ‘simplicity’ made within the MP. Just one example: while I agree that the elimination of traces is welcome from the point of view of the NTC (and Inclusiveness and other Good Things), it surely makes things much less simple at the interfaces. Given the copy theory of movement, at both PF and LF interfaces you need some mechanism that can tell, for any constituent, whether or not that constituent has a copy elsewhere in the structure and, if so, whether it is the lower or the higher copy.
I'm not sure that it does make things less simple at the interfaces. Recall that trace theory needed a theory of reconstructions as well. So traces were not bare even in GB. As Chomsky has noted one of the attractive features of the Copy Theory is that it makes reconstruction less mysterious. What is required on any theory is a mapping between parts of the chain and their CI roles. Neither traces nor copies ARE variables; though they can map to them.
DeleteTwo last points: Yes one needs to make chains. But I think that this is so in all theories. It is not hard to "tell" whether a copy alone is kosher. It needs a case and a theta role. So chains must be reconstructed. It is easier with traces as these are indexed. One could, of course, index copies and so the problem equalizes, I think.
Last: what makes the elimination of traces nice is not really the NTC, it plausibly follows from this. Rather it is that traces are theory internal objects with special properties that seem very linguisticky. Theory internal "special" formatives are conceptually undesirable seen from a DP perspective. Thus, eliminating them with something less special is a good idea. The NTC suggests a way of doing this: just let an expression assume multiple roles in a derivation (i.e. allow it to have more than a single property). That seems conceptually anodyne enough.
What is the justification for the NTC on this whig analysis?
DeleteIt used to be I thought a computational simplicity argument, but I guess the alternative is an evolutionary simplicity argument; i.e. systems that satisfy the NTC are more likely to evolve.
My justification or Chomsky's? For the latter it defines the "simplest" imaginable combination operation. One that fails to preserve this more "complex." You rightly ask in what does the simplicity reside and you enumerate the two options.
DeleteWhat I find interesting is that there is a more historical "justification," if that is the right term. It is simply a generalization of the projection principle, which give us trace theory. This version delivers what trace theory did (and so shares its conceptual and empirical virtues) and also lays the groundwork for a theory of reconstruction (again, to the degree that connectedness effects hold, this is empirical justification for the Copy Theory and hence NTC from which the copy theory follows.
It would be nice to find some other virtues by unpacking one of the two routes you mention. Monotonicity is not unknown as a feature of computational systems and NTC delivers that. But what's monotonicity good for? I don't know. Do you? And as for evolvability, we know so little about this that it is hard to take this seriously. But that's the current state of the art, IMO.
It's not just that chains need to be made, but also that the interface mapping has to be able to tell whether it's dealing with the head or foot of a chain. In the GB setup this can be kept a completely local matter: if you're looking at a trace, then you're not at the head of a chain! But given copy theory, if the interface mapping sees e.g. John_1, it's going to have to look at some larger chunk of structure to tell whether this John_1 is the head John_1 or the foot John_1. That seems less simple.
Delete[Interlude: fair point about the elimination of traces being conceptually (if not always practically) separable from the elimination of indices.]
Now the point about reconstruction, I take it, is that just knowing that it's looking at a trace isn't sufficient for the mapping to know what to actually do—so maybe the mapping has to look at a larger chunk of structure either way. But I think the force of this point depends on what your favoured theory of reconstruction phenomena is. There have been theories according to which traces always map to variables, and the needed flexibility comes from elsewhere.
None of this is meant to argue that the copy theory isn't better *overall*. But I think the gain in simplicity might be overstated.
These are fair points. But several observations. Traces do what they need to do by stipulation. They are designed to have these nice properties and so, all things being equal we should see if these problems can be overcome in a more principled way. By assuming the copy theory we are motivated to look for more principled answers. Second, not all traces/copies get converted into variables, or at least not apparently. Think of ones in A' positions. Worse, some of these are reconstruction sites. So it is not only the head and foot of the chain that are at issue. Third, some of the problems you note re copies might be finessed if copies are distinguished wrt their feature structures. Thus, most copies will not have a complement of theta role and case feature and A'-feature. This may give on a handle on whether some copy is a head or foot of a chain. Those without case are at the foot, for example. Last, the non-syntactic theories of reconstruction might be correct, but, they follow from very little. The nice feature of the copy theory is that is invites reconstruction effects given that an expression is "in" several positions at once and thus might be expected to exercise powers of the positions it occupies. We can mimic this with various non-syntactic theories (and in trace theory) but it's not clear why these effects should hold on these views. I take this to be the main advantage of a copy theory empirically: not that it can do reconstruction but that it makes sense on such a theory.
DeleteThat said, we should not oversell the CTM. It's just nice to see how much mileage one can get from a move that plausibly simplifies UG by cleaning out linguistic specific machinery.
Thx for your comments. Made me think.
This comment has been removed by a blog administrator.
ReplyDelete