1.
Successive Cyclic Movement and Criterial Agreement
Ok
back to the main text: Chomsky next turns to the most interesting cases:
labeling in cases where a and b are “XPs.”[1] So say we had something like {{a,b}, {c,{d…{e…}}}}.
An example would be any “spec-XP” configuration (but don’t use this terminology
in the vicinity of Chomsky because he will immediately tell you that Specs
don’t exist (c.f. the funny interchange with David P and Chomsky over this in
the last 30 minutes of the lecture). Here the MLA cannot function given that
there is no single most prominent
atom within the set. In the above, a,b and c are equally prominent.
Chomsky puts this inability to label to very good use: he uses it to
motivate successive cyclic movement and (Rizzi’s) Criterial Agreement
Principle. Let’s consider these in turn.
First
successive cyclic movement: One of the ways of evading the impotence of the MLA
when inspecting a {XP,YP} configuration is to move one of the two. Say XP is a DP and YP is a non-finite “T’.”
Then were XP to move and if this
rendered the copy/occurrence inside the set “invisible” (we return to this)
then there would be a unique most prominent atom (e.g. c in {{a,b}, {c,{d…{e…}}}}).
Note that this assumes that the tail of the movement chain is
unavailable for labeling. Indeed, lower
copies are generally invisible in CS (Chomsky discusses the Icelandic data that
illustrates that tails of chains are not potential interveners). Chomsky notes,
however, that this is not the way we
want to describe matters for it sounds too traceish. After all, if these are
all copies, then why make invidious distinctions between any of the
occurrences? Isn’t this just sneaking in
trace licensing considerations into FL under another name? I think that Chomsky thinks that it would be.
In place of this, Chomsky thinks that we should consider the occurrences as one
big discontinuous element (question:
does this reify the notion of a
chain? Are chains now objects in the
domain of syntactic operations? Is this
a violation of Inclusiveness or NTC? I don’t know, but for some possibly
relevant discussion see the last Chapter of LGB.)
He then proposes that MLA (and maybe all rules of G) must apply to expressions
in the same domains. More particularly, we have something like (1):
(1) a is in domain D iff every
copy of a is in D
So
consider a case of movement, say moving from “Spec TP” “Spec CP” (for the
reason for scare quotes ask David Pesetsky), as in (2):
(2) {{Which, man} {C , {{which,man} {T0….}}}}
Consider labeling the bolded structure. If {which, man} is not in the relevant
labeling domain (I return to this), then T0
is the single prominent atom and it will be chosen as the label.
So, what’s the definition of a domain? Chomsky does not say in the lectures (or so
far as I can tell, anywhere else). Here are some things it cannot be if we are
to get the facts right. The domain for MLA cannot be the phase, for this
includes all of (2) and then both
occurrences of {which, man} are in
the same domain and so the MLA cannot apply unambiguously. Say we interpret
‘being in the same domain’ to mean ‘in the same set.’ This won’t serve either for in that case,
whenever I-merge has applied the “head” of the chain and the “foot” will never
be in the same domain as I-merging will always result in copies being elements
of different sets. This would imply
that the head of the chain is never
“visible.” This, note is not necessarily a terrible conclusion given the
observation that it is usually the target of I-merge that projects (T0
in this case). However, I doubt that Chomsky will like this for it will not force successive cyclic movement,
something that he wants to take place as a by-product of what is required for
the MLA to unambiguously label a set at Transfer.
Let me clarify: Chomsky wants successive cyclicity to follow
from MLA plus the idea that movement makes an expression “invisible” by putting
copies in separate domains. What we are looking for is a notion of ‘domain’
that serves this end. Domains cannot be
phases for all copies will be in the same phase and so the MLA won’t be able to
apply. Nor can Domains be sets for I-merge will always result in copies being
members of different sets (and this means that the head of the chain need never
move out of “Spec” to be invisible). This means that we need a notion of domain
that is not the same as the pair of notions that are motivated by the
conceptual bare minimum (i.e. either the minimal operation (this gives us sets)
or minimal computation (this gives us phases)). We need an additional notion. In fact we need something roughly analogous to
whatever the eventual label “dominates” or whatever is “part of” the labeled
structure. And for this we would need a recursive definition of ‘part-of’ if we
are interested in defining ‘domain’ using sets, which, remember, is the only
structure we have given Merge.[2]
That sets is all there is to FL structure is emphasized by
Chomsky in his discussion of the perils of using phrase marker trees. See his
discussion of trees with labels at the nodes and how some of this is not
interpretable in a Merge based system (around 19-21 minutes). He notes, quite
rightly, that many “minimalist” theories on offer (multidimensional theories
that allow e.g. parallel merge) are not representable in the more austere set
notation.[3] But, then, as we see, it is not clear how to
define ‘domain’ in this more austere theory either, at least if all on can use
are set theoretic notions. At the very
least, the definition of ‘domain’ requires something like the ancestral
relation to define it (an inductive definition) or introduction of the notion
of a lattice with a part-whole relation defined over it.[4] This is not
a trivial addition to the basic inventory of primitive concepts and, from where
I sit, it would seem to require more than Merge to sneak it into the inventory
of basic notions of FL. Note that defining ‘domain’ is trivial if we treat
every occurrence independently of its copies (i.e. we forgo syntactically
quantifying over chains). Then we can define ‘domain of a’ as all elements that are members of the same set as a. However, this won’t serve Chomsky’s
purposes for moving something out of the Specifier will not thereby render it’s
tail invisible.
This said, let’s assume there is a reasonable definition of
domain that puts higher copies outside the relevant domain and lower copies
within it (see note 4). This will then force movement of elements in “Spec
positions.”
Observe that “Spec” is doing some work here (i.e. it goes
beyond notational convenience). To see this, consider an example like (3a)
where we raise the DP John and VP
front the embedded VP. This is not a great sentence. But compare it with what
occurs where we raise the embedded T’ instead leaving John below as in (3b). This is word salad. But why? Why can’t we break the {XP, YP} symmetry by
moving the “T’” instead of the XP? Note,
we cannot say that this is because X’s don’t move for the whole tenor of
Chomsky’s discussion is that projection is not part of the syntax. We might
claim that these are not interpretable at CI or SM for some reason, but it’s
not really clear why this should be so, IMO.
Why after all is moving the VP interpretable but moving the non-finite
T’ uninterpretable? Another option is to attribute the lousy status of (3b) to
the “case filter” or some MP analogue thereof. But doing this removes the need
to use the MLA to explain successive cyclic DP movement.[5]
Case considerations, which appear to be independently needed, suffice. Note, that prohibiting such movement is part
of what X’ theory tried to explain.[6]
At any rate, here it is sufficient to just note a potential puzzle: why don’t
X’s move and why can’t they ever serve to break an unwanted symmetry?[7]
(3)
a. Eat pizza, John never appears to
b.
To eat pizza, it never appears John to
Ok, let’s keep moving: Say we can break the {XP,YP} symmetry
by moving. The next question is why/how do we ever stop moving. After all, Gs do generate {XP,YP} configurations.[8]
Here’s Chomsky’s second idea: Agreement provides another
mechanism for getting around the labeling ambiguity problem. Informally, if XP
values some uF of YP then we can treat F shared between XP and YP as the unique
head of {XP,YP}.[9]
Thus, feature agreement provides a way out of the ambiguity problem by
providing a unique feature for the MLA to choose, one that both XP and YP
share.[10]
Chomsky notes that it is not enough
that these two features happen to be the same, they need to have been made the
same by a rule of agreement (i.e. the features were co-valued by Agree). Note that this requires, that MLA apply at the
phase level, for it’s only within a phase that CS “remembers” whether two
features are inherently the same or are the same because of a valuation
operation.
The agreement route to labeling combines with the above
earlier discussion to derive successive cyclic movement: it applies until a
moving XP hits a criterial position where agreement can license a label via the
MLA.
Interestingly, this suggests that the input to MLA is the
whole phase and not simply the unit that is actually transferred (viz. the
complement of the phase head). Why? Consider a WH in “Spec CP.” If WH values
C’s +Q features (or vice versa)[11]
and if Transfer applies to the complement of the phase head and if MLA is part
of Transfer then the MLA will not be able to “see” whether the features are
products of Agreement for their actual transfer to the interfaces will only
apply the next phase up. But by then
we cannot tell if the features just “happen” to be the same or were co-valued
via Agree. So, if simple feature
identity is not enough, then either MLA is not part of Transfer or Transfer
need not actually transfer what it applies to.
Ok: so we get successive cyclic movement via the two ways to
disambiguate the application of the MLA. Effectively, in the absence of feature
agreement, movement is required.
Oddly, this “derivation” of the cycle is not all that
different from a more traditional one (e.g. Boskovic’s). Say we accept
Chomsky’s assumption that the moving WH is the item with the unvalued feature
and say that we take the configuration of feature checking to be {XP,YP}
structures (i.e. the old Spec-head configuration by another designation) then
the WH will keep moving until it can get to a valued position at pains of
crashing the derivation. This is effectively what Chomsky’s technology gives
us. The main difference is the claim that this follows from minimal assumptions
regarding labeling. However, it is not clear to me that the actual technology
is all that different: both require a valuation relation between Wh and C(‘)
(or DP and T(‘)). Both allow the movement to stop when this agreement takes
place. So both require that agreement/valuation be part of the explanatory mix.
The critical question, then, is whether the MLA allows us to
dispense with things like the case filter or the requirement that unvalued
features be valued before Transfer. If we need to assume this anyhow, then it
seems that the MLA account of successive cyclic movement is theoretically
redundant, and that would be a good argument against it. So, let’s assume that this is not the case.
Indeed, if the MLA could allow us to dispense with the whole valued/unvalued
features schtick, that would be one great
argument in its favor. However, from what I can tell, Chomsky wants feature
valuation to motivate phases. Still, if
we assume that only phase heads have unvalued features (contra Chomsky’s claims
in the lecture concerning moving WHs) then the above proposal that elements
move in order to get unvalued features checked cannot work, though something
like the MLA would do the trick. The problem, of course is that unless the
moving Wh has the unvalued feature, it is completely unclear why it should ever
move from its base position to a +Q-C that is arbitrarily far away. So, it
looks like everyone needs to assume that WHs must move to Specs in virtue of its unvalued features (and seems to
also suggest that it moves there because that’s where valuation takes place).
This then renders the MLA nugatory. I’d love to hear what others think of these
considerations.
My thoughts to this point: The minimalist conceit (at
least for Chomsky) has been to look for the conceptually simplest operations,
because properly understood they will give us what we observe. This worked well
for Merge wrt structured hierarchy and movement. But it does not deliver
labels. So why do we need labels? They are BOCs necessary for interpretation.
But the MLA by itself can’t do that
much. It encounters a bunch of environments where it cannot apply. To allow it
to apply FL finds different workarounds. But conceptually, why does FL need to
find ways to get around labeling problems when they arise. Why doesn’t it just
stick with the simplest assumptions and just fail to interface where these
problems arise? Let me put this another way. The facts are not explained simply
by the MLA, but the MLA plus the
various ways of evading its impotence (e.g. roots don’t label, tails of chains
don’t count, feature agreement gives us unique labels, T is special (you’ll
see) etc.). So it’s not only the minimal algortithm that is doing the work, but
the (dare I say it) special assumptions concerning everything else that allows
it to work. These “initial conditions” need to get into FL too, and they don’t
look conceptually simple, very domain general, or the trivial results of
efficient computation Physics rightly construed) to me. So the question is not
only whether the MLA is simple or minimal but whether these other prerequisites
to its proper operation are as well. This said, whether it is minimal or not, it is an interesting theory aiming to
explain deep facts about FL and so we continue our discussion in the next
installment.
[1]
All scare quotes indicate that the enclosed are being used purely for purposes
of exposition.
[2]
It’s interesting to note that the indicated complication arises from treating
Chains as grammatical units. This suggests that understanding what chains are
(“emergent” objects?) and how they function grammatically would be
theoretically useful. For example, there would be nothing unnatural or complex
in assuming that CS treats each occurrence singly rather than as part of a
discontinuous object. This assumption is consistent with treating the
occurrences “together” at CI (e.g. to form operator-variable structures). How
many arguments do we have for assuming the CS worries about chains rather than
occurrences? There are some intervention effects and the MLA. Any others?
[3]
Try it. What’s the analogue of [ a, (b], c) (a [parallel merge structure in
which b is both a member of the unit [a,b] and a member of the unit (b,c). The
set theoretic version of this would be not parallel merge but {a,b} {b,c}. It
is not clear to me that this structure would deliver what is desired.
[4]
Something like: (i) If a is member of D then a is ‘part of” D then (ii) if a is
part of D’ and D’ is a member of D then a is part of D. Then we can say that a
is in the domain of D if a is part of D.
Another
way of thinking of this is that the domain is a lattice with D being the
supremum with all constituents being members of the lattice. Thus a and b are
in domain D iff both a and b are parts of D. Think plural individuals with
lexical atoms at the bottom of the lattice related by the +-in-a-circle
operation. Note, this goes beyond the standard minimal set theoretical
conceptions and if you are squeamish about ordered pairs for being too complex,
then this should really set you twitching.
[7]
Note 2 in the first part provided more relevant evidence about the inertness of
X’.
[8]
Note, conceptually, this is hardly necessary. A very strong version of the MLA
would prohibit {XP, YP} structures altogether. True this would only allow very
simple clauses, effectively only unaccusatives, but this is hardly an
incoherent possibility. Is this conceptually
simpler than allowing various ways of eluding the strictures of the MLA? I
dunno. Chomsky’s efforts to evade the
MLA’s strictures are interesting. But, at least to me, they hardly seem
conceptually minimal. But hey, I wear stripes and checks together so my taste
sucks! One might argue however, that the MLA is the minimal operation given that we have structures of the
kind that we do. However, this does not explain why we have such structures.
The why might come from something
like a principle of effability: maximize Gs contact with CI (i.e. if you can
think it, then you should be able to say it). But this seems like an odd
principle given Chomsky’s observations concerning efficient communication and
its relation to minimal computation. The minimalist conceit (at least for
Chomsky) has been look for the simplest operations, because properly understood
they will give us what we observe. This worked well for Merge wrt structured
hierarchy and movement. But it does not deliver labels. So why do we need
labels? Say for interpretation. But why do we need ways to get around labeling
problems when they arise rather than just fail to interface where the problems
exist? The facts are not explained simply by the MLA, but the MLA plus the various ways of evading its
impotence (e.g. roots don’t lable, domains, feature agreement, T is special
(you’ll see) etc.). So it’s not only the minimal algortithm that is doing the
work, but the (dare I say it) special assumptions concerning everything else
that allows it to work. These “initial conditions” need to get into FL too, and
they don’t look very domain general or the results of efficient computation to
me.
[9]
This is informal for I don’t know whether this involves a novel agreement
operation or it is just the reflex of the standard Probe/Goal technology. If the
latter, then it is not XP that values the uF of YP but X0 valuing
the uF of Y0.
[10]
Note that this possibly implies that the problem is not really one about
minimal search but one about
determinist decision given a search.
If agreement information lives on heads, then MLA must be able to see both X0
and Y0 to see that they agree. So it can search both XP and YP.
Where it fails is in being unable to “decide” which is the label as that
decision is ambiguous except when the label of both is the same. That’s what
agreement provides.
[11]
In the lecture, Chomsky wants the moved WH to have the unvalued features and
have +Q on C value them. I am not sure
why he wants this (though see discussion in text)g. However, one implication is
that not all unvalued features are on phase heads, contrary to what he suggests
in the second lecture. Moreover, it
seems inconsistent with the standard Probe-Goal architecture in which an
unvalued feature searches for something to value it. I am not sure what to make of this actually. Lecture
3 does not say much about Probes and Goals. It even leaves the suggestion that
valuation takes place between XP and YP rather than X0 and Y0.
There is no reason why the feature that becomes the label need sit on a head,
though that is the standard assumption. But if it does need to do so, and if
agreement is a relation between heads, then it suggests that if Agree is via
probe/goal then WH values C not the other way around.
No comments:
Post a Comment