Friday, July 4, 2014

Comments on lecture 3- part II

This is the second part of several comments on Chomsky’s third lecture (here). First part is here.

1.     Successive Cyclic Movement and Criterial Agreement

Ok back to the main text: Chomsky next turns to the most interesting cases: labeling in cases where a and b are “XPs.”[1]  So say we had something like {{a,b}, {c,{d…{e…}}}}. An example would be any “spec-XP” configuration (but don’t use this terminology in the vicinity of Chomsky because he will immediately tell you that Specs don’t exist (c.f. the funny interchange with David P and Chomsky over this in the last 30 minutes of the lecture). Here the MLA cannot function given that there is no single most prominent atom within the set.  In the above, a,b and c are equally prominent.  Chomsky puts this inability to label to very good use: he uses it to motivate successive cyclic movement and (Rizzi’s) Criterial Agreement Principle. Let’s consider these in turn.

First successive cyclic movement: One of the ways of evading the impotence of the MLA when inspecting a {XP,YP} configuration is to move one of the two.  Say XP is a DP and YP is a non-finite “T’.” Then were XP to move and if this rendered the copy/occurrence inside the set “invisible” (we return to this) then there would be a unique most prominent atom (e.g. c in {{a,b}, {c,{d…{e…}}}}).  Note that this assumes that the tail of the movement chain is unavailable for labeling.  Indeed, lower copies are generally invisible in CS (Chomsky discusses the Icelandic data that illustrates that tails of chains are not potential interveners). Chomsky notes, however, that this is not the way we want to describe matters for it sounds too traceish. After all, if these are all copies, then why make invidious distinctions between any of the occurrences?  Isn’t this just sneaking in trace licensing considerations into FL under another name?  I think that Chomsky thinks that it would be. In place of this, Chomsky thinks that we should consider the occurrences as one big discontinuous element (question: does this reify the notion of a chain?  Are chains now objects in the domain of syntactic operations?  Is this a violation of Inclusiveness or NTC? I don’t know, but for some possibly relevant discussion see the last Chapter of LGB.) He then proposes that MLA (and maybe all rules of G) must apply to expressions in the same domains. More particularly, we have something like (1):
(1)  a is in domain D iff every copy of a is in D
So consider a case of movement, say moving from “Spec TP” “Spec CP” (for the reason for scare quotes ask David Pesetsky), as in (2):
                        (2) {{Which, man} {C , {{which,man} {T0….}}}}
Consider labeling the bolded structure. If {which, man} is not in the relevant labeling domain (I return to this), then T0 is the single prominent atom and it will be chosen as the label. 

So, what’s the definition of a domain?  Chomsky does not say in the lectures (or so far as I can tell, anywhere else). Here are some things it cannot be if we are to get the facts right. The domain for MLA cannot be the phase, for this includes all of (2) and then both occurrences of {which, man} are in the same domain and so the MLA cannot apply unambiguously. Say we interpret ‘being in the same domain’ to mean ‘in the same set.’  This won’t serve either for in that case, whenever I-merge has applied the “head” of the chain and the “foot” will never be in the same domain as I-merging will always result in copies being elements of different sets. This would imply that the head of the chain is never “visible.” This, note is not necessarily a terrible conclusion given the observation that it is usually the target of I-merge that projects (T0 in this case). However, I doubt that Chomsky will like this for it will not force successive cyclic movement, something that he wants to take place as a by-product of what is required for the MLA to unambiguously label a set at Transfer.

Let me clarify: Chomsky wants successive cyclicity to follow from MLA plus the idea that movement makes an expression “invisible” by putting copies in separate domains. What we are looking for is a notion of ‘domain’ that serves this end.  Domains cannot be phases for all copies will be in the same phase and so the MLA won’t be able to apply. Nor can Domains be sets for I-merge will always result in copies being members of different sets (and this means that the head of the chain need never move out of “Spec” to be invisible). This means that we need a notion of domain that is not the same as the pair of notions that are motivated by the conceptual bare minimum (i.e. either the minimal operation (this gives us sets) or minimal computation (this gives us phases)). We need an additional notion. In fact we need something roughly analogous to whatever the eventual label “dominates” or whatever is “part of” the labeled structure. And for this we would need a recursive definition of ‘part-of’ if we are interested in defining ‘domain’ using sets, which, remember, is the only structure we have given Merge.[2]

That sets is all there is to FL structure is emphasized by Chomsky in his discussion of the perils of using phrase marker trees. See his discussion of trees with labels at the nodes and how some of this is not interpretable in a Merge based system (around 19-21 minutes). He notes, quite rightly, that many “minimalist” theories on offer (multidimensional theories that allow e.g. parallel merge) are not representable in the more austere set notation.[3]  But, then, as we see, it is not clear how to define ‘domain’ in this more austere theory either, at least if all on can use are set theoretic notions.  At the very least, the definition of ‘domain’ requires something like the ancestral relation to define it (an inductive definition) or introduction of the notion of a lattice with a part-whole relation defined over it.[4]  This is not a trivial addition to the basic inventory of primitive concepts and, from where I sit, it would seem to require more than Merge to sneak it into the inventory of basic notions of FL. Note that defining ‘domain’ is trivial if we treat every occurrence independently of its copies (i.e. we forgo syntactically quantifying over chains). Then we can define ‘domain of a’ as all elements that are members of the same set as a. However, this won’t serve Chomsky’s purposes for moving something out of the Specifier will not thereby render it’s tail invisible.

This said, let’s assume there is a reasonable definition of domain that puts higher copies outside the relevant domain and lower copies within it (see note 4). This will then force movement of elements in “Spec positions.”

Observe that “Spec” is doing some work here (i.e. it goes beyond notational convenience). To see this, consider an example like (3a) where we raise the DP John and VP front the embedded VP. This is not a great sentence. But compare it with what occurs where we raise the embedded T’ instead leaving John below as in (3b). This is word salad. But why?  Why can’t we break the {XP, YP} symmetry by moving the “T’” instead of the XP?  Note, we cannot say that this is because X’s don’t move for the whole tenor of Chomsky’s discussion is that projection is not part of the syntax. We might claim that these are not interpretable at CI or SM for some reason, but it’s not really clear why this should be so, IMO.  Why after all is moving the VP interpretable but moving the non-finite T’ uninterpretable? Another option is to attribute the lousy status of (3b) to the “case filter” or some MP analogue thereof. But doing this removes the need to use the MLA to explain successive cyclic DP movement.[5] Case considerations, which appear to be independently needed, suffice.  Note, that prohibiting such movement is part of what X’ theory tried to explain.[6] At any rate, here it is sufficient to just note a potential puzzle: why don’t X’s move and why can’t they ever serve to break an unwanted symmetry?[7]
                        (3)       a. Eat pizza, John never appears to
                                    b. To eat pizza, it never appears John to

Ok, let’s keep moving: Say we can break the {XP,YP} symmetry by moving. The next question is why/how do we ever stop moving. After all, Gs do generate {XP,YP} configurations.[8]

Here’s Chomsky’s second idea: Agreement provides another mechanism for getting around the labeling ambiguity problem. Informally, if XP values some uF of YP then we can treat F shared between XP and YP as the unique head of {XP,YP}.[9] Thus, feature agreement provides a way out of the ambiguity problem by providing a unique feature for the MLA to choose, one that both XP and YP share.[10] Chomsky notes that it is not enough that these two features happen to be the same, they need to have been made the same by a rule of agreement (i.e. the features were co-valued by Agree). Note that this requires, that MLA apply at the phase level, for it’s only within a phase that CS “remembers” whether two features are inherently the same or are the same because of a valuation operation. 

The agreement route to labeling combines with the above earlier discussion to derive successive cyclic movement: it applies until a moving XP hits a criterial position where agreement can license a label via the MLA. 

Interestingly, this suggests that the input to MLA is the whole phase and not simply the unit that is actually transferred (viz. the complement of the phase head). Why? Consider a WH in “Spec CP.” If WH values C’s +Q features (or vice versa)[11] and if Transfer applies to the complement of the phase head and if MLA is part of Transfer then the MLA will not be able to “see” whether the features are products of Agreement for their actual transfer to the interfaces will only apply the next phase up. But by then we cannot tell if the features just “happen” to be the same or were co-valued via Agree.  So, if simple feature identity is not enough, then either MLA is not part of Transfer or Transfer need not actually transfer what it applies to.

Ok: so we get successive cyclic movement via the two ways to disambiguate the application of the MLA. Effectively, in the absence of feature agreement, movement is required.

Oddly, this “derivation” of the cycle is not all that different from a more traditional one (e.g. Boskovic’s). Say we accept Chomsky’s assumption that the moving WH is the item with the unvalued feature and say that we take the configuration of feature checking to be {XP,YP} structures (i.e. the old Spec-head configuration by another designation) then the WH will keep moving until it can get to a valued position at pains of crashing the derivation. This is effectively what Chomsky’s technology gives us. The main difference is the claim that this follows from minimal assumptions regarding labeling. However, it is not clear to me that the actual technology is all that different: both require a valuation relation between Wh and C(‘) (or DP and T(‘)). Both allow the movement to stop when this agreement takes place. So both require that agreement/valuation be part of the explanatory mix.

The critical question, then, is whether the MLA allows us to dispense with things like the case filter or the requirement that unvalued features be valued before Transfer. If we need to assume this anyhow, then it seems that the MLA account of successive cyclic movement is theoretically redundant, and that would be a good argument against it.  So, let’s assume that this is not the case. Indeed, if the MLA could allow us to dispense with the whole valued/unvalued features schtick, that would be one great argument in its favor. However, from what I can tell, Chomsky wants feature valuation to motivate phases.  Still, if we assume that only phase heads have unvalued features (contra Chomsky’s claims in the lecture concerning moving WHs) then the above proposal that elements move in order to get unvalued features checked cannot work, though something like the MLA would do the trick. The problem, of course is that unless the moving Wh has the unvalued feature, it is completely unclear why it should ever move from its base position to a +Q-C that is arbitrarily far away. So, it looks like everyone needs to assume that WHs must move to Specs in virtue of its unvalued features (and seems to also suggest that it moves there because that’s where valuation takes place). This then renders the MLA nugatory. I’d love to hear what others think of these considerations.

My thoughts to this point: The minimalist conceit (at least for Chomsky) has been to look for the conceptually simplest operations, because properly understood they will give us what we observe. This worked well for Merge wrt structured hierarchy and movement. But it does not deliver labels. So why do we need labels? They are BOCs necessary for interpretation. But the MLA by itself can’t do that much. It encounters a bunch of environments where it cannot apply. To allow it to apply FL finds different workarounds. But conceptually, why does FL need to find ways to get around labeling problems when they arise. Why doesn’t it just stick with the simplest assumptions and just fail to interface where these problems arise? Let me put this another way. The facts are not explained simply by the MLA, but the MLA plus the various ways of evading its impotence (e.g. roots don’t label, tails of chains don’t count, feature agreement gives us unique labels, T is special (you’ll see) etc.). So it’s not only the minimal algortithm that is doing the work, but the (dare I say it) special assumptions concerning everything else that allows it to work. These “initial conditions” need to get into FL too, and they don’t look conceptually simple, very domain general, or the trivial results of efficient computation Physics rightly construed) to me. So the question is not only whether the MLA is simple or minimal but whether these other prerequisites to its proper operation are as well. This said, whether it is minimal or not, it is an interesting theory aiming to explain deep facts about FL and so we continue our discussion in the next installment.





           




[1] All scare quotes indicate that the enclosed are being used purely for purposes of exposition.
[2] It’s interesting to note that the indicated complication arises from treating Chains as grammatical units. This suggests that understanding what chains are (“emergent” objects?) and how they function grammatically would be theoretically useful. For example, there would be nothing unnatural or complex in assuming that CS treats each occurrence singly rather than as part of a discontinuous object. This assumption is consistent with treating the occurrences “together” at CI (e.g. to form operator-variable structures). How many arguments do we have for assuming the CS worries about chains rather than occurrences? There are some intervention effects and the MLA. Any others?
[3] Try it. What’s the analogue of [ a, (b], c) (a [parallel merge structure in which b is both a member of the unit [a,b] and a member of the unit (b,c). The set theoretic version of this would be not parallel merge but {a,b} {b,c}. It is not clear to me that this structure would deliver what is desired.
[4] Something like: (i) If a is member of D then a is ‘part of” D then (ii) if a is part of D’ and D’ is a member of D then a is part of D. Then we can say that a is in the domain of D if a is part of D.
            Another way of thinking of this is that the domain is a lattice with D being the supremum with all constituents being members of the lattice. Thus a and b are in domain D iff both a and b are parts of D. Think plural individuals with lexical atoms at the bottom of the lattice related by the +-in-a-circle operation. Note, this goes beyond the standard minimal set theoretical conceptions and if you are squeamish about ordered pairs for being too complex, then this should really set you twitching.
[5] See below for discussion of the analogous proposal for successive cyclic Wh movement.   
[6] Some self promotion: I tried to account for this in terms of Minimality here.
[7] Note 2 in the first part provided more relevant evidence about the inertness of X’.
[8] Note, conceptually, this is hardly necessary. A very strong version of the MLA would prohibit {XP, YP} structures altogether. True this would only allow very simple clauses, effectively only unaccusatives, but this is hardly an incoherent possibility.  Is this conceptually simpler than allowing various ways of eluding the strictures of the MLA? I dunno.  Chomsky’s efforts to evade the MLA’s strictures are interesting. But, at least to me, they hardly seem conceptually minimal. But hey, I wear stripes and checks together so my taste sucks! One might argue however, that the MLA is the minimal operation given that we have structures of the kind that we do. However, this does not explain why we have such structures. The why might come from something like a principle of effability: maximize Gs contact with CI (i.e. if you can think it, then you should be able to say it). But this seems like an odd principle given Chomsky’s observations concerning efficient communication and its relation to minimal computation. The minimalist conceit (at least for Chomsky) has been look for the simplest operations, because properly understood they will give us what we observe. This worked well for Merge wrt structured hierarchy and movement. But it does not deliver labels. So why do we need labels? Say for interpretation. But why do we need ways to get around labeling problems when they arise rather than just fail to interface where the problems exist? The facts are not explained simply by the MLA, but the MLA plus the various ways of evading its impotence (e.g. roots don’t lable, domains, feature agreement, T is special (you’ll see) etc.). So it’s not only the minimal algortithm that is doing the work, but the (dare I say it) special assumptions concerning everything else that allows it to work. These “initial conditions” need to get into FL too, and they don’t look very domain general or the results of efficient computation to me.
[9] This is informal for I don’t know whether this involves a novel agreement operation or it is just the reflex of the standard Probe/Goal technology. If the latter, then it is not XP that values the uF of YP but X0 valuing the uF of Y0.
[10] Note that this possibly implies that the problem is not really one about minimal search but one about determinist decision given a search. If agreement information lives on heads, then MLA must be able to see both X0 and Y0 to see that they agree. So it can search both XP and YP. Where it fails is in being unable to “decide” which is the label as that decision is ambiguous except when the label of both is the same. That’s what agreement provides.
[11] In the lecture, Chomsky wants the moved WH to have the unvalued features and have +Q on C value them.  I am not sure why he wants this (though see discussion in text)g. However, one implication is that not all unvalued features are on phase heads, contrary to what he suggests in the second lecture.  Moreover, it seems inconsistent with the standard Probe-Goal architecture in which an unvalued feature searches for something to value it.  I am not sure what to make of this actually. Lecture 3 does not say much about Probes and Goals. It even leaves the suggestion that valuation takes place between XP and YP rather than X0 and Y0. There is no reason why the feature that becomes the label need sit on a head, though that is the standard assumption. But if it does need to do so, and if agreement is a relation between heads, then it suggests that if Agree is via probe/goal then WH values C not the other way around.

No comments:

Post a Comment