Wednesday, July 2, 2014

Comments on lecture 3-part the first

For some mysterious reason, these comments kept growing and growing. I have decided to split them into three parts for easier handling. It also has the advantage of allowing me a break over the coming long weekend. Oh yes, here's a link to the lectures again.

People, fasten your seatbelts and put up your tray tables, in the third lecture Chomsky really puts the pedal to the me(n)tal.  The first two lectures saw Chomsky arguing that the minimal recursive operation (Merge, the single evolutionary “miracle”) in combination with minimal computation (Phases by way of minimal search/minimal “memory” and a gift from physics (if we only understood it better) and so not part of UG) provides Gs that generate an infinite number of structured linguistic objects (SLO) which manifest displacement, reconstruction, morphology and cyclicity effects.  In lecture 3, Chomsky turns his attention to projection (i.e. X’ theory), in particular the idea that phrases are headed. He argues that X’ theory is fundamentally misguided. In particular, Chomsky argues that within the computational system (CS) units are not headed (i.e. there is no analogue of the labeled phrase), though the objects transferred to the sensory-motor (SM) and conceptual-intentional (CI) systems are headed, though, so far as I can tell, not headed with the conventional labels. There are no more CPs or TPs or VPs, rather there are f headed, WH headed, Topic headed etc expressions. 

In what follows, I will try to present (i) Chomsky’s arguments against labeled expressions in CS, (ii) his proposed minimal labeling algorithm (MLA) and (iii) the main “empirical” payoff of the MLA. I put “empirical” in scare quotes because as becomes clear, and Chomsky emphasizes, the results deal with highly idealized examples. Getting the suggestions to pan out in detail are left as exercises/projects for the sympathetic.  The main emphasis of lecture 3, like that in the first two, is on conceptual arguments that maintain that any analysis other than the one that on offer is, at best, a fancy re-description of the phenomenon of interest. Why? Because Chomsky claims to offer the conceptually most minimal requirements. If correct, all other accounts are more complex and so must carry less explanatory oomph. The methodological assumption is that departures from the conceptual minimum can only stipulate, not explain. As in previous posts, I will throw my 2 cents in occasionally during the course of exposition.

1.     Why Labels/Projections are Problematic

From the perspective of lectures 1 and 2, there is little surprise, really, that Chomsky is hostile to the idea that labeling/projection is part of CS.  The reason is that if Merge is the one “miracle” (and by this I just mean the simple adventitious evo addition that made I-language possible) then labeling (the operation that underlies projection) is a definite complication. Indeed, as Chomsky notes, Merge without labels is sufficient to generate an infinite number of hierarchical LSOs with movement like dependencies. There is no need for labels to do this and so, adding labels cannot be the conceptually most minimal assumption. Conclusion? No labels in the CS part of G.

But, Chomsky believes that empirically there is good evidence for labels (though, curiously, he does not review what this is). So if they don’t arise in CS where do they come from?  Chomsky proposes that labels are required for LSOs to be illegible at the interfaces. In other words, Bare Output Conditions (BOC) demand labeled LSOs. No labels, no phonetic or CIish interpretations.

It is worth observing that this line of argument follows quite neatly from Chomsky’s formulation of the SMT. Recall that the SMT is the view that FL is the optimal realization of interface conditions. What Chomsky means by this is not that human Gs match properties of the interface as well as any G can (this is what I used to think he meant, but I was wrong). Rather it means that FL is what we get from (the) optimal (i.e. simplest, most conceptually minimal) rules and computational principles that make the most minimal adjustments to accommodate the readability conditions of CI (and secondarily and with a lot more tinkering) SM. As Merge is the conceptually simplest possible rule, then if labels exist (and Chomsky assumes they do), they must reflect the demands of the interfaces. So what are labels? They must be the products of the conceptually simplest rules that satisfy the legibility requirements of the interfaces. There is nowhere else for them to come from given Chomsky’s minimalist conceptual considerations, driven as they are by Occam. 

We can take this reasoning a further step: if the interfaces is where labels are needed, then labeling optimally occurs at the point where SLOs are transferred to the interfaces (i.e. during Transfer at the phase level). Thus, labels are not part of the “transformational” component of G.  Merging and CS could care less about labels. CI and SM care a whole bunch.

A comment: Chomsky likes to note in this lecture that making things simple regularly generates puzzles. I think he is right, and the point is an important one. Stipulations resolve problems, but in very uninteresting ways. So, it is not surprising that conceptual simplification generates empirical problems. Here’s one that Chomsky did not mention in his lecture: there is plenty of evidence that Gs manipulate some kinds of phrases and not others. Thus, English has VP fronting and VP ellipsis, French does not.[1] Or English overtly moves WHs while Chinese does not.  This fact has two sides. First, that English moves verb phrases, not just verbs. Indeed, English doesn’t like to move verbs, but finds moving verb phrases just fine. And second that English seems to care less how big the verb phrases are. The displaced element can be one word or arbitrarily large so long as it is a verb phrase. The fact that phrases (Max Ps) move (rather than X’s say) is considered a descriptive staple.[2] Such data make sense if units in CS are phrase like objects (viz. labeled constituents). But sans labels (and hence sans notions like XP and X’) well, let’s just say that why/how we get these kinds of data is a bit of a puzzle. One that I am sure that Chomsky would welcome and I hereby bequeath to him. 

I should add that I believe that these kinds of facts, staples of the GG literature since “the earliest days of Generative Grammar,” are very problematic for an approach like Chomsky’s. To date (for the last decade at least), we have stopped worrying about such data relegating their eventual understanding to unknown principles of pied piping. Maybe, but it is worth noting that one of the classical reasons for postulating labeled phrases and rules that target them are these kinds of constituency tests. And these tests, at least to me, seem more robust than any of the data that Chomsky cites in lecture 3 (e.g. EPP and Subject/object asymmetries).[3]

2.     The Minimal Labeling Algorithm

Ok, let’s back to lecture 3.  So where do labels come from? They are the product of the conceptually minimal labeling algorithm (MLA); a rule that looks for the most “prominent” expression in a set.  So, if Merge creates objects like {a,b} then at Transfer, MLA applies to find the most prominent element in the set and that is the set’s Label.  “Most prominent element” means, least embedded atom. This works well, Chomsky notes, in the standard head-complement case where a is an atom (viz. an X0 in X’ terms) and b is a set of atoms (and XP in X’ terms). In such a case a is more prominent than anything in b. Note that the MLA assumes that labels must be atoms for otherwise b is as prominent as a is.  So, the MLA codes part of what X’ theory did: it labels some complex expressions (sets) in terms of properties of the head (in X’ terms) of that complex expression (set). Chomsky assumes that this application of MLA is conceptually unproblematic as it is very simple (i.e. minimal search) and can apply unambiguously (i.e. there must be exactly one most prominent atom to find).[4] Take note of this non-ambiguity assumption, for it plays a major role in Chomsky’s account.

Chomsky next turns to the problematic cases.[5] First he considers the old “bottom of the tree” problem. Merge can combine two atoms to yield a structure where a and b are both atoms.  An example might be {the,dog} or {eat, it}. Here we would seem to have two atoms and so application of the MLA is ambiguous (something that Chomsky considers a computationally unacceptable situation (I’m not sure why? G operations must be deterministic?)). He resolves this by adopting a Distributed Morphology proposal concerning categorization. In particular, he assumes that lexical atoms are not Ns, Vs, As, etc. but roots. Being an N, V or A is not a property of an atom, but a relation between a root (R) (I am using R because Blogger does not like the standard root symbol) and a functional expression. Thus, a noun like dog is not an atom but a little set (viz. {n, R(dog)}).

To get the labeling to work our right, Chomsky makes a second assumption: roots are too “weak” to label (i.e. they are inherently incapable of labeling). Given this, the MLA must choose n as label of {n,R(dog)} for the root is not a potential label.[6] Thus, the MLA identifies the shallowest that can act as a label in an {a,b} structure.

In sum, the “bottom of the tree” problem (misnamed as there are no trees if Chomsky is right and thinking in tree terms is positively harmful to your theoretical linguistic health, as Chomsky notes (see below)) follows from the MLA, a rule that seeks out the most prominent potential labeler.

Comment: Chomsky’s assumption is the thin end of a very dangerous wedge, I believe. Recall the minimalist project is to minimize the grammatically parochial in FL. This has been the impetus behind eliminating levels and simplifying rules to their conceptually bare minimum (e.g. eliminate traces). Indeed, as Chomsky has previously argued, we don’t want grammar internal primitives for they exacerbate Darwin’s Problem (DP). With this in mind, what are we to make of elements like n, v, a, the functional atoms that convert roots into categories?  Aren’t these quintessentially grammar internal elements? And if so, don’t they need to get into FL as well? Put another way, roots are just lexical atoms. Categories like n,v,a are grammatical formatives that turn lexical atoms into grammatically manipulable objects.  This suggests that in their absence, LSOs cannot interface with CI. But it seems unlikely to me that n,v,a are CI interface categories. But if not, they are grammatical constructs whose existence exacerbate DP.

Perhaps these functional elements relate to lexicalization, the other mystery Chomsky often discusses. As noted in discussion of lecture 1 (here), Chomsky has identified two very distinctive properties of natural language: hierarchical recursion and the large number and odd interpretive properties of lexical items. Maybe n,v,a relate to the mysterious issues surrounding lexicalization. At any rate, it is not clear to me why Merge can’t work with roots alone or why roots should be so grammatically impotent. We can assume this and it works. But the claim that the weakness assumption is conceptually minimal still needs some argument, I believe. Btw, I generally stiffen when people use ‘weak’ and ‘strong’ in explanations. They are words that often signal that we don’t know what we are talking about. Here, ‘weak’ just means that it cannot label. Don’t be fooled into thinking that it explains why labeling is impossible. It doesn’t. It’s just a stipulation to make the trains run on time.[7] 

Second point: It would be nice to have some more examples to play with. The dog is easy when compared with eat it. What’s the structure of the latter? It cannot be {(R(eat), {n,R(it)}} for then this could not be labeled at all, or it would be labeled n as n is the most prominent label-ful expression.  Nor can it be {{v, R(eat)}, it} for the MLA would label this an it-phrase. Nor can it be  {{v,R(eat)}, {n,it}} for here the MLA cannot apply unambigously. So what is it? Note that this is just the bottom of the tree problem (i.e. {X0, Y0}) all over again, but this time with roots factored in. If anyone has any ideas about this, please let me/us know. One unfortunate feature of the lectures that I mentioned at the outset is the dearth of illustrative examples. There are a few, but we really don’t get to see many full-fledged derivational details, and for fixing ideas this is definitely not a plus. Moreover, Chomsky does not respond well when asked to provide concrete examples. David Pesetsky and Sabine Iatridou (and a few others who I could not identify) asked for these (at least obliquely) during the lectures, but without success. Oh well, I guess it’s up to us. So let me know.

[1] I should say, “English” to note that English really does not exist (it is an abstraction). But I won’t. Put in scare quotes where appropriate.
[2] Thus we can get (i) but not (ii):
(i)             Persuade Mary that Frank loved her John did
(ii)           *Persuaded Mary that Frank loved her John
(ii) would be derived by moving T’ (T0+vP) to the front on analogy with (i) where vP is fronted.  At any rate, there is virtually no evidence that Gs ever target X’-projections, a point (and problem) that Chomsky made (and addressed) in his 95 book.
[3] The other classical data motivating endocentric labeling are selection and subcategorization restrictions. The relevant kinds of data are reviewed in Understanding Minimalism chapter 6.
[4] In the simple case, the assumption that labels are atoms follows from the non-ambiguity thesis. However, in the more complex cases that Chomsky discusses, it is not clear why complex elements cannot serve as labels. Nor is it clear to me whether this makes any difference.
[5] Chomsky has long taken the head-complement relation as grammatically basic. This was true in LGB where objects were the best behaved of elements and this is true in this case as well; the simple head complement case being the unproblematic one.  You see it as well in his objections to Specifiers. On the face of it, it’s hard to see why the distinction between first merged element and second merged element should make much of a grammatical difference. But Chomsky seems to have the intuition that first merged elements are less problematic grammatically than non first merged expressions. If anyone knows why, send me a note.
[6] If this is correct, then the search procedure is not “minimal.” The MLA must scan the entire structure, i.e. both the root and the n to find the head, at least in this case. Rather what’s critical is the non-ambiguity assumption, i.e. the assumption that the choice of label is unambiguous.
[7] In addition to informal concepts like weak and strong, another sure sign of mystery mongering is capitalization. Remember SUBJECT or CHAIN?


  1. For your 'eat it' example, isn't it the case that it's not labellable, so the root has to raise to v, then the result is labelled by whatever labels 'it' (say 'it' is D, but I guess it could be more complex with a root at its base, perhaps with the root just having the meaning of a variable (like I suggest in my Bare Resumptives paper).

    1. Chomsky was quite hostile to head raising so I doubt he'd go for this. Maybe this indicate she should. However, he has little to say about what to do with head mvmt phenomena. Maybe in lecture 4.

    2. I think he does suggest that root to v raising is what happens in either the Olomouc talk ( or maybe the Keio one (I can't recall) but the idea is also in a draft that's floating around where the idea that root raises to v does other work too.

    3. So label on phrase it leaves is D?

    4. I believe so. Means that all theta relations have to be calculated at phase level.

    5. Interesting. Lecture 4, which I've not gotten to yet, will hopefully discuss head mvmt. As I noted Chomsky was very skeptical about it. It violates the NTC so, on his terms, it should not exist. I assume that this will be another case of a "near contradiction." I love those.

    6. @David:
      "I believe so. Means that all theta relations have to be calculated at phase level."

      This is interesting. The major impetus for having labels in chapter 3 is to ready SLOs for the interfaces. He says that though labels are NOT required for the CS, they are critical to CI and SM interpretation. But, and this is why your popint is interesting, it seems that labels muck up theta role interpretation which is why this must take place before the output of labeling is no longer a distant memory. In other words, it seems to be saying that labeling is BAD for CI interpretation. One might think that this undercuts the basic motivation of the proposal.

  2. Assuming that the structure of "eat it" is { { v, R }, { n, R} }, isn't another option agreement between v and n, yielding a phi-label? If this is an option in the case of the Spec-TP cases it should be (in principle) possible here as well. At least that should be a possible approach. I still don't quite understand what the consequences of having the label `phi' should be, however.

  3. Cute idea. Won't be spec- head however. Also, the question of whether this is v or V becomes important, right?

    1. Could you expand on what you mean by "whether this is v or V" question, there?

    2. MP accounts have generally assumed that if x theta marks y then x cannot case/agree with y. This fits if we distinguish v for V. If not there are issues. Also if it is V that agrees, the root bearing set, then we don't have agreement in sp-x configuration. This would not be problematic if v were the agreeing element. The problem would then be what the label of the bottom set is as there is nothing to label it. Of course maybe it needs none. But why not? That's it.