For some mysterious reason, these comments kept growing and
growing. I have decided to split them into three parts for easier handling. It
also has the advantage of allowing me a break over the coming long weekend. Oh yes, here's a link to the lectures again.
People, fasten your seatbelts and put up your tray tables,
in the third lecture Chomsky really puts the pedal to the me(n)tal. The first two lectures saw Chomsky arguing
that the minimal recursive operation (Merge, the single evolutionary “miracle”)
in combination with minimal computation (Phases by way of minimal
search/minimal “memory” and a gift from physics (if we only understood it
better) and so not part of UG) provides Gs that generate an infinite number of
structured linguistic objects (SLO) which manifest displacement,
reconstruction, morphology and cyclicity effects. In lecture 3, Chomsky turns his attention to
projection (i.e. X’ theory), in particular the idea that phrases are headed. He
argues that X’ theory is fundamentally misguided. In particular, Chomsky argues
that within the computational system (CS) units are not headed (i.e. there is
no analogue of the labeled phrase), though the objects transferred to the
sensory-motor (SM) and conceptual-intentional (CI) systems are headed, though, so far as I can tell, not headed with the
conventional labels. There are no more CPs or TPs or VPs, rather there are f
headed, WH headed, Topic headed etc expressions.
In what follows, I will try to present (i) Chomsky’s
arguments against labeled expressions in CS, (ii) his proposed minimal labeling
algorithm (MLA) and (iii) the main “empirical” payoff of the MLA. I put
“empirical” in scare quotes because as becomes clear, and Chomsky emphasizes,
the results deal with highly idealized examples. Getting the suggestions to pan
out in detail are left as exercises/projects for the sympathetic. The main emphasis of lecture 3, like that in
the first two, is on conceptual arguments that maintain that any analysis other
than the one that on offer is, at best, a fancy re-description of the
phenomenon of interest. Why? Because Chomsky claims to offer the conceptually
most minimal requirements. If
correct, all other accounts are more complex and so must carry less explanatory
oomph. The methodological assumption is that departures from the conceptual
minimum can only stipulate, not explain. As in previous posts, I will throw my
2 cents in occasionally during the course of exposition.
1. Why
Labels/Projections are Problematic
From the perspective of lectures 1 and 2, there is little
surprise, really, that Chomsky is hostile to the idea that labeling/projection
is part of CS. The reason is that if
Merge is the one “miracle” (and by this I just mean the simple adventitious evo
addition that made I-language possible) then labeling (the operation that
underlies projection) is a definite complication. Indeed, as Chomsky notes,
Merge without labels is sufficient to
generate an infinite number of hierarchical LSOs with movement like
dependencies. There is no need for
labels to do this and so, adding
labels cannot be the conceptually most minimal assumption. Conclusion? No
labels in the CS part of G.
But, Chomsky believes that empirically there is good evidence for labels (though, curiously,
he does not review what this is). So if they don’t arise in CS where do they
come from? Chomsky proposes that labels
are required for LSOs to be illegible at the interfaces. In other words, Bare
Output Conditions (BOC) demand labeled LSOs. No labels, no phonetic or CIish interpretations.
It is worth observing that this line of argument follows
quite neatly from Chomsky’s formulation of the SMT. Recall that the SMT is the
view that FL is the optimal realization of interface conditions. What Chomsky
means by this is not that human Gs
match properties of the interface as well as any G can (this is what I used to
think he meant, but I was wrong). Rather it means that FL is what we get from
(the) optimal (i.e. simplest, most conceptually minimal) rules and
computational principles that make the most minimal adjustments to accommodate
the readability conditions of CI (and secondarily and with a lot more
tinkering) SM. As Merge is the conceptually simplest possible rule, then if
labels exist (and Chomsky assumes they do), they must reflect the demands of
the interfaces. So what are labels? They must be the products of the
conceptually simplest rules that satisfy the legibility requirements of the
interfaces. There is nowhere else for them to come from given Chomsky’s
minimalist conceptual considerations, driven as they are by Occam.
We can take this reasoning a further step: if the interfaces
is where labels are needed, then labeling optimally occurs at the point where
SLOs are transferred to the interfaces (i.e. during Transfer at the phase
level). Thus, labels are not part of the “transformational” component of
G. Merging and CS could care less about
labels. CI and SM care a whole bunch.
A comment: Chomsky likes to note in this lecture that making
things simple regularly generates puzzles. I think he is right, and the point
is an important one. Stipulations resolve problems, but in very uninteresting
ways. So, it is not surprising that conceptual simplification generates
empirical problems. Here’s one that Chomsky did not mention in his lecture:
there is plenty of evidence that Gs manipulate some kinds of phrases and not
others. Thus, English has VP fronting and VP ellipsis, French does not.[1]
Or English overtly moves WHs while Chinese does not. This fact has two sides. First, that English
moves verb phrases, not just verbs.
Indeed, English doesn’t like to move verbs, but finds moving verb phrases just fine. And second that
English seems to care less how big
the verb phrases are. The displaced
element can be one word or arbitrarily large so long as it is a verb phrase. The fact that phrases (Max
Ps) move (rather than X’s say) is considered a descriptive staple.[2]
Such data make sense if units in CS are phrase like objects (viz. labeled
constituents). But sans labels (and hence sans notions like XP and X’) well,
let’s just say that why/how we get these kinds of data is a bit of a puzzle.
One that I am sure that Chomsky would welcome and I hereby bequeath to
him.
I should add that I believe that these kinds of facts,
staples of the GG literature since “the earliest days of Generative Grammar,”
are very problematic for an approach like Chomsky’s. To date (for the last
decade at least), we have stopped worrying about such data relegating their
eventual understanding to unknown principles of pied piping. Maybe, but it is
worth noting that one of the classical reasons for postulating labeled phrases
and rules that target them are these kinds of constituency tests. And these
tests, at least to me, seem more robust than any of the data that Chomsky cites
in lecture 3 (e.g. EPP and Subject/object asymmetries).[3]
2. The
Minimal Labeling Algorithm
Ok, let’s back to lecture 3.
So where do labels come from? They are the product of the conceptually
minimal labeling algorithm (MLA); a rule that looks for the most “prominent”
expression in a set. So, if Merge
creates objects like {a,b} then at Transfer, MLA applies to find the most
prominent element in the set and that is the set’s Label. “Most prominent element” means, least
embedded atom. This works well, Chomsky notes, in the standard head-complement
case where a is an atom (viz. an X0
in X’ terms) and b is a set of atoms
(and XP in X’ terms). In such a case a
is more prominent than anything in b.
Note that the MLA assumes that labels must
be atoms for otherwise b is as
prominent as a is. So, the MLA codes part of what X’ theory did:
it labels some complex expressions (sets) in terms of properties of the head (in X’ terms) of that complex
expression (set). Chomsky assumes that this application of MLA is conceptually
unproblematic as it is very simple (i.e. minimal search) and can apply
unambiguously (i.e. there must be exactly one most prominent atom to find).[4]
Take note of this non-ambiguity assumption, for it plays a major role in
Chomsky’s account.
Chomsky next turns to the problematic cases.[5]
First he considers the old “bottom of the tree” problem. Merge can combine two
atoms to yield a structure where a
and b are both atoms. An example might be {the,dog} or {eat, it}.
Here we would seem to have two atoms and so application of the MLA is ambiguous
(something that Chomsky considers a computationally unacceptable situation (I’m
not sure why? G operations must be deterministic?)). He resolves this by
adopting a Distributed Morphology proposal concerning categorization. In
particular, he assumes that lexical atoms are not Ns, Vs, As, etc. but roots.
Being an N, V or A is not a property of an atom, but a relation between a root
(R) (I am using R because Blogger does not like the standard root symbol) and a
functional expression. Thus, a noun like dog
is not an atom but a little set (viz. {n, R(dog)}).
To get the labeling to work our right, Chomsky makes a
second assumption: roots are too “weak” to label (i.e. they are inherently
incapable of labeling). Given this, the MLA must choose n as label of {n,R(dog)} for the root is not a potential label.[6]
Thus, the MLA identifies the shallowest
that can act as a label in an {a,b} structure.
In sum, the “bottom of the tree” problem (misnamed as there
are no trees if Chomsky is right and thinking in tree terms is positively
harmful to your theoretical linguistic health, as Chomsky notes (see below))
follows from the MLA, a rule that seeks out the most prominent potential
labeler.
Comment: Chomsky’s assumption is the thin end of a very
dangerous wedge, I believe. Recall the minimalist project is to minimize the
grammatically parochial in FL. This has been the impetus behind eliminating
levels and simplifying rules to their conceptually bare minimum (e.g. eliminate
traces). Indeed, as Chomsky has previously argued, we don’t want grammar
internal primitives for they exacerbate Darwin’s Problem (DP). With this in
mind, what are we to make of elements like n,
v, a, the functional atoms that convert roots into categories? Aren’t these quintessentially grammar
internal elements? And if so, don’t they need to get into FL as well? Put
another way, roots are just lexical atoms. Categories like n,v,a are grammatical formatives that turn lexical atoms into
grammatically manipulable objects. This
suggests that in their absence, LSOs cannot interface with CI. But it seems
unlikely to me that n,v,a are CI
interface categories. But if not, they are grammatical constructs whose
existence exacerbate DP.
Perhaps these functional elements relate to lexicalization,
the other mystery Chomsky often discusses. As noted in discussion of lecture 1
(here),
Chomsky has identified two very distinctive properties of natural language:
hierarchical recursion and the large number and odd interpretive properties of
lexical items. Maybe n,v,a relate to
the mysterious issues surrounding lexicalization. At any rate, it is not clear
to me why Merge can’t work with roots alone or why roots should be so
grammatically impotent. We can assume
this and it works. But the claim that the weakness assumption is conceptually
minimal still needs some argument, I believe. Btw, I generally stiffen when
people use ‘weak’ and ‘strong’ in explanations. They are words that often
signal that we don’t know what we are talking about. Here, ‘weak’ just means
that it cannot label. Don’t be fooled into thinking that it explains why labeling is impossible. It
doesn’t. It’s just a stipulation to make the trains run on time.[7]
Second point: It would be nice to have some more examples to
play with. The dog is easy when
compared with eat it. What’s the
structure of the latter? It cannot be {(R(eat), {n,R(it)}} for then this could not be labeled at all, or it
would be labeled n as n is the most prominent label-ful
expression. Nor can it be {{v, R(eat)},
it} for the MLA would label this an it-phrase.
Nor can it be {{v,R(eat)}, {n,it}} for
here the MLA cannot apply unambigously. So what is it? Note that this is just
the bottom of the tree problem (i.e. {X0, Y0}) all over
again, but this time with roots factored in. If anyone has any ideas about
this, please let me/us know. One unfortunate feature of the lectures that I
mentioned at the outset is the dearth of illustrative examples. There are a
few, but we really don’t get to see many full-fledged derivational details, and
for fixing ideas this is definitely not a plus. Moreover, Chomsky does not
respond well when asked to provide concrete examples. David Pesetsky and Sabine
Iatridou (and a few others who I could not identify) asked for these (at least
obliquely) during the lectures, but without success. Oh well, I guess it’s up
to us. So let me know.
[1]
I should say, “English” to note that English really does not exist (it is an
abstraction). But I won’t. Put in scare quotes where appropriate.
[2]
Thus we can get (i) but not (ii):
(i)
Persuade Mary that Frank loved her John did
(ii)
*Persuaded Mary that Frank loved her John
(ii) would be derived by moving T’ (T0+vP)
to the front on analogy with (i) where vP is fronted. At any rate, there is virtually no evidence
that Gs ever target X’-projections, a point (and problem) that Chomsky made
(and addressed) in his 95 book.
[3]
The other classical data motivating endocentric labeling are selection and
subcategorization restrictions. The relevant kinds of data are reviewed in Understanding Minimalism chapter 6.
[4]
In the simple case, the assumption that labels are atoms follows from the
non-ambiguity thesis. However, in the more complex cases that Chomsky
discusses, it is not clear why
complex elements cannot serve as labels. Nor is it clear to me whether this
makes any difference.
[5]
Chomsky has long taken the head-complement relation as grammatically basic.
This was true in LGB where objects
were the best behaved of elements and this is true in this case as well; the
simple head complement case being the unproblematic one. You see it as well in his objections to
Specifiers. On the face of it, it’s hard to see why the distinction between
first merged element and second merged element should make much of a
grammatical difference. But Chomsky seems to have the intuition that first
merged elements are less problematic grammatically than non first merged
expressions. If anyone knows why, send me a note.
[6]
If this is correct, then the search procedure is not “minimal.” The MLA must
scan the entire structure, i.e. both the root and the n to find the head, at least in this case. Rather what’s critical
is the non-ambiguity assumption, i.e. the assumption that the choice of label is unambiguous.
[7]
In addition to informal concepts like weak and strong, another sure sign of
mystery mongering is capitalization. Remember SUBJECT or CHAIN?
For your 'eat it' example, isn't it the case that it's not labellable, so the root has to raise to v, then the result is labelled by whatever labels 'it' (say 'it' is D, but I guess it could be more complex with a root at its base, perhaps with the root just having the meaning of a variable (like I suggest in my Bare Resumptives paper).
ReplyDeleteChomsky was quite hostile to head raising so I doubt he'd go for this. Maybe this indicate she should. However, he has little to say about what to do with head mvmt phenomena. Maybe in lecture 4.
DeleteI think he does suggest that root to v raising is what happens in either the Olomouc talk (http://olinco.upol.cz) or maybe the Keio one (I can't recall) but the idea is also in a draft that's floating around where the idea that root raises to v does other work too.
DeleteSo label on phrase it leaves is D?
DeleteI believe so. Means that all theta relations have to be calculated at phase level.
DeleteInteresting. Lecture 4, which I've not gotten to yet, will hopefully discuss head mvmt. As I noted Chomsky was very skeptical about it. It violates the NTC so, on his terms, it should not exist. I assume that this will be another case of a "near contradiction." I love those.
Delete@David:
Delete"I believe so. Means that all theta relations have to be calculated at phase level."
This is interesting. The major impetus for having labels in chapter 3 is to ready SLOs for the interfaces. He says that though labels are NOT required for the CS, they are critical to CI and SM interpretation. But, and this is why your popint is interesting, it seems that labels muck up theta role interpretation which is why this must take place before the output of labeling is no longer a distant memory. In other words, it seems to be saying that labeling is BAD for CI interpretation. One might think that this undercuts the basic motivation of the proposal.
Assuming that the structure of "eat it" is { { v, R }, { n, R} }, isn't another option agreement between v and n, yielding a phi-label? If this is an option in the case of the Spec-TP cases it should be (in principle) possible here as well. At least that should be a possible approach. I still don't quite understand what the consequences of having the label `phi' should be, however.
ReplyDeleteCute idea. Won't be spec- head however. Also, the question of whether this is v or V becomes important, right?
ReplyDeleteCould you expand on what you mean by "whether this is v or V" question, there?
DeleteMP accounts have generally assumed that if x theta marks y then x cannot case/agree with y. This fits if we distinguish v for V. If not there are issues. Also if it is V that agrees, the root bearing set, then we don't have agreement in sp-x configuration. This would not be problematic if v were the agreeing element. The problem would then be what the label of the bottom set is as there is nothing to label it. Of course maybe it needs none. But why not? That's it.
Delete