Faculty of Language: Labels

Showing posts with label Labels. Show all posts

Monday, July 21, 2014

What's in a Category? [Part 2]

Last week I wondered about the notion of syntactic category, aka part of speech (POS). My worry is that we have no clear idea what kind of work POS are supposed to do for syntax. We have some criteria for assigning POS to lexical items (LI) --- morphology, distribution, semantics --- but there are no clear-cut rules for how these are weighed against each other. Even worse, we have no idea why these are relevant criteria while plausible candidates such as phonological weight and arity seem to be irrelevant.¹ So what we have is an integral part of pretty much every syntactic formalism for which we cannot say

what exactly it encompasses,
why it is necessary,
why it shows certain properties but not others.

Okay, that's a pretty unsatisfying state of affairs. Actually, things are even more unsatisfying once you look at the issue from a formal perspective. But the formal perspective also suggests a way out of this mess.

What's in a Category? [Part 1]

Norbert's most recent comments on Chomsky's lecture implicitly touched on an issue that I've been pondering ever since I realized how categories can be linked to constraints. Norbert's primary concern is the role of labels and how labeling may drive the syntactic machinery. Here's what caught my attention in his description of these ideas:

In effect, labels are how we create equivalence classes of expressions based on the basic atomic inventory. Another way of saying this is that Labeling maps a "complex" set {a,b} to either a or b, thereby putting it in the equivalence class of 'a' or 'b'. If Labels allow Select to apply to anything in the equivalence class of 'a' (and not just to 'a' alone), we can derive [structured linguistic objects] via Iteration.

Unless I'm vastly miscontruing Norbert's proposal, this is a generalization of the idea of labels as distribution classes. Linguists classify slept and killed Mary as VPs because they are interchangeable in all grammatical sentences of English. Now in Norbert's case the labels presumably aren't XPs but just lexical items, following standard ideas of Bare Phrase Structure. Let's ignore this complication for now (we'll come back to it later, I promise) and just focus on the issue that causes my panties to require some serious untwisting:

Are syntactic categories tied to distribution classes in some way?
If not, what is their contribution to the formalism?
What does it mean for a lexical item to be, say, a verb rather than a noun?
And why should we even care?

Comments on lecture 3-part the first

For some mysterious reason, these comments kept growing and growing. I have decided to split them into three parts for easier handling. It also has the advantage of allowing me a break over the coming long weekend. Oh yes, here's a link to the lectures again.

People, fasten your seatbelts and put up your tray tables, in the third lecture Chomsky really puts the pedal to the me(n)tal. The first two lectures saw Chomsky arguing that the minimal recursive operation (Merge, the single evolutionary “miracle”) in combination with minimal computation (Phases by way of minimal search/minimal “memory” and a gift from physics (if we only understood it better) and so not part of UG) provides Gs that generate an infinite number of structured linguistic objects (SLO) which manifest displacement, reconstruction, morphology and cyclicity effects. In lecture 3, Chomsky turns his attention to projection (i.e. X’ theory), in particular the idea that phrases are headed. He argues that X’ theory is fundamentally misguided. In particular, Chomsky argues that within the computational system (CS) units are not headed (i.e. there is no analogue of the labeled phrase), though the objects transferred to the sensory-motor (SM) and conceptual-intentional (CI) systems are headed, though, so far as I can tell, not headed with the conventional labels. There are no more CPs or TPs or VPs, rather there are f headed, WH headed, Topic headed etc expressions.

In what follows, I will try to present (i) Chomsky’s arguments against labeled expressions in CS, (ii) his proposed minimal labeling algorithm (MLA) and (iii) the main “empirical” payoff of the MLA. I put “empirical” in scare quotes because as becomes clear, and Chomsky emphasizes, the results deal with highly idealized examples. Getting the suggestions to pan out in detail are left as exercises/projects for the sympathetic. The main emphasis of lecture 3, like that in the first two, is on conceptual arguments that maintain that any analysis other than the one that on offer is, at best, a fancy re-description of the phenomenon of interest. Why? Because Chomsky claims to offer the conceptually most minimal requirements. If correct, all other accounts are more complex and so must carry less explanatory oomph. The methodological assumption is that departures from the conceptual minimum can only stipulate, not explain. As in previous posts, I will throw my 2 cents in occasionally during the course of exposition.

1. Why Labels/Projections are Problematic

From the perspective of lectures 1 and 2, there is little surprise, really, that Chomsky is hostile to the idea that labeling/projection is part of CS. The reason is that if Merge is the one “miracle” (and by this I just mean the simple adventitious evo addition that made I-language possible) then labeling (the operation that underlies projection) is a definite complication. Indeed, as Chomsky notes, Merge without labels is sufficient to generate an infinite number of hierarchical LSOs with movement like dependencies. There is no need for labels to do this and so, adding labels cannot be the conceptually most minimal assumption. Conclusion? No labels in the CS part of G.

But, Chomsky believes that empirically there is good evidence for labels (though, curiously, he does not review what this is). So if they don’t arise in CS where do they come from? Chomsky proposes that labels are required for LSOs to be illegible at the interfaces. In other words, Bare Output Conditions (BOC) demand labeled LSOs. No labels, no phonetic or CIish interpretations.

It is worth observing that this line of argument follows quite neatly from Chomsky’s formulation of the SMT. Recall that the SMT is the view that FL is the optimal realization of interface conditions. What Chomsky means by this is not that human Gs match properties of the interface as well as any G can (this is what I used to think he meant, but I was wrong). Rather it means that FL is what we get from (the) optimal (i.e. simplest, most conceptually minimal) rules and computational principles that make the most minimal adjustments to accommodate the readability conditions of CI (and secondarily and with a lot more tinkering) SM. As Merge is the conceptually simplest possible rule, then if labels exist (and Chomsky assumes they do), they must reflect the demands of the interfaces. So what are labels? They must be the products of the conceptually simplest rules that satisfy the legibility requirements of the interfaces. There is nowhere else for them to come from given Chomsky’s minimalist conceptual considerations, driven as they are by Occam.

We can take this reasoning a further step: if the interfaces is where labels are needed, then labeling optimally occurs at the point where SLOs are transferred to the interfaces (i.e. during Transfer at the phase level). Thus, labels are not part of the “transformational” component of G. Merging and CS could care less about labels. CI and SM care a whole bunch.

A comment: Chomsky likes to note in this lecture that making things simple regularly generates puzzles. I think he is right, and the point is an important one. Stipulations resolve problems, but in very uninteresting ways. So, it is not surprising that conceptual simplification generates empirical problems. Here’s one that Chomsky did not mention in his lecture: there is plenty of evidence that Gs manipulate some kinds of phrases and not others. Thus, English has VP fronting and VP ellipsis, French does not.[1] Or English overtly moves WHs while Chinese does not. This fact has two sides. First, that English moves verb phrases, not just verbs. Indeed, English doesn’t like to move verbs, but finds moving verb phrases just fine. And second that English seems to care less how big the verb phrases are. The displaced element can be one word or arbitrarily large so long as it is a verb phrase. The fact that phrases (Max Ps) move (rather than X’s say) is considered a descriptive staple.[2] Such data make sense if units in CS are phrase like objects (viz. labeled constituents). But sans labels (and hence sans notions like XP and X’) well, let’s just say that why/how we get these kinds of data is a bit of a puzzle. One that I am sure that Chomsky would welcome and I hereby bequeath to him.

I should add that I believe that these kinds of facts, staples of the GG literature since “the earliest days of Generative Grammar,” are very problematic for an approach like Chomsky’s. To date (for the last decade at least), we have stopped worrying about such data relegating their eventual understanding to unknown principles of pied piping. Maybe, but it is worth noting that one of the classical reasons for postulating labeled phrases and rules that target them are these kinds of constituency tests. And these tests, at least to me, seem more robust than any of the data that Chomsky cites in lecture 3 (e.g. EPP and Subject/object asymmetries).[3]

2. The Minimal Labeling Algorithm

Ok, let’s back to lecture 3. So where do labels come from? They are the product of the conceptually minimal labeling algorithm (MLA); a rule that looks for the most “prominent” expression in a set. So, if Merge creates objects like {a,b} then at Transfer, MLA applies to find the most prominent element in the set and that is the set’s Label. “Most prominent element” means, least embedded atom. This works well, Chomsky notes, in the standard head-complement case where a is an atom (viz. an X⁰ in X’ terms) and b is a set of atoms (and XP in X’ terms). In such a case a is more prominent than anything in b. Note that the MLA assumes that labels must be atoms for otherwise b is as prominent as a is. So, the MLA codes part of what X’ theory did: it labels some complex expressions (sets) in terms of properties of the head (in X’ terms) of that complex expression (set). Chomsky assumes that this application of MLA is conceptually unproblematic as it is very simple (i.e. minimal search) and can apply unambiguously (i.e. there must be exactly one most prominent atom to find).[4] Take note of this non-ambiguity assumption, for it plays a major role in Chomsky’s account.

Chomsky next turns to the problematic cases.[5] First he considers the old “bottom of the tree” problem. Merge can combine two atoms to yield a structure where a and b are both atoms. An example might be {the,dog} or {eat, it}. Here we would seem to have two atoms and so application of the MLA is ambiguous (something that Chomsky considers a computationally unacceptable situation (I’m not sure why? G operations must be deterministic?)). He resolves this by adopting a Distributed Morphology proposal concerning categorization. In particular, he assumes that lexical atoms are not Ns, Vs, As, etc. but roots. Being an N, V or A is not a property of an atom, but a relation between a root (R) (I am using R because Blogger does not like the standard root symbol) and a functional expression. Thus, a noun like dog is not an atom but a little set (viz. {n, R(dog)}).

To get the labeling to work our right, Chomsky makes a second assumption: roots are too “weak” to label (i.e. they are inherently incapable of labeling). Given this, the MLA must choose n as label of {n,R(dog)} for the root is not a potential label.[6] Thus, the MLA identifies the shallowest that can act as a label in an {a,b} structure.

In sum, the “bottom of the tree” problem (misnamed as there are no trees if Chomsky is right and thinking in tree terms is positively harmful to your theoretical linguistic health, as Chomsky notes (see below)) follows from the MLA, a rule that seeks out the most prominent potential labeler.

Comment: Chomsky’s assumption is the thin end of a very dangerous wedge, I believe. Recall the minimalist project is to minimize the grammatically parochial in FL. This has been the impetus behind eliminating levels and simplifying rules to their conceptually bare minimum (e.g. eliminate traces). Indeed, as Chomsky has previously argued, we don’t want grammar internal primitives for they exacerbate Darwin’s Problem (DP). With this in mind, what are we to make of elements like n, v, a, the functional atoms that convert roots into categories? Aren’t these quintessentially grammar internal elements? And if so, don’t they need to get into FL as well? Put another way, roots are just lexical atoms. Categories like n,v,a are grammatical formatives that turn lexical atoms into grammatically manipulable objects. This suggests that in their absence, LSOs cannot interface with CI. But it seems unlikely to me that n,v,a are CI interface categories. But if not, they are grammatical constructs whose existence exacerbate DP.

Perhaps these functional elements relate to lexicalization, the other mystery Chomsky often discusses. As noted in discussion of lecture 1 (here), Chomsky has identified two very distinctive properties of natural language: hierarchical recursion and the large number and odd interpretive properties of lexical items. Maybe n,v,a relate to the mysterious issues surrounding lexicalization. At any rate, it is not clear to me why Merge can’t work with roots alone or why roots should be so grammatically impotent. We can assume this and it works. But the claim that the weakness assumption is conceptually minimal still needs some argument, I believe. Btw, I generally stiffen when people use ‘weak’ and ‘strong’ in explanations. They are words that often signal that we don’t know what we are talking about. Here, ‘weak’ just means that it cannot label. Don’t be fooled into thinking that it explains why labeling is impossible. It doesn’t. It’s just a stipulation to make the trains run on time.[7]

Second point: It would be nice to have some more examples to play with. The dog is easy when compared with eat it. What’s the structure of the latter? It cannot be {(R(eat), {n,R(it)}} for then this could not be labeled at all, or it would be labeled n as n is the most prominent label-ful expression. Nor can it be {{v, R(eat)}, it} for the MLA would label this an it-phrase. Nor can it be {{v,R(eat)}, {n,it}} for here the MLA cannot apply unambigously. So what is it? Note that this is just the bottom of the tree problem (i.e. {X⁰, Y⁰}) all over again, but this time with roots factored in. If anyone has any ideas about this, please let me/us know. One unfortunate feature of the lectures that I mentioned at the outset is the dearth of illustrative examples. There are a few, but we really don’t get to see many full-fledged derivational details, and for fixing ideas this is definitely not a plus. Moreover, Chomsky does not respond well when asked to provide concrete examples. David Pesetsky and Sabine Iatridou (and a few others who I could not identify) asked for these (at least obliquely) during the lectures, but without success. Oh well, I guess it’s up to us. So let me know.

[1] I should say, “English” to note that English really does not exist (it is an abstraction). But I won’t. Put in scare quotes where appropriate.

[2] Thus we can get (i) but not (ii):

(i) Persuade Mary that Frank loved her John did

(ii) *Persuaded Mary that Frank loved her John

(ii) would be derived by moving T’ (T⁰+vP) to the front on analogy with (i) where vP is fronted. At any rate, there is virtually no evidence that Gs ever target X’-projections, a point (and problem) that Chomsky made (and addressed) in his 95 book.

[3] The other classical data motivating endocentric labeling are selection and subcategorization restrictions. The relevant kinds of data are reviewed in Understanding Minimalism chapter 6.

[4] In the simple case, the assumption that labels are atoms follows from the non-ambiguity thesis. However, in the more complex cases that Chomsky discusses, it is not clear why complex elements cannot serve as labels. Nor is it clear to me whether this makes any difference.

[5] Chomsky has long taken the head-complement relation as grammatically basic. This was true in LGB where objects were the best behaved of elements and this is true in this case as well; the simple head complement case being the unproblematic one. You see it as well in his objections to Specifiers. On the face of it, it’s hard to see why the distinction between first merged element and second merged element should make much of a grammatical difference. But Chomsky seems to have the intuition that first merged elements are less problematic grammatically than non first merged expressions. If anyone knows why, send me a note.

[6] If this is correct, then the search procedure is not “minimal.” The MLA must scan the entire structure, i.e. both the root and the n to find the head, at least in this case. Rather what’s critical is the non-ambiguity assumption, i.e. the assumption that the choice of label is unambiguous.

[7] In addition to informal concepts like weak and strong, another sure sign of mystery mongering is capitalization. Remember SUBJECT or CHAIN?

Monday, February 3, 2014

Derivation Trees: Syntacticians' Best Friend?

Usually I would have been busy today bringing the excitement of parts of speech and phrase structure rules to a group of surprisingly energetic students --- some of them take the train all the way from NYC, which according to my calculations means that they have to get up around 5:30 in the morning. But at least today they get to sleep in, as the roads and tracks are covered in this weird, cocaine-colored substance that has been pestering us east coast residents for quite a while now and has even found its way south to Georgia. So classes are canceled, and I'm sitting at home with a glass of milk and a habitual desire to talk about syntax. Good thing there's tons of stuff I haven't told you about Minimalist grammars yet, starting with derivation trees.

Got Culture?

In the last chapter of Dehaene’s Reading the Brain he speculates about one of the really big human questions: whence culture? The books big thesis, concentrating on reading and writing as vehicles for cultural transmission, is the Neuronal Recycling Thesis (NRT). The idea is simple; culture supervenes on neuronal mechanisms that arose to serve other ends. Think exaptation as applied to culture. Thus, reading and writing are underpinned by proto letters, which themselves live on ecologically natural patterns useful for object recognition. So too, the hope goes, for the rest of what we think of as culture. However, as Dehaene quickly notes, if this is the source, and “we share most, if not all of these processors [i.e. recycled structures NH] with other primates, why are we the only species to have generated immense and well-developed cultures” (loc 4999). Dehaene has little patience for those who fail to see a qualitative difference between human cultural achievements and those of our ape cousins.

…the scarcity of animal cultures and the paucity of their contents stand in sharp contrast to the immense list of cultural traditions that even the smallest human groups develop spontaneously. (loc 4999)

Dehaene specifically points to the absence of “graphic invention” in primates as “not due to any trivial visual or motor limitation” or to a lack of interest in drawing, apparently (loc 5020). He puts the problem nicely:

If cultural invention stems from the recycling of brain mechanisms that humans share with other primates, the immense discrepancy between the cultural skills of human beings and chimpanzees needs to be explained. (loc 5020)

He also surveys several putative answers, and finds them wanting. His remarks on Tomasello (loc 5046-5067) seem to me quite correct, noting that though Tomasello’s mind reading account might explain how culture might spread and its achievements retained cross generationally:[1]

…it says little…about the initial spark that triggers cultural invention. No doubt the human species is particularly gifted at spreading culture – but it is also the only species to create culture in the first place. (loc 5067, his emphasis)

So what’s Dehaene’s proposal?

My own view is that another singular change was needed - the capacity to arrive at new combinations of ideas and the elaboration of a conscious mental synthesis (loc 5067).

This is quite a mouthful, and so far as I can see, what Dehaene means by this is that our frontal lobe got bigger and that this provided a “”neuronal workspace” whose main function is to assemble, confront, recombine, and synthesize knowledge” (loc 5089).

I don’t find this particularly enlightening. It’s neuro-speak for something happened, relevant somethings always involving the brain (wouldn’t it be refreshing if every once in a while the kidney, liver or heart were implicated!). In other words, the brain got bigger and we got culture. Hmm. This might be a bit unfair. Dehaene does say more.

He notes that the primate cortex, in contrast to ours, is largely modular, with “its own specific inputs, internal structure, and outputs.” Our prefrontal cortex in contrast “emit and receive much more diverse cortical signals” and so “tend to be less specialized.” In addition, the our brains are less “modular” and have greater “bandwidth.” This works to prevent “the division of data and allows out behavior to be guided by any combination of information from past or present experience.” (loc 5089)

Broken down to its essentials, Dehaene is here identifying the demodularization of thought as the key ingredient to the emergence of culture. As he notes (loc 5168), in this he agrees with Liz Spelke (and others) who has argued that the general ability to integrate information across modules is what spices up our thinking beyond what we find in other primates. Interestingly for my purposes here, Spelke ties this capacity for cross module integration to the development of linguistic facility (see here).

This assumption, that language is a necessary condition for the emergence of the kind of culture we see in humans is consistent with the hypothesis Minimalists have been assuming (following people like Tatersall (here)) that the anthropological “big bang,” which occurred in the last 25-50,000 years, piggy backed on the emergence of FL in the last 50-100,000 years. Moreover, it’s language as module buster that gets the whole amazing culture show on the road.

But what features of language make it a module buster? What allows grammar to “assemble and recombine” otherwise modular information? What’s the secret linguistic sauce?

Sadly, neither Dehaene nor Spelke say. Which is too bad as me and my lunch buddies (thx Paul, Bill) have discussed this question off and on for several years now, without a lot to show for it. However, let me try to suggest a key characteristic that we (aka I) believe is implicated. The key is syntax!

The idea is that FL provides a general-purpose syntax for combining information trapped within modules. Syntax is key here, for I am assuming (almost certainly wrongly, so feel free to jump in at any point) what makes information modular is some feature of the module internal representations that make it difficult for them to “combine” with extra-modular information. I say syntax for once information trapped within a module can combine with information in another module it appears that, more often than not, the combination can be interpreted. Thus, it’s not that the combination of modularly segregated concepts is semantically undigestible, rather the problem seems to be getting the concepts to talk to one another in the first place, and, I take this to mean, to syntactically combine. So module busting will amount of figuring out how to treat otherwise distinct expressions in the same way. We need some kind of abstract feature that, when attached to an arbitrary expression, allows it to combine with any other expression from any other module. What we need, in effect, is, what Chomsky called, an “edge-feature,” (EF) a thingamajig that allows expressions to freely combine.

Now, if you are like me, you will not find this proposal a big step forward for it seems to more name a solution than provide one. After all, what can EFs be such that they possess such powers? I am not sure, but I am pretty confident that whatever this power is it’s purely syntactic. It is an intrinsic property of lexical atoms and it is an inherited property of congeries of such (i.e. outputs of Merge). I have suggested (here) that EFs are, in fact, labels, which function to close Merge in the domain of the lexical items (LIs). In the same place I proposed that labeling is the distinctively linguistic operation, which in concert with other cognitively recycled operations, allowed for the emergence of FL.

How might labels do this? Good question. An answer will require addressing a more basic question: what are labels? We know what they must do: they must license the combination both of lexical atoms and complexes of such. Atomic LIs are labels. Complexes of LIs are labeled in virtue of containing atomic ones. The $64,000 question (doesn’t sound like much of a prize anymore, does it?) is how to characterize this. Stay tuned.

So, culture supervenes on language and language is the recycling of more primitive cognitive operations spiced with a bit of labeling. Need I say that this is a very “personal” (read “extremely idiosyncratic and not currently fashionable”) view? Current MP accounts are very label-phobic. However, the question Dehaene raises is a good one, especially for theories like MP that presuppose lots of cognitive recycling.[2] It’s not one whose detailed answer is anywhere on the horizon. But like all good questions, I suspect that it will have lots of staying power and will provide lots of opportunities for fun conversations.

[1] It’s good to see that Tomasello is capable of begging the interesting question regardless of where he puts his efforts.

[2] See discussion in the comments I had with Jan Koster about this my previous post (here).

Faculty of Language

Comments

Monday, July 21, 2014

What's in a Category? [Part 2]

Monday, July 14, 2014

What's in a Category? [Part 1]

Wednesday, July 2, 2014

Comments on lecture 3-part the first

Monday, February 3, 2014

Derivation Trees: Syntacticians' Best Friend?

Monday, August 26, 2013

Got Culture?

Contributors