Monday, July 14, 2014

What's in a Category? [Part 1]

Norbert's most recent comments on Chomsky's lecture implicitly touched on an issue that I've been pondering ever since I realized how categories can be linked to constraints. Norbert's primary concern is the role of labels and how labeling may drive the syntactic machinery. Here's what caught my attention in his description of these ideas:
In effect, labels are how we create equivalence classes of expressions based on the basic atomic inventory. Another way of saying this is that Labeling maps a "complex" set {a,b} to either a or b, thereby putting it in the equivalence class of 'a' or 'b'. If Labels allow Select to apply to anything in the equivalence class of 'a' (and not just to 'a' alone), we can derive [structured linguistic objects] via Iteration.
Unless I'm vastly miscontruing Norbert's proposal, this is a generalization of the idea of labels as distribution classes. Linguists classify slept and killed Mary as VPs because they are interchangeable in all grammatical sentences of English. Now in Norbert's case the labels presumably aren't XPs but just lexical items, following standard ideas of Bare Phrase Structure. Let's ignore this complication for now (we'll come back to it later, I promise) and just focus on the issue that causes my panties to require some serious untwisting:
  • Are syntactic categories tied to distribution classes in some way?
  • If not, what is their contribution to the formalism?
  • What does it mean for a lexical item to be, say, a verb rather than a noun?
  • And why should we even care?

Criteria for Assigning Parts of Speech

Let's assume that we are working in an old-school syntactic framework where each lexical item (LI) has one of finitely many parts of speech (POS), and POS-based subcategorization requirements interact with X'-style projection to capture the head-argument asymmetry. In such a system, POS act as equivalence classes over LIs. Projection serves two purposes:
  1. limited percolation of certain POSs so that subcategorization can be kept strictly local, and
  2. limiting the domain of subcategorization.
The second point captures the fact that heads supposedly do not select for the argument of an argument. So if a verb can take a DP as an argument, it does not matter what the determiner's NP argument looks like. This follows immediately in the standard setup because even if there are different kinds of NP, that information is never percolated high enough to matter at the point where selection of the DP takes place.

A priori, there are several properties we could use to put LIs into groups:
  • phonological weight
  • morphology
  • string contexts (which strings can be wrapped around the LI and yield a grammatical sentence?)
  • arity

Phonological Weight is Irrelevant

Phonological weight is clearly not a helpful notion in this case. Where weight seems to matter, such as with extraposition and Heavy NP shift, it's a measure of the number of LIs rather than their individual length --- Yesterday I introduced to Bill the man does not sound much worse than Yesterday I introduced to Bill the supercalifragilisticexpialidocious man, whereas Yesterday I introduced to Bill the man that you met at Sue's party is a lot better than the latter despite containing fewer syllables.

I have heard claims that in Latin, fronting of prepositions is dependent on syllable weight such that propter can be fronted but cum cannot. But even if this is true (I have been unable to verify the claim and feel like I've struggled with a few fronted cums during my high school years), it is a very peculiar exception. In general, syntax doesn't care about phonology.

Arity is Irrelevant

A more interesting case is arity. In principle, it seems worthwhile to classify LIs according to how many arguments they take. Determiners and prepositions would cluster together with transitive verbs, possessive 's with ditransitives, and so on. There are many systems for which this is the right way of carving up the inventory. In order to build well-formed formulas of propositional logic, for instance, you only need to know the arity of the symbols. But for language, this is utterly useless. Determiners do not have the same distribution as transitive verbs. Heck, they don't even have the same semantic type (<e,t,e> VS <e,t>), which is where one would expect arity to be most apparent.

So two viable candidates are already out. This is a crucial piece of the puzzle, actually: whatever we have to say about categories and labeling, it should explain why the concepts above do not yield natural syntactic classes.

Morphology and Distribution: Quite a Mess

This leaves morphology and string contexts. If you have taught an undergrad level syntax course before, you know that this is the operational semantics we attach to POSs. We tell our students that intransitives, transitives and ditransitives form a natural class because they all show the same verbal morphology that is very different from that of, say, determiners. And we bolster that argument by showing that since they all project the same VP label, they should have the same syntactic distribution, and lo and behold, they do. They also are all subject to processes that target VPs, for instance VP-ellipsis. So they do indeed form cluster together with respect to a variety of morphological and syntactic properties. But as pretty much everything in a linguistics undergrad curriculum, this is a half-truth at best.

First, the morphology argument becomes tricky as soon as you move beyond English. In several Germanic languages, determiners inflect in a manner that is similar to adjectives. So if you are lucky enough to have a sharp polyglot sitting in your course, they will happily bring this to the entire class's attention. Your reaction will probably be to sharpen the morphological argument: adjectives have comparative and superlative forms, determiners do not. Hence the two are still distinct. But some adjectives --- your students complain --- do not have these forms, either, for instance the analogues of alleged or former. At this point you switch gears: you admit that the morphological argument is inconclusive, but fortunately there's also the distribution test. All these adjectives can occur between a determiner and an NP, determiners cannot. Case solved!

Except that really smart students will keep pressing you. Some determiners do occur in such positions as in his every move. The adjectives that lack comparatives and superlatives cannot be used predicatively, so why do we put them in the same class? What about languages that use morphology to explicitly mark the difference between intransitives and ditransitives? If we need only some amount of overlap in all these properties, how much overlap? Why does everything have to be so fuzzy, why are there no clearcut answers without millions of exceptions? Gosh I hate this class, syntax sucks and you're the worst teacher ever!

A Plea for Action

The truth is, there are no perfect answers to your students' questions because there never was a concerted effort to work out that part of syntax (the closest I can think of is cartography, which does little to illuminate the status of POS). My impression is that categories simply weren't considered particularly important to get the theory off the ground, the traditional vocabulary of verb, noun, adjective, adverb, determiner, preposition, and complementizer does the job perfectly fine in most cases. So what we got is a canonical terminology that can be linked to clusters of properties that are realized by prototypical members of the respective classes, but how these criteria are applied in practice is a very fuzzy affair that quickly gives rise to inconsistencies.

Now if categories were just an inessential addendum to the theory, we could probably keep going in the same laid back fashion. But handwaving is no longer good enough. Minimalists should frown on the idea of a list of POS as yet another substantive universal of UG. Everyone should be miffed that we don't have a good understanding of what POS contribute be the theory, be it in the abstract sense as part of the labelling debate or more concretely in that we can't even define what it means to be a verb. We have reached a point in the development of the theory where we can't make progress on several issues without rethinking the notion of category/POS. More on that next time.


  1. This is a really interesting question -- I think the question of where the categories (or features for that matter) come from is a crucial choice point for theories. Because if they aren't innate, then UG presumably can't contain any statement that refers to them, and that would rule out much of the current conception of UG.
    But I think that maybe draws the innate/learned dichotomy a little too crudely.

    I agree that it is surprising that it isn't discussed more -- if you like playing these games, then Norbert's proposal that Label is the key ingredient is much more plausible than Chomsky's Merge, but if so that only makes the question of where the labels come from more acute.

  2. I second Alex's and your point re labels. It would be nice to know where they come from and what they are. In early MP papers, like ch 4 of the Black Book, labels were LIs themselves. Chomsky did not code for V/N etc but simple 'eat'Ps and 'dog'Ps. The problem is that there are clear generalizations across these phrases, e.g. 'eat'Ps function a like 'sleep'Ps and 'give'Ps. Why? Well, would like to say that they are all VPs but then the label issue arises again.

    In Aspects, these issues were part to the theory of substantive universals, as Thomas nots. And as a matter of fact, there has been very little work on what substantive universals are and how much they might vary. There was some interesting work on this by Grismshaw and Pesetsky (and I think Pinker originally) on canonical realization rules wherein semantic classes project to syntactic classes. I have no idea how well that approach turned out, however. Maybe someone who knows about it might bring us up to speed. At any rate, what seems evident to me is that GG has had many things to say about structural universals, but very little about substantive universals (at least in syntax).

    Last point: It's not only categories that we have eluded insight, even what a lexical items is has been hard to pin down. What's the difference between a concept and a lexical item that tags that concept? Chomsky and Paul Pietroski have worried about this consistently over the last decade (if not longer for Chomsky), but all we have is a bunch of interesting examples and a few observations about what they might entail. So it is not only labels that are obscure. Everything concerning LIs is.

    1. Chomsky did not code for V/N etc but simple 'eat'Ps and 'dog'Ps. The problem is that there are clear generalizations across these phrases, e.g. 'eat'Ps function a like 'sleep'Ps and 'give'Ps. Why? Well, would like to say that they are all VPs but then the label issue arises again.
      That's actually pretty close to how I think we should approach the issue. I don't want to ruin the suspense at this point, everything will be revealed in the next post. Let's just say for now that Alex C has done some interesting work on categories in MCFGs that, although it does not carry over straight-forwardly to Minimalism for various reasons, outlines how one could get a POS-less system off the ground without losing the generalizations provided by POSs.

  3. I was confused by your mention of cartography as coming close to an effort to work out this part of syntax. Doesn't cartography exacerbate the parts-of-speech problem? (Since it explodes the number of categories that need to be distinguished, and in practice – that is, in most languages – most or all of them have no morphology whatsoever.)

    1. closest != close ;)

      Yes, cartography exacerbates the problem, and as I mentioned in the post I don't think it tells us much about POS (that's fine, cartography has different goals). But it's also the only Minimalist work I know of that goes beyond the handful of standard categories. So it constitutes at least some kind of work related to POS.

    2. I think a possible angle on the pedagogial problem are the ideas of a) subclasses b) the distinction between ungrammaticality and 'surrealistic anomaly'. The latter for things like #colorless green ideas, which mess up the distributional criteria if not managed appropriately, the former for the endlessly multiplying subdivisions of the major parts of speech ('modal' vs 'predicative' adjectives, 'main' vs 'auxiliary' verbs in English, ...). The pedagogical problem is that intelligent undergrads can manage these two ideas being introduced at the beginning, but the mediocre mass cannot, in my experience (maybe somebody else could explain it better, but I think it's basically just too hard). Distribution and inflection giving different but mostly consistent indications of the PoS system is also helpful for the bright ones.

      But we also need to figure out our story about what the syntactic categories are doing in the first place.

  4. Delayed response to this interesting post. I wonder if the issue isn't really that the notion of `lexical item' isn't quite right. People seem to think that there's some kind of priority that can be given to what we intuitively think of as words, or, less intuitively, as morphemes. But if the proposals about category-less roots have any traction, and the pronounceable bits are really compositions of roots plus various functional elements, then we don't have a well defined notion of lexical item that comports with the `old-school' intuitions that Thomas lays out above. If we take lexical item to mean `atomic unit that enters into syntactic composition', we have little or no guarantee that that correlates isomorphically with a phonological/morphological unit, or that such atomic units have category features in the traditional sense at all (think of Borer's proposal that roots are categorised in a kind of gestalt way by their context). I guess this is the area where cartographic ideas do actually have something to say about the notion of category: from that perspective a category is a position in a structure, defined by its scopal relations to other such positions (so it's syncategoremic, hence has some relationship with distribution), while a feature is a paradigmatically alternating property of such a position. Peter Svenonius and I riff on this a bit in our Features paper in that Handbook of Minimalist Syntax. But the relevant issue here is that we don't want to look at the distributional/morphological/other phenomenological properties of what we think of intuitively as lexical items and take those as a starting point. We want to ask what theory of syntax provides us with an explanation of these phenomena, including their weirdnesses. And I think such theories require us to dissociate the notion of lexical item from morphophonology (though of course we have to have a theory that will account for their morphophonological properties), probably don't have very much that looks like traditional V or N as properties of words, and place a lot more emphasis on functional categories, which have very few members. My own take on this in my LI monograph is that even functional categories are not lexical items, they're just labels of pieces of structure built up from label less roots, but that's maybe going a bit far for some!

  5. "probably don't have very much that looks like traditional V or N as properties of words, and place a lot more emphasis on functional categories, which have very few members"

    I'm not sure that I have seen the advantage of trading lexical categories for relations of roots to functional heads. If we can do all of this in a "natural" syntactic way, then that's great, as we now can let syntax do some of our "morphology." But often the rules that relate to the functional-root relation, don't seem entirely of a piece with what takes place higher up in the "tree", and this serves to encumber the syntax in unhelpful ways. The problem is, and here I agree with David, that we don't really seem to understand what is going on.

    Moreover, words/morphemes, though proxies for what is really going on, are not bad stand-ins. In addition, they seem to be the sorts of things that psycho-ling and nueor-ling research seems able to get a handle on in fruitful ways. I suspect that when all the work is done we will find that something "like" our notion of morpheme and word is not that far off being a natural kind. Not exactly, but extensionally not that far away.

  6. Well, if Chomsky is right about the externalisation f language being an add on, one might expect the natural kind to be the atomic components of linguistically coopted conceptual structure. Seems more likely to me.

  7. I agree, but I take it the question then is what exactly is it to "co-opt" a conceptual structure. As Chomsky has also noted, whatever lexical items are, they are very odd. They can span several "concepts" and have a kind of semantic flexibility that is quite unlike the concepts that we find in other parts of the animal kingdom. The only person I know that is thinking about this hard is Paul Pietroski, and right now I have seen nothing reducing the flexibility of our lexical items to the fact that they are syntactic atoms. Indeed, it is not clear that we directly manipulate our concepts grammatically, so much as manipulate objects that map to these in some complex way. At any rate, I agree, that we should always be wary of overt morphology. But just as it is at times a good indicator of more abstract underlying structure and operations, there is no reason to think (yet) that words/morphemes are entirely orthogonal to our syntactic atoms.

  8. Again it depends on what lexical items are. Functional items and morphemes don't behave like that: they have fairly rigid meanings. It's true that non functional items have the properties you say, but to be honest, they are even weirder and can be made up on the spot as you can see in any sci fi book and one might even take imitations if animal calls etc to be LIs in that sense (I remember reading a novel where some birds screescreed overhead). They are very interesting but they're perhaps incidental to much if syntax.

    1. You might be right here. I have resisted the move towards focusing on functional categories for it effectively reintroduces phrase structure rules. PS rules were a way of defining structure independent of lexical insertion. Syntax as effectively the properties of functional heads endorses the same idea.

      The problem, as you know, is that the degree to which this is true, it encumbers Darwin's Problem. We need a story as to where these things came from. Why do they have the specific properties they have? Their properties look very linguistiky and so are unlikely to be reducible to more general cognitive or computational concerns. If so, they are part of UG and hence a problem.

      That said, I am sympathetic to the conclusion that this is where the action is. FL provides a sentential template in terms of functional heads with special structural properties. Lexical heads are incidental to what happens. This may be right. I just have to get used to the idea.

  9. I completely agree, and I think that's why this is exactly where a lot if intellectual action needs to be over the next few years. I think there is a way of deUGifying FCats, essentially allowing all possible scopal orders within UG, but then pruning them out during development not only via interfacing with external pld but also via interfacing with whatever turn out to be the simplest computations in non linguistic cognition. That might be a way if getting universal scope ordering a of FCats but leaves out the question if which ones get coopted, which I don't have any speculations about. But this does seem to me to be quite an exciting areas exactly for the reason you raise: it's an excellent problem.