- what exactly it encompasses,
- why it is necessary,
- why it shows certain properties but not others.
The Formal Problem: POS LeakThose of you who remember my 4-part discussion of the connection between constraints and Merge know that POS are actually the locus of an immense amount of power. Too much power, in fact. If it is a restricted theory of syntax you want, you need a theory of POS, because by default POS leak.
Don't worry, I'm not going to subject you to yet another tour de force through arcane logics and automata models, no, the basic point can be made in a much simpler fashion. Recall that the syntactic distribution of an LI is one of the criteria we use for determining its category, or rather, the category of the phrase it projects. That is to say, we obviously do not expect all LIs of category V to have the same distribution. For instance, transitives must be followed by a DP, intransitives must not. But we still treat them as Vs because the phrases they project have the same distribution. Wherever you have the VP slept, you should be able to substitute the VP killed Mary. But that is not the case for the VP killed herself, since John slept is fine yet John killed herself is not.2 Thus the two VPs should actually have distinct categories, say, VP[-refl] and VP[+refl]. But if the label of a constituent is determined by the POS of its head, that means killed has different POS in killed Mary and killed herself. That is the basic idea that underlies my proof that constraints can be encoded directly in the POS of the grammar: both constraints and POS can be used to restrict the distribution of constituents, so we can switch between the two as we see fit.
This state of affairs is unsatisfying from a linguistic perspective. Distribution is a major criterion for assigning POS, yet this opens up a loop hole where categories can be freely refined to capture the distributions enforced by constraints. Categories and constraints become interchangeable, we can completely do away with one as long as we keep the other. We lose generalizations (the two VPs above become completely different syntactic objects), and we allow for massive overgeneration because categories act as a loop hole for smuggling in unwanted constraints. All because we do not have the understanding of POS that would allow us to restrict their use in an insightful way.
UG POS Doesn't Answer the QuestionNow it might be tempting to plug the POS leak by fixing the set of POSs across all grammars. This was suggested by Norbert at some point, but by now you'll hopefully agree with me that this is not a helpful way to go about it.
Let's assume that fixing the POS across all languages would actually stop them from leaking by limiting category refinement (I don't think it would, but that's not the point here). The crux is that this step would weaken the notion of POSs even more, to the point of being completely useless. Basically, POSs would be mostly tied to inflectional morphology (an extra-syntactic property, mind you) and syntactic distribution modulo the syntactic constraints. That is to say, we look at the distribution of a given constituent, factor out those parts of the distribution that are already accounted for by independent constraints, and then pick the POS that accounts for the remainder (by assumption one of the fixed POSs should fit the remainder).
But what exactly does this achieve? Without a fixed set of constraints, we can tweak the grammar until the remainder is 0, leaving us with no need whatsoever for POS --- and by extension labeling. And any fixed set of constraints is ultimately arbitrary because we have no principled way of distinguishing between constraints and POS. In addition, it is rather unlikely that small set of constraints Minimalists are likely to post would yield a small set of POS that captures the full range of distribution, as is evidenced by the huge number of POS used in the average tree bank.
So what's the moral here? The link between POSs and constraints is not a funny coincidence or a loop hole that needs to be worked around. It reveals something fundamental about the role POS serve in our theory, but the fact that the connection to constraints also opens the floodgates of overgeneration shows that our notion of POS lacks clear delimiting boundaries; which shouldn't come as a surprise, it is a hodge podge of morphological and syntactic properties without a clear operational semantics in our linguistic theories.
Categories, Categories, and CategoriesNow that we have established that the problem of POS isn't just one of scientific insight but also has real repercussions for the power of the formalism, what can we do about it? Until a few weeks ago, I wouldn't have had much to offer except for a shrug. But then Alex Clark, a regular in our prestigious FoL comments section, presented some very exciting work at this year's LACL on an algebraic view of categories (joint work with Ryo Yoshinaka, not available online yet). So now I can offer you some first results coupled with my enthusiastic interpretation of where we could take this.
Before we set out for our voyage through the
- A POS is a specific type of feature on an LI, e.g. V, N, A, T, C, ...
- A l[exical]-category is the full feature specification of an LI. In Minimalist grammars, for instance, this includes the POS and all Merge and Move features (as well as their respective order).
- A p[rojection]-category is the label assigned to an interior node, for instance V' or TP.
Algebra and CategoriesAt first sight, it seems that Alex's work has nothing to tell us about POS in Minimalist syntax. For one thing, him and Ryo are looking at Multiple Context-Free Grammars (MCFGs) rather than Minimalist grammars, let alone the standard concept of POS that goes beyond mere distribution facts. And more puzzlingly, POS never enter the picture. Since they are working with MCFGs, their concept of category is the MCFG-concept, which amounts to what I called p-categories above. What they show is a particular correspondence between syntactic distribution and p-categories of specific MCFGs. More precisely, if one takes a string language L that can generated by a multiple context-free grammar3 and groups strings into equivalence classes in some specific fashion building on their distribution, what one gets is a rich algebraic structure (a residuated lattice, for those in the know) where each node corresponds to a specific category of the smallest MCFG that generates this language L.
Okay, this was pretty dense, let's break it up into two main insights:
- If one operates with the most succinct MCFG possible, its categories correspond directly to specific distribution classes.
- Those distribution classes are not just a flat collection of atomic entities, they form part of a highly structured system that we can describe and study from an algebraic perspective.
Linguists treat POS as equivalence classes --- the lexicon is partitioned by the set of POSs. This invariably leads to a view of POSs as atomic units that are unrelated to each other and cannot be analyzed any further. Hence there are no principled limits one how the lexicon can be carved up by POSs. We can't do much more than write down a list of valid POSs. In other words, we are limited to substantive universal with no attack vector for formal universals. It also means that the only way the concept of POS can be made meaningful is to link each POS to specific properties, which in practice mostly happen to be morphological. But that can only be done if the set of POSs is fixed across languages, which limits the flexibility of POS and once again enforces a substantive universals treatment of POS.
However, there is no reason why we should think of the lexicon as a collection of partitions. Imagine that the lexicon is ordered instead such that a < b iff LI b can be selected by any LI c that selects LI a.4 The verb kill would no longer be marked as a V selecting a DP, but just have an entry that shows it can select the (assuming that the is the determiner with the most permissive distribution). This is not a particularly new idea, of course, it just takes Bare Phrase Structure to its logical conclusion: there are no POS at all, only LIs. At the same time, it still allows us to express generalization across multiple LIs by virtue of the ordering relation that holds between them. In a certain sense this also captures the intuition of David Adger's suggestion that lexical items may themselves contain syntactic structure, except that we reencode the entailments of the internal structure as an ordering over the entire lexicon. Irrespective of how closely these three ideas are actually connected, the essential point is that we can think of the lexicon as an algebraic object. And algebraic structure can be studied, classified, characterized, manipulated and altered in all kinds of ways.
Towards a Solution: An Algebraic Theory of POSSuppose that we identify a property P that is satisfied by all natural language lexicons, some basic property of the ordering of LIs. Then P would be an indispensable requirement that needs to be preserved no matter what. In this case a constraint can be coded into the lexicon only if the necessary refinement does not destroy property P. And the really neat thing is that constraints can themselves be represented algebraically via tree automata, so this all reduces to the problem of combining algebraic structures while preserving property P --- a cute math problem.
But there's more. Remember how I pointed out in my last post that we should have an explanation as to why phonology and arity do not matter for assigning POS, whereas morphology does? If POS are atomic object, this is just baffling. But if our primary object of study isn't POS but a structured lexicon, the answer might be that the orders that could by induced by these notions do not satisfy our mystical property P. Conversely, morphology and distribution should have something in common that does give rise to P, or at least does not destroy it.
Finally, POS could be given an operational semantics in terms of succinct encoding similar to what Alex C found for MCFG categories. Maybe POS correspond to nodes in the smallest algebraic structure from which one can assemble the fully structured lexicon. Or maybe they denote in a structure that is slightly lossy with respect to information (not as fine-grained as necessary). It all depends on what the linguistic practice actually turns out to be. But that is the cool thing about this perspective: exploring what linguists are doing when they assign POS reduces to figuring out how they are condensing the lexicon into a more compact (but possibly lossy) structure.
So there's tons of neat things we could be doing, and none of it is particularly challenging on a mathematical level. Alas, there is one thorny issue that currently prevents this project from getting off the ground (beyond limited time and resources, that is): Just what the heck is our mystery property P? All of the applications above assume that we already know how to restrict the orderings between LIs, and we clearly don't. Well, such is the fate of promising new ideas, you usually have to start from square 1.
Somewhat ironically, the correspondence between POS and constraints that got me to worry about the role of POS in the first place might be of great help here. As linguists, we have a pretty good idea of what kind of constraints we do not want in our grammar, e.g. the number of nodes is a multiple of 5. We can assume, then, that P is a property that would be lost if we compiled these unwanted constraints into the lexicon. That's a first hint, and if we pool that with other insights about selection and how syntactic scales and orderings tend to work, we might be able to approximate P to a sufficient degree to get some exploratory work started. Here's hoping I'll have a less speculative story for you a couple of years from now.
- Actually, phonological weight has some use in predicting the split between lexical and functional categories, with the former being on average longer than the latter. Personally, I'm inclined to attribute this to extra-grammatical factors --- functional elements are more frequent, and the more frequent a word, the shorter it tends to be. But that doesn't change the fact that phonological weight has some predictive power, yet we do not put it on the same level as morphology as distribution. The intuition, it seems, is that those two are proper syntactic criteria in some sense.↩
- One could of course argue that what is spelled out as killed herself is actually the more abstract VP killed REFL and that John killed REFL is perfectly fine. But you can cook up other examples involving NPIs, PPIs, movement, whatever floats your boat. The basic point is that the category of a constituent does not fully predict its syntactic distribution, which is uncontroversial in any framework that incorporates long-distance dependencies of some kind.↩
- Actually their result pertains only to 2-MCFGs, the weakest kind of MCFGs. Generalizing the result to arbitrary MCFGS doesn't seem particularly difficult, though.↩
- Similar ideas are used in Meaghan Fowlie's treatment of adjunction, and programmers will recognize this as coercive subtyping. Also note that the order is a pre-order but not a partial order. That is to say, a < a holds for every a (reflexivity), and a < b and b < c jointly imply a < c (transitivity). But it is not the case that a < b and b < a implies a = b (no antisymmetry). So two words can have the same distribution and still be distinct, cf. groundhog and woodchuck. Preorders seem to be a very natural construct in natural language, they form the foundation of hyperintensional semantics and also arise organically in the algebraic view of the PCC.↩