One thing I did not make perfectly clear in my last post is that there is an asymmetry between the advantages and disadvantages of feature coding. The main advantage is that you can freely add MSO-constraints to your formalism without having to worry about altering its computational properties. But obviously that isn't much of an advantage if your formalism isn't all that computationally well-behaved to begin with. The disadvantages, on the other hand, are problems for any reasonable theory because they weaken its linguistic fabric by allowing
- limited counting (e.g. odd versus even),
- unattested interdependencies (e.g. the gender of the subject determines if the verb is marked singular or plural),
- wrong typological predictions.1
Fixing the Set of FeaturesThis is the solution proposed by Norbert, and it's also what many other syntacticians have suggested when confronted with the result: since feature coding requires new features, just assume that UG fixes a set of features et voila, all this outrageous stuff is no longer possible.
Now on a purely personal level I am not particularly smitten with this idea (which is code for "I am vehemently opposed to it... sorta, kinda"). That's because working with a fixed set of features would make the math extremely hard and would also jeopardize all the attractive properties of Minimalist grammars, my favorite playground. But pouting and stomping your foot doesn't make for a very convincing argument, so do I have any substantial objections? Why yes, as a matter of fact I do: limiting the set of category features is not guaranteed to work.
If UG fixes the set of category features, then that set must contain every category that is used in some language. But it is far from obvious that every language uses all the features provided by UG, in particular if the distinctions are more fine-grained than the usual macro-classification into nouns, verbs, adpositions, adjectives and adverbs. In this case, languages will have some unused features lying around that can be exploited for feature coding. Now the crazier constraints like allowing only trees with an odd number of nodes would still be difficult to pay off because this requires that at least half of all UG features are not used by the original language. So fixing the set of features does make feature coding more difficult. However, it does it in the wrong way because now we predict that the number of categories a language uses is indirectly correlated to the complexity of constraints it can enforce.
This problem becomes even more pressing if one buys into the story I mentioned in the previous post that the presence of constraints in syntax is not at odds with the Minimalist hypothesis because all constraints are enforced via Merge. For then every posited syntactic constraint --- universal or language-specific --- has to be enforced via feaures, exploding the size of the set of features furnished by UG.
Overall, fixing the set of features is a blunt strategy that is only guaranteed to work if every category feature found in some language is found in every language. But even in this one-in-a-million scenario it isn't a satisfying solution because it solves a formal issue with substantive universals; it deprives us of an opportunity to learn something about Merge and its source of power by stipulating the problem out of existence.
Order is in OrderOne common assumption we made in our treatment of subcategorization is that the category feature of the argument must exactly match the value of the selecting head's Arg feature. This is why if we split D into D(m) and D(f) to encode gender differences, Arg:D(m) means that only masculine DPs can be selected, feminine DPs or DPs without gender specification just won't do. But if this kind of exact matching criterion is weakened, the class of constraints that can be enforced by Merge should shrink.
Meaghan Fowlie has a very neat paper in the latest MOL proceedings where she argues that adjunction should be treated as standard Merge where exact feature matching is replaced by approximate feature matching. According to Fowlie, the set of category features isn't flat and structurless like a botched pancake. Instead, it has an order defined on top of it. This order, for example, might include a statement of the form N > A. Then every head that selects an N can also select an A. So this gives us the entailment that adjectives can be adjuncts of nouns, as in the boy and the young boy.
I find this approach particularly fascinating because it opens up an order-theoretic perspective on the adjunct argument distinction (and the various hybrid thingies that do not exactly belong to either category). With such an order-theoretic understanding in place, it might be possible to restrict the set of Merge-expressible constraints to those that preserve or do not explicitly contradict the category order of the original grammar. Mind you, I haven't had time yet to look at this more closely while wearing the crimson cape and oversized cat ears that grant me my mathematical super powers. Still, putting order constraints on selection looks like the most promising short-term solution to me.
Lexical ObesityCompiling constraints directly into the feature system can lead to an enormous blow-up in the size of the lexicon. Suppose n is the maximum number of arguments a head may take, and Q is the number of states of the automaton that computes a given MSO-constraint. Then the refined lexicon may be up to Qn times bigger than the original lexicon. Now if you have some background in complexity theory then you might point out that this is just a linear blow-up, which isn't all that bad. While technically true, the multiplier grows way too quickly in this case. Even if every head selects at most 2 arguments, an automaton for a reasonably sized MSO formula can easily have over 50 states, so the refined lexicon could be almost 2500-times the size of the original lexicon. And that's just for a single constraint, whereas an implementation of a real Minimalist analysis would probably require half a dozen MSO constraints. Talk about a ginormous lexicon.
This blow-up might feasibly put a bound on the kind of constraints that can be enforced by Merge. The difficulty of parsing increases with the size of the grammar because more structures must be built, compared, and kept in memory. Now if there is a subclass of constraints whose underlying automata exhibit certain regularities so that they can be compiled into the grammar without a huge blow up in the size of the lexicon, then maybe those are the constraints found in natural language while all others --- albeit formally expressible --- never surface because the corresponding grammars cannot be parsed by humans.
Wrapping upThis is the end of our one-week exploration of the dark, sinister corners of grammar formalisms where MSO-constraints can be smuggled in unnoticed via various feature coding shenanigans.
We saw that grammar formalisms with a standard subcategorization mechanism are capable of expressing all MSO-definable constraints. On a formal level, this actually makes them easier to extend, modify, and study, but it also points out a big loop-hole in our theories that needs to be plugged. Substantive universal can get the job done in a few select cases, but they do away with both the good and the bad, and they do so in a stipulative way that provides no real insights into the problem. A more promising route (or at least more titillating on an intellectual level) is to restrict selection via formal universals and invoke grammar-external considerations such as parsing needs to prune down the grammar space as a whole.
Some constraints will still slip through, namely those that are structurally identical to attested constraints but switch features around. For example, any grammar that can handle number agreement between subject and verb can also match number on the verb with gender on the subject. But to some extent this problem exists in every formalism: if you have a way to transfer or match features, why is the mechanism restricted to features of the same kind?
- In an attempt to keep things simple, I actually oversimplified the typological entailments in the previous post (kudos to Greg Kobele for pushing me on this issue). First, closure under intersection and relative complement hold only for some formalisms such as TAG and MGs with the Shortest Move Constraint. Second, we have to be specific about the nature of trees under consideration in statements such as every tree that is grammatical in English but not in German. For derivation trees and multi-dominance trees, the examples I gave are definitely valid. For phrase structure trees, however, it depends on whether one can tell for every trace/copy which phrase it is a trace/copy of.↩