Sunday, October 13, 2013

The Merge Conspiracy [Part 3]

Part 2 and a half showed us the consequences of the MSO-Merge correspondence, both good and bad. In an ideal world, there should be a way to curtail the bad without limiting the good. Alas, ever since J.J. Abrams got his hands on Star Trek I have been uncertain about the degree of idealness of the world we inhabit.

One thing I did not make perfectly clear in my last post is that there is an asymmetry between the advantages and disadvantages of feature coding. The main advantage is that you can freely add MSO-constraints to your formalism without having to worry about altering its computational properties. But obviously that isn't much of an advantage if your formalism isn't all that computationally well-behaved to begin with. The disadvantages, on the other hand, are problems for any reasonable theory because they weaken its linguistic fabric by allowing
  • limited counting (e.g. odd versus even),
  • unattested interdependencies (e.g. the gender of the subject determines if the verb is marked singular or plural),
  • wrong typological predictions.1
So when push comes to shove, sacrificing the advantages in order to do away with the disadvantages is a reasonable move. It's just not the best case scenario. Keeping this in mind, what are our options to curb the excessive power Merge is granted by feature coding?

Fixing the Set of Features

This is the solution proposed by Norbert, and it's also what many other syntacticians have suggested when confronted with the result: since feature coding requires new features, just assume that UG fixes a set of features et voila, all this outrageous stuff is no longer possible.

Now on a purely personal level I am not particularly smitten with this idea (which is code for "I am vehemently opposed to it... sorta, kinda"). That's because working with a fixed set of features would make the math extremely hard and would also jeopardize all the attractive properties of Minimalist grammars, my favorite playground. But pouting and stomping your foot doesn't make for a very convincing argument, so do I have any substantial objections? Why yes, as a matter of fact I do: limiting the set of category features is not guaranteed to work.

If UG fixes the set of category features, then that set must contain every category that is used in some language. But it is far from obvious that every language uses all the features provided by UG, in particular if the distinctions are more fine-grained than the usual macro-classification into nouns, verbs, adpositions, adjectives and adverbs. In this case, languages will have some unused features lying around that can be exploited for feature coding. Now the crazier constraints like allowing only trees with an odd number of nodes would still be difficult to pay off because this requires that at least half of all UG features are not used by the original language. So fixing the set of features does make feature coding more difficult. However, it does it in the wrong way because now we predict that the number of categories a language uses is indirectly correlated to the complexity of constraints it can enforce.

This problem becomes even more pressing if one buys into the story I mentioned in the previous post that the presence of constraints in syntax is not at odds with the Minimalist hypothesis because all constraints are enforced via Merge. For then every posited syntactic constraint --- universal or language-specific --- has to be enforced via feaures, exploding the size of the set of features furnished by UG.

Overall, fixing the set of features is a blunt strategy that is only guaranteed to work if every category feature found in some language is found in every language. But even in this one-in-a-million scenario it isn't a satisfying solution because it solves a formal issue with substantive universals; it deprives us of an opportunity to learn something about Merge and its source of power by stipulating the problem out of existence.

Order is in Order

One common assumption we made in our treatment of subcategorization is that the category feature of the argument must exactly match the value of the selecting head's Arg feature. This is why if we split D into D(m) and D(f) to encode gender differences, Arg:D(m) means that only masculine DPs can be selected, feminine DPs or DPs without gender specification just won't do. But if this kind of exact matching criterion is weakened, the class of constraints that can be enforced by Merge should shrink.

Meaghan Fowlie has a very neat paper in the latest MOL proceedings where she argues that adjunction should be treated as standard Merge where exact feature matching is replaced by approximate feature matching. According to Fowlie, the set of category features isn't flat and structurless like a botched pancake. Instead, it has an order defined on top of it. This order, for example, might include a statement of the form N > A. Then every head that selects an N can also select an A. So this gives us the entailment that adjectives can be adjuncts of nouns, as in the boy and the young boy.

I find this approach particularly fascinating because it opens up an order-theoretic perspective on the adjunct argument distinction (and the various hybrid thingies that do not exactly belong to either category). With such an order-theoretic understanding in place, it might be possible to restrict the set of Merge-expressible constraints to those that preserve or do not explicitly contradict the category order of the original grammar. Mind you, I haven't had time yet to look at this more closely while wearing the crimson cape and oversized cat ears that grant me my mathematical super powers. Still, putting order constraints on selection looks like the most promising short-term solution to me.

Lexical Obesity

Compiling constraints directly into the feature system can lead to an enormous blow-up in the size of the lexicon. Suppose n is the maximum number of arguments a head may take, and Q is the number of states of the automaton that computes a given MSO-constraint. Then the refined lexicon may be up to Qn times bigger than the original lexicon. Now if you have some background in complexity theory then you might point out that this is just a linear blow-up, which isn't all that bad. While technically true, the multiplier grows way too quickly in this case. Even if every head selects at most 2 arguments, an automaton for a reasonably sized MSO formula can easily have over 50 states, so the refined lexicon could be almost 2500-times the size of the original lexicon. And that's just for a single constraint, whereas an implementation of a real Minimalist analysis would probably require half a dozen MSO constraints. Talk about a ginormous lexicon.

This blow-up might feasibly put a bound on the kind of constraints that can be enforced by Merge. The difficulty of parsing increases with the size of the grammar because more structures must be built, compared, and kept in memory. Now if there is a subclass of constraints whose underlying automata exhibit certain regularities so that they can be compiled into the grammar without a huge blow up in the size of the lexicon, then maybe those are the constraints found in natural language while all others --- albeit formally expressible --- never surface because the corresponding grammars cannot be parsed by humans.

Wrapping up

This is the end of our one-week exploration of the dark, sinister corners of grammar formalisms where MSO-constraints can be smuggled in unnoticed via various feature coding shenanigans.

We saw that grammar formalisms with a standard subcategorization mechanism are capable of expressing all MSO-definable constraints. On a formal level, this actually makes them easier to extend, modify, and study, but it also points out a big loop-hole in our theories that needs to be plugged. Substantive universal can get the job done in a few select cases, but they do away with both the good and the bad, and they do so in a stipulative way that provides no real insights into the problem. A more promising route (or at least more titillating on an intellectual level) is to restrict selection via formal universals and invoke grammar-external considerations such as parsing needs to prune down the grammar space as a whole.

Some constraints will still slip through, namely those that are structurally identical to attested constraints but switch features around. For example, any grammar that can handle number agreement between subject and verb can also match number on the verb with gender on the subject. But to some extent this problem exists in every formalism: if you have a way to transfer or match features, why is the mechanism restricted to features of the same kind?

  1. In an attempt to keep things simple, I actually oversimplified the typological entailments in the previous post (kudos to Greg Kobele for pushing me on this issue). First, closure under intersection and relative complement hold only for some formalisms such as TAG and MGs with the Shortest Move Constraint. Second, we have to be specific about the nature of trees under consideration in statements such as every tree that is grammatical in English but not in German. For derivation trees and multi-dominance trees, the examples I gave are definitely valid. For phrase structure trees, however, it depends on whether one can tell for every trace/copy which phrase it is a trace/copy of.


  1. A nit-to-pick is that if the 'gender' of the subject determined the 'number' of the verb, the latter feature would not be called 'number', but something else (probably gender+number). So the traditional terminology generates a lot of smoke that needs to be cleared away to find the real structural issues, whatever they turn out to be. One of which, I think, will probably be a distinction between 'category' and 'inflectional' features (corresponding to the ones that live in c-structure vs f-structure, in LFG), which might or might not prove to be generally useful, since the behavior is different in all sorts of ways, ie there is no inflectional sensitivity to the category features of containing items, only to the inflectional ones.

    1. That's a good point. I was thinking along the following lines: Suppose we have a language L where verbs have suffix -s for singular and -p for plural if the subject is neuter. So these guys look like number affixes. Yet with masculine nouns we always get -s and with feminine nouns always -p, even though everywhere else masculine and feminine agreement is indicated by the affixes -m and -f. This kind of pattern strikes me as highly unnatural.

      The split between category features and inflectional features is almost certainly correct on an empirical level, but due to feature coding it has little formal weight because everything related to inflection can be done by category features.

      Maybe over time it will turn out that features themselves are richly structured objects (kind of like a futuristic cross of feature geometries, nano syntax and Distributed Morphology) and that it is this kind of structure that allows for certain kinds of matching but not others. Or maybe if we look at the complexity of the morphological system as a whole it turns out that agreement between features of different types generates systems that are less likely to be inferred by the learner.

    2. But it could happen, tho more likely the other way, for functional reasons (people are more interested in the number of animate than inanimate entities). For example in Ancient Greek, neuter plural nouns are said in the grammars to take singular agreement, which certainly is the case with semantically inanimate nouns. I don't actually know if this happens with semantically animate neuters (such as 'paidia' children); the one example that I dug up in a few minutes on the Perseus project had plural agreement (but I lack the time to browse through several hundred, at least any time soon).

      It's also worth noting that inflectional features in the typical 'inflected' languages are a simplifying abstraction between 'formatives' (overt chunks of form) and facts about meaning and distribution, and you can't really motivate them at all, empirically, without implicit invocation of the evaluation metric. You could in principle dump the standard features and give each formative its own unary feature if you don't mind messing up your grammar. & sound change and other historical effects can do all sorts of strange things to overt forms.

      I think there will have to be some serious story about the difference between inflectional and category features, due to their entanglement with other things. If one member of an inflectional paradigm acquires a new meaning, for example, the others tend to follow quickly (consider the spread of the 'say' meaning of 'I'm like' to the other members of the inflectional paradigm of 'be' in the 1980s), whereas the products of derviational/category changing morphology often change mean without dragging along their derivational bases. So 'rationalize' as acquired the meanings of 'cut services to save money' and 'make up phoney reasons for doing something you just did because you wanted to', without affecting the meaning of 'rational'.

    3. So would you say that we should indeed expect every logically possible agreement pattern to be instantiated in some language? And what about boolean combinations thereof, such as "adjective is masculine iff the noun is masculine or (feminine and singular) or (neuter and plural)"? (modulo the caveat that those are just terms we attach to specific affixes)

      If all of that is actually possible in language, I would be happy, one thing less to worry about, but I'm sure many Minimalists would find it hard to swallow. At least I don't see how boolean conditions could be accommodated by the Agree operation as it is currently defined.

    4. Replace the semantically loaded traditional feature names (which afaik have no spelled out mathematical properties whereby they could be identified without knowledge of their (sometimes rather flakey) semantic affiliations) with abstract attributes tied to specific morphological forms, & I'd say yes.

      For example, in Modern Greek the ending '-i' can arguably signal:

      Feminine Singular (of some nouns)
      (add -s to get the genitive, nothing more for nom and acc)
      Nominative Masculine Plural (of others)
      (-on for genitive plural, -us for masculine accusative plurual)
      Nominative/Accusative Neuter Plural (of yet others)
      (-on for genitive plural)

      Putting this and a few dozen more equally crazy patterns together gives you all kinds of possibilities if you don't care about having an overall 'simple' (by traditional standards) system of connections between form, meaning, and distribution. And I don't get the impression that these traditional simplicity judgements carry any weight in most of the mathematical investigations.

    5. How do you handle things like in German
      "Er findet und hilft Papageien" where Papageien needs to be both +ACC and +DAT? Or Menschen.

    6. This comment has been removed by the author.

    7. Dalrymple and Kaplan (2000) Language pp 759-798 make a proposal for LFG that the values of features can be sets, e.g. CASE {DAT, ACC}, each member of which must be licensed by the environment. I'm not sure how transportable this would be between frameworks, but perhaps something would work along the lines of 'multiple specifications of one feature value on a syntactic object are OK if they are all licensed by the environment and properly spelled out by the form'.

    8. Feature geometry is one way to tackle the apparently different formal behavior of features, levels of representation (as deployed in LFG, currently not much used in the MP afaik) is another. Another interesting effect discussed in Baker's 2008 agreement book is that person features, although very commonly participating in agreement, appear not to get involved in concord (recalling that this distinction between as I am using the terms is between showing features of something you're next to vs what you're inside). So a little table:

      Agreement Concord Selection/Governmen
      Person common never never
      Gender, Number common common never/rare*
      Case common common common
      Tense, Mood, Polarity never rare common**
      Part-of-Speech never never common

      So question is whether some or all of these typological facts are caused by UG facts about representations and what rules can do, or tendencies caused by functional factors, where the 'never' stuff is not impossible in principle but, for functinal reasons, rare enough so that we have missed it by chance (without the Tangkic languages, tense-mood-polarity concord would be a 'never').

      *semantic selection for number doesn't count (eg. 'disperse') 'animacy' selection in Alkonguian languages might be a problem (since certain things that we would regard as inanimate count as animate for that system).

      ** mood is plausibly selected for, tense not.