Wednesday, October 9, 2013

The Merge Conspiracy [Part 2]

Last time in our riveting story of intrigue and cabal, the seemingly innocuous subcategorization requirements encoded via Merge turned out to have a kinky bent to them: by splitting categories into smaller subcategories, we can regulate dependencies such as case-marking and gender agreement using only Merge. But the rabbit hole goes a lot deeper... ... ... ... (in fact, it is so deep I decided to pull a Wachowski and save the second half of the post for tomorrow).

The Big Picture

A little bit over two years ago, Greg Kobele and me showed independently of each other that the feature refinement strategy that works for case-marking and gender agreement can be extended to a vast range of dependencies, including non-local ones. [1,2] As a matter of fact, almost all constraints in the syntactic literature can be compiled directly into the feature system so that they are implicitly enforced by Merge.

As so often in science, Greg and me were standing on the shoulders of giants --- the rare kind of giants with a predilection for formal language theory, who had already shown
  1. the correspondence between constraints and logical formulas,
  2. the equivalence of monadic second-order logic and finite-state tree automata,
  3. a strategy for compiling tree automata directly into grammars.
With these pieces of the puzzle in place, it only takes a couple of pages of jumping through technical hoops before one reaches the marvelous conclusion that a constraint can be expressed by Merge iff it can be defined in monadic second-order logic.

The Big Picture Explained

If you and your mind aren't, respectively, awestruck and blown by the revelation above, then that's probably due to those fancy-schmancy terms that do an excellent job of obscuring the beauty of the underlying idea.

Constraints and Logic

Let's look at the correspondence of constraints and logic first. As linguists we usually encounter logic in semantics, where it is used as a description language for propositions. But logic can also be used to talk about structures, in particular trees. For example, we might want to ensure that every reflexive is c-commanded by a DP (the crudest of all crude approximations of Principle A). In first-order logic, this would be expressed by the formula

∀x [reflexive(x) → ∃y [DP(y) & c-commands(y,x)]]
"for every node x it holds that if x is a reflexive, then there is a node y that is labeled DP and c-commands x"

A tree is well-formed with respect to our original constraint iff the formula above is true in the tree. For the most part, then, we are simply using logic as a description language for defining constraints.

Monadic Second -Order Logic

Just like grammar formalisms, logics differ with respect to their expressive power, so there are some constraints that can be formalized in logic A but not in logic B. Monadic second-order logic (MSO) is a small extension of first-order logic in which one may quantify not only over individual nodes but also over sets of nodes. Set quantification makes it very easy to talk about linguistic domains, which is why MSO has proven extremly useful for the formalization of syntactic constraints. Rogers (1998) [3] actually gives a full MSO implementation of GB with Relativized Minimality, which is very impressive considering the byzantine nature of GB.

There's only two things that are both beyond the reach of MSO and at least of peripheral interest to syntacticians. Despite being a logic, MSO has absolutely no grasp of semantic entailment. Zilch, nada. So MSO cannot regulate the distribution of heads and phrases based on semantic factors such as identity of meaning. It is also impossible to check whether two subtrees are identical, which is a problem if, say, your analysis of ellipsis involves deletion under syntactic identity. Everything else is pretty much fair game, though.

Tree Automata

Finite-state tree automata are closely related to MSO. If you have at least one course in formal language theory under your belt, all you need to know is that tree automata are the tree-analogue of finite-state automata over strings. If you don't (or you lost the belt), here's the quick gist: A tree automaton has a finite set of state symbols, and its job is to decide whether a tree is well-formed by assigning a state symbol to every node in the tree. Crucially, which state a node receives depends only on its label and the states of its daughters. So the assignment of states is a very local process, even more local than Merge. A tree is well-formed iff the state assigned to its root is a designated final state.

Despite their strict locality bound, tree automata are exactly as powerful as MSO --- every tree automaton can be converted into an MSO formula, and every MSO formula can be converted into an equivalent tree automaton. For our purposes, this means that MSO-definable long-distance dependencies can be enforced in a way that's even more local than Merge. Intuitively, one long-distance dependency is decomposed into a chain of many, many local dependencies. It isn't too surprising that Merge can enforce these local dependencies, and that is what allows it to emulate long-distance dependencies as long as they are MSO-definable.

Back to Feature Refinement

Now that we have seen the big picture of things, how does it all tie into the kind of feature refinement we used for case and gender agreement? Well, the states the automaton assigns to the nodes in a tree can be pushed directly into the category system. Look at the tree below, where a gray box has been added for each state.
The complement of the verb is the DP their cat. Its highest projection --- the node labeled DP --- has been assigned the state q. So in a sense, we are not dealing with a DP but a q-DP, so we change the category feature of the DP from D to D(q). We do the same thing for all other XPs, giving us a new tree with refined category features.
In this tree, all subcategorization requirements are violated because the Arg-features still have the old values, but that is easy to fix.

If you keep doing this kind of refinement until all lexical items have been refined in all licit ways, your tenacity will be rewarded with a grammar that generates only those trees that are both generated by the original grammar and deemed well-formed by the automaton.

In sum, if you want to express a constraint via Merge, you formalize it as an MSO formula, convert the formula into an automaton, and compile the state assignments of the automaton directly into the feature system. Charmingly straight-forward, isn't it?

[1] Graf, Thomas (2011): Closure Properties of Minimalist Derivation Tree Languages. In Sylvain Pogodalla and Jean-Philippe Prost (eds.), LACL 2011, Lecture Notes in Artificial Intelligence 6736, 96--111.
[2] Kobele, Gregory M. (2011): Minimalist Tree Languages are Closed under Intersection with Recognizable Tree Languages. In Sylvain Pogodalla and Jean-Philippe Prost (eds.), LACL 2011, Lecture Notes in Artificial Intelligence 6736, 129--144.
[3] Rogers, James (1998): A Descriptive Approach to Language-Theoretic Complexity. CSLI: Stanford.


  1. Where can I read about

    > a strategy for compiling tree automata directly into grammars.

    hopefully with examples of the whole chain down from constraints.

    1. Actually what's a good chain of papers to read to see the rough shape of the mentioned hoops?

    2. Also I should add, this seems like it might be highly relevant to type theoretic concerns about refinement types. Ghani, Atkey, and Johann have a paper ( about when a refinment type (basically a subset type) is inductive. They give an answer that is basically, "whenever the constraint is recursively computable", but it's only a directional implication. This post suggests that there is an identity between tree constraints and some kind of recursive computation, which is a very big deal for type theory if it's correct.

    3. The conversion from MSO to tree automata goes back to Rabin69 and Thatcher&Wright68. Thatcher67 describes how tree automata can be compiled into context-free grammars. Unless you are already well-versed in formal language theory, though, I would recommend that you start with my thesis instead (seriously, this isn't just lame self-promotion). It slowly introduces MSO and tree automata and uses many examples. The only thing missing is the actual conversion step from MSO to automata, but this is explained fairly well in Morawietz's "Two Step Approaches to Natural Language Formalisms", p64--72. If your library doesn't have it, PM me.

    4. As for the relevance to type-theory, that's not really my area of expertise, but there are definitely interesting connections between types and tree languages, that's what Abstract Categorial Grammar is all about. Maybe you can find something helpful in this lovely collection of papers.