Comments on Faculty of Language: Derivation Trees: Syntacticians' Best Friend?

Thanks! your thesis before Greg's is my curren...

2014-02-14T19:56:34.073-08:00

Thanks! your thesis before Greg's is my current plan, on the basis of a latest-first order.

Sorry for the late reply, this week hasn't bee...

2014-02-14T14:58:53.434-08:00

Sorry for the late reply, this week hasn't been very kind to my spare time. Your question can be split up into two issues:

1) How MGs handle morphosyntax.
2) What kind of movement steps are involved.

RE 1) Morphological agreement is usually not done via feature checking in MGs. Feature checking only drives the assembly of lexical items into trees, for which phi-features like person, number, gender do not matter all that much. There's several ways to handle agreement, though.

One would be to implement the Agree operation from recent Minimalism, usually in form of MSO-constraints. That's the one solution you would like to avoid.

The other one is to say that morphology isn't part of syntax but of the mapping from derivations to phrase structure trees. So lexical items are only abstractly realized in syntax and assigning them the right surface form is the job of the mapping.

There's also ways of combining these two approaches, and I'm sure there's alternatives I haven't even thought of yet.

RE 2) This is mostly a question of what Minimalist analysis of passive and ECM you want to implement and as such not really an issue with MGs. However, there is a somewhat troubling aspect to this problem in that the level of nesting is unbounded. So any variant of She is considered (to be believed)^+ to be rich is grammatical. That's problematic for standard MGs because every lexical item has only a finite number of movement features, so it can only undergo a finite number of movement steps.

This can be tackled in two ways: Greg argues in his thesis that successive cyclic movement doesn't arise from the feature calculus but is a property of the mapping from derivations to phrase structure trees. So even though there's only one movement step taking place, it may have to touch down at various locations. The other solution is to allow features to survive feature checking, which makes it possible for them to participate in multiple operations. This is explored in Ed Stabler's 2011 survey paper. Neither variant increases the power of the formalism.

So, a descriptive/technical question about MGs: ho...

2014-02-11T16:49:56.343-08:00

So, a descriptive/technical question about MGs: how to do the classic Passive/Raising cycle in Icelandic, including cases and multi-agreement, e.g. (it's very annoying that blogger trashes any attempt at gloss alignment!):

Ég álit hana vera talda hafa verið ríka
Ι(Ν) consider her(A) to-be believed(A) to-have been rich(A)
"I consider her to be believed to have been rich"

Hún er álitin vera talin vera rík
She(N) is considered(F.N.Sg) to-be believed(F.N.Sg) to-be rich(F.N.SG)
"People think she is believed to be rich"
(Thráinsson 2007:438)

I haven't managed to get the NPs to appear in the right positions with their cases, let alone manage the multi-agreement (I know about constraints, but would hope that they could be avoided for predicate adjective/participle agreement).

Yes. One thing I'd like to know is whether th...

2014-02-10T20:37:28.581-08:00

Yes. One thing I'd like to know is whether the very earliest stages of the acquisition of variable word order languages like Greek (more variable than Russian, as far as I can make out, and more accessible than Warlpiri) show any traces of argument ordering specific to individual verbs, or a wide range of possibilities from the earliest possible moment. The literature seems to suggest the later (with variants being used in pragmatically appropriate circumstances), favoring LFG and standard flavors of Minimalism, as opposed to Construction Grammar and basic TAGs (there are surely ways of fixing TAGs to address this problem), but what I could find doesn't seem to address the absolute beginnings.

@Avery interesting. There's a thrust in certai...

2014-02-10T13:44:57.627-08:00

@Avery interesting. There's a thrust in certain versions of minimalism (the `nanosyntax' strand) to lexicalize the phonologization of whole trees (so you build up a tree then go and look in your lexicon to see if you've got a phonology for that tree, and if you have you insert it) that suffers from exactly this problem. Peter Svenonius pointed out to me that it makes it pretty hard to capture wide cross-language generalizations (e.g. V2), as there's no reason to `project' what one verb will do on the basis of the others, in exactly the way you just mentioned. Standard minimalism builds this info into a computational unit (a functional category), which is the locus of these generalizations (e.g. Finite C attracts finite T, for V to C, and Finite C requires an A-bar specifier to give V2, both properties are parameters, predicting languages that do one or the other or neither). So the nanosyntactic view suffers the TAG problem and standard minimalism here behaves more like LFG in your scenario.

In the spirit of Chomsky 1957 and David Marr, I th...

2014-02-09T16:24:55.573-08:00

In the spirit of Chomsky 1957 and David Marr, I think it's a mistake to worry about algorithms too soon (as long as it can't be proved that that there can't be any for some given approach). I don't think looking at whole languages will do, because linguistically signficant generalizations as linguists see them are always based on finite amounts of data, and we want the equivalent theories to have the same capacity to project such data to the whole language.

So if your linguistic framework is LFGs, and a few verbs in you language are found with SVO, VOS and VSO orders, and then another comes along with only SVO and VSO attested, the most succinct LFG grammar (other things being appropriately fixed will also produce the VOS order, while the best (basic, unadorned) TAG would not, since each word-order will need its own elementary tree for each verb that uses that word order.

So that, if you think LFG is making the right kind of prediction, you should either abandon TAGs for LFG, or modify TAGs to deal with this problem. I think that implicit forms of this kind of reasoning was involved in much early generative syntatic practice, but got buried with the rise of the P&P and the apparent utility of ideas such as the subset principle, but as P&P seems more dubious, it might be time to get more explicit about it.

I thought about this a while ago. So you can say s...

2014-02-09T00:07:25.755-08:00

I thought about this a while ago. So you can say something like this.
Suppose we have two grammatical formalisms G and H, where each grammar generates a set of sound/meaning pairs, then G is polynomially reducible to H if there is a polynomial p such that for every grammar g in G, there is a grammar h in H such that L(g) = L(h) (i.e. they generate the same set of sound/meaning pairs) and |h| < p(|g|).
And two formalisms are polynomially equivalent if G is reducible to H and H is reducible to G.

So this then gives us MCFG < MG and DFA < NFA
etc.

Thinking about positive and negative data takes you into the question of Occam learning algorithms which is quite murky (Blumer et al 1987).

s there any substantial work on succinctness in ma...

2014-02-08T16:09:21.387-08:00

s there any substantial work on succinctness in math ling? My first stab would be along the lines that two theoretical frameworks are equivalent in terms of succinctness if the numerical scores of the most succinct grammars they provide for all datasets (including both positive and negative data examples if we want to duck the negative evidence problem for a moment) can be interconverted by an order preserving mapping.

2014-02-08T16:07:53.268-08:00

This comment has been removed by the author.

The succinctness is a good argument but it isn'...

2014-02-07T05:45:36.054-08:00

The succinctness is a good argument but it isn't really an argument against PSGs since GPSG is definitely a PSG and dealt (alebit imperfectly) with that problem.
I think there are two separate arguments --- one is about whether there is some structure in the syntactic categories of a grammar which there obviously is right? -- and the other is about whether information is always introduced at the leaves of the derivation tree.

Maybe there isn't any real content to the lexicalisation issue if you have unpronounced elements. The inclusiveness condition (is that the right term?) just seems to be stated without any argument.

@David: Short answer: no, they have different doma...

2014-02-05T14:48:08.294-08:00

@David: Short answer: no, they have different domains, and thus their union is a function. Longer answer: yes, they do different things and so even if they are defined as a single function it needs to be defined in cases. Longest answer: If you look at the derivation trees (but multidominant), merge and move are just binary internal nodes. The only difference between them is that merge's daughters are independent, and that move's daughters are not (the one contains the other). So at the level of the derivation tree, they have exactly the properties that Chomsky wanted (this is a recurring theme). But of course, we want to get to strings and meanings. So you could ask whether there is some interpretation to these nodes in the derivation tree which gets us partway to strings and meanings which makes them look similar. And there I'm not so sure. Certainly at the level of directly compositional semantic interpretation, merge is different from move.

@Alex & Thomas: Sylvain Salvati has (see page ...

2014-02-05T14:39:35.227-08:00

@Alex & Thomas: Sylvain Salvati has (see page 98) a very insightful discussion of the difference between the higher order and first order derivations we've been talking about here. (That's how I understood your (Alex's) question.) The difference in concision between MGs and MCFGs seems to be due to the (implicit) difference in the type systems used by the respective formalisms; MG rules use universal quantification over (finite) types, MCFGs don't.

@David: I think that linking syntactic theory to variationist sociolinguistics (among other things) is hugely important.
About regularities in the lexicon... Honestly, that's a hard problem. We know from the work on succinctness in computer science that the kinds of generalizations you can express can depend on the power of your descriptive apparatus. So it's not `what patterns are in the data', but rather `what patterns can I describe with my tools'. One strategy is to simply use the most powerful tools available. Another is to use weak tools (and miss possible generalizations), but appeal to some (as yet mythical) learning procedure which tries to re-use already existing feature bundles.
In this particular case, I think it's nice to separate the currently prevailing doctrine (a universal hierarchy of functional projections) from the formal theory (MGs). But that's just my taste.
If you wanted to see the MG version of what you described, a natural way to go would be to consider well-formed derivation trees in isolation from any derivational process. Essentially, you would need to say that there is a constraint/filter which enforces the universal hierarchy, in addition to the other usual constraints/filters which then only consider selection and licensing features for specifiers. Derivationally, it means that there's a free choice about who you first-merge with, and then everything proceeds as normal, with a filter that stops you from first-merging with the wrong thing.

@Alex: I think the reasons people don't like P...

2014-02-05T14:34:17.268-08:00

@Alex: I think the reasons people don't like PS rules are partly historical. Jackendoff's 1975 take on Chomsky's Remarks was very influential in kickstarting the idea that lexical entries have to be quite complex, leading to lexical rules, and an impoverishment of the PS-component. Also, since PS rules allow things like non-exocentric structures, projections that don't match syntactic distributional tests, etc, it seemed more sensible, at least at the time, to directly impose constraints like endocentricity via X-bar constraints on the projections of lexical information so they were captured as high level generalizations in the grammar. Of course that assumes that such information (endocentricity, number of levels of projection, etc) guides the learning of syntactic rules, rather than being derived from the data - which you may not buy (although I do). More recently, I think the argument is that displacement operations tend to be structure preserving, so the same technology should build and transform structure (that's one of the External/Internal Merge motivations) - a conclusion reminiscent of the motivations Gazdar put forward in the 80s for GPSG, which takes the displacement operations to be structure preserving precisely because the structure is built directly by the phrase structure rules (plus various ways to percolate features). The E/I-Merge story is a bit more elegant though, as it says that structure building and displacement just are the same thing.

One question I had for MGians (?) is that that doesn't seem to be true in MG, right? Move and Merge have to be defined as different operations?

@Bob:As far as I can tell, such locality condition...

2014-02-05T12:15:33.615-08:00

@Bob:As far as I can tell, such locality conditions on MCTAG derivations don't translate in any natural way to conditions on MG-style derivations.
Which constraints in particular do you have in mind? I'm thinking about this in terms of Laura Kallmeyer's characterization of MCTAG as TAG-derivation tree languages with various constraints put on them --- multicomponent, simultaneity, etc --- and none of them seem to hinge on the representation format in any specific way.

@Jeff: It would be interesting to see the extent t...

2014-02-05T11:57:45.069-08:00

@Jeff: It would be interesting to see the extent to which differences between phonology and syntax can be distilled to simply strings vs. trees.
This is actually a very tricky issue worth its own blog post, so I'm just rattling off some quick observations here.

I looked at the subregular complexity of Minimalist derivation tree languages in an older paper of mine. Even if we add a slew of new movement operations as I do in this LACL paper, they are definable in first-order logic with a predicate for dominance, but not in first-order logic with immediate dominance. Over strings these logics correspond to star-free and locally threshold testable. Locally threshold testable is probably too weak for phonology (unless you restrict your class of models to single words rather than strings of words), so FO with dominance seems like a good first approximation.

[Excursus: we still have those pesky stress patterns in Creek and Cairene Arabic that aren't star-free. The data is iffy, though, so these might turn out to be non-issues on closer inspection]

The power of the mapping is a lot less clear. Let's ignore copy movement for now. Then:

- MSO-definable transductions provide a reasonable upper bound.

- An approximate lower bound is given by linear deterministic multi bottom-up tree transducers, which are used for standard phrasal movement to a c-commanding position (there might be a weaker transducer that can pull this off, at this point we do not know).

- The MSO-definable string transductions are exactly the deterministic two-way finite state transductions, which are too powerful for phonology.

- I don't think ldmbtts can do a lot of work over unary branching trees, so they might actually be equivalent to standard finite-state transducers in the string case.

@Alex C: there seem to be very strong objections t...

2014-02-05T11:19:51.513-08:00

@Alex C: there seem to be very strong objections to the idea of phrase structure rules
usually it's a matter of capturing generalizations and/or grammar size. For instance, headedness is a fundamental property of syntax but purely accidental under a PSG analysis. Simple things like subject-verb agreement are tedious with PSGs. This is usually fixed by enriching the lexical representations and adding mechanisms like feature percolation, but once you have these mechanisms you can ditch the phrase structure rules. So from a linguist's perspective, if A can't do the job well without B, but B can do it by itself, then get rid of A.

Ed Stabler has another (imho much better) argument in Appendix A of his parsing paper, namely that MGs (lexicalized formalism) are much more succinct than MCFGs (PSG-based).

There are two ways of writing grammars -- you can ...

2014-02-05T10:12:43.071-08:00

There are two ways of writing grammars -- you can lexicalise it (i.e. push all the structure onto the leaves of the trees) and have the non leaf nodes all being kind of trivial, or you can have nontrivial structure on the non leaf nodes (i.e. having some sort of phrase structure rules). If you have empty (phonetically null/unpronounced) nodes in the tree then it is really trivial to translate and if you don't it can be highly nontrivial. But there seem to be very strong objections to the idea of phrase structure rules which I haven't yet understood the basis of... don't know if anyone can help me out here.

Greg write we just want a regular set (of trees),...

2014-02-05T09:11:23.771-08:00

Greg write we just want a regular set (of trees), and a transduction.

In Phonology, I believe we want a regular set of strings and regular string-to-string transductions (actually particular subregular classes of stringsets and transductions).

I'd like to know more about the regular sets of trees and transductions that Greg, Thomas, Tim, Bob, and others are interested in!

It would be interesting to see the extent to which differences between phonology and syntax can be distilled to simply strings vs. trees.

cool. I'll read the idioms stuff. This set of ...

2014-02-05T09:06:18.926-08:00

cool. I'll read the idioms stuff. This set of ideas looks looks very similar to what I've been pursuing in a very unformal way in the work where I've tried to connect minimalist syntax with variationist sociolinguistics (yeah, I know). Hmm, maybe I should get some money together to get you guys over here for a workshop.

On a slightly tangential topic, when I first wrote Core Syntax, I started with having the selectional features in an ordered list, but there seem to be some good theoretical and empirical reasons to take there just to be a single selectional feature per LI, so I ended up emptying my lexical items of features connected to the ordering of functional categories (since this seems universal, putting it in lexical items leaves a generalization uncaptured) and for multiple arguments, the way that syntactic theory has been going for the last decade is to remove argument structure properties from lexical items and attribute them instead to functional elements (e.g v, Appl, my qof head in my LI book, etc). So it seems to me that the prevailing theoretical wind pushes us in the direction of emptying our units of computation of structured representations, so that all of the structure is in the computation/derivations, rather than in the items themselves. Taking that set of choices in Core Syntax basically allowed me to say that the EPP feature and the `have an argument' selectional features are the same: basically just syntactic requirements for something to have a specifier (I waved my hands about double objects in CS as I thought putting in an applicative head was a bit much for an intro course!).

No issue of course with compiling that information (the order of the particular set of functional items associated with particular roots) into `slices', or what I called `rooted extended projections' during the acquisition process, but I don't think kids come armed with richly structured lexical items (qua elements of computation) at the start of the acquisition process.

So I guess the question then is whether one could still do MGs, but take the fundamental units to achieve their ordering via some other mechanism than selection.

@David: The idea is only implicit in Thomas' s...

2014-02-05T07:59:13.935-08:00

@David: The idea is only implicit in Thomas' slices. I present these sorts of derivation trees explicitly in my papers on idioms (esp section 5.1) and ellipsis linked above. The paper on idioms is a very abstract (but very general) formalization of Williams' spanning, plus a proposal about interfaces, these slides might be clearer. If you are interested in assigning probabilities to derivations in a meaningful way, Tim's paper is a good place to start.

@Bob: I think the only problem with converting higher order MCTAG derivations to first order (`MG-style') ones is that the operations being used are not substitute and adjoin. Instead, the operations can be thought of as something like tuples of the usual TAG operations (but indexed with gorn addresses). Once you specify the actual operations taking place, there is no hindrance to translating any restriction you might like to impose on higher order derivations into first order terms. As for (finitely bounded) delay, I think it's best to think of this as a transduction on normal derivation trees. In other words, I think that we should drop the `derivation' in `derivation trees'; we just want a regular set (of trees), and a transduction. (I suspect Thomas would agree with me.)

I wonder if you had something concrete in mind? (And thus that I was talking past you...)

My gratitude to all for this interesting discussio...

2014-02-05T06:39:01.778-08:00

My gratitude to all for this interesting discussion about the relation between the two types of derivation trees. @Greg, thanks for the reference to @Thomas's paper, which I hadn't known about, and which gets at a question that I have been worried about for some time, like @Tim. Though the notion of slices or polynomials allows for a translation of one type of derivation tree into another, it strikes me that it is not completely satisfactory when we consider conditions imposed on the derivation trees for the application of TAG adjoining in different varieties of multi-component TAG (compare tree local MCTAG to k-delayed MCTAG to set local MCTAG). As far as I can tell, such locality conditions on MCTAG derivations don't translate in any natural way to conditions on MG-style derivations. And while I suspect that that strict tree locality can be preserved using a simple tree transducer, even if the statement of such restrictions is a mess in terms of MG-derivations, it's not obvious that the other kinds can be. If such conditions turn out to have linguistic import, this might be a reason to prefer one sort of representation over another. Any thoughts?

Thanks Greg. I took a look at some of the stuff th...

2014-02-05T05:07:17.554-08:00

Thanks Greg. I took a look at some of the stuff that Thomas did on slices, and I think they're not quite the same as the way I was thinking about telescoped representations, although there is definitely an exciting idea here, that the grammar can be specified by a set of finite slices, which is very similar to what I proposed. In my own proposal, the idea is that for an actual derivation of a structure via UG principles (so duing acquisition, and maybe as long as UG stays `open'), Merge applies to build binary or unary representations, where the output of the representation is labelled by a functional category (not by an operation). However, the way I sketch it in the book, and in a more motivated way in http://ling.auf.net/lingbuzz/002012, an actual grammar (i.e an acquired I-language) will be a lexicalized/routinized version of the unary projections above the root (sort of Construction Grammar in Reverse: the conventionalized or routinized structures are just those given by UG plus the primary linguistic data). So then a grammar can be given by these unary projections (which are like strings, effectively), plus binary Merge/Move, which is, if I've understood it, very like Thomas's slices proposal. The difference is that mine are labelled by not operations, but by functional category labels. So mine are actually much closer to Brody's telescoped representations, which have labels on the nodes, as opposed to specifications of whether the node was built by one operation or another. I need the labels on the nodes as I don't have functional heads as independent lexical items (qua elements of computation rather than spellouts of structure). Of course, since some of these labels have extra diacritics on them specifying whether the projection line is pronounced at that point (like the Brody-*), or whether they have an EPP feature forcing movement, it amounts to something like the same thing, I think. Wish I had more time to think about this stuff. Roll on September when I am no longer Dean!

@Alex: Made-up terminology... Collins' paser ...

2014-02-05T04:56:41.116-08:00

@Alex: Made-up terminology... Collins' paser (~1999) lexicalizes the treebank cfg, in that non-terminals are marked up with the identity of their head terminal. I think of this as a grammar transform. It has the benefit that head-to-head selection is a local property in the transformed tree, allowing refined statistics to be collected. (This is related to the traditional ideas about idioms in the transformational tradition.)

@Greg: What is the "Collins transform"?

2014-02-05T03:18:48.203-08:00

@Greg: What is the "Collins transform"?

@Tim: I think, btw, that this representation might...

2014-02-04T17:06:51.773-08:00

@Tim: I think, btw, that this representation might be interesting for parsing. Note that essentially the Collins-transform has been applied to it. Note also that the sizes of the first order and higher order derivation trees are different (the HO tree is usually at least half the size), and so the number of parsing steps in a successful parse will also be much fewer.