Monday, February 22, 2016

Derived objects and derivation trees

I am pretty slow to understand things that are new to me. So I have a hard time rethinking things in new ways, being very comfortable with what I know, which is all a longwinded way of saying that when I started reading Stabler’s take on Minimalism (and Greg Kobele’s and Thomas Graff’s and Tim Hunter’s) that emphasized the importance of derivation trees (as opposed to derived objects) as the key syntactic “objects” I confess that I did not see what was/is at stake. I am still not sure that I do. However, thinking about this stuff led me to some thoughts on the matter and I thought that I would put them up here so that those better versed on these issues (and also more mentally flexible) could help set me (and maybe a few others) straight.

The way I understand things, derivation trees are ways of representing the derivational history of a sound/meaning pairing. A derivation applies some rules in some sequence and interpreting each step allows for a phonological and a semantic “yield,” (a phonetic form and a meaning correlate of these sequence of steps).  Some derivations are licit, some not. This divides the class of derivation trees into those groups that are G-ok and those that are not. However, and this seems to be an important point, all of this is doable without mentioning derived objects (i.e. classical phrase markers (PM)). Why? Because PMs are redundant. Derivation trees implicitly code all the information that a phrase marker does as the latter are just the products that applying rules in a particular sequence yield. Or, for every derivation tree it is possible to derive a corresponding PM.

Putting this another way: the mapping to sound and meaning (which every syntactic story must provide) is not a mapping from phrase markers to interpreted phrase markers but from sequences of rules to sound and meaning pairs (i.e. <s,m>). There is no need to get to the <s,m>s by first going through phrase markers. To achieve a pairing of articulation and semantic interpretations we need not transit through abstract syntactic phrase markers. We can go there directly. Somewhat poetically (think Plato’s cave), we can think of PMs as shadows cast by the real syntactic objects, the derivational sequence (represented as derivation trees), and though pretty to look at shadows (i.e. PMs) are, well, shadowy and not significant. At the very least, they are redundant and so, given Occamite sympathies for the cleanly shaved, best avoided.

This line of argument strikes me as pretty persuasive. Were it the case that derived objects/PMs didn’t do anything, then though they might be visually useful they are not fundamental Gish objects (sort of like linguistic versions of Feynman diagrams). But, I think that the emphasis on derivation trees misses one important function of PMs within syntax and I am not sure that this role is easily or naturally accommodated by an emphasis on derivation trees alone. Let me explain.

A classical view of Gs is that they sequentially map PMs into PMs. Thus, G rules apply to PMs and deliver back PMs. The derivations are generally taken to be Markovian in that the only PM that a G must inspect to proceed to the next rule application is the last one generated. So for the G to licitly get from PMn to PMn+1 it need only inspect the properties of PMn. On this conception, a PM brings information forward, in fact all (and in the best case, no more) of the information you need to know in order to take the next derivational step. On this view, then, derived objects serve to characterize the class of licit derivation trees by making clear what kinds of derivation trees are unkosher. An illustration might help.

So, think of island effects. PMs that have a certain shape prohibit expressions within them from moving to positions outside the island. So an expression E within an island is thereby frozen. How do we code islandhood? Well, some PMs are/contain islands and some do not. If E is within an island at stage PMn then E cannot move out of that island at stage PMn+1. Thus, the derivation tree that represents such movement is illict. From this perspective, we can think of derived objects as bringing forward information in derivational time, information that restricts the possible licit continuations of the derivation tree. Indeed, PMs do this in such a way that all the information relevant to continuing is contained in the last PM derived (i.e. it supports completely markovian derivations). This is one of the things (IMO, the most important thing) that PMs (aka derived objects) brought to the table theoretically.

So the relevant question concerning the syntactic “reality” of PMs/derived objects is whether we can recapture this role of PMs without adverting to them. And the answer should be “yes.” Why? Well derived objects just are summations of previous derivational steps. They just code prior history. But if this is what they do, then derived objects are, as described above, redundant, and so, in principle eliminable. In other words, we can derive all the right <s,m>s by considering the class of licit derivation trees and we can identify these without peaking at the derived objects that correspond to them.[1]

This line of argument, however, is reminiscent of another one that Stabler (here) critically discusses. He notes that certain kinds of context free grammars (MCFGs) can mimic movement with the effect that all the same <s,m>s that a movement based theory derives such a CFG can also derive and in effectively the same way. However, he notes that these CFGs are far less compact than the analogous transformational grammars and that this can make an important difference cognitively. Here’s the abstract:

Minimalist grammars (MGs) and multiple context free grammars (MCFGs)
are weakly equivalent in the sense that they define the same languages, a
large mildly context sensitive class that properly includes context free languages.
But in addition, for each MG, there is an MCFG which is strongly
equivalent in the sense that it defines the same language with isomorphic
derivations. However, the structure building rules of MGs but not MCFGs
are defined in a way that generalizes across categories. Consequently,MGs
can be exponentially more succinct than their MCFG equivalents, and this
difference shows in parsing models too. An incremental, top-down beam
parser forMGs is defined here, sound and complete for allMGs, and hence
also capable of parsing all MCFG languages. But since the parser represents
its grammar transparently, the relative succinctness of MGs is again
evident. And although the determinants of MG structure are narrowly and
discretely defined, probabilistic influences from a much broader domain
can influence even the earliest analytic steps, allowing frequency and context
effects to come early and from almost anywhere, as expected in incremental

Can we apply the same logic to the discussion above? Well maybe. Even if the derivation trees contain all the same information that a theory with PMs does, does it make it available in the same way that PMs do? Or, if we all agree that certain information must be “carried forward” (Kobele’s elegant term) derivationally, might it make a cognitive difference how this information is carried forward; implicitly in a derivation tree or explicitly in a PM? Well, here is one place to look: One thing that PMs allow is for derivation to be markovian. Is this a cognitively important feature, analogous to being compact? I can imagine that it might be. I can imagine that Gs being markovian has nice cognitive properties. Of course, this might be false. I just don’t know. At any rate, I have no problem believing that how information is carried forward can make a big computational difference. Consider an analogy.

Think of adding a long column of multi-digit numbers. One useful operation is the “carry” procedure whereby only the last digit of a column is registered and the rest is carried forward to the next column. But is “carrying” really necessary? Is adding all but the last digit to the next column necessary? Surely not, for the numbers carried can be recovered at every column by simply adding everything up again from earlier ones. Nonetheless, I suspect that re-adding again and again has a computational cost that carrying does not. It just makes things easier. Ditto with PMs. Even if the information is recoverable in derivation trees, PMs make accessing this information easy.

Let me go further. MPs of the kind that Stabler and his students have developed don’t really have much to say about how the kinds of G restrictions on derivations are to be retrieved in derivation trees without explicit mention of the information coded in the structural properties PMs exhibit. The only real case I’ve seen discussed in depth is minimality, and Stabler-like MGs (minimalist grammars) deal with minimality effects by effectively supposing that they never arise (it is never the case in Stabler MGs that a licit derivation allows two objects to have the same accessible checkable features). This rendering of minimality works well enough in the usual cases so that Stabler MG formalizations are good proxies for minimalist theories “in the wild.”  However, not only does this formalization not conform to the intuition that most syntactitians have about what the minimality condition is, it is furthermore easy to imagine that this is not the right way to formalize minimality effects for there may well be many derivations where more than one expression carries the requisite features in an accessible way (in fact, I’ve heard formalists discussing just this point many times (think multiple interrogation or multiple foci or topics or even case assignment). This is all to say, that the one case where a reasonably general G-condition does get discussed in the MP literature leaves it unclear how MP should/does treat other conditions that do not seem susceptible to the same coding (?) trick. Or, minimality environments are just one of the conditions that PMs make salient. It would be nice to see how other restrictions that we explain by thinking of derivations as markovian mappings from PMs to PMs is handled with derived objects.[2] Take structure dependence or, Island/ECP effects or the A-over-A condition for example. We know what needs doing: we need to say that some kinds of G relations are illicit between some positions in a derivation tree so that some extensions of the derivation tree are G-illicit. Is there a nice compact description of what these conditions are that make no mention, however inadvertently, of PMs?

That’s it. I have been completely convinced that derivation trees are indispensible. I am convinced that derived objects (aka PMs) are completely recoverable from derivation trees. I am even convinced that one need not transit through PMs to get to the right <s,m> pairs (in fact, I think that thinking of the mapping via PMs that are “handed over” to the interpretive components is a bad way of thinking of what Gs do). But this does not yet imply, I don’t think, that PMs are not important G like objects. At the very least they describe the kinds of information that we need to use to specify the class of licit derivation trees. Thus, we need an account of how information is brought forward in derivational time in derivation trees and, more importantly, what is not. Derived objects seem very useful in coding the conditions on G-licit derivation tree continuations. And as these are the very heart of modern GG theory (I would say the pride of what GG has discovered) we want to know how these are coded with PMs.

Let me end with one more historical point. Syntactic Structures argues that transformations are necessary to capture evident generalizations in the data. The argument for affix hopping and Aux movement was not that a PSG couldn’t code the facts, but that it did so in such a completely ugly, ad hoc and uninformative way. This was the original example for the utility of compact representations. PMs proved useful in similar ways: inspecting their properties allowed for certain kinds of “nice looking” derivations. The structure of a given PM constrained what next derivational was possible. That’s what PMs did well (in addition to feeding <s,m>s). Say we agree that PMs are not required (or really add much) in understanding the mapping between sounds and meaning (i.e. in deriving <s,m> pairs) what of the more interesting use to which OMs were made (i.e. stating restrictions on derivations). Is this second function as easily, insightfully discharged without PMs? I’d love to know.

[1] It is perhaps noteworthy that there is not a clear match between grammaticality and semantic interpretability. Thus, there are many unacceptable sentences that are easlity interpreted and, in fact, have only one possible interpretation (e.g. The child seems sleeping, or Which man did you say that left). This, IMO, is an important fact. We odn’t want our theories of linguistic meaning to go off the rails if a sentence is ungrammatical for that would (seem to) imply that it has no licit meaning. Now there are likely ways to get around this, but I find nothing wrong with the idea that an expression can be semantically well formed even if syntactically illicit and I would like a theory to allow this. Thus, we don’t want a theory to not yield a licit <s,m> just because there is no licit derivation. Of course, how to do this, is an open question.
[2] See here and here for another possible example of how derivation trees handle significant linguistic details that are easy to “see” if one adopts derived objects. The gist is that bounding “makes sense” in a theory where PMs are mapped into PMs in a computationally reasonable way. It is less clear why sticking traces into derived objects (if understood as yields (i.e. ss or ms in <s,m> pairs) makes any sense at all given their interpretive vacuity.


  1. Marcus Kracht's work on compositionality addresses your
    > idea that an expression can be semantically well formed even if syntactically illicit
    The idea is that there are independent domains (phonological, categorial, semantic), each with their own operations. What we think of as the grammatical operations are really pairings (or triplings) of operations from each of those domains. One application of *merge* to an expression e = is the simultaneous application of the sound part to ph, the category part to c, and the meaning part to m. It is therefore no problem to trace the 'meaning' part of *merge* through an otherwise illegitimate series of derivational steps.

    I'm not sure that this is the right way to think about semantic well- despite syntactic ill-formedness. I think it is useful to distinguish three cases. 1) an underivable sound-meaning pair. 2) a derivable but otherwise ill-formed s-m pair (perhaps because there were three subjacency violations). 3) an s which is not part of any derivable s-m pair but which is a few cognitive `repairs' away from a derivable s'-m' pair. I am happy saying that cases like 3 abound; I have no problem understanding the speech of late second language acquirers of English, or of very early first language acquirers. A commonly accepted instance of case 2 comes from processing difficulties like center embeddings (but I used the example I did because I think that it is natural and faithful to the original proposal to view subjacency this way). It is attractive to me to think that all cases of something being "semantically well-formed despite being syntactically ill-formed" is an instance of either 2 or 3.

  2. > The gist is that bounding “makes sense” in a theory where PMs are mapped into PMs in a computationally reasonable way. It is less clear why sticking traces into derived objects (if understood as yields (i.e. ss or ms in pairs) makes any sense at all given their interpretive vacuity.

    What I got out of the previous discussion on that point is that it’s possible to imagine certain kinds of finite state transducer that might need to insert traces. So thinking in string terms, imagine that each output token is determined by the most recent k input tokens together with the most recent k output tokens. This kind of transducer couldn’t put anything into “storage” for an indefinitely long time, and so would have to regularly copy dummy symbols into the output instead.

    1. @Alex: This won't solve anything; you will still be able to remember exactly k units of information about moving things (per state), as if you have more moving things than that you will have forgotten about the earlier ones once you've written all the traces down.

      I think that Thomas must have been thinking of his tier-based representation for movement when he made that complexity claim. I don't see how to make any sense of it otherwise. (I still don't even in tiers, but I'm willing to give him the benefit of the doubt.)

    2. @Greg: I'm not sure what you mean when you say that this won't solve anything. The point was just that if the next output is determined solely by looking back at some finite portion of the preceding input and output, then it will not be possible to "remember" that a particular symbol occurred in the input for an indefinite period of time. So the transducer would have to insert something in the output every so often to "remind" itself. There's a loose analogy between that sort of process and successive-cyclic movement.

    3. @Alex: I meant that since you have a finite number of states, you don't gain anything by temporarily writing down a finite amount of information, as you can just encode that in the state.

    4. @Greg: I was assuming that there were no states in that sense ("the next output is determined solely by looking back at some finite portion of the preceding input and output").

    5. Both of you are correct. With a free choice of states, locality restrictions are pointless since all non-local information is locally passed around by states --- the whole subregular hierarchy gets compressed into a single point. But transductions can indeed be computed with systems where the choice of states is much more restricted, as is the case with, say, the input strictly local functions. And then locality does matter because, intuitively, you can only refer to what you actually see in some subpart of the structure.

      None of these string transductions have been lifted to trees yet, afaik there isn't even a tree analogue of the subsequential transductions, which are much better known that their input/output strictly local subclasses. But it is very easy to see that movement cannot be output strictly local unless you have intermediate landing sites. That by itself will still not be enough because there is no upper bound on the distance between phase edges due to adjunction. But that's where the tiers come in, as Greg correctly surmised: project a tier that does not contain any nodes related to adjunction, and you (probably) get an upper bound on the distance between landing sites.

  3. "[...] Stabler-like MGs (minimalist grammars) deal with minimality effects by effectively supposing that they never arise (it is never the case in Stabler MGs that a licit derivation allows two objects to have the same accessible checkable features)."

    I am far too ignorant of the details of Stablerian MGs to know if this statement is true — but if it is, then I don't see how these grammars can be understood as models of natural language.

    Consider agreement intervention in Basque. It is easy to show that ABS nominals in Basque do not require licensing by a functional head. Additionally, non-clitic-doubled DAT nominals intervene in agreement relations targeting the ABS. And, crucially, neither the ABS nor the DAT have any "accessible checkable features" in that configuration, in that neither requires licensing-by-checking by the time the phi probe enters the derivation.

    The only way I can make sense of the above statement, then, is if we assume that ABS nominals have a feature that requires checking-by-agreement in all derivations except in those licit derivations where a DAT intervenes (and there are such derivations; you get 3sg agreement even with a non-3sg ABS argument). But at this point, haven't you just introduced a feature that amounts to [+/- minimality is in play]? Couldn't you just as easily add a [+ not in an island] feature to every wh-phrase that happens not to be in an island? I guess what I'm saying is that if that's the treatment of minimality, then you've pretty much admitted minimality is real and your system can't handle it.

    But like I said above, it is entirely (entirely!) possible that I have not fully understood what's at stake.

    1. @Omer: I would first qualify the quote to read that "Stablerian MGs often assume a particular constraint on movement (called the SMC) to the effect that ..." This is because Stablerian MGs are a general framework in which all of minimalism as she is practiced can be straightforwardly implemented.

      A closer-to-home example is multiple wh-questions, and some proposals (Frey, Gaertner, and Michaelis) follow Grewendorf in this regard.

      Wrt Basque, a (perhaps too?) simple approach would be to block off/erase/delete/make invisible the ABS's phi features once the DAT enters the derivation.

    2. @Greg: That wouldn't work because, if the DAT is subsequently clitic-doubled, then the features of the ABS become visible again should a yet-higher phi probe search for them. (That's in fact how you get ABS agreement in monoclausal ditransitives; but when the clause is non-finite — and hence lacks clitic doubling — a higher probe is able to target the embedded ABS only in the absence of a DAT.)

  4. It seems to me that this post talks about several issues at once without clearly distinguishing them.

    1) Can we generate the same sound-meaning mappings without using PMs?
    2) Are PMs Markovian in a sense that derivation trees are not?
    3) Do PMs allow for more succinct descriptions of certain well-formedness conditions compared to derivation trees?
    4) Are there structural conditions that are natural over PMs but unnatural over derivation trees? If so, are any of these conditions similar to what we observe in natural languages?

    Let's take them one by one:

    1) Expressivity: Yes, as you point out yourself.

    2) Markovian: Depends. If I read correctly between the lines, your worry is that we can easily put conditions on derivation trees that one might consider instances of look-ahead, whereas this is less natural from a PM perspective. But the devil is in the details here. I can take look-ahead constraints and compile them into the grammar so that they are enforced without look-ahead, so PMs do not protect you from look-ahead. At the same time, it is very simple to restrict the constrains on derivation trees to make them Markovian in your sense. In fact, this is already the case for standard MGs.

    3) Succinctness: Depends on your definition of succinctness. The situation is not at all like MGs VS MCFGs, where you have an exponential blow-up in grammar size. Over derivation trees, you can still reference every position that a phrase moves to. If you take some constraint like the Proper Binding Condition and look at its implementation over derived trees and derivation trees, the latter won't look much more complicated, though admittedly some additional complexity needs to be added to the derivational definition of c-command if you want it to be exactly identical to c-command in derived trees. But i) it is unclear that you need perfect identity, and ii) the difference in description length is still marginal.

    4) Naturalness: Depends on your definition of naturalness. Any effect brought about by movement could be considered unnatural over derivation trees because instead of the moving phrase you only have a Move node in the derivation. But this is only unnatural if you ignore the fact that Move nodes establish very close links to their movers --- links that syntacticians often think of as multi-dominance branches.

    The last point is also why I'm pretty puzzled whenever syntacticians are reticent to switch to derivation trees. The switch is very painless, it's pretty much the same as working with multi-dominance trees (yes, we usually drop the movement branches because they do not add any information for MGs, but that doesn't matter for the things syntacticians care about in their work). Yet despite being a superficially minor switch that is easy to adapt to, it opens up a multitude of new perspectives, which you can tell from the discussions we've had on this blog before. Recent examples include parallels between phonology and syntax, the status of successive cyclicity, hidden implications of sideward movement, and the connection between features and constraints.

    1. Two more examples that I'm pretty fond of that haven't been discussed on this blog yet: Greg's unification of syntactic and semantic theories of ellipsis, and his treatment of idioms as the semantic analogue of suppletion. There's a lot more, such as Ed Stabler's top-down parser and the work by Greg, John Hale, Tim Hunter, and me and my students that builds on it, Tim's approach to adjunction, and so on. All of these ideas are strongly informed by the derivational perspective, to such an extent that they probably wouldn't have been proposed if we were still thinking in terms of derived trees.

    2. @Thomas: I think you are factually wrong about that. Idioms as the semantic analogue of suppletion, for example, is an idea that can be found in some of the earliest writing in Distributed Morphology. And so it was proposed independently (and I think chronologically prior to?) any talk of derivation trees in minimalism — see, e.g., Marantz 1997 (PLC proceedings), 2001 ("Words", unpublished ms.)

    3. @Omer: Such are the perils of boiling down a paper to one catchy slogan. A better one would have focused on how the paper reconciles the fact that idioms are syntactically fine-grained but semantically atomic. Or how it ties it to psycholinguistcs. Or how the existence of idioms is completely unremarkable under a derivational perspective. Just look at Greg's paper and you'll see that it touches on a lot of things and bundles it all together in an intuitive fashion through derivation trees.

    4. Already downloaded it, hopefully will have time to read it tomorrow.

    5. @Thomas: First off, not a bad summary. Second, I don't think that there is any hostility to viewing things in terms of derivation trees. At any rate, I am not averse. What I want to get clear about is whether the things that we exploited PMs for can be easily transferred to derivation trees. If so, then great, let's use them as the basics. This won't even change practice much for if PMs are redundant, as they are for the mapping, then they will just be useful ways of thinking of things for those who think pictorially. Again, like Feynmann Diagrams. So, there is no general hostility, just curiosity about the details.

      So, back to the main point: why do GGers like PMs? Because we have observed that the rules that Gs exploit are structure dependent and PMs are a way of making the relevant structure explicit. So, minimality, islands, binding domains, headedness, c-command etc are all easy to code in PMs. Given that we think that these are key notions, then we want something that will allow for their easy coding. My query was to provoke those who knew to instruct us all about how we might do this without PMs. I noted that this seemed like a legit question given that the only interesting condition I have seen coded (though I admit to not being expert in any of this, hence the post) viz. Minimality, does this by sidestepping the problem, rather than coding the condition. Minimality does not arise in Stabler MGs, as you know. Why not? I have no idea. Would it be easy to code? I have no idea. How would it compare to a PM coding? I have no idea. But, I do know that unless it is easy to code and does so in a natural way, then this is a problem for a view that entirely dumps PMs. So, this was all by way of a question, with some indications of why I take the question to be interesting.

      You note that succinctness and naturalness are all in the eyes of the definers. True. But some definitions seem nicer than others. I know that raising such issues is useful because Stabler did it for "movement" (as did Chomsky before him). He argued that its virtues lie NOT in the mapping to s but in the nature of the Gs that do the mapping. G size might matter. Similarly, how far back one need look in a derivation tree might matter. Of course it might not. But, isn't considering things like this the business we are in.

      So, do I have hostility to derivation trees as the fundamental objects? No way. Do I understand how we can code all that we want entirely in these terms? Nope. That's where you come in. Always call a local expert when you run into problems. Thomas, I am calling.

    6. @Norbert: Alright, let me answer the call. Just a quick aside first: I didn't say there's hostility, just hesitation, which is surprising because at first glance derivation trees and multidominance trees almost seem like notational variants. The latter are fairly run-of-the-mill, so one would expect the former to be met by "yeah sure, why not" rather than "I don't know, this feels strange". But since that's not really the case with you anyways, let's move on.

      My hunch is that your query is best answered by a concrete example. Let's take the Adjunct Island Constraint: no phrase may move out of an adjunct. That's a little vague, so here's a more rigorous, GB-style version:

      For all nodes x and y, if x and y are members of a movement chain, then there is no node z such that

      - z is the maximal projection of a head h, and
      - the phrase projected by h is an adjunct, and
      - x c-commands z, and
      - z properly dominates y, and
      - y is not a projection of h.

      I do not define movement chain here, because that's a mess over PMs if you have remnant movement (see e.g. the Collins and Stabler paper).

      Or one more in the vein of Minimalism:

      Let P be some phrase marker containing a potential mover M. Then Move(P,M) is defined only if M is not properly contained by an adjunct.

      Once again I leave out a definition here, this time it's proper containment. But it's pretty similar to the clauses 1, 2, 4, and 5 above, so the two definitions differ little in complexity once you pin down all the terms.

      Now let's look at the derivational equivalent. First two terminological remarks: a Move node x is an occurrence of y iff x checks a feature of y, and the slice root of a lexical item l is the highest node in the derivation that is licensed by some feature of l (this is the derivational equivalent of the highest node projected by l in the PM).

      For all nodes x and y such that x is an occurrence of y, there is no lexical item l such that

      - l is an adjunct, and
      - l and y are distinct, and
      - the slice root of l properly dominates y.

      For a full definition we would also have to define adjunct in all three definitions. But the bottom-line is that I do not see much of a difference between any of these definition. Once you have to clearly spell out your terms, they all boil down to checking that movement paths are not allowed to cross adjunct borders.

      And this is the story for pretty much all constraints over PMs that I am aware of. They aren't any harder to state over derivation tees because MG derivation trees --- in contrast to, say, derivation trees in Tree Adjoining Grammar --- are structurally very similar to PMs despite removing some of the structural fat, as Tim points out.

    7. Let's look at Relativized Minimalist as another example.
      For the sake of simplicity, let us reduce the condition to the following: x may not move across y to the specifier of z if y could move to z. This is partially handled by the Shortest Move Constraint in MGs as you cannot have both x and y carry the feature that would allow them to move to z. But MGs do allow x to move if y has no such feature, and the other way round. So here's how you patch this loop hole in two steps:

      Let l be a lexical item with phonetic exponent pe and the string of feature f_1 ... f_n. Then l is a potential g-mover iff there is some lexical item l' with phonetic exponent pe and feature string f_1 ... f_i g f_{i+1} ... f_n.

      That's just a very roundabout way of saying that l could in principle carry movement feature g. And now we enforce the simplified version of Relativized Minimality introduced above:

      If x is a g-occurrence of y, then there is no z such that

      - z and y are distinct, and
      - x properly dominates z, and
      - the slice root of z properly dominates y, and
      - z is a potential g-mover

      Not particularly shocking either. You might say that the switch between dominance and dominance by slice root is inelegant, but it can all be boiled down to the concept of derivational prominence. And of course all of this would be more intuitive if we looked just at the trees instead of mucking around with definitions; but since succinctness was an issue, I figured a more technical approach would provide better evidence that there is no real difference in that respect.

    8. Bugfix: the derivational definition of the AIC above is missing one clause.

      - x properly dominates the slice root of l.

    9. @Thomas: good, so I see there's no problem per se in stating minimality over derivation trees. So we can reject the (faulty) premise that no two goals ever simultaneously bear the features sought by the probe, and we still have a way of getting minimality effects. This is encouraging.

      So now we come to the question of naturalness. One of the nice properties of minimality when stated over PMs is that it can be understood as a side effect of iterative search. (The search in question is neither breadth-first nor depth-first, but rather something like searching along the spine of the tree. Each iteration consists in asking "does the spec match my needs? if not, does the complement match my needs?" and if the answer is still no, repeating this step with what-was-the-complement as the new domain of evaluation.)

      The question is whether the following (quoted from your comment above) has a similarly natural interpretation:

      - z and y are distinct, and
      - x properly dominates z, and
      - the slice root of z properly dominates y, and
      - z is a potential g-mover

    10. @Omer: Yes, it captures pretty much the same intuition. Here's several ways of paraphrasing it: 1) Traversing downwards from the Move node towards the mover, you may never cross through the slice of a potential mover. 2) If incremental search along the complement spine leads you towards a potential mover, the derivation is considered ill-formed because the actual mover is further away then it could have been. 3) Among all potential movers for a Move node, the derivationally most prominent one has to be the actual mover.

  5. Norbert wrote: this does not yet imply, I don’t think, that PMs are not important G like objects. At the very least they describe the kinds of information that we need to use to specify the class of licit derivation trees. Thus, we need an account of how information is brought forward in derivational time in derivation trees and, more importantly, what is not. Derived objects seem very useful in coding the conditions on G-licit derivation tree continuations. And as these are the very heart of modern GG theory (I would say the pride of what GG has discovered) we want to know how these are coded with PMs.

    Unfortunately I most likely won't have much time this week to participate further in this discussion, but just briefly: As far as I can tell, no one denies that some information has to be carried forward about the current "derivational state" in order to define which transformations (e.g. merge/move steps) are applicable at any particular point. And no one denies that full PMs of the traditional sort are sufficient to carry forward this information. To my mind, the question is whether all of the information that full traditional PMs encode is necessary. Almost certainly it is not all necessary, for the same reasons that various people have intuitions about something like phases: hand-waving a bit, only a certain amount of relatively recent history is relevant.

    1. Ok, we are on the same page here. What we want to know is what OMish information is required and how to make it available. So PMs are succinct ways of coding the "current derivational state" (thx for the terminology). And yes, full PMs are too rich, hence the pause/subjacecny concerns. But, if I read your point correctly, we agree that PMs DO add something useful, a specification of the current derivational state. Hence PMs help determine the "next" step takable to expand the derivation tree. If this is so, they are very useful objects, or at least the info contained therein is. The question remains whether the info they carry is redundant, or is easily recoded in derivation tree terms. Thx, and whatever you are busy with, good luck.

    2. There is no way to do without a specification of the current derivational state. Everyone has some object that serves that purpose. The question of whether PMs are useful will only make sense if we specify what a PM is. (Otherwise we could just use the label PM for the whatever-it-is that encodes current derivational state.)

      At least implicitly, a PM is often assumed to be something that has words in it (for example, as the leaf nodes in a tree). Let's put the stake in the ground there. If we do that then I think it's clear that PMs do contain redundant information, since all the applicability of derivational operations cares about is the categories of these words, not the identity of the words themselves (their pronunciation, etc.); that's virtually the definition of what a "category" is.

      I think the inclination to assume that PMs have complete words in them stems from the idea that the PM one arrives at at the end of a derivation serves as the basis for interpretation at the interfaces --- in addition to serving as the basis for determining which derivaitonal operations are applicable. If you assume this, then your PMs have to do more than encode current derivational state, and yes you will obviously need the full PF/LF information about each word. So the question of "how much info you need to carry forward" depends on whether you're carrying it forward for operation-applicability purposes only, or for that plus interface interpretation purposes. When we think in derivation tree terms, however, I think it is almost always assumed that the derivation itself is interpreted, rather than the final derived object. In other words, it's a system that has the same shape as something like CCG: the thing that encodes derivational state is just the categories like NP and S\NP, and the pronunciation and meaning of the larger constructed object is composed step by step in sync with the derivational operations themselves, not by stepping through some tree-shaped derived object.