Faculty of Language: Derived objects and derivation trees

Monday, February 22, 2016

Derived objects and derivation trees

I am pretty slow to understand things that are new to me. So I have a hard time rethinking things in new ways, being very comfortable with what I know, which is all a longwinded way of saying that when I started reading Stabler’s take on Minimalism (and Greg Kobele’s and Thomas Graff’s and Tim Hunter’s) that emphasized the importance of derivation trees (as opposed to derived objects) as the key syntactic “objects” I confess that I did not see what was/is at stake. I am still not sure that I do. However, thinking about this stuff led me to some thoughts on the matter and I thought that I would put them up here so that those better versed on these issues (and also more mentally flexible) could help set me (and maybe a few others) straight.

The way I understand things, derivation trees are ways of representing the derivational history of a sound/meaning pairing. A derivation applies some rules in some sequence and interpreting each step allows for a phonological and a semantic “yield,” (a phonetic form and a meaning correlate of these sequence of steps). Some derivations are licit, some not. This divides the class of derivation trees into those groups that are G-ok and those that are not. However, and this seems to be an important point, all of this is doable without mentioning derived objects (i.e. classical phrase markers (PM)). Why? Because PMs are redundant. Derivation trees implicitly code all the information that a phrase marker does as the latter are just the products that applying rules in a particular sequence yield. Or, for every derivation tree it is possible to derive a corresponding PM.

Putting this another way: the mapping to sound and meaning (which every syntactic story must provide) is not a mapping from phrase markers to interpreted phrase markers but from sequences of rules to sound and meaning pairs (i.e. <s,m>). There is no need to get to the <s,m>s by first going through phrase markers. To achieve a pairing of articulation and semantic interpretations we need not transit through abstract syntactic phrase markers. We can go there directly. Somewhat poetically (think Plato’s cave), we can think of PMs as shadows cast by the real syntactic objects, the derivational sequence (represented as derivation trees), and though pretty to look at shadows (i.e. PMs) are, well, shadowy and not significant. At the very least, they are redundant and so, given Occamite sympathies for the cleanly shaved, best avoided.

This line of argument strikes me as pretty persuasive. Were it the case that derived objects/PMs didn’t do anything, then though they might be visually useful they are not fundamental Gish objects (sort of like linguistic versions of Feynman diagrams). But, I think that the emphasis on derivation trees misses one important function of PMs within syntax and I am not sure that this role is easily or naturally accommodated by an emphasis on derivation trees alone. Let me explain.

A classical view of Gs is that they sequentially map PMs into PMs. Thus, G rules apply to PMs and deliver back PMs. The derivations are generally taken to be Markovian in that the only PM that a G must inspect to proceed to the next rule application is the last one generated. So for the G to licitly get from PM_n to PM_n+1 it need only inspect the properties of PM_n. On this conception, a PM brings information forward, in fact all (and in the best case, no more) of the information you need to know in order to take the next derivational step. On this view, then, derived objects serve to characterize the class of licit derivation trees by making clear what kinds of derivation trees are unkosher. An illustration might help.

So, think of island effects. PMs that have a certain shape prohibit expressions within them from moving to positions outside the island. So an expression E within an island is thereby frozen. How do we code islandhood? Well, some PMs are/contain islands and some do not. If E is within an island at stage PM_n then E cannot move out of that island at stage PM_n+1. Thus, the derivation tree that represents such movement is illict. From this perspective, we can think of derived objects as bringing forward information in derivational time, information that restricts the possible licit continuations of the derivation tree. Indeed, PMs do this in such a way that all the information relevant to continuing is contained in the last PM derived (i.e. it supports completely markovian derivations). This is one of the things (IMO, the most important thing) that PMs (aka derived objects) brought to the table theoretically.

So the relevant question concerning the syntactic “reality” of PMs/derived objects is whether we can recapture this role of PMs without adverting to them. And the answer should be “yes.” Why? Well derived objects just are summations of previous derivational steps. They just code prior history. But if this is what they do, then derived objects are, as described above, redundant, and so, in principle eliminable. In other words, we can derive all the right <s,m>s by considering the class of licit derivation trees and we can identify these without peaking at the derived objects that correspond to them.[1]

This line of argument, however, is reminiscent of another one that Stabler (here) critically discusses. He notes that certain kinds of context free grammars (MCFGs) can mimic movement with the effect that all the same <s,m>s that a movement based theory derives such a CFG can also derive and in effectively the same way. However, he notes that these CFGs are far less compact than the analogous transformational grammars and that this can make an important difference cognitively. Here’s the abstract:

Minimalist grammars (MGs) and multiple context free grammars (MCFGs)

are weakly equivalent in the sense that they define the same languages, a

large mildly context sensitive class that properly includes context free languages.

But in addition, for each MG, there is an MCFG which is strongly

equivalent in the sense that it defines the same language with isomorphic

derivations. However, the structure building rules of MGs but not MCFGs

are defined in a way that generalizes across categories. Consequently,MGs

can be exponentially more succinct than their MCFG equivalents, and this

difference shows in parsing models too. An incremental, top-down beam

parser forMGs is defined here, sound and complete for allMGs, and hence

also capable of parsing all MCFG languages. But since the parser represents

its grammar transparently, the relative succinctness of MGs is again

evident. And although the determinants of MG structure are narrowly and

discretely defined, probabilistic influences from a much broader domain

can influence even the earliest analytic steps, allowing frequency and context

effects to come early and from almost anywhere, as expected in incremental

models.

Can we apply the same logic to the discussion above? Well maybe. Even if the derivation trees contain all the same information that a theory with PMs does, does it make it available in the same way that PMs do? Or, if we all agree that certain information must be “carried forward” (Kobele’s elegant term) derivationally, might it make a cognitive difference how this information is carried forward; implicitly in a derivation tree or explicitly in a PM? Well, here is one place to look: One thing that PMs allow is for derivation to be markovian. Is this a cognitively important feature, analogous to being compact? I can imagine that it might be. I can imagine that Gs being markovian has nice cognitive properties. Of course, this might be false. I just don’t know. At any rate, I have no problem believing that how information is carried forward can make a big computational difference. Consider an analogy.

Think of adding a long column of multi-digit numbers. One useful operation is the “carry” procedure whereby only the last digit of a column is registered and the rest is carried forward to the next column. But is “carrying” really necessary? Is adding all but the last digit to the next column necessary? Surely not, for the numbers carried can be recovered at every column by simply adding everything up again from earlier ones. Nonetheless, I suspect that re-adding again and again has a computational cost that carrying does not. It just makes things easier. Ditto with PMs. Even if the information is recoverable in derivation trees, PMs make accessing this information easy.

Let me go further. MPs of the kind that Stabler and his students have developed don’t really have much to say about how the kinds of G restrictions on derivations are to be retrieved in derivation trees without explicit mention of the information coded in the structural properties PMs exhibit. The only real case I’ve seen discussed in depth is minimality, and Stabler-like MGs (minimalist grammars) deal with minimality effects by effectively supposing that they never arise (it is never the case in Stabler MGs that a licit derivation allows two objects to have the same accessible checkable features). This rendering of minimality works well enough in the usual cases so that Stabler MG formalizations are good proxies for minimalist theories “in the wild.” However, not only does this formalization not conform to the intuition that most syntactitians have about what the minimality condition is, it is furthermore easy to imagine that this is not the right way to formalize minimality effects for there may well be many derivations where more than one expression carries the requisite features in an accessible way (in fact, I’ve heard formalists discussing just this point many times (think multiple interrogation or multiple foci or topics or even case assignment). This is all to say, that the one case where a reasonably general G-condition does get discussed in the MP literature leaves it unclear how MP should/does treat other conditions that do not seem susceptible to the same coding (?) trick. Or, minimality environments are just one of the conditions that PMs make salient. It would be nice to see how other restrictions that we explain by thinking of derivations as markovian mappings from PMs to PMs is handled with derived objects.[2] Take structure dependence or, Island/ECP effects or the A-over-A condition for example. We know what needs doing: we need to say that some kinds of G relations are illicit between some positions in a derivation tree so that some extensions of the derivation tree are G-illicit. Is there a nice compact description of what these conditions are that make no mention, however inadvertently, of PMs?

That’s it. I have been completely convinced that derivation trees are indispensible. I am convinced that derived objects (aka PMs) are completely recoverable from derivation trees. I am even convinced that one need not transit through PMs to get to the right <s,m> pairs (in fact, I think that thinking of the mapping via PMs that are “handed over” to the interpretive components is a bad way of thinking of what Gs do). But this does not yet imply, I don’t think, that PMs are not important G like objects. At the very least they describe the kinds of information that we need to use to specify the class of licit derivation trees. Thus, we need an account of how information is brought forward in derivational time in derivation trees and, more importantly, what is not. Derived objects seem very useful in coding the conditions on G-licit derivation tree continuations. And as these are the very heart of modern GG theory (I would say the pride of what GG has discovered) we want to know how these are coded with PMs.

Let me end with one more historical point. Syntactic Structures argues that transformations are necessary to capture evident generalizations in the data. The argument for affix hopping and Aux movement was not that a PSG couldn’t code the facts, but that it did so in such a completely ugly, ad hoc and uninformative way. This was the original example for the utility of compact representations. PMs proved useful in similar ways: inspecting their properties allowed for certain kinds of “nice looking” derivations. The structure of a given PM constrained what next derivational was possible. That’s what PMs did well (in addition to feeding <s,m>s). Say we agree that PMs are not required (or really add much) in understanding the mapping between sounds and meaning (i.e. in deriving <s,m> pairs) what of the more interesting use to which OMs were made (i.e. stating restrictions on derivations). Is this second function as easily, insightfully discharged without PMs? I’d love to know.

[1] It is perhaps noteworthy that there is not a clear match between grammaticality and semantic interpretability. Thus, there are many unacceptable sentences that are easlity interpreted and, in fact, have only one possible interpretation (e.g. The child seems sleeping, or Which man did you say that left). This, IMO, is an important fact. We odn’t want our theories of linguistic meaning to go off the rails if a sentence is ungrammatical for that would (seem to) imply that it has no licit meaning. Now there are likely ways to get around this, but I find nothing wrong with the idea that an expression can be semantically well formed even if syntactically illicit and I would like a theory to allow this. Thus, we don’t want a theory to not yield a licit <s,m> just because there is no licit derivation. Of course, how to do this, is an open question.

[2] See here and here for another possible example of how derivation trees handle significant linguistic details that are easy to “see” if one adopts derived objects. The gist is that bounding “makes sense” in a theory where PMs are mapped into PMs in a computationally reasonable way. It is less clear why sticking traces into derived objects (if understood as yields (i.e. ss or ms in <s,m> pairs) makes any sense at all given their interpretive vacuity.

24 comments:

Greg KobeleFebruary 22, 2016 at 8:59 AM
Marcus Kracht's work on compositionality addresses your
> idea that an expression can be semantically well formed even if syntactically illicit
The idea is that there are independent domains (phonological, categorial, semantic), each with their own operations. What we think of as the grammatical operations are really pairings (or triplings) of operations from each of those domains. One application of *merge* to an expression e = is the simultaneous application of the sound part to ph, the category part to c, and the meaning part to m. It is therefore no problem to trace the 'meaning' part of *merge* through an otherwise illegitimate series of derivational steps.

I'm not sure that this is the right way to think about semantic well- despite syntactic ill-formedness. I think it is useful to distinguish three cases. 1) an underivable sound-meaning pair. 2) a derivable but otherwise ill-formed s-m pair (perhaps because there were three subjacency violations). 3) an s which is not part of any derivable s-m pair but which is a few cognitive `repairs' away from a derivable s'-m' pair. I am happy saying that cases like 3 abound; I have no problem understanding the speech of late second language acquirers of English, or of very early first language acquirers. A commonly accepted instance of case 2 comes from processing difficulties like center embeddings (but I used the example I did because I think that it is natural and faithful to the original proposal to view subjacency this way). It is attractive to me to think that all cases of something being "semantically well-formed despite being syntactically ill-formed" is an instance of either 2 or 3.
ReplyDelete
Replies
Alex DrummondFebruary 22, 2016 at 10:23 AM
> The gist is that bounding “makes sense” in a theory where PMs are mapped into PMs in a computationally reasonable way. It is less clear why sticking traces into derived objects (if understood as yields (i.e. ss or ms in pairs) makes any sense at all given their interpretive vacuity.

What I got out of the previous discussion on that point is that it’s possible to imagine certain kinds of finite state transducer that might need to insert traces. So thinking in string terms, imagine that each output token is determined by the most recent k input tokens together with the most recent k output tokens. This kind of transducer couldn’t put anything into “storage” for an indefinitely long time, and so would have to regularly copy dummy symbols into the output instead.
ReplyDelete
Replies
OmerFebruary 22, 2016 at 11:23 AM
"[...] Stabler-like MGs (minimalist grammars) deal with minimality effects by effectively supposing that they never arise (it is never the case in Stabler MGs that a licit derivation allows two objects to have the same accessible checkable features)."

I am far too ignorant of the details of Stablerian MGs to know if this statement is true — but if it is, then I don't see how these grammars can be understood as models of natural language.

Consider agreement intervention in Basque. It is easy to show that ABS nominals in Basque do not require licensing by a functional head. Additionally, non-clitic-doubled DAT nominals intervene in agreement relations targeting the ABS. And, crucially, neither the ABS nor the DAT have any "accessible checkable features" in that configuration, in that neither requires licensing-by-checking by the time the phi probe enters the derivation.

The only way I can make sense of the above statement, then, is if we assume that ABS nominals have a feature that requires checking-by-agreement in all derivations except in those licit derivations where a DAT intervenes (and there are such derivations; you get 3sg agreement even with a non-3sg ABS argument). But at this point, haven't you just introduced a feature that amounts to [+/- minimality is in play]? Couldn't you just as easily add a [+ not in an island] feature to every wh-phrase that happens not to be in an island? I guess what I'm saying is that if that's the treatment of minimality, then you've pretty much admitted minimality is real and your system can't handle it.

But like I said above, it is entirely (entirely!) possible that I have not fully understood what's at stake.
ReplyDelete
Replies
UnknownFebruary 22, 2016 at 2:11 PM
It seems to me that this post talks about several issues at once without clearly distinguishing them.

1) Can we generate the same sound-meaning mappings without using PMs?
2) Are PMs Markovian in a sense that derivation trees are not?
3) Do PMs allow for more succinct descriptions of certain well-formedness conditions compared to derivation trees?
4) Are there structural conditions that are natural over PMs but unnatural over derivation trees? If so, are any of these conditions similar to what we observe in natural languages?

Let's take them one by one:

1) Expressivity: Yes, as you point out yourself.

2) Markovian: Depends. If I read correctly between the lines, your worry is that we can easily put conditions on derivation trees that one might consider instances of look-ahead, whereas this is less natural from a PM perspective. But the devil is in the details here. I can take look-ahead constraints and compile them into the grammar so that they are enforced without look-ahead, so PMs do not protect you from look-ahead. At the same time, it is very simple to restrict the constrains on derivation trees to make them Markovian in your sense. In fact, this is already the case for standard MGs.

3) Succinctness: Depends on your definition of succinctness. The situation is not at all like MGs VS MCFGs, where you have an exponential blow-up in grammar size. Over derivation trees, you can still reference every position that a phrase moves to. If you take some constraint like the Proper Binding Condition and look at its implementation over derived trees and derivation trees, the latter won't look much more complicated, though admittedly some additional complexity needs to be added to the derivational definition of c-command if you want it to be exactly identical to c-command in derived trees. But i) it is unclear that you need perfect identity, and ii) the difference in description length is still marginal.

4) Naturalness: Depends on your definition of naturalness. Any effect brought about by movement could be considered unnatural over derivation trees because instead of the moving phrase you only have a Move node in the derivation. But this is only unnatural if you ignore the fact that Move nodes establish very close links to their movers --- links that syntacticians often think of as multi-dominance branches.

The last point is also why I'm pretty puzzled whenever syntacticians are reticent to switch to derivation trees. The switch is very painless, it's pretty much the same as working with multi-dominance trees (yes, we usually drop the movement branches because they do not add any information for MGs, but that doesn't matter for the things syntacticians care about in their work). Yet despite being a superficially minor switch that is easy to adapt to, it opens up a multitude of new perspectives, which you can tell from the discussions we've had on this blog before. Recent examples include parallels between phonology and syntax, the status of successive cyclicity, hidden implications of sideward movement, and the connection between features and constraints.
ReplyDelete
Replies
Tim HunterFebruary 23, 2016 at 6:49 AM
Norbert wrote: this does not yet imply, I don’t think, that PMs are not important G like objects. At the very least they describe the kinds of information that we need to use to specify the class of licit derivation trees. Thus, we need an account of how information is brought forward in derivational time in derivation trees and, more importantly, what is not. Derived objects seem very useful in coding the conditions on G-licit derivation tree continuations. And as these are the very heart of modern GG theory (I would say the pride of what GG has discovered) we want to know how these are coded with PMs.

Unfortunately I most likely won't have much time this week to participate further in this discussion, but just briefly: As far as I can tell, no one denies that some information has to be carried forward about the current "derivational state" in order to define which transformations (e.g. merge/move steps) are applicable at any particular point. And no one denies that full PMs of the traditional sort are sufficient to carry forward this information. To my mind, the question is whether all of the information that full traditional PMs encode is necessary. Almost certainly it is not all necessary, for the same reasons that various people have intuitions about something like phases: hand-waving a bit, only a certain amount of relatively recent history is relevant.
ReplyDelete
Replies

Add comment

Faculty of Language

Comments

Monday, February 22, 2016

Derived objects and derivation trees

24 comments:

Contributors