Thursday, December 12, 2013

Simultaneous rule application: Help!!!

Lately I have been thinking of something and have gotten stuck. Very stuck. This post is a request for help.  Here’s the problem. It relates to some current minimalist technology and how it relates to the bigger framework assumptions of the enterprise.  Here’s what I don’t quite get: what’s it means to say that rules apply “all at once” at Spell Out.  Let me elaborate.

A recent minimalist innovation is the proposal that congeries of operations apply simultaneously at Spell Out (SO).  The idea of operations applying all at once is not in and of itself problematic, for it is easy to imagine that many rules can apply “in parallel.”  However, when rules so apply, they are informationally encapsulated in the sense that the output of rule A does not condition the application of rule B. ‘Condition’ here means neither feeds nor bleeds its application. When rules do feed and bleed one another, then the idea that they all apply “all at once” is hard (at least for me) to understand, for if the application of B logically requires information about the output of A then how could they apply “in parallel.” But if they are not applying “in parallel” what exactly does it mean to say that the rules apply “all at once”?

One answer to this question is that I am getting entangled in a preconception, namely my confusion is the consequence of a “derivational” mindset (DM). The DM picture treats derivations like proofs, each line licensed by some rule applying to the preceding lines.[1] The “all at once” idea is rejecting this picture and is suggesting in its place a more model theoretic idiom in which sets of constraints together vet a given object for well-formedness. An object is well formed not if derivable from rules sequentially applied, but no matter how constructed it meets all the relevant constraints.  This should be familiar to those with GBish or OTish educations, for GB and OT are very much “freely generate and filter” kinds of models, the filters being the relevant constraints.[2] If this is correct, then the current suggestion about simultaneous rule application at SO is more accurately understood as a proposal to dump the derivational conception of grammar characteristic of earlier minimalism in favor of a constraint based approach of the GB variety.

Note, that to get something like this to work, we would need some way of constructing the objects that the constraints inspect.  In GB this was the province of Phrase Structure Rules and ‘Move alpha.’ These two kinds of rules applied freely and generated the structures and dependencies that filters like Principle A/B/C, ECP, Subjacency, etc. vetted. In an MP setting, it is harder to see how this gets done, at least to me.  Recall that in an MP setting, there is only one “rule,” (i.e. Merge). So, I assume that it would generate the relevant structures and these would be subsequently vetted. In effect the operations of the computational system (i.e. Merge, Agree and anything else, e.g. Feature Transfer, Probing, ???) would apply freely and then the result would be vetted for adequacy. What would this consist in? Well, I assume checking the resultant structures for Minimality, Extension, Inclusiveness, etc.  The problem, then, would be to translate these principles, which are easy enough to picture when thought of derivationally, into constraints on freely generated structures. I confess that I am not sure how to do this.  Consider the Extension condition. How is one to state this as a well-formedness condition on derived structures rather than on the operations that determine how the structures are derived? Ditto on steroids for Derivational Economy (aka: Merge over Move) or the idea that shorter derivations trump longer ones, or determining what constitutes a chain (which are the copies that form a chain?).  Are there straightforward ways of coding these as output conditions in freely generated objects?  If so, what are they?

There is another subsidiary more conceptual concern. In early Minimalism output conditions (aka filters) were understood as Bare Output Conditions (BOCs). BOCs were legibility conditions that interfaces, particularly CI, imposed on linguistic products. Now, BOCs were not intended to be linguistic, though they imposed conditions on linguistic objects. This means that whatever filter one proposes needs to have a BOC kind of interpretation. This was always my problem with, for example, the Minimal Link Condition (MLC). Do we really think that chains are CI objects and that “thoughts” impose locality conditions on their interacting parts? Maybe, but, I’m dubious. I can see minimality arising naturally as a computational fact about how derivations proceed. I find it harder to see it as a reflection of how thoughts are constructed.  However, whatever one thinks of the MLC, understanding Economy or Extension or Phase Impenetrability or Inclusiveness as BOCs seems, at least to me, more challenging still.

Things get even hairier, I think, when one considers that range of operations supposed to happen “all at once.” So, for example, If the features of T are inherited from C (as currently assumed) and I-merge is conditioned by Agree, then this suggests that DPs move to Spec T conditional on C having merged with T. But any such movement must violate Extension. The idea seems to be that this is not a problem if all the indicated operations apply simultaneously. But how is this accomplished?  How can I-merge be conditioned (fed) by features that are only available under operations that require that a certain structure exists (i.e. C and “TP” have E-merged) but whose existence would preclude Merging the DP (doing so would violate Extension).  One answer: screw Extension. Is this what is being suggested?  If not, what?

So, I throw myself on the mercy of those who have a better grasp of the current technology. What is involved in doing operations “all at once”? Are we dumping derivations and returning to a generate-and-filter model? What do we do with apparent bleeding and feeding relations and the dependencies that exploit these notions. Which principles are we to retain, and which dispense with? Extension? Economy? Minimality? How to the rules/operations work? Sample examples of “derivations” would be nice to see.  If anyone knows the answer to all or any of these questions, please let me know.

[1] A strong version of this is that it is only the immediately preceding line can influence what happens “next.”
[2] OT’s filters are ranked, whereas GB filters were not.  However, I don’t believe that this difference makes a difference for my problem.


  1. "... checking the resultant structures for Minimality, Extension, Inclusiveness, etc. The problem, then, would be to translate these principles, which are easy enough to picture when thought of derivationally, into constraints on freely generated structures. I confess that I am not sure how to do this. Consider the Extension condition. How is one to state this as a well-formedness condition on derived structures rather than on the operations that determine how the structures are derived? Ditto on steroids for Derivational Economy (aka: Merge over Move) ..."

    I think there are rarely in-principle problems with re-encoding derivational constraints as representational constraints. To implement the Extension condition, for example, I think all you need to say is that moved phrases must c-command their traces. (This is just the reverse of the old point about "explaining c-command via extension".) And generally, I think you could check satisfaction of Merge-over-Move by reasoning backwards and asking: "Given that the top of this chain is sitting in position X [i.e. given that something moved into position X, in derivational terms], is there anything with a base position higher in the structure than X [i.e. is there anything that was first-merged higher in the structure] of the sort that could have gone into position X?" If so, that's a violation of Merge-over-Move. Some refinements necessary if you have subarrays and so on.

    And even if standard derived representations don't specify all the information you would need to "go back in time" and see how the representation was constructed, you can always enrich the representations to include all the historical information you need. In the limit, if you do this and then also eliminate all the information you *don't* need, then you end up with something that would be sensibly called a derivation tree (distinct from a derived tree). From one point of view, including traces in our derived trees is just a way of "looking back in time" in this sense. One order-of-operations issue that I think is not encoded in what are usually thought of as derived representations is the distinction between cyclic adjunction and counter-cyclic adjunction, which makes all the difference in the famous Freidin/Lebeaux antireconstruction effects: presumably there are two derivations there that "build the same structure", only one of which is "valid". But you can always just say that counter-cyclic adjunction puts a little mark somewhere in the output representation, and bingo, everything can be stated representationally again.

    So the bigger question for me is why we might be interested to bother trying to translate between derivational and representational statements of these constraints ... what is the empirical motivation one way or the other?

    1. Nice points. Thx. I think that the only semi substantive question relates to how one wants to think of output conditions. Are they BOCs or not and if they are what properties of the interfaces do they respond to. To me, notions like Extension, Merge over Move etc make sense as derivational conditions. Less so as representational ones and even less so as BOCs. But that might be me. One last point: there may be an empirical issue re Extension and C-command: if there is sidewards movement then it's not clear that every part of the chain need c-command every other part but there may be a derivation where extension is always respected. But this is pretty abstruse stuff.

    2. The distinction between representational and derivational theories has always struck me as muddled in the syntactic literature. Tim's reply touches on two issues: the split between operational and constraint-based approaches, and the availability of new data structures --- derivations rather than representations. Those two aspects are completely independent. Derivations can be defined via constraints and do not need to be built, and representational theories can be operationalized. So from that perspective there isn't much of an issue here at all, as Tim points out.

      However, if you look at the discussions in Epstein.etal (1998), Epstein&Seely (2002) vs Brody (2002), there's yet another layer on top of this split, which is what kind of explanations we want to use. From that perspective, it does not matter that we can switch from derivational to representational and back, because there's still a fixed class of "derivational explanations" and "representational explanations". I never quite figured out what that is meant to convey, and my hunch is that this is exactly what you worry about when you say [...] notions like Extension, Merge over Move etc make sense as derivational conditions. Less so as representational ones and even less so as BOCs.

      I'm really not sure what to make of this view. The implicit assumption seems to be that the coding of a theory is an integral part for determining its explanatory value. To me, that would only be the case if you could show that one format is a lot more succinct than the other. But whether Merge over Move is enforced in syntax or at the interfaces does not necessarily have this difference in description length (non-trivial issue on a technical level). So in what sense is one version less explanatory than the other?

    3. A handout from my class "Syntactic Models" might be helpful here. It is as close as I have been able to get to a contentful discussion of derivations vs. representations and why we should care. I think the one consideration that could make the discussion meaningful (otherwise it's not) is the existence or non-existence of opacity phenomena -- the same question that has been raised in the debates over OT vs. rules (and derivational flavors of OT). Here's my class handout, so you can see what mean:

      I imagine the same considerations apply to Chomsky's phrase-level representationalism vs. phase-to-phase derivationalism (cf. Kiparsky's Cyclic OT).

    4. Thanks for the handout, it gives a good overview of the arguments in the literature, but I don't think it addresses the basic point I'm trying to figure out.

      The comparisons in the handout aren't just between "filters vs operations" but rather "operations with derivations VS filters over phrase structure trees". None of the phenomena discussed in the handout would be a problem for a constraint-based theory that operates over derivation trees rather than phrase structure trees. Yet if I understand the Epstein et al agenda correctly, that's still not the kind of explanation we want. I never quite figured out why that should be the case; my hope is that if the quote in my previous post were explored in greater detail, it might finally click in my head.

    5. Actually, I don't want to speak for them, but I believe that Epstein et. al. assumed that the derivations were over phrase structure representations, not derivation trees. I actually think that Chomsky has been thinking of this in the same way, at least since he stopped talking about T-markers. Maybe the right answer then is that we should stop thinking in terms of derivations over PS trees and shift to thinking about things in terms of derivation trees. If the problem then goes away, good.

      However, does that mean that you see the issue as possibly meaty if one holds that the manipulated objects are phrase markers?

    6. Thomas: Norbert's right, as far as I know. (And the stuff in my handout actually isn't in the literature -- I mean, the data is, but not the conclusions drawn from it -- which is why I made the handout!) An interesting informal attempt to do something like derivation trees in a post-LSLT world actually appears on the first page of Lakoff's famous paper "On Generative Semantics", by the way. Easy to update in a Merge mode.

    7. It's been a while since I've read Epstein et al., but don't they argue for a tree-based perspective of derivations in the last chapter in order to fix certain problems with the Merge-based definition of c-command? At any rate the issue isn't so much how you think about derivations, but what property makes a theory derivational or representational.

      On a purely technical level Norbert's worries aren't troubling at all. Under very specific assumptions there are constraints that can only be stated over derivations, but those cases aren't very interesting from a linguistic perspective. In general, you can mark-up your basic data structure (derivations, phrase structure trees, feature matrices, etc) with additional information to do what you want irrespective of whether your theory is derivational or representational. But that's just a technical result; there seems to be some kind of invisible boundary beyond which the coding tricks are taken to destroy the derivational/representational spirit of the original account. I'm trying to figure out what this boundary is, and if it restricts the recoding in a meaningful way.

      For example, it is perfectly fine for a representational theory to use indexed traces. However, adding a second index to model effects of derivational timing would probably be viewed as making the theory derivational, even though nothing commits you to a derivational interpretation of this second class of indices. From a technical perspective, they're just indices whose distribution is regulated by certain representational constraints.

      Or take MGs. The way the formalism works, every derivation corresponds to exactly one phrase structure tree, so it suffices to assign derivation trees to sentences rather than phrase structure trees. The set of well-formed derivation trees can be characterized by four representational constraints. So we can view MGs as a representational theory that assigns somewhat abstract trees to sentences. After all, nothing commits us to a derivational interpretation of derivation trees, they are just trees. So what are MGs? Derivational? Representational? Both?

    8. First, Tim and Thomas are right that as a purely technical issue it is easy enough to trade off between derivational and representational idioms. Moreover, as Thomas observes this is just a matter of enriching the representations so as to code for derivational history (and even timing). All this is not only correct, but has been recognized as such, well, since trace theory was introduced and people started considering the question (note: prior to trace theory there could not be adequate representational theories). If this is right, the conclusion should be that either this is NOT the point at issue, or there is no point to be at issue once we clarify away the conceptual fog. So which is it?

      Well, it depends on how seriously one takes notions like Bare Output Condition, Inclusiveness, Economy etc. Inclusiveness, for example, would seem to warn against the kinds of coding tricks that Thomas recommends as palliatives. How so? Well, the idea is that the kinds of indices suggested are not in any reasonable way understood as lexical properties of expressions and as derivations cannot alter lexical info, these are not legit. There are plenty of things to worry about with this idea: it's not that crisp, why can't we add this info etc. But, if there is an issue here, that's where it is.

      Ditto with Bare Output Conditions. The original Chomsky idea is that what we thought of as filters in earlier GB stories should be understood as conditions that the interfaces, particularly CI, imposes for legibility. Say this is so, then we can ask for any given "filter" whether this seems like a reasonable feature of CI representations. This again is not a crisp question, but it is not a senseless one, at least to me. So that's the venue for my worry. Even I can provide technical solutions.What I want to know is how these fit into the wider conceptions that Chomsky urged at the start of MP. In other words, if this is the answer, what do we make of the original motivating conceptions? Do we forget about about reducing filters to BOCs? Do we allow whatever representational enrichments we want, do we forget about opacity effects and treat them as mere technical annoyances?

      Of course, one need not be concerned about these issues: indeed maybe NO ONE should be. But, right now I am and to be told that technically there is no problem here, keep on moving nothing to see, does less to sooth my worries than perhaps they should. Or to put this another way: it's our business to decide what kinds of information our representations carry and maybe we cannot decide which way to go at any given time but reducing the worries to technical ones is not where I want to go right now.

    9. Norbert, I know where you're coming from, I'm just curious if it would be possible to put that line of reasoning on some principled foundation. If we consider the whole space of grammars and simply split representational and derivational according to whether the grammar uses rules or constraints, then there is no issue here at all. However, if there is more to this divide, i.e. we can add further criteria to what makes a theory derivational or representational, then we might deal with much smaller classes and it might no longer be possible to translate between them in every case. And then we just have to check where Minimalism falls in this landscape before Chomsky04 and after it in order to see whether your worries can be soothed. I guess we could also approach it from the other direction: how distinct do representational and derivational theories have to be in order for your worries to be unsoothable? Both questions strike me as very interesting on a technical level, but they cannot be tackled without some precise criteria to distinguish derivational from representational theories.

    10. I agree that we need further criteria. What struck me as intriguing about early Minimalist ideas were that they seemed to offer conditions on what should count as a legit addition to a representation. Inclusiveness, for example, seemed to suggest treating indices with kid gloves (I actually have not been impressed with this as indices seem needed to get something like the Tarski trick for quantification off the ground and so a reasonable BOC) and this would be especially true of indices that tracked derivational timing issues. This, in my view, is what makes derivational economy arguments interesting as if these exist and Inclusiveness frowns on the coding trick you mentioned then we have an argument for a derivational approach here. Ditto with what one takes goes to CI. If FI requires a "clean" representation offloaded to CI and if not all copies are interpretable (e.g. those in intermediate CPs seem plausible candidates, though this is hardly logically required with enough lambdas) then CI representations do not have the wherewithal to sustain bounding effects and so these cannot be BOCs and so are derivational. This, at any rate, is how I've been thinking of this.

      Now add to this CHomsky's proposals about "all at once" rule applications and things get confusing, to me. First, it's pretty clear that HE does not want to treat this using filters (I suspect because he still likes to think abut this as BOCs). So what does he have in mind? Charles makes some suggestions but they seem hairy to me. However, then maybe he SHOULD be thinking of this in filtering terms. In which case, what does it mean for the earlier dicta? Those are my concerns. I sense that you find notions like Inclusiveness, Extension, Economy too fluffy to be useful. Is this right? Wouldn't it be worth trying to see if the intuitions behind these can be made more precise?

    11. I agree with Norbert that things like Inclusiveness and Bare Output Conditions are probably the best places to look for answers to this question. But in a way, insisting on Inclusiveness or insisting that all of the "weird stuff" is enforced by Bare Output Conditions seems to be almost just a roundabout way of insisting on a derivational theory. I suppose the closest thing to an empirical argument for these would be along the lines of "Darwin's problem": very roughly, we take the funny-looking properties of language to result from an interaction between (a) weird non-linguistic stuff that was already hanging around and (b) some clean, less weird, "single-mutation" addition; and the classical 1995 idea about BOCs keeps (a) and (b) separate by having the former embodied representationally and the latter embodied derivationally. If we accept the relevant assumptions, this seems more or less reasonable, as far as it goes, but I'm not sure how far it really does go. We have some pretty hairy questions to answer, such as (1) whether it's important that our theory provides some distinction to point to that separates the two parts, (2) whether the derivational/representation distinction is the right or only way to draw that distinction, (3) whether we could use this same distinction the other way around (e.g. embody the supposed pre-linguistic stuff derivationally and embody the "single-mutation" stuff representationally?), etc.

    12. YES!!! I think that Tim has, once again, hit the nail on the head. There are two separable points: given some of the standard minimalist assumptions/principles (central dogmas in Crick's sense) what is one to make of notions like doing things "all at once." And there is a second question which is IF this requires going representational, is this a bad thing. Indeed, one can ask Tim's questions more pointedly, as he did: is the way Chomsky cut the problem up re Darwin's problem the right way to do it. This is a very good question to think on, though I doubt we will get persuasive answers in the near term. Big deal. Don't raise them, never answer them.

      Re (3): I suspect not, at least if one thinks that recursion (aka unbounded Merge) is the big addition. I think that Chomsky's emphasis on this as the central big addition pushes him to think that derivations are where the action is and that filters on this basic addition are best seen as interface conditions, aka BOCs. The question then breaks down into what merges are good ones. The answer comes in two parts: those that are simple and "efficient" (in some sense to be made precise) and useful/useable by other cognitive systems (BOCs). The division makes sense, at least in an inchoate sport of way. The question Tim raises is whether there are other ways to make sense of things given a Darwin's Problem perspective. Maybe. It would be nice to sketch one of these out.

  2. I think this is doable but very hairy (as one might expect).

    I’m thinking about the literature, and computational work, on parallelizing derivational rules of phonology, with the use of finite state transducers. You take a pair of strings (of, say, segments), which basically corresponds to the underlying and surface representations, and you feed it through all FSTs altogether, moving from left to right one *pair* of letters at a time. Each FST roughly corresponds to a phonological rule. For instance, suppose you have a rule: a->b/_c. You write the FST such that if it sees the pair (a:b), i.e., an “a” in the UR and a “b” in the SR, you move to a state in the FST such that the next pair you see must be (c:c); otherwise the FST crashes and the string pair is rejected. so ac:bc goes through but not ad:bd.

    In practice, it’s very hard to write these FSTs because any derivational interaction among rules will have to be worked out, by hand, to translate into the parallel executions of FSTs. It can be a lot of fun implementing relatively simple set of data but I think any large scale coverage of the morpho-phonology of a language will be very difficult. It is theoretically possible to take a sequence of derivational rules and compile it into parallel FSTs. The Xerox PARC folks have a tool like that, but the resulting FSTs are very large and not very interpretable: you lose the transparency (and fun) of seeing how the FSTs scan through a pair of strings one character at a time, where you can which FST is messing things up.

    For morphophonological analysis, the pairs of UR and SR are generated by, essentially, enumerating all pairs of possible character combinations and then checking them by some type of search heuristic (e.g., breath- or depth-first search). As Barton, Berwick and Ristad point out in their classic study, this could lead to an exponential blowup, especially when dealing with long distance correspondence such as harmony, but in practice as discussed in the 1980s, I don’t think the problem is that severe.

    To do this for syntax, we need to do a few obvious things. First, pairs of strings should be pairs of (sub)trees, as one might go from top to down traversing the structure. Second, the generation problem would have to be solved: complexity is an issue—the alphabet of phonology is usually small—but thankfully, the size of the trees would be bound by the phase. Third, one would have to work out the interactions among the constraints, formulated derivationally, and translate them into parallel checks. Maybe the syntax based MT folks have worked out something along these lines?

    In Sandiway Fong’s implementation of a GB parser, in theory a parallel system, one (obviously) does not do wild generate and test. A Generalized Left to Right parser is used to (over) generates quite a lot possible trees, which are then fed through the constraints (Case Filter, Theta, etc.). The application of the constraints is not parallel either but ordered to improves efficiency (no point in checking Principle A on a structure that already violates Case). I think Sandiway has been working on Minimalist parsers and he’s probably best placed to talk about this.

    1. Hmm Sandiway's parser is maybe not so different from LFG ... I think some of Lakoff's cognitive grammar writings are also relevant, it actually isn't too hard to implement simple Lakoff-type phonologies in Prolog that apply rules in parallel without compiling them into a single FST. Of course you need strata to get anywhere with fun cases such as Tiberian Hebrew.

  3. I am thinking out loud here.

    "So, for example, If the features of T are inherited from C (as currently assumed) and I-merge is conditioned by Agree, then this suggests that DPs move to Spec T conditional on C having merged with T. But any such movement must violate Extension. The idea seems to be that this is not a problem if all the indicated operations apply simultaneously."

    Simultaneous application can't be if there is a condition governing when operation X is licit that is conditioned on the output of operation Y. The only way around this is assume that the output of Y is actually recoverable somewhere in the input to X. Here the first round of trouble is that there is an operation Y that merges C and an operation X that moves DP. X must be fed by Y, but instead you need to say that X is actually triggered by the input to Y, I guess, a C in the workspace which must of necessity merge with TP given whatever else is present in the computation before all this simultaneous application happens. Does that sound right? (I know it sounds crazy, but that's not the question.)

    So now the next question is whether this violates extension. Suppose the other thing that was in the input to the computation was TP, and extension says, extend the maximal dominating XP. If extension is interpreted as a simultaneous-application type condition (i.e. only the input counts) then no, extension is not violated. The output, yes, has a further dominating XP (namely CP), but the output is not visible to the SA-extension condition. That has got to be the idea.

    1. Almost there: this assumes (1) that being in the workspace is sufficient to trigger merge with T. But then we must give up the idea that feature passing and or probing is under sisterhood. In other words, we must rethink the Probe-Goal architecture of the operations (I think that this is wise anyhow, but just saying). (2) Do we need something like Extended projections? Is this what the last paragraph is suggesting?

    2. (1) I figured something would go wrong like this, but I'm not enough of a syntactician to know if we're on the same track - sounds like a detail

      (2) I don't think so. My search for extended projection turned up a lot of references but no clear explanation that would link it to this case so you'll have to fill that in for me too. But what I was suggesting is that the clouds parted for me as I thought this through. That, I believe that, indeed, with no additional mechanisms than the usual (except maybe whatever (1) demands), simultaneous application solves the violation-of-extension problem.

      Let me try and rephrase it, this will help me sort it out. Then maybe you can tell me what the answer to (2) is, or maybe you can help pin down where there's still daylight between what I'm saying and sense. The key is that the Extension Condition be evaluated in a "simultaneous application" way. Forget the term "simultaneous application." All that means is, "doesn't see the output of other operations." EC doesn't see the output of other operations. What does "other" mean? Other than the one it is constraining. In this case, EC is constraining Move-DP. But it doesn't see the output of Merge-C. So as far as it's concerned, the root is TP, not CP. That's how SA solves the EC puzzle.

      So now you tell me if that's solving the wrong problem, solving the right problem but not solving it, or presupposing extended projections.

    3. Actually, I can make it still stronger, now that I think about it. That should make things clearer. Moving to Spec,TP is the ONLY way to satisfy the EC if Move-DP and Merge-C apply simultaneously. There, that should reinforce the fact that yes, Virginia, there is indeed still a non thrown-out EC. One could not move to Spec,CP (doesn't exist) and one couldn't move lower (would violate EC, assuming the TP is in the input).

  4. This comment has been removed by the author.