Thursday, February 4, 2016

David Adger; Baggett Lecture 3

Well, he did it! Three excellent and provocative talks on syntactic theory. Here is the third set of slides. In this talk, David went after sidewards movement (SWM), a favorite idea of mine and, though you might not believe this, I sat quite demurely through the whole talk and basically agreed with much of the what David had to say. I, not surprisingly, did not buy the conclusion, but I did buy the way that he set up the problem and the way that he approached a solution given his empirical judgments about the empirical viability of SWM. How so?

David makes two important points (i.e. points that I completely agree with).

First, that contrary to what is sometimes said, the current definition(s) of Merge does not by itself rule out SWM as an instance of Merge. It is sometimes claimed that whereas Merge when applied to E and I instances is a binary operation, when extended to SWM becomes a 3-place operation. What is correct is that one can define SWM to be 3-place and thereby invidiously distinguish it from the other applications, BUT, this is not a definition forced by any notions of conceptual simplicity or computational elegance. It is just something one can do if one wants to rule out SWM.

Moreover, as David seemed to concede, there is no really non ad-hoc way of ruling SWM out by simplifying the definition of Merge. All require further (ahem) refinements (what I would dub, extrinsic machinery designed to rule out a perfectly well defined option). As he noted in the talk, and I have been at pains to emphasize in conversation over the years, SWM is what you get when you leave the simple definition alone. To repeat. It is possible to merge two unconnected expressions together (E-merge) and to merge a subpart of one expression to that expression (I-merge). SO why is it not also possible to merge the subpart of one expression to another that it is not a subpart of (SWM). In other words, you can "look inside" a constituent and you can have multiple constituents in a "workspace" and this is all you need to allow SWM unless you make things more complicated. And that's why I have always thought that SWM is a natural consequence of a very simple definition of Merge and that preventing it requires either complicating the definition or arguing that more goes into Merge than the simple operation.

Moreover, how to complicate matters is not a mystery. The idea that I-merge requires AGREE (i.e. move = Agree + EPP) suffices to block SWM as ex-ante the target does not c-command the mover. Needless to say, someones modus ponens can be someone else's modus tolens and one might conclude from this that AGREE is a suspect operation (e.g. someone like moi) that should be forcefully thrown out of our minimalist Eden. But, if SWM proves to be empirically unpalatable, well this is one way to get rid of it.

Side note: of course if E-merge and I-merge are actually the very same operation then why I-merge needs to be licensed by AGREE but E-merge does not (indeed cannot) have to be becomes a bit of a conceptual mystery. And please don't tell me about having to identify the inputs to Merge as if finding an expression inside a constituent is particularly computationally demanding (indeed, more demanding than finding an element in the lexicon or the numeration).

Second, David thinks that it is important to start thinking about how operations like Merge are computationally realized algorithmically. In fact, his talk presents an algorithm that makes SWM unavailable. In other words, Merge the rule allows it but the computational implementation of Merge inside an architecture with certain kinds of memory restrictions prevents it. So why not SWM on this view? It's the structure of linguistic memory stupid!

I liked the ambitions behind this a lot. I've argued before that this line of thinking is what MP should be endorsing (e.g. see here and here and search from SMT on the site for others). It seems to me that David is on board with this now. In particular, that the SMT should concern itself not merely with restrictions imposed on FL by the interfaces AP and CI but also by the kinds of memory structures that we think are necessary to use Gs generated by FL. We know a little about these things now and it is worth speculating as to what this would mean for FL. David's talk can be viewed as an exercise in this kind of thinking. Great.

So, I loved the lecture, but did not buy the conclusion. Let me note why very briefly.

One thing to look for in a new research program are things that are different or novel from the perspective of older research programs. SWM should it exist is a novel kind of operation that we would not have though reasonable within GB, say. However, if the above is right, then in an MP Merge-centric context this is a kind of operation we might expect to find. So, if we do find it, it constitutes an empirical argument in favor of the new way of looking at things. Thus, if SWM, then it is very interesting.  Of course, this does not mean that it is right. It probably isn't. But it is very interesting and we should not try to get rid of it because it looks novel. Just the opposite. We should look for cases and see how they fare empirically. IMO, SWM analyses have been pretty insightful (e.g. Nunes on parasitic gaps, Uriagereka and Bobaljik and Brown on head movement, moi on adjunct control, and some new stuff on double object constructions whose authors must remain nameless for now). This stuff might all be wrong, but I find the analyses very interesting.

So, thx to David for 3 great lectures. High theory indeed! And also lots of fun. As I noted before, when the lectures become available on video I will link to them.


  1. Taking a glance at the slides, it seems like the mechanisms used to preclude sideward movement are reminiscent of stuff I recently published in a pre-reincarnated journal (Given who my advisor was, I had to publish it under a pseudonym. jk!):

    The driving intuition I had with that is the distinction between the relative ubiquity of where we find regular 'upward' movement and the relative rarity of 'sideward' movement. Though I didn't really address whether that might just be the result of the constraints on what would have been reasonable in GB.

    1. I think that you get to a conclusion similar to David's but by a different route. Yours looks more like the AGREE based idea where the computational space must be surveyed to find the inputs to Merge with some looking's harder than others. So it's easier to find unattached items than it is to find one item in another and maybe harder still to find one item in another and one unattached item. This search then ranks the instances of Merge.

      I think that David's idea is a bit different. He thinks that there is a strict 2 cell bound on the part of the work space where things are combined. This makes SWM impossible, not just hard. Moreover, it is not related to search, but to the nature of the memory architecture of the system that implements Merge. This results in a hard constraint preventing SWM altogether.

      I admit, that I am not sure how to understand "minimal search." Is it part of Merge? Part of the algorithm that is a pre-condition for using merge? Part of the computational system's architecture? At any rate, your idea and his are family related, it seems to me, but nonetheless exploit different intuitions.

    2. Brooke, I think Norbert's right (I had a look at the paper). My idea was not about search really, though search is necessary to find terms to Merge, it was about the amount of space you can use for the actual operation itself. SWM needs an extra unit of memory that just isn't there. Still, I think we're aiming at the same target. I'd like to hear from Norbert where he thinks a SWM approach needs to go to handle the reconstruction asymmetries in ATB and Parasitic Gaps. From a SWM perspective there has to be a general principle, easily available to the learner, that predicts the asymmetries, since they're not present in available data. SWM predicts no asymmetries, but they are quite sharp. So I take that as a prima facie argument that the system the child is using for building her grammar of English doesn't allow SWM. If it does, what is the general principle that stops the full copy being available in the PGC or second conjunct.

    3. Yeah, I wasn't lying when I said I had only taken a glance at the slides. I sat down with them last night and the differences are apparent.

      A curious generalization about SWM is that whenever it applies, it seems to require a secondary upward movement such that there is a c-command relation between that higher position and the lower ones. Nunes explicitly encodes this, but I think/hope there can be a deeper explanation of that generalization. I have my ideas, but it's not easy

    4. I am not a big fan of the reification of workspaces in popular minimalist thinking; they are like an appendix, except they were never really good for anything in the first place.
      I had tried to deal with sidewards movement phenomena in a minimalist setting almost a decade ago now, by using ideas from Manzini and Rousseau (and ultimately Gazdar). I introduced them at TAG+, and then extended it to account for the reconstruction asymmetries in PG at a GGS.

    5. You must have had deja vu at my talk yesterday, Greg! But Don't you need a forest if you are going to Merge two XPs already constructed. It seems to me what you have to say is that both of these objects have to be opaque, but that when you do Move, the object has to be transparent, so you can't unify Merge and Move then. Of course if you do Move via slash, you don't anyway. The other thing I don't like about slash features is that they have to be category valued, which lets in complex values in feature structures, and then you need a theory about just how complex these can be (why no value that is itself a complex value allowing long distance selection?). So I like the No Complex Values hypothesis I defended in that 2010 `Minimalist theory of feature structure' paper, which rules out a slash analysis. But otherwise, I guess we agree!

    6. In fact, Greg, maybe that should have been my answer to your question about why workspaces after my talk here in Chicago. Without workspaces, the unification of the structure building and structure changing components of the grammar is lost (so it's not unified in Stabler's system, and CG deals with movement dependencies in other ways). And without a structured workspace (differentiation stored resource vs arguments of an operation), you allow SWM. I think that works, and is maybe a better answer than convergence or cyclicity. Or is there a way of unifying Merge and Move in Stabler type system?

    7. merge (a,b) = {a,b}
      move (a) = merge(a,a) = {a,a} = {a}

      I explained this in a short note that I sent to Thomas to be posted on this blog.

      I do not think that this is as important as many seem to. In particular, I am very skeptical about any possible payoff this could have wrt explaining language emergence in the species, which seems to be the only motivation which has ever been given for its import.

    8. @Greg: I remember you sending me the note, but only in the context of a conversation with Chris Collins about Transfer. If I was supposed to post it to FOL and didn't, I'm sorry for my negligence. *blush*

      @David: I don't think the choice of feature system is relevant here. You also get sidewards movement "in a single workspace" (like Greg, I'm not to fond of the concept) if your grammar allows for lowering movemen. Intuitively, sidewards = upwards followed by downwards. Lowering movement has Greg's hypothetical reasoning with slash features as one conceivable interpretations, but there's others that are completely independent of the feature system.

      The sidewards debate is actually one about how much lowering should be allowed in the grammar. This debate also has some other battlefields, e.g. whether we need TAG-style adjunction, another operation that you can emulate once you have lowering. Knowing these correspondences makes the debate much more interesting because it adds some juicy empirical problems that would otherwise be considered orthogonal to the topic of discussion.

    9. The main advantage of thinking about sideward movement in terms of workspaces is that you get a very natural explanation for why sideward movement out of certain islands is possible. (I.e., the constituents in question weren’t yet islands at the point in the derivation when the movement occurred.) Of course it is possible to do without workspaces by adding other technology instead, but I don’t think that the any of the alternatives capture the facts about islands so neatly. You end up having to say something along the lines of “movement out of an island is possible when…”. So for example, you can easily add sideward movement to MGs without introducing any notion of workspace, but you then have to explicitly distinguish SM from regular movement in terms of its sensitivity to islands, however exactly islands are encoded.

    10. I don't see much of an advantage, but my hunch is that we disagree on what "neatly" means in this context.

      One could just as well say that island constraints apply to movement paths that contain the island inducing node, and since sideward movement doesn't, it's not subject to that. Or that island conditions induce anti-dominance requirements between LIs and their next associated move node in the derivation, so once again sidewards movement is exempt. Or that sideward movement is not subject to island constraints because it is feature percolation, not movement, as Greg suggests. And that's just three ouf of an estimated bazillion options. I just don't to see how workspaces provide a neater explanation that any of those.

      If we had something like a precisely defined notion of islands we absolutely wanted to stick with, there might be some motivation for going with workspaces. But the reformulations I offered above do not conflict with what we have learned about islands so far. Yes, we intuititively think of them as some kind of barrier that you can't move past, but intuitions are just that, intuitions. Once you write them down in an explicit description language, their complexity turns out to be comparable to any of the alternatives I mentioned above.

      And that's probably where we will agree to disagree. But I just don't get this notion that a perspective is to be preferred to another one because it is more intuitive. Intuitive is not the same as succinct or insightful. For example, the intuitive idea of "movement took place before the phrase became an island" strikes me as one of the more complicated implementations because you have to explicitly introduce a notion of derivational timing that isn't needed for anything else in MGs. Sticking to intuitive notions makes the whole system much harder to understand. That's a general trend that you can also observe in the Collins & Stabler paper, which takes ideas that are considered intuitive/natural and winds up with a system that's much more complicated than MGs.

      It's perfectly fine for a researcher to stick with certain intuitions that make them more productive, but that doesnt mean that these intuitions should be explicitly encoded in the formalism. In particular because what is intuitive greatly depends on the researcher and the questions they ask.

    11. I guess my beef with sidewards movement is really that (i) it makes predictions that things should be identical when they are manifestly not (that's the asymmetries in reconstruction effects) so you need to do something special to make those things different and (ii) it requires non-monotonicity of some sort in the derivation, where information is lower than it normally would be. (i) is fundamentally a question of how the learner will ever have evidence that should stop the reconstruction happening where it does; (ii) is really about restrictiveness of the range of possible form-meaning pairs in a more general sense. The whole conceit of the three lectures was that the current literature is very eclectic in the range of types of derivation it allows, hence in the legitimate form-meaning pairings, and I just think we should pause to consider whether we have let the simplicity of the basic structure building operations drive us towards too unrestrictive an overall system. I was trying to get some restrictiveness back in though constraining the basic operation and using workspaces for that (though one could just directly constrain operations as they apply to objects not workspaces), but of course one could think that it's also because something filters out the structures (some kind of a condition on chains, relaxed just where one wants it to be). I think there's something to be said for the former approach, because it's harder to add ad hoc fixes that just patch the system where it looks like it needs to be patched (changes to the structure building system tend to ramify more widely).

    12. @David: Given the picture you sketch in lecture 3 (in particular, circa slide 14), some of the reconstruction effects one finds in "suspected" sideward movement configurations (e.g., ATB constructions) are indeed asymmetrical (Principle A, Principle C, WCO) – but others are symmetrical (idiom chunks, scope, SCO).

      If this is so, then we are faced with the following two options:

      (a) allow sideward movement, and hope that something else will block those cases where reconstruction effects exhibit asymmetry

      (b) block sideward movement, and hope that something else will account for those cases where reconstruction effects exhibit symmetry

      In your talk, as well as in the comments here, you treat it as self-evident that (b) is preferable to (a). But it's not clear to me that, for example, (b) makes the job any easier for the child than (a) does. Assuming (a) means that the child needs to somehow acquire when and where reconstruction is blocked in a set of constructions that can hardly be assumed to be "readily available" in the input. But doesn't (b) present the child with an entirely analogous and complementary problem?

    13. @David: As far as asymmetries in reconstruction effects go, it's not necessarily the parasitic/non-parasitic nature of the gap that determines whether or not reconstruction to the gap site occurs. For example, the Kearney paradigm is reversed in the case of subject parasitic gaps, where the parasitic gap precedes the real gap. [Wrote this before seeing Omer's comment. The overall pattern of reconstruction effects is clearly pretty intricate.]

      @Thomas: A lot of work on parasitic gaps has had the goal of showing that contrary to appearances, PGs are not a weird exception but an inevitable consequence of the interaction of a number of independently-motivated syntactic principles. The best analyses of PGs make it look like parasitic gaps have to exist given certain other properties of the syntax and/or semantics. Nunes’ analysis succeeds in this respect. If the goal is just to characterize the basic properties of the phenomenon, then of course there are 1001 ways of doing that — it’s not an especially complex phenomenon. I don’t think my position on this has anything in particular to do with preferring analyses that are intuitive (which is not a term that I used, FWIW). My point in favor of Nunes’ analysis was regarding the deductive structure of his theory: the existence of parasitic gaps is derived from a combination of assumptions each of which does some useful work apart from accounting for the existence of parasitic gaps. I quite agree that the details of multiple workspaces and sideward movement can get quite complex. I’d be all for keeping sideward movement and dropping multiple workspaces (which can certainly be done, as you know), but I don’t (yet) know how to do that without introducing special stipulations regarding islands.

      I think there is a danger here of putting too much emphasis on what’s easy or hard from the point of view of MGs. It’s true of course that multiple workspaces are not needed in MGs (and in fact would be difficult or impossible to add to the formalism). However, from another point of view — the one that Nunes and others were adopting — multiple workspaces are necessary to derive any non-uniformly-branching structure using binary (Re)Merge. I don’t think that MGs should get a veto on which theoretical notions are acceptable within Minimalist syntax. After all, MGs began as an attempt to capture a certain aspects of informal Minimalist theoretical practice, and those aspects of informal Minimalist syntactic practice which can be easily implemented in MGs are, roughly, just those aspects that Stabler chose to try to capture.

    14. @Alex: My intent certainly isn't to veto anything, I'm perfectly fine with many different ideas floating around. What I was ranting about in the second part of my post is the exact opposite, ideas being pushed as inescapable or superior to the detriment of others without a really strong argument beyond perceived neatness. [For the record, I'm not saying that anyone in this thread has been doing this, like most rants it was a) a little unfocused and b) fueled by a fair number of unrelated experiences.] At best this gives rise to fruitless debates about why something is or isn't neat, at worst it locks you into a specific perspective and makes you miss viable alternatives.

      That is what I originally pointed out about feature systems, workspaces and ATB/sideward movement; these things aren't invariably intertwined in the way that was implicitly suggested. Single workspace + simplex features without slash percolation still allows for sidewards movement. And once you disentangle these things, you see lots of connections that do not emerge from a workspace-based perspective. I read (or misread, it seems) your comment as implying that the workspace-based perspective is the best approach to sidewards movement and thus "overrules" these other perspectives.

    15. No time to dive into this discussion properly, unfortunately, but I feel obliged to add a link to this Stabler paper here, even if just in the interest of a kind of historical completeness:
      But there's a lot packed into that paper, it's really about two or three distinct issues and it can be hard to pull them apart.

  2. @Thomas, you did post a link to Greg's note, here.

    1. Thanks for the link. Too bad I didn't see it before I finished my penance of 100 Hail Marys.

  3. I don't think I've presented it as self evident, and I don't think it's self evident (though I think it's more interesting!). My hunch is that the reason scope and strong crossover pattern together is because, to get the composition of meanings needed for atb or pgc you need a dependency from the top of the conjuncts/composed adjunct to a base position, and that's what's involved in xo and scope. But that doesn't need a low copy. But in any event, I'm inviting swm devotees to provide an account if the asymmetry. In the meantime, I'm working out an account if the symmetry (well if I can get my head around the Moltmann scope claims).