Monday, January 11, 2016

An interesting PoS argument makes it to the "show"

Linguists tend to publish for other linguists. And this is fine. However, it was not always so. There was a time when linguists saw themselves as part of a larger cog-psy community and published in venues frequented by non-linguists. Cognition was a terrific venue for such work and it enabled linguistic discoveries to influence debates about the nature of mind (and, occasionally, even the brain). However, even in these golden years very few linguists published in the leading general science journals and this had the effect of segregating our work from the scientific mainstream. Books like Pinker’s The Language Instinct were effective conduits to the larger scientific community, but really nothing gains scientific street cred like publishing in the big three peer reviewed high impact journals like Science, Nature and PNAS. Moreover, as readers of FoL know, I believe that the single best way to politically advance linguistics and protect it economically is to disseminate our results to our fellow scientists. So, with this as prelude, I am delighted to note a paper that has yesterday appeared in PNAS of that ilk. The paper (here) has three authors: Chung-hye Han, Julien Musolino and Jeff Lidz (HML). Aside from being very GFL (i.e. “good for linguistics”) that such things are being published in PNAS it’s also a very good paper that I heartily recommend you take a look. It even has the virtue of being a mere 6 pages (a page limit we should encourage our own journals to try to approximate). So what’s in it? Here is a quick précis with some comments.

The paper argues that FL requires that speakers adopt (at least in the unmarked case) a single G when exposed to PLD. The reasoning for this conclusion is based on a novel Poverty of Stimulus (PoS) argument. What makes it novel is that the paper outlines how a particular kind of variation can be driven by properties of FL. Let me explain.

The standard PoS argument looks at invariances across Gs and shows how these can be accounted for with some or another proposed innate feature of FL. Variation among Gs is then attributed to the differential inductive effects of the Primary Linguistic Data (PLD). What HML shows is that this same logic allows FL to accommodate some G variation just in case the PLD is insufficient to fix a G parameter in the LAD (aka language acquisition device (i.e. kid)). In such cases, if FL requires an LAD to construct a single G for given PLD, then we expect to find variable Gs in a population of speakers with the following three key features: (i) Speakers exhibit variability wrt a certain set of (relatively recondite) grammatical phenomena, (ii) This variability is attested between but not within speakers, and (iii) The variability is independent between speakers and parents. Let me say a word about each point.

As regards (i), this is the fact HML discovers (actually this possibility is first described in an earlier 2007 HML paper). HML shows that in Korean the height of the verb explains scope of negation effects. These effects are quite obscure and verb height cannot be induced from the inspection of surface forms as they can be in languages like French and English. In effect HML shows argues (IMO, shows) that “children’s acquisition of this knowledge [viz. the scope facts, NH]…is not determined by any aspect of experience…because the experience of the language learner does not contain the necessary environmental trigger” (1).

As regards (ii), HML shows that the variation is consistent within speakers across different negative constructions and over time. In other words, once a LAD’s G fixes the position of a verb (and with it negation) it fixes it there consistently.

As regards (iii), the G variation in the population is effectively random. It is not possible to predict any speaker’s positioning of the verb by examining the Gs of parents (or, for that matter, anyone else). In other words, as there is no data that could fix where the verb sits in a Korean G then the fact that it gets fixed is a product of the structure of FL and so we expect random variation.

This is a very clever argument. Note, that it directly supports the logic of general PoS arguments without assuming G invariance of the output. Or, to put this another way: PoS arguments generally proceed from invariant properties of Gs to features of FL. HML notes that it is consistent with PoS logic that there be variation so long as it is random. The fact that one can find such cases further strengthens PoS logic.

The paper has two other virtues, IMO, more directly relevant to syntactic theory.

First, it provides a model of the kinds of things that syntacticians keep asking for. Syntacticians keep asking whether psycho-ling results can help choose between various alternative syntactic proposals. In principle the answer is, of course, yes. However, paradigms of this are hard to find. HML provides an example where a psycho-ling result could be close to dispositive.  The relevant syntactic alternatives hail from the earliest days of Minimalism when V raising was a hot topic of inquiry.

Cast your mind back to the earliest days of the Minimalist Program (MP), indeed all the way back to 1995 and the Black Book. Chapters 2 and 3 presented alternative theories of V raising. The chapter 2 theory (see 138ff) basically provided a theory in which French (where Vs overtly raise to T) is the unmarked case and English (where V does not overtly raise) is the marked one. The markedness contrast is argued to be a reflection of a leading MP idea (viz. economy). The idea is that overt raising involves fewer operations than overt lowering plus LF raising so an English G is less economical than a French one as a matter of FL principle (not UG incidentally, but FL) for it uses more elaborate derivations in getting V to T.

The cost accounting changes in chapter 3 (roughly 195-198) where English “procrastinating” Gs are the unmarked case. The idea is that LF operations without PF effects are more economical than ones that result in PF “deletions” (an idea, btw, that lingers to the present in current MP accounts that take Gs to relate meanings with sound). This requires rethinking how syntactic morphemes are licensed (checking) and how they enter derivations (fully featurally encumbered), but given this and some other assumptions French overt raising in the chapter 3 theory is less grammatically svelte than covert V to T, and hence less preferred.

HML bears directly on these two theories and argues that they are both wrong.  Were the chapter 2 story right then we would expect that all Korean Gs assigned Korean Vs high positions. Were the chapter 3 theory right they would all have low positions. The fact that both are available and equally so argues that neither option is better than the other. This leaves open the question of what theory allows both Gs to be equally available (see below). I suspect that a single cycle theory with copy deletion can be made to work, but who knows. Now that HML has shown that both are equally fine, we know that neither of the earlier stories can possibly be correct. Note that this does not mean that the HML account is inconsistent with any markedness view of V raising. This brings me to the second virtue of the HML story. It raises an interesting theoretical question.

So far as I know, there is currently no G story for why it is that LADs need choose a singly G for V raising. The data are quite clear that this is what happens, but why this is required is theoretically unclear. Thus, HML raises an interesting grammatical question: what is it about FL/UG that forces a choice? I can imagine some answers having to do with how complex the lexicon is and that having functional heads that optionally assign features to V in one of two ways is more costly than having one that does it only one way. This might have the desired effects if judiciously worked out. However, this then predicts that some Gs, mixed Gs, will be more costly than uniform Gs. This, in effect, makes English the marked case again given the fact that English Gs raise be and have (and maybe modals) but not more “lexical” verbs.[1] At any rate, none of this is a theory, but the HML data raise an interesting theoretical question as well as closing off two reasonable prior alternatives. So it serves as a nice example of how psycho work can impact syntactic theory.

Let me end with one more point. Unlike much publicity concerning linguistics, the HML work offers an excellent example of what linguistics has achieved. This exploits real linguistic advances to make its scientifically interesting point. And this is in contrast to lousy ways of advertising our linguistic wares. One reaction to the invisibility of linguistics in the general scientific culture has been to try to co-opt anything “languagy” to promote linguistics. The word of the year competition at the LSA is an excellent (sad) example.[2]  The idea seems to be that this kind of thing garners media attention and that there is no such thing as bad publicity. I could not disagree more. The word of the year has nothing to do with linguistics, nothing to do with the serious advances GG has made, and relies on no expert/professional knowledge that linguistics brings to the scientific table. As such it does nothing to advertise our scientific bona fides. It’s, IMO, crap.  And using it to advance the visibility of linguistics is both counterproductive and dishonest. I don’t know about you, but scientific overreach (aka, scientism) makes my teeth hurt. This should not be what a professional linguistics organization (the LSA) should be doing to promote linguistics.  What should it be doing? Advertising work like HML, i.e. making this kind of work more widely accessible to the general scientific community. This is what I had hoped the LSA initiative (noted here) was going to do. To date, so far as I can tell, this hope has not been realized. Instead we get words of the year and worse (see here). It’s almost like the LSA is embarrassed by work in real linguistics. Too bad, for as HML indicates, it can sell well.

So, read the HML paper and advertise it to scientific colleagues outside linguistics proper. It is both interesting in itself and good publicity for what we do. It’s real linguistics with a broader reach.

[1] And this might predict that mixed Gs will necessarily only allow Vs robustly indicated in the data to be “different,” (e.g. raise). This is consistent with what we find in English where be and have are pretty PLD robust. It would be interesting to crank these cases through Charles Yang’s forthcoming learner and see what the limits on exceptionality would be for such a story.
[2] Thx to Alexander Williams for some co-venting about this.


  1. I'm wondering about the assumption in this post that the markedness facts in the syntactic theory should correspond directly with the acquisition facts. Couldn't the theory in which English is the syntactically simpler case (with French T having extra properties that cause raising) be true even if the acquisition facts were exactly as they are (and vice versa)? That is, complexity of derivation does not necessarily imply anything about what happens in acquisition. It might be that the theories differ in terms of syntactic complexity/markedness/etc but nonetheless require data to fix the derivation in the particular languages, with Korean still being the extreme case where there is no data. The stronger claim, which would be the acquisition variant of the derivational theory of complexity, requires an additional assumption about how derivational complexity maps to order of acquisition. What this case does argue against is a syntactic theory in which parameters have default settings that explain acquisition facts (or at least one in which there is a default setting for V-raising).

    1. Not sure I see this. I take markedness to mean that in the absence of evidence to the contrary the unmarked option is realized by a G. Thus, in the absence of PLD the unmarked option is taken.

      What does this imply about the course of acquisition. Well, it indirectly implies that Korean is impossible if you assume, as Chomsky did, that one of the two options is unmarked (with markedness a product of complexity, i.e the most economical option is unmarked). Why? Because there is no evidence to the contrary to dislodge the unmarked case. As you show, there is no evidence at all, thus no evidence to mark the unmarked option. Thus, there is no PLD relevant to changing the parameter value and so this is the unmarked one is what the LAD should have is its G.

      So, complexity of derivation does not imply anything about acquisition, but if complexity of derivation is taken as implying markedness then it should. Or maybe I am missing something (quite likely).

    2. I guess the issue is about what "markedness" means. There are three possibilities:
      1) Unmarked = simpler/economical representation; Marked = more complex/less economical representation
      2) Unmarked = typologically common; Marked = typologically uncommon
      3) Unmarked = default for learning; Marked = needs evidence for learning.

      I think that (1) does not necessarily imply (3). That is, you could have a difference in grammatical complexity (economy, etc) with or without making a claim about learning. I think the strongest theory is one in which 1-3 are equated, but it is by no means necessary.

    3. I think that (2) is a notion unsuitable to GG. There is no reason to conclude that because something is typologically common that it represents an unmarked value of a parameter. What is common is a function of PLD and parameter defaults. It is entirely possible for PLD to regularly reset a default parameter to allow for the marked to value to be typologically visible.

      Nor need this be the only way to explain surface predominance. If I recall correctly, Berwick and Niyogi presented a learning model in which OV would go to VO pretty quickly and stay there once certain random surface changes occurred. This is not because OV is more marked but because of how their learning system worked.

      Ok, let's now look at (1) and (3). Economy ranks representational options saying that the more economical obtain all things being equal. What makes them unequal? Well, if we understand this to mean that economical systems are the default, then it seems to me that ti comes (very) close to saying that this is what a G does unless…Well unless what? One thing might be unless the more economical system cannot converge. But from what I can tell both the Korean cases converge at both AP and CI. So unless…? Well unless the more economical system fails to analyze the PLD correctly. This relates markedness to simplicity, as in the old evaluation metric (which parameters were intended to replace, I believe), and so makes markedness directly relevant for learning. In fact, I believe that this is how the notion was intended to be understood in a parameter based setting (which, recall replaced the evaluation metric which was intended to bear on learning). Now, it's logically possible that we want to separate (1) and (3). But then I am not sure what (1) is supposed to mean when convergence is not at issue. What is the mark of economy if not as a default? What's the "unless" you are envisioning for the case of V to T that HML discusses?

      So, yes you can have a claim about economy without committing hostages to learning. But the only two that I know of either default but for convergence or default but for the PLD. I know of no other options, though it might conceivably exist. But absent specification of the "unless" clause I believe that (1) and (3) are linked in the case HML discusses.

    4. Ok, Jeff is right (big surprise, right). To see this consider the role that economy plays in the Chomsky ch3 account. French differs from English in having a strong feature on T. This forces movement of V in overt syntax (strong features crash a derivation unless checked by AP, i.e. in overt syntax). What does economy do? It PREVENTS movement of V to T in English as moving in overt syntax is less economical than moving covertly (procrastinate).

      Let's now go to Korean. What prevent a Korean speaker from having an optional strong feature on T? In other words in the absence of evidence a Korean speaker flips a coin and either puts a strong or a weak feature on T. If the former then V is always high. If the latter V is low but economy is still needed to prevent overt raising as an option. So the economy issue does not tell the learner which feature to append to T.

      Why can't a speaker append either, and so end up with 2 Gs wrt T's features? I suggested that this reflected UG. Jeff has suggested that this need not be so as the learning algorithm might force the LAD to choose one or the other but not both. Yang;s theory will have this consequence (Jeff tells me) and has been modeled by Lisa Pearl). This is not unlike the learner noted above in Berwick and Niyogi wrt OV/VO.

      This non UG version makes a prediction, though it might be hard to test. It predicts that in the early days of acquisition a learner will entertain both possibilities for a while. So we should expect learners that allow both kinds of Gs for a while rather than opting for one or the other straightaway. This would be true not only in Korean but in any language. It will be very hard to test in Korean given how subtle the relevant data to test it is. But the learning theory view seems to make this prediction.

      Last point: I think that the acceptability of both structures in Korean points to another consequence for the analysis of V to T. It had better not be that at CI the two Gs look the same given that they have different semantic consequences. So, covert raising does not create structures identical to those that overt raising does. In particular, covert V to T does not bring neg along for the ride, though overt raising does. Nor does Neg raise at LF or neg lower at LF. So, the standard idea that at LF/CI all Gs look the same seems clearly false.

      Thx Jeff.

    5. This paper is very nice, and the overall idea is one that seems very plausible, but I think it leaves open a syntactic question to do with the analysis in terms of head movement: how does Neg raising to T allow it to scope over the object? Unless you do some funky playing around with c-command and the definition of segments, there ain't no way that the negation is actually going to have scope over the object-it’s too structurally embedded in the verb. Could the analysis rather be related to reconstruction of the object below the base position of negation? I understand that QPs are frozen wrt each other in scope, but nothing shows they are frozen wrt negation, does it? Or have I missed something? So then the findings would not be about string vacuous head movement, but rather whether reconstruction of an object below Neg is possible. Subjects would never reconstruct (perhaps related to topicality), but object reconstruction would vary by speaker. Or have I missed something obvious that's discussed elsewhere?.

      The more general result, which is very cool and stands irrespective of the analytical details, reminds me a bit of Judit Gervain’s Master's thesis where she showed via a cluster analysis over judgments that there were two previously undetected `dialects’ of hungarian wrt whether you have movement or a null resumptive strategy, with speakers consistent in each, though, as far as I recall, no geographical, age, gender etc distinction was significant.

    6. Nice point. I think that there are two assumptions possible to make this run. One is, as you suggest, assuming that neg adding to the verb allows it to scope as high as the verb does. My recollection is that Baker had a version of this assumption for head movement so that adjunction of the heads to heads would not violate the ECP. The other option is to assume that the height of neg correlates with the verb. So there are multiple positions for Neg, some higher than others. My recollection is that for some of the East Asian languages there is NPI data indicating that neg licenses NPIs in subject position, in contrast to English. Neg is always on the V so to get this to work the assumption is that neg can scope high (from C). This would suggest differential NPI licensing effects in the two "dialects" of Korean that HML discusses. I don't know what the facts are however.

      As for freezing scope wrt negation; not sure. In English neg scope seems to track more or less S-structure c-command, doesn't it? In all cases except for the notorious "Everyone didn't leave." But this is sensitive to the universal nature of the subject. "Many people don't like herring" does not allow the neg>many reading. So, go figure.

    7. I think it's not hard to come up with a mechanism whereby head movement would allow the moving element to c-command out of the landing site (e.g. by finessing the category/segment distinction, as David alluded to above). But abstracting away from the particular technical apparatus, I think the real issue is this: we all agree that we don't want head movement to feed things like c-selection (imagine, for example, a language in which nouns head-move all the way to D; if head movement could feed c-selection, the prediction would be that in such a language, you could the stick nominals headed by a deverbal noun wherever a VP could go). I take it that this kind of thing was part of the motivation for the head-adjunction structure in the first place.

      So we want the post head-movement structure to be asymmetric enough to preserve selection, but symmetric enough to allow c-command out of the landing site. I think I can hazard a guess as to what David's response to this would be: that this tension is an artifact of clinging to head movement as a component of the grammar. Personally, jettisoning head movement accrues debts that I don't know how to repay (in particular, in the area of syntactic cliticization / clitic doubling; see the last two sections of this for a review). I think Matushansky's approach provides a good alternative: "head movement" is just movement of a head to the spec of the next projection, followed by fusing of that spec to the adjacent head. Notice that, if this is so, it would extend the c-command domain of the moving head by at least one projection. (Whether successive iterations of head movement would continue to expand the c-command domain of the original mover depends on what the outputs of iterated applications of this fusion operation look like.)

      Crucially, selection is unaffected by all of this. That is because, being an instance of probe-triggered movement, head movement will only occur to the specifier of a head that projects. And if selection operates over projections, it is the head of the landing site, not the moving head, that will count for these purposes.

      [Keen eyes will notice that this way of talking about selection and projection is incompatible with Chomsky's most recent approaches to labeling. I have to admit, I'm not too worried about that; see here for some of the reasons why.]

    8. @Omer: You have indeed uncovered my secret motivation in raising this issue here - to undermine head-movement wherever it appears! I have serious issues with the Matushansky Gambit, not least the feeding back of a morphologically constructed element into the computation, that then is subject to syntactic operations cyclically - which you note above. Of course, I don't really think there are heads (except roots), so I kinda have to object to the Matushansky Gambit, although of you take, say, the short negation to be a spec in Korean, and allow it to raise to a higher negation in the functional structure (which is version 2 of Norbert's solutions, and what I proposed in my Olomouc talk a few years back), that will work. I'm sure this will raise some interesting exchanges at my Baggett lectures in a few weeks!

    9. A very different (and imho appealing) alternative is furnished by a perspective that treats derivation trees rather than phrase structure trees as the central syntactic object. This is somewhat awkward to explain without trees, but I'll try my best:

      Over derivation trees, derived c-command reduces to dominance --- X c-commands Y if a Move node associated to X properly dominates Y. If X head-moves to Z and Y is somewhere in the compliment domain of Z, the corresponding Move node properly dominates Y. Consequently, X c-commands Y. However, in the derivation tree itself X is still in its base position, its locus of syntactic dependencies has not changed. This means that the selectional properties of X do not somehow get transferred up to Z.

      Full disclaimer: formally we can of course redefine selection so that this kind of relocation to Z takes place. But the most intuitive picture from a derivational perspective is the one above , i.e. the one that emerges without any extra machinery beyond the derivational definition of derived c-command that is needed anyways.

    10. But Thomas, from that perspective how do you make the Move node pull up the head and not the phrase? They won't be distinguished in a Derivation Tree, will they? I just think that heads aren't in the domain of structure building operations - in my Syntax of Substance system, that's because there are no heads (you have Brody style telescoped trees).

    11. Are head movement and phrasal movement distinguished in the derivation tree? Yes and no.

      I would posit two different "interpretations" of Move, one that takes a phrase and puts it in a position right under the Move node, and another one that takes a head and adjoins it to the head that carries the movement-triggering feature. How one distinguishes between the two is mostly a technical matter. You could have distinct feature types, some kind of structural principle (if there is one, empirically), an economy constraint, and so on. But ultimately a distinction can be made, it's just not expressed in tree-geometric terms in the derivation tree.

      That said, if it turns out that head movement doesn't exist that's fine in my book. I just wanted to point out that the c-command puzzle is not all that puzzling from a derivational perspective; it's healthy to sometimes put aside phrase structure trees and think in a slightly different representational format.

    12. A leaf in a derivation tree is exactly a head. Brody style telescoped trees (which are borrowed from dependency grammar) are simply an equivalent alternative notation for the usual MG derivation tree. Being clear about this dissolves all puzzles about, for example, how such structures are constructed derivationally.

      I myself prefer the post-syntactic conception of head movement, as initially made precise by Ed Stabler, and then expanded upon by myself. It is clear that very many ideas in this domain are just a variant of this (Brody's mirroring, DM, Spanning, etc). From what I can glean from google books, your idea in SoS seems to be prefigured by this tradition.

      I would welcome a very explicit working out of the semantics of constructions involving negation in Korean, under different analytic assumptions (in particular, under a post-syntactic head movement view). C-command only reflects semantic scope to the degree it does under the assumptions that scope takers basically have continuation types, and that function application is the only mode of semantic combination.

    13. Isn't the usual assumption that a specifier of a specifier of Z doesn't c-command into the complement domain of Z? In Thomas' terms, ZP with an XP specifier is is a Move node associated to XP, so XP c-commands everything dominated by Z; but without additional assumptions a WP specifier of X doesn't.

      If that is right, then the extension of the derviational definition to head-movement would only let a head c-command from the first step of movement.

      This wouldn't be enough for, e.g. what has been claimed for English negation (in e.g. *Will anybody not help me? vs. Won't anybody help me?), where the suggestion is that Neg is below T to begin with, as seen by the facts of do-support, and doesn't c-command into the subject even if it moves to T (*Anybody won't help me). But if it is carried up to C by T-to-C movement, then it does (see Ian Roberts' 2010 book for discussion).

      Similarly, in Han et al's 2007 analysis of Korean, Neg is separated from T by another functional head, so the extension of the c-command domain needed seems to go beyond what is usually assumed for phrasal movement, contra what Thomas said as far as I understand.

    14. @greg, yes, my own system leans heavily on Brody, and your answer is exactly what I'd assumed about derivation trees, which is why Thomas's initial point puzzled me. I thought Brody was the first to suggest this in his early papers on mirror theory in the latter half of the 90s. Did Stabler propose this before him?
      @peter definitely, which is what David Hall pointed out in his thesis. In Robert's system, it's not really the moving head that does the job of licensing, it's a feature that is already in C, and it is an agree operation that gives it the wherewithal to license the NPI.
      But I think for Korean these issues can be solved by allowing reconstruction of the object to its base position (Han et al assume it raises for case in the paper, and there is adverbial evidence for this assumption). But that of course means that there's no head movement in these cases, as predicted by the `phonological' view. Which makes me a happy syntactician.

    15. @Peter: Alright, here's my crazy attempt at fitting the English data into the story I proposed above. In "Will anybody not help me?" you have standard T-C movement. In "Won't anybody help me?" T lowers to Neg and Neg raises to C. Something similar would have to happen in Korean.

      As somebody who's blissfully unaware of most empirical facts surrounding head movement, I have no idea if this solution is just unappealing or obviously wrong. It's not in line with standard ideas about cyclicity, but the way that I implemented lowering movement in MGs a few years ago it is actually cyclical in the derivation tree.

    16. @David: Not sure why *Anybody won't help me is a problem for the idea that Neg is able to c-command out of the complex head it is in. I think the following is also bad:

      (1) *Anybody seems to Mary to not have been expected to leave.

      If I'm right about that, then it seems that lower copies of A-movement simply don't count for (this kind of) NPI licensing, at least in English. (Since there are several traces of anybody that are c-commanded by negation, in this example.) Whatever the explanation of that is, it seems to me it will also cover *Anybody won't help me, since in that example, the NPI's surface position is not in the c-command domain of the complex head located in T. [I'm assuming here that it is the head, rather than any other projection, that counts for c-command; and that there is no segment/category monkeybusiness of any kind. Simply your good ol' "first branching node"-style stuff.]

      This is again very much compatible with the Matushanskyan approach I sketched above: any head can c-command from its landing site after head movement, because it just lands in a specifier position. I fully agree with your(=@David) discomfort about the idea of a morphological operation feeding back into syntax – I think that's a nonstarter, actually – which is why I think that m-merger has to be a syntactic operation proper. It is an operation that takes two immediately-c-commanding nonbranching nodes and allows subsequent syntactic cycles to treat them as a single term. (And as I noted, you could formulate this operation in such a way that successive applications would continue to extend the c-command domain of the moving heads.)

      Some people (Hi, Norbert!) might not like this addition to the syntactic repertoire of operations. I admit I am not particularly moved by such considerations, unless they are accompanied by an adequate theory of clitic doubling that doesn't resort to m-merger (or traditional head movement). From where I sit, clitic doubling cannot, pace Sportiche, be treated as garden-variety phrasal movement (overt or covert); that's because, after clitic doubling, neither the full noun phrase nor the doubled clitic count as interveners, and I simply don't know how phrasal movement alone could deliver that. Phrasal movement followed by m-merger, however, could. [Here's a definition of "intervener" that delivers this: an occurrence of X is an intervener for agreement iff it c-commands every other occurrence of X and is phrasal. Thx to Norbert for some helpful discussion of this.]

      There's more to say, but this comment has rambled on for long enough.

    17. Rambling or not, I neglected to point out that the idea that clitic doubling is derived by m-merger belongs to Boris Harizanov (2014, NLLT) and was developed in subsequent work by Ruth Kramer (also 2014, NLLT).

    18. I'll have a look at the Harinazov paper, but I can't see why clitic doubling requires syntactic m-merger, and I'm definitely with Norbert in not wanting new operations. In fact, a big chunk of my Baggett lectures, when I come over in a fortnight (, will be about how we have a proliferation of Merge type operations (indeed, a Menagerie of Merges): external, internal, parallel, sideways, m-merger, late merger, under merge, ... I feel like it's a bit back to the 70s with a zoo of transformations. We need a bit of a clean-up in the theory.
      On clitic doubling, I quite like the proposal that Daniel and I buried in the Kiowa book (p125ff of Mirrors and Microparameters). Essentially clitics are agreement features that have got trapped in the wrong projection line and are expressed by partly phonologically independent units, phonologically bound to the exponent of the rest of the projection line (cf Hale's 2001 analysis of Navajo prefixal morphology).

    19. Like I said, the reason clitic doubling justifies m-merger (as far as I'm concerned) is syntactic intervention. Lexical noun phrases that have not been clitic doubled are interveners; those that have been clitic doubled are not interveners, nor are the clitics interveners. [It's not about the Activity Condition; I'd gladly explain why, but I'm trying to avoid rambling.]

      I don't have your(pl.) book on hand, but does it have an account of the intervention facts?

    20. No, it is just about the positioning of the clitics in Kiowa. But, if I understand you right, you're saying that m-merger effectively removes the features from intervening. Fine, but those features could be absent for other reasons too, any one of which could correlate with expression as a clitic. For example, let's say that presence of phi in an extended projection (which leads to a clitic being morphologically expressed) deletes phi on the outer layer of a Kase phrase, stopping that phrase from intervening. No m-merger, but lack of intervention follows. Conversely, there are cases where there is m-merger (e.g. Matushanky's suggestion for the analysis of English negation) and you still have intervention (negation is still a syntactic intervener when it is expressed as an inflection on an auxiliary). I may have misunderstood what your argument is, but in any event, there doesn't seem to be a clear correlation between m-merger and syntactic intervention across all cases.
      I guess we can talk abut this soon!

    21. Yes – I think I have a retort (I know, try to contain your astonishment), but let's carry on in person!

    22. @David: I feel like it's a bit back to the 70s with a zoo of transformations. We need a bit of a clean-up in the theory.

      This probably does not relate directly to what you have in mind, but formally it is the case that a lot of additional operations (at least the ones applying to phrases) boil down to adding lowering to the grammar. Late Merge, Sidewards movement, even TAG-style tree adjunction are things that become possible once you allow phrases to move to c-commanded positions. So we can think of all these operations as bundling raising and lowering together into specifically constrained sequences of simple movement steps. The need for raising movement is obvious, the question is how much lowering we need, if any.

      On an even more abstract level, it is the case that all the movement operations you find in the Minimalist literature are first-order definable transductions. So it's not a complete zoo, more like a zoo that only has mammals. That's still pretty bad if you're supposed to run a petting zoo, but at least the children won't be eaten by alligators.

    23. @thomas, yeah, I'd thought of that (first order definability). Hence why the talk on this is a Menagerie of Merges rather than a Zoo of Transformations! Keeping everything within, say, mild context sensitivity is of course important (indeed, a discovery), but that still leaves an awful lot of leeway! For my money, one of the big issues is the link between syntax and interpretation, where a richness of operations and derivational possibilities leads to expectations about which meanings are available, expectations which aren't empirically evident. The child has no evidence in these domains, so shouldn't be able to discern between possibilities. In cases like the Han et al paper, this looks like one case of what is happening (though I think its about reconstruction of the DP, not string vacuous raising of the head), but for things like roll-up derivations, adult speakers are consistent in not reconstructing, but how is a child to ever know this from the data. Equally, why do languages never show differential reconstruction in the roll-up parts of remnant roll-up derivations? If you restrict the system and disallow roll-up (say), then the child has no choice of analyses. I guess this is just classic PoS stuff, arguing for restrictions on derivational types, even though the derivations all fall within a tightly restricted formal class. (That was lecture 2, while my rant about head-movement above is lecture 1!)

    24. @David, I share your skepticism about the v-raising analysis, though perhaps not so much as you. But I'm even more skeptical of the reconstruction analysis for the following reason: if the effect were about optionally invisible quantifier movements, you would expect more languages to show this kind of inter-speaker variability. For example, in English you can evidently reconstruct the subject to a VP-internal position (every linguist didn't believe that claim). But surely there is no evidence for this fact in speech to children. And, as discussed in my PISH post some months ago, children are strongly biased towards isomorphism early in development. So, why don't we find variability in reconstruction effects there? The evidence is probably about the same for the two cases (i.e., English subjects and Korean objects). It can't simply be the universality of PISH, given that languages like Korean evidently require the isomorphic interpretation for subject quantifiers over negation.

    25. @Jeff I think there's actually a lot of variability in things like `everyone wasn't convinced' - they are impossible for many people but possible for others, and I don't think there's a dialectal difference that you can tie to geography. There's old work by Guy Carden that talks about this, though I can't recall the details. There's also the existence of scrambling in Korean which we don't have in English that may feed into the analyses that English vs Korean kids allow to be accessible at different points in acquisition.

  2. Super-minor point: the "Word of the Year" thing is not an LSA event. It's the American Dialect Society, meeting concurrently with the LSA and they schedule this big draw thing at the same time as the LSA business meeting.

  3. The within-subject consistency is really remarkable, but I don't understand how it affects the PoS argument: would the argument be weaker if all speakers generalized the ambiguous input in the same way?

    1. Good question. Here's how I am understanding things: The PoS says that PLD matters and so when there is no relevant PLD there are two possibilities. The first is that FL/UG provides the default and so all LADs converge on that result. This is the standard PoS argument for, say, ECP effects or islands or BT. We have many examples of this fork of the argument. The second possibility is no PLD therefore the LAD flips a coin among the logically possible options. HML provides an instance of this second option. Given that this is a predicted option of the logic, finding an instance of it confirms the general argument form.

      So the argument for Korean would not be a weaker PoS argument should they all have converged, but it would have been less interesting for the logic and validity of the PoS itself, not its conclusions in particular applications.

      That's how I was interpreting things.