Sunday, May 8, 2016

Simplicity and Ockham

Minimalists are moved by simplicity. But what is it that moves us and are we right to be so moved? What makes a hypothesis simple and why is simpler better? What makes a svelt G or a minimal UG better than its more rococo cousins? Here is a little discussion by Elliot Sober, reprising some of the main themes of a new book on Ockham and his many Razors (here). It makes for some interesting reading. Here are a few comments.

Sober’s big point is that simplicity is not an everywhere virtue. We know this already when it comes to art, where complicated and ornate need not mean poor. However, as Sober notes, unless one is a theist  then the scientific virtues of simplicity need defending (as Sober notes, Newton defends simple theories by grounding it in the “perfection of God’s works,” not a form of argument that would be that popular today)[1]. As he puts it, how deep a razor cuts “depends in empirical assumptions about the problem.”

I mention this because “simplicity” and Ockham have been important in minimalist discussion and this suggests that arguing for one or another position based on simplicity is ultimately an empirical argument. Therefore, identifying the (implicit) empirical assumptions that license various simplicity claims are important. Sober discusses three useful versions.

The first of Ockham’s Razor’s rests on the claim that simpler theories are often empirically more probable. Thus, for example, if you can attribute a phenomena to a mundane cause rather than an exotic one, go for the common one. Why? Because common causes are common and hence more likely. Sober describes this as “avoid chasing zebras.”

This form of argument occurs quite a lot in linguistic practice.  Here’s one personal example. In my experience, linguists love to promote the distinctiveness of the non-English language they are expert in. One of the ways that this is done is by isolating novel looking phenomena and providing them with novel looking analyses. Here is an example.

There is a phenomenon of switch reference (SR) wherein the subject of an embedded or adjunct clause is (or is not) marked as coreferential with the matrix one. SR is generally found in what I would call more “exotic” languages.[2] Thus, for example, English is not generally analyzed as a SR language. But why not? We find cases where subjects of non-matrix clauses are either controlled or obviative wrt higher subjects (e.g. John1 left the party without PRO1/*him1 kissing Mary or John1 would prefer PRO1/for *him1 to leave). When there is a PRO the non-matrix subject must be coreferential with the matrix subject and if there is a pronoun it must be obviative). The English data are typical instances of control. Control phenomena are well studied and common and, so, not particularly recondite. Ockham would suggest treating SR as an instance of control if possible, rather than something special in these “exotic” languages. However, historically, this is not how things have played out. Rather than reduce the “exotic” to the linguistically “common,” analyses have treated SR as a phenomenon apart. All things being equal, Ockham would argue against this move. Don’t go exotic unless absolutely forced to, and even then only very reluctantly.

Consider now a second razor: all the lights in the house go out. Two explanations. Each light bulb burned out vs the house lost power. Both explain why the lights are out. However, the single cause account is preferable. Why? Here’s Sober (7): “Postulating a single common cause is more parsimonious than postulating a large number of independent, separate causes.”

Again, this form of simplicity argument is applicable to linguistic cases. For example, this reasoning underlies Koster’s 1984 argument for unifying A-chains, binding and obligatory control. I have personally found this simplicity argument very compelling (so compelling that I stole the idea and built on it in slightly altered form). Of course it could be that the parallelisms are adventitious. But a single cause is clearly the simpler hypothesis as it would explain why the shared features are shared. Is the simpler account also true? Well who knows? We cannot conclude that the simplest hypothesis is also the true one. We can only conclude that it is the default story, favored until proven faulty, and that we need good reasons to abandon it for a multi-causal account, which, we can see, will have no explanation for the overlapping properties of the “different” constructions.

There is one last razor Sober discusses: “parsimony is relevant to discussing how accurately a model will predict new observations” (8). Put simply, simple hypotheses benefit from not overfitting data. Conversely, the more parameters a theory has, the easier it is for unrepresentative data to mislead it.

This is related to another way that simplicity can matter. Simple theories are useful because they are lead footed. They make predictions. The more subtle or supple a theory is, the more adjustable parameters it has, the more leeway it provides, the less it says. Simple theories are blunt (and brittle), and even if they are wrong, they may not be very wrong. So, theories that cover given empirical ground more successfully, may be paying a high predictive/explanatory price for this success.

Here is another way of making this point. The more supple a theory the more data it can fit. And this is the problem. We want our theories to be brittle and simple theories have less wiggle room. This is what allows them to make relatively clear predictions.

Sober ends his short piece by noting that simplicity needs to be empirically grounded. Put another way, there is no a priori notion of simplicity and it is somewhat indexical. So when we talk simplicity, it is in a certain context of inquiry. I say this because Ockham has come to play an ever larger role in modern syntactic theory in the context of the minimalist program. However, unfortunately, it is not always clear in what way simplicity is to be understood in this context. Sometimes the claim seems to be that some stories should be favored because the concepts they deploy are “simpler” than those being opposed (e.g. sets are simpler than trees), sometimes the claim is that more general theories are to be preferred to those which assume the same mechanism but with some constraints (e.g. the most general conception of merge reduces E and I merge to the same basic operation (the implicit claim being that the most general is the simplest), sometimes it is argued that the simplest operations are the computationally optimal ones (e.g. merge plus inclusiveness plus extension is simpler than any other conception of merge). Whatever the virtues of these claims, they do not appear to be of the standard Ockham’s Razor variety. Let me end with one example that has exercised me for a while.

 Chomsky has argued that treating displacement as an instance of merge (I-merge) is simpler than treating it as the combination of merge plus copy. The argument seems to be that there is no “need” for the copy operation once one adopts the simplest conception of merge. The Ockham Razor argument might go as follows: everyone needs an operation that puts two separate expressions together. The simplest version of that operation also has the wherewithal to represent displacement. Hence a theory that assumes a copy operation in addition to this conception of merge is adding s superfluous operation. Or Merge+Copy does no more than Merge alone and so Ockham prefers the second.

But do the two theories adopt the exact same merge operation? Not obviously, at least to me. Merge in the Copy+Merge theory can range over roots alone (call this merge1). Merge in the “simpler” theory (call this merge2) must range over roots and non-roots. Is one domain “simpler” than another? I have no idea. But it seems at least an open question whether having a larger domain makes an operation simpler than one that has a more restricted domain. Question: Is addition ranging over the integers more “complex” than addition ranging over the rationals? Beats me.

One might also argue that Merge2 should be preferred because it allows for one fewer operation in FL (i.e. it does not have or need the Copy operation). However, how serious an objection is this (putting aside whether Merge2 is simpler than Merge1)? Here’s what I mean.

Here is a line of argument: at bottom, simplicity in UG operations matters in minimalism because we assume that the evolutionary emergence of simple structures/operations is easier to explain than the emergence of complexity.  The latter requires selection and selection requires time, often lots of time. Thus, if we assume that Merge is the linguistically distinctive special sauce that allowed for the emergence of FL, then we want merge to be simple so that its emergence is explicable. We also want the emergence of FL to bring with it both structure building and displacement operations. So, the emergence of Merge should bring with it hierarchical structure building plus displacement. And postulating Merge2 as the evolutionary innovation suffices to deliver this.

How about if we understand merge along the lines of Merge1? Then to get displacement we need Copy in addition to Merge1. Doesn’t adding Copyas a basic operation add to the evolutionary problem of explaining the emergence of structured hierarchy with displacement? Not necessarily. It all depends on whether the copy operation is linguistically proprietary. If it is, then its emergence needs explanation. However, if Copy is a generic cognitive operation, one that our pre-linguistic ancestors had, then Copy comes for free and we do not need Merge2 to explain how displacement arose in FL. It should arise if we add Merge1 given that Copy is already an available operation. So, from the perspective of Darwin’s Problem, there is no obvious sense in which Merge2 is simpler than Merge1. It all really depends on the pre-linguistic cognitive background.[3]

So that’s it. Sober’s essay (and book that it advertises (and that I am now reading)) is useful and interesting for the minimalistically inclined. Take a look.


[1] And not only because we are not longer theistically inclined. After all, why does God prefer simple theories to complex ones? I love Rube Goldberg devices. Some even have a profound taste for the complicated. For example, Peter Gay says of Thomas Mann: “Mann did not like to be simple if it was at all possible to be complicated.” So, invoking the deities preferences can only get one so far (unless, perhaps, of one is Newton).
[2] These are tongue in cheek quotes.
[3] I should add that I am a fan of Merge2, though I once argued for the combo of Merge1 + Copy. However, my reason for opting for Merge2 is that it might explain something that is a problem for the combo theory (viz. why “movement” is target oriented). This is not the place to go into this, however.

33 comments:

  1. All things being equal, Ockham would argue against this move....

    The "all things being equal" bit has long seemed extremely important to me. It suggests, correctly in my view, that simplicity is secondary to other considerations. My guess is that it's very rare for all other relevant things to be equal.

    The next example in the post gives a nice illustration:

    We can only conclude that it is the default story, favored until proven faulty, and that we need good reasons to abandon it for a multi-causal account, which, we can see, will have no explanation for the overlapping properties of the “different” constructions.

    The single-cause account is simpler than the multi-cause account, yes, but the former is also explanatory in a way that the latter is not. While the simplicity and the explanatory capacity of the two theories are linked, it seems to me that the explanatory capacity can do a lot more work than can simplicity when arguing for the single-cause account.

    ReplyDelete
    Replies
    1. I am not sure that Sober (or I) would like to radically distinguish simplicity from explanatory potential. Sober's question is why simplicity might be a virtue as regards truth. So why believe that there is anything useful in simpler theories. He gives a three fold answer, bit one of these involves explanatory potential and, in particular, argues that this goes hand in hand with likely truth. Why? Because if unification works it functions like the sole source of power reduction in the light bulb case. It is more probable that similarity is due to common cause than not. So, for Sober, simplicity gains its purchase on truth via the explanatory route which then links likelihood of coincidence.

      If this is right, then simplicity is a feature of explanatory stories SOME OF THE TIME. And this is why looking for these kinds of accounts can lead to the truth. Note, that this has methodological oomph if it turns out that epistemologically aiming for simplicity is not quite the same thing as aiming towards explanation. I suspect that this is right. The former is rooted in aesthetic considerations, the latter in epistemological ones. The claim is that these can coincide and that when they do looking for simplicity is a good research strategy. Of course, you might be wrong. there are, after all, adventitious similarities (fish and whales?) but still, not a bad strategy.

      Delete
  2. Merge in the “simpler” theory (call this merge2) must range over roots and non-roots

    Can you elaborate on this? What do you mean, where does this come from?

    ReplyDelete
    Replies
    1. Only a little. Imagine that Merge could only see the tops of phrases. This means that only roots (unembedded SOs) were amerceable. Copy is not similarly restricted. But say you have a copy operation, then a rule like merge with a more limited domain than the current conception would suffice to deliver displacement structures. You copy anything and then merge the copied expression (which when put into the computation space is a root) with another root. This was actually the 1993/5 idea more or less. In going to the modern conception we widened the domain of the application of merge and this allowed for the elimination of copy. My point was just that it is not clear to me that widening the domain of application of an operation "simplifies" it. Again, is addition in the domain of the rationals "simpler" than addition in the domain of the integers? Hope this helps.

      Delete
    2. But Merge can't only see tops of phrases, or else you can't do internal Merge. Merge needs to see terms. Chomsky's simplest version actually needs to be asymmetrical: (at least) one argument of Merge must be a `free' syntactic object. That's basically why I suggested that system of immemorious Merge: the memory space over which Merge operates is very small, another kind of `simplicity'.

      Delete
    3. @David, whether or not you want to implement movement as internal merge is at issue.

      Delete
    4. Thanks for explaining, Norbert. As I understand him, Chomsky takes the restriction you describe to follow from the binarity of Merge: if you merge A and B, where B is contained in C, then if C ≠ A, the operation is ternary. This derives what David above calls "asymmetry". Seems right intuitively, but I'm not sure it goes through formally.

      Delete
    5. This comment has been removed by the author.

      Delete
    6. @Dennis. Maybe I'm being dumb, but I never really understood that way to rule out sideward movement. So, Merging A with B when B is contained in C, this involves three entities and that's a ternary operation. But I don't see why we are forced to specify the location of B for sideward movement but not upward movement. That is, why do we need to say what B is contained in for some operations and not others?

      Delete
    7. @ David: Matthew is right. Yes, sans Copy you need Chomsky's conception of merge. If you allow Copy then you can do without looking into build structure FOR merge. My only question was whether there was a simplicity argument capable of motivating Chomsky's version of merge. I believe he thinks that there is.

      @Dennis: I am with Brooke here, and David Adger too in his (non-posted) third Baggett lecture. SWM does not require a 3-place merge operation. It may require 3-place operation if one assumes that one cannot see SOs embedded within other SOs unless one selects the container in the select operation. But, why assume that one needs to? Anger gives some mechanisms for accomplishing this, but, IMO, that does not address the central issue: do we WANT to rule out SWM. I have no problem thinking that there are various ways of doing so should it be undesirable. Note too, and this is a jab at Chomsky, if one needs to ADD something to rule out SWM, then given his views about simplicity, the one that allows for SWM is simpler and so is the default preferred version. Of course, maybe it should be disallowed, but the recursive definition of SO Chomsky assumes allows for SWM if not modified. nd as you know, modification implies complication and hence less simple operation. So, do we want SWM or not? Conceptually, yes. The big issue is the empirical one. I am ready to defend the empirical virtues of SWM, but on the conceptual issue, I think that Chomsky is just wrong to think that the standard definition rules out SWM.

      Delete
    8. @Brooke, Norbert: I wasn't trying to argue for Chomsky's view here; I'm not sure it's formally compelling. But I do think there is a reason why we have to identify B via C when the latter contains the former. Think of this as the copy/repetitions problem. If you merge A and B, how do you know that B is the B that's in C, rather than some independent B? If you want it to be Internal Merge of (a copy of) B rather than External Merge of (a repetition of) B, then you need to identify B as *the B contained in C*. Whether or not this really and necessarily means that the operation ternary (perhaps it's weakly ternary ;-)), I don't know; but I can see where the intuition comes from.

      Delete
    9. @Dennis I get that intuition about why we might need to specify where the B is being merged from so we can distinguish copies from repetitions. Assume that works, why doesn't that also hold of upward movement? I imagine we wanna distinguish copies from repetitions for upward movement too, wouldn't that make both upward and sideward movement (weakly) ternary?

      Delete
    10. No, because in regular upward movement you only need two terms: A and B, the root, which contains A.

      Delete
    11. See this is where I feel like I'm being blind to something. How does the root 'know' where the mover is coming from in this case. Sure the root dominates the mover, but it's not like the root wears on its sleeve all the things that it dominates. It seems like you would need to specify the location of the mover as root internal somehow.

      Delete
    12. The root need not know. Perhaps we can think of it in this way: the Merge operation always relates A, B, and C. If B = C, that's equivalent to it relating A and B, i.e. it's binary -- this allows Internal Merge. If B ≠ C, it's ternary.

      Delete
    13. This comment has been removed by the author.

      Delete
    14. Ah! now I see it! I had apparently been misinterpreting the 'if C contains B' bit to be about what dominates B irrespective of whether C was a root or not (which I think is a fair reading of this). But if it's job is to tell us specifically which root B is under, I can totally see the distinction now.

      I guess the question now is how identity can render ternary operations binary, but that's obviously a different question

      Delete
    15. Yeah, I think that's what Chomsky has in mind, but I should stress again that this is my interpretation (as far as I know, he's never spelled it out).

      Not sure I understand what the identity question is supposed to be. If Merge relates A, B, and C, and C = B, then it's simply not ternary in the first place; it's just binary.

      A different question might be what happens in case we merge A and B, and B = the root contains repetitions (rather than copies) of A. Perhaps this would derive/predict some sort of superiority.

      Delete
    16. "If Merge relates A, B, and C, and C = B, then it's simply not ternary in the first place; it's just binary."

      It seems like there's an extensional/intensional difference at play here and that only on the extensional view is it binary. I can't see immediately whether this is important or not, but that seems to be the issue

      Delete
    17. Not sure that's at issue here, since my C = B was meant to mean "intensionally equivalent" throughout. We need someone from the math ling camp to enlighten us.

      Delete
    18. I don't see that the binary version of Merge would be able to distinguish copies from repetitions in the general case. If the root contains two independent instances of 'John', then setting A=John and B=[...John...John...] doesn't tell you which 'John' has moved. I guess the idea is that Minimality plus some kind of ban on left branch extraction would always disambiguate? But we have e.g.:

      John, I gave a picture of t to John.
      John, I gave a picture of John to t.

      So I'm not sure if that strategy would work.

      Delete
    19. @Alex: I think this is at least in part an orthogonal issue (although an important one). My assumption above was that you *minimally* need to be able to say whether or not you're relating A to something that's contained in it or not. But then it leads to the question like the one you raise (which, if I understand correctly, is identical to the one I mentioned above).

      Incidentally, I don't think that English topicalization is a good example here, since it's most likely not simple movement but a kind of dislocation.

      Delete
    20. @Dennis
      I would go Alex one step further: to the degree that Minimality plus LBE plus phases might work to distinguish copies from repetitions then to that degree SWM can. The same restrictions hold in the SWM case as in the simple I-merge case (or so I argued in the 2009 book).

      I should add two more points. All of this is moot if we allow the G to track selections from the lexicon. This is already done in any theory that has numerations. A numeration with 2 selections of an LI is different from a numeration with one. The question then becomes whether we should allow this tracking in the syntax. So two selections of 'John' are indicially distinguished. The claim is that this is a gross violation of inclusiveness. So, that violation is ok in the numeration but not in the G. I confess that this is too subtle for me. Ok, say that there is no numeration. One still needs to "remember" which are copies and which not. It is claimed that this is all done within the phase. But then why assume that SWM is less capable than I merge of retaining this info? This only follows if phases can only be defined for single rooted structures. But why assume this? So far as I can tell, this is not so. Note, btw, that long distance binding will need something like indices anyhow. So they must exist in either the G or the interfaces. If they exist in the latter then indices are generally cognitively available so why shouldn't the G use them? But this is an issue for another time.

      Very last point. I think that it is critical to understand that whatever our conclusions here, Chomsky's recursive definition of Merge allows SWM. What I mean is that nothing in the definition prohibits it. What MIGHT prohibit it is the algorithm that APPLIES the definition/procedure. It seems to be a matter of how we define SELECT, the operation that chooses the objects to be merged. If one can only select SOs within SOs by first selecting the container SO then for SWM select must be 3-place. If we can select an SO just in case it is an SO (i.e. SOs are transparent regardless of where they sit) then Select for SWM can be binary. The conceptual question concerns the transparency assumption. In particular: why assume that SOs contained in other SOs can only be selected if the SO they are contained in is also selected? Can there really be a conceptual argument for this assumption?In fact, doesn't the other assumption, viz. that all SOs are qua SOs potential arguments of Merge seem "simpler" (i.e. less encumbered)? Isn't adding the restriction on select complicating the merge operation? And if so, doesn't one need empirical arguments FOR doing this? And if so don't the objections to SWM just reduce to the standard empirical ones? That's what I think. I also happen to think that the empirical arguments FOR SWM are pretty good and that the putative problems are not that severe, but that's my view. What I don't see is that there is ANY good conceptual argument against it, if one does not encumber Merge with further do-dads, a move that Chomsky finds unfortunate in other venues.

      Delete
    21. @Norbert: I don't think we have any strong disagreement. I just think it's an interesting (open) question whether or not SWM is fully formally compatible with binary Merge (because I do think there is a good conceptual argument for keeping Merge strictly binary). The copies/repetitions problem is related but somewhat orthogonal, and I certainly didn't mean to say that this problem is specific to SWM. I was merely trying to reconstruct Chomsky's claim that SWM and such cases are non-binary.

      As for your second paragraph: I don't think tracking selection from the lexicon is a viable option. First of all, this is backtracking, which we want to avoid, whether you do it via numerations or not. Numerations shouldn't be a part of the theory unless we need transderivational comparisons; at least I know of no other principled motivation for having them. Secondly, this becomes somewhat more complex withn it comes to deciding whether some complex object (say, "the tall man" rather than just "John") has been merged externally or internally. In this case we need to know if DP = {the,{tall,man}} is a copy or a repetition, but it's not something that's been drawn from the lexicon (although its terms are, of course).

      Delete
    22. I agree that we don't strongly disagree. I was trying to sharpen matters to see what the issues are. Thx for helping do this.

      I agree that numerations are likely not useful anymore given that merge over move economy accounts are out. I though, however, that we still had phase reasons to consider numerations, but I might be out of date here. That said, how exactly does the current grammar track the distinction between copies and originals? I thought that Chomsky allowed this to be tracked within a given phase. In other words, it is a fact about phasal memory that it can distinguish copies from originals (oviducts of E vs I merge) within a phase. If this is not how things are done, how is it done?

      I should add, that IMO, contrary to Chomsky here, indexing selections from the lexicon is not a very computationally onerous task. It is a technical way of distinguishing types from tokens (objects from occurrences) which is something that almost any computational system needs to be able to do (See Marcus discussion in The Algebraic Mind). thus, it is likely something NOT proprietary to language. At any rate, we all agree that this needs doing and so the question becomes how does the G do it. I am asking here because I really am not sure what the standard wisdom is. Help is appreciated.

      Delete
    23. Norbert: I don't think anybody knows how copies are distinguished from repetitions in a Merge-based system (although there are proposals). I think it's a huge gap in this (otherwise very impressive and revolutionary) idea that PSG and TG can be collapsed into Merge. It needs to be resolved before this can be considered a coherent system.

      Yes, Chomsky at some point had the idea that you can do it by tracking EM vs. IM within the phase, but what does that really mean? EM vs. IM is a descriptive, not an ontological distinction; it's the same operation. At some point he argued that you could do it by having EM apply before and IM at the phase level, but that just re-introduces a Merge vs. Move distinction, and it's also not clear to me how this could be implemented (if Transfer 'sees' IM applying at the phase level, that doesn't in and of itself change anything about the representation shipped off to the interfaces). You can make either Merge or Transfer more complicated by making them add indices (as Collins & Stabler do, if I remember correctly), but it will always remain a complication.

      In short, I don't think there's any standard wisdom, or if there is, I'm not aware of it. I think it's a huge and very important open issue (but note that outside of venues like this one, hardly anyone in the field seems to care).

      I don't know what the phase-related reasons might be to assume numerations/arrays. I don't see why they would be necessary. If phases (assuming we need them at all) can be delimited in any other way (uFs, interface legibility, whatever) they're just redundant if not vacuous.

      Thanks for the pointer to Marcus' work in this context, I should take another look at it. But I think in this case the problem is really introduced by set theory, where there is (as far as I know) no meaning to the statement that two "x"s in a given representation are different objects, whereas two "John"s in NL can be.

      Delete
    24. Thx for this. It was very helpful. Note, that indices are just the device to distinguish two different tokens of the same type. that's what they are designed to do. Chomsky has claimed that indices violate inclusiveness. Maybe. But we actually have no current account of what it is to "take" a lexical atom and use it in a syntactic derivation. Presumably, when we access an atom, the lexicon does not shrink by one. Selecting an atom for the syntax does not reduce the size of the lexicon. So, accessing is more akin to tokening. However, whatever the story, I agree that it is important to get it straight.

      I am on board as well with he observation that numerations are an unneeded encumbrance. I thought that phases were defined at some point by the subnumerations associated with them. I see that this is may not be required anymore. I am not as moved as others are by the idea that phase heads drive all syntax and don't even find the valued/unvalued distinction that useful. But that is a discussion for another time over beer (I'll buy). So thx again.

      Delete
    25. Interesting discussion. Like Dennis, I'd always taken Chomsky to mean, by ternarity in this case, that Merge would have to have access to 3 objects. But I think if you work this out in any formal way, you end up being unable to make the distinction without, ultimately, reintroducing a disjunction that gives the Merge/Move distinction. You can sort of see it in his latest specifications of what Merge is: (i) Select A from the workspace; (ii) select B (from either the workspace or from A); Merge(A, B). True, here the disjunction essentially comes for free, if there is only the workspace and objects constructed, but, if constructed objects are available for selection, as they have to be to allow internal Merge, then why, in clause (i) here can't you select C in A, already constructed. To rule that out, you need to say that only top-level elements in the workspace are accessible. In terms of simplicity, we'd want to make clauses (i) and (ii) the same: (i) select A; (ii) select B; (iii) Merge(A,B), but then we allow sideways move. This is basically what I said in that 3rd Baggett lecture, and is, I think, of a one with Norbert's view that there doesn't seem to be a simplicity argument here. I don't think the copies/repetition story that Dennis told really helps in the absence of a good theory of that, and I agree with everyone that we don't have such a theory. I tried, in that third Baggett lecture, to say that Merge was maximally simply (as in the version just above) but the data structure it operated over had a cache-register structure, so that Merge can only see inside the register, and that register isn't big enough to allow sideways movement. So that's taking some of the complexity out of Chomsky's definition of move, but placing that complexity in the memory architecture. I guess the conceptual argument for that is not simplicity of structure, but rather efficiency of operations (operations are more optimal if they work over smaller structures). So that's a second kind of economy: not just simplicity of definition, but reduction in required resource.

      Delete
    26. I agree with David's main points, particularly that there is no current way to squeeze a prohibition against SWM simply from the properties of Merge. I also agree that one way to do this is to go for some property of the memory system and argue that a simple memory structure gets the prohibition. This is what David did in his Baggett lecture 3. Where David and I probably part ways is on how much independent evidence (conceptual or empirical) we have for the particular memory structures needed to get the prohibition. IMO, the conceptual argument is weak at best and the main motivation is to eliminate SWM. If this is correct (and I am pretty sure that David would disagree) then the question resolves down to the empirical arguments for or against SWM. Not surprisingly, my main argument in its favor resolves around the issues of adjunct control and parasitic gaps. I very much like Nunes PG account (IMO, it is the only extant theory that derives most of the PG effects without too much stipulation) and I also believe that the parallels between adjunct control and complement control requires that they be treated in a parallel manner. So, if you like the MTC for complements you will like SWM for adjunct control. Of course, someone's modus pinene can be someone else's modus tollens so even if you accept the conditional there are (to my mind, very unsavory but logically coherent) ways out.

      Last brief for SWM: it really does seem to be a novelty made available by BPS and merge style thinking. One of the things we should be open to is considering such theoretical novelties. This is what happens in real sciences when one gets a change in theoretical perspective. IMO, we have been much too quick to discard the possibility of SWM. Even it is turns out to be something that w want to dump, we should not do too before we examine its virtues and vices more carefully. Right now, IMO, most of the rejection has been knee-jerk, more a reflection of GBish prejudices than careful consideration. There have been some good points made (e.g. David's reconstruction arguments) but it would be good to take it seriously enough to compile a convincing list of problems so that we fully understand its downside.

      Delete
    27. Thanks all for this discussion here. Lots of issues in urgent need of clarification. I'm planning to work on some of them, so this has been very helpful!

      Delete
  3. This comment has been removed by the author.

    ReplyDelete
  4. On Sober

    Not picking up any pretty much any of the comments, but...

    Sober has been on this topic for MANY years, going back at least to his 1975 book "Simplicity". That one is of some particular interest to linguists because it devotes one chapter (#3) to . . . "The sound pattern of English"! He does so because, as he notes, it was (and probably still is) just about the only instance ever of actual working scientists trying to make explicit (and formal) just what they meant by 'simplicity', not merely invoking it.

    Subsequent works, especially "Reconstructing the past" (1988, a chapter on "The philosophical problem of simplicity"), offer further reflections and revisions on the topic. No doubt the current work references the earlier stuff, but the SPE focus of "Simplicity" is not something that comes up all that often in my experience.

    --RC

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete