Comments

Sunday, February 12, 2017

Strings and sets

I have argued repeatedly that the Minimalist Program (MP) should be understood as subsuming earlier theoretical results rather than replacing them. I still like this way of understanding the place of MP in the history of GG, but there is something misleading about it if taken too literally. Not wrong exactly, but misleading. Let me explain.

IMO, MP is to GB (my favorite exemplar of an earlier theory) as Bounding Theory is to Ross’s Islands. Bounding Theory takes as given that Ross’s account of islands is more or less correct and then tries to derive these truths from more fundamental assumptions.[1] Thus, in one important sense, Bounding Theory does not substitute for Ross’s but aims to explain it. Thus, Bounding Theory aims to conserve the results of Ross’s theory more or less.[2] 

Just as accurately, however, Bounding Theory does substitute for Ross’s. How so? It conserves but does not recapitulate it. Rather it explains why the things on Ross’s list are there. Furthermore, if successful it will add other islands to Ross’s inventory (e.g. Subject Condition effects) and make predictions that Ross’s did not (e.g. successive cyclicity). So conceived, Ross’s island are explanada for which Bounding Theory is the explanans.

Note, and this is important, given this logic Bounding Theory will inherit any (empirical) problems for Ross’s generalizations. Pari passu for GB and MP. I mention this not because it is the topic of todays sermonette, but just to observe that many fail to appreciate this when criticizing MP. Here’s what I mean.

One way MP might fail is in adopting the assumption that GBish generalizations are more or less accurate. If this assumption is incorrect, then the MP story fails in its presuppositions. And as all good semanticists know, this is different from failing in one’s assertions. Failing this way makes you not so much wrong as uninteresting. And MP is interesting, just as Bounding Theory is interesting, to the degree that what it presupposes is (at least) on the right track.[3]

All of this is by way of (leisurely) introduction to what I want to talk about below. Of the changes MP has suggested I believe that most (or, to be mealy mouthed, one of the most) fundamental has been the proposal that we banish strings as fundamental units of grammar. This shift has been long in coming, but one way of thinking about Chomsky’s set theoretic conception of Merge is that it dislodges concatenation as the ontologically (and conceptually) fundamental grammatical relation. Let me flesh this out a bit.

The earliest conception of GG took strings as fundamental, strings just being a series of concatenated elements. In Syntactic Structures (SS) (and LSLT for which SS was a public relations brochure) kernel sentences were defined as concatenated objects generated by PS rules. Structural Descriptions took strings as inputs and delivered strings (i.e. Structural Changes) as outputs (that’s what the little glide symbol (which I can’t find to insert) connecting expressions meant). Thus, for example, a typical rule took as input things like (1) and delivered changes like (2), the ‘^’ representing concatenation. PS rules are sets of such strings and transformations are sets of sets of such strings. But the architecture bottoms out in strings and their concatenative structures.[4]

(1)  X^DP1^Y^V^DP2^Z
(2)  X^DP2^Y^V+en^by^NP1

This all goes away in merge based versions of MP.[5] Here phrase markers (PM) are sets, not strings and string properties arise via linearization operations like Kayne’s LCA which maps a given set into a linearized string. The important point is that sets are what the basic syntactic operation generates, string properties being non-syntactic properties that only obtain when the syntax is done with its work.[6] It’s what you get as the true linguistic objects, the sets, get mapped to the articulators. This is a departure from earlier conceptions of grammatical ontology.

This said it’s an idea with many precursors. Howard Lasnik has a terrific little paper on this in the Aspects 50 years later (Gallego and Ott eds, a MITWPL product that you can download here). He reviews the history and notes that Chomsky was quite resistant in Aspects to treating PMs as just coding for hierarchical relationships, an idea that James McCawley, among others, had been toying with. Howard reviews Chomsky’s reasoning and highlights several important points that I would like to quickly touch on here (but read the paper, it’s short and very very sweet!).

He notes several things. First, that one of the key arguments for his revised conception in Aspects revolved around eliminating some possible but non-attested derivations (see p. 170). Interestingly, as Howard notes, these options were eliminated in any theory that embodied cyclicity. This is important for when minimalist Chomsky returns to Generalized Transformations as the source of recursion, he parries the problems he noted in Aspects by incorporating a cyclic principle (viz. the Extension Condition) as part of the definition of Merge.[7]

Second, X’ theory was an important way station in separating out hierarchical dependencies from linear ones in that they argued against PS rules in Gs. By dumping PS rules, the relation between such rules and the string features of Gs was conceptually weakened.

Despite this last point, Lasnik’s paper highlights the Aspects arguments against set based conception of phrase structure (i.e in favor of retaining string properties in PS rules). This is section 3 of Howard’s paper. It is a curious read for a thoroughly modern minimalist for in Aspects we have Chomsky arguing that it is a very bad idea to eliminate linear properties from the grammar as was being proposed, by among others, James McCawley. Uncharacteristically (and I mean this is a compliment), Chomsky’s reasoning here is largely empirical. Aspects argues that when one looks, the Gs of the period, presupposed some conception of underlying order in order to get the empirical facts to fit and that this presupposition fits very poorly with a set theoretic conception of PMs (see Aspects: 123-127). The whole discussion is interesting, especially the discussion of free word order languages and scrambling. The basic observation is the following (126):

In every known language the restrictions on order [even in scrambling languages, NH] are quite severe, and therefore rules of realization of abstract structures are necessary. Until some account of such rules is suggested, the set-system simply cannot be considered seriously as a theory of grammar.

Lasnik, argues plausibly, that Kayne’s LCA offered such an account and removed this empirical objection against eliminating string information from basic syntactic PMs.

This may be so. However, from my reading of things I suspect that something else was at stake. Chomsky has not, on my reading, been a huge fan of the LCA, at least not in its full Kaynian generality (see note 6). As Howard observes, what he has been a very big fan of is the observation, going back at least to Reinhart, that, as he says in the Black Book (334), “[t]here is no clear evidence that order plays a role at LF or in the computation from N [numeration, NH] to LF.”

Chomsky’s reasoning is Reinhart’s on steroids. What I mean is that Reinhart’s observations, if memory serves, are largely descriptive, noting that anaphora is largely insensitive to order and that c-command is all that matters in establishing anaphoric dependencies (an important observation to be sure and one that took some subtle argumentation to establish).[8] Chomsky’s observations go beyond this in being about the implications of such lacunae for a theory of generative procedures. What’s important wrt to linear properties and Gs is not whether linearized order plays a discernible role in languages, of course it does, but whether these properties tell us anything about generative procedures (i.e. whether linear properties are factors in how generative procedures operate). This is key. And Chomsky’s big claim is that G operations are exclusively structure dependent, that this fact about Gs needs to be explained and that the best explanation is that Gs have no capacity to exploit string properties at all. This builds on Reinhart, but is really making a theoretical point about the kinds of rules/operations Gs contain rather than a high level observation about antecedence relations and what licenses them.

So, the absence of linear sensitive operations in the “core” syntax, the mapping from lexical items to “LF” (CI actually, but I am talking informally here) rather than some way of handling the evident linear properties of language, is the key thing that needs explanation.

This is vintage Chomsky reasoning: look for the dogs that aren’t barking and give a principled explanation for why they are not barking. Why no barking strings? Well, if PMs are sets then we expect Gs to be unable to reference linear properties and thus such information should be unable to condition the generative procedures we find in Gs.

Note that this argument has been a cynosure of Chomsky’s most recent thoughts on structure dependence as well. He reiterates his long-standing observation that T to C movement is structure dependent and that no language has a linear dependent analogue (move the “highest” Aux exists but move the “left-most” aux never does and is in fact never considered an option by kids building English Gs). He then goes on to explain why  no G exploit such linear sensitive rules. It’s because the rule writing format for Gs exploits sets and sets contain no linear information. As such rules that exploit linear information cannot exist for the information required to write them is un-codeable in the set theoretic “machine language” available for representing structure. In other words, we want sets because the (core) rules of G systematically ignore string properties and this is easily explained if such properties are not part of the G apparatus.

Observe, btw, that it is a short step from this observation to the idea that linguistic objects are pairings of meanings with sounds (the latter a decidedly secondary feature) rather than a pairing of meanings and sounds (where both interfaces are equally critical). These, as you all know, serve as the start of Chomsky’s argument against communication based conceptions of grammar. So eschewing string properties leads to computational rather than communicative conceptions of FL.

The idea that strings are fundamental to Gs has a long and illustrious history. There is no doubt that empirically word order matters for acceptability and that languages tolerate only a small number of the possible linear permutations. Thus, in some sense, epistemologically speaking, the linear properties of lexical objects is more readily available (i.e. epistemologically simpler) than their hierarchical ones. If one assumes that ontology should follow epistemology or if one is particularly impressed with what one “sees,” then taking strings as basic is hard to resist (and as Lasnik noted, Chomsky did not resist it in his young foolish salad days). In fact, if one looks at Chomsky’s reasoning, strings are discounted not because string properties do not hold (they obviously do) but because the internal mechanics of Gs fails to exploit a class of logically possible operations. This is vintage Chomsky reasoning: look not at what exists, but what doesn’t. Negative data tells us about the structure of particular Gs. Negative G-rules tells us about the nature of UG. Want a pithy methodological precept? Try this: forget the epistemology, or what is sitting there before your eyes, and look at what you never see.

Normally, I would now draw some anti Empiricist methodological morals from all of this, but this time round I will leave it as an exercise for the reader. Suffice it for now to note that it’s those non-barking dogs that tell us the most about grammatical fundamentals.


[1] Again, our friends in physics make an analogous distinction between effective theories (those that are more or less empirically accurate) and fundamental theories (those that are conceptually well grounded). Effective theory is what fundamental theory aims to explain. Using this terminology, Newton’s theory of gravitation as the effective theory that Einstein’s theory of General Relativity derived as a limit case.
[2] Note that conserving the results of earlier inquiry is what allows for the accumulation of knowledge. There is a bad meme out there that linguistics in general (and syntax in particular) “changes” every 5 years and that there are no stable results. This is hogwash. However, the misunderstanding is fed by the inability to appreciate that older theories can be subsumed as special cases by newer ones.  IMO, this has been how syntactic theory has generally progressed, as any half decent Whig history would make clear. See one such starting here and continuing for 4 or 5 subsequent posts.
[3] I am not sure that I would actually strongly endorse this claim as I believe that even failures can be illuminating and that even theories with obvious presuppositional failures can point in the right direction. That said, if one’s aim is “the truth” then a presupposition failure will at best be judged suggestive rather than correct.
[4] For those that care, I proposed concatenation as a primitive here, but it was a very different sense of concatentation, a very misleading sense.  I abstracted the operation from string properties. Given the close intended relation between concatenation and strings, this was not a wise move and I hereby apologize.
[5] I have a review of Merge and its set like properties in this forthcoming volume for those that are interested.
[6] One important difference between Kayne’s and Chomsky’s views of linearization is that the LCA is internal to the syntax for the former but is part of the mapping from the syntax proper to the AP interface for the latter. For Kayne, LCA has an effect on LF and derives the basic features of X’ syntax. Not so for Chomsky. Thus, in a sense, linear properties are in the syntax for Kayne but decidedly outside it for Chomsky.
[7] The SS/LSLT version of the embedding transformation was decidedly not cyclic (or at least not monotonic structurally). Note, that other conceptions of cyclicity would serve as well, Extension being sufficient, but not necessary.
[8] It’s also not obviously correct. Linear order plays some role in making antecedence possible (think WCO effects) and this is surely true in discourse anaphora. That said, it appears that in Binding Theory proper, c-command (more or less), rather than precedence, is what counts.

20 comments:

  1. Couching the relevant contrast in terms of strings and sets is misleading: 1) the intended effect that linear order cannot be referenced does not rule out string-based grammars, and 2) sets do not prevent you from referencing linear order.

    Let's look at 1 first: The core aspect that is meant to be captured is succinctly expressed via permutation closure with ordered trees: tree t with subtree s(X,Y) is well-formed iff the result of replacing s(X,Y) in t by s(Y,X) is well-formed. A PSG that needs to satisfy this can no longer use rules of the form X -> Z | _ Y to the exclusion of X -> Z | Y _, or X --> A B to the exclusion of X --> B A. The left-sibling order has become irrelevant for rule application. So an ordered data structure does not imply that this order can be meaningfully referenced.

    As for 2: every MG derivation tree can be represented via nested sets as is familiar from Bare Phrase Structure grammar, yet you can reference the order of the linearized surface string because it is implicitly encoded by the sequence of Merge and Move steps. As far as I can tell, none of the technical assumptions about narrow syntax that have been put forward in the literature prevent you from doing that.

    Am I splitting hairs? Maybe. But it seems more prudent to me to first define as your main insight the property that is to be captured, and then propose a specific mechanism to guide intuition --- rather than the other way round. Because the implementation is a lot harder to make watertight, and also more specific than necessary.

    ReplyDelete
    Replies
    1. If Gs truck in derivations from phrase markers (PM) to phrase markers (something that I believe you might not agree with) then one way, a very good way, of explaining why certain kinds of operations don't apply is by noting that PMs don't code for the relevant information. Now, this does not mean that even if it DID code for it this info could not be ignored. Of course it could be. Imagine PMs as strings with the injunction that we ignore the non-hierarchical features. There, Gs can't advert to strong properties. Does this explain anything? Not to me, even though the G will not have rules sensitive to linear features of PMs. So, that we can define PSGs that are built on strong that don't advert to such properties is not interesting. The question is why it cannot and, for example, why the PSG you cite above has the property you give it. The question, in other words, is not can we write a G based on strong that ignores them. The question is why such a G would? A G without the relevant PMs need not face this question.

      Ok, the second point: If the mapping is from PMs to PMs and these are sets then I am unclear how the implicit linear order assumptions can be referenced. Maybe you could elaborate. My understanding was that the rules mapping the derivation trees to strings and interpretations was consistent with all sorts of possible mappings and that the derivation tree underdetermined any particular output. You are saying that this is not so, or are you?

      Chomsky believes that he has arguments for merge and arguments for PMs being sets. If they are, structure dependence seems to follow on the assumption that Gs map PMs into PMs into interfaces. If syntax only manipulates PMs (not interface objects) then so fas as I can see, the structure dependence of rules follows.

      Delete
    2. Sorry, huge post incoming even after I edited it down a lot, so most of your specific points will go unaddressed.

      Let's look at the big issue here: description VS explanation. I completely agree with you that we want an explanation, but I don't agree that the standard story provides a real explanation. By itself, it has no restrictive force, and to make it more restrictive you need stipulations that are no better than just flat-out saying "we want permutation closure". I'd even say they are worse because they take the property, cut it up, distribute it over many subparts of the grammar, and thus make it very hard to derive in a uniform way.

      So what are those stipulations? In order to block any of the coding tricks I have in mind, you need to assert that:

      1) There is a fixed, universal set of category features and that every language uses the majority of those categories; status: an easy sell for most linguists

      AND

      2) There is no derivational lookback or look-ahead; status: most people are on board with that

      AND

      3a) Syntax is insensitive to c-command; status: tough sell, c-command still has a central role in a lot of research

      OR

      3b) Syntax can employ c-command unless that would allow it to infer string order; status: circular, you're assuming what you seek to derive

      AND

      4) The feature components of every lexical item can vary so widely that one cannot safely infer its original feature configuration (pre-checking/valuation) from the surrounding structure; status: I'm not aware of any claims in either direction; interesting question though

      So you need 2A or 2B, and neither one is too great a choice. Now I realize that this thought experiment is fairly unconvincing without details --- I cut this part short since it just amounted to me listing various coding tricks and how you would probably discard them as violating the spirit of some other assumptions. But that is exactly my point: you need a rich network of assumptions to get anything from the set PM idea. It is not a light-weight explanation, it comes with lots of ballast.

      I'm sure you still disagree. Fair enough, but let's at least see if we can agree that there is an interesting alternative approach...

      Delete
    3. Suppose that we just put permutation closure out there as a general property that is to be derived. Then you have multiple attack vectors; I'll just discuss one here. With PSGs, you can note that a grammar that is permutation closed is smaller than one that is not because the former need not repeat rules with different linear orderings. Instead of specifying X --> Y Z and X --> Z Y, one is enough, the other option can be inferred. That's a factorial compression in the best case, which is huge. In MGs, you also get a more compact grammar because you do not need to distinguish between features for left and right arguments, there's just arguments.

      But why then aren't languages completely free in their word-order if that's the most compact specification? Who knows, might be processing limitations, information structure, some interface requirement. Whatever the reason, we can actually try to calculate if the LCA is a good or maybe even the optimal solution to the problem of minimizing grammar size through permutation closure while fixing word order. The drawback is that you now need to grow the grammar a bit to accommodate new movement steps, but if you already have a certain amount of movement (e.g. due to semantics) the cost may be negligible.

      That's all speculation of course, I haven't done the calculations (I probably should). But the crucial point is that by sticking to one well-defined property, I keep the scenario simple enough that I can do those calculations and explore these ideas. I could also wonder how, say, GPSG's LP/ID rule format would fare in comparison. Because it always boils down to the clear-cut property of permutation closure, rather than the much more ephemeral idea of set PMs, which makes vastly different predictions based on your other assumptions.

      I like to have a playground where we can give very different explanations of one and the same thing, and where the thing that is to be explained is sufficiently clear-cut that no hidden assumption can change what it does. If you immediately take permutation closure out of the picture, you lose that playground, and I don't see that you gain much in exchange.

      Delete
  2. @Thomas. Permutation closure isn't a property that natural language grammars appear to have, so it's presumably not a property that we're trying to derive. (If the PS rules or equivalents do not generate strings it makes no sense; if they do it's empirically false.)

    The question is why syntactic rules don't appear to have access to linear information. You are right of course that switching from strings to sets doesn't in itself necessarily render this information unavailable, since it could be coded in features etc. But as you've pointed out, pretty much any constraint on the syntax is formally toothless without adequate restrictions on the accompanying feature theory. So the point is well taken, but I'm not sure it's a problem with the "sets not strings" hypothesis itself.

    ReplyDelete
    Replies
    1. Whether permutation closure is empirically true or not isn't really the issue. After all, you have no direct evidence for set PMs either. And you can't embrace the latter on empirical grounds while rejecting the former. So whoever likes the idea of set PMs also has to consider the permutation closure alternative.

      My point was that instead of using set PMs and focusing on that set-like nature of structures, you can instead posit a higher-order property (one of languages rather than structures) that does not commit you to a specific encoding. And that this is advantageous because it clears up certain conceptual issues (linear order in the structure and rules referencing linear order are two completely different things) and opens up new ways of thinking about why string order does not matter, like my grammar size thought experiment above.

      Btw, that string order is irrelevant is not an obvious truism. For instance, while first-conjunct agreement can be independent of string order (true first-conjunct agreement), there is no such thing as true last-conjunct agreement --- if you see last-conjunct agreement, that must be the linearly closest conjunct. Of course one can still recast that in structural terms (though not as directly), or maybe morphosyntax is not part of syntax proper. And it's not really pertinent to this discussion anyways. But since you brought up the empirical status of permutation closure, discussion of empirical data may make for a nice addition to this debate.

      Delete
    2. @Thomas. I don't follow this at all. The claim that PMs are closed under permutation is not empirically supported, and does not ensure that syntactic rules are unable to make reference to linear order. (Even if PMs are closed under permutation, you can still formulate rules like "move the first Auxiliary to the front of the sentence", since any given PM will order its terminals.)

      So I can't see any reason at all to entertain the hypothesis that PMs are closed under permutation.

      I can, however, see a reason to entertain the hypothesis that PMs are sets of some kind, since this might go some way to explaining why syntactic rules don't appear to make reference to linear order. No doubt there may be some gaps in this line of reasoning that need plugging, but it at least seems to start off in the right direction.

      Delete
    3. Suppose you have a set PM of the form {A, B}. That set has no intrinsic order, so you cannot say things of the form "if A precedes B in the structure, merge C, else D". You might still be able to say "if A linearly precedes B in the output string, merge C, else D". That depends on your mapping from PMs to strings. For instance, if you know the head-argument relation between A and B and linearlization is determined by that, you can still reference string order. So you cannot reference the order in the structure, but maybe in the output string depending on additional factors.

      Now let's do it with an ordered pair and permutation closure as a requirement. Can you have a rule of the form "if A precedes B in the structure, merge C, else D"? No. Because that would entail that your language contains > but not >, violating permutation closure. But as before you might be able to reference linear order if the linearization mechanism is determined by something like the head-argument relation so that has the same string linearization as . So you cannot reference the order in the structure, but maybe in the output string depending on additional factors. Exactly as before.

      Delete
    4. Damn, the pairs got eaten by the HTML parser, here's the relevant passage again:

      Now let's do it with an ordered pair [A,B] and permutation closure as a requirement. Can you have a rule of the form "if A precedes B in the structure, merge C, else D"? No. Because that would entail that your language contains [C,[A,B]] but not [C,[B,A]], violating permutation closure.

      Delete
    5. I am not quite following this discussion but it is on a topic that I am very interested in so that's frustrating. Can we tighten up the dialectic a bit?

      The empirical question is whether syntax has access to linear order or not. Is this vacuous or not? It seems to depend on some nontrivial constraints on the mappings to the interfaces, but I don't understand what they are.

      The debate seems a bit like the debate over compositionality post Zadrozny.

      Delete
    6. @Alex C: The crucial distinction is between linear order in the output string and linear order in the structure and/or rewrite rules. Let's call the former string precendence, the latter structure precendence. The empirical claim is that string precedence is not a factor for syntax.

      Why would one posit that? Two reasons. One argument is the example given by Alex D above, that no language has a rule of the form "front the first auxiliary in the string". That is the weak argument because you can't elegantly state that rule even in formalisms that have linearly ordered structures but restrict linear order to siblings, e.g. CFGs. If A is the left sibling of B, that's easy to state in CFGs. Capturing that all nodes of the subtree rooted in A are to the left of all nodes in the subtree rooted in B cannot be done without introducing many new non-terminal symbols, which makes the grammar much bigger.

      The stronger argument is that structure precedence is not a conditioning factor, either. So a contrast like John_i likes him_i VS *He_i likes John_i is due to other structural factors, e.g. c-command. And this is then supported indirectly by further evidence like Which claim that John made did he like VS *Which claim that John is amazing did he like.

      An even stronger argument is that you can easily define unattested word order if you can define arbitrary linear orders between siblings. But this has been challenged (see all the recent work on headedness parameters), and the same goes for the Principle C data above (Ben Bruening's paper on phase-command and precedence).

      I'm not concerned with the empirical status of string order or even structural order in syntax. My first post was a formal remark: 1) ruling out structural order does not prevent you from referencing string order, nor the other way round, and 2) assuming that your structures are linearly ordered does not mean that this linear order can be a factor for well-formedness. But it also had a methodological component: if you do not want structure precedence to be a factor, set PMs are not a good way of going about it because you're casting a general property in terms of a specific implementation, which restricts the ways you can study and explain the property you are trying to capture. If you assume right away that syntactic structures can never be ordered, then there's no way to think about why permutation closure (i.e. the lack of meaningful structure precedence) may be an advantageous property even if you have linearly ordered structures.

      Delete
    7. @Thomas: Let me follow up on Alex D's worry (which I share). If we adopt the permutation closure perspective, then any given object will have a particular order, to which rules can refer. Of course, all 'permutations' of that object will exist in the grammar as well, and the rules will be able to refer to their idiosyncratic orders. I could understand it if you were to say "even though syntactic rules can make reference to order, because all orders are possible, it will appear from outside the grammar as though there were no effects of order."

      @AlexC: Getting right to the heart of the matter! I would like to better understand this too. It is not clear to me that the right place to look is at the interface maps. I think (of course) this must be about which heads syntax can construct dependencies between, and the proposal is that c-command between the two heads is all that is needed to determine this. Although there are proposals to the effect that c-command is strongly related to linear order, this computation includes a hereditary condition (if A c-commands B, then all dominated by A precede all dominated by B) that isn't expressible using just atomic c-command predicates.

      The reason why I am dubious about the relevance of interface maps is that I don't see a way of expressing this formally; the maps will involve some sort of (finite state) transduction, and (finite amounts of) linear order can be encoded into the states. The nature of the states, and linguistic categories, is still formally unconstrained, as Thomas always points out.

      Delete
    8. @Greg: I'm fine with that paraphrase. As far as I can see, it is fully compatible with everything I've said so far.

      Delete
    9. @Thomas. Right, but that just means you need to pair up an optional "put the first auxiliary at the front" rule with an optional "put the first auxiliary at the end" rule. Supposing that movement of the auxiliary leaves nothing behind, the resulting grammar would satisfy permutation closure (assuming that the tree language obtained by removing both rules from the grammar already does). In effect, the rule would end up being "front any auxiliary", since you can get any node to be the first node in the tree by permutation. That's not the sort of rule that we usually want to rule out when we're thinking about prohibitions on the use of linear information, but it's arguably still an "unnatural" kind of rule. The broader point is that if we try to use permutation closure as a substitute for a direct prohibition on rules that make reference to linear order, the consequences could potentially be a bit surprising, and I'm not convinced that they've been properly worked out. (But if there are relevant proofs or whatnot already then I'd certainly be interested to take a look.)

      Delete
    10. Oops, meant to say that you can get any node of category C to be the first node of category C by permutation.

      Delete
    11. that just means you need to pair up an optional "put the first auxiliary at the front" rule with an optional "put the first auxiliary at the end" rule.
      String-first or structure-first? Let's think through both cases:

      1) As I briefly mentioned in my reply to Alex C, the property of being structure-first is not necessarily definable over an ordered tree. A CFG cannot do it without refining categories --- unless you treat the precedence relations between leaves as part of your graph structure (which trees usually do not), you need to define it in terms of the left-sibling relation and reflexive dominance. That actually takes quite a bit of expressivity, more than seems to be needed for any other syntactic phenomena. So the non-existence of "structure-first auxiliary" rules is a very weak argument for set PMs.

      2)If you meant string-first auxiliary, both approaches are in the same boat. The inability to target the string-first auxiliary must be derived from assumptions about computational limitations. For set PMs, you can always compute string precedence via the LCA (linguists do that all the time when they look at a tree), so you have to forbid syntax from doing that. If set PMs get to forbid that, so does any other view where string precedence and structure precedence are dissociated.

      3) The "front any auxiliary" rule isn't ruled out by set PMs either. It's a problem for both views. We usually assume some structural minimality condition, but that doesn't hinge on the presence or absence of structure precedence.

      4) The front/end part is again ambiguous between string and structure, but you won't find any major discrepancies between the two views. With set PMs, movement of X to a new specifier at the root of the tree does produce an element that could be string-first or string-last. Only the asymmetry introduced by the LCA tells you that it is string-first. But the LCA behaves exactly the same over ordered and unordered structures, so there's no noteworthy difference here. And the same parallels are found with structure-first/last.

      There seems to be a belief that the LCA follows naturally from set PMs (but not from something like permutation closure). I briefly explained my thinking on that in my reply to Norbert: a priori syntax could just have a simple "randomly pick a sibling order" mechanism, which from the outside would look exactly like the simple "read the leafs from left to right" mechanism in a permutation-closed language. It's not hard to come up with reasons why humans would struggle with such a language. So some specialized linearization mechanism is needed in either case, and neither format gives you a straight line towards a particular mechanism.

      But permutation closure has a tiny methodological advantage: it immediately allows you to compare grammars generating languages with permutation closure to other grammars, and you'll notice that the former can be specified more compactly. Of course the same compactness also holds for set PMs, but for those nobody entertained the question because we stipulated it away by removing all order from syntax.

      Delete
    12. @Thomas.

      I meant string first.

      Suppose that we don't want to simply stipulate that there can't be syntactic rules specified in terms of linear precedence. Your suggestion, as I understand it, was that the effects of such a ban could be derived via the permutation closure requirement.

      If rules referring to linear precedence are permitted, then we might expect to find rules that identify the linearly first element of some category C and do something with it (say, move it).

      If this is possible, then given the permutation closure requirement, it should be possible, in effect, for such a rule to locate any element of category C in the structure, since for any element X of category C in a tree S, there is a permutation S' of S such that X is the first element of category C in S'. The exact implications of this will obviously depend on all the gnarly details of the formalism under consideration, but we probably don't want to admit what in effect amount to rules of the form "do [something] to any C in the structure".

      The rule, of course, has to have an output that respects permutation closure when combined with the other rules of the grammar, but that is not difficult to arrange.

      If we just assume that the structures generated have no inherent ordering, then these issues do not arise. So this seems like a potential problem with the approach you were suggesting that doesn't arise for the "sets not strings" approach.

      Is it a real problem or one that's easily fixed? I don't know. As for the gaps in the "sets not strings" argument, I agree with you on these points, as I indicated earlier.

      Delete
    13. @Alex: Suppose that we don't want to simply stipulate that there can't be syntactic rules specified in terms of linear precedence. Your suggestion, as I understand it, was that the effects of such a ban could be derived via the permutation closure requirement.
      Yes and no. By itself, no assumption about syntactic structure can do anything for you about that if you have already dissociated string precedence from structural precedence. That's what I meant with independent computational restrictions on how much syntax is allowed to infer from its own representations. As long as linear order is determined by syntactic structure, it is implicitly encoded in that structure and hence accessible. Unless we have independent evidence that reconstructing that information exceeds the computational limits of syntax, the best case scenario is a succinctness argument.

      Anyways, the main point of my original post is that this dissociation between string precedence and structural precedence --- which is the Minimalist default --- can be derived with ordered structures, too:

      1) Suppose you want to tie string precedence to structural precedence.
      2) Suppose that permutation closure is a good thing for grammar compactness.
      3) Then you would end up with a language with extremely free word order (way beyond what a free word-order language allows).
      4) There's independent reasons to believe that such grammars would violate some other constraints of human cognition (e.g. due to high memory load).
      5) Since you can't do much about 4, and 2 has its advantages, 1 has to go.
      6) So even with ordered structures you wouldn't want to tie string precedence to that order.

      My initial post didn't articulate that very lucidly --- that's one of the nice things about FoL debates, I get to understand my own ideas more clearly.

      If we just assume that the structures generated have no inherent ordering, then these issues do not arise.
      Why do they not arise? "Move any C" is in no way at odds with set PMs, additional locality assumptions are needed to rule it out. Is the idea that permutation closure somehow elevates "Move any C" into the league of natural rules whereas set PMs do not? If so, I simply don't understand how you derive that difference.

      Delete
    14. Is the idea that permutation closure somehow elevates "Move any C" into the league of natural rules whereas set PMs do not?

      Sort of. The worry was just that it seems that some instances of such rules will effectively count as natural if "move the first C" counts as natural, since the effect of such rules could be derived by a "move the first C" rule combined with other suitably constructed rules. Let me try again. Say you start out with a tree language L, which may or may not have permutation closure. You then define the language L' of trees derived from L by any sequence of zero or more applications of R1 and R2. R1 permutes the order of sisters. R2 locates the linearly first thing of category C and does something with it (say, moves it to attach at the root on the left). Given R1, L' has the permutation closure property. Now you can in effect move any C, but using rules which seem like pretty good examples of extremely "natural" structure-dependent and linear rules.

      So the point is that putting linear order back into syntactic structures and adding a permutation closure requirement could, perhaps, have unexpected and unwanted side-effects. In particular, even if you block "move any C" rules (which of course, any theory has to do one way or another), they can sneak in through the back door through a mechanism that's not available if syntactic structures don't have an inherent ordering.

      I do grasp the point that the removing explicitly encoded order from the structures still allows rules to access the order implicitly encoded in hierarchical relations. But that's a problem for everyone who (i) acknowledges the existence of hierarchical structure and (ii) wants not to have syntactic rules that (in effect) refer to linear order. The problem I'm talking about seems to be a different, additional problem. Or I should say, potential problem. It all depends on how the details work out.

      Delete
  3. This comment has been removed by the author.

    ReplyDelete