Faculty of Language: Strings and sets

Sunday, February 12, 2017

Strings and sets

I have argued repeatedly that the Minimalist Program (MP) should be understood as subsuming earlier theoretical results rather than replacing them. I still like this way of understanding the place of MP in the history of GG, but there is something misleading about it if taken too literally. Not wrong exactly, but misleading. Let me explain.

IMO, MP is to GB (my favorite exemplar of an earlier theory) as Bounding Theory is to Ross’s Islands. Bounding Theory takes as given that Ross’s account of islands is more or less correct and then tries to derive these truths from more fundamental assumptions.[1] Thus, in one important sense, Bounding Theory does not substitute for Ross’s but aims to explain it. Thus, Bounding Theory aims to conserve the results of Ross’s theory more or less.[2]

Just as accurately, however, Bounding Theory does substitute for Ross’s. How so? It conserves but does not recapitulate it. Rather it explains why the things on Ross’s list are there. Furthermore, if successful it will add other islands to Ross’s inventory (e.g. Subject Condition effects) and make predictions that Ross’s did not (e.g. successive cyclicity). So conceived, Ross’s island are explanada for which Bounding Theory is the explanans.

Note, and this is important, given this logic Bounding Theory will inherit any (empirical) problems for Ross’s generalizations. Pari passu for GB and MP. I mention this not because it is the topic of todays sermonette, but just to observe that many fail to appreciate this when criticizing MP. Here’s what I mean.

One way MP might fail is in adopting the assumption that GBish generalizations are more or less accurate. If this assumption is incorrect, then the MP story fails in its presuppositions. And as all good semanticists know, this is different from failing in one’s assertions. Failing this way makes you not so much wrong as uninteresting. And MP is interesting, just as Bounding Theory is interesting, to the degree that what it presupposes is (at least) on the right track.[3]

All of this is by way of (leisurely) introduction to what I want to talk about below. Of the changes MP has suggested I believe that most (or, to be mealy mouthed, one of the most) fundamental has been the proposal that we banish strings as fundamental units of grammar. This shift has been long in coming, but one way of thinking about Chomsky’s set theoretic conception of Merge is that it dislodges concatenation as the ontologically (and conceptually) fundamental grammatical relation. Let me flesh this out a bit.

The earliest conception of GG took strings as fundamental, strings just being a series of concatenated elements. In Syntactic Structures (SS) (and LSLT for which SS was a public relations brochure) kernel sentences were defined as concatenated objects generated by PS rules. Structural Descriptions took strings as inputs and delivered strings (i.e. Structural Changes) as outputs (that’s what the little glide symbol (which I can’t find to insert) connecting expressions meant). Thus, for example, a typical rule took as input things like (1) and delivered changes like (2), the ‘^’ representing concatenation. PS rules are sets of such strings and transformations are sets of sets of such strings. But the architecture bottoms out in strings and their concatenative structures.[4]

(1) X^DP₁^Y^V^DP₂^Z

(2) X^DP₂^Y^V+en^by^NP₁

This all goes away in merge based versions of MP.[5] Here phrase markers (PM) are sets, not strings and string properties arise via linearization operations like Kayne’s LCA which maps a given set into a linearized string. The important point is that sets are what the basic syntactic operation generates, string properties being non-syntactic properties that only obtain when the syntax is done with its work.[6] It’s what you get as the true linguistic objects, the sets, get mapped to the articulators. This is a departure from earlier conceptions of grammatical ontology.

This said it’s an idea with many precursors. Howard Lasnik has a terrific little paper on this in the Aspects 50 years later (Gallego and Ott eds, a MITWPL product that you can download here). He reviews the history and notes that Chomsky was quite resistant in Aspects to treating PMs as just coding for hierarchical relationships, an idea that James McCawley, among others, had been toying with. Howard reviews Chomsky’s reasoning and highlights several important points that I would like to quickly touch on here (but read the paper, it’s short and very very sweet!).

He notes several things. First, that one of the key arguments for his revised conception in Aspects revolved around eliminating some possible but non-attested derivations (see p. 170). Interestingly, as Howard notes, these options were eliminated in any theory that embodied cyclicity. This is important for when minimalist Chomsky returns to Generalized Transformations as the source of recursion, he parries the problems he noted in Aspects by incorporating a cyclic principle (viz. the Extension Condition) as part of the definition of Merge.[7]

Second, X’ theory was an important way station in separating out hierarchical dependencies from linear ones in that they argued against PS rules in Gs. By dumping PS rules, the relation between such rules and the string features of Gs was conceptually weakened.

Despite this last point, Lasnik’s paper highlights the Aspects arguments against set based conception of phrase structure (i.e in favor of retaining string properties in PS rules). This is section 3 of Howard’s paper. It is a curious read for a thoroughly modern minimalist for in Aspects we have Chomsky arguing that it is a very bad idea to eliminate linear properties from the grammar as was being proposed, by among others, James McCawley. Uncharacteristically (and I mean this is a compliment), Chomsky’s reasoning here is largely empirical. Aspects argues that when one looks, the Gs of the period, presupposed some conception of underlying order in order to get the empirical facts to fit and that this presupposition fits very poorly with a set theoretic conception of PMs (see Aspects: 123-127). The whole discussion is interesting, especially the discussion of free word order languages and scrambling. The basic observation is the following (126):

In every known language the restrictions on order [even in scrambling languages, NH] are quite severe, and therefore rules of realization of abstract structures are necessary. Until some account of such rules is suggested, the set-system simply cannot be considered seriously as a theory of grammar.

Lasnik, argues plausibly, that Kayne’s LCA offered such an account and removed this empirical objection against eliminating string information from basic syntactic PMs.

This may be so. However, from my reading of things I suspect that something else was at stake. Chomsky has not, on my reading, been a huge fan of the LCA, at least not in its full Kaynian generality (see note 6). As Howard observes, what he has been a very big fan of is the observation, going back at least to Reinhart, that, as he says in the Black Book (334), “[t]here is no clear evidence that order plays a role at LF or in the computation from N [numeration, NH] to LF.”

Chomsky’s reasoning is Reinhart’s on steroids. What I mean is that Reinhart’s observations, if memory serves, are largely descriptive, noting that anaphora is largely insensitive to order and that c-command is all that matters in establishing anaphoric dependencies (an important observation to be sure and one that took some subtle argumentation to establish).[8] Chomsky’s observations go beyond this in being about the implications of such lacunae for a theory of generative procedures. What’s important wrt to linear properties and Gs is not whether linearized order plays a discernible role in languages, of course it does, but whether these properties tell us anything about generative procedures (i.e. whether linear properties are factors in how generative procedures operate). This is key. And Chomsky’s big claim is that G operations are exclusively structure dependent, that this fact about Gs needs to be explained and that the best explanation is that Gs have no capacity to exploit string properties at all. This builds on Reinhart, but is really making a theoretical point about the kinds of rules/operations Gs contain rather than a high level observation about antecedence relations and what licenses them.

So, the absence of linear sensitive operations in the “core” syntax, the mapping from lexical items to “LF” (CI actually, but I am talking informally here) rather than some way of handling the evident linear properties of language, is the key thing that needs explanation.

This is vintage Chomsky reasoning: look for the dogs that aren’t barking and give a principled explanation for why they are not barking. Why no barking strings? Well, if PMs are sets then we expect Gs to be unable to reference linear properties and thus such information should be unable to condition the generative procedures we find in Gs.

Note that this argument has been a cynosure of Chomsky’s most recent thoughts on structure dependence as well. He reiterates his long-standing observation that T to C movement is structure dependent and that no language has a linear dependent analogue (move the “highest” Aux exists but move the “left-most” aux never does and is in fact never considered an option by kids building English Gs). He then goes on to explain why no G exploit such linear sensitive rules. It’s because the rule writing format for Gs exploits sets and sets contain no linear information. As such rules that exploit linear information cannot exist for the information required to write them is un-codeable in the set theoretic “machine language” available for representing structure. In other words, we want sets because the (core) rules of G systematically ignore string properties and this is easily explained if such properties are not part of the G apparatus.

Observe, btw, that it is a short step from this observation to the idea that linguistic objects are pairings of meanings with sounds (the latter a decidedly secondary feature) rather than a pairing of meanings and sounds (where both interfaces are equally critical). These, as you all know, serve as the start of Chomsky’s argument against communication based conceptions of grammar. So eschewing string properties leads to computational rather than communicative conceptions of FL.

The idea that strings are fundamental to Gs has a long and illustrious history. There is no doubt that empirically word order matters for acceptability and that languages tolerate only a small number of the possible linear permutations. Thus, in some sense, epistemologically speaking, the linear properties of lexical objects is more readily available (i.e. epistemologically simpler) than their hierarchical ones. If one assumes that ontology should follow epistemology or if one is particularly impressed with what one “sees,” then taking strings as basic is hard to resist (and as Lasnik noted, Chomsky did not resist it in his young foolish salad days). In fact, if one looks at Chomsky’s reasoning, strings are discounted not because string properties do not hold (they obviously do) but because the internal mechanics of Gs fails to exploit a class of logically possible operations. This is vintage Chomsky reasoning: look not at what exists, but what doesn’t. Negative data tells us about the structure of particular Gs. Negative G-rules tells us about the nature of UG. Want a pithy methodological precept? Try this: forget the epistemology, or what is sitting there before your eyes, and look at what you never see.

Normally, I would now draw some anti Empiricist methodological morals from all of this, but this time round I will leave it as an exercise for the reader. Suffice it for now to note that it’s those non-barking dogs that tell us the most about grammatical fundamentals.

[1] Again, our friends in physics make an analogous distinction between effective theories (those that are more or less empirically accurate) and fundamental theories (those that are conceptually well grounded). Effective theory is what fundamental theory aims to explain. Using this terminology, Newton’s theory of gravitation as the effective theory that Einstein’s theory of General Relativity derived as a limit case.

[2] Note that conserving the results of earlier inquiry is what allows for the accumulation of knowledge. There is a bad meme out there that linguistics in general (and syntax in particular) “changes” every 5 years and that there are no stable results. This is hogwash. However, the misunderstanding is fed by the inability to appreciate that older theories can be subsumed as special cases by newer ones. IMO, this has been how syntactic theory has generally progressed, as any half decent Whig history would make clear. See one such starting here and continuing for 4 or 5 subsequent posts.

[3] I am not sure that I would actually strongly endorse this claim as I believe that even failures can be illuminating and that even theories with obvious presuppositional failures can point in the right direction. That said, if one’s aim is “the truth” then a presupposition failure will at best be judged suggestive rather than correct.

[4] For those that care, I proposed concatenation as a primitive here, but it was a very different sense of concatentation, a very misleading sense. I abstracted the operation from string properties. Given the close intended relation between concatenation and strings, this was not a wise move and I hereby apologize.

[5] I have a review of Merge and its set like properties in this forthcoming volume for those that are interested.

[6] One important difference between Kayne’s and Chomsky’s views of linearization is that the LCA is internal to the syntax for the former but is part of the mapping from the syntax proper to the AP interface for the latter. For Kayne, LCA has an effect on LF and derives the basic features of X’ syntax. Not so for Chomsky. Thus, in a sense, linear properties are in the syntax for Kayne but decidedly outside it for Chomsky.

[7] The SS/LSLT version of the embedding transformation was decidedly not cyclic (or at least not monotonic structurally). Note, that other conceptions of cyclicity would serve as well, Extension being sufficient, but not necessary.

[8] It’s also not obviously correct. Linear order plays some role in making antecedence possible (think WCO effects) and this is surely true in discourse anaphora. That said, it appears that in Binding Theory proper, c-command (more or less), rather than precedence, is what counts.

20 comments:

UnknownFebruary 12, 2017 at 8:45 AM
Couching the relevant contrast in terms of strings and sets is misleading: 1) the intended effect that linear order cannot be referenced does not rule out string-based grammars, and 2) sets do not prevent you from referencing linear order.

Let's look at 1 first: The core aspect that is meant to be captured is succinctly expressed via permutation closure with ordered trees: tree t with subtree s(X,Y) is well-formed iff the result of replacing s(X,Y) in t by s(Y,X) is well-formed. A PSG that needs to satisfy this can no longer use rules of the form X -> Z | _ Y to the exclusion of X -> Z | Y _, or X --> A B to the exclusion of X --> B A. The left-sibling order has become irrelevant for rule application. So an ordered data structure does not imply that this order can be meaningfully referenced.

As for 2: every MG derivation tree can be represented via nested sets as is familiar from Bare Phrase Structure grammar, yet you can reference the order of the linearized surface string because it is implicitly encoded by the sequence of Merge and Move steps. As far as I can tell, none of the technical assumptions about narrow syntax that have been put forward in the literature prevent you from doing that.

Am I splitting hairs? Maybe. But it seems more prudent to me to first define as your main insight the property that is to be captured, and then propose a specific mechanism to guide intuition --- rather than the other way round. Because the implementation is a lot harder to make watertight, and also more specific than necessary.
ReplyDelete
Replies
Alex DrummondFebruary 13, 2017 at 3:33 AM
@Thomas. Permutation closure isn't a property that natural language grammars appear to have, so it's presumably not a property that we're trying to derive. (If the PS rules or equivalents do not generate strings it makes no sense; if they do it's empirically false.)

The question is why syntactic rules don't appear to have access to linear information. You are right of course that switching from strings to sets doesn't in itself necessarily render this information unavailable, since it could be coded in features etc. But as you've pointed out, pretty much any constraint on the syntax is formally toothless without adequate restrictions on the accompanying feature theory. So the point is well taken, but I'm not sure it's a problem with the "sets not strings" hypothesis itself.
ReplyDelete
Replies
Alex DrummondFebruary 13, 2017 at 1:38 PM
This comment has been removed by the author.
ReplyDelete
Replies

Add comment

Faculty of Language

Comments

Sunday, February 12, 2017

Strings and sets

20 comments:

Contributors