Comments

Showing posts with label c-command. Show all posts
Showing posts with label c-command. Show all posts

Sunday, May 18, 2014

SMT III

Much as I enjoy the cut and thrust of debate about the discoveries of Generative Grammar and their significance for understanding FL, I am ready to change the topic, at least for a while. Before, doing so, let me urge those of you who have not been following the comment threads of my last two posts to dip into them. IMO, they are both (mostly) entertaining and actually insightful. I may be over-concluding here, but it looks to me that we have reached a kind of consensus in which almost everyone (there is one conspicuous exception, and I am sure regular readers can guess who this is) concurs that GG has made many serious empirical discoveries (what I dubbed "effects") that call for theoretical explanation.  With this consensus in hand, let’s return to the SMT.

In two previous posts (here and here), I outlined a version of the SMT that had several nice properties (or at least I thought them nice). First, empirically it linked work on syntax directly to work in psycho and vice versa, with results in each carrying clear(ish) consequences for work in the other. The SMT mediated this cross fertilization by endorsing a strong version of the transparency thesis wherein the performance systems used the principles, operations and representations of the competence systems to do what they do (and, this is important, do so well). I offered some illustrations of this two-way commerce and touted its virtues.

The second nice property of this version of the SMT is that it promises to deliver on the idea that grammars are evaluable wrt computational efficiency. Minimalists love to say that some property enhances or detracts from computational efficiency or adds or reduces computational complexity and our computational colleagues never tire from calling them/us out on this.[1] The SMT provides a credible sense in which grammars might be computationally efficient (CE). Grammars, operations, representations etc. are CE just in case transparently embedding them within performance systems allows these performance systems to be efficient. What’s ‘efficient’ mean? Parsers that parse fast are efficient. If these parsers are fast (i.e. efficient) (in part) because they embed grammars with certain specifiable properties then we can say that these grammars are CE. Ditto with acquisition. The SMT conjectures that we are efficient at acquiring our native Gs (in part) because UG has the properties it does. Thus, Gs and UGs are efficient to the degree that they explain why we are so good at doing (performing) what we do linguistically. Thus, given the SMT, Gs and UGs can be vicariously CE (VCE). I have been arguing that minimalists should endorse VCE as what they intend when claiming computational virtues for their minimalist proposals.

Say you buy this much. It would be useful to have a couple of paradigm examples of how to make the argument linking the properties of Gs and UG to CE in performance systems. Indeed, wouldn’t it be nice to have stories that take us from properties like, say, Extension, Cyclicity, and C-command to fast parsing and easy learnability. Fortunately, such illustrative examples exist. Let me bring a nice compact, short, and easily readable one to your attention. Berwick and Wexler (B&W) (here) provide a paradigm case of what I think we need. This paper was written in 1987 (the 80s really were a golden age for this sort of stuff, as Charles has noted), and sad to say, the wisdom it contains seems to have been almost entirely lost. In what follows I give a short précis of the B&W argument, highlighting what I take to be those features of the approach that it would behoove us to “rediscover” and use. It’s a perfect example of SMT reasoning.

B&W focuses on showing that c-command (CC) is “a linguistically-motivated restriction [that] can also be justified on computational grounds” (48)).  How do B&W show this? There are two prongs to the argument. First, B&W show how and under what conditions CC would enhance antecedence retrieval. The main result is that for trees that are as deep as they are wide the computational savings are a reduction in search time from N to log N (N = number of terminals) (48) when CC is transparently embedded in a Marcus Parser (M-Par).[2]

B&W then show that CC follows from a particular property of M-Pars, which B&W dub “constituency completeness” (55). What is this? It is the assumption that “the “interior” of any phrase attached to a node on the active node stack…[is] opaque to further access” (55). More specifically, “once a phrase has been completely built” it is “attached as a single, opaque object to its proper dominating phrase…Crucially, this means that the … [node] now acts as a single, opaque unit…[that will not be]…accessible to syntactic analysis.” As B&W note, “this restriction is simply the core notion of c-command once again” (50).

Constituent Completeness has an analogue within current syntax. It is effectively the Extension Condition (EC). EC states that once a constituent is built it cannot be further reconfigured (i.e tampered with). Furthermore, as several have noted, there is a tight connection between EC and CC, at least for some class of dependencies.[3] It is interesting to see these connections foreshadowed in B&W. Note that the B&W discussion in the current theoretical context lends credence to the idea that EC promotes CE via its relation to CC and M-Pars rapid parsing.

B&W observe that Constituent Completeness (aka, EC) has another pleasant consequence. It’s pivotal in making M-Pars fast. How so? First M-Pars is a species of Bounded Context Parsers (BCP). BCPs (and hence M-Pars) are fast because they move forward by “examining strictly literal contexts around the current locus of parsing” (55).  Thus, parsing decisions can only consult “the local environment of the parse.” To implement this, such local environments must be represented in the very same code that linguists use in describing syntactic objects:

…[a] decision will be made by consulting he local environment of the parse – the S and VP nodes, with the attached verb, and the three input buffer items. Further these items are recorded exactly as written by the linguist – as the nodes S, NP, VP, V and so forth. No “additional” coding is carried out. It this literal use of the parse tree context that distinguishes bounded context parsing…

Thus, Constituent Completeness (viz. EC) “effectively limits the left-hand parsing context that is available…[and] is a necessary requirement for such a parser to work” (55).

In other words, something very like EC contributes to making BCPs/M-Pars fast. Additionally, Constituent Completeness and the transparency assumption together motivate the Berwick and Weinberg proposal that something very like bounded cyclic derivations are necessary for efficient parsing given the relation between bounded left contexts and fast parsing. Every grammatical theory since the mid 80s has included some way of representing bounded cycles (i.e. either phases + PIC or Barriers+Subjacency or Bounding nodes + subjacency). Indeed, as you all know, Berwick and Weinberg argued that Subjacency was sufficient to provide bounded left contexts of the kind their parser required to operate quickly.  In sum, the B&W paper shows how EC (in the guise of Constituent Completeness) and something like bounded domains of computation (phases/subjacent domain) in the context of a M-Parser together can conspire to yield fast parsing. If so, this supports the view that something like EC and phases are computationally efficient. Wow!!

B&W doesn’t stop here. It goes on to speculate about the relation between fast parsing and easy learnability. Wexler and Culicover showed that grammars that have the BDE property (bounded degree of error) can be learned on the basis of degree 2 data.[4] It is possible that BDE and BCP are closely related. Berwick (here) showed that one can derive BDE from BCP and that both properties rely on something like EC and bounded domains of computation (which, to repeat, something like phase/subjacency theory would provide). B&W suggest that BDE might in turn imply BCE, which, if correct, would further support the idea that notions like EC and phases are CE. Indeed, should it prove possible to prove that Gs are easily learned iff they are quickly parsed and that both quick learning and speedy parsing leverage specific properties of G and UG like EC, phases/subjacency, CC etc. then we will have taken a big step in vindicating the SMT.

Let me end with two last comments on B&W.

First, one of the key features of the paper is the proposal to take M-Pars/BCPs as proxy models for efficient parsing and to then study what enables them to be so good. What B&W finds is that part of what makes them good are the data structures they use with the particular properties they encode. Thus, the choice to study M-Pars/BCPs is the choice to study parsers (and maybe BDE learners) “grounded in particular linguistic theories” (58).  As B&W note, this approach is quite unlike what one finds in formal learning theory or “the general results obtained from the parsability of formal languages” (58). B&W starts from a consideration of a narrower class of languages that are “already known to be linguistically relevant” (59). The aim, as B&W sees it, is to evaluate the impact on computations that the data structures we know to be operative in natural language Gs and UG have. As B&W puts it, what the paper develops is a model for studying “the interactions between data structures and algorithms…[as a way] to develop more computationally based linguistic theory” (51). In other words, it offers an interesting and concrete way of understanding the current minimalist interest in CE.

Second, as B&W stresses again and again, this is not the “last word” on the topic (53). But, to my eyes it is a very good first word. In contrast to many computational approaches to parsing and learning, it takes linguistic theory seriously and considers its implications for performance. Let me quote B&W:

…what we want to illustrate here is not the final result but the method of study. We can assess the relative strengths of parsability and learnability in this case, but only because we have advanced specific models for each. These characterizations are still quite specific, being grounded in particular linguistic theories. The results are therefore quite unlike the formal learning theories of Gold (1967), or more recently, of Osherson, Stob and Weinstein (1982) nor are they like the general results obtained from the analysis of the parsability of formal languages. Rather, they hold of a narrower class of languages that are already known to be linguistically relevant. In this respect, what the theories lose in terms of invariance over changes in linguistic theories, they gain in terms of specificity.[5] (58-9)

Thus, it offers a concrete way of motivating actually proposed principles of FL/UG on computational grounds. In other words, it offers a concrete way of exploring the SMT. Not bad for a 1987 paper that has long been ignored. It’s time to go back to the future.




[1] I still fondly recall a day in Potsdam several years ago when Greg Kobele suggested in the question period after a talk I gave that any minimalist claims to computational efficiency are unfounded (actually, he implied worse, BS being what sprang to my mind). At any rate, Greg was right to push. The SMT posts are an attempt to respond.
[2] For trees that are not “perfectly balanced” (i.e. not as deep as wide) the computational savings decline until they go to zero in simple left branching sentences.
[3] Epstein has noted this connection. Hornstein 2009 discusses it ad nauseum and it forms the basis of his argument that all relations mediated by CC should be reduced to movement. This includes pronominal binding. This unification of binding with movement is still very controversial (i.e. only Kayne, Sandiway Fong, me and a couple of other crazies think it possible) and cannot be considered as even nearly settled. This said, the connections to B&W are intriguing.
[4] Like real time parsing, I suspect degree 2 is too lax a standard. We likely want something stricter, something along the lines of Lightfoot’s degree 0+ (main clauses plue a little bit).
[5] This is important. Most formal work on language quite deliberately abstract away from what linguists would consider the core phenomenon of interest: the structure of Gs and UG. They results are general because they are not G/UG dependent. But this is precisely what makes them the wrong way for investigating G/UG properties and for exploring the SMT.

There is a counter argument, that by going specific one is committing hostages to the caprice of theory. There are days where I would sympathize with this. However, as I’ve not yet gone tired of repeating, the rate of real theoretical change within GG is far slower than generally believed. For example, as I noted in the main body of the post, modern theory is often closely related to old proposals (e.g. phases/subjacency or CC/EC). This means that theory change is not quite as radical as often advertised and so the effects of going specific not nearly as baleful as often feared.  This said, I need not be so categorical. General results are not to be pooh-poohed just because they are general. However, as regards the SMT, to the degree that formal results are not based in grammatically specific concepts, to that degree they will not be useful for SMT purposes. So, if you are interested in the SMT, B&W is the right way to go.

An aside: the branch of CS that that B&W take to be of relevance to their discussion is compiler theory, a branch of CS which gets its hand dirty with the nitty gritty details.

Friday, February 14, 2014

Derivation Trees and Phrase Structure

Another snow day, another blog post. Last week we took a gander at derivation trees and noticed that they satisfy a number of properties that should appeal to Minimalists. Loudmouth that I am I took things a step further and proclaimed that there is no good reason to keep using phrase structure trees now that we have this shiny new toy that does everything phrase structure trees do, just better. Considering the central role of phrase structure trees, that's a pretty bold claim... or is it?

Friday, November 16, 2012

QR?


 Caveat Lector! What follows is wonkier than what I have posted heretofore.

Chris Barker has a remark in the recent issue of LI where he presents, what seems to me, pretty convincing evidence that quantificational binding (of a pronoun) does not require c-command, wider scope suffices.  More particularly, he adopts Safir’s scope requirement:

(1) A quantifier must scope over any pronoun that it binds.

and operationalizes scope with (2):

(2) A quantifier can take scope over a pronoun only if it can take scope over an
                 existential inserted in the place of the pronoun.

The test in (2) conceptually divorces ‘scope’ from ‘c-command’ and allows one to investigate whether the two move in tandem, i.e. whether a scopes over b iff a c-commands b. The data Barker presents shows pretty clearly that though c-command is a sufficient condition for scope, it is not obviously necessary.[1] 

Barker’s interesting discussion serves a (perfectly reasonable) PR function. He has another way of doing binding/scope that fits well with the conclusion that it is not tied to c-command.[2] He does not argue in detail against theories that have tried to save the Scope/C-command generalization, but his congeries of counter-examples certainly suggests that defenders of the orthodox  have their work cut out for them (including an earlier counterpart of me should I/he be tempted to defend (2)). Though I could quibble with some of the data (there are a lot of cases involving ‘each,’ which is known to be recalcitrant in many ways, e.g. WCO effects with ‘each’ are quite attenuated (his1 mother considers/believes each boy1 to be perfect/His1 mother yelled at each boy1 to stop shouting), I found weight of data and Barker’s and conclusion based on it pretty convincing.  At the very least, he has made a good case to re-open our minds about the relation between c-command and scope.

This said, I want to point out another consequence of Barker’s argument should his conclusion prove correct. It suggests that UG does not concern itself with quantifier scope and that QR, the grammatical vehicle for adjusting phrase structures in order to bring scope and c-command together, is not a rule of grammar.  This view has a pedigree. Tony Kroch’s thesis argues that the grammar does not regulate variable Q-scope. In fact, it was Chomsky’s work on WCO and the noted parallel between (3a,b) that prompted taking quantifier scope to be a species of A’-movement:[3]

            (3)       a. *Who1 does his1 mother love t1
                        b. *His1 mother loves everyone1

(3a) resists a paraphrase as whose mother loves him and (3b) is infelicitous as everyone’s mother loves him.  Chomsky’s very reasonable point was that the facts in (3) could be collapsed if WCO were an LF fact and (3b) was covertly transformed via QR into a structure analogous to that in (3a).

Soon enough, other facts arose that supported this view (the curious can look at chapter 1 of my book Logical Form for a review of some of this).  However, Barker’s paper suggests that this data is not representative and that the grammar does not regulate scope via QR at all. The paper cites examples where quantifiers scope out of subject islands, relative clauses, subject sentences and adjuncts.  If these data are more or less accurate, then it suggests that QR is not a rule of grammar.  More exactly, QR seems unregulated by the kinds of locality conditions we expect movement operations to be subject to.

There remains a way of finessing this conclusion. Maybe covert movement is not subject to locality restrictions of the kind we see for overt movement. This is a venerable assumption.  Those predisposed to single-cycle theories of syntax (that’s you minimalists out there) might find this assumption challenging.  I suspect that there are ways around this (e.g. treating islands as effectively PF linearization phenomena) but that too will take some work.  At any rate, Barker’s conclusion, if correct, has interesting consequences.

I end this rambling with a question: Why were we so easily convinced that binding required c-command? First, because we understood LF as the input to semantic interpretation and took difference in truth conditions to be sufficient indication of difference in LFs. Thus, if (4) is ambiguous, it must have at least two different LFs. QR delivered two different LFs.

            (4)       At least one boy kissed every girl

Second, as May emphasized, QR was the perfect fix for the regress problem that ACD constructions appeared to present.

The first assumption was now far less attractive, at least to me. There is no reason to assume that the grammar alone determines semantic interpretation. Being one factor among many leaves LF/CI interface issues plenty interesting. The second reason stands. If Barker is right, then it looks like we need to rethink the analysis of ACDs. Some have started doing so, attributing ACD licensing to something like extraposition (Baltin, Fox-Nissenbaum), or ‘afterthoughts’ (Chomsky).  At any rate, the free ride that the QR analysis of ACDs enjoyed by piggy backing on the “independently motivated” rule of QR is probably over.

I confess that I have always found QR a bit unsavory. It didn’t really function like other A’-movement operations (too many landing sites), it really was hard to find overt versions of it (something unexpected if analogized to Wh-movement), it enjoyed different properties than other forms of A-movement (it doesn’t require obligatory reconstruction for if it did it could not license ACDs, but quantified RCs cannot generally drag their sentential parts along for the covert ride for if possible principle C would be systematically violated) etc.  Barker’s paper suggests that we need to rethink not only the c-command condition on binding and scope but the standing of QR as a grammatical operation.



[1] Dave Kush (p.c.) noted that the sufficiency of c-command for scope is quite interesting.  Why should it be that a’s c-commanding b is sufficient for a to scope over b. One can imagine systems where the two notions are entirely divorced and that some c-commanding elements cannot scope over elements they c-command.  Interestingly, one of Reinhart’s tests for bound variable readings of pronouns does seem sensitive to c-command:
(i)             John loves his mother but Bill doesn’t (sloppy ok)
(ii)           John’s father loves him but Bill’s mother doesn’t (no sloppy)
(iii)          Everyone from NYC loves its subway. Does everyone from Boston? (no-ish sloppy)
I find the sloppy readings in (ii) and (iii) marginal at best, though I can be convinced that I am wrong about this.
[2] Earlier work in the mid 80s by George Wilson and Jeff King treats certain cases of bound pronouns as flagged variables in a natural deduction system. This approach also dissociates binding and scope from c-command.  
[3] May’s subsequent work also proved influential.