Faculty of Language: What has the SMT done for you lately?

Monday, May 5, 2014

What has the SMT done for you lately?

A recent post (here) illustrated how the SMT provided a unified framework for various kinds of research into the structure of FL. In particular I reviewed some work showing how certain recent findings concerning online parsing followed were parsers to transparently embed grammars as the SMT would require. The flow of argument in the work described goes from results in syntax to consequences for online measures of incremental parsing complexity. In other words, this is a case where SMT, given some properties of the grammar, makes claims about some property of the interface. Here’s a question: can we reverse the direction of argument? Can we find cases where the SMT does grammatically useful work, in that some property of the interface makes a claim about what grammars must look like? In other words, where the argument moves from some property of the interfaces to some claims about the right theory of grammar?

Before offering some illustrations, let me note that the first kind of argument is nothing to sneeze at if you are interested in discovering the structure of FL (and who isn’t interested in this?). Why? Because the kind of evidence that comes from things like the filled gap effect and the plausibility effect are different from the kind of data that acceptability (under an interpretation) judgments provide. And, as every intro philo of science course will tell you, the best support for a theory comes from different kinds of data all pointing to the same conclusion (this is called consilience (a term Whewell invented)). Consequently, finding online data that supports conclusions garnered from acceptability data is interesting even if one is mainly interested in competence theories.

This said, for purely selfish reasons, it would still be nice to have examples of arguments going in the other direction as well: implications for grammatical theory from psycho considerations. I have three concrete(ish) examples to offer as models; one that I have talked about before (here) based on the work by Pietroski, Lidz, Hunter and Halberda (PLHH), but would like to remind you of, one on how to understand binding domains based on work by Dave Kush (here), and one that is entirely self-serving (i.e. based on some work I did on non-obligatory control) (here ch. 6).

Before proceeding, let me emphasize that the examples are meant to be illustrative of the logic of the SMT. I do actually think that the cited arguments are pretty compelling (of course I LOVE the third one). However, my point here is not to defend their truth but to outline their logic and how this relates to SMT reasoning. Being motivated by the SMT does not imply that an account is true. But given minimalist interests, it is an interesting property for an account to have. This out of the way, let’s consider some cases.

First PLHH’s argument concerning the meaning of ‘most.’ The argument is that there is a privileged representational format for the meaning of ‘most.’ It’s meaning is (1c) and not the truth functionally equivalent (1a) or (1b):

(1) Three possible meanings for ‘most.’

a. [{x: D (x)}, [x: Y (x)}] iff some some set s, s Ì {X: D(x)} and OneToOne [s, {x: Y (x)}]

b. |{x: D (x) & Y (x)}| > {x: D (x) & - Y (x)}|

c. |{x: D (x) & Y (x)}| > |{ x: D (x)}| - |{x: D(x) & Y (x)}|

Why (1c)? Because that’s the one that speakers use when evaluating the quantities of dot arrays when visually presented. And if one assumes that the products of well-designed grammars (e.g. meanings) are transparently used by the interfaces, i.e. if one assumes that the SMT is true, then the fact that the visual system uses representations like (1c) in preference to those in (1a,b) even when the others could be used is evidence that this is what ‘most’ means. In other words, given the SMT and the fact that (1c) is used (and used very efficiently and quickly (see the experiments)) implies that (1c) is the linguistic meaning of ‘most.’

Consider a second case with similar logic. In his thesis and in recent presentations (here), Dave Kush observes that speakers respect c-command restrictions when parsing sentences that involve quantificational binding. More specifically, in parsing sentences like (2a,b), speakers look for antecedents only within the c-command domain (CCD) of the bound pronoun. While parsing, speakers reliably distinguish cases like (2a), where the antecedent c-commands the bound pronoun, from those like (2b), where it doesn’t.

(2) a. Kathi didn’t think that any janitor₁ liked his job when he₁ had to clean up

b. Kathi didn’t think that any janitor₁ liked his job but he₁ had to clean up

Parsing sensitivity to CCDs is further buttressed by the difference found in the online parsing of Strong vs Weak Crossover (S/WCO) effects. Kush provides evidence that incremental parsing respects the former, which invokes CCDs, but not the latter, which does not.[[1]] As Kush notes, this fits well with earlier work on the binding of reflexives and reciprocals. Kush adds some Hindi data on reciprocals to earlier work by Dillon and Sturt on reflexives to ground this conclusion. Taking these various results together, Kush concludes, very reasonably IMO, that online parsing is sensitive to the c-command relations that bound expressions have wrt to their antecedents.

The conclusion, then, is that incremental parsing computes CCDs in real time. Based on this established fact, Kush then asks a second very interesting follow up question: how is this condition implemented in human parsers. He notes the following problem. Human memory architecture appears to be content addressable. He notes that this makes coding CCDs with such an architecture difficult.[[2]] However, the data clearly indicate that we code something like CCDs and do so online quickly. So how is this done? Kush suggests that we do not actually code for CCDs but for something that does similar work, something very like clausemates, the restriction that did the heavy lifting in previous incarnations of syntactic theory. Howard Lasnik and Ben Bruening have recently argued for a return to something like this (Bruening has proposed “phase-command” rather than c-command as the operative condition). Interestingly, as Kush shows, these alternatives to c-command can be made to comfortably fit with the kinds of content addressable memory architectures humans seems we have. Conclusion: our competence grammars use something like clause/phase command conditions rather than CCDs as the relevant primitive relations relevant to binding. Note that the direction of argument goes from online parsing facts plus facts about human memory architecture to claims about the primitive relations in the competence grammar. What’s of interest here is how the SMT is critical in licensing the argument form. Whether Kush is right or not about the conclusion he draws is, of course, important. But IMO, this mere factual issue it is not nearly as interesting as the argument form itself.

Let me end with a third example, one from some of my own work. As some of you may know, I have done some work on Control phenomena. With many colleagues (thx Jairo, Cedric, Masha, Alex), I have argued that there exists a theory of control, the Movement Theory of Control (MTC), that has pretty good empirical coverage and can effectively be derived given certain central tenets of the Minimalist Program (MP). In particular, once one eliminates D-structure in toto and treats Move as a species of Merge then the MTC is all but inevitable. None of this means to say that the MTC is empirically correct, but it does mean that it is a deeply minimalist theory. I would go further (and indeed I have) and argue that the MTC is the only deeply minimalist theory of control and if something like it is incorrect then either MP is wrong (at least for this area of grammar) or control phenomena are not part of FL (here’s a good place to wave hands about properties of the interface). Why do I mention this? Because the MTC is a theory of obligatory control (OC) and, as we all know, this is not the end of the control menagerie. There is non-obligatory control (NOC) as well. What does the MTC have to say about this?

Well, not that much actually.[[3]] Here’s what MTCers have said: it’s the by-product of having a pro in a subject position rather than a PRO (viz. an “A-trace”). And this proposal creates problems for the MTC. How?

Well, to get the data to fall out right any theory of control must assume that given a choice between an OC and an NOC configuration, grammars prefer OC.[[4]] In the context of the MTC this translates into saying that grammars prefer OC style movement to pro binding. Let’s call this a preference for Move over Bind. This sets up the problem. Here it is.

The MTC explains cases like (3a) on the assumption that the gap in the lowest clause is a product of movement (i.e. an “A-trace”). But what prevents a representation like (3b) with a pro in place of the trace thereby licensing the indicated unavailable interpretation? Nothing, and this is a problem.

(3) a. John₁ expects Mary₂ to regret PRO_2/*1 shaving himself₁

b. John₁ expects Mary₂ to regret pro₁ shaving himself₁

The SMT provides a possible solution (this is elaborated in detail here ch. 6). Given the SMT, parsers respect the distinctions grammars make. Thus, parsers must also prefer treating ecs as A-traces rather than pros if they can. So, in parsing a sentence like (4) the parser prefers treating the ec as an A-trace/Copy rather than a pro. But if so, this A-trace must find a (very) local antecedent. Mary fits the bill, John cannot (it would violate minimality).

(4) John expects Mary to regret ec shaving himself

Given this line of reasoning, (3b) above is not a possible parse of the indicated sentence and so the sentence is judged unacceptable. Note that this account relies on the SMT: the parser must cleave to the contours the grammar lays out. Thus, given the grammatical preference for Move over Bind we cannot parse the ec in (4) as a pro and so the structure in (3b) is in principle unavailable to the parser.

Note that this logic only applies to phonetically null pronouns. The parser need not decide on the status of a phonetically overt pronoun, hence the acceptability of (5) with the same binding relations we were considering in (3b):

(5) John₁ expects Mary to regret him₁ shaving himself₁

I don’t expect anyone to believe this analysis (well, not as stated here. Once you read the details you will no doubt be persuaded). Indeed, I have had a hard time convincing very many that the MTC is on the right track at all. But, for the nonce I just want to note that the logic deployed above illustrates another use of SMT reasoning. Let’s review.

Given the SMT, there are strong ties between what parsers do and what grammars prescribe. One can argue from the properties of one to those of the other given the transparency assumptions characteristic of the SMT. In this case, we can use it to argue that though there is nothing grammatically wrong with (3b) it is, given the MTC and the grammatical preference for Move over Bind, inherently unparsable, hence unacceptable under the indicated interpretation.

I have reviewed three instances of SMT reasoning where claims about processing have implications for the competence theory. We have already reviewed arguments that move from competence grammars to interface properties. As is evident (I hope), the SMT has interesting implications for the relationship between the properties of performance systems and competence theories. This should not come as a surprise. We have every reason to think that there is an intimate connection between the shapes of data structures and the algorithms that use them efficiently (see Marr or Gallistel and King on this topic). The SMT operationalizes this truism in the domain of language. Of course, whether the SMT is true is an entirely different issue. Maybe it is, maybe it isn’t. Maybe FL’s data structures are well designed, maybe not. However, for the first time in my linguistic life, I am starting to see how the SMT might function in providing interesting arguments to probe the structure of FL. It seems that at least one version of the SMT has empirical clout and licenses interesting inferences about the structure of performance systems given the properties of competence systems AND vice versa, the structure of competence systems given the properties of performance systems. This version of the SMT further provides a way of understanding minimalist claims about the computational efficiency of grammatical formalisms that make computational sense.[[5]] Of course, this may all be wrong (though not wrong-headed), but it is novel (at least to me) and very very exciting.

Last point and I sign off: The above outlines one version of the SMT. There are various interpretations around. I have no idea whether this version is exactly what Chomsky has been proposing (I suspect that it is in the same intellectual region, but I don’t really care if my exegesis is correct). I like this interpretation for I can make sense of it and because it has a pedigree within Generative Grammar (which I will discuss in a proximate future post). Some don’t seem to like it because it does not make the SMT obviously false or incoherent (you know who you are). The above version of the SMT relies on treating it as an empirical thesis (albeit a very abstract one) about good design. Good design is indexed by various empirical properties: fast parsing and easy learning being conspicuous examples. Whether the FL design is “optimal” (whatever that might mean) is less interesting to me than the question of how/whether our very efficient linguistic performance systems are as good as they are because they exploit FL’s basic properties. It seems to be a fact that we are very good at language processing/production and acquisition. Some might think that how we do these things so well (and why) calls for an explanation. One explanation, the one that I have urged that we investigate (partly because I believe we have some non-trivial evidence bearing on the questions already), is that part of what makes us good at what we do is that the data-structures that Gs generate (our linguistic knowledge) have the properties they have. In short: why are we fast parsers and good acquirers? Because grammars embody principles like c-command, subjacency, Extension etc. That folks is the SMT! And that folks is a very interesting conjecture that we are just now beginning to study in a non-trivial manner. And that folks is called progress. Yessss!

[1] This raises the interesting question of how to model WCO if one accepts the SMT. Why don’t we find WCO effects in online measures like the ones that pop out for SCO?

[2] Actually the argument is more involved: if we model this feature about human memory in something like an Act-R framework (this is based on implementations of this idea by Rick Lewis) then coding c-command into the system proves to be very difficult.

[3] Well, it does say that NOC must occur where movement is prohibited, say into islands:

(i) John said that [ ec kissing Mary] would upset Bill

Thus, the ec in (i) is not the product of movement, so not a case of OC, and thus must be a case of NOC.

[4] Were both equally optional NOC would render OC effects invisible as the latter’s observable properties are a proper subset of the former’s.

[5] I will go into this in more detail in a post that I am working on.

9 comments:

Asad SayeedMay 5, 2014 at 2:34 PM
I don't know if this counts exactly as what you're looking for with the question "What has the SMT done for you lately?", but I work in an environment where attitude of most of my colleagues range from mostly indifferent to SMT-like ideas to incomprehension and hostility, and as I once mentioned to you in private, I feel a "minimalist" attitude has made me feel like I'm playing with a larger hand, so to speak.

I've recently had the task of coming up with a mapping between syntax and semantics that can be used to (robustly) represent certain aspects of incremental parsing, and what I've come up with (and has been adopted by some of my colleagues) is heavily inspired, if not really identical to, the work on neo-Davidsonian semantics by Tim Hunter.

A lot of work in incremental parsing is inspired by a heavily functionalist theory of processing (uniform information density) that has its own sense of minimality, as well as a good body of empirical evidence. It's promoted by people, a few of whom a vehemently hostile to SMT-like things. But I'm increasingly convinced that, if you *need* a formalism, anything that can plausibly represent the kinds of ambiguities and revised expectations that UID-style theories represent in a flexible manner is going to have a lot of the merge-y and move-y scaffolding, a similar notion of features, and so on...and a very constrained design that doesn't require a lot of unpacking and repacking.
ReplyDelete
Replies
Alex ClarkMay 6, 2014 at 12:30 PM
I have slightly lost track of what the SMT is meant to be at this point. Here are two "empirical claims"

A: The language faculty is an optimal solution to legibility conditions at the interfaces.

B: Languages are the way they are so that they can be processed (parsed, produced and acquired) reasonably efficiently.

So there is the boring but necessary terminological question about which of these is the SMT but let's skip that and go to the interesting scientific question: which of them is true?

So I think A is clearly false and B is very likely true. And maybe (mirabile dictu) we agree on this. If we do then we can argue about how this relates to transparency in the parser etc. but we need to get that straight first.

But maybe (based on something N said earlier) we need to split B into B! and B2

B1: Languages have some properties that mean they can be processed efficiently.

B2: They have those properties so they can be processed efficiently.

So I buy into both of these, but I can see that one could accept B1 and not B2.
ReplyDelete
Replies
Tim HunterMay 8, 2014 at 12:00 PM
I'm confused by the discussion of (3) and (4). What's the difference between data that establishes that "grammars prefer OC style movement over binding", and the fact illustrated in (3)? Why does one pertain to the grammar and the other to the parser?
ReplyDelete
Replies

Faculty of Language

Comments

Monday, May 5, 2014

What has the SMT done for you lately?

9 comments:

Contributors