Faculty of Language: The SMT again

Thursday, April 17, 2014

The SMT again

Two revisions below thx to Ewan spring a thinko, a slip of the mind.

I have recently urged that we adopt a particular understanding of the Strong Minimalist Thesis (SMT) (here). The version that I favor treats the SMT as a thesis about systems that use grammars and suggests that central features of the grammatical representations that they use will be crucial to explaining why they are efficient. If this proves to be doable, then it is reasonable to describe FL and the grammars it makes available as “well designed” and “computationally efficient.” Stealing from Bob Berwick (here), I will take parsing efficiency to mean real time parsing and (real time) acquisition to mean easy acquisition given the PLD. Put this all together and the SMT is the conjecture that the grammatical format of Gs and UG is critical to allowing parsers, acquirers, producers, etc. to be very good at what they do (i.e. to be well-designed). On this view, grammars are “well designed” or “computationally efficient” in virtue of having properties that allow their users to be good at what they do when such grammars are embedded transparently within these systems.

One particularly attractive virtue of this interpretation (for me) is that I understand how I could go about empirically investigating it. I confess that this is not true for other versions of the SMT that talk about neat fits between grammar and the CI interface, for example. So far as I can tell, we know rather little about the CI interface and so the question of fit is, at best, premature. On the other hand we do know a bit about how parsing works and how acquisition proceeds so we have something to fit the grammar to.[1]

So how to proceed? In two steps I believe. The first is to see if use systems (e.g. parsers) actually deploy grammars in real time, i.e. as they parse. Thus, if it is true that the basic features of grammatical representations are responsible for how (e.g.) parsers manage to efficiently do what they do then we should find real time evidence implicating these representations in real time parsing. Second, we should look for how exactly the implicated features manage to make things so efficient. Thus, we should look for theoretical reasons for why parsers that transparently embody, say, Subjacency like principles, would be efficient. Let me discuss each of these points in turn.

There is increasing evidence from psycho-ling research indicating that real time parsing respects grammatical distinctions, even very subtle ones. Colin Phillips is a leader in this kind of work and he and his (ex) students (e.g. Brian Dillon, Matt Wagers, Ellen Lau, Dave Kush, Masaya Yoshida) have produced a body of work that demonstrates how very closely parsers respect grammatical conditions like islands, c-command, and local binding domains. And by closely I mean very closely. So, for example, Colin shows (here) that online parsing respects the grammatical conditions that license parasitic gaps. So, not only do parsers respect islands, but they even treat configurations where island effects are amnestied as if they were not islands. Thus, parsers respect both the general conditions that grammars lay down regarding islands and the exceptions to these general conditions that grammars allow. This is what I mean by ‘close.’

There is a recent excellent demonstration of this from Masaya Yoshida, Lauren Ackerman, Morgan Purier and Rebekah Ward (YLPW) (here are slides from a recent CUNY talk).[2] YLPW analyzes the processing of backward sluicing constructions like (1):

(1) I don’t recall which writer, but the editor notified a writer about a new project

There is an ellipsis “gap” right after which writer that is redeemed by anchoring it to a writer in the following sentence. What YLPW is looking to determine is whether the elided gap site is sensitive to online parsing effects. YLPW uses a plausibility effect as probe as follows.

First, it is well known that a wh in CP triggers an active search for a verb/gap that will give it an interpretation. ‘Active’ here means that the parser uses a top down predictive process and is eagerly looking to link the wh to a predicate without first consulting bottom information that would indicate the link to be ill-advised. YLPW show that the eagerness to “fill a gap” is as true for implicit gaps within ellipsis sites as it is for “real” gaps in regular wh sentences. YLPW shows this by demonstrating a plausibility effect slowdown in sentences like (2a) parallel to the ones found in (2b):

(2) a. I don’t remember which writer/which book, but the editor notified a writer about a new book

b. I don’t remember which writer/which book the editor notified GAP about a new book

When the wh is which book then there is a significant pause at notified in both sentences in (2), as contrasted with the same sentences where which writer is the antecedent of the gap. This is because parsers, we know, greedily try and relate the wh to the first syntactically available position encountered and in the case of which book the wh is not a plausible filler of the gap and the attempted filling results in a little lingering about the verb (*notify this book about…). If the antecedent is which writer no such pause occurs, for obvious reasons. The plausibility effect, then, is just a version of the well-known filled gap effect, with a little semantic kicker to add some frisson. At any rate, the first important discovery is that we find the same plausibility effect in both (2a) with the gap inside a sluiced ellipsis site, and (2b) where the gap is “overt.”

The next step is to see if this plausibility/filled gap effect slowdown occurs when the relevant antecedent for the sluiced ellipsis site is inside an island. It is well known that ellipsis is not subject to island restrictions. Thus, if the parser cleaves tightly to the distinctions the grammar makes (as the SMT would lead us to expect) then we should find plausibility slowdowns ~~except~~ when the gap is inside an ellipsis site for the latter are not subject to island restrictions [and so should induce filled gap/plausibility effects (added: thx Ewan)]. And that’s exactly what YLPW find. Though plausibility effects are not found at notified in cases like (3) they are found in cases like (4) where the “gap” is inside a sluice sight.

(3) I don’t remember which book [the editor who notified the publisher about some science book] had recommended to me

(4) I don’t remember which book, but [the editor who notified the publisher about some science book] recommended a new book to me

This is just what we expect from a parser that transparently embeds a UG like grammar that treats movement but not ellipsis as a product of (long) movement.

The conclusion: it seems that parsers make just the distinctions that grammars make when they parse in real time, just as the SMT would lead us to expect.

So, there is growing evidence that parsers transparently embed UG like grammars. This readies us for the second step. Why should they do so? Here, there is less current research that bears on the issue. However, there is work from the 80s by Mitch Marcus, Bob Berwick and Amy Weinberg that showed that a Marcus style parser that incorporated grammatical features like Subjacency (and, interestingly, Extension) could parse sentences efficiently (effectively, in real time). This is just what the doctor ordered. It goes without saying (though I will say it) that this work needs updating to bear more directly on the SMT and minimalist accounts of FL. However, it provides a useful paradigm of how one might go about connecting the discoveries concerning online parsing with computational questions of parsing efficiency and their relationship to central architectural features of FL/UG.

The SMT is a bold conjecture. Indeed, it is likely false, at least in fine detail. This does not, however, detract from its programmatic utility. The fact is that there is currently lots of research that can be understood as bearing on its accuracy and that fruitfully brings together work in syntax, psycholinguistics and computational linguistics. The SMT, in other words, is a terrific hypothesis that will generate fascinating work regardless of its ultimate empirical fate. That’s what we want from a research program and that’s something that the Strong Minimalist Thesis is ready to deliver. Were this all that the Minimalist Program provided, it would have been enough (dayenu!). There is more, but for the nonce, this is more than enough. Yay, for the Minimalist Program!!!

[1] Let me modulate this: we know something about some other parts, see here for discussion of magnitude estimation in the visual domain. Note that this discussion fits well with the version of the SMT deployed here precisely because we know something about how this part of the visual system works. We cannot say as much about most of the other parts of CI. Indeed, we don’t really know how many “parts” CI has.

[2] They are running some more experiments, so this work is not yet finished. Nonetheless, it illustrates the relevant point well, and it is really fun stuff.

67 comments:

ewanApril 18, 2014 at 12:04 PM
Quick correction: you write:

(i) if [X] then we should find plausibility slowdowns except when [SLUICING]
(ii) plausibility effects are not found at notified in cases like [NOT SLUICING] [and] they are found in cases like [SLUICING]

(ii) is correct, (i) is (a fortiori) backwards, no? (i) asserts that there should be plausibility slowdowns in case "NOT SLUICING" and no plausibility slowdowns in case "SLUICING". But that would mean the parser isn't respecting islands in the case where it should, and is respecting islands (i.e., ignoring the possible-but-island-filtered-out gap site) in the case where it shouldn't. Their Experiment 4 shows the opposite (i.e. the right thing), unless my brain is completely farted out.

Unless I'm missing something, the island experiment also serves as a needed control. One might have argued based only on (their) Experiment 2 (ex (2)) that the reason you see the plausibility effect in the sluicing cases is because the gap-filling doesn't respect the structure at all, it's just the linearly recent wh that triggers it. But Experiment 4 (ex (3-4)) shows (yet again) that active gap filling is island sensitive, pulling apart the linear-filling strategy and the grammatical-filling strategy for these sentences. In other words if you're not already convinced that active gap filling should be island sensitive, E4 serves to COMPLETE E2 as an argument that in the case of sluicing we see the appropriate grammar-hewing; if you are it's as your treatment reads, another piece of evidence ON TOP of E2.
ReplyDelete
Replies
ewanApril 18, 2014 at 6:05 PM
OK, now that the facts are straight: is there any sense in maintaining this version of SMT at all? Not to be difficult. It says that the parser should respect the grammar, but the whole premise of (SMT being about) systems that "use the grammar" respecting it presupposes that there can be systems that "use the grammar" which are in some sense separate from the grammar. I'm pretty sure this implies what I call the Explicit Grammar Hypothesis - namely, that there is a possibility of distinguishing between the grammar and the systems that "use" it. I.e. the grammar is not merely implicit in the systems that use it, but has some existence apart from them which can be meaningfully characterized. I've always wondered what this hypothesis - always implicit, never argued for - really amounts to. Consider learning, parsing/recognition, and production. Is there really anything meaningful apart from this? do we think that these systems "use" the grammar, e.g., by accessing it from memory? If so, what's the evidence for this? And if not, what's the force of (this treatment of) SMT?
ReplyDelete
Replies
AveryAndrewsApril 18, 2014 at 6:43 PM
This comment has been removed by the author.
ReplyDelete
Replies
ewanApril 19, 2014 at 6:19 AM
Shared competence is fine, but that's different from explicit competence as far as I can see. Shared competence says, give me production mechanism P, comprehension mechanism C. There's something you could call the grammar G(P) and something you could call the grammar G(C) and they're the same. Fine. Of course.

But now consider what we're saying when we ask "does P [/C] respect G(P) [/C]?". Hard to say exactly what that means, because in a certain sense it seems tautological. We need another ingredient - Berwick and Weinberg's notion of "transparency". How much does P "differ" from G(P)? You'd need some more details to spell out exactly what that means but we can take the Marcus parser as a positive example - there was a negative example in the book if I recall, I don't remember what it was - N will.

OK. Now. There are two ways of fixing the function G() (which needs to be overloaded for P/C/etc but whatever). One is to say "G is fixed by virtue of the fact that there is an explicit grammar which is separate from P or C" - as I suggested, one example of this might be loading the grammar from storage. Think of loading the Java virtual machine (P or C) and then that loads the program up (G). There might be some other way to spell this out neurophysiologically other than memory access.

But anyway there is a completely OTHER way you could fix G(.), which is, G(.) is simply DEFINED as the set of all the things I can find in common between P and C; but it is completely implicit in both. The idea would be that, as I learn a language, I am simply learning two separate programs under the rule that they have to be consistent in a bunch of ways. Again, you'd have to do some work to figure out exactly what this means, and where you would ever stop with this task of finding "all" the things that P and C have in common. But you get the idea. G is just the bit where P and C, mapped into the appropriate comparable space, overlap. They both respect islands, for example.

The slogan - are language mechanisms centralized planners (Explicit Grammar Hypothesis), or friendly anarchists who get along by copying each other (Implicit Grammar Hypothesis)?

In both cases, something is shared. But SMT_{Norbert} takes a very different meaning on under these two ideas about competence. Under the IGH, it really just works out to "how close is the correspondence between C and P?"; under EGH, that's a derived question - the basic question is "how close is the correspondence between C and G and between P and G?" With EGH, the answer could be "not close at all" and there would still be a grammar. With IGH, if the answer is sufficiently far from "close" then the only overlap might be in their extension (the set of acceptable/generable sound-meaning pairs). We would find ourselves in the position where all the principles of grammar we've discovered are actually principles of (I guess it would be) parsing.
ReplyDelete
Replies
ewanApril 19, 2014 at 6:50 AM
So (to pose the question) is there any empirical evidence for one treatment of competence over the other?

And to add a bit, under IGH, we'd need to say something more about exactly HOW production does islands in order to assess the SMT in this case (I think).
ReplyDelete
Replies
Alex ClarkApril 19, 2014 at 10:43 AM
I am confused here about the term "real time". From my conversations with Bob Berwick I think he means some technical sense of real time; ie. recognized by deterministic (perhaps non-deterministic?) Turing machines in linear time with bounded delay. That is a particularly stringent notion of computational efficiency that is I think incompatible with the approach you are taking, where real-time means something else, but I don't know what.

ReplyDelete
Replies
Tim HunterApril 20, 2014 at 10:20 AM
Norbert wrote: There is nothing logically incoherent about finding that the online systems and the off line systems make different distinctions. So, it COULD have happened that, for example, parsers did not respect islands. [But that is not what we see.]

While I agree with the gist of this, the logical possibility that is being ruled out seems to me to be a very strange and complicated one (and this strangeness and complicated-ness often seems to slip by unmentioned). Suppose that some studies of word-by-word reading times (or whatever) revealed to us that "parsers did not respect islands". Then something else would be required to account for the fact that island violations produce unacceptability. I don't think it's enough to just say that "well, while the *parser* doesn't respect islands, the *grammar* does, and it's the grammar that's responsible for the 'slower' things like acceptability judgements". This would have to be supplemented by some story of how the acceptability judgements themselves are computed, after the parser does its non-island-respecting thing: this sounds to me like we are in effect positing two distinct parsers, only one of which is incremental. (What is it that produces acceptability judgements if not a parser of some kind?) One way of doing this is Bever's, which takes grammatical derivations to describe the bottom-up operations of the "second parser", but (and I suppose this is my main point) this goes beyond simply making a distinct between a parser and a grammar. So is this two-parser view being implicitly assumed every time we entertain the logical possibility that "parsers did not respect islands"?

Norbert wrote: What I am pretty sure of is that we don't want to identify Gs with what we find in parsers. Colin has evidence bearing on this. Recall, he finds filled gap effects into some subject islands; those that license PGs. This means that the parser can and does establish dependencies into these islands AND unless there is a PG downstream speakers will judge these perfectly parsed sentences to be unacceptable. I assume that to describe this we need to say that a parsable dependency is nonetheless ungrammatical.

On a similar note, I'm a bit confused about what is meant by a "parsable dependency" here (and I suspect this may be related to Alex C's question about the term "real time"). It seems to refer to "a dependency that is not rendered impossible due to memory constraints [or other implementation details]". But what about the ungrammatical subject-verb dependency in "You is tall". That is presumably parsable in this sense, right? If so, then you don't need anything as involved as Colin's PGs experiment to make the general point that parsable dependencies can be ungrammatical.

Of course the important point of that experiment was that those dependencies had been claimed by others to be unacceptable due to non-parsability rather than due to non-grammaticality. So the finding that those particular dependencies were parsable was relevant to that debate. (Indeed, it seems to me to be a genuinely knock-down argument of a kind that we very rarely get.) But it seems to me to be a much more obvious and less subtle point that there exist dependencies that are parsable and yet ungrammatical. Am I misunderstanding the terms?
ReplyDelete
Replies
Alex ClarkApril 23, 2014 at 1:46 AM
There is an informal sense of real time which means taking n seconds to process n seconds of input; so call this RT-I.
A formal sense (actual several) which involves multi-tape turing machines, linear bounds and finite delay, call this RT-F.

So we observe that humans can in general understand language in RT-I subject to occasional slow downs, pauses and complete failures (think garden-path sentences, multiple sentence embeddings etc). Call this empirical observation (E).

So one argument (AR) is that E implies that the sets of grammatical strings of the language are in RT-F.

That may or may not be a good argument, one can say a lot about it, but it doesn't have anything to do with the SMT right?

The weaker argument (AP) is that E implies that the sets of grammatical strings of the language are in PTIME (the set of efficiently parseable languages).

So clearly if you reject AP you must reject AR. (since RT-I is a subset of PTIME). And I think that you and Bob B reject AP.
ReplyDelete
Replies
Alex ClarkApril 23, 2014 at 9:27 AM
@Norbert I am not advocating the arguments AP or AR here, just trying to understand the SMT as you view it.

So just to be clear, you reject AR and AP, and the research strategy you advocate to investigate the SMT is to implement broad coverage parsers and test them empirically somehow? Presumably on real corpora, or on artificial examples?
ReplyDelete
Replies
Greg KobeleApril 23, 2014 at 8:53 PM
I did indeed say those things, but I intended them to be understood differently. An interesting result of comp sci is the division of the logical space of possible languages into hierarchies (Chomsky hierarchy, complexity hierarchies, etc). It is a non-trivial discovery of linguistics that the actual languages that have been observed are not randomly distributed throughout the logical space of possible languages, but rather cluster into a small group. One of the properties shared by this group is that of being recognizable in polynomial time on a deterministic turing machine. Although it is tempting to think of this property in terms of a parser, there are equivalent characterizations of this class of languages (being describable in first order logic with a least fixed point operator) which do not have anything to do with the dynamic processing of recognizing strings. Still, the fact that all known natural languages can be recognized efficiently is very suggestive! It also means that we can write correct algorithms which do exactly what the grammar says they should do (in terms of accept/reject). This seems like a natural place to start, when trying to develop theories of the human sentence processor. There are otherwise infinitely many procedures that agree with the correct ones on a fixed finite number of strings. The idea to start with a correct procedure is related to the idea of rational analysis in the cog sci/psych literature (in addition to being the basis of the levels hypothesis).
One thing that has always puzzled me about the idea that the grammar should be ontologically distinct from the parser is what explanatory role that leaves for the grammar. Norbert seems to be suggesting that the grammar should be used for the explanation of acceptability judgments, and the parser for the explanation of other things (like eye-tracking). This can't be right as I've stated it, because we appeal to the parser to explain why center embeddings are unacceptable. So then the grammar is used for the explanation of whatever acceptability judgments that the parser doesn't account for. It feels like this is a fairly slippery slope; how does the grammar actually get used here? We need some sort of theory of the use of the grammar; a separate parser for acceptability. Now we have two parsers; the parser which respects the grammar, and the one which only sort of respects the grammar. What a mess!
Work in computer science shows how we can use the parser which respects the grammar to do lots of things, including seeming not to respect the grammar (NVN-style heuristic effects). Why not start here? An uncharitable person might describe this as looking for lost keys under the lightpost, but, where else are we going to start looking for them!
ReplyDelete
Replies

Add comment

Faculty of Language

Comments

Thursday, April 17, 2014

The SMT again

67 comments:

Contributors