A recent post (here)
illustrated how the SMT provided a unified framework for various kinds of
research into the structure of FL. In particular I reviewed some work showing
how certain recent findings concerning online parsing followed were parsers to
transparently embed grammars as the SMT would require. The flow of argument in
the work described goes from results in syntax to consequences for online
measures of incremental parsing complexity. In other words, this is a case
where SMT, given some properties of the grammar, makes claims about some
property of the interface. Here’s a question: can we reverse the direction of
argument? Can we find cases where the SMT does grammatically useful work, in
that some property of the interface makes a claim about what grammars must look
like? In other words, where the argument
moves from some property of the interfaces to some claims about the right
theory of grammar?
Before offering some illustrations, let me note that the
first kind of argument is nothing to sneeze at if you are interested in discovering
the structure of FL (and who isn’t interested in this?). Why? Because the kind
of evidence that comes from things like the filled gap effect and the
plausibility effect are different
from the kind of data that acceptability (under an interpretation) judgments provide.
And, as every intro philo of science course will tell you, the best support for
a theory comes from different kinds of data
all pointing to the same conclusion (this is called consilience (a term Whewell invented)). Consequently, finding
online data that supports conclusions garnered from acceptability data is interesting
even if one is mainly interested in competence theories.
This said, for purely selfish reasons, it would still be
nice to have examples of arguments going in the other direction as well:
implications for grammatical theory from psycho considerations. I have three
concrete(ish) examples to offer as models; one that I have talked about before
(here)
based on the work by Pietroski, Lidz, Hunter and Halberda (PLHH), but would
like to remind you of, one on how to understand binding domains based on work
by Dave Kush (here),
and one that is entirely self-serving (i.e. based on some work I did on
non-obligatory control) (here
ch. 6).
Before proceeding, let me emphasize that the examples are
meant to be illustrative of the logic of the SMT. I do actually think that the
cited arguments are pretty compelling (of course I LOVE the third one).
However, my point here is not to defend their truth but to outline their logic
and how this relates to SMT reasoning. Being motivated by the SMT does not
imply that an account is true. But given minimalist interests, it is an
interesting property for an account to have. This out of the way, let’s
consider some cases.
First PLHH’s argument concerning the meaning of ‘most.’ The
argument is that there is a privileged representational format for the meaning
of ‘most.’ It’s meaning is (1c) and not the truth
functionally equivalent (1a) or (1b):
(1)
Three possible meanings for ‘most.’
a.
[{x: D (x)}, [x: Y (x)}] iff some some
set s, s Ì {X: D(x)} and OneToOne [s,
{x: Y (x)}]
b.
|{x: D (x)
& Y (x)}| > {x: D (x) & - Y (x)}|
c.
|{x: D (x) & Y (x)}| > |{
x: D (x)}| - |{x: D(x) & Y (x)}|
Why (1c)? Because that’s the one that speakers use when evaluating the quantities of
dot arrays when visually presented. And
if one assumes that the products of well-designed grammars (e.g. meanings) are transparently
used by the interfaces, i.e. if one assumes that the SMT is true, then the fact
that the visual system uses representations like (1c) in preference to those in
(1a,b) even when the others could be used is evidence that this is what ‘most’
means. In other words, given the SMT and the fact that (1c) is used (and used
very efficiently and quickly (see the experiments)) implies that (1c) is the linguistic meaning of ‘most.’
Consider a second case with similar logic. In
his thesis and in recent presentations (here), Dave Kush
observes that speakers respect c-command restrictions when parsing sentences
that involve quantificational binding. More specifically, in parsing sentences
like (2a,b), speakers look for antecedents only within the c-command domain
(CCD) of the bound pronoun. While parsing, speakers reliably distinguish cases
like (2a), where the antecedent c-commands the bound pronoun, from those like
(2b), where it doesn’t.
(2)
a. Kathi didn’t think that any janitor1 liked his job when he1
had to clean up
b. Kathi didn’t think that any janitor1
liked his job but he1 had to clean up
Parsing sensitivity to CCDs is further buttressed by the difference
found in the online parsing of Strong vs Weak Crossover (S/WCO) effects. Kush
provides evidence that incremental parsing respects the former, which invokes
CCDs, but not the latter, which does not.[[1]] As
Kush notes, this fits well with earlier work on the binding of reflexives and
reciprocals. Kush adds some Hindi data on reciprocals to earlier work by Dillon
and Sturt on reflexives to ground this conclusion. Taking these various results
together, Kush concludes, very reasonably IMO, that online parsing is sensitive
to the c-command relations that bound expressions have wrt to their
antecedents.
The conclusion, then, is that incremental parsing computes CCDs
in real time. Based on this established fact, Kush then asks a second very
interesting follow up question: how is this condition implemented in human
parsers. He notes the following problem. Human memory architecture appears to
be content addressable. He notes that this makes coding CCDs with such an
architecture difficult.[[2]]
However, the data clearly indicate that we code something like CCDs and do so
online quickly. So how is this done? Kush suggests that we do not actually code
for CCDs but for something that does similar work, something very like
clausemates, the restriction that did the heavy lifting in previous
incarnations of syntactic theory. Howard Lasnik and Ben Bruening have recently
argued for a return to something like this (Bruening has proposed “phase-command”
rather than c-command as the operative condition). Interestingly, as Kush
shows, these alternatives to c-command can be made to comfortably fit with the
kinds of content addressable memory architectures humans seems we have.
Conclusion: our competence grammars use something like clause/phase command
conditions rather than CCDs as the relevant primitive relations relevant to
binding. Note that the direction of argument goes from online parsing facts
plus facts about human memory architecture to claims about the primitive
relations in the competence grammar. What’s of interest here is how the SMT is
critical in licensing the argument form. Whether Kush is right or not about the
conclusion he draws is, of course, important. But IMO, this mere factual issue
it is not nearly as interesting as the argument form itself.
Let me end with a third example, one from some of my own work.
As some of you may know, I have done some work on Control phenomena. With many
colleagues (thx Jairo, Cedric, Masha, Alex), I have argued that there exists a
theory of control, the Movement Theory of Control (MTC), that has pretty good
empirical coverage and can effectively be derived given certain central tenets
of the Minimalist Program (MP). In particular, once one eliminates D-structure in toto and treats Move as a species of
Merge then the MTC is all but inevitable. None of this means to say that the
MTC is empirically correct, but it does mean that it is a deeply minimalist theory. I would go further (and indeed I have)
and argue that the MTC is the only
deeply minimalist theory of control and if something like it is incorrect then
either MP is wrong (at least for this area of grammar) or control phenomena are
not part of FL (here’s a good place to wave hands about properties of the
interface). Why do I mention this? Because the MTC is a theory of obligatory control (OC) and, as we all
know, this is not the end of the control menagerie. There is non-obligatory control (NOC) as well.
What does the MTC have to say about this?
Well, not that much actually.[[3]]
Here’s what MTCers have said: it’s the by-product of having a pro in a subject position rather than a PRO (viz. an “A-trace”). And this
proposal creates problems for the MTC. How?
Well, to get the data to fall out right any theory of control must
assume that given a choice between an OC and an NOC configuration, grammars
prefer OC.[[4]]
In the context of the MTC this translates into saying that grammars prefer OC
style movement to pro binding. Let’s
call this a preference for Move over Bind. This sets up the problem. Here it
is.
The MTC explains cases like (3a) on the assumption that the gap
in the lowest clause is a product of movement (i.e. an “A-trace”). But what
prevents a representation like (3b) with a pro
in place of the trace thereby licensing the indicated unavailable
interpretation? Nothing, and this is a problem.
(3) a. John1 expects Mary2 to
regret PRO2/*1 shaving himself1
b. John1 expects Mary2 to regret pro1
shaving himself1
The SMT provides a possible solution (this is elaborated in
detail here
ch. 6). Given the SMT, parsers respect the distinctions grammars make. Thus,
parsers must also prefer treating ecs as A-traces rather than pros if they can. So, in parsing a
sentence like (4) the parser prefers treating the ec as an A-trace/Copy rather
than a pro. But if so, this A-trace
must find a (very) local antecedent. Mary
fits the bill, John cannot (it would
violate minimality).
(4) John
expects Mary to regret ec shaving himself
Given this line of reasoning, (3b) above is not a possible
parse of the indicated sentence and so the sentence is judged unacceptable.
Note that this account relies on the SMT: the parser must cleave to the
contours the grammar lays out. Thus, given the grammatical preference for Move
over Bind we cannot parse the ec in
(4) as a pro and so the structure in
(3b) is in principle unavailable to
the parser.
Note that this logic only applies to phonetically null
pronouns. The parser need not decide on the status of a phonetically overt
pronoun, hence the acceptability of (5) with the same binding relations we were
considering in (3b):
(5) John1
expects Mary to regret him1 shaving himself1
I don’t expect anyone to believe this analysis (well, not as
stated here. Once you read the details you will no doubt be persuaded). Indeed,
I have had a hard time convincing very many that the MTC is on the right track
at all. But, for the nonce I just want to note that the logic deployed above
illustrates another use of SMT reasoning. Let’s review.
Given the SMT, there are strong ties between what parsers do
and what grammars prescribe. One can argue from the properties of one to those
of the other given the transparency assumptions characteristic of the SMT. In
this case, we can use it to argue that though there is nothing grammatically
wrong with (3b) it is, given the MTC and
the grammatical preference for Move over Bind, inherently unparsable, hence
unacceptable under the indicated interpretation.
I have reviewed three instances of SMT reasoning where
claims about processing have implications for the competence theory. We have
already reviewed arguments that move from competence grammars to interface
properties. As is evident (I hope), the SMT has interesting implications for
the relationship between the properties of performance systems and competence
theories. This should not come as a surprise. We have every reason to think
that there is an intimate connection between the shapes of data structures and
the algorithms that use them efficiently (see Marr or Gallistel and King on
this topic). The SMT operationalizes this truism in the domain of language. Of
course, whether the SMT is true is an entirely different issue. Maybe it is,
maybe it isn’t. Maybe FL’s data structures are well designed, maybe not. However,
for the first time in my linguistic life, I am starting to see how the SMT
might function in providing interesting arguments to probe the structure of FL.
It seems that at least one version of the SMT has empirical clout and licenses
interesting inferences about the structure of performance systems given the
properties of competence systems AND
vice versa, the structure of competence systems given the properties of
performance systems. This version of the SMT further provides a way of
understanding minimalist claims about the computational efficiency of
grammatical formalisms that make computational
sense.[[5]] Of course, this may all be wrong (though not wrong-headed), but it is novel (at
least to me) and very very exciting.
Last point and I sign off: The above outlines one version of the SMT. There are
various interpretations around. I have no idea whether this version is exactly
what Chomsky has been proposing (I suspect that it is in the same intellectual
region, but I don’t really care if my exegesis is correct). I like this
interpretation for I can make sense of it and because it has a pedigree within
Generative Grammar (which I will discuss in a proximate future post). Some
don’t seem to like it because it does not make the SMT obviously false or
incoherent (you know who you are). The above version of the SMT relies on
treating it as an empirical thesis (albeit a very abstract one) about good design.
Good design is indexed by various empirical properties: fast parsing and easy
learning being conspicuous examples. Whether the FL design is “optimal”
(whatever that might mean) is less interesting to me than the question of
how/whether our very efficient linguistic performance systems are as good as
they are because they exploit FL’s basic properties. It seems to be a fact that
we are very good at language processing/production and acquisition. Some might
think that how we do these things so well (and why) calls for an explanation.
One explanation, the one that I have urged that we investigate (partly because
I believe we have some non-trivial evidence bearing on the questions already),
is that part of what makes us good at what we do is that the data-structures
that Gs generate (our linguistic knowledge) have the properties they have. In
short: why are we fast parsers and good acquirers? Because grammars embody
principles like c-command, subjacency, Extension etc. That folks is the SMT!
And that folks is a very interesting conjecture that we are just now beginning to
study in a non-trivial manner. And that folks is called progress. Yessss!
[1] This raises the
interesting question of how to model WCO if one accepts the SMT. Why don’t we
find WCO effects in online measures like the ones that pop out for SCO?
[2] Actually the argument is
more involved: if we model this feature about human memory in something like an
Act-R framework (this is based on implementations of this idea by Rick Lewis)
then coding c-command into the system proves to be very difficult.
[3] Well, it does say that NOC
must occur where movement is prohibited, say into islands:
(i)
John said that [ ec kissing Mary] would upset
Bill
Thus, the ec in (i) is not the product of movement, so
not a case of OC, and thus must be a case of NOC.
[4] Were both equally optional
NOC would render OC effects invisible as the latter’s observable properties are
a proper subset of the former’s.
[5] I will go into this in
more detail in a post that I am working on.
I don't know if this counts exactly as what you're looking for with the question "What has the SMT done for you lately?", but I work in an environment where attitude of most of my colleagues range from mostly indifferent to SMT-like ideas to incomprehension and hostility, and as I once mentioned to you in private, I feel a "minimalist" attitude has made me feel like I'm playing with a larger hand, so to speak.
ReplyDeleteI've recently had the task of coming up with a mapping between syntax and semantics that can be used to (robustly) represent certain aspects of incremental parsing, and what I've come up with (and has been adopted by some of my colleagues) is heavily inspired, if not really identical to, the work on neo-Davidsonian semantics by Tim Hunter.
A lot of work in incremental parsing is inspired by a heavily functionalist theory of processing (uniform information density) that has its own sense of minimality, as well as a good body of empirical evidence. It's promoted by people, a few of whom a vehemently hostile to SMT-like things. But I'm increasingly convinced that, if you *need* a formalism, anything that can plausibly represent the kinds of ambiguities and revised expectations that UID-style theories represent in a flexible manner is going to have a lot of the merge-y and move-y scaffolding, a similar notion of features, and so on...and a very constrained design that doesn't require a lot of unpacking and repacking.
I have slightly lost track of what the SMT is meant to be at this point. Here are two "empirical claims"
ReplyDeleteA: The language faculty is an optimal solution to legibility conditions at the interfaces.
B: Languages are the way they are so that they can be processed (parsed, produced and acquired) reasonably efficiently.
So there is the boring but necessary terminological question about which of these is the SMT but let's skip that and go to the interesting scientific question: which of them is true?
So I think A is clearly false and B is very likely true. And maybe (mirabile dictu) we agree on this. If we do then we can argue about how this relates to transparency in the parser etc. but we need to get that straight first.
But maybe (based on something N said earlier) we need to split B into B! and B2
B1: Languages have some properties that mean they can be processed efficiently.
B2: They have those properties so they can be processed efficiently.
So I buy into both of these, but I can see that one could accept B1 and not B2.
I buy B1. I do not currently buy B2: so I don't think that Gs have the properties they have SO THAT they could be efficiently parsed, but they have these properties and as a result they can be so parsed. If you also buy B1 then that's great. The empirical issue then becomes to see which properties language needs so that this is true (indeed, we need to see IF this is true, but we agree on that). I believe that some of these are Extension and cyclicity/Bounding and maybe minimality. A system with these properties will run well when plugged into parsers/acquirers. But, for now let's celebrate that we agree on how to put the issue. That's great.
DeleteAs for A: as I've noted, I am not sure what this says. THus I am not sure that it is false. But I do agree it has been very hard to work with, and the B1 claim is interesting enough for me.
So let's call B or maybe just B1 the Weak Efficiency Thesis (WET) because it is pretty uncontroversial.
DeleteOne way of exploring this would be to say, precisely, what we mean by reasonably efficient, and then only consider theories of grammar that are efficient in that sense: efficient in the sense that they can be parsed and learned in some technically efficient way. That is roughly my research program. So there are some explicit ideas about some properties that lead to efficient recognition and learning, and we have some theorems that show this. So I understand that you don't like the particular technical notion of efficiency that is used, but let's agree to disagree on that.
But I don't see how the constraints you mention will work with the WET:
" I believe that some of these are Extension and cyclicity/Bounding and maybe minimality. A system with these properties will run well when plugged into parsers/acquirers."
Why do you think that systems like this will run well? What evidence do you have?
That's the topic of my next post. You won't like it, but it will be put out there in the open. BTW, I am happy to agree to disagree. I will address this as well in my next post. Right now, back to racing theses and grading papers.
DeleteI'm confused by the discussion of (3) and (4). What's the difference between data that establishes that "grammars prefer OC style movement over binding", and the fact illustrated in (3)? Why does one pertain to the grammar and the other to the parser?
ReplyDeleteMTC assumes that 'pro' can be freely generated in the spec of a non-finite TP in English. This pro is what's responsible for non OC readings (does not require a c-commanding antecedent, need not be local etc). So the problem with (3b) being unavailable in English cannot be that it is an ungrammatical structure. It isn't. So why is it out? Well, the idea was that to get the OC/NOC distinction at all given the MTC one needs a kind of economy story with movement being preferred to binding where the two are options. If this is a principle of the grammar then transparency says (SMT) that it is a principle of the parser, ie.. when an ec is encountered in a parse assume it is a "trace" rather than a null pronoun, all things being equal. So, the parser will treat the ec in (4) as a trace and look for a local antecedent. THus, the perfectly acceptable structure with pro there will never be available due to how the parser transparently reflects properties of the grammar (i.e. the economy condition). So the sentence were it parable would be ok but it is not parable and not so for a principled reason given the SMT. That's the idea.
DeleteI don't expect you to believe this (though I kind of like the explanation) but that the logic invokes the SMT and the transparency assumption between grammars and parsers to run.
Let me see if I get this. On the basis of standard non-OC examples, we suppose that pro is possible in Spec of non-finite TP. This appears to be contradicted by the fact that (3b) is unacceptable. So, in response, we suppose that (3b) is actually grammatical but non-parsable (I take it this means it has something like the same status as a triple-center-embedded sentence). Is that right? If so ... dumb question ... why is the original standard non-OC example not similarly non-parsable?
DeleteBecause the ec is generally within and island and so cannot be analyzed as a trace:
Delete(i) John said that [[ec washing himself] was imperative]
This is the typical case of NOC. Note the ec is within an island and so not analyzable by either the grammar or the parser as a trace of I-merge. Hence, no competition and the parser can drop a pro there.
The problem for any grammatical theory is that the diagnostics of OC are a proper subset of those for NOC. Hence IF OC did not pre-empt NOC you would never "see" OC. Hence there needs to be some economy condition favoring OC when it is available. Of course, the MTC tells you that one place where it is NOT available is inside an island. Here NOC will be possible if it is licensed by something like a pro. This has the nice property of favoring OC outside island contexts, and this seems roughly correct.
Note that there is still some issues concerning what to do with the arb interpretation of pro (John thinks that washing oneself is important), though I would suspect that these too are pronouns, just indefinite ones that distribute more or less the way 'one' does. But these are orthogonal to the proposal above.
Hope this helps.