The goal of the Minimalist Program is to reduce the language
specificity of FL (i.e. UG) to the absolute minimum, so as to reduce the
Logical Problem of Language Evolution (LPLE) to manageable size. The basic
premise is that LPLE is more challenging
the more linguistiky FL’s operations and principles are. This almost
immediately implies that to solve the LPLE requires radically de-linguistifying
the structure of FL.
There are several reasonable concerns regarding this version
of LPLE, but I will put these aside here and assume, with Chomsky, that the
question, though vague, is well posed and worth investigating (see here
and here
for some discussion). One obvious strategy for advancing this project is to try
and reduce/unify well grounded linguistic principles of UG with those operative
in other domains of cognition. Of the extant UG principles ripe for such
reconceptualization, the most tempting, IMO and that of many others as we shall
see, is Relativized Minimality (RM). What is RM to be unified with?
Human/biological memory. In particular, it is tempting to see RM effects as
what you get when you shove linguistic objects through the human memory system.[1]
That’s the half-baked idea. In what follows I want to discuss whether it can be
more fully baked.
First off, why think that this is doable at all? The main
reason is the ubiquity of Similarity Based Interference (SBI) effects in the
memory literature. Here is a very
good accessible review of the relevant state of play by Van Dyke and Johns
(VD&J).[2]
It seems that human (in fact all biological) memory is content addressable (CA)
(i.e. you call the memory in terms of its contents (rather than, say, an
index)). Further, the more the contents of specific memories overlap, the more
difficult it is to successfully get at them. More particularly, say that one
accesses a memory via certain content cues, the more these cues overlap the
more they “overload” the retrieval protocol making it harder to successfully
get the right one. On the (trivial) assumption that memory will be required to
deal with the ubiquitous non-(linearly)-adjacent dependencies found in language
we should expect to find restrictions on linguistic dependencies that reflect
this memory architecture.[3]
VD&J review various experiments that show the effects that distracters can
have on retrieving the right target when these distracters “resemble” the
interacting expressions.
Given that memory will create SBIs it is natural to think
that some kinds of dependencies will be favored over others by this kind of
memory architecture. Which? Well ones in which the cues/features relating the dependents
are dissimilar from those of the intervening
elements. Graphically, (1) represents the relevant configuration. In (1), if
non-adjacent X and Y need to be related (say there is a movement or antecedence
dependency between the two) then this will be easiest if the cues/features
relating them are not also shared by intervening Zish elements.
(1) …X…Z…Y…
This should look very
familiar to any syntactician who has ever heard the name ‘Luigi Rizzi’ (and if
you haven’t think of either changing fields or getting into a better grad
program). Since Rizzi’s 1990 work, (1), in the guise of RM, is standardly used to
explain why WHs cannot move across other WHs (e.g. superiority and Wh-island
effects) or heads over heads (the head movement constraint).
IMO, RM is one of the prettiest
(because simplest) empirically useful ideas to have ever been proposed in
syntax.[4]
Moreover, its family resemblance to the kinds of configurations that induce SBI
effects is hard to miss.[5]
And the lure of relating the two is very tempting, so tempting that resistance
is really perverse. So the question becomes, can we understand RM effects as
species of SBI effects and thus reflections of facts about memory architecture?
Psychologists have been pursing a similar (though as we
shall see, not identical) hunch for quite a while. There is now good evidence
that VD&J reviews that encumbering (working) memory with word lists while
sentences are being processed differentially affects processing times of
non-local dependencies and that the
difficulty is proportional to how similar the words in memory are with words in
the sentence that need to be related. Thus, for example, if you are asked to
keep in memory the triad TABLE-SINK-TRUCK while processing It was the boat that the guy
who lived by the sea sailed after
two days then you do better at establishing the dependency between boat
and sail
then if you are asked to parse the same sentence with fix in place of sail.
Why? Because all three of the memory list words are fixable, while none are
sailable. This fact makes boat harder to retrieve in the
second fix sentence than the first sail sentence (p. 198-9).
Syntactic form can also induce interference effects. Thus,
the subject advantage inside relative clauses (RC) (i.e. it is easier to parse
subject relatives than object relatives, see here)
is affected by the kinds of DPs present in the RC. In particular take (2) and (3). The Subject
Advantage is the fact that (2) is easier to parse.
(2) The
banker that praised the barber climbed the mountain
(3) The
banker that the barber praised climbed the mountain
VD&J note that SBI effects are detectable in such cases
as the Subject Advantage can be reduced or eliminated if in place of D-NP
nominal like the barber one puts in
pronouns, quantified DPs like everyone
and/or proper names. The reasoning is that the definite descriptions interfere
with one another, while the names, pronouns and quantifiers interfere with
D-NPs far less.[6]
VD&J offers many more examples making effectively the
same point: that the contents of
memory can affect sentential parsing of non-local dependencies and that they do
so by making retrieval harder.
So, features matter, both syntactic features and “semantic”
ones (and, I would bet other kinds as well).
What are the relevant dimensions of similarity? Well, it
appears that many things can disrupt, including grammatical and semantic
differences. Thus, the “semantic” suitability of a word on the memorized word
list and the syntactic features that differentiate one kind of nominal from
another can serve to interfere with establishing the relation of interest. [7]
Friedmann, Belletti and Rizzi (FBR) (here)
reports similar results, but this time for acquisition. It appears, for
example, that subject relatives are more easily mastered than object relatives,
as are subject vs object questions. FBR discusses data from Hebrew. Similar
results are reported for Greek by Varlokosta, Nerantzini and Papadopoulou (VNP)
here.
Moreover, just as in the processing literature, it appears that DPs interfere
with one another the more similar
they are. Thus, replacing D-NP nominal with relative pronouns and bare whs
(i.e. what vs what book) eases/eliminates the problem. As FBR and VNP note, the
subject advantage is selective and, in their work, is correlated with the
syntactic shapes of the intervening nominal.[8]
The more similar they are, the more the problems caused.
So, at first blush, the idea that RM effects and SBI effects
are really the same thing looks very promising. Both treat the shared features
of the interveners and dependents as the relevant source of “trouble.” However
(and you knew that knew was coming, right?) things are likely more complicated.
What’s clear is that features do make a difference, including syntactic ones.
However, what’s also clear is that not only syntactic shape/features matters.
So do many other kinds.
Moreover, it is not clear which similarities cause problems
and which don’t. For example, the standard RM model (and the one outlined in
FBR) concentrates on cases where the features are identical vs when they
overlap vs when they are entirely disjoint. The problem with relative clauses
like (3) for example, is that the head of the relative clause and the
intervening subject have the exact same syntactic D-NP shape and the reason
subbing a pronoun or name or quantifier might be expected to mitigate
difficulty is that the subject intervener only share some of their features thereby reducing the minimiality effects. So
in the case of RCs the story works as we expect.
The problem is that there are other data to suggest that this
version of RM delivers the wrong answers in other kinds of cases. For example,
a recent paper by Atkinson, Apple, Rawlins and Omaki (here) (AARO) shows that “the
distribution of D-linking amelioration effect [sic] is not consistent with Featural Relativized Minimality’s
predictions…” (1). AARO argues that carefully controlled rating methods of the
experimental syntax variety show that moving a which-NP over a which-NP
in Spec C is better than moving it over a who
(i.e. (4) is reliably better than (5)). This is not what is expected given the
featural identity in the first case
and mere overlap in the second.[9]
(4) Which
athlete did she wonder which coach would recruit
(5) Which
athlete did she wonder who would recruit
IMO, superiority shows much the same thing. So (6) is quite
a bit better than (7) to my ear.
(6) I
wonder which book which boy read
(7) I
wonder which book who read
Once again, the simple syntactic version of RM suggests that
the opposite should be the case. If this is so, then there is more than just
structural similarity involved in RM effects.
This, however, might be a good thing if one’s aim is to treat RM effects as instances of more
general SBI effects. We expect many
different factors to interact to provide a gradation of effects, with syntactic
shape being one factor among many. The AARO data suggests that this might
indeed be correct, as are the parallels between the VD&J parsing data and
the FBR/VNP acquisition data. So, if AARO is on the right track, it argues in
favor of construing RM effects as kinds of SBI effects, and this is what we
would expect were RM not a grammatically primitive feature of FL/UG, but the
reflection of general memory architecture when applied to linguistic objects. In
other words, this need not be a problem, for this is what one would expect if
RM were just a species of SBI (and hence traceable to human memory being
content addressable).[10]
What is more problematic, perhaps, is settling what
“intervention” means. In the memory literature intervention is entirely a
matter of temporal order (i.e. if Z is active when X and Y are being considered
it counts as an “intervener,” i.e. roughly speaking if Z is temporally between
X and Y then Z intervenes), while for RM the general idea is that intervention
is stated in terms of c-command (i.e. Z intervenes between X and Y if X
c-commands Z and Z c-commands Y) and this has no simple temporal implications. Thus,
the notion explored in the memory literature mainly explores a “linear” notion
of intervention while RM relies on a structural notion and so it is not clear
that RM effects should be assimilated to memory effects.
However, I am not currently sure how big a problem this
might be. Here’s what I mean.
First, much of the literature reviewed in VD&J involves
English data where linear and hierarchical intervention will be the same. We
know that when these two are pulled apart in many cases it is hierarchy that
matters (see the discussion of the Yun et. al. paper here.
It shows that the Subject Advantage persists in languages where linear order
and hierarchical order go in different directions).
Similarly, Brian Dillon (here) and Dave Kush (in his
unavailable thesis; ask him) show that hierarchical, not linear, intervention
is what’s critical in computing binding relations.
Of course, there are also cases where hierarchy does not rule,
and linear intervention seems to matter (e.g. agreement attraction errors and
certain NPI licensing illusions are less sensitive to structural information,
at least in some online tasks, then hierarchical restrictions suggest they
should be).[11]
So both notions of proximity seem to play a role in language
processing. I don’t know whether both have an effect in acquisition (but see
note 11). So, does this mean that we cannot unify RM effects with SBI effects
for they apply in different kinds of configurations? Maybe not.
Here’s why.
Memory effects arise from two sources: the structure of
memory (e.g. is it content addressable, rates of decay, number of buffers, RAM
etc.) and the data structures that memory works on. It is thus possible that
when memory manipulates syntactic structures that it will measure intervention
hierarchically because linguistic objects are hierarchically structured. In
other words, if phrase markers are bereft of linear order information (as say a
set theoretic understanding of phrase markers entails) then when memory deals
with these it will not be able to use linear notions to manipulate them because
such objects have no linear structure. In these cases, when task demands use
memory to calculate the properties of phrase markers, then RM effects is what
we expect to see: SBIs with c-command determining intervention. Of course, sentences when used have more than
hierarchical structure and it is reasonable to suppose that this too will affect
how linguistic items are used.[12]
However, this does not prevent thinking of RM as a memory effect defined over
PM-like data structures. And there is every reason to hope that this is in fact
correct for if it is then we can treat RM as a special case of a more general
cognitive fact about us; that we have content addressable memories that are
subject to SBI effects. In other words, we can reduce the linguistic
specificity of FL.
Maybe an analogy will help here. In a wonderful book, What the hands reveal about the brain (here),
Poizner, Bellugi and Klima (PBK) describe the two ways that ASL speakers with
brains damage describe spatial layouts. As you all know, space has both a
grammatical and a physical sense for an ASL speaker. What PBK note is that when
this space is used grammatically then
it functions differently than when it is used physically. When used physically, stroke patients with spatial
deficits show all the characteristic problems that regular right hemisphere stroke
patients show (e.g. they use only half the space). However, when signing in the
space (i.e. using the space linguistically) then this particular spatial
deficit goes away and patients no longer neglect half the signing space. In
other words, depending on how the space is being used, linguistically or
physically, determines what deficits are observed.
Say something analogous is true with memory; when used in
computing grammatical properties, intervention is hierarchical. When used
otherwise, linear/temporal structure may arise.
Thus what counts as an intervener will depend what properties memory is
being used to determine. If something like this makes sense (or can be made to
make sense) then unifying RM effects as SBI effects with both related to how
human memory works looks possible.[13]
Enough. This post is both too long and too rambling. Let me
end with a provocative statement. Many at Athens felt that Darwin’s Problem
(the LPLE) is too vague to be useful. Some conceded that it might have poetic
charm, that is was a kind of inspirational ditty. Few (none?) thought that it
could support a research program. As I’ve said many times before, I think that
this is wrong, or at least too hasty. The obvious program that LPLE (aka
Darwin’s Problem) supports is a reductive/unificational one. To solve LPLE
requires showing that most of the principles operative in FL are
non-linguistically specific. This means showing how they could be reflections
of something more cognitively (or computationally or physically) general. RM
seems ripe for such a reanalysis in more general terms. However, successfully
showing that RM is a special case of SBI which is grounded in how human memory
operates will take a lot of work, and it might fail. However, the papers I’ve
cited above outline how redeeming this hunch might proceed. Can it work? Who
knows. Should it work? Yup, the LPLE/DP hangs on it.[14]
[1]
The first person I heard making this connection explicitly is Ivan Ortega
Santos. He did this in his 896 paper at UMD in about 2007 (a version published here).
It appears that others were barking up a similar tree somewhat earlier, as
reviewed in the paper by Friedmann, Belletti, Rizzi paper discussed in what
follows. The interested reader should go there for references.
[2]
Julie Van Dyke and Brian McElree (VD&M) wrote another paper that I found
helpful (here).
It tries to zero in on a more exact specification of the core properties of
content addressable memory systems. The feature that they identify as key is
the following:
The defining property of a
content addressable retrieval mechanism is that information (cues) in the
retrieval context enables direct access to relevant memory representations,
without the need to search through extraneous representations (164).
In effect, there is no cost to “search.” Curiously, I
believe that VD&M get this slightly wrong. CA specifies that information is
called in virtue of substantive properties of its contents. This could be combined with a serial search.
However, it is typically combined with a RAM architecture in which all
retrieval is in constant time. So general CA theories combine a theory of
addressability with RAM architecture, the latter obviating costs to search. That
said, I will assume that both features are critical to human memory and that
the description they offer above of CA systems correctly describes biological
memory.
[3]
In other words, unifying RM with CA systems would effectively treat RM effects
as akin to what one finds in self-embedding structures. These are well known to
be very unacceptable despite their grammaticality (e.g. *that that that they
left upset me frightened him concerned her).
[5]
In fact, you might enjoy comparing the pix in VD&J (p. 197) with (1) above
to see how close the conceptions are.
[6] VD&J do not report whether replacing both
D-NPs with pronouns or quantifiers reintroduces the Subject Advantage that
replacing barber eliminates. The
prediction would seem to be that it should on a straightforward reading of RM.
Thus, someone who you/he saw should
contras with someone who saw you/him
in roughly the same way that (2) and (3) do. VNP (see below) report some data
suggesting that quantifiers might pose separate problems. VD&M reporting on
the original Gordon & Co studies note “their data indicate that
similarity-based interference occurs when the second noun phrase is from the
same referential class as the first noun phrase, but it is reduced or
eliminated when the noun phrases are from different classes (158).” This
suggests that the SBI effects are symmetric.
[7]
The scare quotes are here for in the relevant examples exploit a “what makes
sense” metric, not a type measure. All the listed expressions in boat-sail and boat-fix examples are of the same semantic type, though only boats
are “sailable.” Thus it is really the semantic content that matters here, not
some more abstract features. VD&M review other data that points to the
conclusion that there are myriad dimensions of “similarity” than can induce
SBIs.
[8]
VD&J cites the work of Gordon and collaborators. They do not link the
abatement of SBIs to syntactic shape but to their semantic functions, their
“differeing referential status.” This could be tested. If Gordon is right, then
languages with names that come with overt determiners (e.g. Der Hans in German) should, on the
assumption that they function semantically the same as names in English do,
obviate SBIs when a D-NP is head of the relative. If MR is responsible, then
these should function like any other D-NP nominal and show a Subject Advantage.
[9]
This is the main point. Of course there are many details and controls to worry
about which is why AARO is 50 pages rather than a paragraph.
[10]
This might create trouble for the strong RM effects, like moving adjuncts out
of WH islands: *How did you wonder
whether Bill sang. This is a really bad
RM effect. This is correct and the question arises why so bad? Dunno. But then
again, we currently do not have a great theory of these ECP effects anyhow. One
could of course concoct a series of features that led to the right result, but,
unfortunately, one could also find features that would predict the opposite.
So, right now, these hard effects are
not well understood, so far as I can tell, by anyone.
[11]
Note the very tentative nature of this remark. Are there any results from
language processing/production/acquisition that implicate purely linear
relations? I don’t know (a little help would be nice from you in the know). The
NPI stuff and the agreement attraction errors are not entirely insensitive to
hierarchical structure. Maybe this: VNP cite work by Friedman and Costa which
shows that children have problems with crossing dependencies in coordinate
structures (e.g. The grandma1
drew the girl and t1
smiled). The “crossing” seems to be linear, not hierarchical. At any rate,
it is not clear that the psycho data cannot be reinterpreted in large part in
hierarchical terms.
[12]
However, from what I can tell, pure linear effects (e.g. due to decay) are
pretty hard to find and where they are found seem to be of secondary
importance. See VD&J and VD&M for discussion. VD&J sum things up as
follows:
…interference is the primary
factor contributing to the difficulty of integrating associated
constituents…wit a more specialized role arising for decay…
[13]
One other concern might be the following: aren’t grammatical restrictions
categorical while performance ones are not? Perhaps. Even RM effects lead to graded acceptability, with some
violations being much worse than others. Moreover, it is possible that RM
effects are SPI effects that have been “grammaticized.” Maybe. So RM is a
grammatical design feature of G objects so that such objects mitigate the
problems that CA memory necessarily imposes. I have been tempted to this view
in the past. But now I am unsure. My problem lies with the notion
“grammaticization.” I have no idea what this process is, what objects it
operates over, and how it takes gradient effects and makes them categorical. At
any rate, this is another avenue to explore.
[14]
There are some syntactic implications for the unification of RM and SBI effects
in terms of the structure of CA memory. For example, if RM effects are due to
CA architecture then issues of minimal search (a staple of current theories)
are likely moot. Why? Well, because as observed (see note 2), CA “enables
direct access to relevant memory representations, without the need to search
through extraneous representations.”
In other words, CA eschews serial search and so the
relevance of minimal search is moot if RM effects are just CA effects. All
targets are available “at once” with none more accessible than any others. In
other words, no target is further/less accessible than any other. Thus if RM
special case of CA then it not search
that drives it. This does not mean that distance does not matter, just that it
does not matter for search. Search
turns out to be the wrong notion.
Here’s another possible implication: if decay is a secondary
effect, then distance per se should
not matter much. What will matter is the amount of intervening “similar”
material. This insight is actually coded into most theories of locality: RM is
irrelevant if there is only one DP looking for its base position.
Interestingly, the same is true of phase based theories, for structures without
two DPs are “weak” phases and these do not impose locality restrictions. Thus,
the problems arise when there are structures supporting two relevant potential
antecedents of a gap, just as a theory of minimality based on SBI/CA would lead
one to suppose.