Friday, August 28, 2015

Stats and the perils of psych results

As you no doubt all know, there is a report today in the NYT about a study in Science that appears to question the reliability of many reported psych experiments. The hyperventilated money quote from the article is the following:

...a painstaking yearslong effort to reproduce 100 studies published in three leading psychology journals has found that more than half of the findings did not hold up when retested.
The clear suggestion is that this is a problem as over half the reported "results" are not. Are not what? Well, not reliable, which means to say that they may or may not replicate. This, importantly, does not mean that these results are "false" or that the people who reported them did something shady, or that we learned nothing from these papers. All it means is that they did not replicate. Is this a big deal?

Here are some random thoughts, but I leave it to others who know more about these things than I do to weigh in.

First, it is not clear to me what should be made of the fact that "only" 39% of the studies could be replicated (the number comes from here). Is that a big number or a small one? What's the base line? If I told you that over 1/3 of my guesses concerning the future value of stocks were reliable then you would be nuts not to use this information to lay some very big bets and make lots of money. If I were able to hit 40% of the time I came up to bat I would be a shoe-in inductee at Cooperstown. So is this success rate good or bad? Clearly the headline makes it look bad, but who knows.

Second, is this surprising? Well, some of it is not. The studies looked at articles from the best journals. But these venues probably publish the cleanest work in the field. Thus, by simple regression to the mean, one would expect replications to not be as clean. In fact, one of the main findings is that even among studies that did replicate, the effects sizes shrank. Well, we should expect this given the biased sample chosen from.

Third, To my mind it's amazing that any results at all replicated given some of the questions that the NYT reports being asked. Experiments on "free will" and emotional closeness? These are very general kinds of questions to be investigating and I am pretty sure that these phenomena are the the results of the combined effects of very many different kinds of causes that are hard to pin down and likely subject to tremendous contextual variation due to unknown factors. One gets clean results in the real sciences when causes can be relatively isolated and interaction effects controlled for. It looks like many of the experiments reported were problematic not because of their replicability but because they were not looking for the right sorts of things to begin with. It's the questions stupid!

Fourth, in my shadenfreudeness I cannot help but delight in the fact that the core data in linguistics gathered in the very informal ways that it is is a lot more reliable (see Sprouse and Almeida and Schutze stuff on this). Ha!!! This is not because of our methodological cleverness, but because what we are looking for, grammatical effects, are pretty easy to spot much of the time. This, does not mean, of course, that there aren't cases where things can get hairy. But over a large domain, we can and do construct very reliable data sets using very informal methods (e.g. can anyone really think that it's up for grabs whether 'John hugged Mary' can mean 'Mary hugged John'?). The implication of this is clear, at least to me: frame the question correctly and finding effects becomes easier. IMO, many psych papers act as if all you need to do is mind your p-values and keep your methodological snot clean and out will pop interesting results no matter what data you throw in. The limiting case of this is the Big Data craze. This is false, as anyone with half a brain knows. One can go further, what much work in linguistics shows is that if you get the basic question right, damn methodology. It really doesn't much matter. This is not to say that methodological considerations are NEVER important. Only that they are only important in a given context of inquiry and cannot stand on their own.

Fifth, these sorts of results can be politically dangerous even though our data are not particularly flighty. Why? Well, too many will conclude that this is a problem with psychological or cognitive work in general and that nothing there is scientifically grounded. This would be a terrible conclusion and would affect linguistic support adversely.

There are certainly more conclusions/ thoughts this report prompts. Let me reiterate what I take to be an important conclusion. What these studies shows is that stats is a tool and that method is useful in context. Stats don't substitute for thought. They are neither necessary nor sufficient for insight, though on some occasions they can usefully bring into focus things that are obscure. It should not be surprising that this process often fails. In fact, it should be surprising that it succeeds on occasion and that some areas (e.g. linguistics) have found pretty reliable methods for unearthing causal structure. We should expect this to be hard. The NYT piece makes it sound like we should be surprised that reported data are often wrong and it suggests that it is possible to do something about this by being yet more careful and methodologically astute, doing our stats more diligently. This, I believe, is precisely wrong. There is always room for improvement in one's methods. But methods are not what drive science. There is no method. There are occasional insights and when we gain some it provides traction for further investigation. Careful stats and methods are not science, though the reporting suggests that this is what many think it is, including otherwise thoughtful scientists.

Wednesday, August 26, 2015

I couldn't resist

As many of you know, when Neil Armstrong landed on the moon he forgot to bring along enough indefinite articles. On landing he made the following very famous statement:

That's one small step for man, one giant leap for mankind

The semanticists in the crowd will appreciate that what he meant to say was not this contradiction but "that's one small step for A man…"

For a while I thought that I found this lost indefinite. I found its way into Kennedy's famous Berlin speech, where he identified with the city's residents by uttering the famous sentence "Ich bin Ein Berliner." However, my German friends assured me that though Kennedy's shout out was well received and was very moving, it did not quite mean what it said. Apparently what he wanted to say was "Ich bin Berliner," the added Ein, the indefinite article leads to the following translation: "I am a jelly donut," a Berliner being a scrumptious city specific treat. At any rate, move the indefinite from Kennedy to Armstrong and historical linguistics is set right.

Ok, this is all just preamble for the real point of this post. I heard a great trivial story today that I thought I would relay to you. I know that I don't usually do this sort of stuff, but heh, WTH.  I got this from a friend and it puts Armstrong into a very positive light.  Here is what I got verbatim.


It's not really possible that UG doesn't exist

There are four big facts that undergird the generative enterprise:

1.     Species specificity: Nothing talks like humans talk, not even sorta kinda.
2.     Linguistic creativity: “a mature native speaker can produce a new sentence on the appropriate occasion, and other speakers can understand it immediately, though it is equally new to them’ (Chomsky, Current Issues: 7). In other words, a native speaker of a given L has command over a discrete (and for all practical and theoretical purposes) infinity of differently interpreted linguistic expressions.
3.     Plato’s Problem: Any human child can acquire any language with native proficiency if placed in the appropriate speech community.
4.     Darwin’s Problem: Human linguistic capacity is a very recent biological innovation (roughly 50-100 kya).

These four facts have two salient properties. First, they are more or less obviously true. That’s why nobody will win a Nobel prize for “discovering” any of them. It is obvious that nothing does language remotely like humans do, and that any kid can learn any language, and that there is for all practical and theoretical purposes an infinity of sentences a native speaker can use, and that the kind of linguistic facility we find in humans is a recentish innovation in biological terms (ok, the last is slightly more tendentious, but still pretty obviously correct). Second, these facts can usefully serve as boundary conditions on any adequate theory of language. Let’s consider them in a bit more detail.

(1) implies that there is something special about humans that allows them to be linguistically proficient in the unique way that they are. We can name the source of that proficiency: humans (and most likely only humans) have a linguistically dedicated faculty of language (FL) and “designed” to meet the computational exigencies peculiar to language. 

(2) implies that native speakers acquire Gs (recursive /procedures or rules) able to generate an unbounded number of distinct linguistic objects that native speakers can use to express their thoughts and to understand the expressions that other native speakers utter. In other words, a key part of human linguistic proficiency consists in having an internalized grammar of a particular language able to generate an unbounded number of different linguistic expressions. Combined with (1), we get to the conclusion that part of what makes humans biologically unique is a capacity to acquire Gs of the kind that we do.

(3) implies that all human Gs have something in common; they are all acquirable by humans. This strongly suggests that there are some properties P that all humans have that allow them acquire Gs in the effortless reflexive way that they do.

Indeed, cursory inspection of the obvious facts allows us to say a bit more: (i) we know that the data available to the child vastly underdetermines the kinds of Gs that we know humans can acquire thus (ii) it must be the case that some of the limits on the acquirable Gs reflect “the general character his [the acquirers NH] learning capacity rather that the particular course of his experience” (Chomsky, Current Issues; 112). (1), (2) and (3) imply that FL consists in part of language specific capacities that enable humans to acquire some kinds of Gs more easily than others (and, perhaps, some not at all).

Here’s another way of saying this. Let’s call the linguo-centric aspect of FL, UG. More specifically, UG is those features of FL that are linguistically specific, in contrast to those features of FL that are part of human or biological cognition more generally. Note that this allows for FL to consist of features that are not parts of UG. All that it implies is that there are some features of FL that are linguistically proprietary. That some such features exist is a nearly apodictic conclusion given facts (1)-(3). Indeed, it is a pretty sure bet (one that I would be happy to give long odds on) that human cognition involves some biologically given computational capacities unique to humans that underlie our linguistic facility. In other words, the UG part of FL is not null.[1]

The fourth fact implies a bit more about the “size” of UG: (4) implies that the UG part of FL is rather restricted.[2] In other words, though there are some cognitively unique features of FL (i.e. UG is not empty), FL consists of many operations that FL shares with other cognitive components and that are likely shared across species. In other words, though UG has content, most of FL consists of operations and conditions not unique to FL.

Now, the argument outlined above is often taken to be very controversial and highly speculative. It isn’t. That humans have an FL with some unique UGish features is a trivial conclusion from very obvious facts. In short, the conclusion is a no-brainer, a virtual truism! What is controversial, and rightly so, is what UG consists in. This is quite definitely not obvious and this is what linguists (and others interested in language and its cognitive and biological underpinnings) are (or should be) trying to figure out. IMO, linguists have a pretty good working (i.e. effective) theory of FL/UG and have promising leads on its fundamental properties. But, and I really want to emphasize this, even if many/most of the details are wrong the basic conclusion that humans have an FL with UGish touches must be right. To repeat, that FL/UG is a human biological endowment is (or should be) uncontroversial, even if what FL/UG consists in isn’t.[3] 

Let me put this another way, with a small nod to 17th and 18th century discussions about skepticism. Thinkers of this era distinguished logical certainty from moral certainty. Something is logically certain iff its negation is logically false (i.e. only logical truths can be logically certain). Given this criteria, not surprisingly, virtually nothing is certain. Nonetheless, we can and do judge that many more or less certain that are neither tautologies nor contradictions. Those things that enjoy a high degree of certainty but are not logically certain are morally certain. In other words, it is worth a sizable bet with long odds given. My claim is the following: that UG exists is morally certain. That there is a species specific dedicated capacity based on some intrinsic linguistically specific computational capacities is as close to a sure thing as we can have. Of course, it might be wrong, but only in the way that our bet that birds are built to fly might be wrong, or fish are built to swim might be. Maybe there is nothing special about birds that allow them to fly (maybe as Chomsky once wrly suggested, eagles are just very good jumpers). Maybe fish swim like I do only more so. Maybe. And maybe you are interested in this beautiful bridge in NYC that I have to sell you. That FL/UG exists is a moral certainty. The interesting question is what’s in it, not if it’s there.

Why do I mention this? Because in my experience, discussions in and about linguistics often tend to run the whether/that and the what/how questions together. This is quite obvious in discussions of the Poverty of Stimulus (PoS). It is pretty easy to establish that/whether a given phenomenon is subject to PoS, i.e. that there is not enough data in the PLD to fix a given mature capacity. But this does not mean that any given solution for that problem is correct. Nonetheless, many regularly conclude that because a proposed solution is imperfect (or worse) that there is no PoS problem at all and that FL/UG is unnecessary. But this is a non-sequitur. Whether something has a PoS profile is independent of whether any of the extant proposed solutions are viable.

Similarly with evolutionary qualms regarding rich UGs: that something like island effects fall under the purview of FL/UG is, IMO, virtually uncontestable. What the relevant mechanisms are and how they got into FL/UG is a related but separable issue. Ok, I want to walk this back a bit: that some proposal runs afoul of Darwin’s Problem (or Plato’s) is a good reason for re-thinking it. But, this is a reason for rethinking the proposed specific mechanism, it is not a reason to reject the claim that FL has internal structure of a partially UGish nature. Confusing questions leads to baby/bathwater problems, so don’t do it!

So what’s the take home message: we can know that something is so without knowing how it is so. We know that FL has a UGish component by considering very simple evident facts. These simple evident facts do not suffice to reveal the fine structure of FL/UG but not knowing what the latter is does not undermine the former conclusion that it exists. Different questions, different data, different arguments. Keeping this in mind will help us avoid taking three steps backwards for every two steps forward.

[1] Note that this is a very weak claim. I return to this below.
[2] Actually, this needs to be very qualified, as done here and here.
[3] This is very like the mathematical distinction between an existence proof vs a constructive proof. We often have proofs that something is the case without knowing what that something is.

Monday, August 24, 2015

Four papers to amuse you

I just returned from an excellent two weeks eating too much, drinking excessively, getting tanned and sweaty, reading novels and listening to weird opera. In the little time I had not so productively dedicated, I ran across four papers that readers might find interesting as they related to themes that have been discussed more than glancingly in FoL. Here they are.

The first deals with publishing costs and what can be done about it. Here’s a quote that will give you an excellent taste of the content: 

“The whole academic publishing industry is a gigantic web of avarice and selfishness, and the academic community has not engaged to the extent it perhaps should have to stop it,” Leeder said.
“Scholarly publishing is a bit like the Trans-Pacific Partnership Agreement. It’s not totally clear what the hell is going on, but you can be sure someone is making a hell of a lot of money from it.”

The second and third discuss the reliability of how successful pre-registration has been in regulating phising for significant results. It seems that the results have been striking. Here’s a taste again:

The launch of the registry in 2000 seems to have had a striking impact on reported trial results, according to a PLoS ONE study1 that many researchers have been talking about online in the past week.
A 1997 US law mandated the registry’s creation, requiring researchers from 2000 to record their trial methods and outcome measures before collecting data. The study found that in a sample of 55 large trials testing heart-disease treatments, 57% of those published before 2000 reported positive effects from the treatments. But that figure plunged to just 8% in studies that were conducted after 2000.

The third is a piece on one problem with peer review. It seems that some like to review themselves and are often very impressed with their own work. I completely understand this. Most interesting to me about this, is that this problem arose in Springer journals. Springer is one of the larger and more expensive publishers. It seems that the gate-keeping function of the prestige journals is not all that it is advertised to be.

The self-review issue, I suspect, though dramatic and fun (Hume provided an anonymous favorable review of the Treatise, if memory serves) is probably the tip of a bigger iceberg. In a small field like linguistics like-minded people often review one another’s papers (like the old joke concerning the New York Review of Each Other’s Books) and this can make it more difficult for unconventional views (those that fall outside the consensus) to get an airing. I believe that this partially explains the dearth of purely theoretical papers in our major journals. There is, as I’ve noted many times, an antipathy for theoretical speculation, an attitude reflected in the standard review process

The last article is about where “novel” genes come from.  Interestingly they seem to be able to just “spring into existence.” Moreover, it is claimed that this process might well be very common and “very important.” So, the idea that change might just pop up and make important contributions seems to be gaining biological respectability. I assume that I don’t need to mention why this possibility might be of interest to linguists.

Sunday, August 9, 2015

Universals again

I am on vacation and had hoped not to post anything for two weeks. But just as I was settling into a nice comfortable relaxing sojourn I get this in my inbox. It reports on a universal that Ted Gibson discovered that states that words that semantically go together are more often than not linearly close to one another. The answer, it seems, is that this makes it easier to understand what is being said and easier to express what you have in mind, or so the scientists tell us. My daughter and sister-in-law read this and were surprised to find that this kind of thing passes for insight, but it seems that it does. More interesting still, is the desire to see this surprising new finding as bearing on Chomsky's claims about universal grammar. It seems that we finally have evidence that languages have something in common, namely the tendency to put things that semantically go together linearly close to one another. Finally a real universal.

The astute reader will note that once again universal here is understood in Greenberg's terms, not Chomsky's. Happily, Ted Gibson notes as much (to, what I take to be the apparent disappointment of the interviewer) when asked how this stuff bears on Chomsky's claims (here). As he notes, it is largely at right angles to the relevant issues concerning Chomsky universals.

I now think that there is really no way to flush this misconception out of the intellectual cogno-sphere. For reasons that completely elude me, people refuse to think about Universals in anything but Grennbergian terms. I suspect that this is partly due to strong desire to find that Chomsky is wrong. And given that his big claim concerns universal grammar and given that it is not that hard to find that languages robustly differ in their surface properties, it seems pretty easy to show that Chomsky's claims were not only wrong, but obviously so. This fits well into a certain kind of political agenda: not only is Chomsky a political naif, but he is also a scientific one. If only he had bothered looking at the facts rather than remain blinkered by his many preconceptions. Of course, if Chomsky universals are not Greenberg universals then this simple minded dismissal collapses. That's one reason why, IMO, the confusion will be with us for a long time. It's a confusion that pays dividends.

There are several others as well. You all already know that I believe that Empiricism cannot help confusing the two kinds of universals, as Greenberg universals are the only kinds that the Eishly inclined can feel comfortable with. But this may be too fancy an account for why this stuff gets such widespread coverage in the popular press. For the latter, I think the simple desire to sink Chomsky suffices.

At any rate, for those that care, it's hard to believe that anyone will find this "discovery" surprising. Who would have thought that keeping semantically significant units close to one another could be useful. Sure surprised me.

Monday, July 27, 2015

Relativized Minimality and similarity based interference

The goal of the Minimalist Program is to reduce the language specificity of FL (i.e. UG) to the absolute minimum, so as to reduce the Logical Problem of Language Evolution (LPLE) to manageable size. The basic premise  is that LPLE is more challenging the more linguistiky FL’s operations and principles are. This almost immediately implies that to solve the LPLE requires radically de-linguistifying the structure of FL.

There are several reasonable concerns regarding this version of LPLE, but I will put these aside here and assume, with Chomsky, that the question, though vague, is well posed and worth investigating (see here and here for some discussion). One obvious strategy for advancing this project is to try and reduce/unify well grounded linguistic principles of UG with those operative in other domains of cognition. Of the extant UG principles ripe for such reconceptualization, the most tempting, IMO and that of many others as we shall see, is Relativized Minimality (RM). What is RM to be unified with? Human/biological memory. In particular, it is tempting to see RM effects as what you get when you shove linguistic objects through the human memory system.[1] That’s the half-baked idea. In what follows I want to discuss whether it can be more fully baked.

First off, why think that this is doable at all? The main reason is the ubiquity of Similarity Based Interference (SBI) effects in the memory literature. Here is a very good accessible review of the relevant state of play by Van Dyke and Johns (VD&J).[2] It seems that human (in fact all biological) memory is content addressable (CA) (i.e. you call the memory in terms of its contents (rather than, say, an index)). Further, the more the contents of specific memories overlap, the more difficult it is to successfully get at them. More particularly, say that one accesses a memory via certain content cues, the more these cues overlap the more they “overload” the retrieval protocol making it harder to successfully get the right one. On the (trivial) assumption that memory will be required to deal with the ubiquitous non-(linearly)-adjacent dependencies found in language we should expect to find restrictions on linguistic dependencies that reflect this memory architecture.[3] VD&J review various experiments that show the effects that distracters can have on retrieving the right target when these distracters “resemble” the interacting expressions.

Given that memory will create SBIs it is natural to think that some kinds of dependencies will be favored over others by this kind of memory architecture. Which? Well ones in which the cues/features relating the dependents are dissimilar from those of the intervening elements. Graphically, (1) represents the relevant configuration. In (1), if non-adjacent X and Y need to be related (say there is a movement or antecedence dependency between the two) then this will be easiest if the cues/features relating them are not also shared by intervening Zish elements.

(1)  …X…Z…Y…

This should look very familiar to any syntactician who has ever heard the name ‘Luigi Rizzi’ (and if you haven’t think of either changing fields or getting into a better grad program). Since Rizzi’s 1990 work, (1), in the guise of RM, is standardly used to explain why WHs cannot move across other WHs (e.g. superiority and Wh-island effects) or heads over heads (the head movement constraint).

IMO, RM is one of the prettiest (because simplest) empirically useful ideas to have ever been proposed in syntax.[4] Moreover, its family resemblance to the kinds of configurations that induce SBI effects is hard to miss.[5] And the lure of relating the two is very tempting, so tempting that resistance is really perverse. So the question becomes, can we understand RM effects as species of SBI effects and thus reflections of facts about memory architecture?

Psychologists have been pursing a similar (though as we shall see, not identical) hunch for quite a while. There is now good evidence that VD&J reviews that encumbering (working) memory with word lists while sentences are being processed differentially affects processing times of non-local dependencies and that the difficulty is proportional to how similar the words in memory are with words in the sentence that need to be related. Thus, for example, if you are asked to keep in memory the triad TABLE-SINK-TRUCK while processing It was the boat that the guy who lived by the sea sailed after two days then you do better at establishing the dependency between boat and sail then if you are asked to parse the same sentence with fix in place of sail. Why? Because all three of the memory list words are fixable, while none are sailable. This fact makes boat harder to retrieve in the second fix sentence than the first sail sentence (p. 198-9).

Syntactic form can also induce interference effects. Thus, the subject advantage inside relative clauses (RC) (i.e. it is easier to parse subject relatives than object relatives, see here) is affected by the kinds of DPs present in the RC.  In particular take (2) and (3). The Subject Advantage is the fact that (2) is easier to parse.

(2)  The banker that praised the barber climbed the mountain 
(3)  The banker that the barber praised climbed the mountain

VD&J note that SBI effects are detectable in such cases as the Subject Advantage can be reduced or eliminated if in place of D-NP nominal like the barber one puts in pronouns, quantified DPs like everyone and/or proper names. The reasoning is that the definite descriptions interfere with one another, while the names, pronouns and quantifiers interfere with D-NPs far less.[6]

VD&J offers many more examples making effectively the same point: that the contents of memory can affect sentential parsing of non-local dependencies and that they do so by making retrieval harder.

So, features matter, both syntactic features and “semantic” ones (and, I would bet other kinds as well).

What are the relevant dimensions of similarity? Well, it appears that many things can disrupt, including grammatical and semantic differences. Thus, the “semantic” suitability of a word on the memorized word list and the syntactic features that differentiate one kind of nominal from another can serve to interfere with establishing the relation of interest. [7]

Friedmann, Belletti and Rizzi (FBR) (here) reports similar results, but this time for acquisition. It appears, for example, that subject relatives are more easily mastered than object relatives, as are subject vs object questions. FBR discusses data from Hebrew. Similar results are reported for Greek by Varlokosta, Nerantzini and Papadopoulou (VNP) here. Moreover, just as in the processing literature, it appears that DPs interfere with one another the more similar they are. Thus, replacing D-NP nominal with relative pronouns and bare whs (i.e. what vs what book) eases/eliminates the problem. As FBR and VNP note, the subject advantage is selective and, in their work, is correlated with the syntactic shapes of the intervening nominal.[8] The more similar they are, the more the problems caused.

So, at first blush, the idea that RM effects and SBI effects are really the same thing looks very promising. Both treat the shared features of the interveners and dependents as the relevant source of “trouble.” However (and you knew that knew was coming, right?) things are likely more complicated. What’s clear is that features do make a difference, including syntactic ones. However, what’s also clear is that not only syntactic shape/features matters. So do many other kinds.

Moreover, it is not clear which similarities cause problems and which don’t. For example, the standard RM model (and the one outlined in FBR) concentrates on cases where the features are identical vs when they overlap vs when they are entirely disjoint. The problem with relative clauses like (3) for example, is that the head of the relative clause and the intervening subject have the exact same syntactic D-NP shape and the reason subbing a pronoun or name or quantifier might be expected to mitigate difficulty is that the subject intervener only share some of their features thereby reducing the minimiality effects. So in the case of RCs the story works as we expect.

The problem is that there are other data to suggest that this version of RM delivers the wrong answers in other kinds of cases. For example, a recent paper by Atkinson, Apple, Rawlins and Omaki (here) (AARO) shows that “the distribution of D-linking amelioration effect [sic] is not consistent with Featural Relativized Minimality’s predictions…” (1). AARO argues that carefully controlled rating methods of the experimental syntax variety show that moving a which-NP over a which-NP in Spec C is better than moving it over a who (i.e. (4) is reliably better than (5)). This is not what is expected given the featural identity in the first case and mere overlap in the second.[9]

(4)  Which athlete did she wonder which coach would recruit
(5)  Which athlete did she wonder who would recruit

IMO, superiority shows much the same thing. So (6) is quite a bit better than (7) to my ear.

(6)  I wonder which book which boy read
(7)  I wonder which book who read

Once again, the simple syntactic version of RM suggests that the opposite should be the case. If this is so, then there is more than just structural similarity involved in RM effects.

This, however, might be a good thing if one’s aim is to treat RM effects as instances of more general SBI effects. We expect many different factors to interact to provide a gradation of effects, with syntactic shape being one factor among many. The AARO data suggests that this might indeed be correct, as are the parallels between the VD&J parsing data and the FBR/VNP acquisition data. So, if AARO is on the right track, it argues in favor of construing RM effects as kinds of SBI effects, and this is what we would expect were RM not a grammatically primitive feature of FL/UG, but the reflection of general memory architecture when applied to linguistic objects. In other words, this need not be a problem, for this is what one would expect if RM were just a species of SBI (and hence traceable to human memory being content addressable).[10]

What is more problematic, perhaps, is settling what “intervention” means. In the memory literature intervention is entirely a matter of temporal order (i.e. if Z is active when X and Y are being considered it counts as an “intervener,” i.e. roughly speaking if Z is temporally between X and Y then Z intervenes), while for RM the general idea is that intervention is stated in terms of c-command (i.e. Z intervenes between X and Y if X c-commands Z and Z c-commands Y) and this has no simple temporal implications. Thus, the notion explored in the memory literature mainly explores a “linear” notion of intervention while RM relies on a structural notion and so it is not clear that RM effects should be assimilated to memory effects.

However, I am not currently sure how big a problem this might be. Here’s what I mean.

First, much of the literature reviewed in VD&J involves English data where linear and hierarchical intervention will be the same. We know that when these two are pulled apart in many cases it is hierarchy that matters (see the discussion of the Yun et. al. paper here. It shows that the Subject Advantage persists in languages where linear order and hierarchical order go in different directions).

Similarly, Brian Dillon (here) and Dave Kush (in his unavailable thesis; ask him) show that hierarchical, not linear, intervention is what’s critical in computing binding relations.

Of course, there are also cases where hierarchy does not rule, and linear intervention seems to matter (e.g. agreement attraction errors and certain NPI licensing illusions are less sensitive to structural information, at least in some online tasks, then hierarchical restrictions suggest they should be).[11]

So both notions of proximity seem to play a role in language processing. I don’t know whether both have an effect in acquisition (but see note 11). So, does this mean that we cannot unify RM effects with SBI effects for they apply in different kinds of configurations?  Maybe not.  Here’s why.

Memory effects arise from two sources: the structure of memory (e.g. is it content addressable, rates of decay, number of buffers, RAM etc.) and the data structures that memory works on. It is thus possible that when memory manipulates syntactic structures that it will measure intervention hierarchically because linguistic objects are hierarchically structured. In other words, if phrase markers are bereft of linear order information (as say a set theoretic understanding of phrase markers entails) then when memory deals with these it will not be able to use linear notions to manipulate them because such objects have no linear structure. In these cases, when task demands use memory to calculate the properties of phrase markers, then RM effects is what we expect to see: SBIs with c-command determining intervention.  Of course, sentences when used have more than hierarchical structure and it is reasonable to suppose that this too will affect how linguistic items are used.[12] However, this does not prevent thinking of RM as a memory effect defined over PM-like data structures. And there is every reason to hope that this is in fact correct for if it is then we can treat RM as a special case of a more general cognitive fact about us; that we have content addressable memories that are subject to SBI effects. In other words, we can reduce the linguistic specificity of FL.

Maybe an analogy will help here. In a wonderful book, What the hands reveal about the brain (here), Poizner, Bellugi and Klima (PBK) describe the two ways that ASL speakers with brains damage describe spatial layouts. As you all know, space has both a grammatical and a physical sense for an ASL speaker. What PBK note is that when this space is used grammatically then it functions differently than when it is used physically. When used physically, stroke patients with spatial deficits show all the characteristic problems that regular right hemisphere stroke patients show (e.g. they use only half the space). However, when signing in the space (i.e. using the space linguistically) then this particular spatial deficit goes away and patients no longer neglect half the signing space. In other words, depending on how the space is being used, linguistically or physically, determines what deficits are observed.

Say something analogous is true with memory; when used in computing grammatical properties, intervention is hierarchical. When used otherwise, linear/temporal structure may arise.  Thus what counts as an intervener will depend what properties memory is being used to determine. If something like this makes sense (or can be made to make sense) then unifying RM effects as SBI effects with both related to how human memory works looks possible.[13]

Enough. This post is both too long and too rambling. Let me end with a provocative statement. Many at Athens felt that Darwin’s Problem (the LPLE) is too vague to be useful. Some conceded that it might have poetic charm, that is was a kind of inspirational ditty. Few (none?) thought that it could support a research program. As I’ve said many times before, I think that this is wrong, or at least too hasty. The obvious program that LPLE (aka Darwin’s Problem) supports is a reductive/unificational one. To solve LPLE requires showing that most of the principles operative in FL are non-linguistically specific. This means showing how they could be reflections of something more cognitively (or computationally or physically) general. RM seems ripe for such a reanalysis in more general terms. However, successfully showing that RM is a special case of SBI which is grounded in how human memory operates will take a lot of work, and it might fail. However, the papers I’ve cited above outline how redeeming this hunch might proceed. Can it work? Who knows. Should it work? Yup, the LPLE/DP hangs on it.[14]

[1] The first person I heard making this connection explicitly is Ivan Ortega Santos. He did this in his 896 paper at UMD in about 2007 (a version published here). It appears that others were barking up a similar tree somewhat earlier, as reviewed in the paper by Friedmann, Belletti, Rizzi paper discussed in what follows. The interested reader should go there for references.
[2] Julie Van Dyke and Brian McElree (VD&M) wrote another paper that I found helpful (here). It tries to zero in on a more exact specification of the core properties of content addressable memory systems. The feature that they identify as key is the following:
The defining property of a content addressable retrieval mechanism is that information (cues) in the retrieval context enables direct access to relevant memory representations, without the need to search through extraneous representations (164).
In effect, there is no cost to “search.” Curiously, I believe that VD&M get this slightly wrong. CA specifies that information is called in virtue of substantive properties of its contents. This could be combined with a serial search. However, it is typically combined with a RAM architecture in which all retrieval is in constant time. So general CA theories combine a theory of addressability with RAM architecture, the latter obviating costs to search. That said, I will assume that both features are critical to human memory and that the description they offer above of CA systems correctly describes biological memory.
[3] In other words, unifying RM with CA systems would effectively treat RM effects as akin to what one finds in self-embedding structures. These are well known to be very unacceptable despite their grammaticality (e.g. *that that that they left upset me frightened him concerned her).
[4] My other favorite is the A-over-A principle. Happily, these two can be unified as two sides of the same RM coin if constituents are labeled. See here for one version of this unification.
[5] In fact, you might enjoy comparing the pix in VD&J (p. 197) with (1) above to see how close the conceptions are.
[6]  VD&J do not report whether replacing both D-NPs with pronouns or quantifiers reintroduces the Subject Advantage that replacing barber eliminates. The prediction would seem to be that it should on a straightforward reading of RM. Thus, someone who you/he saw should contras with someone who saw you/him in roughly the same way that (2) and (3) do. VNP (see below) report some data suggesting that quantifiers might pose separate problems. VD&M reporting on the original Gordon & Co studies note “their data indicate that similarity-based interference occurs when the second noun phrase is from the same referential class as the first noun phrase, but it is reduced or eliminated when the noun phrases are from different classes (158).” This suggests that the SBI effects are symmetric.
[7] The scare quotes are here for in the relevant examples exploit a “what makes sense” metric, not a type measure. All the listed expressions in boat-sail and boat-fix examples are of the same semantic type, though only boats are “sailable.” Thus it is really the semantic content that matters here, not some more abstract features. VD&M review other data that points to the conclusion that there are myriad dimensions of “similarity” than can induce SBIs.
[8] VD&J cites the work of Gordon and collaborators. They do not link the abatement of SBIs to syntactic shape but to their semantic functions, their “differeing referential status.” This could be tested. If Gordon is right, then languages with names that come with overt determiners (e.g. Der Hans in German) should, on the assumption that they function semantically the same as names in English do, obviate SBIs when a D-NP is head of the relative. If MR is responsible, then these should function like any other D-NP nominal and show a Subject Advantage.
[9] This is the main point. Of course there are many details and controls to worry about which is why AARO is 50 pages rather than a paragraph.
[10] This might create trouble for the strong RM effects, like moving adjuncts out of WH islands: *How did you wonder whether Bill sang. This is a really bad RM effect. This is correct and the question arises why so bad? Dunno. But then again, we currently do not have a great theory of these ECP effects anyhow. One could of course concoct a series of features that led to the right result, but, unfortunately, one could also find features that would predict the opposite. So, right now, these hard effects are not well understood, so far as I can tell, by anyone.
[11] Note the very tentative nature of this remark. Are there any results from language processing/production/acquisition that implicate purely linear relations? I don’t know (a little help would be nice from you in the know). The NPI stuff and the agreement attraction errors are not entirely insensitive to hierarchical structure. Maybe this: VNP cite work by Friedman and Costa which shows that children have problems with crossing dependencies in coordinate structures (e.g. The grandma1 drew the girl and t1 smiled). The “crossing” seems to be linear, not hierarchical. At any rate, it is not clear that the psycho data cannot be reinterpreted in large part in hierarchical terms.
[12] However, from what I can tell, pure linear effects (e.g. due to decay) are pretty hard to find and where they are found seem to be of secondary importance. See VD&J and VD&M for discussion. VD&J sum things up as follows:
…interference is the primary factor contributing to the difficulty of integrating associated constituents…wit a more specialized role arising for decay…
[13] One other concern might be the following: aren’t grammatical restrictions categorical while performance ones are not? Perhaps. Even RM effects lead to graded acceptability, with some violations being much worse than others. Moreover, it is possible that RM effects are SPI effects that have been “grammaticized.” Maybe. So RM is a grammatical design feature of G objects so that such objects mitigate the problems that CA memory necessarily imposes. I have been tempted to this view in the past. But now I am unsure. My problem lies with the notion “grammaticization.” I have no idea what this process is, what objects it operates over, and how it takes gradient effects and makes them categorical. At any rate, this is another avenue to explore.
[14] There are some syntactic implications for the unification of RM and SBI effects in terms of the structure of CA memory. For example, if RM effects are due to CA architecture then issues of minimal search (a staple of current theories) are likely moot. Why? Well, because as observed (see note 2), CA “enables direct access to relevant memory representations, without the need to search through extraneous representations.”
In other words, CA eschews serial search and so the relevance of minimal search is moot if RM effects are just CA effects. All targets are available “at once” with none more accessible than any others. In other words, no target is further/less accessible than any other. Thus if RM special case of CA then it not search that drives it. This does not mean that distance does not matter, just that it does not matter for search. Search turns out to be the wrong notion.

Here’s another possible implication: if decay is a secondary effect, then distance per se should not matter much. What will matter is the amount of intervening “similar” material. This insight is actually coded into most theories of locality: RM is irrelevant if there is only one DP looking for its base position. Interestingly, the same is true of phase based theories, for structures without two DPs are “weak” phases and these do not impose locality restrictions. Thus, the problems arise when there are structures supporting two relevant potential antecedents of a gap, just as a theory of minimality based on SBI/CA would lead one to suppose.