Faculty of Language: Darwin's problem; some reasonable

Tuesday, May 12, 2015

Darwin's problem; some reasonable

Karthik Durvasula has pointed me to a thoughtful blog post by the Confused Academic (CA) critical of using Darwin’s Problem (DP) kind of considerations as a criterion in the evaluation of linguistic proposals (see here).[1] The main thrust of the (to repeat, very reasonable) points made is that we know very little about how evolutionary considerations apply to cognitive phenomena in general and linguistic phenomena in particular and, as such, we should not expect too much from DP kinds of considerations. Indeed, as I noted here, there are several problems with giving a detailed account how FL evolved. Let me remind you of some of the more serious issues, as CA adverts to similar ones.

The most obvious is the remove between cognitive powers and genetic ones. In particular, for an account of the evolution of FL we need a story of how minds are incarnated in brains and how brains are coded in genes. Why? Because evolution rejiggers genomes, which in turn grow brains, which in turn secrete cognition. Sadly, every link of this chain is weak, most particularly in the domain of language (though the rest of cognition is not in much better shape as Lewontin’s famous piece (noted by CA) emphasizes). We really don’t know much about the genetic bases of brain development, nor do we know much about how brains realize FL. So, though we do know a fair bit about the cognitive structure of FL, we don’t have any really good linking hypotheses taking us from this to the brain and the genome, which are the structures that evolution manipulates to work its magic. In other words, to really explain how FL evolved (at least in detail) we need to account for how the brains structures that embody FL rose via alterations in our ancestors’ genomes (broadly construed to include epigenetic factors), and right now, though we have decent cognitive descriptions of FL, we have no good way of linking these up to brains and genes.

Second, if Lewontin is right (and I for one found his discussion entirely persuasive) then the prospects of giving standard evo accounts of how FL evolved will be largely nugatory due to the virtual impossibility of finding the relevant evidence, e.g. there really exist no “fossil” records to exploit and there is really nothing like our linguistic facility evident in any of our primate “cousins.” This makes constructing a standard evolutionary account empirically very dicey.

In sum, things do not look good, so there arises the very reasonable question of what good DP thinking is for the practicing linguist. That’s the question CA asks. Here’s what I think (acknowledging that the problems CA notes are serious): Despite these problems, I still think that DP thinking is useful. Let me say why.

As I’ve noted (I suspect too many times) before (e.g. here) there is a tension between Plato’s Problem (PP) and DP. The former encourages packing as much as possible into UG to make the acquisition problem easier while the latter eschews this strategy to make the evolvability problem more tractable. Put another way: the more linguistic knowledge is given rather than acquired[2], the easier it is to account for why acquisition is as easy and effortless as it seems to be. However, the more there is packed into FL/UG the more challenging it is to explain how this knowledge could have evolved. This is the tension, and here is why I like DP: this tension is a creative one. These two problems together generate a very interesting theoretical problem: how to have one’s DP cake and PP’s too (i.e. how to allow for a solution to both PP and DP). I have suggested several strategies about how this might be accomplished that leads, IMO, to an interesting research program (e.g. here). My claim is that if you find this program attractive, then you need to take DP semi-seriously. What do I mean by “semi” here? Well, nobody expects to explain how FL actually evolved given how little we know about the relevant bridging assumptions (see above), but by thinking of the problem in tandem with PP we know the kinds of things we need to do (e.g. effectively eliminate the G internal modularity of characteristic of GB style theories and show that all the apparently different linguistic dependencies found outlined in the separate GB modules are effectively one and the same). That’s the first necessary step needed to reconcile DP and PP.[3]

The second, also motivated by DP, is yet more interesting (and challenging): to try and factor out those G operations, principles and primitives that are not linguistically specific. Thus, given DP we want not only a simpler (more elegant, more beautiful yadda, yadda, yadda) theory, but a particular kind of simpler etc. theory. We want one with as little linguistic specificity as possible. Maybe an example would help here.

The cyclic nature of derivations has long been a staple of GG theory. Ok, how to explain this? Earlier GG theory simply stipulated it: rules apply cyclically. The Minimalist Program (MP) has tried to offer slightly deeper motivation. There are two prominent accounts in the literature: the cycle as expression of the Extension Condition (EC) and the cycle as the expression of feature checking (Featural Cyclicity (FC)). FC is the idea that “bad” (unvalued or un-interpretable) features must be discharged quickly (e.g. in the phase or phrase that introduces them). EC says that derivations are monotonic (i.e. constituents that are inputs to a G operation must be constituents in the output of the operation).

There are various empirical linguistic-theory internal reasons that have been offered for preferring one or the other of these ideas. Both apply over a pretty general domain of cases and, IMO, it is hard to argue that either is in any relevant sense “simpler” than the other. However, IMO, the world would be a better place minimalistically were the EC the right way to conceptually ground the cycle. Why? Because it has the right generic feel to it because it looks like a very generic property of computations. In other words, IMO, it would be natural to find that cognitive rules systems in general are monotonic (information preserving) so that to find this holding of Gs would not be a surprise. FC, on the other hand, strikes me as relying on quite linguistically special assumptions about the toxicity of certain linguistic features and how quickly they need to be neutralized (some examples of very linguistically special properties: only probes contain toxic features, only phase heads have toxic features, toxic features must be very quickly eliminated, Gs contain toxic features at all). Personally, I find it hard to see FC and its special conception of features generalizing to other cognitive domains of third factor considerations. Of course I could be wrong (indeed given my track record, I am likely wrong!). What’s nice about a DP perspective is that it encourages us to try and make this kind of evaluation (i.e. ask how generic/specific a proposed operation/principle is) and, if my reasoning is on the right track, it suggests that we should try to maintain EC as a core feature precisely because it is plausibly not linguistic specific (i.e. not tied to special properties of human Gs).[4]

You could rightly object that such considerations are hardly dispositive, and I would agree. But if the above form of reasoning is even moderately useful, it suggests that DP considerations can have some theoretical utility. So, DP points to certain kinds of accounts and encourages one to develop theories with a certain look. It encourages the unification of GB modules and the elimination of linguistically specific features of Gs. It does this by highlighting the tension between DP and PP. One might construe all of this in simplicity terms, but from where I sit, DP encourages a very specific kind of simple, elegant theory (i.e. it favors some simple theories over others) and for this alone it earns its theoretical keep.[5]

CA has one other objection to DP that I would like to briefly touch on: DP encourages reductionism and reductionism is not a great methodological stance. I have two comments.

First, I am not personally against reduction if you can get it. My problem with it is that it very hard to come by, and I do not expect it to occur any day soon in my neck of the scientific woods. However, I do like unification and think that we should be encouraged to seek it out. As my good and great friend Elan Dresher once sagely noted: there are really only two kinds of linguistic papers. The first shows that two things that appear completely different are roughly the same. The second shows that two things that are roughly the same are in fact identical. I don’t know about linguistic papers in general, but this is a pretty good description of what a good many theory papers look like. Unification is the name of the game. So if by reduction we mean unification, then I am for it. Nothing wrong with it and a great deal that is right In fact, there is nothing wrong with real reduction either, if you can pull it off. Sadly, it is very very hard to do.

Second, I think that for MP to succeed then we should expect lots of reduction/unification. We will need to unify the modules (as noted above) and unify certain cognitive and computational operations. If we cannot manage this, then I would conclude that the main ideas/intuitions behind MP are untenable/unworkable/sterile. So, maybe unlike CA I welcome the reductionist/unificationist challenge. Not only is this good science, DP is betting that it is the next important step for GG.

This said, there is something bad about reduction, and maybe this is what CA was worried about. Some reductionism takes it as obvious that the reducing science is more epistemologically privileged than the reduced one. But I see no reason for thinking that the metaphysics of the case (e.g. if A reduces to B) implies that the reducing theory is more solidly grounded epistemologically than the reduced one. So the fact that A might reduce to B does not mean that B is the better theory and that A’s results must bow to B’s theoretical diktats. Like all science, reduction/unification is a completed affair. It is best attempted when there is a reasonable body of doctrine in the theories to be related. I suspect that CA (and I am pretty sure Karthik) thinks that this is actually the main problem with DP at this time. It is premature (and if Lewontin is right, it will remain premature for the foreseeable future) precisely because of the problems I noted above concerning the assumptions required to bridge genes and cognition. My only response to this is that I am (slightly) more optimistic. I think DP considerations, even if inchoate, have purchase, though I agree that we should proceed carefully given how little we know of the details. So, we should make haste very very slowly.

Let me end. I think that DP has raised important questions for theoretical linguistics. Like most questions, they are yet somewhat undefined and fuzzy. Our job is to try and make them clearer and find ways of making them empirically viable. I believe that MP has succeeded in this to a degree. This said, CA (and Karthik) are right to point out the problems. Needless to say (or as I have said), I remain convinced that DPish considerations can and should play a role in how we develop theory moving forward for if nothing else (and IMO this is enough) it helps imbue the all to vague notions of elegance and simplicity with some local linguistic content. In other words, DP clarifies what kinds of elegant and simple theories of FL and UG we should be aiming for.

[1] Importantly, CA is not anti minimalist and believes that general criteria like elegance and simplicity (suitably contextualized for linguistics) can play a useful role. It’s DP that bothers CA not theoretical desiderata on syntactic theories.

[2] Indeed the more linguistic specific this knowledge is, the easier it is to explain the facility of acquisition despite the limitations in the PLD for G construction.

[3] I should be careful here: it is conceptually possible that one small genetic change creates a brain that looks GBish. Recall, we really don’t know how a fold here and there in a brain unlocks cognition. But, if we assume that the small thing that happened genetically to change our brains resulted in a simple new cognitive operation being added to our previous inventory, we are home free. This seems like a reasonable assumption, though it might be wrong. Right now, it is the simplest least convoluted assumption. So it is a good place to start.

[4] To say what need not be said: none of this implies that EC is right and FC wrong. It means that EC has more MPish value than FC does if you share my views. So, if we want to pursue an MPish line of inquiry, then EC is a very good way to go. Or, you should demand lots of good empirical reasons for rejecting it. DP, like PP, when working well, conceptually orders hypotheses making some more desirable than others all things being equal.

[5] These considerations were prompted by a very fruitful e-mail exchange with Karthik. He pointed out that notions like simplicity etc., in order to be useful, need to be crafted to apply to a domain of inquiry. The above amounts to suggesting that DP helps in isolating the right kind of simplicity considerations.

44 comments:

UnknownMay 12, 2015 at 6:29 AM
It's Durvasula, not Darvusala.
ReplyDelete
Replies
AnonymousMay 12, 2015 at 5:24 PM
I don't think reductionism is really relevant to this issue. I agree that it is vastly premature to try to reduce language to neuropsychology, let alone genetics, but I don't think evocations of DP attempt that. The point of it to me is just a nod to say, "let our theories be *congruent* with the many possible shapes that an evolutionary explanation could take."

Dissenters here seem to be saying that because we know next to nothing about the mechanisms of cognitive evolution, we cannot rule out any speculations about the speed or products of cognitive evolution. I think this is a non-sequitur. Despite us not understanding HOW cognition evolves, we have substantial examples of WHAT has evolved and WHEN, and that precedence ought to set limits on our hypotheses.

We know, for example, that there is no evidence in the 3.5bn years of life on this planet for an organism that evolved in the space of, say, 10,000 years, a cognitive ability for intuitively understanding the fundamental laws of physics. This is an extreme example but the point is this: are you going to say that we can't rule out the possibility that that COULD actually happen just because we don't know how cognition evolves in general, or does it not seem sensible to say that the precedence of life on earth makes it quite likely that there are design constraints making such an evolution impossible even though we don't fully understand what they are?

Obviously, we don't have the luxury of linguistic cognition being so extreme, even though it is unique, so we don't find ourselves at a clear boundary. We can't say with certainty that a large amount of domain-specific knowledge couldn't evolve to be innate in a short time. However, I think it's reasonable to look at the time-frame we have and the communicative abilities of our closest relatives, and then appeal to the general trends in cognitive development throughout the planet's history to propose that we should be looking for a small, non-gradual change, concluding that we need to reduce the innate content we are theorising. It may be drastically wrong but it's a basis for inquiry.

Besides, I think it's important that this gives us cause beyond Occam's Razor to avoid stuffing UG full of domain-specific knowledge because, without DP, why should we automatically evoke simplicity particularly in syntactic operations? Why mightn't we instead desire simplicity in language acquisition by throwing a little more into our bag of innate tricks? There is a complex, multi-layered give-and-take with simplicity and I think it is a falsehood to believe that there will eventually appear only ONE theory that is THE simplest theory and it just so happens that the minimalist program will take us there. There will be competing simple theories in different frameworks and DP could help us choose between them.
ReplyDelete
Replies
Confused AcademicMay 12, 2015 at 6:13 PM
@Norbert: First of all, thanks for discussing the content of my post. I too think the issue is relevant to how the field progresses. I think you have represented my position very fairly, and I want to address (deconstruct?) the one particular argument for the utility of DP constraints that you discuss in the post: The extension condition (EC) vs. feature cyclicity (FC).

The fundamental argument for why you prefer EC over FC is this:
(a) “Because it has the right generic feel to it because it looks like a very generic property of computations.”

You also mention:
(b) “it would be natural to find that cognitive rules systems in general are monotonic”.

In regards to (a), we know a fair bit about the general theory of computation, and perhaps Alex Clark, Greg Kobele, Jeff Heinz, or Thomas Graf (and other regular contributors to the blog) could add to the discussion to see if EC is indeed a more reasonable extension of our current understanding of “computation”. Given we know something reasonable about the topic, of course it can serve as a decent constraint on linguistic theorizing, and is in a sense more unificationist. Crucially, this is not a constraint from DP, but from our general theory of computation.

But, (b) has no evidentiary force at all, since we know precious little about other cognitive systems to apply this argument. It boils down to an argument from ignorance. Add to this, that we know nothing about the evolution of cognition, so the expectation of monotonicity is nothing but intuitionist hope at this point (a point that Karthik raised too). Therefore, it seems like pure and unhealthy reductionism, based on intuition! IF this is allowed, we might as well allow associationist constraints, since they are also strongly in sync with many people’s intuitions. (b) to me is simply just-so-y theorizing - something we should all beware.

Ultimately, it is (a) that seems to drive your argument; the argument does not depend on how computation is carried out in natural systems, but actually on the EC being more naturally situated in some view of computation that you have. Therefore, the actual substantive force of the argument comes not from DP considerations, but from your view of what is more natural in a general theory of computation as envisioned by you. Furthermore, to the extent that the minimal (“simplest”) extension is what one means when they say “X” is more naturally situated within a view, your argument (as I pointed out in my blog post) has again appealed at some level to simplicity criteria. This much seems inevitable (again, as touched upon by KD above).

To repeat the basic argument of my own blog post, since we know nothing about the evolution of cognition and are only now slowly discovering the cognitive abilities of other species, how could it possibly put any DP constraint on our theorizing of the language faculty? The arguments I have seen inevitably boil down to simplicity criteria at some level, and don’t actually need the invocation of DP at all. This, as I see it, is a good result, since the invocation of DP in theorizing about the language faculty is reductionism in the bad sense, and not unificationism (given that, to repeat myself, we know nothing of the evolution of cognition).

Note: I initially created this id to point out the silliness of CB’s argumentation. But, more recently I have realized anonymity is a great way to focus on the argument, away from names and associated baggage, as it enforces a blind review of the content/argument at hand. Though I don’t comment much (if at all), I am indeed a regular reader of your blog, and I thank you sincerely for creating a forum where linguists of all stripes can grapple with issues that are core to the enterprise. And thanks to KD for passing on the blog post to you.
ReplyDelete
Replies
davidadgerMay 17, 2015 at 11:05 AM
@Greg. I think the reason I take there to be a specification of structure that is not just input/output is that I'm impressed by cases like those discussed by Chomsky in Aspects where knowledge goes beyond use (all the standard cases of memory issues in parsing etc) and a theory that generates configurations and interprets them seems to get that right(isn) without saying much extra. So it probably boils down to that debate and hunches as to how it's best resolved. Also, it's a way of thinking that allows me to get a grip on those sociovariation questions I've been worrying about for a while. And the cyclicity of syntactic computation doesn't seem to track memory issues as usually understood. And ... well, I guess I think all of the answers to these issues fall in one direction for me, which is a generate and interface architecture, rather than an input/output with the grammar as an abstraction architecture.
ReplyDelete
Replies
davidadgerMay 18, 2015 at 6:58 AM
@greg. I'm a fan of CCG (Mark was my first ever syntax teacher, when I was 18!) and that comment of Mark's is very compatible with the pretty common view that minimalist syntactic derivations are computations of semantics with spell out as a sideshow. I guess there might be different ways within CCG of building in processing difficulty into the formalism itself (e.g. in particular derivations penalising particular interactions of composition and type-raising, or whatever) while saying that if there's a well formed derivation overall, the structure is grammatical, but I'm not sure that that would actually be helpful.

Maybe it's just about different ways of configuring your abstractions, but I do think there's something to the very simple idea that our knowledge of language goes beyond our parsing capacity (centre embedding being the obvious case), and we want an architecture which that falls out of. A classical way to think of the model of use is as a number of finite transducers operating on the configurations delivered by the generative component, and those transducers are where memory limitations, etc, apply. That allows us to capture the fact that the knowledge of language stored in the generative system is greater than what is usable by the other systems, which are time and memory bounded. If we take the generative system to be an input system in the Marrian sense, it becomes a bit of a mystery as to why we see a disparity between knowledge of language and capacity to parse.

I'm not keen on the position that Avery sketched out (although I'm embarrassingly ignorant of the Peacock-David proposal which I'll go find out about), because I don't think we should duck the issue. It's probably better to be sharp about what the proposal is, so then we can at least be sure if it turns out to be wrong.
ReplyDelete
Replies
davidadgerMay 18, 2015 at 2:43 PM
I also don't think its a mystery! I said IF we think of knowledge if language as an input system, then it'd be a mystery. On the sketch I gave, you'd expect quite a tight linkup, since the systems of use are using linguistically generated configurations. The only thing I think I'm trying to say here is that the generative system that creates the structures that are interpreted is not an input system, and is best specified at something more like Marrs computational level.
ReplyDelete
Replies

Add comment

Faculty of Language

Comments

Tuesday, May 12, 2015

Darwin's problem; some reasonable

44 comments:

Contributors