Thursday, July 6, 2017

The logic of adaptation

I recently ran across a nice paper on the logic of adaptive stories (here), along with a nice short discussion of its main points (here) (by Massimo Pigliucci (P)). The Olson and Arroyo-Santos paper (OAS) argues that circularity (or “loopiness”) is characteristic of all adaptive explanations (indeed, of all non-deductive accounts) but that some forms of loopiness are virtuous while others are vicious. The goal, then, is to identify the good circular arguments from the bad ones, and this amounts to distinguishing small uninteresting circles from big fat wide ones. Good adaptive explanations distinguish themselves from just-so stories in having independent data afforded from the three principle kinds of arguments evolutionary biologists deploy. OAS adumbrates the forms of these arguments and uses this inventory to contrast lousy adaptive accounts from compelling ones. Of particular interest to me (and I hope FoLers) is the OAS claim that looking at things in terms of how fat a circular/loopy account is will make it easy to see why some kinds of adaptive stories are particularly susceptible to just-soism. What kinds? Well ones like those applied to the evolutions of language, as it turns out. Put another way, OAS leads to Lewontin like conclusions (see here) from a slightly different starting point.

An example of a just-so story helps to illustrate the logic of adaptation that OAS highlights.  Why do giraffes have long necks? So as to be able to eat leaves from tall trees. Note, that giraffes eat from tall trees confirms that having long necks is handy for this activity, and the utility of being able to eat from tall trees would make having a long neck advantageous. This is the loopiness/circularity that OAS insists is part of any adaptational account. OAS further insists that this circularity is not in itself a problem. The problem is that in the just-so case the circle is very small, so small as to almost shrink to a point. Why? Because the evidence for the adaptation and the fact that the adaptation explains is the same: tall necks are what we want to explain and also constitute the evidence for the explanation. As OAS puts it:

…the presence of a given trait in current organisms is used as the sole evidence to infer heritable variation in the trait in an ancestral population and a selective regime that favored some variants over others. This unobserved selective scenario explains the presence of the observed trait, and the only evidence for the selective scenario is trait presence (168).

In other words, though ‘p implies p’ is unimpeachably true, it is not interestingly so. To get some explanation out of an account that uses these observations we need a broader circle. We need a big fat circle/loop, not an anorexic one.

OAS’s main take home message is that fattening circles/loops is both eminently doable (in some cases at least) and is regularly done. OAS lists three main kinds of arguments that biologists use to fatten up an adaptation account: comparative arguments, population arguments, and optimality arguments. Each brings something useful to the table. Each has some shortcomings. Here’s how OAS describes the comparative method (169):

The comparative method detects adaptation through convergence (Losos 2011). A basic version of comparative studies, perhaps the one underpinning most state- ments about adaptation, is the qualitative observation of similar organismal features in similar selective contexts.

The example OAS discusses is the streamlined body shapes and fins in animals that live in water. The observation that aquatic animals tend to be sleek and well built for moving around in water strongly suggests that there is something about the watery environment that is driving the observed sleekness.  As this example illustrates, a hallmark of the comparative method is “the use of cross-species variation” (170). The downside of this method is that it “does not examine fitness or heritability directly” and it “often relies on ancestral character state reconstructions or assumptions of tempo and mode that are impossible to test” (171, table 1).

A second kind of argument focuses on variations in a single population and sees how this affects “heritability and fitness between potentially competing individuals” (171). These kinds of studies involve looking at extant populations and seeing how their variations tie up with heritability. Again OAS provides an extensive example involving “the curvature of floral nectar spurs” in some flowers (171) and shows how variation and fitness can be precisely measured in such circumstances (i.e. where it is possible to do studies of  “very geographically and restricted sets of organisms under often unusual circumstances” (172)).

This method, too, has a problem.  The biggest drawback is that the population method “examines relatively minor characters that have not gone to fixation” and “extrapolation of results to multiple species and large time scales” is debatable (171, table 1). In other words, it is not that clear whether the situation in which population arguments can be fully deployed reveal the mechanisms that are at play “in generating the patterns of trait distribution observed over geological time and clades” because it is unclear whether the “very local population phenomena are…isomporphic with the factors shaping life on earth at large” (172).

The third type of argument involves optimality thinking. This aims to provide an outline of the causal mechanisms “behind a given variant being favored” and rests on a specification of the relevant laws driving the observed effect (e.g. principles of hydronamics for body contour/sleekness in aquatic animals). The downside to this mode of reasoning is that it is not always clear what variables are relevant for optimization.

OAS notes that adaptive explanations are best when one can provide all three kinds of reasons (as one can in the case, for example, of aquatic contour and sleekness (see figure 4 and the discussion in P). Accounts achieve just-so status when none of the three methods can apply and none have been used to generate relevant data. The OAS discussion of these points is very accessible and valuable and I urge you take a look.

The OAS framing also carries an important moral, one that both OAS and P note: if going from just-so to serious requires fattening with comparative, population and optimization arguments then some fashionable domains of evolutionary speculation relying on adaptive consideration are likely to be very just-soish. Under what circumstances will getting beyond hand waving prove challenging? Here’s OAS (184, my emphasis):

Maximally supported adaptationist explanations require evidence from comparative, populational, and optimality approaches. This requirement highlights from the outset which adaptationist studies are likely to have fewer layers of direct evidence available. Studies of single species or unique structures are important examples. Such traits cannot be studied using comparative approaches, because the putatively adaptive states are unique (cf. Maddison and FitzJohn 2015). When the traits are fixed within populations, the typical tools of populational studies are unavailable. In humans, experimental methods such as surgical intervention or selective breeding are unethical (Ruse 1979). As a result, many aspects of humans continue to be debated, such as the female orgasm, human language, or rape (Travis 2003; Lloyd 2005; Nielsen 2009; MacColl 2011). To the extent that less information is available, in many cases it will continue to be hard to distinguish between different alternative explanations to decide which is the likeliest (Forber 2009).

Let’s apply these OAS observations to a favorite of FoLers, the capacity for human language. First, human language capacity is, so far as we can tell, unique to humans. And it involves at least one feature (e.g. hierarchical recursion) that, so far as we can tell, emerges nowhere else in biological cognition. Hence, this capacity cannot be studied using comparative methods. Second, it cannot be studied using population methods, as, modulo pathology, the trait appears (at least at the gross level) fixed and uniform in the human species (any kid can learn any language in more or less the same way). Experimental methods, which could in principle be used (for there probably is some variation across individuals in phenomena that might bear on the structure of the fixed capacity (e.g. differences in language proficiency and acquisition across individuals) will, if pursued, rightly land you in jail or at the World Court in the Hague. Last, optimization methods also appear useless for it is not clear what function language is optimized for and so the dimensions along which it might be optimized are very obscure.  The obvious ones relating to efficient information transmission are too fluffy to be serious.[1]

P makes effectively the same point, but for evo-psych in general, not just evo-lang. In this he reiterates Lewontin’s earlier conclusions. Here is P:

If you ponder the above for a minute you will realize why this shift from vicious circularity to virtuous loopiness is particularly hard to come by in the case of our species, and therefore why evolutionary psychology is, in my book, a quasi-science. Most human behaviors of interest to evolutionary psychologists do not leave fossil records (i); we can estimate their heritability (ii) in only what is called the “broad” sense, but the “narrow” one would be better (see here); while it is possible to link human behaviors with fitness in a modern environment (iii), the point is often made that our ancestral environment, both physical and especially social, was radically different from the current one (which is not the case for giraffes and lots of other organisms); therefore to make inferences about adaptation (iv) is to, say the least, problematic. Evopsych has a tendency to get stuck near the vicious circularity end of Olson and Arroyo-Santos’ continuum.

There is more, much more, in the OAS paper and P's remarks are also very helpful. So those interested in evolang should take a look. The conclusion both pieces draw regarding the likely triviality/just-soness of such speculations is a timely re-re-re-reminder of Lewontin and the French academy’s earlier prescient warnings. Some questions, no matter how interesting, are likely to be beyond our power to interestingly investigate given the tools at hand.

One last point, added to annoy many of you. Chomsky’s speculations, IMO, have been suitably modest in this regard. He is not giving an evolang account so much as noting that if there is to be one then some features will not be adaptively explicable. The one that Chomsky points to is hierarchical recursion. Given the OAS discussion it should be clear that Chomsky is right in thinking that this will not be a feature liable to an adaptive explanation. What would “variation” wrt Merge be? Somewhat recursive/hierarchical? What would this be and how would the existence of 1-merge and 2-merge systems get you to unbounded Merge? It won’t, which is Chomsky’s (and Dawkins’) point (see here for discussion and references). So, there will be no variation and no other animals have it and it doesn’t optimize anything. So there will be no available adaptive account. And that is Chomsky’s point! The emergence of FL whenever it occurred was not selected for. Its emergence must be traced to other non adaptive factors. This conclusion, so far as I can tell, fits perfectly with OAS’s excellent discussion. What Chomsky delivers is all the non-trivial evolang we are likely to get our hands on given current methods, and this is just what OAS, P and Lewontin should lead us to expect.



[1] Note that Chomsky’s conception of optimal and the one discussed by OAS are unrelated. For Chomsky, FL is not optimized for any phenotypic function. There is nothing that FL is for such that we can say that it does whatever better than something else might. For example structure dependence has no function so that Gs that didn’t have it would be worse in some way than ones (like ours) that do.

62 comments:

  1. Can we summarize your argument as 'if we assume (1) that language structure is unrelated to its function, (2) that the language faculty is monolithic and not composed of multiple mechanisms, and (3) that hierarchical recursion is an all-or-nothing phenomenon, unique to language and to humans, then studying the evolution of language is a waste of time?'?

    I haven't read the Olson and Arroyo-Santos paper, but it sounds like a reasonable tutorial on what counts as evidence in evolutionary biology, a field that has had to answer the circularity charge from the very beginning. The most interesting conclusion to draw from it, it seems to me, is that work on the evolution of language must find ways to break up language into multiple components to make the comparative method applicable (cf. Fitch'10 OUP), search hard for fitness functions that may have been optimized (cf. Zuidema & de Boer,'09 JoP) and look for relevant individual variation in a population (cf., Lai et al, 2001 Nature).

    (This, incidently, also explains why Chomsky, Berwick etc. don't have much too offer for people like me, as they do none of this work. If premises 1,2,3 hold, fine, there might be nothing to do there. But I am not convinced they do, and think we won't know learn more unless we make a serious effort to do the work required).

    ReplyDelete
    Replies
    1. Agreed. In principle, we could investigate the evolutionary origins of any number of components of language, not just the construct of Merge. For instance, we could follow the lead of the developmentalists and look at phonemic acquisition, the ability to pick up sound patterns and prosody, semantic learning, the development of holophrasis or often-used structural elements, etc. Granted, language doesn't fossilize, so we might never have the empirical evidence to address the relevant questions, but that's different from ruling the investigation out of bounds ab initio. (The latter approach has a tendency to get smacked upside the head by empirical reality.)

      On top of that, we don't necessarily need to bother with unbounded Merge. Empirically, an utterance becomes incomprehensible after a certain number of merges; the precise point depends on exactly what is being Merged (both in terms of structure and semantic content). Chomsky likes to gloss this as a memory limitation, but that just renames the problem instead of solving it. In practice, 3-Merge or 4-Merge might be all we really have, in which case the logical problem of unbounded Merge would simply vanish.

      Delete
    2. 'The man is watching the dog' is merge 5. You really think that presses the limits of human capacity?

      We are ok with up to about 7 sentential embeddings. The number of 20 word sentences we are ok with is astronomical. You really arent being serious, are you?

      Of course there are possible things to study that dont involve merge. I have seen no evidence that they have been studies in interesting ways, but sure, go ahead. But enough of the promisory notes, ok. Demonstrate. Give a concrete example. Show off. Dont talk in principle talk. Do it.

      Last point, abstracting away from Merge means endorsing the distinction between FLN and FLW. As you recall, the skepticism extended to the former, not the latter. IMO, results on the latter are also piss poor, however. Time to produce my friends. Time to produce.

      Delete
    3. I was responding to the dismissal of an entire field of study, and point out that the argument depends on whether you accept premises 1,2,3. I think we can agree that the truth of these premises is debated, even if we individually have strong opinions. Given that empirical sciences are always a game of converging evidence, I think it is unwise to abort an entire field of study based on a bit of reasoning on top of a few premises that might be true, but certainly not selfevidently so.

      Whether the study of the evolution of language has already delivered convincing scenarios of subcomponents of language is another issue. But a scientific field defines its own research questions, and carves up the phenomenon it studies in ways that fit the theories and tools in the field. There is plenty of good computational and experimental work happening under the label 'evolution of language' (e.g., vocal tract reconstruction, experimental iterated learning, compositionality, combinatorial phonology, artificial grammar learning across species), but it doesn't immediately answer how linguist X's favorite feature of language evolved.

      But linguists demanding a theory of how merge, or derivational morphology, or vowel harmony or whatever evolved, is like neuroscientists demanding that linguists provide the function of the thalamus or Brodman's area 44. That's not how progress is made. Understanding how language emerged in evolution is an incredibly difficult question, we all agree (just like really understanding how language is learned and processed). Much of the progress in any field can only be judged by using criteria from inside the field, although occasionally a field must of course make contact with neighboring fields. With the huge impact that the FoxP2 work has in human genetics I think the evolution of language field already has at least one big success story, which is quite a lot better than many other subfields of linguistics.



      Delete
    4. One judges a field by its results, not its ambitions. So, one wants to have good examples of the kinds of things that count as primo findings. I do this regularly for linguistics on FoL. You might not like the stuff, but I try to show what the findings are and why they are important and how they bear on the problems mooted. So, naturally, I would like the same thing from EvoLANG, especially when I am told that I MUST appreciate the Darwinian angle. Ok, so show me something to appreciate.

      Now, it is possible that what there is does not touch on anything of interest to linguists. This too would be worth knowing. This is not a criticism of the work, but it is a criticism (or the foundations of a criticism) for the claim that I must know the work and appreciate it to do my own. And this is what we are constantly told. So we heard endlessly that the innate mechanisms that linguists propose cannot be there because they could not have evolved. Well, this suggests that someone knows how they did or must have evolved. So I ask for details. Your reply, even if right, begs this part of the dialectic. So provide an example of SOME feature, even if not my favorite.

      Second point: The papers I cited are not by linguists. These are biologists and they suggest that given the available techniques for grounding adaptive accounts there is little reason to think that evolving will IN GENERAL raise above the level of just-soism. They argue this point. I report it. I do note that it fits well with Lewontin's (another biologist) similar point decades ago. If you disagree, I offer you space in FoL to rebut the arguments. To provide concrete examples of success that we can use to model further discussion. I cannot wait to see some as this would make further conversation about the issues far more concrete and useful. So, consider yourself invited.

      Last point: None of this bears on Merge. Unless you think that Merge is not effectively unbounded (Steven P above?) then the problem of how a recursive mechanism came into being piecemeal is a very hard one. I have still not been offered anything that looks like a proposal. Hand waving yes. I've gotten a slight cold from the draft actually. But let's see a discussion of how one gets an operation like Merge in small incremental steps. To repeat, Dawkins (another biologist) didn't see how to do it. I don't. Chomsky doesn't. If you do, enlighten us.

      Delete
    5. Well, the point was that Merge obviously has *some* limits, which one could presumably study empirically. If you agree with that, as you seem to, then we're done: We don't need to explain the evolution of unbounded Merge because it doesn't exist. As a consequence, the logical problems that Dawkins and Chomsky raised don't apply.

      Moreover, attacking evolang in this way is probably not the best idea. One can try to use logic or rhetoric to claim that the evolution of unbounded Merge is impossible, but doing science still requires that you go and check to make sure you're right. As I wrote, reality has a nasty way of showing us that we're not as smart as we think we are.

      That's related to what I meant by "in principle." Chomsky and Dawkins seem to think that the gradual evolution of Merge, and thus a key part of language, is in principle impossible. I'm not willing to grant the "in principle" part, not for something Merge-ish, nor for any other part of language. That doesn't mean that they *did* gradually evolve. It just means that we can ask the question.

      "Of course there are possible things to study that dont involve merge. I have seen no evidence that they have been studies in interesting ways, but sure, go ahead."

      I cannot believe that you really mean this. Do you really consider phonemic acquisition, semantic learning, categorization, etc., etc., to be uninteresting?

      Take phonemic acquisition. We know that, at birth, infants can distinguish among a wide range of speech sounds. As they grow and listen to speech, they slowly lose the ability to distinguish between different sounds that fall within the same phoneme in their native language. So I can't really hear the distinction among the major Chinese tones or all four Hindi plosives, but I can hear the distinction between /r/ and /l/.

      This happens pretty much automatically, but there's no reason that we have to be wired this way. So why are we? You can, if you like, do some comparative analyses, testing for this effect in primates; you can study how this ability (or inability!) develops in children; you can do optimality analyses, exploring whether this optimizes or even just improves our ability to perceive language; you can look at the advantages that phonemic systems have over holistic systems. You can then look at associated brain systems and see how they work.

      I can't give you a single post that reviews all this at the necessary level of detail, and I'm not sure you'd believe it anyway. There's work by Bert de Boer, Imai and Kita, Kuhl, Juszyck's book, etc.

      Delete
    6. No, merge has no limits. It is as an operation, unbounded. Add memory and sky's the limit. How do we add this: piece of paper often suffices. At any rate, we do not agree that there are limits to the operation, though there may be to its application. So, yes, we disagree. If you think that Merge really is bounded, then the logic you mention works. As I don't, it doesn't. You solve the problem by denying the premise.

      Nope, I do not consider them to be uninteresting. What have we found? So far as I can tell, relatively little that is evolutionarily interesting. It seems that this exists across species in pretty much the same way it appears in us. BTW, it seems that you can hear the differences, you just don't attend to them much. The idea that it disappears seems somewhat over stated.

      So don't give me a single post that reviews it all. Take a post and review ONE good example and go through the logic and teach us what this entails for evolang investigations. If it is all there then this should be easy, or easy enough. So do it. We would all appreciate the effort.

      Delete
    7. I am not a big fan of statements like "X cannot evolve", based on intuition alone, whether X is 'unbounded recursion' or the innate mechanisms you talk about. Intuition just isn't so great a guide when it comes to complex processes, although, inevitably, we rely on intuition to decide where to focus our research efforts. My intuition has always been that 7M years is a bit short to evolve the 1980s style Universal Grammar + the whole suit of human specific adaptations that the evolutionary psychologists postulate. But I agree 'X cannot evolve' by itself is not much of an argument.

      I also have no problem with and even much like work that tries to account for grammaticality/ acceptability/ differences in meaning of (sets of) sentences, and/or account for similarities and difference between languages, using sophisticated models/grammars.

      What I like less, and where I do think generativists should pay attention to counter-arguments, is work that based on such linguistic analyses tries to draw conclusions about innateness, and the impossibility of learning, evolution and neural networks to account for the linguistic data. It's not that such an effort is uninteresting per se, but much of it is so ideologically laden, it seems, that valid counterarguments are dismissed without proper consideration. (I realize that this state of affairs might be explained by the fact that many of the counterarguments thrown at Chomsky et al. over many decades have been rather unconvincing, and at some point people have just stopped paying attention. Nevertheless, I do regret that the better arguments apparently get lost in the tsunami of crappy ones).

      So this is not just about evolang, but let's take an Evolang topic as an example: iterated learning. Over the last 2 decades there has been a lot of work on trying to understand what happens when a complex learned system like language (or music) is passed on from generation to generation, where each individual in each generation has to learn the whole system from a limited number of examples. This is a rich literature, with very many computer models (e.g., Hurford & Kirby), a number of mathematical results (e.g., Griffiths & Kalish) a lot of experimental studies with adults, children, and even other species like zebra finches and baboons. It is impossible to summarize that this literature in a few lines, and much of it is just interesting for its own sake, but one thing it shows is that there is a complex relationship between learning biases and the properties of the languages that emerge in a population. This is not evidence against innateness, but it does show that simple schemes that equate constraints on variation with innate knowledge are on the wrong track. (My 2003 NIPS paper makes some of this more precise, and in particular focuses on the erroneous use of Gold's Theorem to argue for innateness).

      Another, less evolang-y, example is the on this blog frequently repeated claim that there is no gradual path to 'unbounded' recursion, about which we had a long discussion a while ago. Here, too, I find that paying attention to elaborate counterexamples would be in order, leading not to an worked out alternative theory on how all of this works and evolved, but at the very least to some modesty about what we can and cannot conclude about innateness, evolution and the neural basis of language.

      Delete
    8. "I am not a big fan of statements like "X cannot evolve", based on intuition alone, whether X is 'unbounded recursion' or the innate mechanisms you talk about. Intuition just isn't so great a guide when it comes to complex processes, although, inevitably, we rely on intuition to decide where to focus our research efforts."

      I think you are missing the point here. The problem wrt recursion is conceptual. I don't think we even have a just-so story as to how unbounded hierarchical recursion could emerge piecemeal. But if there is no just-story, then the problem is conceptual, not empirical. If I am right in describing the capacity as unbounded with unbounded recursive embedding being the right mechanism then someone opposed to the all at once scenario has to explain how it COULD arise in bits. The problem is getting to UNbounded from finite. Till this is done there is not much to discuss. Or, let me put this very baldly: I don't see how it is logically possible to get to infinite in small steps. This is why the only coherent option is to deny that the we have run unbounded capacity. But this seems to me just wrong. So, that's the problem and talking about intuitions is beside the point.

      " that valid counterarguments are dismissed without proper consideration." Look, FoL has spent a lot of time and effort going over some of the prime contestants here. It might not have covered them all, but it does try to go over a good many. So this is more than a bit unfair. In fact, from where I sit, the critics hardly ever go over the PoS arguments, even when challenged to do so publicly and regularly.

      "equate constraints on variation with innate knowledge are on the wrong track." Which ones? There are two kinds. Some things seem entirely absent (there is no variation at all, e.g. ECP effects). The other tries to constrain the "range" of variation with parameters. I agree that this has been less successful (and I have written as much on this topic and will be doing so again soon). So, what do you have in mind? BTW, the argument is not re innateness. Everyone believes in innateness for without innate structure there is no "learning" of any kind. The question is not IF but what kind. So, I am not sure what you are driving at here.

      s for last paragraph, see my first one.

      Delete
    9. "So this is more than a bit unfair." -- I didn't mean to be disparaging about the FoL blog. I think it is a great forum, even if I often disagree, and it helps me understand better what's worthwhile, valid or controversial in your field. But evolution of language research, neural networks, Bayesian modelling, do get a regular beating, there, and I don't recall seeing you ever change your mind, even when reasonable counterarguments are presented.

      For the case of unbounded recursion we had a long discussion recently, that we need not repeat, but it would be nice to see some acknowledgement of the fact that a continuous-valued system X can mimick a discrete-valued system Y in every way to arbitrary degree, so that you can have X pass every test for recursiveness that you find convincing when it comes to human language, and still have a gradual path to recursion. This doesn't prove anything about how human language did in fact evolve or does in fact work, but it does show the a priori, conceptual argument doesn't fly.

      Delete
    10. "it would be nice to see some acknowledgement of the fact that a continuous-valued system X can mimick a discrete-valued system Y in every way to arbitrary degree, so that you can have X pass every test for recursiveness that you find convincing when it comes to human language, and still have a gradual path to recursion."

      First, I really don't see how having a continuous system that "mimics" a discrete system tells us anything about gradualism. My issue is not discrete vs continuous (though what continuous rules that relate S and Ms is (with some probability?) but with bounded vs unbounded. We can form unboundedly many discrete pairings of Ms and Ss. If I understand your proposal correctly what happens is that we actually have a system that can approximate any discrete system and that is what we actually have underlyingly. But then it can approximate any other system as well, ones that we do not find. The fact is that FL allows Gs with very distinctive properties and not others AT ALL. Look, I clearly do not get this. So I invite you to write something up that makes the conceptual argument clear. Show us how you go from systems that allow only one or two applications of Merge (that's just an example, choose any rule you want) to one that allows you an arbitrary number of applications of that operation. How we go from one or two levels of embedding to arbitrary levels thereof. Lay the just-so scenario out. You know, do for unbounded hierarchical recursion what we already know how to do for white polar bears and long knocked giraffes. I would find that very helpful. I really would for then I would know what you have in mind. When this is done, I will be happy to acknowledge that the argument I gave perviously about the conceptual problem has been laid to rest. So, please do this. I grant you all the space you need to make the case. I actually am looking forward to this. Nothing better than finding out that you are wrong. Thx in advance.

      Delete
    11. Maybe if you made the a priori argument clearer that would help? E.g. what is the relevant sense of recursion? On the other post where we discussed this there was general confusion on this point.

      Delete
    12. Give me something like Merge. That sense. A general operation that will yield unboundedly many hierarchically structured objects and that allows for displacement. That's what we find in FL and so that is what we want to evolve.

      Delete
    13. It seems to me that any self-respecting empiricist (for want of a better term) shouldn't go anywhere near Norbert's challenge (the non-self-respecting should knock themselves out). I take the Chomsky-Dawkins-Norbert line to be trivially true. The point is simply that you can't effectively enumerate an infinite set without the function somewhere or other 'calling itself' (or presupposing the infinity at issue). Hence the analogy with counting: 3 is no closer to infinity (Aleph-0) than 2; in the number case you need some recursive rule for the counting, at some point. I understand the empiricist, therefore, as being in the business of denying that unbounded recursion is a cognitively real phenomenon at all, so they don't want a just-so story of it, no more than they want one of unicorns and their horns. What the empirist must offer, instead, is some explanation of the illusion of unboundedness (think Hume and causation/necessity). I stopped holding my breath on this one after first reading Descartes, for no approximation will capture the general, counterfactual-supporting character of the competence (e.g., what happens with more/less memory?); only some contingent performative realisation of it would be at best explained, which leaves the general matter unexplained. I think it matters little what precise notion of recursion one has in mind - r.e. will do - although cfg is sufficiently weak to satisfy anything at issue here, I think.

      Delete
    14. We want a definition of recursion, R, such that the transition from a system which is not R to a system which is R exhibits some sort of discontinuity. How can recursive enumerability be this?

      Norbert, is displacement an important part of this argument?

      Delete
    15. As I understand it the argument goes something like:

      Premise: recursion (whatever that is) is an all or nothing property.

      Conclusion: recursion could not have evolved gradually.

      But in the first case we are talking about a mathematical property of an abstraction -- a competence grammar that divides sentences into grammatical and ungrammatical -- and in the second case we are talking about the human brain. It's clear we need some more steps in between, to go from elementary facts about formal language theory to some claims about the messy process of evolution.

      Delete
    16. Alex: The relevant discontinuity is from finite to unbounded. The challenge is to explain how a system could transition without stipulating some 'recursive part', which for present purposes can simply be the inductive cause of the definition of the relevant function, or simply presupposing an infinite domain. So, specifying the problem in basic r.e. terms suffices, and is the way Chomsky and Dawkins present it, for the problem doesn't turn on properties peculiar to stronger conditions, or more restrictive languages. Sure, providing stronger dfs. is required for this and that purpose, and other issues thereon arise. The present point doesn't turn on any of that. I would depart from Norbert here in his appeal to movement in his last post.

      Delete
    17. That's an interesting point to choose. As I understand the argument, it requires that all nonhuman animals are in the finite part of it. Is that tenable in the light of the literature on birdsong etc.? For example Bob Berwick claims that they are k-reversible regular languages (e.g. here).

      But certainly in that case the example of Jelle's that we discussed before would not work, as it uses an RNN as the initial model which would be recursive in this sense.

      Delete
    18. This comment has been removed by the author.

      Delete
    19. I think Alex' point that "in the first case we are talking about a mathematical property of an abstraction ... in the second case we are talking about the human brain" is key here.

      When John writes: "I understand the empiricist, therefore, as being in the business of denying that unbounded recursion is a cognitively real phenomenon ... . What the empirist must offer, instead, is some explanation of the illusion of unboundedness." that is correct in some sense. But the word 'illusion' has unfortunate connotations. It makes a lot of sense to describe, say, centerembedding contructions with recursive rules, that generalize across linguistic examples, and acknowledge that there is no principle bound on how deep the recursion can go. All these things are very real at a linguistic level of description, even though the underlying neural implementation might not technically be unbounded. That's why I wrote "pass every test for recursiveness that you find convincing when it comes to human language". We shouldn't loose track of the fact that we're trying to understand how real language works, not whether one model can perfectly mimic another model.

      Regarding the repeated invitation to post of worked out example: we are working on a text describing a precise model illustrating these point, which I'm happy to share when we're done. But models are always simplifications, and you can always find ways in which the model is different from the reality it models. I have no interest in playing a game with shifting goal posts. If there is a clear *operational* definition of recursion (e.g., in the often quoted simpler domains of counting or arithmetic) including clear theory-independent tests for recursiveness (not: there must be a function inside the model that calls itself), than that would make the discussion about evolvability/gradual route to recursion much more productive.

      Delete
    20. "that is correct in some sense. But the word 'illusion' has unfortunate connotations. It makes a lot of sense to describe, say, centerembedding contructions with recursive rules, that generalize across linguistic examples, and acknowledge that there is no principle bound on how deep the recursion can go. All these things are very real at a linguistic level of description, even though the underlying neural implementation might not technically be unbounded."

      Yes, John hit the nail on the head. The idea that there is a real recursive mind must be an illusion. So what you are denying is the premise of the argument. You think that there will be an adequate substitute, but color me skeptical until I see it.

      Note, btw, the problem is not that one is a mathematical abstraction while the other deals with brains. We know how to code a recursive procedure into a real live computing device. It is unbounded in that the only thing limiting its unbounded application is memory and time. Add either and the thing will keep going on in the same way that it did for the less elaborate cases. This 'in the same way' is key. This is effectively what you are denying holds for brains. But why brains present a problem of embodiment different in kind that what we get with computers is beyond me. So, the problem is not one of brains vs mathematical abstractions.

      Second, take any recursive function you want, (John suggested a couple) and show how to get 'and so on' from a finite basis. If there really is no upper bound then how one gets to this from something that has an upper bound is conceptually unclear. That's the point. And that is why you must always insist that in fact we do have an upper bound and that the 'and so on' is just an extension of 'and so on up to N' where N is pushed further and further out.

      Last point. Can we all agree that N is very big? If so, what possible evo pressure will there be to go from some small N to N= some very big number. What advantage does one get by being able to embed 15 times rather than 10 or 5? I never understood this either. Though the evo point is easy to argue when we assume that N is unbounded, the general evo point holds (I believe) even if N is very big. The problem will then be not explaining how one jumps the qualitative divide, but why one keeps getting pushed out along the quantitative frontier. What's the analogue of good eats on top of trees and long necks in the domain of embeddings and gains from embedding?

      At any rate, I now think we can agree: you deny that there really is unbounded recursion and believe that if it is finite even if very big and that there will be an adaptive account of this. You concede that IF it is unbounded you have nothing to say (my original point) but demur as to whether this is the correct description. You also agree that to date you have nothing concrete to offer, but don't like it when people conclude form this that as of this moment there is no alternative to the Chomsky-Dawkins point of view. You are selling optimism and hope. Good luck with that.

      Delete
    21. What Norbert said! Either unboundedness is an illusion (there is some big N) or we know how to get there without building in the infinity, as it were, by supposing a recursive device or some denumerable domain. I can see no reason to favour either, and you can let recursion be defined however you like on the Chomsky hierarchy.

      Delete
    22. Is birdsong bounded?

      Norbert, when you say a "real live computing device" do you mean a brain or an electronic computer?

      Delete
    23. The real evolutionary puzzle for me is how the minds of generativists have become so binary... You're a nativist OR an empiricist. Systems are finite or unbounded. Recursion is true or an illusion. You win a discussion or you loose.

      Look, no-one has any idea how language, and the recursive structures it features, came about, and how it is implemented in the brain. You (a.o.) offer a quasi-logical argument that recursion cannot gradually emerge, and I (a.o.) have pointed out that the logical argument is very brittle -- the entire argument is based on the inability to imagine alternatives to treating the mathematical idealizations of formal grammars to be cognitively real.

      I have never claimed there is a worked out model of natural language grammar based on other assumptions -- although I would much recommend anyone interested to pay attention to the beautiful syntactic and semantic parsing models of Yoav Goldberg, Mike Lewis, Richard Socher and others making headlines in NLP. I have no problem if you remain skeptical -- much more work is needed, and we need people to point out the flaws. I just wish you'd examine your own silent assumptions just as skeptically, and don't dismiss entire fields of study because of some tiny bit of logical reasoning on top of a contested premise.

      Delete
    24. Well, oft-times, you get to find out about the world just by thinking about it; such is the present case. I also don't see anything brittle about the argument because it does not rely on any imaginative blinkers. It simply goes:
      P1: Linguistic competence with L is unbounded.
      P2: Unboundedness is given either by stipulation or recursion (again, define that as weak as you like),
      C1: So barring stipulation, competence is served by a recursive device
      P3: Recursion is not an incremental property - there is no more or less (again, any level on the hierarchy will do).
      C2: So, the device serving competence with L can't be incrementally acquired,

      That looks valid. Some of the Ps might be false, but it is no good just wishing they are. The conclusion of a sound argument tells you something about reality, regardless of data or imagination.

      Delete
    25. @Alex: Birdsong. I tend top believe it is not (and Have said as much on other places). It not a great model for language as it does not involve unbounded hierarchy, but no, I don't think it is. As for "real live computing device" I was thinking of a desktop, not a brain. I was just noting that the problem cannot be incarnation vs abstraction as abstract things can be incarnated in real stuff in ways we understand. We don't understand how brains compute, but this is at right angles to the unboundedness question. That was the point.

      @Willem: Nope, my mind is binary but not in the way you mention. As I've said many times before, everyone is a nativist. Empiricism is just a kind of impoverished nativism where the relevant mechanisms are pretty anodyne. That's why I have never understood why anyone would think Empiricists enjoy the argumentative high ground. They cannot as all are nativists the only relevant question being what's innate, and on this there is a lot more than two positions. So, as regards the first charge, I plead innocent.

      As regards number two however, yes, I make a categorical distinction between bounded and not. Infinite is not like 2 or 200 or 200,000. It doesn't come after some number. There is a difference in kind. So the problem of going from finite to infinite is not like going from 2 to 10. I am sorry that you don't like this, but it is the case all the same. There are distinctions of kind and sometimes you cannot get there from here. So here I plead guilty and proud of it.
      to be continued:

      Delete
    26. continuation:
      As for discussions, we both won, as I see it. WHY? Because we clearly see where we disagree. That's a very good place to be. You know why I think you are wrong and you know why you think I am missing the point. We have identified the issue: unboundedness. You should be pleased. I am. Of course we can pretend that there is no disagreement here and get into a group hug, but what good would that do? We disagree. We have identified the relevant point of disagreement. That's progress. Btw, recursion IS an illusion for you. It does not really exist. You said as much. Stick to your guns. It's better for all of us.

      The logical argument is not "brittle." That's where we disagree. Your reply is that it is possible that it emerged gradually but I have no idea how. My view is that if I described the problem accurately then this seems an impossibility. The way you show I am wrong to think this is to demonstrate how it COULD happen (not how it did, but COULD). I asked for a just-so story. Not a real one. Just so stories have the virtue of showing that a certain position is coherent and conceivable. You don't have one to give. I sympathize. Does that mean that you must agree with me? Nope. It means that you have no good counter-argument. It's logically possible that... is not an argument. It is a sentiment registering your hopes and dreams. Always nice to have them registered, but not really relevant. So, not brittle. Right now the only game in town. I can live with that. It spends more easily than promissory notes.

      You retreat to what has not been asked for: a worked out model. Again, this is not what was requested. A just so story is not a worked out model as the paper discussed in the post makes clear. But it is a start. And NLP models, frankly, are not really that relevant here. We are talking competence, not parsing, or translation or... NLPs incorporate Gs. These care generally recursive. They apply these Gs to do something else. Of course, actual performance is bounded. Nobody lives forever. But the capacity might not be. That's what is at issue. So, this just throws dust in our eyes. A useful rhetorical maneuver but not really to the point. Again, I would stick to the illusion story were I in your place. At least it is relevant and makes sense.

      Oh yes: did you consider the adaptive problem of going from some small finite N to some very big N. This is not a logical problem, but one asking for the pressures that would push us along this continuum. I don't see a likely candidate. It is cousin to the unboundedness concern.

      Ah silent assumptions: one thing I tend not to be is silent. I try to examine mms assumptions by organizing them and making them available so that others can poke holes in them and the arguments they support. I try to defend them and (occasionally) admit when I have nothing to reply with. That's useful. I don't aim for consensus as we were not engaged in politics but intellectual inquiry. And unlike you, I tend to really value "tiny bits of logical reasoning." You see, I believe that if something does not make logical sense that that is one reason for thinking that it is wrong. Or at the very least highlighting a flaw in how we conceive things to be. Nor do I worry about contested premises. I state mine, you state yours, we argue. My job is not to convince you. My interest is in finding out what's what. Premises and logic are useful tools in this project. If this means that whole domains of inquiry are devalued as they do not stand up to scrutiny, well so much the better. It means that thinking works. I am happy to change my mind, but I want reasons to do so. I understand that this smacks of hubris, but I am old and will risk it.

      I will leave you the last word should you want it because I think I've exhausted our conversation. Thx.

      Delete
    27. So here is an example that may illustrate where Jelle and I have some anxieties about the argument. I'll take the boundary between regular and cfgs as the discontinuity in question.

      Consider two discrete formal languages.

      L is the language of even length strings over {a,b}.

      M is that subset of L where the string has equal numbers of a's and b's in any order.

      L is regular, M is context-free but not regular, and they are both infinite.

      Pick some family of distributions, D, perhaps generated by an Elman style RNN,
      parameterised by a value theta in [0,1], where D(theta = 0) has support equal to L and D(theta = 1) has support equal to M, such that D(theta) is everywhere continuous and smooth, say.

      So here we have a smoothly varying family of distributions where at one end
      D does not have this all or nothing property, and at the other end it does.
      This is not to deny that at a computational level, D actually is computing these two different things. It's just that at a lower implementation level we can have things smoothly varying.

      Delete
    28. Thanks Alex. I think I follow the case. First off, note that the kind of case you offer is distinct from the finite/infinite discontinuity - but no matter. Here is general thought. All competence theories are computational ones (in Marr's sense, more or less), not theories of implementation or even algorithm. Indeed, again following Marr, you can have correct competence theories in the absence of any implemetnation story. Moreover, any implemenation that is available does not explain away or reduce the computational explanations, which might hold under radically different implementations. So, in your example, the underlying smooth distirbution does not belie the sharp discontinuity at the computational level, which captures the competnce facts. I think the basic point hereabouts was made by Fodor & Pylyshyn. F&P's objection to networks was not that their aproximate categorisations and smooth transitions disallowed them from implementing a competence (in our terms); the problem was that the systems, in not being rigid, failed to capture the relevant generalisations or support the right counterfactuals precisely beause they failed to make the sharp, in-principle distinctions that characterise the competence; they can only mimic the competence to some degree, much as getting the right answer for the wrong reasons is a mimic, until one changes the conditions (decrease memory) and the wrong answer comes out. The competence theory, in this sense, presents rigid constraints on the class of possible implementations of the relevant processes (vision, language, whatever). To turn to your case, then, if the difference between L and M is a real phenomenon for some class of systems, then the computational level will explain it, not some particular smooth distribution that might relaise the transition, but upon which the distinction does not depend. Otherwise put, that distribution of the RNR counts as an implemtation of the transition because it belongs to a class of systems that can be specified as computing the relevant functions for L and M, not vice versa.

      Delete
    29. Sorry I didn't really follow that. The example was meant to illustrate the incremental acquisition of a property which is not incremental. Does it work ? In which case the implication P3 -> C2 in your argument is invalid.

      I could come up with a more elaborate example if needed; but we would need a tighter definition of recursion which seems not to be forthcoming.

      Delete
    30. Sorry, I meant to add...from the evo perspective, then, if one takes the smooth distribution to model an evo process, one is still left with the discontinuity unexplained, for we only have a computational grasp of the two ends, as it were, not a transition.

      Delete
    31. Yes indeed. Particularly in this artificial example where the transition is unmotivated.
      Consider the following dumb argument: evenness and oddness are not incremental properties so you cannot move smoothly from 1 to 2. Well of course you can, and at the point's in between like 1.2345, the concepts of oddness and evenness don't really apply. In an analogous way some transitional states of the brain may defy a simple categorisation as generating a well defined set of strings.

      Delete
    32. In reply: I don't see that at all. You have two computationally distinct systems, we agree, and no transitional systems between them. What you do have in the scenario you depict is a transition between systems/states that realise the computional states, but still no incremental computational steps. I don't deny that shit happens, as it were. My argument was meant to hold at the computational level. I also think, which I've said a few times, that the issue of sharpening the df. of recursion is a complete red herring. The argument I presented doesn't stand or fall on some or other df. of recursion, for it trades on the finite/infinite distinction - I said nothing about going from reg. to cfg.

      Delete
    33. Sorry - the sequence of my messages are getting mixed up.

      Right, you can tranisition between 1 and 2 via some equally well behaved notions of partial sums, limits, or what have you. I suppose I don't understand what the transitional states are supposed to be in the present case. We agree that computationally there aren't any; so, computationally speaking, these steps aren't increments, but great leaps forward.

      Delete
    34. John, the problem is I think that I don't really understand the argument you make:

      "P2: Unboundedness is given either by stipulation or recursion (again, define that as weak as you like),
      C1: So barring stipulation, competence is served by a recursive device
      P3: Recursion is not an incremental property - there is no more or less (again, any level on the hierarchy will do).
      C2: So, the device serving competence with L can't be incrementally acquired,"

      I don't really understand how P2 can be true independently of how we define recursion. Because iteration for example is a way of generating infinity and so on.
      I also don't understand what "unbounded" means in a technical sense. Obviously the number of bananas a monkey can eat is unbounded in some sense, but bounded in others (by the size of the stomach etc.) and birdsong is unbounded as Norbert and Bob Berwick agree.

      I also don't see how C2 is meant to follow from the preceding propositions, in the light of the example I sketched.

      Delete
    35. Unbounded = countably infinite (although it wouldn't matter if the construal were non-denumerable).
      Recursion - but I don't see why it matters if one is dealing with iteration or a narrower realm on the hierarchy, for there is still a difference between finite and unbounded iteration. That said, I did mention above that we could settle on cfg as the relevant notion without bothering about what the right notion will turn out to be.
      Birdsong - I don't know about the birds, but I don't see the relevance. Assume that some birds have an unbounded competence. Fine. If the argument I offered is OK, then the same problem will arise for the birds as for us. It is not as if the birds are our near ancestors.
      Validity - the argument remains valid because no incremental steps have been specified. C2 speaks of incremental acquisition, not mere acquisition.

      Delete
    36. But I did specify an infinite number of incremental steps; take theta = 0.5?

      This has the feel of a Zeno paradox.

      Delete
    37. I hope it's not Zeno:)

      Sure, you have the incremental steps, but I thought we agreed that these steps don't give you a computational transition - the intermediate states are computationally unspecified. It is not like moving from 1 to 2 via partial sums, say (that's what Zeno was confused about, inter alia). So, to return to Norbert's way of putting the challenge, you don't have a just so story here, but more the claim that categorical differences can be realised by smooth transitions. I don't want to deny that is interesting - the full significance of that might be beyond my present ken - but only to insist, as you yourself put it, that the transition takes place at a different level from the level at which the computational distinction holds, and it is at this level that the question of acquisition is pitched in terms of antecedent states to the unbounded gizmo we presently have. I mean, just think about it concretely. Imagine we have some evolving population. At t0 no unbounded competence (let's say), at tn such a competence. I can think, like a good Darwinian, that the transisition is smooth, but if, God-like, I were to look at each organism, at all intermediate times, I wouldn't find one with half of merge. That is nonsense. I would come across a single organism - Eve Merge - who has the competence unlike her conspecifics, but the transition over the population between t0 and tn would be smooth. I think Dawkins thinks of it in such terms.

      Delete
    38. Well you wouldn't find half of Merge, but you would find some neural architecture that does the relevant computation imperfectly.

      So in the example I gave, one can determine whether a string is in the language M or not by keeping track of the current difference between a's and b's (equivalently by having a deterministic push down automata with two stack symbols etc.), and reducing or increasing the count as each a or b is read. This is something that can be done approximately.

      So the idea that there would be in evolution some distinct moment when there is an exact MERGE computation, seems an unmotivated stipulation.

      Delete
    39. OK, I think I see the idea better, but three things...
      (1) Remember, we are here talking about competence, not performance. Performance, let's suppose, may improve to be approximate to the formally specified device, but we are after the underlying device (computationally speaking). In this light, it would seem as if there is some point where the system stops being one that approximates and starts deciding. I think this takes us back to the issue discussed above: whether on offer is a story of how to transition to Merge (unboundedness, anyway) or a story of how it is really an illusion. The idea that all the systems are just approximations of the formal devices is an illusion story. Why? Because it ignores the counterfactuals of what the system could do without the restrictions on time, memory, etc.
      (2) I'm unclear how this example would translate to one concerning the move from the finite to the unbounded, which is the only case the argument turns on. There the issue isn't about approximating membership, but a rule feature - whether one has variables or a loop, say, in the system.
      (3) You would still have Eve Merge, even if her performance would be the same as some conspecific. Different counterfactuals would hold for her insofar as she had unbounded merge, whereas her conspecifics didn't.

      Delete
    40. On point 1): I am generally a fan of the competence performance distinction as a modeling assumption, but you need to keep an eye on these assumptions and not accept them dogmatically. So why should there be some point where "the system stops being one that approximates and starts deciding."?

      There just aren't any (nomologically valid) counterfactuals under which a human brain can enumerate an infinite set whether of grammatical sentences or natural numbers. That doesn't deny the existence of a competence grammar, but just to recognize the existence of performance limitations, even if they fluctuate from time to time.

      2) I think that one could extend this to the finite/unbounded case without much difficulty, but that is only tangentially related to the evolution of human language so let's drop it, unless there is more general interest in this question.

      Delete
    41. Sure, no actual system could enumerate an infinite set - that's not what I meant. I meant that for any finite bound, there would be some counter factual situation that would belie such a bound. That's why one wants unboundedness from the system - any bound would be arbitrary and so fail to reflect the capacity of the system. Thus, I think unboundedness is central, as it reflects the nature of the system in abstraction from contingent factors, which always will impose an uninteresting limit on performance.

      Delete
    42. That's why one wants unboundedness from the system - any bound would be arbitrary and so fail to reflect the capacity of the system.
      That's actually not obvious. In the TAG community, for example, it has been noted that TAGs can only do limited scrambling and that this bound coincides with what the standard view would regard as the performance cut-off point. So a priori it is conceivable that all the supposed performance gaps are competence gaps and we just don't have the right competence view yet.

      One thing I also find surprising is that you and Norbert adopt an E-language view in this case: the discontinuity arises if we look at the set of expressions that can be generated. But as generativists we should look at I-language, i.e. the generating device itself. It's much less clear that there is a lot of discontinuity at this level, which I take to be Alex's point.

      Take a finite-state automaton: the split between finite and infinite corresponds to whether you allow cycles. Whether cycles are trivial or not depends on the encoding, so one could regard that as a separate evolutionary step. And one could regard it as an incremental step because you move from the ability to connect states in a restricted way to connecting them in a more general way.

      On a much more general note, I think the unboundedness discussion is a red herring. All the things linguists want (self-embedding, tree structures) also emerge with finite languages that, like natural languages, are very large and display a lot of internal regularity.

      Delete
    43. "so one could regard that as a separate evolutionary step".
      Yes one could. The point though is that it will not be taken piecemeal. Look, nobody is saying, I don't think, that what we think of as Unbounded Hierarchy (my interest) may not itself be compounded of several different operations (in fact that is MY view). What We (I) ams saying is that there is no coherent story that takes us piecemeal from finite versions of this to unbounded ones. You either can do it ad libitum or you cannot. Or from the fact that you can embed once, or twice does not get you to you can embed as much as you want. There is no logical relation between the finite capacity and the unbounded one. And I still don't see how "the more general way" will be general enough unless it is "as many times as you want."

      Another way of saying this: I can see what pushes one from do it once to do it twice to do it thrice. What pushes you from do it some finite N to do it as much as you please? So if this is the correct description, that the competence really is unbounded, then it did not get there from JUST generalizing over finite instances.

      Delete
    44. (1) I didn't mean to suggest that ANY bound is illegitimate or arbitrary. Local finite bounds do not affect the general point. What would is if the TAG phenomenon you mention held for all operations/relations. No-one thinks that, I take it.
      (2) E/I pertain to function in extension/intension - the distinction has nothing to do with implementable algorithms or physical devices. Everything I have said relates to functions in intension.
      (3) I think your thought about a move from restriction to greater freedom is an interesting one, but note that the relvant sense of freedom here is absolute, as it were, so does mark a radical shift, which I mentioned above - it is not that adding a loop or a variable to a system is a big change in itself, but it is a novel component that allows for something very different.
      (4) Yes, the unbounded issue does not matter to much about anything one wants to say about language. The same holds for most of maths, though - how many numbers do you need to do all the arithmetic any would ever want to do? The general prnciples give you an abundance that is largly irrelevant. So what? That doesn't cast doubt on the superabundance, for as soon as one goes to restrict, one is faced with unprincipled limits, more complicated rules, etc., save for some exceptions, perhaps, as with your TAG case.

      Delete
    45. Still not sure I get the argument because I don't understand the difference between a piecemeal story and one with a radical shift when we're talking about language. For E-language that is easy: the jump from a finite set to an infinite one is not an incremental one. So far so simple. But when we look at the generating device, it is much less clear why a change that takes you from a finite to an infinite extension must qualify as a radical change.

      The only reason to assign a cycle special status is its effect on the output language, from the perspective of the generating device it is unremarkable. It does not change what kind of state transitions are allowed (you don't need loops to get a cycle), it does not alter the memory structure or how recognition works. It is entirely innocuous.

      As for what kind of evolutionary pressure would push you towards unboundedness: succinctness/grammar compactness. Past a certain size, finite languages can be given much more compact descriptions by generalizing them to infinite ones. So we can again imagine a sequence of adaptive steps [Merge 1; Merge 2; ... Merge n; unbounded Merge], which in terms of grammar size would look something like [x; 3x; ... x^n; x^n]. Those are fantasy numbers, of course, but one can look at this as a gradual development where no step is particularly privileged. So how do we decide what we have to look at and what piecemeal VS radical means for this object/property?

      That said, I have the feeling that I don't fully understand what kind of position you are arguing against, so what I'm saying may actually be pretty much in line with your thinking anyways. It's been a long discussion, so it's hard to extract the central point of contention.

      Delete
    46. I think there are some other arguments about the emergence of recursion; one is how it is learned from a finite set of examples, the other is why languages have evolved/developed to have those properties.

      Those are I think independent to the question we are looking at which is whether discontinuities at the computational level entail an evolutionary discontinuity.

      I think I will state my view and then stop as I am not sure we will make any progress. Human linguistic abilities are very finite and bounded, but those bounds fluctuate and are arbitrary so it is essential I think to have some sort of competence/performance distinction where we combine a less bounded competence model with some performance bounds to get a principled explanation. the competence grammar here would correspond to some real regularities in the neural architecture of the language processing parts of the brain, though they might not be declaratively represented with little arrows carved on the neurons. In intermediate cases it may be the case that we can equally well describe the system as an infinite competence and some bounds that make it finite, or as a finite competence with some weaker bounds. (I gave an example for the regular/cfg case which is different and maybe less contentious). This sort of case seems to imply that the Norbert/Dawkins/Chomsky style argument that there has to be a Big Bang emergence of Merge is not sound. I could do a similar example for finite/infinite if you want but I'd like to have some clearer definitions.

      Delete
    47. I find that useful. As I see it, there are four possible positions:
      (1) The finite/infinite is not real - let's agree this is out.
      (2) It is real, but only as an ideal we can approach approximately - this is the Alexian position I've been contesting.
      (3) It is real at the level of the device, but you need just some innocuous change to bring it about, so is not evo problematic - Graffianism.
      (4) Incrementslism - you need a bunch of minor changes, each offering some selective advantage.
      (5) Just like Graffianism, except to say, 'Right, you need some new mechanism, whatever it is to get you to the unbounded, and requires no gradualist selective rationale - Chomsky/Dawkins/me/Norbert(?).

      Delete
    48. My last was in response to TG.

      Delete
    49. Alex: Thanks. Yes, let's recoup:) Just to say, I think from a semantic perspective the unboundedness is clear - if I don't recognise the unboundness of the basic connectives, then I lack an understanding of them; ditto for various operators, verbs, etc. Or so it seems to me - a clear per/comp distinction in the realm of semantics is s no-brainier.

      Delete
  2. Serious question, stemming from "One judges a field by its results, not its ambitions.":
    what are the important results of linguistics from the past, say, 20 years?

    ReplyDelete
    Replies
    1. I take it that you believe that earlier work is not up for grab? So you grant that the GBish world was chock full of results? Given this, I think that I have tried to address some of this in recent posts on why I think MP has been so wildly successful. The unification the Merge Hypothesis has forged is theoretically wonderful, IMO. The demonstration that properties like c-command and reconstruction and structure dependence follows form a simple conception of Merge are all pretty good. I am personally also moved by the unification of parts of the theory of construal with the Merge system (others are less impressed, but IMO, they are wrong (no accounting for taste, huh?)). Empirically we have discovered inverse control phenomena, discovered interesting evidence for kinds of "movement" we did not heretofore entertain (sidewards) and linked scope and case/agreement effects. In addition there is the excellent work on deletion as island obviation and some theoretical attempts to try to understand why this might hold.

      There are many skeptical of the last 20 years of Minimalist work. I am not one of them. As I say, I've posted on this extensively. I assume that you are less impressed. We will see on 100 years who was right.

      Delete
    2. I'll suggest two things.

      1) Kwiatkowski, Steedman et al.'s (2012) demonstration that at least somewhat decent combinatory categorial grammars can be learned from glossed child-accessible data. This demonstrates the general feasability of Chomsky's broad program as developed in the 50's and 60's, that some specifically defined metagrammar (format for writing grammars) could specify the 'projection function' that takes language learners from the limited data the encounter to the far more inclusive system that is the language as a whole. Important Proviso: CCG is not a fully typologially adequate theory, afaik, so there's plenty more work to be done. Another observation: 60 years ago, C didn't think that any kind of discovery procedure was even possible, but quite a lot of time has passed; our expectations of a 10 year old are different than of a 60 year old. Yet another observation: the CCG learner involves both corpora and (Bayesian) statistics, which C tends to disparage, but nobody's perfect (and Bayes is not the only option).

      2) The rather rapidly developing case for concentric, mostly binary, hierarchical structure in Noun Phrases. E.g. the discussion of NP-internal agreement discrepancies in Russian and Lebanese Arabic in David Pesetsky's 2013 book, and Velegrakis' 2011 UCL thesis on NP structure in Greek, with the possible readings of all six orders of 'the gold the fake the watch' apparently predicted in accord with binary branching (I would however like to see some large scale empirical validation of this result).

      I think the latter is especially interesting, if it holds up to further empirical scrutiny, because these interpretive facts are subtle, not at all plausible to be directly learnable from the data, and are coming exactly as predicted by a binary syntactic composition mechanism.

      We can also easily imagine a world where the above idea is false: it is the world of Jackendoff's 1977 version of X-bar theory, plus something like LFG's glue semantics, wherein NP structure is flat, and every reordering of the Greek NP would be 2 ways ambiguous.

      Qualification: unlike Minimalists, most of them, 'officially' at any rate, I don't think binary branching is all there is; there is also some kind of iterative mechanism, and maybe more. But in spite of the qualifications, there definitely seems to be a binary bias lurking in the area, which is surely relevant to the biology of language

      Delete
  3. Actually I am not at all skeptical of the minimalist program when compared to earlier work. I asked the question with biology of language in mind (i.e. What happened *after* the inception of the MP that informs biological investigations of the language faculty)

    ReplyDelete
    Replies
    1. Oh sorry. It depends what you intend by 'biological' investigations. I think the the main change is still (sadly) largely promissory. So here is where I think we have made some progress. I believe that the distinction between FLN and FLW is useful. I don't believe that there is likely to be much in the way of evolving insight into the question of how Merge (or some analogue) arose. It is a one shot deal and if we can make it "simple" enough then by exercising the phenotypic gambit (the supposition that simple phenotype means simple genotype) we have a story that allows Merge and its consequences to emerge. the hard part is finding the NON linguistic features of FL. Here we don't have much, IMO. So, can we show that minimality, say, is really just an instance of something more generic? I hope we can and that it is not linguistically special, but... So, I think there are projects out there, but we have not made much practical headway. That said, I think that if there is progress to be made it will start by making the FLW vs FLN distinction is some way and go after the FLW stuff. So IF we can get a story of the cognitive and computational non linguistic resources (a good account of FLW resources) then the project of figuring out how these combine with Merge becomes more than promissory. But we are not there yet. Does this help?

      Delete
    2. Thanks for the reply. However, it does make one wonder about how to judge the field "by its results, not its ambitions." This would make me think that it cannot be judged favorably wrt biology of language (i.e. the main tenet of the field)

      Delete
    3. Ah, I seems to have misunderstood our question. I thought you were talking about Evolang. There I believe our steps forward have been very modest, though like Tattersal (see his review of Berwick and Chomsky in NYRB) I think that the Merge conjecture is a major step forward and, right now, the only serious game in town. If Evolang is "biology" then we have not learned much. But of course it isn't. We have learned a lot about the computational features of FL and given that FL is a part of human biology we have learned a lot about that too. But this won't satisfy you. Why not? I have no idea. Imagine in place of humans we considered, well, bees or birds. And imagine that I have a theory of all the possible bee dances, how they are generated, the limits of what they can do, described their various dialects and how they are fixed...Imagine I did that. Would that be biology? Maybe there would be a Nobel prize up for grabs? So why bees but not Italians? Why the structure of birdsong but not Hungarian? Because only the former is biology and the latter linguistics? Really.

      But we, of course, have learned yet more. I am a big fan of recent work by Dehaene, Poeppel, Pylkannen, Marantz, Hickock, Firederici etc. They have started finding excellent brain evidence for the kind of hierarchy we care about. Do we know how the brain does this? Nope, but then we don't know about how the brain does much (think bees again). And we have found this out in the last 20 years.

      Does this touch cutting edge syntactic theory very closely? Yes and No.Yes in that recursive hierarchy is the central feature of G if MP is right. No in that virtually every theory of G has structures like this. But, and this is important, it is language facts that are driving the desire to find the mechanisms of hierarchy in the brain (as Dehaene and Poeppel make clear).

      Anything else? Sure: we have brain indices for all sorts of ling phenomena. We have psycho models of all sorts for processing and acquisition.

      But you know all of this. I know you do.So the question was not intended to generate this response but to suggest something else. Ok, what? IF the idea is that MP per se has had little influence on what we "call" bio work, this seems to me false as a matter of biography: certainly Dehaene and Friederici have dressed up their work in MP clothing. Did they have to? Nope. But they did. But the more fundamental question is if the absence of cutting edge ling in euro or cognition means that the theory is idle? Well, if that is the indicator then much of physics is not physical and much of genetics is not biological and much of behavioral biology is not biology.

      So, the question, as you know, was not innocent. But I don't mind having had the opportunity to answer it in my way.

      Delete
    4. One more point: I agree that we know relatively little about the neurobiological details or the genetic details of language. I think that this is so for two reasons. First, we are not allowed to do to humans what biologists regularly do to bugs, birds and mammals. I suspect that if ALL biologists could do to other critters was take pix of their brains then they would be roughly where we are now.

      But there is an additional problem: one that MP I think has had some hand in fixing. Until recently biologists have not worried about the mechanics behind unbounded hierarchy. They did not recognize this as a "thing." But if you don't think that this is a basic cognitive operation then you won't look for it. MP has had a hand in putting this on the agenda and so we now have a new kosher question: describe the neural circuits that implement this. You can't look for something that is out of mind.MP has made merge and unbounded hierarchy THE THING and I suspect that this is why people like Dehaene are now looking for IT. It must be part of the repertoire of neuro operations so the brain must have a circuit that executes it, and finding it is now a research topic. If this is so, then MP has had a positive effect on this one branch of biology.

      Delete
    5. Just to be clear: my question was not an attack on MP. I find virtues in it that previous frameworks lacked.

      So my question was not an innocent one not because I felt like taking MP down, but because I very rarely find new biological insights from linguistics that go beyond ambition, and was taken aback by the claim that fields should only be judged by their results (which is something I do agree with, but which left me wondering how linguistics would fair). I was looking for an honest response, which I think I got.

      I of course have no problem with studying the biology and (concomitantly) the evolution of language. It's my favorite topic, and as valid a topic as honey bee behavior. The topic itself is not the problem. I do have a problem with automatically labeling the study of something as biological just because a species needs biology to do it (by the same token, an even wider range of topics falls under physics). So it's not the topic, but how it is studied, or with what in mind. I've seen this "if bees are biology, why isn't language?" argument many times (some of them when I least expected to). That argument seems to hold on to the presupposition that whoever disagrees with the sudden, all-or-nothing, saltationist, merge-centric story of language emergence must have a problem with the idea that studying language can be biology. But no: they just think that story will not lead to results, and that it is misguided. They think that biology has much more to offer and contribute than that story. And they think that turning the impossibility of finding out how merge arose into a good thing (or the best we have) is a pretty disappointing move, after which there seems to be no way to proceed (only for those who make that move, fortunately). It misses the pluralism that honey bee Nobel prize winner Tinbergen advocated (his 1963 paper is in the same category of Turing 1950: many people like to talk about it but few actually care about what's on there).

      Work on language evolution that doesn't speak to merge can be (and effectively is) interesting. Ok, you could say "we don't know what FL is, but it is at least merge", but that's misleading. It can be at least merge but it isn't only merge. Merge alone won't get you FL.

      The FLN vs FLB (your "FLW") distinction is not really the work of linguistics, but in any case any degree of usefulness it might have has not been made apparent by anyone (that I know of). If one does adopt the distinction, however, here we disagree on which side of the dichotomy has had more substantial content put in. FLB is vast, FLN by definition is not (could be empty for all I know, rendering the distinction vacuous). FLN is pretty, elegant, something Erdős would say belongs in THE BOOK.; FLB is messy, ugly, hard to exhaust. Maybe this deters many people from being interested in the FLB side, but such as life.

      Delete
    6. Thx for this. I think I agree with most everything you've said. I have some hope that FLB work can get somewhere and I buy the idea that FLN is Merge PLUS A BUNCH OF OTHER GENERAL STUFF. It would be great if we could get our hands on what this other stuff is. So what are the mechanics of feature checking in other domains? Are there analogues of locality and what are they? So, I am on board for this and expect this to be a domain where progress is possible, and whose results will be of linguistic interest. Why? Well, though I think that Merge is basic I think that we cannot deduce the GB universals from Merge ALONE. We need locality and other conditions. So I would love to know what these are because I take MP to aim to deduce more or less all of GB universals based on Merge and general cog/computational principles. That's the game, so I need to know something about these other principles.

      Last point: insight into FLN is non-negligible, IMO. But one contribution it will make is, hopefully, focusing attention in the non-recursive aspects of language that we also need (I agree here). The problem with lots of work on language is that it is on "language." Linguists know that this is too big a category. It's like saying that biologists work on "life." Useless. We need more articulation. Bracket hierarchical recursion and what remains? Let's get specific and see if we can explain that. So thx.

      Delete
  4. I suggest that the focus on Merge is partially misplaced, and that before Merge there was Narrative, that is, an unbounded interface to episodic and layout memory (closely related if not identical facilities). The capacity to tell travel stories about where you went and what you did along the way might be quite useful to often migratory cooperative breeders using multiple food-sources, and it does not require complex phrasal syntax to support something that might be useful. E.g.

    go that-way
    stop-at river
    go-toward mountain
    stop-at pool
    many fish
    watch-for crocodiles

    Given a certain amount of elaboration of some such facility, some kind of phrase-structure would be a useful upgrade, for example supporting extensive description of particular places and entities in a more compact and less ambiguous manner than an unstructured sequence of very short utterances, by allowing the speaker to convey arbitrarily extensive identifying information about them, and the existence of center embedding in natural productions indicates that the memory support for this facility worked more like Algol than like Classic (pre-1977) Fortran (for some unknown reason).

    So the most fundamental facility would not be Merge, but rather something that might best be modelled as an iterative mechanism for traversing a network of entities and relationships, and producing a performance from which they can be at least partially reconstructed. I think Tom Roeper is sort of trying to get at this in his concept of 'direct recursion' in his acquisition of recursion work, but is struggling to hammer it into the conventional Minimalist framework, where it doesn't quite fit, because the properties of iteration are substantially different from those of embedding, in particular, far less numerically restricted, as Fred Karlsson goes on about at some length his paper in the 2010 Van Hulst recursion book. But it would appear that both a Merge-like and an Iterative mechanism are operative within sentence-level syntax.

    ReplyDelete