Friday, January 10, 2014

More on recursion

Jeff Watumall, Marc Hauser, Ian Roberts and yours truly, Norbert (WHRH) have a paper out in Frontiers of Psychology (here) that discusses the uses and abuses of the notion of recursion as deployed in current cog sci "debates" about recursion in grammar. Yes, those are scare quotes. WHRH argues that there can be no real debate because critics of the generative program (you all know who the prominent ones are, right?) use 'recursion' in, ahem, a non-standard way. There are several confusions dissected and, hopefully (I know a vain hope), put to rest. It really is a waste of time endlessly discussing these terminological issues. But Augean stables do need cleaning from time to time, so here is a recent effort.


  1. I think I have a problem with the discussion of unboundedness on the right side of p3; I'd be happier if it presented the recursive function as a *useful model* of the brain, so that the unbounded performance was not attributed the brain as a physical object, but to that model of its functioning. Many people would strongly question Chomsky's claims about what paper and pencil can do for complex center embeddings; indeed I think I find Fred Karlsson's examples from early modern German to be essentially unintelligible, even with the aid of his nice indentation formatting. I think the modelling presentation sidesteps this kind of problem; a model that includes the possibility of center embedding can explain many things, adding soft-ish limits explains more, but the structure of the (hopefully most generally illuminating) model as recursion+limits remains.

    More speculatively, another issue is the focus on recursion, what about co-recursion, an apparently widely used tool in computer science for describing processes that in principle go on forever. I think we idealize conversations that way (in many cultures, it's very hard to shut them down).

    1. I had a look at the German examples after Benjamin provided the link to the paper [thanks for that!!] and I have to say they are not unintelligible at all to a native speaker. Not your everyday conversation kind of German that's true but for anyone who has encountered Amtsdeutsch these are "small fish". There is BTW interesting psycholinguistic work showing that Germans have an easier time with center embeddings than speakers of English - presumably because we actually use them [of course usage based accounts are not very popular here].

    2. Cool! A center-embedding processing Olympiad between Greeks and Germans would be interesting too, because Modern Greeks also use some fairly complex ones (or at least did so until fairly recently) in poetic style. Greek bureaucratese on the other hand seems fairly rigidly tail-recursive.

  2. Beuatiful! paper! I coudn't help getting through it before going asleep. Avery, my impression is that the (un)boundedness isuue is explained reasonably. I like the idea of suprasential discourse.Good night. .

  3. I have a number of problems/questions with this paper starting maybe with the fact that it doesn't engage with the recent literature; like, say, Marcus Tomalins 2007 paper on Recursion in Lingua, or the 2010 book by Harry van der Hulst. Tomalin's paper in particular does an excellent job of teasing apart the different senses of recursion as they have been used. I don't think you can read this paper and blithely claim that there is a "standard" use.

    1. Even more relevant is a more recent paper also by Tomalin called "Syntactic Structures and Recursive Devices: A Legacy of Imprecision", in JoLLI in 2011.
      I quote from the abstract:
      "Specifically, it is shown that there were profound ambiguities surrounding the notion of recursion in the 1950s, and that this was partly due to the fact that influential texts such as Syntactic Structures neglected to define what exactly constituted a recursive device. As a result, uncertainties concerning the role of recursion in linguistic theory have prevailed until the present day, and some of the most common misunderstandings that have appeared in recent discussions are examined at some length. This article shows that debates about such topics are frequently undermined by fundamental misunderstandings concerning core terminology, and the full extent of the prevailing haziness is revealed."

      and from the body of the text "
      As will be shown in detail, the notion of ‘recursion’ was fundamentally ambiguous when it began to be used by linguists in the 1950s, and (as the subtitle of this article implies), these ambiguities have persisted to the present day. This unfortunate (and needless) perpetuation of imprecision has had a deleterious impact upon recent discussions of the role of recursion in linguistic theory."..."Regrettably, the informal discussion of the subject contained in the Hauser, Chomsky, and Fitch paper makes it impossible to determine which one of the above interpretations (if any) is assumed. "

    2. Although the WHRH paper could (and maybe should) have cited Tomalin's 2007 and 2011 papers, it is clearly dealing with the issues raised by Tomalin, which is to say that it is attempting to do what Tomalin says linguists should do - be precise about what recursion is and what role it plays in language.

    3. Tomalin stresses the importance of distinguishing Turing's notion of recursion from Godel's, since they are intensionally completely different.
      But WHRH, start off by saying "This thesis [i.e. the Hauser Chomsky Fitch paper] was based on the standard mathematical definition of recursion as understood by Gödel and Turing".
      So they make exactly the mistake that Tomalin warns against.
      Moreover they contradict themselves, since what Godel and Turing have in common is recursion qua computabilty. So from the abstract what recursion means is computatibility. But in the body they switch to a completely different notion, namely Godel's general recursion, (after confusing it a bit by introducing primitive recursion, a different notion again).
      So frankly, the paper is confused about which meaning of recursion they use.
      Perhaps Norbert could explain?

      The problem is, as Soare (1996) -- in his well-known paper "Computability and Recursion"-- points out, is that Turing Machines, Post systems, Church calculi, and Godelian general recursive functions are all intensionally different processes. A Turing machine is NOT "recursive" in the Godelian sense. So WHRH correctly in my view say that it is important to pay attention to the intension and not just the extension of the functions -- I agree with this -- but then confuse 4 or 5 intensionally different systems and treat them as the same.

    4. Thanks for elaborating, Alex. This isn't my area of expertise, and I gather I've missed some of the subtleties that are central to distinguishing different notions of recursion. I'll take a look at the Soare paper.

    5. (In what follows, all quotations are Godel’s exact words; I am flipping through my Complete Works of Godel, and can provide precise references if requested).

    6. It is no error to equate Turing computability with Godel recursiveness. Godel was explicit on this point: “A formal system can simply be defined to be any mechanical procedure for producing formulas, called provable formulas[...]. Turing’s work gives an analysis of the concept of ‘mechanical procedure’ (alias ‘algorithm’ or ‘computation procedure’ or ‘finite combinatorial procedure’). This concept is shown to be equivalent with that of a ‘Turing machine.’” It was important to Godel that the notion of formal system be defined so that his incompleteness results could be generalized: “That my [incompleteness] results were valid for all possible formal systems began to be plausible for me[.] But I was completely convinced only by Turing’s paper.” This clearly holds for the primitive recursive functions: “[primitive] recursive functions have the important property that, for each given set of values of the arguments, the value of the function can be computed by a finite procedure.” And even prior to Turing, Godel saw that “the converse seems to be true if, besides [primitive] recursions [...] recursions of other forms (e.g., with respect to two variables simultaneously) are admitted [i.e., general recursions].” However, pre-Turing, Godel thought that “[t]his cannot be proved, since the notion of finite computation is not defined, but it serves as a heuristic principle.” But Turing proved the true generality of Godel recursiveness. As Godel observed: “The greatest improvement was made possible through the precise definition of the concept of finite procedure, which plays a decisive role in these results [on the nature of formal systems]. There are several different ways of arriving at such a definition, which, however, all lead to exactly the same concept. The most satisfactory way, in my opinion, is that of reducing the concept of finite procedure to that of a machine with a finite number of parts, as has been done by the British mathematician Turing.” Elsewhere Godel wrote: “In consequence of [...] the fact that due to A.M. Turing’s work a precise and unquestionably adequate definition of the general notion of formal system can now be given, a completely general version of Theorems VI and XI [of the incompleteness proofs] is now possible.” And Godel did not see formal systems and Turing machines as simply extensionally equivalent: a formaly system is as constructive as a proof: “We require that the rules of inference, and the definitions of meaningful formulas and axioms, be constructive; that is, for each rule of inference there shall be a finite procedure for determining whether a given formula B is an immediate consequence (by that rule) of given formulas A1, ..., An[.] This requirement for the rules and axioms is equivalent to the requirement that it should be possible to build a finite machine, in the precise sense of a ‘Turing machine,’ which will write down all the consequences of the axioms one after the other.” This equivalence of formal systems with Turing machines established an absoluteness: “It may be shown that a function which is computable in one of the systems Si or even in a system of transfinite type, is already computable in S1. Thus, the concept ‘computable’ is in a certain definite sense ‘absolute,’ while practically all other familiar metamathematical concepts depend quite essentially on the system with respect to which they are defined.” Godel saw this as “a kind of miracle that”, in this equivalence of computability and recursiveness, “one has for the first time succeeded in giving an absolute definition of an interesting epistemological notion, i.e., one not depending on the formalism chosen.” Emil Post went further: The success of proving these equivalences raises Turing-computability/Godel-recursiveness “not so much to a definition or to an axiom but to a natural law” (Post 1936: 105). As a natural law, computability/recursiveness applies to any computational system, including a generative grammar.

    7. Yes, some of those quotes and some broader historical context are in the Soare 1996 paper I mentioned earlier. Of course you are right that all Turing-complete systems from SKI combinators to non-deterministic Turing machines are equivalent in one sense. Namely that they all define the same set of functions, the computable functions.

      But they are all intensionally different, no? P=NP for example is about the difference between deterministic and nondeterministic Turing machines.

    8. The ambiguities pointed out by Tomalin seem to persist into the TG textbook literature, so that Akmajian and Heny (1975) p 45 seem to characterize/define recursion as the capacity to produce an infinite number of outputs, implementing this in PSR's with the possibility of self-similar embedding, while Culicover (1976) p 27, 180 seems to define it as the possibility of self-similar embedding. I haven't yet found useful discussion in earlier sources that generative grammar students would have looked at, such as Aspects and Integrated Theory of Linguistic Descriptions. Emmon Bach's (1974) _Syntactic Theory_ has the best 'for ordinary working linguistics students' discussion that I've encountered so far from that era, not talking about 'recursion' as such, except in the context of recursive functions, but talking about 'recursive elements', symbols in PSG that participate in self-embedding, and distinguishing between languages that have essential (central) self-embedding, from those with initial and final self-embedding, which can be redescribed with fsms.

    9. Thanks for the interesting history lesson, Avery. But keep in mind WHRH was written to provide clarification: "There are several confusions dissected and, hopefully (I know a vain hope), put to rest". Judging by the comments [most by generativists] it would seem the hope is in vain indeed - but not because the critics of minimalism are the kinds of morons Norbert wishes everyone to believe...

  4. Secondly, like Avery I don't understand what "unbounded" means here. Consider the constant function f(n) = 0. Is this unbounded?

    1. Sure f(n) = 0 (for all n) in unbounded. If one doesn’t know it in a given case, one might in principle (presumably not in practice) put in new and new n’s and wait for a reward (that would never come). If you know the function you’ve got the entire unbounded set {(0,1), (1,1), ...}.

  5. The paragraph at the top of the second column of page 4 seems to contain some of the most important claims in the article (at least in relation to the Pirahã controversy), but it moves too quickly for me to understand it. In particular, there is a claim that some function or other is not tail recursive, but it's not clear what this function is or how we know that it isn't tail recursive. On top of this, the definition of tail recursion in footnote 2 is a bit sketchy and difficult to relate to the standard notion from computer science. The footnote then claims that “tail recursion is still recursion, so no claim remains that recursion is absent from Pirahã”. I take this to mean that the authors could concede that the relevant function was tail recursive without undermining their core argument. But given the noted equivalence between iteration and tail recursion, how would we then know that the function was defined recursively rather than iteratively?

    1. Following up on Alex's question, in what sense is iteration supposed to be equivalent to only tail recursion? I'm not aware of functions that are defined iteratively to be any less expressive than those that are defined recursively (what exactly are we talking about here anyway, mathematical functions or actual algorithms, i.e. functions as defined in some programming language).
      This might just be a(nother) terminological point but it'd be great if this could be cleared up.

    2. perhaps by "iterative" y'all meant "can be written with a single loop"? alex d can tell me if this is too strong or too weak for the two to be equivalent in extension (i.e. functions they compute). otherwise you would have to start talking about the transparency of the mapping between the two algorithms (i.e. intensions).

    3. What kind of loop? WHILE-loops suffice to implement everything in an iterative (rather than recursive) fashion, FOR-loops don't. In any case, there's nothing magic about a function calling itself...
      I'm suspecting I'm thinking at the wrong level of description, but I have no idea how to draw a "recursion" / "iteration" distinction at the purely mathematical level.

    4. In response to Ewan, I share Benjamin's questions. I'm not aware that mathematicians define any such thing as the "tail-recursive functions" (but then this could easily be ignorance on my part). If we're thinking in more programming-languagey terms then most imperative programming languages would remain Turing complete if you banned all forms of recursion, so any “recursive” function taken in extension could be defined without any function calling itself. And the lambda calculus (which isn't imperative) does not permit recursion in the sense of a function calling itself but is nonetheless Turing complete.

    5. Just a more specific response to Ewan's question. Even the restriction to a single loop doesn't do much (assuming the loop isn't bounded). That's enough to simulate an FSA with two stacks.

    6. hmm. yes. well, I mean, we do know what they were thinking of, obviously, the only case in a normal person's life where the notion "tail call" arises, namely "tail call optimization." god forbid I try to be so charitable I start doing their work for them, but there is most definitely a set of X's such that X is tail-call-optimizable and a set of X's such that X is not. I have some formal object that gives the intension of a function and there is some mechanism for recursion. I construct a mapping to another formal device that can be seen as the intension of a function (let's say a while loop). sometimes there is a way of doing this which is __ and sometimes there is not. this is perfectly well defined, the question is just whether it is doing anything useful outside of compilers (and of course what __ is, as I guess presumably there does exist some not-so-nice way of doing it anyhow).

    7. There's certainly a fact of the matter as to whether any given function call in any given program is a tail call or not, yes. And if you are considering functions intensionally this property might be interesting (e.g. for compiler optimizations, as you point out). The thing is that extensionally speaking the restriction of recursion to tail calls does nothing to restrict the class of definable functions[*]. For this reason it's difficult to see how we could know that some function within the human mind is defined using tail recursion, since once could get the same results by defining it using a loop.

      [*] Conceivably, it might be possible to define a language such that this restriction did actually restrict the class of definable programs, but this would not be the case in any normal programming language.

    8. With loops and arrays you can simulate central recursion - presumably if you are forbidden from using arrays, there will be a difference in what you cando in languages that have 'classic programmers recursion' (LISP, Algol) and languages that don't (Fortan, COBOL).

      Fred Karlsson's paper in the van der Hulst book has what seem to me to be good arguments for treating iteration and recursion aka self-similar embedding as distinct and both existent in NL syntax.

    9. Thanks for the pointer (the paper can be accessed here: )

      I'm still sceptical as to whether there really is a meaningful distinction between "mere" iteration and "full-blown" recursion. As Alex D pointed out in the previous comment, when it comes to programming languages there is no difference in expressiveness between WHILE-loops and recursive function calls (whether or not limited to tail-calls, no difference there either). So I'm somewhat puzzled when Karlsson's Table 1 on p. 6 which states that "full-blown recursion [is] not convertible to iteration".

      As for distinguishing co-ordination, apposition and the like, and bringing levels of embedding up, this seems to boil down to the question of what kind of machine is needed to account for syntax, whether (memory-less) finite state suffices or whether a push-down automaton or even something more powerful is needed. But "recursion" (or "iteration") don't seem like very helpful notions in which to couch this discussion.

      Then again, I might just not understand the relevant notions correctly, so clarification is highly appreciated.

    10. It's worth remembering that at the bottom, all real computers are, like Turing machines, iterative devices, and recursion is implemented using iteration and a stack.

      In practice, there are computations which are much easier to do recursively -- the example I used to use when I taught computer science was to print out all the n factorial permutations of a list. But I don't know of any attempts to characterise such computations.

    11. @Ben: I think Karlsson is implicitly assuming in effect that the programming language doesn't have arrays, so that you can't build and manipulate a pushdown stack using an array and an ordinary variable as a stack pointer. So for example you couldn't program a parser for a centrally recursive language in Basic without the DIM statements. (since the subroutine equivalents in that language aren't recursive either).

      The distinction is linguistically murky in that notations such as the Kleene star have been floating around in PSR notation for a very long time (I think perhaps from the Peters and Lakoff paper about NP coordination in the late 60s, but I haven't yet found my copy of the anthology it's in), without any real, well worked out consensus on when it should be used as opposed to self-similar embedding.

      The non-distinguishing of iteration and recursion increases the bafflement expressed by Alex, in that dogs are perfectly capable of iterated behavior such as barking until they are let in, but it would presumably undermine a lot of recent claims about recursion if ordinary dogs were proposed to be doing it in their everyday lives.

    12. @Alex D: Right, all I was saying was a slightly turned around version of what you say, namely, that for cases which are extensionally indistinguishable with respect to tail recursion versus - well, I guess it would have to be a while loop, no arrays, yes? - in those cases there is still an intensional matter of fact. I think that's also what they're saying in the bottom of fn 2, i.e., if it turns out that the mechanism has a stack or central recursion, then it does, regardless of whether phenomenon y in language x makes demonstrable use of it.

      I am still quite baffled by the surrounding text, as it does not say anything so straightforward and sensible: "outputs are recursed (carried forward on tape) as inputs to strongly generate structured expressions; thus the process is not a form of iteration (equivalently tail recursion) as claimed". Is the logic "structured expressions => ! iteration"? And how on earth do they _equate_ iteration and tail recursion given that the whole point of the last sentence of footnote 2 is that they are intensionally NOT equivalent, even when they are extensionally indistinguishable?! Pardon my punctuation.

    13. As for "if it turns out that the mechanism has a stack or central recursion, then it does, regardless of whether phenomenon y in language x makes demonstrable use of it", I think I agree that that's sensible (minus the "central recursion" bit, what does that add? ultimately, everything bottoms out in "iteration and a stack", as Alex C put it very succinctly).
      But now we really are left with a truism that doesn't even make use of "recursion" once its put in more precise terms (well, nothing wrong about stating truisms every once in a while)

      (Minor comment: If you kick out "arrays" for the loop, you also have to appropriately constrain tail-calls, otherwise the extensional equivalence does not hold (sorry, my computer-science lingo is very rusty, I'm sure one of the Alex's can elaborate this properly --- you need to ensure that you don't just push everything into the arguments of your tail call. But what exactly is the point of this, again?)

      I'm skeptical about "intensional facts of the matter" here, though, for to repeat:
      "It's worth remembering that at the bottom, all real computers are, like Turing machines, iterative devices, and recursion is implemented using iteration and a stack." (Alex Clark)

    14. - as for tail/central, to be clear, it's not that there's no distinction to be made, it's that it doesn't distinguish at the level of "iteration and a stack", but at a finer level (something like with or without the stack) ...

      - ... however, I agree that there's no demonstrable point. for the first reason you're alluding to, which is that there is no obvious reason to think that there are distinct cognitive mechanisms that map to recursion and iteration in the narrow range of cases where you would wonder about converting between the two ...

      - ... and for the second, which is that neither the set of strings NOR the set of structures generated seems to bear on the intensional question at hand!

      - the fact that there is an implementation level at which the distinction is moot doesn't logically preclude there being evidence for differences in mechanism at some higher level of description of what's going on. thus, the fact that I bottom out to a von neumann machine doesn't change the fact that the code sitting on my hard disk is in lisp.

      - this, in turn, does not mean that I have any idea WHAT the intensional question at hand is in this case!

    15. Talking about the brain it bottoms out in some sort of highly parallel neural architecture that we understand poorly if at all. It seems that one can make a coherent and contentful (albeit wildly speculative) claim that, say, at a suitable level of description, human cognition consists of two systems, one which is iterative, ancient and homologous to systems in non human animals, and one which is evolutionarily more recent, species specific, and which is recursive (in the defined by induction sense). That might even be right. But this is the view, if I understand it correctly, that WHRH are arguing is a misinterpretation of HCF. This is the view that PJ argue against for example.

  6. I'm confused about a number of claims in the paper. Here are a few.

    From the Evolution subsection: "(i) Computability requires proof of a procedure: a conditional branching program that generates new and complex representations by combining and manipulating symbols, as in human language; this productive process is to be contrasted with the retrieval of representations from a lookup table (finite and innately specified or memorized), as in the calls of non-human primates."
    Are the authors committed to thinking that finite sets (and their characteristic functions) are not computable (and so not recursive)? Because if so, it's pretty misleading to claim that their definition of 'recursion' or 'computation' matches Turing's (or Godel's).

    From (Un)Boundedness:
    "First, a recursive function may generate an infinite set yet only produce a finite output because of arbitrary constraints."
    Perhaps this is nit-picky, but I don't know what it would be for a function to produce anything, unless we're using a non-standard definition of functions. Functions are abstract objects. And perhaps generation makes a little more sense, it's not obvious that it does. At any rate, it can't just be the range (at least in the standard sense of range), as the authors seem to suggest elsewhere. Take the example they use: "For instance, Turing formulated a machine (a recursive function) for the effective calculability of π. The decimal expansion of π is infinite (i.e., 3.14…) such that the machine requires infinite steps and infinite memory to generate its value as would a machine for generating an infinite set of syntactic structures". The function computed by this machine (the machine is not itself a function, but a tuple) is presumably one from integers i to the ith digit of pi. It's range, then, if we're using decimal notation, is just {'0','1','2','3','4','5','6','7','8','9', [and perhaps '.']}, which is, I'm guessing, not what the authors have in mind. Perhaps, though, we could get what the authors seem to be after by saying what's generated is a relation (a set of ordered pairs).

    Finally, I don't understand the roles inductive definitions and mathematical induction are supposed to play. Inductive definition, apparently, is supposed to get us to strong as opposed to weak generation (and so, somehow, making formal language theory irrelevant? Then why are we talking about Turing, Godel, and Post at all?!). But it's not clear to me how this is supposed to work. For one thing, a single function can be given different definitions (and some might not even be inductive). And mathematical induction is supposed to get us from bounded to unbounded. I don't know what this means. Is mathematical induction supposed to be a property? What is it a property of? How is it related to the activity that mathematicians engage in? What are the bounded and unbounded things we're moving between?

    I suspect that a good deal of my confusion is merely terminological (but of course, this paper is supposed to be clearing up terminological disputes), so hopefully it can be easily removed.

    1. Very interesting comments so far. The most common denominator seems to be confusion about claims made by WHRH. That's quite a remarkable effect for a paper that is sold as an attempt to clean the 'Augean stables' [whatever was washed away by WHRH it wasn't confusion].

      I'd like to add two points to the growing list of issues requiring clarification:

      1. For decades now Chomsky and those following him as closely as Norbert have told us the only worthy object of linguistic study is I-language and have issued explicit warnings against work like WHRH because it “reinforce[s] inappropriate analogies to the formal sciences” (Chomsky, 1986, 29). Now given that [i] WHRH is teeming with analogies to the formal sciences [in fact I found very little relating specifically to I-language, less to biology] and that [ii] presumably none of the authors think these analogies are inappropriate it would seem that Chomsky [1986] was either incorrect all along or is at least in 2014. Is that so?

      2. One of the big selling features for Chomsky's LAD "program" was that [allegedly] it explains why [i] kids acquire language 'in a snap' [effortless, instantaneous, uniform across the species, etc. etc.] in spite of massive poverty of stimulus and [ii] how language acquisition differs from acquisition of other comparably complex cognitive tasks like say arithmetic [which requires instruction, is tedious, some kids have great difficulties acquiring it, some never do, etc. etc.]. At one point the difference between [i] and [ii] was explained by rich innate resources made available by LAD for [i] but not for [ii]. Now on the recent minimalist model innate resources have been radically reduced to Merge. Rather surprisingly then, in WHRH one reads:

      "We can further illustrate the I- v. E- distinction in the case of language by drawing an analogy to arithmetic. We can define I-arithmetic—represented internal to the mind/brain of an individual of the species Homo sapiens sapiens—as the function in intension that generates as its extension a set of arithmetical theorems (E-arithmetic)."
      and the illuminating footnote suggests:
      "Given that recursive procedures define linguistic and mathematical cognition, investigations into the degree of neural overlap of these domains could bear upon the FLN hypothesis".

      So if we now hypothesize 'neural overlap' between 'I-arithmetic' and 'I-language' and both provide only the recursive procedure then WHAT explains that language is a core domain while arithmetic is 'just for freaks' [to borrow Chomsky's inimitable prose]? As recently as 2009 he told us "If you want to understand an organism, you look at the core domains not the freaky things at the edges" [Chomsky, 2009, 383]. From what I can tell WHRH suggests that studying freaky things at the edges CAN illuminate the core domain [because both share neural overlap]. So was Chomsky [2009] wrong as well?

  7. I was also confused by this passage on page 4."Ultimately, any boundedness is demonstrably arbitrary as proved by the undisputed fact that recursion is unbounded in some (i.e., most or, as we submit, all) languages: i.e., it fol- lows from mathematical law that recursion is unlearnable and thus must be part of the species endowment (UG), and thus universal."

    I am not sure what sense of the word "recursion" is being used here; and what mathematical law is it that says that recursion can't be learned?

  8. Leaving to one side the terminological confusions, the important claim in the paper is the claim that in HCF 2002, which WHRH endorses, recursion just means Turing computability.
    This raises two interesting issues. First, it is really hard to make sense then of remarks in Fitch Hauser and Chomsky 2005 "The evolution of the language faculty".
    (which is the rejoinder to the Pinker and Jackendoff (PJ)critique).

    E.g. "the necessity of recursion in syntax" .. p (189)
    Clearly we don't need full Turing computability for syntax; but we do need perhaps some recursion in other senses.

    "We do not define FLN as recursion by theoretical fiat (note, we say “a key component”), which would contradict the aims of our paper, but offer this as a plausible, falsifiable hypothesis worthy of empirical exploration."

    How could one possibly falsify the claim that FLN is computable?

    Moreover, though PJ clearly and explicitly take recursion to mean "defined by induction" or "functions calling themselves", FHC do not anywhere point out that they are in error. Surely if PJ got it completely wrong then in their rejoinder this would be pointed out?

    Finally, I don't see what the content of this claim is. Essentially everyone in cognitive science from Turing (1950) onwards has assumed that the human mind/brain and indeed animal mind/brains are computable. This is an upper bound on the power of the brain. What would it even mean, empirically, to say that FLN, a component, is computable? Of course it is computable as all of the components of the human brain are computable.
    How can you possible say that Turing computability is specific to language? (i.e. in FLN) It is the most domain general and universal concept imaginable.

    So I really doubt that HCF actually meant this notion of recursion. In this context it is very interesting indeed that Hauser is a co-author of this paper, but it would be nice to have some confirmation that this was actually the sense of recursion that they meant, since the paper is so confused.

  9. "The procedure concept in ALGOL 60 was also important because it made recursion possible. Although recursion was already known, ALGOL 60 popularised the concept.", from

    Of course loops existed in widely used programming languages well before ALGOL, so I think this might illustrate the origin of what I think is the commonest understanding of what recursion is amongst working linguists (also, more recently, Culicover's (2009) textbook, and Carnie's intro).

  10. Another distinction that seems to be ignored in the paper is between 'recursive PS rules' as understood in the classic generative textbooks, and what might be called a 'nonrecursive hierarchy' model of phrase structure, as sometimes expounded in other approaches. Such a hierarchy might for example have levels of sentence, phrase and word, with the idea that a unit of one type is made out of units of lower types, with no phrases inside of phrases, only words (and different types at some of the levels, such as especially 'phrase').

    This would fit the facts of Piraha as presented by Everett quite nicely, tho there might be motivation for an additional level of 'clause' in between sentence and phrase, if correlative structures exist, which doesn't seem impossible from what I recall.

    Such descriptions would not be workable for languages with center embedding, and provide only messy descriptions for those with lots of edge-embedding such as English. So messy, in fact, that I abandoned my attempt to produce one for English NPs. There is a Bayesian angle to this, in that without the possiblity for self-similar embedding in phrase structure, the extensionally adequate grammar has innumerable small variants that fails to happen in real life (due to the repetitions), whereas, with it, these typologically rare or nonocurring forms of grammar are longer than the actual ones, so will be disfavored by the prior aka evaluation metric.

  11. Link to an early source of Kleene star in PS rules:,d.aGc

    Got it by googling on "lakoff peters symmetric predicates"; it's in the Reibel and Schane reader, which some old folks might have lying around somewhere.

    This refers to Stan Peters' 1966 dissertation.

  12. A comment in two parts:

    I have read some of the WHRH paper, and have puzzled over a number of points that are suggested. The trouble spots come early on in the paper, and it seems to me that there are some confusions here (though I know this paper is trying hard to overcome *other* confusions in the past). Perhaps my comments will be helpful to show how one reader interpreted what is there.

    Here are the two early quotations that I found difficult to get over:

    1. "The function is defined in intension: it specifies the conditions some object would need to satisfy to be subsumed in the set, and it is possible that no such objects are presented. This notion of intension is realized in a Turing machine as rules for conditional branching: e.g., “IF in state qi , reading symbol xi on the tape, THEN write yi , move one space, transition to state qj .” The machine produces a set of outputs if and only if it enters the defined configurations, but that set—the set of possible outputs—is determined (generated) by the rules
    “in advance” of—indeed independent of—any input. For example, independent of its potential application to particular inputs, a rule of arithmetic determines (generates) a range of numbers (i.e., the non-arbitrary set of possible outputs) given a domain of inputs."

    From this (and even more, from the next quotation, just below), my sense is that the writer has not understood what the question was that Turing and his contemporaries (and the previous generation) were trying to come to grips with, and that is how mathematicians can be sure about truths that cover an infinite range of cases when we are only finite creatures with finite time. (Perhaps that is too harsh, and I apologize in advance; but the paper does indeed set itself the goal of clarifying important points that have been misunderstood.) It is the age-old problem of infinity and in the present context, it could be paraphrased: Let us agree that we can define what we mean by an unbounded process: it is something can continue indefinitely, an indefinite number of times. At any particular moment in the process's unfolding, it has only applied some finite number of times; but there is no bound to it.

    While there were divergent points of view about the foundations of mathematics in the first decades of the 20th century --- that is putting it mildly --- I don't think anyone has boldly asserted that the output of an imaginary machine, such as a Turing machine, had a set of outputs that were determined in advance of any input (which is what is suggested in the quote above). Now, WHRH put the phrase "in advance" in scare quotes, and that is a bad sign, because it shows that they know that they really should not be using that phrase; they sense that there is something off, something not technically correct, and then they goes ahead and use the phrase anyway, not having some other phrase that they are willing to defend. So we can object now that the determination is not *in advance*, and the authors might think they can come back and say that they didn't really mean "in advance"; and we'll have to have this conversation all over again.

    The point of the Turing machine model is to come to grips with the finiteness of our thinking ability. In some cases, we have succeeded in finding ways to make generalizations about what will or will not happen over unbounded periods of time. As far as I know, all of those cases involve some version of the principle of induction, which is why that is so important. (The axiom of choice is perhaps a second way, if we chose to accept it, to say something about infinities.)

  13. Part 2:

    So it is wrong, it seems to me, to say that the outputs of a finite program are determined in advance and independent of any input in general (though that is true *in some cases*). The case of the non-halting Turing machine programs is precisely the case where this becomes very tricky. It has been argued that properly interpreted, a consequence of Godel's result is that no finite creature, no creature within the physical universe, can know the answer to some questions about halting, and that is a deep and perhaps disturbing discovery. It could be said, in a footnote and as a profession of religious faith, that one believes that there is an infinite God who actually does know, but that could never migrate up to the text (nor could it be published in a mathematics journal).

    In short: "in advance" cannot be interpreted literally, and I think that there is no non-literal interpretation that can replace it with that both makes sense and is true.

    2. "The range of a recursive function can be infinite, as with the arithmetical rule. The computation can run to infinity..."

    No, it can't. Its algorithm can be implemented in an unbounded fashion, but no Turing machine can run to infinity.

    "...but its rules are finitely specified and at any step in the computation only a finite amount of tape has been processed. The finiteness of the machine is fundamental: the set of theorems or data derivable or predictable in a consistent system or theory is non-arbitrary such that an extensional definition (a list of the theorems/data)
    cannot suffice; a list does not derive and thereby delimit (and thereby explain) the set.

    This, too, seems confused to me. Data or theorems (as special cases of data, perhaps) can be defined by extension, or not. Sometimes that is possible, and sometimes it is not. If a list does not suffice, then a Turing machine could be programmed to generate them, but typically there are many ways to do that, and it is only in a loose and non-technical sense that that enterprise could be said to "explain" the set.

    "Therefore, it is necessary to define the set [presumably, the set of "theorems or data" referred to just above] intensionally by a finite procedure (a rule) to derive or predict what things satisfy the conditions to be subsumed in the set or equivalently to generate (describe) all and only those things the set subsumes. In other words, if a set is non-arbitrary,..."

    Wait: have we stopped talking about "the set" mentioned above, of theorems and data? I think we have. But I am not sure.

    What is an non-arbitrary set? This is not a well-defined notion. Someone familiar with Kolmogorov complexity might take this phrase to mean a set which can be defined with by an algorithm that is shorter than the list of all of the members. But this way of thinking (which is by now standard) would never let one say something like "the reason the set subsumes all and only those things it does" ...

    "...then there must exist a reason why it subsumes all and only those things it does; and the rule is the reason."

    No, the algorithm is not the reason. Or do we have in mind a new sense of reason? I'd be grateful if the authors could say whether I have misunderstood their position.

  14. Part 1

    “The machine produces a set of outputs if and only if it enters the defined configurations, but that set—the set of possible outputs—is determined (generated) by the rules “in advance” of—indeed independent of—any input” (Watumull, et al. 2014: 2).

    I and my coauthors stand by the literal interpretation of this statement. I-language generates sets in the way axioms generate theorems. As I have discussed elsewhere (, “Chaitin observes, ‘theorems are compressed into the axioms’ so that ‘I think of axioms as a computer program for generating all theorems’ (Chaitin 2005: 65). Consider how a computer program explicitly representing the Euclidean axioms encodes only a finite number of bits; it does not—indeed cannot—encode the infinite number of bits that could be derived from its postulates, but it would be obtuse to deny that such an infinity is implicit (compressed) in the explicit axioms” (Watumull 2013: 308). Similarly, independent of their use, a ruler and compass determine a space of possible shapes. And similarly, “general principles [...] are simply innate to the language faculty, as part of the schematism that determines admissible grammars and the ways in which their rules apply, thus determining the class of languages accessible to humans by application of the language faculty” (Chomsky 1975: 91): given the schematism of UG (as constrained by third factors), a set of possible I-languages and by extension a set of possible expressions are determined (i.e., generated).

  15. Part 2

    In particular, I and my coauthors stand by the notion that expressions are generated “in advance of” input just as the rule of summation generates (i.e., determines) a set of possible sums in advance of the input of any summands. That abstract space of possible sums would exist independent of any concrete adders. The scare quotes were intended not to connote any unseriousness, but rather to scare off any connotations of temporality. (It is not as if the expressions are produced in time and stored for later output. “[A] generative system involves no temporal dimension. In this respect, generation of expressions is similar to other recursive processes such as construction of formal proofs. Intuitively, the proof ‘begins’ with axioms and each line is added to earlier lines by rules of inference or additional axioms. But this implies no temporal ordering. It is simply a description of the structural properties of the geometrical object ‘proof.’” (Chomsky 2007: 6). Chomsky’s scare quotes on “begins” are not unserious: they scare away the specter of erroneous interpretations.)

    The locution “in advance of” simply restates a property criterial of an effective procedure: determinism. Gandy, Turing’s student, defined Church’s Thesis thus: “What is effectively calculable is computable[...]. The word ‘effective’ in the thesis serves to emphasize that the process of calculation is deterministic” (Gandy 1980: 123, 124). As Turing explained in the context of “construct[ing] an automatic machine [i.e., a Turing machine] which will find all the provable formulae of the calculus[...]. It is natural to construct first a choice machine[...] to do this. We can suppose that the choices are always choices between two possibilities 0 and 1. Each proof will then be determined by a sequence of choices i1, i2, ..., in (i1 = 0 or 1, i2 = 0 or 1, ..., in = 0 or 1), and hence the number 2^n+i12^n-1+i22^n-2+...+in completely determines the proof. The [Turing] machine carries out successively proof 1, proof 2, proof 3, ....” (Turing 1936: 252). The Turing machine carries out proofs deterministically in the way a Merge-based system generates expressions deterministically. (As Turing’s biographer, Andrew Hodges, explained in the context of Turing’s theory of the brain qua Turing machine, at a “detailed level of description, [neural functioning] was all determined[...]. It was not the determinism of physics, or chemistry, or of biological cells[...]. It was something more abstract. It was the quality of being fixed in advance” (Hodges 1983: 96).)

  16. Part 3

    (If I may go off on a tangent, Turing was interested in computability not, as Professor Goldsmith implies, solely for its epistemological implications, but further—and fundamentally—for its implications for the ontology of mind. “To understand the Turing model of ‘the brain’, it was crucial to see that it regarded physics and chemistry, including all the arguments about quantum mechanics [...] as essentially irrelevant. In his view, the physics and chemistry were relevant only in as much as they sustained the medium for the embodiment of discrete ‘states’, ‘reading’ and ‘writing’. Only the logical pattern of these ‘states’ could really matter. The claim was that whatever the brain did, it did by virtue of its structure as a logical system, and not because it was inside a person’s head, or because it was [...] made up of a particular kind of biological cell formation. It was a materialist view of mind, but one that did not confuse logical patterns and relations with physical substances and things[...]. In particular it was a different claim from that of behaviourist psychology, which spoke of reducing psychology to physics. The Turing model did not seek to explain one kind of phenomenon, that of the mind, in terms of another. It did not expect to ‘reduce’ psychology to anything. The thesis was that ‘mind’ or psychology could properly be described in terms of Turing machines because they both lay on the same level of description of the world, that of discrete logical systems” (Hodges 1983: 291).)

  17. Part 4

    I and my coauthors stand by the statement “computation can run to infinity, but its rules are finitely specified and at any step in the computation only a finite amount of tape has been processed” (Watumull, et al. 2014: 2). This is simply a paraphrase of Turing: “There is no theoretical difficulty in the idea of a computer with an unlimited store. Of course only a finite part can have been used at any one time. Likewise only a finite amount can have been constructed, but we can imagine more and more being added as required. Such computations have special theoretical interest and will be called infinite capacity computers” (Turing 1950: 438-439). (Gödel was more radical in going beyond “theoretical interest” to philosophize that “although at each stage the number and precision of the abstract terms at our disposal may be finite, both (and, therefore, also Turing’s number of distinguishable states of mind) may converge toward infinity in the course of the application of the procedure” (Gödel 1972: 306).)

    “if a set is non-arbitrary, then there must exist a reason why it subsumes all and only those things it does; and the rule is the reason” (Watumull, et al. 2014: 2). Professor Goldsmith asks “What is an non-arbitrary set?” Elementary: A set is non-arbitrary if conditions on set membership exist. As Carnap explained, “the intension of a predicate ‘Q’ [...] is the general condition which an object y must fulfill in order for [the ascription] ‘Q’ to y” (Carnap 1955: 42). I-language specifies conditions an object must satisfy to be an expression of the language. “For presumably, unless the grammar is the one-sentence grammar which says ‘Any finite sequence of English words is a sentence.’, then there must be some finite sequence of English words which, by implication at least, are ruled out as deviant and I will here and now guarantee to find situations under which informants would produce some of these sentences and hearers will understand some of them[...]. This act of classifying sentences as grammatical or ungrammatical seems to be [...] relying on something like an effective procedure. In this connection, I am of course relying on certain very general hypotheses as to the character of the human brain. To be specific, I would suggest that there are many considerations which point to the idea that a Turing machine [...] is a reasonable model of the human brain” (Putnam 1961: 27, 39). I-language qua Turing machine decides the set of possible expressions: the set is non-arbitrary in this sense. (NB: We do not assume the desideratum of linguistics be the classification of (un)grammatical sentences. Our project is a description of the mechanism generative of expressions, for only it can explain why the grammatical/ungrammatical distinction even obtains. And ultimately it is the mechanism independent of its inputs/outputs that is of interest.)

    1. " Professor Goldsmith asks “What is an non-arbitrary set?” Elementary: A set is non-arbitrary if conditions on set membership exist."

      I don't really understand this. Is this another vague way of saying a computable set?

      Suppose I take a random set of numbers less than 10^100. This is computable. It has a finite condition on set membership.But it is clearly arbitrary; they are not formed according to a rule.

      Consider the set of all indices of halting Turing machines. This has a finite condition -- it is well-defined, and not arbitrary. But it is not computable.

      The Chaitin/Kolmogorov/Solomonoff stuff -- which John Goldsmith is extremely familiar with -- is a completely different theory which you do not discuss or mention or cite (correct me if I am wrong) in the paper in question. I think the Kolmogorov/Chaitin notion of randomness is a good way of making the notion of arbitrariness contentful, and additionally of formalizing versions of learning. But I don't see that it has any relation to the content of your paper.

  18. Part 5

    “if a set is non-arbitrary, then there must exist a reason why it subsumes all and only those things it does; and the rule is the reason” (Watumull, et al. 2014: 2). I and my coauthors stand by this statement. (For a discussion of “the rule is the reason,” see section 5, “Generation and Explanation,” in In the reply to this post (, I quote Turing discussing rules and explanations. Here is a snippet:

    A recursive function derives—-and thus explains—a value. A look-up table stipulates—and thus does not explain—a value. (The recursive function establishes epistemological and ontological foundations.) Turing emphasized this distinction, with characteristic wit, in discussing “Solvable and Unsolvable Problems” (1954). Imagine a puzzle-game with a finite number of movable squares. “Is there a systematic way of [solving the puzzle?] It would be quite enough to say: ‘Certainly [b]y making a list of all the positions and working through all the moves, one can divide the positions into classes, such that sliding the squares allows one to get to any position which is in the same class as the one started from. By looking up which classes the two positions belong to one can tell whether one can get from one to the other or not.’ This is all, of course, perfectly true, but one would hardly find such remarks helpful if they were made in reply to a request for an explanation of how the puzzle should be done. In fact they are so obvious that under the circumstances one might find them somehow rather insulting.” Indeed. A look-up table is arbitrary; it is equivalent to a memorized or genetically preprogrammed list. This may suffice for, say, nonhuman animal communication, but not natural language. This is particularly important for an infinite system (such as language), for as Turing explains: “A finite number of answers will deal with a question about a finite number of objects,” such as a finite repertoire of memorized/preprogrammed calls. But “[w]hen the number is infinite, or in some way not yet completed[...], a list of answers will not suffice. Some kind of rule or systematic procedure must be given.”

    Turing’s ideas are reflected in the sense of explanation Professor Goldsmith identifies with that of Kolmogorov complexity. Chaitin—projecting from Turing to Leibniz—is explicit on this notion of computation as explanation: “a scientific theory [can be seen as] a binary computer program for calculating the observations, which are also written in binary. And you have a law of nature if there is compression, if the experimental data is compressed into a computer program that has a smaller number of bits than are in the data that it explains. The greater the degree of compression, the better the law, the more you understand the data. But if the experimental data cannot be compressed, if the smallest program for calculating it is just as large as it is [...], then the data is lawless, unstructured, patternless, not amenable to scientific study, incomprehensible. In a word, random, irreducible[.] Understanding is compression[.] To comprehend is to compress” (Chaitin 2005: 64, 65). However we would maintain that compressions is necessary but not sufficient. We need to understand the program.

    1. @Jeff: Please consider my comments here

    2. I have to say i find this recursion clarification project utterly fascinating: we have now a 4 author paper [WHRH] claiming to clarify claims made in an earlier paper [HCF] about recursion. This was followed by two main posts by one of the 4 authors clarifying the clarification paper, and a 5 part post further clarifying the clarifications. In spite of this recursive clarification effort we also have dozens of comments claiming things remain unclear and now as latest treat a comment [by Greg] linking back to an earlier comment specifying what [still] remains unclear. So the only thing missing from this massively recursive project is clarity...