Tuesday, March 17, 2020

Recursion in the Time of Cholera

What better time to think about recursion as the COVID-19 virus is busy replicating itself all over the world?   

The big picture issues will come later—for the moment I will even use the term recursion loosely—because I think they become clear once we crack the empirical nut.  

Languages such as English allow infinite embedding of genitive possessors via the apostrophe -s (1a) but German, and most Germanic languages, typically allow only a single item that is a proper name or a handful of kinship terms (1b and 1c):

(1) a. Maria’s’s neighbor’s friend’s house
b. Marias Haus (Maria’s house)
c. *Marias Nachbars Freundins Haus (Maria’s neighbor’s friend’s house)

Such structures are all over language: the infinite use of finite means. One well-studied case is adjective embedding (aka stacking) in NPs as in the big red … car, which young children learn very early. Then there is the (in)famous Pirara debate on whether the language has clausal or any kind of recursive embedding. A principled solution to one of such cases has the potential of handling them all.

Before we look at how recursive embedding can be learned, let’s first look at how it can not be learned. First, deep embeddings are rare and deeper embeddings are rarer. There are very few triple possessive embeddings in child-directed English (about 20 in the entire CHILDES input of almost six million words) and no deeper ones whatsoever. Second, and more important, even if the child were to hear, say, 17-level embedding, there is no reason that infinite, or even 18-level, embedding should follow. Finally, it would not be wise to suppose that, perhaps by default, language recursively embeds across all domains. That would leave us with the huge question of reigning in numerous superset grammars all over the place. Plus, there is no evidence that German-learning children ever stray and stumble as in (1c), i.e., having to suppress the recursive rule. 

The proposal here is that recursive embedding is an instance of productivity.  The conventional notion of productivity is that it applies to a category of items without restrictions. A case in point is children’s knowledge of determiners, where productivity is defined as the interchangeability of the members in the determiner category—specifically, a/n and the—when used in combination with nouns. The recursive embedding of a rule merely has a different flavor: it applies irrespective of structural positions.   

Let me explain with the example of adjective stacking. We can view it as a case of productivity where the placement of adjectives takes place irrespective of their structural positions. For a noun phrase of the form A1A2N, if an adjective can appear in position A1, then it must be able to appear in position A2 as well, and vice versa. This conception of recursion as productivity enables us to apply learning models such as the Tolerance/Sufficiency Principle (TSP): a rule defined over N lexical items productively generalizes iff eN/lnN where e is the cardinality of the subset of items not attested under the rule in the input. A crucial property of the TSP is that N pertains to the child learner’s vocabulary, which is just a few hundred at age 2-3 when most of the recursive rules are learned. Thus, the evidence for rule productivity must come from a small set of early words, which can be approximated by examining the distributional properties of the most frequent types (here: adjectives) in child-directed input.

This idea has been explored in joint work (with Lydia Grohe and Petra Schulz) that is to appear, COVID19 permitting, at this year’s of GALANA conference in Iceland. For English, we focus on the 49 adjectives in the some 550 words known to typical 3-year-olds (found here). We use a part-of-speech tagger to extract “DA1A2N” sequences from a 5.5-million-word child- directed English corpus. All 49 adjectives appear in either A1 or A2 position, of which only 3 fail to appear in both, trivially clearing the TSP threshold (49/ln49=13). A1 and A2 are fully interchangeable: adjective stacking is productive and recursive. For German, we analyze five child-directed corpora of about 3.5 million words. We focus on the 40 most frequent adjectives and extract all “A1A2N” sequences. 38 of the 40 adjectives appear in either A1 or A2 position, of which only 7 fail to appear in both, also clearing the TSP threshold (38/ln38=10). Thus, the productivity of English and German adjective stacking can be rapidly acquired on a distributional basis.

Possessor embedding can be handled similarly. The rule has the form of “X’s Y” where X is the possessor and Y is the possessee. Productivity of recursive embedding means that nouns appearing in the X position can also appear in the Y position, and vice versa. Again, this only needs to be true of a relatively small number of highly frequent nouns in a young child learner's vocabulary. I have not run the numbers for English but Daoxin Li, a graduate student at Penn, has checked Chinese, which has the identical structure (just replace ’s with , the possessor marker). Even in a smaller corpus, 40 out of top 50 nouns appear in both X and Y positions, again meeting the productivity criterion (50/ln50=13): young children learning Chinese do know this recursive rule very early And given the state of German, there is no way for the structure to be productive: the child will not generalize the rule to all nouns but only learn the specific lexical class of items that does appear in the possessor position. 

Yes, the same argument carries over the CP embedding. Almost all of the top 10 recursively embedding verbs (think, know, say, tell, believe, etc.) that young children know and use, which all express propositional attitudes, are robustly attested in both matrix and embedded clause positions, often together, in child-directed English input (e.g., ".. think ... [CP that .. think]", "... know [CP ...  think ... ]", "told .. [CP.. said ...]"

A couple of general remarks before I finish.

  1. If correct, recursive embedding can be learned distributionally; no need to rig a theory of linguistic structures to do so.
  2. It is theoretically possible for a language, or a stage of language development, to have no recursive embedding rules at all. (NB: productive rules needn't be recursive but recursive rules must be productive.)
  3. The current proposal implies that embedding is either infinite (as in most cases) or restricted to level one (as in German possessives). There is nothing in between. This is analogous, in my view, to the nature of counting systems and the knowledge of natural numbers (I believe) it entails: productivity is a categorical notion. 
  4. Recursion, as conceived here, is somewhat independent of representational structure. That is, a linear structure that iterates (e.g., a A* finite state language) can be learned by the same mechanism, provided that the learner treats A as a discrete category of elements. The learning theory applies to linear positions as well as hierarchical positions, where interchangeability appears to the key. The origin of hierarchy, a la Merge, remains as mysterious as ever.
Take good care, everyone. Better: recursively!


  1. Interesting!

    I would like to point out that there is no need to use the word _recursion_ here (_embedding_ is just enough). Technically speaking Merge itself has nothing to do with recursion - recursion is a manner of applying a function which is defined in terms of itself. I wrote a post about it not so long ago:

    1. Hi Kirill, Thanks for the link: I agree, especially as I entertain the possibility that the TSP, or similar threshold-based learning principles, may be domain general. At the same time, it is conceivable that Merge makes combinatorics possible such that the learner would attend to discrete categories of elements and their positional invariance in the input data. In other words, unless we get a bird/dog/monkey, who doesn't have Merge, to learn embedding (in a linear system), we still do not know whether language is critical for "recursion".

      Be safe.

    2. Thank you for a very clear discussion on drawing a line between Merge vs recursion! As you've argued, we shouldn't be sloppy in our descriptions ('Merge produces a recursive structure' and 'Merge can be applied recursively' not being the same as 'Merge is recursive') in general, and especially when it come to evolang research.

  2. There are some interesting points in time in this article but I don’t know if I see all of them center to heart. There is some validity but I will take hold opinion until I look into it further. Good article , thanks and we want more!
    professional translation services

  3. c. *Marias Nachbars Freundins Haus (Maria’s neighbor’s friend’s house)

    I don't think it has any bearing on your point, but for the sake of completeness I should probably mention that German does stack possessives – just not so much in the genitive case.

    Printable options (I'm a native speaker):

    das Haus der Freundin des Nachbarn von Maria
    the house the.F.GEN friend-F the.M.GEN neighbor.M-GEN of Maria

    das Haus der Freundin von Marias Nachbar(n)
    the house the.F.GEN friend-F of Maria-GEN neighbor.M(-DAT)

    Part of the reason is that the genitive died out 500 years ago in almost all dialects, but remained in the written language, and in the last century or so people in much of Germany have tried to create colloquial registers for it that they speak to their children, so there are now lots of people whose native idiom is Written German with heavy influence from various genitive-free substrates. Like so:

    das Haus von der Freundin von Marias Nachbar(n)
    the house of the.DAT friend.F of Maria-GEN neighbor.M(-DAT)

    das Haus von der Freundin vom Nachbarn von Maria
    the house of the.DAT friend-F of:the.DAT neighbor.M-DAT of Maria

    For comparison, my genitive-free dialect offers two options which could be rendered in Standard German vocabulary as follows (the sound system is so different it can't be represented by German spelling conventions):

    das Haus von der Freundin vom Nachbarn von der Maria
    the house of the.DAT friend-F of:the.DAT neighbor.M-DAT of the Maria

    das Haus von der Freundin von der-Maria-ihrem Nachbarn
    the house of the.DAT friend-F of the.DAT-Maria-her.DAT neighbor-DAT

    (That last one uses the most dread der Dativ ist dem-Genitiv-sein Tod construction, which is very widespread and has made it into the title of an infamous prescriptivist book.)

  4. Types of switches in networking

    What is Linux and why it is used?

    Thanks for sharing this wonderful article. Keep going and If you want information about computers and IT, then go and see on my blog.