Tuesday, March 17, 2020

Recursion in the Time of Cholera

What better time to think about recursion as the COVID-19 virus is busy replicating itself all over the world?   

The big picture issues will come later—for the moment I will even use the term recursion loosely—because I think they become clear once we crack the empirical nut.  

Languages such as English allow infinite embedding of genitive possessors via the apostrophe -s (1a) but German, and most Germanic languages, typically allow only a single item that is a proper name or a handful of kinship terms (1b and 1c):

(1) a. Maria’s’s neighbor’s friend’s house
b. Marias Haus (Maria’s house)
c. *Marias Nachbars Freundins Haus (Maria’s neighbor’s friend’s house)

Such structures are all over language: the infinite use of finite means. One well-studied case is adjective embedding (aka stacking) in NPs as in the big red … car, which young children learn very early. Then there is the (in)famous Pirara debate on whether the language has clausal or any kind of recursive embedding. A principled solution to one of such cases has the potential of handling them all.

Before we look at how recursive embedding can be learned, let’s first look at how it can not be learned. First, deep embeddings are rare and deeper embeddings are rarer. There are very few triple possessive embeddings in child-directed English (about 20 in the entire CHILDES input of almost six million words) and no deeper ones whatsoever. Second, and more important, even if the child were to hear, say, 17-level embedding, there is no reason that infinite, or even 18-level, embedding should follow. Finally, it would not be wise to suppose that, perhaps by default, language recursively embeds across all domains. That would leave us with the huge question of reigning in numerous superset grammars all over the place. Plus, there is no evidence that German-learning children ever stray and stumble as in (1c), i.e., having to suppress the recursive rule. 

The proposal here is that recursive embedding is an instance of productivity.  The conventional notion of productivity is that it applies to a category of items without restrictions. A case in point is children’s knowledge of determiners, where productivity is defined as the interchangeability of the members in the determiner category—specifically, a/n and the—when used in combination with nouns. The recursive embedding of a rule merely has a different flavor: it applies irrespective of structural positions.   

Let me explain with the example of adjective stacking. We can view it as a case of productivity where the placement of adjectives takes place irrespective of their structural positions. For a noun phrase of the form A1A2N, if an adjective can appear in position A1, then it must be able to appear in position A2 as well, and vice versa. This conception of recursion as productivity enables us to apply learning models such as the Tolerance/Sufficiency Principle (TSP): a rule defined over N lexical items productively generalizes iff eN/lnN where e is the cardinality of the subset of items not attested under the rule in the input. A crucial property of the TSP is that N pertains to the child learner’s vocabulary, which is just a few hundred at age 2-3 when most of the recursive rules are learned. Thus, the evidence for rule productivity must come from a small set of early words, which can be approximated by examining the distributional properties of the most frequent types (here: adjectives) in child-directed input.

This idea has been explored in joint work (with Lydia Grohe and Petra Schulz) that is to appear, COVID19 permitting, at this year’s of GALANA conference in Iceland. For English, we focus on the 49 adjectives in the some 550 words known to typical 3-year-olds (found here). We use a part-of-speech tagger to extract “DA1A2N” sequences from a 5.5-million-word child- directed English corpus. All 49 adjectives appear in either A1 or A2 position, of which only 3 fail to appear in both, trivially clearing the TSP threshold (49/ln49=13). A1 and A2 are fully interchangeable: adjective stacking is productive and recursive. For German, we analyze five child-directed corpora of about 3.5 million words. We focus on the 40 most frequent adjectives and extract all “A1A2N” sequences. 38 of the 40 adjectives appear in either A1 or A2 position, of which only 7 fail to appear in both, also clearing the TSP threshold (38/ln38=10). Thus, the productivity of English and German adjective stacking can be rapidly acquired on a distributional basis.

Possessor embedding can be handled similarly. The rule has the form of “X’s Y” where X is the possessor and Y is the possessee. Productivity of recursive embedding means that nouns appearing in the X position can also appear in the Y position, and vice versa. Again, this only needs to be true of a relatively small number of highly frequent nouns in a young child learner's vocabulary. I have not run the numbers for English but Daoxin Li, a graduate student at Penn, has checked Chinese, which has the identical structure (just replace ’s with , the possessor marker). Even in a smaller corpus, 40 out of top 50 nouns appear in both X and Y positions, again meeting the productivity criterion (50/ln50=13): young children learning Chinese do know this recursive rule very early And given the state of German, there is no way for the structure to be productive: the child will not generalize the rule to all nouns but only learn the specific lexical class of items that does appear in the possessor position. 

Yes, the same argument carries over the CP embedding. Almost all of the top 10 recursively embedding verbs (think, know, say, tell, believe, etc.) that young children know and use, which all express propositional attitudes, are robustly attested in both matrix and embedded clause positions, often together, in child-directed English input (e.g., ".. think ... [CP that .. think]", "... know [CP ...  think ... ]", "told .. [CP.. said ...]"

A couple of general remarks before I finish.

  1. If correct, recursive embedding can be learned distributionally; no need to rig a theory of linguistic structures to do so.
  2. It is theoretically possible for a language, or a stage of language development, to have no recursive embedding rules at all. (NB: productive rules needn't be recursive but recursive rules must be productive.)
  3. The current proposal implies that embedding is either infinite (as in most cases) or restricted to level one (as in German possessives). There is nothing in between. This is analogous, in my view, to the nature of counting systems and the knowledge of natural numbers (I believe) it entails: productivity is a categorical notion. 
  4. Recursion, as conceived here, is somewhat independent of representational structure. That is, a linear structure that iterates (e.g., a A* finite state language) can be learned by the same mechanism, provided that the learner treats A as a discrete category of elements. The learning theory applies to linear positions as well as hierarchical positions, where interchangeability appears to the key. The origin of hierarchy, a la Merge, remains as mysterious as ever.
Take good care, everyone. Better: recursively!