If the child is tipped off that “John donated the museum the painting” is no good, then Baker’s Paradox immediately dissolves. But since negative evidence is not systematically available in language acquisition, a perennial contender in the learnability literature has been indirect negative evidence (INE): unattested or unrealized expectations constitute negative evidence.
Let’s consider a case study, one which is simpler than the dative constructions in Baker’s Paradox but has the same character. There is a class of English adjectives that in general can be used predicatively but not prenominally in noun phrases. Many of these adjectives start with an unstressed schwa (“a”) and have acquired the label “a-adjectives” (AA):
(1) a. The cat is asleep. ??The asleep cat.
b. The boss is away. ??The away boss.
c. The dog is awake. ??The awake dog.
d. The child is alone. ??The alone child.
e. The troops are around. ??The around troops.
Boyd and Goldberg (2011, Language) claim that these properties of AAs are genuinely idiosyncratic and require “statistical preemption” to be acquired. The ungrammaticality of prenominal usage is blocked by the availability of paraphrases such as “the cat that is asleep” or “the scared cat”: “the asleep/afraid cat” are thus prevented. This proposal has precedents in Wexler and Culicover’s Principle of Uniqueness and Clark’s Principle of Contrast, which Pinker notes in his 1989 book Learnability and Cognition as “surrogate for indirect negative evidence”.
INE should be avoided if possible. First, it remains unclear how to implement INE computationally or psychologically. Its standard use is for the learner to avoid the superset trap. If the learner conjectures a superset/larger hypothesis, the failure to observe (some of) the expected expressions may lead them to retreat to the subset/smaller hypothesis. But to do so may require computing, and comparing, the extensions of these two hypotheses to determine the superset-subset relationship, which can be computationally costly (Osherson et al. 1986, MIT Press Fodor & Sakas 2005, J. Ling.) or even uncomputable. Recent probabilistic approaches in the MDL/Bayesian framework tend to focus on the abstract property of using IDE in an ideal learner, without specifying psychologically motivated learning algorithms. Second, statistical preemption does not seem all that effective. In a grammaticality judgment study of the dative constructions in Baker’s Paradox (Ambridge et al 2012 Cognition), statistical preemption was found not to offer additional explanatory power beyond the semantic criteria in the Pinker/Levin line of work (more on these in the next post). Finally, and most important, INE appears to make wrong predictions. If the ungrammaticality of prenominal AAs is due to the blocking effect of paraphrase equivalents, then the relative clause use of typical adjectives should likewise be blocked if they consistently appear prenominally. In a 3 million word corpus of child directed English that I extracted from CHILDES, there are many adjectives, ranging from very frequent ones (e.g., “red”, which appears thousands of times) to relatively rare ones (e.g., “ancient”) that are exclusively used prenominally to modify the noun. Yet they can be used in a relative clause without any difficulty.
So, what is to be done? How does the child learn what not to say if INE is not up to the job? We must turn to the positive.
There is evidence, and crucially evidence in the PLD, that suggests that the AAs are not as idiosyncratic as they appear, but belong to a more general classes of linguistic units. On the one hand, there are non-a-adjectives that show similar restrictions:
(2) a. The chairperson is present. *The present chairperson (spatial sense)
b. The receptionist is out. *The out receptionist
c. The game is over. *The over game
On the other, the ungrammaticality of prenominal use of AAs appears to be associated not with a fixed list but with the aspectual prefix a-, which may be combined with stems to create novel adjectives that show the same type of restriction (Salkoff 1983, Lg., Coppock 2008, Standard dissertation):
(3) a. The tree is abud with green shoots.
?* An abud tree is a beautiful thing to see.
b. The water is afizz with bubbles.
?* The afizz water was everywhere.
Larson & Marusic (2004, LI) note that all AAs are decomposable into a- and a stem (bound or free). (4) is their list with a few of my own additions; none is generally acceptable in prenominal use.
(4) abeam, ablaze, abloom, above, abroad, abuzz, across, adrift, afire, aflame, afraid, agape, aghast, agleam, aglitter, aglow, aground, ahead, ajar, akin, alight, alike, alive, alone, amiss, amok, amuck, apart, around, ashamed, ashore, askew, aslant, asleep, astern, astir, asunder, atilt, averse, awake, aware, awhirl, away
By contrast, a- combining with a non-stem forms a typical adjective, as in “the above examples”, “the aloof professor”, “the alert student”, etc.
Note that even if the morphological characterization is true, the acquisition problem does not go away. First, the learner must recognize that the a-stem combination forms a well defined set of adjectives; that is, they must be able to carry out morphological decomposition of these adjectives. Second, they still have to learn that the adjectives thus formed cannot be used prenominally in NPs, which is the main issue at stake.
There is evidence that AAs patterns like PPs. (I thank Ben Bruening for discussion of these matters. Ben has written a few blogposts about AAs including an exchange with Adele Goldberg). If the child learns this, and independently knows that PPs cannot be used pronominally in an NP, then they wouldn’t put AAs there either. Of the several diagnostics proposed by a number of authors, the most robust is the ability for AAs to be modified by adverbs such as right, well etc. that express the meaning of intensity or immediacy:
(5) a. I was well/wide awake at 4am.
b. The race leader is well ahead.
c. The baby fell right/sound asleep.
d. You can go right ahead.
e. The guards are well aware (of the danger).
To be sure, not all AAs may be modified as such (“??I was well/right afraid”), but the adverbial modification cannot appear with typical adjectives while they are compatible with PPs:
(6) a. *The car is right/straight/well new/nice/red.
b. The cat ran straight into the room.
c. The rocket soared right across the sky.
d. The search was well under way.
The child must be able to deduce these properties of AAs—that they are made of the prefix a- and an actual stem and that they pattern like PPs—on the basis of positive evidence in the PLD. I examined a 3 million word child directed input dataset from CHILDES, which corresponds to about a year of data for some language learners (Hart & Risley 2003). Extracting all the adjectives and trying to lop off the word initial schwa gives us two lists:
(7) a. Containing a stem: afraid, awake, aware, ashamed, ahead, alone, apart, around, alive, asleep, away
b. Not containing a stem: amazing, annoying, allergic, available, adorable, another,
american, attractive, approachable, acceptable, agreeable, affectionate, adept,
above, aberrant
american, attractive, approachable, acceptable, agreeable, affectionate, adept,
above, aberrant
It is clear that the presence or absence of a stem neatly partitions these adjectives into two classes: (7a) are all AAs that contain a (highly frequent) stem and (7b) are all typical adjectives. Of the AAs in (7a), these in bold (8 out of 11) were indeed attested with adverbial modification of the type in (5). Should the child now conclude that 8/11 is good enough to generalize to the entire class of AAs?
They should. This is the typical situation in language acquisition. In almost all cases of language learning, the child will not be able to witness the explicit attestation of every member of a linguistic class, never mind novel members that have just entered into the language: generalization is necessary. Thus, a productive generalization can (and must be) acquired if the learner witnesses “enough” positive attestations over lexical items. Conversely, if the learner does not witness enough positive instances, it will decide the generalization is unproductive, proceed to lexicalize the positively attested examples and refrain from extending the pattern to novel items. I happen to think this is the typical case of all types of learning. Suppose you encountered 11 exotic animals on an island but have only seen 8 of them breathing fire while the other 3 seem quite friendly; best not get too close.
The key question, then, is what counts as “enough” positive evidence. Again, this is the typical question in language acquisition. To take a well known example, all English children learn that the “-ed” rule is productive because it applies to “enough” verbs (i.e., the regulars) despite the presence of some 120 irregular verbs. We need a model of generalization that goes beyond the attested examples and deduces that the unattested examples would behave like the attested ones.
This post is already getting long. I do have a model of generalization on the offer: if a generalization is applicable to N lexical items, the learner can tolerate no more than N/lnN exceptions or unattested examples. (More on this to follow, when we deal with Baker’s Paradox for real.) If N is 11, it gives us a threshold of 4, just enough to allow for the missing afraid, aware and ashamed. In other words, the child should be able to conclude something along the line of “a- plus stem = PP”.