Faculty of Language: The significant gaps in the PoS argument

Wednesday, February 14, 2018

The significant gaps in the PoS argument

I admit it, the title was meant to lure some in with the expectation that Hornstein was about to recant and confess (at last) to the holiness (as in full of holes) of the Poverty of Stimulus argument (PoS). If you are one of these, welcome (and gotcha!). I suspect, however, that you will be disappointed for I am here to affirm once again how great an argument form the PoS actually is, and if not ‘holy’ then at least deserving of the greatest reverence. The remarks below are prompted by an observation in a recent essay by Epstein, Kitahara and Seely (EKS’s) (here p. 51, emphasis mine):

…Recognizing the gross disparity between the input and the state attained (knowledge) is the first step one must take in recognizing the fascination surrounding human linguistic capacities. The chasm (the existence of which is still controversial in linguistics, the so-called poverty of the stimulus debate) is bridged by the postulation of innate genetically determined properties (uncontroversial in biology)…

This is a shocking statement! And what makes it shocking is EKS’s completely accurate observation that many linguists, psychologists, neuroscientists, computer scientists and other language scientists still wonder whether there exists a learning problem in the domain of language at all. Yup, after more than 60 years of, IMO, pretty conclusive demonstrations of the poverty of the linguistic stimulus (and several hundred years of people exploring the logic of induction) the question of whether such a gap exists is still not settled doctrine. To my mind, the only thing that this could mean is that skeptics really do not understand what the PoS in the domain of language (or any other really) is all about. If they did, it would be uncontroversial that a significant gap (in fact, as we shall see several) exists between evidence available to fix the capacity and the capacity attained. Of course, recognizing that significant gaps exist does not by itself suffice to bridge them. However, unrecognized problems are particularly difficult to solve (you can prove this to yourself by trying to solve a couple of problems that you do not know exist right now) and so as a public service I would like to rehearse (one more time and with gusto) the various ways that the linguistic input (aka, Primary Linguistic Data (PLD)) to the language acquisition device (LAD (aka child)) underdetermines the structure of the competence attained (knowledge of G that a native speaker has). The deficiencies are severe and failure to appreciate this stems from a misconception of what G acquisition consists in. Let’s review.

There are three different kinds of gaps.

The first and most anodyne relates to the quality of the input. There are several ways that the quality might be problematic. Here are some.

1. The input PLD is in the form of uttered bits of language. This gap adverts to the fact that there is a slip betwixt phrases/sentences (the structures that we know something about) and the lip(s that utter them). So the PLD in the ambient environment of the LAD is not “perfect.” There are mispronunciations, half thoughts badly expressed, lots of ‘hmms’ and ‘likes’ thrown in for no apparent purpose (except to irritate parents), misperceptions leading to misarticulations, cases of talking with one’s mouth full, and more. In short, the input to the LAD is not ideal. The input data are not perfect examples of the extensional range of sentences and phrases of the language.

2. The range of input data also falls short. Thus, since forever (Newport, Gleitman and Gleitman did the leg work on this over 30 years ago (if not more)) it has been pointed out that utterances addressed to LADs are largely in imperative or interrogative form. When talking to very young kids, native speakers tend to ask a lot of questions (to which the asker already knows the answer so it is not a “real” question) and issue a log of commands (actually many of the putative questions are also rhetorically disguised commands). Kids, in contrast, are basically declarative utterers. In other words, kids don’t grow up talking motherese, though large chunks of the input has a very stylized phon/syntactic contour (at least in some populations). They don’t sound mothereesish and they eschew use of the kinds of sentences directed at them. So even at a gross level, the match between what LADs hear in their ambient environment and what they do mismatches.

So, the input is not perfect and the attained competence is an idealized version of what the LAD actually has access to. Even the Structuralists appreciated this point, knowing full well that texts of actual speech needed curation to be taken as useful evidence for anything. Anyone who has ever read a non-edited verbatim text of an interview knows that the raw uttered data can be pretty messy. As indeed are data of any kind. The inputs vary in the quality of the exemplar, some being closer approximations to the ideal than others. Or to put this another way: there is a sizeable gap between the set of sentences and the set of utterances. Thus, if we assume that LADs acquire sentential competence, there is a gap to be traversed in building the former set from the latter.

The second gap between input and capacity attained is decidedly more qualitative and significant. It is a fact that native speakers are linguistically creative in the sense of effortlessly understanding and producing linguistic objects never before experienced. Linguistic creativity requires that what an LAD acquires on the basis of PLD is not a list of sentences/phrases previously encountered in ambient utterances but a way of generating an open ended list of acceptable sentences/phrases. In other words, what is acquired is (at least) some kind of generative procedure (rule system or G) that can recursively specify the open ended set of linguistic objects of the native speaker’s language. And herein lies the second gap: the outputs of the acquisition process is a G or set of rules while the input is products of this G or set of rules AND products of rules and rules (or sentences and the Gs that generate them) are ontologically different kinds of objects. Or to put this another way: an LAD does not “experience” Gs directly but only via their products and there is a principled gap between these products and the generative procedures that generate them. Furthermore, so far as I know nobody has ever shown how to bridge this ontological divide by, say, using conventional analytical methods. For example, so far as I know the standard substitution methods prized by the transitional probability crowd has yet to converge on the actual VP expansion rule (one that includes ate, ate a potato, ate a potato that I bout at the store, ate a potato that I bought at the store that is around the block, ate a potato that I bought at the store that is around the block near the gin joint whose owner has a red buggy which people from Detroit want to buy for a song, etc. All of these are VPs but so far that they are all VPs is not something that artificial G learners have managed to cover. Recursion really is a pain).

In fact, it is worse than this. For any finite set of data specified extensionally there are an infinite number of different functions that can generate those data. This observation goes back to Hume, and has been clear to anyone that has thought about the issue for at least the last several hundred years. Wittgenstein made this point. Goodman made it. Fodor made it. Chomsky made it. Even I, standing on the shoulders of giants) have been known to make it. It is a very old point. The data (always a finite object) cannot speak for themselves in the sense of specifying a unique function that generate those data. Data cannot bootstrap a unique function. The only way to get from data to the functions that generate them is to make concrete assumptions about the specific nature of the induction and in one way or other, this means specifying (listing them, ranking them) the class of admissible functional targets. In the absence of this, data cannot induce functions at all. As Gs or generative procedures just are functions, there is a qualitative irreducible ontological difference between the PLD and the Gs that the LAD acquires. There is no way to get from any data to any function (induce any G from any finite PLD) without specifying in some way the range of potential candidate Gs. Or, to say this another way, if the target is Gs and the input is a finitely specified list of data, there is no way of uniquely specifying the target in terms of the properties of the data list alone.

All of which leads to consideration of a third gap: the evidence required for choosing among plausible competing Gs to which native speakers converge is underdetermined by the PLD available for deciding among competing Gs. So, not only do we need some way of getting native speakers from data to functions that generate that data, but even given a list of plausible options, there exists little evidence in the PLD itself for choosing among these plausible functions/Gs. This is the problem that Chomsky (and moi) has tended to zero in on in discussions of PoS. So for example, a perfectly simple and plausible transformation would manipulate inputs in virtue of their string properties, another in virtue of their hierarchical properties. We have evidence that native speakers do the latter, not the former. Evidence for this conclusion exists in the wider linguistic data (surveyed by the linguist and including complex cases and unacceptable data) but not the Primary linguistic data the child as access to (which is typically quite simple and well formed). Thus, the fact that LADs induce along structure dependent lines is an induction to a particular G from a given list of possible Gs with no basis in the data justifying the induction. So not only do humans project and not only do they do so uniformly, they do so uniformly in the absence of any available evidence that could guide this uniform projection. There are endlessly many qualitatively different kinds of inductions that could get you from PLD to a G and native speakers project in the same way despite no evidence in the PLD constraining this projection. The gap between plausible Gs and the PLD is strongly underdetermined.

It is worth observing that this gap does not require that the set of grammatical objects be infinite (though making the argument is a whole lot easier if something like this is right). The point is that native speakers make systematic conclusions about novel data. That they conclude anything at all implies something like a rule system or generative procedure. That native speakers largely do so in the same way suggests that they are not (always) just guessing.[1] The systematic projection from the data provided in the input to novel examples implies that LADs generalize in the same way from LAD input. Thus, native speakers project the same Gs from finite sample examples of those Gs. And that is the problem: how do native speakers bridge the gap between samples of Gs and the Gs of which they are samples in the same way despite the fact that there are infinitely many ways to project from a finite samples to functions that generate those samples. Answer: the projection is natively constrained in some way (plug in your favorite theory of UG here). The divide between sample data and Gs that generate them is further exacerbated by the fact that native speakers converge on effectively the same kinds of Gs despite having precious little evidence in the PLD for making a decision among plausible Gs.

So there are three gaps: a gap between utterances and sentences, a gap between sentences/phrases and the Gs that generate them and a gap between the Gs that are projected and the evidence to choose among these Gs in the PLD the child has access to while fixing these Gs. Each gap presents its own challenges, though it is the last two that are, IMO, the most serious. Were the problem restricted to getting from exemplars to idealized examples (i.e. utterances to sentences) then the problem would be solvable by conventional statistical massaging. But that is not the problem. It is at most one problem, and not the big one. So are there chasms that must be jumped, and gaps that must be filled? Yup. And does jumping them/filling them the way we do require something like native knowledge? Yup. And will this soon be widely accepted and understood? Don’t bet on it.

Let me end with the following observation. Randy Gallistel recently gave a talk at UMD observing that it is trivial to construct PoS arguments in virtually every domain of animal learning. The problem is not limited to language or to humans. It saturates the animal navigation literature, the animal foraging literature, and the animal decision literature. The fact, as he pointed out, is that in the real world animals learn a whole lot from very little while Eish theories of learning (e.g. associationism, connectionism, etc) assume that animals learn very little from a whole lot. If this is right, then the main problem with Eish theories (which as EKS note still (sadly) dominate the mental and neural sciences and which lie behind the widespread skepticism concerning PoS as a basic fact of mental life in animals) is that the way that it frames the problem of learning has things exactly backwards. And it is for this reason that Es have a hard time spotting the gaps. The failure of Eism, in other words, is not surprising. If you misconceive the problem your solutions will be misconceived as well.

[1] The ‘always’ is a nod to interesting work by Lidz and friends that suggests that sometimes this is exactly what LADs do.

19 comments:

UnknownFebruary 14, 2018 at 6:51 AM
If you want to have any generalization at all beyond the input, you need to have an innate bias (see Tom Mitchell's 1980 classic). There's no way around it, and any "E" who denies that is confused (I'm not sure if there's a particular E you have in mind here, though). The question has always been about the content of that innate bias.
ReplyDelete
Replies
UtpalFebruary 14, 2018 at 8:56 AM
That link doesn't work. You need to do it directly: http://www.cs.cmu.edu/~tom/pubs/NeedForBias_1980.pdf
ReplyDelete
Replies
UnknownFebruary 14, 2018 at 11:45 AM
So convincing is this reasoning that I think its denial actually amounts to a denial of Gs. Y'know, our competence isn't really unbounded; it's not really independent of performance interference; it's not really sharp in deciding what's grammatical and what isn't; it's not really universal; etc
ReplyDelete
Replies

Add comment