Faculty of Language: idealization

In a previous post (here) I discussed two possible PoS arguments. I am going to write about this again, mainly to clarify my own thinking. Maybe others will find it useful. Here goes. Oh yes, as this post got away from me lengthwise, I have decided to break it onto two parts. Here’s the first.

The first PoS argument (PoS1) aims to explain why some Gs are never attested and the other (PoS2) aims to examine how Gs are acquired despite the degraded and noisy data that the LAD exploits in getting to its G. PoS1 is based on what we might call the “Non-Existing Data Problem” (NEDP), PoS2 on the “Crappy Data Problem” (CDP). What I now believe and did not believe before (or at least not articulately) is that these are two different problems each raising their own PoS concerns. In other words, I have come to believe (or at least think I have), that I was wrong, or had been thinking too crudely before (this is a slow fat ball down the middle of the plate for the unkind; take a hard whack!). On the remote possibility that my mistakes were not entirely idiosyncratic, I’d like to ruminate on this theme a little and in service of this let me wax autobiographical for a moment.

Long long ago in a galaxy far far away, I co-wrote (with David Lightfoot) a piece outlining the logic of the PoS argument (here, see Introduction). In that piece we described the PoS problem as resting on three salient facts (9):

(1) The speech the child hears is not “completely grammatical” but is filled with various kinds of debris, including slips of the tongue, pauses, incomplete thoughts etc.

(2) The inference is from a finite number of G products (uttered expressions) to the G operations that generated these products. In other words, the problem is an induction problem where Gs (sets of rules) are projected from a finite number of examples that are the products of these rules.

(3) The LAD attains knowledge of structures in their language for which there is no evidence in the PLD.

We summarized the PoS problem as follows:

… we see a rich system of knowledge emerging despite a poverty of the linguistic stimulus and despite being underdetermined by the data available to the child. (10)

We further went on to argue that of these three data under-determination problems the third is the most important for it logically highlights the need for innate structure in the LAD. Or, more correctly, if there are consistent generalizations native speakers make that are only empirically manifested in complex structures that are unavailable to the LAD then these generalizations must reflect the structure of the LAD rather than that of the PLD. In other words, cases where the NEDP applied can be used as direct probes into the structure of the LAD and, as there are many cases where the PLD is mute concerning the properties of complex constructions (again, think ECP effects, CED effects, Island Effects, Binding effects etc), these provide excellent (indeed optimal) windows into the structure of FL (i.e. that component of the LAD concerned with acquiring Gs).

I still take this argument form to be impeccable. However, the chapter went on to say (this is likely my co-authors fault, of course! (yes, this is tongue in cheek!!!)) the following:

If such a priori knowledge must be attributed to the organism in order to circumvent [(3)], it will also provide a way to circumvent [(1)] and [(2)]. Linguists need not concern themselves with the real extent of deficiencies [(1)] and [(2)]; the degenerateness and finiteness of the data are not real problems for the child because of the fact that he is not totally dependent on his linguistic experience, and he knows certain things a priori; in many areas, exposure to a very limited range of data will enable a child to attain the correct grammar, which in turn will enable him to utter and understand a complex range of sentence types. (12-13).

And this is what I no longer believe. More specifically, I had thought that solving the PoS based solely on the NEDP would also suffice to solve the acquisition problem that the LAD faces due to the CDP. I very much doubt that this is true. Again, let me say why. As background, let’s consider again the idealizations that bring PoS1 into the clearest focus.

The standard PoS makes the following very idealized assumptions:

(4) a. The LAD is an ideal speaker-hearer.

b. The PLD is perfect PLD: from a single G, presented “all at once,”

c. The PLD is “simple.” simple clauses more or less

What’s this mean? (4a) abstracts away from reception problems. The LAD does not “mishear” the input, its attention never wavers, its articulations are always pristine, etc. In other words, the LAD can extract whatever information the PLD contains. (4b) assumes that the PLD on offer to the LAD is flawless. Recall that the LAD is exposed to linguistic utterances from which it must look for grammatical structure. The utterances may be better or worse vehicles for these structures. For example, utterances can be muddy (mispronunciation), imperfect (spoonerisms, slips of the tongue), incomplete (hmming and hawing and incomplete thoughts). Moreover, in the typical acquisition environment, the ambient PLD consists of utterances of linguistic expressions (not all of them sentences) generated by a myriad of Gs. In fact, as no two Gs are identical (and even one speaker typically has several registers) it is very unlikely that any single G can cover all of the actual PLD. (4b) abstracts away from this. It assumes that utterances have no performance blemishes and that all the PLD is the product of a single G.

These assumptions are heroic, but they are also very useful. Why? Because together with (4c) they serve to focus attention on PoS1, which recall is an excellent window (when available) into the native structure of FL. (4c) restricts the PLD to “simple” input. As noted (here) a good proxy for “simple” is un-embedded main clauses (plus a little bit, Degree 0+).[1] In effect, assumptions (4a,b) abstract away from the CDP and (4c) focuses attention on the NEDP and what it implies for the structure of LADs.

As indicated, this is an idealization. Its virtue is that it allows one to cleanly focus on a simple problem with big payoffs if one’s interest is in the structure of FL.[2] The real acquisition situation however is known to be very different. In fact, it’s much more like (5):

(5) a. Noisy Data

b. Non-homogeneous PLD

Thus, the actual PLD is problematic for the LAD in two important ways in addition to it being deficient in NEDP terms. First, there is lots of noise in input as there is often a large distance between pristine sentences and muddy utterances. On the input side, then, the PLD is hardly uniform (different speakers, registers), contains unclear speech, interjections, slips of the tongue, incomplete and wayward utterances, etc. On the intake side, the actual LAD (aka: baby) can be inattentive, mishear, have limited intake capacity (memory) etc. Thus, in contrast to the idealized data assumed for PoS1, the actual PLD can be very much less than perfect.

Second, the PLD consists of expressions from different Gs. In the extreme, as no two people have the exact same G, every acquisition situation is “multi-lingual.” In effect, standard acquisition is more similar to cases of creolization (i.e. multiple “languages” being melded into one) than to the ideal assumed in PoS1 investigations.[3] Thus there is unlikely to be a single G that fits all the actual PLD. Moreover, the noisy data is presented incrementally, thus, not all-at-once. Therefore, the increments are not only noisy but with respect to the LADs as a whole, the actual PLD is quite variable. It is very likely that no two actual LADs get the same sequence of input PLD.

These two features it is reasonable to believe can raise their own PoS problems. In fact, Dresher and Fodor/Sakas have shown that relaxing the all-at-once assumption makes parameter setting very challenging if the parameters are not independent (which there is every reason to believe is the case). Dresher, for example, demonstrated that even a relatively simple stress LAD has serious problems incrementally setting its parameters. I can only imagine the problems that might accrue were the PLD not only presented incrementally, but was drawn from different stress Gs 10% of which were misleading.

And that’s the point I tried to take away from the Gigerenzer & Brighton (G&B) paper: it is unlikely that the biases required to get over the PoS1 hurdle will suffice to get actual LADs over PoS2. What G&B suggests is that getting through the noise and the variance of the actual PLD favors a very selective use of the input data. Indeed, given what we suspect, if you can match the data too well you will likely not be tracking a real G given that the PLD is not homogeneous, noise free and closely clustered around a single G. And this is due both to performance considerations (sore throats, blocked noses, “thinkos,” inarticulateness, inattention, etc.) and non-homogeneity (many Gs producing the ambient PLD). In the PoS2 context things like the Bias-Variance-Dilemma might loom large. In the first they don’t because our idealizations abstract away from the kinds of circumstances that can lead to them.[4]

So, I was wrong to run together PoS1 problems and PoS2 problems. The two kinds of investigations are related, I still believe, but when the PoS1 idealizations are relaxed new PoS problems arise. I will talk about some of this next time.

[1] In modern terms this would be something like the top two phases of a clause (C and v*).

[2] This kind of idealization functions similarly to what we do when we create vacuum chambers within which to drop balls to find out about gravity. In such cases we can physically abstract away from interfering causal factors (e.g. friction). Linguists are not so lucky. Idealization, when it works, serves the same function: to focus on some causal powers to the exclusion of others.

[3] In cases of creolization, if the input is from pidgins then the ambient PLD might not reflect underlying Gs at all, as pidgins may not be G based (though I’m not sure here). At any rate, the idea that actual PLD samples from products of a single G is incorrect. Thus every case of real life acquisition is a problem in which PLD springs from multiple different Gs.

[4] In fact, Dresher and Fodor&Sakas present ways of ignoring some of the data to enforce independence on the parameters thus allowing them to incrementally set parameters. Ignoring data and having a bias seem (impressionistically, I admit) related.

Faculty of Language

Comments

Saturday, October 25, 2014

The two PoSs again

Contributors