Thursday, October 18, 2012

‘I’ before ‘E’: unambiguity

In my next post, I’ll discuss the I-language/E-language distinction that Chomsky introduced in Knowledge of Language. The history of this distinction is illuminating, and it helps explain what the ‘I-’ means. But for today, let me stipulate that an I-language is a generative procedure that connects articulations of some kind (say, sounds or gestures) with meanings of some kind. Let ‘E-language’ be a covering term for anything else—a set of word strings, a social practice, or whatever—that might be called a language.

That’s already enough to make it clear that many alleged “debates” about whether kids learn the languages they acquire (and whether such languages are transformational) aren’t really debates. One “side” argues that humans naturally acquire I-languages that connect articulations with meanings in accord with logically contingent constraints that are not learned. The other “side” shows how a certain kind of learner could acquire an E-language that is like a human I-language in respects other than the ones highlighted by the evidence.

To take a much discussed kind of example, the word-strings indicated with (1-3)
(1)  The guest was fed waffles?
(2)  The guest fed the parking meter?
(3)  The guest who was fed waffles fed the parking meter?
can be used—with rising intonation—to ask yes/no questions, with the corresponding declaratives indicating affirmative answers. With regard to (1), the question can also be asked with (4). But (5) is not another way of asking question (3).
                        (4)  Was the guest fed waffles?
(5)  Was the guest who fed waffles fed the parking meter?
On the contrary, (5) is understood as the bizarre question indicated with (6).
(6)  The guest who fed waffles was fed the parking meter?
Put another way, (5) is unambiguous: it has the meaning indicated with (6), and it fails to have the meaning indicated (3). This “negative” fact is of interest. One can easily imagine a generative procedure that connects the pronunciation of (5) with both meanings, or just the meaning of (3). But (5) can only be understood as the bizarre question, even though (3) is the more likely question, given what we know about guests and waffles. Similarly, (7)
                        (7)  Was the hiker who lost kept walking in circles?
is understood as indicated with (7a) and not (7b).
(7a)  The hiker who lost was kept walking in circles?
(7b)  The hiker who was lost kept walking in circles?
So in acquiring English, one acquires an I-language that connects articulations with meanings in a way that makes (5) and (7) unambiguous.
Now perhaps kids somehow learn that their parents and peers use I-languages that
connect articulations with meanings in this constrained way. I doubt it, for reasons that have been reviewed often. (I’ve done my time on such reviews.) But in principle, I can imagine replies of the following form: show how kids could start with a more permissive generative procedure—or a strategy for acquiring I-languages that would support acquisition of I-languages in which (5) is ambiguous—and then use available experience to figure out that the “local” I-languages are more constrained. I have not, however, encountered such replies. What I have encountered (see the reviews just mentioned) are descriptions of machines that can learn to classify strings like (8) as defective, while classifying strings like (9-11) as undefective.
                        (8)  Was the guest who hungry was tired?
                        (9)   The guest was hungry?
                        (10)  Was the guest hungry?
                        (11)  Was the guest who was hungry tired?
Such machines can, in effect, learn to put an asterisk on (8) while leaving (9-11) unmarked.

But that’s beside the point. The phenomenon illustrated with (1-7) is not that kids acquire hard to learn procedures for putting asterisks on strings. The point is that kids acquire I-languages (procedures that connect articulations with meanings) that are constrained in certain ways. Of course, any biologically implemented procedure will be constrained in ways that are unlearned. But the interesting nativist claim—not rebutted by inventing learnable procedures for classifying strings as defective—is that particular constraints (e.g., those characterized in terms of constraints on displacement) are unlearned.

Linguists, or their informants, might mark the oddness of (12) as shown below.
(12)  *The guest who fed waffles was fed the parking meter.
But like (7a), (12) can be understood as an English sentence that expresses a crazy thought. Another day, I’ll talk about contrasts with (13) and (14).
(13)  *Colorless green ideas sleep furiously.
                        (14)  *I might been have there.
There are complications, and not only because acceptability differs from grammaticality. But whatever we say about asterisks, a child who acquires English acquires an I-language that connects the pronunciation of (12) with the corresponding meaning. Such a child also ends up knowing that (12) is bizarre thing to say. But it’s bizarre because of what (12) means. Likewise for (5). And (5) wouldn’t be bizarre if it could have the meaning of (3).
(5)  Was the guest who fed waffles fed the parking meter?
(3)  The guest who was fed waffles fed the parking meter?
That raises the question of why of kids don’t acquire I-languages that are more semantically permissive. Building machines that can learn to put an asterisk on (5) doesn’t address this question, much less suggest that kids learn that strings like (5) are unambiguous. One can try to build a machine that classifies (5) as a “generable but deviant” string and (14) as “ungenerable.” But the question remains: why does (5) have one meaning rather than two? Similar remarks apply to (15), which has the meaning of (16) and not (17).
                        (15)  Can pigs that fly talk?
                        (16)  Pigs that fly can talk?                         
(17)  Pigs that can fly talk?
If we want to understand the human capacity to acquire I-languages—procedures that connect articulations with meanings in certain ways—then it’s hard to see the point of inventing machines that learn to classify strings as generable or not. There are, of course, other goals. But to have a debate about I-languages, both sides have to talk about them. 


  1. On your last argument, that you don't see the point of studying weak learning algorithms -- surely there are several good reasons, among which is that weak learning is more mathematically well defined, that it is easier, that any strong learner will be a fortiori a weak learner and so on.

    We may want a strong learner, but given that this is a hard problem to say the least, it seems appropriate to start by solving some well-defined simpler problems; by breaking the large problem into several smaller ones that one can solve one by one. This seems entirely in line with standard scientific methodology; thoroughly Galilean in fact.

    1. Hi Alex,
      Thanks. That claim in my last paragraph was (and was intended to be) conditional and qualified: *if* the goal is to study the *human* capacity to acquire *I-languages* that allow for ambiguity only in the constrained ways I gestured at, *then* I don't see the point of inventing machines that learn to classify strings as generable or not. As I said, there are other goals. But I don't see any reason for thinking that human kids acquire a capacity to classify strings as generable or not, independent of acquiring an I-language that generates meaningful expressions in a certain way.

      The fact that other projects are more mathematically well-defined may be a reason to pursue those other projects, at least if they lead to interesting results. But if the goal is to understand how kids come to connect pronunciations with meanings in the ways that kids do--and not in the ways that kids don't--then I don't see how it matters if other goals are mathematically better defined. (Though I'm not sure how well-defined the alleged set of *strings* of [my idiolect of] English is. Is 'The child seems sleeping' a member of that set?) And if the question is how kids learn/acquire I-languages, as opposed to how a system might learn/acquire a capacity to classify strings, then I don't really understand the claim that "any strong learner will be a fortiori" a weak learner. What is it to be a weak learner of an I-language?

    2. Just clarifying my terminology to make sure we are on the same wavelength: taking E-language to be the infinite set of sound meaning pairs, and I-language to be some finite internal procedure that generates them, then a weak learner is just one that converges to an I-language that generates the right set of sound/meaning pairs, or just the right set of sounds, (there are two models depending on what you think the right model of the input to the learner is)
      and a strong learner is one that converges to one that is isomorphic to the correct I-language (or one of the correct I-languages, since we don't know that there is a unique one for each E-language).

      I completely agree about the weaknesses of looking at these things as sets, rather than something fuzzier and more probabilistic, but the same definitions work for probability distributions over sets, if you prefer that way od modeling things.

    3. I think we're on the same page terminologically. And if being a weak learner of an I-language L is a matter of acquiring a procedure that generates (all and only) the same pronunciation-meaning pairs as the procedure L, that's fine. (Of course, to nail this down, we'd have to be precise about what we meant by 'pronunciation' and 'meaning'.) Generating the same strings as L would be weaker still, at least on the usual construal of 'string'. And either way, I think it's a funny sense of "learning L"--namely learning something that is like L in some respect. But if the goal is see how a child could acquire a procedure that is in some sense equivalent to the procedure(s) used locally, that's fine with me. The empirical question is then whether string-equivalence (as opposed to other ways in which I-languages can be similar) is an explanatorily interesting dimension of similarly across I-languages. If it is, then great, that's a defense of E-languages as independently interesting, If not, then not.

      The point about 'seems sleeping' wasn't to suggest that grammaticality is a graded and/or probabilistic notion--though I agree with you that it (probably) is. The point was that competent speakers know that this somewhat defective string has the meaning of 'seems to be sleeping' as opposed to 'seems sleepy'. So it seems that speakers of English have an I-language that pairs the pronunciation of 'seems sleeping' with a specific meaning, even though the string is recognized as defective.

      I agree that we don't know in advance that kids in a community converge to the same I-language. Still, I think we do have reason to believe that kids respect the same basic constraints on homophonies. *Given* two grammars that generate all *and only* the same pronunciation-meaning pairs, I agree that we would need to find independent evidence that speakers employ a common procedure corresponding to one of the grammars. But while one can find examples of extensionally equivalent specifications of lexical meanings, it's harder to come up with extensionally equivalent *grammars*, given the ubiquity of constrained homophony,

    4. Yes, I see your point: the natural way is to say we want the learner to converge to something that is isomorphic to the correct I-language (let's assume there is just one). But the problem with that approach is that we do not know what the correct I-language is, or even what the correct class of I-languages is. We have some theories about what they are, but given the fact that we cannot observe them, at least at our current technological state, we need to use some proxy, and extensional equivalence seems a reasonable one since it relies on observable properties, namely E-language. I completely buy the classical arguments that the object of study is I-language (or rather UG) and not E-language, but the fact of the matter is that nearly all the evidence we have is via E-language, and that the learner only has access (noisy, incomplete and so on) to E-language.

      I am quite interested in these points, as I have been working on strong learning algorithms for the last few years, though with only limited success so far. The results I am trying to get at the moment are where we define a class of grammars (of CFGs or Minimalist Grammars say) and try to learn in the strong sense -- that is to say to learn from strings a grammar that
      defines isomorphic sets of structural descriptions to the target grammar. That seems a much more directly interesting result for linguistics.

      On your final point, I think this crucially depends on whether you are using a grammar formalism where the derived trees (structural descriptions) and the derivation trees are the same or at least in bijection with each other. If they are, as is the case with e.g. CFGs, then your point is probably correct, but if they are not, as with say regular tree grammars or tree substitution grammars, then it is quite possible we might have different grammars. We might also have slightly different feature systems between individuals that have extensionally equivalent grammars, if the feature systems are learned and not part of UG.

    5. Thanks for the continued interest.

      We probably have different views about what has been learned, over the last 50 years or so, about the (class of) I-languages that human kids naturally acquire. Regarding examples like (1-11) in the post, I don't see how to explain the unambiguities if not via constraints on transformations. So I think kids acquire I-languages that permit transformations in constrained ways. And I think we've learned a fair bit about the constraints, thereby characterizing (very partially and imperfectly) a space of "humanly possible I-languages." Maybe all that work is wrong, or at least not justified. Maybe. But I don't know how to even characterize (alleged) E-languages except by reference to I-languages. So I think the relative priority of 'E' to 'I' is the converse of what you're suggesting. But suppose I'm wrong to think this.

      I'm not sure what you mean by extensional equivalence in this context. And I think this matters. Are we now talking (a la David Lewis) about procedures that generate all and only the same string-meaning pairs, or about procedures that just generate all and only the same strings? If my I-language is a procedure that generates pronunciation-meaning pairs, then a procedure P cannot be *extensionally* equivalent to my I-language if P generates only strings. Of course, we can characterize other notions of equivalence. But it's an open question whether the other notions are interesting if the goal is to characterize my I-language and others as humanly possible I-languages.

      I'm not sure why observability is especially relevant here. But I may be missing your point, since I don't know what it is for evidence to be "via E-language." Do you just mean that the usual data concern perceptible strings of words? That may be right, but I thought the evidence (for theorists) included facts like: this string cannot support this meaning. More importantly, I'm not sure what you mean when say that learners only have access to E-language. Is the idea that the only inputs to the language acquisition process are word-strings, string-meaning pairs, ... ? And whatever 'E-language' means here, is it definitional that kids don't have "access" to their current I-language at each stage of the acquisition/maturation process?

    6. Yes, likewise -- thanks for taking the time to answer.

      Just to clarify my imprecisions here:
      there are two notions of extensional equivalence, one where we consider just the word-strings (E1-equivalence) and one where we consider the word-meaning pairs (E2-equivalence). If we think of the learner as receiving as input only strings, then the appropriate notion of equivalence is the former; if we consider a model where the learner is receiving word-meaning pairs then the appropriate model is the latter. Of course, the real situation is more complex, as the child receives only partial information about the meaning of the utterance that it can derive from the situational context; and of course is operating under conditions of perceptual noise and so on, but it seems appropriate to idealise here.

      The evidence for the theorist is a bit richer -- but by via E-language I just mean exactly things like 'this string cannot support this meaning'. Suppose we have two grammars (models of I-language) that are extensionally equivalent in the sense that they generate the same set of string-meaning pairs. Then no information of this type (via E-language) will distinguish them (by assumption). So in order to distinguish these two theories, we either need some empirical data of a different type (e.g. psycho/neuro-linguistic data) or you use a non empirical principle (e.g. the SMT, or some notion of simplicity).

      Access was the wrong word to use, apologies, I meant more precisely input.
      Certainly, children have access to their current I-language during the learning process, but that is internal, so I wouldn't consider that to be input or data: it's an internal hypothesis, if anything an *output* of the learning algorithm.

      One important point here, which I think is glossed over in standard discussions of the E-language/I-language distinction is that language acquisition is a convergence of children to a grammar which is E2-equivalent (i.e. defines the same set of sound/meaning pairs) to the one in the environment/ the language of the parents. We don't necessarily know that the grammar of the child is also I-equivalent (isomorphic to the I-language of the parents) as well.
      So even in a uniform speech community, we might have slightly different I-languages which are all E2-equivalent.

      The only substantial disagreement we have is about how well established 'transformation-based' models are. If we compare current models of syntax that are reasonably precise, say modern variants of HPSG, CCG, TAG, and Stabler's minimalist grammars, it is clear that the movement based models are not the only option. Indeed on the particular examples of the auxiliary system, Ivan Sag in "Sex, Lies, and the English Auxiliary System" makes a very convincing case that the non-movement analyses are better than the movement analysis.

      I personally think the distinction between movement-based and non movement based models is largely notational so I target the MCFG hierarchy in my current work which are a nontransfomational class of models that are strongly equivalent to Stabler's MGs, which count as tranformational, I guess.

      Actually, now I have written this, perhaps we don't disagree on this point at all -- if one accepts MGs as a 'reasonable' formalisation of minimalist syntaz, and it's the only candidate at the moment, maybe there is an across the board consensus at a high level of abstraction. Indeed the proposal in your paper with Berwick and Chomsky seemed quite close to Stabler's MGs, at least in terms of the combinatorial operations it uses.

  2. OK, lots of convergence now, which is great.

    And let me say that I shouldn't have used 'transformational' in my previous comment...since I wasn't thinking explicitly about the Chomsky-hierarchy, and I wasn't assuming that expressions "move." (The copy theory of "movement" is fine by me.) And I agree that many debates about "movement" are terminological. I was just assuming that one way or another, the structure relevant to articulation will sometimes differ from the structure relevant to meaning, but that such "mismatches" (between PF and LF, in minimalist idiom) are evidently quite constrained. And I like Stabler's grammars, as you suspected. (I've only seen Sag's slides, and haven't had a chance to think through the predictions for constraints on quantifier scope.)

    I'm not sure that your E1-equivalence is really a kind of *extensional* equivalence. But that probably doesn't matter. For my money, the really interesting question is the one you rightly raise in terms of your E2 notion: to what degree do kids in a community converge on a single I-language that is also used by their parents (peers/teachers/grandparents/etc).

    Given creolization and historical cases of fast language change (and Rozz Thornton's work on kids diverging from parental I-languages but in ways that respect constraints respected by other "adult" I-languages), I have doubts about the usual idealizations according to which kids try to acquire a "target" grammar. It might be that kids just keep trying out humanly possible I-languages until they find one that seems to work well enough for communicative purposes, or until they get too old and inflexible to keep trying new ones. (My hunch is that studies of signed languages will yield more insights here than studies of spoken languages. But we'll see.) It would be great to know what proportion of humanly possible I-languages are such that: if you use them, your grandkids will end up acquiring them. Of course, the languages we see in stable crossgenerational use have this property. But that could be, at least in part, a historical fact rather than a manifestation of it being a biological norm for kids to converge to their parents' I-language.

    But now focusing on the (many) cases where kids do acquire an I-language that is at least roughly E2-equivalent to the local I-language(s) rightly ask (in good Quine-Lewis fashion) why we should think that the acquirers converge to a single I-language, perhaps modulo small variations. In one sense, it will take the next several posts for me to say anything remotely satisfactory about this. (But I'll try...that's where I'm heading). In brief, I think one needs to find new sources of evidence beyond speakers' judgments about potential interpretations for strings. And in that sense, the classical data for linguistic theorizing--like all data--has its limits. But once there is agreement that we're trying to figure out which I-language(s) kids acquire when they acquire English--and that the (hard) task is to specify the procedures that kids use to connect articulations with meanings, and how these procedures are acquired--then I'm all in favor of using whatever tools we can use to answer your question about how much actual I-language variation there is, given a logically possible space of E2-equivalent I-languages. In my own toolkit, I tend to like a mix of minimalist reasoning--to try to identify the "basic operations" that the human language faculty seems to employ--and experimental methods that provide new sources of data that bear on parade cases of intensional equivalence (e.g., the many provably equivalent ways of specifying the extension of a quantificational word like 'most'). But I'm wide open to insights from any corners on how to tease apart hypothetical procedures that pair the same articulations with (all and only) the same meanings/construals.

    1. I also think that there are some difficulties with the notion of I-language equivalence -- if one's conception of I-language is as a bottom-up or top-down generative device, it is clear that this is not what is actually neurally realised: it is a very abstract view. So it is not clear what it means to say that person A and person B have the same I-language; Marr's levels don't quite seem to fit here. The computational level is E2-equivalence, but I-languages aren't algorithmic models as they are abstract competence models not process models.

      But that is beside the point -- can I re-ask my original question now that we have got enough common ground staked out. We have three notions of equivalence and clearly I-equivalence implies E2-equivalence implies E1-equivalence. Now I agree that E1 is too weak, and we can kick back and forth whether we should really be targeting E2 or I equivalence, but given the implications and the hardness of the problems, I have chosen to start with the simpler problem.
      What we want is strong learning of MGs, but starting with weak learning of regular grammars, and then moving to weak learning of CFGs, then MCFGs, then moving to strong learning -- that seems perfectly reasonable to me, and yet you and many others take the view that it is irrelevant.

      Re-reading your first response you put it quite correctly "But I don't see any reason for thinking that human kids acquire a capacity to classify strings as generable or not, independent of acquiring an I-language that generates meaningful expressions in a certain way. " Just so, this ability is not independent of the I-language -- it is entirely *dependent* on acquiring the I-language, just a trivial consequence of I-language, and therefore modeling this ability might tell us something about the acquisition of I-language.

    2. Somebody might think that for each ability A that we have by virtue of having an underlying competence C (maybe together a few other cognitive capacities), modeling A is a reasonable way of trying to gain insights about C and/or its acquisition. Some of my friends in philosophy seem to hold a view close to this. But I assume you don't hold any such general view. So do you think there is something special here about human I-languages and their relation to other languages in a certain hierarchy?

      I know you're not suggesting that in general, modeling A is a reasonable research strategy for understanding C if modeling A is easier than trying to describe C itself. But your "simpler" problem--which strikes me as plenty hard...I couldn't do what you do in terms of making progress on it--may just be a *different* problem, subject to different constraints, than the acquisition of human I-languages.

      Even if acquiring a human I-language is one special (overkilling) way of acquiring a string-classifier, I don't see yet see any reasons for thinking that if you set out to acquire a string-classifier, you'll end up acquiring something that is interestingly like a human I-language. (And likewise for many alternatives to 'string-classifier'.) But I really mean the "I don't yet see any reasons for." It's not that I think that the strategy you describe can't yield insights about the acquisition of human I-languages. It might. And I don't think the strategy is irrelevant, if only because one way to be relevant is by revealing the difference between acquiring a string classifier and acquiring a human expression generator. The question, to my mind, is whether it is "perfectly reasonable" to suppose that an account of how a certain kind of learner can acquire one kind of procedure (that kids don't acquire) will yield insights into how kids do acquire procedures of another type.

      It might be reasonable *if* we had independent reason for thinking that human language acquisition was fundamentally a matter of learning from the environment, with innate constraints playing only a relatively minor or cognition-general role (e.g., ruling out Gruesome predicates that would make a mess of induction). But absent reason for thinking this, why think that weakly learning a regular grammar or a CFG or ... is (in theoretically interesting respects) like acquiring a human I-language?

    3. I am not sure what you mean by a string-classifier, but it sounds bad ...
      What we acquire are CFGs, MCFGs and so on, without any semantics, as the learner has no semantics as input.
      There are other models that do use semantics, if you think that is more relevant.
      So are CFGs or MCFGs interestingly like human I-languages?
      The derivation trees of Stabler's MGS are exactly like the derivation trees of MCFGs, so if MGs are interestingly like I-languages then so are MCFGs.

      The term 'weak' just refers to the criterion of convergence we learn a grammar that generates the right set of strings -- so we require it to be correct qua 'string-classifier' in your terms.

      On your final paragraph, if FLA isn't fundamentally a matter of learning from the environment, what is it now that UG is minimal? Now that parameter setting has been abandoned there don't seem to be many alternatives on the table, other than some vague comments about third factors.
      Chomsky at UCL this year seemed to say language acquisition did proceed primarily through learning, but perhaps there was some miscommunication.

  3. Paul -- glad to see your postings here, very interesting stuff.

    I wholeheartedly agree that the "focus on asterisks" you bemoan has been detrimental to the field, probably going back to the wrong-headed simplification that what we're doing is building a machine that can separate the "good" from the "bad" sentences. What we're doing, of course, is build a machine that links sound and meaning in an empirically correct way, *including* all kinds of crazy meanings (and crazy sounds, say illicit ellipsis). Seems to me that this whole misunderstanding, and the overstated importance of "acceptability," goes back to the early days, where the generative procedure was taken to generate the good sentences but not the bad ones. This was clearly inspired by formal-language theory, but commits to the fallacy of illicitly equating acceptability and grammaticality. I think current "crash-proof syntax" models fall prey to the same fallacy.

    (I didn't have time to read the discussion above, so apologies in case I'm repeating things that are mentioned there.)

  4. I was thinking about this some more, and especially about the issue of assigning meanings to ill-formed sentences like "John seems sleeping".
    I think there are two separate issues here: one is whether you consider a joint model of sound/meaning pairs, versus a model just of the sequences of sounds. The second is whether this model is 'categorical' defining a sharp boundary or boundaries between ill-formed and well-formed, versus a 'graded' model, such as a probabilistic one.
    Clearly these are independent -- one can have a categorical joint model or a graded joint model or a categorical model of just strings (as in formal language theorey) or a probabilistic model of strings.
    But it seems like they are getting conflated here.

    1. There are indeed those two issues. But the point I was stressing is closer (though not identical) to the first. It's that the string is, while not fully legit, still interestingly unambiguous: it means that John seems to be sleeping, and not that he seems sleepy (and it's not gibberish). That tells against a picture--quite common in the circles I travel in--according to which syntactic competence is a matter of determining which strings are well-formed formulae (wffs), and semantic competence is matter of determining which meanings wffs have. I'm fully on board for the idea that grammaticality (as well as acceptability) is a graded notion. I'm skeptical of the idea that humans have a capacity to determine which word strings are wffs--as opposed to a capacity to connect articulations with meanings, in accord with constraints. Though we may be able to judge that certain articulations are "word salad" in the sense of having zero (compositional) meanings.