Wednesday, October 31, 2012

‘I’ before ‘E’: Lewis, Language and Languages

Last week, I located Chomsky’s I-language/E-language distinction in the context of Church’s distinction between intensions (procedures) and extensions (sets of ordered pairs). In passing, I mentioned three claims from “Language and Languages” that deserve more attention.
(1) “…a language does not uniquely determine the grammar that generates it.”
(2)  “I know of no promising way to make objective sense of the assertion that
         a grammar Γ is used by a population P, whereas another grammar Γ',
         which generates the same language as Γ, is not.”
(3)  “I think it makes sense to say that languages might be used by populations
        even if there were no internally represented grammars.”
Lewis, who I greatly admire, begins his paper in a way that might seem uncontroversial.
What is a language? Something which assigns meanings to certain strings of types of sounds or of marks. It could therefore be a function, a set of ordered pairs of strings and meanings.
But let’s controvert. Why think that in acquiring a spoken language, one acquires something that assigns meanings to strings? Couldn’t one acquire—i.e., come to implement—a procedure that generates meaningful and pronounceable expressions without assigning anything to anything? (Can’t a machine generate coins that depict people and buildings without assigning buildings to people?) One can stipulate that I-languages assign meanings to strings. But then at least some “meaning assigners” are procedures rather than functions in extension.
Moreover, even if all languages are meaning assigners in some nontrivial sense, why think that Human Languages—languages that human children naturally acquire—could be sets of ordered pairs? There is a thin sense of ‘could’ in which the object beneath my fingers could be a badger (cleverly disguised by an evil demon) rather than a computer. Perhaps it follows that my computer could be a badger. But such thin modal claims don’t help much if you want to know what a computer is. So maybe Lewis' modal claim just indicates the hypothesis that Human Languages are sets of a certain sort. Note, however, that the alleged sets would be quirky.
Recalling an earlier post, (4) is at least roughly synonymous with (5), but not with (6).
           (4)   Was the guest who fed waffles fed the parking meter?
           (5)   The guest who fed waffles was fed the parking meter?
           (6)   The guest who was fed waffles fed the parking meter?
Let <S4, M4> be the string-meaning pair corresponding to (4), and likewise for (5-6). Then on Lewis’ view, the elements of English include <S4, M5> but not <S4, M6>. But why is English not a slightly different set that also includes <S4, M6>? One can stipulate that English is the set it is. But then the question is why humans acquire sets like English as opposed to more inclusive sets. And the answer will be that human children naturally acquire certain generative procedures that allow for homophony only in constrained ways. Similar remarks apply, famously, to ‘easy to please’ and ‘eager to please’. Moreover, if English is a set, does it include <S7, M8> or not?
                        (7)  The child seems sleeping.
                        (8)  The child seems to be sleeping.
Such examples suggest that sets of string-meaning pairs are at best derivative, and that the explanatory action lies with generative procedures; see Aspects of the Theory of Syntax. But one can hypothesize that English is a set that may be specified by a grammar Γ, a distinct grammar Γ', and various procedures that kids implement. So perhaps (1) just makes it explicit that Lewis used ‘language’ in a technical/extensional sense, and in (1), ‘the’ should be ‘any'.
Still, (2) and (3) remain puzzling. If there really are Lewis Languages, why is it so hard to make sense of the idea that speakers specify them in certain ways, and easier to make sense of speakers using Lewis Languages without specifying them procedurally? Did Lewis think that it is senseless to say that a certain machine employs procedure (9) as opposed to (10),
              (9)  F(x) = | x - 1 |
            (10)  F(x) = +√(x2 - 2x + 1)
or that it is easier to make sense of the corresponding set being “used by a population” without being specified in any way? I doubt it. While talk of meaning can bring out residual behaviorism and/or verificationism (see Quine, or Kripke’s 1982 book on rule-following), I think it’s more important to highlight Lewis’ slide from talk of meaning to talk of truth and semantics.
What could a meaning of a sentence be? Something which, when combined with factual information about the world...yields a truth value. It could therefore be a function from worlds to truth-values—or more simply, a set of worlds.”
But why not: sentence meanings could be mental representations of a certain sort? If you think Human Language sentences are true or false, relative to contexts, it’s surely relevant that mental representations are (unlike sets) good candidates for being true or false relative to contexts.
Lewis was, of course, exploring a version of the Davidson-Montague Conjecture (DMC) that each Human Language has a classical semantics. As a first pass, let’s say that a language has a classical semantics just in case its expressions are related to entities—e.g., numbers, things covered by good theoretical generalizations, functions and/or mereological sums defined in terms of such things—in a way that can be recursively specified in terms of truth, reference/denotation, or Tarski-style satisfaction conditions. DMC raises many questions about specific constructions. But it also raises the “meta-question” of how a natural language could ever have a semantics.
One possible answer is that Human Languages connect pronunciations with generable representations of suitable entities. But this I-language perspective raises the question of what work the entities do in theories of meaning. And one needn’t be a behaviorist to wonder if ordinary speakers generate the representations required by theories of truth. Lewis held that a Human Language has its semantics by virtue of being used in accord with conventions of truth and trustfulness, “sustained by an interest in communication;” where typically, these conventions are not represented by ordinary speakers. Given extensionally equivalent clusters of conventions, there may be no fact of the matter about which one governs the relevant linguistic behavior. So Lewis was led to an extensional conception of Human Languages. He could offer convention-based accounts of both languages and language (i.e., linguistic behavior). But one can play on the count/mass polysemy differently, and say that language is the use of an I-language. So instead of embracing (2) and (3) as consequences of Lewis’ metasemantics, one might view them as reductios of his extensional conventionalism; see Chomsky, Reflections on Language.
Lewis recognized the possibility of taking Human Languages to be what he called grammars. But he quickly converted this alternative proposal into a methodological question: “Why not begin by saying what it is for a grammar Γ to be used by a population P?” For which, he had an answer: strings are paired with sets of worlds via conventions that do not plausibly determine a unique grammar; better to start by saying what it is for a Lewis Language to be used by a population. But why begin by saying what it is for anything to be used by anyone? Why not start by saying that Human Languages are procedures, partly described by linguists’ grammars, that generate pronounceable meaningful expressions. What might a sentence meaning be? Something which, when it interfaces with human conceptual systems, yields (modulo complications) a truth-evaluable thought. It could therefore be an instruction to build a thought.

Impossibility Arguments or Why I Love Poverty of Stimulus Thinking

Impossibility Arguments or
Why I Love Poverty of Stimulus Thinking

Sometimes to figure out what something does it is useful to consider what it can’t do. This is what impossibility arguments are designed for and generative grammarians love them (ok, many don’t appreciate their beauty but they should!). The most famous one in linguistics is the Poverty of Stimulus argument (POS).   The POS actually fulfills two functions: (i) it shows that the data kids use to acquire their natural language grammars radically underdetermine what they linguistically achieve and (ii) it offers a conceptual tool for investigating the structure of UG that has remarkable carrying power.  Let’s consider these two points in turn.

First a detailed review of the argument. The POS has several ingredients but it boils down to a very simple schema:
a.     Consider the features of some grammar rule R in some particular language L. 
b.     Subtract out those features of L that are data driven.
c.     The residue characterizes UG. 

The argument is easily illustrated and has been many times.  Nonetheless, let’s take a look at the standard example and pull it apart. The most discussed example involves Yes/No (Y/N) question formation in English. The facts are simple (btw: portys are Portuguese Water Dogs and I have one; ‘Sampson,’ who once (he’s now 13) was an excellent jumper). (1) illustrates an instance of the rule.

(1)   Portys can jump  à Can portys jump
(2)   John is thinking portys can jump à
a.     Is John thinking portys can jump
b.     *Can John is thinking that portys can jump
(3)   Portys that can jump should swim daily
a.     Should portys that can jump swim daily
b.     *Can portys that jump should swim daily
(4)   Can portys that jump swim daily

These facts tell us the following: (1) a Y/N question in English involves moving the auxiliary of the corresponding declarative answer to the front.  (2) tells us that when there is more than one auxiliary (so there is no the auxiliary) one can move only one of them. What’s the right way of describing the “correct” one? Consider two possibilities: (i) the right auxiliary is the highest one, (ii) the right auxiliary is the one linearly closest to the beginning.  Now the big questions: How does the kid settle on (i) and not on (ii)?, Do the data in (1)-(5) help the kid choose between (i) and (ii) and if so, how?  Well, simple sentences like (1) and more complex ones like (2) cannot help him/her much for both (i) and (ii) cover the relevant data (viz. can and is are both the highest and leftmost auxiliaries in (1) and (2)). To find data that distinguishes (i) from (ii) we need (3)? It is not consistent with (ii) (viz. should is the highest and can is the leftmost auxiliary) so it indicates that the “right” specification of the rule (the one English speakers know) is something like (i), i.e. move the highest auxiliary to the front. (4) is a bonus fact.  Restricting Y/N question formation as in (i) explains why (4) is unambiguous. It means (5a) and cannot mean (5b). 

(5)   a. Is it the case that portys that can jump swim daily
b. Is it the case that portys that jump can swim daily

To review: there are several kinds of facts, a mix of positive and negative data. The positive data divide into those sentences that are acceptable simpliciter, (viz. (1), (2a), (3a)) and those acceptable only under a certain interpretation (viz. (4) with interpretation (5b)). The negative data divide into those that are simply unacceptable (viz. (2b), (3b) and those that are unacceptable under a specific interpretation, (viz. (4) with interpretation (5a)).  Conclusion: (i) is how English speakers determine the right auxiliary to move.

Before moving on, it is important to appreciate that there is nothing conceptually wrong with either a linear or a hierarchical proximity condition.  Both are conceptually fine, neither is a priori simpler than the other (actually the linear proximity condition might be less complex as linear order seems a simpler conception than hierarchical order) and both could yield perfectly interpretable (though, as we’ve seen, different) outputs. Nonetheless, it is absolutely clear given these simple English facts that (ii) is in fact wrong and (i) (or something like it) is in fact right. 

With this settled we can now consider the real problem, the question whose answer allows us to fix the structure of UG: how do native speakers of English come to adopt (i) and reject (ii)?  Here’s what we can immediately conclude from perusing the data in (1)-(5). They do not do so on the basis of sentences like (1) or (2a) as both hypotheses are consistent with this kind of data.  Moreover, sentences like (2b) and (3b) being barely comprehensible are never produced (and if they were (rarely) they would be treated as noise given that they are gibberish) and sentences like (3a) are vanishingly rare in the primary linguistic data, at least as Childes indicates.  Legate andYang do the requisite grunt work and show that there are no (“no,” as in zero!) Y/N sentences of the required form (i.e. sentences like (3) and (4)) in the relevant data bases.[1]  That means that human adults are not driven to adopt (i) by the data available to them in the course of acquisition. There simply is no such data. Conclusion: if opting for (i) is not data driven then the choice of (i) must be result of some internal bias in the learner in favor of restrictions like (i) and against those like (ii). Put in generativist terms, the bias towards (i) and against (ii) is part of UG and so need not be learned at all.  It is, rather, part of what humans bring to the task of learning language. 

To be clear: this argument does not show that Y/N question formation is innate, nor does it imply that all languages will form Y/N questions by moving verbal auxiliaries to the front (they don’t). What this argument does show is that rules have a structure sensitive, rather than a linear sensitive format. And this is what’s really interesting. The very simple data above concerning a very simple rule of English grammar, Y/N question formation, argue for a very general conclusion about the kinds of grammar rules natural languages possess. They are sensitive to hierarchical relations (e.g. “highest”) and oblivious to linear ones (e.g. “leftmost”).  Why UG is built this way is a very interesting question (currently a topic of some interesting speculation which I will sadly ignore). But built that way it is and this conclusion has very wide ranging ramifications, viz. that we should expect to find no rule of grammar in any natural language that is stated in linear rather than hierarchical terms.  In other words, the reasoning we have engaged in requires that the format restrictions that UG puts on English Y/N questions must regulate grammar rules in general, with the result that all rules in all natural language grammars must embrace (i) and eschew (ii).  This is a very strong claim, and one that to the best of my knowledge has admirably withstood decades of linguistic investigation on the grammars of many many diverse languages.

And this, if you sit back and stare at it for a moment is amazing. Careful consideration of the logic of very simple cases in one language allows one to establish how rules will operate in all languages.  One of the marvels of the POS (and which particularly appeals to my philological ineptitude) is that it allows for a very respectable kind of arm-chair discovery, one able to establish properties of UG more powerfully than with any other method. The reason is that POS focuses attention on the properties of the grammatical coding system rather than the specifics of any given code.  English may code different rules than Swahili or Japanese but all three languages rules will code rules in the same way, necessarily cleaving to the format that UG requires.  We know that this must be so given another simple fact: any kid can learn any language if dropped into the right speech community. UG describes this very general language learning capacity. POS allows us to subtract out the limits of what data can determine. What’s left over are innate features of UG that, by the design of the POS, cannot be data sensitive. Consequently what we learn about UG using the POS in studying the grammar of any one language must necessarily carry over to the grammatical format of another. The POS, though deceptively simple, cuts very deep.  The POS; simple, effective, amazing!

[1] They also consider another possible data source to distinguish (i) from (ii) (sentence like where’s the other dolly that was in there) and find that these appear in the relevant data sets at a rate of about .05%. This is indistinguishable from noise.  Put another way: only a system looking for such data could be expected to find it and such a system would already have been biased towards a Y/N rule incorporating (i).  Thus, if this is not noise for the learner it is because the learner is already looking for the “right” answer, the same conclusion we have reached without considering such cases.
            Incidentally, I highly recommend the Legate-Yang paper as it is the most sophisticated elaboration of the POS argument I’ve seen. They try to quantify “how much” data is too little data. The punch line is about 1.2% for Y/N questions. This for the case at hand the Childes data is roughly 1/20th of what is needed.    

Friday, October 26, 2012

The Rationalism of Generative Grammar

 The theory of mind that Generative Grammar endorses is evidently rationalist. An earlier post gestured (don’t you love that word!) to these philosophical roots. Interestingly (though not surprisingly), the scientific metaphysics is as well. Here’s what I mean.

I was recently reading a fascinating book by Nancy Cartwright -The Dappled World- (highly recommended for philo of science aficionados) in which she contrasts the role of powers/natures/capacities versus regularities in scientific theory.  The classical empiricist/Humean tradition rejected the former as occult residues of an earlier search for Aristotelian essences and insisted on founding all scientific knowledge on “the kinds of qualities that appear to us in experience (79)” (recall the dictum: nothing in the intellect that is not first in the senses!). Modern empiricists/Humeans endorse this antipathy to “powers” by treating the laws of nature as summaries of “what things do (82).” Cartwright contrasts this with the view that laws are about powers/natures/capacities, which is not about what things do but “what it is in their nature to do (82).” Here’s a quote that provides a good feel for what she has in mind:

What we have done in modern science, as I see it, is to break the connection between what the explanatory nature is- what it is in and of itself- and what it does. An atom in an excited state, when agitated, emits photons and produces light. It is, I say, in the nature of an excited atom to produce light. Here the explanatory feature –an atom’s being in an excited state- is a structural feature of the atom…For modern science what something really is -how it is defined and identified- and what it is in its nature to do are separate things.

In short, there is an important metaphysical distinction that divides Empiricists and Rationalists. For the former the laws of nature are in effect summaries (perhaps statistical) of “actually exhibited behaviors”, for the latter they describe abstract “configurations of properties” or “structures.” These latter underlie, but are distinct from, behavior (“what appears on the surface”), these being “the result of the complex interaction of natures (81).”

Cartwright notes the close connection between the Rationalist conception of powers/natures/capacities and the analytic method of inquiry characteristic of the physical sciences, often called “Galilean idealization.” She also provides several interesting reasons for insisting on the distinction between what something is versus what it does. Here are two.

First, given that visible behavior is an interaction effect of complex natures it is often impossible to actually see the contribution of the power one is interested in, even in the very contrived circumstances of controlled experiments. She illustrates this using Coulomb’s law and the interfering effects of gravity. As she points out:

Coulomb’s law tells not what force charged particles experience but rather what it is in their nature, qua charged, to experience…What particles that are both massive and charged actually experience will depend on what tendency they have qua charged and what qua massive (82).

Thus, actual measurable forces are the result of the interaction of several powers and it takes great deal of idealization, experimentation, calculation and inference to (a) simply isolate the effects of just one and segregate it from everything else, viz. to find out how two charged bodies “ ‘would interact if their masses were zero.’ ”[1]  And (b) to use the results from (a) to find out what the actual powers involved are:

The ultimate aim is to find out how the charged bodies interact not when their masses are zero, nor under any other specific set of circumstances, but how they interact qua charged. 

Second, contrary to the accepted wisdom more often than not in the real world the same cause is not followed by the same effect.  In fact, generating stable relations between cause and effect requires very careful contrivance in manufactured artificial experimental settings.  Cartwright refers to these as nomological engines; set-ups that allow for invariant regular connections between what powers/natures/capacities can do and what they actually do. Except in such settings the Humean dictum that effects regularly follow causes is hardly apparent.

Outside the supervision of a laboratory or the closed casement of a factory-made module, what happens in one instance is rarely a guide to what will happen in others.  Situations that lend themselves to generalizations are special…(86).
Now, the punch line: Cartwright’s discussion should sound familiar to generative ears. Chomsky’s important distinction between competence and performance is a rationalist one.  UG is a theory of human linguistic powers/natures/capacities, not a theory of linguistic behavior.  UG is not a summary of behavioral regularities. Indeed, linguistic behavior is a very complex interaction effect with competence being one of many (very poorly understood) factors behind it. The distinction between what a speaker knows (competence) and what a speaker puts this knowledge to use (performance) echoes Cartwright’s rationalist themes. Similarly, the rejection of the idea that linguistic competence is just (a possibly fancy statistical) summary of behavior should be recognized as the linguistic version of the general Rationalist endorsement of the distinction between powers/natures/capacities and their behavioral/phenomenal effects. Lastly, the Rationalist conception buttresses a reasonable skepticism against a currently common (and sadly fashionable) view that language acquisition is largely a statistical exercise where minds track environmental regularities, a view more congenial with the empiricist conception that identifies what something is with what it does. This cannot be true: for there are precious few such regularities in the wild and what there is won't (in fact cannot) alone reveal the powers, capacities and natures of the underlying object of linguistic interest, i.e. the fine structure of UG.

[1] Cartwright observes that though doing (1) is difficult it is “just a stage; in itself this information is uninteresting.” (83-4), a point not unlike that made in a previous post.

Wednesday, October 24, 2012

‘I’ before ‘E’: especially after ‘C’

Consider the function represented with (1) and the function represented with (2), 
                        (1)    F(x) = | x - 1 |
                        (2)    F(x) = +√(x2 - 2x + 1)
letting ‘x’ range over the integers. Have we represented the same function twice? Or have we represented two functions that share an “input-output profile,” gestured at with (3)?
                        (3)   { ... , <-2, 3>, <-1, 2>, <0, 1>, <1, 0>, <2, 1>, <3, 2>... }
This question bears, directly and historically, on the I-language/E-language distinction that Chomsky introduced in Knowledge of Language (1986).

For any integer, the absolute value of its predecessor is the positive square root of the successor of the result of subtracting twice that number from its square. There’s no magic here: (x - 1)2
x2 - 2x + 1. So the set of ordered pairs specified with (4) is the set specified with (5).
                        (4)  {<x, y> : y = | x - 1 | }
                        (5)  {<x, y> : y = +√(x2 - 2x + 1)}
In this respect, (4) and (5) are like ‘Hesperus’/‘Phosphorus’, ‘George Orwell’/‘Eric Blair’, ‘the smallest prime number’/‘the second positive integer’, etc. The morning star is the evening star, regardless of what people do or don’t know. Likewise, the set of ordered pairs <x, y> such that y = | x - 1 | is the set of ordered pairs <x, y> such that y = +√(x2 - 2x + 1). So getting back to the initial question, do we have one function or two? Frege and Church, who knew something about functions, held that it depends on what you mean by ‘function’. But they also held that an important sense, (1) and (2) should be understood as representing different functions that have the same extension.

Frege contrasted Functions with their Courses of Values. But he also said that Functions are “unsaturated,” as reflected with the variable ‘x’ in (1) and (2), and that this precludes referring to Functions. If you can’t imagine why Frege said this, count yourself lucky. It leads to claims like ‘The successor function is not a Function’. Yuck. In 1941, Church made the point clearer in On the Calculi of Lambda Conversion: we can talk about functions as procedures that map inputs onto outputs (functions “in intension”), or as sets of input-output pairs (functions “in extension”). But when a set has infinitely many elements, specifying it—as opposed to just giving hints and using ellipses, as in (3)—requires procedural description. In this sense, the procedural notion is primary, as Frege had noted; and in this sense, (2) specifies a different function than (1).

Church also wanted to ask questions about computability. So he invented a notation for specifying procedures and their input-output profiles. Expressions of his lambda calculus can be construed intensionally so that (6) is true, or construed extensionally so that (7) is true.
                        (6) λx . | x - 1 | ≠ λx . +√(x2 - 2x + 1)
                        (7) λx . | x - 1 | = λx . +√(x2 - 2x + 1)
But as Church stressed, while extensional interpretation is adequate for some mathematical purposes—viz., when it doesn’t matter how outputs are paired with inputs—you need the intensional interpretation if you want to talk about algorithms (i.e., ways of computing outputs given inputs). In retrospect, this all seems pretty obvious. Eventually, I’ll discuss some ironies regarding how lambdas ended up being used in semantics. Today, the important point is that the ‘I’ in ‘I’-language connotes (among other things) ‘Intensional’ in Church’s procedural sense.

Chomsky also took I-languages to be internal (“some element of the mind”) and concerned with individuals as opposed to social artifacts. One might add that I-languages are biologically implemented and innately constrained. Alliteration is mnemonic. But Chomsky clearly viewed I-languages as generative procedures (see p. 23). So one crucial contrast is with extensional conceptions of language, according to which the “same language” can be determined by different procedures. Here it’s worth recalling Quine’s obsession with extensionality—and as Chomsky mentions, Lewis’ characterization of language as a social practice (“ruled by convention”) and languages as sets of pairs <s, W> where s is a string of sounds or marks, and W is a set of possible worlds.

My next post will focus on Lewis, who said, “A grammar uniquely determines the language it generates. But a language does not uniquely determine the grammar that generates it.” He added, “I know of no promising way to make objective sense of the assertion that a grammar Γ is used by a population P, whereas another grammar Γ', which generates the same language as Γ, is not.” Really? No way to even make sense of the idea that people use procedure (1) rather than procedure (2)? And then he said, “I think it makes sense to say that languages might be used by populations even if there were no internally represented grammars.” Whatever sense one makes of Lewis, there was room for a procedural alternative to extensional conceptions of languages.

But there are other conceptions: strings of a corpus; Quinean complexes of “dispositions to verbal behavior;” etc. Davidson said that a “radical” interpreter would ascribe languages to speakers; yet it was unclear what this implied for speaker psychology. So Chomsky introduced ‘E-language’ as a cover term for any language, in whatever sense, that is not an I-language. There’s no serious question about whether humans acquire E-languages. If we use ‘acquire’ so that (it comes out true that) kids acquire dispositions, social practices, sets, and corpora, then anyone who acquires English acquires many things that count as E-languages. And there’s no serious question about whether humans acquire I-languages, since there is no alternative account of how we can connect so many articulations with so many meanings as we do; see my earlier post on unambiguity. The interesting question, for purposes of scientific inquiry, is what explains what. Regarding the various “things” that count as languages, we want to know what they are, and which ones are good candidates for being explanatorily primary.

One can imagine discovering that each speaker of French has acquired the same I-language, which is kept in a glass case, guarded by L'Académie française. French children may have a kind of telepathy that lets them access this shared procedure, I-French; where such access is, like cell phone service, imperfect and subject to individual variation. In which case, I-French isn’t internal or individualistic in Chomsky’s sense. (It’s not analytic that I-languages have these features.) Less fancifully, Michael Dummett held that each speaker grasps her native language imperfectly and partially. One is free to posit procedures that connect articulations and meanings that are communally determined. But that raises the question of how kids in a community acquire the alleged public procedures. To be sure, we speak of acquiring citizenship by birth, and acquiring the age of majority. So we can say that each child in Lyon acquires—by participating in social practices (i.e., by growing up and talking)—a thing kept under glass in Paris. But we also speak of adolescents acquiring secondary sexual characteristics, caterpillars acquiring wings, etc. And one might suspect that when a child acquires a capacity to connect articulations with meanings, she does so by implementing her own I-language, where this procedure is relevantly (and deeply) like those her parents/peers use to connect articulations with meanings. More on this next week.