Faculty of Language: Lightfoot

Showing posts with label Lightfoot. Show all posts

Tuesday, September 9, 2014

Rationalism, Empiricism and Nativism -2

In an earlier post (here), I reviewed Fodor’s and Chomsky’s argument concluding that anyone that believes in induction must be a nativist. Why? Because all extant inductive theories of belief fixation (BF) are selection theories and all selection theories presuppose a given hypothesis space that characterizes all the possible fixable beliefs. Thus, anything that “learns” (fixes beliefs) must have a representation of what is learned (a given hypothesis space) which is used to evaluate the input/experience in fixing whatever beliefs are fixed. Absent this, it is impossible to define an inductive procedure.[1] Thus, trivially (or almost tautologically (see note 1)), whatever one’s theory of induction, be it Rationalist or Empiricist, everyone is a nativist. The question is not whether nativism but what’s native. And here is where Rationalists and Empiricists actually differ.

Before going on, let me remind you that both Fodor and Chomsky (and all the participants at Royaumont it seems to me) took this to be a trivial, nay, almost a tautological consequence of what induction is. However, this does not mean that it is not worth remembering and repeating. It is still the case that intelligent people confuse Rationalism with Nativism and assume that Empiricists have no nativist commitments. This suggests that Rationalists contrast with Empiricists in making fancy assumptions about minds and hence bear the burden of proof in any argument about mental structures. However, once it is recognized that all psychological theory is necessarily nativist, the burden shifting manoeuver looses much of its punch. The question becomes not whether the mind is pre-stocked with all sorts of stuff, but what kind of stuff it is stuffed with and how this stuff is organized. Amy Perfors (here) says this exactly right (135)[2]:

…because all models implicitly define a hypothesis space, it does not make sense to compare models according to whether they build hypothesis spaces in. More interesting questions are: What is the size of the latent hypothesis space defined by the model? How strong or inflexible is the prior?...

So given that everyone is a nativist, how to decide between Rationalist (R) vs Empiricist (E) approaches to the mind. First of all, note that given that everyone is a trivial nativist the debate between Rs and Es necessarily revolves around how beliefs are fixed and what this implies for the mind’s native structure. Interestingly, probing this question ends up focusing on what kind of experience is required to fix a given belief.

Es have traditionally taken the position that beliefs are fixed by positive exposures to extensions of the relevant concepts. So, for example, one fixes the belief that ‘red’ means RED by exposure to red, and that ‘dog’ means DOG by exposure to dogs. Thus, there is no belief fixation without exposure to tokens in the relevant extensions of a concept. It is in this sense that Es see the environment as shaping mental structure. Minds track environmental input and are structured by this input. The main contribution that minds make to the structure of their contents is by being receptive to the information that the environment makes available. On an E view, the trick is to figure out how to extract information in the signal. As should be obvious, this sort of view champions the idea that minds are very good statistical machines able to find valuable informational needles in potentially very large input haystacks. Rs have no problem with this assumption, but they argue that it is insufficient to account for our attested cognitive capacities.

More particularly, Rs argue that there is more to the fixation of belief than environmental input. Or, another way of making this same point, is that the beliefs that get fixed via exposure to input data far outrun the information available from that input. Thus, thought the environment can trigger the emergence of beliefs they do not shape them for we have ideas/concepts that are not themselves tokened in the input. If this is correct, then Rs reason that hypothesis spaces are highly structured and what you come to “know” is strongly affected by this given structure. Note that the disagreement between Rs and Es hinges on what it is possible to glean from available input.

So how to approach this disagreement in a semi-rational manner? This is where the Logical Problem of Acquisition (LPA) comes in. What is the LPA? It’s an attempt to specify the nature of the input data that an Acquisition Device (AD) has access to and to then compare this to the properties of the attained competence. Chomsky discusses the general form of this approach in chapter 1 of Reflections on Language (here).

In the study of language, the famous diagram in (1) concisely describes the relevant issues:

(1) PLD_L -> FL -> G_L

PLD_L is the name we give to the linguistic data from L that a child (actually) uses in building its grammar. FL is, well you know, and G_L is the resultant grammar that a native speaker attains. One can easily generalize this schema to other domains of inquiry by subbing other relevant domains for “L.” A generalized version of the schema is (2) (‘X’ being a variable ranging over cognitive domains of interest) and a version of it as applied to vision is (3). So, if one’s interest is in visual object recognition (as for example in Marr’s program), we can consider the schema in (2) as outlining the logic to be explored (PVD = Primary visual data, FV = Faculty of Vision, GV = grammar (i.e. rules) of vision).[3]

(2) PXD -> FX -> GX

(3) PVD -> FV -> GV

This schematic rendition of the LPA focuses the R vs E debate on the information available in PXD. An Eish conception commits hostages to the view that PXD is quite rich and that it provides a lot of information concerning GX. To the degree that information about GX can be garnered from PXD to that degree we need not populate FX with principles to bridge the gap. Rish conceptions rest on the view that PXD is a rather poor source of information relevant to GX. As a result, Rs assume that FX is generally quite rich.

Note that both Rs and Es assume that FX has a native structure. This, recall is common to both views. The question at issue is how much belief fixation (or more exactly the fixation of a particular belief) owes to the nature of the data and how much to the structure of the hypothesis space. As a first approximation one can say that Rs believe that given hypothesis spaces are pretty highly structured so that the data required to “search” that space can be quite sparse. Conversely, the richer the set of available alternatives the more one needs to rely on the data to fix a given belief. Thus for Rs all the explanatory action lies in specifying the narrow range of available alternatives, while for Es most of the explanatory action lies in specifying the (most often nowadays, statistical) procedures that determine how one moves across a rather expansive set of possibilities.

The schemas above suggest ways of investigating this disagreement. Let’s consider some.

E invites the view that, ceteris paribus, variations in PXD should lead to variations in GX as the latter closely tracks properties of the former (it is in this sense that Es think of PXD as shaping a person’s mental states). Thus, if some kinds of inputs are systematically absent in an individuals’ PXD, we should expect that that individual’s cognitive development and attained competence should differ from that of a individual with more “normal” inputs. Hume (our first systematic associationist psychologist) gives a useful version of this view:[4]

…wherever by any accident the faculties which give rise to any impressions are obstructed in their operations, as when one is born blind or deaf, not only the impressions are lost, but also their corresponding ideas; so that there never appear in the mind the least traces of either of them.

There’s been lots of research over the last 50 years exploring Hume’s contention in the domain of language acquisition. Lila Gleitman and Barbara Landau (G&L) provides a good brief overview of some of the child language research investigating these matters.[5] It notes that the evidence does not support this prediction (at least in the domain of language). Rather it seems that “humans reconstruct linguistic form …[despite] the blatantly inadequate information offered in their usable environment (91).” In other words, it seems that the course of language acquisition can proceed smoothly (in fact no differently than what happens in the “normal” case) even when the input to the system is perceptually very limited and degraded. G&L interpret this Rishly to mean that language acquisition is relatively independent of the nature of the quality of the input, which makes sense if it is guided by a rich system of innate knowledge.

G&L illustrate the logic using two kinds of examples: blind people can and do learn the meanings of words like ‘see’ and ‘look’ without being able to see or look, and people can acquire full native competence (and can make very subtle “perceptual” distinctions in their vocabulary) despite being blind and deaf. Indeed, it seems that even extreme degradation of the sensory channels leaves the process of language acquisition unaffected.

It is worth noting just how degraded the input can be when compared to the “normal” case. G&L reporting Carol Chomsky’s original research on learning via the Tadoma method (92):[6]

To perceive speech at all, the deaf-blind must place their fingers strategically at the mouth and throat of the speaker, picking up the dynamic movements of the mouth and jaw, the timing and intensity of the vocal-cord vibration, and the release of air…From this information, differing radically in kind and quality from the continuously varying speech wave, the blind-deaf recover the same ornate system of structured facts as do hearing learners…

In short, there is plenty of evidence that language acquisition can (and does) take place in the face of extremely degraded input, at least when compared with the PLD available in the standard case.[7]

The Poverty of Stimulus (PoS) argument also reflects the logic of the schemas in (1-3). As the schema suggests, a PoS has two major struts: a description of the available PLD and a description of the grammatical operations of interest (i.e. the relevant rules). The next step compares what information can be gleaned about the operation from the data, the slack is then used to probe the structure of FL. The standard PoS question is then: what must we assume about FL so that given the witnessed PLD, the LAD can derive the relevant rules? As the schema indicates, the inference is from instances of rules (used outputs of a grammatical system) to the rules that generate the observed sentences. Put another way, whatever else is going on, the LPA requires that FL at least contain some ways of generalizing beyond the PLD. This is not controversial. What is controversial is how fancy these methods for generalizing beyond the data have to be. For Es, the generalizing procedures are quite anodyne. For Rs it is often quite rich.

Well-designed PoS arguments focus on grammatical phenomena for which there is no likely relevant information available in the PLD. If Es are right (see Hume above), all relevant grammatical operations and principles should find (robust?) expression in the PLD. If Rs are right, we should find lots of cases where speakers develop grammatical competence even in the absence of relevant PLD (e.g. all agree that “John expects Mary to hug himself” is out and that “John expects himself to hug Mary is good” where ‘John’ is the antecedent of ‘himself’).

It goes without saying that given this logic debate between Es and Rs will revolve around how to specify the PLD in relevant cases (see here for a sophisticated discussion). So for example, all accept the idea that PLD consists of good examples of the relevant operation (e.g. all take: “John hugged himself” to be a typical data point bearing on principle A (A)). What of negative data, data that some example is unacceptable with the indicated interpretation (e.g. that “John expects Mary to hug himself” is out)? There is every reason to think that overt correction of LAD “mistakes” barely occurs. So, in this sense the PLD does not contain negative data. However, perhaps for the LAD absence of evidence is evidence of absence. In other words, perhaps for the LAD failing to witness an example like “John expects Mary to hug himself” leads to the conclusion that the dependency between ‘John’ and ‘himself’ in these configurations is illicit. This is entirely possible. So too with other *-cases.[8]

Note, that this reasoning requires a fancier FL than one that simply assumes that all decisions are made on the basis of positive data. So the logic of LPA is respected here: we compensate for the absence of certain information in the PLD (i.e. direct negative evidence) by allowing FL to evaluate expectations of what should be seen in the PLD were a given construction good.[9] The question an R would ask an E is whether the capacity to compute such expectations doesn’t itself require a pretty hefty native capacity. After all, many things are absent from the data, but only some of these absences tell us anything (e.g. I would bet that for most cases in the PLD the anaphor is within 5 words of the antecedent, nonetheless “John confidently for a man of his age and temperament believes himself to be ready to run the marathon” seems fine).

One assumption I commonly make in considering PoS arguments is that PLD effectively consists of simple acceptable sentences (e.g. “John likes himself”). This is the so-called Degree 0 hypothesis (D0H).[10] If the PLD is so restricted, then FL must be very rich indeed for many robust linguistic phenomena are simply unattested (and recall, induction is impossible in the absence of any data to drive it) in simple clauses; e.g. island effects, ECP effects, many binding effects, minimality effects a.o. The D0H may be too strong, but there are two (maybe one as they are related) reasons for thinking that it is on the right track.

The first is Penthouse Principle (PP) Effects. Ross noted long ago that there are many operations restricted to main clauses but virtually none that apply exclusively to embedded clauses. Subject Aux Inversion and Tag Question formation are two examples from English. If we assume that something like D0H is right(ish) we expect all idiosyncratic processes to be restricted to main clauses where substantial evidence for them will be forthcoming. Embedded clauses, on the other hand should be very regular. At the very least we expect no operations to apply exclusively to embedded domains, the converse of the PP as given D0H there can be no evidence to fix them.

The second reason relates to this. It’s a diachronic argument David Lightfoot gave based on the history of English (here). It is based on a very nice observation: main clause properties can affect embedded clause properties but not vice versa. Lightfoot illustrates this by considering the shift from OV to VO in English. He notes that in the period in which the change occurred, embedded clauses always displayed OV order. Despite this, English changed from OV to VO. Lightfoot reasons as follows: were embedded clause information robustly available there would have been very good evidence that, despite appearances to the contrary in unembedded clauses, that English was OV not VO (i.e. the attested change to VO (which ended up migrating to embedded clauses) would never have occurred. Thus, the fact that English changed in this way is nice (and influences in the other direction are unattested) follows if something like D0H holds (viz. an LAD don’t use embedded clause information child in the acquisition of its grammar). Lisa Pearl subsequently elaborated a sophisticated quantitative version of this argument here and here. The upshot: D0H holds. Of course if it does then the strong version of PoS arguments for many linguistic phenomena readily spring to mind. No data, no induction. No induction, highly structured natively given hypothesis spaces guiding the AD.

OK, this post has gotten out of control and is far too long. Let me end by reiterating the take-home message. Rs and Es differ not on whether nativism but on what is native. And, exploring the latter effectively revolves around considerations of how much information the data contains (and the child can use) in fixing its beliefs. This is where the action is. Research like what G&L review is interesting in that it shows that achieved competence seems quite insensitive to large variations in the relevant usable data. Classical PoS arguments are interesting in that they provide cases where it is arguable that there is no data at all in the input relevant to fixing a given belief. If this is so, then the mechanisms of belief fixation must lean very heavily on the highly structured (and hence restricted nature) of the hypothesis space that ADs natively bring to the belief fixation process. In R/E debates everyone believes that input matters and everyone believes that minds have native structure. The argument is about how much each factor contributes to the process. And this, is something that can only be adjudicated empirically. As things stand now, IMO, the fertility of the Rish position in the domain of language (most of cognition actually) has been repeatedly demonstrated. Score one (indeed many) for Descartes and Kant.

[1] In effect, induction serves to locate a member/members from a given set of alternatives. No pre-specified alternatives, no induction. Thus Fodor’s point: for learning (i.e. belief fixation) to be possible requires a given set of concepts that mediate the process.

Fodor emphasizes that this view, though trivial, is not purely tautological. There does exist a tautological claim that some have confused with Fodor’s. This misreading interprets Fodor as saying that any acquired concept must be acquirable (i.e. a principle of modal logic along the lines of: If I do have the concept that I could have had the concept). Alex Clark, for example, so reads Fodor (here): “There is a tautological claims which is that I have an innate intellectual endowment that allows me to acquire the concept SMARTPHONE in some way, on the basis of reading, using them, talking to people etc. Obviously any concept I have, I must have the innate ability to have it…” 

Fodor notes this possible interpretation of his views at Royaumont (p. 151-2), but argues that this is not what he is claiming. He says the following: “The banal thesis is just that you have the innate potential of learning any concept you can in fact learn; which reduces, in turn, to the non-insight that whatever is learnable is learnable. …What I intended to argue is something very much stronger; the intended argument depends on what learning is like, that is the view that everybody has always accepted, that it is based on hypothesis formation and confirmation. According to that view, it must be the case that the concepts that figure in the hypothesis you come to accept are not only potentially accessible to you, but are actually exploited to mediate the learning…The point about confirming a hypothesis like "X is miv off it is red and square" is that it is required that not only red and square be potentially available to the organism, but that these notions be effectively used to mediate between the organism's experiences and its consequent beliefs about the extension of miv…”  In other words, if inductive logics require given hypothesis spaces to get off the ground and if we attribute an inductive logic to a learner then we must also be attributing to them the given hypothesis space AND we must be assuming that it is in virtue of exploiting the properties of that space in fixing a belief. So far as I can tell, this is what every inductivist is in fact committed to.

[2] Despite the terminological misstep of identifying Rationalism with Nativism on p 127.

[3] In Marr’s program, the grammar includes the rules and derivations that get us from the grey scale sketch to the 2.5D sketch.

[4] This is quoted in Gleitman and Landau, see note 4. The quote is from Hume’s Treatise p 49.

[5] See ‘Every child an isolate: nature’s experiments in language learning,” Chapter 6 of this. See here for a free copy.

[6] Carol Chomsky’s original papers on this topic are appendixed in book. They are well worth reading. On the basis of the reported speech, the Tadoma learners seem indistinguishable from “normal” native speakers.

[7] G&L also note the excess of data problem towards the end of their paper. This is something that Gleitman has explored in more recent work (discussed here and in links cited there). Lila once noted that a picture is worth a thousand words, and that is precisely the problem. In the early period of word learning the child is flooded with logical possibilities when word learning is studied in naturalistic settings. Here induction becomes a serious challenge not because there is no information but because there is too much and narrowing it down to the relevant stuff is very hard. Lila and colleagues have argued that in such cases what the child does bears relatively little resemblance to the careful statistical sampling that one might expect if acquisition were via “learning.” This suggests that there must be a certain sweet spot where data is available but not too available for learning (induction) to be a viable form of acquisition. Where this is not possible other acquisition procedures appear to be at play, e.g. guess and guess again! Note, that this amounts to saying that resource constraints are key factors in making “learning” an option. In many cases, learning (i.e. reviewing the alternatives systematically) is simply too costly, and other less seemingly rational procedures kick in. Interestingly, form an R perspective, it is precisely when the field of options is narrowed (when syntax kicks in) that something akin to classical learning appears to become viable.

[8] For reasons I have never quite understood, many (see here) have assumed that GGers are hostile to the idea that LADs can use “negative” data productively. This is simply false. See Howard Lasnik (here) for a good review. As Lasnik notes, the possibility that negative data could be relevant goes back at least to Chomsky’s LGB (if not earlier). What is relevant, is not whether negative data might be useful but what kinds of minds can productively use it. The absence of a barking is useful when one is listening for dogs. Thus, the more constrained the space of options under consideration the easier it is to use absence of evidence as evidence of absence. If you have no idea what you are looking for, not finding it is of little informational value.

[9] For example, Chater and Vitanyi (C&V) (here) order the available hypotheses according to “simplicity” measured in MDL terms, not unlike what Chomsky proposed in Aspects. Not surprisingly, given such an ordering indirect negative evidence can be usefully exploited (something that would not surprise a GGer). What C&V do not consider are the possibility of cases where there is virtually no relevant positive or negative data in the PLD. This is what is taken to be the strongest kind of PoS argument and is the central case discussed in at least one of the references C&V cite (see here).

[10] Most who think that this is more or less on the right track actually take “simple” to mean un-embedded binding domains (e.g. Lightfoot). This is sometimes called Degree 0⁺. Thus, ‘Bill’ is in the PLD in (i) but not in (ii):

(i) John believes Bill to be intelligent

(ii) John believes (that) Bill is intelligent

Tuesday, November 26, 2013

Linguistic appropriate gifts

Thanksgiving is in a couple of days and if you are like me (unlikely, I know) you are starting to think of what presents to get friends and relatives for the holidays. You are also probably starting to wonder how best to answer the question “what’s a linguist do?” when your parents, siblings, nieces, nephews, etc. introduce you to their circle of friends who do understandable things like podiatry and necromancy. You need a spiel and a plan. Here are mine. After mumbling a few things about fish swimming, birds flying and people speaking and saying that I study humans the way Dylanologists study Dylan and archeologists study Sumer (by wading through their respective garbages) I generally do what most academics do: I thrust a book in their hands in the expectation (and hope) that they won’t read it but because they won’t guilt will pre-empt a similar question next time we meet. To work effectively, this book must be carefully selected. Not just anything will do. It must be such that were it read it would serve to enlighten (here I worry less about the immediate recipient and more about some possible collateral damage, e.g. a young impressionable mind picking it up precisely because her/his parental units have disdained it). And just as important, it doesn’t immediately bore you to tears.

Such books do exist, happily. For example, there’s the modern classic by Steve Pinker The Language Instinct (here). It’s a pretty good intro to what linguists do and why. It’s not tech heavy and its full of pretty good jokes and though I would have preferred a little more pop-sci of the Scientific American variety (the old Sci-Am where some actual science was popularized, rather than the science light modern mag), I have friends who read Steve’s book and walked away educated and interested.

A second excellent volume, but for the more ambitious, as it gets into the nuts and bolts of what we do albeit in a very accessible way is Ray Jackendoff’s Patterns in the Mind (here). One of the best things about the book is the observation that Ray makes a big deal of right at the start between a pattern matcher and a pattern generator. As he points out, there are an unbounded number of linguistic patterns. A finite set of templates will not serve. In other words, as far as language is concerned minds are not pattern matchers at all (suggesting that the book was mistitled?). This distinction between generative systems and pattern matching systems is very important (see here for some discussion) and Ray does an excellent job of elaborating it. He also gets his hands dirty explaining some of the technology linguists use, how they use it and why they do. It’s not a beach read, but with a little effort, it is user friendly and an excellent example of how to write pop-ling for the interested.

A third good read is Mark Baker’s The Atoms of Language (here). Everyone (almost everyone, me not so much) is fascinated by linguistic diversity and typology. Mark effectively explains how beneath this linguistic efflorescence there are many common themes. None of this will be news to linguists, but it will be eye opening to anyone else. Family/friends who read this and mistake what you do as related to what Mark describes will regard you in a new more respectful light. I would recommend reading this book before you give it as a gift (indeed, read some of the technical papers too) for those who read it will sometimes follow up with hard questions about typology thinking that you, another linguist, will have some answers. It pays to have some patter at your disposal to either further enlighten (or thoroughly confuse and hence disarm) your interlocutor and if you are as badly educated as I am about these matters a little defensive study is advised. The problem with Mark’s book (and this is a small one for the recipient but not for the giver) is that it is a little too erudite and interesting. Many linguists just don’t know even a tenth of what Mark does but this will likely not be clear to the neophyte giftee. The latter’s misapprehension can become your embarrassment so be warned! Luckily, most who you would give this book to can be deterred from asking too many questions by mention of topics like the Cinque Hierarchy, macro vs micro variation, cartography or the Universal Base Hypothesis (if pressed, throw in some antisymmetry stuff!). My advice: read a couple of papers by Mark on Mohawk or get cartographic before next meeting the giftee that might actually read your present.

There are other nice volumes to gift (or re-gift as the case may be). There’s Charles Yang’s The Infinite Gift (here) if your giftees tastes run to language acquisition, there is David Lightfoot’s The Language Lottery (here) if a little language change might be of interest and, somewhat less linguistiky but nonetheless a great read (haha), Stan Dehaene’s Reading in the Brain (here). And there are no doubt others that I have missed (sorry).

Before ending, let me add one more to the list, one that I confess to only having recently read. As you know, Stan Dehaene recently was at UMD to give the Baggett lectures. In the third one, he discussed some fMRI work aimed at isolating brain regions that syntax lights up (see here for some discussion). He mentioned that this work benefitted from some earlier papers by Andrea Moro (and many colleagues) using Jabberwocky to look for syntax sensitive parts of the brain. This work is reprised in a very accessible popular book The Boundaries of Babel (here). The first and third parts of the book go over pretty standard syntactic material in a very accessible way (the third part is more speculative and hence maybe more remote from a neophyte’s interests). The sandwiched middle goes over the fMRI work in slow detail. I recommend it for several reasons.

First, it lightly explains the basic physics behind PET and fMRI and discusses what sorts of problems these techniques care useful for and what limitations they have.

Second, it explains just how a brain experiment works. The “subtractive method” is well discussed and its limitations and hazards well plumbed. In contrast to many rah-rah for neuroscience books, Andrea both appreciates the value of this kind of investigation without announcing that this is the magic bullet for understanding cog-neuro. In other words, he discusses how hard it is to get anything worthwhile (i.e. understandable) with these techniques.

Third, the experiments he reprises are really very interesting. They aim neurological guns at hard questions, viz. the autonomy of syntax and Universal Grammar. And, there are results. It seems that brains do distinguish syntactic from other structure “syntax can be isolated in hemodynamic terms” (144) and that brains are sensitive to processes that are UG compatible from those that are not. In particular, the brain can sort out UG compatible rules in an artificial language from those that are not UG kosher. The former progressively activate Broca’s area while the latter deactivate it (see, e.g. p. 175). Andrea reports these findings with the proper degree of diffidence considering how complex the reasoning is. However, it’s both fun and fascinating to consider that syntactic principles are finding neurological resonances. If you (or someone you know) would be interested in an accessible entre into how neuro methods might combine with some serious syntax, Andrea’s book is a nice launching point.

So, the holidays are once more upon us. Once again family and friends threaten to pry into your academic life. Be prepared!

Faculty of Language

Comments

Tuesday, September 9, 2014

Rationalism, Empiricism and Nativism -2

Tuesday, November 26, 2013

Linguistic appropriate gifts

Contributors