Tuesday, September 9, 2014

Rationalism, Empiricism and Nativism -2

In an earlier post (here), I reviewed Fodor’s and Chomsky’s argument concluding that anyone that believes in induction must be a nativist. Why? Because all extant inductive theories of belief fixation (BF) are selection theories and all selection theories presuppose a given hypothesis space that characterizes all the possible fixable beliefs.  Thus, anything that “learns” (fixes beliefs) must have a representation of what is learned (a given hypothesis space) which is used to evaluate the input/experience in fixing whatever beliefs are fixed.  Absent this, it is impossible to define an inductive procedure.[1] Thus, trivially (or almost tautologically (see note 1)), whatever one’s theory of induction, be it Rationalist or Empiricist, everyone is a nativist.  The question is not whether nativism but what’s native. And here is where Rationalists and Empiricists actually differ.

Before going on, let me remind you that both Fodor and Chomsky (and all the participants at Royaumont it seems to me) took this to be a trivial, nay, almost a tautological consequence of what induction is.  However, this does not mean that it is not worth remembering and repeating. It is still the case that intelligent people confuse Rationalism with Nativism and assume that Empiricists have no nativist commitments. This suggests that Rationalists contrast with Empiricists in making fancy assumptions about minds and hence bear the burden of proof in any argument about mental structures.  However, once it is recognized that all psychological theory is necessarily nativist, the burden shifting manoeuver looses much of its punch. The question becomes not whether the mind is pre-stocked with all sorts of stuff, but what kind of stuff it is stuffed with and how this stuff is organized.  Amy Perfors (here) says this exactly right (135)[2]:

…because all models implicitly define a hypothesis space, it does not make sense to compare models according to whether they build hypothesis spaces in. More interesting questions are: What is the size of the latent hypothesis space defined by the model? How strong or inflexible is the prior?...

So given that everyone is a nativist, how to decide between Rationalist (R) vs Empiricist (E) approaches to the mind. First of all, note that given that everyone is a trivial nativist the debate between Rs and Es necessarily revolves around how beliefs are fixed and what this implies for the mind’s native structure. Interestingly, probing this question ends up focusing on what kind of experience is required to fix a given belief.

Es have traditionally taken the position that beliefs are fixed by positive exposures to extensions of the relevant concepts. So, for example, one fixes the belief that ‘red’ means RED by exposure to red, and that ‘dog’ means DOG by exposure to dogs. Thus, there is no belief fixation without exposure to tokens in the relevant extensions of a concept. It is in this sense that Es see the environment as shaping mental structure. Minds track environmental input and are structured by this input. The main contribution that minds make to the structure of their contents is by being receptive to the information that the environment makes available. On an E view, the trick is to figure out how to extract information in the signal. As should be obvious, this sort of view champions the idea that minds are very good statistical machines able to find valuable informational needles in potentially very large input haystacks. Rs have no problem with this assumption, but they argue that it is insufficient to account for our attested cognitive capacities.

More particularly, Rs argue that there is more to the fixation of belief than environmental input. Or, another way of making this same point, is that the beliefs that get fixed via exposure to input data far outrun the information available from that input. Thus, thought the environment can trigger the emergence of beliefs they do not shape them for we have ideas/concepts that are not themselves tokened in the input.  If this is correct, then Rs reason that hypothesis spaces are highly structured and what you come to “know” is strongly affected by this given structure. Note that the disagreement between Rs and Es hinges on what it is possible to glean from available input.

So how to approach this disagreement in a semi-rational manner?  This is where the Logical Problem of Acquisition (LPA) comes in.  What is the LPA?  It’s an attempt to specify the nature of the input data that an Acquisition Device (AD) has access to and to then compare this to the properties of the attained competence. Chomsky discusses the general form of this approach in chapter 1 of Reflections on Language (here).

In the study of language, the famous diagram in (1) concisely describes the relevant issues:

(1)  PLDL -> FL -> GL

PLDL is the name we give to the linguistic data from L that a child (actually) uses in building its grammar. FL is, well you know, and GL is the resultant grammar that a native speaker attains.  One can easily generalize this schema to other domains of inquiry by subbing other relevant domains for “L.” A generalized version of the schema is (2) (‘X’ being a variable ranging over cognitive domains of interest) and a version of it as applied to vision is (3). So, if one’s interest is in visual object recognition (as for example in Marr’s program), we can consider the schema in (2) as outlining the logic to be explored (PVD = Primary visual data, FV = Faculty of Vision, GV = grammar (i.e. rules) of vision).[3]

(2)  PXD -> FX -> GX
(3)  PVD -> FV -> GV

This schematic rendition of the LPA focuses the R vs E debate on the information available in PXD. An Eish conception commits hostages to the view that PXD is quite rich and that it provides a lot of information concerning GX.  To the degree that information about GX can be garnered from PXD to that degree we need not populate FX with principles to bridge the gap. Rish conceptions rest on the view that PXD is a rather poor source of information relevant to GX.  As a result, Rs assume that FX is generally quite rich.

Note that both Rs and Es assume that FX has a native structure. This, recall is common to both views. The question at issue is how much belief fixation (or more exactly the fixation of a particular belief) owes to the nature of the data and how much to the structure of the hypothesis space. As a first approximation one can say that Rs believe that given hypothesis spaces are pretty highly structured so that the data required to “search” that space can be quite sparse. Conversely, the richer the set of available alternatives the more one needs to rely on the data to fix a given belief. Thus for Rs all the explanatory action lies in specifying the narrow range of available alternatives, while for Es most of the explanatory action lies in specifying the (most often nowadays, statistical) procedures that determine how one moves across a rather expansive set of possibilities. 

The schemas above suggest ways of investigating this disagreement. Let’s consider some.

E invites the view that, ceteris paribus, variations in PXD should lead to variations in GX as the latter closely tracks properties of the former (it is in this sense that Es think of PXD as shaping a person’s mental states).  Thus, if some kinds of inputs are systematically absent in an individuals’ PXD, we should expect that that individual’s cognitive development and attained competence should differ from that of a individual with more “normal” inputs. Hume (our first systematic associationist psychologist) gives a useful version of this view:[4]

…wherever by any accident the faculties which give rise to any impressions are obstructed in their operations, as when one is born blind or deaf, not only the impressions are lost, but also their corresponding ideas; so that there never appear in the mind the least traces of either of them.

There’s been lots of research over the last 50 years exploring Hume’s contention in the domain of language acquisition.  Lila Gleitman and Barbara Landau (G&L) provides a good brief overview of some of the child language research investigating these matters.[5] It notes that the evidence does not support this prediction (at least in the domain of language). Rather it seems that “humans reconstruct linguistic form …[despite] the blatantly inadequate information offered in their usable environment (91).” In other words, it seems that the course of language acquisition can proceed smoothly (in fact no differently than what happens in the “normal” case) even when the input to the system is perceptually very limited and degraded. G&L interpret this Rishly to mean that language acquisition is relatively independent of the nature of the quality of the input, which makes sense if it is guided by a rich system of innate knowledge.

G&L illustrate the logic using two kinds of examples: blind people can and do learn the meanings of words like ‘see’ and ‘look’ without being able to see or look, and people can acquire full native competence (and can make very subtle “perceptual” distinctions in their vocabulary) despite being blind and deaf. Indeed, it seems that even extreme degradation of the sensory channels leaves the process of language acquisition unaffected.

It is worth noting just how degraded the input can be when compared to the “normal” case. G&L reporting Carol Chomsky’s original research on learning via the Tadoma method (92):[6]

To perceive speech at all, the deaf-blind must place their fingers strategically at the mouth and throat of the speaker, picking up the dynamic movements of the mouth and jaw, the timing and intensity of the vocal-cord vibration, and the release of air…From this information, differing radically in kind and quality from the continuously varying speech wave, the blind-deaf recover the same ornate system of structured facts as do hearing learners…

In short, there is plenty of evidence that language acquisition can (and does) take place in the face of extremely degraded input, at least when compared with the PLD available in the standard case.[7]

The Poverty of Stimulus (PoS) argument also reflects the logic of the schemas in (1-3). As the schema suggests, a PoS has two major struts: a description of the available PLD and a description of the grammatical operations of interest (i.e. the relevant rules). The next step compares what information can be gleaned about the operation from the data, the slack is then used to probe the structure of FL. The standard PoS question is then: what must we assume about FL so that given the witnessed PLD, the LAD can derive the relevant rules?  As the schema indicates, the inference is from instances of rules (used outputs of a grammatical system) to the rules that generate the observed sentences. Put another way, whatever else is going on, the LPA requires that FL at least contain some ways of generalizing beyond the PLD. This is not controversial. What is controversial is how fancy these methods for generalizing beyond the data have to be. For Es, the generalizing procedures are quite anodyne. For Rs it is often quite rich.

Well-designed PoS arguments focus on grammatical phenomena for which there is no likely relevant information available in the PLD. If Es are right (see Hume above), all relevant grammatical operations and principles should find (robust?) expression in the PLD. If Rs are right, we should find lots of cases where speakers develop grammatical competence even in the absence of relevant PLD (e.g. all agree that “John expects Mary to hug himself” is out and that “John expects himself to hug Mary is good” where ‘John’ is the antecedent of ‘himself’).

It goes without saying that given this logic debate between Es and Rs will revolve around how to specify the PLD in relevant cases (see here for a sophisticated discussion). So for example, all accept the idea that PLD consists of good examples of the relevant operation (e.g. all take: “John hugged himself” to be a typical data point bearing on principle A (A)). What of negative data, data that some example is unacceptable with the indicated interpretation (e.g. that “John expects Mary to hug himself” is out)?  There is every reason to think that overt correction of LAD “mistakes” barely occurs. So, in this sense the PLD does not contain negative data. However, perhaps for the LAD absence of evidence is evidence of absence. In other words, perhaps for the LAD failing to witness an example like “John expects Mary to hug himself” leads to the conclusion that the dependency between ‘John’ and ‘himself’ in these configurations is illicit. This is entirely possible. So too with other *-cases.[8] 

Note, that this reasoning requires a fancier FL than one that simply assumes that all decisions are made on the basis of positive data.  So the logic of LPA is respected here: we compensate for the absence of certain information in the PLD (i.e. direct negative evidence) by allowing FL to evaluate expectations of what should be seen in the PLD were a given construction good.[9] The question an R would ask an E is whether the capacity to compute such expectations doesn’t itself require a pretty hefty native capacity. After all, many things are absent from the data, but only some of these absences tell us anything (e.g. I would bet that for most cases in the PLD the anaphor is within 5 words of the antecedent, nonetheless “John confidently for a man of his age and temperament believes himself to be ready to run the marathon” seems fine).

One assumption I commonly make in considering PoS arguments is that PLD effectively consists of simple acceptable sentences (e.g. “John likes himself”).  This is the so-called Degree 0 hypothesis (D0H).[10]  If the PLD is so restricted, then FL must be very rich indeed for many robust linguistic phenomena are simply unattested (and recall, induction is impossible in the absence of any  data to drive it) in simple clauses; e.g. island effects, ECP effects, many binding effects, minimality effects a.o. The D0H may be too strong, but there are two (maybe one as they are related) reasons for thinking that it is on the right track.

The first is Penthouse Principle (PP) Effects.  Ross noted long ago that there are many operations restricted to main clauses but virtually none that apply exclusively to embedded clauses. Subject Aux Inversion and Tag Question formation are two examples from English.  If we assume that something like D0H is right(ish) we expect all idiosyncratic processes to be restricted to main clauses where substantial evidence for them will be forthcoming.  Embedded clauses, on the other hand should be very regular. At the very least we expect no operations to apply exclusively to embedded domains, the converse of the PP as given D0H there can be no evidence to fix them.

The second reason relates to this. It’s a diachronic argument David Lightfoot gave based on the history of English (here).  It is based on a very nice observation: main clause properties can affect embedded clause properties but not vice versa. Lightfoot illustrates this by considering the shift from OV to VO in English.  He notes that in the period in which the change occurred, embedded clauses always displayed OV order. Despite this, English changed from OV to VO.  Lightfoot reasons as follows: were embedded clause information robustly available there would have been very good evidence that, despite appearances to the contrary in unembedded clauses, that English was OV not VO (i.e. the attested change to VO (which ended up migrating to embedded clauses) would never have occurred. Thus, the fact that English changed in this way is nice (and influences in the other direction are unattested) follows if something like D0H holds (viz. an LAD don’t use embedded clause information child in the acquisition of its grammar). Lisa Pearl subsequently elaborated a sophisticated quantitative version of this argument here and here. The upshot: D0H holds. Of course if it does then the strong version of PoS arguments for many linguistic phenomena readily spring to mind.  No data, no induction. No induction, highly structured natively given hypothesis spaces guiding the AD.

OK, this post has gotten out of control and is far too long. Let me end by reiterating the take-home message.  Rs and Es differ not on whether nativism but on what is native. And, exploring the latter effectively revolves around considerations of how much information the data contains (and the child can use) in fixing its beliefs. This is where the action is. Research like what G&L review is interesting in that it shows that achieved competence seems quite insensitive to large variations in the relevant usable data. Classical PoS arguments are interesting in that they provide cases where it is arguable that there is no data at all in the input relevant to fixing a given belief. If this is so, then the mechanisms of belief fixation must lean very heavily on the highly structured (and hence restricted nature) of the hypothesis space that ADs natively bring to the belief fixation process. In R/E debates everyone believes that input matters and everyone believes that minds have native structure.  The argument is about how much each factor contributes to the process. And this, is something that can only be adjudicated empirically. As things stand now, IMO, the fertility of the Rish position in the domain of language (most of cognition actually) has been repeatedly demonstrated. Score one (indeed many) for Descartes and Kant.





[1] In effect, induction serves to locate a member/members from a given set of alternatives. No pre-specified alternatives, no induction. Thus Fodor’s point: for learning (i.e. belief fixation) to be possible requires a given set of concepts that mediate the process.

Fodor emphasizes that this view, though trivial, is not purely tautological. There does exist a tautological claim that some have confused with Fodor’s. This misreading interprets Fodor as saying that any acquired concept must be acquirable (i.e. a principle of modal logic along the lines of: If I do have the concept that I could have had the concept). Alex Clark, for example, so reads Fodor (here):  “There is a tautological claims which is that I have an innate intellectual endowment that allows me to acquire the concept SMARTPHONE in some way, on the basis of reading, using them, talking to people etc. Obviously any concept I have, I must have the innate ability to have it…”


Fodor notes this possible interpretation of his views at Royaumont (p. 151-2), but argues that this is not what he is claiming. He says the following:  “The banal thesis is just that you have the innate potential of learning any concept you can in fact learn; which reduces, in turn, to the non-insight that whatever is learnable is learnable. …What I intended to argue is something very much stronger; the intended argument depends on what learning is like, that is the view that everybody has always accepted, that it is based on hypothesis formation and confirmation. According to that view, it must be the case that the concepts that figure in the hypothesis you come to accept are not only potentially accessible to you, but are actually exploited to mediate the learning…The point about confirming a hypothesis like "X is miv off it is red and square" is that it is required that not only red and square be potentially available to the organism, but that these notions be effectively used to mediate between the organism's experiences and its consequent beliefs about the extension of miv…”

In other words, if inductive logics require given hypothesis spaces to get off the ground and if we attribute an inductive logic to a learner then we must also be attributing to them the given hypothesis space AND we must be assuming that it is in virtue of exploiting the properties of that space in fixing a belief. So far as I can tell, this is what every inductivist is in fact committed to.
[2] Despite the terminological misstep of identifying Rationalism with Nativism on p 127.
[3] In Marr’s program, the grammar includes the rules and derivations that get us from the grey scale sketch to the 2.5D sketch.
[4] This is quoted in Gleitman and Landau, see note 4. The quote is from Hume’s Treatise p 49.
[5] See ‘Every child an isolate: nature’s experiments in language learning,” Chapter 6 of this. See here for a free copy.
[6] Carol Chomsky’s original papers on this topic are appendixed in book. They are well worth reading. On the basis of the reported speech, the Tadoma learners seem indistinguishable from “normal” native speakers.
[7] G&L also note the excess of data problem towards the end of their paper. This is something that Gleitman has explored in more recent work (discussed here and in links cited there). Lila once noted that a picture is worth a thousand words, and that is precisely the problem. In the early period of word learning the child is flooded with logical possibilities when word learning is studied in naturalistic settings.  Here induction becomes a serious challenge not because there is no information but because there is too much and narrowing it down to the relevant stuff is very hard. Lila and colleagues have argued that in such cases what the child does bears relatively little resemblance to the careful statistical sampling that one might expect if acquisition were via “learning.” This suggests that there must be a certain sweet spot where data is available but not too available for learning (induction) to be a viable form of acquisition. Where this is not possible other acquisition procedures appear to be at play, e.g. guess and guess again! Note, that this amounts to saying that resource constraints are key factors in making “learning” an option. In many cases, learning (i.e. reviewing the alternatives systematically) is simply too costly, and other less seemingly rational procedures kick in. Interestingly, form an R perspective, it is precisely when the field of options is narrowed (when syntax kicks in) that something akin to classical learning appears to become viable.
[8] For reasons I have never quite understood, many (see here) have assumed that GGers are hostile to the idea that LADs can use “negative” data productively.  This is simply false. See Howard Lasnik (here) for a good review.  As Lasnik notes, the possibility that negative data could be relevant goes back at least to Chomsky’s LGB (if not earlier).  What is relevant, is not whether negative data might be useful but what kinds of minds can productively use it.  The absence of a barking is useful when one is listening for dogs.  Thus, the more constrained the space of options under consideration the easier it is to use absence of evidence as evidence of absence. If you have no idea what you are looking for, not finding it is of little informational value.
[9] For example, Chater and Vitanyi (C&V) (here) order the available hypotheses according to “simplicity” measured in MDL terms, not unlike what Chomsky proposed in Aspects. Not surprisingly, given such an ordering indirect negative evidence can be usefully exploited (something that would not surprise a GGer).  What C&V do not consider are the possibility of cases where there is virtually no relevant positive or negative data in the PLD. This is what is taken to be the strongest kind of PoS argument and is the central case discussed in at least one of the references C&V cite (see here).
[10] Most who think that this is more or less on the right track actually take “simple” to mean un-embedded binding domains (e.g. Lightfoot).  This is sometimes called Degree 0+.  Thus, ‘Bill’ is in the PLD in (i) but not in (ii):
(i)             John believes Bill to be intelligent
(ii)           John believes (that) Bill is intelligent

31 comments:

  1. great exposition.

    somewhat orthogonal but:

    I'd like to ask though what you think of Fodor's recent sermons to the effect that this computational story just about only holds for modular domains of cognition, and doesn't scale up. that is, that the computational view of acquisition and thinking which entail all of the details you just described only holds in parts of the mind that work like turing machines (language, perceptual systems): things in which the content is innate, atomistic, compositional, systematic, semantically evaluable in virtue of their syntax etc etc).

    for instance heres a quote from a book review: (http://www.lrb.co.uk/v20/n02/jerry-fodor/the-trouble-with-psychological-darwinism) unfortunately there's no way for me to pull a good quote from this review without turning this comment into a monster. but have a look at the whole thing, it's about 6 or 7 paragraphs in.

    "I think it’s likely, for example, that a lot of rational belief formation turns on what philosophers call ‘inferences to the best explanation’. You’ve got what perception presents to you as currently the fact and you’ve got what memory presents to you as the beliefs that you’ve formed till now, and your cognitive problem is to find and adopt whatever new beliefs are best confirmed on balance. ‘Best confirmed on balance’ means something like: the strongest and simplest relevant beliefs that are consistent with as many of one’s prior epistemic commitments as possible. But, as far as anyone knows, relevance, strength, simplicity, centrality and the like are properties, not of single sentences, but of whole belief systems; and there’s no reason at all to suppose that such global properties of belief systems are syntactic."

    this is obviously not strictly relevant to the linguist but I supposed it would be relevant to you.

    ReplyDelete
    Replies
    1. @Max:
      I tend to think that Fodor is onto something here. His views are consonant with Lila's recent work in which she argues that in open ended context like in early word learning induction is replaced, effectively, by guessing until you hit on a winner. He notes that there is another way to do things: inference to the best explanation. The problem with this is that it is very open textured, contextual and global. And we really don't know how this sort of thing gets done. What counts as best, relevant, simplest etc has proven very hard to "mechanize." It seems that thinking and judgment really currently elude our understanding. I agree with Fodor here.

      The standard reply is that our minds/brains are massively modular. I tend to think that Fodor is right in thinking that this is not really correct. Sadly I have no idea what to suggest as replacement, Fodor's point.

      I've always thought that one of the reasons that we can study syntax and phonology and parts of semantics is that these are relatively informationally encapsulated systems. Were they not, we'd get nowhere. However, not everything is like this, and where it isn't we are stuck.

      Delete
  2. I've got the same intuition.

    So then let me ask you, in light of this, what do you make of the minimalist inverted Y model of the architecture of the language faculty.

    how does information flow in and out in your view? Is the CI interface feeding syntax from some central processor? when something is transferred over to the CI for interpretation does that interpretation become available to the central processor?

    Presumably, a mental representation of a thought needs to be available to the syntactic component so that it knows what structure to build?

    ReplyDelete
    Replies
    1. @Max:
      Ahh, deep questions. I don't think that syntax right now has much to say about these deep questions. We all assume that FL interfaces with a sound and a meaning system. What this comprises, at least on the meaning side, is very unclear. Frankly, this is one methodological reason for why I am not wild about linguistic explanations based on interface properties. We have some idea abut what a derivation might look like and how it might be computationally well-designed. I think we have virtually no understanding about CI aside from a few cute asides like Full Interpretation (which is actually not that well defined either).

      The question you point to and the obscurities it invokes becomes clear when one actually looks at the difference between parsing and production studies (thought this is not to say that generation is production!). We actually have non-trivial theories of parsing because we can say quite a bit before we need to say anything about the thoughts a linguistic object makes contact with. It's a little like vision work here; we are good until we need to say anything about object recognition. In production we have a hard tome even getting off the ground as we don't know what the input to the production system is. Why? Because we assume it is a thought and we don't know what thoughts are or how they are represented. Hence we can say a few things, but not very much.

      So, how does FL relate to the interfaces? Don't know. I don't especially know much about CI: how many "modules" might there be? What's the representational format interfaced with? Is there one? Many? DO NOT KNOW.

      BTW, Jackendoff discusses this somewhere in "Consciousness and the Computational Mind" (I think). He addresses the question of how we talk about what we see, and does this by shooing how to link up linguistic and visual representations. I doubt it's right in detail;, but it gives you the flavor of what reasonable approach might look like.

      Delete
    2. Once you allow indirect negative evidence, I think you're somewhere in the intermediate zone between E and R, but where? And how to found out?

      And then the problem arises that we appear not to have any truly convincing demonstrations that anything in particular is acquired but not learnable from evidence under fairly broad, close-to-E assumptions.

      There is a minor and presumably parametric difference between US and UK English in that in the UK, you can say 'let's go to mine' meaning 'let's go to my house', comparable to 'let's go to Bill's', without 'house' in the immediate discourse environment (Cindy hears this kind of thing on the BBC crime shows she watches).

      But, in the approx 6.7 million words of 'child directed speech' on the 5 biggest files of UK English on Childes, this construction does not even occur once, even tho people do say things like "at my house", or "I'm gonna eat yours then".

      This is a simple construction, surely learned from evidence of some kind, and not hard to observe in entertainment made for grownups, but absent from a decent sized corpus of what is supposed to be child directed speech (2-6 years worth of input, depending on SES, according to the Risley and Hart figures that everybody cites). So there's work to be done in figuring what the input is, even as straight text.

      The agreement behavior of quirky-case marked PRO in Icelandic is probably about as good as it gets for something that is arguably not learned, but even there, there is no *scientific* case, due to lack of any definite body of data, argued plausibly to be effectively very similar to PLD (in both amount and nature), from which the overt evidence is lacking.

      Delete
  3. Norbert, you don't mention linguistic nativism at all -- in your view then the POS is not an argument for linguistically specific hypothesis spaces, but just for (highly) structured ones?

    That seems a big change from earlier versions of the argument.

    ReplyDelete
    Replies
    1. I never understood any earlier versions to be anything different. To the extent that there a well-reasoned argument about this topic suggested domain-specificity, I read it as secondary and speculative. It might be worth finding a clear example. If it's from Chomsky there is the usual wall of rhetorical flourish to partial out.

      Delete
    2. Conceptually, PoS can surely only argue for structure in the hypothesis space; but some (most?) people, such as Pearl and Sprouse in various publications, have taken UG to mean structure specific to language-learning.

      Delete
    3. From Pullum and Scholz 2002:
      "It is widely believed, outside of linguistics as well as within it, that linguists have developed an argument for linguistic nativism called the "argument from poverty of the stimulus". Linguistic nativism is the view (putatively reminiscent of Cartesian rationalism and opposed to empiricism) that human infants have at least some linguistically specific innate knowledge."
      later
      "The one thing that is clear about the argument from poverty of the stimulus is what its conclusion is supposed to be: it is supposed to show that human infants are equipped with innate mental mechanisms with specific linguistic content that assist the language acquisition process - in short, that the facts about human language acquisition support linguistic nativism."

      Delete
    4. Do they name any specific works that way exactly that? It is something that certainly was floating around in the air, perhaps with nobody writing it down a completely explicit way.

      Back when I believed in rather than being basically agnostic about the task-specific nature of the child's cheat sheet for language acquisition, it was because the structure seemed to be too weird to be useful for anything else, but at some point that stopped looking like a real argument to me (mostly because of our massive ignorance about everything else). And, back then, nobody had thought of Darwin's Problem.

      Delete
    5. Ewan and Avery are correct: the argument for domain specificity has ALWYAS been empirical. One looks at the principles taken to characterize FL and see if they generalize to other domains. Until recently, the principles seemed sui generic. The primitives were language specific, the constraints were language specific, and the operations were language specific. MP has raised the issue of whether this need be so, or whether the apparent language specificity is real, or maybe there is some way to unify linguistic operations and principles with those in other domains of cognition. Right now, the accomplishments, and I believe that there have been some, have been modest. So right now, there is lots of domain specificity within FL.

      I understand the desire to unify FL with other cognitive faculties. However, such unification is an accomplishment not a foregone conclusion. To date we have barely managed to describe the different cognitive domains in any depth. Hence unification is somewhat premature. And given what we have, domain specificity looks like a good bet.

      Last point: modularity and domain specificity are closely linked. I will talk about this in my next post. We have every reason to believe that FL is modular. Language is very important and so it would not be surprising if we had a specially designed, dedicated faculty whose concerns were linguistic. After all, we have a visual system distinct from audition and olfaction. If these unify at all, it is not at the "organ" level but at the more primitive circuit level. If this sounds cryptic, read my forth coming post.

      Delete
    6. Yes, I always took Pullum and Scholz to be wrong on that point. They didn't cite anyone and I wasn't convinced by their mere assertion of "widely."

      Delete
    7. Trusting Pullum on almost anything to do with PoS is akin to taking G.W. Bush's word about WMDs.

      Delete
    8. "Trusting Pullum on almost anything to do with PoS is akin to taking G.W. Bush's word about WMDs."

      Would you be so kind to provide evidence for this analogy, Norbert? I am sure I am not the only one in your audience who is amazed by it - so don't see it as talking to me but as informing them.

      Delete
    9. The bet on task-specificity is not one I'm interested in making because a) I can't see any payoff b) it adds a speculation to a collection of actual results of various levels of abstraction, such as the almost or complete mild context sensitivity of NLs, the absence of relative pronouns when the RC precedes the head. together with questions we don't really know the answer to but do have at least a somewhat coherent approach to trying to answer. People are easily confused, and I think we'd be better off leaving the speculations out of the main presentation, at least.

      Delete
    10. We agree, Avery, that people are easily confused - as evidenced by Norbert's [and others'] hostility towards the 2002 Scholz&Pullum and Pullum & Scholz papers - probably 2 of the most badly misunderstood papers on PoS. We also agree that we'd be better off leaving speculations out of the main presentation. This brings me to the complaints voiced above by some about the lack of evidence in P&S; S&P. If true, such seem fair complaints and one would of course expect that anyone making such complaints would lead by example.

      So how are 'the good guys' doing in the 'make only claims supported by credible empirical evidence' department. I'll take the "Defense of Nativism" [ML] paper Alex C. provided a link to because I assume that it is Norbert-approved and most have read it. So how do these fine nativists support their pretty far reaching claims with empirical evidence? Here is one example:

      “English-speaking children sometimes go through a peculiar stage as they learn how to form certain types of questions. They insert an extra wh-word, saying things like
      (1) What do you think what Cookie Monster eats?
      (2) Who did he say who is in the box?
      These sentences are, of course, ungrammatical in adult English. But it’s not as if these children randomly insert extra wh-words any which way...
      ... though adult English speakers don’t say any of (1)–(5), the pattern found in their children’s speech does appear in other natural languages, including German, Irish, and Chamorro. For example, in German an extra wh-word is used, but not a wh-phrase, making (6) grammatical but (7) ungrammatical:
      (6) Weri glaubst du weri nach Hause geht?
      Who do you think who goes home?
      (7) *Wessen Buchi, glaubst du wessen Buchi Hans liest?
      Whose book do you think whose book Hans is reading?

      There is a reference to 2 papers by Crain+co-author but NOT A SINGLE reference to data confirming that many/most/all kids indeed go through a well defined phase in which they use [1] and [2]. Next, even though ML claim the pattern appears in at least 3 other languages only 1 example is given [again no source is cited] and this example is patently false. As native speaker of German I find [6] and [7] equally bad [if forced to make a ranking I'd say [6] is worse than [7]; an intuition shared by several German linguists I asked]. Why then would ML rest a very important claim [" that 3-year English-speakers talk like Germans when they ask certain sorts of questions"] on such questionable evidence? One has to assume it was too difficult to find an informant of a language that has >90,000,000 native speakers and hundreds, maybe thousands professional linguists. Or maybe this is again the Galilean style at work, where only counts that the conclusions are acceptable not how they were arrived at?

      In any event one has to wonder how anyone who sets the evidential bar so low for work he agrees with can be so outraged about alleged lack of evidence in the work of those he disagrees with.

      Delete
    11. That was probably an oversight (given my proclivities, I won't throw rocks at other people who commit them on occasion); here's a bunch of references I pulled out of a 2003 paper by Claudia Felser that was the first thing I came up with from googling on 'wh copy child language'.

      De Villiers et al., 1990; McDaniel et al., 1995, Thornton, 1990

      But more generally, I think the time has passed for arguing on the basis of striking but isolated phenomena; what is wanted is models that do or don't learn things that are sort of like decent grammars from data that is sort of like PLD. It is now possible for 95 lines of Python whipped up in a few days using nltk to convert CHILDES corpora into a treebank, so the excuses for not making use of these kinds of capabilities have gotten too thin.

      Delete
    12. So I don't quite follow Norbert's answer, but like P and S I have certainly ended up with the view that the conclusion of the POS is that there is domain specific knowledge. The other conclusion is trivial; is there anyone who would reject it?

      So e.g. Berwick, Pietroski, Yankama and Chomsky (2011) Poverty of the Stimulus Revisited are pretty explicit:

      Such differences in outcome arise from four typically interacting factors:
      (1) Innate, domain-specific factors;
      (2) Innate, domain-general factors;
      ....
      [then later]
      The point of a POS argument is not to replace appeals to ‘‘learning’’ with appeals to ‘‘innate principles’’ of Universal Grammar (UG). The goal is to identify phenomena that reveal Factor (1) contributions to linguistic knowledge, in a way that helps characterize those contributions.

      Delete
    13. Could someone come up with a clear quotation that shows the domain-general version of the POS? So I can cite it the next time I write on this.

      I agree with Avery and Ewan (and I guess Norbert?) that the domain specific argument is quite weak; but the conclusion of the domain general version is so weak that it doesn't need much arguing.

      Delete
    14. Alex C. I won't post any quotes of the general PoS version but encourage you to address this gem of logic the next time you write on PoS. In 2001 [The poverty of the stimulus] L&M argue that in addition to the many hypotheses compatible with the PLD there are also many promising dead ends [PDE] and that this fact would make it increasingly unlikely that the child would arrive at any plausible hypothesis at all [pp. 232-33]. As good Galilean scientists the provide zero empirical evidence for at least one PDE from which children could not recover. But lets be charitable and grant them the conclusion that because of such PDE innate constraints [ICs] are needed.

      Now looking at the 2012 paper 'In defence of Nativism' we find the example of children going through a phase in which they are emulating a pattern that is not present in the language they hear while growing up. That seems to qualify as PDE. M&L argue "Since nativists generally maintain that language-specific principles are part of the Nativist Acquisition Base, they can postulate that children are working with a highly constrained hypothesis space and that some children temporarily adopt a set of rules that are laid out in that space even if these rules aren’t attested to in the data." [p.9]

      So on the one hand we need ICs to prevent children getting stuck on PDEs. On the other hand ICs themselves introduce PDEs. Maybe you [or someone else] can explain to me HOW kids still end up with the English pattern if ICs force them to go through a German phase? Would it not be more parsimonious to keep the German pattern? And HOW can they recover from the PDE if ICs force them into this PDE in the first place? On the nativist story it is better not because of PLD.

      It seems to me if ICs are used to [1] explain how children avoid PDEs and [2] why children occasionally end up in PDEs, then maybe we're better off by eliminating ICs ...

      Delete
    15. @ Alex:
      The argument for domain specificity has always been specific. I mean by that the PoS works by isolating a grammatical phenomenon (e.g. island restrictions on movement, locality for binding etc.) and then seeing if THAT FACT can be accounted for without postulating domain specific information. The argument is weighted against doing so as the assumption commonly made is that one postulates such information as a last resort. If there is info in the PLD sufficient to get the requisite generalization/fact then one traces the causal source of the grammatical effect to that. So, though the form of the PoS is general the arguments are all very local. Domain specificity is a conclusion from particular cases. You don't like the conclusion, reanalyze the cases either by showing that the PLD is richer than assumed OR that the same phenomenon can be explained without domain specific assumptions. Promisory notes are not valid currency.

      We have run down this path before Alex. Indeed, many times. Each time we come to this point. I ask you again: show me the money. Derive islands, ECP effects, binding effects, case effects, WHATEVER without any domain specific assumptions. Do this three times and then we can talk seriously. Till then, well..not worth the time and effort to debate.

      Delete
    16. The Northern Hypocrite has spoken again:

      "Promisory notes are not valid currency.

      We have run down this path before Alex. Indeed, many times. Each time we come to this point. I ask you again: show me the money. Derive islands, ECP effects, binding effects, case effects, WHATEVER without any domain specific assumptions. Do this three times and then we can talk seriously. Till then, well..not worth the time and effort to debate."

      Indeed, promissory notes are not acceptable from those questioning NH's orthodoxy. Defenders of the NH orthodoxy on the other hand are under no obligation to 'show the money' and provide even the vaguest suggestion re HOW innate constraints facilitating the acquisition of islands, ECP, binding effects, case effects, WHATEVER are implement in the postulated domain specific biological language faculty.

      There is indeed no point in engaging in a serious debate...

      Delete
    17. I think part of the UG/PoS mess may stem from the fact that when the idea of decomposing grammar into modules was invented, the general rhetoric surrounding generative grammar didn't fully change to accomodate it. So in Chomsky's Knowledge of Language (1987:4) we read "Consider, for example, the idea that there is a language faculty, a component of the mind/brain, ..."

      Given modules, the idea that a faculty corresponds to component ceases to be sound; the faculty of language might involve 10 modules, the faculty of learning to tie your shoes 6, 4 of them common to both faculties. And, because modules are by nature more abstract than faculties, the force of the gee-whiz 'argument' that any particular one must be task-specific is greatly weakened.

      That there is no helpful quote for Alex to refer to is annoying, but it is perhaps explained by the fact that this kind of observation is both too simple minded and commonsensical to make into academic literature, and furthermore has no real bearing on the content of substantive work at the moment. Forex I had hoped to find something in my Guessing Rule One paper (unpublished, draft on lingbuzz), but didn't, presumably for the reason that nothing of an immediately useful nature would have been added by pointing out that although the Guessing Rule looks like something specific to the language faculty, it might be just a consequence of things that aren't.

      Delete
    18. @Norbert, we are not discussing (well I'm not anyway) whether there is or is not innate domain specific knowledge. We are discussing whether or not the POS argument is standardly an argument for linguistic nativism or not. I think it is. You seem to think it is too -- so we are agreeing here not disagreeing?
      So yes, there is no point in debating if we are both on the same side...

      Delete
    19. @ Avery: Thank you for drawing attention to the module/component/faculty problem. You are absolutely right: a lot of the confusion in the literature arises because people do not use these terms consistently. But then you cannot really blame individual authors as long as generativists don't put their cards on the table and define the biological language faculty. As long as they get away with massive piles of promissory notes we'll remain stuck with this conceptual mess. Alex C. may find a paper that offers the definition he asked for but he is as sure to find at least 10 generativists claiming that the author of this paper badly misunderstood the issues [there ARE and should be debates among people who are on the same side!].

      Norbert certainly deserves some kind of perseverance award for generating this blog and being the by far most prolific contributor. Whatever is out there defending his side of the story will be promptly brought to our attention. And if he has answers to our questions he'll provide them. That you won't find a clear definition of the biological LF on this blog should be taken as a very strong indicator that there is none. This is a rather sobering fact after more than 50 years of intense research on I-language [Chomsky tells us this is what generativists ALWAYS were doing and he ought to know].

      Add that the work on the processing sources of island effects by Kluender, Hofmeister, Casasanto and many others offers a valid alternative to the orthodoxy and that work on intervention effects has shown the showpiece examples of the ECP, the whole basis for the idea, were spurious because material which does not change the critical structure can nullify the effect completely (as discovered by Bresnan in 1976! and rediscovered by Culicover in 1993). At one point you have to ask yourself based on what Norbert still can claim that the orthodoxy is the most promising approach at the moment. There are no results in biology, valid alternatives in linguistics proper, developmental psychology, computational modelling ...

      Delete
    20. @Alex
      Good to be agreeing. The PoS argument has been used to argue for domain specific nativistic conclusions. But this is not intrinsic to the PoS. The PoS is an argument form. Given the premises you get different kinds of conclusions. The conclusions when addressing linguistic issues have been domain specific. GB conclusions, for example, have been that, for example, subjacency is a innate feature of FL, that CP and DP are bounding nodes is FL internal, that binding domains are what they are is domain specific, that principle C holds is part of FL etc. These conclusions rest on the grammatical details one is interested in explaining and the PLD one surmises is relevant. So yes it has been used to argue for domain specificity but no it is not inherent to the argument form that the conclusions be domain specific. Of course, IMO, there is every reason to think that there will be quite a bit of domain specificity and right now, despite some hopeful minimalist results, the conclusion that there is no domain specificity seems to me heroic (if not worse). But, heh, the filed is young, we are making progress and one should never underestimate human theoretical ingenuity.

      Delete
    21. I would be happier if Norbert had said above '"conclusions when addressing linguistic issues have been formulated in a domain specific manner, due to insufficient knowledge to formulate them more generally in a useful way".

      The reason is that the postulated Merge Mutation alone does not get us out of the way of the collision between DP and PP, but multifunctional modules do, because most or even all of them might have evolved to do all sorts of useful things that don't require your conspecifics to also have them in order to be useful, and therefore, we have hundreds of millions of years to for them to develop rather than a few tens of thousands.

      But, due to ignorance, we can't formulate them in a useful way other than apparently task-specifically. Perhaps that will change when we learn more about animal cognition ... animals are far more cognitively advanced than anybody thought when GG took shape in the 1950s and 1960s.

      My suggested place to look would be the capacity to learn by imitation, which is clearly found in mammals and birds, and iirc at least one lizard species (David Attenborough has a piece on a population of lizards on a mediterranean island who learned that they could remain active over the winter by ripping open a seed pod of a certain kind of plant; it's hard to see how this could have gotten started without some sort of 'lizard see, lizard do' faculty). This gives us about 300my for the infrastructure for language to develop.

      Delete
    22. @Avery
      I would agree to this in part. It would always be nice to be able to formulate hypotheses in more general ways. Often this leads to better stories. However, I am not sure that the incapacity is peculiar to linguistics. We always want better accounts. Right now the best ones we have seem to imply lots of domain specificity. We have reasons to want something more general. I also want a flying pony. IMO, the best way to get the better stories is to build on the ones that we have, especially as these are neither trivial nor without insight. This is how its done everywhere else, so why not linguistics.

      As for the imitation capacity. I will post something on this soon that goes into some of this. I like the idea of tracing some of our capacities to very early ancestors. I'm just unsure right now how to do this, as you rightly observe, in a fruitful way.

      Delete
  4. The quote you cited does then go on to say that starting by ascribing things to Factor 1 is a research strategy and that good research will then go on to try and reduce Factor 1 to Factor 2. But that doesn't mean I disagree about what the first part says. I agree that that sounds like a good example of something that's logically wrong. I agree that the thrust of their (qualified) statement is "POS proves there is something domain specific and innate, except that it doesn't really of course." So I take it there's a bit of bravado in this.

    "One hopes for subsequent revision and reduction of the initial character- ization, so that 50 years later, the posited UG seems better grounded. If successful, this approach suggests a perspective familiar from biology and (in our view) important in cognitive science: focus on the internal system and domain-specific constraints; data analysis of external events has its place within those constraints. At least since the 1950s, a prime goal for theoretical linguists has been to uncover POS issues and then accommodate the discovered facts while reducing the domain-specific component (1)."

    A second's reflection on this is enough to reveal that this obviously contradicts "The goal is to identify phenomena that reveal Factor (1) [and not Factor (2)] etc" - if one can take what's been learned through a POS argument about Factor 1 and then swap most of it out for Factor 2 then surely it was not all about Factor 1.

    Reflections on Language (1975) had it, I think, the right way around. However I can't find my copy at the moment. As I recall the first chapter first challenges the assumption that learning is necessarily domain general and trivial, presents a simple aux inversion sentence an example of an obvious non-trivial induction problem, and then says "it seems really unlikely to me that anything like structure-dependence could be domain general and innate". In some rhetorically overblown way no doubt but, if I remember right, still leaving it fairly clear that this is an option.

    ReplyDelete
    Replies
    1. Ewan, i think you refer to the part where Chomsky compares the simple [structure independent] Hypothesis 1 and the more complex Hypothesis 2 [that contains the structure-dependent rule] and concludes:

      "The only reasonable conclusion is that UG contains the principle that all such rules must be structure-dependent. That is, the child's mind (specifically, its component LT (H,L)) contains the instruction: Construct a structure dependent rule, ignoring all structure-independent rules. The principle of structure-dependence is not learned but forms part of the conditions for learning" [32-3]

      I would interpret this as referring to a domain specific claim [because he talks about the 'reasonably well delimited cognitive domain L' [=language]. But you are right his formulation does not rule out domain general as an option.

      Delete