Faculty of Language: On playing nicely together

Tuesday, November 12, 2013

On playing nicely together

UMD has a cognitive science lecture series with colloquia delivered every other Thursday afternoon to a pretty diverse audience of philosophers, linguists, psychologists, and the occasional computer scientist, neuroscientist and mathematician. The papers are presented by leading lights (i.e. those with reputations). The last dignitary to speak to us was Fei Xu from Berekely psych and her talk was on how a rational constructivism, a new take on old debates, will allow us to get beyond the Empiricism/Rationalism (E/R) dualism of yore. Here’s the abstract:

The study of cognitive development has often been framed in terms of the nativist/empiricist debate. Here I present a new approach to cognitive development – rational constructivism. I will argue that learners take into account both prior knowledge and biases (learned or unlearned) as well as statistical information in the input; prior knowledge and statistical information are combined in a rational manner (as is often captured in Bayesian models of cognition). Furthermore, there may be a set of domain-general learning mechanisms that give rise to domain-specific knowledge. I will present evidence supporting the idea that early learning is rational, statistical, and inferential, and infants and young children are rational, constructivist learners.

I had to leave about five minutes before the end of the presentation and missed the question period, however, the talk did get me thinking about the issues Xu mentioned in her abstract. I cannot say that her remarks found fertile ground in my imagination, but they did kick start a train of thought about whether the debate between Es and Rs is worth resolving (or maybe we should instead preserve and sharpen the points of disagreement) and what it would mean to resolve it. Here’s what I’ve been thinking.

When I started in this business in the 1970s, the E/R debate was billed as the “innateness controversy.” The idea was that Rs believe that there are innate mental structures whereas Es don’t. Of course, this is silly (as was quickly observed and acknowledged), for completely unstructured minds cannot do anything, let alone think. Or, more correctly, both Es and Rs recognize that minds generalize, the problem being to specify the nature of these generalizations and the mechanisms that guide it (as mush doesn’t do much generalizing). If so, the E/R question is not whether minds have biologically provided structure that supports generalization but the nature of this structure.

This question, in turn, resolves itself into two further questions: (i) the dimensions along which mental generalizations run (i.e. the primitive features minds come with) and (ii) the procedures that specify how inputs interact with these features to fix our actual concepts. Taking these two questions as basic allows us to recast the E/R debate along the following two dimensions: (a) how “specific” are the innately specified features and (b) how “rational” are the acquisition procedures. Let me address these two issues in turn.

Given that features are needed, what constitutes an admissible one? Es have been more than happy to acknowledge sensory/perceptual features (spf), but have often been reluctant to admit much else. In particular, Es have been averse to cognitive modularity as this would invite admitting domain specific innate features of cognitive computation. Spfs are fine. Maybe domain general features are ok (e.g. “edge,” “animate,” etc.). But domain specific features like “island” or “binder” are not.

A necessary concomitant of restricting the mental feature inventory in this Eish way is a method for building complexes of features from the given simple givens (i.e. a combinatorics). Es generally recognize that mental contents go beyond (or appear to go beyond) the descriptive resources of the primitives. The answer: what’s not primitive is a construct from these primitives. Thus, in this sense, constructivism (in some form) is a necessary part of any Eish account of mind.

A second part of any Eish theory is an account of how the innately given features interact with sensory input to give rise to a cognitive output. The standard assumption has been that this method of combination is rational. ‘Rational’ here means that input is evaluated in a roughly “scientific” manner, albeit unconsciously, viz. (i) the data is carefully sifted, organized and counted and (ii) all cognitive alternatives (more or less) are evaluated with respect to how well they fit with this regimented data. The best alternative (the one that best fits the data) wins. In other words, on this conception, acquisition/development is an inductive procedure not unlike what we find more overtly in scientific practice[1] (albeit tacit) and it rests on an inductive logic with the principles of probability and statistics forming the backbone of the procedure.[2]

Many currently fashionable Bayesian models (BM) embody this E vision. See, for example, the Xu abstract above. BMs assume a given hypothesis space possibly articulated, i.e. seeded with given “prior knowledge and biases,”[3] plus Bayes Rule and maybe “a set of domain-general learning mechanisms.” These combine “statistical information” culled from the environmental input together with the “prior knowledge and biases” in a “rational manner” (viz. Bayesian manner) to derive “domain specific knowledge” from “domain-general” learning mechanisms.”

There are several ways to argue against this E conception and Rs have deployed them all.

First, one can challenge the assumption that features be restricted to spfs or the domain general ones. Linguists of the generative stripe should all be familiar with this kind of argument. Virtually all generative theories tell us that linguistic competence (i.e. knowledge of language) is replete with very domain specific notions (e.g. specified subject, tensed clause, c-commanding antecedent (aka binder), island, PRO, etc.) without which a speaker’s attested linguistic capacity cannot be adequately described.

Second, one might argue against the idea that there is a lot of feature combinatorics going on. Recall, that if one restricts oneself to a small set of domain general features then one will need to analyze concepts that (apparently) fall outside this domain as combinations of these given features. Jerry Fodor’s argument for the innateness of virtually all of our lexical concepts is an instance of this kind of argument. It has two parts. First, it observes that if concept acquisition is statistical (see below) then the hypothesis space must have some version of the acquired concept as a value in that space (i.e. it relies on the following truism: if acquiring concept C involves statistically tracking Cish patterns then the tracker (i.e. a mind) must come pre-specified with Cish possibilities). Second, it argues that there is no way of defining (most of) our lexical concepts from a small set of lexical primitives. Thus, the Cish possibilities must be coded in the hypothesis space as Cish as such and not congeries of non-Cish primitives in combination. Put the two assumptions together and one gets that ‘carburetor’ must be an innate concept (specified as such in the hypothesis space).[4] This form of argument can be deployed anywhere and where it succeeds it serves to challenge the important Eish conception of a relatively small/sparse sensory/perceptual and/or domain general set of primitive features underlying our cognitive capacities.

Third one can challenge the idea that acquisition/development is “rational.”[5] [6] This is perhaps the most ambitious anti E argument for it is an assumption long shared by Rs as well.[7] One can see the roots of this kind of criticism in the distinction between triggering stimuli and formative stimuli. Rs have long argued that the relation between environmental input and cognitive output is less a matter of induction than a form of triggering (more akin to transduction than induction). Understanding ‘trigger’ as ‘hair trigger’ provides a way of challenging the idea that acquisition is a matter of induction at all. A paradigm example of triggering is one trial learning (OTL).[8]

To the degree that OTL exists, it argues against the idea that acquisition is “rational” in any reasonable sense. Minds do not carefully organize and smooth the incoming data and minds do not incrementally evaluate all possible hypotheses against the data so organized in a deliberate manner. If OTL is the predominant way that basic cognitive competence arises, it would argue for re-conceptualizing acquisition as “growth” rather than “learning,” as Chomsky has often suggested. Growth is no less responsive to environmental inputs than learning is, but the responsiveness is not “rational” just brute causal. It is an empirical question, albeit a very subtle one, whether acquisition/development is best modeled as a rational or a brute causal process. IMO, we (indeed I!) have been to quick to assume that induction (aka: learning) is really the only game in town.[9]

Let me end: there is a tendency in academic/intellectual life to split the difference between opposing views and to find compromise positions where we conclude that both sides were right to some degree. You can see this sentiment at work in the Xu abstract above. In interest of honesty, I confess to having been seduced by this kind of sort of gentle compromise myself (though many of you might find this hard to believe). This live and let live policy enhances collegiality and makes everyone feel that their work is valued and hence, valuable (in virtue of being at least somewhat right). I think that this is a mistake. This attitude serves to blur valuable conceptual distinctions, one’s that have far reaching intellectual implications. Rather than bleaching them of difference, we should enhance the opposing E/R conceptions precisely so that we can better use them to investigate mental phenomena. Though, there is nothing wrong with being wrong (and work that is deeply wrong can be very valuable), there is a lot wrong with being namby-pamby. The E/R opposition presents two very different conceptions of how minds/brains work. Maybe the right story will involve taking a little from column E and a little from column R. But right now, I think that enhancing the E/R distinctions and investigating the pure cases is far more productive. At the very least it serves to flush out empirically substantive assumptions that are presupposed rather than asserted and defended. So, from now on, no more Mr. Nice Guy! [10]

[1] Actually, whether we find this in scientific practice is a topic of pretty extensive debate.

[2] There is a decision procedure required as well (e.g. maximize expected utility), but I leave this aside.

[3] Let me repeat something that I have reiterated before: there is nothing in Bayes per se that eschews pretty abstract and domain specific information in the hypothesis space, though as a matter of fact such has been avoided (in this respect, it’s a repeat of the old connectionism discussions). One can be a R-Bayesian, though this does not seem to be a favored combo. Such a creature would have Rish features in the hypothesis space and/or domain specific weightings of features. So as far as features go, Bayesians need not be Es. However, there may still be an R objection to framing the questions Bayes-wise, as I discuss below.

[4] I discuss Jerry’s argument more fully here and here. Note that it is important to remember that this is an argument about the innately given hypothesis space, not about belief fixation. Indeed, Jerry’s original argument assumed that belief fixation was inductive. So, the concept CARBURETOR may be innate but fixing the lexical tag ‘carburetor’ onto the concept CARBURETOR was taken to be inductive.

[5] Fodor’s original argument did not do this. However, he did consider this in his later work, especially in LOT2.

[6] I am restricting discussion here to acquisition/development. None of what I say below need extend to how acquired knowledge is deployed on line in real time, in, e.g. parsing. Thus, for example, it is possible that humans are Bayesian parsers without their being Bayesian learners. Of course, I am not saying that they are, just that this is a possible position.

[7] E.g. the old concept of children as “little linguists” falls into this fold.

[8] See here for some discussion and further links.

[9] There are other ways of arguing against the rationality assumption. Thus, one can argue that full rationality is impossible to achieve as it is computationally unattainable (viz. the relevant computation is intractable). This is a standard critique of Bayesian models, for example. The fall back position is some version of “bounded” rationality. Of course, the tighter the bounds the less “rational” the process. Indeed, in discussions of bounded rationality all the interesting action comes in specifying the bounds for if these are very narrow, the explanatory load shifts from the rational procedures to the non-rational bounds. Economists are currently fighting this out under the rubric of “rational expectations.” One can try to render the irrational rational by going Darwinian; the bounds themselves “make sense” in a larger optimizing context. Here too brute causal forces (e.g. Evo-Devo, natural physical costraints) can be opposed to maximizing selection procedures. Suffice it to say, the E/R debate runs deep and wide. Good! That’s what makes it interesting.

[10] For the record, lots of the thoughts outlined above have been prompted by discussions with Paul Pietroski. I am not sure if he wishes to be associated with what I have written here. I suspect that my discussion may seem to him too compromising and mild.

27 comments:

Alex ClarkNovember 13, 2013 at 4:06 AM
"To the degree that OTL exists, it argues against the idea that acquisition is “rational” in any reasonable sense."

I guess the classic example in language acquisition of OTL is fast mapping -- and that has Bayesian explanations like, say, Fei Xu's own work, that are both "rational" and "probabilistic". I am having trouble seeing how word learning could be other than rational.
ReplyDelete
Replies
ewanNovember 13, 2013 at 10:04 AM
"Here [in evolution] too brute causal forces (e.g. Evo-Devo, natural physical costraints) can be opposed to maximizing selection procedures."

You'll need to clarify (especially for this evolutionary naif) what the role of evo devo is in this context. But "natural physical constraints": isn't the whole point that selective forces do and indeed must select organisms with traits that are "better" in some sense, _up to_ physical limitations? You might have to forgive some naif's coarseness there but I couldn't be too too far off, mm? And isn't the point of the rational cognition program to explain cognitive processes as "as close to optimal as we can get under the circumstances"? Either on an organism level or an evolutionary level, different cases may be different (bug me for examples). In other words the "brute causal forces" you refer to must warrant some explanation for being one way and not another. It seems to me the only useful dichotomy is the phenotype-genotype level dichotomy, in one way or another things must be the way they are for a reason. Unless you reject this, I claim that you are merely elaborating the details of the rational cognition program.
ReplyDelete
Replies
NorbertNovember 13, 2013 at 10:38 AM
Ewan and Alex rightly press me on what I mean by rational cognition. Here's a shot: I take the program to be part of the continuation of a long tradition that sees induction as the key to understanding cognition. Hume is the poster pinup here. The aim of theories of induction was to model how scientific deliberation worked; how data collection and evaluation leads to truth. the idea was (and is) that how data is sampled, how it is organized and how it is used to evaluate alternatives plays a major part in explaining why some investigations lead to truth and some don't. In this context, for example, induction is contrasted with abduction and learning is contrasted with growth and gradual learning with one trial learning. Abduction, one trial learning, growth are NOT rational processes in the normal sense of the term. So, one way of taking my point is that I want to know how the Bayesian Rational Cognition program fits with these ideas. One answer is that it is at right angles to it. Another is that it abstracts away from it. Another is that it embodies it. I have been assuming that it sees itself as part of this tradition (hearing Xu e.g. start with a contrast of the rationalist and Empiricist traditions and claiming that we can eat our two cakes etc suggested that part of the sell of this approach is the traditional one). However, I could be wrong. Others treat this "approach" as effectively a suggestion for a normal form notation without making ANY empirical claims. Is this indeed the whole point? Let's talk Bayes because it's neutral wrt any questions we will be interested in? Or is it that Bayes has content and it is the right tool for the job of explaining development/acquisition. In which case I want to know what it is about Bayes that makes it such. What does IT bring to the table empirically? So when I take it that induction is not the right way of thinking about a problem, I mean it as seen against the great tradition. It is not merely a technical question.

Let me make this point another way: is 'rational' in 'rational cognition' like 'significant' in 'statistically significant'? As we all know statistically significant results can be trivial.
ReplyDelete
Replies
John K PateNovember 13, 2013 at 10:01 PM
I think the relevance of Bayesian methods comes down to whether you think there is uncertainty in the process of language acquisition. Advocates of "deterministic" triggering-based algorithms, which never make a learning mistake (and so are really "inerrant" rather than simply "deterministic") are really proposing that there is no uncertainty in the acquisition process: if we know ahead of time certain relationships between words and categories, the grammar can be straightforwardly decoded from the input.

However, if we accept that there is uncertainty involved in some aspect of language acquisition, then probabilistic models are useful, because they measure uncertainty about the values of hypothesized variables (and Bayesian models are just probabilistic models that represent uncertainty about the model parameters). For example, if we think that children use an inerrant triggering algorithm for learning syntax once they are confident in the part of speech for each word, but that there is uncertainty about the mapping between parts of speech and words, we could use a Bayesian model to measure when there is good evidence for particular sequences of parts of speech.
ReplyDelete
Replies
ewanNovember 15, 2013 at 8:42 AM
There are two cases where people say we should move beyond the difference between X and Y. In one, X versus Y is a false dichotomy. In the other, the question of X versus Y is in simply ill-posed. This is one of the second sorts of cases. It is often useful to go to great lengths to try and precisify their intuitions and those of others, but, frankly, not when there are whole mountains of precisely theory that make genuinely meaningful distinctions about learning...
ReplyDelete
Replies

Add comment