Faculty of Language: November 2013

Tuesday, November 26, 2013

Linguistic appropriate gifts

Thanksgiving is in a couple of days and if you are like me (unlikely, I know) you are starting to think of what presents to get friends and relatives for the holidays. You are also probably starting to wonder how best to answer the question “what’s a linguist do?” when your parents, siblings, nieces, nephews, etc. introduce you to their circle of friends who do understandable things like podiatry and necromancy. You need a spiel and a plan. Here are mine. After mumbling a few things about fish swimming, birds flying and people speaking and saying that I study humans the way Dylanologists study Dylan and archeologists study Sumer (by wading through their respective garbages) I generally do what most academics do: I thrust a book in their hands in the expectation (and hope) that they won’t read it but because they won’t guilt will pre-empt a similar question next time we meet. To work effectively, this book must be carefully selected. Not just anything will do. It must be such that were it read it would serve to enlighten (here I worry less about the immediate recipient and more about some possible collateral damage, e.g. a young impressionable mind picking it up precisely because her/his parental units have disdained it). And just as important, it doesn’t immediately bore you to tears.

Such books do exist, happily. For example, there’s the modern classic by Steve Pinker The Language Instinct (here). It’s a pretty good intro to what linguists do and why. It’s not tech heavy and its full of pretty good jokes and though I would have preferred a little more pop-sci of the Scientific American variety (the old Sci-Am where some actual science was popularized, rather than the science light modern mag), I have friends who read Steve’s book and walked away educated and interested.

A second excellent volume, but for the more ambitious, as it gets into the nuts and bolts of what we do albeit in a very accessible way is Ray Jackendoff’s Patterns in the Mind (here). One of the best things about the book is the observation that Ray makes a big deal of right at the start between a pattern matcher and a pattern generator. As he points out, there are an unbounded number of linguistic patterns. A finite set of templates will not serve. In other words, as far as language is concerned minds are not pattern matchers at all (suggesting that the book was mistitled?). This distinction between generative systems and pattern matching systems is very important (see here for some discussion) and Ray does an excellent job of elaborating it. He also gets his hands dirty explaining some of the technology linguists use, how they use it and why they do. It’s not a beach read, but with a little effort, it is user friendly and an excellent example of how to write pop-ling for the interested.

A third good read is Mark Baker’s The Atoms of Language (here). Everyone (almost everyone, me not so much) is fascinated by linguistic diversity and typology. Mark effectively explains how beneath this linguistic efflorescence there are many common themes. None of this will be news to linguists, but it will be eye opening to anyone else. Family/friends who read this and mistake what you do as related to what Mark describes will regard you in a new more respectful light. I would recommend reading this book before you give it as a gift (indeed, read some of the technical papers too) for those who read it will sometimes follow up with hard questions about typology thinking that you, another linguist, will have some answers. It pays to have some patter at your disposal to either further enlighten (or thoroughly confuse and hence disarm) your interlocutor and if you are as badly educated as I am about these matters a little defensive study is advised. The problem with Mark’s book (and this is a small one for the recipient but not for the giver) is that it is a little too erudite and interesting. Many linguists just don’t know even a tenth of what Mark does but this will likely not be clear to the neophyte giftee. The latter’s misapprehension can become your embarrassment so be warned! Luckily, most who you would give this book to can be deterred from asking too many questions by mention of topics like the Cinque Hierarchy, macro vs micro variation, cartography or the Universal Base Hypothesis (if pressed, throw in some antisymmetry stuff!). My advice: read a couple of papers by Mark on Mohawk or get cartographic before next meeting the giftee that might actually read your present.

There are other nice volumes to gift (or re-gift as the case may be). There’s Charles Yang’s The Infinite Gift (here) if your giftees tastes run to language acquisition, there is David Lightfoot’s The Language Lottery (here) if a little language change might be of interest and, somewhat less linguistiky but nonetheless a great read (haha), Stan Dehaene’s Reading in the Brain (here). And there are no doubt others that I have missed (sorry).

Before ending, let me add one more to the list, one that I confess to only having recently read. As you know, Stan Dehaene recently was at UMD to give the Baggett lectures. In the third one, he discussed some fMRI work aimed at isolating brain regions that syntax lights up (see here for some discussion). He mentioned that this work benefitted from some earlier papers by Andrea Moro (and many colleagues) using Jabberwocky to look for syntax sensitive parts of the brain. This work is reprised in a very accessible popular book The Boundaries of Babel (here). The first and third parts of the book go over pretty standard syntactic material in a very accessible way (the third part is more speculative and hence maybe more remote from a neophyte’s interests). The sandwiched middle goes over the fMRI work in slow detail. I recommend it for several reasons.

First, it lightly explains the basic physics behind PET and fMRI and discusses what sorts of problems these techniques care useful for and what limitations they have.

Second, it explains just how a brain experiment works. The “subtractive method” is well discussed and its limitations and hazards well plumbed. In contrast to many rah-rah for neuroscience books, Andrea both appreciates the value of this kind of investigation without announcing that this is the magic bullet for understanding cog-neuro. In other words, he discusses how hard it is to get anything worthwhile (i.e. understandable) with these techniques.

Third, the experiments he reprises are really very interesting. They aim neurological guns at hard questions, viz. the autonomy of syntax and Universal Grammar. And, there are results. It seems that brains do distinguish syntactic from other structure “syntax can be isolated in hemodynamic terms” (144) and that brains are sensitive to processes that are UG compatible from those that are not. In particular, the brain can sort out UG compatible rules in an artificial language from those that are not UG kosher. The former progressively activate Broca’s area while the latter deactivate it (see, e.g. p. 175). Andrea reports these findings with the proper degree of diffidence considering how complex the reasoning is. However, it’s both fun and fascinating to consider that syntactic principles are finding neurological resonances. If you (or someone you know) would be interested in an accessible entre into how neuro methods might combine with some serious syntax, Andrea’s book is a nice launching point.

So, the holidays are once more upon us. Once again family and friends threaten to pry into your academic life. Be prepared!

Thursday, November 21, 2013

Have MOOCs peaked already?

I have been somewhat skeptical that MOOCs are the world changing innovations that supporters have supposed them to be. I have tended to regard the claims that they are "game changers" that would do to universities what online news sources have done to the press (or what online porn has done to the salacious magazine industry) as hype generated by those that want to put their hands on your wallet, or, even worse, the latest new thing that college administrators intend to use to "cut costs" (no doubt in part, my cynical self tells me, to save their own perks and salaries). I have noted that the evidence, such as it is, does not yet indicate that MOOCs are better at delivering a quality education at a good price than old style correspondence courses were and, so far as I can tell, the latter did not serve to upend the university system.

None of this is intended to demean the wonders of online resources. I LOVE them. I like being able to get online courses, I love being able to avail myself of the tons of excellent material more knowledgeable souls have graciously placed online FREE for my benefit. The world is a better place for this. However, I still think that the old-fashioned educational system, where learning is largely a kind of group hug, is the close to optimal because it embodies the human fact that education is a strongly social enterprise. Solo efforts can and do occur, but it really helps to have a community that is up close and personal, and in a real not a virtual way for real education to take place.

Why do I repeat this again here. Well, Josh Falk was kind enough to send me this fascinating interview with Sebastien Thrun, the founder of Udacity. Here's Josh's rather good take on the interview:

Sebastian Thun, the cofounder of a major MOOC company, offers a moderately pessimistic view of his own product.

Some choice quotes:

"We were on the front pages of newspapers and magazines, and at the same time, I was realizing, we don't educate people as others wished, or as I wished. We have a lousy product."

"The sort of simplistic suggestion that MOOCs are going to disrupt the entire education system is very premature."

"We're not doing anything as rich and powerful as what a traditional liberal-arts education would offer you."

There's a lot more to the interview than that, and some of his points are subtler than these cherry-picked quotes would suggest, but it's interesting that even someone who runs this type of business is expressing significant doubts.

The interview has gotten lots of collateral play. Thrun is clearly a serious person, and not just somewhat out to make a buck. MOOCs may in fact change the world, but right now, the momentum is coming not from its obvious educational superiority but because their are large amounts of money to be made in ~~destroying~~ reforming the current university system. IMO, these different educational platforms will not be fought out in the marketplace where consumers (aka students/parents) vote with their dollars and spend on cheaper MOOC educations rather than costly four year colleges. Rather, right now the strategy of MOOC advocates is to convince Higher Ed to throw in the towel and go MOOC so as to prevent later MOOCification down the road. And the convincing is being done by those that have a lot to gain if they win. In this regard, the dialectics is similar to what we see from the Petersen Foundation wrt Social Security: cut it now because we might need to cut it later. The bottom line? When you read the MOOC hype ask that age old but still relevant question, cui bono? Who benefits?

Tuesday, November 19, 2013

Bayesian claims?

Last week I had two Bayes moments: the first was a vigorous discussion of the Jones and Love paper with computational colleagues with a CS mind set and the second was a paean delivered by Stan Dehaene in his truly excellent Baggett Lectures this year (here). These conversations were quite different in interesting ways.

My computational colleagues, if I understood them correctly (and this is a very open question) saw Bayes as providing a general formal framework useful to the type of problems cognitive neuroscientists encounter. Importantly, on this conception, Bayes is not an empirical hypothesis about how minds/brains compute but an empirically neutral notation whose principle virtue is that allows you to state matters more precisely than you can in simple English. How so? Well minds/brains are complicated and having a formal technology that can track this complexity in some kind of normal form (as Bayes does) is useful. On this view, Bayes has all the virtues (and charms?) of double entry bookkeeping.[1] On this view, every problem is amenable to a Bayesian analysis of some kind so there is no way for a Bayes approach as such to be wrong or inapposite, though particular proposals can be better or worse. In short, Bayes so considered is more akin to C++ (an empirically neutral programming language) than to Newton’s mechanics (a description of real world forces).

There is a contrasting view of Bayes that is prevalent in some parts of the cog-neuro world. Here Bayes is taken to have empirical content. In contrast to the first view, its methods can be inappropriate for a given problem and a Bayes approach as such can even be wrong if it can be shown that its design requirements are not met in a given problem domain. On this view, Bayes is understood to be a description of cog-neuro mechanisms and describing a system as Bayesian is to attribute to that system certain distinctive properties.

These two conceptions cannot be more different. And I suspect that the move between these two conceptions has muddied the conceptual landscape. Empirically minded cognitive neuroscientists are attracted to the second conception for it commits empirical hostages and makes serious claims (I review some of these below). CS types seem attracted to the first conception precisely because it provides a general notation for dealing with any computational problem and it does so by being general and thus bereft of any interesting empirical content.[2] Whichever perspective you adopt, it’s worth keeping them separate.

I just read a very good exposition of the second conception of Bayes-as-mechanism by O’Reilly, Jbabdi and Behrens (OJB) (here).[3] OJB is at pains (i) to show what the basic characteristics of a Bayes system are, (ii) to illustrate successful cases where the properties so identified gain explanatory purchase within cog-neuro and (iii) to show where the identified properties are not useful in explaining what is going on. Step (iii) illustrates that OJB takes Bayes to making a serious empirical claim. Here are some details, though I recommend that you read the paper in its entirety as it offers a very good exposition of why neuroscientists have become interested in Bayes, and it’s not just because of (or even mainly due to) its computational explicitness.

OJB identifies the following characteristics of a Bayesian system (BS):

1. BSs represent quantities in terms of probability density functions (PDFs). These represent an observer’s uncertainty about a quantity (1169).

2. BSs integrate information using precision weighting (1170). This means that the information is combined sensitive to their relative reliability (as measured probabilistically): “It is a core feature of Bayesian systems that when sources of information are combined, they are weighted by their relative reliability (1171).”

3. BSs integrate new information and prior information according to their relative precisions. This is analogous to what occurs when several sources of new information are combined (e.g. visual and haptic info). As OJB puts it: “the combined estimate is partway between the current information and the prior, with the exact position depending on the relative precision of the current estimate and the prior (1171).”

4. BSs represent the values of all parameters in the model jointly, viz. “It is a central characteristic of fully Bayesian models that they represent the full state space (i.e. the full joint probability distribution across all parameters) (1171).”

In sum: a BS represents info as PDFs, does precision weighted integration of old and new info, and fully represents and updates all info in the state space. If this is what a BS is, then there are several obvious ways that a given proposal can be non-BS. A useful feature of OJB is that it contrasts each of these definitional features with a non-BS alternative. Here are some ways that a proposed system can be non-BS.

· It represents quantities as exact (contra 1). For example, in the MSTG paper here, learners represented their word knowledge with no apparent measure of uncertainty in the estimate.

· It combines info without sensitivity to its reliability (contra 2). Thus, e.g. in the MSTG paper information is not combined probabilistically by considering the weighting of the prior and input. Rather contrary info leads one to drop the old info completely and arbitrarily choose a new candidate.

· It uses a truncated parameter spaces or does not compute values for all alternatives in the space. Again, in MSTG paper the relevant word-meaning alternatives are not updated at all as only one candidate at a time is attended to.

The empirical Bayes question then is pretty straightforward conceptually: to what degree do various systems act in a BS manner and when a system deviates along one of the three dimensions of interest, how serious a deviation is it? Once again, OJB offer useful illustrations. For example, OJB notes that multi-sensory integration looks very BSish. It represents incoming info PDFly and does precision weighted integration of the various inputs. Well almost. It seems that some modalities might be weighted more than “is optimal” (1172). However, by and large, the BS model the central features of how this works. Thus, in this case, the BS model is a reasonable description of the relevant mechanism.

There are other cases where the BS idealization is far less successful. For example, it is well known that “adding parameters to a model (more dimensions to the model) increases the size of the state space, and the computing power required to represent and update it, exponentially (1171).” Apparently problems arise even when there are only “a handful of dimensions of state spaces” (1175). Therefore, in many cases, it seems that behavior is better described by “semi-Bayesian models,” (viz. with truncated state spaces) or “non-Bayesian models” (viz. in which some parameters are updated and some ignored) (1175). Or models in which “variance-blind heuristics” substitute for precision weighted integration or “rather than optimizing learning by integrating information over several trials, participants seem to use only one previous exemplar of each category to determine its ‘mean’ (1175).”

OJB describe various other scenarios of interest all with the same aim: to show how to take Bayes seriously as a substantive piece of cog-neuro science. It is precisely because not everything is Bayesian that arguing that a mechanism is Bayes that one might get some explanatory insight from the classification. OJB take Bayes to be a useful description for some class of mechanisms, ones with the three basic characteristics noted above: PDF representations, precision weighted integration and fully jointly specified state space.

OJB points out one further positive feature of taking Bayes in this way: you can start looking for neural mechanisms that execute these functions, e.g. how neurons or populations of neurons might allow for PDF like representations and their integrations. In other words, a mechanistic interpretation of Bayes leads to an understandable research program, one with real potential empirical reach.

Let me end here. I understand the OMB version of Bayes. I am not sure how much it describes linguistic phenomena, but that is an empirical question, and not one that I will be well placed to adjudicate. However, it is understandable. What is less understandable are versions that do not treat Bayes as a hypothesis about mental/neural mechanisms. If Bayes is not this, then why should we care? Indeed, what possible reason can there be in not taking Bayes in this mechanistic way? Is it the fear that it might be wrong so construed? Is being wrong so bad? Isn’t it the aim of cog-neuro to develop and examine theories that could be wrong? So, my question to non-mechanistic Bayesians: what’s the value added?

[1] This sounds more dismissive than it should perhaps. The invention of double entry bookkeeping was a real big deal and if Bayes serves a similar function, then it is nothing to sneeze at. However, if this is what practitioners take its main contribution to be, they should let us know in so many words.

[2] I believe that one can see an interesting version of these contrasting views in the give and take between Alex C and John Pate in the comments section here.

[3] The paper (How can a Bayesian approach inform neuroscience) is behind a paywall. Sorry. If you are affiliated with a university you should be able to get to it pretty easily as it is a Wiley publication.

Tuesday, November 12, 2013

On playing nicely together

UMD has a cognitive science lecture series with colloquia delivered every other Thursday afternoon to a pretty diverse audience of philosophers, linguists, psychologists, and the occasional computer scientist, neuroscientist and mathematician. The papers are presented by leading lights (i.e. those with reputations). The last dignitary to speak to us was Fei Xu from Berekely psych and her talk was on how a rational constructivism, a new take on old debates, will allow us to get beyond the Empiricism/Rationalism (E/R) dualism of yore. Here’s the abstract:

The study of cognitive development has often been framed in terms of the nativist/empiricist debate. Here I present a new approach to cognitive development – rational constructivism. I will argue that learners take into account both prior knowledge and biases (learned or unlearned) as well as statistical information in the input; prior knowledge and statistical information are combined in a rational manner (as is often captured in Bayesian models of cognition). Furthermore, there may be a set of domain-general learning mechanisms that give rise to domain-specific knowledge. I will present evidence supporting the idea that early learning is rational, statistical, and inferential, and infants and young children are rational, constructivist learners.

I had to leave about five minutes before the end of the presentation and missed the question period, however, the talk did get me thinking about the issues Xu mentioned in her abstract. I cannot say that her remarks found fertile ground in my imagination, but they did kick start a train of thought about whether the debate between Es and Rs is worth resolving (or maybe we should instead preserve and sharpen the points of disagreement) and what it would mean to resolve it. Here’s what I’ve been thinking.

When I started in this business in the 1970s, the E/R debate was billed as the “innateness controversy.” The idea was that Rs believe that there are innate mental structures whereas Es don’t. Of course, this is silly (as was quickly observed and acknowledged), for completely unstructured minds cannot do anything, let alone think. Or, more correctly, both Es and Rs recognize that minds generalize, the problem being to specify the nature of these generalizations and the mechanisms that guide it (as mush doesn’t do much generalizing). If so, the E/R question is not whether minds have biologically provided structure that supports generalization but the nature of this structure.

This question, in turn, resolves itself into two further questions: (i) the dimensions along which mental generalizations run (i.e. the primitive features minds come with) and (ii) the procedures that specify how inputs interact with these features to fix our actual concepts. Taking these two questions as basic allows us to recast the E/R debate along the following two dimensions: (a) how “specific” are the innately specified features and (b) how “rational” are the acquisition procedures. Let me address these two issues in turn.

Given that features are needed, what constitutes an admissible one? Es have been more than happy to acknowledge sensory/perceptual features (spf), but have often been reluctant to admit much else. In particular, Es have been averse to cognitive modularity as this would invite admitting domain specific innate features of cognitive computation. Spfs are fine. Maybe domain general features are ok (e.g. “edge,” “animate,” etc.). But domain specific features like “island” or “binder” are not.

A necessary concomitant of restricting the mental feature inventory in this Eish way is a method for building complexes of features from the given simple givens (i.e. a combinatorics). Es generally recognize that mental contents go beyond (or appear to go beyond) the descriptive resources of the primitives. The answer: what’s not primitive is a construct from these primitives. Thus, in this sense, constructivism (in some form) is a necessary part of any Eish account of mind.

A second part of any Eish theory is an account of how the innately given features interact with sensory input to give rise to a cognitive output. The standard assumption has been that this method of combination is rational. ‘Rational’ here means that input is evaluated in a roughly “scientific” manner, albeit unconsciously, viz. (i) the data is carefully sifted, organized and counted and (ii) all cognitive alternatives (more or less) are evaluated with respect to how well they fit with this regimented data. The best alternative (the one that best fits the data) wins. In other words, on this conception, acquisition/development is an inductive procedure not unlike what we find more overtly in scientific practice[1] (albeit tacit) and it rests on an inductive logic with the principles of probability and statistics forming the backbone of the procedure.[2]

Many currently fashionable Bayesian models (BM) embody this E vision. See, for example, the Xu abstract above. BMs assume a given hypothesis space possibly articulated, i.e. seeded with given “prior knowledge and biases,”[3] plus Bayes Rule and maybe “a set of domain-general learning mechanisms.” These combine “statistical information” culled from the environmental input together with the “prior knowledge and biases” in a “rational manner” (viz. Bayesian manner) to derive “domain specific knowledge” from “domain-general” learning mechanisms.”

There are several ways to argue against this E conception and Rs have deployed them all.

First, one can challenge the assumption that features be restricted to spfs or the domain general ones. Linguists of the generative stripe should all be familiar with this kind of argument. Virtually all generative theories tell us that linguistic competence (i.e. knowledge of language) is replete with very domain specific notions (e.g. specified subject, tensed clause, c-commanding antecedent (aka binder), island, PRO, etc.) without which a speaker’s attested linguistic capacity cannot be adequately described.

Second, one might argue against the idea that there is a lot of feature combinatorics going on. Recall, that if one restricts oneself to a small set of domain general features then one will need to analyze concepts that (apparently) fall outside this domain as combinations of these given features. Jerry Fodor’s argument for the innateness of virtually all of our lexical concepts is an instance of this kind of argument. It has two parts. First, it observes that if concept acquisition is statistical (see below) then the hypothesis space must have some version of the acquired concept as a value in that space (i.e. it relies on the following truism: if acquiring concept C involves statistically tracking Cish patterns then the tracker (i.e. a mind) must come pre-specified with Cish possibilities). Second, it argues that there is no way of defining (most of) our lexical concepts from a small set of lexical primitives. Thus, the Cish possibilities must be coded in the hypothesis space as Cish as such and not congeries of non-Cish primitives in combination. Put the two assumptions together and one gets that ‘carburetor’ must be an innate concept (specified as such in the hypothesis space).[4] This form of argument can be deployed anywhere and where it succeeds it serves to challenge the important Eish conception of a relatively small/sparse sensory/perceptual and/or domain general set of primitive features underlying our cognitive capacities.

Third one can challenge the idea that acquisition/development is “rational.”[5] [6] This is perhaps the most ambitious anti E argument for it is an assumption long shared by Rs as well.[7] One can see the roots of this kind of criticism in the distinction between triggering stimuli and formative stimuli. Rs have long argued that the relation between environmental input and cognitive output is less a matter of induction than a form of triggering (more akin to transduction than induction). Understanding ‘trigger’ as ‘hair trigger’ provides a way of challenging the idea that acquisition is a matter of induction at all. A paradigm example of triggering is one trial learning (OTL).[8]

To the degree that OTL exists, it argues against the idea that acquisition is “rational” in any reasonable sense. Minds do not carefully organize and smooth the incoming data and minds do not incrementally evaluate all possible hypotheses against the data so organized in a deliberate manner. If OTL is the predominant way that basic cognitive competence arises, it would argue for re-conceptualizing acquisition as “growth” rather than “learning,” as Chomsky has often suggested. Growth is no less responsive to environmental inputs than learning is, but the responsiveness is not “rational” just brute causal. It is an empirical question, albeit a very subtle one, whether acquisition/development is best modeled as a rational or a brute causal process. IMO, we (indeed I!) have been to quick to assume that induction (aka: learning) is really the only game in town.[9]

Let me end: there is a tendency in academic/intellectual life to split the difference between opposing views and to find compromise positions where we conclude that both sides were right to some degree. You can see this sentiment at work in the Xu abstract above. In interest of honesty, I confess to having been seduced by this kind of sort of gentle compromise myself (though many of you might find this hard to believe). This live and let live policy enhances collegiality and makes everyone feel that their work is valued and hence, valuable (in virtue of being at least somewhat right). I think that this is a mistake. This attitude serves to blur valuable conceptual distinctions, one’s that have far reaching intellectual implications. Rather than bleaching them of difference, we should enhance the opposing E/R conceptions precisely so that we can better use them to investigate mental phenomena. Though, there is nothing wrong with being wrong (and work that is deeply wrong can be very valuable), there is a lot wrong with being namby-pamby. The E/R opposition presents two very different conceptions of how minds/brains work. Maybe the right story will involve taking a little from column E and a little from column R. But right now, I think that enhancing the E/R distinctions and investigating the pure cases is far more productive. At the very least it serves to flush out empirically substantive assumptions that are presupposed rather than asserted and defended. So, from now on, no more Mr. Nice Guy! [10]

[1] Actually, whether we find this in scientific practice is a topic of pretty extensive debate.

[2] There is a decision procedure required as well (e.g. maximize expected utility), but I leave this aside.

[3] Let me repeat something that I have reiterated before: there is nothing in Bayes per se that eschews pretty abstract and domain specific information in the hypothesis space, though as a matter of fact such has been avoided (in this respect, it’s a repeat of the old connectionism discussions). One can be a R-Bayesian, though this does not seem to be a favored combo. Such a creature would have Rish features in the hypothesis space and/or domain specific weightings of features. So as far as features go, Bayesians need not be Es. However, there may still be an R objection to framing the questions Bayes-wise, as I discuss below.

[4] I discuss Jerry’s argument more fully here and here. Note that it is important to remember that this is an argument about the innately given hypothesis space, not about belief fixation. Indeed, Jerry’s original argument assumed that belief fixation was inductive. So, the concept CARBURETOR may be innate but fixing the lexical tag ‘carburetor’ onto the concept CARBURETOR was taken to be inductive.

[5] Fodor’s original argument did not do this. However, he did consider this in his later work, especially in LOT2.

[6] I am restricting discussion here to acquisition/development. None of what I say below need extend to how acquired knowledge is deployed on line in real time, in, e.g. parsing. Thus, for example, it is possible that humans are Bayesian parsers without their being Bayesian learners. Of course, I am not saying that they are, just that this is a possible position.

[7] E.g. the old concept of children as “little linguists” falls into this fold.

[8] See here for some discussion and further links.

[9] There are other ways of arguing against the rationality assumption. Thus, one can argue that full rationality is impossible to achieve as it is computationally unattainable (viz. the relevant computation is intractable). This is a standard critique of Bayesian models, for example. The fall back position is some version of “bounded” rationality. Of course, the tighter the bounds the less “rational” the process. Indeed, in discussions of bounded rationality all the interesting action comes in specifying the bounds for if these are very narrow, the explanatory load shifts from the rational procedures to the non-rational bounds. Economists are currently fighting this out under the rubric of “rational expectations.” One can try to render the irrational rational by going Darwinian; the bounds themselves “make sense” in a larger optimizing context. Here too brute causal forces (e.g. Evo-Devo, natural physical costraints) can be opposed to maximizing selection procedures. Suffice it to say, the E/R debate runs deep and wide. Good! That’s what makes it interesting.

[10] For the record, lots of the thoughts outlined above have been prompted by discussions with Paul Pietroski. I am not sure if he wishes to be associated with what I have written here. I suspect that my discussion may seem to him too compromising and mild.