Tuesday, February 26, 2013

Fodor on Concepts Again

There have been two kinds of objections to Fodor’s argument in the thread of ‘Fodor on Concepts.’ Neither of them really engage with his arguments. Let me quickly review them.

The first comes in two flavors. The first is that if he is right that implies that the concept ‘carburetor’ is innate and this is batty. The second is a variant of this, but invokes Darwin and the holy spirit of biological plausibility, to make the same point: that putting ‘carburetor’ concepts into brains is nuts. 

Let’s say we agree with this.  This implies that there is something wrong with Fodor’s argument. After all it has led to an unwanted conclusion.  If so, what is wrong? Treat Fodor as the new Zeno and let’s clarify the roots of the paradox. This would be valuable and we could all learn something. After all, the argument rests on two simple premises: that if learning is a species of induction then there must be a given hypothesis space and that this hypothesis space cannot itself be learned for it is a precondition of induction, hence not learned, viz. innate. The second premise is that there are a lot more primitive concepts in the hypothesis space than one might have a priori imagined. In particular, if you assume that words denote concepts then given the absence of decomposition for most words, then there are at least as many concepts as words. Let’s consider these premises again.

The first is virtually apodictic. Why? Because all inductive theories to date have the following form: of the alternatives a1…an, choose the ai that best fits the data. The alternatives are given, the data just prunes them down to the ones that are good matches with the input.  If this is so, Fodor notes, then in the context of concept acquisition, this means that a1…an are innate in the sense of not there as a result of inductive processes, but available so that induction can occur.  Like I said, this is not really debatable unless you come up with a novel conceptualization of induction.

The second premise is empirical. It is logically possible that most of our concepts are complex combinations of simple ones, i.e. that most concepts are defined in terms of more primitive ones.  Were this so, then concept acquisition would be definition formation. Fodor has argued at great length that this is empirically false, at least if you take words to denote concepts. English words do not by and large resolve into definitions based on simpler concepts.  Again, this conclusion is not that surprising given the work of Austin and the other ordinary language philosophers.  They spent years showing that no two words mean the same thing. The failure of the Fodor-Katz theory of semantic markers pointed to the same conclusion as did the failure of cluster concepts to offer any enlightenment to word/concept meaning.  If most words are definitions based on simpler concepts nobody has really shown how they are.  Note that this does not mean that concepts fail to interrelate. It is consistent with this view that there are scads of implicational relations between concepts. Fodor is happy with meaning postulates, but they won’t suffice. We need definitions for only in this way can we get rid of what I would dub the “grandmother problem.” What is that?

How are you able to recognize your grandmother. One popular neuroscience theory is that your grandmother neuron lights up.  Every “concept” has its own dedicated neuron. This would be false, however, if the concept of your grandmother were defined via other concepts. There wouldn’t have to be dedicated grandmother neurons for the concept ‘grandmother’ would be entirely reducible to the combination of other concepts. However, this is only true if the concept is entirely reducible to the other primitive concepts and only a definition achieves this. So, either most concepts are definable or we must accept that the set of basic concepts is at least as large as any given lexicon, i.e. the concept for ‘carburetor’ is part of the innate hypothesis space.

I sympathize with those who find this conclusion counter-intuitive. However, I have long had problems getting around the argument. The second premise is clearly the weaker link. However, given that we know how to show it to be false, viz. provide a bunch of definitions for a reasonable subset of words, and that fact that this has proven pretty hard to do, it is hard to avoid the conclusion that Fodor is onto something.

Jan Koster has suggested a second way around Fodor’s argument, but it is not one that I understand very well. He suggests that the hypothesis space is itself context sensitive, allowing it to be sensitive to environmental input. Here are two (perhaps confused) reactions: (i) in any given context, the space is fixed and so we reduce to Fodor’s original case.  I assume that we don’t fix the values of these contextual indices inductively. Rather, there are a given set of context parameters which when fixed by context specify non-parametric values. Fixing these parameters contextually is itself brute causal, not inductive. If this is so, I don’t see how Jan’s proposal addresses Fodor’s argument.  (ii) As Alex Drummond (p.c.) observed: “It sure seems like we don’t want context to have too much of an influence on the hypothesis space, because it would make learning via hypothesis testing a bit tricky if you couldn't test the same hypothesis at different times in different situations.” He is right. Too much context sensitivity and you could never step into the same conceptual stream twice. Not a good thing if you are trying to acquire novel concepts via different environmental exposures.

Fodor has a pretty argument. If it’s false, it’s not trivially false. That’s what makes it interesting, very interesting.  Your job, Mr Hunt, should decide to accept it, is to show where it derails.

Monday, February 25, 2013

Fodor on Concepts

There has been a bit of a kerfuffle in the thread to What’s Chomsky Thinking Now concerning Fodor’s claim that all of our concepts are innate.  Unfortunately, with the exception of Alex Drummond, most who have participated in the discussion appear unacquainted with Fodor’s argument.  It has two parts, both of which are interesting. To help focus the indignation of his critics, I will outline them below as a public service.  Before starting however, let me share with you my own personal rule of thumb in these matters, one that I learned from Kuhn’s discussions of Aristotelian physics: when someone very smart says something that you think is obviously dumb then go back and reread it for it may be that you have thoroughly misunderstood the point. I am acquainted with Jerry Fodor. Let me assure you he is very smart. So let’s start.

As noted Fodor has a two pronged argument.  The first part (an excellent very short form of which can be found here 143ff) is an observation about learning as a form of inductive logic. Fodor distinguishes between theories of concept acquisition and theories of belief fixation. The latter is what theories of learning are about. Learning theories have nothing general to say about concept acquisition because, being inductive theories, they presuppose the availability of a set of basic concepts without which the inductive learning mechanism cannot get off the ground.  If this all sounds implausible, considering Fodor’s example will make his intent clear.

Consider someone learning a new word miv in a classical learning context.  The subject is shown cards, some of which are miv and some non-miv. The subject is given a cookie whenever s/he correctly identifies the miv cards and is hit by a bolt of lightening when s/he fails (we want the reinforcement here to be very unambiguous).  What does the subject do, according to any classical learning theory, s/he considers a hypothesis of the form “X is miv iff X is…”, the blank being filled in with a specification of the features that are criterial for being a miv.  The data is then used to assess the truth of the hypotheses with various values of “…”.  So if miv means “red and round” then the data will tend to confirm  “X is miv iff X is red and round” and disconfirm everything else. This much Fodor takes to be obvious. If learning is a form of inductive inference (and, as he notes, there is no other theory of learning), then it takes the indicated form. 

Fodor then asks where do the hypotheses that are tested come from? In other words, where do the fillers of “…” come from?  They are GIVEN. Inductive theories presuppose that the set of alternatives that the data filter are provided up front. Given a hypothesis space, the data (environmental input) can be used to assign a number (a probability) of how well that hypothesis fits the data.  What inductive theories don’t do is provide the hypothesis space.  Another way of making the same point is that what inductive logics (i.e. learning theories) do is explain how given some input the user of that logic should/does navigate the hypothesis space: where’s the best place to be given that the data has been such and so.  However, if this is what inductive logics do (and, I cannot repeat this enough, all learning theories are species of inductive logics), then the field of concepts used by the inductive logic cannot themselves be fixed by the inductive logic.  Or as Fodor puts it (147):

You have to be nativistic about the conceptual resources of the organism because the inductive theory of learning simply doesn’t tell you anything about that – it presupposes it – and the inductive theory of learning is the only one we’ve got.

So, Fodor’s argument amounts to pointing out what everybody should be nodding in agreement with: no induction without a hypothesis space.  If the inductive theory is a theory of learning, then the hypothesis space must be innate and that means that all the concepts used to define it must be innate as well.  As I said, this part of the argument is apodictic, cannot be gainsaid and, in fact, never has been. Even Quine, a rather extreme associationist, agreed that everyone is a nativist to some degree for without some nativism (enough to define the hypothesis space) there can be no induction and hence no learning. Fodor’s point is to emphasize this point and use it against theories that suggest that one can bootstrap one’s way form less conceptually complex systems of “knowledge” to more complex ones.  If this means that one can expand one’s hypothesis space by learning and ‘learning’ means induction then this is impossible.[1] 

None of this should be surprising or controversial. Controversy arises with respect to Fodor’s second prong of the argument. He takes the concepts words tag to be effectively atomic. Another way of making this point in the domain of language is that there is no lexical decomposition, or at least very very little. Why is this assumption so important? Because the relation between the input and the atomic features of the hypothesis space is causal, not inductive. You see a red thing and +red lights up. Pure transduction.  Induction proceeds given this first step: count how many of the lit features are red+round vs red+not-round, vs green+round etc.  So, for atomic features/concepts the relation between their “lighting up” and the environment is not one of learning (one doesn’t learn to light them up) it’s just a brute fact (they light up).  So, and this is an important point, to the degree that most of our words denote atomic concepts (i.e. to the degree that there is no lexical decomposition) to that degree there is no interesting inductive theory of concept acquisition. Note, this does not preclude their being a possibly interesting causal theory, e.g. maybe being exposed to a miv is causally responsible for triggering the concept miv or maybe being exposed to a dax is causally responsible, or maybe being exposed to a miv right after birth is or while being snuggled by your mother etc. The causal triggers might conceivably be very complex and finding them may be very difficult. However, with resepct to atomic features, one can only discover brute causal connections, not inductive ones. Fodor’s point is that we should not confuse them as they are very different. Recently Fodor has speculated that prototypes are causally implicated in causally triggering concepts, but he insists, rightly given his strong atomicity, that this relation is not inductive (See here).

To recap, the logic of the first argument is that primitive concepts cannot be “learned” as they are presupposed for learning to take place. This allows the possibility that one “learns” to combine these primitives in various ways and that’s what concept acquisition is.  Concept acquisition is just learning to form complex concepts. Fodor is conceptually happy with this possibility. It is logically possible that concept “acquisition” amounts to defining new concepts in terms of the primitive ones. As applied to words (which I am assuming denote concepts), it is logically possible that most/many words are complex definitions. Logically possible? Yes. Actually the case? No, or that’s what Fodor has been arguing for a very long time.

His arguments are almost always of the same form: someone proposes some complex definition for a term and he shows that it doesn’t work.  Indeed, very few linguists, psychologists or philosopher have managed to provide any but a handful of purported definitions. ‘Bachelor’ may mean unmarried man, but as Putnam noted a long time ago, there are not many words like it.

Fodor is actually in a good position to understand this point for he along with Katz once investigated reducing meanings to feature trees. David Lewis derided this “markerese” approach to semantics (another instance of be careful what you hurl as it may boomerang back at you (see Paul on Harman on Lewis here)), but what really killed it was the realization that virtually all words bottomed out in terminal faetures referring to the very concept that the featural semantics was intended to explicate. So, e.g. the markerese representation for ‘cat’ ended up having a terminal CAT. This clearly did not move explanation forward, as Fodor realized.

So is Fodor right about definitions?  Well, I am slightly less skeptical than he is about the virtues of decomposition, however, this said, I cannot find good examples showing him to be wrong. As the first part of his argument is unassailable, then those that don’t like the conclusion that ‘carburetor’ is innate (i.e. a primitive of our conceptual hypothesis space) had better start looking for ways of defining these words in terms of the available primitives.  If past history is any guide, they will fail.  Definitions in terms of sense data have come and (happily) gone and cluster concepts, once considered seriously, have long been abandoned. There is a little industry in linguistics working on argument structure in the Hale-Keyser (HK) framework, but, at least from where I sit, Fodor has drawn significant blood in his debates with HK aficionados. Suffice it for now to repeat, that this is where the action must be if Fodor is to be proven incorrect and the ball is clearly not in his court.  It is easy to show that he is wrong, viz. show that most/many words denote complex concepts.  How to show Fodor is wrong is easy. Showing that he is has proven to be far more challenging.[2]

So that’s the argument. The first step is clearly correct. All the action concerns the second.  One further point: there has been a lot of discussion in the thread that Fodor is advocating a nutty kind of nativism that eschews learning from the environment. As should be clear, this is simply false. If word learning is belief fixation then it can be as inductivist as you like. However, if word learning is concept acquisition then the question entirely revolves around the nature of the primitives concerning which everyone must take as innate and hence not acquired. Fodor’s bottom line is that hypothesis spaces are not acquired but presupposed and that as a matter of fact there is far less definition one might have supposed. That’s the argument; fire away!

[1] Alex Clark mentioned Sue Carey’s recent book that appeared to consider this bootstrapping possibility. Gallistel reviewed her book making effectively this point that induction/learning cannot expand a hypothesis space (here). To repeat, all that such theories show is how to most effectively navigate this space given certain data.
[2] One interesting avenue that Paul has been exploring revolves around Frege’s notion of definition.  For Frege definition changed a person’s cognitive powers. This is really interesting. Paul’s work starts from Jeff Horty’s discussion of Frege’s notion (here and considers how to extend it to theories of meaning more generally (c.f. here and here).

Friday, February 22, 2013

Acceptability and Grammaticality

Names are important for what you call something can affect how people understand it, no matter how many times you clarify your intentions. Chomsky has been the unfortunate recipient of considerable “criticism” launched against positions that he does not hold. I have discussed this elsewhere but it is worth reiterating for when it comes to criticizing his ideas, it appears, that understanding them is hardly a prerequisite.  The most insistent zombie criticism comes in frantic popular pieces that regularly announce that Universal Grammar has failed because language X doesn’t have property Y (Piraha and recursion fit this template to a T). The fact that Chomsky’s conception of Universal Grammar is not about languages but about the Faculty of Language (viz. humans have a species specific capacity to acquire languages and Universal Grammar is a specification of the mental powers on which this capacity supervenes) and that it does not require that every language exhibit the same properties seems irrelevant to these critics and the secondary market in Chomsky criticism (you know who you are!).  However, satisfying  it is to reiterate this simple but important point (and please feel free to join me in making it at every available opportunity (Tip: it makes a nice part of any wedding or bar mitzvah speech)) let me leave it aside for now and let me instead continue my program of ling-speak reform.  In earlier posts, I (and Paul) suggested that we drop ‘learning’ for the more generic ‘acquisition’ in describing what kids do and reserve the former term for a particular kind of data driven acquisition.  Here I want to consider the use of another term: ‘grammaticality’ as used to describe speakers’ judgments, as in ‘sentence (1a) is grammatical and sentence (1b) is ungrammatical.’

(1)  a. John likes Mary
b. *John like Mary

My proposal is that we swap ‘acceptable’ for ‘grammatical,’ that we use the latter as a predicate of analyses and use the former to describe the data. This terminological proposal amounts to treating ‘grammatical’ as a theoretical term, whereas ‘acceptable’ describes the empirical lay of the land.

This would have several prophylactic advantages.

First, it makes an important distinction between data and what the data is used to probe. We have reliable intuitions of unacceptability but not intuitions of what causes unacceptability. Utterances of sentences can be more or less acceptable.[1] Only sentences can be grammatical and ungrammatical. And, though unacceptability is a prima facie reason for suspecting ungrammaticality, ungrammaticality is neither necessary nor sufficient for the perception of unacceptability.  Grammaticality is a theoretical notion, acceptability an observational one.

So acceptability is a predicate of our linguistic data.  At bottom, a big part of that data is judgments of the acceptability of an utterance of a sentence in a specified context (e.g. we ask “could you say BLAH BLAH BLAH in coxtext C to express M?”). Often, a used sentence token can be judged unacceptable without much specification of context of use (e.g. island violations), but often not (e.g. scope ambiguities). Thus, low acceptability can be traced to a variety of reasons, only one of which concerns the utterance’s grammatical status. Indeed, all the four possible relations between +/- acceptability (+/-A) and +/- Grammaticality (+/-G) exist.  Let’s consider some examples.

An uttered sentence can be unacceptable yet grammatical, –A/+G, the unacceptability attributed to parsing difficulty of some kind. Canonical examples include:

(2)  a. That that that Bill left Mary amused Sam is interesting is sad
b. Dogs dogs dog dog dogs dogs dog
c. The horse raced past the barn fell

(2a) is a case of self embedding (c.f. Chomsky and Miller 1963), (2c) a garden path and (2b), though it has no name is great for parties (other variants include ‘buffalo buffalo buffalo bufalo buffalo buffalo bufalo’ and ‘skunks skunks skunk skunk skunks skunk’).  We have pretty good stories about why these sentences are hard to deal with. For (2a) see Lewis and Vasisth 2005, for (2b) Barton, Berwick and Ristad 1987, and (2c) virtually anybody working on language processing.

Jon Sprouse discusses other examples of the disconnect between grammaticality and acceptability. For example, it is well known that length alone affects acceptability ratings. For example, when uttered, a long sentence like (3a) is judged superior to a short one like (3b) though every theory of grammar will treat them as equally grammatical:

(3)  a. Who did Bill see
b. Who did Frank say that Bill saw

Interestingly, the contrast goes in the other direction as well. There are sentences that sound really very good when uttered, but are nonetheless clearly ungrammatical.  Examples like (4) were first discussed, I believe, by Montalbetti, and recently Alexis Wellwood, Roumi Pancehva and Colin Philips have been investigating them as examples of grammatical illusions.

(4)  More people visited Rome last year than I did

Such sentences when uttered “sound” really good and garner high judgment ratings. In particular, if a native speaker is asked to judge their acceptability, especially if asked to do so quickly, they will rate them very high.  Ask them what the uttered sentence means (or could mean) and they are stumped. Consider (5a). They can be paraphrased as (5b).  On this model (4) should be understood as (6). But (6) is incomprehensible, true word salad.

(5)  a. More people visited Rome last year than visited Venice
b. The number of people who visited Rome last year is greater than the
number of people who visited Venice last year.

(6)  The number of people who visited Rome last year is greater than …the number that I did?/greater than the number of I that did?

You should appreciate that the incomprehensibility of (4) is quite an accomplishment. It is not actually that easy to construct sentences whose utterance are truly meaningless.  Many of our standard “bad” examples are semantically quite transparent. Thus (7a) means (7b) and (8a) has the paraphrase (8b).

(7)  a. *who did you meet a woman who loved
b. Which person is it that you met a woman who loved him

(8)  a. *John seems sleeping
b. John seems to be sleeping

Real semantic mish mash exists, (9a) is word salad and cannot be pretzled into meaning (9b) (it is a well known minimality or PIC (phase impenetrability condition) violation).

(9)  a. *John seems that it was told that Frank left
b. it seems that John was told that Frank left

Note, however, that the semantic incomprehensibility correlates with strong unacceptability. What makes examples like (4) so interesting is that they are judged quite acceptable despite their incoherence.

The other two combinations are the standard cases, +G leading to +A and –G being –A, the ones that make it reasonable to treat (un)acceptability as a leading indicator of (un)grammaticality. But, as the +G/-A and –G/+A cases show, even generally reliable symptoms can mislead.

There is one other upside to adopting this terminological distinction. When theory changes, we often reassess the import of the facts.  This is in and of itself not a bad thing. Facts don’t come marked with their significance and a fact’s import might change as theory does.  However, though what we make of facts might change, the facts themselves do not (at least not by and large).  It is useful to have neutral terminology for describing the explanada and whereas a sentence that is ungrammatical in one theoretical venue might be grammatical in another, an utterable sentence that is unacceptable given one set of theoretical assumptions does not generally become acceptable given another.  Grammaticality shifts. Unacceptability is relatively fixed. This is useful to recall when the excitement of theoretical innovation leads us to set aside heretofore central data points. There is nothing wrong in my view in setting recalcitrant facts to one side, at least temporarily, there is a lot wrong in forgetting that these facts exist.  Not confusing acceptability with grammaticality might help restrain this otherwise enticing move.

However, the real benefit of making this change is that it will remind us that imputations of grammaticality involve dipping one’s toes into the deep seas of theory (they are part of explaining) while noting differences in acceptability belong to the realm of description. Being careful not to confuse description with explanation is a worthwhile precept of methodological hygiene. Incorporating the distinction between acceptability and grammaticality into our ling-speak is a small inoculation against self-confusion.

[1] I use ‘utterance’ here in a wide sense; so reading a token of a sentence counts as an utterance, hearing one does too. Any used token counts.

Tuesday, February 19, 2013

No New Post

I and David Pesestky have been busy in the thread to "What's Chomsky Thinking Now" answering various critiques of the Generative Program. The discussion has been lively and, occasionally enlightening. If you haven't yet taken a look, you might find the discussion amusing.  Take a look, in particular at Paul Postal's comments and Dan Everett's.  I have short replies to theirs. And I am sure they will soon have replies to mine.

Wednesday, February 13, 2013

More on DNA Computing

Eric Raimy sent me this link to another paper about DNA computing.  I discussed the relevance of this sort of technology to the Gallistel-King Conjecture (G-KC) before (here). The fact that this sort of work seems to be heating up is intellectually significant. How so? Well, quite often, intellectual life follows the technological cutting edge.  There is a reason why the ‘mind as computer’ analogy became really big in the mid 50s (I’ll let you guess why).  It won’t seem at all absurd to think that brains use large biological molecules to compute with if it turns out that we can use DNA, RNA and proteins in this way.  And as this paper indicates, it seems that our ability to do so is ever increasing.

G-KC is a very bold proposal.  From where we sit right now there is every reason to think that it is wrong.  However, though I know very little about neuroscience beyond what my very smart friends and colleagues tell me, it does not seem at all obvious that conventional ways of thinking have really given us what we need (see here).  We don’t really know how spike trains carry information, we don’t know how to scale up neural nets so that they are feasible computational devices, we don’t know how to code the kinds of things that behavioral studies on both animals and humans indicate we need to explain mental capacities.

Moreover, behavioral studies provide overwhelming evidence for the claim that what we compute in a pretty conventional way; we can manipulate variables, bind them in various ways, provide a wide range of values for them, and do this very systematically. The Fodor-Pylyshyn and Marcus arguments stressing this seem to me basically correct.  We have no trouble modeling these capacities as standard programs (e.g. Sandiway Fong’s thesis does a fair job of this for a pretty interesting version of GB). Things get a lot hairier when we model these in neural nets.

Given this, the G-KC gains interest, especially when coupled with the constant improvement in the technology of DNA computing and G-K’s arguments concerning the physical limitations of standard neural net proposals. 

Are G-K right?  Who knows? But right now their idea is looking lees and less whacky. Maybe in a few years it will seem obvious. Stay tuned.