Comments

Showing posts with label Gallistel and Matzel. Show all posts
Showing posts with label Gallistel and Matzel. Show all posts

Wednesday, January 10, 2018

An update on Empiricism vs Rationalism debates

All papers involve at least two levels of discourse. The first comprises the thesis: what is being argued? What evidence is there for what is being argued? What’s the point of what is being argued? What follows from what is being argued? The second is less a matter of content than of style: how does the paper say what it says? This latter dimension most often reveals the (often more obscure) political dimensions of the discipline, and relates to topics like the following: What parts of one’s butt is it necessary to cover so as to avoid annoying the powers that be? Which powers is it necessary to flatter in order to get a hearing or to avoid the ruinous “high standards of scrutiny” that can always be deployed to block publication? Whose work must get cited and whose can be safely ignored? What are the parameters of the discipline’s Overton Window? If you’ve ever been lucky enough to promote an unfashionable position, you have developed a sensitivity to these “howish” concerns. And if you have not, you should. They are very important. Nothing reveals the core shibboleths (and concomitant power structures) of a discipline more usefully than a forensic inquiry into the nature of the eggshells upon which a paper is treading.  In what follows I would like to do a little eggshell inspection in discussing a pretty good paper from a (for me) unexpected source (a bunch of Bayesians). The authors are Lake, Ullman, Tenenbaum and Gersham (LUTG) all from the BCS group at MIT. The paper (here) is about the next steps forward for an AI that aspires to be cognitively interesting (like the AI of old), and maybe even technologically cutting edge, (though LUTG makes this second point very gingerly).

But that is not all the paper is about. LUTG is also a useful addition to the discussions about Empiricism (E) vs Rationalism (R) in the study of mind, though LUTG does not put matters in quite this way (IMO, for quasi-political reasons, recall “howish” concerns!). To locate the LUTG on this axis will require some work on my part. Here goes.

As I’ve noted before, E and R have both a metaphysical and an epistemological side. Epistemologically, E takes minds to be initially (relatively) unstructured, with mental contour forged over time as a by-product of environmental input as sampled by the senses.  Minds on this view come to represent their environments more or less by copying their structures as manifest in the sensory input.  Successful minds are ones that effectively (faithfully and efficiently) copy the sensory input. As Gallistel & Matzel put it, for Es minds are “recapitulative,” their function being to reliably take “an input that is part of the training input, or similar to it, [to] evoke[s] the trained output, or an output similar to it” (see here for discussion and links). Another way of putting this point is that E minds are pattern matchers, whose job is to track the patternings evident in the input (see here). Pattern matching is recapitulative and E relies on the idea that cognition amounts to tracking the patternings in the sensory input to yield reliable patterns.

Coupled with this E theory of mind comes a metaphysics. Reality has a relatively flat causal structure. There is no rich hierarchy of causal mechanisms whose complex interactions lie behind what we experience. Rather, what you sense is all there is. Therefore, effectively tracking and cataloguing this experience and setting up the right I/O associations suffices to get at a decent representation of what exists. There is no hidden causal iceberg of which this is the sensory/perceptual tip. The I/O relations are all there is. Not surprisingly (this is what philosophers are paid for), the epistemology and the metaphysics fit snugly together.

R combines its epistemology and its metaphysics differently. Epistemologically, R regards sensory innocent minds to be highly structured. The structure is there to allow minds to use sensory/perceptual information to suss out the (largely) hidden causal structures that produce these sensations/perceptions. As Gallistel & Matzel put it R minds are more like information processing systems structured to enable minds to extract information about the unperceived causal structures of the environment that generate the observed patterns of sensation/perception. On the R view, minds sample the available sensory input to construct causal models of the world which generate the perceived sensory patterns. And in order to do this, minds come richly stocked with the cognitive wherewithal necessary to build such models. R epistemology takes it for granted that what one perceives vastly underdetermines what there is and takes it to be impossible to generate models of what there is from sensory perception without a big boost from given (i.e. unlearned) mental structures that make possible the relevant induction to the underlying causal mechanisms/models.

R metaphysics complements this epistemological view by assuming that the world is richly structured causally and that what we sense/perceive is a complex interaction effect of these more basic complexly interacting causal mechanisms. There are hidden powers etc. that lie behind what we have sensory access to and that in no way “resembles” (or “recapitulates”) the observables (see here for discussion).

Why do I rehearse these points yet again? Because I want to highlight a key feature of the E/R dynamic: what distinguishes E from R is not whether pattern matching is a legit mental operation (both views agree that it is (or at least, can be)). What distinguishes them is the Es think that this is all there is (and all there needs to be), while Rs reserve an important place for another kind of mental process, one that builds models of the underlying complex non visible causal systems that generate these patterns. In other words, what distinguishes E from R is the belief that there is more to mental life than tracking the statistical patterns of the inputs. R doesn’t deny that this is operative. R denies that this suffices. There needs to me more, a lot more.[1]

Given this backdrop, it is clear that GG is very much in the R tradition. The basic observation is that human linguistic facility requires knowledge a G (a set of recursive rules). Gs are not perceptually visible (though their products often are). Further, cursory inspection of natural languages indicates that human Gs are quite complex and cannot be acquired solely by induction (even sophisticate discovery procedures that allowed for inductions over inductions over inductions…). Acquiring Gs requires some mental pre-packaging (aka, UG) that enables an LAD to construct a G on the basis of simple haphazard bits of G output (aka, PLD). Once acquired, Gs can be used to execute many different kinds of linguistic behaviors, including parsing and producing novel sentences and phrases. That’s the GG conceit in a small nutshell: human linguistic facility implicates Gs which implicates UGs, both G and UG being systems of knowledge that can be put to multiple uses, including G acquisition, and the production and comprehension of an unbounded number of very different novel linguistic expressions within a given natural language. G, UG, competence vs performance: the hallmarks of a R theory of linguistic minds.

LUTG makes effectively these points but without much discussing the language case (it does at the end but mainly to say that it won’t discuss it).  LUTG’s central critical claim is that human cognition relies on more than pattern matching. There is, in addition, model building, which relies on two kinds of given mental contents to guide learning.  The first kind, what LUTG calls “core” (and an R would call “given”) involves substantive specific content (e.g. naïve physics and psychology). The second involves formal properties (e.g. compositionality and certain kinds of formal analogy (what LUTG calls “learning-to-learn”).[2] LUTG notes that these two features pre-condition further learning and are indispensible. LUTG further notes that the two mental powers (i.e. model building and pattern recognition) can likely be fruitfully combined, though it (rightly, IMO) insists that model building is the central cognitive operation. In other words, LUTG recapitulates the observations above that R theories can incorporate E mechanisms and that E mechanisms alone are insufficient to model human cognition and that without the part left out, E pattern matching models are poor fits with what we have tons of evidence to be central features of cognitive life. Here’s how the abstract puts it:

We review progress in cognitive science suggesting that truly human-like learning and thinking machines will have to reach beyond current engineering trends in both what they learn and how they learn it. Specifically, we argue that these machines should (1) build causal models of the world that support explanation and understanding, rather than merely solving pattern recognition problems; (2) ground learning in intuitive theories of physics and psychology to support and enrich the knowledge that is learned; and (3) harness compositionality and learning-to-learn to rapidly acquire and generalize knowledge to new tasks and situations. We suggest concrete challenges and promising routes toward these goals that can combine the strengths of recent neural network advances with more structured cognitive models.

The paper usefully goes over these points in some detail.  It further notes that this conception of the learning (I prefer the term “acquisition” myself) problem in cognition challenges associationist assumptions which LUTG observes (again rightly IMO) is characteristic of most work in connectionism, machine learning and deep learning. LUTG also points to ways that “model free methods” (aka pattern matching algorithms) might usefully supplement the model building cognitive basics to improve performance and implementation of model based knowledge (see 1.2 and 4.3.2).[3]

Section 3 is the heart of the critique of contemporary AI, which largely ignores the (R) model building ethos that LUTG champions. As it observes, the main fact about humans when compared with current AI models is that humans “learn a lot more from a lot less” (6). How much more? Well, human learning is very flexible and rarely tied to the specifics of the learning situation. LUTG provides a nice discussion of this in the domain of learning written characters. Further, human learning generally requires very little “input.” Often, a couple of minutes or exposures to the relevant stimulus is more than enough. In fact, in contrast to current machine learning or DL systems humans do not need to have the input curated and organized (or, as Geoff Hinton recently put it (here): humans, as opposed to DLers, “clearly don’t need all the labeled data.”). And why not? Because unlike what DLers and connectionists and associationists have forever assumed, human minds (and hence brains) are not E devices but R ones. Or as LUTG puts it (9):

People never start completely from scratch, or even close to “from scratch,” and that is the secret to their success. The challenge of building models of human learning and thinking then becomes: How do we bring to bear rich prior knowledge to learn new tasks and solve new problems so quickly? What form does that prior knowledge take, and how is it constructed, from some combination of inbuilt capacities and previous experience?

In short, the blank tablet assumption endemic to E conceptions of mind is fundamentally misguided and a (the?) central problem of cognitive psychology is to figure out what is given and how this given stuff is used. Amen (and about time)!
Section 4 goes over what LUTG takes to be some core powers of human minds. This includes naïve theories of how the physical world functions and how animate agents operate. In addition, with a nod to Fodor, it outlines the critical role of compositionality in allowing for human cognitive productivity.[4] It is a nice discussion and makes useful comments on causal models and their relation to generative ones (section 4.2.2).
Now let’s note a few eggshells. A central recurring feature of the LUTG discussion is the observation that it is unclear how or whether current DL approaches might integrate these necessary mechanisms. The paper does not come right out and say that DL models will have a hard time with these without radically changing its sub-symbolic associationist pattern matching monomania, but it strongly suggests this. Here’s a taste of this recurring theme (and please note the R overtones).

As in the case of intuitive physics, the success that generic networks will have in capturing intuitive psychological reasoning will depend in part on the representations humans use. Although deep networks have not yet been applied to scenarios involving theory of mind and intuitive psychology, they could probably learn visual cues, heuristics, and summary statistics of a scene that happens to involve agents. If that is all that underlies human psychological reasoning, a data-driven deep learning approach can likely find success in this domain.
However, it seems to us that any full formal account of intuitive psychological reasoning needs to include representations of agency, goals, efficiency, and reciprocal relations. As with objects and forces, it is unclear whether a complete representation of these concepts (agents, goals, etc.) could emerge from deep neural networks trained in a purely predictive capacity. Similar to the intuitive physics domain, it is possible that with a tremendous number of training trajectories in a variety of scenarios, deep learning techniques could approximate the reasoning found in infancy even without learning anything about goal- directed or socially directed behavior more generally. But this is also unlikely to resemble how humans learn, understand, and apply intuitive psychology unless the concepts are genuine. In the same way that altering the setting of a scene or the target of inference in a physics-related task may be difficult to generalize without an understanding of objects, altering the setting of an agent or their goals and beliefs is difficult to reason about without understanding intuitive psychology.

Yup. Right on. But why so tentative? “It is unclear?” Nope it is very clear, and has been for about 300 years since these issues were first extensively discussed. And we all know why. Because concepts like “agent,” “goal,” “cause,” “force,” are not observables and not reducible to observables. So, if they play central roles in our cognitive models then pattern matching algorithms won’t suffice. But as this is all that E DL systems countenance, then DL is not enough. This is the correct point, and LUTG makes it. But note the hesitancy with which it does so. Is it too much to think that this likely reflects the issue mooted at the outset about the implicit politics that one’s dance over the eggshells reveals.
It is not hard to see from how LUTG makes its very reasonable case that it is a bit nervous about DL (the current star of AI). LUTG is rhetorically covering its posterior while (correctly) noting that unreconstructed DL will never make the grade. The same wariness makes it impossible for LUTG to acknowledge its great debt to R predecessors.[5] As LUTG states, its “goal is to build on their [neural networks, NH] successes rather than dwell on their shortcomings” (2). But those that always look forward and never back won’t move forward particularly well either (think Obama and the financial crisis). Understanding that E is deeply inadequate is a prerequisite for moving forward. It is no service to be mealy-mouthed about this. One does not finesse one’s way around veery influential bad ideas.
Ok, so I have a few reservations about how LUTG makes its basic points. That said, this is a very useful paper. It is nice to see this coming out of the very influential Bayesian group at MIT and in a prominent place like B&BS. I am hoping that it indicates that the pendulum is swinging away from E and towards a more reasonable R conception of minds. As I’ve noted the analogies with standard GG practice is hard to miss. In addition LUTG rightly points to the shortcomings with connectionist/deep learning/neural net approaches to mental life. This is good. It may not be news to many of us, but if this signals a return to R conceptions of mind, it is a very positive step in the right direction.[6]


[1] R often goes farther: that even tracking the relevant perceptual regularities requires lots of given mental baggage. The world does not come pre-labeled. So zeroing in on the relevant dimensions for inductive generalization itself requires lots of pre-packaged “knowledge.” Geoff Hinton (the daddy of deep learning) seems to have come to a similar view of late concerning how hard it is to get things off the ground without curated data. See below for a reference.
[2] GGers should find this reminiscent of Chomsky’s discussion in Aspects of substantive and formal universals and how these interact to create Gs (models) of the ambient degenerate and deficient linguistic input available to the child (PLD).
[3] Or to put this in GG friendly terms; LUTG resurrects something very like a competence/performance distinction, model building being analogous to the former and model free methods applied to these models being analogous to the latter. An idea I found interesting is that given models of competence provides a useful domain for performance models wherein sophisticated pattern matching algorithms do their work. Conceptually, this idea seems very similar to what Charles Yang has been advocating as well (see here).
[4] Actually, though compositionality is critical to productivity, it does not suffice for generating an “infinite number of thoughts” or using an “infinite number of sentences”  “from a finite set if primitives” (14). For this we need more than compositionality, we need recursion as well.
[5] It is amazing, IMO, how small a role Chomsky plays in LUTG’s discussion. So far as I can tell, all of its major points were developed in the modern period by him. I am pretty sure that some of the authors know this but that highlighting this fact would hurt the paper politically by offending the relevant leading DL lights.
[6] BTW, LUTG has a nice discussion of the biological “plausibility” of neural net models of the brain. The short point is that short of being pictorially suggestive, there is no real reason for thinking that brains are like connectionist nets. As LUTG puts it (20):

Many seemingly well-accepted ideas concerning neural computation are in fact biologically dubious, or uncertain at best…

For example, most neural networks use some form of gradient based (e.g. back propogation) or Hebbian learning. It has long been argued, however, that
backpropagation is not biologically plausible. As Crick (1989) famously pointed out, backpropagation seems to rquire that information be transmitted backward along the axon, which does not fit with realistic models of neuronal function…

As LUTG observes, this should not in and of itself stop people from investigation neural nets as possible models of brain computation, but it should put an end to the prejudice that brains are nets because they look net-like. Sheesh! 

Tuesday, January 19, 2016

Cogneuro cross training; the Gallistel method

I am getting ready to fly to the Netherlands where I am going to defend Generative Grammar’s (GG) neuro-cognitive relevance. The venue? David Poeppel has been invited to give three lectures on brain and language (see here (neat poster btw)) and I have been invited to comment on the third, Peter Hagoort being the other discussant. The lectures are actually billed as “neurobiological provocations” and David thought that I fell snugly within the extension of the nominal predicate. Given Peter Hagoort’s published views (see here for link and discussion) about GG and his opinion that it has lost its scientific mojo, I suspect (and hope) that the exchange will be lively. The position that I will argue for is pretty simple. Here are the main points:

1.     The following two central claims of GG are, conceptually, near truisms (though, sadly, not recognized as such):
a.     Grammars (G) are real mental objects
b.     FL exists and has some linguistically proprietary structure
2.     At least one defining feature of FL/UG is that it licenses the construction of Gs which generate objects with unbounded hierarchical complexity (aka: recursion).
3.     Most versions of GG identify (more or less) the same kinds of G objects and invoke the same kinds of G principles and operations.
4.     Contrary to received wisdom, GG does not change its theoretical character every 15 minutes. In fact, the history of theory change in GG has been very conservative with later theoretical innovations retaining most of the insights and generalizations of the prior stage.
5.     It’s a big mistake to confuse Greenberg vs Chomsky universals.
6.     Linguistic data is, methodologically speaking, almost as pure as the driven snow (Yay for Sprouse, Almeida, Schutze, Phillips and a host of others). There is nothing wrong with more careful vetting of the data except that most of the time it’s a pointless expenditure of effort (i.e. little marginal return in insight for the extra time and money) whose main objective seems to be to make things look “sciency” (as in “truthy”).
7.     The autonomy of syntax thesis does not mean that GG eschews semantics. It is in fact, another, almost certainly, truistic claim about the structure of human Gs (viz. that syntactic structure is not reducible to phonetic or semantic or informational or communicative structure).
8.     GG recognizes that there is G variation and has things to say about it.
9.     Studying linguistic communication is certain to be much harder than studying G competence precisely because the former presupposes some conception of the latter. Gs have a hope of being natural kinds whereas communication is certainly a massive interaction effect and hence will be very hard to study. Disentangling interaction effects is a real pain, and not only in the cog-neuro of language!

That’s what I will be saying, and given Hagoort’s diametrically opposite views on many of these matters, the discussion should be (ahem) lively. However, I have promised to be on my best behavior and given that I believe it to be very important for linguistics for cog-neuro to appreciate how much GG has to offer I am going to play as nicely as I know how, all the while defending a Rationalist conception of FL/UG and the Gs that FL generates.

I admit that I have been training a bit for this event. My preparation has mainly consisted of re-reading a bunch of papers, and aside from getting me re-focused on some relevant issues, this has also allowed me to re-engage with some really terrific stuff. I want to especially mention a terrific paper by Randy Gallistel and Louis Matzel (G&M) (here). I have noted it in other posts, but I don’t think that I ever discussed it in any detail. I want to somewhat rectify that oversight here.

IMO, the paper is indispensible for anyone interested in why neuroscience and cognition have mixed about as well as oil and water. What makes G&M so good? It argues that the problems stems from the deep-seated Empiricism of contemporary neuroscience. This Empiricist bias has largely prevented neuroscience from even asking the right kinds of questions, let alone providing real insights into how brains embody cognition. A commitment to an Empiricist Associationist psychology has blinded neuroscience from interesting questions. Moreover, and this is what is most interesting in G&M, Empiricist blinders have prevented neuroscience from noticing that there is little cognitive evidence in favor of its pet theory of the brain and no neuro evidence for it either. This, G&M argues, has been obscured by a kind of unfortunate intellectual two step: psychologists believe that some of the best evidence for Associationsim comes from neuroscience and neuroscience thinks that some of the best evidence for it comes from psychology. In other words, there is a reinforcing delusion in which associationist learning and synaptic plasticity take in one another’s dirty laundry and without doing any vigorous washing or even mild rinsing conclude that the two dirty piles are together crisp and clean. G&M argues that this is fundamentally wrong-headed. Here are the basics of the argument.

G&M identifies two broad approaches to learning and memory. The first is the complex of associationism plus neural nets with Hebbian synapses (ANN) (“what fires together wires together”):

In the associative conceptual framework, the mechanism of learn-
ing cannot be separated from the mechanism of memory expression. At the psychological level of analysis, learning is the formation of associations, and memory is the translation of that association into a behavioral change. At the neuroscientific level of analysis, learning is the rewiring of a plastic nervous system by experience, and memory resides in the changed wiring. (170)

This contrasts with an information processing (IP) approach:

[In] the information-processing perspective, learning and memory are distinct mechanisms with different functions: Learning mechanisms extract potentially useful information from experience, while memory carries the acquired
information forward in time in a computationally accessible form that is acted upon by the animal at the time of retrieval (Gallistel & King 2009). (170)

G&M notes another critical difference between the two approaches:

The distinction between the associative and information-processing frameworks is of critical importance: By the first view, what is learned is a mapping from inputs to outputs. Thus, the learned behavior (of the animal or the network, as the case may be) is always recapitulative of the input-output conditions during learning:
An input that is part of the training input, or similar to it, evokes the trained output, or an output similar to it. By the second view, what is learned is a representation of important aspects of the experienced world. This representation
supports input-output mappings that are in no way recapitulations of the mappings (if any) that occurred during the learning. (170)

It is not, then, an accident that so many Associationists have a problem distinguishing a model of the data from a model of the system that generates the data. For an ANN modeling the data is modeling the system, as the latter is just a way of modeling the I/O relations in the data. The brain for ANNers captures the generalizations in the data and more or less faithfully encodes these. Not so for an IPer.

And this leads to a host of other important differences. Here’s two that G&M makes much of:

1.     ANN approaches eschew “representations” and, as such, are domain general
2.     IP theories are closely tied to the computational theory of mind and this “leads to the postulation of domain-specific learning mechanisms because no general-purpose computation could serve the demands of all types of learning” (175).

Thus representations, computation and domain specificity are a natural triad and forsaking one leads naturally to rejecting all. There really is little middle ground, which is precisely why the Empiricism/ Rationalism divide is so deep and consequential.

However, interesting though this discussion is, it is not what I wanted to focus on. For me, the most interesting feature of G&M is its critique of Hebbianism (i.e. the fire-wire pairing of synapses), the purported neural mechanism behind ANN views of learning.

The main process behind the Hebbian synapse is a process known as “long term potentiation” (LTP). This is the process wherein inputs modify transmission between synapses (e.g. increase amplitude and/or shorten latencies) and this modification is taken to causally subvene associative learning. In other words, association is the psychology of choice because synapses reorganize LTPishly thereby closely tracking the laws of association (i.e. “properties of LTP aligned closely with those of the associative learning process as revealed by behavioral experimentation” (171)).

This putative relation between the association and LTP biology has been one of the main arguments for ANN. Thus connectionist neural net models not only look “brainy”, they actually work like brains do! Except, as G&M shows, they don’t really, as “the alignment” between LTP mechanisms and association measured behaviorally “is poor” (171).

How poor is the fit? Well G&M argues that the details of the LTP process lines up very badly the associationist ones over a host of dimensions. For example, associationist and LTP time scales are vastly different, a few milliseconds for LTP versus (up to) hours for associations. Moreover, whereas LTP formation cares about inter-stimulus intervals (how close the relevant effects are in time to one another) associations don’t. They care about ratios of conditioned and unconditioned stimuli pairs (i.e. the CS-US ratio being smaller than the US-US ratio). In sum, as regards “timing,” LTP growth and behavioral association formation are very different.

Moreover, as G&M notes, this is widely recognized to be the case (172) but despite this the conclusion that association laws supervene on LTP biology is not called into question. So, the disconnect is acknowledged but no consequences for ANN are drawn (see p 172-3 for discussion). Defenders of the link rightly observe that the fact that LTP and association formation don’t track one another does not imply that they are not intimately linked. True, but irrelevant. The issue is not whether the two might be related but whether they indeed are and the fact that they don’t track one another means that the virtues of either cannot rebound to the benefit of the other. In fact, as I note anon, things are worse than this.

But first, here are some other important differences G&M discusses:

·      “Behaviorally measured associations can last indefinitely, whereas LTP always decays and usually does so rapidly” (172).
·      “Both forgotten and extinguished conditioned responses exhibit facilitated reacquisition; that is, they are relearned more efficiently than when they were initially acquired” whereas “following its decay to baseline LTP is neither more easily induced nor more persistent than it was after previous inductions” (172).

G&M provide a handy table (174) enumerating the ways that LTP and associations fail to track one another. Suffice it to say, that the two mechanisms seem to be very different and how LTP biology is supposed to support associations is a mystery. And I mean this literally.

Rather than draw the problematic conclusion that there is little evidence that Hebbian synapses can causally undergird associative learning, ANNers appeal to emergent properties of the network rather than LTP operations at the synapses to provide the neural glue for associationsim (see p.173). As G&M note, this is mysterianism, not science.  And though I sympathize with the view that we may never understand how the brain does what it does, I don’t consider this a breakthrough in neuroscience. The following is a funny kind of scientific argument to make in favor of ANN: though the main causal mechanisms for learning are association via Hebbian synapses we cannot understand this at the micro level of associations and synapses but only at the macro level of whole brains. Brains are where the action is, in virtue of the structures of their synapses but how the synapses do this will be forever shrouded in mystery. So much for the virtues of analysis. I love it when hard-headed Empiricism saves itself by flights of mystical fancy.

There is a second line of argument. G&M shows that classical associationist effects require the calculation of intervals (numbers coding for duration) and that Hebbian based neural nets can’t code this kind of info in a usable form (172). As G&M puts it:

“[T]he mechanism that mediates associative learning and memory must be able to encode the intervals between events in a computationally accessible form. There is no hypothesis as to how this could be accomplished through the modification of synaptic transmission” (172).

So, the temporal properties don’t fit together and the basic facts about classical conditioning invoke information that cannot be coded in a set of synapses in terms of LTP. It appears that there really is no there there. What we are left with are arguments from pictograms: ANN stories make for nice pictures (i.e. neural nets look so brainy and synapsy and connectionist nets “learn” so well!) but as the putative fit between neural mechanism and behavioral pattern is very poor (as G&M shows and, it seems, is largely conceded) there is no good biological reason for holding onto associations and no good psychological reason for embracing neural nets. Time to move on.

The rest of G&M outlines what kinds of stories we should expect from an IP cog-neuroscience. Surprise surprise: we are waist deep in representations from the get-go. We hare awash with domain specific computations and mechanisms. In other words, we get what looks to be a Rationalist conception of the mind/brain, or as G&M puts it, “a materialist form of Kantian rationalism” (193). A place, I would add, where GG of the Chomsky variety should feel very much at home. In other words, the problems that neuroscience claims to have with GG is more indicative of problems with the current state of the brain sciences than problems with GG. GG is not perfect (ok, I am kidding, it is) but there is little reason to believe that what we know about the brain (which is quite limited) precludes accounts of the GG variety, contrary to ANN doctrine.

Conclusion? Time to assign ANN to the trash bin of ideas that looked nice but were entirely off track. The sooner we do this, the sooner we can start addressing the serious problems of relating minds to brains G&M list a bunch on p. 175. We have a long way to go. And maybe I can make this point in Nijmegen too, but nicely, very nicely.