Wednesday, February 14, 2018

The significant gaps in the PoS argument

I admit it, the title was meant to lure some in with the expectation that Hornstein was about to recant and confess (at last) to the holiness (as in full of holes) of the Poverty of Stimulus argument (PoS). If you are one of these, welcome (and gotcha!). I suspect, however, that you will be disappointed for I am here to affirm once again how great an argument form the PoS actually is, and if not ‘holy’ then at least deserving of the greatest reverence. The remarks below are prompted by an observation in a recent essay by Epstein, Kitahara and Seely (EKS’s) (here p. 51, emphasis mine):

…Recognizing the gross disparity between the input and the state attained (knowledge) is the first step one must take in recognizing the fascination surrounding human linguistic capacities. The chasm (the existence of which is still controversial in linguistics, the so-called poverty of the stimulus debate) is bridged by the postulation of innate genetically determined properties (uncontroversial in biology)…

This is a shocking statement! And what makes it shocking is EKS’s completely accurate observation that many linguists, psychologists, neuroscientists, computer scientists and other language scientists still wonder whether there exists a learning problem in the domain of language at all. Yup, after more than 60 years of, IMO, pretty conclusive demonstrations of the poverty of the linguistic stimulus (and several hundred years of people exploring the logic of induction) the question of whether such a gap exists is still not settled doctrine. To my mind, the only thing that this could mean is that skeptics really do not understand what the PoS in the domain of language (or any other really) is all about. If they did, it would be uncontroversial that a significant gap (in fact, as we shall see several) exists between evidence available to fix the capacity and the capacity attained. Of course, recognizing that significant gaps exist does not by itself suffice to bridge them. However, unrecognized problems are particularly difficult to solve (you can prove this to yourself by trying to solve a couple of problems that you do not know exist right now) and so as a public service I would like to rehearse (one more time and with gusto) the various ways that the linguistic input (aka, Primary Linguistic Data (PLD)) to the language acquisition device (LAD (aka child)) underdetermines the structure of the competence attained (knowledge of G that a native speaker has). The deficiencies are severe and failure to appreciate this stems from a misconception of what G acquisition consists in. Let’s review.

There are three different kinds of gaps.

The first and most anodyne relates to the quality of the input. There are several ways that the quality might be problematic. Here are some.

1.     The input PLD is in the form of uttered bits of language. This gap adverts to the fact that there is a slip betwixt phrases/sentences (the structures that we know something about) and the lip(s that utter them). So the PLD in the ambient environment of the LAD is not “perfect.” There are mispronunciations, half thoughts badly expressed, lots of ‘hmms’ and ‘likes’ thrown in for no apparent purpose (except to irritate parents), misperceptions leading to misarticulations, cases of talking with one’s mouth full, and more. In short, the input to the LAD is not ideal. The input data are not perfect examples of the extensional range of sentences and phrases of the language.
2.     The range of input data also falls short. Thus, since forever (Newport, Gleitman and Gleitman did the leg work on this over 30 years ago (if not more)) it has been pointed out that utterances addressed to LADs are largely in imperative or interrogative form. When talking to very young kids, native speakers tend to ask a lot of questions (to which the asker already knows the answer so it is not a “real” question) and issue a log of commands (actually many of the putative questions are also rhetorically disguised commands). Kids, in contrast, are basically declarative utterers. In other words, kids don’t grow up talking motherese, though large chunks of the input has a very stylized phon/syntactic contour (at least in some populations). They don’t sound mothereesish and they eschew use of the kinds of sentences directed at them. So even at a gross level, the match between what LADs hear in their ambient environment and what they do mismatches.

So, the input is not perfect and the attained competence is an idealized version of what the LAD actually has access to. Even the Structuralists appreciated this point, knowing full well that texts of actual speech needed curation to be taken as useful evidence for anything. Anyone who has ever read a non-edited verbatim text of an interview knows that the raw uttered data can be pretty messy. As indeed are data of any kind. The inputs vary in the quality of the exemplar, some being closer approximations to the ideal than others. Or to put this another way: there is a sizeable gap between the set of sentences and the set of utterances. Thus, if we assume that LADs acquire sentential competence, there is a gap to be traversed in building the former set from the latter.

The second gap between input and capacity attained is decidedly more qualitative and significant.  It is a fact that native speakers are linguistically creative in the sense of effortlessly understanding and producing linguistic objects never before experienced. Linguistic creativity requires that what an LAD acquires on the basis of PLD is not a list of sentences/phrases previously encountered in ambient utterances but a way of generating an open ended list of acceptable sentences/phrases. In other words, what is acquired is (at least) some kind of generative procedure (rule system or G) that can recursively specify the open ended set of linguistic objects of the native speaker’s language. And herein lies the second gap: the outputs of the acquisition process is a G or set of rules while the input is products of this G or set of rules AND products of rules and rules (or sentences and the Gs that generate them) are ontologically different kinds of objects. Or to put this another way: an LAD does not “experience” Gs directly but only via their products and there is a principled gap between these products and the generative procedures that generate them. Furthermore, so far as I know nobody has ever shown how to bridge this ontological divide by, say, using conventional analytical methods. For example, so far as I know the standard substitution methods prized by the transitional probability crowd has yet to converge on the actual VP expansion rule (one that includes ate, ate a potato, ate a potato that I bout at the store, ate a potato that I bought at the store that is around the block, ate a potato that I bought at the store that is around the block near the gin joint whose owner has a red buggy which people from Detroit want to buy for a song, etc. All of these are VPs but so far that they are all VPs is not something that artificial G learners have managed to cover. Recursion really is a pain).

In fact, it is worse than this. For any finite set of data specified extensionally there are an infinite number of different functions that can generate those data. This observation goes back to Hume, and has been clear to anyone that has thought about the issue for at least the last several hundred years. Wittgenstein made this point. Goodman made it. Fodor made it. Chomsky made it. Even I, standing on the shoulders of giants) have been known to make it. It is a very old point. The data (always a finite object) cannot speak for themselves in the sense of specifying a unique function that generate those data. Data cannot bootstrap a unique function. The only way to get from data to the functions that generate them is to make concrete assumptions about the specific nature of the induction and in one way or other, this means specifying (listing them, ranking them) the class of admissible functional targets. In the absence of this, data cannot induce functions at all. As Gs or generative procedures just are functions, there is a qualitative irreducible ontological difference between the PLD and the Gs that the LAD acquires. There is no way to get from any data to any function (induce any G from any finite PLD) without specifying in some way the range of potential candidate Gs. Or, to say this another way, if the target is Gs and the input is a finitely specified list of data, there is no way of uniquely specifying the target in terms of the properties of the data list alone.

All of which leads to consideration of a third gap: the evidence required for choosing among plausible competing Gs to which native speakers converge is underdetermined by the PLD available for deciding among competing Gs. So, not only do we need some way of getting native speakers from data to functions that generate that data, but even given a list of plausible options, there exists little evidence in the PLD itself for choosing among these plausible functions/Gs. This is the problem that Chomsky (and moi) has tended to zero in on in discussions of PoS. So for example, a perfectly simple and plausible transformation would manipulate inputs in virtue of their string properties, another in virtue of their hierarchical properties. We have evidence that native speakers do the latter, not the former. Evidence for this conclusion exists in the wider linguistic data (surveyed by the linguist and including complex cases and unacceptable data) but not the Primary linguistic data the child as access to (which is typically quite simple and well formed). Thus, the fact that LADs induce along structure dependent lines is an induction to a particular G from a given list of possible Gs with no basis in the data justifying the induction. So not only do humans project and not only do they do so uniformly, they do so uniformly in the absence of any available evidence that could guide this uniform projection. There are endlessly many qualitatively different kinds of inductions that could get you from PLD to a G and native speakers project in the same way despite no evidence in the PLD constraining this projection. The gap between plausible Gs and the PLD is strongly underdetermined.

It is worth observing that this gap does not require that the set of grammatical objects be infinite (though making the argument is a whole lot easier if something like this is right). The point is that native speakers make systematic conclusions about novel data. That they conclude anything at all implies something like a rule system or generative procedure. That native speakers largely do so in the same way suggests that they are not (always) just guessing.[1] The systematic projection from the data provided in the input to novel examples implies that LADs generalize in the same way from LAD input. Thus, native speakers project the same Gs from finite sample examples of those Gs. And that is the problem: how do native speakers bridge the gap between samples of Gs and the Gs of which they are samples in the same way despite the fact that there are infinitely many ways to project from a finite samples to functions that generate those samples. Answer: the projection is natively constrained in some way (plug in your favorite theory of UG here). The divide between sample data and Gs that generate them is further exacerbated by the fact that native speakers converge on effectively the same kinds of Gs despite having precious little evidence in the PLD for making a decision among plausible Gs.

So there are three gaps: a gap between utterances and sentences, a gap between sentences/phrases and the Gs that generate them and a gap between the Gs that are projected and the evidence to choose among these Gs in the PLD the child has access to while fixing these Gs. Each gap presents its own challenges, though it is the last two that are, IMO, the most serious. Were the problem restricted to getting from exemplars to idealized examples (i.e. utterances to sentences) then the problem would be solvable by conventional statistical massaging. But that is not the problem. It is at most one problem, and not the big one. So are there chasms that must be jumped, and gaps that must be filled? Yup. And does jumping them/filling them the way we do require something like native knowledge? Yup. And will this soon be widely accepted and understood? Don’t bet on it.

Let me end with the following observation. Randy Gallistel recently gave a talk at UMD observing that it is trivial to construct PoS arguments in virtually every domain of animal learning. The problem is not limited to language or to humans. It saturates the animal navigation literature, the animal foraging literature, and the animal decision literature. The fact, as he pointed out, is that in the real world animals learn a whole lot from very little while Eish theories of learning (e.g. associationism, connectionism, etc) assume that animals learn very little from a whole lot. If this is right, then the main problem with Eish theories (which as EKS note still (sadly) dominate the mental and neural sciences and which lie behind the widespread skepticism concerning PoS as a basic fact of mental life in animals) is that the way that it frames the problem of learning has things exactly backwards. And it is for this reason that Es have a hard time spotting the gaps. The failure of Eism, in other words, is not surprising. If you misconceive the problem your solutions will be misconceived as well.

[1] The ‘always’ is a nod to interesting work by Lidz and friends that suggests that sometimes this is exactly what LADs do.

Friday, February 2, 2018

Evolang one more time

Footnotes to Plato is a blog run by a philosopher-biologist named Massimo Pigliucci (MP). It has lots of interesting material and I have personally learned a lot by reading it. Currently, MP is writing a multi part commentary on a new book on evolution Darwin’s Unfinished Symphony by Kevin Laland. It’s on the evolution of culture and its impact on the evolution of mind. It is actually a pretty good read and, unlike much of the literature that discusses mind and culture, it does not fall into the continuity thesis trap that takes what humans do to simply be a beefed up version of what other animals do. In other words, it rightly treats the human case as different in kind and asks how this difference might have arisen. I don’t agree with everything Laland proposes, but it starts with the right presuppositions (what humans have really is different) and proceeds from there (see here for some brief discussion).

MP’s latest installment of his running commentary on Laland’s book (here) addresses the evolution of language. In the chapter, Laland surveys traditional accounts for how language arose in the species. Here is the list that MP compiled:

  • To facilitate cooperative hunting.
  • As a costly ornament allowing females to assess male quality.
  • As a substitute for the grooming exhibited by other primate species.
  • To promote pair bonding.
  • To aid mother-child communication.
  • To gossip about others.
  • To expedite tool making.
  • As a tool for thought.
Laland finds these wanting and adds another contender: language evolved to teach relatives. Laland spends lots of time in previous parts of his book arguing that learning via imitation and observation is a key feature of biological minds and that this power promotes biological success of the evo relevant variety. In this context, it is pretty clear why the pedagogical role of language might find a spotlight: language looks like an excellent medium for molding minds (though parents and teachers might beg to differ regarding how efficient a method it is!). At any rate, Laland’s proposal is that language evolved for instructional purposes, rather than to make tool making easier, or gossip more salacious, or promote pair-bonding or, or, or… Of course, once language arrived on the evo scene it could have served all these other functions as well, but according to Laland that was not what set the whole language thing in motion. Nope, it arose so that one day we could have long and boring faculty meetings at pedagogical institutions like UMD.

MP’s post critically reviews Laland’s proposal and points out that it does not obviously do better on the criteria proposed than do variants of the other rejected approaches. Moreover, MP argues, all these evo scenarios share a common difficulty; because the evolution of language has happened exactly once (i.e. it is a unique evo event) it’s very hard to provide convincing evolutionary evidence of the sort typically on offer for the various alternative scenarios. Here is MP:

For me, though, what makes this chapter the least convincing of those we have read so far is that even if we grant Kevin everything he is arguing for, we are still left, at best, with an hypothetical scenario that falls far short of empirical verification. Yes, maybe language evolved so that we could efficiently teach valuable information to our relatives, and things then went on from there. Or maybe there is a clever variant of one of the other hypotheses now on the table that will be even more convincing than the present analysis. Or perhaps there is yet another scenario that simply nobody has thought up yet. We just don’t know. And to be honest I don’t think we are likely to know any time soon, if ever. Precisely because of a major stumbling block acknowledged by Laland himself: the evolution of language was a unique historical event, and unique historical events are exceedingly difficult (though not impossible) to study.

MP goes on to flag Lewontin’s skepticism regarding the availability of robust evolutionary accounts for cognitive traits given the paucity of footprints in the fossil record left by the exercise of such capacities (see here). Lewontin’s point, that MP endorses, is that it is unlikely that we will ever get enough evidence to build a compelling case for the evolution of any human cognitive trait, including (especially!) language given its biological uniqueness and the faint traces it physically leaves.

I agree with much of this, but I think that it misses the real problem with Laland’s discussion, and with the other scenarios MP catalogues. The big hole in these accounts is that they fail to specify what exactly language is. In other words, the projects fail from the start as they do not sufficiently specify the cognitive capacity whose evolution we are interested in explaining.[1] What exactly is it that has evolved? What are its key properties/characteristics? Only after specifying these does it make sense to ask how it and they arose. Sadly Laland doesn’t do this. Rather he seems to presuppose that we all know what language is and so specifying the relevant capacity of interest in some detail is unnecessary. But linguists know that this is wrong. Language is not a simple thing, but a very complex capacity and so asking how it evolved is asking how all of these complex intricacies came together in humans and only in humans. So, the real problem with Laland (and MP’s discussion) is not just that relevant data bearing on evolutionary scenarios sucks (though it does) but that most of the discussions out there fail to specify what needs explaining. Only after answering this question in some detail can the evolutionary question even be broached coherently.

Let me expand on this a bit. MP starts his comment on Laland as follows:

Despite much talk of animal communication, that’s just what other species do: communicate. Language is a very special, and highly sophisticated, type of communication. Characterized by grammar, capable of recursivity, inherently open ended. Nothing like that exists anywhere else in the animal world. Why?

Given this preamble, the thing that MP (and I assume Laland) thinks needs explaining is how a certain kind of grammar based system of communication arose, with emphasis on ‘grammar’ (after all, this is one key factor that makes human communicative systems unique).

So what features does such a system have? Well, it generates unboundedly many hierarchical structured objects that pair a meaning with an articulation. But this is not all. In addition, its use is very very labile (there is no apparent restriction on the kinds of topics it can be used to “discuss” and it exploits a lexicon several orders of magnitude larger than anything else we find in animal communication systems and whose entries have semantic features quite unlike those we find with other animals. In sum, the syntax of human language, the vocab of human language and the applicability of human language are each unique.

More specifically, as GGers know human Gs embody a very specific form of hierarchical structure (e.g. binary branching, labeled nodes), a very specific form of recursion (e.g. Merge like rather than say FSG like) and human G use is open ended in many different ways (e.g. its use is not stimulus bound (i.e. you can talk about what’s not right before your eyes (viz. independently of the famous 4-Fs) or even actual), the semantics of its atoms are not referentially constricted,[2] its domain of application seems to be topic neutral (i.e. not domain restricted like, say, bee dances or vervet alarm calls)). And all of the above is still a pretty surfacy description of just some of distinctive features of human language (there is nothing quite like morphology evident in other communication systems either). As any GGer can attest, the descriptions available for each of these features that are empirically well motivated are endless.

I could go on, but, even this very cursory and brief description suffices for the main point I want to make: if these are the features that make human language unique then the evolutionary forces Laland lists, including his own, don’t in any obvious way get anywhere near explaining any of them. To wit: How does the fact that language is used to teach realtives or to gossip about them (or others) explain the fact that human Gs are hierarchically recursive, let alone recursive in the specific way that they are? How does the possibility that language promotes pair bounding or can be used to identify predators or to support good ways to hunt explain why human linguistic atoms are not particularly referentially bound? How does the claim that language can guide tool making or teach migration patterns explain why humans can use language in a non-stimulus bound way? How do any of these “functions” explain why the domains of application of human language are so labile? They don’t. Not even close. And that is the real problem. Not only is relevant evidence hard to come by (i.e. Lewontin’s point) but, more importantly, the form of the accounts are conceptually insufficient to explain the (acknowledged) unique features of interest. The problem, in other words, is that the proposals Laland (and MP) survey fail to make contact with the properties that need explaining. And that is far more problematic than simply being empirically hard (maybe, impossible) to verify.

Let me be a little harsher. A standard objection (again from stemming from Lewontin) is that many evolutionary accounts are just-so stories. And this is correct. Many are. And this is indeed a failing. Let’s even say that it is a very serious failing. But whatever their vices, just-so stories do have one vital ingredient missing from the accounts Laland and MP survey: were they accurate they would explain the relevant feature. Why did moths go from light colored to dark when pollution arose? Because the white ones were less able to camouflage themselves and were eaten leaving only the dark ones around. I don’t care if this story is entirely correct (but see here reporting that it is). It has the right form (i.e. if correct it would explain why the moths are speckled dark). So too stories we tell about why polar bears are white and why giraffe necks are long. However, this is precisely what is missing from most EvoLang accounts, including Laland’s. Or more precisely, if the features of interest are the ones that MP notes at the outset (which, recall, MP flags as being what makes human communication systems distinctive), then it is entirely unclear how the gossiping, teaching, cooperating would fuel the emergence of a system that is recursive, non-referential, domain general and stimulus free. So, the accounts fail conceptually, not just empirically. These accounts are not even just-so adequate. And that is a big failure. A very big failure. Indeed, an irreparable one![3]

I could go further (and so I will). Given an FL like ours which produces Gs like ours with generative procedures like ours and vocabulary items like ours it is pretty easy to tell a story as to how such a system could be used to do wonderful things, among others teach, gossip, makes tools, coordinate hunts, discuss movie reviews and more and more and more. That direction is easy. Given the characteristics of the system of language the variable uses it can be deployed in service of is pretty easy to understand. Not so the opposite. Even if teaching or bonding or gossiping is important it is not clear why doing any of these things demands a system with the special properties we find. One could imagine a perfectly serviceable teaching system that did not exploit lexical items with the peculiar semantic properties ours do or did not have generative procedures that allowed for the construction of endlessly hierarchically complex structures or that allowed for vastly different kinds of articulators (hands and tongues) or… You get the point, though, sadly, it seems to be a hard one to get. It is the point that Chomsky has been repeatedly making for quite a while now and it correctly flags the fact that an adequate evolutionary account of a capacity logically require a specification of the capacity whose evolution is being accounted for. This, after all, is the explanadum in any EvoLang account and, as such, is the explanatory target of any admissible explanans. Laland doesn’t spend much time specifying the features that make human language unique (the one’s that MP limns) and so spends no time explaining how his candidate proposal leads to communicative systems with these properties. Not surprisingly, then, the accounts he surveys and the one he provides don’t explain how these capacities could have arisen, let along how they actually did.

So, another discussion of evolang that really gets nowhere. This is nothing new, but it is sad that such smart people (and they are very very smart) are derailed in the same old uninteresting way. We really do know a lot about human language and its unique features. It would be nice if evolutionary types interested in evolang would pay some attention (though I am really not holding my breath).

[1] The very first comment on MP’s post by saphsin correctly makes this point.
[2] See here for some discussion of this and more specifically Paul Pietroski’s discussions of how little linguistic meaning has to do with truth (e.g. Paul’s contribution here and articles on his webpage here).
[3] I do know of a story that does not make this mistake and that concentrates on trying to explain some features on evolutionary terms. It’s one that Bob Brandon and I provided many many years ago here: From Icon to Symbol:  Some Speculations on the Evolution of Natural Language (1986), Philososphy & Biology. Vol. 1.2 pp.169-189. This speculative paper no doubt suffers from Lewontin’s critique, but at least it tries to isolate different features of the overall capacity and say which ones might be have an available evolutionary explanation. This virtue is entirely due to Robert Brandon’s efforts (he is a hot shot philosopher of biology and a friend).

Wednesday, January 24, 2018

Gary Marcus on deep learning

An “unknown” commentator left links to two very interesting Gary Marcus (GM) pieces (here1 and here2) on the current state of Deep Learning (DL) research. His two pieces make the points that I tried to make in a previous post (here), but do so much more efficiently and insightfully than I did. They are MUCH better. I strongly recommend that you take a look if you are interested in the topics.

Here are, FWIW, a couple of reactions to the excellent discussion these papers provide.

Consider first here1.

1. GM observes that the main critiques of DL contend not that DL is useless or uninteresting, but (i) that it leaves out a lot if one’s research interests lie with biological cognition, and (ii) that the part that DL leaves out is precisely what theories promoting symbolic computation have always focused on. In other words, the idea that DL suffices as a framework for serious cognition is what is up for grabs not whether it is necessary. Recall, Rs are comfortable with the kinds of mechanisms DLers favor. The E mistake is to think that this is all there is. It isn’t. As GM puts it (here1:4): DL is “not a universal…solvent, but simply…one tool among many…”

I am tempted to go a bit farther (something that Lake et. al. (see here) moot as well). I suspect that if one’s goal is to understand cognitive processes then DL will play a decidedly secondary explanatory role. The hard problem is figuring out the right representational format (the kinds of generalizations it licenses and categorizations it encourages). These fixed, DL can work its magic. Without these, DL will be relatively idle. These facts can be obscured by DLers that do not seem to appreciate the kinds of Rish debts their own programs actually incur (a point that GM makes eloquently in here 2). However, as we all know a truly blank slate generalizes not at all. We all need built-ins to do anything. The only relevant question is which ones and how much, not whether. DLers (almost always of an Eish persuasion) seem to have a hard time understanding this or drawing the appropriate conclusions from this uncontentious fact.

2. GM makes clear (here1:5) in what sense DL is bad at hierarchy. The piece contrasts “feature-wise hierarchy” from systems that “can make explicit reference to the parts of larger wholes.” GM describes the former as a species of “hierarchical feature detection; you build lines out of pixels, letters out of lines, words out of letters and so forth.” DL is very good at this (GM: “the best ever”). But it cannot do the second at all well, which is the kind of hierarchy we need to describe, say, linguistic objects with constituents that are computationally active. Note, that what GM calls “hierarchical feature detection” corresponds quite well with the kind of discovery procedures earlier structuralism advocated and whose limitations Chomsky exposed over 60 years ago. As GM notes, pure DL does not handle at all well the kinds of structures GGers regularly make use of to explain the simplest linguistic facts. Moreover, DL fails for roughly the reasons that Chomsky originally laid out; it does not appreciate the particular computational challenges that constituency highlights.

3. GM has a very nice discussion of where/how exactly DLs fail. It relates to “extrapolation” (see discussion of question 9, 10ff). And why? Because DL networks “don’t have a way of incorporating prior knowledge” that involve “operations over variables.” For these kinds of “extrapolations” we need standard symbolic representations, and this is something that DL eschews (for typically anti-nativist/rationalist motives). So they fail to do what humans find trivially easy (viz. to “learn from examples the function you want and extrapolate it”). Can one build into DL systems that employ operations over variables? GM notes that they can. But in doing so they will not be pure DL devices and will have to allow for symbolic computations and the innate (i.e. given) principles and operations that DLers regularly deny is needed.

4. GM’s second paper also has makes for very useful reading. It specifically discusses the AlphaGO programs recently in the news for doing for Go what other programs did for chess (beat the human champions). GM asks whether the success of these programs support the anti R conclusions that its makers have bruited about? The short answer is ‘NO!”. The reason, as GM shows, is that there is lots of specialized pre-packaged machinery that allows these programs to succeed. In other words, they are elbow deep into very specific “innate” architectural assumptions without which the programs would not function.

Nor should this be surprising for this is precisely what one should expect. The discussion is very good and anyone interested in a good short discussion of innateness and why it is important should take a look.

5. One point struck me as particularly useful. If what GM says is right then it appears that the non nativist Es don’t really understand what their own machines are doing. If GM is right, then they don’t seem to see how to approach the E/R debate because they have no idea what the debate is about. The issue is not whether machines can cognize. The issue is what needs to be in a machine that cognizes. I have a glimmer of a suspicion that DLers (and maybe other Eish AIers) confuse two different questions: (a) Is cognition mechanizable (i.e does cognition require a kind of mentalistic vitalism )? versus (b) What goes into a cognitively capable mind: how rasa can a cognitively competent tabula be?
These are two very different questions. The first takes mentalism to be opposed to physicalism, the suggestion being that mental life requires something above and beyond the standard computational apparatus to explain how we cognize as we do. The second is a question within physicalism and asks how much “innate” (i.e. given) knowledge is required to get a computational system to cognize as we do. The E answer to the second question is that not much given structure is needed. The Rs beg to differ. However, Rs are not committed to operations and mechanisms that transcend the standard variety computational mechanisms we are all familiar with. No ghosts or special mental stuff required. If indeed DLers confuse these two questions then it explains why they consider whatever program they produce (no matter how jam packed with specialized “given” structures (of the kind that GM notes to be the case with AlphaGO)) as justifying Eism. But as this is not what the debate is about, the conclusion is a non-sequitur. AlphaGo is very Rish precisely because it is very non rasa tabularly.[1]

To end: These two pieces are very good and important. DL has been massively oversold. We need papers that keep yelling about how little cloth surrounds the emperor. If your interests are in human (or even animal) cognition then DL cannot be the whole answer. Indeed, it may not even be much or the most important part of the answer. But for now if we can get it agreed that DL requires serious supplementation to get off the ground, that will be a good result. GM’s papers are a very good at getting us to this conclusion.

[1] I should add, that there are serious mental mysteries that we don’t know how to account for conutationally. Chomsky describes these as the free use of our capacities and what Fodor discusses under the heading central systems. We have no decent handle on how we freely exercise our capacities or how the complex judgments work. These are mysteries, but these mysteries are not what the E/R debate is mostly about.