Faculty of Language: May 2016

Monday, May 30, 2016

Crucial experiments and killer data

In the real sciences, theoretical debate often comes to an end (or at least severely changes direction) when a crucial experiment (CE) ends it. How do CEs do this? They uncover decisive data (aka “killer data” (KD)) that if accurate shows that one possible live approach to a problem is empirically deeply flawed.[1] These experiments and their attendant KD become part of the core ideology and serve to eliminate initially plausible explanations from the class of empirically admissible ones.[2]

Here are some illustrative examples of CE: the Michaelson-Morley experiment (which did in the ether and ushered in special relativity (here)), the Rutherford Gold Foil experiment that ushered in the modern theory of the atom (here), the recent LIGO experiment that established the reality of gravity waves (here), the Franklin x-ray refraction pix that established the helical structure of DNA (here), the Aspect and Kwiat experiments that signaled the end of hidden variable theories (here) and (one from Wootton) Galileo’s discovery of the phases of Venus which ended the Ptolemaic geocentric universe. All of these are deservedly famous for ending one era of theoretical speculation and initiating another. In the real sciences, there are many of these and they are one excellent indicator that a domain of inquiry has passed from intelligent speculation (often lavishly empirically titivated) to real science. Why? Because only relatively well-developed domains of inquiry are sufficiently structured to allow an experiment to be crucial. To put this another way: crucial experiments must tightly control for wiggle room, and this demands both a broad well developed empirical basis and a relatively tight theoretical setting. Thus, if a domain has such, it signals its scientific bona fides.

In what follows, I’d like to offer some KDs in syntax, phenomena that, IMO, rightly terminated (or should, if they are accurate) some perfectly plausible lines of investigation. The list is not meant to be exhaustive, nor is it intended to be uncontroversial.[3] I welcome dissent and additions. I offer five examples.

First, and most famously, polar questions and structure dependence. The argument and effect is well known (see here for one elaborate discussion). But to quickly review, we have an observation about how polar questions are formed in English (Gs “move” an auxiliary to the front of the clause). Any auxiliary? Nope, the one “closest” to the front. How is proximity measured? Well, not linearly. How do we know? Because of (i) the unacceptability of sentences like (1) (which should be well formed if distance were measured linearly) and (ii) the acceptability of those like (2) (which should be acceptable if distance is measured hierarchically).

1. *Can eagles that ~~can~~ fly should swim?

2. Should eagles that can fly ~~should~~ swim?

The conclusion is clear: if polar questions are formed by movement, then the relevant movement rule ignores linear proximity in choosing the right auxiliary to move.[4] Note, as explained in the above linked-to post, the result is a negative one. The KD here establishes that G rules forsake linear information. It does not specify the kind of hierarchical information it is sensitive to. Still, the classical argument puts to rest the idea that Gs manipulate phrase markers in terms of their string properties.[5]

The second example concerns reflexivization (R). Is it an operation that targets predicates and reduces their addicities by linking their arguments or is it a syntactic operation that relates nominal expressions? The former treats R as ranging over predicates and their co-arguments. The latter treats R as an operation that syntactically pairs nominal expressions regardless of their argument status. The KD against the predicate centered approach is found in ECM constructions where non co-arguments can be R related.

3. Mary expects herself to win

4. John believes himself to be untrustworthy

5. Mary wants herself to be elected president

In (3)-(5) the reflexive is anteceded by a non-co-argument. So, ‘John’ is an argument of the higher predicate in (4), and ‘himself’ is an argument of the lower predicate ‘be untrustworthy’ but not the higher predicate ‘believe.’ Assuming that reflexives in mono-clauses and those in examples like (3)-(5) are licensed by the same rule, it provides KD that R is not an argument changing (i.e. addictiy lowering)[6] operation but a rule defined over syntactic configurations that relates nominals.[7]

Here's a third more recondite example that actually had the consequence of eliminating one conception of empty categories (EC). In Concepts and Consequences (C&C) Chomsky proposed a functional interpretation of ECs.

A brief advertisement before proceeding: C&C is a really great book whose only vice is that its core idea is empirically untenable. Aside from this, it is a classic and still well worth reading.

At any rate, C&C is a sustained investigation of parasitic gap (PG) phenomena and it proposes that there is no categorical difference among the various flavors of traces (A vs A’ vs PRO). Rather there is only one EC and the different flavors reflect relational properties of the syntactic environment the EC is situated in. This allows for the possibility that an EC can start out its life as a PRO and end its life as an A’-trace without any rule directly applying to it. Rather, if something else moves and binds the PRO, the EC that started out as a PRO will be interpreted as an A or A’-trace depending on what position the element it is related to occupies (the EC is an A-trace if A-bound and an A-trace if A’-bound). This forms the core of C&C analysis of PGs, and it has the nice property of largely deriving the properties of PGs from more general assumptions about binding theory combined with this functional interpretation of ECs. To repeat, it is a very nice story. IMO, conceptually, it is far better than the Barriers account in terms of chain formation and 0-operators which came after C&C. Why? Because the Barriers account is largely a series of stipulations on chain formation posited to “capture” the observed output. C&C provides a principled theory but is wrong and Barriers provides an account that covers the data but is unprincipled.

How was C&C wrong? Kayne provided the relevant KD.[8] He showed that PGs, the ECs inside the adjuncts, are themselves subject to island effects. Thus, though one can relate a PG inside an adjunct (which is an island) to an argument outside the adjunct, the gap inside the island is subject to standard island effects. So the EC inside the adjunct cannot itself be inside another island. Here’s one example:

6. Which book did you review before admitting that Bill said that Sheila had read

7. *Which book did you review before finding someone that read

The functional definition of ECs implies that ECs that are PGs should not be subject to island effects as they are not formed by movement. This proved to be incorrect and the approach died. Killed by Kayne’s KD.

A fourth case: P-stranding and case connectedness effects in ellipsis killed the interpretive theory of ellipsis and argued for the deletion account. Once upon a time, the favored account of ellipsis was interpretive.[9] Gs generated phrase markers without lexical terminals. Ellipsis was effectively what one got with lexical insertion delayed to LF. It was subject to various kinds of parallelism restrictions, with the non-elided antecedent serving to provide the relevant terminals for insertion into the elided PM (i.e. the one without terminals) the insertion subject to recoverability and the requirement that the insertion be to positions parallel to those in the non-elided antecedent. Figuratively, the LF of the antecedent was copied into the PM of the elided dependent.

As is well-known by now, Jason Merchant provided KD against this position, elaborating earlier (ignored?) arguments by Ross. The KD came in two forms. First, that elided structures respect the same case marking conventions apparent in non-elision constructions. Second, that preposition stranding is permitted in ellipsis just in case it is allowed in cases of movement without elision. In other words, it appears that but for the phonology, elided phrases exhibit the same dependencies apparent in non-elided derivations. The natural conclusion is that elision is derived by deleting structure that is first generated in the standard way. So, the parallelism in case and P-stranding profiles of elided and non-elided structures implies that they share a common syntactic derivational core.[10] This is just what the interpretive theory denies and the deletion theory endorses. Hence the deletion theory has a natural account for the observed syntactic parallelism that Merchant/Ross noted. And indeed, from what I can tell, the common wisdom today is that ellipsis is effectively a deletion phenomenon.

It is worth observing, perhaps, that this conclusion also has a kind of minimalist backing. Bare Phrase Structure (BPS) makes the interpretive theory hard to state. Why? Because the interpretive theory relies on a distinction between structure building and lexical insertion, and BPS does not recognize this distinction. Thus, given BPS, it is unclear how to generate structures without terminals. But as the interpretive theory relies on doing just this, it would seem to be a grammatically impossible analysis in a BPS framework. So, not only is the deletion theory of ellipsis the one we want empirically, it also appears to be the one that conforms to minimalist assumptions.

Note, that the virtue of KD is that it does not rely on theoretical validation to be effective. Whether deletion theories are more minimalistically acceptable than interpretive theories is an interesting issue. But whether they are or aren’t does not affect the dispositive nature of KD data wrt the proposals it adjudicates. This is one of the nice features of CEs and KD: they stand relatively independent of particular theory and hence provide a strong empirical check on theory construction. That’s why we like them.

Fifth, and now I am going to be much more controversial; inverse control and the PRO based theory of control. Polinksy and Potsdam (2002) presents cases control in which “PRO” c-commands its antecedent. This, strictly speaking should be impossible for such binding violates principle C. However, the sentences are licit with a control interpretation. Other examples of inverse control have since been argued to exist in various other languages. If inverse control exists, it is a KD for any PRO based conception of control. As all but the movement theory of control (MTC) is a PRO based conception of control, if inverse control obtains then the MTC is the only theory left standing. Moreover, as Polinsky and Potsdam have argued since, that inverse control exists makes perfect sense in the context of a copy theory of movement if one allows top copies to be PF deleted. Indeed, as argued here the MTC is what one expects in the context of a theory that eschews D-structure and adopts the least encumbered theory of merge. But all of this is irrelevant as regards the KD status of inverse control. Whether or not the MTC is right (which it, of course is) inverse control effects present KD against PRO based accounts of control given standard assumptions about principle C.

That’s it. Five examples. I am sure there are more. Send in your favorite. These are very useful to have on hand for they are part of what makes a research program progressive. CEs and KDs mark the intellectual progress of a discipline. They establish boundary condition sin adequate further theorizing. I am no great fan of empirics. The data does not do much for me. But I am an avid consumer of CEs and KDs. They are, in their own interesting ways, tributes to how far we’ve come in our understanding and so should be cherished.

[1] Note the modifier ‘deeply.’ Here’s an interesting question that I have no clean answer for: what makes one flaw deep and another a mere flesh wound? One mark of a deep flaw is that it buts up against a bed rock principle of the theory under investigation. So, for example, Galileo’s discovery was hard to reconcile with the Ptolemaic system unless one assumed that the phases of Venus were unlike any other of the phases seen at the time. There was no set of calculations that could get you the observed effects that were consistent with those most generally in use. Similarly for the Michaelson-Morley data. To reconcile these with the observations required fundamental changes to other basic assumptions. Most data are not like this. They can be reconciled by adding further (possibly ad hoc) assumptions or massaging some principles in new ways. But butting up against a fundamental principle is not that common. That’s why CEs and KD is interesting and worth looking for.

[2] The term “killer data” is found in a great new book on the rise of modern science by David Wootton (here). He argues that the existence of KD is a crucial ingredient in the emergence of modern science. It’s a really terrific book for those of you interested in these kinds of issues. The basic argument is that there really was a distinction in kind between what came after the scientific revolution and its precursors. The chapter on how perspective in painting fueled the realistic interpretation of abstract geometry as applied to the real world is worth the price of the book all by itself.

[3] In this, my list fails to have one property that Wootton highlighted. KDs as a matter of historical fact are widely accepted and pretty quickly too. Not all my candidate KDs have been as successful (tant pis), hence the bracketed qualifying modal.

[4] Please note the conditional: the KD shows that transformations are not linearly sensitive. This presupposes that Y/N questions are transformationally derive. Syntactic Structures argued for a transformational analysis of Aux fronting. A good analysis of the reasons for this is provided in Lasnik’s excellent book (here). What is important to note is that data can become KD only given a set of background assumptions. This is not a weakness.

[5] This raises another question that Chomsky has usefully pressed: why don’t G operations exploit the string properties of phrase markers? His answer is that PMs don’t have string properties as they are sets and sets impose no linear order on their elements.

[6] Note: that R relates nominals does not imply that it cannot have the semantic reflex of lowering the additcity of a predicate. So, R applies to John hugged himself to relate the reflexive and John. This might reduce the addicity of hug from 2-place to 1-place. But this is an effect of the rule, not a condition of the rule. The rule could care less whether the relata are co-arguments.

[7] There are some theories that obscure this conclusion by distinguishing between semantic and syntactic predicates. Such theories acknowledge the point made here in their terminology. R is not an addicity changing operation, though in some cases it might have the effects of changing predicate addicity (see note 6).

This, btw, is one of my favorite KDs. Why? Because it makes sense in a minimalist setting. Say R is a rule of G. Then given Inclusiveness it cannot be an addicity changing operation for this would be a clear violation of Inclusiveness (which, recall, requires preserving the integrity of the atoms in the course of a derivation and nothing violates the integrity of a lexical item more than changing its argument structure). Thus, in a minimalist setting, the first view of R seems ruled out.

We can, as usual, go further. We can provide a deeper explanation for this instance of Inclusiveness and propose that addicity changing rules cannot be stated given the right conception of syntactic atoms (this parallel to how thinking of Merge as outputting sets thereby makes impossible rules that exploit linear dependencies among the atoms (see note 3)). How might we do this? By assuming that predicates have at most one argument (i.e. they are 1-place predicates). This is to effectively endorse a strong neo-Davidsonian conception of predicates in which all predicates are 1-place predicates of events and all “arguments” are syntactic dependents (see e.g. Pietroski here for discussion). If this is correct, then there can be no addicity changing operations grammatically identifying co-arguments of a predicate, as predicates have no co-arguments. Ergo, R is the only kind of rule a G can have.

[8] If memory serves, I think that he showed this in his Connectedness book.

[9] Edwin Williams developed this theory. Ivan Sag argued for a deletion theory. Empirically the two were hard to pull apart. However in the context of GB, Williams argued that the interpretive theory was more natural. I think he had a point.

[10] For what it is worth, I have always found the P-stranding facts to be the more compelling. The reason is that all agree that at LF P-stranding is required. Thus the LF of To whom did you speak? involves abstracting over an individual, not a PP type. In other words, the right LF involves reconstructing the P and abstracting over the DP complement; something like (i), not (ii):

(i) Who₁ [you talk to x₁]

(ii) [To who]₁ [you talk x₁]

An answer to the question given something like (i) is ‘Fred.’ An answer to (ii) could be ‘about Harry.’ It is clear that at LF we want structure like (i) and not (ii). Thus, at LF the right structure in every language necessarily involves P-stranding, even if the language disallows P-stranding syntactically. This is KD for theories that license ellipsis at LF via interpretation rather than via movement plus deletion.

Friday, May 27, 2016

Testing our tools

Here’s a paper that I just read that makes a very interesting point.[1] The paper by Eric Jonas and Konrad Kording (J&K) has the provocative title “Could a neuroscientist understand a microprocessor?” It tests the techniques of neuroscience by applying them to a structure that we completely understand and asks if these techniques allow us to uncover what we know to be the correct answer. The “model system” investigated are vintage processors used to power video games in very early Apple/Atari/Commodore devices and the question asked is whether the techniques in cog-neuro can deliver an undergrad level understanding of how the circuit works. You can guess the answer: Nope! Here’s how J&K put it:

Here we will try to understand a known artificial system, a historic processor by applying data analysis methods from neuroscience. We want to see what kind of an understanding would emerge from using a broad range of currently popular data analysis methods. To do so, we will analyze the connections on the chip, the effects of destroying individual transistors, tuning curves, the joint statistics across transistors, local activities, estimated connections, and whole brain recordings. For each of these we will use standard techniques that are popular in the field of neuroscience. We find that many measures are surprisingly similar between the brain and the processor and also, that our results do not lead to a meaningful understanding of the processor. The analysis cannot produce the hierarchical understanding of information processing that most students of electrical engineering obtain. We argue that the analysis of this simple system implies that we should be far more humble at interpreting results from neural data analysis. It also suggests that the availability of unlimited data, as we have for the processor, is in no way sufficient to allow a real understanding of the brain. (1)

This negative result should, as J&K puts it, engender some humility in those that think we understand how the brain works. If J&K are right, our techniques are not even able to suss out the structure of a relatively simple circuit, which, in most ways that count, should be more easily investigated using these current techniques. We can after all lesion a circuit to our hearts delight (but this does not bring us “much closer to an understanding of how the processor works” (5)) and take every imaginable measurement of both individual transistors and of the whole processor (but this does not give “conclusive insight into the computation” (6)) and do full connectivity diagrams and still we have little idea about how the circuit is structured to do what it does. So, it’s not only the nematode that remains opaque. Even a lowly circuit won’t give up its “secrets” no matter how much data we gather using these techniques.

This is the negative result, and it is interesting. But there is a positive observation that I want to draw your attention to as well. J&K observes that many of their measures on the processor “are surprisingly similar” to those made on brains. The cog-neuro techniques applied to the transistors yield patterns that look remarkably like spike trains (5), look “quite a bit like real brain signals” (6) and “produce results that are surprisingly similar to the results found about real brains” (9). This is very interesting. Why?

Well, there is a standard story in Brainville that promotes the view that brains are entirely different from digital computers. The J&K observation is that a clearly digital system will yield “surprisingly similar” patterns of data if the same techniques are applied to it as are applied to brains. This suggests that standard neuro evidence is consistent with the conclusion that the brain is a standard computing device. Or, more accurately, were it one, the kind of data we in fact find is the kind of data that we should find. Thus, the simple minded view that brains don’t compute the way that computers do, is, at best motivated by very weak reasoning (IMO, the only real “argument” is that they don’t look like computers).

Why mention this? Because, as you know, there are extremely good reasons provided by Gallistel among others that the brain must have a standard classical Turing architecture, though we have no current idea how brains realize this. What J&K shows is that systems that clearly are classical computational systems in this sense generate the same patterns of data as brains do, which suggests, at the very least, that the conclusion that brains are not classical computers requires much more argument than standardly provided.

At any rate, take a look at J&K. It is a pretty quick read. Both its negative and positive conclusions are interesting. It also outlines a procedure that is incredibly useful: it always pays to test ones methods on problems that we know the answer to. If these methods don't deliver where we know the answer, then we should be wary of over-interpreting results when applied to problems we know almost nothing about.

[1] Both Tim Hunter and Jeff Lidz sent me links to it. Thx.

Wednesday, May 25, 2016

AI now

Here are some papers on the current hype over AI. They are moderately skeptical about the current state of the art. Not so much about whether there are tech breakthroughs to be had. All agree that these are forthcoming. The skepticism concerns the implications of this. Let me a say a word or two about this.

There is a lot of PR concerning how Big Data will revolutionize our conceptions of how the mind works and how science should be conducted. Big Data is Empiricism on steroids. It is made possible because of hardware breakthroughs in memory and speed of computation. We can do more of what we have always done faster and this can make a difference. I doubt that this tells us much about human cognition. Or more accurately, what it does tell us is likely wrong. Big Data is often coupled with Deep Learning. And linguists have every reason to believe that Deep Learning is an incorrect model of human cognition. Why? Because it is a modern version of the old discovery procedure. Level1 generalizations are generalized again at level 2 and level two generalizations are generalized agains t level 3 and so on. As a model of cognition, this tells us that higher levels are just generalizations over lower ones (e.g. from phonemes we get morphemes and from morphemes we get we get phrase structure and from phrase structure we get...). GG started form the demonstration that this is an incorrect understanding of linguistic organization. Levels exist, but they are in no sense reducible to the ones lower down. Indeed, whether it makes sense to speak of 'higher' and 'lower' is quite dubious. The levels interact but don't reduce. And any theory of learning that supposes otherwise is wrong if this is right (and there is no reason to think that it is not right, or at least no argument has been presented arguing against level independence). So, Deep Learning and Big Data are. IMO, dead theories walking. We will see this very soon.

The interview with Gary Marcus (here) discusses these issues and notes that historically what we have here is more of the same that has in the past proven to be wildly oversold. He thinks (and I agree) that we are getting another snow job this time around too. The interview rambles somewhat (bad editing) but there is lots in here to provoke thought.

A second paper on a similar theme is here in Aeon. The fact that it is in Aeon should not be immediately held against it. True, there is reason to be suspicious given the track record, but the paper was not bad, IMO. It argues that there is no "coming of the machines."

Here is a third piece on programming and advice about how not to do it (the Masaya). It interestingly argues for a Marrian conception of programming. Understand the computational problem before you write code. Seems like reasonable advice.

Last point: I mentioned above that Big Data is not only influencing how we conceive of Minds and Brains but also on how we should do science. The idea seems to be that that with enough data, the search for basic causal architecture becomes quaint and unnecessary. We can just vacuum up the data and the generalizations of scientific utility will pop out. On this view, theories (and not only minds) are just compendia of data generalizations and given that we can now construct these compendia more efficiently and accurately by analyzing more and more data, theory construction becomes a quaint pastime. The only real problem with Empiricism is that we did not gather enough data fast enough. And now, Big Data can fix this. The dream of understanding without thinking is finally here. Oy vey!

Monday, May 23, 2016

The return of behaviorism

There is a resurgence of vulgar Empiricism (E). It’s rampant now, but be patient, it will soon die out as the groundless extravagant claims made on its behalf will soon be seen to, yet again, prove sterile. But it is back and getting airing in the popular press.

Of the above, the only part that is likely difficult to understand is what I intend by ‘vulgar.’ I am not a big fan of the E-weltanschauung, but even within Empiricism there are more and less sophisticated versions. The least sophisticated in the mental sciences is some version of behaviorism (B). What marks it out as particularly vulgar? Its complete repudiation of mental representations (MR). Most of the famous E philosophers (Locke and Hume for example) were not averse to MRs. They had no problem believing that the world, through the senses, produces representations in the mind and that these representations are causally implicated in much of cognitive behavior. What differentiates classical E from classical Rationalism (R) is not MRs but the degree to which MRs are structured by experience alone. For E, the MR structure pretty closely tracks the structure of the environmental input as sampled by the senses. For R, the structure of MRs reflects innate properties of the mind in combination with what the senses provide of the environmental landscape. This is what the debate about blank/wax tablets is all about. Not whether the mind has MRs but whether the properties of the MRs we have reduce to sensory properties (statistical or otherwise) of the environment. Es say ‘yes,’ Rs ‘no.’

Actually this is a bit of a caricature. Everyone believes that the brain/mind brings something to the table. Thus, nobody thinks that the brain/mind is unstructured as such brains/minds cannot generalize and everyone believes that brains/minds that do not generalize cannot acquire/learn anything. The question then is really how structured is the brain/mind. For Es the mind/brain is largely a near perfect absorber of environmental information with some statistical smoothing techniques thrown in. For R extracting useful information from sensory input requires a whole lot of given/innate structure to support the inductions required. Thus, for Es the gap between what you perceive and what you acquire is pretty slim, while for Rs the gap is quite wide and bridging this gap requires a lot of pre-packaged knowledge. So everyone is a nativist. The debate is what kinds of native structure is imputed.

If this is right, the logical conclusion of E is B. In particular, in the limit, the mind brings nothing but the capacity to perfectly reflect environmental input to cognition. And if this is so, then all talk of MRs is just a convenient way of coding environmental input and its statistical regularities. And if so, MRs are actually dispensable and so we can (and should) dump reference to them. This was Skinner’s gambit. B takes all the E talk of MRs as theoretically nugatory given that all MRs do is recapitulate the structure of the environment as sampled by the senses. MRs, on this view, are just summaries of experience and are explanatorily eliminable. The logical conclusion, the one that B endorses, is to dump the representational middlemen (i.e. MRs) that stand between the environment and behavior. All the brain is, on this view, is a way of mapping between stimulus inputs and behavior, all the talk of MRs just being misleading ways of talking about the history of stimuli. Or, we don’t need this talk of minds and the MR talk it suggests, we can just think of the brain as a giant I/O device that “somehow” maps stimuli to behaviors.

Note, that without representations there is no real place for information processing and the computer picture of the mind. Indeed, this is exactly the point that critics of E and B have long made (e.g. Chomsky, Fodor, and Gallistel to name three of my favorites). But, of course the argument can be aimed in the reverse direction (as Jerry Fodor sagely noted someone’s modus ponens can be someone else’s modus tollens): ‘If B then the brain does not process information’ (i.e. the opposite of ‘If the brain processes info then not B’). And this is what I mean by the resurgence of vulgar E. B is back, and getting popular press.

Aeon has a recent piece against the view of the brain as an information processing device (here). The author is Robert Epstein. The view is B through and through. The brain is just a vehicle for pairing inputs with behaviors based on reward (no, I am not kidding). Here is the relevant quote (13) :

As we navigate through the world, we are changed by a variety of experiences. Of special note are experiences of three types: (1) we observe what is happening around us (other people behaving, sounds of music, instructions directed at us, words on pages, images on screens); (2) we are exposed to the pairing of unimportant stimuli (such as sirens) with important stimuli (such as the appearance of police cars); (3) we are punished or rewarded for behaving in certain ways.

No MRs mediate input and output. I/O is all there is.

Misleading headlines notwithstanding, no one really has the slightest idea how the brain changes after we have learned to sing a song or recite a poem. But neither the song nor the poem has been ‘stored’ in it. The brain has simply changed in an orderly way that now allows us to sing the song or recite the poem under certain conditions. When called on to perform, neither the song nor the poem is in any sense ‘retrieved’ from anywhere in the brain, any more than my finger movements are ‘retrieved’ when I tap my finger on my desk. We simply sing or recite – no retrieval necessary (14).

No need for memory banks or MRs. All we need is “the brain to change in an orderly way as a result of our experiences” (17). So sensory inputs, rewards, behavioral outputs. And the brain? That organ that mediates this process. Skinner must be schepping nachas!

Let me end with a couple of references and observations.

First, there are several very good long detailed critiques of this Epstein piece out there (Thx to Bill Idsardi for sending them my way). Here and here are two useful ones. I take heart in these quick replies for it seems that this time around there are a large number of people who appreciate just how vulgar B conceptions of the brain are. Aeon, which published this piece, is, I have concluded, a serious source of scientific disinformation. Anything printed therein should be treated with the utmost care, and, if it is on cog-neuro topics, the presumption must be that it is junk. Recall that Vyvyan Evans found a home here too. And talk about junk!

Second, there is something logically pleasing about articles like Epstein’s; they do take an idea to its logical conclusion. B really is the natural endpoint of E. Intellectually, it’s vulgarity is a virtue for it displays what much E succeeds in hiding. Critics of E (especially Randy and Jerry) have noted its lack of fit with the leading ideas of computational approaches to neuro-cognition. In an odd way, the Epstein piece agrees with these critiques. It agrees that the logical terminus of E (i.e. B) is inimical with the information processing view of the brain. If this is right, the brain has no intrinsic structure. It is “empty,” a mere bit of meat serving as physiological venue for combining experience and reward with an eye towards behavior. Randy and Jerry and Noam (and moi!) could not agree more. On this behaviorist view of things the brain is empty and pretty simple. And that’s the problem with this view. The Epstein piece has the logic right, it just doesn’t recognize a reductio, no matter how glaring.

Third, the piece identifies B’s fellow travellers. So, not surprisingly embodied cognition makes an appearance and the piece is more than a bit redolent of connectionist obfuscation. In the old days, connectionists liked to make holistic pronouncements about the opacity of the inner workings of the neural nets. This gave it a nice anti-reductionist feel and legislated questions about how the innards of the system worked unaskable. It gave the whole theory a kind of new age, post-modern gloss with an Aquarian appeal. Well, the Epstein piece assembles the same cast of characters in roughly the same way.

Last observation: the critiques I linked to above both dwell on how misinformed this piece is. I agree. There is very little argumentation and what there is, is amazingly thin. I am not surprised, really. It is hard to make a good case for E in general and B in particular. Chomsky’s justly famous review of Skinner’s Verbal Behavior demonstrated this in detail. Nonetheless, E is back. If this be so, for my money, I prefer the vulgar forms, the ones that flaunt the basic flaws. And if you are looking for a good version of a really bad set of Eish ideas, the Epstein article is the one for you.