Friday, September 28, 2018

Pulling back

Today marks FoL's sixth anniversary. I started FoL because I just could not stand reading the junk being written about Generative Grammar (GG) in the popular press. The specific occasion was some horrid coverage of Everett's work (By Bartlett in the Chronicle) on Piraha and its supposed significance for theories of FL/UG. The discussion was based on the most trivial misunderstanding of the GG enterprise, and I thought that I could help sort matters out and have some fun in the process. I did have fun. I have sorted things out. I have not stopped the garbage.

FoL has continued to try to deal with the junk by both pointing it out and then explaining how it was badly mistaken. This has, sadly, kept me quite busy. There is lots of misunderstanding out there and it never seems to lessen, no matter how many stakes get driven into the hearts of the extremely poor arguments.

In addition to regularly cleaning out the Augean stables, I have also written on other issues that amuse me: Rationalism vs Empiricism, Fodor and representationalism, big data, deep learning, mentalism, PoS argumentation, languistics vs linguistics, universals, Evo Lang, minimalism and its many many virtues, minimalism and its obscurities, minimalism and how to do it right, minimalism and how/why people misinterpret it, computationalism and its implications for cog-neuro, the greatness of Randy's work, interesting findings in neuro that support GG, how to bring near and ling closer together (Yay to Embick and Poeppel), and more. It's been a busy 6 years.

In fact, here is how busy. I calculate that I've written about 1 long post per week for the last 6 years. A long post is 4 pages or more. I have also put up shorter ones so that overall I have posted upwards of 600 pieces. And I have enjoyed every minute of this.

However, doing this (finding the articles, reading them, writing the posts, responding to comments, cleaning the site) has taken up a lot of my time, and as I want to write one last book before I call it quits (I am getting very old!), I have decided that I have to cut back. My current plan is to write maybe one post a month, if that. This will allow me to write my magnum opus (this is a joke!) which will be an even more full throated defense of the unbelievable success of the Minimalist Program. It really has been marvelous and I intend to show exactly how marvelous in about 150 (not more or nobody will take a look) fun filled pages. If all goes well, I might even post versions in FoL for comment.

This is all a long-winded way of saying that I will be posting much less often and to thank you for reading, commenting and arguing with me for the last 6 years. I have learned a lot and enjoyed every minute. But time for a break.

One last point: if anyone wishes to post to FoL I am open to looking at things to put up. We still have others that will be contributing content and I am happy to curate more if it comes my way. So feel free to jump in. And again, thx.

Linguistic experiments

How often do we test our theories and basic concepts in linguistics? I don’t know for sure, but my hunch is that it is not that often. Let me explain.

One of the big ideas in the empirical sciences is the notion of the crucial experiment (or “experimentum crucis” (EC) for those of you who prefer “ceteris paribus” to “all things being equal” (psst, I am one of those so it is ‘EC’ from now on) (see here). What is an EC?  Wikepedia says the following:

In the sciences, an experimentum crucis (English: crucial experiment or critical experiment) is an experiment capable of decisively determining whether or not a particular hypothesis or theory is superior to all other hypotheses or theories whose acceptance is currently widespread in the scientific community. In particular, such an experiment must typically be able to produce a result that rules out all other hypotheses or theories if true, thereby demonstrating that under the conditions of the experiment (i.e., under the same external circumstancesand for the same "input variables" within the experiment), those hypotheses and theories are proven false but the experimenter's hypothesis is not ruled out.

The most famous experiments in the sciences (e.g. Michelson-Morley on Special Relativity, Eddington’s on General Relativity, Aspect on Bell’s inequality) are ECs, including those that were likely never conducted (e.g. Galileo’s dropping things from the tower). What makes them critical is that they are able to isolate a central feature of a theory or a basic concept for test in a local environment where it is possible to control for the possible factors. We all know (or we all shouldknow) that it is very hard to test an interesting theoretical claim directly.[1]As the quote above notes, the test critically relies on carefully specifying the “conditions of the experiment” so as to be able to isolate the principle of interest enough for an up or down experimental test.

What happens in such an experiment? Well, we set up ancillary assumptions that are well grounded enough to allow the experiment to focus on the relevant feature up for test. In particular, if the ancillary assumptions are sufficiently well grounded in the experimental situation then the proposition up for test will be the link in the deductive structure of the set up that is most exposed by the test. 

Ancillary assumptions are themselves empirical and hence contestable. That is why ECs are so tough to dream up: to be effective these ancillary assumptions must in the context of the experimental set upbe stronger than the theoretical item they are being used to test. If they are weaker than the proposition to be tested then the EC cannot decisively test that proposition. Why? Well, the ancillary assumption(s) will be weaker links in the chain of experimental reasoning and an experimental result can always be correctlycausally attributed to the weaker ancillary assumptions. This will spare exposure of the theoretically principle or concept of interest directly to the test. However, and this is the important thing, it is possible in a given contextto marshal enough useful ancillary assumptions that are better grounded in that contextthan the proposition to be tested. And when this is possible the conditions for an EC are born.

As I noted, I am not sure that we linguists do much ECing. Yes, we argue for and against hypotheses and marshal data to those ends, but it is rare that we set things up to manufacture a stable EC. Here is what I mean.

A large part of linguistic work aims less to test a hypothesis than to apply it (and thereby to possibly(not this is a possibility, not a necessity) refine it). For example, say I decide to work on a certain construction C in a certain language L. Say C has some focus properties, namely when the expression appears in a designated position distinct from its “base” position it bears a focus interpretation. I then analyze the mechanisms underlying this positioning. I usemovement theory to triangulate on the kind of operation might be involved. I test this assumption by seeing if it meets the strictures of Subjacency Theory (allows unbounded dependencies yet obeys islands) and if it does, I conclude it is movement. I then proceed to describe some of the finer points of the construction given that it is an A’-movement operation. This might force a refinement of the notion of movement, or island or, phase to capture all the data, but the empirical procedure presupposes that the theory we entered the investigation with is on the right track though possibly in need of refinement within the grammar of L. The empirical investigation’s primary interest is in describing C in L and in service of this it will refine/revise/repurpose (some) principles of FL/UG. 

This sort of work, no matter how creative and interesting is unlikely to lead to a EC of the principles of FL/UG precisely because of its exploratory nature. The principles are more robust than the ancillary assumptions we will make to fit the facts. And if this is so, we cannot use the description to evaluate the basic principles. Quite the contrary. So, this kind of work, which I believe describes a fair chunk of what gets done, will not generally serve EC ends.

There is a second impediment to ECs in linguistics. More often than not the principles are too gauzy to be pinned down for direct test. Take for example the notion of “identity” or “recoverability.” Both are key concepts in the study of ellipsis, but, so far as I can tell, we are not quite sure how to specify them. Or maybe a more accurate claim would be is that we have many many specifications. Is it exact syntactic identity? Or identity as non-distinctness? Or propositional (semantic) identity? Identity of what object at what level?  We all know that something likeidentity is critical, but it has proven to be very hard to specify exactly what notion is relevant. And of course, because of this, it is hard to generate ECs to test these notions. Let me repeat: the hallmark of a good EC is its deductive tightness. In the experimental situation the experimental premises are tight enough and grounded enough to focus attention on the principle/concept of interest. Good ECs are very tight deductive packages. So constructing effective ones is hard and this is why, I believe, there are not many ECs in linguistics.

But this is not always so, IMO. Here are some example ECs that have convinced me.

First: It is pretty clear that we cannot treat case as a byproduct of agreement. What’s the EC?[2]Well one that I like involves the Anaphor Agreement Effect (AAE). Woolford (refining Rizzi) observed that reflexives cannot sit in positions where they would have to value agreement features on a head. The absence of nominative reflexives in languages like English illustrates this. The problem with them is not that they are nominatively case marked, but that they must value the un-valued phi features of T0and they cannot do this. So, AAE becomes an excellent phi-feature detector and it can be put to use in an EC: if case is a byproduct of phi-feature valuation then we should never find reflexives in (structurally) case marked positions. This is a direct consequence of the AAE. But we do regularly find reflexives in non-nominative positions, hence it must be possible to assign case without first valuing phi-features. Conclusion: case assignment need not piggy back on phi-feature valuation. 

Note the role that the AAE plays in this argument. It is a relatively simple and robust principle. Moreover, it is one that we would like to preserve as it explains a real puzzling fact about nominative reflexives: they don’t robustly exist! And where we do find them, they don’t come from T0s with apparent phi-features and where we find other case assigning heads that do have unvalued phi-features we don’t find reflexives. So, all in all, the AAE looks like a fairly decent generalization and is one that we would like to keep. This makes it an excellent part of a deductive package aimed at testing the idea that case is parasitic on agreement as we can lever its retention into an probe of some idea we want to explore. If AAE is correct (main assumption), then if case is parasitic on agreement we shouldn’t see reflexives in case positions that require valuing phi features on a nearby head. If case is not parasitic on phi valuation then we will. The experimental verdict is that we do find reflexives in the relevant domains and the hypothesis that case and phi-feature valuation are two sides of the same coin sinks. A nice tight deductive package. An EC with a very useful result.

Second: Here’s a more controversial EC, but I still think is pretty dispositive. Inverse control provides a critical test for PRO based theories of control. Here’s the deductive package: PRO is an anaphoric dependent of its controller. Anaphoric dependents can never c-command their antecedents as this would violate principle C. Principle C is a very robust characteristic of binding configurations. So, a direct consequence of PRO based accounts of control is the absence of inverse control configurations, configurations in which “PRO” c-commands its antecedent. 

This consequence has been repeatedly tested since Polinksy and Potsdam first mooted the possibility in Tsez and it appears that inverse control does indeed exist. But regardless of whether you are moved by the data, the logic is completely ECish and unless there is something wrong with the design (which I strongly doubt) it settles the issue of whether Control is a DP-PRO dependency. It cannot be. Inverse control settles the matter. This has the nice consequence that PRO does not exist. Most linguists resist this conclusion but, IMO, that is because they have not fully taken on board the logic of ECs.

Here’s a third and last example: are island effects complexity effects or structural effects? In other words, are island effects the reflections of some generic problem that islands present cognition with or something specific to the structural properties of islands? The former would agree that island effects exist but that they are due to, for example, short term memory overload that the parsing of islands induces. 

The two positions are both coherent and, truth be told, for theoretical reasons, I would rather that the complexity story were the right one. It would just make my life so much easier to be able to say that island effects were not part of my theoretical minimalist remit. I could then ignore them because they are not really reflections of the structure of FL/UG and so I would not have to try and explain them! Boy would that be wonderful! But much as I would love this conclusion, I cannot in good scientific conscience adopt it for Sprouse and colleagues have done ECs showing that it is very very likely wrongwrong. I refer you to the Experimental Syntax volume Sprouse and I edited for discussion (see here) and details. 

The gist of the argument is that were islands reflexes of things like memory limitations then we should be able to move island acceptability judgments around by manipulating the short term memory variable. And we can do this. Humans come in strong vs weak short term memory capacities. We even have measures of these. Were island effects reflections of such memory capacity, then island effects would differentially affect these two groups. They don’t so it’s not. Again the EC comes in a tight little deductive box and the experiment (IMO) decisively settles the matter. Island effects, despite my fondest wishes really do reflect something about the structurallinguisticproperties of islands. Damn!

So, we have ECs in linguistics and I would like to see many more. Let me end by saying why.  I have three reasons.

First, it would generate empirical work directly aimed at theoretically interesting issues. The current empirical investigative instrument is the analysis, usually of some construction or paradigm. It starts with an empirical paradigm or construction in some L and it aims at a description and explanation for that paradigm’s properties. This is a fine way to proceed and it has served us well. This way of proceeding is particularly apposite when we are theory poor for it relies on the integrity of the paradigm to get itself going and reaches for the theory in service of a better description and possible explanation. And, as I said, there is nothing wrong with this. However, though it confronts theory, it does so obliquely rather than directly. Or so it looks to me.

To see this, contrast this with the kind of empirical work we see more often in the rest of the sciences. Here empirical work is experimental. Experiments are designed to test the core features of the theory. This requires, first, identifying and refining the key features of the leading ideas, massaging them, explicating them and investigating their empirical consequences. Once done, experiments aim to find ways of making these consequences empirically visible. Experiments, in other words, require a lot of logical scaffolding. They are not exploratory but directed towards specific questions, questions generated by the theories they are intended to test. Maybe a slogan would help here: linguistics has lots of exploratory work, some theoretical work but only a smidgen of experimental work. We could do with some more.

Second, experiments would tighten up the level of argument. I mentioned that ECs come as tight deductive packages. The assumptions, both what is being tested and the ancillary hypotheses must be specified for an EC to succeed. This is less the case for exploratory work. Here we need to string together principles and facts in a serviceable way to cover the empirical domain. This is different from building an airtight box to contain it and prod it and test it. So, I think that a little more experimental thinking would serve to tighten things up.

Third, the main value of ECs is that it eliminates theoretical possibilities and so allows us to more narrowly focus theory construction. For example, if case is not parasitic on agreement then this suggests different theories of case than ones where they must swing together. Similarly, if PRO does not exist, then theories that rely on PRO are off on the wrong track, no matter how descriptively useful they might be. The role of experiments, in the best of all possible worlds, is to discard attractive but incorrect theory. This is what empirical work is for, to dispose. Now, we do not (and never will) live in the best of all possible scientific worlds. But this does not mean that getting a good bead on the empirical standing of our basic concepts experimentally is not useful. 

Let me finish by adding one more thing. Our friends in psycho ling do experiments all the time. Their culture is organized around this procedure. That’s why I have found going to their lab meetings so interesting. I think that theories in Ling are far better grounded and articulated than theories in psycho-ling (that is my personal opinion) but their approach often seems more direct and reasonable. If you have not been in the habit of sitting in on their lab meetings, I would recommend doing so. There is a lot to recommend the logic of experimentation that is part of their regular empirical practice.


[1]Part of the problem with languists’ talking about Chomsky’s linguistic conception of universals is that they do not appreciate that simply looking at surface forms is unlikely to bear much on the claim being made. Grammars are not directly observable. Languists take this to imply that Chomskyan universals are not testable. But this is not so. They are not triviallytestable, which is a whole different matter. Nothing interesting is trivially testable. It requires all sorts of ancillary hypotheses to set the stage for isolating the relevant principle of interest. And this takes lots of work. 
[2]This is based on discussions with Omer. Thx.

Thursday, September 27, 2018

Wednesday, September 19, 2018

Generative grammar's Chomsky Problem

Martin Haspelmath (MP) and I inhabit different parts of the (small) linguistics universe. Consequently, we tend to value very different kinds of work and look to answer very different kinds of questions. As a result, when our views converge, I find it interesting to pay attention. In what follows I note a point or two of convergence. Here is the relevant text that I will be discussing (Henceforth MHT (for MHtext)).[1]

MHT’s central claim is that “Chomsky no longer argues for a rich UG of the sort that would be relevant for the ordinary grammarian and, e.g. for syntax textbooks” (1). It extends a similar view to me: “even if he is not as radical about a lean UG as Chomsky 21stcentury writings (where nothing apart from recursion is UG), Hornstein’s view is equally incompatible with current practice in generative grammar” (MHT emphasis, (2)).[2]

Given that neither Chomsky nor I seems to be inspiring current grammatical practice (btw, thx for the company MH), MHT notes that “generative grammarians currently seem to lack an ideological superstructure.” MHT seems to suggest that this is a problem (who wants to be superstructure-less after all?), though it is unclear for whom, other than Chomsky and me (what’s a superstructure anyhow?). MHT adds that Chomsky “does not seem to be relevant to linguistics anymore” (2).

MHT ends with a few remarks about Chomsky on alien (as in extra-terrestial) language, noting a difference between him and Jessica Coon on this topic. Jessica says the following (2):

 When people talk about universal grammar it’s just the genetic endowment that allows humans to acquire language. There are grammatical properties we could imagine that we just don’t ever find in any human language, so we know what’s specific to humans and our endowment for language. There’s no reason to expect aliens would have the same system. In fact, it would be very surprising if they did. But while having a better understanding of human language wouldn’t necessarily help, hopefully it’d give us tools to know how we might at least approach the problem.

This is a pretty vintage late 1980s bioling view of FL. Chomsky demurs, thinking that perhaps “the Martian language might not be so different from human language after all” (3). Why? Because Chomsky proposes that many features of FL might be grounded in generic computational properties rather than idiosyncratic biological ones. In his words:

We can, in short, try to sharpen the question of what constitutes a principled explanation for properties of language, and turn to one of the most fundamental questions of the biology of language: to what extent does language approximate an optimal solution to conditions that it must satisfy to be usable at all, given extralinguistic structural architecture?” 

MHT finds this opaque (as do I actually) though the intent is clear: To the degree that the properties of FL and the Gs it gives rise to are grounded in general computational properties, properties that a system would need to have “to be usable at all” then to that degree there is no reason to think that these properties would be restricted to human language (i.e. there is no reason to think that they would be biologically idiosyncratic). 

MHT’s closing remark about this is to reiterate his main point: “Chomsky’s thinking since at least 2002 is not really compatible with the practice of mainstream generative grammar” (3-4).

I agree with this, especially MHT's remark about current linguistic practice. Much of what interests Chomsky (and me) is not currently high up on the GG research agenda. Indeed, I have argued (herethat much of current GG research has bracketed the central questions that originally animated GG research and that this change in interests is what largely lies behind the disappointment many express with the Minimalist Program (MP). 

More specifically, I think that though MP has been wildly successful in its own terms and that it is the natural research direction building on prior results in GG, its central concerns have been of little mainstream interest. If this assessment is correct, it raises a question: why the mainstream disappointment with MP and why has current GG practice diverged so significantly from Chomsky’s? I believe that the main reason is that MP has sharpened the two contradictory impulses that have been part of the GG research program from its earliest days. Since the beginning there has been a tension between those mainly interested in the philological details of languages and those interested in the mental/cognitive/neuro implications of linguistic competence.

We can get a decent bead on the tension by inspecting two standard answers to a simple question: what does linguistics study? The obvious answer is language. The less obvious answer is the capacity for language (aka, linguistic competence). Both are fine interests (actually, I am not sure that I believe this, but I want to be concessive (sorry Jerry)). And for quite a while it did not much matter to everyday research in GG which interest guided inquiry as the standard methods for investigating the core properties of the capacity for language proceeded via a filigree philological analysis of the structures of language. So, for example, one investigated the properties of the construal modules by studying the distribution of reflexives and pronouns in various languages. Or by studying the locality restrictions on questions formation (again in particular languages) one could surmise properties of the mentalist format of FL rules and operations. Thus, the way that one studied the specific cognitive capacity a speaker of a particular language L had was by studying the details of the language L and the way that one studied more general (universal) properties characteristic of FL and UG was by comparing and contrasting constructions and their properties across various Ls. In other words, the basic methods were philological even if the aims were cognitive and mentalisic.[3]And because of this, it was perfectly easy for the work pursued by the philologically inclined to be useful to those pursuing the cognitive questions and vice versa. Linguistic theory provided powerful philological tools for the description of languages and this was a powerful selling point. 

This peaceful commensalism ends with MP. Or, to put it more bluntly, MP sharpens the differences between these two pursuits because MP inquiry only makes sense in a mentalistic/cognitive/neuro setting. Let me explain.

Here is very short history of GG. It starts with two facts: (1) native speakers are linguistically productive and (2) any human can learn any language. (1) implies that natural languages are open ended and thus can only be finitely characterized via recursive rule systems (aka grammars (Gs)). Languages differ in the rules their Gs embody. Given this, the first item on the GG research agenda was to specify the kinds of rules that Gs have and the kinds of dependencies Gs care about. Given an inventory of such rules sets up the next stage of inquiry.

The second stage begins with fact (2). Translated into Gish terms it says that any Language Acquisition Device (aka, child) can acquire any G. We called this meta-capacity to acquire Gs “FL” and we called the fine structure of FL “UG.” The fact that any child can acquire any G despite the relative paucity and poverty of the linguistic input data implies that FL has some internal structure. We study this structure by studying the kinds of rules that Gs can and cannot have. Note that this second project makes little sense until we have candidate G rules. Once we have some, we can ask why the rules we find have the properties they do (e.g. structure dependence, locality, c-command). Not surprisingly then, the investigation of FL/UG and the investigation of language particular Gs naturally went hand in hand and the philological methods beloved of typologists and comparative grammarians led the way. And boy did they lead! GB was the culmination of this line of inquiry. GB provided the first outlines of what a plausible FL/UG might look like, one that had grounding in facts about actual Gs. 

Now, this line of research was, IMO, very successful. By the mid 90s, GG had discovered somewhere in the vicinity of 25-35 non-trivial universals (i.e. design features of FL) that were “roughly” correct (see here for a (partial) list). These “laws of grammar” constitute, IMO, a great intellectual achievement. Moreover, they set the stage for MP in much the way that the earlier discovery of rules of Gs set the stage for GB style theories of FL/UG. Here’s what I mean.

Recall that studying the fine structure of FL/UG makes little sense unless we have candidate Gs and a detailed specification of some of their rules. Similarly, if one’s interest is in understanding why our FL has the properties it has, we need some candidate FL properties (UG principles) for study. This is what the laws of grammar provide; candidate principles of FL/UG. Given these we can now ask why we have these kinds of rules/principles and not other conceivable ones. And this is the question that MP sets for itself: why this FL/UG? MP, in short, takes as its explanadum the structure of FL.[4]

Note, if this is indeed the object of study, then MP only makes sense from a cognitive perspective. You won’t ask why FL has the properties it has if you are not interested in FL’s properties in the first place. So, whereas the minimalist program so construed makes sense in a GG setting of the Chomsky variety where a mental organ like FL and its products are the targets of inquiry, it is less clear that the project makes much sense if ones interests are largely philological (in fact, it is pretty clear to me that it doesn’t). If this is correct and if it is correct that most linguists have mainly philological interests then it should be no surprise that most linguists are disappointed with MP inquiry. It does not deliver what they can use for it is no longer focused on questions analogous to the ones that were prominent before and which had useful spillover effects. The MP focus is on issues decidedly more abstract and removed from immediate linguistic data than heretofore. 

There is a second reason that MP will disappoint the philologically inclined. It promotes a different sort of inquiry. Recall that the goal is explaining the properties of FL/UG (i.e. the laws of grammar are the explanada). But this explanatory project requires presupposing that the laws are more or less correct. In other words, MP takes GB as (more or less) right.[5] MP's added value comes in explaining it, not challenging it. 

In this regard, MP is to GB what Subjacency Theory is to Ross’s islands. The former takes Ross’s islands as more or less descriptively accurate and tries to derive them on the basis of more natural assumptions. It would be dumb to aim at such a derivation if one took Ross’s description to be basically wrong headed. So too here. Aiming to derive the laws of grammar requires believing that these are basically on the right track. However, this means that so far as MP is concerned, the GBish conception of UG, though not fundamental, is largely empirically accurate. And this means that MP is not an empirical competitor to GB. Rather, it is a theoretical competitor in the way that Subjacency Theory is to Ross’s description of islands. Importantly, empirically speaking, MP does not aim to overthrow (or even substantially revise the content of) earlier theory.[6]

Now this is a problem for many working linguists. First, many don’t have the same sanguine view that I do of GB and the laws it embodies. In fact, I think that many (most?) linguists doubt that we know very much about UG or FL or that the laws of grammar are even remotely correct. If this is right, then the whole MP enterprise will seem premature and wrong headed to them.  Second, even if one takes these as decent approximations to the truth, MP will encourage a kind of work that will be very different from earlier inquiry. Let me explain.

The MP project so conceived will involve two subparts. The first one is to derive the GB principles. If successful, this will mean that we end up empirically where we started. If successful, MP will recover the content of GB. Of course, if you think GB is roughly right, then this is a good place to end up. But the progress will be theoretical not empirical. It will demonstrate that it is reasonable to think that FL is simpler than GB presents it as being. However, the linguistic data covered will, at least initially, be very much the same. Again, this is a good thing from a theoretical point of view. But if one’s interests are philological and empirical, then this will not seem particularly impressive as it will largely recapitulate GB's empirical findings, albeit in a novel way.

The second MP project will be to differentiate the structure of FL and to delineate those parts that are cognitively general from those that are linguistically proprietary. As you all know, the MP conceit is that linguistic competence relies on only a small cognitive difference between us and our apish cousins. MP expects FL’s fundamental operations and principles to be cognitively and computationally generic rather than linguistically specific. When Chomsky denies UG, what he denies is that there is a lot of linguistic specificity to FL (again: he does not deny that the GB identified principles of UG are indeed characteristic features of FL). Of course, hoping that this is so and showing that it might be/is are two very different things. The MP research agenda is to make good on this. Chomsky’s specific idea is that Merge and some reasonable computational principles are all that one needs. I am less sanguine that this is all that one needs, but I believe that a case can be made that this gets one pretty far. At any rate, note that most of this work is theoretical and it is not clear that it makes immediate contact with novel linguistic data (except, of course, in the sense that it derives GB principles/laws that are themselves empirically motivated (though recall that these are presupposed rather than investigated)). And this makes for a different kind of inquiry than the one that linguists typically pursue. It worries about finding natural more basic principles and showing how these can be deployed to derive the basic features of FL. So a lot more theoretical deduction and a lot less (at least initially) empirical exploration.

Note, incidentally, that in this context, Chomsky’s speculations about Martians and his disagreement with Coons is a fanciful and playful way of making an interesting point. If FL’s basic properties derive from the fact that it is a well designed computational system (its main properties follow from generic features of computations), then we should expect other well designed computational systems to have similar properties. That is what Chomsky is speculating might be the case. 

So, why is Chomsky (and MP work more generally) out of the mainstream? Because mainstream linguistics is (and has always been IMO) largely uninterested in the mentalist conception of language that has always motivated Chomsky’s view of language. For a long time, the difference in motivations between Chomsky and the rest of the field was of little moment. With MP that has changed. The MP project only makes sense in a mentalist setting and invites decidedly philologically  projects without direct implications for further philological inquiry. This means that the two types of linguistics are parting company. That’s why many have despaired about MP. It fails to have the crossover appeal that prior syntactic theory had. MHT's survey of the lay of the linguistic land accurately reflects this IMO.

Is this a bad thing? Not necessarily, intellectually speaking. After all, there are different projects and there is no reason why we all need to be working on the same things, though I would really love it if the field left some room for the kind of theoretical speculation that MP invites.

However, the divergence might be sociologically costly. Linguistics has gained most of its extra mural prestige from being part of the cog-neuro sciences. Interestingly, MP has generated interest in that wider world (and here I am thinking cog-neuro and biology). Linguistics as philology is not tethered to these wider concerns. As a result, linguistics in general will, I believe, become less at the center of general intellectual life than it was in earlier years when it was at the center of work in the nascent cognitive and cog-neuro sciences. But I could be wrong. At any rate, MHT is right to observe that Chomsky’s influence has waned within linguistics proper. I would go further. The idea that linguistics is and ought to be part of the cog-neuro sciences is, I believe, a minority position within the discipline right now. The patron saint of modern linguistics is not Chomsky, but Greenberg. This is why Chomsky has become a more marginal figure (and why MH sounds so delighted). I suspect that down the road there will be a reshuffling of the professional boundaries of the discipline, with some study of language of the Chomsky variety moving in with cog-neuro and some returning to the language departments. The days of the idea of a larger common linguistic enterprise, I believe, are probably over.


[1]I find that this is sometimes hard to open. Here is the url to paste in:
https://dlc.hypotheses.org/1269 

[2]I should add that I have a syntax textbook that puts paid to the idea that Chomsky’s basic current ideas cannot be explicated in one. That said, I assume that what MHT intends is that Chomsky’s views are not standard text book linguistics anymore. I agree with this, as you will see below.
[3]This was and is still the main method of linguistic investigation. FoLers know that I have long argued that PoS style investigations are different in kind from the comparative methods that are the standard and that when applicable they allow for a more direct view of the structure of FL. But as I have made this point before, I will avoid making it here. For current purposes, it suffices to observe that whatever the merits of PoS styles of investigation, these methods are less prevalent than the comparative method is.
[4]MHT thinks that Chomsky largely agrees with anti UG critics in “rejecting universal grammar” (1). This is a bit facile. What Chomsky rejects is that the kinds of principles we have identified as characteristic of UG are linguistically specific. By this he intends that they follow from more general principles. What he does not do, at least this is not what Ido, is reject that the principles of UG as targets of explanation. The problem with Evans and Levinson and Ibbotson and Tomasello is that their work fails to grapple with what GG has found in 60 years of research. There are a ton of non-trivial Gish facts (laws) that have been discovered. The aim is to explain these facts/laws and ignoring them or not knowing anything about them is not the same as explaining them. Chomsky “believes” that language has properties that previous work on UG ahs characterized. What he is questioning is whether theseproperties are fundamental or derived. The critics of UG that MHT cites have never addressed this question so they and Chomsky are engaged in entirely different projects. 
            Last point: MHT notes that neophytes will be confused about all of this. However, a big part of the confusion comes from people telling them that Chomsky and Evans/Levinson and Ibbotson/Tomasello are engaged in anything like the same project.
[5]Let me repeat for the record, that one can do MP and presuppose some conception of FL other than GB. IMO, most of the different “frameworks” make more or less the same claims. I will stick to GB because this is what I know best andMP indeed has targeted GB conceptions most directly.
[6]Or, more accurately, it aims to preserve most of it, just as General Relativity aimed to preserve most of Newtonian mechanics.

Wednesday, September 12, 2018

The neural autonomy of syntax

Nothing does language like humans do language. This is not a hypothesis. It is a simple fact. Nonetheless, it is often either questioned or only reluctantly conceded. Therefore, I urge you to repeat the first sentence of this post three times before moving forward. It is both true and a truism. 

Let’s go further. The truth of this observation suggests the following non-trivial inference: there is something biologically special about humans that enables them (us) to be linguistically proficient andthis special mental power is linguistically specific. In other words, humans are uniquely cognitively endowed as a matter of biology when it comes to language and this biological gift is tailored to track some specific cognitive feature of language rather than (for example) being (just!) a general increase in (say)generalbrain power. On this view, the traditional GG conception stemming from Chomsky takes FL to be both species specific and domain specific. 

Before proceeding, let me at once note that these are independent specificity theses. I do this because every time I make this point, others insist in warning me that the fact mentioned in the first sentence does not imply the inference I just drew in the second paragraph. Quite right. In fact: 

It is logically possible that linguistic competence supervenes on no domain specific capacities but is still species specific in that only humans have (for example) sufficiently powerful general brains to be linguistically proficient. Say, for example, linguistic competence requires at least 500 units of cognitive power (CP) and only human brains can generate this much CP. However, modulo the extra CPs, the mental “programs” the CPs drive are the same as those that (at least some) other cognitive creatures enjoy, they just cannot drive them as fast or as far because of mileage restrictions imposed by low CP brains.

Similarly, it is logically possible that animals other than humans have domain specific linguistic powers. It is conceivable that apes, corvids, platypuses, manatees, and Portuguese water dogs all have brains that include FLs just like ours that are linguistically specific (e.g. syntax focused and not exercised in other cognitive endeavors). Were this so, then both they and we would have brains with specific linguistic sensitivities in virtue of having brains with linguistically bespoke wiring/circuitry or whatever specially tailored brain ware makes FL brains special. Of course, were I one of them I would keep this to myself as humans have the unfortunate tendency of dismembering anything that might yield scientific insight (or just might be tasty). If these other animals actually had an FL I am pretty sure some NIH scientist would be trying to figure out how to slice and dice their brains in order to figure out how its FL ticks.

So, both options are logically possible, but, the GG tradition stemming from Chomsky (and this includes yours truly, a fully paid up member of this tribe) has doubted that these logical options are live and that when it comes to language onlywe humans are built for it and what makes our cognitive profile special is a set of linguistically specific cognitive functions built into FL and dedicated to linguistic cognition. Or, to put this another way, FL has some special cognitive sauce that allows us to be as linguistically adept as we evidently are and we alone have minds/brains with this FL.

Nor do the exciting leaps of inference stop here. GG has gone even further out on the empirical limb and suggested that the bespoke property of FL that makes us linguistically special involves an autonomous SYNTAX (i.e. a syntax irreducible to either semantics or phonology and with its own special combinatoric properties). That’s right readers, syntax makes the linguistic world go round and only we got it and that’s why we are so linguistically special![1]Indeed, if a modern linguistic Ms or Mr Hillel were asked to sum up GG while standing on one foot s/he could do worse than say, only humans have syntax, all the rest is commentary.

This line of reasoning has been (and still is) considered very contentious. However, I recently ran across a paper by Campbell and Tyler (here, henceforth C&T) that argues for roughly this point (thx to Johan Bolhuis and William Matchin for sending it along). The paper has several interesting features, but perhaps the most intriguing (to me) is that Tyler is one of the authors. If memory serves, when I was growing up, Tyler was one of those who were very skeptical that there was anything cognitively special about language. Happily, it seems that times have changed.

C&T argues that brain localizes syntactic processing in the left frontotemporal lobe and “makes a strong case for the domain specificity of the frontotemporal syntax system and its autonomy from domain-general networks” (132). So, the paper argues for a neural version of the autonomy of syntax thesis. Let me say a few more words about it.

First, C&T notes that (of course) the syntax dedicated part of the brain regularly interacts with the non-syntactic domain general parts of the brain. However, the paper rightly notes that this does not argue against the claim that there is an autonomous syntactic system encoded in the brain. It merely means that finding it will be hard as this independence will often be obscured.  More particularly C&T says the activation of the domain general systems only arise “during task based language comprehension” (133). Tasks include having to make an acceptability judgment. When we focus on pure comprehension, however, without requiring any further “task” we find that “only the left-laterilized frontotemporal syntax system and auditory networks are activated” (133). Thus, the syntax system only links to the domain general ones during “overt task performance” and otherwise activates alone. C&T note that this implies that the syntactic system alone is sufficient for syntactic analysis during language comprehension.

Second, C&T argue that arguments against the neural autonomy of syntax rest on bad definitions of domain specificity. More particularly, according to C&T the benchmarks for autonomy in other studies beg the autonomy question by embedding a “task” in the measure and so “lead to the activation of additional domain-general regions” (133). As C&T notes, when such “tasks” are controlled for, we only find activation in the syntax region.

Third, the relevant notion of syntax is the one GGers know and love. For C&T takes syntax to be prime species specific feature of the brain and understands syntax in GGish terms to be implicated in “the construction of hierarchical syntactic structures.” C&T contrasts hierarchical relations with “adjacency relationships” which it claims “both human and non-human primates are sensitive to” (134). This is pretty much the conventional GG view and C&T endorses it.

And there is more. C&T endorses the Hauser, Chomsky, Fitch distinction between FLN and FLB. This is not surprising for once one adopts an autonomy of syntax thesis and appreciates the uniqueness of syntax in human minds/brains the distinction follows pretty quickly. Let me quote C&T (135):

In this brief overview, we have suggested that it is necessary to take a more nuanced view to differentiating domain-general and domain-specific components involved in language. While syntax seems to meet the criteria for domain-specificity….there are other key components in the wider language system which are domain-general in that they are also involved in a number of cognitive functions which do not involve language.

C&T has one last intriguing feature, at least for a GGer like me. The name ‘Chomsky’ or the terms ‘generative grammar’ are never mentioned, not even once (shades of Voldemort!). Quite clearly, the set of ideas that the paper explores presupposes the basic correctness of the Chomskyan generative enterprise. C&T arugues for a neural autonomy of syntax thesis and, in doing so, it relies on the main contours of the Chomsky/GG conception of FL. Yes, if C&T is correct it adds to this body of thought. But it clearly relies on it’s main claims and presupposes their essential correctness. A word to this effect would have been nice to see. That said, read the paper. Contrary to the assumptions of many, it argues that for a cog-neuro conception of the Chomsky conception of language. Even if it dares not speak his name.


[1]I suspect that waggle dancing bees and dead reckoning insects also non verbally advance a cognitive exceptionalism thesis and preen accordingly.

Tuesday, September 4, 2018

Two pictures of the mind?(brain)

Empiricists (E) and Rationalists (R) have two divergent “pictures” of how the mind/brain functions (henceforth, I use ‘mind’ unless brains are the main focus).[1]

For Es, the mind/brain is largely a passive instrument that, when running well, faithfully records the passing environmental scene. Things go awry when the wrong kinds of beliefs intrude between the sensory input and receptive mind to muddy the reception. The best mind is a perfectly receptive mind. Passive is good. Active leads to distortion.[2]

For Rs there is no such thing as a passive mind. What you perceive is actively constructed along dimensions that the mind makes available. Perception is constructed. There is no unvarnished input, as transduction takes place along routes the mind lays out and regulates. More to the point, sensing is an activity guided by mental structure.

All of this is pretty old hat. However, that does not mean that it has been well assimilated into the background wisdom of cog-neuro.  Indeed, from what I can tell, there are large parts of this world (and the closely related Big Data/Deep Mind world) that take the R picture to be contentious and the E picture to be obvious (though as we shall see, this seems to be changing).  I recently ran across several nice pieces that discuss these issues in interesting ways that I would like to bring to your attention. Let me briefly discuss each of them in turn.

The first appeared here (let’s call the post TF (Teppo Felin being the author)) and it amusingly starts by discussing that famous “gorilla” experiment. In case you do not know it, it goes as follows (TF obligingly provides links to Youtube videos that will allow you to be a subject and “see” the gorilla (or not) for yourself). Here is TF’s description (2):

 In the experiment, subjects were asked to watch a short video and to count the basketball passes. The task seemed simple enough. But it was made more difficult by the fact that subjects had to count basketball passes by the team wearing white shirts, while a team wearing black shirts also passed a ball. This created a real distraction.

The experiment came with a twist. While subjects try to count basketball passes, a person dressed in a gorilla suit walks slowly across the screen. The surprising fact is that some 70 per cent of subjects never see the gorilla. When they watch the clip a second time, they are dumbfounded by the fact that they missed something so obvious. The video of the surprising gorilla has been viewed millions of times on YouTube – remarkable for a scientific experiment. Different versions of the gorilla experiment, such as the ‘moonwalking bear,’ have also received significant attention.
Now, it’s hard to argue with the findings of the gorilla experiment itself. It’s a fact that most people who watch the clip miss the gorilla.
The conclusion that is generally drawn (including by heavyweights like Kahneman) is that humans are “ ‘blind to the obvious, and blind to our blindness.’” The important point that TF makes is that thisdescription of the result presupposes that there is available a well defined mind independent notion of “prominence or obviousness.” Or, in my (tendentious) terms, it presupposes an Eish conception of perception and a passive conception of the mind.  The problem is that this conception of obviousness is false. As TF correctly notes, “all kinds of things are readily evident in the clip.” In fact, I would say that there are likely to be an infinite number of possible things that could be evident in the clip in the right circumstances. As Lila Gleitman once wisely observed, a picture is worth a thousand words and that is precisely the problem. There is no way to specify what is “obvious” in the perception of the clip independent of the mind doing the perceiving. As TF puts it, obviousness only makes sense relativized to perceivers’ mental capacities and goals. 
Now, ‘obviousness’ is not a technical cog-neuro term. The scientific term of art is ‘salience.’ TF’s point is that it is quite standardly assumed that salience is an objective property of a stimulus, rather than a mind mediated relation. Here is TF on Kahneman again (3).
Kahneman’s focus on obviousness comes directly from his background and scientific training in an area called psychophysics. Psychophysics focuses largely on how environmental stimuli map on to the mind, specifically based on the actual characteristics of stimuli, rather than the characteristics or nature of the mind. From the perspective of psychophysics, obviousness – or as it is called in the literature, ‘salience’ – derives from the inherent nature or characteristics of the environmental stimuli themselves: such as their size, contrast, movement, colour or surprisingness. In his Nobel Prize lecture in 2002, Kahneman calls these ‘natural assessments’. And from this perspective, yes, the gorilla indeed should be obvious to anyone watching the clip. 
TF gets one thing askew in this description IMO: the conception of salience it criticizes is Eish, not psychophysical.[3]True, psychophysics aims to understand how sensation leads to perception and sensations are tied to the distal stimuli that generate them. But this does not imply that salience is an inherent property of the distal stimulus. The idea that it is, is pure Eism. On this view, minds that “miss” the salient features of a stimulus are minds that are misfiring. But if minds makestimuli salient (rather than simply tracking what is salient), then a mind that misses a gorilla in a video clip when asked to focus on the number of passes being executed by members of a team may be functioning perfectly well (indeed, optimally). For this purpose the gorilla is a distraction and an efficient mind with the specific count-the-passes mandate in hand might be better placed to accomplish its goal were it to “ignore” the gorilla in the visual scene.[4]
Let me put this another way: if minds are active in perception (i.e. if minds are as Rs have taken them to be) then salience is not a matter of what you are looking atbut what you are looking for (this is TF’s felicitous distinction). And if this is so, every time you hear some cog-psych person talking about “salience” and attributing to it causal/explanatory powers, you should appreciate that what you are on the receiving end of is Eish propaganda. It’s just like when Es press “analogy” into service to explain how minds generalize/induce. There is no scientifically usefully available notions of either except as relativized to the specific properties of the minds involved. Again as TF puts it (4):
Rather than passively accounting for or recording everything directly in front of us, humans – and other organisms for that matter – instead actively look for things. The implication (contrary to psychophysics[5]) is that mind-to-world processes drive perception rather than world-to-mind processes.

Yup, sensation and perception are largely mind mediated activities. Once again, Rism is right and Eism is wrong (surprise!).

Now, all of this is probably obvious to you(at least once it is pointed out). But it seems that these points are still considered radical by some. For example, TF rightly observes that this view permeates the Big Data/Deep Learning (BD/DL) hoopla. If perception is simply picking out the objectively salient features of the environment unmediated by distorting preconceptions, then there is every reason to think that being able to quickly assimilate large amounts of input and statistically massage them quickly is the road to cognitive excellence. Deep Minds are built to do just that, and that is the problem (see herefor discussion of this issue by “friendly” critics of BD/DL). 

But, if Rism is right, then minds are not passive pattern matchers or neutral data absorbers but are active probers of the passing scene looking for information to justify inferences the mind is built to make. And if this is right, and some objective notion of salience cannot be uncritically taken to undergird the notion of relevance, then purely passive minds (i.e. current Deep Minds) won’t be able to separate what is critical from what is not. 

Indeed, this is what lies behind the failure of current AI to get anywhere on unsupervised learning. Learning needs a point of view. Supervised learning provides the necessary perspective in curating the data (i.e. by separating out the relevan-to-the-task (e.g. find the bunny)) data from the non-relevant-to-the-task data). But absent a curator (that which is necessarily missing from unsupervised learning), the point of view (what is obvious/salient/relevant) must come from the learner (i.e. in this case, the Deep Mind program). So if the goal is to get theories of unsupervised learning, the hard problem is to figure out what minds consider relevant/salient/obvious and to put this into the machine’s mind. But, and here is the problem, this is precisely the problem that Eism brackets by taking salience to be an objective feature of the stimulus. Thus, to the degree that BD/DL embrace Eism (IMO, the standard working assumption), to that degree it will fail to address the problem of unsupervised learning (which, I am told, is theproblem that everyone (e.g. Hinton) thinks needs solving).[6]

TF makes a few other interesting observations, especially as relates to the political consequences of invidiously comparing human and machine capacities to the detriment of the former. But for present purposes, TF’s utility lies in identifying anotherway that Eism goes wrong (in addition, for example, to abstracting away from exactly how minds generalize (remember, saying that the mind generalizes via “analogy” is to say nothing at all!)) and makes it harder to think clearly about the relevant issues in cog-neuro.

Sam Epstein develops this same theme in a linguistic context (here (SE)). SE starts with correctly observing that the process of acquiring a particular G relies on two factors, (i) an innate capacity that humans bring to the process and (ii) environmental input (i.e. the PLD). SE further notes that this two factor model is generally glossed as reflecting the contributions of “nature” (the innate capacity) and “nurture” (the PLD). And herein we find the seeds of a deep Eish misunderstanding of the process, quite analogous to the one the TF identified.  Let me quote SE (197-198):

[I]t is important to remember—as has been noted before, but
perhaps it remains underappreciated—that it is precisely the organism’s biology
(nature) that determines what experience, in any domain, can consist of …
To clarify, a bee, for example, can perform its waggle dance for me a million times, but that ‘experience’, given my biological endowment, does not allow me to transduce the visual images of such waggling into a mental representation (knowledge) of the distance and direction to a food source. This is precisely what it does mean to a bee witnessing the exact same environmental event/waggle dance. Ultrasonic acoustic disturbances might be experience for my dog, but not for me. Thus, the ‘environment’ in this sense is not in fact the second factor, but rather, nurture is constituted of those aspects of the ill-defined ‘environment’ (which of course irrelevantly includes a K-mart store down the street from my house) that can in principle influence the developmental trajectory of one or more organs of a member of a particular species, given its innate endowment.

In the biolinguistic domain, the logic is no different. The apparent fact that
exposure to some finite threshold amount of ‘Tagalog’ acoustic disturbances in
contexts (originating from outside the organism, in the ‘environment’) can cause
any normal human infant to develop knowledge of ‘Tagalog’ is a property of
human infants…. Thus the standard statement that on the one hand, innate properties of the organism and, on the other, the environment, determine organismic development, is profoundly misleading. It suggests that those environmental factors that can influence the development of particular types of organisms are definable, non-biologically—as the behaviorists sought, but of course failed, to define ‘stimulus’ as an organism-external construct. We can’t know what the relevant developmental stimuli are or aren’t, without knowing the properties of the organism.

This is, of course, correct. What counts as input to the language acquisition device (LAD) must be innately specified. Inputs do not come marked as linguistically vs non-linguistically relevant. Further what the LAD does in acquiring a G is the poster child example of unsupervised learning. And as we noted above, without a supervisor/curator selecting the relevant inputs for the child and organizing them into the appropriate boxes it’s the structure of the LAD that mustbe doing the relevant curating for itself. There really is no other alternative. 

SE points out an important consequence of this observation for nature vs nurture arguments within linguistics, including Poverty of Stimulus debates.  As SE notes (198): 

… organism external ‘stimuli’ cannot possibly suffice to explain any aspects of the developed adult state of any organism. 

Why? For the simple reason that the relevant PLD “experience” that the LAD exploits is itself a construction of the LAD. The relevant stimulus is the proximal one, and in the linguistic domain (indeed in most cognitively non-trivial domains) the proximal stimulus is only distantly related to the distal one that triggers the relevant transduction. Here is SE once more (199):

…experience is constructed by the organism’s innate properties, and is very different from ‘the environment’ or the behaviorist notion of ‘stimulus’.

As SE notes, all of this was well understood over 300 years ago (SE contains a nice little quote from Descartes). Actually, there was a lively discussion at the start of the “first cognitive revolution” (I think this is Chomsky’s term) that went under the name of the “primary/secondary quality distinction” that tried to categorize those features of proximate stimuli that reflected objective features of their distal causes and those that did not. Here appears to be another place where we have lost clear sight of conceptual ground that our precursors cleared.

SE contains a lot more provocative (IMO, correct) discussion of the implications of the observation that experience is a nature-infested notion. Take a look.

Let me mention one last paper that can be read along side TF and SE. It is on predictive coding, a current fad, apparently, within the cog-neuro world (here). The basic idea is that the brain makes top down predictions based on its internal mental/brain models about what it should experience, perception amounting to checking these predictions against the “input” and adjusting the mental models to fit these. In other words, perception is cognitively saturated. 

This idea seems to be getting a lot of traction of late (a piece in Quantais often a good indicator that an idea is “hot”). For our purposes, the piece usefully identifies how the new view differs from the one that was previously dominant (7-8):
The view of neuroscience that dominated the 20th century characterized the brain’s function as that of a feature detector: It registers the presence of a stimulus, processes it, and then sends signals to produce a behavioral response. Activity in specific cells reflects the presence or absence of stimuli in the physical world. Some neurons in the visual cortex, for instance, respond to the edges of objects in view; others fire to indicate the objects’ orientation, coloring or shading…
Rather than waiting for sensory information to drive cognition, the brain is always actively constructing hypotheses about how the world works and using them to explain experiences and fill in missing data. That’s why, according to some experts, we might think of perception as “controlled hallucination.”
Note the contrast: perception consists in detecting objective features of the stimulus vs constructing hypotheses about how the world works verified against bottom up “experience.” In other words, a passive feature detector vs an active mind constructing hypothesis tester.  Or, to be tendentious one more time, an Eish vs an Rish conception of the mental. 
One point worth noting. When I was a youngster oh so many decades ago, there was a big fight about whether brain mechanisms are largely bottom up or top down computational systems. The answer, of course, is that it uses both kinds of mechanisms. However the prevalent sentiment in the neuro world was that brains were largely bottom up systems, with higher levels generalizing over features provided by lower ones. Chomsky’s critique of discovery procedures (see herefor discussion) hit at exactly this point, noting that in the linguistic case it was not possible to treat higher levels as simple summaries of the statistical properties of lower ones. Indeed, the flow of information likely went from higher to lower as well. This has a natural interpretation in terms of brains mechanisms involving feed forward as well as feed back loops. Interestingly, this is what has also driven the trend towards predictive coding in the neuro world. It was discovered that the brain has many “top down feedback connections” (7)[7]and this sits oddly with the idea that brains basically sit passively waiting to absorb perceptual inputs. At any rate, there is an affinity between thinking brains indulge in lots of feed forward processing and taking brains to be active interpreters of the passing perceptual scene.
That’s it. To repeat the main message, the E vs R conceptions of the mind/brain and how it functions are very different, and importantly so. As the above papers note, it is all too easy to get confused about important matters if the differences between these two views of the mental world are not kept in mind. Or, again to be tendentious: Eism is bad for you! Only a healthy dose of Rism can protect you from walking its fruitless paths. So arm yourself and have a blessed Rish day.

[1]They also have two divergent pictures of how data and theory relate in inquiry, but that is not the topic of today’s sermonette.
[2]I have argued elsewhere (here) that this passivity is what allows Es to have a causal semantic theory. 
[3]Nor from what I can gather from Kahneman’s Noble lecture is he committed to the view that salience is a property of objects. Rather it is a property of situations a sentient agent finds herself in. The important point for Kahneman is that they are more or less automatic, fast, and unconscious. This is consistent with it being cognitively guided rather than a transparent reflection of the properties of the object. So, though TF’s point is useful, I suspect that he did not get Kahneman quite right. Happily none of that matters here.
[4]A perhaps pointless quibble: the fact that people cannot reportseeing a gorilla does not mean that they did not perceive one. The perceptual (and even cognitive) apparatus might indeed have registered a gorilla without it being the case that that viewers can access this information consciously. Think of being asked about the syntax of a sentence after hearing it and decoding its message. This is very hard to retrieve (it is below consciousness most of the time) but that does not mean that the syntax is not being computed. At any rate, none of this bears on the central issues, but it was a quibble that I wanted to register.
[5]NH: again, I would replace ‘psychophysics’ with ‘Eism.’
[6]As TF notes, this is actually a very old problem within AI. It is the “frame problem.” It was understood to be very knotty and nobody had any idea how to solve it in the general case. But, as TF noted, it has been forgotten “amid the present euphoria with large-scale information- and data-processing” (6).
            Moreover, it is a very hard problem. It is relatively easy to identify salient features givena context. Getting a theory of salience, in contrast, (i.e. a specification of the determinants of salience acrosscontexts) is very hard. As Kahneman notes in his Nobel Lecture (456), it is unlikely that we will have one of these anytime soon. Interestingly, early on Descartes identified the capacity for humans to appropriatelyrespond to what’s around them as an example of stimulus free (i.e. free and creative) behavior. We do not know more about this now than Descartes did in the 17thcentury, a correct point that Chomsky likes to make.
[7]If recollection serves (but remember I am old and on the verge of dementia) the connections from higher to lower brain levels is upwards of five times those from lower to upper. It seems that the brain is really eager to involve higher level “expectations” in the process of analyzing incoming sensations/perceptions.