Wednesday, September 19, 2018

Generative grammar's Chomsky Problem

Martin Haspelmath (MP) and I inhabit different parts of the (small) linguistics universe. Consequently, we tend to value very different kinds of work and look to answer very different kinds of questions. As a result, when our views converge, I find it interesting to pay attention. In what follows I note a point or two of convergence. Here is the relevant text that I will be discussing (Henceforth MHT (for MHtext)).[1]

MHT’s central claim is that “Chomsky no longer argues for a rich UG of the sort that would be relevant for the ordinary grammarian and, e.g. for syntax textbooks” (1). It extends a similar view to me: “even if he is not as radical about a lean UG as Chomsky 21stcentury writings (where nothing apart from recursion is UG), Hornstein’s view is equally incompatible with current practice in generative grammar” (MHT emphasis, (2)).[2]

Given that neither Chomsky nor I seems to be inspiring current grammatical practice (btw, thx for the company MH), MHT notes that “generative grammarians currently seem to lack an ideological superstructure.” MHT seems to suggest that this is a problem (who wants to be superstructure-less after all?), though it is unclear for whom, other than Chomsky and me (what’s a superstructure anyhow?). MHT adds that Chomsky “does not seem to be relevant to linguistics anymore” (2).

MHT ends with a few remarks about Chomsky on alien (as in extra-terrestial) language, noting a difference between him and Jessica Coon on this topic. Jessica says the following (2):

 When people talk about universal grammar it’s just the genetic endowment that allows humans to acquire language. There are grammatical properties we could imagine that we just don’t ever find in any human language, so we know what’s specific to humans and our endowment for language. There’s no reason to expect aliens would have the same system. In fact, it would be very surprising if they did. But while having a better understanding of human language wouldn’t necessarily help, hopefully it’d give us tools to know how we might at least approach the problem.

This is a pretty vintage late 1980s bioling view of FL. Chomsky demurs, thinking that perhaps “the Martian language might not be so different from human language after all” (3). Why? Because Chomsky proposes that many features of FL might be grounded in generic computational properties rather than idiosyncratic biological ones. In his words:

We can, in short, try to sharpen the question of what constitutes a principled explanation for properties of language, and turn to one of the most fundamental questions of the biology of language: to what extent does language approximate an optimal solution to conditions that it must satisfy to be usable at all, given extralinguistic structural architecture?” 

MHT finds this opaque (as do I actually) though the intent is clear: To the degree that the properties of FL and the Gs it gives rise to are grounded in general computational properties, properties that a system would need to have “to be usable at all” then to that degree there is no reason to think that these properties would be restricted to human language (i.e. there is no reason to think that they would be biologically idiosyncratic). 

MHT’s closing remark about this is to reiterate his main point: “Chomsky’s thinking since at least 2002 is not really compatible with the practice of mainstream generative grammar” (3-4).

I agree with this, especially MHT's remark about current linguistic practice. Much of what interests Chomsky (and me) is not currently high up on the GG research agenda. Indeed, I have argued (herethat much of current GG research has bracketed the central questions that originally animated GG research and that this change in interests is what largely lies behind the disappointment many express with the Minimalist Program (MP). 

More specifically, I think that though MP has been wildly successful in its own terms and that it is the natural research direction building on prior results in GG, its central concerns have been of little mainstream interest. If this assessment is correct, it raises a question: why the mainstream disappointment with MP and why has current GG practice diverged so significantly from Chomsky’s? I believe that the main reason is that MP has sharpened the two contradictory impulses that have been part of the GG research program from its earliest days. Since the beginning there has been a tension between those mainly interested in the philological details of languages and those interested in the mental/cognitive/neuro implications of linguistic competence.

We can get a decent bead on the tension by inspecting two standard answers to a simple question: what does linguistics study? The obvious answer is language. The less obvious answer is the capacity for language (aka, linguistic competence). Both are fine interests (actually, I am not sure that I believe this, but I want to be concessive (sorry Jerry)). And for quite a while it did not much matter to everyday research in GG which interest guided inquiry as the standard methods for investigating the core properties of the capacity for language proceeded via a filigree philological analysis of the structures of language. So, for example, one investigated the properties of the construal modules by studying the distribution of reflexives and pronouns in various languages. Or by studying the locality restrictions on questions formation (again in particular languages) one could surmise properties of the mentalist format of FL rules and operations. Thus, the way that one studied the specific cognitive capacity a speaker of a particular language L had was by studying the details of the language L and the way that one studied more general (universal) properties characteristic of FL and UG was by comparing and contrasting constructions and their properties across various Ls. In other words, the basic methods were philological even if the aims were cognitive and mentalisic.[3]And because of this, it was perfectly easy for the work pursued by the philologically inclined to be useful to those pursuing the cognitive questions and vice versa. Linguistic theory provided powerful philological tools for the description of languages and this was a powerful selling point. 

This peaceful commensalism ends with MP. Or, to put it more bluntly, MP sharpens the differences between these two pursuits because MP inquiry only makes sense in a mentalistic/cognitive/neuro setting. Let me explain.

Here is very short history of GG. It starts with two facts: (1) native speakers are linguistically productive and (2) any human can learn any language. (1) implies that natural languages are open ended and thus can only be finitely characterized via recursive rule systems (aka grammars (Gs)). Languages differ in the rules their Gs embody. Given this, the first item on the GG research agenda was to specify the kinds of rules that Gs have and the kinds of dependencies Gs care about. Given an inventory of such rules sets up the next stage of inquiry.

The second stage begins with fact (2). Translated into Gish terms it says that any Language Acquisition Device (aka, child) can acquire any G. We called this meta-capacity to acquire Gs “FL” and we called the fine structure of FL “UG.” The fact that any child can acquire any G despite the relative paucity and poverty of the linguistic input data implies that FL has some internal structure. We study this structure by studying the kinds of rules that Gs can and cannot have. Note that this second project makes little sense until we have candidate G rules. Once we have some, we can ask why the rules we find have the properties they do (e.g. structure dependence, locality, c-command). Not surprisingly then, the investigation of FL/UG and the investigation of language particular Gs naturally went hand in hand and the philological methods beloved of typologists and comparative grammarians led the way. And boy did they lead! GB was the culmination of this line of inquiry. GB provided the first outlines of what a plausible FL/UG might look like, one that had grounding in facts about actual Gs. 

Now, this line of research was, IMO, very successful. By the mid 90s, GG had discovered somewhere in the vicinity of 25-35 non-trivial universals (i.e. design features of FL) that were “roughly” correct (see here for a (partial) list). These “laws of grammar” constitute, IMO, a great intellectual achievement. Moreover, they set the stage for MP in much the way that the earlier discovery of rules of Gs set the stage for GB style theories of FL/UG. Here’s what I mean.

Recall that studying the fine structure of FL/UG makes little sense unless we have candidate Gs and a detailed specification of some of their rules. Similarly, if one’s interest is in understanding why our FL has the properties it has, we need some candidate FL properties (UG principles) for study. This is what the laws of grammar provide; candidate principles of FL/UG. Given these we can now ask why we have these kinds of rules/principles and not other conceivable ones. And this is the question that MP sets for itself: why this FL/UG? MP, in short, takes as its explanadum the structure of FL.[4]

Note, if this is indeed the object of study, then MP only makes sense from a cognitive perspective. You won’t ask why FL has the properties it has if you are not interested in FL’s properties in the first place. So, whereas the minimalist program so construed makes sense in a GG setting of the Chomsky variety where a mental organ like FL and its products are the targets of inquiry, it is less clear that the project makes much sense if ones interests are largely philological (in fact, it is pretty clear to me that it doesn’t). If this is correct and if it is correct that most linguists have mainly philological interests then it should be no surprise that most linguists are disappointed with MP inquiry. It does not deliver what they can use for it is no longer focused on questions analogous to the ones that were prominent before and which had useful spillover effects. The MP focus is on issues decidedly more abstract and removed from immediate linguistic data than heretofore. 

There is a second reason that MP will disappoint the philologically inclined. It promotes a different sort of inquiry. Recall that the goal is explaining the properties of FL/UG (i.e. the laws of grammar are the explanada). But this explanatory project requires presupposing that the laws are more or less correct. In other words, MP takes GB as (more or less) right.[5] MP's added value comes in explaining it, not challenging it. 

In this regard, MP is to GB what Subjacency Theory is to Ross’s islands. The former takes Ross’s islands as more or less descriptively accurate and tries to derive them on the basis of more natural assumptions. It would be dumb to aim at such a derivation if one took Ross’s description to be basically wrong headed. So too here. Aiming to derive the laws of grammar requires believing that these are basically on the right track. However, this means that so far as MP is concerned, the GBish conception of UG, though not fundamental, is largely empirically accurate. And this means that MP is not an empirical competitor to GB. Rather, it is a theoretical competitor in the way that Subjacency Theory is to Ross’s description of islands. Importantly, empirically speaking, MP does not aim to overthrow (or even substantially revise the content of) earlier theory.[6]

Now this is a problem for many working linguists. First, many don’t have the same sanguine view that I do of GB and the laws it embodies. In fact, I think that many (most?) linguists doubt that we know very much about UG or FL or that the laws of grammar are even remotely correct. If this is right, then the whole MP enterprise will seem premature and wrong headed to them.  Second, even if one takes these as decent approximations to the truth, MP will encourage a kind of work that will be very different from earlier inquiry. Let me explain.

The MP project so conceived will involve two subparts. The first one is to derive the GB principles. If successful, this will mean that we end up empirically where we started. If successful, MP will recover the content of GB. Of course, if you think GB is roughly right, then this is a good place to end up. But the progress will be theoretical not empirical. It will demonstrate that it is reasonable to think that FL is simpler than GB presents it as being. However, the linguistic data covered will, at least initially, be very much the same. Again, this is a good thing from a theoretical point of view. But if one’s interests are philological and empirical, then this will not seem particularly impressive as it will largely recapitulate GB's empirical findings, albeit in a novel way.

The second MP project will be to differentiate the structure of FL and to delineate those parts that are cognitively general from those that are linguistically proprietary. As you all know, the MP conceit is that linguistic competence relies on only a small cognitive difference between us and our apish cousins. MP expects FL’s fundamental operations and principles to be cognitively and computationally generic rather than linguistically specific. When Chomsky denies UG, what he denies is that there is a lot of linguistic specificity to FL (again: he does not deny that the GB identified principles of UG are indeed characteristic features of FL). Of course, hoping that this is so and showing that it might be/is are two very different things. The MP research agenda is to make good on this. Chomsky’s specific idea is that Merge and some reasonable computational principles are all that one needs. I am less sanguine that this is all that one needs, but I believe that a case can be made that this gets one pretty far. At any rate, note that most of this work is theoretical and it is not clear that it makes immediate contact with novel linguistic data (except, of course, in the sense that it derives GB principles/laws that are themselves empirically motivated (though recall that these are presupposed rather than investigated)). And this makes for a different kind of inquiry than the one that linguists typically pursue. It worries about finding natural more basic principles and showing how these can be deployed to derive the basic features of FL. So a lot more theoretical deduction and a lot less (at least initially) empirical exploration.

Note, incidentally, that in this context, Chomsky’s speculations about Martians and his disagreement with Coons is a fanciful and playful way of making an interesting point. If FL’s basic properties derive from the fact that it is a well designed computational system (its main properties follow from generic features of computations), then we should expect other well designed computational systems to have similar properties. That is what Chomsky is speculating might be the case. 

So, why is Chomsky (and MP work more generally) out of the mainstream? Because mainstream linguistics is (and has always been IMO) largely uninterested in the mentalist conception of language that has always motivated Chomsky’s view of language. For a long time, the difference in motivations between Chomsky and the rest of the field was of little moment. With MP that has changed. The MP project only makes sense in a mentalist setting and invites decidedly philologically  projects without direct implications for further philological inquiry. This means that the two types of linguistics are parting company. That’s why many have despaired about MP. It fails to have the crossover appeal that prior syntactic theory had. MHT's survey of the lay of the linguistic land accurately reflects this IMO.

Is this a bad thing? Not necessarily, intellectually speaking. After all, there are different projects and there is no reason why we all need to be working on the same things, though I would really love it if the field left some room for the kind of theoretical speculation that MP invites.

However, the divergence might be sociologically costly. Linguistics has gained most of its extra mural prestige from being part of the cog-neuro sciences. Interestingly, MP has generated interest in that wider world (and here I am thinking cog-neuro and biology). Linguistics as philology is not tethered to these wider concerns. As a result, linguistics in general will, I believe, become less at the center of general intellectual life than it was in earlier years when it was at the center of work in the nascent cognitive and cog-neuro sciences. But I could be wrong. At any rate, MHT is right to observe that Chomsky’s influence has waned within linguistics proper. I would go further. The idea that linguistics is and ought to be part of the cog-neuro sciences is, I believe, a minority position within the discipline right now. The patron saint of modern linguistics is not Chomsky, but Greenberg. This is why Chomsky has become a more marginal figure (and why MH sounds so delighted). I suspect that down the road there will be a reshuffling of the professional boundaries of the discipline, with some study of language of the Chomsky variety moving in with cog-neuro and some returning to the language departments. The days of the idea of a larger common linguistic enterprise, I believe, are probably over.

[1]I find that this is sometimes hard to open. Here is the url to paste in: 

[2]I should add that I have a syntax textbook that puts paid to the idea that Chomsky’s basic current ideas cannot be explicated in one. That said, I assume that what MHT intends is that Chomsky’s views are not standard text book linguistics anymore. I agree with this, as you will see below.
[3]This was and is still the main method of linguistic investigation. FoLers know that I have long argued that PoS style investigations are different in kind from the comparative methods that are the standard and that when applicable they allow for a more direct view of the structure of FL. But as I have made this point before, I will avoid making it here. For current purposes, it suffices to observe that whatever the merits of PoS styles of investigation, these methods are less prevalent than the comparative method is.
[4]MHT thinks that Chomsky largely agrees with anti UG critics in “rejecting universal grammar” (1). This is a bit facile. What Chomsky rejects is that the kinds of principles we have identified as characteristic of UG are linguistically specific. By this he intends that they follow from more general principles. What he does not do, at least this is not what Ido, is reject that the principles of UG as targets of explanation. The problem with Evans and Levinson and Ibbotson and Tomasello is that their work fails to grapple with what GG has found in 60 years of research. There are a ton of non-trivial Gish facts (laws) that have been discovered. The aim is to explain these facts/laws and ignoring them or not knowing anything about them is not the same as explaining them. Chomsky “believes” that language has properties that previous work on UG ahs characterized. What he is questioning is whether theseproperties are fundamental or derived. The critics of UG that MHT cites have never addressed this question so they and Chomsky are engaged in entirely different projects. 
            Last point: MHT notes that neophytes will be confused about all of this. However, a big part of the confusion comes from people telling them that Chomsky and Evans/Levinson and Ibbotson/Tomasello are engaged in anything like the same project.
[5]Let me repeat for the record, that one can do MP and presuppose some conception of FL other than GB. IMO, most of the different “frameworks” make more or less the same claims. I will stick to GB because this is what I know best andMP indeed has targeted GB conceptions most directly.
[6]Or, more accurately, it aims to preserve most of it, just as General Relativity aimed to preserve most of Newtonian mechanics.

Wednesday, September 12, 2018

The neural autonomy of syntax

Nothing does language like humans do language. This is not a hypothesis. It is a simple fact. Nonetheless, it is often either questioned or only reluctantly conceded. Therefore, I urge you to repeat the first sentence of this post three times before moving forward. It is both true and a truism. 

Let’s go further. The truth of this observation suggests the following non-trivial inference: there is something biologically special about humans that enables them (us) to be linguistically proficient andthis special mental power is linguistically specific. In other words, humans are uniquely cognitively endowed as a matter of biology when it comes to language and this biological gift is tailored to track some specific cognitive feature of language rather than (for example) being (just!) a general increase in (say)generalbrain power. On this view, the traditional GG conception stemming from Chomsky takes FL to be both species specific and domain specific. 

Before proceeding, let me at once note that these are independent specificity theses. I do this because every time I make this point, others insist in warning me that the fact mentioned in the first sentence does not imply the inference I just drew in the second paragraph. Quite right. In fact: 

It is logically possible that linguistic competence supervenes on no domain specific capacities but is still species specific in that only humans have (for example) sufficiently powerful general brains to be linguistically proficient. Say, for example, linguistic competence requires at least 500 units of cognitive power (CP) and only human brains can generate this much CP. However, modulo the extra CPs, the mental “programs” the CPs drive are the same as those that (at least some) other cognitive creatures enjoy, they just cannot drive them as fast or as far because of mileage restrictions imposed by low CP brains.

Similarly, it is logically possible that animals other than humans have domain specific linguistic powers. It is conceivable that apes, corvids, platypuses, manatees, and Portuguese water dogs all have brains that include FLs just like ours that are linguistically specific (e.g. syntax focused and not exercised in other cognitive endeavors). Were this so, then both they and we would have brains with specific linguistic sensitivities in virtue of having brains with linguistically bespoke wiring/circuitry or whatever specially tailored brain ware makes FL brains special. Of course, were I one of them I would keep this to myself as humans have the unfortunate tendency of dismembering anything that might yield scientific insight (or just might be tasty). If these other animals actually had an FL I am pretty sure some NIH scientist would be trying to figure out how to slice and dice their brains in order to figure out how its FL ticks.

So, both options are logically possible, but, the GG tradition stemming from Chomsky (and this includes yours truly, a fully paid up member of this tribe) has doubted that these logical options are live and that when it comes to language onlywe humans are built for it and what makes our cognitive profile special is a set of linguistically specific cognitive functions built into FL and dedicated to linguistic cognition. Or, to put this another way, FL has some special cognitive sauce that allows us to be as linguistically adept as we evidently are and we alone have minds/brains with this FL.

Nor do the exciting leaps of inference stop here. GG has gone even further out on the empirical limb and suggested that the bespoke property of FL that makes us linguistically special involves an autonomous SYNTAX (i.e. a syntax irreducible to either semantics or phonology and with its own special combinatoric properties). That’s right readers, syntax makes the linguistic world go round and only we got it and that’s why we are so linguistically special![1]Indeed, if a modern linguistic Ms or Mr Hillel were asked to sum up GG while standing on one foot s/he could do worse than say, only humans have syntax, all the rest is commentary.

This line of reasoning has been (and still is) considered very contentious. However, I recently ran across a paper by Campbell and Tyler (here, henceforth C&T) that argues for roughly this point (thx to Johan Bolhuis and William Matchin for sending it along). The paper has several interesting features, but perhaps the most intriguing (to me) is that Tyler is one of the authors. If memory serves, when I was growing up, Tyler was one of those who were very skeptical that there was anything cognitively special about language. Happily, it seems that times have changed.

C&T argues that brain localizes syntactic processing in the left frontotemporal lobe and “makes a strong case for the domain specificity of the frontotemporal syntax system and its autonomy from domain-general networks” (132). So, the paper argues for a neural version of the autonomy of syntax thesis. Let me say a few more words about it.

First, C&T notes that (of course) the syntax dedicated part of the brain regularly interacts with the non-syntactic domain general parts of the brain. However, the paper rightly notes that this does not argue against the claim that there is an autonomous syntactic system encoded in the brain. It merely means that finding it will be hard as this independence will often be obscured.  More particularly C&T says the activation of the domain general systems only arise “during task based language comprehension” (133). Tasks include having to make an acceptability judgment. When we focus on pure comprehension, however, without requiring any further “task” we find that “only the left-laterilized frontotemporal syntax system and auditory networks are activated” (133). Thus, the syntax system only links to the domain general ones during “overt task performance” and otherwise activates alone. C&T note that this implies that the syntactic system alone is sufficient for syntactic analysis during language comprehension.

Second, C&T argue that arguments against the neural autonomy of syntax rest on bad definitions of domain specificity. More particularly, according to C&T the benchmarks for autonomy in other studies beg the autonomy question by embedding a “task” in the measure and so “lead to the activation of additional domain-general regions” (133). As C&T notes, when such “tasks” are controlled for, we only find activation in the syntax region.

Third, the relevant notion of syntax is the one GGers know and love. For C&T takes syntax to be prime species specific feature of the brain and understands syntax in GGish terms to be implicated in “the construction of hierarchical syntactic structures.” C&T contrasts hierarchical relations with “adjacency relationships” which it claims “both human and non-human primates are sensitive to” (134). This is pretty much the conventional GG view and C&T endorses it.

And there is more. C&T endorses the Hauser, Chomsky, Fitch distinction between FLN and FLB. This is not surprising for once one adopts an autonomy of syntax thesis and appreciates the uniqueness of syntax in human minds/brains the distinction follows pretty quickly. Let me quote C&T (135):

In this brief overview, we have suggested that it is necessary to take a more nuanced view to differentiating domain-general and domain-specific components involved in language. While syntax seems to meet the criteria for domain-specificity….there are other key components in the wider language system which are domain-general in that they are also involved in a number of cognitive functions which do not involve language.

C&T has one last intriguing feature, at least for a GGer like me. The name ‘Chomsky’ or the terms ‘generative grammar’ are never mentioned, not even once (shades of Voldemort!). Quite clearly, the set of ideas that the paper explores presupposes the basic correctness of the Chomskyan generative enterprise. C&T arugues for a neural autonomy of syntax thesis and, in doing so, it relies on the main contours of the Chomsky/GG conception of FL. Yes, if C&T is correct it adds to this body of thought. But it clearly relies on it’s main claims and presupposes their essential correctness. A word to this effect would have been nice to see. That said, read the paper. Contrary to the assumptions of many, it argues that for a cog-neuro conception of the Chomsky conception of language. Even if it dares not speak his name.

[1]I suspect that waggle dancing bees and dead reckoning insects also non verbally advance a cognitive exceptionalism thesis and preen accordingly.

Tuesday, September 4, 2018

Two pictures of the mind?(brain)

Empiricists (E) and Rationalists (R) have two divergent “pictures” of how the mind/brain functions (henceforth, I use ‘mind’ unless brains are the main focus).[1]

For Es, the mind/brain is largely a passive instrument that, when running well, faithfully records the passing environmental scene. Things go awry when the wrong kinds of beliefs intrude between the sensory input and receptive mind to muddy the reception. The best mind is a perfectly receptive mind. Passive is good. Active leads to distortion.[2]

For Rs there is no such thing as a passive mind. What you perceive is actively constructed along dimensions that the mind makes available. Perception is constructed. There is no unvarnished input, as transduction takes place along routes the mind lays out and regulates. More to the point, sensing is an activity guided by mental structure.

All of this is pretty old hat. However, that does not mean that it has been well assimilated into the background wisdom of cog-neuro.  Indeed, from what I can tell, there are large parts of this world (and the closely related Big Data/Deep Mind world) that take the R picture to be contentious and the E picture to be obvious (though as we shall see, this seems to be changing).  I recently ran across several nice pieces that discuss these issues in interesting ways that I would like to bring to your attention. Let me briefly discuss each of them in turn.

The first appeared here (let’s call the post TF (Teppo Felin being the author)) and it amusingly starts by discussing that famous “gorilla” experiment. In case you do not know it, it goes as follows (TF obligingly provides links to Youtube videos that will allow you to be a subject and “see” the gorilla (or not) for yourself). Here is TF’s description (2):

 In the experiment, subjects were asked to watch a short video and to count the basketball passes. The task seemed simple enough. But it was made more difficult by the fact that subjects had to count basketball passes by the team wearing white shirts, while a team wearing black shirts also passed a ball. This created a real distraction.

The experiment came with a twist. While subjects try to count basketball passes, a person dressed in a gorilla suit walks slowly across the screen. The surprising fact is that some 70 per cent of subjects never see the gorilla. When they watch the clip a second time, they are dumbfounded by the fact that they missed something so obvious. The video of the surprising gorilla has been viewed millions of times on YouTube – remarkable for a scientific experiment. Different versions of the gorilla experiment, such as the ‘moonwalking bear,’ have also received significant attention.
Now, it’s hard to argue with the findings of the gorilla experiment itself. It’s a fact that most people who watch the clip miss the gorilla.
The conclusion that is generally drawn (including by heavyweights like Kahneman) is that humans are “ ‘blind to the obvious, and blind to our blindness.’” The important point that TF makes is that thisdescription of the result presupposes that there is available a well defined mind independent notion of “prominence or obviousness.” Or, in my (tendentious) terms, it presupposes an Eish conception of perception and a passive conception of the mind.  The problem is that this conception of obviousness is false. As TF correctly notes, “all kinds of things are readily evident in the clip.” In fact, I would say that there are likely to be an infinite number of possible things that could be evident in the clip in the right circumstances. As Lila Gleitman once wisely observed, a picture is worth a thousand words and that is precisely the problem. There is no way to specify what is “obvious” in the perception of the clip independent of the mind doing the perceiving. As TF puts it, obviousness only makes sense relativized to perceivers’ mental capacities and goals. 
Now, ‘obviousness’ is not a technical cog-neuro term. The scientific term of art is ‘salience.’ TF’s point is that it is quite standardly assumed that salience is an objective property of a stimulus, rather than a mind mediated relation. Here is TF on Kahneman again (3).
Kahneman’s focus on obviousness comes directly from his background and scientific training in an area called psychophysics. Psychophysics focuses largely on how environmental stimuli map on to the mind, specifically based on the actual characteristics of stimuli, rather than the characteristics or nature of the mind. From the perspective of psychophysics, obviousness – or as it is called in the literature, ‘salience’ – derives from the inherent nature or characteristics of the environmental stimuli themselves: such as their size, contrast, movement, colour or surprisingness. In his Nobel Prize lecture in 2002, Kahneman calls these ‘natural assessments’. And from this perspective, yes, the gorilla indeed should be obvious to anyone watching the clip. 
TF gets one thing askew in this description IMO: the conception of salience it criticizes is Eish, not psychophysical.[3]True, psychophysics aims to understand how sensation leads to perception and sensations are tied to the distal stimuli that generate them. But this does not imply that salience is an inherent property of the distal stimulus. The idea that it is, is pure Eism. On this view, minds that “miss” the salient features of a stimulus are minds that are misfiring. But if minds makestimuli salient (rather than simply tracking what is salient), then a mind that misses a gorilla in a video clip when asked to focus on the number of passes being executed by members of a team may be functioning perfectly well (indeed, optimally). For this purpose the gorilla is a distraction and an efficient mind with the specific count-the-passes mandate in hand might be better placed to accomplish its goal were it to “ignore” the gorilla in the visual scene.[4]
Let me put this another way: if minds are active in perception (i.e. if minds are as Rs have taken them to be) then salience is not a matter of what you are looking atbut what you are looking for (this is TF’s felicitous distinction). And if this is so, every time you hear some cog-psych person talking about “salience” and attributing to it causal/explanatory powers, you should appreciate that what you are on the receiving end of is Eish propaganda. It’s just like when Es press “analogy” into service to explain how minds generalize/induce. There is no scientifically usefully available notions of either except as relativized to the specific properties of the minds involved. Again as TF puts it (4):
Rather than passively accounting for or recording everything directly in front of us, humans – and other organisms for that matter – instead actively look for things. The implication (contrary to psychophysics[5]) is that mind-to-world processes drive perception rather than world-to-mind processes.

Yup, sensation and perception are largely mind mediated activities. Once again, Rism is right and Eism is wrong (surprise!).

Now, all of this is probably obvious to you(at least once it is pointed out). But it seems that these points are still considered radical by some. For example, TF rightly observes that this view permeates the Big Data/Deep Learning (BD/DL) hoopla. If perception is simply picking out the objectively salient features of the environment unmediated by distorting preconceptions, then there is every reason to think that being able to quickly assimilate large amounts of input and statistically massage them quickly is the road to cognitive excellence. Deep Minds are built to do just that, and that is the problem (see herefor discussion of this issue by “friendly” critics of BD/DL). 

But, if Rism is right, then minds are not passive pattern matchers or neutral data absorbers but are active probers of the passing scene looking for information to justify inferences the mind is built to make. And if this is right, and some objective notion of salience cannot be uncritically taken to undergird the notion of relevance, then purely passive minds (i.e. current Deep Minds) won’t be able to separate what is critical from what is not. 

Indeed, this is what lies behind the failure of current AI to get anywhere on unsupervised learning. Learning needs a point of view. Supervised learning provides the necessary perspective in curating the data (i.e. by separating out the relevan-to-the-task (e.g. find the bunny)) data from the non-relevant-to-the-task data). But absent a curator (that which is necessarily missing from unsupervised learning), the point of view (what is obvious/salient/relevant) must come from the learner (i.e. in this case, the Deep Mind program). So if the goal is to get theories of unsupervised learning, the hard problem is to figure out what minds consider relevant/salient/obvious and to put this into the machine’s mind. But, and here is the problem, this is precisely the problem that Eism brackets by taking salience to be an objective feature of the stimulus. Thus, to the degree that BD/DL embrace Eism (IMO, the standard working assumption), to that degree it will fail to address the problem of unsupervised learning (which, I am told, is theproblem that everyone (e.g. Hinton) thinks needs solving).[6]

TF makes a few other interesting observations, especially as relates to the political consequences of invidiously comparing human and machine capacities to the detriment of the former. But for present purposes, TF’s utility lies in identifying anotherway that Eism goes wrong (in addition, for example, to abstracting away from exactly how minds generalize (remember, saying that the mind generalizes via “analogy” is to say nothing at all!)) and makes it harder to think clearly about the relevant issues in cog-neuro.

Sam Epstein develops this same theme in a linguistic context (here (SE)). SE starts with correctly observing that the process of acquiring a particular G relies on two factors, (i) an innate capacity that humans bring to the process and (ii) environmental input (i.e. the PLD). SE further notes that this two factor model is generally glossed as reflecting the contributions of “nature” (the innate capacity) and “nurture” (the PLD). And herein we find the seeds of a deep Eish misunderstanding of the process, quite analogous to the one the TF identified.  Let me quote SE (197-198):

[I]t is important to remember—as has been noted before, but
perhaps it remains underappreciated—that it is precisely the organism’s biology
(nature) that determines what experience, in any domain, can consist of …
To clarify, a bee, for example, can perform its waggle dance for me a million times, but that ‘experience’, given my biological endowment, does not allow me to transduce the visual images of such waggling into a mental representation (knowledge) of the distance and direction to a food source. This is precisely what it does mean to a bee witnessing the exact same environmental event/waggle dance. Ultrasonic acoustic disturbances might be experience for my dog, but not for me. Thus, the ‘environment’ in this sense is not in fact the second factor, but rather, nurture is constituted of those aspects of the ill-defined ‘environment’ (which of course irrelevantly includes a K-mart store down the street from my house) that can in principle influence the developmental trajectory of one or more organs of a member of a particular species, given its innate endowment.

In the biolinguistic domain, the logic is no different. The apparent fact that
exposure to some finite threshold amount of ‘Tagalog’ acoustic disturbances in
contexts (originating from outside the organism, in the ‘environment’) can cause
any normal human infant to develop knowledge of ‘Tagalog’ is a property of
human infants…. Thus the standard statement that on the one hand, innate properties of the organism and, on the other, the environment, determine organismic development, is profoundly misleading. It suggests that those environmental factors that can influence the development of particular types of organisms are definable, non-biologically—as the behaviorists sought, but of course failed, to define ‘stimulus’ as an organism-external construct. We can’t know what the relevant developmental stimuli are or aren’t, without knowing the properties of the organism.

This is, of course, correct. What counts as input to the language acquisition device (LAD) must be innately specified. Inputs do not come marked as linguistically vs non-linguistically relevant. Further what the LAD does in acquiring a G is the poster child example of unsupervised learning. And as we noted above, without a supervisor/curator selecting the relevant inputs for the child and organizing them into the appropriate boxes it’s the structure of the LAD that mustbe doing the relevant curating for itself. There really is no other alternative. 

SE points out an important consequence of this observation for nature vs nurture arguments within linguistics, including Poverty of Stimulus debates.  As SE notes (198): 

… organism external ‘stimuli’ cannot possibly suffice to explain any aspects of the developed adult state of any organism. 

Why? For the simple reason that the relevant PLD “experience” that the LAD exploits is itself a construction of the LAD. The relevant stimulus is the proximal one, and in the linguistic domain (indeed in most cognitively non-trivial domains) the proximal stimulus is only distantly related to the distal one that triggers the relevant transduction. Here is SE once more (199):

…experience is constructed by the organism’s innate properties, and is very different from ‘the environment’ or the behaviorist notion of ‘stimulus’.

As SE notes, all of this was well understood over 300 years ago (SE contains a nice little quote from Descartes). Actually, there was a lively discussion at the start of the “first cognitive revolution” (I think this is Chomsky’s term) that went under the name of the “primary/secondary quality distinction” that tried to categorize those features of proximate stimuli that reflected objective features of their distal causes and those that did not. Here appears to be another place where we have lost clear sight of conceptual ground that our precursors cleared.

SE contains a lot more provocative (IMO, correct) discussion of the implications of the observation that experience is a nature-infested notion. Take a look.

Let me mention one last paper that can be read along side TF and SE. It is on predictive coding, a current fad, apparently, within the cog-neuro world (here). The basic idea is that the brain makes top down predictions based on its internal mental/brain models about what it should experience, perception amounting to checking these predictions against the “input” and adjusting the mental models to fit these. In other words, perception is cognitively saturated. 

This idea seems to be getting a lot of traction of late (a piece in Quantais often a good indicator that an idea is “hot”). For our purposes, the piece usefully identifies how the new view differs from the one that was previously dominant (7-8):
The view of neuroscience that dominated the 20th century characterized the brain’s function as that of a feature detector: It registers the presence of a stimulus, processes it, and then sends signals to produce a behavioral response. Activity in specific cells reflects the presence or absence of stimuli in the physical world. Some neurons in the visual cortex, for instance, respond to the edges of objects in view; others fire to indicate the objects’ orientation, coloring or shading…
Rather than waiting for sensory information to drive cognition, the brain is always actively constructing hypotheses about how the world works and using them to explain experiences and fill in missing data. That’s why, according to some experts, we might think of perception as “controlled hallucination.”
Note the contrast: perception consists in detecting objective features of the stimulus vs constructing hypotheses about how the world works verified against bottom up “experience.” In other words, a passive feature detector vs an active mind constructing hypothesis tester.  Or, to be tendentious one more time, an Eish vs an Rish conception of the mental. 
One point worth noting. When I was a youngster oh so many decades ago, there was a big fight about whether brain mechanisms are largely bottom up or top down computational systems. The answer, of course, is that it uses both kinds of mechanisms. However the prevalent sentiment in the neuro world was that brains were largely bottom up systems, with higher levels generalizing over features provided by lower ones. Chomsky’s critique of discovery procedures (see herefor discussion) hit at exactly this point, noting that in the linguistic case it was not possible to treat higher levels as simple summaries of the statistical properties of lower ones. Indeed, the flow of information likely went from higher to lower as well. This has a natural interpretation in terms of brains mechanisms involving feed forward as well as feed back loops. Interestingly, this is what has also driven the trend towards predictive coding in the neuro world. It was discovered that the brain has many “top down feedback connections” (7)[7]and this sits oddly with the idea that brains basically sit passively waiting to absorb perceptual inputs. At any rate, there is an affinity between thinking brains indulge in lots of feed forward processing and taking brains to be active interpreters of the passing perceptual scene.
That’s it. To repeat the main message, the E vs R conceptions of the mind/brain and how it functions are very different, and importantly so. As the above papers note, it is all too easy to get confused about important matters if the differences between these two views of the mental world are not kept in mind. Or, again to be tendentious: Eism is bad for you! Only a healthy dose of Rism can protect you from walking its fruitless paths. So arm yourself and have a blessed Rish day.

[1]They also have two divergent pictures of how data and theory relate in inquiry, but that is not the topic of today’s sermonette.
[2]I have argued elsewhere (here) that this passivity is what allows Es to have a causal semantic theory. 
[3]Nor from what I can gather from Kahneman’s Noble lecture is he committed to the view that salience is a property of objects. Rather it is a property of situations a sentient agent finds herself in. The important point for Kahneman is that they are more or less automatic, fast, and unconscious. This is consistent with it being cognitively guided rather than a transparent reflection of the properties of the object. So, though TF’s point is useful, I suspect that he did not get Kahneman quite right. Happily none of that matters here.
[4]A perhaps pointless quibble: the fact that people cannot reportseeing a gorilla does not mean that they did not perceive one. The perceptual (and even cognitive) apparatus might indeed have registered a gorilla without it being the case that that viewers can access this information consciously. Think of being asked about the syntax of a sentence after hearing it and decoding its message. This is very hard to retrieve (it is below consciousness most of the time) but that does not mean that the syntax is not being computed. At any rate, none of this bears on the central issues, but it was a quibble that I wanted to register.
[5]NH: again, I would replace ‘psychophysics’ with ‘Eism.’
[6]As TF notes, this is actually a very old problem within AI. It is the “frame problem.” It was understood to be very knotty and nobody had any idea how to solve it in the general case. But, as TF noted, it has been forgotten “amid the present euphoria with large-scale information- and data-processing” (6).
            Moreover, it is a very hard problem. It is relatively easy to identify salient features givena context. Getting a theory of salience, in contrast, (i.e. a specification of the determinants of salience acrosscontexts) is very hard. As Kahneman notes in his Nobel Lecture (456), it is unlikely that we will have one of these anytime soon. Interestingly, early on Descartes identified the capacity for humans to appropriatelyrespond to what’s around them as an example of stimulus free (i.e. free and creative) behavior. We do not know more about this now than Descartes did in the 17thcentury, a correct point that Chomsky likes to make.
[7]If recollection serves (but remember I am old and on the verge of dementia) the connections from higher to lower brain levels is upwards of five times those from lower to upper. It seems that the brain is really eager to involve higher level “expectations” in the process of analyzing incoming sensations/perceptions.

Monday, August 27, 2018

Revolutions in science; a comment on Gelman

In what follows I am going to wander way beyond my level of expertise (perhaps even rudimentary competence). I am going to discuss statistics and its place in the contemporary “replication crisis” debates. So, reader be warned that you should take what I write with a very large grain of salt. 

Andrew Gelman has a long post (here, AG) where he ruminates about a comparatively small revolution in statistics that he has been a central part of (I know, it is a bit unseemly to toot your own horn, but heh, false modesty is nothing to be proud of either). It is small (or “far more trivial”) when compared to more substantial revolutions in Biology (Darwin) or Physics (Relativity and Quantum mechanics), but AG argues that the “Replication revolution” is an important step in enhancing our “understanding of how we learn about the world.” It may be right. But…

But, I am not sure that it has the narrative quite right. As AG portrays matters, the revolution need not have happened. The same ground could have been covered with “incremental corrections and adjustments.” Why weren’t they? The reactionaries forced a revolutionary change because of their reactions to reasonable criticisms by the likes of Meehl, Mayo, Ioannidis, Gelman, Simonsohn, Dreber, and “various other well-known skeptics.” Their reaction to these reasonable critiques was to charge the critics with bullying or insist that the indicated problems are all part of normal science and will eventually be removed by better training, higher standards etc. This, AG argues, was the wrong reaction and required a revolution, albeit a minor one relatively speaking, to overturn.

Now, I am very sympathetic to a large part of this position. I have long appreciated the work of the critics and have covered their work in FoL. I think that the critics have done a public service in pointing out that stats has served to confuse as often (maybe more often) than it has served to illuminate. And some have made the more important point (AG prominently among them) that this is not some mistake, but serves a need in the disciplines where it is most prominent (see here). What’s the need? Here is AG:[1]

Not understanding statistics is part of it, but another part is that people—applied researchers and also many professional statisticians—want statistics to do things it just can’t do. “Statistical significance” satisfies a real demand for certainty in the face of noise. It’s hard to teach people to accept uncertainty. I agree that we should try, but it’s tough, as so many of the incentives of publication and publicity go in the other direction.

And observe that the need is Janus faced. It faces inwards to relieve the anxiety of uncertainty and it faces outwards in relieving professional publish-or-perish anxiety. Much to AG’s credit he notices that these are different things, though they are mutually supporting. I suspect that the incentive structure is important, but secondary to the desire to “get results” and “find the truth” that animates most academics. Yes, lucre, fame, fortune, status are nice (well, very nice) but I agree that the main motivation for academics is the less tangible one, wanting to get results just for the sake of getting them. Being productive is a huge goal for any academic, and a big part of the lure of stats, IMO, is that it promises to get one there if one just works hard and keeps plugging away. 

So, what AG says about the curative nature of the mini-revolution rings true, but only in part. I think that the post fails to identify the three main causal spurs to stats overreach when combined with the desire to be a good productive scientist.

The first it mentions, but makes less off than perhaps others have. It is that stats are hard and interpreting them and applying them correctly takes a lot of subtlety. So much indeed that even experts often fail (see here). There is clearly something wrong with a tool that seems to insure large scale misuse. AG in fact notes this (here), but it does not play much of a role in the post cited above, though IMO it should have. What is it about stats techniques that make them so hard to get right? That I think is the real question. After all, as AG notes, it is not as if all domain find it hard to get things right. As he notes, psychometricians seem to get their stats right most of the time (as do those looking for the Higgs boson). So what is it about those domains where stats regularly fails to get things right that makes it the case that they so generally fail to get things right?  And this leads me to my second point.

Stats techniques play an outsized role in just those domains where theory is weakest. This is an old hobby horse of mine (see here for one example). Stats, especially fancy stats, induces the illusion that deep significant scientific insights are for the having if one just gets enough data points and learns to massage them correctly (and responsibly, no forking paths for me thank you very much). This conception is uncomfortable with the idea that there is no quick fix for ignorance. No amount of hard work, good ethics, or careful application suffices when we really have no idea what is going on. Why do I mention this? Because, in many of the domains where the replication crisis has been ripest are domains that are very very hard and where we really don’t have much of an understanding of what is happening. Or maybe to put this more gracefully, either the hypotheses of interest are too shallow and vague to be taken seriously (lots of social psych) or the effects of interest are the results of myriad interactions that are too hard to disentangle. In either case, stats will often provide an illusion of rigor while leading one down a forking garden path. Note, if this is right, then we have no problem seeing why psychometricians were in no need of the replication revolution. We really do have some good theory in the domains like sensory perception, and here stats have proven to be reliable and effective tools. The problem is not with stats, but with stats applied where they cannot be guided (and misapplications tamed) by significant theory.

Let me add two more codicils to this point.

First, here I part ways with AG. The post suggests that one source of the replication problem is with people having too great “an attachment to particular scientific theories or hypotheses.” But if I am right this is not the problem, at least not the problem behind the replication crisis. Being theoretically stubborn may make you wrong, but it is not clear why it makes your work shoddy. You get results you do not like and ignore them. That may or may not be bad. But with a modicum of honesty, the most stiff necked theoretician can appreciate that her/his favorite account, the one true theory, appears inconsistent with some data. I know whereof I speak, btw. The problem here, if there is one, is not generating misleading tests and non-replicable results, but if ignoring the (apparent) counter data. And this, though possibly a problem for an individual, may not be a problem for a field of inquiry as a whole. 

Second, there is a second temptation that today needs to be seriously resisted but that severely leads to replication problems: because of the ubiquity and availability of cheap “data” nowadays, the temptation to think that this time it’s different is very alluring. Big Data types often seem to think that get a large enough set of numbers, apply the right stats techniques (rinse and repeat) and out will plop The Truth. But this is wrong. Lars Syll puts it well here in a post entitled correctly “Why data is NOT enough to answer scientific questions”:

The central problem with present ‘machine learning’ and ‘big data’ hype is that so many –falsely- think that they can ge away with analyzing real-world phenomena without any (commitment to) theory. But –data never speaks for itself. Without a prior statistical set-up, there actually are no data at all to process. And – using a machine learning algorithm will only produce what you are looking for.

Clever data mining tricks are never enough to answer important scientific questions. Theory matters.

So, when one combines the fact that in many domains we have, at best, very weak theory, and that nowadays we are flooded with cheap available data the temptation to go hyper statistical can be overwhelming.

Let me put this another way. As AG notes, successful inquiry needs strong theory and careful measurement. Not the ‘and.’ Many read the ‘and’ as an ‘or’ and allow that strong theory can substitute for paucity of data or that tons of statistically curated data can substitute for virtual absence of significant theory. But this is a mistake. But a very tempting one if the alternative is having nothing much of interest or relevance to say at all. And this is what AG underplays: a central problem with stats is that it often tries to sell itself as allowing one to bypass the theory half of the conjunction. Further, because it “looks” technical and impressive (i.e. has a mathematical sheen) it leads to cargo cult science, scientific practice that looks "science" rather than being scientific. 

Note, this is not bad faith or corrupt practice (though there can be this as well). This stems from the desire to be, what AG dubs, a scientific “hero,” a disinterested searcher for the truth. The problem is not with the ambition, but the added supposition that any problem will yield to scientific inquiry if pursued conscientiously. Nope. Sorry. There are times when there is no obvious way to proceed because we have no idea how to proceed. And in these domains no matter how careful we are we are likely to find ourselves getting nowhere.

I think that there is a third source of the problem that resides in the complexity of the problems being studied. In particular, the fact that many phenomena we are interested in arise from the interaction of many causal sub-systems. When this happens there is bound to be a lot of sensitivity to the particular conditions of the experimental set up and so lots of opportunities for forking paths (i.e. p-hacking) stats (unintentional) abuse. 

Now, every domain of inquiry has this problem and needs to manage it. In the physical sciences this is done by (as Diogo once put it to me) “controlling the shit out of the experimental set up.” Physicists control for interaction effects by removing many (most) of the interfering factors. A good experiment requires creating a non-natural artificial environment in which problematic factors are managed via elimination. Diogo convinced me that one of the nice features of linguistic inquiry is that it is possible to “control the shit” out of the stimuli thereby vastly reducing noise generated by an experimental subject. At any rate, one way of getting around interaction effects problem is to manage the noise by simplifying the experimental set up and isolating the relevant causal sub-systems.

But often this cannot be done, among other reasons because we have no idea what the interacting subsystems are or how they function (think, for example, pragmatics).  Then we cannot simplify the set up and we will find that our experiments are often task dependent and very noisy. Stats offers a possible way out. In place of controlling the design of the set up the aim is to statistically manage (partial out) the noise. What seems to have been discovered (IMO, not surprisingly) is that this is very hard to do in the absence of relevant theory. You cannot control for the noise if you have no idea where it comes from or what is causing it. There is no such thing as a theory free lunch (or at least not a nutritious one). The revolution AG discusses, I believe, has rediscovered this bit of wisdom.

Let me end with an observation special to linguistics. There are parts of linguistics (syntax, large parts of phonology and morphology) where we are lucky in that the signal from the underlying mechanisms are remarkably strong in that they withstand all manner of secondary effects. Such data are, relatively speaking, very robust. So, for example, ECP or island or binding violations show few context effects. This does not mean to say that there are no effects at all of context wrt acceptability (Sprouse and Co. have shown that these do exist). But the main effect is usually easy to discern. We are lucky. Other domains of linguistic inquiry are far noisier (I mentioned pragmatics, but even large parts of semantics strike me as similar (maybe because it is hard to know where semantics ends and pragmatics begins)). I suspect that a good part of the success of linguistics can be traced to the fact that FL is largely insulated from the effects of the other cognitive subsystems it interacts with. As Jerry Fodor once observed (in his discussion of modularity), the degree to which a psych system is modular to that degree it is comprehensible. Some linguists have lucked out. But as we more and more study the interaction effects wrt language we will run into the same problems. If we are lucky, linguistic theory will help us avoid many of the pitfalls AG has noted and categorized. But there are no guarantees, sadly.

[1]I apologize for not being able to link to the original. It seems that in the post where I discussed it, I failed to link to the original and now cannot find it. It should have appeared in roughly June 2017, but I have not managed to track it down. Sorry.