Empiricists (E) and Rationalists (R) have two divergent “pictures” of how the mind/brain functions (henceforth, I use ‘mind’ unless brains are the main focus).
For Es, the mind/brain is largely a passive instrument that, when running well, faithfully records the passing environmental scene. Things go awry when the wrong kinds of beliefs intrude between the sensory input and receptive mind to muddy the reception. The best mind is a perfectly receptive mind. Passive is good. Active leads to distortion.
For Rs there is no such thing as a passive mind. What you perceive is actively constructed along dimensions that the mind makes available. Perception is constructed. There is no unvarnished input, as transduction takes place along routes the mind lays out and regulates. More to the point, sensing is an activity guided by mental structure.
All of this is pretty old hat. However, that does not mean that it has been well assimilated into the background wisdom of cog-neuro. Indeed, from what I can tell, there are large parts of this world (and the closely related Big Data/Deep Mind world) that take the R picture to be contentious and the E picture to be obvious (though as we shall see, this seems to be changing). I recently ran across several nice pieces that discuss these issues in interesting ways that I would like to bring to your attention. Let me briefly discuss each of them in turn.
The first appeared here (let’s call the post TF (Teppo Felin being the author)) and it amusingly starts by discussing that famous “gorilla” experiment. In case you do not know it, it goes as follows (TF obligingly provides links to Youtube videos that will allow you to be a subject and “see” the gorilla (or not) for yourself). Here is TF’s description (2):
In the experiment, subjects were asked to watch a short video and to count the basketball passes. The task seemed simple enough. But it was made more difficult by the fact that subjects had to count basketball passes by the team wearing white shirts, while a team wearing black shirts also passed a ball. This created a real distraction.
The experiment came with a twist. While subjects try to count basketball passes, a person dressed in a gorilla suit walks slowly across the screen. The surprising fact is that some 70 per cent of subjects never see the gorilla. When they watch the clip a second time, they are dumbfounded by the fact that they missed something so obvious. The video of the surprising gorilla has been viewed millions of times on YouTube – remarkable for a scientific experiment. Different versions of the gorilla experiment, such as the ‘moonwalking bear,’ have also received significant attention.
Now, it’s hard to argue with the findings of the gorilla experiment itself. It’s a fact that most people who watch the clip miss the gorilla.
The conclusion that is generally drawn (including by heavyweights like Kahneman) is that humans are “ ‘blind to the obvious, and blind to our blindness.’” The important point that TF makes is that thisdescription of the result presupposes that there is available a well defined mind independent notion of “prominence or obviousness.” Or, in my (tendentious) terms, it presupposes an Eish conception of perception and a passive conception of the mind. The problem is that this conception of obviousness is false. As TF correctly notes, “all kinds of things are readily evident in the clip.” In fact, I would say that there are likely to be an infinite number of possible things that could be evident in the clip in the right circumstances. As Lila Gleitman once wisely observed, a picture is worth a thousand words and that is precisely the problem. There is no way to specify what is “obvious” in the perception of the clip independent of the mind doing the perceiving. As TF puts it, obviousness only makes sense relativized to perceivers’ mental capacities and goals.
Now, ‘obviousness’ is not a technical cog-neuro term. The scientific term of art is ‘salience.’ TF’s point is that it is quite standardly assumed that salience is an objective property of a stimulus, rather than a mind mediated relation. Here is TF on Kahneman again (3).
Kahneman’s focus on obviousness comes directly from his background and scientific training in an area called psychophysics. Psychophysics focuses largely on how environmental stimuli map on to the mind, specifically based on the actual characteristics of stimuli, rather than the characteristics or nature of the mind. From the perspective of psychophysics, obviousness – or as it is called in the literature, ‘salience’ – derives from the inherent nature or characteristics of the environmental stimuli themselves: such as their size, contrast, movement, colour or surprisingness. In his Nobel Prize lecture in 2002, Kahneman calls these ‘natural assessments’. And from this perspective, yes, the gorilla indeed should be obvious to anyone watching the clip.
TF gets one thing askew in this description IMO: the conception of salience it criticizes is Eish, not psychophysical.True, psychophysics aims to understand how sensation leads to perception and sensations are tied to the distal stimuli that generate them. But this does not imply that salience is an inherent property of the distal stimulus. The idea that it is, is pure Eism. On this view, minds that “miss” the salient features of a stimulus are minds that are misfiring. But if minds makestimuli salient (rather than simply tracking what is salient), then a mind that misses a gorilla in a video clip when asked to focus on the number of passes being executed by members of a team may be functioning perfectly well (indeed, optimally). For this purpose the gorilla is a distraction and an efficient mind with the specific count-the-passes mandate in hand might be better placed to accomplish its goal were it to “ignore” the gorilla in the visual scene.
Let me put this another way: if minds are active in perception (i.e. if minds are as Rs have taken them to be) then salience is not a matter of what you are looking atbut what you are looking for (this is TF’s felicitous distinction). And if this is so, every time you hear some cog-psych person talking about “salience” and attributing to it causal/explanatory powers, you should appreciate that what you are on the receiving end of is Eish propaganda. It’s just like when Es press “analogy” into service to explain how minds generalize/induce. There is no scientifically usefully available notions of either except as relativized to the specific properties of the minds involved. Again as TF puts it (4):
Rather than passively accounting for or recording everything directly in front of us, humans – and other organisms for that matter – instead actively look for things. The implication (contrary to psychophysics) is that mind-to-world processes drive perception rather than world-to-mind processes.
Yup, sensation and perception are largely mind mediated activities. Once again, Rism is right and Eism is wrong (surprise!).
Now, all of this is probably obvious to you(at least once it is pointed out). But it seems that these points are still considered radical by some. For example, TF rightly observes that this view permeates the Big Data/Deep Learning (BD/DL) hoopla. If perception is simply picking out the objectively salient features of the environment unmediated by distorting preconceptions, then there is every reason to think that being able to quickly assimilate large amounts of input and statistically massage them quickly is the road to cognitive excellence. Deep Minds are built to do just that, and that is the problem (see herefor discussion of this issue by “friendly” critics of BD/DL).
But, if Rism is right, then minds are not passive pattern matchers or neutral data absorbers but are active probers of the passing scene looking for information to justify inferences the mind is built to make. And if this is right, and some objective notion of salience cannot be uncritically taken to undergird the notion of relevance, then purely passive minds (i.e. current Deep Minds) won’t be able to separate what is critical from what is not.
Indeed, this is what lies behind the failure of current AI to get anywhere on unsupervised learning. Learning needs a point of view. Supervised learning provides the necessary perspective in curating the data (i.e. by separating out the relevan-to-the-task (e.g. find the bunny)) data from the non-relevant-to-the-task data). But absent a curator (that which is necessarily missing from unsupervised learning), the point of view (what is obvious/salient/relevant) must come from the learner (i.e. in this case, the Deep Mind program). So if the goal is to get theories of unsupervised learning, the hard problem is to figure out what minds consider relevant/salient/obvious and to put this into the machine’s mind. But, and here is the problem, this is precisely the problem that Eism brackets by taking salience to be an objective feature of the stimulus. Thus, to the degree that BD/DL embrace Eism (IMO, the standard working assumption), to that degree it will fail to address the problem of unsupervised learning (which, I am told, is theproblem that everyone (e.g. Hinton) thinks needs solving).
TF makes a few other interesting observations, especially as relates to the political consequences of invidiously comparing human and machine capacities to the detriment of the former. But for present purposes, TF’s utility lies in identifying anotherway that Eism goes wrong (in addition, for example, to abstracting away from exactly how minds generalize (remember, saying that the mind generalizes via “analogy” is to say nothing at all!)) and makes it harder to think clearly about the relevant issues in cog-neuro.
Sam Epstein develops this same theme in a linguistic context (here (SE)). SE starts with correctly observing that the process of acquiring a particular G relies on two factors, (i) an innate capacity that humans bring to the process and (ii) environmental input (i.e. the PLD). SE further notes that this two factor model is generally glossed as reflecting the contributions of “nature” (the innate capacity) and “nurture” (the PLD). And herein we find the seeds of a deep Eish misunderstanding of the process, quite analogous to the one the TF identified. Let me quote SE (197-198):
[I]t is important to remember—as has been noted before, but
perhaps it remains underappreciated—that it is precisely the organism’s biology
(nature) that determines what experience, in any domain, can consist of …
To clarify, a bee, for example, can perform its waggle dance for me a million times, but that ‘experience’, given my biological endowment, does not allow me to transduce the visual images of such waggling into a mental representation (knowledge) of the distance and direction to a food source. This is precisely what it does mean to a bee witnessing the exact same environmental event/waggle dance. Ultrasonic acoustic disturbances might be experience for my dog, but not for me. Thus, the ‘environment’ in this sense is not in fact the second factor, but rather, nurture is constituted of those aspects of the ill-defined ‘environment’ (which of course irrelevantly includes a K-mart store down the street from my house) that can in principle influence the developmental trajectory of one or more organs of a member of a particular species, given its innate endowment.
In the biolinguistic domain, the logic is no different. The apparent fact that
exposure to some finite threshold amount of ‘Tagalog’ acoustic disturbances in
contexts (originating from outside the organism, in the ‘environment’) can cause
any normal human infant to develop knowledge of ‘Tagalog’ is a property of
human infants…. Thus the standard statement that on the one hand, innate properties of the organism and, on the other, the environment, determine organismic development, is profoundly misleading. It suggests that those environmental factors that can influence the development of particular types of organisms are definable, non-biologically—as the behaviorists sought, but of course failed, to define ‘stimulus’ as an organism-external construct. We can’t know what the relevant developmental stimuli are or aren’t, without knowing the properties of the organism.
This is, of course, correct. What counts as input to the language acquisition device (LAD) must be innately specified. Inputs do not come marked as linguistically vs non-linguistically relevant. Further what the LAD does in acquiring a G is the poster child example of unsupervised learning. And as we noted above, without a supervisor/curator selecting the relevant inputs for the child and organizing them into the appropriate boxes it’s the structure of the LAD that mustbe doing the relevant curating for itself. There really is no other alternative.
SE points out an important consequence of this observation for nature vs nurture arguments within linguistics, including Poverty of Stimulus debates. As SE notes (198):
… organism external ‘stimuli’ cannot possibly suffice to explain any aspects of the developed adult state of any organism.
Why? For the simple reason that the relevant PLD “experience” that the LAD exploits is itself a construction of the LAD. The relevant stimulus is the proximal one, and in the linguistic domain (indeed in most cognitively non-trivial domains) the proximal stimulus is only distantly related to the distal one that triggers the relevant transduction. Here is SE once more (199):
…experience is constructed by the organism’s innate properties, and is very different from ‘the environment’ or the behaviorist notion of ‘stimulus’.
As SE notes, all of this was well understood over 300 years ago (SE contains a nice little quote from Descartes). Actually, there was a lively discussion at the start of the “first cognitive revolution” (I think this is Chomsky’s term) that went under the name of the “primary/secondary quality distinction” that tried to categorize those features of proximate stimuli that reflected objective features of their distal causes and those that did not. Here appears to be another place where we have lost clear sight of conceptual ground that our precursors cleared.
SE contains a lot more provocative (IMO, correct) discussion of the implications of the observation that experience is a nature-infested notion. Take a look.
Let me mention one last paper that can be read along side TF and SE. It is on predictive coding, a current fad, apparently, within the cog-neuro world (here). The basic idea is that the brain makes top down predictions based on its internal mental/brain models about what it should experience, perception amounting to checking these predictions against the “input” and adjusting the mental models to fit these. In other words, perception is cognitively saturated.
This idea seems to be getting a lot of traction of late (a piece in Quantais often a good indicator that an idea is “hot”). For our purposes, the piece usefully identifies how the new view differs from the one that was previously dominant (7-8):
The view of neuroscience that dominated the 20th century characterized the brain’s function as that of a feature detector: It registers the presence of a stimulus, processes it, and then sends signals to produce a behavioral response. Activity in specific cells reflects the presence or absence of stimuli in the physical world. Some neurons in the visual cortex, for instance, respond to the edges of objects in view; others fire to indicate the objects’ orientation, coloring or shading…
Rather than waiting for sensory information to drive cognition, the brain is always actively constructing hypotheses about how the world works and using them to explain experiences and fill in missing data. That’s why, according to some experts, we might think of perception as “controlled hallucination.”
Note the contrast: perception consists in detecting objective features of the stimulus vs constructing hypotheses about how the world works verified against bottom up “experience.” In other words, a passive feature detector vs an active mind constructing hypothesis tester. Or, to be tendentious one more time, an Eish vs an Rish conception of the mental.
One point worth noting. When I was a youngster oh so many decades ago, there was a big fight about whether brain mechanisms are largely bottom up or top down computational systems. The answer, of course, is that it uses both kinds of mechanisms. However the prevalent sentiment in the neuro world was that brains were largely bottom up systems, with higher levels generalizing over features provided by lower ones. Chomsky’s critique of discovery procedures (see herefor discussion) hit at exactly this point, noting that in the linguistic case it was not possible to treat higher levels as simple summaries of the statistical properties of lower ones. Indeed, the flow of information likely went from higher to lower as well. This has a natural interpretation in terms of brains mechanisms involving feed forward as well as feed back loops. Interestingly, this is what has also driven the trend towards predictive coding in the neuro world. It was discovered that the brain has many “top down feedback connections” (7)and this sits oddly with the idea that brains basically sit passively waiting to absorb perceptual inputs. At any rate, there is an affinity between thinking brains indulge in lots of feed forward processing and taking brains to be active interpreters of the passing perceptual scene.
That’s it. To repeat the main message, the E vs R conceptions of the mind/brain and how it functions are very different, and importantly so. As the above papers note, it is all too easy to get confused about important matters if the differences between these two views of the mental world are not kept in mind. Or, again to be tendentious: Eism is bad for you! Only a healthy dose of Rism can protect you from walking its fruitless paths. So arm yourself and have a blessed Rish day.
They also have two divergent pictures of how data and theory relate in inquiry, but that is not the topic of today’s sermonette.
Nor from what I can gather from Kahneman’s Noble lecture is he committed to the view that salience is a property of objects. Rather it is a property of situations a sentient agent finds herself in. The important point for Kahneman is that they are more or less automatic, fast, and unconscious. This is consistent with it being cognitively guided rather than a transparent reflection of the properties of the object. So, though TF’s point is useful, I suspect that he did not get Kahneman quite right. Happily none of that matters here.
A perhaps pointless quibble: the fact that people cannot reportseeing a gorilla does not mean that they did not perceive one. The perceptual (and even cognitive) apparatus might indeed have registered a gorilla without it being the case that that viewers can access this information consciously. Think of being asked about the syntax of a sentence after hearing it and decoding its message. This is very hard to retrieve (it is below consciousness most of the time) but that does not mean that the syntax is not being computed. At any rate, none of this bears on the central issues, but it was a quibble that I wanted to register.
NH: again, I would replace ‘psychophysics’ with ‘Eism.’
As TF notes, this is actually a very old problem within AI. It is the “frame problem.” It was understood to be very knotty and nobody had any idea how to solve it in the general case. But, as TF noted, it has been forgotten “amid the present euphoria with large-scale information- and data-processing” (6).
Moreover, it is a very hard problem. It is relatively easy to identify salient features givena context. Getting a theory of salience, in contrast, (i.e. a specification of the determinants of salience acrosscontexts) is very hard. As Kahneman notes in his Nobel Lecture (456), it is unlikely that we will have one of these anytime soon. Interestingly, early on Descartes identified the capacity for humans to appropriatelyrespond to what’s around them as an example of stimulus free (i.e. free and creative) behavior. We do not know more about this now than Descartes did in the 17thcentury, a correct point that Chomsky likes to make.
If recollection serves (but remember I am old and on the verge of dementia) the connections from higher to lower brain levels is upwards of five times those from lower to upper. It seems that the brain is really eager to involve higher level “expectations” in the process of analyzing incoming sensations/perceptions.