Wednesday, September 12, 2018

The neural autonomy of syntax

Nothing does language like humans do language. This is not a hypothesis. It is a simple fact. Nonetheless, it is often either questioned or only reluctantly conceded. Therefore, I urge you to repeat the first sentence of this post three times before moving forward. It is both true and a truism. 

Let’s go further. The truth of this observation suggests the following non-trivial inference: there is something biologically special about humans that enables them (us) to be linguistically proficient andthis special mental power is linguistically specific. In other words, humans are uniquely cognitively endowed as a matter of biology when it comes to language and this biological gift is tailored to track some specific cognitive feature of language rather than (for example) being (just!) a general increase in (say)generalbrain power. On this view, the traditional GG conception stemming from Chomsky takes FL to be both species specific and domain specific. 

Before proceeding, let me at once note that these are independent specificity theses. I do this because every time I make this point, others insist in warning me that the fact mentioned in the first sentence does not imply the inference I just drew in the second paragraph. Quite right. In fact: 

It is logically possible that linguistic competence supervenes on no domain specific capacities but is still species specific in that only humans have (for example) sufficiently powerful general brains to be linguistically proficient. Say, for example, linguistic competence requires at least 500 units of cognitive power (CP) and only human brains can generate this much CP. However, modulo the extra CPs, the mental “programs” the CPs drive are the same as those that (at least some) other cognitive creatures enjoy, they just cannot drive them as fast or as far because of mileage restrictions imposed by low CP brains.

Similarly, it is logically possible that animals other than humans have domain specific linguistic powers. It is conceivable that apes, corvids, platypuses, manatees, and Portuguese water dogs all have brains that include FLs just like ours that are linguistically specific (e.g. syntax focused and not exercised in other cognitive endeavors). Were this so, then both they and we would have brains with specific linguistic sensitivities in virtue of having brains with linguistically bespoke wiring/circuitry or whatever specially tailored brain ware makes FL brains special. Of course, were I one of them I would keep this to myself as humans have the unfortunate tendency of dismembering anything that might yield scientific insight (or just might be tasty). If these other animals actually had an FL I am pretty sure some NIH scientist would be trying to figure out how to slice and dice their brains in order to figure out how its FL ticks.

So, both options are logically possible, but, the GG tradition stemming from Chomsky (and this includes yours truly, a fully paid up member of this tribe) has doubted that these logical options are live and that when it comes to language onlywe humans are built for it and what makes our cognitive profile special is a set of linguistically specific cognitive functions built into FL and dedicated to linguistic cognition. Or, to put this another way, FL has some special cognitive sauce that allows us to be as linguistically adept as we evidently are and we alone have minds/brains with this FL.

Nor do the exciting leaps of inference stop here. GG has gone even further out on the empirical limb and suggested that the bespoke property of FL that makes us linguistically special involves an autonomous SYNTAX (i.e. a syntax irreducible to either semantics or phonology and with its own special combinatoric properties). That’s right readers, syntax makes the linguistic world go round and only we got it and that’s why we are so linguistically special![1]Indeed, if a modern linguistic Ms or Mr Hillel were asked to sum up GG while standing on one foot s/he could do worse than say, only humans have syntax, all the rest is commentary.

This line of reasoning has been (and still is) considered very contentious. However, I recently ran across a paper by Campbell and Tyler (here, henceforth C&T) that argues for roughly this point (thx to William Matchin for sending it along). The paper has several interesting features, but perhaps the most intriguing (to me) is that Tyler is one of the authors. If memory serves, when I was growing up, Tyler was one of those who were very skeptical that there was anything cognitively special about language. Happily, it seems that times have changed.

C&T argues that brain localizes syntactic processing in the left frontotemporal lobe and “makes a strong case for the domain specificity of the frontotemporal syntax system and its autonomy from domain-general networks” (132). So, the paper argues for a neural version of the autonomy of syntax thesis. Let me say a few more words about it.

First, C&T notes that (of course) the syntax dedicated part of the brain regularly interacts with the non-syntactic domain general parts of the brain. However, the paper rightly notes that this does not argue against the claim that there is an autonomous syntactic system encoded in the brain. It merely means that finding it will be hard as this independence will often be obscured.  More particularly C&T says the activation of the domain general systems only arise “during task based language comprehension” (133). Tasks include having to make an acceptability judgment. When we focus on pure comprehension, however, without requiring any further “task” we find that “only the left-laterilized frontotemporal syntax system and auditory networks are activated” (133). Thus, the syntax system only links to the domain general ones during “overt task performance” and otherwise activates alone. C&T note that this implies that the syntactic system alone is sufficient for syntactic analysis during language comprehension.

Second, C&T argue that arguments against the neural autonomy of syntax rest on bad definitions of domain specificity. More particularly, according to C&T the benchmarks for autonomy in other studies beg the autonomy question by embedding a “task” in the measure and so “lead to the activation of additional domain-general regions” (133). As C&T notes, when such “tasks” are controlled for, we only find activation in the syntax region.

Third, the relevant notion of syntax is the one GGers know and love. For C&T takes syntax to be prime species specific feature of the brain and understands syntax in GGish terms to be implicated in “the construction of hierarchical syntactic structures.” C&T contrasts hierarchical relations with “adjacency relationships” which it claims “both human and non-human primates are sensitive to” (134). This is pretty much the conventional GG view and C&T endorses it.

And there is more. C&T endorses the Hauser, Chomsky, Fitch distinction between FLN and FLB. This is not surprising for once one adopts an autonomy of syntax thesis and appreciates the uniqueness of syntax in human minds/brains the distinction follows pretty quickly. Let me quote C&T (135):

In this brief overview, we have suggested that it is necessary to take a more nuanced view to differentiating domain-general and domain-specific components involved in language. While syntax seems to meet the criteria for domain-specificity….there are other key components in the wider language system which are domain-general in that they are also involved in a number of cognitive functions which do not involve language.

C&T has one last intriguing feature, at least for a GGer like me. The name ‘Chomsky’ or the terms ‘generative grammar’ are never mentioned, not even once (shades of Voldemort!). Quite clearly, the set of ideas that the paper explores presupposes the basic correctness of the Chomskyan generative enterprise. C&T arugues for a neural autonomy of syntax thesis and, in doing so, it relies on the main contours of the Chomsky/GG conception of FL. Yes, if C&T is correct it adds to this body of thought. But it clearly relies on it’s main claims and presupposes their essential correctness. A word to this effect would have been nice to see. That said, read the paper. Contrary to the assumptions of many, it argues that for a cog-neuro conception of the Chomsky conception of language. Even if it dares not speak his name.

[1]I suspect that waggle dancing bees and dead reckoning insects also non verbally advance a cognitive exceptionalism thesis and preen accordingly.

Tuesday, September 4, 2018

Two pictures of the mind?(brain)

Empiricists (E) and Rationalists (R) have two divergent “pictures” of how the mind/brain functions (henceforth, I use ‘mind’ unless brains are the main focus).[1]

For Es, the mind/brain is largely a passive instrument that, when running well, faithfully records the passing environmental scene. Things go awry when the wrong kinds of beliefs intrude between the sensory input and receptive mind to muddy the reception. The best mind is a perfectly receptive mind. Passive is good. Active leads to distortion.[2]

For Rs there is no such thing as a passive mind. What you perceive is actively constructed along dimensions that the mind makes available. Perception is constructed. There is no unvarnished input, as transduction takes place along routes the mind lays out and regulates. More to the point, sensing is an activity guided by mental structure.

All of this is pretty old hat. However, that does not mean that it has been well assimilated into the background wisdom of cog-neuro.  Indeed, from what I can tell, there are large parts of this world (and the closely related Big Data/Deep Mind world) that take the R picture to be contentious and the E picture to be obvious (though as we shall see, this seems to be changing).  I recently ran across several nice pieces that discuss these issues in interesting ways that I would like to bring to your attention. Let me briefly discuss each of them in turn.

The first appeared here (let’s call the post TF (Teppo Felin being the author)) and it amusingly starts by discussing that famous “gorilla” experiment. In case you do not know it, it goes as follows (TF obligingly provides links to Youtube videos that will allow you to be a subject and “see” the gorilla (or not) for yourself). Here is TF’s description (2):

 In the experiment, subjects were asked to watch a short video and to count the basketball passes. The task seemed simple enough. But it was made more difficult by the fact that subjects had to count basketball passes by the team wearing white shirts, while a team wearing black shirts also passed a ball. This created a real distraction.

The experiment came with a twist. While subjects try to count basketball passes, a person dressed in a gorilla suit walks slowly across the screen. The surprising fact is that some 70 per cent of subjects never see the gorilla. When they watch the clip a second time, they are dumbfounded by the fact that they missed something so obvious. The video of the surprising gorilla has been viewed millions of times on YouTube – remarkable for a scientific experiment. Different versions of the gorilla experiment, such as the ‘moonwalking bear,’ have also received significant attention.
Now, it’s hard to argue with the findings of the gorilla experiment itself. It’s a fact that most people who watch the clip miss the gorilla.
The conclusion that is generally drawn (including by heavyweights like Kahneman) is that humans are “ ‘blind to the obvious, and blind to our blindness.’” The important point that TF makes is that thisdescription of the result presupposes that there is available a well defined mind independent notion of “prominence or obviousness.” Or, in my (tendentious) terms, it presupposes an Eish conception of perception and a passive conception of the mind.  The problem is that this conception of obviousness is false. As TF correctly notes, “all kinds of things are readily evident in the clip.” In fact, I would say that there are likely to be an infinite number of possible things that could be evident in the clip in the right circumstances. As Lila Gleitman once wisely observed, a picture is worth a thousand words and that is precisely the problem. There is no way to specify what is “obvious” in the perception of the clip independent of the mind doing the perceiving. As TF puts it, obviousness only makes sense relativized to perceivers’ mental capacities and goals. 
Now, ‘obviousness’ is not a technical cog-neuro term. The scientific term of art is ‘salience.’ TF’s point is that it is quite standardly assumed that salience is an objective property of a stimulus, rather than a mind mediated relation. Here is TF on Kahneman again (3).
Kahneman’s focus on obviousness comes directly from his background and scientific training in an area called psychophysics. Psychophysics focuses largely on how environmental stimuli map on to the mind, specifically based on the actual characteristics of stimuli, rather than the characteristics or nature of the mind. From the perspective of psychophysics, obviousness – or as it is called in the literature, ‘salience’ – derives from the inherent nature or characteristics of the environmental stimuli themselves: such as their size, contrast, movement, colour or surprisingness. In his Nobel Prize lecture in 2002, Kahneman calls these ‘natural assessments’. And from this perspective, yes, the gorilla indeed should be obvious to anyone watching the clip. 
TF gets one thing askew in this description IMO: the conception of salience it criticizes is Eish, not psychophysical.[3]True, psychophysics aims to understand how sensation leads to perception and sensations are tied to the distal stimuli that generate them. But this does not imply that salience is an inherent property of the distal stimulus. The idea that it is, is pure Eism. On this view, minds that “miss” the salient features of a stimulus are minds that are misfiring. But if minds makestimuli salient (rather than simply tracking what is salient), then a mind that misses a gorilla in a video clip when asked to focus on the number of passes being executed by members of a team may be functioning perfectly well (indeed, optimally). For this purpose the gorilla is a distraction and an efficient mind with the specific count-the-passes mandate in hand might be better placed to accomplish its goal were it to “ignore” the gorilla in the visual scene.[4]
Let me put this another way: if minds are active in perception (i.e. if minds are as Rs have taken them to be) then salience is not a matter of what you are looking atbut what you are looking for (this is TF’s felicitous distinction). And if this is so, every time you hear some cog-psych person talking about “salience” and attributing to it causal/explanatory powers, you should appreciate that what you are on the receiving end of is Eish propaganda. It’s just like when Es press “analogy” into service to explain how minds generalize/induce. There is no scientifically usefully available notions of either except as relativized to the specific properties of the minds involved. Again as TF puts it (4):
Rather than passively accounting for or recording everything directly in front of us, humans – and other organisms for that matter – instead actively look for things. The implication (contrary to psychophysics[5]) is that mind-to-world processes drive perception rather than world-to-mind processes.

Yup, sensation and perception are largely mind mediated activities. Once again, Rism is right and Eism is wrong (surprise!).

Now, all of this is probably obvious to you(at least once it is pointed out). But it seems that these points are still considered radical by some. For example, TF rightly observes that this view permeates the Big Data/Deep Learning (BD/DL) hoopla. If perception is simply picking out the objectively salient features of the environment unmediated by distorting preconceptions, then there is every reason to think that being able to quickly assimilate large amounts of input and statistically massage them quickly is the road to cognitive excellence. Deep Minds are built to do just that, and that is the problem (see herefor discussion of this issue by “friendly” critics of BD/DL). 

But, if Rism is right, then minds are not passive pattern matchers or neutral data absorbers but are active probers of the passing scene looking for information to justify inferences the mind is built to make. And if this is right, and some objective notion of salience cannot be uncritically taken to undergird the notion of relevance, then purely passive minds (i.e. current Deep Minds) won’t be able to separate what is critical from what is not. 

Indeed, this is what lies behind the failure of current AI to get anywhere on unsupervised learning. Learning needs a point of view. Supervised learning provides the necessary perspective in curating the data (i.e. by separating out the relevan-to-the-task (e.g. find the bunny)) data from the non-relevant-to-the-task data). But absent a curator (that which is necessarily missing from unsupervised learning), the point of view (what is obvious/salient/relevant) must come from the learner (i.e. in this case, the Deep Mind program). So if the goal is to get theories of unsupervised learning, the hard problem is to figure out what minds consider relevant/salient/obvious and to put this into the machine’s mind. But, and here is the problem, this is precisely the problem that Eism brackets by taking salience to be an objective feature of the stimulus. Thus, to the degree that BD/DL embrace Eism (IMO, the standard working assumption), to that degree it will fail to address the problem of unsupervised learning (which, I am told, is theproblem that everyone (e.g. Hinton) thinks needs solving).[6]

TF makes a few other interesting observations, especially as relates to the political consequences of invidiously comparing human and machine capacities to the detriment of the former. But for present purposes, TF’s utility lies in identifying anotherway that Eism goes wrong (in addition, for example, to abstracting away from exactly how minds generalize (remember, saying that the mind generalizes via “analogy” is to say nothing at all!)) and makes it harder to think clearly about the relevant issues in cog-neuro.

Sam Epstein develops this same theme in a linguistic context (here (SE)). SE starts with correctly observing that the process of acquiring a particular G relies on two factors, (i) an innate capacity that humans bring to the process and (ii) environmental input (i.e. the PLD). SE further notes that this two factor model is generally glossed as reflecting the contributions of “nature” (the innate capacity) and “nurture” (the PLD). And herein we find the seeds of a deep Eish misunderstanding of the process, quite analogous to the one the TF identified.  Let me quote SE (197-198):

[I]t is important to remember—as has been noted before, but
perhaps it remains underappreciated—that it is precisely the organism’s biology
(nature) that determines what experience, in any domain, can consist of …
To clarify, a bee, for example, can perform its waggle dance for me a million times, but that ‘experience’, given my biological endowment, does not allow me to transduce the visual images of such waggling into a mental representation (knowledge) of the distance and direction to a food source. This is precisely what it does mean to a bee witnessing the exact same environmental event/waggle dance. Ultrasonic acoustic disturbances might be experience for my dog, but not for me. Thus, the ‘environment’ in this sense is not in fact the second factor, but rather, nurture is constituted of those aspects of the ill-defined ‘environment’ (which of course irrelevantly includes a K-mart store down the street from my house) that can in principle influence the developmental trajectory of one or more organs of a member of a particular species, given its innate endowment.

In the biolinguistic domain, the logic is no different. The apparent fact that
exposure to some finite threshold amount of ‘Tagalog’ acoustic disturbances in
contexts (originating from outside the organism, in the ‘environment’) can cause
any normal human infant to develop knowledge of ‘Tagalog’ is a property of
human infants…. Thus the standard statement that on the one hand, innate properties of the organism and, on the other, the environment, determine organismic development, is profoundly misleading. It suggests that those environmental factors that can influence the development of particular types of organisms are definable, non-biologically—as the behaviorists sought, but of course failed, to define ‘stimulus’ as an organism-external construct. We can’t know what the relevant developmental stimuli are or aren’t, without knowing the properties of the organism.

This is, of course, correct. What counts as input to the language acquisition device (LAD) must be innately specified. Inputs do not come marked as linguistically vs non-linguistically relevant. Further what the LAD does in acquiring a G is the poster child example of unsupervised learning. And as we noted above, without a supervisor/curator selecting the relevant inputs for the child and organizing them into the appropriate boxes it’s the structure of the LAD that mustbe doing the relevant curating for itself. There really is no other alternative. 

SE points out an important consequence of this observation for nature vs nurture arguments within linguistics, including Poverty of Stimulus debates.  As SE notes (198): 

… organism external ‘stimuli’ cannot possibly suffice to explain any aspects of the developed adult state of any organism. 

Why? For the simple reason that the relevant PLD “experience” that the LAD exploits is itself a construction of the LAD. The relevant stimulus is the proximal one, and in the linguistic domain (indeed in most cognitively non-trivial domains) the proximal stimulus is only distantly related to the distal one that triggers the relevant transduction. Here is SE once more (199):

…experience is constructed by the organism’s innate properties, and is very different from ‘the environment’ or the behaviorist notion of ‘stimulus’.

As SE notes, all of this was well understood over 300 years ago (SE contains a nice little quote from Descartes). Actually, there was a lively discussion at the start of the “first cognitive revolution” (I think this is Chomsky’s term) that went under the name of the “primary/secondary quality distinction” that tried to categorize those features of proximate stimuli that reflected objective features of their distal causes and those that did not. Here appears to be another place where we have lost clear sight of conceptual ground that our precursors cleared.

SE contains a lot more provocative (IMO, correct) discussion of the implications of the observation that experience is a nature-infested notion. Take a look.

Let me mention one last paper that can be read along side TF and SE. It is on predictive coding, a current fad, apparently, within the cog-neuro world (here). The basic idea is that the brain makes top down predictions based on its internal mental/brain models about what it should experience, perception amounting to checking these predictions against the “input” and adjusting the mental models to fit these. In other words, perception is cognitively saturated. 

This idea seems to be getting a lot of traction of late (a piece in Quantais often a good indicator that an idea is “hot”). For our purposes, the piece usefully identifies how the new view differs from the one that was previously dominant (7-8):
The view of neuroscience that dominated the 20th century characterized the brain’s function as that of a feature detector: It registers the presence of a stimulus, processes it, and then sends signals to produce a behavioral response. Activity in specific cells reflects the presence or absence of stimuli in the physical world. Some neurons in the visual cortex, for instance, respond to the edges of objects in view; others fire to indicate the objects’ orientation, coloring or shading…
Rather than waiting for sensory information to drive cognition, the brain is always actively constructing hypotheses about how the world works and using them to explain experiences and fill in missing data. That’s why, according to some experts, we might think of perception as “controlled hallucination.”
Note the contrast: perception consists in detecting objective features of the stimulus vs constructing hypotheses about how the world works verified against bottom up “experience.” In other words, a passive feature detector vs an active mind constructing hypothesis tester.  Or, to be tendentious one more time, an Eish vs an Rish conception of the mental. 
One point worth noting. When I was a youngster oh so many decades ago, there was a big fight about whether brain mechanisms are largely bottom up or top down computational systems. The answer, of course, is that it uses both kinds of mechanisms. However the prevalent sentiment in the neuro world was that brains were largely bottom up systems, with higher levels generalizing over features provided by lower ones. Chomsky’s critique of discovery procedures (see herefor discussion) hit at exactly this point, noting that in the linguistic case it was not possible to treat higher levels as simple summaries of the statistical properties of lower ones. Indeed, the flow of information likely went from higher to lower as well. This has a natural interpretation in terms of brains mechanisms involving feed forward as well as feed back loops. Interestingly, this is what has also driven the trend towards predictive coding in the neuro world. It was discovered that the brain has many “top down feedback connections” (7)[7]and this sits oddly with the idea that brains basically sit passively waiting to absorb perceptual inputs. At any rate, there is an affinity between thinking brains indulge in lots of feed forward processing and taking brains to be active interpreters of the passing perceptual scene.
That’s it. To repeat the main message, the E vs R conceptions of the mind/brain and how it functions are very different, and importantly so. As the above papers note, it is all too easy to get confused about important matters if the differences between these two views of the mental world are not kept in mind. Or, again to be tendentious: Eism is bad for you! Only a healthy dose of Rism can protect you from walking its fruitless paths. So arm yourself and have a blessed Rish day.

[1]They also have two divergent pictures of how data and theory relate in inquiry, but that is not the topic of today’s sermonette.
[2]I have argued elsewhere (here) that this passivity is what allows Es to have a causal semantic theory. 
[3]Nor from what I can gather from Kahneman’s Noble lecture is he committed to the view that salience is a property of objects. Rather it is a property of situations a sentient agent finds herself in. The important point for Kahneman is that they are more or less automatic, fast, and unconscious. This is consistent with it being cognitively guided rather than a transparent reflection of the properties of the object. So, though TF’s point is useful, I suspect that he did not get Kahneman quite right. Happily none of that matters here.
[4]A perhaps pointless quibble: the fact that people cannot reportseeing a gorilla does not mean that they did not perceive one. The perceptual (and even cognitive) apparatus might indeed have registered a gorilla without it being the case that that viewers can access this information consciously. Think of being asked about the syntax of a sentence after hearing it and decoding its message. This is very hard to retrieve (it is below consciousness most of the time) but that does not mean that the syntax is not being computed. At any rate, none of this bears on the central issues, but it was a quibble that I wanted to register.
[5]NH: again, I would replace ‘psychophysics’ with ‘Eism.’
[6]As TF notes, this is actually a very old problem within AI. It is the “frame problem.” It was understood to be very knotty and nobody had any idea how to solve it in the general case. But, as TF noted, it has been forgotten “amid the present euphoria with large-scale information- and data-processing” (6).
            Moreover, it is a very hard problem. It is relatively easy to identify salient features givena context. Getting a theory of salience, in contrast, (i.e. a specification of the determinants of salience acrosscontexts) is very hard. As Kahneman notes in his Nobel Lecture (456), it is unlikely that we will have one of these anytime soon. Interestingly, early on Descartes identified the capacity for humans to appropriatelyrespond to what’s around them as an example of stimulus free (i.e. free and creative) behavior. We do not know more about this now than Descartes did in the 17thcentury, a correct point that Chomsky likes to make.
[7]If recollection serves (but remember I am old and on the verge of dementia) the connections from higher to lower brain levels is upwards of five times those from lower to upper. It seems that the brain is really eager to involve higher level “expectations” in the process of analyzing incoming sensations/perceptions.

Monday, August 27, 2018

Revolutions in science; a comment on Gelman

In what follows I am going to wander way beyond my level of expertise (perhaps even rudimentary competence). I am going to discuss statistics and its place in the contemporary “replication crisis” debates. So, reader be warned that you should take what I write with a very large grain of salt. 

Andrew Gelman has a long post (here, AG) where he ruminates about a comparatively small revolution in statistics that he has been a central part of (I know, it is a bit unseemly to toot your own horn, but heh, false modesty is nothing to be proud of either). It is small (or “far more trivial”) when compared to more substantial revolutions in Biology (Darwin) or Physics (Relativity and Quantum mechanics), but AG argues that the “Replication revolution” is an important step in enhancing our “understanding of how we learn about the world.” It may be right. But…

But, I am not sure that it has the narrative quite right. As AG portrays matters, the revolution need not have happened. The same ground could have been covered with “incremental corrections and adjustments.” Why weren’t they? The reactionaries forced a revolutionary change because of their reactions to reasonable criticisms by the likes of Meehl, Mayo, Ioannidis, Gelman, Simonsohn, Dreber, and “various other well-known skeptics.” Their reaction to these reasonable critiques was to charge the critics with bullying or insist that the indicated problems are all part of normal science and will eventually be removed by better training, higher standards etc. This, AG argues, was the wrong reaction and required a revolution, albeit a minor one relatively speaking, to overturn.

Now, I am very sympathetic to a large part of this position. I have long appreciated the work of the critics and have covered their work in FoL. I think that the critics have done a public service in pointing out that stats has served to confuse as often (maybe more often) than it has served to illuminate. And some have made the more important point (AG prominently among them) that this is not some mistake, but serves a need in the disciplines where it is most prominent (see here). What’s the need? Here is AG:[1]

Not understanding statistics is part of it, but another part is that people—applied researchers and also many professional statisticians—want statistics to do things it just can’t do. “Statistical significance” satisfies a real demand for certainty in the face of noise. It’s hard to teach people to accept uncertainty. I agree that we should try, but it’s tough, as so many of the incentives of publication and publicity go in the other direction.

And observe that the need is Janus faced. It faces inwards to relieve the anxiety of uncertainty and it faces outwards in relieving professional publish-or-perish anxiety. Much to AG’s credit he notices that these are different things, though they are mutually supporting. I suspect that the incentive structure is important, but secondary to the desire to “get results” and “find the truth” that animates most academics. Yes, lucre, fame, fortune, status are nice (well, very nice) but I agree that the main motivation for academics is the less tangible one, wanting to get results just for the sake of getting them. Being productive is a huge goal for any academic, and a big part of the lure of stats, IMO, is that it promises to get one there if one just works hard and keeps plugging away. 

So, what AG says about the curative nature of the mini-revolution rings true, but only in part. I think that the post fails to identify the three main causal spurs to stats overreach when combined with the desire to be a good productive scientist.

The first it mentions, but makes less off than perhaps others have. It is that stats are hard and interpreting them and applying them correctly takes a lot of subtlety. So much indeed that even experts often fail (see here). There is clearly something wrong with a tool that seems to insure large scale misuse. AG in fact notes this (here), but it does not play much of a role in the post cited above, though IMO it should have. What is it about stats techniques that make them so hard to get right? That I think is the real question. After all, as AG notes, it is not as if all domain find it hard to get things right. As he notes, psychometricians seem to get their stats right most of the time (as do those looking for the Higgs boson). So what is it about those domains where stats regularly fails to get things right that makes it the case that they so generally fail to get things right?  And this leads me to my second point.

Stats techniques play an outsized role in just those domains where theory is weakest. This is an old hobby horse of mine (see here for one example). Stats, especially fancy stats, induces the illusion that deep significant scientific insights are for the having if one just gets enough data points and learns to massage them correctly (and responsibly, no forking paths for me thank you very much). This conception is uncomfortable with the idea that there is no quick fix for ignorance. No amount of hard work, good ethics, or careful application suffices when we really have no idea what is going on. Why do I mention this? Because, in many of the domains where the replication crisis has been ripest are domains that are very very hard and where we really don’t have much of an understanding of what is happening. Or maybe to put this more gracefully, either the hypotheses of interest are too shallow and vague to be taken seriously (lots of social psych) or the effects of interest are the results of myriad interactions that are too hard to disentangle. In either case, stats will often provide an illusion of rigor while leading one down a forking garden path. Note, if this is right, then we have no problem seeing why psychometricians were in no need of the replication revolution. We really do have some good theory in the domains like sensory perception, and here stats have proven to be reliable and effective tools. The problem is not with stats, but with stats applied where they cannot be guided (and misapplications tamed) by significant theory.

Let me add two more codicils to this point.

First, here I part ways with AG. The post suggests that one source of the replication problem is with people having too great “an attachment to particular scientific theories or hypotheses.” But if I am right this is not the problem, at least not the problem behind the replication crisis. Being theoretically stubborn may make you wrong, but it is not clear why it makes your work shoddy. You get results you do not like and ignore them. That may or may not be bad. But with a modicum of honesty, the most stiff necked theoretician can appreciate that her/his favorite account, the one true theory, appears inconsistent with some data. I know whereof I speak, btw. The problem here, if there is one, is not generating misleading tests and non-replicable results, but if ignoring the (apparent) counter data. And this, though possibly a problem for an individual, may not be a problem for a field of inquiry as a whole. 

Second, there is a second temptation that today needs to be seriously resisted but that severely leads to replication problems: because of the ubiquity and availability of cheap “data” nowadays, the temptation to think that this time it’s different is very alluring. Big Data types often seem to think that get a large enough set of numbers, apply the right stats techniques (rinse and repeat) and out will plop The Truth. But this is wrong. Lars Syll puts it well here in a post entitled correctly “Why data is NOT enough to answer scientific questions”:

The central problem with present ‘machine learning’ and ‘big data’ hype is that so many –falsely- think that they can ge away with analyzing real-world phenomena without any (commitment to) theory. But –data never speaks for itself. Without a prior statistical set-up, there actually are no data at all to process. And – using a machine learning algorithm will only produce what you are looking for.

Clever data mining tricks are never enough to answer important scientific questions. Theory matters.

So, when one combines the fact that in many domains we have, at best, very weak theory, and that nowadays we are flooded with cheap available data the temptation to go hyper statistical can be overwhelming.

Let me put this another way. As AG notes, successful inquiry needs strong theory and careful measurement. Not the ‘and.’ Many read the ‘and’ as an ‘or’ and allow that strong theory can substitute for paucity of data or that tons of statistically curated data can substitute for virtual absence of significant theory. But this is a mistake. But a very tempting one if the alternative is having nothing much of interest or relevance to say at all. And this is what AG underplays: a central problem with stats is that it often tries to sell itself as allowing one to bypass the theory half of the conjunction. Further, because it “looks” technical and impressive (i.e. has a mathematical sheen) it leads to cargo cult science, scientific practice that looks "science" rather than being scientific. 

Note, this is not bad faith or corrupt practice (though there can be this as well). This stems from the desire to be, what AG dubs, a scientific “hero,” a disinterested searcher for the truth. The problem is not with the ambition, but the added supposition that any problem will yield to scientific inquiry if pursued conscientiously. Nope. Sorry. There are times when there is no obvious way to proceed because we have no idea how to proceed. And in these domains no matter how careful we are we are likely to find ourselves getting nowhere.

I think that there is a third source of the problem that resides in the complexity of the problems being studied. In particular, the fact that many phenomena we are interested in arise from the interaction of many causal sub-systems. When this happens there is bound to be a lot of sensitivity to the particular conditions of the experimental set up and so lots of opportunities for forking paths (i.e. p-hacking) stats (unintentional) abuse. 

Now, every domain of inquiry has this problem and needs to manage it. In the physical sciences this is done by (as Diogo once put it to me) “controlling the shit out of the experimental set up.” Physicists control for interaction effects by removing many (most) of the interfering factors. A good experiment requires creating a non-natural artificial environment in which problematic factors are managed via elimination. Diogo convinced me that one of the nice features of linguistic inquiry is that it is possible to “control the shit” out of the stimuli thereby vastly reducing noise generated by an experimental subject. At any rate, one way of getting around interaction effects problem is to manage the noise by simplifying the experimental set up and isolating the relevant causal sub-systems.

But often this cannot be done, among other reasons because we have no idea what the interacting subsystems are or how they function (think, for example, pragmatics).  Then we cannot simplify the set up and we will find that our experiments are often task dependent and very noisy. Stats offers a possible way out. In place of controlling the design of the set up the aim is to statistically manage (partial out) the noise. What seems to have been discovered (IMO, not surprisingly) is that this is very hard to do in the absence of relevant theory. You cannot control for the noise if you have no idea where it comes from or what is causing it. There is no such thing as a theory free lunch (or at least not a nutritious one). The revolution AG discusses, I believe, has rediscovered this bit of wisdom.

Let me end with an observation special to linguistics. There are parts of linguistics (syntax, large parts of phonology and morphology) where we are lucky in that the signal from the underlying mechanisms are remarkably strong in that they withstand all manner of secondary effects. Such data are, relatively speaking, very robust. So, for example, ECP or island or binding violations show few context effects. This does not mean to say that there are no effects at all of context wrt acceptability (Sprouse and Co. have shown that these do exist). But the main effect is usually easy to discern. We are lucky. Other domains of linguistic inquiry are far noisier (I mentioned pragmatics, but even large parts of semantics strike me as similar (maybe because it is hard to know where semantics ends and pragmatics begins)). I suspect that a good part of the success of linguistics can be traced to the fact that FL is largely insulated from the effects of the other cognitive subsystems it interacts with. As Jerry Fodor once observed (in his discussion of modularity), the degree to which a psych system is modular to that degree it is comprehensible. Some linguists have lucked out. But as we more and more study the interaction effects wrt language we will run into the same problems. If we are lucky, linguistic theory will help us avoid many of the pitfalls AG has noted and categorized. But there are no guarantees, sadly.

[1]I apologize for not being able to link to the original. It seems that in the post where I discussed it, I failed to link to the original and now cannot find it. It should have appeared in roughly June 2017, but I have not managed to track it down. Sorry.

Wednesday, August 22, 2018

Some thoughts on the relation between syntax and semantics, through the lens of compositionality:…/

Tuesday, August 21, 2018

Language and cognition and evolang

Just back from vacation and here is a starter post on, of all things, Evolang (once again).  Frans de Waal has written a short and useful piece relevant to the continuity thesis (see here). It is useful for it makes two obvious points, and it is important because de Waal is the one making them. The two points are the following:

1.     Humans are the only linguistic species.
2.     Language is not the medium of thought for there are non-verbal organisms that think.

Let me say a few words about each.

de Waal is quite categorical about the each point. He is worth quoting here so that next time you hear someone droning on about how we are just like other animals just a little more so you can whip out this quote and flog the interlocutor with it mercilessly. 

You won’t often hear me say something like this, but I consider humans the only linguistic species. We honestly have no evidence for symbolic communication, equally rich and multifunctional as ours, outside our species. (3)

That’s right: nothing does language like humans do language, not even sorta kinda. This is just a fact and those that know something about apes (and de Waal knows everything about apes) are the first to understand this. And if one is interested in Evolang then this fact must form a boundary condition on whatever speculations are on offer. Or, to put this more crudely: assuming the continuity thesis disqualifies one from participating intelligently in the Evolang discussion. Period. End of story. And, sadly, given the current state of play, this is a point worth emphasizing again and again and again and… So thanks to de Waal for making it so plainly.

This said, De Waal goes on to make a second important point: that even if it is the case that no other animals have our linguistic capacities even sorta kinda, it does not mean that some of the capacities underlying language might not be shared with other animals. In other words, the distinction between a faculty for language in the narrow versus a faculty for language in the broad sense is a very useful one (I cannot recall right now who first proposed such a distinction, but whoever it was thx!). This, of course, cheers a modern minimalist’s heart cockles, and should be another important boundary condition on any Evolang account. 

That said, de Waal’s two specific linguistic examples are of limited use, at least wrt Evolang. The first is bees and monkeys who de Waal claims use “sequences that resemble a rudimentary syntax” (3). The second “most intriguing parallel” is the “referential signaling” of vervet monkey alarm calls. I doubt that these analogous capacities will shed much light on our peculiar linguistic capacities precisely because the properties of natural language words and natural language syntax are where humans are so distinctive. Human syntax is completely unlike bee or monkey syntax and it seems pretty clear that referential signaling, though it is one use to which we put language, is not a particularly deep property of our basic words/atoms (and yes I know that words are not atoms but, well, you know…). In fact, as Chomsky has persuasively argued IMO, Referentialism (the doctrine (see here)) does a piss poor job of describing how words actually function semantically within natural language. If this is right, then the fact that we and monkeys can both engage in referential signaling will not be of much use in understanding how words came to have the basic odd properties they seem to have. 

This, of course, does not detract from de Waal’s correct two observations above. We certainly do share capacities with other animals that contribute to how FL functions and we certainly are unique in our linguistic capacities. The two cases of similarity that de Waal cites, given that they are nothing like what we do, endorses the second point in spades (which, given the ethos of the times, is always worth doing).

Onto point deux. Cognition is possible without a natural language. FoLers are already familiar with Gallistel’s countless discussions of dead reckoning, foraging, and caching behavior in various animals. This is really amazing stuff and demands cognitive powers that dwarf ours (or at least mine: e.g. I can hardly remember where I put my keys, let alone where I might have hidden 500 different delicacies time stamped, location stamped, nutrition stamped and surveillance stamped). And they seem to do this without a natural language. Indeed, the de Waal piece has the nice feature of demonstrating that smart people with strong views can agree even if they have entirely different interests. De Waal cites none other than Jerry Fodor to second his correct observation that cognition is possible without natural language. Here’s Jerry from the Language of Thought:

‘The obvious (and I should have thought sufficient) refutation of the claim that natural languages are the medium of thought is that there are non-verbal organisms that think.’ (3)

Jerry never avoided kicking a stone when doing so was all a philosophical argument needed. At any rate, here Fodor and de Waal agree. 

But I suspect that there would be more fundamental disagreements down the road. Fodor, contra de Waal, was not that enthusiastic of the idea that we can think in pictures, or at least not think in pictures fundamentally. The reason is that pictures have little propositional structure and thinking, especially any degree of fancy thinking, requires propositional structures to get going. The old Kosslyn-Pylyshyn debate over imagery went over all of this, but the main line can be summed up by one of Lila Gleitman’s bon mots: a picture is worth a thousand words, and that is the problem. Pictures may be useful aids to thinking, but only if supplied with captions to guide the thinking. In and of themselves, pictures depict too much and hence are not good vehicles for logical linkage. And if this is so (and it is) then where there is cognition there may not be natural language, but there must be a language of thought (LOT) (i.e. something with propositional structure that licenses the inferences that is characteristic of cognitive expansiveness in a given domain). 

Again this is something that warms a minimalist heart (cockles and all). Recall the problem: find the minimal syntax to link the CI system and the AP system. CI is where language of thought lives. So, like de Waal, minimalists assume that there is quite a lot of cognition independent of natural language, which is why a syntax that links to it is doing something interesting.

Truth be told, we know relatively little about LOT and it properties, a lot less that we know about the properties of natural language syntax IMO. But regardless, de Waal and Fodor are right to insist that we not mix up the two. So don’t.

Ok, that’s enough for an inaugural post vaca post. I hope your last two weeks were as enjoyable as mine and that you are ready for the exciting pedagogical times ahead.

Friday, August 10, 2018

A sort-of BS follow-up: Alternatives to LMS

Norbert's recent post on the BSification of academic life skipped one particular wart that's at the front of my mind now that I'm in the middle of preparing classes for the coming semester: Learning Management Systems (LMS). In my case, it's Blackboard, but Moddle and Canvas are also common. It is a truth universally acknowledged that every LMS sucks. It's not even mail client territory, where the accepted truth is "all mail clients suck, but the one I use sucks less". LMS are slow, clunky, inflexible, do not support common standards, and quite generally feel like they're designed by people who couldn't cut it at a real software company. They try to do a million things, and don't do any of them well. Alright, this is where the rant will stop. Cathartic as it may be, these things have been discussed a million times. Originally I had an entire post here that explores how the abysmal state of LMS is caused by different kinds of academic BS, but I'd like to focus on something more positive instead: alternatives to LMS that work well for me and might be useful for you, too. If somebody really wants to hear me complain about LMS and the academic BS surrounding them, let me know in the comments and I'll post that part as a follow-up.

Monday, July 30, 2018

The BSification of everyday academic life

One of the most useful philosophy tracts written in the last 25 years is Harry Frankfurt’s On Bullshit (OB, here). OB makes an important distinction between lying and bullshitting, the latter being the more insidious as, in contrast to the former that shows a regard for the truth (by deliberately contradicting it), the latter could care less. BS’s insidiousness arise from two features: (i) its actual disregard for the truth and (ii) its great regard to appearto be true. Thus, BS prizes truthiness (h/t to Colbert), but could care less about truth.

This is a powerful insight, and it has been weaponized. Tobacco companies and the large fossil fuel energy companies have understood that the best way to stop rational action is to obfuscate the intellectual terrain (here). The aim is not to persuade so much as to make it impossible to conclude. Ignorance really is bliss for some and BS is a very good way of spreading it. 

The institutional BS business is now widespread enough that there are academics that study it. The problem of how doubt is spread is now an academic discipline with a Greek rooted name, ‘agnotology’ (here), to demarcate it from other kinds of rhetorical studies (BS, MS, PhD indeed).[1]

In this post, I point you to a second useful theoretical treatise on the topic, one that expands the BS descriptor from ideas to occupations. David Graeber (DG) has a new book out on the topic and an interview where he discusses its main thesis (here). It notes that BS decisively shapes the ecology of the workplace so that some jobs are best understood as BS positions. What makes one such? A job is BS (BSJ) when it is “so pointless that even the person doing the job secretly believes that it shouldn’t exist” (1). Anyone in an academic environment can probably point to several such (the hierarchy of assistant deans/provosts (and their assistants) is a good place to look for of BSJs). DG has a nice taxonomy that I commend to your attention. There are at least six categories: flunkies, goons, duct-tapers, box-tickers, task-makers, and bean-counters. I am sure you can figure out their respective skill sets from the evocative titles, but what makes DG’s discussion illuminating is his anthro-socio take on these positions and what forces lead to their proliferation even in enterprises whose aim is to make money. Universities, where lucre is not the obvious organizing principle, act like hothouses and the most exotic versions of these six BSJs are spotted regularly, especially when manured with just a dollop of the latest philosophy from our leading schools of business and management.[2]

So, BS abounds. But, sadly, it is not just confined to specific jobs. It is everywhere. We need another category: BS activities (BSA). BSAs are now part of even the necessary parts of life. Here are some observations concerning how BSAs are now standard features of even the good parts of academic life.

I have ranted before about how the “wider consequences” sections of NSF and NIH funding grants have grown in importance. This is the last section of the grant where you have to say how developing a superior theory of case and agreement will lead to a cure for world hunger, cancer and aphasia. What makes this BS is not merely that it is clearly untrue and unfounded, but that everyone knows that it is, knows that everyone knows that it is, knows that everyone knows that everyone knows that everyone knows that it is…In short, it is BS that everyone recognizes as BS and nonetheless the process demands that everyone take it seriously enough to act as if it is not. In fact, this is critical: what makes BS insidious is not merely that it could care less about the truth of the matter, but when institutionalized it requires that those who deal with it to take it seriously. BS recognized as such can be funny, and even subversive (Colbert has made a career on this). But BS requirements in an NSF/NIH grant cannot be laughed away. They must be taken seriously and all involved are forced to pretend that what is obvious BS is not. 

And this is what makes it so insidious in academic life. Optimists hope that it can be circumscribed to its own little section of the grant (near the end) and limit its affects. But this is a BS hope. Like the camel’s nose under the tent, once in it spreads everywhere and quickly. How quickly? Here’s a conjecture: the prevalence of institutionalized BS is a contributing factor to the replication crisis. As noted, BS prizes truthiness (the appearance of truth) stats magic provides packaged (as in stats packages) ways to manufacture truthiness (recall Twain’s “lies, damn lies and statistics”), so with the rise of BS and the strong incentive to avoid being recognized asBS, we get it everywhere camouflaged in stats. Som,first, the end sections of the grant, then everywhere. There really is a cost to playing along.

Here is a second recent personal example. I was asked to write a “minimalist” chapter for a volume comparing theoretical approaches in linguistics. I agreed.And this was a mistake. First, others would have done a more mainstream job of it. I am quite certain my take on things is quite idiosyncratic. But, moreover, I did not really think that what I had to say really fit in with the spirit of the other contributions (and the reviews made just this point). However, I agreed. And I am lucky I did for it allowed me to experience another small place where BS thrives in academe. Let me relate.

As many of you know, when you contribute something to a non open source publisher you sign away the rights to this work as part of the process of publication. In this case, I got the standard 5 page contract, but this time I read it. It was completely incomprehensible, though from what I could make out it it basically delivered allthe rights to the paper (ideas in it, phrasing of these ideas, everything) to the publisher. It also forbade me from using the paper or a version thereof in the future. As the publisher was European, there were references to various EU laws that underlay the codicils in the contract. I was assured that the contract was pretty standard and that I should sign on the dotted line.

But, I did not like the idea that the paper’s contents no longer belonged to me. And because I am old and am no longer all that focused on padding my CV and can afford to loose the $0.00 royalty check that this chapter was going to generate and am not worried about further academic advancement and…I decided to try to understand what it was that the contract was actually saying and what rights I was actually signing away in return for what services. That was where the complete BSishness of the whole thing became evident.

First the remuneration: The obligation of the publisher was to publish the paper and give me a copy of the tomb in which it would appear. In this day and age, receiving such a door stopper is more like receiving the proverbial white elephant than real remuneration. But, that was it. In return I had to do a whole bunch of things to get the MS in shape, within a certain time frame etc. All in all, the demands were not unreasonable. 

What did I give up? Well, all rights to the paper, or this is what I thought the contract stipulated. I replied that I did not like this idea as I intended to use the material again in a larger project. I also asked about the standard EU laws the contract bandied about and that the publisher insisted needed to be adhered to. And here is when things got fun.

It turns out that nobody I talked to knew what these laws were. Nobody could tell me. Moreover, everyone assured me that regardless of what the contract said, it really didn’t matter because it would not be enforced. Should I decide to use the material again in another project (after a year’s time) the publisher would do nothing about it. Or, more accurately, the publisher rep told me that they knew of no case where anyone who used their own work in future work was called to the mat for doing so. In other words, the contract was a purely formal object whose content was BS and yet we were all obliged to take that content seriously as if it were not BS so that we could get on with ignoring it and get the book published.

Things proceeded from there to the point where we agreed to a two line contract that basically said that they could use the paper, I would not use it for a year and that I could use it after this time as I saw fit. Nobody ever explained the EU laws to me (or to themselves), nobody batted an eye when these EU laws were dropped from the final contract, nobody was concerned to do anything but make this thing go away and have a signed document of little relevance to what was actually happening (or so I was repeatedly assured). It just needed to get signed. And so I did. At least the two codicil version. Pure BS.

The contract that is opaque to all that sign it and all that ask that it be signed is another example of the ritualization of BS in academic life. And like the NSF/NIH version it coarsens intellectual life. Let me say this more strongly, it is especiallycorrosive of academic life. Academics are people whose professional obligations involve taking ideas seriously. In fact, this is the main thing we are trained to do, at least within some small domain. This is the core value: be serious about thoughts! BS is the vice that most challenges this virtue. It insists that you take none if it seriously, because seriousness about ideas is what BS is meant to undermine. BS, BS jobs, BS activities, BS forms, BS sections of forms…, all serve to undermine this seriousness. It grubbies the mind and it keeps on coming. The optimistic view is that it does not really matter. The pessimistic view is that it is too late to change it. The moderately hopeful view is that eternal vigilance is the only possible defense. I swing between the second and the third. 

[1]Bull sh*t, more sh*t, piled high and deep!
[2]See herefor an amusing take. But beware: the piece will play to many of your prejudices and so should be read critically. For someone like me, it is all too easy to believe most every judgment passed. Still, a sample quote might whet your appetite (1):

As a principal and founding partner of a consulting firm that eventually grew to 600 employees, I interviewed, hired, and worked alongside hundreds of business-school graduates, and the impression I formed of the M.B.A. experience was that it involved taking two years out of your life and going deeply into debt, all for the sake of learning how to keep a straight face while using phrases like “out-of-the-box thinking,” “win-win situation,” and “core competencies.”

I am sure that it has not escaped your notice that this is high class BS and so now we are in the delightful situation where part of the university manufactures what the other part studies. We may have discovered an intellectual perpetual motion machine.