Comments

Thursday, May 25, 2017

Naturalized philosophy

I went to graduate school in philosophy a long time ago. At that time, there was a premium put on “naturalized” research, the idea being that good philosophy needed grounding in a “real” (non-philosophical) discipline. It was a time when Newton and Einstein and Boyle and Godel and Poincare joined the usual dead white European males that we all know and love in the pantheon of philosophical greats. In this setting, it is no surprise that Chomsky and his work made frequent appearances in the pages of the most prestigious philo journals and was a must read for a philosopher of language. It actually took some effort for the discipline to relegate Chomsky to the domain of “philosophical naïf” (I think this was Putnam’s phrase) and it coincided with endless debates about the implications of the referentialist worldview for narrow content and semantic meaning. IMO, this work did not deliver much in the way of insight, though it did manage to make quite a few careers. At any rate, Chomsky’s exit from the main stage coincided with a waning of the naturalizing project and a return to the metaphysical (and metalinguistic) abstruseness that philosophy is, it appears, endemically attracted to. If nothing else, de-naturalizing philosophy establishes academic protective boundaries providing philosophy with a proprietary subject matter that can protect deep thinkers from the naturalizers and their empirical pretentions.[1] Why do I mention this? Because I am a big fan of the kind of naturalized philosophy that the above mentioned luminaries practiced and so I am usually on the lookout for great examples thereof.

What are the distinctive marks of this kind of work? It generally rests on a few pretty “obvious” empirical premises and demonstrates their fertile implications. Chomsky’s work offers an excellent illustration.

What is Chomsky’s most significant contribution to philosophy (and indeed linguistics)? He identified three problems in need of solution: what does a native speaker know when s/he knows her/his native language? What meta-capacity underlies a native speaker’s capacity to acquire her/his native language? And how did this meta-capacity arise in the species?  These are the big three questions he put on the table. And the subsequent questions they naturally lead to: How do native speakers use their knowledge to produce and understand language, how do LADs use their meta-capacity to acquire their native capacity? How are one’s knowledge of language embodied in wetware?  The last three rely on glimmers of answers to the first three. Chomsky has taught us how to understand the first three. 

Here’s the argument. It is based on few really really really obvious facts. First, that nothing does language like humans do. Birds fly, fish swim, humans do language. It is a species specific capacity unlike anything we find anywhere else. Second, a native speaker displays linguistic creativity. This means that a native speaker can use and understand an unbounded number of linguistic objects never before encountered and does this relatively effortlessly. Third, any kid can reflexively acquire any language when placed in the right linguistic environment (linguistic promiscuity), an environment which, when one looks even moderately closely, vastly underdetermines the knowledge attained (poverty of the linguistic stimulus). These three facts make it morally certain that part of linguistic competence implies internalization of a G, that the human meta-capacity of interest involves a higher order capacity to acquire certain kinds of Gs and not others and that this meta-capacity rests on some distinctive species specific capacities of humans. These three conclusions rest solidly on these obvious facts and together they bring forth a research program: what properties to human Gs have and what is the fine structure of the meta-capacity. That Gs exist and that FL/UG exists is trivially true. What their properties are is anything but.[2]

As FoLers know, Chomsky has recently added a third question to the agenda: how FL/UG could possibly have arisen. He argues that the relative rapidity of the emergence of FL and its subsequent stability argues for an intriguing conclusion: that the change that took place was necessarily pretty small and that whatever is proprietary to language must be quite minor. I tend to think that Chomsky is right about this, and that it motivates a research program that (i) aims to limit what is linguistically special while (ii) demonstrating how this special secret sauce allows for an FL like ours in the context of other more cognitively and computationally general mental capacities it is reasonable to believe that our pre-linguistic ancestors enjoyed. Imo, this line of thinking is less solidly based on “obvious” facts, but the line of inquiry is sufficiently provocative to be very inviting. Again, the details are up for grabs, as they should be.

So what are the marks of naturalized philosophy? Identifying questions motivated by (relatively) straightforward facts that support a framework for asking more detailed questions using conventional modes of empirical inquiry. Chomsky is a master of this kind of thinking. But he is not alone. All of the above is actually in service of advertising another such effort by Randy Gallistel. The paper of interest, which is a marvelous piece of naturalized philosophy appeared here in TiCS. I want to say a word or two about it.

Gallistel’s paper is on the coding question. The claim is that this question has been effectively ignored in the cog-neuro world with baleful effects. The aim is to put it front and center on the research agenda and figure out what kind of neural system is compatible with a reasonable answer to that question. The argument in the paper is roughly as follows.

First, there is overwhelming behavioral evidence that animals (including humans) keeps track of numerical quantities (see box 1, (3)). Hence the brain must have a way to code for number. It must be able to store these numbers in some way and must be able to transmit this stored information in signals in some way. So it must be able to write this information to memory and read this information from memory.

Second, if the brain does code for number it must do so in some code. There are various kinds, but the two the paper discusses are hash/rate/tally codes vs combinatorial codes (4-5). The former are “unary” codes. What this means is that “to convey a particular number one must use as many code elements are the numerosity to which the number refers.” Thus, if the number is 20 then there are 20 hash marks/strokes/dotes whatever representing the number.

The paper distinguishes such codes from “combinatorial” codes. These are the ones we are familiar with. So for example, ‘20’ conveys the number 20 and does so by using 10 digits in order sensitive configurations (i.e. 21 differs from 12). Note, combinatorial code patterns are not isomorphic to the things they represent.[3] 

The paper explores the virtues of combinatorial codes as against hash/rate/tally codes. The latter are “vastly more efficient” by orders of magnitude. Rate codes “convey 1 bit per spike” (5) while it is known that spike trains convey between 3-7 bits per spike. Rate codes are very energy expensive, combinatorial codes can be “exponentially smaller” (6). Last of all, there is evidence that spike trains use combinatorial codes because “reordering the intervals changes the message” (recall ‘21’ vs ‘12’), as expected if they spike trains are expressing a combinatorial code. 

The conclusion: the brain uses a combinatorial code, and this is interesting because this seems to require that the code be “symbolic” in the sense that its abstract (syntactic) structure matters for the information being conveyed.  And this strongly suggests that this info is not stored in synapses as supposed in a neural net system.

This last conclusion should not be controversial. When first put on the market of ideas, neural nets were confidently sold as being non-representational. Rumelhart and McClelland focused on this as one of their more salient properties and Fodor and Pylyshyn criticized such models for precisely this reason. The Gallistel paper is making the additional point that being asymbolic is, in addition to being cognitively problematic, is also neurophysiologically a problem as the kind of codes we are pretty sure we need are the kinds that neural nets are designed not to support. And this means that these are the wrong neuro models for the brain: “In neural net models, plastic synapses are molded by experience” and were intended to model “associative bonds” which “were never conceived of as symbols, and neither are their neurobiological proxies” (8).

Note, we can conclude that neural nets are the wrong model even if we have no idea what the correct model is. We can know what kind of code it is and what this means for the right neurophysiology without knowing what the right neurophysiology is. And if the codes are combinatorial/symbolic then there is no way that the right physiology for memory can be neural nets. This takes the Fodor-Pylyshyn critique on major step further.

So, if not in nets, what kind of architecture. Well, you all know by now. The paper notes that we can get everything we want from a chemical computer. We can physically model classical von Neuman/Turing machines in chemistry, with addresses, reading from, writing to, etc. Moreover, chemical computation has some nice biological features. Complex chemicals can be very stable for long periods of time (what we want from a long term memory store), and writing to such molecules is very energy efficient (8). In addition, chemical computing can be fast (some “can be altered on a nanosecond time scale”) and we know of instances of this kind of chemical computing that are behaviorally relevant. Last chemical computations are very energy efficient. Both storing and computing can be done cheaply if done chemically.

All of this leads to the conclusion that the locus of neurobiological computing is chemical and where are the relevant chemicals? Inside the cell. So, in place of neural nets we have the “cell intrinsic memory hypothesis” (1). Happily, there is now evidence that some computing gets done intra-celluarly (1-2). But if some gets done there…

This paper is great naturalized philosophy: we argue from pretty simple behavioral evidence that a certain kind of coding format is required and then that these kinds of formats prefer certain kinds of physical systems to support such codes and end with conclusions about the locus of the relevant computations. Thus we move from numbers are required, to combinatorial codes are the right kind, to neural nets won’t cut it to chemical computing within the cell. The big open meaty empirical question is what particular combinatorial code is exploited. It’s the cog-neuro analogue of how DNA stores genetic information and uses it. Right now, we do not know. At all.

This last analogy to DNA is important, and, IMO, is the strongest reason for thinking that this line of thinking is correct. Conventional computers provide an excellent model of how computation can be physically implemented. We know how to chemically “build” a conventional computer. We know that biology already uses chemistry to store and use information in hereditary and development. Is it really plausible that this in place machinery is not used for cognitive computation? Or as the paper puts it in the last sentence: “Why should the conveyance of acquired information proceed by principles fundamentally different from those that govern the conveyance of heritable information?” Why indeed! Isn’t the contrary assumption (the machinery is there for the using but it is never used) biologically scandalous? Wouldn’t Darwin be turning in his grave if he considered this? Isn’t assuming it to be false a kind of cognitive creationism? Yup, connetionists and neural net types are the Jerry Falwells of biology! Who would have thunk it: the road from Associationsim to Creationism is paved with Empiricist intentions. Only Rationalism and Naturalized Philosophy can save you.



[1] Let me quickly add that I consider philosophy training a very useful aid to right thinking. Nothing allows you to acquire the feel for good argumentation than a stressful philosophical workout. And by “good” I mean understanding how premises related to conclusions, how challenging premises can allow one to understand how to evaluate conclusions, understanding that it is always reasonable to ask what would happen to a conclusion should such and such a premise be removed etc.  In other words, philosophy prizes deductive structure and this is a useful talent to nurture regardless of what conclusions you are interested in netting and premises you are interested in frying.
[2] Chomsky, as you all know, not only posed the questions but showed how to go about empirically investigating them. This is what puts him in with the Gods: he discovered interesting questions and figured out technology relevant to answering them.
[3] Thus, the numeral’s patterning represents the number in the former but not the latter. The difference between the two kinds of codes is similar to the one made (here) between patterns that track the patterning and those that do not.

Tuesday, May 16, 2017

The wildly successful minimalist program

It’s that time of year: spring has sprung, classes are almost over, and all of those commitments you made to write papers three years ago and forgot about are coming due. I am in the midst of one such effort right now (due the end of May). It’s one of those “compare different theories/frameworks” volumes and I have been asked to write on the Minimalist Program (MP). After ignoring the project for a good long time, I initially bridled against the fact that I had accepted to write anything. In order to extricate myself from the promise, I tried to convince the editor that the premise of the volume (that MP was a theory like the others) was false and so a paper on MP would not really be apposite. This tantrum was rejected. I then sulked. Finally, I decided that I would take the bull by the horns and argue that MP, contrary to what I perceive to be the conventional view, has been wildly successful in its own terms and that the reason for its widespread perceived failure is that most critics have refused to accept the premises of MP investigation. Why would they do so? There are several reasons, but the best one (and one that might even be correct) is that the premises for MP investigation (viz. that we know something about the structure of FL/UG and that something resembles GB) are shaky and so the project is premature. On this view the program is fine, it’s just that we’ve gotten a little ahead of ourselves.

This objection should sound familiar. It is what people who study specific languages and their Gs say about claims about FL and UG. We don’t know enough yet about particular Gs to address questions about FL/UG. Things are more complicated and we need time to sort these out.

I reject this. Things are always more complicated. Time is never right. IMO, GB is a pretty good theory and it is worth trying to see if we can derive some of its features in a more principled way. We will learn something even if we are not completely right about this (which is surely the case). In other words, GB is right enough (or, many of its properties will be part of whatever description turns out to be more accurate) and so trying to see how to derive its properties is a worthwhile project that could tech us something about FL/UG.

This, I should add, is the best reason to demur about MP (and as you can see, I am not sympathetic). Two others spring to mind: (i) MP sharpens the linguistics/languistics kulturkampf and (ii) MP privileges a kind of research that is qualitatively different from what most professionals commonly produce and so is suspect. 

I have beaten both these drums in the past, and I do so again here. I have convinced myself that the biggest practical problem for MP work is that it sharpens the contrast between the bio/cog and the philological perspectives on language. More specifically, MP only makes sense from the bio/cog perspective as it takes FL/UG as the object of inquiry. FL/UG is the explanandum. If you don’t think FL/UG exists (or you are not really interested in whether it exists) then MP will seem, at best, pointless and, at worst, mystical omphaloskepsis. It is an odd fact of life that many find their own interests threatened by those that do not share them. I suspect that MP’s greatest sin in the eyes of many is that it appears to devalue their own interest in language by promoting the study of the underlying faculty. This, of course, does not follow. Tastes differ, interests range. But there can be little doubt that one of Chomsky’s many vices is that by convincing so many to be fascinated by the problems he has identified that he has robbed so many of confidence in their own. MP simply sharpens: doing it at all means buying into the bio-cog program. Abandon hope all languists who enter here.

Second, furthering the MP project will privilege a kind of work distinct in style from that normally practiced. If the aim is unification then MP work will necessarily be quite theoretical and the relevance of this kind of work for the kinds of language facts that linguists prize somewhat remote, at least initially. Why? Because if a primary aim of MP is to deduce the basic features of GB from more fundamental principles then a good chunk of the hard work will be to propose such principles and see how to deduce the particular properties of GB from them. The work, in other words, will be analytic and deductive rather than descriptive and inductive. Need I mention again how little our community of scholars esteems such work?

I we put these two features of MP inquiry together, we end up with work that is hard core bio-mentalist and heavily deductive and theoretical in nature. Each feature suffices to generate skepticism (if not contempt) among many working linguists. This, at any rate, is what I argue in the paper that I avoided trying to write.

I cannot post the whole thing (or at least won’t do so today). But I am going to given you the intro stage setting (i.e. polemical) bits for your amusement.  Here goes, and may you have a happy time with your own thoughtless commitments.

****

What is linguistics about? What is its subject matter? Here are two views.

One standard answer is “language.” Call this the “languistic (LANG) perspective.” Languists understand the aim of a theory of grammar to describe the properties of different languages and identify the common properties they share. Languists frequently observe that there are very few properties that all languages have in common. Indeed, in my experience, the LANG view is that there are almost no language universals that hold without exception and that languages can and do vary arbitrarily and limitlessly. LANGers assume that if there are universals, then they are of the Greenbergian variety, more often statistical tendencies than categorical absolutes.  

There is a second answer to the question, one associated with Chomsky and the tradition in Generative Grammar (GG) his work initiated. Call this the linguistic (LING) perspective.” Until very recently, linguists have understood grammatical theory to have a pair of related objectives: (i) to describe the mental capacities of a native speaker of a particular language L (e.g. English) and (ii) to describe the meta-capacity that allows any human to acquire the mental capacities underlying a native speaker facility in a particular L (i.e. the meta-capacity required to acquire a particular G). LINGers, in other words, take the object of study to be two kinds of mental states, one that grammars of particular languages (i.e. GL) describe and one that “Universal Grammar” (UG) describes. UG, then, names not Greenbergian generalizations about languages but features of human mental capacity that enable them to acquire GLs. For linguists, the study of languages and their intricate properties is useful exactly to the degree that it sheds light on both of these mental capacities. As luck would have it, studying the products of these mental capacities (both at the G and UG level) provides a good window on these capacities.

The LANG vs LING perspectives lead to different research programs based on different ontological assumptions. LANGers take language to be primary and grammar secondary. GLs are (at best) generalizations over regularities found in a language (often a more or less extensive corpus or lists of “grammaticality” judgments serving as proxy).[1] For LINGers, GLs are more real than the linguistic objects they generate, the latter being an accidental sampling from an effectively infinite set of possible legitimate objects.[2] On this view, the aim of a theory of a GL is, in the first instance, to describe the actual mental state of a native speaker of L and thereby to indirectly circumscribe the possible legit objects of L. So for LINGers, the mental state comes first (it is more ontologically basic), the linguistic objects are its products and the etiology of those that publically arise (are elicited in some way) only partially reflect the more stable, real, underlying mental capacity. Put another way, the products are interaction effects of various capacities and the visible products of these capacities are the combination of their adventitious complex interaction. So the products are “accidental” in a way that the underlying capacities are not.

LANGers disagree. For them the linguistic objects (be they judgments, corpora, reaction times) come first, GLs being inductions or “smoothed” summaries of these more basic data. For LINGers the relation of a GL to its products is like the relation between a function and its values. For a LANGer it is more like the relation between a scatter plot and the smoothed distributions that approximate it (e.g. a normal distribution).

LINGers go further: even GLs are not that real. They are less real than UG, the meta-capacity that allows humans to acquire GLs. Why is UG more “real” than GLs? Because in a sense that we all understand, native speakers only accidentally speak the language they are native in. Basically, it is a truism universally acknowledged that any kid could have been native in any language. If this is true (and it is, really), then the fact that a particular person is natively proficient in a particular language is a historical accident. Indeed, just like the visible products of a GL result from a complex interaction of many more basic sub-capacities, a particular individual’s GL is also the product of many interacting mental modules (memory size, attention, the particular data mix a child is exposed to and “ingests,” socio-economic status, the number of hugs and more). In this sense, every GL is the product of a combination of accidental factors and adventitious associated capacities and the meta-capacity for building GLs that humans as a species come equipped with.

If this is right, then there is no principled explanation for why it is that Norbert Hornstein (NH) is a linguistically competent speaker of Montreal English. He just happened to grow up on the West Island of that great metropolis. Had NH grown up in the East End of London he would have been natively proficient in another “dialect” of English and had NH been raised in Beijing then he would have been natively proficient in Mandarin. In this very clear sense, then, NH is only accidentally a native speaker of the language he actually speaks (i.e. has acquired the particular grammatical sense (i.e. GL) he actually has) though it is no accident that he speaks some native language. At least not a biological accident for NH is the type of animal that would acquire some GL as a normal matter of course (e.g. absent pathological conditions) if not raised in feral isolation. Thus, NH is a native speaker of some language as a matter of biological necessity. NH comes equipped with a meta-capacity to acquire GLs in virtue of the fact that he is human and it is biologically endemic to humans to have this meta-capacity. If we call this meta-capacity the Faculty of Language (FL), then humans necessarily have an FL and necessarily have UG, as the latter is just a description of FL’s properties. Thus, what is most real about language is that any human can acquire the GL of any L as easily as it can acquire any other. A fundamental aim of linguistic theory is to explain how this is possible by describing the fine structure of the meta-capacity (i.e. by outlining a detailed description of FL’s UG properties).

Before moving on, it is worth observing that despite their different interests LINGers and LANGers can co-exist (and have co-existed) quite happily and they can fruitfully interact on many different projects. The default assumption among LINGers is that currently the best way to study GLs is to study its products as they are used/queried. Thus, a very useful way of limning the fine structure of a particular GL is to study the expressions of these GLs. In fact, currently, some of the best evidence concerning GLs comes from how native speakers use GLs to produce, parse and judge linguistic artifacts (e.g. sentences). Thus, LINGers, like LANGers, will be interested in what native speakers say and what they say about what they say. This will be a common focus of interest and cross talk can be productive.

Similarly, seeing how GLs vary can also inform one’s views about the fine structure of FL/UG. Thus both LINGers and LANGers will be interested in comparing GLs to see what, if any, commonalities they enjoy. There may be important differences in how LINGers and LANGers approach the study of these commonalities, but at least in principle, the subject matter can be shard to the benefit of each. And, as a matter of fact, until the Minimalist Program (MP) arose, carefully distinguishing LINGer interests from LANGer interests was not particularly pressing. The psychologically and philologically inclined could happily live side by side pursuing different but (often enough) closely related projects. What LANGers understood to be facts about language(s), LINGers interpreted as facts about GLs and/or FL/UG.


MP adversely affects this pleasant commensalism. The strains that MP exerts on this happy LING/LANG co-existence is one reason, I believe, why so many GGers have taken a dislike to MP.  Let me explain what I mean by discussing what the MP research question is. For that I will need a little bit of a running start.

Prior to MP, LING addressed two questions based on two evident rationally uncontestable facts (and, from what I can tell, these facts have not been contested). The first fact is that a native speaker’s capacities cover an unbounded domain of linguistic objects (phrases, sentences etc.). Following Chomsky (1964) we can dub this fact “Linguistic Creativity” (LC).[3]  dI’ve already adverted to the second fact: any child can acquire any GL as easily as any other. Let’s dub this fact “Linguistic Promiscuity” (LP). Part of a LINGers account for LC postulates that native speakers have internalized a GL. GLs consist of generative procedures (recursive rules) that allow for the creation of unboundedly complex linguistic expressions (which partly explains how a native speaker effortlessly deals with the novel linguistic objects s/he regularly produces and encounters).

LINGers account for the second fact, LP, in terms of the UG features of FL.  This too is a partial account. UG delineates the limits of a possible GL. Among the possible GLs, the child builds an actual one in response to the linguistic data it encounters and that it takes in (i.e the Primary Linguistic Data (PLD)).

So two facts, defining two questions and two kinds of theories, one delimiting the range of possible linguistic expressions for a given language (viz. GLs) and the other delimiting the range of possible GLs (viz. FL/UG). As should be evident, as a practical matter, in addressing LP it is useful to have to hand candidate generative procedures of specific GLs. Let me emphasize this: though it is morally certain that humans come equipped with a FL and build GLs it is an empirical question what properties these GLs have and what the fine structure of FL/UG is. In other words, that there is an FL/UG and that it yields GLs is not really open for rational debate. What is open for a lot of discussion and is a very hard question is exactly what features these mental objects have. Over the last 60 years GG has made considerable progress in discovering the properties of particular GLs and has reasonable outlines of the overall architecture of FL/UG. At least this is what LINGers believe, I among them. And just as the success in outlining (some) of the core features of particular Gs laid the ground for discovering non-trivial features of FL/UG, so the success in liming (some of) the basic characteristics of FL/UG has prepared the ground for yet one more question: why do we have the FL/UG that we have and not some other? This is the MP question. It is a question about possible FL/UGs.

There are several things worth noting about this question. First, the target of explanation is FL/UG and the principles that describe it. Thus, MP only makes sense qua program of inquiry if we assume that we know some things about FL/UG. If nothing is known, then the question is premature. In fact, even if something is known, it might be premature. I return to this anon. 

Second, the MP question is specifically about the structure of FL/UG. Thus, unlike earlier work where discussions of languistic interest can be used to obliquely address LC and LP, the MP question only makes sense from a LING perspective. It is asking about possible FL/UGs and this requires taking a mentalistic stance. Discussing languages and their various properties had better bottom out in some claim about FL/UG’s limits if it is to be of MP relevance.  This means that the kind of research MP fosters will often have a different focus from that which has come before. This will lead LANGers and LINGers to a more obvious parting of the investigative ways. In fact, given that MP takes as more or less given what linguists and languists have heretofore investigated as basic, MP is not really an alternative to earlier theory. More specifically, MP can’t be an alternative to GB because, at least initially, MP is a consumer of GB results.[4] What does this mean?

An analogy might help. Think of the relationship between thermodynamics and statistical mechanics. The laws of thermodynamics are grist for the stats mechanics mill, the aim being to derive the thermodynamic generalizations in a more principled atomic theory of mechanics. The right way to think of MP and early theory is in the same way. Take (e.g.) GB principles and see if they can be derived in a more principled way. That’s one way of understanding the MP program, and I will elaborate this perspective in what follows. Note, if this is right, then just as many thermodynamical accounts of, say, gas behavior will be preserved in a reasonable statistical mechanics, so too many GB accounts will be preserved in a decent MP theory of FL. The relation between GB and MP is not that between a true theory and a false one, but a descriptive theory (what physicists call an “effective” theory) and a more fundamental one.

If this is right, then GB (or whatever FL/UG is presupposed) accounts will mostly be preserved in MP reconstructions. And this is a very good thing! Indeed, this is precisely what we expect in science; results of past investigations are preserved in later ones with earlier work preparing the ground for deeper questions. Why are they preserved? Because they are roughly correct and thus not mimicking these results (at least approximately) is excellent indication that the subsuming proposal is off on the wrong track. Thus, a sign that the more fundamental proposal is worth taking seriously is that it recapitulates earlier results and thus a reasonable initial goal of inquiry is to explicitly aim to redo what has been done before (hopefully, in a more principled fashion).

If this is correct, it should be evident why many might dismiss MP inquiry. First, it takes as true what many will think contentious and tries to derive it. Second, it doesn’t aim to do much more than derive “what we already know” and so does not appear to add much to our basic knowledge, except, perhaps, a long labored (formally involved) deduction of a long recognized fact.

Speaking personally, my own work takes GB as a roughly correct description of FL/UG. Many who work on refining UGish generalizations will consider this tendentious. So be it. Let it be stipulated that at any time in any inquiry things are more complicated than they are taken to be. It is also always possible that we (viz. GB) got things entirely wrong. The question is not whether this is an option. Of course it is. The question is how seriously we should take this truism.

So, MP starts from the assumption that we have a fairly accurate picture of some of the central features of FL and considers it fruitful to inquire as to why we have found these features. In other words, MP assumes that time is ripe to ask more fundamental questions because we have reasonable answers to less fundamental questions. If you don’t believe this then MP inquiry is not wrong but footling.

Many who are disappointed in MP don’t actually ask if MP has failed on its own terms, given its own assumptions. Rather it challenges the assumptions. It takes MP to be not so much false as premature. It takes issue with the idea that we know enough about FL/UG to even ask the MP question. I believe that these objections are misplaced. In other words, I will assume that GBish descriptions of FL/UG are adequate enough (i.e. are right enough) to start asking the MP question. If you don’t buy this, MP will not be to your taste and you might be tempted to judge its success in terms of your interests rather than its own questions.



[1] There are few more misleading terms in the field than “grammaticality judgment.” The “raw” data are better termed “acceptability” judgments. Native speakers can reliably rank linguistic objects with regard to relative acceptability (sometimes under an interpretation). These acceptability judgments are, in turn, partial reflections of grammatical competence. This is the official LING view. LANGers need not be as fussy, though they too must distinguish data reflecting judgments in reflective equilibrium from more haphazard reactions. The reason that LANGers differ from LINGers in this regard reflects their different views on what they are studying. I leave it to the reader to run the logic for him/herself.
[2] The term set should not be taken too seriously. There is little reason to think that languages are sets with clear in/out conditions or that objects that GLs generate are usefully thought of as demarcating the boundaries of a language. In fact, LINGers don’t assume that the notion of a language is clear or well conceived. What LINGers do assume is that native speakers have a sense of what kinds of objects their native capacities extend to and that this is an open ended (effectively infinite) capacity and that is (indirectly) manifest in their linguistic behavior (production and understanding) of linguistic objects.
[3] Here’s Chomsky’s description of this fact in his (1964:7):
…a mature native speaker can produce a new sentence of his language on the appropriate occasion, and other speakers can understand it immediately, though it is equally new to them. Most of our linguistic experience, both as speakers and hearers, is with new sentences; once we have mastered a language, the class of sentences with which we can operate fluently is so vast that for all practical purposes (and, obviously, for all theoretical purposes), we may regard it as infinite.
 [4] Personally, I am a big fan of GB and what it has wrought. But MP style investigations need not take GB as the starting point for minimalist investigations. Any conception of FL/UG will do (e.g. HPSG, RG, LFG etc.). In my opinion, the purported differences among these “frameworks” (something that this edited collection highlights) have been overhyped. To my eye, they say more or less the same things, identify more or less the same limiting conditions and do so in more or less the same ways. In other words, these differing frameworks are largely notational variants of one another, a point that Stabler 2010) makes as well.

Thursday, April 27, 2017

How biological is biolinguistics?

My answer: very, and getting more so all the time.  This view will strike many as controversial. For example Cedric Boeckx (here and here) and David Berlinsky (here) (and most linguistics in discussions over beer) contend that linguistics is a BINO (biology in name only). After all, there is little biochemistry, genetics, or cellular biology in current linguistics, even of the Minimalist variety. Even the evolang dimension is largely speculative (though, IMO, this does not distinguish it from most of the “seripous” stuff in the field). And, as this is what biology is/does nowadays, then, the argument goes, linguistic pronouncements cannot have biological significance and so the “bio” in biolinguistics is false advertising. That’s the common wisdom as best as I can tell, and I believe it to be deeply (actually, shallowly) misguided. How so?

A domain of inquiry, on this view, is defined by its tools and methods rather than its questions. Further, as the tools and methods of GG are not similar to those found in your favorite domain of biology then there cannot be much bio in biolinguistics. This is a very bad line of reasoning, even if some very smart people are pushing it.  In my view, it rests on pernicious dualist assumptions which, had they been allowed to infect earlier work in biology, would have left it far poorer than it is today. Let me explain.

First, the data linguists use is biological data: we study patterns which would be considered contenders for Nobel Prizes in Medicine and Physiology (i.e. bio Nobels) were they emitted by non humans. Wait, would be?  No, actually were. Unraveling the bee waggle dance was Nobel worthy. And what’s the waggle dance? It’s the way a bee “articulates” (in a sign language sort of way, but less sophisticated) how far and in what direction honey lies. In other words, it is a way for bees to map AP expressions onto CI structures that convey a specific kind of message. It’s quite complicated (see here), and describing it’s figure 8 patterns (direction and size) and how they related to the position of the sun and the food source is what won von Frisch the prize in Physiology and Medicine. In other words, von Frisch won a bio Nobel for describing a grammar of the bee dance.

And it really was “just” a G, with very little “physiology” or “medicine” implicated. Even at the present time, we appear to know very little about either the neural or genetic basis of the dance or its evolutionary history (or at least Wikipedia and a Google search seems to reveal little beyond anodyne speculations like “Ancestors to modern honeybees most likely performed excitatory movements to encourage other nestmates to forage” or “The waggle dance is thought to have evolved to aid in communicating information about a new nest site, rather than spatial information about foraging sites” (Wikipedia)). Nonetheless, despite the dearth of bee neurophysiology, genetics or evo-bee-dance evolutionary history, the bio worthies granted it a bio Nobel! Now here is my possibly contentious claim: describing kinds of patterns humans use to link articulations to meanings is no less a biological project than is describing waggle dance patterns. Or, to paraphrase my good and great friend Elan Dresher: if describing how a bunch of bees dance is biology so too is describing how a bunch of Parisians speak French.

Second, it’s not only bees! If you work on bird songs or whale songs or other forms of vocalization or vervet monkey calls you are described as doing biology (look at the journals that publish this stuff)! And you are doing biology even if you are largely describing the patterns of these songs/calls. Of course, you can also add a sprinkle of psychology to the mix and tentatively describe how these calls/songs are acquired to cement your biological bona fides. But, if you study non human vocalizations and their acquisition then (apparently) you are doing biology, but if you do the same thing in humans apparently you are not. Or, to be more precise, describing work on human language as biolinguistics is taken to be wildly inappropriate while doing much the same thing with mockingbirds is biology. Bees, yes. Whales and birds, sure. Monkey calls, definitely. Italian or Inuit; not on your life! Dualism anyone?

As may be evident, I think that this line of reasoning is junk best reserved for academic bureaucrats interested in figuring out how to demarcate the faculty of Arts from that of Science. There is every reason to think that there is a biological basis for human linguistic capacity and so studying manifestations of this capacity and trying to figure out its limits (which is what GG has been doing for well over 60 years) is biology even if it fails to make contact with other questions and methods that are currently central in biology. To repeat, we still don’t know the neural basis or evolutionary etiology of the waggle dance but nobody is lobbying for rescinding von Frisch’s Nobel.

One can go further: Comparing modern work in GG and early work in genetics leads to a similar conclusion. I take it as evident that Mendel was doing biology when he sussed out the genetic basis for the phenotypic patterns in his pea plant experiments. In other words, Mendel was doing biogenetics (though this may sound redundant to the modern ear). But note, this was biogenetics without much bio beyond the objects of interest being pea plants and the patterns you observe arising when you cross breed them. Mendel’s work involved no biochemistry, no evolutionary theory, no plant neuro-anatomy or plant neuro-physiology. There were observed phenotypic patterns and a proposed very abstract underlying mechanism (whose physical basis was a complete mystery) that described how these might arise. As we know, it took the rest of biology a very long time to catch up with Mendel’s genetics. It took about 65 years for evolution to integrate these findings in the Modern Synthesis and almost 90 years until biology (with the main work carried out by itinerant physicists) figured out how to biochemically ground it in DNA. Of course, Mendel’s genetics laid the groundwork for Watson and Crick and was critical to making Darwinian evolution conceptually respectable. But, and this is the important point here, when first proposed, its relation to other domains of biology was quite remote. My point: if you think Mendel was doing biology then there is little reason to think GGers aren’t. Just as Mendel identified what later biology figured out how to embody, GG is identifying operations and structures that the neurosciences should aim to incarnate.  Moreover, as I discuss below, this melding of GG with cog-neuro is currently enjoying a happy interaction somewhat analogous to what happened with Mendel before.

Before saying more, let me make clear that of course biolinguists would love to make more robust contact with current work in biology. Indeed, I think that this is happening and that Minimalism is one of the reasons for this. But I will get to that. For now let’s stipulate that the more interaction between apparent disparate domains of research the better. However, absence of apparent contact and the presence of different methods does not mean that subject matters differ. Human linguistic capacity is biologically grounded. As such inquiry into linguistic patterns is reasonably considered a biological inquiry about the cognitive capacities of a very specific animal; humans. It appears that dualism is still with us enough to make this obvious claim contentious.

The point of all of this? I actually have two: (i) to note that the standard criticism of GG as not real biolinguistics at best rests on unjustified dualist premises (ii) to note that one of the more interesting features of modern Minimalist work has been to instigate tighter ties with conventional biology, at least in the neuro realm. I ranted about (i) above. I now want to focus on (ii), in particular a recent very interesting paper by the group around Stan Dehaene. But first a little segue.

I have blogged before on Embick and Poeppel’s worries about the conceptual mismatch between the core concepts in cog-neuro and those of linguistics (here for some discussion). I have also suggested that one of the nice features of Minimalism is that it has a neat way of bringing the basic concepts closer together so that G structure and its bio substructure might be more closely related. In particular, a Merge based conception of G structure goes a long way towards reanimating a complexity measure with real biological teeth. In fact, it is effectively a recycled version of the DTC, which, it appears, has biological street cred once again.[1] The cred is coming from work showing that one can take the neural complexity of a structure as roughly indexed by the number of Merge operations required to construct it (see here). A recent paper goes the earlier paper one better by embedding the discussion in a reasonable parsing model based on a Merge based G. The PNAS paper (Henceforth Dehaene-PNAS) (here) has a formidable cast of authors, including two linguists (Hilda Koopman and John Hale) orchestrated by Stan Dehaene. Here is the abstract:
Although sentences unfold sequentially, one word at a time, most linguistic theories propose that their underlying syntactic structure involves a tree of nested phrases rather than a linear sequence of words. Whether and how the brain builds such structures, however, remains largely unknown. Here, we used human intracranial recordings and visual word-by-word presentation of sentences and word lists to investigate how left-hemispheric brain activity varies during the formation of phrase structures. In a broad set of language-related areas, comprising multiple superior temporal and inferior frontal sites, high-gamma power increased with each successive word in a sentence but decreased suddenly whenever words could be merged into a phrase. Regression analyses showed that each additional word or multiword phrase contributed a similar amount of additional brain activity, providing evidence for a merge operation that applies equally to linguistic objects of arbitrary complexity. More superficial models of language, based solely on sequential transition probability over lexical and syntactic categories, only captured activity in the posterior middle temporal gyrus. Formal model comparison indicated that the model of multiword phrase construction provided a better fit than probability- based models at most sites in superior temporal and inferior frontal cortices. Activity in those regions was consistent with a neural implementation of a bottom-up or left-corner parser of the incoming language stream. Our results provide initial intracranial evidence for the neurophysiological reality of the merge operation postulated by linguists and suggest that the brain compresses syntactically well-formed sequences of words into a hierarchy of nested phrases.
A few comments, starting with a point of disagreement: Whether the brain builds hierarchical structures is not really an open question. We have tons of evidence that it does, evidence that linguists a.o. have amassed over the last 60 years. How quickly the brain builds such structure (on line, or in some delayed fashion) and how the brain parses incoming strings in order to build such structure is still opaque. So it is misleading to say that what Dehaene-PNAS shows is both that the brain does this and how. Putting things this way suggests that until we had such neural data these issues were in doubt. What the paper does is provide neural measures of this structure building processes and provides a nice piece of cog-neuro inquiry where the cog is provided by contemporary Minimalism in the context of a parser and the neuro is provided by brain activity in the gamma range.

Second, the paper demonstrates a nice connection between a Merge based syntax and measures of brain activity. Here is the interesting bit (for me, my emphasis):

Regression analyses showed that each additional word or multiword phrase contributed a similar amount of additional brain activity, providing evidence for a merge operation that applies equally to linguistic objects of arbitrary complexity.

Merged based Gs treat all combinations as equal regardless of the complexity of the combinations or differences among the items being combined. If Merge is the only operation, then it is easy to sum the operations that provide the linguistic complexity. It’s just the same thing happening again and again and on the (reasonable) assumption that doing the same thing incurs the same cost we can (reasonably) surmise that we can index the complexity of the task by adding up the required Merges. Moreover, this hunch seems to have paid off in this case. The merges seem to map linearly onto brain activity as expected if complexity generated by Merge were a good index of the brain activity required to create such structures. To put this another way: A virtue of Merge (maybe the main virtue for the cog-neuro types) is that it simplifies the mapping from syntactic structure to brain activity by providing a common combinatory operation that underlies all syntactic complexity.[2] Here is Dehaene-PNAS paper (4):
A parsimonious explanation of the activation profiles in these left temporal regions is that brain activity following each word is a monotonic function of the current number of open nodes at that point in the sentence (i.e., the number of words or phrases that remain to be merged).
This makes for a limpid trading relation between complexity as measured cognitively and as measured brain-wise transparent when implemented in a simple parser (note the weight carried by “parsimonious” in the quote above). What the paper argues is that this simple transparent mapping has surprising empirical virtues and part of what makes it simple is the simplicity of Merge as the basic combinatoric operation.

There is lots more in this paper. Here are a few things I found most intriguing.

A key assumption of the model is that combining the words into phrases occurs after the word at the left edge of the constituent boundary (2-3):
…we reasoned that a merge operation should occur shortly after the last word of each syntactic constituent (i.e., each phrase). When this occurs, all of the unmerged nodes in the tree comprising a phrase (which we refer to as “open nodes”) should be reduced to a single hierarchically higher node, which becomes available for future merges into more complex phrases.
This assumption drives the empirical results. Note that it indicates that structure is being built bottom-up. And this assumption is a key feature of a Merge based G that assumes something like Extension. As Dehaene-PNAS puts it (4):
The above regressions, using “total number of open nodes” as an independent variable, were motivated by our hypothesis that a single word and a multiword phrase, once merged, contribute the same amount to total brain activity. This hypothesis is in line with the notion of a single merge operation that applies recursively to linguistic objects of arbitrary complexity, from words to phrases, thus accounting for the generative power of language
If the parsing respects the G principle of Extension then it will have to build structure in this bottom up fashion. This means holding the “open” nodes on a stack/memory until this bottom up building can occur. The Dehaene-PNAS paper provides evidence that this is indeed what happens.

What kind of evidence? The following (3) (my emphasis):
We expected the items available to be merged (open nodes) to be actively maintained in working memory. Populations of neurons coding for the open nodes should therefore have an activation profile that builds up for successive words, dips following each merge, and rises again as new words are presented. Such an activation profile could follow if words and phrases in a sentence are encoded by sparse overlapping vectors of activity over a population of neurons (27, 28). Populations of neurons involved in enacting the merge operation would be expected to show activation at the end of constituents, proportional to the number of nodes being merged. Thus, we searched for systematic increases and decreases in brain activity as a function of the number of words inside phrases and at phrasal boundaries.
So, a Merge based parser that encodes Extension should show a certain brain activity rhythm indexed to the number of open nodes in memory and the number of Merge operations executed. And this is what the paper found.

Last, and this is very important: the paper notes that Gs can be implemented in different kinds of parsers and tries to see which one best fits the data in their study. There is no confusion here between G and parser. Rather, it is recognized that the effects of a G in the context of a parser can be investigated, as can the details of the parser itself. It seems that for this particular linguistic task, the results are consistent either a bottom-up or left corner parser, with the latter being a better fit for this data (7):
Model comparison supported bottom-up and left-corner parsing as significantly superior to top-down parsing in fitting activation in most regions in this left-hemisphere language network…
Those findings support bottom-up and/or left-corner parsing as tentative models of how human subjects process the simple sentence structures used here, with some evidence in favor of bottom-up over left-corner parsing. Indeed, the open-node model that we proposed here, where phrase structures are closed at the moment when the last word of a phrase is received, closely par- allels the operation of a bottom-up parser.
This should not be that surprising a result given the data that the paper investigates. The sentences of interest contain no visible examples where left context might be useful for downstream parsing (e.g. Wh element on the left edge (see Berwick and Weinberg for discussion of this)). We have here standard right branching phrase structure and for these kinds of sentences non-local left context will be largely irrelevant. As the paper notes (8), the results do “not question the notion that predictability effects play a major role in language processing” and as it further notes there are various kinds of parsers that can implement a Merge based model, including those where “prediction” plays a more important role (e.g. left-corner parsers).
That said, the interest of Dehaene-PNAS lies not only in the conclusion (or maybe not even mainly there), but in the fact that it provides a useful and usable model for how to investigate these computational models in neuro terms. That’s the big payoff, or IMO, the one that will pay dividends in the future. In this, it joins the earlier Pallier et al and the Ding et al papers. They are providing templates for how to integrate linguistic work with neuro work fruitfully. And in doing so, they indicate the utility of Minimalist thinking.
Let me say a word about this: what cog-neuro types want are simple usable models that have accessible testable implications. This is what Minimalism provides. We have noted the simplicity that Merge based models afford to the investigations above; a simple linear index of complexity. Simple models are what cog-neuro types want, and for the right reasons. Happily, this is what Minimalism is providing and we are seeing its effects in this kind of work.
An aside: let’s hear it for stacks! The paper revives classical theories of parsing and revives the idea that brains have stacks important for the parsing of hierarchical structures. This idea has been out of favor for a long time. One of the major contributions of the Dehaene-PNAS paper is to show that dumping it was a bad idea, at least for language, and, most likely, other domain where hierarchical organization is essential.
Let me end: there is a lot more in the Dehaene-PNAS paper. There are localization issues (where the operations happen) and arguments showing that simple probability based models cannot survive the data reviewed. But for current purposes there is a further important message: Minimalism is making it easier to put a lot more run of the mill everyday bio into biolinguistics. The skepticism about the biological relevance of GG and Minimalism for more bio investigation is being put paid by the efflorescence of intriguing work that combines them. This is what we should have expected. It is happening. Don’t let anyone tell you that linguistics is biologically inert. At least in the brain sciences, it’s coming into its own, at last![3]



[1] Alec Marantz argued that the DTC is really the only game in town. Here’s a quote:
…the more complex a representation- the longer and more complex the linguistic computations necessary to generate the representation- the longer it should take for a subject to perform any task involving the representation and the more activity should be observed in the subject’s brain in areas associated with creating or accessing the representation or performing the task.
For discussion see here.

[2] Note that this does not say that only a Merge based syntax would do this. It’s just that Merge systems are particularly svelt systems and so using them is easy. Of course many Gs will have Mergish properties and so will also serve to ground the results.
[3] IMO, it is also the only game in town when it comes to evolang. This is also the conclusion of Tatersall in his review of Berwick and Chomsky’s book. So, yes, there is more than enough run of the mill bio to license the biolinguistics honorific.