Comments

Wednesday, March 18, 2015

A (shortish) Whig history of Generative Grammar (part 1)

 0. One of the striking characteristics of modern scientific practice, at least in the successful domains of inquiry, is that it is possible to stand on the shoulders of one’s predecessors (of various heights), thereby allowing one to (occasionally successfully) address questions that would otherwise be out of reach.  This cumulative aspect of scientific inquiry is so distinctive (and important) that it is not unreasonable to use it as a measure of how much scientific traction a domain of investigation has achieved.

The converse is also true: to the degree that a field of inquiry enjoys a “revolution” every decade to that degree it suggests that it has not yet made the jump from (possibly informed) speculation to scientific inquiry.  Revolutions tend to discard the past, not build on it and this implies that what is discarded is not worth retaining. Permanent scientific revolution within a field is a leading indicator of ignorance.

Given this epistemological yardstick it is germane to ask how well modern Generative Grammar (GG) has measured up.  Not well, is one standard view.  Here we argue that this misreads the history of GG. We aim to back this judgment up by providing a Whig history (WH) of GG that displays its cumulative character.  Before proceeding, a word of caution. WHs are not “real” histories. They present the past as “an inevitable progression towards ever greater…enlightenment.” Real history is filled with dead ends, lucky breaks, misunderstandings, confusions, petty rivalries, and more.  WHs are not. They focus on the “successful chain of theories and experiments that led to the present-day science, while ignoring failed theories and dead ends” (see here). A less pejorative term for WH is “rational reconstruction.” At its best, a WH can expose the logic behind a set of questions and the attendant inquiries into them, exposing how the theories we consider today have built on the results of earlier theoretical and empirical investigations.  A domain with a credible WH is a domain where the growth of knowledge has in fact been cumulative even if the path leading to this body of knowledge has been extremely crooked. GG has a very credible WH, as we intend to demonstrate.

1. Where to begin?  The modern Generative enterprise rises from the consideration of a handful of obvious facts:
            (1) A competent speaker of a given natural language (NL) has the capacity to deal with an effectively infinite number of linguistic objects.
            (2) The linguistic objects in question are pairings of meanings with “sounds.” Thus, among the things a native speaker of English knows is that Dogs chase cats does not mean the same thing as Cats chase dogs while Cats are chased by dogs does. There are an unbounded number of such facts that a competent native speaker of a given NL knows.
            (3) Any human child can acquire competence in any NL when placed in the appropriate speech community. Thus, any child of any parentage will grow up speaking English if it grows up in NYC, Chinese if it grows up in Shanghai, Hungarian in Budapest, etc. Moreover, all children regardless of the language, or the child, do this in essentially the same way.
            (4) This capacity to acquire anything like an NL is a species-specific capacity that humans have. In other words, no other animals do language like we humans do language.

The first fact suggests that linguistic competence consists (in part) in mastery of a system of rules that specify the NL mastered.  Why a rule system? Because that is the only way to specify an effectively infinite capacity.  We cannot just list the objects in the domain of a native speaker’s competence. The capacity can only be specified in terms of a finite procedure that describes (aka: generates) it. Thus, we conclude that linguistic mastery consists (in part) in acquiring a set of rules (aka: a Grammar (G)) that generate the kinds of linguistic objects that a native speaker has competence with.   

The second fact tells us something more about these Gs. They must specify pairings of meanings with sounds. Thus the rule systems that native speakers have mastered are rules that generate objects with two distinctive properties. Gs are generative procedures that tie a specific meaning profile together with a specific sound profile, and they do this over an effectively infinite domain. So Gs are functions whose range is meaning-sound pairs, viz. an infinite number of objects like this: <m,s>. What’s the domain? Some finite set of “atoms” that can combine again and again to yield more and more complex <m,s> pairs. Let’s call these atoms ‘morphemes.’

Putting this all together, we know from the basic facts and some very elementary reasoning that native speakers master Gs (recursive procedures) that map morphemes into an unbounded range of <m,s>s. This we know. What we don’t know is what the specific rules that Gs contain look like (or for that matter what the ‘m’s and ‘s’s look like). And that brings us to our first research question: describe some rules of NL Gs and specify their variety.

We know another thing about these Gs. They can be acquired by any human when placed in the appropriate linguistic environment. You might be a native speaker of English, but you could just as well have been a native speaker of Swahili but for an accident of birth location. There is nothing that intrinsically makes you a native speaker of English (nor of Swahili, Dutch, Japanese, etc.), though, given the third and fourth facts noted above, there is likely to be something biologically intrinsic to you that makes you capable of acquiring a NL (i.e. a G that generates the NL) at all. Let’s call the capacity that humans have to acquire a NL, the ‘Faculty of Language’ (FL).  Note, postulating that there is a FL does not tell us what’s in FL. It simply names a fact: that humans (qua humans) are able to acquire any NL G in more or less the same way. We can think of FL as a function whose range is Gs of NLs.  An obvious research question is what’s the fine structure of this function FL.

This second research question has been addressed in at least two different ways. The first has been to inspect many NL Gs to induce what all Gs have in common? Note, that doing this requires having a largish set of candidate Gs and a reasonable variety of them (e.g. romance Gs, Germanic Gs, Semitic Gs, Austonesian Gs, East Asian Gs, etc.).  Clearly, we cannot inspect their commonalities without having them to inspect.[1]

There is a second way of addressing this question. We can ask what FL must contain in order to produce even a single a G only guided by the kinds of data the child uses. Call the data the child actually uses to guide its G acquisition ‘Primary Linguistic Data’ (PLD).  We can investigate the structure of FL by asking what (if anything) we must assume about the structure of FL to allow it to use the PLD(NL) (i.e. the PLD of a given NL, e.g. uttered bits of English) to arrive at the G(NL) (i.e. The G of that NL, e.g. the grammar of English) that the child attains. 

What do we know about the PLD? Actually quite a bit. We know that it consists of relatively simple linguistic forms. There are not many sentences in the PLD that the child has access to (e.g. child directed language, a superset, most likely, of what it actually uses) that involve more than 2 levels of embedding. Indeed, most of the PLD seems to consist of simple phrases and sentences.  Moreover, there are virtually no grammatically ill-formed utterances addressed to the child and no correction of mistakes that the child spontaneously makes. Thus, to a good first approximation, we can take the PLD to be simple well formed phrases and sentences addressed to the child. From this the child builds a G that permits it to generate an effectively unbounded number of phrases and sentences, both simple (like the one’s it is exposed to) and complex (language bits with structures unattested in the PLD). The idea is that we can investigate how FL is structured using the following argument form, called ‘The Poverty of the Stimulus Argument’ (POS): assume that anything that the child can acquire on the basis of the PLD it does acquire in this way. However, anywhere that the PLD is insufficient to fix the attained property implicates some built-in (i.e. innate) feature of FL.  In contrast to the comparative method noted above, the POS licenses inferences about FL based on the properties of a single G(NL).  The method is effectively subtractive: what you can acquire from PLD assume is so acquired, what’s left after you subtract this out is due to fixed (viz. innate) features of FL.[2]

Two last points before proceeding further: note (i) that investigating FL in either way requires that we have some candidate Gs. The output of FL is a G, so until we know something about the operations that NL Gs contain, it is fruitless to pursue this question about FL. (ii) that the two forms of hunting for innate features of FL are complementary and both are useful. As we will see in our WH of GG below, what we learned in studying multiple particular Gs , and POS evaluations of single Gs have both contributed to GG’s understanding of FL’s structure.

Given (4) we know that FL is a biological novelty. That means that there was a time at which our ancestors did not have a FL. A reasonable question to ask is how FL arose in humans. Note that just as having candidate Gs is a precondition for studying the properties of FL, having plausible candidate FLs is a precondition for studying how FL arose in the species.  Let’s be a little clearer here.

Let’s divide FL into those features (i) that are domain specific to the use and mastery of FL (call these principles ‘Universal Grammar’ (UG)), (ii) that FL shares with other cognitive capacities (call these ‘domain general cognitive features’ (DGCF)), (iii) features that FL has by physical necessity (PN).  The more that FL is constructed from operations and principles drawn from (ii) and (iii) to that degree the story of how FL could have arisen in the species can be simplified. For example, if all of the ingredients but one necessary to build our FL are in DGCF or PN then we can trace the rise of FL in humans to the emergence of that one distinctive property of UG.  Conversely, the more that must be packed into UG the more involved will be an explanation for how FL arose.

The difficulty is further exacerbated if FL is a relatively recent cognitive innovation, as it leaves less time for (the oft believed) gradual methods of natural selection to work their magic.[3] This leads to the following conclusion: in the best of all possible worlds, FL uses operations and principles that cognitively pre-date the emergence of FL and that these suffice to construct an FL with the properties ours has when the (hopefully, very) small number (possibly zero, but more likely one or two) of language specific cognitive innovations (i.e. UG) are added to the prior mix.  This line of reasoning suggests another clear project: show UG is pretty sparse and that very modest UGs can derive the operations and principles of richer UGs when combined with identifiable DGCFs and PNs. 

The above formulation highlights an important tension between explaining how Gs arise in a single individual and explaining how FL arose in the species.  The more we pack into UG (operations and principles specific to linguistic capacity), the easier we make the child’s task of projecting a G(NL) from the PLD(NL) it exploits.  However, the richer the UG component of FL, the harder it is for our cognitive ancestors (who, by assumption, were sans FL) to evolve an FL like ours.  That’s the tension, and contemporary GG tries to address it. However, for now, let’s just observe the tension and note that the enterprise of discussing how FL arose in the species can only get productively started once we have some idea of what properties FL has, and for this it is very useful to have some understanding of what operations and principles of FL might be linguistic specific (i.e. part of UG).

To conclude our little conceptual tour of the problem: we have here outlined the logic of the GG research program based on some pretty elementary facts. This project addresses three questions:[4]
           
(5)       a. What properties do individual Gs have?
                        b. What properties must FL have to so as to enable it to acquire these Gs?
                        c. Which of these properties of FL are proprietary to language and which
are more general?

Our WH will illustrate how research in GG can be understood as progressively addressing each of these questions, thereby setting the stage for a fruitful investigation of the next questions. This is what we should expect given our observation that (5a) is a precondition for (fruitfully) addressing (5b) and (5b) is a precondition for fruitfully addressing (5c). Note that ‘precondition’ is here used in the conceptual sense.  It does not mean that answers to (5b) might not lead to rethinking claims about (5a) or (5c) to claims about (5b). Nor does it mean that these questions cannot be (and aren’t) pursued in tandem. As a matter of practice, all three questions are often addressed simultaneously. Rather, what we intend by ‘precondition’ is that it is nugatory to pursue the latter questions without some answers to the previous ones given the nature of the questions asked.

One more caveat before we get the show on the road: this is a very idealized account of the problem and its empirical boundary conditions. For example, most GGers do not think that humans have a single grammar of their NL (i.e. it is almost certain that humans develop multiple Gs). Nor do they think that NLs are natural kinds (e.g. looked at carefully, there is nothing like ‘English’ that all so-called speakers of English speak). Ontologically, Gs are more “real” than the NLs they are related to. However, this idealization is adopted because it is recognized that describing the properties of G and FL even under these idealized assumptions is already very hard and, GGers believe that relaxing the idealization will not significantly affect the shape of the answers provided.  Of course, this may be incorrect. But we doubt it and what follows does not much worry about the legitimacy of so idealizing.[5]

2. So given these three questions, we can divide the WH of GG into three epochs. Early work in GG (say from the mid 50s to the early 70s) concentrated on constructing sample Gs for fragments of a given NL.  The second epoch goes from the mid 70s to the early 90s. This concentrated on simplifying the rules/operations of FL. This involved, categorizing the various rule types a G could have, factoring out common features within these types and enriching UG to prevent massive over-generation (a natural consequence of simplifying the rules, as we shall see). The third epoch goes from the mid 90s to the present. This period focuses on simplifying FL, trying to figure out which aspects of FL’s properties are language specific and which follow from more general cognitive/ and/or computational principles. The aim here has been to factor out those features of FL that are computationally general, leaving, it is hoped, a very small domain specific residue. So, three epochs: (i) exploring the rules Gs contain and how they interact, (ii) simplifying the structure of Gs by articulating the structure of UG and (iii) simplifying FL by separating the computationally general wheat from the linguistically specific chaff.

In what follows we describe in slightly more detail the kinds of results each epoch delivered and illustrate the progressive nature of the GG enterprise. Let’s begin with the first epoch.


[1] Though this sounds like an obvious method to pursue in studying the structure of FL, it is actually quite a bit harder to do than one might think. The reason is that Gs do not tend to have the same rules. What they have in common is far more abstract: e.g. all the rules adhere to a common rule schema, or all the rules obey similar constraints in the sense of no rules within Gs showing evidence of disobeying them.  This makes any simple-minded process of looking for commonalities quite difficult.
[2] It is worth observing that the POS sets a very high standard for concluding that some property reflects innate structural features of FL. POS assumes that any feature of G that could be a data driven acquisition is one.  However, this conclusion does not follow. Nonetheless, POS allows one to isolate a whole slew of promising candidate generalizations useful in probing the built-in structure of FL.
[3] Even if natural selection can operate quickly, time pressures may matter.
[4] There are others: How is FL instantiated in the brain? How are Gs used in linguistic performance? How is FL used to acquire a G in real time? We return to these.
[5] This does not mean to say that GGers have not explored models that relax these idealizations. For example, a staple of work in historical linguistics is to assume that speakers have multiple Gs that compete. Change is then modeled as the changing dominance relations between these multiple Gs. 

Saturday, March 14, 2015

Defeating linguistic agnotology; or why very vigorous attack on bad ideas is important

In a recent piece I defended the position that intemperance in debate is defensible. More specifically, I believe that there are some positions that are so silly and/or misinformed that treating them with courtesy amounts to a form of disinformation. As any good agnotologist will tell you, simply being admitted to the debate as a reasonable option is 90% of the battle. So, for example, the smoking lobby became toast once it was widely recognized that the "science" it relied on was not wrong, but laughable and fraudulent. You can fight facts with other facts, but once a position is the subject of late night comedy, it's a goner.

Why should this be so? Omer Preminger sent me this piece in discussing the Dunning-Kruger Effect (DKE), something that I had never heard of before. It is described as follows:

 ….those with limited knowledge in a domain suffer from a dual burden: Not only do they reach mistaken conclusions and make regrettable errors, but their incompetence robs them of the ability to realize it.

If the DKE is real, then it is not particularly surprising that some debates are interminable. The only constraint on their longevity will be the energy that the parties bring to the "debate." In other words, holding a silly uninformed position is not something that will become evident to the holder no matter how clearly this is pointed out. Thus, the idea that gentle debate will generally result in consensus, though a nobel ideal, cannot be assumed to be a standard outcome.

The piece goes on to describe a second apparent fact. It seems that when there is "disagreement" people tend to split the difference.

people have an “equality bias” when it comes to competence or expertise, such that even when it’s very clear that one person in a group is more skilled, expert, or competent (and the other less), they are nonetheless inclined to seek out a middle ground in determining how correct different viewpoints are.

I've seen this in action, especially as regards non-experts. I call it the three remark rule. You go to a talk and someone says something that you believe might be controversial. Nobody in the audience objects. The third time there is no objection, you assume that the point made was true. After all, you surmise, were it false someone would have said so. Of course, there are many other reasons not to say anything. Politeness being the highest motive on the list. However, in such cases reticence, even when motivated by the best social motives, have intellectual consequences. Saying nothing, being "nice" can spread false belief. That's why, IMO, it is incumbent on linguists to be very vociferous.

Let me put this another way: reasonable debate among reasonable positions is, well, reasonable. However, there are many views, especially robust in the popular intellectual press, that we know to be garbage. The Everett and Evans stuff on universals is bunk. It is without any intellectual interest once the puns are clarified. Much of the stuff one finds from empiricist minded colleagues (read, psychologists) is similarly bereft of insight or relevance. These positions should not be treated graciously. So, nest time you are at a talk and hear these views expounded, make it clear that you think that they are wrong, or better still miss the point or beg the relevant questions. If you can do this with a little humor, so much the better. Whatever you do, do not rest the views with respect. Do this and we have every reason to believe that this will help the silliness spread.

Thursday, March 12, 2015

Getting a grant

Kleanthes Grohmann (recently featured here) sent me this paper (here) which surveys the costs and benefits of grant writing. The participants are from astronomy and psychology (social and personality). The paper is wroth taking a look at for it confirms what many might have thought reflecting on their own experience. So, for example, grant writing is VERY labor intensive. Conservatively, it seems to require about 10 times the amount of time of other scholarly pursuits. Moreover, large amounts of good work is never funded and the collateral benefits of writing the grant and not getting funded seem pretty mild. At any rate, whatever their collateral benefits, it does not seem that these are sufficient to inspire more grant writing effort.

There is more that is interesting:


  • Success does not seem to be gender biased
  • Funding is very competitive, more so if one has not landed a grant in the few years before applying
  • Amount of time spent on writing grant not apparently correlated with greater funding
  • There is a correlation between applying for more grants and getting more funding, though "the causal direction might go either way"
  • Many good things don't get funded
None of these results are surprising. But take a look. It's interesting. 

Sunday, March 8, 2015

A good Sunday read

Bob Berwick sent me this old review by Richard Lewontin. I'm a big fan of Lewontin's. His paper in the Invitation to CogSci on the evolution of cognition (here) is (or should be) standard reading for anyone interested in Evolang. At any rate, this review of a Carl Sagan book is a terrific (funny) discussion of the scientific method (there is none), the overhyping of science by scientists, the increasing cynicism of the "unwashed" towards this overhyping, science as show business, the cultural bases of (some of) the public's antipathy towards science in the USA, and more. If I had read this before, I didn't remember. I certainly enjoyed it this time around.

Saturday, March 7, 2015

How to make $1,000,000

My mother once told me about an easy way to become a millionaire: start with $10 million. This seems to be advice that second generations are particularly good at following. And not only as regards inter-generational wealth transfer. Like families, journals also enjoy life cycles, with founders giving way to a next generation. And as in families, regression to the mean (i.e. the headlong rush to average) seems to be an inexorable force. However, in contrast to the rise and decline of wealth in families, the move from provocative to staid in journals is rarely catalogued. Rarely, but not never. Here is a paper (by Priva and Austerweil (P&A)) that charts the change in intellectual focus of Cognition, a (once?) very important cogsci journal. What does P&A show? Two things: (i) It shows that the mix of papers in the journal has substantially changed. Whereas in the beginning, there was a fair mix of theory and experimental papers (theory papers predominating), since the mid 2000s the mix has dramatically changed, with experimental papers forming the bulk of academic product. Theory papers have not entirely disappeared, but they have been substantially overtaken by their experimental kin. (ii) That papers on language and development have gone from central topics of interest to a somewhat marginal presence.[1]

How surprising is this?  Let me start by discussing (i), the decline of “theory,” a favorite obsession of mine (see here). Well, first off, from one common perspective, some decline might be expected. We all know the Kuhnian trope; “revolutionary” periods of scientific inquiry where paradigms are contested, big ideas are born and old ones die out (one old fogey at a time in Plank time) give way to periods of “normal science” where the solid majestic wall of scientific accomplishment is carefully and industriously built brick by careful empirical brick. The picture offered is one in which the basic framework ideas get hashed out and then their implications are empirically fleshed out. I never really liked this way of conceptualizing matters (there is a lot of hashing and fleshing going on all the time in serious work), but I think that this picture has some appeal descriptively. Sadly, it also seems to have normative attractions, especially to next generation editors. Here’s what I mean.

Editing a journal is a lot of work. Much of it thankless. So before I go off the deep end here in a minute, let me personally thank those that take this task on, for their work and commitment is invaluable and what we think of as science could not succeed without such effort. That said, precisely because of how hard it is to do, you need to be driven (nuts?) to start a journal. What drives you? The feeling that there is something new to say but that there is no good place to say it. Moreover, not only is that something new, it must be important and new. And it is not possible to say these new important things in the current journals because the new ideas cut things up in new ways or approach problems from premises that don’t fit into the existing journalistic matrix.[2] So, at the very least, the extant venues are not congenial places to publish, and in some cases are outright hostile.

The emergence of cogsci (something that happened when I was growing up intellectually) had this feel to it. There was a self-conscious cognitive revolution, with very self-conscious revolutionaries. Furthermore, this revolution was fought on several fronts: linguistics, psychology, computer science and philosophy being the four main ones. Indeed, for a while, it was not clear where one left off and the other began. Linguists read philosophy and psychology papers, psychologists knew about Transformational Grammar and could Locke from Descartes, philosophers debated what to make of innate ideas, representations and rule following based on work in linguistic and psychology and computer scientists (especially in AI) worried about the computational properties of mental representations (think Marr and Marcus for example).  Cogsci lived at the intersection of these many disciplines, was nurtured by their cross disciplinary discussions and, for someone like me, cogsci became identified as the investigation of the structures of minds (and one day brains) using the techniques and methods of thought that each discipline brought to the feast. Boy was this exciting. Not surprisingly, the premiere journal for the advancement of this vision was Cognition. Why not surprisingly? Because the founding editors, Jacques Mehler and Tom Bever, were two people that thoroughly embodied this new combined intellectual vision (and were and are two of its leading lights) and they built Cognition to reflect it.[3]

A nice way of seeing this is to read Mehler’s “farewell remarks” here. It is very explicit about what gap the journal was intended to fill:

Our aim was to change the publishing landscape in psychology and related disciplines that became part of “Cognitive Science.” …[P]sychology had turned almost exclusively into an experimental discipline with an overt disdain for theory…Linguistics had become a descriptive discipline often favoring normative or purely descriptive over theoretical approaches. Professional journals in line with this outlook generally obliged contributors to write their papers in standard format that privileged the shortest possible introductions and conclusions, methods and procedures used in experiments. Papers by non-experimental scientists say, philosophers of mind or theoretical linguists, were rarely even accepted…. (p. 7)

In service of this, the journal was the venue of lots of BIG debates concerning connectionism, the representational theory of mind, compositionality, AI models of mind, prototypes, domain specificity, computational complexity, core knowledge and much much more. In fact, Cognition did something almost miraculous: It became a truly inter-disciplinary journal, something that administrators and science bureaucrats (including publishers) love to talk about (but, it seems, often fail appreciate when it happens).

P&A records that this Cognition now seems to be largely gone. It is no longer the journal its editors founded. There is little philosophy and little linguistics or linguistically based psychology. Nor does it seem to any longer be the venue where big ideas are thrashed out. Three illustrations: (i) the critical discussions concerning Bayesian methods in psychology have not occurred in the pages of Cognition,[4] (ii) nor have the Gallistel-like critiques of connectionist neuro-science gotten much of an airing, (iii) nor have extensive critiques of resurgent “language” empiricism (e.g. Tomasello) made an appearance. These have gotten play elsewhere, and that is a good thing, but these dogs have not barked in Cognition, and their absence is a good indicator of how much Cognition has changed. Moreover, this change is no accident. It was policy.

How so? Well, in the same issue that Mehler penned his farewell the new incoming editor Gerry Altmann gave his inaugural editorial (here). It’s really worth reading the Mehler and Altmann pieces side by side, if nothing else as an exercise in the sociology of science. I’ve rarely read anything that so embodies (and embraces) the Kuhnian distinction between revolutionary vs normal science. Altmann’s editorial is six pages long. After some standard boilerplate thanking Mehler & Co. for its path-breaking efforts, Altmann sets out his vision of the future. It comes in two parts.

First the ideal paper:

To be published in Cognition, articles must be robust in respect of the fit between the theory, the data and the literature in which the work is grounded. They should have a breadth to them that enables the specific research they describe to make contact with more general issues in cognition; the more explicit this contact, the greater the impact of the research beyond the confines of the specialized research community. (2)

It’s worth contrasting this ideal with the more expansive one provided by Mehler above. In Altmann’s, there is already an emphasis on “data” that was missing from Mehler’s discussion. In other words, Altmann’s ideal has an up front experimental tilt. Data’s the lede. The vision thing is filler. To see this, read the two sentences in reverse order. The sense of what is important changes. In the actual quoted order what matters is data fit then idea quality. Reverse the sentences and we get first idea quality and then data fit. Moreover, unlike Mehler’s pitch, what’s clear here is that Altmann does not envision papers that might be good and worthwhile even were they bereft of data to fit. It more or less assumes that the conceptual issues that were at the foundation of the cogsci revolution have all been thoroughly investigated and understood (or were largely irrelevant to begin with (dare I say, maybe even BS?)). More charitably, it assumes that if something new does rise under the cognitive sun, it will arise from the carefully fitted data. In short, the main job of the cogscientist is see how the theories fit the facts (or vice versa). Theory alone is so your grandparent’s cognition.

The second part of the editorial reinforces this reading. The last 3 pages (i.e. half the editorial), section 3, concerns “the appropriate analyses of data” (4). It’s a long discussion of what stats to use and how to use them. There is no equally long section discussing thos hot topics/problems, what issues are worth addressing and why. This reinforces the conclusion that what Cognition will henceforth worry most about is data fit and experimental procedure. Sounds like the kind of journal that Mehler and Bever had hoped that Cognition would displace. Indeed, prior to Cognition’s founding, psychology had lots of the kinds of Journals that the Altmann editorial aspires to. That’s precisely why Mehler and Bever started their journal. Altmann appears to think that psychology needs one more.

If this read is right, then it is not surprising that Cognition’s content profile has changed over the years. It is not merely that new topics get hot and old ones get stale. Rather, it is that what was once a journal interested in bridging disciplines, critically investigating big issues and provoking thought, “grew up” and happily assumed the role of purveyor of “normal” science. A nice well behaved journal, just like most of the others.

Last two points. Given the apparent dearth of interest in theory, it is not a surprise (to me) that work on language is less represented in the new Cognition. Anything that takes linguistic theory seriously in psycho study will be suspect to those with a great respect for psychological techniques (we don’t gather data the right way, there is a distance between competence and performance, we think that minds are not all purpose learners etc.). Thus taking results in linguistic theory as starting points will go against the intellectual grain where theory is less important than data points. This need not have been so. But that it is so is not surprising.

Second, there is a weird part of Altmann’s editorial concerning the “collaborative” nature of science and how this should be reflected in the “editorial structure” of the journal. Basically, it seems to be signaling a departure from past methods. I don’t really know how the Mehler era operated “editorially.” But it would not surprise me were he (and Bever) more activist editors than is commonplace. This would go far, IMO, in explaining why the old Cognition was such a great journal. It expressed the excellent taste of its leaders. This is typically true of great journals. At one time the leading figures edited journals and imposed their tastes on the field, to its benefit. Max Planck edited the journals that published Einstein’s groundbreaking (and very unconventional) papers.[5] Keynes edited the most important economics journal of his day. Mehler and Bever were intellectual leaders and Cognition reflected their excellent taste in questions and problems. It strikes me that the Altmann editorial is a non too subtle critique of this. It’s saying that going forward editorial decisions would be more balanced and shared. In other words, more watered down, more common denominatorish, less quirky, more fashionnable. There is room for this ideal, one where the aim is to reflect the scientific “consensus.” Today, in fact, this is what most journals do. Mehler and Bever’s Cognition did not.

To end: Cognition has changed. Why? Because it wanted to. It has managed to achieve exactly what the new regime was aiming for. The old Cognition stood apart, had a broad vision and had the courage of its new ideas. The new Cognition has re-joined the fold. A good journal (no doubt). But no longer a distinctive one. It’s not where people go to see the most important ideas in cognition vigorously debated. It’s become a professional’s journal, one among many. Does it publish good papers? Sure. Is it the indispensible journal in cogsci that it once was? Not so much. IMO, that’s really too bad. However, it is educational, for now you know how to make $1,000,000 dollars. Just be sure to start off with $10,000,000.




[1] This is all premised on the assumption that the topic model methodology used in the paper accurately reflects what has been going on. This may be incorrect. However, I confess that it accurately reflects what many people I know have noted anecdotally.
[2] Is this PoMo or what? With a tinge of the Wachowskis thrown in.
[3] And you know many of the others. To name a few: Chomsky, two Fodors, Gleitman, Gallistel, Katz, Pylyshyn, Gellman, Garrett, Block, Carey, Spelke, Berwick, Marr, Marcus, a.o.
[4] E.g. Eberhardt & Danks, Brown & Love, Bowers & Davis, Marcus have all appeared in other venues. See here, here and here form some discussion and references.
[5] A friend of mine in theoretical physics once told me that he doubted that papers like Einstein’s great 1905 quartet could be published today. Even by the standards in 1905 they looked strange. Moreover, they were from a nobody working in a patent office. It’s a good thing for Einstein that Planck, one of the leading physicists of his day, was the editor of Annalen der Physik.

Monday, March 2, 2015

A defense of my sentiments

I am not always kind. In fact, wrt some views I have been (and likely will be) pretty unrelentingly negative. I will most likely indulge in ridicule, sarcasm, humor, and, satire, irony and petulance. I will throw verbal tantrums, make fun of the views, treat them as nonsense and generally avoid taking their claims seriously for the umpteenth time. Many will find this disrespectful (it is). Many will think that this is self-defeating (it may be, not sure). And many will think that this is not the way to treat views you don't agree with. After all intellectual disagreement should be civilized, right? Yes, but only to a point. There really are some views that are so far gone, so misinformed, so relentlessly clueless that treating them with regard is dishonest. I've discussed such papers in the recent past. My  position: Junk is junk and needs to be called out as junk.

Now, why bring this up? Well, it seems that on this matter of critical style I am not alone (yeah!!!). There is at least one other that has voiced similar opinions (and he has a Nobel Prize (well sort of a Nobel Prize. It's in Economics)). It is in another discipline, but the sentiments are the same. He defends his practice here.

Krugman notes three strategies for dealing with ideas that are popular but deeply wrongheaded and that have been continuously shown to be such. The first is to pretend that there is really an issue worth debating and that this is a disagreement among serious people. The second is to point out the wrongness  again and again quietly and politely. The third is to be nasty, snarky and loud. Krugman argues for door number 3. Why? Well you read it. What's relevant here is that I agree with K here. There are positions that are just silly. They are based on simple confusions, which have been pointed out repeatedly and  regularly ignored. There are books that are plain dumb, so misinformed and off base that pretending that they say anything worthwhile is an exercise in mass deception. Most things are NOT like this. But some are. And the problem is that many of these things get highlighted in the semi-popular intellectual press. Like K, I think that the best replies will not be pretty and polite and quiet and "serious." Snark, laughs and satire, irony, jokes and outrage. I plan to pile it on, and have fun in the process.

Sunday, March 1, 2015

Fodor on PoS arguments

One of the perks of academic life is the opportunity to learn from your colleagues. This semester I am sitting in on Jeff Lidz’s graduate intro to acquisition course. And what’s the first thing I learn? That Jerry Fodor wrote a paper in 1966 on how to exploit Poverty of Stimulus reasoning that is as good as anything I’ve ever seen (including my own wonderful stuff). The paper is called “How to learn to talk: some simple ways” and it appears here. I cannot recommend it highly enough. I want to review some of the main attractions, but really you should go and get a copy. It’s really a good read.

Fodor starts with three obvious points:

1.     Speakers have information about the structural relations “within and among the sentences of that language” (105).
2.     Some of the speaker’s information must be learned (105).
3.     The child must bring to the task of language learning  “some amount of intrinsic structure” (106).[1]

None of these starting points can be controversial. (1) just says that speakers have a grammar, (2) that some aspects of this internalized grammar are acquired on the basis of exposure to instances of that grammar and (3) that “any organism that extrapolates from experience does so on the basis of principles that are not themselves supplied by its experience” (106). Of course, whether these principles are general or language-specific is an empirical question.

Given this troika of truisms, Fodor then asks how we can start investigating the “psychological process involved in the assimilation of the syntax of a first language” (106). He notes that this process requires at least three variables (106):

4.     The observations (i.e. verbalizations in its vicinity) that the child is exposed to that it uses
5.     The learning principles that the child uses to “organize and extrapolate these observations”
6.     The body of linguistic information that are the output of the application of the principles to the data that the child will subsequently use in speaking and understanding

We can, following standard convention, call (4) the PLD (Primary Linguistic Data), (5) UG, and (6) the G acquired. Thus, what we have here is the standard description of the problem as finding the right UG that can mediate PLD and G; UG(PLDL) = GL. Or as Fodor puts it (107): “the child’s data plus his intrinsic structure must jointly determine the linguistic information at which he arrives.”

And this, of course, has an obvious consequence:

…it is a conclusive disproof of any theory that about the child’s intrinsic structure to demonstrate that a device having the structure could not learn the syntax of the language on the basis of the kind of data that the child’s verbal environment provides (107).

This just is the Poverty of Stimulus argument (PoS): a tool for investigating the properties of UG by comparing the PLD to the features of the acquired G. In Fodor’s words:

…a comparison of the child’s data with a formulation of the linguistic information necessary to speak the language the child learns permits us to estimate the nature of the complexity of the child’s intrinsic structure. If the information in the child’s data approximates the linguistic information he must master, we may assume that the role of intrinsic structure is relatively insignificant. Conversely, if the linguistic information at which the child arrives is only indirectly or abstractly related to the data provided by the child’s exposure to adult speech, we shall to suppose that the child’s intrinsic structure is correspondingly complex.

I like Fodor’s use of the term ‘estimate.’ Note, so framed, the question is not whether there is intrinsic structure in the learner, but how “significant” it is (i.e. how much work it does in accounting for the acquired knowledge). And the measure of significance is the distance between the information the PLD provides and the G acquired.

Though Fodor doesn’t emphasize this, it means that all proposals regarding language learning must advert to Gs (i.e. to rules). After all, these are the end points of the process as Fodor describes it. And emphasizing this can have consequences. So, for example, observing that there are statistical regularities in the PLD is perhaps useful, but not enough. A specification of how the observed regularities lead to the rules acquired must be provided. In other words, as regularities are not themselves rules of G (though they can be the basis on which the learner infers/decides what rules G contains) accounts that stop at adumbrating the statistical regularities in a corpus are accounts that stop much too soon. In other words, pointing to such regularities (e.g. noting that a statistical learner can divide some set of linguistic data into two pieces) cannot be the final step in describing what the learner has learned. There must be a specification of the rule.

This really is an important point. Much stuff on statistical properties of corpora seem to take for granted that identifying a stats regularity in and of itself explains something. It doesn’t, at least in the case of language. Why? Because we know that there lurks rules in them there corpora. So though finding a regularity may in fact be very helpful (identifying that which the LAD looks for to help determine the properties of the relevant rules) it is not by itself enough. A specification of the route from the regularity to the rule is required or we have not described what is going on in the LAD.

Fodor then proceeds to flesh this tri-partite picture out. The first step is to describe the PLD. It consists of “a sample of the kinds of utterances fluent speakers of his language typically produce” (108). If entirely random (indeed, even if not), this “corpus” will be pretty noisy (i.e. slips of the tongue, utterances in different registers, false starts, etc.). The first task the child faces is to “discover regularities in these data (109).” This means ignoring some portions of the corpus and highlighting and grouping others. In other words, in language acquisition, the data is not given. It must be constructed.

This is not a small point. It is possible that some pre-sorting of the data can be done in the absence of the relevant grammatical categories and rules that are the ultimate targets of acquisition, but it is doubtful (at least to me) that these will get the child very far. If this is correct, then simply regimenting the data in a way useful to G acquisition will already require a specification of given (i.e. intrinsic) grammatical possibilities. Here’s Fodor on this (109):

His problem is to discover regularities in these data that, at the very least, can be relied upon to hold however much additional data is added. Characteristically the extrapolation takes the form of a construction of a theory that simultaneously marks the systematic similarities among the data at various levels of abstraction, permits the rejection of some of the observational data as unsystematic, and automatically provides a general characterization of the possible future observations. In the case of learning language, this theory is precisely the linguistic information at which the child arrives by applying his intrinsic information to the analysis of the corpus. In particular, this linguistic information is at the very least required to provide an abstract account of syntactic structure in terms of which systematically relevant features of the observed utterance can be discarded as violating the formation rules of the dialect, and in terms of which the notion “possible sentence of the language” can be defined (my emphasis, NH).

Nor is this enough. In addition we need principles that bridge from the provided data to the principles. What sorts of bridges? Fodor provides some suggestions.

…many of the assertions the child hears must be true, many of the things he hears referred to must exist, many of the questions he hears must be answerable, and many of the commands he receives must be performable. Clearly the child could not learn to talk if adults talked at random. (109-110).

Thus, in addition to a specification of the given grammatical possibilities, part of the acquisition process involves correlating linguistic input with available semantic information.

This is a very complicated process, as Fodor notes, and involves the identification of a-linguistic predicates that can be put into service absent knowledge of a G (those enjoying “epistemologically priority” in Chomsky’s sense). Things like UTAH can play a role here. For example, if we assume that the capacity for parsing a situation into agents and patients and recipients and themes and experiencers and…is a-linguistic, then this information can be used to map words (once identified) onto structures. In particular, if one assumes a mapping principle that says agents are external arguments and themes are internal arguments (so we can identify agenthood and themehood for at least a core number of initially available predicates) then hearing “Fido is biting Max” allows one to build a representation of this sentence such as [Fido [biting [Max]]].[2] This representation when coupled with what UG requires of Gs (e.g. case on DPs, agreement of unvalued phi features etc.) should allow this initial seeding to generate structures something like [TP Fido is [VP Fido [V’ biting Max]]]]. As Fodor notes, even if “Fido is biting Max” is observable, [TP Fido is [VP Fido [V’ biting Max]]]] is not. The PoS problem then is how to go from things like the first (utterances of sentences) to things like the second (phrase markers of sentences), and for this, a specification of the allowable grammatical options seems unavoidable, with these options (not themselves given in the data) being necessary for the child to organize the data in a usable form.

One of the most useful discussions in the paper begins around p. 113. Here Fodor distinguishes two problems: (i) specification of a device that given a corpus provides the correct G for that input, and (ii) specification of a device that given a corpus only attempts “to describe its input in terms of the kinds of relations that are known to be relevant to systematic linguistic description.” (i) is the project of explaining how particular Gs are acquired. (ii) is the project of adumbrating the class of possible Gs. Fodor describes (ii) as an “intermediate problem” (113) on the way to (i) (see here for some discussion of the distinction and some current ways of exploring the bridge). Why “intermediate”? Because it is reasonable to believe that restricting the class of possible Gs, (what Fodor describes as “characterizing a device that produces only non-“phony” extrapolations of corpuses” (114)) will contribute to understanding how a child settles on its specific G. As Fodor notes, there are “indefinitely many absurd hypothesis” that a given corpus is consistent with. And “whatever intrinsic structure the child brings to the language-learning situation must a least be sufficient to preclude the necessity of running through” them all.

So, one way to start addressing (i) is via (ii). Fodor also suggests another useful step (114-5):

It is worth considering the possibility that the child may bring to the language-learning situation a set of rules that takes him from the recognition of specified formal relations within and among strings in his data to specific putative characterizations of underlying structures for strings of those types. Such rules would implicitly define the space of hypotheses through which the child must search in order to arrive at the precisely correct syntactic analysis of his corpus.

So, Fodor in effect suggests two intermediate steps: a characterization of the range of possible extensions of a corpus (roughly a theory of possible Gs) and a specification of “learning rules” that specify trajectories through this space of possible Gs. This still leaves the hardest problem, how to globally order the Gs themselves (i.e. the evaluation metric). Here’s Fodor again (115):

Presumably the rules [the learning rules, NH] would have to be so formulated as to assume (1) that the number of possible analyses assigned to a given corpus is fairly small; (2) that the correct analysis (or at least any even a best analysis) is among these; (3) that the rules project no analysis that describes the corpus in terms of the sorts of phony properties already discussed, but that all the analyses exploit only relations of types that sometimes figure in adequate syntactic theories.

Fodor helpfully provides an illustration of such a learning rule (116-7). It maps non-contiguous string dependencies into underlying G rules that related these surface dependencies via a movement operation. The learning rule is roughly (1):

(1)  Given a string ‘I X J’ where the forms of I and J swing together (think be/ing in is kissing’ over a variable X (i.e. ‘kiss’), the learner assumes that this comes from a rule like (I,J) X à I X J.

This is but one example. Fodor notes as well that we might also find suggestions for such rules in the “techniques of substitution and classification traditionally employed in attempts to formulate linguistic discovery procedure[s].” As Fodor notes, this is not to endorse attempts to find discovery procedures. This project failed. Rather in the current more restricted context, these techniques might prove useful precisely because what we are expecting of them is more limited:

I am proposing…that the child may employ such relations as substitutability-in-frames to arrive at tentative classifications of elements and sequences of elements in his corpus and hence at tentative domains for the application of intrinsic rules for inducing base structures. Whether such a classification is retained or discarded would be contingent upon the relative simplicity of the entire system of which it forms a part.

In other words, these procedures might prove useful for finding surface statistical regularities in strings and mapping them to underlying G rules (drawn from the class of UG possible G rules) that generate these strings.

Fodor notes two important virtues of this way of seeing things. First, it allows the learner to take advantage of “distributional regularities” in the input to guide him to “tentative analyses that are required if he is to employ rules that project putative descriptions of underlying structure” (118). Second, these learning procedures need not be perfect, and more often than not might be wrong. The idea is that their frailties will be adequately compensated for by the intrinsic features of the learner (i.e. UG). In other words, he provides a nice vision of how techniques of statistical analysis of the corpus can be combined with principles of UG to (partially (remember, we still need an evaluation metric for a full story)) explain G acquisition.

There is lots more in the paper. It is even imbued with a certain two fold modesty.

First, Fodor’s outline starts by distinguishing two questions; a hard one (that he suggests we put aside for the moment) and an approachable one (that he outlines). The hard question is (in his words) “What sort of device would project a unique correct grammar on the basis of exposure to the data?” The approachable one is “What sort of device would project candidate grammars that are reasonably sensitive to the contents of the corpus and that operate only with the sorts of relations that are known to figure in linguistic descriptions?” (120). IMO, Fodor makes an excellent case for thinking that solving the approachable problem would be a good step towards answering the hard one. PoS arguments fit into this schema in that they allow us to plumb UG, which serves to specify the class of “non-phony” generalizations.  Add that to rules taking you from surface regularities to potential G analyses and you have the outlines of a project aimed at addressing the second question.

Fodor’s second modest moment comes when he acknowledges that his proposal “is surely incorrect as stated” (120). Here he is, IMO, the modesty is misplaced. Fodor’s proposal may be wrong in detail, but it lays out the various kinds of questions that need addressing and some mechanisms for how to do so lucidly and concisely.

As I noted, it’s great to sit in on other’s classes. It’s great to discover “unknown-to-you” classics. So, take a busman’s holiday. It’s fun.






[1] Fodor like the term ‘intrinsic’ rather than ‘innate’ for he allows for the possibility that some of these principles may themselves be learned. I would also add that ‘innate’ seems to raise red flags for some reason. As Fodor notes (as has Chomsky repeatedly) there cannot reasonably be an “innateness hypothesis.” Where there is generalization from data, there must be principles licensing this generalization. The question is not whether these given principles exist, but what they are. In this sense, everyone is a nativist.
[2] Of course, if one has a rich enough UG then this will also allow a derivation of the utterance wherein case and agreement have been discharged, but this is getting ahead of ourselves. Right now, what is relevant is that some semantic information can be very useful for acquiring syntactic structure even if, as Fodor notes, syntax is not reducible to semantics.