Faculty of Language: Cognition

Showing posts with label Cognition. Show all posts

Wednesday, October 7, 2015

What's in UG (part 1)?

This is the first of three posts on a forthcoming Cognition paper arguing against UG. The specific argument is against the Binding Theory. But the form is intended to generalize. The paper is written by excellent linguists, which is precisely why I spend three posts exposing its weaknesses. The paper, because it will appear in Cognition, is likely to be influential. It shouldn’t be. Here’s the first of three posts explaining why.

Let’s start with some truisms: not every property of a language particular G is innate. Here’s another one: some features of G reflect innate properties of the language acquisition device (LAD). Let’s end with a truth (that should be a truism by now but is still contested by some for reasons that are barely comprehensible): some of the innate LAD structure key to acquiring a G is linguistically dedicated (i.e. not cognitively general (i.e. due to UG)). These three claims should be obvious. True truisms. Sadly, they are not everywhere and always recognized as such. Not even by extremely talented linguists. I don’t know why this is so (though I will speculate towards the end of this note), but it is. Recent evidence comes from a forthcoming paper in Cognition (here) by Cole, Hermon and Yanti (CHY) on the UG status of the Binding Theory (BT).[1] The CHY argument is that BT cannot explain certain facts in a certain set of Javanese and Malay dialects. It concludes that binding cannot be innate. The very strong implication is that UG contains nothing like BT, and that even if it did it would not help explain how languages differ and how kids acquire their Gs. IMO, this implication is what got the paper into Cognition (anything that ends with the statement or implication that there is nothing special about language (i.e. Chomsky is wrong!!!) has a special preferential HOV lane in the new Cognition’s review process). Boy do I miss Jacques Mehler. Come back Jacques. Please.

Before getting into the details of CHY, let’s consider what the classical BT says.[2] It is divided into three principles and a definition of binding:

A. An anaphor must be bound in its domain

B. A pronominal cannot be bound in its domain

C. An R-expression cannot be bound

(1) An expression E binds an expression E’ iff E c-commands E’ and E is co-indexed with E’.

We also need a definition of ‘domain’ but I leave it to the reader to pick her/his favorite one. That’s the classical BT.

What does it say? It outlines a set of relations that must hold between classes of grammatical expressions. BT-A states that if some expression is in the grammatical category ‘anaphor’ then it must have a local c-commanding binder. BT-B states that if some expression is in the category ‘pronominal’ then it cannot have a local c-commanding binder. And BT-C states, well you know what it states, if…

Now what does BT not say? It says nothing about which phonetically visible expressions fall into which class. It does not say that every overt expression must fall into at least one of these classes. It does not say that every G must contain expressions that fall into these classes. In fact, BT by itself says nothing at all about how a given “visible” morphologically/phonetically visible expression distributes or what licensing conditions it must enter into. In other words, by itself BT does not tell us, for example, that (2) is ungrammatical. All it says is that if ‘herself’ is an anaphor then it needs a binder. That’s it.

(2) John likes herself

How then does BT gain empirical traction? It does so via the further assumption that reflexives in English are BT anaphors (and, additionally, that binding triggers morphologically overt agreement in English reflexives). Assuming this, ‘herself’ is subject to principle BT-A and assuming that John is masculine, herself has no binder in its domain, and so violates BT-A above. This means that the structure underlying (2) is ungrammatical and this is signaled by (2)’s unacceptability.

As stated, there is a considerable distance between a linguistic object’s surface form and its underlying grammatical one. So what’s the empirical advantage of assuming something as abstract as the classical BT? The most important reason, IMO, is that it helps resolve a critical Poverty of Stimulus (PoS) problem. Let me explain (and I will do this slowly for CHY never actually explains what the specific PoS problem in the domain of binding is (though they allude to the problem as an important feature of their investigation), and this, IMO, allows the paper to end in intellectually unfortunate places).

As BT connoisseurs know, the distribution of overt reflexives and pronouns is quite restricted. Here is the standard data:[3]

(3) a. John₁ likes herself_1/*₂

b. John₁ likes himself_1/*2

c. John₁ talked to Bill₂ about himself_1/2/*3

d. John₁ expects Mary₂ to like himself_*1/*2/*3

e. John₁ expects Mary₂ to like herself_*1/2/*3

f. John₁ expects himself_1/*2/*3 to like Mary₂

g. John₁ expects (that) he/himself_*1/*2/*3 will like Mary₂

If we assume that reflexives are BT-A-anaphors then we can explain all of this data. Where’s the PoS problem? Well, lots of these data concern what cannot happen. On the assumption that the ungrammatical cases in (3) are not attested in the PLD, then the fact that a typical English Language Acquisition Device (LAD, aka, kid) converges on the grammatical profile outlined in (3) must mean that this profile in part reflects intrinsic features of the LAD. For example, the fact that kids do not generalize from the acceptability of (3f) to conclude that (3g) should also be acceptable needs to be explained and it is implausible that the LAD infers that that this is an incorrect inference by inspecting unacceptable sentences like (3g), for being unacceptable they will not appear in the PLD.[4] Thus, how LADs come to converge to Gs that allow the good sentences and prevent the bad ones looks like (because it is) a standard PoS puzzle.

How does assuming that BT is part of UG solve the problem? Well, it doesn’t, not all by itself (and nobody ever thought that it could all by itself). But it radically changes it. Here’s what I mean.

If BT is part of UG then the acquisition problem facing the LAD boils down to identifying those expressions in your language that are anaphors, pronominals and R-expressions. This is not an easy task, but it is easier than figuring this out plus figuring out the data distribution in (3). In fact, as I doubt that there is any PLD able to fix the data in (3) (this is after all what the PoS problem in the binding domain consists in) and as it is obvious that any theory of binding will need to have the LAD figure out (i.e. learn) using the PLD which overt morphemes (if any) are BT anaphors/pronominals (after all, ‘himself’ is a reflexive in English but not in French and I assume that this fact must be acquired on the basis of PLD) then the best story wrt Plato’s Problem in the domain of binding is where what must obviously be learned is all that must be learned. Why? Because once I know that reflexives in English are BT anaphors subject to BT-A then I get the knowledge illustrated by the data in (3) as a UG bonus. That’s how PoS problems are solved.[5] So, to repeat: all the LAD needs do to become binding competent is figure out which overt expressions fall into which binding categories. Do this and the rest is an epistemic freebie.

Furthermore, it’s virtually certain that the UG BT principles act as useful guides for the categorization of morphemes into the abstract categories BT trucks in (i.e. anaphor, pronominal, and R-expression). Take anaphors. If BT is part of UG it provides the LAD with some diagnostics for anaphoricity. Anaphors must have antecedents. They must be local and high enough. This means that if the LAD hears a sentence like John scratched himself in a situation where John is indeed scratching himself then he has prima facie evidence that ‘himself’ is a reflexive (as it fits A constraints). Of course, the LAD may be wrong (hence the ‘prima facie’ above). For example, say that the LAD also hears pairs of sentences like John loves Mary. She loves himself too and ‘himself’ here is anaphoric to John, then the LAD has evidence that reflexives are not just subject to BT-A (i.e. they are at best ambiguous morphemes and at worst not subject to BT-A at all). So, I can see how PLD of the right sort in conjunction with an innate UG provided BT-A would help with the classification of morphemes to the more abstract categories using simple PLD in the.[6] That’s another nice feature of an articulate UG.

Please observe: on this view of things UG is an important part of a theory of language learning. It is not itself a theory of learning. This point was made in Aspects, and is as true today as it was then. In fact, you might say that in the current climate of Bayesian excess that it is the obvious conclusion to draw: UG limns the hyporthesis space that the learning procedure explores. There are many current models of how UG knowledge might be incorporated in more explicit learning accounts of various flavors (see Charles Yang’s work or Jeff Lidz’s stuff for some recent general proposals and worked out examples).

Does any of this suppose that the LAD uses only attested BT patterns in learning to classify expressions? Of course not. For example, the LAD might conclude that ‘itself’ is a BT-A anaphor in English on first encountering it. Why? By generalizing from forms it has encountered before (e.g. ‘herself’, ‘themselves’). Here the generalization is guided not by UG binding properties but by the details of English morphology. It is easy to imagine other useful learning strategies (see note 6). However, it seems likely that one way the LAD will distinguish BT-A from BT-B morphemes will be in terms of their cataphoric possibilities positively evidenced in the PLD.

So, BT as part of UG can indeed help solve a PoS problem (by simplifying what needs to be acquired) and plausibly provides guide-posts towards that classification. However, BT does not suffice to fix knowledge of binding all by itself nor did anyone ever think that it would. Moreover, even the most rabid linguistic nativist (I know because I am one of these) is not committed to any particular pattern of surface data. To repeat, BT does not imply anything about how morphemes fall into any of the relevant categories or even if any of them do or even if there are any relevant surface categories to fall into.

With this as background, we are now ready to discuss CHY. I will do this in the next post.

[1] I have been a great admirer of both Cole and Hermon’s work for a long time. They are extremely good linguists, much better than I could ever hope to be. This paper, however, is not good at all. It’s the paper, not the people, that this post discusses.

[2] I will discuss the GB version for this is what CHY discusses. I personally believe that this version of BT is reducible to the theory of movement (A-chain dependencies actually). The story I favor looks more like the old Lees & Klima account. I hope to blog about the differences in the very near future.

[3] As GGers also know, the judgments effectively reverse if we replace the reflexive with a bound pronoun. This reflects the fact that in languages like English, reflexives and bound pronouns are (roughly) in complementary distribution. This fact results from the opposite requirements stated in BT-A and BT-B. The same effect was achieved in earlier theories of binding (e.g. Lees and Klima) by other means.

[4] From what I know, sentences like (3g) are unattested in CHILDES. Indeed, though I don’t know this, I suspect that sentences with reflexives in ECM subject position are not a dime a dozen either.

[5] I assume that I need not say that once one figures out which (if any) of the morphemes are pronominals then BT-B effects (the opposite of those in (3) with pronouns replacing reflexives) follow apace. As I need not say this, I won’t.

[6] Please note that this is simply an illustration, not a full proposal. There are many wrinkles one could add. Here’s another potential learning principle: LADs are predisposed to analyze dependencies in BT terms if this is possible. Thus the default analysis is to treat a dependency as a BT dependency. But this principle, again, is not an assumption properly part of BT. It is part of the learning theory that incorporates a UG BT.

Saturday, March 7, 2015

How to make $1,000,000

My mother once told me about an easy way to become a millionaire: start with $10 million. This seems to be advice that second generations are particularly good at following. And not only as regards inter-generational wealth transfer. Like families, journals also enjoy life cycles, with founders giving way to a next generation. And as in families, regression to the mean (i.e. the headlong rush to average) seems to be an inexorable force. However, in contrast to the rise and decline of wealth in families, the move from provocative to staid in journals is rarely catalogued. Rarely, but not never. Here is a paper (by Priva and Austerweil (P&A)) that charts the change in intellectual focus of Cognition, a (once?) very important cogsci journal. What does P&A show? Two things: (i) It shows that the mix of papers in the journal has substantially changed. Whereas in the beginning, there was a fair mix of theory and experimental papers (theory papers predominating), since the mid 2000s the mix has dramatically changed, with experimental papers forming the bulk of academic product. Theory papers have not entirely disappeared, but they have been substantially overtaken by their experimental kin. (ii) That papers on language and development have gone from central topics of interest to a somewhat marginal presence.[1]

How surprising is this? Let me start by discussing (i), the decline of “theory,” a favorite obsession of mine (see here). Well, first off, from one common perspective, some decline might be expected. We all know the Kuhnian trope; “revolutionary” periods of scientific inquiry where paradigms are contested, big ideas are born and old ones die out (one old fogey at a time in Plank time) give way to periods of “normal science” where the solid majestic wall of scientific accomplishment is carefully and industriously built brick by careful empirical brick. The picture offered is one in which the basic framework ideas get hashed out and then their implications are empirically fleshed out. I never really liked this way of conceptualizing matters (there is a lot of hashing and fleshing going on all the time in serious work), but I think that this picture has some appeal descriptively. Sadly, it also seems to have normative attractions, especially to next generation editors. Here’s what I mean.

Editing a journal is a lot of work. Much of it thankless. So before I go off the deep end here in a minute, let me personally thank those that take this task on, for their work and commitment is invaluable and what we think of as science could not succeed without such effort. That said, precisely because of how hard it is to do, you need to be driven (nuts?) to start a journal. What drives you? The feeling that there is something new to say but that there is no good place to say it. Moreover, not only is that something new, it must be important and new. And it is not possible to say these new important things in the current journals because the new ideas cut things up in new ways or approach problems from premises that don’t fit into the existing journalistic matrix.[2] So, at the very least, the extant venues are not congenial places to publish, and in some cases are outright hostile.

The emergence of cogsci (something that happened when I was growing up intellectually) had this feel to it. There was a self-conscious cognitive revolution, with very self-conscious revolutionaries. Furthermore, this revolution was fought on several fronts: linguistics, psychology, computer science and philosophy being the four main ones. Indeed, for a while, it was not clear where one left off and the other began. Linguists read philosophy and psychology papers, psychologists knew about Transformational Grammar and could Locke from Descartes, philosophers debated what to make of innate ideas, representations and rule following based on work in linguistic and psychology and computer scientists (especially in AI) worried about the computational properties of mental representations (think Marr and Marcus for example). Cogsci lived at the intersection of these many disciplines, was nurtured by their cross disciplinary discussions and, for someone like me, cogsci became identified as the investigation of the structures of minds (and one day brains) using the techniques and methods of thought that each discipline brought to the feast. Boy was this exciting. Not surprisingly, the premiere journal for the advancement of this vision was Cognition. Why not surprisingly? Because the founding editors, Jacques Mehler and Tom Bever, were two people that thoroughly embodied this new combined intellectual vision (and were and are two of its leading lights) and they built Cognition to reflect it.[3]

A nice way of seeing this is to read Mehler’s “farewell remarks” here. It is very explicit about what gap the journal was intended to fill:

Our aim was to change the publishing landscape in psychology and related disciplines that became part of “Cognitive Science.” …[P]sychology had turned almost exclusively into an experimental discipline with an overt disdain for theory…Linguistics had become a descriptive discipline often favoring normative or purely descriptive over theoretical approaches. Professional journals in line with this outlook generally obliged contributors to write their papers in standard format that privileged the shortest possible introductions and conclusions, methods and procedures used in experiments. Papers by non-experimental scientists say, philosophers of mind or theoretical linguists, were rarely even accepted…. (p. 7)

In service of this, the journal was the venue of lots of BIG debates concerning connectionism, the representational theory of mind, compositionality, AI models of mind, prototypes, domain specificity, computational complexity, core knowledge and much much more. In fact, Cognition did something almost miraculous: It became a truly inter-disciplinary journal, something that administrators and science bureaucrats (including publishers) love to talk about (but, it seems, often fail appreciate when it happens).

P&A records that this Cognition now seems to be largely gone. It is no longer the journal its editors founded. There is little philosophy and little linguistics or linguistically based psychology. Nor does it seem to any longer be the venue where big ideas are thrashed out. Three illustrations: (i) the critical discussions concerning Bayesian methods in psychology have not occurred in the pages of Cognition,[4] (ii) nor have the Gallistel-like critiques of connectionist neuro-science gotten much of an airing, (iii) nor have extensive critiques of resurgent “language” empiricism (e.g. Tomasello) made an appearance. These have gotten play elsewhere, and that is a good thing, but these dogs have not barked in Cognition, and their absence is a good indicator of how much Cognition has changed. Moreover, this change is no accident. It was policy.

How so? Well, in the same issue that Mehler penned his farewell the new incoming editor Gerry Altmann gave his inaugural editorial (here). It’s really worth reading the Mehler and Altmann pieces side by side, if nothing else as an exercise in the sociology of science. I’ve rarely read anything that so embodies (and embraces) the Kuhnian distinction between revolutionary vs normal science. Altmann’s editorial is six pages long. After some standard boilerplate thanking Mehler & Co. for its path-breaking efforts, Altmann sets out his vision of the future. It comes in two parts.

First the ideal paper:

To be published in Cognition, articles must be robust in respect of the fit between the theory, the data and the literature in which the work is grounded. They should have a breadth to them that enables the specific research they describe to make contact with more general issues in cognition; the more explicit this contact, the greater the impact of the research beyond the confines of the specialized research community. (2)

It’s worth contrasting this ideal with the more expansive one provided by Mehler above. In Altmann’s, there is already an emphasis on “data” that was missing from Mehler’s discussion. In other words, Altmann’s ideal has an up front experimental tilt. Data’s the lede. The vision thing is filler. To see this, read the two sentences in reverse order. The sense of what is important changes. In the actual quoted order what matters is data fit then idea quality. Reverse the sentences and we get first idea quality and then data fit. Moreover, unlike Mehler’s pitch, what’s clear here is that Altmann does not envision papers that might be good and worthwhile even were they bereft of data to fit. It more or less assumes that the conceptual issues that were at the foundation of the cogsci revolution have all been thoroughly investigated and understood (or were largely irrelevant to begin with (dare I say, maybe even BS?)). More charitably, it assumes that if something new does rise under the cognitive sun, it will arise from the carefully fitted data. In short, the main job of the cogscientist is see how the theories fit the facts (or vice versa). Theory alone is so your grandparent’s cognition.

The second part of the editorial reinforces this reading. The last 3 pages (i.e. half the editorial), section 3, concerns “the appropriate analyses of data” (4). It’s a long discussion of what stats to use and how to use them. There is no equally long section discussing thos hot topics/problems, what issues are worth addressing and why. This reinforces the conclusion that what Cognition will henceforth worry most about is data fit and experimental procedure. Sounds like the kind of journal that Mehler and Bever had hoped that Cognition would displace. Indeed, prior to Cognition’s founding, psychology had lots of the kinds of Journals that the Altmann editorial aspires to. That’s precisely why Mehler and Bever started their journal. Altmann appears to think that psychology needs one more.

If this read is right, then it is not surprising that Cognition’s content profile has changed over the years. It is not merely that new topics get hot and old ones get stale. Rather, it is that what was once a journal interested in bridging disciplines, critically investigating big issues and provoking thought, “grew up” and happily assumed the role of purveyor of “normal” science. A nice well behaved journal, just like most of the others.

Last two points. Given the apparent dearth of interest in theory, it is not a surprise (to me) that work on language is less represented in the new Cognition. Anything that takes linguistic theory seriously in psycho study will be suspect to those with a great respect for psychological techniques (we don’t gather data the right way, there is a distance between competence and performance, we think that minds are not all purpose learners etc.). Thus taking results in linguistic theory as starting points will go against the intellectual grain where theory is less important than data points. This need not have been so. But that it is so is not surprising.

Second, there is a weird part of Altmann’s editorial concerning the “collaborative” nature of science and how this should be reflected in the “editorial structure” of the journal. Basically, it seems to be signaling a departure from past methods. I don’t really know how the Mehler era operated “editorially.” But it would not surprise me were he (and Bever) more activist editors than is commonplace. This would go far, IMO, in explaining why the old Cognition was such a great journal. It expressed the excellent taste of its leaders. This is typically true of great journals. At one time the leading figures edited journals and imposed their tastes on the field, to its benefit. Max Planck edited the journals that published Einstein’s groundbreaking (and very unconventional) papers.[5] Keynes edited the most important economics journal of his day. Mehler and Bever were intellectual leaders and Cognition reflected their excellent taste in questions and problems. It strikes me that the Altmann editorial is a non too subtle critique of this. It’s saying that going forward editorial decisions would be more balanced and shared. In other words, more watered down, more common denominatorish, less quirky, more fashionnable. There is room for this ideal, one where the aim is to reflect the scientific “consensus.” Today, in fact, this is what most journals do. Mehler and Bever’s Cognition did not.

To end: Cognition has changed. Why? Because it wanted to. It has managed to achieve exactly what the new regime was aiming for. The old Cognition stood apart, had a broad vision and had the courage of its new ideas. The new Cognition has re-joined the fold. A good journal (no doubt). But no longer a distinctive one. It’s not where people go to see the most important ideas in cognition vigorously debated. It’s become a professional’s journal, one among many. Does it publish good papers? Sure. Is it the indispensible journal in cogsci that it once was? Not so much. IMO, that’s really too bad. However, it is educational, for now you know how to make $1,000,000 dollars. Just be sure to start off with $10,000,000.

[1] This is all premised on the assumption that the topic model methodology used in the paper accurately reflects what has been going on. This may be incorrect. However, I confess that it accurately reflects what many people I know have noted anecdotally.

[2] Is this PoMo or what? With a tinge of the Wachowskis thrown in.

[3] And you know many of the others. To name a few: Chomsky, two Fodors, Gleitman, Gallistel, Katz, Pylyshyn, Gellman, Garrett, Block, Carey, Spelke, Berwick, Marr, Marcus, a.o.

[4] E.g. Eberhardt & Danks, Brown & Love, Bowers & Davis, Marcus have all appeared in other venues. See here, here and here form some discussion and references.

[5] A friend of mine in theoretical physics once told me that he doubted that papers like Einstein’s great 1905 quartet could be published today. Even by the standards in 1905 they looked strange. Moreover, they were from a nobody working in a patent office. It’s a good thing for Einstein that Planck, one of the leading physicists of his day, was the editor of Annalen der Physik.

Saturday, June 28, 2014

Thomas Can't Into Cognitive Modelling

This week I got to present at CMCL 2014, a workshop on computational models of language-related cognition, i.e. processing, acquisition, discourse representation, and so on. My talk was about the connection between Stabler's top-down parser for Minimalist grammars and the processing of relative clauses, something I've been working on for a while now with Bradley Marcinek, a student of mine. Thanks to Greg Kobele, John Hale and Sabrina Gerth, we already know that the predictions of this parser depend on one's syntactic analysis in interesting ways, so we wanted to extend their line of work to some other well-known phenomena. Long story short, our results are rather messy and it will be a while until we can get this idea truly off the ground.

That is why I won't blog about this research quite yet (except for the shameless self-promotion above) and instead focus on the talks I heard, rather than the one I gave. Don't get me wrong, many of them were very interesting to me on a technical level; some of them even pierced my 90s habitus of acerbic cynicism and got me a bit excited. Quite generally, a fun time was had by all. But the talks made me aware of a gapping hole in my understanding of the field, a hole that one of you (I believe we have some readers with serious modelling chops) may be able to plug for me: Just what is the point of cognitive modelling?

Faculty of Language

Comments