Friday, November 30, 2018

An addendum to the last post

Here is an amusing addendum to the last post. The NYT ran an article (here) discussing the edge that China will have over the rest of the world in AI research. What’s the edge? Cheap labor! Now you might find this odd, after all, AI is supposed to be that which makes labor superfluous (you know, the machines are coming and they are going to take all of the jobs). So, why should cheap labor give the Chinese such an advantage? Easy. Without it you cannot hand annotate and hand curate all the data that is getting sucked up. And without that, there is no intelligence, artificial or otherwise. Here is what cheap labor gets you:

Inside, Hou Xiameng runs a company that helps artificial intelligence make sense of the world. Two dozen young people go through photos and videos, labeling just about everything they see. That’s a car. That’s a traffic light. That’s bread, that’s milk, that’s chocolate. That’s what it looks like when a person walks.

As a very perceptive labeler put it (odd that this observation has not been made by those computer scientists pushing Deep Learning and Big Data. I guess that nothing dampens critical awareness more completely than the kaching of the cash register):

“I used to think the machines are geniuses,” Ms. Hou, 24, said. “Now I know we’re the reason for their genius.”

Right now, this is the state of the art. All discussion of moving to unsupervised, uncurated learning is, at this time, idle talk. The money is in labeled data that uses the same old methods we long ago understood would not be useful for understanding either human or animal cognition. What makes humans and animals different from machines is what they come to the learning problem with; lots and lots of pre-packaged innate knowledge. Once we have a handle of what this is we can begin to ask how it works and how to put it into machines. This is the hard problem. Sadly, much of AI seems to ignore it.

Monday, November 26, 2018

What's innate?

Johan Bolhuis sent me a copy of a recent comment in TiCS(Priors in animal and artificial intelligence (henceforth Priors))on the utility of rich innate priors in cognition, both in actual animals and artificially in machines. Following Pinker, Priorsframes the issue in terms of the blank slate hypothesis (BSH) (tabula rasafor you Latin lovers). It puts the issue as follows (963):

Empiricists and nativists have clashed for centuries in understanding the architecture of the mind: the former as a tabula rasa, and the latter as a system designed prior to experience…The question, summarized in the debate between the nativist Gary Marcus and the pioneer of machine learning, Yann LeCun, is the following: shall we search for a unitary general learning principle able to flexibly adapt to all conditions, including novel ones, or structure artificial minds with driving assumptions, or priors, that orient learning and improve acquisition speed by imposing limiting biases?

Marcus’ paper (here) (whose philosophical framework Priorsuses as backdrop for its more particular discussion) relates BSH to the old innateness question, which it contends revolves around “trying to reduce the amount of innate machinery in a given system” (1). I want to discuss this way of putting things, and I will be a bit critical. But before diving in, I want to say that I really enjoyed both papers and I believe that they are very useful additions to the current discussion. They both make excellent points and I agree with almost all their content. However, I think that the way they framed the relevant issue, in terms of innateness and blank slates, is misleading and concedes too much to the Empiricist (E) side of the debate. 

My point will be a simple one: the relevant question is not how much innate machinery, but what kindof innate machinery. As Chomsky andQuine observed a long time ago, everyonewho discusses learning and cognition is waste deep in a lot of innate machinery. The reason is that learning without a learning mechanismis impossible. And if one has a learning mechanism in terms of which learning occurs, then that learning mechanism is not itself learned. And if it is note learned then it is innate. Or, to put this more simply, the mechanism that allows for learning is a precondition for learning and preconditions are fixed prior to that which they precondition. Hence all features of the learning mechanism are innate in the simple sense of not themselves being learned. This is a simple logical point, and all who discuss these issues are aware of this point. So the question is not, never has been, and never could not have been is there innate structure?Rather the question is, always has been and always will be what structure is innate?

Why is putting things in this way important? Because arguing about the amountof innate structure gives Eists the argumentative edge. Ockham like considerations will always favor using less machinery rather than more all things being equal. So putting things as the Marcus paper and Priorsdoes is to say that the Eist position is methodologically preferable to the Rationalist (R) one. Putting things in terms of what kinds of innate machinery is required (to solve a given learning problem), rather than how much considerably levels the methodological playing field. If both E and R conceptions require boatloads of innate machinery to get anywhere, then the question moves from whether innate structure is needed (as the BSH slyly implicates) to what sort is needed (which is the serious empirical question).

This said, let’s zero in on some specifics. What makes an approach Eist? There are two basic ingredients. The first important ingredient is associationism (Aism). This is the glue that holds “ideas” together. However, this is not all. There is a second important ingredient: perceptualism (Pism). Pism is the idea that all mental contents are effectively reducible to perceptual contents, which are themselves effectively reducible to sensory concepts (sensationalism (Sism)). 

This pair of claims lies at the center of Eist theories of mind. And the notion of the blank slate emphasizes the second. We find this reflected in a famous Eish slogan: “there is nothing in the mind that is not first in the senses.” The Eish conception unites P/Sism with Aism to get to the conclusion that all mental contents are either primitive sensory/perceptual “ideas” or constructed out of sensory/perceptual input via association. The problems with Eism arise from both sources and revolve around two claims: the denial that mental concepts interrelate other than by association (they have no further interesting logical structure) and that all ideas are congeries of sensory perceptions. These two assumptions combine to provide a strong environmentalist approach to cognition wherein the structure of the environment largely shapes the contents of the mind through the probabilistic distributions of sensory/perceptual inputs. Rism denies bothclaims. It argues that association is not the fundamental conceptual glue that relates mental contents anddenies that all complex mental contents are combinations of sensory/perceptual inputs. To wax metaphorical, for Eists, only sensation can write on our mental blank slates and the greater the sensations the more vivid the images that appear. Rists think this is empirical bunk.

Note that this combination of cognitive assumptions has a third property. Given Eist assumptions, cognition is general purpose. If cognition is nothing but tracking the frequencies of sensory inputs then all cognition is of a piece, the only difference being the sensations/perceptions being tracked. There is no modularity or domain specificity beyond that afforded by the different sensory mechanisms, nor rules of “combination” beyond those tracking the differential exposure to some sensations over others. Thus for Eists, the domain generality of cognition is not an additional assumption. It is the consequence of Eisms two foundational premises.

Now, we actually know today that Eism will not work (actually, we knew this way back way back when). In particular, Pism/Sism was very thoroughly explored at the turn of the 20thcentury and shown to be hopeless. There were vigorous attempts to reduce our conceptual contents to sense data. And these efforts completely failed! Pism/Sism, in other words, is a hopeless position. So hopeless, in fact, that the only place it still survives is in AI and certain parts of psychology. Deep Learning (DL), it seems, is the latest incarnation of P/Sism+Aism right now. BothPriorsand Marcus elegantly debunk DLs inflated pretentions by showing both that the assumptions are biologically untenable and that they are adhered to more in the PR discussions than in the practice of the parade cases meant to illustrate successful AI learners.[1] I refer you to their useful discussions. See especially their excellent points concerning how much actual learning in humans and animals is based on very little input (i.e. from a very limited number of examples). DL requires Big Data (BD) to be even remotely plausible. And this data must be quite carefully curated (i.e. supervised) to be of use. Both papers make the obvious point that much biological learning is done from very few example cases (sparse data) and is unsupervised (hence notcurated). This makes most of what DLers have “discovered” largely irrelevant as models for biologically plausible theories of cognition. Sadly, the two papers do notcome right out and say this, though they hint at it furiously. It seems that the political power of DL is such that frankly saying that this emperor is hardly clothed will not be well rewarded.[2]Hence, though the papers make this point, it is largely done in a way that bends over backwards to emphasize the virtues of DL and not appear to be critically shrill. IMO, there is a cost to this politeness.

One last point and I stop. Priorsmakes a cute observation, at least one that I never considered. Eists of the DL and connectionist variety loveplasticity. They want flexible minds/brains because these are what the combination of Aism and P/Sism entails.Priorsmakes the nice observation that if flexibility is understood as plasticity then plasticity is something that biology only values in smalldoses. Brains cease being plastic after a shortish critical period. This Priorsnotes implies that there is a biological cost of being relentlessly open minded. You can see why I might positively reverberate to this observation.

Ok, nuff said. The two papers are very good and are shortish as well. Priorsis perfect for anyone wanting to have a non human case to illustrate Rish themes in a class on language and mind. The Marcus piece is part of a series of excellent papers he has been putting out reviewing the hype behind DL and taking it down several pegs (though, again, I wish he were less charitable). From these papers and the references they cite, it strikes me that the hype that has surrounded DL is starting to wear thin. Call me a hopeless romantic, but maybe when the overheated PR dies down and it becomes clear that the problems the latest round of Eish accounts solved were not the central problems in cognition, we can return to some serious science.  

[1]An aside: there is more than a passing similarity between the old attempts to reduce mental contents to sense data and the current fad in DL of trying to understand everything in terms of pixel distributional properties. History seems to constantly repeat; the first time as insight, the second time as a long con. Not surprisingly, the attempt to extract the notion “object” or “cat” from pixel distributions is no more successful today than were prior attempts to squeeze such notions from sense data. Ditto with algebraic structure from associations. It is really useful to appreciate how long we have known that Eism cannot be a serious basis for cognition. The failures Priorsand Marcus observe are not new ones, just the same old failures gussied up in technically spiffier garb.

[2]Some influential voices are becoming far more critical. Shalizi (here) notes that much of DL is simply a repackaging of perceptrons (“extracting features from the environment which work in that environment to make a behaviorally-relevant classificationor prediction or immediate action”) and will have roughly the same limitations that perceptrons had (viz. “This sort of perception is fast, automatic, and tuned to very, very particular features of the environment… They generalize to more data from their training environment, but not to new environments…”).  Shalizi, like Marcus andPriors, locates the problems with these systems in their lack of “abstract, compositional, combinatorial understanding we (and other animals) show in manipulating our environment, in planning, in social interaction, and in the structure of language.” 
            In other words, DL is basically the same old stuff repackaged for the credulous “smart” technopilic shopper. You cannot keep selling perceptrons, so repackage and sell it as DeepLearning (the ‘deep’ here is, no doubt, the contribution of the marketing department). The fact is that the same stuff that was problematic before is problematic still. There is no way to “abstract” out compositional and combinatorial principles and structures from devices aimed to track “particular features of the environment.” 

Monday, November 12, 2018

Guest post by William Matchin on SinFonIJA part 2

Last post, I discussed the main thrust of my comments at the roundtable. Now I raise some of the points raised by the keynote speakers and my thoughts on them.

Cedric Boeckx

The standard perception of an outsider looking at formal linguistics (e.g., a psychologist or neuroscientist) is that rather than pursuing fundamental questions about the nature of the mind/brain, they are really philologists interested in the history and grammar of particular languages divorced from psychology and biology. Cedric’s talk and comments throughout the conference explicitly made this point – that linguists have overwhelmingly focused on the analysis of particular constructions and/or particular languages using the tools of generative grammar, but not really addressing the foundational questions driving the field: the basic architecture of the faculty of language (Humboldt’s problem), the relative contributions of its genetic and environmental components (Plato’s problem), how it is implemented in the human brain (Darwin’s problem), and how it evolved (Darwin’s problem). Readers of this blog are certainly aware that Norbert has repeatedly made this point (i.e. the linguist/languist distinction). Cedric’s presence at the conference amounted to being an evangelist for biolinguistics and a return to these core questions, with dire warnings about the future of the field (and job prospects) if linguists do not.

There is a lot of truth in these statements. From my perspective, I rarely see somebody in linguistics (or psycholinguistics) explain to me why exactly I as a neuroscientist should care about this particular analysis or experiment regarding some construction in some language.Neuroscientists and those studying aphasia or other language disorders repeatedly make this point. Of course there are relevant connections, in that analyses of particular phenomena (should) inform the general theory, which is ultimately a theory of biological architecture. When aspiring to communicate with other cognitive scientists, these connections need to be made explicit. Cedric would have the philology essentially stop wholesale – while I do not agree with this extreme view, I do agree that much of this work seems unnecessary to the broader goal of advancing the theory, especially in absence of even attempts to make these connections.

I would like to add two points.

First, this issue is hardly unique to linguistics. Largely the same issues hold in other fields (e.g., why exactlydoes it matter which lesion distributions are associated with a particular syndrome, either for patient care or theoretical development?). But there isn’t the convenient languistics/philology label to apply to those that seem distracted from the central questions at hand. In some ways this is a credit to linguistics – the philosophical foundations of the field have been laid explicit enough to point out the difference between philology and linguistics. Neuroscientists are ostensibly interested in asking fundamental questions about how the brain works. But I think this widespread myopia arises in part because of sociology – we are constantly interacting with our peers, feeling compelled to react to (and express sympathy for) what they are doing, and it is far easier for a new graduate study to perform a neuroimaging experiment and publish a paper following up on what somebody else has done than to reflect on the basic nature of the mind and brain. For good or bad, our field rests on interaction and personal connections: being invited to conferences, having reviewers knowledgeable of our work and sympathetic to it, asking for letters of recommendation. There are few worldly incentives for pursuing the big questions, and this cuts across all of the sciences.

Second, at least within the GG community (as exemplified by SinFonIJA), people really doseem to care about the fundamental questions. The people who gave keynote lectures whose careers are devoted to linguistic analysis within the generative tradition (Ian Roberts, Lanko Marušič, and Tobias Sheer) all presented on topics about the fundamental questions listed above. Every senior linguist I talked to at the conference clearly had thought and reflected deeply on the fundamental issues. In fact, one linguist mentioned to me that the lack of focus on fundamentals is not from lack of interest but rather distraction (in the manner described above). Many of the young people at the conference buzzed Cedric and me with questions and seemed quite interested to hear what we had to say. Apparently these issues are at the forefront of their mind – and it’s always telling to get a sense of what the young people are thinking, because they are the future of the field (that is, ifthey get jobs).

Overall, I agree with much of what Cedric said, and there were quite similar currents in both of our talks. The main question that I think linguists should be asking themselves is this: what do I really care about? Do I care about the philosophical problems outlined above? Or do I care about analyses of specific languages, i.e. philology? If the answer is the former, then I very much recommend thinking of ways to help reconnect linguistics with the other cognitive sciences. If the answer is the latter, then I don’t have much to say to you except, to quote Cedric, “good luck”.

I love languages – I see the appeal and the comfort of leaving the heavy theoretical work to others. I have spent much of my life learning languages and learning about their corresponding cultures. But that is not nearly enough to sustain the field of linguistics much further into the modern age, in which funding for the humanities is being dramatically cut in the U.S. and other scientists, potential collaborators who will still be funded, are very much disenchanted with generative grammar.

One last comment about Cedric’s talk. While we agree on the point above, we disagree about what is currently being done in other fields like language evolution. His perspective seems to be that people are making real progress, and my perspective echoes Chomsky – skepticism of much of this work, particularly with respect to evolution. I think that Cedric has a bit of a “grass is greener” syndrome. However, I do not mean to be completely pessimistic, and the work by people like Alec Marantz, Stanislaus Dehaene, Christophe Pallier, Greg Hickok, John Hale, Jonathan Brennan, and others presents promising connections between neurobiology and linguistic theory. As reviewed here on FoL, Randy Gallistel has been highlighting interesting findings in cellular neuroscience that inform us about how neurons actually store representations and perform computations over them. 

Ian Roberts

As I mentioned above, Ian Roberts gave an interesting keynote lecture highlighting the viability of a single cycle that underlies all of syntactic derivation. It is this sort of work, reinforcing the basic components of the competence model from a forest (rather than a trees) perspective, that presents a tantalizing opportunity for asking how such properties are implemented in the brain.

However, I was somewhat disappointed in Ian’s response to my presentation calling for such an integration. He pointed out the viability of a purely Platonic view of formal linguistics; that is, that the study of linguistics can be perfectly carried out without concern for integration with biology (to be clear: he did not endorsethis view, but merely pointed out its viability). He also seemed to dismiss the absence of invitations for interaction to formal linguists from the other cognitive sciences as flaws in those fields/individuals. The underlying thread was something like: “we’re doing perfectly fine, thank you”.

I do not disagree. One cando platonic linguistics, and cognitive scientists areunfairly dismissive of formal linguistics. But this misses the point (although perhaps not Cedric’s). The point was: assumingwe want integration of formal linguistics with other fields (and I think almost everyone agrees that we do, at least given my impressions from the conference), onecritical obstacle to this integration, that linguists are in a very good position to address, is how competence relates to performance (or, the grammar-parser relation) on a mechanistic level. 

Ian is clearly very thoughtful. But I was worried by his response, because it means he is missing the writing on the walls. Perhaps this is in part because the situation is better in Europe. The cognitive revolution was born in the United States, and I believe that it is also the place of its potential deathbed. The signs may be clearer here than in Europe. Altogether, if the orcs are about to invade your homeland, threatening to rape and pillage, you don’t keep doing what you’re doing while noting that the situation isn’t your fault because you were always nice to the orcs. Instead, you prepare to fight the orcs. And if there is one thing that Cedric and I heartily agree on, the orcs are here.

Tobias Scheer

Tobias’s main point at the roundtable was that there is still a lot of work to do on the competence model before it can be expanded into a performance model of online processing. This is perhaps the best counter to my argument for working on the performance model – that it’s a good idea, but that there are practical limitations of the sort Chomsky outlined in Aspects that have not gone away.

As I discussed in my previous blog post, I often find myself remarking how important this issue is – language behavior is so complicated, and if you add on the complexities of neuroimaging experiments, it is hard to really make anything coherent out of it. The competence performance distinction has been invaluable to making progress.

The main question is whether or not it is possibleto make progress in working on performance. With respect to language, the competence-performance distinction is an absolutely necessary abstraction that allows for focus on a small set of possible data that still allows for analyzing a wide range of constructions across the world’s languages and for theoretical development to occur. The disagreement concerns whether or not it is possible at this time to move beyond this particular abstraction to other, slightly less focused, abstractions, such as a model of real-time language processing that can account for simple constructions and the acquisition of such a model.

This is an empirical assessment. It’s pretty much impossible to understand what the hell people do mechanistically when they perceive a garden-path sentence (much less interpret a neuroimaging experiment on garden-path sentences). But, in my view, it is possible to largely preserve the virtues of the competence-performance distinction with respect to limiting the relevant set of data by only aspiring to develop a performance model for fairly simple cases, such as simple transitive active and passive sentences.

In addition, there might be something (a lot?) to be gained about thinking in real-time that could explain troublesome phenomena from the traditional standpoint of linguistic theory. For instance, I don’t know of any better approach to the constituency conflicts Colin Philips pointed out in his 1996 dissertation and 2003 LI paper than the idea that sentences are built incrementally, which naturally accounts for the conflict of constituency tests[1]. There may be many more such phenomena that could be addressed from the standpoint of real-time processing that help simply competence model itself. How do you know until you try?

Ianthi Maria Tsimpli

Ianthi’s talk at the conference presented data illustrated differences in language behavior between monolingual and bilingual speakers.

Ianthi Tsimpli’s comments at the roundtable and much of her other work points out that there are really fantastic opportunities to make more direct connections between work on developmental disabilities and theories of universal grammar, i.e. the genetic contribution of the language faculty. Ianthi was one of the main scientists who studied Christopher, the savant who was severely intellectually disabled yet able to learn many languages fluently. She summarized for me some of the major findings on Christopher regarding British Sign Language (BSL), which I believe illustrate the biological (and likely genetic) autonomy of language from other aspects of cognition.

There are three main facts.

(1) Christopher did essentially as well as L2 learners in learning BSL, despite his severe mental handicap. This is important because it reinforces the notion (that I believe is not really in need of reinforcing) that sign languages are like spoken languages in all the relevant psychological aspects, including the distinction between language and other cognitive domains, but more importantly that language is something differentfrom other things, potentially with distinct biological underpinnings.

(2) The one domain where Christopher struggled is on classifier constructions, which rely heavily on visual-spatial abilities that Christopher is already impaired on. This is not a very interesting except for the fact that it clarifies the nature of what may seem like important differences between speech and sign – if you cannot process certain rapid formant transitions because of a disorder of your auditory periphery, you probably will not learn consonant speech sounds very well, but this is merely a barrier to processing speech, not an indicator that the deeper levels of organization between speech and sign are fundamentally different. The same with classifiers – they are exciting properties of sign language that clearly rely heavily on the visual-manual nature of sign languages, but this does not mean much about their more abstract organization in the language system.

Again – there is something about languageitselfthat is not reducible to sensory-motor externalization of language.

(3) Christopher essentially ignored the iconic properties of signs when acquiring the lexicon, whereas hearing L2 learners are quite sensitive to them. This underscores that language acquisition, at its core, really doesn’t care about iconicity, and indicates that while the study of iconicity may be interesting to some, it is orthogonal to the essential properties of language, which are its abstractness and arbitrariness. This fact has been clearly lain out for decades (see e.g. Bellugi and Klima’s paper “Two faces of sign: Iconic and abstract”), but again, is reinforced by Christopher’s remarkable ability to learn BSL effectively while ignoring its iconic elements.

In the roundtable discussion, Ianthi pointed out that there are problems waiting to be addressed that would greatly benefit from the insights of generative grammar. To me, these are the golden opportunities – there are a wide range of disorders of language use, and working on them presents opportunities for collaboration with biologically-oriented fields that generally have much greater funding than linguistics (i.e., language disorders, neuroscience, genetics). I recommend the book chapter she wrote with Maria Kambanaros and Kleanthes Grohmann[2](edited by Ian Roberts), which discusses in more detail some of this work and highlights the possibilities for fruitful interaction.

Franc (Lanko) Marušič

Lanko Marušič’s talk reported behavioral experiments attempting to explain the roughly universal adjective ordering preferences addressed in the cartographic approach of Cinque. The idea was that if preferences for certain features (such as color, size, shape) come from non-linguistic cognition, then one should find the same preferences in non-linguistic cognition. Thus he reported behavioral experiments that attempted to map out the salience of these features of experimental subjects, ascertaining whether the results agreed with the linguistic ordering preferences. The experiments themselves were a bit complicated and difficult to interpret, as there were many possible confounding variables that the experimenters attempted to grapple with (again, illustrating the deep pitfalls of investigating performance generally). However, this was an experiment that certainly was interesting to me and is exactly the type of thing to interest non-linguists in linguistics.

Outside of the conference, I spent time talking with Lanko. In our informal conversations, he mentioned to me the possibility of attempting to localize syntactic representations in the brain by building off our understanding of the interfaces that syntax must deal with: the conceptual-intentional (CI) and sensory-motor (SM) interfaces. I.e., if language is accurately captured by the Y-model, then syntax should be in the middle of CI and SM. This is a great idea, and happens to be a cornerstone of the model I am currently developing with Greg Hickok. This illustrates that there can in fact be value for neuroscience taken from linguistics – not at the level of a particular construction, but at the high-level of broader components of linguistic theory. Like the Y-model, cyclicity, etc.

Closing thoughts

Most of the conference presentations were not concerned with the questions I addressed above. Most posters and talks addressed standard questions discussed in linguistics conferences – and these presentations were, for the most part, excellent. I was very happy to be part of this and to remind myself of the high quality of work in linguistics. One of the virtues of linguistics is that it is highly detailed and reflects the health of a mature field – one does not need a general introduction to acceptability judgments, the competence/performance distinction, etc. to understand the talk. These shared underlying assumptions allow for very efficient presentations and discussions as well as progress, at least in terms of analyses of specific constructions.

In some sense, as discussed above, the crisis that I (and Cedric) perceive in linguistics, in the context of the other cognitive sciences, is unfair to linguistics – other fields suffer from the same problems, and there are plenty of healthy aspects to the field. Linguists in general seem more thoughtful about the underlying philosophical issues of science than those in other fields, as evidenced by my conversations with the conference attendees (and particularly keynote speakers).

On the other hand – the crisis is obviously there, deserved or not. I spend much time talking to linguists about the job prospects for graduate students. It seems to me that what linguistics isdoing to address this issue is to shift from theoretical focus to working on issues that have a more superficial appeal to other fields, or that can provide training for jobs outside of linguistics (i.e., computational modeling). This might be helpful for getting jobs, but I worry that it essentially hastens the abandonment of the core questions of interest underling generative grammar: Humboldt’s problem, Plato’s problem, Broca’s problem, Darwin’s problem.

In my view, there is a fantastic opportunity at hand: a preservation of these core philosophical problems as well asjobs. And this is working towards a performance model. This project, broadly construed, could include work along many dimensions, including much of the current kind of work that is being done: understanding the appropriate analysis of constructions/linguistic phenomena from the vantage point of incremental derivation, in the style of Phillips’s (2003) analysis of constituency conflicts. With respect to my world, it could mean developing a more realistic understanding of how linguistic theory relates to neurons and genes. In-between, it could involve the development of a plausible parser/producer that incorporates a syntactic theory (work that Shota Momma is currently pursuing).

At any rate, that’s my two cents. SinFonIJA was a lot of fun, and I cannot thank Marta, Ewa, and Mateusz enough for inviting me and being quite generous in their accommodations. At some point in the near future, conference proceedings will be published in the Journal of Polish Linguistics (edited by Ewa Willim) – stay tuned for what I hope is a very interesting set of papers.

[1]Phillips, C. (1996). Order and structure(Doctoral dissertation, Massachusetts Institute of Technology). Phillips, C. (2003). Linear order and constituency. Linguistic inquiry34(1), 37-90.

Thursday, November 8, 2018

Guest Post by William Matchin: Reflections on SinFonIJA 11, Part 1

Before posting this, let me encourage others to do what William is doing here. Send me stuff to post on FoL. The fact that I have become fat and lazy does not mean that others need go to seed as well. This is the first of two posts on the conference. 


I thought that I would write my thoughts about SinFonIJA 11 in Krakow, Poland, which just finished this past weekend. It was organized by three professors in the Dept. of English Studies at Jagiellonian University in Krakow: Marta Ruda, a former visiting scholar at UMD Linguistics, Mateusz Urban, and Ewa Willim, who was Howard Lasnik’s student and the recipient of the infamous manuscript ‘On the Nature of Proper Government’[1]. All three of them were gracious hosts and the conference was very well organized, informative, and fun. SinFonIJA is a regional[2]conference on formal linguistic analysis focusing on syntax and phonology, but as a neuroscientist, I felt quite welcome and many of the attendees expressed interest in my work. Kraków is a beautiful city and definitely worth visiting, to boot; if you ever visit, make sure to see the Wieliczka salt mine[3].

I suppose my sense of welcome was helped by the fact that the main theme of the conference was “Theoretical Linguistics within Cognitive science” – I was invited to chair a round table discussion on how linguistics is getting on with the other cognitive sciences these days. Linguistics was a founding member of the modern cognitive sciences during the cognitive revolution in the 50s and 60s – perhaps the founding member, with the work by Chomsky in Generative Grammar stimulating interest in deeper, abstract properties of the mind and articulating an alternative vision of language from the dominant behaviorist perspective. Marta was the key instigator of this theme – this was a frequent topic of discussion between us while we were both at the UMD Linguistics dept., which has a unique capacity to bridge the gaps between formal linguistic theory and other fields of cognitive science (e.g., acquisition, psycholinguistics, neuroscience). The invited keynote speakers comprising the round table addressed foundational questions underlying linguistic theory as well as the relation between formal linguistics and the cognitive sciences in their own talks. The main part of this post will reflect on this topic and the roundtable discussion, but before that I’d like to discuss Zheng Shen’s talk, which highlighted important issues regarding the methods in formal linguistics. Much of what I say here reiterates a previous post of mine on FoL[4].

Methods and data in formal linguistics

Lately there has been noise about the quality of data in formal linguistics, with some non-formal linguists calling for linguists to start acting more like psychologists and report p-values (because if you don’t have p-values, you don’t have good data, naturally). My impressions are that these concerns are greatly exaggerated and a non-sequitur. If anything, my feelings are that formal linguistics, at least of the generative grammar variety, is on a greater empirical footing than psycholinguistics and neurolinguistics. This is because linguistics rightly focuses on theoretical development, with data as a tool to sharpen theory, rather than a fixation on data itself. This is illustrated well by Shen’s talk.

Shen began by discussing his analysis of agreement in right node raising (RNR) and its empirical superiority over other accounts (Shen, 2018[5]). His account rested on a series of traditional informal acceptability judgments, consulting a small number of native speakers of English to derive the patterns motivating his analysis. Interestingly, other authors offered a competing account of agreement in RNR, which was not just an alternative analysis but included conflicting data patterns – the two papers disagreed on whether particular constructions were good and bad (Belk & Neelman, 2018) (see the abstract submitted by Shen for details[6]). Shen then performed a series of carefully designed acceptability judgment experiments to sort out the source of the discrepancy, ultimately obtaining patterns of data from large groups of naïve participants that essentially agreed with his judgments rather than Belk & Neelman’s. 

Psychologists (particularly Ted Gibson & Ev Fedorenko) have been heavily critical of methods in formal linguists of late, claiming that informal acceptability judgments are unreliable and misleading (Gibson & Fedorenko, 2010; 2013; their claim of weak quantitative standards in linguistics has been directly contradicted by the exhaustive research of Sprouse & Almeida, 2012; 2013, which illustrates a replication rate of 95-98% of informal judgments presented in a standard syntax textbook as well as a leading linguistics journal with naïve subjects in behavioral experiments[7],[8]). This disagreement about data with respect to RNR appears to support these attacks on formal linguistics by providing a concrete example.

This critique is invalid. First, the two sets of authors agreed on a large set of data, disagreeing on a small minority of data that happened to be crucial for the analysis. The two competing theoreticalaccounts highlighted the small discrepancy in data, leading to a proper focus on resolving the theoretical dispute via cleaning up the data point.

Second, Shen’s original judgments were vindicated. In other words, the behavioral experiments essentially replicated the original informal judgments. In fact, Shen noted several quite obvious issues with the use of naïve subjects, in that they may not be sensitive to making judgments under particular interpretations – that is, they may judge the string to be acceptable, but not under the crucial interpretation/structural analysis under consideration. It took a large amount of work (and I assume money) to handle these issues with multiple experiments to (in a nutshell) merely replicate informal judgments that were obtained far more rapidly and easily than the experiments. Essentially, no new data points were obtained – only replications. It is not clear why Shen and Belk & Neelman disagreed on the data (potentially because of dialect differences, British vs. American English) – but it certainly the problem was not with Shen’s informal judgments.

These two facts inform us that while large-scale experiments can be useful, they are not the drivers of research. Shen’s hard work provided replications in the context oftwo detailed, competing theoretical analyses. The experimental data were only acquired after the theoretical analyses were proposed, and those analyses were based on informal judgment data. If we take Gibson & Fedorenko’s (2010) demands for eschewing informal judgments entirely, then we would end up with disastrous consequences, namely slavishly collecting mass amounts of behavioral data, and spending inordinate amounts of time analyzing that data, all in the absence of theoretical development (which is one of the drivers of the un-replicability plague of much of social psychology). Theory should drive data collection, not the other way around.

With that said, the next section changes gears and discusses the special topic of the conference.

Theoretical linguistics within cognitive science: a crisis?

First, I will summarize my introduction to the round table and the main feelings driving what I and Cedric Boeckx perceive to be a crisis regarding the place of formal linguistics in the cognitive sciences – from my perspective, cognitive neuroscience specifically. As I pointed out in a previous blog post on Talking Brains[9], this crisis is well-illustrated by the fact that the Society for the Neurobiology of language has never had a formal linguist, or even a psycholinguist, present as a keynote speaker in its 10 years of existence, despite many presentations by neuroscientists and experts on non-human animal communication systems.

I think there are many reasons for the disconnect – paramount among these a lack of appreciation for the goals and insights of linguistic theory, sociological factors such as a lack of people who are knowledgeable of both domains and the objectives of both sides, and likely many others. My main point was not to review all of the possible reasons. Rather, I thought it appropriate when discussing with linguists to communicate what is possible for linguists to do to rekindle the interaction among these fields (when I talk to cognitive neuroscientists, I do the opposite – discuss what they are missing from linguistics). I used my own history of attempting to bridge the gaps among fields, raising what I perceived to be a frustrating barrier - the competence/performance distinction. Consider this line from Aspects (Chomsky, 1965), the authoritative philosophical foundation of the generative grammar research enterprise:

… by a generative grammar I mean simply a system of rules that in some explicit and well-defined way assigns structural descriptions to sentences”

The idea that language is a system of rules is powerful. In the context of the mentalistic theory of grammar, it embodies the rejection of behaviorism in favor of a more realistic as well as exciting view of human nature – that our minds are deep and, in many ways, independent of the environment, requiring careful and detailed study of the organism itself in all of its particularities rather than merely a focus on the external world. It calls for a study of the observer, the person, the machine inside of the person’s head that processes sentences rather than the sentences themselves. This idea is what sparked the cognitive revolution and the intensive connection between linguistics and the other cognitive sciences for decades, and led to so many important observations about human psychology.

For a clear example from one of the conference keynote speakers: the work Ianthi Tsimpli did on Christopher, the mentally impaired savant who apparently had intact (and in fact, augmented) ability to acquire the grammar of disparate languages[10], including British Sign Language[11], in the face of shocking deficits in other cognitive domains. Or my own field, which finds that the formal descriptions of language derived from formal linguistic theory, and generative grammar in particular – including syntactic structures with abstract layers of analysis and null elements, or sound expressions consisting of sets of phonological features that can be more or less shared among speech sounds – have quite salient impacts on patterns of neuroimaging data[12],[13].

However, it is one thing to illustrate that hypothesized representations from linguistic theory impact patterns of brain activity, and another to develop a model for how language is implemented in the brain. To do so requires making claims for how things actually work in real time. But then there is this:

“... agenerative grammar is not a model for a speaker or a hearer ... When we say that a sentence has a certain derivation with respect to a particular generative grammar, we say nothing about how the speaker or hearer might proceed ... to construct such a derivation”.

The lack of investigation into how the competence model is usedposes problems. It is one thing to observe that filler gap dependences – sentences with displaced elements involving the theoretical operation Movement(or internal merge, if you like) – induce increased activation in Broca’s area relative to control sentences (Ben-Shachar et al., 2003), but quite another to develop a map of cognitive processes on the brain. Most definitely it is not the case that Broca’s area “does” movement[14].

It is clearly the case that linguists would like to converge with neuroscience and use neuroscience data as much as possible. Chomsky often cites the work of Friederici (as well as Moro, Grodzinsky, and others). For instance, in Berwick & Chomsky’s recent book Why Only Us they have a central part of the book devoted to the brain bases of syntax, adopting Friederici’s theoretical framework for a neurobiological map of syntax and semantics in the brain. Much of my work has pointed out that Friederici’s work, while empirically quite exceptional and of high quality, makes quite errant claims about how linguistic operations are implemented in the brain.

Now, I think this issue can be worked on and improved upon. But how? The only path forward that I can see is by developing a model of linguistic performance – one that indicates how linguistic operations or other components of the theory are implemented during real-time sentence processing and language acquisition. In other words, adding temporal components to the theory, at least at an abstract level. This was my main point in introducing the round table – why not work on how exactly grammar relates to parsing and production, i.e. developing a performance model?

At the end of Ian Roberts’s talk, which quite nicely laid out the argument for strict bottom-up cyclicity at all levels of syntactic derivation, where there was some discussion about whether the derivational exposition could be converted to a representational view that does not appeal to order (of course it can). Linguists are compelled by the competence/performance distinction to kill any potential thinking of linguistic operations occurring in time. This makes sense if one’s goal is to focus purely on competence. With respect to making connections to the other cognitive sciences, though, the instinct needs to be the reverse – to actually make claims about how the competence theory relates to performance.

Near the end of my talk I outlined three stances on how the competence grammar (e.g., various syntactic theories of a broadly generative type) relates to real-time processing (in this context, i.e. parsing):

1.    The grammar is a body of static knowledge accessed during acquisition, production, and comprehension (Lidz & Gagliardi, 2015).This represents what I take to be the standard generative grammar view – that there is a competence “thing” out there that somehow (in my view, quite mysteriously) mechanistically relates to performance. It’s one thing to adopt this perspective, but quite another to flesh out exactly how it works. I personally find this view to be problematic because I don’t think there are any other analogs or understandings for how such a system could be implemented in the brain and how it constrains acquisition and use of language (but I am open to ideas, and even better – detailed theories).

2.    The grammar is a “specification” of a parser (Berwick & Weinberg, 1984; Steedman, 2000).The idea is that there really is no grammar, but rather that the competence theory is a compact way of describing the structural outputs of the “real” theory of language, the performance models (parser/producer). If this is so, that’s quite interesting, because in my view it completely deprives the competence model of any causal reality, which completely removes its insight into any of the fundamental questions of linguistic theory, such as Plato’s problem – how language is acquired. I do not like this view.

3.    The grammar is a real-time processing device, either directly (Miller, 1962; Phillips, 1996) or indirectly (Fodor et al., 1974; Townsend & Bever, 2001) used during real-time processing and acquisition.I very much like this view. It says that the competence model is a thing that does stuff in real time. It has causal powers and one can straightforwardly understand how it works. While I don’t think that the models advocated for in these citations ultimately succeeded, I think they were spot on in their general approach and can be improved upon.

While I personally heavily favor option (3), I would love to see work that fleshes out any of the above while addressing (or leading the way to address) the core philosophical questions of linguistic theory, as discussed by Cedric Boeckx’s.

Part 2 of this post raises and addresses some of the comments by the keynote speakers on this topic.

[1]If you don’t know this story you would best hear about it from the original participants.
[2]The regional domain consists of the former Austro-Hungarian Empire. This divides the borders of current countries, so Krakow is in but Warsaw is out.
[3]Wielizca is no average mine – it was in parts beautiful and educational. It is way more fun than it sounds.
[5]Doctoral dissertation.
[7]Gibson, E., & Fedorenko, E. (2010). Weak quantitative standards in linguistics research. Trends in cognitive sciences14(6), 233-234; Gibson, E., & Fedorenko, E. (2013). The need for quantitative methods in syntax and semantics research. Language and Cognitive Processes28(1-2), 88-124.; Sprouse, J., & Almeida, D. (2012). Assessing the reliability of textbook data in syntax: Adger's Core Syntax. Journal of Linguistics48(3), 609-652; Sprouse, J., Schütze, C. T., & Almeida, D. (2013). A comparison of informal and formal acceptability judgments using a random sample from Linguistic Inquiry 2001–2010. Lingua134, 219-248.
[8]95-98% is probably an underestimate, because there are likely cases where subjects incorrectly report their judgments without properly making the judgment under particular interpretations, etc. However, even taking the 95-98% number at face value, what do we think the replication rate is in certain fields of social psychology? Are formal linguists really supposed to change their way of doing things to match a field that is notoriousthese days for lack of rigor?
[10]Smith, N. V., & Tsimpli, I. M. (1995). The mind of a savant: Language learning and modularity. Blackwell Publishing.
[11]Smith, N., Tsimpli, I., Morgan, G., & Woll, B. (2010). The signs of a savant: Language against the odds. Cambridge University Press.
[12]Brennan, J. R., Stabler, E. P., Van Wagenen, S. E., Luh, W. M., & Hale, J. T. (2016). Abstract linguistic structure correlates with temporal activity during naturalistic comprehension. Brain and language157, 81-94.
[13]Okada, K., Matchin, W., & Hickok, G. (2018). Phonological Feature Repetition Suppression in the Left Inferior Frontal Gyrus. Journal of cognitive neuroscience, 1-9.
[14]For evidence on this point see the following papers. Wilson, S. M., & Saygın, A. P. (2004). Grammaticality judgment in aphasia: Deficits are not specific to syntactic structures, aphasic syndromes, or lesion sites. Journal of Cognitive Neuroscience16(2), 238-252. Matchin, W., Sprouse, J., & Hickok, G. (2014). A structural distance effect for backward anaphora in Broca’s area: An fMRI study. Brain and language138, 1-11. Rogalsky, C., Almeida, D., Sprouse, J., & Hickok, G. (2015). Sentence processing selectivity in Broca's area: evident for structure but not syntactic movement. Language, cognition and neuroscience30(10), 1326-1338.