Comments on Faculty of Language: The Generative Death March, part 3. Whose death is it anyway?

"They therefore concluded that performance di...

2024-05-25T03:13:48.188-07:00

"They therefore concluded that performance did not mask competence in this case. Pretty simple falsification, right?"

No ... this is a laughably bad misunderstanding of falsifiability. A falsifiable theory is one that *could* be false, and *if false* could be shown to be false. But if you actually succeed in falsifying your theory then your theory *is false*. e.g., the Theory of Evolution is falsifiable, and *could have been false, and shown false, in some other possible world*, but it can't be shown false in *this* world, because it isn't false. You have essentially said that your linguistic theory has been shown false in this world. (But you're wrong about that.)

"I find it hard to believe that air is a gas....

2024-05-25T02:47:15.800-07:00

"I find it hard to believe that air is a gas. Indeed, I find it hard to believe that air is even a thing. But a little bit of science reveals that it's there. Same deal for grammar."

This is a stunningly disingenuous and stupid response.

Her article is trash. "The SA article claims...

2024-05-25T02:44:08.965-07:00

Her article is trash.

"The SA article claims that recent research has discredited Noam Chomsky’s theory of Universal Grammar and that scientists are happily throwing it away."

No it doesn't.

"this claim, i.e., that performance can’t mas...

2024-05-25T02:40:25.916-07:00

"this claim, i.e., that performance can’t mask competence"

This is stunningly incompetent. A claim that P is unfalsifiable is not a claim that P is false.

There's another point, which is that even if t...

2016-10-07T01:18:16.984-07:00

There's another point, which is that even if there proves to be no UG, interpreting UG as a single, fixed grammar-writing notation that is useful for explaining how languages can be learned (by making some generalizations easier to acquire than others), and clearly better for that purpose than a wide range of alternatives, there are still very large numbers of extremely precise and non-variable regularities that can be described with grammatical rules. Mixed up with stuff that seems to be mushier, but the regularities of particular languages are there, and the sharpness of many of them has been evident to many people for a long time before Chomsky, such as Sapir in his 1922 book 'Language'. So any neural architecture has to deal with this, however unexpected it seems.

2016-10-07T01:11:37.831-07:00

This comment has been removed by the author.

@Mortimer. There’s obviously quite a lot in your c...

2016-10-06T16:03:17.253-07:00

@Mortimer. There’s obviously quite a lot in your comments that deserves a response, but since others have already responded to the main points, I just wanted to focus on the following:

The ultimate products of Chomskyan endeavors are neither immediately useful except in doing more Chomskyanism, nor do I see any reason to believe that they correspond any better to the underlying structure -- i.e. contribute to our actual knowledge of the structure of human language -- than traditional grammars do.

Research in generative syntax has uncovered lots of phenomena which went completely unnoticed by traditional grammarians. Many of these are highly informative as to the structure of human language. Antecedent-contained deletion would be one example. Without a formal(ish) theory of ellipsis and quantifier scope, there is nothing at all noteworthy or puzzling about a sentence such as “John read every book that Mary did”. But as soon as you start to construct such a theory, you realize that the answers to fundamental questions about syntax, semantics and the syntax/semantics interface hang on the correct analysis of these sentences.

What do you think about this? Do you think that linguists should stop investigating ACD and stop trying to figure out the rules and principles that underly it? Or do you think that they should keep doing this but somehow make everything “squishier”? Or, let’s take your statement that we should “be focusing on doing useful things with [the black box]’s output and not saddling ourselves with an overly rigorous rule system, which is liable to bias our observations.” How might we actually follow this advice in the case at hand? Is the system of rules assumed in typical analyses of ACD too rigorous? If so, how might we fix this problem?

@Mortimer: Thank you for your thoughtful reply. I...

2016-10-06T10:49:42.690-07:00

@Mortimer: Thank you for your thoughtful reply. I do not believe, nor do I think linguists in general believe, that in any particular instance the concrete generalization being made is 'right'. What I believe is happening is that we are getting a better handle on the nature/kind/forms of generalizations that seem relevant. I think that this is of fundamental importance.

For example, near consensus is that the kinds of constructions manifested in natural language are mildly context sensitive. This allows us to dig down into the nature of things that can 1) describe only such patterns, 2) use (parse/generate) such patterns, and 3) induce such patterns from data. We need to understand this because whatever we are actually doing when we learn language, the kinds of generalizations we end up with fall into this very restricted set. By recognizing this, it gives us principled guidance into an otherwise even more horribly underdetermined problem.

Having a good understanding of the kinds of mechanisms which are capable of expressing the relevant kinds of generalizations allows for principled approaches to engineering tasks. I think that Kevin Knight at the ISI is a fantastic example of this, as he is using a kind of graph model for machine translation that corresponds to exactly this class of patterns.

I would be interested to know if you felt that there were a case in which some insight into the nature of a problem was gained by training a neural net to perform well according to some metric on that problem. I personally feel that neural nets are the wrong level for understanding; while there were certainly monsters who could read assembly as Neo did the code for the matrix, a high level language makes everything easier to comprehend. This is what is motivating my question.

@William Matchin That is my hope, too. Chomskyans...

2016-10-06T09:04:45.941-07:00

@William Matchin

That is my hope, too. Chomskyans' independent rediscovery of Case and the like makes me cautiously optimistic that they will eventually rejoin the rest of us in expanding our beautiful stamp collection.

Thank you for the pointer -- I am currently working my way backwards through this blog one breakfast at a time, and I'll be reaching your posts soon :)

@ Mortimer I hate self-referencing, but I underst...

2016-10-06T07:59:33.642-07:00

@ Mortimer

I hate self-referencing, but I understand your impressions on this topic and I'd suggest you take a look at my "Brains & Syntax" posts from early September. There I propose a rough theory that reconciles generative syntax with what appear to be reliable generalizations about the nature of language use.

I think that generative syntacticians have the right goals and have made massive progress in understanding the nature of the faculty of language, and that if we work a bit to propose the right linking theory, then we can incorporate the successes of non-generative approaches and more progress can potentially be made.

@Greg Kobele: I also believe that there are "...

2016-10-06T07:03:31.695-07:00

@Greg Kobele:

I also believe that there are "very robust regularities within and across speech communities". This is unsurprising, seeing as humans are all basically the same and view the world in broadly the same way. I also don't reject the idea of biologically inherited language-forming systems, so some of those regularities being especially robust or even universal is similarly unsurprising.

"That you expect the generalizations to be 'squishier'? Again, they seem not to be"

The generalisations only seem non-sqhishy if you handwave away the unreliability in actual performance and the disagreement in judgement between native speakers. The number of actual, 100% universal linguistic universals we have discovered can be counted on hands and feet, and even some of those may be accidents of history (would our human sound inventory include clicks if the Khoisan languages had gone extinct 2000 years ago?).

The problem of understanding and describing black boxes is exactly why I contend that linguistics should focus on what it can do given our current understanding of the human mind and our best ability to come up with "rules" for it -- statistically, rather than in absolutes. The ultimate products of Chomskyan endeavours are neither immediately useful except in doing more Chomskyanism, nor do I see any reason to believe that they correspond any better to the underlying structure -- i.e. contribute to our actual knowledge of the structure of human language -- than traditional grammars do. !!!Which is not to say that I think traditional grammars are an accurate representation of the underlying system!!! I think that for all intents and purposes, the system is a black box and will stay that way for a long time, which means we should be focusing on doing useful things with its output and not saddling ourselves with an overly rigorous rule system at this time, which is liable to bias our observations.

"To be sure, you are in a great position to do stuff and make money, but that is not the goal of linguistics." That's exactly the problem! We currently lack data, technology and scientific skills in many interconnected disciplines to accomplish that "goal" of linguistics. We can't do it. My complaint with Chomskyans isn't that they have the wrong goal (I find the idea of a universal grammar immensely appealing, and as I said previously, I do believe that something of the sort exists), but that they are desperately premature and hence wasting their efforts. They are Greek philosophers trying to come up with quantum physics by watching a ball game.

@Jeff Lidz:

If you think I am making an argument from incredulity, I respectfully suggest you read my post again.

Oh hey, I wasn't actually expecting to get any...

2016-10-06T07:03:01.986-07:00

Oh hey, I wasn't actually expecting to get any replies, seeing as I came so late to the discussion (I got here via a recent Pharyngula post) :)
Since the system doesn't seem to allow individual replies, I'll try to respond to people in order -- though I reserve the right to talk to one person at a time if I feel Í'm being dogpiled later. I apologise in advance for getting different people's arguments mixed up in my head.

@Omer Preminger:

Your quantum physics analogy is instructive, I think. The underlying non-squishy quanta of human language are the same as the underlying non-squishy quanta of everything else -- namely, quantum physics (which is an incomplete model of course, but we'll assume that Universal String Theory or whatever will also turn out to be non-squishy).
But you will notice that quantum physics is no use in describing a game of football, nor did we discover it by observing football games. Chomskyans are not attempting to describe language in terms of quantum physics, they are attempting to find higher-level rules, while ignoring the messiness of something as complex as a brain at this scale. My contention is that there's no reason to think such tidy higher-level rules exist, and that this assumption goes counter to our experience with other, similarly complex areas of human cognition. Related fields like psychology seem to understand this and have changed their methods accordingly. Meanwhile, Chomsykans handwave away this complexity by claiming that "performance masks competence" etc., rather than going with the face evidence which suggests that "competence" isn't an absolute (and therefore can't be controlled by clockwork-tidy logical rules).

On her sewing/knotting blog, a BA student of mine ...

2016-10-05T13:52:04.444-07:00

On her sewing/knotting blog, a BA student of mine from about 10 years ago wrote a scathing response to the IT paper: "Scientific American says Universal Grammar is dead: a response" http://woolandpotato.com/2016/10/05/scientific-american-says-universal-grammar-is-dead-a-response/

As the title suggests, she not only takes on IT, but also questions the judgment of the Scientific American editors. It is an easy read and a good thing to give to your non-linguist friends who ask you about the death of UG--send them to Allison who describes herself thus:

"I sew my own clothes. I knit my own sweaters. I throw pots. I spin fibre into yarns and dye them with plant based dyes. I weave, on occasion. I tat too, when the mood strikes."

And she understands generative linguistics.

@Mortimer, I find it hard to believe that air is a...

2016-10-04T15:56:59.445-07:00

@Mortimer, I find it hard to believe that air is a gas. Indeed, I find it hard to believe that air is even a thing. But a little bit of science reveals that it's there. Same deal for grammar.

@Mortimer: That is a very strange argument. Lingu...

2016-10-04T09:32:53.438-07:00

@Mortimer: That is a very strange argument. Linguists believe that there are very robust regularities within and across speech communities. If you are rejecting this, you must at least say something explaining why it seems to be the case.

If you are not rejecting this, what exactly is your point? That we do not know how such generalizations are encoded in the brain? No one on this board would argue with that. That you expect the generalizations to be 'squishier'? Again, they seem not to be, and you need to say at least something in this regard (are they really, but we're not looking at them in the right way?).

The problem with neural networks, as you point out, is that (except perhaps for Bengio and Hinton) they are black boxes. If you have a performant neural network account of some phenomenon, you are in no better a position to understand that phenomenon than you were before. To be sure, you are in a great position to do stuff and make money, but that is not the goal of linguistics.

There is also very interesting work on compiling 'rules' into networks (see e.g. Smolensky and beim Graben). Beim Graben has been interested in the possibility of phase transitions, which introduce (basically) transitions between configurations which were not present in the rule-based presentation.

@Mortimer: There are some implicit assumptions in ...

2016-10-03T09:13:29.400-07:00

@Mortimer: There are some implicit assumptions in what you're saying, which I think should be brought out front and center and discussed:

assumption 1: If a set of phenomena X is squishy (or more precisely, looks squishy from our current scientific vantage point), and we hypothesize that X is generated from the interaction of a set of principles Y with a set of complex real-world factors Z, then it follows that Y must be squishy, too.

assumption 2: If our current understanding of the brain (say, neural networks) is unable to mesh with contemporary theories of linguistics (say, minimalist syntax), the onus is on the latter to change.

As others have written here before, I see no reason to believe that either of these assumptions is valid. With regard to (1), one need look no further than Gleitman's work on, e.g., odd numbers, to see that this is false. (But if one insists, one can also look at our squishy-looking physical universe which seems, nevertheless, to be underlain by quanta.) With regard to (2), looking at the brief scientific history that is currently at our disposal, there is no reason to believe this is correct, certainly not as a matter of general principle.

You may disagree, but I think it is helpful to bring (1-2) out into the surface, to at least make it clear what it is that the disagreement is really about.

I just find it hard to believe that a squishy, per...

2016-10-03T07:40:35.130-07:00

I just find it hard to believe that a squishy, perversely complicated neural network patched together over evolutionary time like the one that produces human language should run on a small set of logically precise rules that can be brought into human-readable form.

Not to mention that our knowledge of both human languages and the brain and mind seems much too limited right now to tease out those rules from all the other factors of influence even if we accept that they do exist. We're not done collecting stamps yet.

I mean, we have trouble figuring out and describing in human-readable form how exactly a virtual neural network conditioned to identify digits or pictures of cats does what it does. We end up saying things like "well, this set of transformations seems to detect rounded edges near the top...this one does...whiskers, maybe?". And that's 1. orders of magnitude simpler, 2. We can turn bits of it off to see what happens, and 3. WE BUILT IT and know exactly how it's structured and what its design principles are.

Any UG "rules" that exist are going to be fuzzy squishy unreliable tendencies strongly shaped by reinforcement and susceptible to social factors. Those are also a simple (if less satisfying) explanation for errors like in example (1): The squishy neural network takes in/conceives of the whole phrase at once, the verb is right next to a plural form, so sometimes a wire gets crossed and a plural is produced because it's "thinking about" plurals at the time. Just like you might occasionally mistake a shadow for a person (and will be more likely to do so if you've been primed to look for persons) or get the wrong cutlery from the cupboard. It's not a case of "performance masking competence", it's just that competence isn't perfect because the network isn't perfect and doesn't run on perfect rules. In that particular case we might say it has a 92% success rate of correctly (i.e. according to our desired output) choosing singular over plural in that particular construction.

As of right now we frankly have no way of telling whether attempts at UG correspond to any sort of underlying structure or are just an alternative way of describing the output.
Basically, I do believe that there is some sort of "universal grammar", and Chomsykan inquiries into it are very interesting. It's just that their conclusions as to what this grammar should specifically look like are very much in the spirit of postwar scientific hubris and its overly simplistic conception of human cognition.

Well, yes. "We cannot draw a line between li...

2016-09-16T09:32:08.255-07:00

Well, yes. "We cannot draw a line between light and darkness, yet day and night are, upon the whole, tolerably distinct." (Edmund Burke). But when it is not a question of drawing a line, but of interpenetration of meanings throughout the domain of discourse, one begins to wonder whether the distinction really has a difference attached.

English grammarians have spent centuries trying to sort out prepositions, subordinating conjunctions, and (some) adverbs: Huddleson & Pullum's view that there is only one lexical class, whose members may take a NP, a clause, or nothing as their objects in a lexically-specific way, makes for a great simplification.

I really don't understand the fuss about the c...

2016-09-14T13:37:30.793-07:00

I really don't understand the fuss about the c/p distinction. If Russian athletes in Rio, RAM upgrades, binoculars, abacuses, tools, the million-dollar man, string theories, and Beethoven can do it, why can't generative grammars? It's as if David Marr was just someone's long-forgotten uncle.

If the c/p distinction can’t be clearly applied ac...

2016-09-14T11:32:54.328-07:00

If the c/p distinction can’t be clearly applied across the board or even if it is unclear how to apply it in general, then that is regrettable, non-ideal, but such a situation doesn’t invalidate the distinction, as Jeff says, if one has a sound general argument for it and can apply it in some cases. So, and this thought goes back to Aspects chp. 1, section 1, we know that a grammar that produces some finite set of structures that explain a finite set of acceptable quotidian sentences will also produce infinitely many structures that, counterfactually, would explain the acceptability or ambiguity, say, of sentences with 15 billion clauses that are not performable, as it were, for humans (analogous remarks hold for garden paths, inter alia). So, some c/p distinction is going to be play as soon as one wants a grammar to be simple (having no stipulated finite bound) and counterfactually robust, going beyond whatever just happens to obtain. So, far from being incoherent, some c/p distinction is nigh-on necessary, which supports the kind of methodology Jeff espouses. Or so it sees to me.

There are no general solutions to any of these pro...

2016-09-14T08:56:25.106-07:00

There are no general solutions to any of these problems. Rather, people can act in good faith and build explicit theories to account for phenomena and then a productive discussion can happen. Of course it is difficult to tell whether a given effect is due to competence or performance. That certainly doesn't mean that the distinction is invalid. It means that it's hard to answer hard questions. Nobody ever said science is easy. But it's impossible if people are unwilling to consider relevant evidence or build maximally explicit models.

Hi Jeff, I think you're underestimating how d...

2016-09-14T07:46:07.654-07:00

Hi Jeff,

I think you're underestimating how difficult it is to separate performance & competence accounts. To the extent that they really are difficult if not impossible to tease apart, then the spirit of IT is accurate on this issue.

You point to the Sprouse et al as a "pretty simple" demonstration of this idea. The problem, as Ivan Sag, Laura Staum Casasanto and I pointed out, is that they didn't show their measures of memory were predictive of anything. Imagine someone said they’d decided that cranial capacity was a good measure of memory, then tried to see if cranial capacity was predictive of the magnitude of island effects, and lo and behold, it wasn’t. Well, then it would be at least premature, if not outright unscientific, to conclude that working memory is not important for island effects — because cranial capacity hasn’t been linked to anything. Sprouse et al used a memory measure, yes, but didn't show that it had any predictive value for any type of sentence acceptability contrast, and especially not for sentences differing in difficulty. The larger point here is: what is the empirical diagnostic that rigorously & consistently shows a difference between examples assumed to be defined by performance factors, however defined, and competence factors?

By extension, while the Wagers et al stuff is a fine demonstration of how off-the-shelf processing models can account for agreement mismatch effects, the conclusion that performance masks competence is not the only possible interpretation. For instance, what if these sorts of "performance" profiles drive changes in agreement systems, or whatever other dimension of the grammar you prefer? That is, if the processing exigencies of a language cause patterns to be noisy, maybe this is exactly the sort of thing that contributes to syntactic change? Or, for someone who thinks there's no performance-competence split, one could look at the agreement facts as a demonstration that there's a strong bias to have subjects and verbs agree in English, but that this bias is intimately & inextricably interwoven with the ability to detect the signal amidst the noise of the input.

Whatever the case, and regardless of what else IT have to say in their article, there is no agreed upon way of deciphering what's due to performance and what's due to competence. If you're aware of how you can show that an acceptability contrast or developmental profile is purely driven by competence or purely driven by performance, then that would do the job.

Funny thing about these "retreats" that ...

2016-09-14T02:11:53.363-07:00

Funny thing about these "retreats" that Ibbotson and Tomasello talk about – as someone pointed out to me, your average generative linguist seems to have a much better grasp of what 'ergative' means than these authors do (based on their use of it in the piece). I wonder how that ends up happening when generativists are so busy retreating from cross-linguistic data.

(I know this post is dedicated to one of the other alleged "retreats" but I couldn't resist pointing this out.)

2016-09-14T01:52:34.192-07:00

This comment has been removed by the author.