Monday, May 20, 2013

Evans-Levinson: the sound and the fury

I confess that I did not read the Evans and Levinson article (EL) (here) when it first came out. Indeed, I didn’t read it until last week.  As you might guess, I was not particularly impressed. However, not necessarily for the reason you might think. What struck me most is the crudity of the arguments aimed at the Generative Program, something that the (reasonable) commentators (e.g. Baker, Freidin, Pinker and Jackendoff, Harbour, Nevins, Pesetsky, Rizzi, Smolensky and Dupoux a.o.) zeroed in on pretty quickly. The crudity is a reflection, I believe, of a deep seated empiricism, one that is wedded to a rather superficial understanding of what constitutes a possible “universal.” Let me elaborate.

EL adumbrates several conceptions of universal, all of which the paper intends to discredit. EL distinguishes substantive universals from structural universals and subdivides the latter into Chomsky vs Greenberg formal universals. The paper’s mode of argument is to provide evidence against a variety of claims to universality by citing data from a wide variety of languages, data that EL appears to believe, demonstrate the obvious inadequacy of contemporary proposals. I have no expertise in typology, nor am I philologically adept. However, I am pretty sure that most of what EL discuss cannot, as it stands, broach many of the central claims made by Generative Grammarians of the Chomskyan stripe. To make this case, I will have to back up a bit and then talk on far too long. Sorry, but another long post. Forewarned, let’s begin by asking a question.

What are Generative Universals (GUs) about?  They are intended to be in the first instance, descriptions of the properties of the Faculty of Language (FL). FL names whatever it is that humans have as biological endowment that allows for the obvious human facility for language. It is reasonable to assume that FL is both species and domain specific. The species specificity arises from the trivial observations that nothing does language like humans do (you know: fish swim, birds fly, humans speak!). The domain specificity is a natural conclusion from the fact that this facility arises in all humans pretty much in the same way independent of other cognitive attributes (i.e. both the musical and the tone deaf, both the hearing impaired and sharp eared, both the mathematically talented and the innumerate develop language in essentially the same way).  A natural conclusion from this is that humans have some special features that other animals don’t as regards language and that human brains have language specific “circuits” on which this talent rests. Note, this is a weak claim: there is something different about human minds/brains on which linguistic capacity supervenes. This can be true even if lots and lots of our linguistic facility exploits the very same capacities that underlie other forms of cognition. 

So there is something special about human minds/brains as regards language and Universals are intended to be descriptions of the powers that underlie this facility; both the powers of FL that are part of general cognition and those unique to linguistic competence.  Generativists have proposed elaborating the fine structure of this truism by investigating the features of various natural languages and, by considering their properties, adumbrating the structure of the proposed powers. How has this been done? Here again are several trivial observations with interesting consequences.

First, individual languages have systematic properties. It is never the case that, within a given language, anything goes.  In other words, languages are rule governed. We call the rules that govern the patterns within a language a grammar. For generativists, these grammars, their properties, are the windows into the structure of FL/UG. The hunch is that by studying the properties of individual grammars, we can learn about that faculty that manufactures grammars.  Thus, for a generativist, the grammar is the relevant unit of linguistic analysis. This is important. For grammars are NOT surface patterns. The observables linguists have tended to truck in relate to patterns in the data. But these are but way stations to the data of interest: the grammars that generate these patterns.  To talk about FL/UG one needs to study Gs.  But Gs are themselves inferred from the linguistic patterns that Gs generate, which are themselves inferred from the natural or solicited bits of linguistic productions that linguists bug their friends and collaborators to cough up. So, to investigate FL/UG you need Gs and Gs should not be confused with their products/outputs, only some of which are actually perceived (or perceivable).

Second, as any child can learn any natural language, we are entitled to conclude from the intricacies of any given language to powers of FL/UG capable of dealing with such intricacies.  In other words, the fact that a given language does NOT express property P does not entail that FL/UG is not sensitive to P. Why? Because a description of FL/UG is not an account of any given language/G but an account of linguistic capacity in general.  This is why one can learn about the FL/UG of an English speaker by investigating the grammar of a Japanese speaker and the FL/UG of both by investigating the grammar of a Hungarian, or Swahili, or Slave speaker. Variation among different grammars is perfectly compatible with invariance in FL/UG, as was recognized from the earliest days of Generative Grammar. Indeed, this was the initial puzzle: find the invariance behind the superficial difference!

Third, given that some languages display the signature properties of recursive rule systems (systems that can take their outputs as inputs), it must be the case that FL/UG is capable of concocting grammars that have this property. Thus, whatever G an individual actually has, that individual’s FL/UG is capable of producing a recursive G. Why, because that individual could have acquired a recursive G even if that individual’s actual G does not display the signature properties of recursion. What are these signature properties?  The usual: unboundedly large and deep grammatical structures (i.e. sentences of unbounded size). If a given language appears to have no upper bound on the size of its sentences, then it's a sure bet that the G that generates the structures of that language is recursive in the sense of allowing structures of type A as parts of structures of type A. This, in general will suffice to generate unboundedly big and deep structures. Examples for this type of recursion include conjunction, conditionals, embedding of clauses as complements of propositional attitude verbs, relative clauses etc.  The reason that linguists have studied these kinds of configurations is precisely because they are products of grammars with this interesting property, a property that seems unique to the products of FL/UG, and hence capable of potentially telling us a lot about the characteristics of FL/UG.

Before proceeding, it is worth noting that the absence of these noted signature properties in a given language L does not imply that a grammar of L is not basically recursive.  Sadly, FL seems to leap to this conclusion (443). Imagine that for some reason a given G puts a bound of 2 levels of embedding on any structure in L. Say it does this by placing a filter (perhaps a morphological one) on more complex constructions. Question: what is the correct description of the grammar of L?  Well, one answer is that it does not involve recursive rules for, after all, it does not allow unbounded embedding (by supposition).  However, another perfectly possible answer is that it allows exactly the same kinds of embedding that English does modulo this language specific filter.  In that case the grammar will look largely like the ones that we find in languages like English that allow unbounded embedding, but with the additional filter. There is no reason just from observing that unbounded embedding is forbidden to conclude that the grammar of this hypothetical language L (aka Kayardild or Piraha) has a grammar different in kind from the grammars we attribute to English, French, Hungarian, Japanese etc. speakers.  In fact, there is reason to think that the Gs that speakers of this hypothetical language have does in fact look just like English etc.  The reason is that FL/UG is built to construct these kinds of grammars and so would find it natural to do so here as well.  Of course L would seem to have an added (arbitrary) filter on the embedding structures, but otherwise the G would look the same as the G of more familiar languages. 

An analogy might help.  I’ve rented cars that have governors on the accelerators that cap speed at 65 mph.  The same car without the governor can go far above 90 mph. Question: do the two cars have the same engine?  You might answer “no” because of the significant difference in upper limit speeds. Of course, in this case, we know that the answer is “yes”: the two cars work in virtually identical ways, have the very same structures but for the governor that prevents the full velocity potential of the rented car from being expressed.  So, the conclusion that the two cars have fundamentally different engines would be clearly incorrect.  Ok: swap Gs for engines and my point is made.  Let me repeat it: the point is not that the Gs/engines might be different in kind, the point is that simple observation of the differences does not license the conclusion that they are (viz. you are not licensed to conclude that they are just finite state devices because they don’t display the signature features of unbounded recursion, as EL seems to).  And, given what we know about Gs and engines the burden of proof is on those that conclude from such surface differences to deep structural differences.  The argument to the contrary can be made, but simple observations about surface properties just doesn’t cut it.

Fourth, there are at least two ways to sneak up on properties of UGs: (i) collect a bunch and see what they have in common (what features do all the Gs display) and (ii) study one or two Gs in great detail and see if their properties could be acquired from input data. If any could not be, then these are excellent candidate basic features of FL/UG. The latter, of course, is the province of the POS argument.  Now, note that as a matter of logic the fact that some G fails to have some property P can in principle falsify a claim like (i) but not one like (ii).  Why? Because (i) is the claim that every G has P, while (ii) is the claim that if G has P then P is the consequence of G being the product of FL/UG. Absence of P is a problem for claims like (i) but, as a matter of logic, not for claims like (ii) (recall, If P then Q is true if P is false).  Unfortunately, EL seems drawn to the conclusion that PàQ is falsified if –P is true. This is an inference that other papers (e.g. Everett’s Piraha work) are also attracted to. However, it is a non-sequitur. 

EL recognizes that arguing from the absence of some property P to the absence of Pish features in UG does not hold.  But the paper clearly wants to reach this conclusion nonetheless. Rather than denying the logic, EL asserts that “the argument from capacity is weak” (EL’s emphasis). Why? Because EL really wants all universals to be of the (i) variety, at least if they are “core” features of FL/UG. As these type (i) universals must show up in every G if they are indeed universal, absence to appear in one grammar is sufficient to call into question its universality. EL is clearly miffed that Generativists in general and Chomsky in particular would hold a nuanced position like (ii). EL seems to think that this is cheating in some way.  Why might they hold this? Here’s what I think.

As I discussed extensively in another place (here), everyone who studies human linguistic facility appreciates that competent speakers of a language know more than they have been exposed to.  Speakers are exposed to bits of language and from this acquire rules that generalize to novel exemplars of that language.  No sane observer can dispute this.  What’s up for grabs is the nature of the process of generalization. What separates empiricists from rationalists conceptions of FL/UG is the nature of these inductive processes. Empiricists analyze the relevant induction as a species of pattern recognition. There are patterns in the data and these are generalized to all novel cases.  Rationalists appreciate that this is an option, but insist that there are other kinds of generalizations, those based on the architectural properties (Smolensky and Dupoux’s term) of the generative procedures that FL/UG allow. These procedures need not “resemble” the outputs they generate in any obvious way and so conceiving this as a species of pattern recognition is not useful (again, see here for more discussion).  Type (ii) universals fit snugly into this second type, and so empiricists won’t like them.  My own hunch is that an empiricist affinity for generalizations based on patterns in the data lies behind EL’s dissatisfaction with “capacity” arguments; they are not the sorts of properties that inspection of cases will make manifest. In other words, the dissatisfaction is generated by Empiricist sympathies and/or convictions which, from where I sit, have no defensible basis. As such, they can be and should be discounted. And in a rational world they would be. Alas…

Before ending, let me note that I have been far too generous to the EL paper in one respect.  I said at the outset that its arguments are crude. How so?  Well, I have framed the paper’s main point as a question about the nature of Gs. However, most of the discussion is framed not in terms of the properties of Gs they survey but in terms of surface forms that Gs might generate.  Their discussion of constituency provides a nice example (441).  They note that some languages display free word order and conclude from this that they lack constituents.  However, surface word order facts cannot possibly provide evidence for this kind of conclusion, it can only tell us about surface forms. It is consistent with this that elements that are no longer constituents on the surface were constituents earlier on and were then separated, or will become constituents later on, say on the mapping to logical form.  Indeed, in one sense of the term constituent, EL insists that discontinuous expressions are such for they form units of interpretation and agreement. The mere fact that elements are discontinuous on the surface tells us nothing about whether they form constituents at other levels. I would not mention this were it not the classical position within Generative Grammar for the last 60 years. Surface syntax is not the arbiter of constituency, at least if one has a theory of levels, as virtually every theory that sees grammars as rules that relate meaning with sounds assumes (EL assumes this too).  There is nary a grammatical structure in EL and this is what I meant be my being overgenerous. The discussion above is couched in terms of Gs and their features. In contrast, most of the examples in EL are not about Gs at all, but about word strings. However, as noted at the outset, the data relevant to FL/UG are Gs and the absence of Gish examples in EL makes most of EL’s cited data irrelevant to Generative conceptions of FL/UG.

Again, I suspect that the swapping of string data for G data simply betrays a deep empiricism, one that sees grammars as regularities over strings (string patterns) and FL/UG as higher order regularities over Gs. Patterns within patterns within patterns. Generativists have long given up on this myopic view of what can be in FL/UG.  EL does not take the Generative Program on its own terms and show that it fails. It outlines a program that Generativists don’t adopt and then shows that it fails by standards it has always rejected using data that is nugatory.

I end here: there are many other criticisms worth making about the details, and many of the commentators of the EL piece better placed than me to make them do so. However, to my mind, the real difficulty with EL is not at the level of detail. EL’s main point as regards FL/UG is not wrong, it is simply besides the point.  A lot of sound and fury signifying nothing.


  1. I think the connection between the "empiricist affinity for generalizations based on patterns in the data" and "dissatisfaction with 'capacity' arguments" is even more direct than you suggest. When we talk about patterns that are "in the data", what we actually mean (I think) is patterns that can be recognised without reference to any domain-specific concepts, i.e. patterns that you can recognise by taking notice of things like the linear ordering of words, and do not require bringing in notions like c-command, binding domains, bounding nodes, etc. As is (very) frequently pointed out, the only thing that's up for grabs is the kind of generalisation the learner makes, not whether generalisations are made; so in a relevant sense there is no such thing as "patterns in the data". To the extent that the phrase means something, I think it means patterns recognisable without domain-specific machinery, and if the "dissatisfaction with 'capacity' arguments" actually concerns domain-specific capacity (which I guess it does), then these two things are obviously very closely connected.

    (I made a similar comment on the post about picking up patterns in decimal expansions that you linked to.)

    1. I believe you are correct. This is what I tried to suggest when discussing learning the decimal expansion of Pi. There is no way that looking at the pattern of numerals could tell you what the next one could be. Having the generative function could. Moreover, this function could be acquired by looking at the available decimals if, e.g. the choice were between two innately provided functions, e.g. the one for Pi and the one for e. However, this would be very specific knowledge and not the kind of thing that some generic search for patterns would support. So, I agree: it seems that if one thinks that the name of the game is finding the generative procedure that is your G then looking for generic patterns in the data won't get you where you want to go.

    2. You know, on second thought I am not sure I do agree. The image I have in mind is induction via feature detection and generalization. Empiricists are fine with feature detectors, those tuned to features of the stimulus/input. They are even fine with feature detectors over outputs of feature detectors (i.e. higher order feature detection). What, however, if there are no features to detect in the data? There is a pattern there, but not a pattern embedded in detectable features? This is how I see a generable pattern; one where there is no real pattern in the data save, for example, the Pi-patterns and for this one needs a Pi-generator to recognize the pattern at hand.

      Now this may be naive. But Empiricists have always worried about feature detection and feature patterns and have always thought of these as "in" the data. It is because the data is structured that the mind need not be. Rationalists have generally emphasized the deficiency of the input: it doesn't display the relevant features or the relevant patterns. Now, I agree that IF one gets feature patterns in the data then simple general pattern detection algorithms will suffice. But it is possible that even if this is false that there might be domain generalish procedures. So i'm not sure that Rationalists in general are committed to domain specific mechanisms, though in the domain of language I believe that this is virtually certain to be true.

      I hope this makes sense. It might not.

    3. You might be reading more into my comment than I intended. I didn't mean it to be anything grand about empiricism or rationalism in general. I'm just starting from the assumption that the question is "Which generalisations get made?" rather than "Do generalisations get made?", and if we accept that assumption then I don't know what "patterns in the data" can mean, for this phrase seems to suggest that there is no need to make generalisations.

      (This is just a guess, but perhaps I'm bypassing the empiricism/rationalism question altogether by making the 'which generalisations' assumption; is this inherently what would be called a rationalist position?)

    4. Hi Tim,

      I think there are such things as "patterns in the data". Or rather, once can clearly think of cases where the patterns are *not* in the data.
      For example if two words are indistinguishable, (distributionally identical) in the data, then there is no pattern in the data where they are different. So if you are an empiricist, then you are locked into saying that these two must be syntactically identical.

      Whereas, if you are not, then it is possible that there is some innate device which will distinguish between these two.

      I think the "which generalisations" question is a good way of looking at it, but that may be too empiricist for some here. This presupposes that language acquisition is a process of generalising from data rather than "growth" or triggering or some other non-inferential brute causal process.

    5. I'm not sure I follow this. What sort of innate device would a non-empiricist propose, that would distinguish between two distributionally-identical words? Do we mean something that would distinguish them based on non-distributional properties, like say assigning the longer one to the category X and assigning the shorter one to the category Y? Or something that would flip a coin, and assign one to category X and the other to category Y, but might do it the other way around the next time around? (Where by "next time around" I'm really thinking of "in the child next door".) Or is there some other way of thinking of how this could work?

    6. I am handicapped a bit by not knowing what sort of innate mechanisms are currently hypothesized, since Norbert plays his cards pretty close to his chest. But if the mechanisms are *not* looking for patterns in the data, then presumably this is possible.

      So for example, one might have an innate knowledge that there can only be one word of a given syntactic category, in which case if one assigned one word to that category, then the other couldn't be. So it might be that the first encountered word triggers something.
      But this is I think not meant to be a rational process of induction, and so there is no need for it to "make sense" from our point of view.

      Or one might rely on semantic properties, if you think of semantic bootstrapping.

      So this may seem a littel half-baked but it does seem like there is some nontrivial notion of patterns in the data, but perhaps I am filtering it too much through my own preoccupations.

    7. @ Alex: you say: "For example if two words are indistinguishable, (distributionally identical) in the data, then there is no pattern in the data where they are different. So if you are an empiricist, then you are locked into saying that these two must be syntactically identical."

      Is this a hypothetical possibility or have you an actual example of data for which this is the case?

    8. It slightly depends what you mean by data -- but for example Monday and Tuesday. Or examples of free variation like qui/qua in Italian.

    9. I cannot imagine a corpus in which Monday and Tuesday are distributionally identical. Maybe very close but identical?? I think this was my question: are we talking about data sets that have been generated by real speakers [preferably under 'natural circumstances' - like say a 'grown up version' of CHILDES] or about computer generated sets?

  2. Apologies to Tim i do not mean to ignore you but had finished writing this comment before yours appeared and I truly hope that for once Norbert will answer the questions i ask at the end...

    After reading E&L Norbert is struck by “the crudity of the arguments aimed at the Generative Program”. This is a pretty serious accusation but not backed with any citation of such crude argument. And for several of Norbert’s own arguments ‘crude’ would be a compliment. He writes: “humans have some special features that other animals don’t as regards language and that human brains have language specific “circuits” on which this talent rests”. It would be nice to be told what is the difference between ‘special features’ and ‘specific “circuits”’, or how this imprecise statement relates to anything E&L [or other empiricists] claim. Presumably somewhere in these ‘special features’ lurk the grammars, the relevant units of linguistic analysis. Of course this analysis is indirect via perceivable data and their surface pattern. Here one would expect sophisticated examples of such a backtracking analysis: surface data [SD] → G → UG. Alas, no such luck.

    Similarly unsupported claims continue: “one can learn about the FL/UG of an English speaker by investigating the grammar of a Japanese speaker and the FL/UG of both by investigating the grammar of a Hungarian, or Swahili, or Slave speaker”. This may be true but why not provide even a single example of how the investigation of Hungarian reveals a property of English grammar that could not have been easier discovered by studying English? Would this drag us down on the crude level of E&L who provided example after example in support of their arguments?

    The predictable dismissal of the arguments from Piraha reveals an even higher level of crudeness. Musings about artificial filters and irrelevant analogies to “cars that have governors on the accelerators” confirm [i] that NO empirical finding could persuade Norbert that his theory might be wrong and [ii] that he never read the arguments made in Everett [2012]. Otherwise he would know that the claim is not [to stay within the analogy] that the Piraha ‘car’ cannot go 90m/h but that it achieves this speed in another way than Norbert’s rental car [Piraha language does not have sentence recursion but discursive recursion – for short description see]. And for this reason we could not learn about this peculiarity of Piraha by studying say German. Or, to turn the argument against Norbert for a moment: if we believe that we can learn about genuine universals from studying JUST Piraha, we could conclude [entirely wrongly] that sentence recursion in not a property of human language and that those exotic languages [like English] that have sentence recursion have an added epicycle to the ‘normal UG’.

    Ignorance of actual arguments is also revealed in the attack on unnamed empiricists. Norbert claims that unlike Empiricists “Rationalists … insist that there are other kinds of generalizations, those based on the architectural properties” Had he ever read recent works of Elman or Tomasello he would know that these ‘arch empiricists’ do not deny built in architectural constraints. But why bother with such a crude activity as actually reading the arguments when one can rely on hunches?

    The final point deserves attention “EL does not take the Generative Program on its own terms and show that it fails. It outlines a program that Generativists don’t adopt” maybe for once Norbert can assume that opponents are not intentionally misrepresenting or attacking straw men but simply do not understand which of the many claims that have been made under the umbrella of 'generativism' are binding at the moment. In other words can Norbert possibly outline in specific detail WHAT the current program of generativists is? What ARE the ‘specific brain circuits’ that allow humans and humans alone to have language?

  3. Norbert's claim seems to be this: that though we observe the strings, and (partially) observe their interpretations, we are primarily interested in the grammars, which are not observed. (This part is fine).

    Therefore, an argument based on the strings alone can't refute a theory about the grammars.

    But that seems a step too far. Because that would make the theory impossible to refute, since we don't have direct evidence about the grammars, but only (at the moment) via the word strings and associated interpretations.
    So take some putative universal grammatical property -- say binary branching. What sort of evidence would refute this?

    (I am not interested in defending the E and L paper per se, as I have some reservations about their methodology too).

    1. I guess there will always be exotic combinations of unlikely circumstances, noise, and so on, that would allow for some hypothesis to be true in spite of the evidence. Even in physics, where if I understand it there are a lot of very exact and discrete-valued predictions that can be made, you still have to do statistics. "Refute" in the sense of "prove wrong" is too strong, more like "prove implausible." So what we're left with is (scientfic) inference. You've got to have a precise and quantiative predictive theory to do that kind of inference right. Of course we can still lean on the cases where theory X predicts something is outright impossible, and I guess that's what linguists try to do, but they get into trouble when they stray from that (and I think they do so quite a lot). Binary branching is something where we aren't really looking to the likelihood function over strings etc to help us out at all. So I don't doubt it will take other sources of evidence and toothful theories of the learner to sort out exactly what the syntactic structure looks like, yes. Sometimes you can find indirect routes to get the likelihood function to say something -- e.g. most of the time it doesn't matter if you've got binary branching, say, but suppose there's some sort of fancy arrangement of dependencies that only holds together in theory X if you assume this version, and maybe there's this universal constraint that would prevent you from getting the facts in this one language any other way -- but the arguments that are made now in favor of binary branching are, I think any honest syntactician would admit, weak conjectures.

    2. @Alex In this case we're not testing a theory of the grammars so much as a theory of the “grammar of grammars”, i.e., the constraints on possible grammars. An argument against a theory of UG from facts about particular languages will typically have to be quite indirect and dependent on substantial (but hopefully independently-justified) theoretical assumptions. Depending on how high you set your standards, this may or may not make such theories “unfalsifiable”, but I don't think it's useful to fuss over this kind of methodological point. It just is the case that there's no quick and easy way of figuring out what's in a person's head from the noises he makes. We just have to do our best to find ways of testing the theories that are on the table. As a matter of fact, there's plenty of internal debate within the field regarding proposed universals, so these theories don't seem to be immune from refutation in practice.

      I like Ewan's binary branching example. As he says, it's not likely that we'll find strong direct evidence from string/interpretation pairs that all syntactic structures are binary branching. But at the same time, it's not an accident that many syntacticians think that universal binary branching is quite likely to be correct. The arguments in Larson 1988 (for a binary branching VP-structure) are illustrative. They depend on various theoretical assumptions (e.g. that the binding theory should be stated in terms of c-command, that a particular construction is a simple coordination and not an instance of ‘gapping’) which in turn require extended theoretical argument to justify.

    3. So from my perspective this is more than some fussy methodological point. Take e.g. GPSG; this made a universal claim basically that all languages are CFGs,
      (being a bit imprecise and sloppy here ...) and this was refuted by Shieber's 1985 analysis of Swiss German, which concerned the string sets.
      This was a "a precise and quantiative predictive theory" to use Ewan's phrase. For me that seems how things should work.

      Just to be clear -- GPSG was a theory of grammar (according to Norbert, there is no theoretical difference beween GPSG and GB)
      -- and it was refuted by evidence about the string sets.

      This of course was only possible because GPSG had been properly formalised.

      So what is different here?

      (As an aside, I think one could make the same thing happen with binary branching using MCFGs (because there is a hierarchy of ranks unlike with CFGs).

    4. Thinking aloud here -- but the Larson 88 paper appeals to certain facts about English in support of a particular analysis. Suppose we had a language like English but where those facts were otherwise -- surely this would then be a straightforward refutation of the universal claim?

      Maybe Avery A can chip in here but I think the case-stacking in Kayardild and Martuthunira really is a refutation of certain theories of syntax, as is Greg Kobele's Yoruba relative clause argument.

    5. Suppose we had a language like English but where those facts were otherwise -- surely this would then be a straightforward refutation of the universal claim?

      @Alex It wouldn't be straightforward because Larson's arguments depend on lots of supporting theoretical assumptions. This comes out quite clearly in the debate with Jackendoff in Linguistic Inquiry in the late 80s and early 90s. Jackendoff argues for a flat VP structure and does a pretty convincing job of showing that the facts are also compatible with this hypothesis. That's not to say that it's impossible to figure out which structure is the correct one, but it takes a lot of hard theoretical work, not just a simple comparison of data with predictions.

      It's good to make precise quantitative predictions if you can, but it's very difficult to get theories of the initial state to make precise quantitative predictions about the range of acceptable string/interpretation pairs. This applies equally to empiricist theories of the initial state. Whichever way you look at it, there are just too many links in the causal chain. The relation between your initial genetically-determined brain state and the range of string/interpretation pairs you find acceptable 5 years later is just very, very complicated, and there's no way around that.

    6. On the contrary I think that precise empiricist theories tend to make precise predictions -- typically that the set of languages is in some precisely specified class. And this allows them to be refuted -- which is good!

      For example, the Berwick, Pietroski, Yankama and Chomsky, Cogsci paper from last year, that Norbert discussed in an earlier post. One way of viewing part of that paper is that it shows that for example Clark and Eyraud's learning algorithm taken as a theory of language acquisition is false, because it predicts that natural language string sets are substitutable and they are not.

    7. This comment has been removed by the author.

    8. [this is the deleted comment with some minor revisions.]

      Both nativist and empiricist theories can equally well make those sorts of predictions about language classes. E.g., one could hypothesize that UG permits only Stablerian Minimalist Grammars to be acquired. This hypothesis could then be refuted by demonstrating that some languages aren't in the relevant class of MCS string languages.

      The thing is that predictions of this sort are just not very useful for evaluating syntactic theories. For one thing, there are lots of other grammatical formalisms besides MG which generate MCS string languages. For another, a very simple modification to MG would suffice to permit the generation of non-MCS string languages (see Kobele). Predictions regarding language-classes are therefore a very coarse-grained tool for deciding between theories of UG. That's why discussion of language classes has always had a rather marginal place in the syntactic literature. It's cool to be able to make such mathematically precise predictions, but the predictions just don't tell you very much. More interesting claims regarding the nature of UG and the range of possible grammars tend to be correspondingly harder to test. That's not to say that they're untestable, just that they don't wear their predictions on their sleeves. Since truth and testability are independent and (quite possibly) non-correlated properties, I don't see that as a good reason to reject these claims.

    9. The ambiguity of the term "UG" is a problem here: what do you mean exactly?

    10. The hypothesis would be that UG is such that only Minimalist Grammars can be acquired. This hypothesis would predict that all acquirable languages are within a particular class of MCS languages (absent any further constraints).

      To make this more concrete, you could think of UG on this hypothesis as a schema for lexical items together with a set of universal inference rules of the kind Stabler presented in the early papers on MGs.

    11. @Alex C: my intention was not to defend [everything of] the E&L paper, merely to say that the points Norbert made [i] were presented in a tone that should have no room in scientific debates and [ii] did not address the genuine problems for UG [as it had been presented at the time of writing] E&L [and many others] have drawn our attention to.

      From much of the discussion I have missed I gather that those who commented seem to have adopted at least implicitly a Platonist point of view and are interested in classes of possible grammars vs. in HUMAN grammars. That is of course a perfectly legitimate line of research but it really seems odd to claim that it will lead to any insights about how the human mind works (evidence from actual language[s] seems to play only a marginal role in your debates]. So has accounting for the latter been put on the back burner or entirely abandoned?

      I further notice that you wrote: "For example, the Berwick, Pietroski, Yankama and Chomsky, Cogsci paper from last year, that Norbert discussed in an earlier post. One way of viewing part of that paper is that it shows that for example Clark and Eyraud's learning algorithm taken as a theory of language acquisition is false, because it predicts that natural language string sets are substitutable and they are not." So you accept for your own work that it can be falsified. Why is it then that for the Chomskyan paradigm no criterion for falsification exists anymore. Assuming for a moment the Piraha evidence is correct: why does it not refute the claim that sentence recursion is a language universal? What IS the evidence that could refute such a claim?

      The following question is for all: if a a set of universal inference rules as mentioned by Alex D. exists and if we can learn about genuine universals by studying ANY randomly picked language why remains so much emphasis on studying English? Would it not be more promising to have teams of linguists study Piraha? They could [i] find out whether or not Everett got the facts right [for people who do not speak a word of that language to claim he did not seems very un-scientific] and [ii] reveal universals that are shared by all languages. Of course that still leaves the non-trivial task pf testing whether these universals ARE present in other languages. And that was in my books the genuine contribution of E&L: presenting evidence from a wide variety of human languages that are actually spoken. The kind of evidence that eventually needs to be accounted for by a theory that claims to be universal.

    12. @Christina: classes of possible languages are a legitimate object of study from a naturalistic (non-Platonic) viewpoint. For example, one might say that the class of possible orbits in Newtonian astronomy is the class of conic sections. The actual orbits we observe are just some subset of this.
      If we observe an orbit which is not one of these then we need to modify some auxiliary assumption (e.g. posit an unobserved planet, e.g. Neptune or Vulcan) in order to avoid falsifying the theory.

      Norbert and Alex D are roughly arguing that Chomskyan hypotheses about universals cannot be falsified by any observable data (i.e. about strings and their interpretations); and furthermore that this, though it is undesirable, it is unavoidable given the complexity of the situation.

      (@AlexD is that a fair account of your view?)

      I on the other hand think that this is perfectly avoidable if you are sufficiently precise, and I gave an example of a proposal which is sufficiently precise to be shown to be wrong. It's uncontroversial that wrong theories are better than theories that are "not even wrong" as the jargon has it.

      So I don't think I am disagreeing with you except about Platonism, but I don't want to continue discussing that issue here.

    13. @Alex: Thanks for this. Lets continue with examples from astronomy to express the concern I have about Norbert/AlexD's approach: Ptolemy constructed a complex system of epicycles to accommodate the observation to his theory. This prevented any genuine progress in astronomy for roughly 1000 years. How do we KNOW that the UG hypothesis/program does not rest on similarly unsound foundations as Ptolemy's astronomy. For Ptolemy perfectly round orbits seemed to be the 'sacred cows' that could not be questioned no matter what observation revealed. For Chomsky it seems innate domain specific endowment/UG.

      When Paul Postal asked for laws of grammar Norbert eventually evaded answering by reminding us what a young science linguistics is. Fair point. But then should we not learn from mistakes that other young sciences made and remain open to the possibility that even [or especially] our most cherished assumptions could be wrong and take data seriously that seem to threaten them? Possibly the data E&L [and not just them!] have accumulated can be accounted for under a UG framework but at the same time maybe data from human languages can also be accounted for by alternative frameworks. Given how little we know at this point what i find most troubling about Norbert's view is not that he might be wrong but that he does not even consider the possibility he could be...

    14. @AlexD
      Thanks, I think there is a useful distinction between UG as a set of grammars and UG as the initial state, or as a theory of the LAD. In the first sense, we can just stipulate that UG defines a certain class of languages, e.g. MGs or TAGs or whatevers. But this is not then a prediction of the theory. It's just a stipulation. You also need a theory of how these can be learned (which is not available)

      Going back to the orbits examples: think of "being a conic sections" as being a universal property of orbits.
      This is a prediction of Newton's theory. We can't change the prediction of the theory. We need a different theory to get a different prediction.

      Alternatively, we could just stipulate that orbits are conic sections, without deriving them from anything deeper. In which case if we wanted we could just change it to any other set of paths (cubic splines or whatever).

      From this point of view UG stipulates a set of grammars, and we can tweak that set of grammars without any problem. In which case, you are quite right, that language classes don't tell you anything interesting. But this is like saying that the shape of the orbits isn't going to tell you anything about gravitation.

      From the other point of view, UG is a learning algorithm -- and this makes a prediction about the class of languages. If you find that natural languages are outside this class, then it refutes that theory. You can't change the prediction without also changing the theory.

      There is probably plenty for you to disagree with here, but the key point is what UG is: the two senses I distinguish tend to get conflated in Chomskyan discourse.

    15. Much as I enjoy the high level scientific analogies to Newton (remember I suffer from profound physics envy) this discussion seems to me to have gotten too abstract. I made no claim (or did not intend to make one) that said that string/interpretive properties could never be usefully empirically. What I said is that it needs to be filtered via some problem about grammars that would generate these strings. UG is about Gs. To argue against a version of UG you need to say something about Gs. The problem with EL is that they don't couch their empirical difficulties in these terms. As such, what they say is of little interest. Moreover, it takes a pretty sophisticated argument to show that a G does not work. Every now and then a simple one (Sheiber's) suffices. But more often than not, the arguments are complex. Big surprise. This is always true (which was basically Alex D's point).

      Now, I thine speculated as to why EL felt comfortable doing what they did (i.e. miss the point in the way they did). I think it's because they believe that only certain kinds of hypotheses concerning UG are really admissible and the reason they think this is due to a pretty deep empiricist pre-conception of what minds do. So, much as I like the discussion of conic sections, this is far to refined. The EL paper is not that sophisticated. And that is precisly one of its problems. The other is that its lack of sophistication is a recognizable one: it suffers from an empiricist bias that the Generative thoery of UG eschews. Thus, it cannot help but be unconvincing and crude.

    16. @Alex

      I'm really not saying that theories of UG are immune to falsification from the sort of data that linguists typically consider. I'm just saying what I think is uncontroversial: falsifiability and testability are graded properties, and it's common for scientific theories to be difficult to test. In other words, the fact that a theory does not simply churn out a list of predictions which can be immediately checked against a list of facts doesn't disqualify it. In practice it has been possible to test theories of universals to a pretty satisfactory extent. For example, I think the evidence against Kayne's original antisymmetry theory is quite strong. Relative clauses pose one of the biggest problems for the theory. However, that's not because the theory in itself makes any predictions about RCs, it's because it has proved very difficult to build a satisfactory theory of RCs on top of antisymmetry. EL could have marshaled theoretical arguments of this sort against other hypothesized universals, but they didn't.

      You're now introducing an additional distinction between predictions and stipulations. I think I understand this in general: a theory can either make a prediction by fiat or via a complex and interesting set of deductions. But I'm not sure I see your point in this specific case. Michaelis' original proof that MGs generate MCS languages was quite complex, and (I guess) comparable to the complexity of a demonstration that a given learning algorithm can only learn languages within a particular class. In general, any definite conception of UG, or specification of a learning algorithm, is going to make some kind of prediction about language classes. Nativists are at no disadvantage here. (And I should be careful not to give the impression that only empiricists can make proposals regarding the algorithms underlying language acquisition.)

      I think syntacticians tend to see language classes as a somewhat marginal issue because, as I said before, the predictions are rarely any use for deciding between rival theories. I suspect that the same is or will be true in your domain. Once we have some idea of the smallest class of languages containing all natural languages, then any viable learning algorithm will have to target a superset of that class. That will still leave plenty of different proposals on the table, no?

    17. Norbert wrote: UG is about Gs. To argue against a version of UG you need to say something about Gs. The problem with EL is that they don't couch their empirical difficulties in these terms.

      This is perhaps a nit-pick but I'm trying to connect this what Alex C. is saying: I think it's slightly too strong to say that you need to say something about grammars (beyond referring obviously to the theory of grammars you're arguing against), but you do need to at least say something about string sets rather than just strings. It's very unlikely that any particular theory of UG will make a prediction that such-and-such a string will not appear in any language, so the arguments that EL make like "look at this string here where the noun-phrase is split into discontinuous chunks" aren't likely to falsify a theory.

      Shieber's or Kobele's arguments against (in effect) particular notions of UG were about string sets, and are much more complicated than EL's: it obviously wasn't enough for Kobele to point to, say, the English sentence "the man thought the man thought" to establish anything about copying. Obviously there's no single string that can't be produced by any MCS grammar, and the same seems very likely to be true for the class of grammars licensed by any remotely plausible theory of UG. So to argue against the MCS hypothesis, it wasn't necessary to propose some alternative class of grammars, but it was necessary to establish something about infinite string sets.

    18. I think in Kobele's case it was string sets under a certain interpretation. These copies had phonological but no semantic import, as I recall. This is what showed that this was an effect of movement with copy pronunciation, rather than just several selections of the same words as in your "the man thought the man thought" example. In other words, the case was interesting for K because it implicated a grammar of a certain sort, one that did not show constant growth but whose length depended on the number of "movement" operations required to generate the string. So the argument relies on assumptions about how grammars relate sound and meaning and what structures they generate and how these structures are interpreted.

      The EL cases are nowhere like this. There are obvious alternatives consistent with their observations about free word order that involve constituency in the standard way. There is nothing about the strong properties they cite that is incompatible with standard assumptions, so far as I can tell. Now I might be wrong, but showing I am requires an argument to the effect that these surface forms are not products of constituents being broken up or being put back together. In other words, it requires discussion of how these strings are generated. Kobele did this (implicitly at least). EL did not even try.

    19. You can make the arguments either just from string sets (weak) or from sets of structures or string sets + interpretations (strong). The string set arguments are obviously more convincing but harder to do. So before Shieber's (weak) argument with Swiss German, there were strong arguments with Dutch which people found less convincing.

      Kobele's arguments are about string sets. As are the Old Georgian arguments if you believe the data (I don't)-- see Joshi and Bhatt.

      Of course, Chomsky 59? contains some (technically not very good) arguments along these lines arguing that regular languages are inadequate and that CFGs are inadequate too, so there is some history to this line of argument.

    20. What I find slightly strange about this discussion is that it is uncontroversial that making precise testable predictions is good, and that if your theory fails to make testable predictions then this is very very bad. And yet there is a very strong undercurrent on this blog and elsewhere that seems to take a different view; a contempt for formalisation, a dismissal of data that might refute the current theories. And I am not the first to notice this.

      And it does seem to lead to problems -- take parametric syntax. That has had major empirical problems for decades but it still chugs along quite happily consuming people's research time.

      But this negative view may reflect my ignorance:
      what are the current properties of UG that make some testable predictions?

    21. This comment has been removed by the author.

    22. I find claims that Ptolemaic paradigm was unsound and that it prevented development for some 1000 years unfortunate and misleading. In fact, it worked (and still does) in lots of applications. Those who claim that should explain how the paradigm could have been disproved, say, in 500 or 1000 A.D. Based on what data?

    23. @Alex it's good for theories to make precise testable predictions, but it often takes a lot of work to figure out what these predictions are and find ways of testing them. Surely that is also an uncontroversial point. It's hard to argue with a caricature, but I don't think it is accurate to say that people commenting here have a contempt for formalism or a disregard for data. Various people here have explained why various data points don't refute certain theories. We've also given examples of how proposed universals can be refuted (e.g. my antisymmetry example). It's no response to any of this to question our attitudes to formalism and data.

    24. (The contempt for formalism remark was not directed at you: for all I know you may be very pro-formalisation, and your work may be formal. And I should say I don't really buy into the naive Popperian falsificationism here.)

      Chomskyan linguistics is *famous* for its contempt for formalisation and its disregard for potential counterexamples, which has even been institutionalised in various ways--e.g. the core periphery distinction.

      Norbert has explicitly said on several occasions on this blog that he does not think that the process of formalisation is worthwhile; in the context of our discussions on computation and elsewhere.
      And Chomsky of course is in print saying exactly the same thing. But regardless of what people like this say, the more revealing point is what they do.

      My point, such as it is, is that one of the reasons why it is very hard to make testable predictions about some current linguistic theories is that the theories are imprecise. It is also uncontroversial that it is easier to make precise testable predictions from a precise theory. If your theory is precise, then you can use mathematical or computational methods to produce predictions.
      GPSG is precise: ergo Shieber's refutation of its central empirical claim only takes a few pages.

      If your theories are not precise, then it is very hard or impossible to make precise testable predictions.
      Current minimalist proposals are not precise.

      (Two big caveats -- Stablers MG and the recent Stabler and Collins proposal, which looks cool but I haven't looked at yet.
      But again it is quite easy to make a testable prediction from these Stablerian theories -- e.g. that the string sets are semilinear, which can be refuted by looking at some observable data.)

      So let me go one step further from making a reasonable technical point (which I hope you accept) to an unreasonable polemical point. I conjecture that one of the reasons that Chomsky and Norbert have explicitly rejected formalisation is that they don't want their theories to be refuted by the likes of E and L (or the likes of Geoff Pullum or me or anyone else.)
      The reason I think this is that Norbert has said several times (perhaps tongue in cheek?) that he "knows" his theory is right, and that for example computational considerations cannot refute the theory but rather that the role is too explain *why* the theory is right.
      This of course may not be your view.

      And I know this is unreasonable and not fair, but I am curious as to whether you think this view is completely off beam or not.

      Thanks for the anti-symmetry example. What about parameters, which I am more familiar with?

      TL;DR precise theories are more easily falsifiable than imprecise theories, so if you care about falsifiability you should care about precise theories. Ergo people that don't care about mathematical precision don't care about being falsified.

    25. I can't control what Chomskyan linguistics is "famous" for. Life is annoying like that. In some circles, Obama is famous for being a Muslim socialist born in Kenya, and I guess in some circles "Chomskyan linguistics" is similarly "famous for its contempt for formalisation and its disregard for potential counterexamples" But that doesn't make the charge accurate.

      Most of what we *do* for a living is discover puzzling linguistic phenomena (anything from "how do babies come to know the that-trace effect" to "what's this funny morpheme doing in that verb") -- and try to explain them by evaluating competing proposals that make distinguishable predictions about the phenomena in question. That's almost every paper in so-called "Chomskyan linguistics", including Chomsky's. To carry out this kind of task, you have to be at least as formally explicit about the proposals as the problem at hand warrants. That's what every linguist tries to do. Of course, success varies, but that's life as well.

      The question Chomsky's takes up in the remarks for which he appears to be infamous is not normative but *tactical*. Formalization takes time and energy. Formalization can obscure generalizations just as easily as it sometimes clarifies them (cf. figuring out how your favorite app works by looking at machine code).

      Unfortunately, there is a persistent stream of critique in linguistics that looks at someone's seemingly beautiful, instructive paper on, say, how phases explain island conditions in Kikuyu and says: "this is worthless because you haven't given us a formal account of the notion 'occurrence'...'". Now there is a version of this critique that makes eminent sense - it is the one that continues "... and if you look carefully at what you have to mean by 'occurrence' to make your proposal coherent, it leads to terrible problems explaining Korean", or "it leads to logical inconsistencies", or something equally alarming. But there is a common version of this critique that makes no sense, which instead continues"... so you're a useless idiot and your so-called research is crap". It is this version of the critique to which Chomsky has repeatedly attempted to issue a corrective: we should formalize as necessary, and justify the expenditure of time and energy by showing that it contributes to our understanding of genuine problems.

      [continues below]

    26. [continuation of preceding comment]

      Likewise with the notion "counterexample". More or less every paper in syntax has lines like "but this proposal would incorrectly predict that example (17c) should be grammatical" (or "... should be scopally ambiguous", or "... should be acceptable without stress on [bla]", or "unlearnable"). Virtually every good talk at any conference is followed by a discussion section in which the varying empirical predictions of competing proposals are debated. Counterexamples that distinguish proposals are the lifeblood of research in Chomskyan linguistics, just as in any field. Once again, to the extent that Chomsky or "Chomskyan linguistics" has spoken negatively about the notion "counterexample", it is as a corrective -- a response to a stubborn strain in linguistic polemics that thinks that unexplained data, unaccompanied by any competing explanation, can kill theories, e.g. "My language loses its island effects on Thursdays in leap years, and Chomsky's theory can't explain this, so generative grammar is wrong." No linguist, not Chomsky, not anyone else I know, has "disregard for potential counterexamples". Chomsky, in his 1986 book "Barriers", offers a theory of parasitic gap constructions that differs from his own proposal advanced four years earlier (and is conceptually weaker) because of unexpected island effects that he had detected in these constructions during the intervening years. The final chapter of his Lectures on Government and Binding was tacked on to the book at the last minute to repair an empirical mess he'd created with a theory of pro-drop that he advanced in an earlier chapter. That's how we all operate. What we don't do is give up and go fishing just because there are data points that we can't explain. In a mature field, these sorts of points would probably not have to be made, but in linguistics they have to be -- and that's what Chomsky's been doing from time to time, on behalf of all of us.

    27. So we should bear in mind that we are interested in different things: you are interested in syntax, and I am interested in
      language acquisition or the LAD/UG. So we maye have slightly different views of what a counterexample is.

      "To carry out this kind of task, you have to be at least as formally explicit about the proposals as the problem at hand warrants."
      That is an excellent maxim: as Aristotle puts it
      "It is a mark of the well trained mind never to expect more precision in the treatment of any subject than the nature of that subject permits."

      So broadly the more remote your theory is from observables the more precise your theory needs to be if it is not to become empirically vacuous.
      That is why the most fundamental branches of physics are the most mathematically precise. If you are not, then it becomes pure speculation.

      Linguistics, as the post on which this is a comment argues correctly, is exactly one of these fields, where the data is remote from the object of study.
      So I agree with Norbert and AlexD about that part of it.

      How explicit should we be in linguistics? I think that linguists or at least those talking about the fundamental issues of syntax and language acquisition should be fully mathematically precise. I realise that this view is not shared widely in linguistics.

    28. (part 2)

      The first thing to say is that it is entirely possible to be entirely explicit and mathematically precise; that we have the mathematical and computational tools
      (eg HPSG or Stabler's Minimalist grammars), and we have the apparatus of computational learning theory.
      So being precise is a perfectly feasible option. And those people who have been precise have learned a thing or two about writing
      generative grammars, though they don't always call them that. And one of the things they have learned is that grammars are far too complex
      to reason about directly: one needs computational assistance to figure out what is or is not accepted by the grammar.
      So when I read a syntax paper and the paper says, "If we make this assumption, then these sentences are ruled out and these sentences are ruled in",
      I don't believe that at all. Or rather whether this is true or not depends on all of the "filigree" details of the grammar and the precise lexical entries, which are not generally specified.
      In particular it is very hard to demonstrate informally that a particular grammar does *not* generate some string of words because that requires checking all possible derivations using all possible lexical entries for those words, and that needs to be done automatically. Chomskyans (I feel embarrassed using this term, but it serves as a useful shorthand) don't really think about these problems, because they don't try to write large scale grammars.

      (Whether the method of syntactic analysis as currently precticed has any chance of leading to psychologically real grammars at all is another entirely separate methodological issue.)

      That is one problem, but Norbert's post raises a larger one. Suppose there is some claimed universal property of all grammars, say P.
      The empirical claim is that for each language there is some grammar that has P that generates that language.
      To falsify that you need to show that every grammar that generates the language is *not* P. And this requires a universal quantification over all languages, which is impossible to do if the class of grammars is not precisely specified. So the claim that every grammar is P is empirically vacuous, however meaningful it may be in terms of the fundamental psychological reality of the grammar. So claims like, all languages are binary branching, or XBar syntax or whatever, seem to be vacuous.

      But the final thing to say is that, "tactically" it is not possible for most linguists to be mathematically precise, because they do not have the technical background.
      Linguists are not trained to be mathematically precise, unless they go to UCLA or a few other places, or transition from some other field.
      Of course there are exceptions, but overall it looks to me like what is necessary is also impossible. Which is a very bad situation to be in.
      And there is no hope of getting out of this situation while you and Norbert and Chomsky are saying that formalisation is pointless.

    29. As other have done, let me thank Norbert for making this blog available as a place where discussions like these can take place. That said, I am often depressed at what passes for discussion. You write:

      "So when I read a syntax paper and the paper says, "If we make this assumption, then these sentences are ruled out and these sentences are ruled in",
      I don't believe that at all. Or rather whether this is true or not depends on all of the 'filigree' details of the grammar and the precise lexical entries, which are not generally specified."

      This is just not true, as a general law. It is often true in specific cases: look more carefully at a phenomenon and you often learn that someone else's simple picture is not so simple after all, just as I wrote. But that's not a law of nature, it's a contingent fact. It is just not practical, if we want to make any progress at all, to wait until we've worked out a theory of everything before we permit ourselves ideas about what's going on in specific cases. I don't even know how that would work. What field in history has made progress that way?

      To continue with an example I mentioned in my previous comment, when Chomsky (following Taraldsen and Engdahl) claimed in "Concepts and Consequences" that in parasitic gap constructions, there must be a "real gap" that obeys islands with respect to its A-bar antecedent, and then the parastic gaps freely disobey islands, he proposed an account of this that relied on notions like "movement", "island", "coindexing" and "trace" that were not fully formalized. They were, however, presented with sufficient clarity for researchers like Kayne, Longobardi and Chomsky himself to notice over the next few years that the generalization was not quite right, and that the correct theory of parasitic gap constructions needed to appeal to configurational connectedness in a particular way that had not even been suspected to be relevant when work began on the topic. Now it's also conceivable that someone like you might have come along in those days and pointed out if you formalize the notion of movement in way X rather than way Y, you make further progress on the topic, and that would have been great too. But you didn't, and even without your helpful hypothetical intervention, progress was made. You're free to say "I don't believe that at all", but you're just wrong, and the proof is all around you, if you would but look and listen.

      The problem is, looking and listening *is* important. Here I am saying repeatedly that formalization is productive, indeed essential, when there is a point to it. Here's what Chomsky wrote about the same topic in 1990 (in a reply to Pullum in Natural Language & Linguistic Theory):

      "Theories should be formulated clearly enough, and observations firmly enough established, so that inquiry can proceed in a constructive way. Beyond that, experiments can be carried out more carefully and theories made more precise, but the burden of proof is on those who consider the exercise worth undertaking. Sometimes the burden can be met: inquiry is advanced by better data and fuller formalization ... [and] work should be clear enough so that it *could* be formalized further if there is some reason to do so."

      Yet you repeat my words back to me as "you and Norbert and Chomsky are saying that formalisation is pointless". Is that what I said? Is that what Chomsky said? Has Norbert expressed that view anywhere?

    30. Yes, I also would like to thank Norbert again for writing posts that stir up such discussions.

      When I said formalisation in the last sentence I meant a proper mathematical model, rather than just providing a little bit of technical detail here and thereBut no more than is used in any other branch of science.
      I also don't think that one necessarily needs to write a grammar for the whole language; but rather a mathematically precise model for a simplified or idealised system; but only for a part of the grammar.
      That is how more or less *all* science works, and I don't think linguistics should be any different.

      So am I being unfair here? I thought about it a bit, trying to be objective, and I don't think I am.
      Because I think you are in fact arguing that formalisation (in my sense) is pointless.

      What you are saying is that formalisation is good when it is necessary, and a pointless waste of energy otherwise.
      Whis is very reasonable and hard to disagree with: but "cook until done" is not a very good recipe...
      So it all depends on your answer to the question: When is it necessary?

      So the Chomsky 1990 letter is not really arguing in favour of formalisation. It is defending the lack of formalisation in current theories.
      He is arguing as you are that it is generally unnecessary. I.e. that it is in general pointless.

      You and Chomsky raise one important and convincing argument why it is unnecessary (and thus pointless) which is that in practice, in syntax, the models are precise enough to be used and refuted and thus "progress" occurs.

      But I just don't see this "progress": there is *change* to be sure as the years go by, but I don't see any measurable progress: no building of a consensus of bedrock syntactic facts that everybody agrees on; no usable theories that could be applied in psycholinguistics or language acquisition, or in models of language evolution, or even in Machine Translation. Changing your mind often isn't progress. And I just don't see that the theories of 2013 are any better than the theories of 2003 or 1993.

      (continued in next post)

    31. I work in a philosophy department at the moment, and there is a lot of discussion in the field about whether there is progress in philosophy. All of the indicators that people use to argue that there is no progress in philosophy seem to apply equally well to linguistics; see e.g. "On the Limits of Philosophical Progress" by David Chalmers. And being in the same situation as philosophy is bad for a supposedly empirical science.

      The argument is basically from the lack of convergence to a consensus. If the field is not converging then it is not a fortiori converging to the truth.
      And linguistics is definitely not converging: indeed the opposite. I see less agreement now about fundamental issues like the nature of the language faculty, the role of the strong minimalist thesis, the nature of parameters, and many other basic syntactic issues..
      Twenty years ago, we had various competing syntactic frameworks which make different claims about syntactic structure-- are we any closer to resolving these disputes between HPSG and GB? What about CCG or dependency grammar formalisms? Or OT?
      There just isn't the convergence to a broad and stable consensus that one would expect from a functioning science. This has been going on for so long in linguistics that people internal to linguistics (like you and AlexD and so on) seem to thing that this is normal and ok.

      And even if one ignores all work done outside mainstream generative grammar, as Chomskyans often do, there is no sign of convergence or consensus, as I have found by hanging out here at this blog.
      So I don't intend this to be an attack on all of Chomskyan linguistics, which wouldn't be a useful comment, but just narrowly a counterargument to your claim that the current low level of formalisation is ok because progress occurs.

      But maybe I am being unfair: A genuine question: have either you or Norbert or Chomsky written a formal paper since 1990? That is to say have you encountered a situation where you felt a proper formalisation was essential? Because beyond what you say about formalisation, there is what you actually do about formalisation, and for example the reception of Stabler's work is to my mind quite revealing.

      So I have given three arguments why it is necessary which you have ignored; let me add one more external one : linguistics is not taken seriously in the wider scientific community, as Mark Liberman notes on Language Log frequently, and as you have noted also (e.g. in your presidential address). One reason may be the lack of formalisation which impedes the integration of linguistics into other sciences.

    32. The problem with answering your messages, Alex, is that they consist of such a wide spray of claims, charges, and put-downs, that there is no way of answering the entire message without producing a reply that is just as scattered, but longer. I chose to write a pair of coherent remarks that focused on those aspects of your remarks that seemed to me most in need of a reply. That of course means that you can now resort to the evergreen all-purpose reply in polemical discourse: "I have given three arguments why it is necessary, which you have ignored" -- but this was inevitable. Unfortunately, in your remarks, I see no arguments, just smears: we "Chomskyans" are not smart enough, we don't agree enough, we "ignore all work done outside mainstream generative grammar" ("as Chomskyans often do"), etc.

      And when Chomsky, Norbert and I say as loud as we can that formalization is not pointless, we actually mean that it *is* pointless. Damn those Chomskyans, always saying the opposite of what they mean! Now you've even "thought about it a bit, trying to be objective" and concluded that you're not unfair: when we said P, we really meant not P.

      No, I have not written a paper that is formal "in [your] sense", and perhaps some day I will -- if the questions I want to answer take me in that direction. If I ever did write such a paper, I do what people in serious fields do on such occasions (and what Chris Collins, syntactic theorist & field linguist, and Ed Stabler, computational linguist & ex-philosopher, did): seek collaborators whose skills complement mine, so that we produce a work that stronger for having been produced by a team with distinct skills but common interests. Is that a problem? Are *your* skills and interests the right ones for every problem? Should everyone be doing exactly what *you* do? I, for one, have never thought it productive to rail against work done in adjoining areas of my field, and I have always considered diversity of talents, skills and interests a strength, not a weakness, of linguistics as a whole. The work done in this diversity does lead to progress and consensus, no matter what you say, and I don't see any way to populate a successful field except with people whose skills and talents overlap but do not coincide. And no matter what you claim, the field I know does offer "usable theories" applied in "psycholinguistics or language acquisition" -- or is it just an illusion that increasing numbers of our students are working in these areas?

      [continues below]

    33. This comment has been removed by the author.

    34. [continuation of previous message, second posting to correct typos]

      Now I can't stop random strong personalities from making their (often rather minor) disagreements look like major quarrels, as they do, but the fact is that we do function pretty well as a field. We don't just "change our mind". We genuinely learn more, and collectively build on our understanding. Crucially, we do that by *disagreeing* at the frontiers of knowledge, just as in any field. But the fact that, say, Norbert and his students might pursue a hunch about control (vs. raising) that differs from the hunch pursued by a former MIT student such as Idan Landau, and that both sets of views differ from those of Susi Wurmbrand (to pick a real example) does not show *lack* of consensus. Rather dramaticaly, it shows the opposite -- as the very terms of such debates reveal. The core issues are agreed on by all participants in the discussion, and the debate proceeds against a huge background of shared assumptions (supported by decades of earlier research).

      My colleagues and I regularly teach papers that develop HPSG, LFG, Relational Grammar and other syntactic approaches, and guess what -- the situation is exactly the same. (Click on the link to see what we teach in this class.) Most of this work shares so much background with the "Minimalist" syntax taught in earlier classes that getting up to speed is practically trivial. (We race through the Sag, Wasow, Bender textbook for HPSG, and Bresnan's textbook for LFG.) The interesting bits are indeed the *disagreements* -- which are fascinating but strikingly specific. Do long-distance dependencies grow specifier-to-specifier (as in most "minimalist" approaches) or daughter-to-mother (as work in GPSG and HPSG proposes)? Does anaphora care about constituent structure, f-structure or ARG-ST, and does it work the same way cross-linguistically? Are long-distance processes like wh-movement, relation-changing processes like passive, and head-displacing processes like subject-aux inversion instances of the same general rule (intermal merge) or a disparate collection of distinct processes (as HPSG and LFG work posits)? But for all that HPSG, LFG, etc. sell themselves as major departures, the amount of background that they share with the supposedly opposing "frameworks" far outweighs the differences -- and the differences mostly arise where secure knowledge ends, just as they should.

    35. Yes it is a bit crass to use the terms "Chomskyan" or "empiricist". And yes, arguing in a comment thread with a 4K character limit is not ideal either.
      But given that this is a comment on a deliberately polemical blog, I hope my tone is not out of order: I have been trying to be polite and direct, and there is no intention to smear or insult you. And if I have caused offense I apologize.

      Let me fish out my specifc arguments for formalisation, since they did get lost:

      1) "the more remote your theory is from observables the more precise your theory needs to be if it is not to become empirically vacuous.
      That is why the most fundamental branches of physics are the most mathematically precise. If you are not, then it becomes pure speculation."

      2) "In particular it is very hard to demonstrate informally that a particular grammar does *not* generate some string of words because that requires checking all possible derivations using all possible lexical entries for those words, and that needs to be done automatically."

      3) "Suppose there is some claimed universal property of all grammars, say P.
      The empirical claim is that for each language there is some grammar that has P that generates that language.
      To falsify that you need to show that every grammar that generates the language is *not* P. And this requires a universal quantification over all languages, which is impossible to do if the class of grammars is not precisely specified. "

      4) "linguistics is not taken seriously in the wider scientific community, as Mark Liberman notes on Language Log frequently, and as you have noted also (e.g. in your presidential address). One reason may be the lack of formalisation which impedes the integration of linguistics into other sciences. "

    36. Thank you for the apology. Briefly:

      Concerning 1: I'm not at all sure that your "law of remoteness from observables" is true, but let us imagine it is. Frankly, and perhaps here Norbert might disagree, I don't think that most of what we do in linguistics is so monumentally divorced from observable reality that the only way to make progress is with heavy formal artillery. Channeling Norbert's physics-envy for a moment, I can say I would be delighted to have a glimpse of a future in which the field has reached that level of successful abstraction, but I don't think we're anywhere near there at the moment. What we do is abstract enough to render E&L's empirical claims, even if they were true (which they're not), irrelevant, but all you need to know is a bit of intro syntax to see that. And unlike you, I'm decently sanguine about the level of formalization in most current work on syntax (but always willing to learn of particular cases where my confidence has been misplaced).

      Concerning 2: I'm a bit lost here. I can easily imagine situations in which a proposal I might make turns out to have unintended consequences that can be explained by a more careful examination of the system I am proposing or presupposing. But most of the time, if I claim that wh-movement is blocked by the presence of the adverb "kerplunk", do I really need fancy formal tools or complete grammars running on a supercomputer to see that wh-movement is in fact blocked by that adverb? And before you tell me that if I ran my grammar on a computer, I would be shocked at the unexpected problems it faces, let me reply "of course" -- but I think that would mostly be for other reasons (e.g. the need to make arbitary analytical decisions about understudied phenomena when you value coverage above understanding), which we can discuss separately on some future comments page.

      Concerning 3: I suspect there's a typo in your claim somewhere, because I can't figure out what you are saying here. Let's leave it for another day.

      Concerning 4: My LSA talk focused on the fact that crazy-bad linguistics papers are getting published in high-profile journals, to the exclusion of anything else. The latest case is discussed in today's Language Log posting by Richard Sproat here. Nothing to do with syntax or Chomsky whatsoever, but the same depressing mess nonetheless. This is indeed a serious problem for linguistics. In my LSA talk and elsewhere I have speculated on some possible reasons, starting with the absence of any knowledge in the public-at-large about the most fundamental discoveries of our field. I doubt that degree of formalization plays any role (except that it does appear possible to blind Science and Nature editors with fancy statistics evaluating irrelevant or bogus data - another topic for a different occasion).

      Thanks for the conversation.

    37. I have been traveling a lot and could not keep up with this debate for a while. One conference I attended in Lisbon showed that contrary to what David says above linguists are taken seriously by many non-linguists and I was rather pleasantly surprised that even some linguists I had considered quite 'Chomakyan' are able to reply politely to opposing views. That is quite a difference to the tone here: "My LSA talk focused on the fact that crazy-bad linguistics papers are getting published in high-profile journals, to the exclusion of anything else." - is it surprising that editors/reviewers accused of accepting work that is 'crazy bad' are not begging to publish MORE from linguists?

      "This is indeed a serious problem for linguistics. In my LSA talk and elsewhere I have speculated on some possible reasons, starting with the absence of any knowledge in the public-at-large about the most fundamental discoveries of our field."

      With all due respect: the public-at-large is ignorant about most discoveries in most fields. That hardly results in 'crazy bad' papers getting published in other fields. But the public-at-large is not entirely stupid. Even someone with as little knowledge about linguistics as Norbert has attested me can easily see that no matter how bad the work that got published by Science is, it does not even remotely approach the level of "Science of Language". The public-at-large wonders why Geoff Pullum was smeared in quite a nasty way when he wrote an accurate review of SoL and why those who claim that "degree of formalization ... [might] blind Science and Nature editors with fancy statistics evaluating irrelevant or bogus data" do not comment with similar harshness on the editors of CUP who did not request substantial revisions to SoL. In other words, the public-at-large is able to notice the enormous elephant in your room - and that you do nothing about that while pointing fingers at everyone else..

    38. I think you're missing the point. What seems to be peculiar about linguistics is that non-linguists feel entitled to an "Oh, language, how hard could that be, I know {insert-your-favorite-"hard"-science}, so I'm already overqualified"-attitude that, as far as I'm aware of, no, say, non-physicist (except for, perhaps, a philosopher) could express with respect to physics without making an utter fool of themself. And "Science" even publishes "linguistics" articles by non-linguists (which, of course, isn't a problem) without bothering to have linguists check whether or not they live up to current standards (or, if they ask a linguist to review, they seem to not care about their judgement). That's exactly the point of the Sproat-languagelog-post linked to above.

    39. @Benjamin: My point was to offer a [partial] explanation for the fact that many/some non-linguists take the attitude you object to. Your field is known to the 'outside world' through the works of Chomsky. I doubt that any non-linguist after reading SS would have considered him/her-self an expert. But after reading SoL it is not surprising that [some] people think: "gosh I could have written something better than that and i am not even a linguist - so how hard can it be". As long as you [pl] continue to shot the messengers you convey to the general public the idea that you think SoL is 'as good as it gets' and you should not complain that your field gets measured by the standards you [seem to] accept.

      You also should not forget that, pace David, there ARE serious debates among linguists about what language is. Some linguists [like Paul Postal] deny that language is a biological organ. That's not some minor detail that will work itself out once we know a bit more. Other linguists [like Dan Everett] claim language is a 'cultural tool'. So you cannot really blame non-linguists for picking one of the definitions of language you disagree with, if not even linguists can agree what language is.

      regarding the attitude towards other sciences: again, look at recent publications by Chomsky: he talks as if he were an expert about work in many fields where he clearly has no expertise. When this is pointed out in forums like this one again the critic is attacked. If this is the standard you accept for your own field [and Norbert has rather proudly admitted that his pay scale is too low to have expertise in biology yet he calls himself a biolinguist and feels competent to publish on evolution] you should not be terribly surprised that [some] others adopt this attitude towards linguistics.

      Finally: do you know who the reviewers for the papers in question were and have you read their reports? If not how do you know they were no linguists and they did not make any suggestions of the nature you seem to think were needed? Assuming for a moment the reviewers were 'mere' psychologists: for decades Chomsky [and certainly not he alone] has claimed that linguistics is a branch of psychology. Should we be surprised then that editors of say 'Science' listen to what a psychologist tells them about the quality of the paper in question? Again, you can continue to fight me [and others who point out the obvious] or you can deal with the elephant in your field...

    40. This comment has been removed by the author.

    41. (deleted comment because text was cut)
      The Sproat-case David mentions (did you have a look at that? Here's the link again: doesn't involve anything as controversial as the "true nature" of language, it's basically about intro-level knowledge about (Computational) Linguistics, which, by the way, is certainly not swarming with dogmatic Chomskyans. I suppose you can turn this around and say, see, Chomsky even managed to damage an adjacent field that now doesn't get the respect it deserves. I doubt there's anything to that.

      A similar thing happened years ago, and Joshua Goodman wrote a hilarious reply to an embarrassingly bad article published in, of all places, a Physics journal. Again, nothing to do with Chomsky, Generative Grammar, just unfiltered ignorance by "real" scientists about CompLing 101 knowledge:‎

    42. I am not sure if you really don't *get* my point or if this is another attempt at humour? Assuming you're serious: no one denies that at times bad papers get published. Nor did I claim such papers should not be criticized. The problem is that these papers are not remotely as bad as Chomsky's forays into nematology or bacteriology or genetics or evolutionary theorizing or...Yet, Norbert and David and you spend much energy complaining about the former but none whatsoever criticizing works like SoL. That makes you [pl] look like hypocrites. So let me ask you: did you have a look at the 'common theory' of language evolution Chomsky *describes* in SoL. If you truly don't know how bad it is ask people who actually work in the field [but be prepared to be treated like an imbecile because it is not even intro-level bad]. Do the same for other biological claims by Chomsky. Or take his references to the allegedly *lost works* of Descartes and erecting his *Cartesianism* on top of them: do you think any serious historian would do such? Take the banality of what Pullum called rightly the 'rocks and kitten argument'. Or the fact that Chomsky has for decades refused to reply to the Katz/Postal criticism that his ontology is incoherent. The list goes on. Where are any criticisms of Norbert or David or you of those issues?

      Editors and referees of Science or Nature may or may not know about the work David listed in his LSA talk. But they DO know that what Chomsky has published as his *science* has nothing to do with biology. And some may have read the reply to Pieter Seuren's [2004] 'Chomsky's minimalism': “Can [Chomsky’s] goal be pursued by ‘normal scientific procedure’? Let us remember we are talking about someone who tries to reinvent the field every time he sits down to write. Why should we expect Chomsky to follow normal scientific practice...?” (Fiengo, 2006, p. 471). - If this is what gets published by serious linguists why should we expect the editors of SCIENCE to accept any work from someone who finds it okay to abandon normal scientific practice?

    43. What is it with "Science of Language"? I don't recall ever defending that book (nor reading it, for that matter). Nor do I recall myself defending Chomsky's specific "proposal" (if you want to go so far as calling it that) about language evolution. Nor his recent interest (shared, I suppose, by Norbert and lots of other people but not me) in "third factors" and the entire Minimalism-thing. I've never been a huge fan of his Cartesian Linguistics, either. Just for the record.
      But why bring up any of this? What I was elaborating on was this specific point of David's which, I think, is exactly right, so I'll repeat it here:

      "My LSA talk focused on the fact that crazy-bad linguistics papers are getting published in high-profile journals, to the exclusion of anything else. The latest case is discussed in today's Language Log posting by Richard Sproat here. Nothing to do with syntax or Chomsky whatsoever, but the same depressing mess nonetheless. This is indeed a serious problem for linguistics. In my LSA talk and elsewhere I have speculated on some possible reasons, starting with the absence of any knowledge in the public-at-large about the most fundamental discoveries of our field."

      If you think this is all Chomsky's (and Norbert's, and David's, and (among many more people) my (I'm quite flattered by being included in this enumeration)) fault I don't think there's anything that would convince you of the opposite. How do they say on the internets, haters gonna hate. To wit:

      "If this is what gets published by serious linguists why should we expect the editors of SCIENCE to accept any work from someone who finds it okay to abandon normal scientific practice?"

      I don't think this is going anywhere, but as my last comment on this, I'd really suggest you look at the specific case David brought up - Richard Sproat on the Indus Script "controversy" - before making comparisons like that.

    44. Okay lets try this one last time, because I agree this is not a pleasant conversation. I cannot imagine that you truly do not understand my point but I have credited people with too much wisdom before. So I keep this very simple:

      1. I have read the Sproat post David linked to.
      2. I never said the points made there were invalid.
      3. I do not deny that bad papers get published even in high profile journals.
      4. I have previously agreed with David and still agree that it is regrettable that [seemingly] SOME editors/reviewers are ignorant of good work in linguistics.
      5. I have offered a few suggestions for why I think the general public is largely ignorant of some of the good work [In reply to David's point you cite above].

      Since you do not seem to want to get this point here it is again:

      When the topic linguistics comes up non-linguists usually know one name: Noam Chomsky. He is the 'outside face' of your field. Saying this is stating a fact not expressing hatred [I do not hate Chomsky].

      Linguistics is probably unique in having such a well known representative. With this privileged position comes of course responsibility. I think Chomsky has 'let down' your field by publishing many works of the caliber of SoL. My point is not that you should read this book but that it is one by which your field is judged by non-linguists. They see it is of a quality far worse than the work described in the Sproat post. And they notice that, nevertheless, it is praised [read the endorsements] by leading linguists and that someone who critiqued it [Geoff Pullum] gets attacked in a pretty mean-spirited way.

      Further, a well known linguist [Fiengo] writes that we should not expect Chomsky to follow scientific practice. To my knowledge neither Norbert nor David have objected to this statement in print. Non-linguists have no way to rule out that this silence implies endorsement of Fiengo's assertion. It is up to those who want to achieve that good linguistic work is published in journals like Science to distance themselves from the bad stuff that your field has become known by.

      Now you COULD HAVE said: hmm, interesting, I never thought of it this way. There is actually something fairly simple we can do to change the public image of our field. Or you can continue to accuse me [and anyone else saying similar things] of hating Chomsky and complain about the stuff that gets published because some people really do not take your field seriously. Before you fire off another reply ask yourself the Dr. Phil question: how has doing what you [pl] are doing [viciously attacking others but ignoring the well-known problems in your own ranks] worked for you in the past? If you continue doing the same chances are you'll get the same results...

  4. My current take on this stuff is that this "debate" sounds a bit like the "debate" between "frequentists" and "Bayesians", at least in the broad sense. Larry Wasserman (whose blog you should definitely read) makes the point that the difference between Bayesian statistics and frequentist statistics is a difference in goals, not a difference in techniques. If you have Bayesian goals then it does not matter whether you "identify as Bayesian" or not: you can only make valid or invalid inferences. I think there's something very similar here: asking questions about mental grammars (which I see as very similar to Bayesian notions of "beliefs") means you accept certain things like inductive bias (or, in Bayesian terms, there is no such thing as an uninformative prior- you cannot consistently pursue Bayesian goals without accepting some information in your prior; just as you can't pursue arithmetic without accepting that 1 != 0). Now, if you can claim that UG is just a prior on grammars (and I can't see how it could be otherwise), it is still reasonable to ask to what extent its biological manifestation is used by other cognitive processes.