Saturday, June 6, 2015

Another Follow-Up on Athens

Following Norbert's example, I would like to talk a little bit about my experience at the Athens workshop. As you might recall, I was rather sceptical after reading the vision statements that much good would come out of all of this. A week later, my evaluation is more nuanced.

First of all, let me say that I can't think of any other syntax conference I've been to that has been that much fun. The format of brief panel presentations followed by one hour discussions worked incredibly well, much to my own surprise, and the range of topics that were touched on was very diverse (Gillian has a great summary on her blog). That said, I can't shake the feeling that most of the issues people worried about are ultimately small potatoes and that nobody was inclined to really question the foundations of the field, not even as a thought experiment to demonstrate their usefulness. I suppose that puts me in what Gillian calls the Grumpy camp, though I prefer to think of it as a healthy predilection for permanent improvement through criticism. Anyways, let's talk a bit about why all the nice discussions won't bring about any of the changes the field needs, even those everybody in Athens agreed on.

Institutional Issues

There was a broad consensus that generative syntax isn't doing enough outreach and interdisciplinary research and that this is the main reason why, first, the public has no idea what the field is about, and second, Minimalism has little traction in neighboring fields like computational linguistics or psychology. It was also generally agreed upon that generative syntax can engage with these and many more fields, be it sociolinguistics, biology, diachronic linguistics, and so on. But here's the catch: on the third day, all participants were asked to come up with research questions for the future. And that's where you could get a very clear look at the real priorities of the audience.

I don't have access to the compiled list, but as far as I remember the only question relating to any of the interface issues listed above was one on diachrony, and even that one had to be added by Ian Roberts who had been actively championing this connection since the first day of the workshop. So the attitude seems to be: "All this interdisciplinary stuff is super important, but we're not gonna change one iota of our research program to accommodate it. We're gonna be standing over here doing business as usual while we wait for somebody else to do the outreach for us." Well, I'm sorry, but that's not how interdisciplinary research works.

Interdisciplinary research is not a one-way road where you send out missionaries to preach the gospel to the heathens outside generative grammar, interdisciplinary work is an active process that requires both parties to assimilate some ideas from the other. If none of the interdisciplinary work gets cited by you, if it is never discussed in your journals, if it has no sway over how you conceptualize certain problems, then it isn't interdisciplinary research, it's a PR campaign. I can't help but think of a remark by Colin Phillips on this blog about the insularity of syntax, in the sense that a lot of syntax-heavy work is still considered outside of syntax because it involves some aspects that have not traditionally been part of the generative enterprise. This is a very unhealthy attitude: if you are so attached to your vocabulary and your canon of research problems that any project that does not fit those criteria is automatically outside of your line of work, then fruitful collaboration is nigh impossible. Even if you are raising an army of students doing interdisciplinary work, the fact that this work has lesser status in the eyes of the average syntactician handicaps it on a scientific as well as an institutional level.

This problem is compounded by the fact that Minimalist models operate at a level of granularity that does not line up well with any of the questions asked in neighboring fields. Which takes us to the next point.

The Granularity Issue

One point I kept insisting on throughout the workshop, to the extent that even I couldn't hear it anymore at the end of day 3, is the importance of abstraction. And of course I've also harped on this point in my previous post, but it bears repeating (and I have a few new aspects to add).

Generative syntax has the reputation of being a very abstract science, but it actually isn't. It is very much invested in the nitty-gritty details of its technical machinery, and those details are incredibly nitty and even more gritty. Norbert and David Adger have claimed on this blog that this is merely a matter of presentation, a helpful guide for carving out new problems and generalizations --- syntacticians, they say, are not in any danger of missing the forest for the trees. In my discussion with David Adger I was inclined to agree; that is no longer the case.

My change of heart is due to a little experiment I accidentally conducted during the last but one session of the workshop, where participants were asked to come up with important discoveries of generative syntax, in particular regarding universals. What I found striking is that these universals took the form "for any language L, property P holds of L". Those are certainly interesting universals, but they are not the only kind of known universal. There is a higher-order type of universals, which we might call "class universals" in contrast to the more standard "structure universals". Class universals take the form "property P holds of the class of natural languages". A mathematical example would be that the class of natural languages is not closed under reversal or union, that is, the fact that L and L' are natural languages does not necessarily imply that the mirror image of L is a natural language, nor is the union of L and L' (even if we discard lexical differences). What makes class universals special is that they frequently do not tell you anything about individual languages --- the fact that the reversal of L is not necessarily in the class tells you absolutely nothing about L.

So against this backdrop I presented the non-closure under reversal example and asked whether anybody could think of linguistic universals that hold at the class level. The replies I got, however, were phrased in terms of structure universals, and as I just pointed out this is bound to fail because class universals do not necessarily tell you anything about individual languages. I believe that this immediate jump back to structure universals is due to the fact that the established Minimalist machinery provides no way of talking about universals that aren't structure universals. It looks like thinking purely in terms of the Minimalist machinery blinds you to properties that cannot be expressed that way. I take this as an indication that the restriction to a very specific level of granularity has been harmful to the field.

I could add some more points about why the current level of granularity is an impediment for connecting syntax to psycholinguistics, how a lot of ink has been spilled on rather immaterial issues of technical implementation, or why formal equivalence results are actually a good thing that we should endorse happily rather than erecting an elaborate system of assumptions just so that one particular implementation can emerge as the winner at the chosen level of description. But I've talked about these things many times before, and Gillian has done an excellent job summarizing this very discussion from a syntactician's perspective.

Who or What Needs to Change?

What I have said above won't convince the dogmatic syntactician who is adamant that the standard methodology of generative syntax is the only way of producing insightful results. I am not sure whether said dogmatic syntactician exists. At least the Athens audience was very open-minded, and David Adger has also argued for pluralism on this blog.

There is a major problem with the pluralism attitude, though. If you have infinite resources, then of course you embrace pluralism since there is absolutely no downside to it. In such a world, it is perfectly fine for syntacticians to just keep asserting the status quo while new resources are poured into producing interdisciplinary research and work that operates at different levels of granularity. If you can do everything, do everything. Alas, resources are, in fact, limited in the real world, and they are even more limited in syntax, and they will keep shrinking if the Minimalist community sticks with its current modus operandi.

The way I see it, syntax will keep losing traction in other fields. While outreach is recognized as important, nobody is willing to make the necessary changes. Note that this isn't just an issue of outreach. Even if, say, biologists were so interested in Minimalism that they wanted to get some research started on their own, they would hit the brick wall of granularity. The majority of research questions that were proposed at the workshop were so specific and technical in nature that nobody outside the narrow corridor of Minimalist thinking could ever see how they are related to something they care about.

Pluralism is not enough because if the majority of syntacticians do not change their approach, the remaining resources will be too little for the non-standard work to gain traction. And that will mean that even the standard work in syntax, without any allies in neighboring fields, will eventually be stripped of all resources. Syntacticians have to actively appreciate problems of broad relevance and encourage research along those lines, otherwise their work will be perceived as irrelevant by the powers that be, with all the negative consequences that entails. As Norbert likes to say, there's no such thing as a booming Classics department.

Just to be perfectly clear, I'm not saying that all syntacticians should suddenly become psycholinguists, sociolinguists, or computational linguists. But the field has to rethink why it does things the way it does and whether it isn't missing important generalizations. At the very least, it has to move away from this attitude of passively following the interdisciplinary work with polite (dis)interest while actively shielding its own work from the implications of that research.


  1. @Thomas - I accept your first point: we are too insular, and for the sake of the future of linguistics, we need to be proactive in interdisciplinary collaboration.

    But I would like to make a minor amendment to your characterization of the list of potential research programs that the assembled group came up with at the Athens event. I think that the list was not quite as bad as you suggest. I happen to have it on my computer, and so I can report that it included, for example:

    27. How does the deductive system interact with syntax?

    30. Can all structures be primed?

    31. How transparent is the mapping between the grammar and the performance systems?

    35. How does subsymbolic meaning get bundled into symbolic units?

    42. What can we say about language contact?

    45. What are the relevant notions of complexity for {the feature system, the derivation, a representation, ...}?

    Personally I can’t imagine how any of these would be approached without a liaison to another field.

    But I don’t dispute the larger point of the first section of your post, and certainly the majority of the topics that people came up with did more to display our passion for our (insular) field than our practical instincts for survival in the current academic climate.

    I'm not so sure that I agree with your suggestion about the significance of syntacticians' collective ignorance of class universals. I think mathematical linguistic work that leads to discoveries like non-closure under reversal are so different from the content of mainstream syntax that they represent different subfields, and so all of the previous exhortations about collaboration and communication apply.

    1. Thanks for adding the other questions, I was going by memory which is not a smart strategy when dealing with a list that has over 50 entries. And thanks again for organizing the workshop, I really had a great time, even if all I do in the post is bicker ;)

      Class universals do not need to be mathematical in nature, it simply happens that those are the ones we know of and that are relatively easy to prove correct. But keep in mind that one of the central claims of the PnP approach is a class universal: the class of natural languages is finite. I do find it telling, though, that this is a class universal that falls out from an assumption about how individual grammars are specified --- once again claims are couched in terms of the formalism.

      Here's two class universals that I think have a good chance of being true, and neither one requires any mathematics.

      1) The class of natural languages is sparse in the sense that if one takes a given phenomenon --- say morphosyntactic agreement or the PCC --- and considers all logically possible options, at least one of them is not attested. There is always a typological gap.

      2) The class of natural languages is string-separated in the sense that if two grammars generate different sets of well-formed trees, then they also generate different sets of output strings. That is to say, no two languages agree on all grammaticality judgments for all sentences (modulo lexical differences) but assign different structures to at least one of those sentences.

      Quick excursus: The second universal is particularly interesting because it could have important implications for learnability: Alex C has a paper on learning tree languages that hinges on the assumption that there is a one-to-one mapping from strings languages to tree languages, which would be a corollary of string-separation.

      Neither one of the two conjectures requires any kind of mathematical training, they are purely linguistic observations. But as far as I know they have not been put forward in the literature --- at the very least, they were not mentioned in Athens. And I can't resist pointing out once more that these conjectured universals of mine have no direct expression in terms of generative machinery, which I find awfully suspicious.

    2. @Thomas:

      Could you clarify (1). Logically possible with respect to what?

      In the broadest sense, there is a logically possible grammar that does subject-aux inversion with a linear rule, but in a narrower sense, this is a logically impossible grammar (given the assumption/fact that the operations of the grammar manipulate hierarchical objects, not strings).

      So do you mean logically possible in the broadest sense? If so, then I don't see why "these conjectured universals [...] have no direct expression in terms of generative machinery". Take the phenomenon of subject-aux inversion. The possibility of a rule for doing the inverting based on string order is logically possible but unattested. And this can be specified in terms of generative machinery.

      So I assume this is not how you're using 'logically possible'? Or am I missing something else?

      With regard to (2), I agree that this conjecture is likely true, particularly given the fact that it would be optimal for acquisition. But I'm also suspicious that it isn't expressible in terms of generative machinery either. Why couldn't there be a theory of linearization from which (2) follows? In fact, given that (2) is likely true and that it would be optimal for acquisition, it would seem that it is an a priori desideratum for any theory of linearization to derive.

    3. Let me first clarify that my point about expressability was mostly a psychological one: the generative theories people currently work with have no machinery that is well-suited to talking about these properties (GPSG was a little closer with its Exhaustive Constant Partial Ordering, but that is still a language unviersal rather than a class universal). Consequently, they fly under the radar. That does not imply that it is impossible to lift generative proposals to this level, but nobody's done that so far.

      Irrespective of that conceptual point I still think that the two conjectured class universals do not fit easily into a generative perspective:

      Regarding (1), I was mostly thinking about morphosyntax there. For the PCC, for example, there are 64 logically possible variants, as every combination of IO and DO with distinct features is either licit or illicit, so we have 6 points that can distinguish systems, and two possible values per point, giving us 2^6 = 64 (IO DO combinations with the same feature are often factored out of the PCC, otherwise there would 2^9=512 variants). For islands, there are at least four possible types (+/- extraction of arguments, +/- extraction of adjuncts), but we never find [- extraction of arguments], [+ extraction of adjuncts].

      Each individual gap can be encoded in terms of structural universals, but quantifying over all constructions and stating that there is a gap for each one of them is not something the grammar can do. If you had a very profound theory of grammar where all phenomena are connected by a single abstract mechanism and then put a restriction on that mechanism, then you would be good. But Minimalism is not at that point, the machinery used for the PCC is different from the one for islands which is different from the one for say, Bobaljik's *A-B-A generalization for comparatives and superlatives. And there's good reasons for using different mechanisms --- Minimalists try to model the full phenomenon rather than simply accounting for the fact that a gap exists, and since these phenomena all behave very differently, you end up with very different mechanisms.

      For 2, linearization will not help you given current assumptions about the rest of the system. Take a simple case like "John and Peter or Mary", where two different structures are mapped to the same string. So that is a point of conflation your linearization algorithm needs to allow for. Yet nothing natural in Minimalism prevents a language where and is always a stronger binder than or --- you have to allow subcategorization features to select for specific heads (laugh at vs *laugh to), so a language could require or to select a coordination phrase only if its head is or, thus indirectly promoting and to a stronger operator. So then we have two languages with distinct tree languages but identical string languages.

      It's also important to keep in mind that if you emulate class universals through the grammar as you suggest (which is possible for some, but not all), you pick up extra assumptions as you have to propose a specific encoding that fits into the machinery of your grammar and derives the universal. These are assumptions that you need to get the story off the ground, and as you probably know by now, I like to keep assumptions minimal. Which takes us back to the granularity discussion that prompted my spiel about class universals in the first place.

    4. @Thomas

      I generally agree with your overall take on things regarding the state of the field's current interests. If I asked to rephrase what you are saying (I know nobody is asking, hence the would) I would say that GGers are currently more interested in finding more MLGs and/or refining the ones that we have through cross linguistic investigation. What the field seems less interested in, despite the minimalist hoopla, is in explaining why the MLGs we have are the ones we have and not others. This latter can be rephrased as interest in why some perfectly logical dogs fail to bark, ever.

      Note that tho kind of question is precisely what MLGs were designed for. Why though unbounded movement is possible is it not possible out of islands? Answer, Bounding Theory. Why bounding theory and why these bounding nodes? On the agenda, but not pressing. Why? The best answer I have been able to come up with is that the question is premature until we've nailed down the right MLGs. I personally think that this is a bad research idea, but I think that many of my wonderful colleagues think otherwise. They are very skeptical that we have the right tiger by the tail.

      That said, people have been trying to explain why certain things are not possible. So Bobaljik's *ABA pattern is both identified by him and he offers an explanation in terms of the elsewhere condition on exponence. It comes with a theory of the hierarchical structure of certain words etc, but the whole schmear is intended as an explanation of why things that are missing must be missing. I would say that both of us want a lot more of this kind of theorizing. And what you and I both want is to extend this theorizing to the MLGs themselves and this requires going to another level of abstraction for that is the only way to approach the problem. You can't explain why something is missing unless you treat what is missing and what is there as potential instances of the same thing. This means abstraction. Just as we "lost" constructions (and their specificity) with Subjacency, we will loose (abstract away from) some detail with any unification of the MLGs. And right now GGers are loathe to loose this specificity. And this reluctance makes it hard to ask your why is X not there when applied to the properties of MLGs.

    5. @Norbert: That's a great way of putting it, much more to the point. It makes you wonder, though, why an interest in finding more MLGs is considered a reason against abstraction. The fact that very few people so far have looked at what kind of generalizations can be stated at those more abstract levels means that there are still many low-hanging fruits out there. If a 100 linguists would start mining that level for generalizations, I'm sure we would find a lot of interesting stuff very quickly. And I think the generalizations would still be i) empirically insightful and ii) reveal broad properties of the technical machinery. Isn't that exactly what MLGs are about?

    6. This comment has been removed by the author.

    7. Well, you can guess my answer to your rhetorical question. As I said in Athens, MLGs are not an end in themselves but a way station to understanding the structure of FL. I think that this has been largely forgotten/disputed in recent years. There is a kind of Greenbergian overlay to our understanding of MLGs, if not in the technical detail, then in obscuring their scientific significance. But this is an old song that I've been singing for a while and though I am sure that everyone values another greatest hits volume, I will refrain from gong into a full blown aria

    8. This comment has been removed by the author.

    9. @Thomas: I'm confused about something. You mention the PCC as one example of how the class of natural languages is "sparse" (i.e., there are many logically possible PCC-like effects that one never sees). But then in describing this sparseness, you make the assumption that there are 3 person categories, an assumption that is – and I'm being charitable here – rapidly falling out of favor.

      [An aside: I'm assuming, in the paragraph above, that I understood your math correctly. So let me spell it out just to make sure. I understood the "6" in 2^6 to be 3!, i.e., the number of ordered pairs of person features without repetitions (assuming a 3-person system), and thus contrasted with 2^9, where "9" is 3^2, i.e., the number of ordered pairs of person features with repetitions (again, assuming a 3-person system). If this is wrong, then you, and anyone else reading this, can probably skip what follows.]

      I would say that the reason the PCC looks sparse from this vantage point is because the space of possibilities has been mischaracterized (this goes back to Adam Liter's point, above). In particular, the treatment of 'person' as a trivalent feature has artificially inflated the space, and so it looks like it's sparse.

      A more accurate feature-geometry for person features gives rise to a space of PCC-like possibilities that is a much more snug fit for actual, attested PCC effects (see, e.g., Bejar & Rezac 2003, 2009, Harley & Ritter 2002, McGinnis 2005, Nevins 2007, for discussion; I don't mean to imply that all these authors agree with one another or that all the relevant problems have been solved, but I think these are good examples of the relevant kind of proposals).

      At this juncture, I imagine you could retort that this is exactly what you are talking about: that the class of natural languages is sparse (in the aforementioned sense), but that by tailoring a specific solution to the PCC sparseness issue, we have missed the sparseness forest for this particular tree. To that I would respond as follows: this would be true if the feature geometries in question had going for them only their snug fit to the PCC; but crucially, the genesis of these feature geometries in generative grammar was not for anything syntactic, but for modeling the pronoun inventories of different languages (Harley & Ritter 2002). Note that focusing on properties of languages "modulo lexical differences" is a recipe for missing the very kind of data that set this particular strand of research in motion.

      From where I sit, this PCC story reaffirms what Adam said: the PCC looks "sparse" only under a misconception of what person features are like (that they consist of "1st person", "2nd person", and "3rd person", as we were taught in school). So as far as I can tell, the PCC is not a data point for the sparseness, or lack thereof, of the class of natural languages.

    10. @Omer: I intended the math to be computed over surface forms, not the representations used by the grammar. Irrespective of whether language actually has three person features, languages definitely have three morphologically distinct forms according to person. So you can define 2^6 different PCC variants according to how those surface forms may combine (just imagine a 3x3 table, with the diagonal blacked out, and each non-black cell can have a checkmark or a star). That is the sparseness.

      If you have a story in terms of a different feature system (which you do), that is perfectly fine and a good step towards understanding what is going on, but it doesn't change that surface sparseness holds. Your strategy is not too different from what I would do approaching the very same isue from a mathematical perspective: by default, we should expect free variation because that's the simplest story, so if that doesn't hold, that means the space within which we allow free variation is too big and we need to find an appealing way of narrowing it down. For you it's related to what kind of person features the grammar uses, for me it's due to an algebraic property linked to Zwicky's person hierarchy. But whatever the right story might be, the surprising thing is that language obviously has the means to encode a three-way surface split for person, but cannot capitalize on that distinction in other domain. That's the intuition behind sparseness, and I admit that I stated (1) way too sloppily to make that point clear (sometimes it's hard to resist the lure of handwaving).

    11. @Thomas: I don't think surfacism rescues the supposition that person is in any sense trivalent. You say, "languages definitely have three morphologically distinct forms according to person." But that isn't true in any general sense that I am aware of. Languages with a first person exclusive vs. inclusive distinction have four; and, depending on your view of obviative, Algonquian languages might have five. You might choose to exclude such cases on the grounds of a prior theoretic commitment to what does and doesn't count as a person category, but then you can no longer claim that "the math [is] computed over surface forms" (emphasis added).

      So I maintain that the 3x3 table is a construct in any case, and all the PCC teaches us is that it is the wrong construct.

    12. @Omer: Now it's my turn to be confused. That some languages have more than three morphological distinctions with respect to person makes the problem worse, assuming that at least one of them has something like a PCC. The fact of the matter is that there are at least 60 PCC variants I could describe right now that are not attested (even more if we extend the number of surface oppositions to account for these other languages). That's a fact that is completely independent of why they don't exist, and that fact is an instance of sparseness. Just like we do not find all conceivable types of islands (3 out of at least 4) or comparative-superlative systems (due to *ABA), nor symmetric variants of established constraints (e.g. Principle A' where the reflexive must c-command its antecedent). The slightest alteration of a natural language system has a very high change of yielding an unattested system. That's a remarkable fact that is anything but a given (for instance, you can run pretty wild within the class of CFLs before you're pushed past its boundaries), and arguing about feature representations is simply missing this very point.

      But here's another, much more narrow attempt at formulating sparseness, maybe that one is less contentious:

      (1') Let M be the class of languages that can be generated by a Minimalist grammar modulo substantive universals. Then most members of M are not natural languages.

      This is still ill-defined because M is infinite, so it makes no sense to talk about most members. And the modulo part is too permissive because it includes grammars with 100 person features, which we are not interested in. But maybe the intuition is clearer now. Take the grammars that are expressive enough to handle natural languages, allow free variation within that class, and most of the languages you get are not natural languages because pretty much no natural language system shows full free variation.

    13. This comment has been removed by the author.

  2. @Thomas: My intent in bringing up four- and five-person systems was not to suggest that we increase the N in your NxN table from N=3, to N=4 or N=5. Quite the opposite: it was supposed to be yet another reason why one should be skeptical of a flat, N-ary structure for person features, in the first place. After all, what is special about the numbers 3, 4, and 5, that's not also shared by 2, 6, or 7?

    (For a more insightful analysis of how, e.g., four-person systems differ from three-person ones, see in particular McGinnis 2005 in Language.)

    But your larger point, concerning how (1') differs from, e.g., properties of context-free languages, is well taken. I would venture, btw, that most practicing generativists understand this on an intuitive level, which is one possible reason why we are skeptical that formal language theory will tell us much about natural language at this juncture; I certainly feel that way (though, of course, intuitive skepticism does not an argument make). I'm not talking about formalized Minimalist Grammars here, but about debates like "how is the mild-context-sensitivity of natural language best characterized?" and the like.

    1. @Omer: The status of formal language theory is rather tangential to the sparseness debate. I used CFLs as a convenient example, but sparseness amounts to more than simply "language is weird from a mathematical perspective". Here's yet another way of looking at it (which is not exactly the same claim as (1'), which is not the same claim as (1); more research is needed to figure out which formulation is closest to the truth).

      Suppose we take all (formal) languages and try to arrange them on a table according to a surface metric. Let's take a metric closely related to the PCC. There are some languages that freely allow DO-IO clitic combinations, and some that do not allow any. Those should be at the opposite corners of the table, let's call them corner F for free and corner B for blocked. The languages between them are arranged in such a fashion that the closer a language is to corner B, the fewer clitic combinations are allowed. Now if I ask you to draw a single circle around all the natural languages on the table, and only those, you can't do it. That's because the way we have arranged the languages by surface distance, it looks more like a Swiss cheese with many non-natural holes between any two natural languages.

      In slightly more technical terms: It holds for every surface metric M that if the class of all languages is arranged according to M, then the convex hull of the subclass of natural languages also includes some non-natural languages. Homework exercise: define surface metric ;)

    2. @Thomas: I would imagine that we should expect the class of languages to be greater than the class of attested natural languages at least to some small extent either because of historical accident and/or some possible natural languages being diachronically unstable for whatever reason.

      This is where I think a place like artificial language research is potentially useful. Norbert posted a while back about the piece from Jennifer Culbertson and David Adger that showed up in PNAS, which showed that English-speaking participants inferred that the string order in an artificial language with postnominal modifiers was N-Adj-Num-Dem (rather than the unattested N-Dem-Num-Adj), even though they only ever got N-Adj, N-Num, or N-Dem in their training input. (The reason you might expect N-Dem-Num-Adj is because Dem-Num-Adj is the order in English, and the participants were English speakers.) And although it's admittedly not dispositive evidence that the order N-Dem-Num-Adj is impossible, it's probably some of the better evidence we will ever be able to get, since you cannot really inductively prove the nonexistence/impossibility of something (i.e., the fact that N-Dem-Num-Adj is unattested could be either historical accident and/or because, for whatever reason, it's diachronically unstable).

      So if you can set up an artificial language experiment that biases the participants as much as possible to acquire a particular grammar and they do not acquire that grammar, then I think it's at least minimally suggestive it is an impossible grammar, whereas if they do acquire that particular grammar, then perhaps that is evidence that it is a possible natural language and is just not attested because of historical accident and/or being diachronically unstable.

      Anyway, the point being that perhaps artificial language research can help us identify whether our class of definable languages is too big or if it's okay because the unattested languages in our class of definable languages are just not attested because of historical accidents and/or being diachronically unstable.

      In other words, in the ideal world, I think your (1') is false modulo what I said in the first sentence of this post. It might turn out to be true, but, methodologically at the very least, it would seem to me that it is best to start out assuming that it is false. (It also seems to me that if your (1') is true, then a lot of explanatory adequacy of the theory goes out the window.)

      (I assume that when you said "most members of M are not natural languages" you meant not only that they are unattested but also that they are impossible natural languages. If this assumption is wrong, then never mind what I just said.)

      Moreover, I still don't see why things like (1) cannot be specified in terms of generative machinery. You said "Each individual gap can be encoded in terms of structural universals, but quantifying over all constructions and stating that there is a gap for each one of them is not something the grammar can do. If you had a very profound theory of grammar where all phenomena are connected by a single abstract mechanism and then put a restriction on that mechanism, then you would be good. But Minimalism is not at that point, the machinery used for the PCC is different from the one for islands which is different from the one for say, Bobaljik's *A-B-A generalization for comparatives and superlatives".

      [continued ... ]

    3. [continued ... ]

      I don't understand why this is something that the grammar would need to do. Unless I'm missing something, this only follows if there really is a typological gap in the sense that the one of the languages in the class of definable languages is impossible as a natural language. But if, as Omer and I have been suggesting, the impossible language isn't definable, then I don't see why this is a problem.

      And it seems like, at least in the case of subject-aux inversion and the PCC, you might be able to change your assumptions about the primitives of the grammar in such a way as to prevent the unattested language from being definable (i.e., the grammar manipulates hierarchical objects, not strings and that there are two person categories, not three, respectively).

    4. @Adam: I did indeed mean the class of possible natural languages furnished by UG, not just the class of attested natural languages. You are right that the former should be assumed to be a fair bit bigger, due to learnability, processing requirements, and diachronic effects carving out specific subsets. You're also right that this raises the issue of how the class of natural languages could even be identified empirically. I also agree that artifical language learning can provide important clues, though I'm a bit more skeptical about what exactly it is those experiments are testing. But that is something that can be worked out through careful experimentation.

      Regarding your second point, I can only repeat what I already said to Omer; the suggestion that these languages are not definable misses the point. The languages are definable in a very natural sense, that's part of what sparseness is all about. The recourse to features doesn't help. It's like saying the natural numbers aren't a special subset of the integers because negative integers can't be written anyways if the minus sign doesn't exist.

      The feature representation story is not without merit, but it does not make the property disappear. It is an instance of what I described above as putting a restriction on the grammar, in this case on the feature system for person. And as such it is a specific restriction that works for this specific phenomenon. It does not predict that we find the same kind of sparseness for other areas like the typology of islands, yet we find it there, too. So for that one you would have to come up with another restriction since to the best of my knowledge nobody has an idea at this point how an assumption about person features could be unified with a restriction on islands.

      What you're left with, then, is decomposing the unified property of sparseness into a list of seemingly unrelated properties of the grammar. That commits you to a lot of technical detail while still leaving open why typological gaps are so pervasive, so this list does not adequately capture the quantificational force of sparseness. Just like you can't reduce second-order logic to first-order logic by equating each set with an individual.

    5. @Thomas: It's not at all "like saying the natural numbers aren't a special subset of the integers"; it's more like saying the natural numbers aren't a special subset of the set of numbers that my 2nd-grade homeroom teacher thought are special. The set of integers is definable independently of facts about the world; the set of possible PCC effects is not. You chose to ground *your* definition of the set of possible PCC effects in a kind of surfacism (the total number of distinct surface forms), which I think is hopeless to begin with – that number is not crosslinguistically stable, and there is no reason to think that reduction-to-the-worst-case (i.e., taking the set of possible PCC effects to be defined over an NxN table where N=the highest number of person distinctions one ever finds in a single language) is a priori privileged over any other approach.

      This is related to another issue: you advocate here for the "cast-a-wide-net" approach to syntax. It seems to me that, once one chooses to case the net widely (viz. choosing the largest attested N to construct your NxN table), it is rather unremarkable that one finds that the interior of that net is then only sparsely populated.

    6. @Thomas: I think the analogy to numbers doesn't really go through. The class of natural languages is ultimately bounded by facts of the matter in the world whereas the class of numbers is, in principle, as big as you choose to define it to be.

      So yes, it's true that you can define a grammar that does subject-aux inversion with a linear rule, but the fact of the matter is that grammars don't operate over strings and transformations proceed hierarchically not linearly.

      I honestly don't understand why you're resistant to manipulating the primitives of the grammar in such a way so as to rule out the things that are unattested. This is what explanatory adequacy is about, and it seems to me that any good science or inquiry tries to explain why things are they way they are and, more importantly, why they are not some other way. If something like your (1') were true, explanatory adequacy would go out the window.

      If (1') were true, I cannot imagine why it would be the case that the definable class of languages contains languages that are not natural languages. In other words, we wouldn't be able to answer the question of why L1, L2, and L3 are definable but not natural languages whereas L4, L5, and L6 are definable and natural languages. Why L4, L5, and L6 but not L1, L2, and L3?

      I personally think that question has an answer. Or, at the very least, I think it makes most sense methodologically to assume that that question has an answer. And so we can rule out languages like L1, L2, and L3 by changing the primitives in such a way so as to remove L1, L2, and L3 from the class of definable languages.

      (Then, of course, the next question becomes why those primitives and not some others.)

      It seems to me that your resistance to manipulating the primitives of the grammar in such a way so as to rule out those languages that are impossible natural languages stems solely from the fact that doing so would most likely not be able to do be done in a unified way. But I actually don't think there's an a priori reason—or an a posteriori reason, for that matter—to think that this should be so. Linguistic facts are presumably not homogenous. I imagine that language involves many cognitive operations going on in the brain. Maybe there is only one linguistically specific operation (e.g., Merge), but there are presumably still some other operations that happen. And maybe there are more operations than just Merge, too, such as Agree. And presumably there are restrictions on what the atoms that these operations operate over can look like.

      I think ideally we would be able to unify many of the typological gaps in terms of these different things at stake, but I don't see any reason to think that every single typological gap would have to follow from the exact same aspect of the linguistic system.

      Here's an analogy that I think makes more sense than your number analogy given that natural languages are ultimately objects of the world (even if you define them computationally/mathematically): it's true that we don't have a horned horse (unicorn), and it's also true that we don't have a horned dog. So we have some of what you're calling sparseness. Should we therefore think that the reason for both of these typological gaps is the same reason?

    7. Ah, looks like Omer beat me to posting largely the same point. =]

    8. @Omer: That the number isn't cross-linguistically stable isn't much of an issue because even if you restrict your attention to just the subclass of natural languages with exactly n surface distinctions for person sparseness still holds. And you don't even have to do that under the metric approach I outlined above, where all you do is line up languages between "everything goes" and "nothing goes".

      We can actually turn this whole discussion around by asking instead what kind of subclass one has to focus on so that sparseness doesn't hold. Your answer would be to look only at languages with clitic combinations that can be regulated in a system with only two person features (under a specific set of assumptions about how the grammar operates). It might be true that then we don't have sparseness with respect to the PCC, but the class will still be sparse with respect to many other processes and constraints. And once you've narrowed down the class enough that sparseness no longer holds, you have filtered out the great "majority" of this infinite set --- a little bit like filtering out all integers that aren't prime. This outcome is not a logical necessity, and as such I do find it very interesting.

      Note that this doesn't even follow in any trivial sense from positing UG --- MGs incorporate a fair share of universals (assumptions about Merge, Move, the feature calculus, the output structures, linearization) but they are not sparse under a large variety of surface metrics.

    9. @Adam: The analogy is apt. The class of all tree languages is not a natural object either, and that's one possible frame against which sparseness of natural languages can be evaluated and seen to hold. Also, I think we can agree that the numbers 1, 2, and 3 are cognitively real in a certain sense and the set of those three numbers is narrowly restricted, so just take the number example above and substitute "{1,2,3}" for the set of natural numbers. You'll need a more elaborate version of the minus part of the statement, but that doesn't really change the point: we're talking about a property that language satisfies, a very special property that differentiates it from what language could look like in a different reality but simply doesn't.

      It is also a property that should be stated at the level of classes because this is where it is observed. I don't understand why you think that means explanatory adequacy has to go out the window. First of all, explanatory adequacy does not imply that one needs a grammar-based story. Second, I have said several times now that this can in principle be reduced to an abstract property of grammar, just like CFL's closure under reversal can be traced back to properties of CFGs. However, that does not mean that one should state this property in terms of CFGs (for example because CFGs are just one representation of CFLs, and because the statement is less elegant at that level).

      More importantly: at this point we don't have a single abstract property that could provide an explanation for the full breadth of sparseness, and as soon as you break it up into many distinct properties you i) have to explain why language has all these properties, and ii) you no longer have a succinct generalization.

      What I really don't get is that you are effectively arguing for what I said in the first place: some generalizations at the class level may break down into a non-unified story at the level of grammar with several seemingly unrelated factors. I take that as an argument for class-level generalizations, just like we don't state binding theory at the level of neural activation patterns because there seems to be no unified story there at this point.

    10. @Thomas:

      I think you're right that there is much less disagreement here than it seems; I had that sense myself when writing the last comment. However, I do still think there is some disagreement. Let me see if I can backtrack and clarify the points of disagreement.

      We both agree that there are languages that are logically possible—logically possible in the broadest sense (i.e., languages that are internally consistent and coherent)—but are nonetheless not natural languages. For the purposes of this comment, I'll call these 'typological gaps'.

      In your first comment, you said that you don't think these typological gaps are expressible in terms of generative machinery (to which Omer and I both objected). After further clarification, this does not seem to be what you meant. Rather, I now take you to have meant that what is not expressible in terms of generative machinery is not any particular typological gap but instead the fact that, in general, there always seems to be a typological gap.

      If this reconstruction is correct, then this is the first point of disagreement between us. As I said above, I see no reason to think that that there always being a typological gap should be a property that our theory tries to capture. This is what the animal analogy was meant to show. While it could be the case that our lack of horned horse and horned dog are due to the same reason, there is no a priori reason to suggest that it must be the same reason (and, in fact, there might be a posteriori reasons to suggest that it should not be the same reason).

      Likewise with linguistic phenomenon. There is no a priori reason that I can think of as to why we should expect lack of linear transformation rules and lack of certain PCC configurations to follow from the same property of language. It's true that it might be the case, and it would be nice if we could unify a decent number of phenomenon (cf. On Wh-Movement), but I don't expect that every linguistic phenomenon actually derives from the same fact about natural language.

      Another place where we agree, however, is that the class of definable languages is going to be bigger than the class of natural languages. We both think the class of definable languages is bigger than the class of natural languages to some trivial extent because of historical accident, diachronic instability, processing constraints, learning contraints, et cetera.

      This, however, is where we part company again. Given your (tentative?) commitment to (1'), you seem to think that the class of definable languages is even bigger still because there are definable languages that are impossible natural languages. As I said above, if this is true, then I think it renders the question of why L1, L2, L3, L4, L5, and L6 are all definable languages but only L4, L5, and L6 are possible natural languages unanswerable. That is, if (1') is true, then I do not think there is a rational answer to that question.

      Moreover, if one is a committed Rationalist, I think one must be committed to there being a rational answer to that question. Being a committed Rationalist, I take this as a rather convincing reductio against (1').

      In other words, it's true when you say that "at this point we don't have a single abstract property that could provide an explanation for the full breadth of sparseness, and as soon as you break it up into many distinct properties you [...] have to explain why language has all these properties", but I would much rather be put in the position of having to explain a few distinct properties—and hopefully it is only a few because hopefully we can unify a good number of them—than be put in the position of not being able to provide a rational answer to the question of why L1, L2, L3, L4, L5, and L6 are all definable languages but only L4, L5, and L6 are possible natural languages.

      [... continued ]

    11. [... continued ]

      So I think those are the two main points of disagreement, although there is definitely some agreement here, too. After the attempted reconstruction and attempt to clarify the points of disagreement, I would like to say one more thing, too. And I would also like to preface it with a disclaimer. :p

      So, disclaimer: I debated for a good while about omitting this part of my comment, as I don't see much point in getting bogged down in analogies since, after all, analogies are not arguments, and they all ultimately break down somewhere. However, I think there is something in the number analogy that might be indicative of some of our disagreement that I just tried to clarify. So onward! :)

      You said "Also, I think we can agree that the numbers 1, 2, and 3 are cognitively real in a certain sense [...]". No, I don't agree, actually, at least not in the way that I think you had in mind. I don't think numbers have any reality. The only thing I think that has any reality is our lexical entries for them and our concepts of them. (God knows what concepts are ...).

      So I still don't think the analogy is apt. Now, my rejection of abstract objects having any reality is—based on my very surfacey familiarity with the philosophical metaphysics literature—admittedly relatively unpopular. However, I don't think one needs to be committed to the nonexistence of abstract objects to show that this analogy is still not apt.

      I think it can be shown using the joke about mathematicians and linguists ("Can you get 9 as prime? I think I can."). We can both intuit the fact that *Who did John punch Fred after kissed Bill is bad, but neither of us could intuit that 492,876,866 is not a prime. It's true that we could both have intuitions about whether it is prime or not, but those intuitions would not have any evidentiary status in determining whether it is actually prime or not. This, I think, shows that the cognitive reality of—i.e., any given individual's concept of—492,876,866 need not line up with the actual properties of 492,876,866. Likewise with 1, 2, and 3, albeit it is much more likely that the cognitive realities (concepts) for these numbers will line up with the properties of the actual numbers since most people are relatively familiar with the properties of 1, 2, and 3.

      So, recoursing to features to explain the nonexistence of certain PCC configurations is not "like saying the natural numbers aren't a special subset of the integers because negative integers can't be written anyways if the minus sign doesn't exist".

      It's quite possible that the third person feature (~ "minus sign") actually doesn't exist as part of the language faculty in speaker's brains; defining and formalizing the third person feature (~ "minus sign") cannot alter that reality and make it exist as part of the language faculty in someone's brain. On the other hand, defining and formalizing the minus sign can make it "exist" even if there is somebody out there who has no concepts in their brain for the minus sign and negative integers.

      Let me end by saying thanks for pursuing this discussion as far as you have. It's been enjoyable and has helped clarify my thinking on some of these issues. Thanks! :)

    12. "In slightly more technical terms: It holds for every surface metric M that if the class of all languages is arranged according to M, then the convex hull of the subclass of natural languages also includes some non-natural languages."

      There is something I don't get in this statement. On the one hand, you seem to be specifying languages through minimalist operations (otherwise, what would be the meaning of "all languages"?). On the other, you then apply to them a "surface metric." But a surface metric which is well-defined on the whole space of languages is not surface, by definition. So it seems to me you are implicitly assuming that your surface metric derives from intrinsic properties of all languages in the space you are considering (personhood in your example of the PCC) but then Adam's and Omer's counter-argument applies: sparseness could come either be intrinsic (genuine sparseness) or come from a mistaken notion of the correct range of variation to consider (artificial sparseness), and the second hypothesis seems to me to be overwhelmingly more probable (for the usual biological reasons).

      To pursue the numerical analogy, the two phenomena correspond respectively to composite integers being of density 1 (not sparse at all) in the smallest ring generated by 1 but of density zero (very sparse) in the smallest field generated by 1, with the structure of field being then the mistakenly too general variation allowed (and by the way Adam, many people, including me, have a very strong intuition that 492,876,866 is not a prime ;-)).

    13. @Olivier ... Hahaha, good point. Thanks for pointing that out. I don't know why I didn't catch that. I just picked a big prime and increased it by a few ... :p

      Well, hopefully everyone is feeling charitable enough to replace it with some more suitable example ...

    14. @Olivier and pretty much everyone else: The notion of "all languages" is indeed what seems to be the crux of the whole discussion. There is of course a mathematical notion, which would simply be the class of all subsets of Sigma* (this allows for strings as well as trees, depending on whether Sigma is a ranked alphabet). But that is not an interesting class because even the recursively enumerable languages are sparse with respect to that class. So what do I have in mind instead? It's basically the closure of the class of natural languages under "naive typological completion". In the case of the PCC, take any PCC table and look at the tables you can get by randomly switching grammaticality values: 64 instead of the attested 4 (for languages with a three-way person split in morphology). For islands, look at what kind of islands you could describe with the current argument/adjunct split: 4 instead of the attested 3. The claim is that if you do that, you'll see that the class of natural languages is sparse.

      With a different reference class, you won't necessarily see sparseness. For instance, the class of natural languages, by definition, is not sparse with respect to the class of natural languages. Just like prime numbers are sparse within the set of natural numbers but not within the set {1,2,3}. That doesn't change the fact, though, that sparseness with respect to the class I picked as a measuring rod is an interesting property. It shows us what language is not like even though the reference class is intuitively very natural.
      In particular, it raises the question why this class isn't closed under naive typological completion.

      That's why I contend that Omer's and Adam's responses miss the point: if you redefine the reference class, you are no longer looking at the same property, nor at the same question. So why did I pick that specific reference class? Because it is sufficiently challenging in the sense that most formalisms fail to predict sparseness, yet at the same time it is fairly light on theory. You do not talk about person features, you just talk about distinct surface forms that express person. We can have endless fights about the feature representation, whereas there is wide consensus that Spanish has a morphological pronoun paradigm with a three-way split according to person. Similarly, we do not need to go into the technical details of islands, we just observe that there are constructions that put restrictions on arguments and/or adjuncts with respect to extraction (and the argument/adjunct distinction if fairly well established).

    15. The second issue is whether sparseness can be accounted for at the level of grammars. So this is basically going one step further and promoting sparseness from a useful tool of inquiry to a property of language that needs to be accounted for, and how we should go about that. It seems that this is what most of Adam's objections are directed at, though I still don't quite understand them.

      What I said about accounting for sparseness strikes me as fairly innocent: Sparseness cannot be stated with Minimalist machinery, it is a higher-order property. So it can only indirectly be explained at the level of grammars; in particular, it might not form a unified problem at that level. Unification is a nice thing, though, so if there is a class-level explanation, I would find that more appealing. What more, a class-level explanation is not tied to a specific formalism (that was my point about CFLs vs CFGs), which also makes it more general.

      Curiously absent from the discussion is a detailed argument as to why there can't be a class-level explanation or why we shouldn't try to think along these lines. Adam has two things to say on that:

      1) Depending on how we conceptualize the class of natural languages, this question might not even be well-defined. That is a valid point, but I think with respect to sparseness that is an advantage. For example, the reference class above is actually very close to the class MGL of MG-definable languages. So the first question you can ask is: let the class of natural languages be the members of MGL that are learnable in paradigm X; do we already have sparseness? Then we can narrow it down another step: let's pick the learnable MGL's as the new reference set and let's only consider those members that are efficiently parseable. Do we have sparseness? And so on. The point is, we can operationalize the universal into a procedure for teasing apart the typological contribution of the separate components that jointly carve out a smaller and smaller space within which we find all attested natural languages.

      2) Adam's second point is that we shouldn't expect such an explanation to exist. 100% true, but that's no reason not to shoot for it. It is the stronger, more general option, so we shouldn't discard it just because it's outside the standard line of generative reasoning.

    16. @Thomas: As long as you continue to assume an a priori privileged status for surfacism (e.g. "We can have endless fights about the feature representation, whereas there is wide consensus that Spanish has a morphological pronoun paradigm with a three-way split according to person"), I think you and I will continue to disagree.

      If modern linguistics has taught us anything, it is that taking surface properties at face value is probably the wrong way to go. From my vantage point, things continue to look as follows: you are choosing an arbitrary representation for person features (that it is a surfacist representation makes it no less arbitrary, esp. in light of the aforementioned lessons of modern linguistics), and declaring that in the light of that arbitrary representations, things look sparse. I still don't see why that's an interesting result.

      This is not to say that class-level sparseness cannot exist; I am taking issue with the particular claim that the PCC is at all relevant here.

    17. @Omer: If modern linguistics has taught us anything, it is that taking surface properties at face value is probably the wrong way to go. But that's exactly the point! Sparseness is an attempt to take that observation, generalize it, and highlight that this is, first, a particular property that is anything but a logical necessity, and second, much more pervasive than one would expect based on pure chance.

    18. @Thomas:

      You said "Curiously absent from the discussion is a detailed argument as to why there can't be a class-level explanation or why we shouldn't try to think along these lines".

      The reductio argument that I gave was an attempt to provide a reason as to why we ought not to think along these lines. I think there has been some useful terminology introduced by you and Olivier that might allow me to try recapitulating this point one more time.

      You pointed out that the "the class of natural languages, by definition, is not sparse with respect to the class of natural languages" and you contrast this with what you call a "reference class", suggesting that the class of natural languages will be sparse with respect to the reference class.

      The point that Omer and I are making, I believe, is that we see no point in defining such a reference class. By definition, if you define it in such a way so as to capture sparseness, the reference class will be too big. And, in particular, you will be faced with the question of why you can define L1, L2, L3, L4, L5, and L6 but only L4, L5, and L6 are natural languages. Again, this is something that cannot have a rational answer, I don't think, at least not if you're committed to the primitives that you used to define the reference class—or some notational variant thereof—being the primitives that underly natural language. Based on my understanding of your tentative commitment to (1'), I understood you as being committed to whatever primitives defined the reference class also being the primitives of natural language.

      And so, given my Rationalist commitments, I think there must necessarily be a rational answer to the question of why we have some natural languages but not other logically possible natural languages and therefore take this as a reason to reject the primitives that have defined the reference class as being the primitives that define the class of natural languages.

      To put this in terms that Olivier used, I think the rational assumption must be that the sparseness is artificial, not genuine.

      This quote from Chomsky's Language and Problems of Knowledge is, I think, perfectly appropriate (and I hope you don't take offense to me quoting this; Chomsky is being a bit dismissive here ... :p ).

      "The status of P-linguistics, or of the study of E-language generally, is quite different. Thus the advocates of P-linguistics have to demonstrate that in addition to the real entities C-English, C-Japanese, etc., and the real mind/brains of their speakers, there are other Platonic objects that they choose to delineate somehow and study. Whatever the merits of this claim, we may simply put the matter aside, noting that people may study whatever abstract object they construct".

      [... continued]

    19. [... continued]

      Now, it could be the case that we've misunderstood you and that you're not committed to the primitives that define the reference class being the same primitives that define the class of natural languages. I believe Olivier also interpreted you as being committed to this, which is why I believe he said "On the one hand, you seem to be specifying languages through minimalist operations (otherwise, what would be the meaning of "all languages"?)".

      But, at any rate, even if you are not committed to this and we've misunderstood you, then I still don't think this would be a good line of thinking to pursue. You suggest a methodology that we might purse in your latest (1): "For example, the reference class above is actually very close to the class MGL of MG-definable languages. So the first question you can ask is: let the class of natural languages be the members of MGL that are learnable in paradigm X; do we already have sparseness? Then we can narrow it down another step: let's pick the learnable MGL's as the new reference set and let's only consider those members that are efficiently parseable. Do we have sparseness? And so on."

      This procedure is, I think, potentially dangerous. In order for it to work, I think it requires that we know the extension of the class of natural languages. And since we all seem to agree that the attested natural languages are going to be smaller than the class of natural languages, I'm not sure how we could ever be confident that we have identified the extension of the class of natural languages.

      I think we would need to know this because we would first need to know that the reference class we have defined contains the class of natural languages. If we don't know this but only think we know this, then we might narrow things down until we have no sparseness left and think we've got it right, but we might actually have identified something smaller than the actual class of natural languages because the reference class never contained the entire class of natural languages to begin with.

      And then, if we were never committed to the primitives that defined the reference class to begin with, we would, I presume, at this point go ahead and try to define the primitives in such a way so that they just generate the class of languages that we have identified by following your procedure. But, if we were wrong about the extension of the class of natural languages to begin with, then we would be trying to define a set of primitives that will ultimately rule out some languages that shouldn't be ruled out.

      I'm not sure if that helped clarify anything or not. Hopefully it did a bit. But anyway, thanks again for the discussion! And I hope you didn't take offense to me quoting Chomsky at you! :) This has been enjoyable and intellectually stimulating, so thank you for all of your thoughtful responses.

    20. Let me chip in; I think there is a good argument here. Back off for a moment to the good old competence/performance distinction. So it is uncontroversial (famous last words) that the set of grammatical sentences is going to be a lot bigger than the set of acceptable sentences, if only because the former is infinite and the latter is finite. Even if you just consider sentences of length 20, if you sample uniformly from the set of grammatical sentences of this length (taking some plausible generative grammar of English with full vocabulary) you will find that only a tiny fraction are acceptable, just because, under current assumptions, it doesn't make sense to put various factors that affect acceptability (eg semantic coherence) into the grammar.

      Similarly we have the small finite set of attested languages (ANL) , and some larger set of grammars that are licensed by UG (UGNL). Here are some properties that ANL have: all languages have a word for the sun; the shortest utterance in every language takes less than 10 seconds to say; all languages have more than 10 lexical items. Now I don't think anyone is proposing that these are constraints on UG are they?
      More interestingly, I think that learnability hinges on some properties of string languages that cannot be expressed in UG terms. And I mean "cannot" quite literally, as the properties may be undecidable in general (some technical caveats here) So it seems inevitable that there is going to be some class of "possible human languages" -- i.e. ones that we might actually find people speaking under certain circumstances -- that is much larger than ANL but much smaller than UGNL, indeed sparse wrt UGML, though that is a bit too vague for my liking.

      I think you have to be a lot more precise about what you mean by the class of natural languages to make the arguments clear.

    21. @Alex: Thanks for the suggested concepts (though I protest the creation of new acronyms!). I think these are helpful, and I agree that distinguishing between the set of attested natural languages, the set of possible human languages, and the set of UG-definable languages makes sense.

      You say that we should expect the set of possible human languages to be sparse with respect to the set of UG-definable languages, and I think that makes sense. To use Olivier's terms, I agree that there might be some genuine sparseness. But I still think a lot of sparseness is going to be artificial. And as far as I can tell, Thomas is trying to capture all sparseness regardless of whether it is genuine or artificial.

      I think what your proposed conceptual distinctions suggest is that, in order for us to take seriously the proposal that we have genuine sparseness between the set of possible human languages and the set of UG-definable languages, we would need at least a sketch of an account as to why the languages that are in the set of UG-definable languages but not the set of possible human languages are not learnable (and thus not in the set of possible human languages).

      So unless there are extant arguments that are at least semi-convincing that all of the sparseness phenomena that Thomas is trying to unify are instances of genuine sparseness, then I think what Thomas's proposed approach will do is lead to us defining a reference class that is ultimately much bigger than the set of UG-definable languages. And then it's not clear to me what procedure we could use to narrow things down in such a way so as to ensure that we ultimately reach the set of UG-definable languages (given what I said in my last comment).

    22. This will be my last comment on this for a while since I have a transatlantic flight tomorrow and will probably take the weekend to recuperate from that. But a few remarks:

      1) No, I do not take the primitives of the reference class to be the primitives of the class of natural languages. There isn't any need to even posit primitives for sparseness to hold. You define two language classes, compare them, and notice that one has a very different structure from the other. For this basic observation it doesn't matter what primitives you assume. Just like I can define the class of regular languages without picking a particular generation model. And that's also why this is a class-level universal: I can meaningfully talk about it without ever talking about grammars, while the reverse is not true.

      2) The problem of defining the class of natural languages isn't in any way, shape or form restricted to this perspective. It could just as well be leveled against any other approach that posits universals --- how do we know that these universals are in fact universals and not just an accidental property of our restricted sample of attested languages. The answer is that we don't, but it would be a weird coincidence that all known languages only have 3 out of 4 island types. And how do we know that the universals are part of UG and do not arise from some third factor? Again, we don't. But we can look at universals and check what the simplest explanation might be --- if a universal needs to be stipulated in your grammar formalism but falls out naturally from learnability restrictions, you put it in the latter. That might not be how it works in reality (the restriction could be in both modules), but methodologically it's the more appealing solution, and there's nothing wrong with scientific theories being better than reality.

      3) We have a very good idea what kind of language classes are likely superclasses of natural language. For example, there isn't a single known syntactic phenomenon that can't be handled by MGs with overt copy movement but without the Shortest Move Constraint.

    23. 4) There are very strong non-learnability results for pretty much any supra-finite class (i.e. any class that properly contains all finite languages). Those results hold under a variety of learning paradigms, and every grammar formalism on the market defines supra-finite classes. Quite generally, learning requires that the class be structured in such a way that the learner can make safe generalizations, and many classes simply do not exhibit the required structure. So learnability will restrict you to a proper subclass of whatever your formalism defines.

      The question is whether learnability is enough to explain all sparseness. I don't think so. In fact, I expect that even once you've factored out whatever you can reasonably factor out via learnability, processing, diachrony, functional concerns, general human cognition (e.g. that first person is somehow more important than third person, or natural counting is something along the lines of 1-2-few-many), you'll still have a decent degree of sparseness left.

      Whether you want to attribute the remainder to UG, historical accident, or a third factor we can't even imagine yet (e.g. physical restrictions on the biochemical computations in our brains) I don't really care all that much about. Because by then the program will have already done its job. We started out with the observation that the class of natural languages has a particular structure (or rather, a hypothesis based on the usual induction from available data), what could be explained via other factors has been explained that way, and we are left with a well-defined remainder that acts as a UG-like restriction, whether it's part of UG or not.

      5) Funny that you would quote Chomsky. Right after I wrote the post Bob Berwick sent me an email where he said that Aspects already contains a passage that talks about sparseness. Unfortunately he didn't remember the precise page number (it should be around p60), and since I'm out of town I haven't had an opportunity yet to check it out.

    24. @Thomas:

      So it seems that perhaps there really is very little disagreement here and perhaps largely just terminological and conceptual confusion.

      I can't speak for Omer or Olivier, of course, but thinking that you were committed to the primitives of the reference class being the primitives of natural language was the main reason that I was objecting. I think my interpretation of your commitment to this stemmed from you saying two things in particular.

      First, you said "I can't resist pointing out once more that these conjectured universals of mine have no direct expression in terms of generative machinery, which I find awfully suspicious". I think generative machinery has largely been built as an attempt to specify what the primitives of natural language are, so one thing that I still honestly do not understand about your line of thinking is why it should be suspicious that we cannot express a property that holds of some class of objects defined with arbitrary primitives in terms of primitives that are meant to explain a much narrower class of objects—namely, the class of natural languages.

      (In fact, though I think I now finally largely understand what you have been suggesting all along, this is the last thing that I really still do not understand at all. Although this might be my nominalist metaphysical commitments getting in the way. Let me try to explain. To use Olivier's terminology again, there may be genuine sparseness between the set of possible human languages and the set of UG-definable languages. I think most of the sparseness you're talking about is, however, probably artificial because it stems from having the wrong set of primitives. Yet you seem to take the combination of genuine sparseness and artificial sparseness as a uniform fact of the world and want to derive it from some property (or at least be able to express it as a uniform fact). This is the premise that I reject. I don't think this is a fact about the world because I don't think that the arbitrarily defined reference class has any ontological status. So yes, there maybe be some genuine sparseness, but I see no reason to unify it with artificial sparseness and try to explain it (cf. also the Chomsky quote and my attempts to explain how your number example is disanalogous).)

      But, at any rate, you saying this in your first comment pushed me in the direction of interpreting you as being committed to the primitives that define the reference class as being the same primitives that define natural language in order to be able to make sense of the claim that this should be suspicious.

      [... continued]

    25. [... continued]

      Second, in your (1') you said "Let M be the class of languages that can be generated by a Minimalist grammar modulo substantive universals. Then most members of M are not natural languages".

      Similar remarks apply here. My understanding of what a Minimalist grammar is, is that it is an attempt to specify the primitives of natural language. So, to me, (1') reads as something like 'the primitives of natural language define a class of languages that is bigger than the class of natural languages'; this is what the reductio argument that I was giving hinged on.

      Again, I cannot speak for Omer or Olivier, but, based on my understanding of what they said, I wouldn't be surprised if this was how they were also interpreting your line of thinking.

      Sorry that all of this seems to largely have resulted from conceptual confusion and misunderstanding. At the very least, hopefully it's been helpful for both of us to figure out how to prevent similar misunderstandings in the future.

      Of course, it's quite probable that this stems more from my complete lack of familiarity with what computational/mathematical linguists do than anything else. :p

      One last thing: if what you said in (3) is really true—namely, that "We have a very good idea what kind of language classes are likely superclasses of natural language. For example, there isn't a single known syntactic phenomenon that can't be handled by MGs with overt copy movement but without the Shortest Move Constraint"—then I would be much less skeptical of the methodological approach that you're proposing—modulo the objection to unifying genuine and artificial sparseness as a fact to be explained—, which seems to be defining an arbitrary class of languages that is bigger than the class of natural languages and trying to work our way down.

      However, I'm honestly still a bit skeptical of your claim in (3). We do seem to have uncovered a lot of syntactic phenomena, but it's possible that we still might not know what some/many possible ones are. So thinking that we know that the defined reference class is a superclass of natural language on the basis of known syntactic phenomena is a bit worrisome to me.

      But again, I don't really have any familiarity with (the results of) computational/mathematical linguistics research. So this is probably just my ignorance speaking more than anything else.

      Anyway, sorry again for the confusion. Hopefully this last comment is actually useful insofar as it may have identified the origin of the confusion and can be helpful for preventing future misunderstandings.

      And thanks for the Chomsky reference. I'll look for it in my copy of Aspects. :p Thanks again for engaging and for all of the thoughtful responses!

  3. I think the matter is even more urgent for syntax than good press or survival. That's because we are now in a game where the very nature of WHAT we think we are describing is being squeezed from all sides of the cognitive equation. If we don't find a way to talk to the neuro and psycholinguists in a general and systematic fashion, by training ourselves to shift up a level in granularity, and actively seeking out the scientific results emerging from the other direction, then we are surely going to end up with the wrong theory. Well, worse. An irrelevant theory.

    1. I'm very sympathetic to this point, but could you be more specific? Did people at the meeting have ideas for a level of granularity that would make it possible for linguists to talk to cognitive scientists outside of linguistics?

    2. There was some discussion of this, mainly by exhibition. So, for example, Luigi Rizzi and Spyridoula Varlokosta showed how minimality issues could be used to probe problems in language acquisition and processing. Bit by and large, the issues were not broached.

      This said, let me add a personal note: I know that the minimalist conception of Merge being the fundamental operation in grammar gets people in neuro quite excited. Dehaene is particularly chuffed with this idea and believes he has way of probing the idea using MR techniques. We also have many people here at UMD investigating more standard issues concerning variation and its acquisition using GBish models. Finding the right grain, at least in conversing with those interested in psych questions, is not actually very hard. What is hard is getting them to pay attention, even to work done up to their standards using their techniques. I want to stress, that Gillian is right as a political matter. But intellectually IMO the non-linguists have been very hostile to taking linguistic issues into consideration and the problem has not been one of granularity.

  4. I don´t think there was enough talk about this at Athens. We have a set of ready made questions that we need psycholinguists to help us with if we are to meet the minimalist challenge of reducing the narrow computation to what is logically necessary, given the work that gets done translating across the interfaces and the work within those other modules of the brain. We need to know what is part of syn-sem and what is not. We won´t figure this out by reasoning or by other conceptual considerations, no matter how subtle and complicated, or how many strange-looking endangered languages we look at. As I read it, the Minimalist Programme contains an imperative to understand general cognition better, to understand third factor design issues as they relate to the language system better. While it is healthy to change systems and reaxiomatize a little so that you see problems in slightly different light, this is surely not the whole point of MP. But it sometimes feels to me as if this is all that is being done in practice. I do not think syntax as a field is taking the deep lessons of the program to their logical conclusion. We need to start thinking of creative ways of testing where those boundaries are and those methodologies are going crucially involve both psycholinguists and computational/mathematical linguists. MP essentially forces out into the messy world of the brain more generally, and means that we cannot afford to be insular. The island that GB created does not have secure boundaries any more, and the icecaps are melting..... Now, I am not saying that everyone has to put Darwin´s Problem into the first paragraph of their syntax paper. That does not usually not force anyone out of their comfort zone, or really ask the questions about the division of labour between different components that should be at the heart of the rethink. (At its worst, what I have seen from this kind of enthusiastic embrace of biolinguistics is a kind of Philosophy of Notation, which leaves out the hard work in the middle ground that would make the move to explanation meaningful. )
    I agree with Norbert that many psycholinguists have not been favourably disposed towards us (some of them have strongly held knee jerk ideologies that are not entirely rational; others of them are just so busy getting on with testing stuff from ready made paradigms that they know give publishable results that they are in their own little insular world just like us).
    But there ARE others, and there are coming more and more, who are doing absolutely relevant and crucially important work to our concerns and that we (read, I) don´t seem to know about. There are also lots of other questions that haven´t begun to be addressed, but which require some collaboration and creativity even to figure out how to tackle them. There are people out there who are having The Conversation already (Colin Phillips your ears are burning), and yet others who are not so ideologically idealistic that they would not jump on board if we had a well formed testable question they could get their teeth into.
    I would like to end this grumpy rant with a positive suggestion. It would be great if we had a regular brainstorming and information sharing symposium every two years or so with the interested and relevant psycholinguists and some delegation of interested minimalists and computational/mathematical linguists. The idea would be that the different groups would share their results and new ideas culled from the previous two years, and have a joint conversation about what we know, what we need to know, what we are in a position to test as the next step. Perhaps such a Forum already exists. If it does, I want to be invited to it.