Comments

Sunday, May 25, 2014

The GG game: Plato, Darwin and the POS

Alex Clark has made the following two comments (abstracted) in his comments to this post.

I find it quite frustrating that you challenge me to "pony up a story" but when pressed, you start saying the MP is just a conjecture and a program and not a theory.

So I read the Hauser et al paper where the only language specific bits are recursion and maps to the interfaces -- so where's the learning story that goes with that version of UG/FLN? Nobody gives me a straight answer. They change the subject or start waffling about 3rd factor principles.

I believe that these two questions betray a misunderstanding, one that Alex shares with many others concerning the objectives of the Minimalist Program (MP) and how they relate to those of earlier theory. We can address the issue by asking: how does going beyond explanatory adequacy relate to explanatory adequacy?  Talk on the Rialto is that the former cancels the latter. Nothing could be further from the truth. MP does not cancel the problems that pre-MP theory aimed to address. Aspiring to go beyond explanatory adequacy does not amnesty a theory from explanatory adequacy. Let me explain.

Before continuing, however, let me state that what follows is not Chomsky exegesis.  I am a partisan of Chomsky haruspication (well not him, but his writings), but right now my concern is not to scavenge around his literary entrails trying to find some obscure passage that might, when read standing on one’s head, confuse. I am presenting an understanding of MP that addresses the indicated question above. The two quoted paragraphs were addressed to (at?) me. So here is my answer. And yes, I have said this countless times before.

There are two puzzles, Plato’s Problem (PP) and Darwin’s Problem (DP).  They are interesting because of the light they potentially shed on the structure of FL, FL being whatever it is that allows humans to be as linguistically facile as we are.  The work in the last 60 years of generative grammar (GG) has revealed a lot about the structure of FL in that it has discovered a series of “effects” that characterize the properties of human Gs (I like to pretentiously refer to these as “laws of grammar” and will do so henceforth to irritate the congenitally irritated). Examples of the kinds of properties these Gs display/have include the following: Island effects, binding effects, ECP effects, obviation of Island effects under ellipsis, parasitic gap effects, Weak and Strong Crossover effects etc. (I provided about 30 of these effects/laws in the comments to the above mentioned post, Greg K, Avery and others added a few more).  To repeat again and loudly: THESE EFFECTS ARE EMPIRICALLY VERY WELL GROUNDED AND I TAKE THEM TO BE ROUGHLY ACCURATE DESCRIPTIONS OF THE KIND OF REGULARITIES THAT Gs DISPLAY AND I ASSUME THAT THEY ARE MORE OR LESS EMPIRICALLY CORRECT.  They define an empirical domain of inquiry. Those who don’t agree I consign to the first circle of scientific hell, the domicile of global warming skeptics, flat earthers and evo deniers. They are entitled to their views, but we are not required (in fact, it is a waste of time) to take their views seriously. So I won’t. 

Ok, let’s assume that these facts have been established. What then? Well, we can ask what they can tell us about FL. IMO, they potentially tell us a lot. How so? Via the POS argument. You all know the drill: propose a theory that derives the laws, take a look at the details of the theory, see what it would take to acquire knowledge of this theory which explains the laws, see if the PLD provides sufficient relevant information to acquire this theory. If so, assume that the available data is causally responsible.[1] If not assume that the structure of FL is causally responsible.  Thus, knowledge of the effects is explained by either pointing to the available data that it is assumed the LAD tracks or by adverting to the structure of LAD’s FL. Note, it is critical to this argument to distinguish between PLD and LD as the LAD has potential use of the former while only the linguist has access to the latter. The child is definitely not a little linguist.[2]

All of this is old hat, a hat that I’ve worn in public on this blog countless times before and so I will not preen before you so hatted again.  What I will bother saying again is that this can tell us something about FL. The laws themselves can strongly suggest whether FL is causally responsible for this or that effect we find in Gs. They alone do not tell us what exactly about FL is responsible for this or that effect. In other words, they can tell us where to look, but they don’t tell us what lives there.

So, how does one go from the laws+POS to a conjecture/claim about the structure of FL? Well, one makes a particular proposal that were it correct would derive the effects. In other words, one proposes a hypothesis, just as one does in any other area of the sciences. P,V,T relate to one another via the gas laws. Why? Well maybe it’s because gases are made up of small atoms banging against the walls of the container etc. etc. etc.  Swap gas laws for laws of grammar and atomic theory for innately structured FL and off we go.

So, what kinds of conjectures have people made? Well, here’s one: the principles of GB specify the innate structure of FL.[3] Here’s why this is a hypothesis worth entertaining: Were this true then it would explain why it is that native speakers judge movement out of islands to be lousy and why they like reflexivization where they dislike pronominalization and vice versa. How does it explain these laws? As follows: if the principles of GB correctly characterize FL, then in virtue of this FL will yield Gs that obey the laws of grammar.  So, again, were the hypothesis correct, it would explain why natural languages adhere to the generalizations GG has discovered over the last 60 years.[4]

Now, you may not like this answer. That’s your prerogative. The right response is to then provide another answer that derives the attested effects.  If you do, we can consider this answer and see how it compares with the one provided. Also, you might like the one provided and want to test it further. People (e.g. Crain, Lidz, Wexler, a.o.) have done just that by looking at real time acquisition in actual kids.  At any rate, all of this seems perfectly coherent to me, and pretty much standard scientific practice. Look for laws, try to explain them.

Ok, as you’ve no doubt noticed, the story told assumes that what’s in FL are principles of GB.[5] Doesn’t MP deny this? Yes and No. Yes, it denies that FL codes for exactly these principles as stated in GB. No, it assumes that some feature of FL exists from which the effects of these principles follow. In other words, MP assumes that PP is correct and that it sheds light on the structure of FL. It assumes that a successful POS argument implies that there is something about the structure of the LAD that explains the relevant effect. It even takes the GB description of the effects to be extensionally accurate. So how does it go beyond PP?

Well, MP assumes that what’s in FL does not have the linguistic specificity that GB answers to PP have. Why?

Well, MP argues that the more linguistically specific the contents of FL, the more difficult it will be to address DP. So, MP accepts that GB accurately derive the laws of grammar but assumes that the principles of GB themselves follow from yet more general principles many of which are domain general so as to be able to accommodate DP in addition to PP.[6] That, at least, is the conjecture. The program is to make good on this hunch. So, MP assumes that the PP problem has been largely correctly described (viz. that the goal is to deduce the laws of grammar from the structure of FL) but that the fine structure of FL is not as linguistically specific as GB has assumed.  In other words, that FL shares many of its operations and computational principles with those in other cognitive domains. Of course, it need not share all of them. There may be some linguistically specific features of FL, but not many. In fact, very very few. In fact, we hope, maybe (just maybe, cross my fingers) just ONE.

We all know the current favorite candidate: Merge. That’s Chomsky’s derby entry. And even this, Chomsky suggests may not be entirely proprietary to FL. I have another, Label. But really, for the purposes of this discussion, it doesn’t really matter what the right answer is (though, of course I am right and Chomsky is wrong!!).

So, how does MP go beyond explanatory adequacy? Well, it assumes the need to answer both PP and DP. In other words, it wants the properties of FL that answer PP to also be properties that can answer DP. This doesn’t reject PP. It doesn’t assume that the need to show how the facts/laws we have discovered over 60 years follow from FL has all of a sudden gone away. No. It accepts PP as real and as described and aims to find principles that do the job of explaining the laws that PP aims to explain but hopes to find principles/operations that are not so linguistic specific as to trouble DP.

Ok, how might we go about trying to realize this MP ambition (i.e. a theory that answers both PP and DP)? Here’s a thought: let’s see if we can derive the principles of GB from more domain general operations/principles.  Why would this be a very good strategy? Well because, to repeat, we know that were the principles of GB innate features of FL then they would explain why the Gs we find obey the laws of grammar we have discovered (see note 6 for philo of science nostrums). So were we able to derive GB from more general principles then these more general principles would also generate Gs that obeyed the laws of grammar. Here I am assuming the following extravagant rule of inference: if AàB and BàC then AàC.  Tricky, eh? So that’s the strategy. Derive GB principles from more domain general assumptions.

How well has MP done in realizing this strategy. Here we need to look not at the aims of the program, but at actual minimalist theories (MT). So how good are our current MT accounts in realizing MP objectives? The answer is necessarily complicated. Why? Because many minimalist theories are compatible with MP (and this relation between theory and program holds everywhere, not just in linguistics). So MP spawns many reasonable MTs. The name of the game if you like MP is to construct MTs that realize the goals of MP and see whether you can get them to derive the principles of GB (or the laws of grammar that GB describes). So, to repeat, how well have we done?

Different people will give different answers. Sadly, evaluations like these require judgment and reasonable people will differ here. I believe that given how hard the problems are, we have done not bad/pretty well for 20 years of work. I think that we have pretty good unifications of many parts of GB in terms of simpler operations and plausibly domain general computational principles. I have tried my own hand at this game (see here). Others have pursued this differently (e.g. Chomsky). But, and listen closely here, MP will have succeeded only if whatever MT it settles on addresses PP in the traditional way.  As far as MP is concerned, all the stuff we thought was innate before is still innate, just not quite in the particular form envisaged. What is unchanged is the requirement to derive the laws of grammar (as roughly described by GB). The only open question for DP is whether this can be done using domain general operations/principles with (at most) a very small sprinkling of domain specific linguistic properties. In other words, the open question is whether these laws are derived directly from principles of GB or indirectly from them (think GB as axioms vs GB as theorems of FL). 

I should add that no MT that I know of is just millimeters away from realizing this MP vision.  This is not a big surprise, IMO. What is a surprise, at least to me, is that we’ve made serious progress towards a good MPish account.  Still, there are lots of domain specific things we have not been able to banish from FL (ECP effects, all those pesky linguistic features (e.g. case), the universal base (and if Cinque is right, it’s a hell of a monster) and more). If we cannot get rid of them, then MP will only be partly realized. That’s ok, programs are, to repeat, not true or false, but fecund or not. MP has been very fertile and we (I?) have reason to be happy with the results so far, and hopeful that progress will continue (yes, I have a relentlessly sunny and optimistic disposition).

With this as prologue, let’s get back to Alex C. On this view, the learning story is more or less the one we had before. MP has changed little.[7] The claim that the principles of GB are innate is one that MP can endorse (and does, given the POS arguments). The question is not whether this is so, but whether the principles themselves are innate or do they derive from other more general innate principles. MP bets on the second. However, MP does not eschew the conclusion that GB (or some equivalent formulation) correctly characterizes the innate structure of FL. The only question is how direct these principles are instantiated, as axioms or as theorems. Regardless of the answer, the PP project as envisioned since the mid 60s is unchanged and the earlier answers provided still quite viable (but see caveat in note 7).

In sum, we have laws of grammar and GB explanations of them that, via the POS, argue that FL has GBish structure. MP, by adding DP to the mix, suggests that the principles of GB are derived features of FL, not primitive.  This, however, barely changes the earlier conclusions based on POS regarding PP. It certainly does not absolve anyone of having to explain the laws of grammar. It moreover implies that any theory that abstracts away from explaining these laws is a non-starter so-far as GG is concerned (Alex C provides a link to one such theory here).[8]

Let me end: here’s the entrance fee for playing the GG game:
1.     Acceptance that GG work over the last 60 years has identified significant laws of grammar.
2.     Acceptance that a reasonable aim of research is to explain these laws of grammar. This entails developing theories (like GB) which would derive these laws were these theories true (PP).
3.     More ambitiously, you can add DP to the mix by looking for theories using more domain general principles/operations from which the principles of GB (or something like them) follow as “theorems,” (adopting DP as another boundary condition on successful theory).

That’s the game. You can play or not. Note that they all start with (1) above. Denial that the laws of grammar exist puts you outside the domain of the serious. In other words, deny this and don’t expect to be taken seriously. Second, GG takes it to be a reasonable project to explain the laws of grammar and their relation to FL by developing theories like GB. Third, DP makes step 2 harder, but it does not change the requirement that any theory must address PP. Too many people, IMO, just can’t wrap their heads around this simple trio of goals. Of course, nobody has to play this game. But don’t be fooled by the skeptics into thinking that it is too ill defined to play. It’s not. People are successfully playing it. It’s just when these goals and ambitions are made clear many find that they have nothing to add and so want to convince you to stop playing. Don’t. It’s really fun. Ignore their nahnahbooboos.

[1] Note that this does not follow. There can be relevant data in the input and it may still be true that the etiology of the relevant knowledge traces to FL. However, as there is so much that fits POS reasoning, we can put these effects to the side for now
[2] One simple theory is that the laws themselves are innate. So, for example, one might think that the CNPC is innate. This is one way of reading Ross’s thesis. I personally doubt that this is right as the islands seem to more or less swing together, though there is some variation. So, I suspect that island effects themselves are not innate though their properties derive from structural properties of FL that are, something like what Subjacency theory provides.
[3] As many will no doubt jump our of their skins when they encounter this, let me be a tad careful. Saying that GB is innate does not specify how it is thus.  Aspects noted two ways that that this could be true: GB restricts the set of admissible hypotheses or it weights the possible alternative grammars/rules by some evaluation measure (markedness). For current purposes, either or both are adequate. GB tended to emphasize the restrictive hypothesis space, Ross, for example, was closer to a theory of markedness.
[4] Observe: FL is not itself a theory of how the LAD acquires a G in real time. Rather it specifies, if descriptively adequate, which Gs are acquirable (relative to some PLD) and what properties these Gs will have.  It is reasonable to suppose that what can be acquired will be part of any algorithm specifying how Gs get acquired, but they are not the same thing.  Nonetheless, the sentence that this note is appended to is correct even in the absence of a detailed “learning theory.”
[5] None of the above or the following relies on it being GB that we use to explain the laws. I happen to find GB a pretty good theory. But if you want something else, fine. Just plug your favorite theory in everywhere I put in ‘GB’ and keep reading.
[6] Again this is standard scientific practice: Einstein’s laws derive Newton’s. Does this mean that Newton’s laws are not real? Yes and No. They are not fundamental, but they are accurate descriptions. Indeed, one indication that Einstein’s laws are correct is that they derive Newton’s as limit cases. So too with statistical mechanics and thermodynamics or quantum mechanics and classical mechanics.  That’s the way it works. Earlier results (theory/laws) being the target of explanation/derivation of later more fundamental theory.
[7] The one thing it has changed is resurrect the idea that learning might not be parameter setting. As noted in various posts, FL internal parameters are a bit of a bother given MP aims. So, it is worth considering earlier approaches that were not cast in these terms, e.g. the approach in Berwick’s thesis.
[8] It’s oracular understanding of the acquisition problem simply abstracts away from PP, as Alex D noted. Thus, it is without interest for the problems discussed above.

53 comments:

  1. I think part of the issue is the tension between your footnote 7

    "The one thing it has changed is resurrect the idea that learning might not be parameter setting."

    and the fact that the only work that is really "visible" at the moment is Yang's variational parameter setting and Fodor/Sakas-triggering. So it at least looks (at least to the "outsiders") as if the only work that considers the acquisition problem at the moment is somewhat at odds with the current state-of-the-art, and I think it's fair game pointing this out.

    Your footnote mentions Berwick's thesis, so perhaps you (or Bob?) could elaborate just a bit how this fits into the bigger picture?

    ReplyDelete
    Replies
    1. This is a fair point, one that I discussed here (http://facultyoflanguage.blogspot.ca/2014/02/plat-darwin-p-and-variation.html)
      in the comments section a little. The tension between MP and a rich FL internal parameter space seems pretty clear to me. So, if MP is on the right track, the existence of variation should not be taken to imply the GB conception of a fixed set of FL internal parameters. So how to resolve the tension?

      Well, first a few observations: First, the existence of variation does not imply that we don't need a rich innately structured FL. The laws of grammar point to a series of invariant features of FL. If these exist, and IMO the evidence is overwhelming that they do, then this circumscribes what needs to be acquired from PLD. Second, there have been problems with parameter setting theories independently of MP concerns (as Dresher and Fodor have discussed, mainly the fact that the parameters are not independent). Third, as Newmeyer has noted, once one goes in for micro-parameters the difference between these and the existence of language specific rules disappears.

      This last point is where Berwick's early work comes in. He was interested in finding ways of learning rules given PLD. I think it is time to think along these lines again: in place if parameter fixing as the picture of acquisition, we should return to rule learning. What kinds of rules. Well, the thread to the above post speculates that it involves mainly figuring out which copies to delete and whether or not to add a functional head. The idea was that FL effectively gives one a fully annotated "LF" derivation and acquisition effectively amounts to matching this annotated object with a "PF" that is also provided. The grammar is a set of rules concerning which copies to keep in "PF."

      This is all speculative, I concede.

      Let me end with one point: it may be that MP fails and that we need a parameter theory. I think that right now this is one of the bigger challenges to the program. It would still be interesting to discover that all of the principles were reducible to a small simple core even if the parameters remained. I am hoping that the sketch above can be fleshed out, using tricks that Bob developed in his thesis a.o. But, I could be wrong (I have been before) and if so, well, we cannot bring PP and DP together in this domain. I am told that in the sciences, this is hardly a novelty. Were this to happen, IMO, the acquisition data is much stronger than the evo data and I would conclude that there is something we still don't know about evo. But I could imagine someone concluding differently. It's too early, however, to give up on the desired reconciliation. Hope this helps.

      Delete
    2. You are of course right, that there are many ways to make progress on the problems of language use, language learning, and language evolution. You (correctly) point out that linguists have discovered many non-trivial generalizations about language and languages, and that one way to make progress is to try to understand these generalizations better (by, eg, explaining them, or determining that they have been mischaracterized, or ...). You also (correctly) point out that Alex C is doing something different, namely trying to explain how anything could learn natural language like patterns. Clearly, as you point out, until Alex C is able to provide an account of the generalizations linguists have discovered, he has not given the complete story. [I have not heard him claim otherwise.]
      Given, however, that we do not yet know the right explanation of the generalizations (nor even if they are (intensionally) true in the form we have given them), a charitable and (I think) compelling perspective on Alex's approach is as follows: "I [Alex] will try to prove more and more learning results. You [linguists] should continue discovering generalizations. We should work together to determine whether, when my learners cannot learn your generalization, (a) the generalization is characterized correctly (more field work) (b) the generalization can be accounted for extragrammatically (in the parser), or (c) the learner is wrong."
      Alex seems frustrated that the only answer given is `c', when `b' is explicitly on the table, and `a' is always a possibility, and is the subject of current theoretical work.

      TL/DR: nahnahbooboo

      Delete
    3. @Greg:

      Yes, a-c are the logical options. I am far less skeptical than you are about whether we have approximately the right accounts. As I said, I believe that GB and its variants are serviceable "effective theories" in the sense that physicists use the term. The aim is to find more fundamental accounts that derive these. I suspect that you would disagree, as would Alex C.

      As regards a-c: I don't think that the generalizations are "Largely" incorrect. So, I tend to discount this option. There may be changes, but mainly around the edges.
      As for b: I have no problems with this. I just have not seen many attempts to explain the main effects GG has discovered in these terms. The island effects have been the main venue for alternative analyses, and this I paid very close attention to. I have concluded that the current state of play does not favor strong extra-grammatical accounts (Sprouse, etc are what have convinced me of this despite my great sympathy for something like this being true for MP reasons).

      As for c: It's not so much that it is wrong as that it never makes contact with the relevant data. Wrong would be good. Nobel failures are instructive. We need some of these. Bob and Charle's and Ken's work CAN be wrong for it addresses the relevant issues. My problem with Alex's stuff is that it is besides the point, not that it is wrong. And this is an entirely different kind of objection.

      Delete
    4. @Greg.With regard to (b), punting to the parser does nothing to remove the need for innate structure. It's clearly possible in principle that any alleged grammatical constraint could turn out to be a side-effect of the way the parser operates, but that just shifts the innate structure from one module to another. So, maybe island effects derive from a grammatical constraint or maybe they reflect resource constraints on parsing, but either way they're built in and not learned.

      I'm not sure it's right to characterize the answer given to Alex C as (c). I don't think anyone is basing their objections on the assumption that his learners are wrong. They may, for all we know, offer a perfectly accurate account of how kids learn the stuff that they do in fact learn. After all, if these learners turn out to be unable to learn the ECP from a realistic corpus, then that is no flaw from our point of view. We don't think that kids learn it either!

      With these caveats, I don't see that (a) or (b) are plausible options for any of the items on the list that Norbert gave. But maybe some of these generalizations are empirically faulty or ripe for some kind of functional explanation. If so, let's see the arguments to that effect, and see how many are left standing. My guess would be: most or all of them.

      Delete
    5. @Alex: punting to the parser does nothing to remove the need for innate structure. It's clearly possible in principle that any alleged grammatical constraint could turn out to be a side-effect of the way the parser operates, but that just shifts the innate structure from one module to another. So, maybe island effects derive from a grammatical constraint or maybe they reflect resource constraints on parsing, but either way they're built in and not learned.
      What you say is true, but it strikes me as a funny way to use `built in'. A successful punt of, say, islands, would have them emerge from a conspiracy of normal resource limitations, together with the particular structure assigned to those sentences, perhaps as well with heuristics based on usage frequency. I would describe this situation as a reduction of a built in theory of islands to more general principles, and not as `building in' the theory of islands into the parser.

      @Norbert: I don't know whether I am more skeptical than you. It depends on what `approximately right' means. Was phlogiston approximately right? I could go either way...

      @Norbert & Alex: You are both right in that Alex C's work is not (yet attempting to) address the discoveries Norbert adumbrated previously. I am (as you see) far more sympathetic to his research direction than are you, and I think that this lines up with the kinds of questions that people working on mathematical linguistics ask (and why this work as well is nahnahboobooed by the working linguist). Like math lx-ers, Alex C is working on developing learning algorithms for properly mildly context-sensitive classes of languages. This is because all are convinced that we need at least this much power to describe the things we see; if one could learn ECP effects but only for context-free languages, it would be a bitter pill to swallow, as it could not possibly be the right story (the learner would be fundamentally too weak).



      Delete
    6. @Greg:
      > @Alex: punting to the parser does nothing to remove the need for innate structure.
      What you say is true, but it strikes me as a funny way to use `built in'.

      As an aside, it constantly strikes me as a funny fact about cognitive science that this strikes anyone as a funny way to use 'built in.' When "innate"/"built in" started implying "not reducible to general principles" or (again distinct) "not reducible to domain general principles" is anybody's guess. I have never understood what they have to do with each other. And as long we are playing Norbert's game of "What would Chomsky say", I take it that anyone who is able to point out the tension between Darwin's Problem and Plato's Problem is equally baffled by this conflation.

      Delete
    7. Cedric Boeckx calls the problem that Norbert is interested in Greenberg's problem (GP):
      explaining why there is the variety of languages that we in fact observe.
      But I am not interested in GP, I am interested in PP (as constrained by DP). And that is, no matter how much Norbert's huffs and puffs and calls names,
      a standard view.

      I think some of the properties in
      Norbert's list of 95 theses that he has nailed to the door will be attributed to innate properties of the LAD,
      others may arise from subtle iterated learning effects -- interactions between the learner and cultural effects over the generations -- which seems to be the consensus view from talking to people like Ian Roberts.
      I am quite open to different explanations; but I do think it is a mistake methodologically to assume that they are all attributable to
      "innate" factors, especially if you want to end up with a minimal UG, for the obvious DP reasons.
      I don't think that saying everything is "innate" counts as an explanation, even if it is in fact true.
      It's like saying that birds have an innate ability to fly; yes, that's true but it doesn't explain how birds can fly.
      Quite apart from the general critique of the term "innate" (Mameli and Bateson).

      In fact I take the opposite methodological stance: we should start off by assuming that these specific linguistic principles are not "innate",
      and then see how far we can get. I.e. Incorporate DP early on. We can't tell what has to be innate until we have figured out how much can be learned.
      So the first order of business must be to understand the limits of general learning mechanisms.

      I am very open to other methodologies and other theories.
      I take the following methodological stance
      a) we should be precise and explicit in our models
      b) we should try to solve the simple cases first (viz recursive structures)
      c) it is fine to idealize from some extraneous factors
      d) we should use the best available theoretical tools from other disciplines (e.g. the theory of computation)

      This leads me, given my interest in PP to study or develop the general theory of learning recursively structured grammars from strings, within a framework of computational learning theory.
      In much the same way as someone trying to explain bird flight might try to develop a general theory of aerodynamics.

      Here is an alternative methodology -- one could use the statistical NLP toolkit (bayes, PCFGs, Pitman-Yor processes etc ect) to try to do unsupervised learning on real corpora of CDS. That also seems like a reasonable way -- computational and empirical but not mathematical. I like a lot of this work, I don't do it myself at the moment, but I may start again. Niyogi has some good discussion in his book of the respective merits of these two approaches.

      What are some other alternative methodologies?

      Delete
    8. This comment has been removed by the author.

      Delete
    9. (deleted the previous comment for being snarky and unproductive)

      Delete
    10. @ Greg:
      I think that the generalizations go somewhat beyond phlogiston accounts in early chemistry. My favorite analogy is the Ideal Gas Laws. Of course if your view is that the generalizations are as bad as you suggest, then I too would not bother with them. As I keep trying to make clear; that is the big divide that we are arguing around. There are some (Alex, and maybe you) who think that we really need to start from scratch and that we've learned almost nothing in 60 years worth of research. That the "discovered" (scare quotes for Alex) laws are best ignored. The reason that is the big divide is that once one acknowledges that they are roughly correct then the next obvious step is to try and explain THEM. That puts demands on proffered accounts, viz. that these laws be targets of explanation. Now, in the island discussion you alluded to, this is what happened, and the debate was productive. I think that the domain specific side got the better of it, but the debate was rational. Why? each side tried to explain island effects and the explanations could be compared and discussed. The two sides were aiming at the same target (btw, it is not clear to me that the non-G side of the debate did not require quite a bit of domain specific "costs" to make their story run (e.g. Finities clauses, referential DPs etc had inherent costs)) (see Sprouse Hornstein volume for a full airing of these debates).

      This is not what typically happens, however, in the cases I've been excoriating. And, I very strongly suspect, that this is because of the view that the basic generalizations are considered snake oil (or, phlogiston?). So, that's the divide and it behooves us to fess up as to which side we are on as productive discussion is impossible otherwise. As I think many of the comments to these posts indicate.

      @Ewan: Could not agree with you more. An MPer is all for non-domain specific innate structure driving linguistic effects (Chomsky is even more radical, hoping that some physical laws will be pertinent). As I noted, I would personally have loved it if Kluender et al's work had worked and that we could reduce island effects to complexity effects of some general sort. That is the name of the game for MPers. However, the game is not worth playing if we ignore the basic data, i.e. the laws we have discovered over the course of the last 60 years.

      Delete
    11. @Norbert: that is the big divide that we are arguing around. There are some (Alex, and maybe you) who think that we really need to start from scratch and that we've learned almost nothing in 60 years worth of research.
      As always in life, things are more complicated. First, going back to the list of phenomena you listed in a previous post, there are several issues:

      1) Do we observe these phenomena (e.g. restrictions on the distribution of pronominals)
      2) Are our laws actual laws, that is to say, do they hold across the board and for every language?
      3) Are the mechanisms we posit in the grammar to derive these laws correct?

      Nobody debates 1, the data is overwhelming. Similarly, we all shy away from 3 and accept that syntactic theory a 100 years from now will probably use different technical tools.

      The real issue is 2, how law-like are our laws. That's where things get gray, because of course there are exceptions, both within a language and from a typological perspective. They are far and few between, but they are the prime reason why competing analyses exist, e.g. canonical binding theory VS Pollard&Sag theta-hierarchy VS Bruening's phase-based precedence. Those theories agree on at least 90% of the data (my purely impressionistic guess), but make slightly different predictions for a few constructions, many of which may involve very subtle grammaticality judgments.

      So far so good, if all our theories agree 90% of the time, they all express pretty much the same law, right? Well, no, because the remaining 10% can be very important, in particular from a mathematical perspective. Just think of all the failed arguments to show that English is not weakly context-free, e.g. the respectively construction. It uses a law that holds for the vast majority of such sentences: the number of verbs matches the number of nouns and the i-th verb agrees in number with the i-th noun, as in John and Bill and Mary sleep, eat, and run, respectively. But while this is a good approximation, things are actually more complicated in a handful cases (Pullum and Barker 82) and it turns out that the construction is context-free after all.

      Quite generally it is fairly easy to construct languages where the addition or removal of a single sentence affects weak generative complexity. And either change could either increase or decrease this complexity. So while our tentative laws are good approximations and set a base line --- you have to be able to account for the 90% that are safe --- the contentious 10% can have a huge effect for mathematical properties. So if those are the properties that you need to get your work off the ground, our current laws are often not good enough.

      Delete
    12. @ Thomas
      Every theory in every domain we know of will look different 100 years from now. thus what you are saying about linguistics applies to them as well. However, I am pretty sure that your caution would be considered ludicrous if extended to these other domains. And there's a good reason why: we will only get to better theories if we take the one's we have very very very seriously.

      I have suggested a distinction before borrowed from the "real" sciences between effective and fundamental theory. In physics, Newton provided an effective theory of gravity. Einstein's is more fundamental. Whether it is THE fundamental theory awaits reconciliation with quantum mechanics (the prevailing view seems to be that it is not). However, the distinction is useful in that it also tells us how to develop a more fundamental theory: aim to derive the effective theory as a limit case. In this sense, the effective theory is "true"/"accurate" (so far as it goes) as it is derivable from the more fundamental one. So, let's take what have seriously enough to investigate it.

      Now, as you note, there are some cases where our competing theories largely overlap, but not completely. I agree that in that case we should keep the penumbras of disagreement before our eyes. Indeed, one aim of fundamental theory is to try and explain these differences. At times, it allows us to adjudicate the competitors theoretically (as well as empirically). This is all philo pf science 1 stuff, so I am sure you know it.

      It is also possible that in some cases the disagreements become important. In that case, they should be discussed. However, none of this affects, IMO, the general point: you need to hold things constant if you are to explore what's going on. For many of your concerns, it appears, the differences really matter. For many of mine, they do not. POS arguments and MP theories can take the roughly correct (90%) as accurate reflections of what's there and see how/if a more fundamental theory might account for them. The 90% in common already generates a whole slew of interesting POS arguments and whole bunch of DP problems. We need not wait for the theories to be "perfect" for interesting inquiry to proceed. Indeed, we NEVER have to do this in ANY domain of scientific inquiry so urging caution or restraint in the domain of linguistics is a form of the methodological dualism that Chomsky regularly (correctly, IMO) argues against. I would suggest the following methodological maxim: don't hold linguistics to standards that would be laughed out of court in other domain of inquiry. Waiting for the perfect description of the data is one such maxim that would not be taken seriously in the real sciences, and so it should not be in linguistics.

      Last point: this all assumes that we can get the "perfect" description absent theory. But as you know, the data do not speak for themselves and theory is required to even get the right analysis. So, like it or not, you're stuck. The question, to repeat, is what to take as the baseline for moving forward. This will be question relative, but that does not mean that ceteris paribus we should be overly skeptical about the "laws" we've identified. There are a whole bunch of questions for which this is more than good enough. But I suspect you know this already and are just trying to justify the unjustifiable skepticism of some of your colleagues. This is very nice of you (you ARE a nice guy!), though in this case what you observe, true though it is, will only serve to muddy the conversation, and that is not a good thing, IMO.

      Delete
    13. @Norbert: I'm afraid I have to muddy the conversation even more, because I just don't see a big divide here. Instead of getting all philosophical about it, let's look at a concrete example first. Oh, and this is a long post, so I'm splitting it in two parts.

      One thing I've been working on for a while is whether there are any syntactic phenomena that cannot be expressed in terms of MSO-constraints over tree. The best candidate for this is binding theory, because Principle B of canonical binding theory is not MSO-definable. But if you assume that there is a split between discourse-binding (some DRT-style identification of variables) and syntactic binding, then Principle B can be MSO-definable if you furthermore restrict the number of pronouns per binding domain that must not be locally bound. The rest of the project then involved some syntactic and typological work to verify that this is a tenable position. And it turns out it is.

      So is canonical binding theory a good approximation for what I'm interested in? Yes and no. It states things that are irrelevant such as how the binding mechanism is mediated via c-command and makes the wrong claims for some relevant cases (exempt pronouns in adjuncts).

      If you compare canonical binding theory to the abstracted, index-free version I use for the empirical part of the project, they look very different. It deliberately ignores parts that syntacticians care a lot about, such as the correct size of the binding domain. and adds mathematical assumptions that no syntactician would ever make because they are irrelevant for what they care about. Heck, my version doesn't even talk about binding, only about the minimum number of available antecedents. So if we were to measure the overlap between the two, it would be rather minimal.

      But at the same time, my version is compatible with all the extra claims that are part of canonical binding theory. You want to specify a size for the binding domain? Sure, we'll put it in this definition over here, doesn't change anything for my result, though. You want binding to be determined by c-command? Alright, you figure out how it works, doesn't change anything for my result, though. Of course it's interesting why c-command should matter rather than m-command, and why binding domains are no larger than CPs. But those questions are orthogonal to the goals of the project as I conceive of it for now.

      Delete
    14. On a more abstract level, the same holds for Alex C's work (which does not mean that this is actually his view, which I'm still trying to pinpoint exactly). He needs certain assumptions to get going, the laws do not say anything about those. What the laws do talk about has little effect on his results, but the two are presumably compatible (the laws do not obviously conflict with his assumptions).

      The main difference is that my assumptions were simple, structural, and restricted to just one phenomenon. Hence I could easily verify myself that they hold. Alex needs to assume abstract properties of string languages. They are hard to relate to structural notions, let alone the intricate notions used in linguistic laws. And even if you figure out how to do that, you would still have to check that every construction in every language obeys them. That's a lot harder.

      So where do we stand, overall? I think what you interpret as skepticism is actually Alex C and Greg pointing out that once you start looking at certain issues, it can happen that the laws 1) talk about things you do not need, 2) leave open things that are important for you, 3) make things more complicated than they need to be empirically (e.g. unbounded Principle B), and 4) make claims that are problematic for you and are actually contentious on an empirical level (cf. locally bound pronouns). That doesn't mean that the laws are bad and we have to go back to square 1, it doesn't even mean that you can ignore the laws whenever you feel like it, it just means that for certain projects they are not particularly useful right now even if the project tackles the same empirical domain.

      Delete
  2. @Thomas
    I guess I fail to understand. You are interested in a different set of questions for which the laws as I understand them do not matter. Far be it from me to stop you from looking at whatever you want. However, I do not see that the problems really overlap or that because you are interested in what you are it renders illegitimate the questions I am asking. This is not the sense I get from Alex. He thinks that my questions are ill formed or silly. He thinks that what he is doing renders my questions moot. I frankly don't see this, nor would I understand your results as bearing on my questions, to the degree I understand them. I am interested in why there are hierarchical and locality conditions on binding. You clearly are not as you are happy to hand code these in. That's fine with me, we are doing very different things. That's all that I want conceded. Maybe they will touch one another in the future, but right now, they are, or appear to me to be, miles apart.

    ReplyDelete
  3. I really don't think your goal is silly; it differs only very slightly from mine.
    There are goals, the methodology we use to approach that goal, and the theories we produce using that methodology. My goal is, IMO, ucontroversial, I want to figure out the structure of the LAD. Your goal is slightly different, I take it. " I am interested in why there are hierarchical and locality conditions on binding.".
    You want to figure out why the attested languages have the properties they have. Or is this only as a step towards figuring out UG?

    Our disagreement is about methodology, and as a result about the reliability of the results produced. But you have backed off from any *theoretical* claims. And you have put no theories of the LAD on the table, so there is nothing to disagree with there.

    I still don't understand why you object so strongly to the methodology I use, other that perhaps the fact that I have not yet arrived at a theory that explains the complex syntactic phenomena you are interested in. Or rather I understand the sociological reasons, but not the scientific ones.

    ReplyDelete
  4. @Alex:
    I think that we only appear to have the same goals. My goal is to figure out the structure of the LAD given that we know a lot about it already. What do we know? Well the 30 or so effects I mentioned characterize some laws of grammar. Moreover, I think that the theories developed in the mid 80s (GB being the one I practice, but I find GPSG, LFG, RG, etc to be notational variants over a large part of the domain of interest).

    Given this view, I am open to any account of why these laws of grammar exist and why the LAD’s Gs adheres to them. I am very open minded here. I have no problem with theories that are domain specific (ascribing linguistically proprietary properties to FL), nor do I eschew domain general explanations which exploit general cognitive mechanisms. In fact, given my MPish inclinations I prefer the latter. However, what I want out of them is that they address the facts as we know them to be. I have little interest in approaches that fail to address these issues.

    As you know, because I’ve said it repeatedly, I think that one of the strengths of GBish like accounts is that they try to explain these laws of grammar. Binding theory aims to explain binding effects, subjacency tries to explain island effects, case theory tries to explain A-movement effects etc. This is a plus for these theories and though I think that they are not quite right, I do think that they are on the right track precisely because they do a fairly decent job of explaining the laws. This belief leads to a proposed approach if one has MPish inclinations: try to derive the principles of GB from more domain general assumptions. Were you to do this, it would derive the laws of grammar (more or less) as a by-product.

    But that’s all detail. What is our methodological difference? It does not love in a rarified atmosphere. It’s quite basic: we disagree about what the basic things we should explain are. We disagree at very basic levels: e.g. you seem enamored of string properties, I insist the basic data is judgments concerning constrained homophony (sound-meaning pairings). Your work ignores the generalizations I mentioned above, mine directly addresses them. You find only domain general theories admissible, I find this to be an open empirical question and tolerate domain specificity if that is what explains the established data. My wishes are to reduce these to a minimum. My methodology disallows me from ignoring them when I cannot. You cannot explain everything at once. Granted. But when you cannot explain something, you say so and remain appropriately sheepish until you can. You do not disparage conclusions that get the basic facts in service of some hope that one day you might. That’s where our methodologies differ.

    Alex, I have nothing against the following statement: “theory T were it correct would explain facts F. But theory T has features I personally don’t like and would like to explain in more general terms. Right now I cannot do that nor do I know how one could do that. But that is my hope.” It’s always nice to know what one’s hopes are and how far one has gone in realizing them. But it’s a pretty weak argument and until realized does not serve as an actual contender. Frome where I sit, you take your hopes to constitute a methodological reason for disparaging stories that work pretty well. That is, IMO, a very bad way to proceed. That’s my main objection to your methodology.

    ReplyDelete
    Replies
    1. I haven't really been disparaging any of your stories at all because you aren't telling me what your preferred story of the LAD is. That is my frustration. What is it?

      I try to be very explicit so that Chomsky and Bob B and Massimo PP can disparage away as much as they want. And I am suitably sheepish about the numerous failures and inadequacies of my models. I hope to be able to fix the problems of my current models, just as I have already fixed many (not all!) of the problems with the model that Chomsky et al criticized.

      Delete
    2. "I insist the basic data is judgments concerning constrained homophony (sound-meaning pairings)"
      I agree with this; you are right to insist.

      The situation perhaps is confused because I assume that the input to the child are just the strings in context. As do Sakas, Fodor, Yang, etc.

      Delete
    3. @Alex C. It seems to me that your research is complimentary to the kind of research that most generative syntacticians do. I for one would be happy to accept (at least as a live possibility) that kids learn some parts of the grammars of their native languages. However, I think that the POS arguments for the innateness of, say, the ECP, are pretty good. Why not just say "sure, the ECP is probably innate; I'm trying to figure out how some of the non-innate bits get learned"? Or is that what you're saying? In that case, your only real disagreement should be with hardcore P&Pers, not generative syntacticians in general. Or do you disagree with them too because you don't accept the POS arguments?

      Delete
    4. I am happy to take some version of GB as a good partial specification of the structure of the LAD. Were it correct, it would explain why native speakers, for example, treat the antecedent of 'himself' as 'John' in 'John likes himself' and treat 'him' as disjoin from 'John' in 'John likes him.'

      This view specifies what needs "learning" and what does not. We need not learn the principles of BT, as they are innately provided. What does need learning is that 'himself' is a reflexive in English and 'him' is a pronoun. What is left to specify is how these lexical facts are learned given PLD. I do assume that the PLD consists of sentences like 'John likes himself' and sentences like 'John likes him.' I assume that in a given context the child "sees" that the former couples the referent of 'John' with that of 'himself' (the kid can "see" that John is both the liker and the like) and that it never couples that of 'him' with it. Negative data is fine for me as I have a specified innate theory of binding, so kids can look for dogs that don't bark with no problem.

      This leaves a bunch of questions open, many that I leave to my acquisition colleagues (e.g. Jeff Lidz), e.g. How exactly does the kid use the relevant PLD?

      It also raises the question of whether the BT is domain specific or not. In its' GB form, it sure looks domain specific adverting as it does to notions like c-command and binding domain. However, I also believe that the notions might be reducible to more domain general ones once we understand how the structures are built. I have even written on this.

      So what's the LAD look like: well it contains BT. What version of BT? One either identical to the GB version of BT or a version from which this version can be (more or less) deduced.

      Is this proposal perfect? Nope. But, it does pretty well. It does really well in explaining why the LAD doesn't converge on theories of binding where reflexivization and pronominalization are not in complementary distribution.

      So, there is a partial specification of my version of the LAD.

      Second note: I do not assume that the input are strings in a context. I assume, given UTAH that they are sounds coupled with thematic structures. For syntax we can assume that the sounds can be treated as strings. Theta roles (event participants) are also inputs. I do not assume that non thematic information of a semantic variety is available, but thematic information IS epistemologically prior and available to the LAD in the PLD. But we have gone around this track before.


      I cannot imagine that this answer will satisfy you. You will complain that it is not formalized, nor specific enough nor complete. I know this. Nonetheless, it seems to be a pretty good specification and it has the virtue of addressing a real GG finding. Like I keep saying, it is the fact that we come down on opposite sides of this judgement that suggests to me that we are doing very different things.

      Delete
    5. I do not often find myself agreeing with Norbert but I think, Alex C., sometimes you have to either agree to disagree [as Norbert repeatedly pointed out you two do] or one of you has to change their position. Norbert said some time ago [in essence] that no empirical discovery he could imagine would falsify his core theoretical commitments [CTC]. I have no reason to doubt he was sincere. So when it comes to those CTC you either have to change your mind or accept Norbert's rhetoric. CTC are a case of you're either with me or against me. From watching your debate for a while I think that Norbert is right: you have irreconcilable difference and I do not quite understand why it seems so difficult to acknowledge that? It would save both of you a lot of typing efforts...

      Now maybe I can get some clarification on a rather specific point Norbert makes above. He writes: "So what's the LAD look like: well it contains BT. What version of BT? One either identical to the GB version of BT or a version from which this version can be (more or less) deduced."

      Now that kids can overcome POS because some version of BT is innate explains nothing UNLESS one has a story how this version of BT is biologically realized in the child's brain. So do those who defend Norbert's view have such a story, yes or no?

      [I ask +/- confused academics to refrain from inquiring what MY story is. Even if i have none or one that is utterly wrong, that does not remove the burden of proof from Norbert and those defending the same proposal: their proposal requires an innate version of BT - so they need to have a story about biological realization of same]

      Delete
    6. @AlexD "Why not just say "sure, the ECP is probably innate; I'm trying to figure out how some of the non-innate bits get learned"? Or is that what you're saying?"

      It's not that far from what I am saying. There is an obvious distinction between a theoretical claim, and a methodological assumption that we should keep in mind.

      So something is definitely innate -- and we definitely generalise in some ways and not others. And things like subjacency, the ECP (if I understand it's current role) seem plausible candidates to be if not innate themselves then consequences of some structural biases on the sorts of configurations that can be learned -- that is pretty weaselly I know. But I just don't see that saying "ECP is innate" is a good move methodologically, in fact I think it is a bad move. It doesn't help the process of constructing an explanatory theory of the LAD, it doesn't explain anything, and it creates DP problems.
      Just to rephrase, there may be some element X which means that the grammars output by the LAD always have the properties that we describe using the term ECP. But if you want to find out what X is, then I am not convinced that saying "ECP is innate" is the right way to go about it.

      Or another way of saying the same thing, more generally. The LAD is a device that learns the parts that vary and doesn't learn the parts that don't vary. The standard generative syntax strategy has been to start by fixing the parts that don't vary (phase A) and then try to figure out how you could learn the rest (phase B) , based I think on some interpretation of Gold's learnability results.
      I advocate the opposite strategy -- start by figuring out how you can learn hierarchically structured grammars from strings, and then see how much these techniques can go -- then when they run out of steam, hypothesize some ingredient X that when combined with the learning algorithms will give the result.

      Now this a methodological point.
      So we can be pluralistic about methodologies -- it seems perfectly reasonable to have multiple groups of people working using different methodologies. At least in my view.
      And perhaps we will all converge on the same answer; after all we none of us have a complete answer. So some people do unsupervised grammar learning on CHILDES data, and some people look at the developmental data and so on.

      Norbert is trying to stipulate that the only admissible methodology is one that starts from the assumption that his theory is correct. I don't think that is remotely defensible. Indeed I think that the methodology he advocates is for various reasons unlikely to lead to the right answer no matter how "fecund" it might be.
      Which is obviously why I am using a different methodology. But it seems impossible to have a discussion here about methodology without Norbert circling the wagons and calling me a climate change denier.

      Delete
    7. Part 1
      @Alex:
      I am now really confused. You seem to tolerate the idea that the ECP is the result of some innate machinery, but the ECP itself is not innate. But you think that saying that the ECP is innate is not a good way to go. Yet you think that starting with the invariant bits of UG and seeing how to add in the variant bits is a reasonable strategy and methodological pluralism is to be encouraged, but not if it means assuming that the invariant bits are innate.

      I find this all very confusing for in one way I find what you are saying perfectly congenial and in another deeply wrong headed. The congenial one is that you think that the ECP style effects (and others I assume) are more or less descriptively adequate but not fundamental. That the right fundamental theory will have the ECP etc as CONSEQUENCES rather than as primitives. This I find both coherent and congenial (very MP in fact). But you then say that assuming the ECP is innate is a bad idea.What I don't see is why. In one perfectly straightforward sense it is, if it derives form something more fundamental that is itself innate. Further, assuming it is tells you to look at "inside" at the "biases" if you want to derive it. It tells you that it is not a data driven generalization within a given G, say in contrast to whether your language is head-complement or complement-head, something that is almost certainly a data driven difference. Thus locating the ECP as part of he innate machinery or derived from innate machinery seems to be to be very useful information to have. But you think that assuming this is not "right way to go about it." So you think that doing this is a mistake. Why?

      Here's one thing I can imagine: saying it is innate acts against trying to explain it. THis is indeed possible. However, in practice this is NOT what has happened. There has been a lot of effort expended within GB and MP trying to deduce these effects from other more fundamental looking ideas. The ECP itself has proven to be recalcitrant. However, this has certainly what people have tried to do with the fixed subject effect (Pesetsky and Torrego), Control (moi with many friends), binding (Kayne, Zwart, Lidz&Idsardi, moi, Alex D), Cross over (Kayne) etc. So assuming that these were innate did not in practice prevent people from thinking about them as theoretically tentative way stations to a more fundamental theory. It simply told people what kind of explanation we should be looking for, to repeat: one focused on the built-in biases of the system.
      continued...

      Delete
    8. Part 2:
      @Alex:
      Let me add a tu quoque here: by refusing to allow that the ECP is innate and instead focusing on the learning algorithms you raise to prominence the view that the ECP is NOT innate but learned, i.e. a data driven generalization. Form what I can tell, you don't actually believe this. Nonetheless that is a very natural way to read your remarks. And this is what I have been strongly objecting to. Whatever its etiology/provenance I am quite sure that very weak argument suffices to conclude that the ECP is not learned. And a methodology that invites this supposition (unless it produces an extremely good demonstration of HOW this might happen) diverts attention from the real issues of explaining these effects. And that kind of methodology is one that I would warn against adopting.

      Last point: as the above may make clearer (and as I have said repeatedly) what I take to be correct is not the fundamental theory, but the accuracy of the conclusions that the generalizations like the ECP are roughly accurate and that their explanation will lie in looking at the structure of the innate biases/structures that people bring to the acquisition task. they are NOT data driven generalizations, but reveal properties of the innate architecture. The question is what this architecture looks like. I personally doubt that GB describes the FUNDAMENTAL structure of FL. But I am pretty sure that whatever does will derive GBish kinds of principles (note the 'ish' I don't think that the principles are exactly correct) as consequences because they do more or less explain these generalizations. If you buy this, then we are reading off the same page. So, are we?

      Delete
    9. "Thus locating the ECP as part of he innate machinery or derived from innate machinery seems to be to be very useful information to have. But you think that assuming this is not "right way to go about it." So you think that doing this is a mistake. Why?"

      It makes the learning problem harder rather than easier -- the end result of adding a lot of innate constraints on the grammar may end up being a class of grammars which is impossible to learn. The result of this is you end up with a theory of the structure of grammar which is incompatible with any plausible learning theory -- which is the state that I think GG is in now. Witness, for example, the absence of a learning theory in your theory of the LAD, above.

      This is, I know, the opposite of how GGers tend to think about learnability.

      Delete
    10. @ Alex:
      Now I'm confused again. I thought we greed that ECP effects are true. That means Gs that don't adhere to some principle that has the effect of ruling of ECP violations don;t exist. But you don't want to assume that such grammars are unlearnable as it would make learning them harder. But they are not learned, that was the point we agreed on? So I am really lost now: if we assume that ECP Gs don't exist then we rule out Gs that allow them to be learned. But how could that be bad?

      One more point: You start your sentence by saying that "it makes the learning problem harder" but continues by saying that it "may end up…" WHich is it? We know that it makes the problem harder or that it may shunt us to a class of Gs with bad learning profiles?

      BTW: I don't agree that I didn't have a learning theory. I thought I told you how to learn binding phenomena. And I think that I can tell a similar story for much more of what we have.

      However, as you have no credible learning story either for any interesting fragment, it's hard for me to see why you think that the problem is the incorporation of constraints like the ECP as part of FL. Indeed, so far as I can tell, your views are effectively hunches, unless there is some proof hiding in your back pocket showing that FLs with GB-like structures built-in are in principle unlearnable. And if you do have such a story, let's hear it.

      Delete
    11. @Norbert. To be clear, I think the ECP as stated is most likely false.
      But I think it is a good generalisation. I wrote a long thing about this earlier but I can't find it -- but I agree with TG's description earlier.
      "1) Do we observe these phenomena (e.g. restrictions on the distribution of pronominals)
      2) Are our laws actual laws, that is to say, do they hold across the board and for every language?
      3) Are the mechanisms we posit in the grammar to derive these laws correct?

      Nobody debates 1, the data is overwhelming. Similarly, we all shy away from 3 and accept that syntactic theory a 100 years from now will probably use different technical tools."

      So for example GPSG is ok, but does not contain the ECP (the slash metarules do the work) but the grammars are described by the ECP even if they do not contain any traces. So say GPSG is right (we know it's not of course), then assuming the ECP as stated would mean we could never find the right theory. So it would be a bad methodological idea, even though the ECP is in some sense "law-like". (ETA I am not completely sure I understand the role of the ECP, so this might be wrong in details)


      I think we know that it makes the problem harder, and we know that in some circumstances it may make the problem very much harder, so that it is, on our current technical understanding computationally intractable. We don't know that it will make the problem impossibly hard though, that is just a possibility. I don't have a proof -- this is just a methodological discussion, A proof that FL with GB like structures is unlearnable would of course need a precise description of the theory, as would a proof that it is learnable for that matter. If there is one, that you accept as a reasonable characterisation, then point me to it. Stabler's MGs for example?


      "BTW: I don't agree that I didn't have a learning theory. I thought I told you how to learn binding phenomena."
      So terminologically a learning theory would be a theory of how you learn something -- and I don't see any theory there of how you learn what is learned.How is the syntactic structure of the sentences learned? In fact I was expecting a link to a peer-reviewed journal article that I could pore over. If there is one, can you provide a reference?

      This is a blog comment thread so of course I don't expect a full description, but a pointer to a place where I can find a detailed description would be good. Because there are lots of different and incompatible proposals out there that rely on different and incompatible notions of grammar from Wexler and Culicover's work on now obsolete forms of tranformational grammar, to Yang's work on P & P. I really don't know what view you hold.

      Delete
    12. @Alex C. You seem to be saying that although the ECP is probably innate, we shouldn't say that it is because that would stop us looking for a deeper explanation. However, almost all of the people who actually are looking for deeper explanations of ECP effects think that they are innately determined!

      More generally, it's odd to criticize an explanation for not going deep enough when the alternative is...nothing. When alternative methodologies start coming up with competing explanations, then we can start worrying about depth.

      This cuts both ways, I might add. So e.g., you could very well say the same thing about P&P accounts of grammar acquisition.

      Delete
    13. This is a question for Alex C.:

      Just why do you accept the Chomskyan story for ECP and resulting innateness suggestions in the first place? If ECP were innate we'd expect it to hold cross-linguistically but at least within English. Yet, as has been known for a long time it doesn't. Lets start with a typical example:

      (1) Who do you think __ likes ice cream?

      is good but

      (2) Who do you think that __likes ice cream?

      is bad. But, as noted by Joan Bresnan, and rediscovered twenty years later by Peter Culicover, you can say

      (3) Who do you think that on any given day __ likes ice cream?

      and it's good. The presence of a modifier (or an extracted phrase, as in

      (4) Robin is someone who-j you can be sure that [on any given table]-i __-j will put exactly the wrong silverware __-i

      which is also good) radically improves such examples. The structural condition that was supposed to account for the difference between (1) and (2) is unchanged in (3), and now there is excellent evidence that the whole thing is prosodic (Kandybowicz among others has some interesting analyses arguing this point cross-linguistically)
      So based on what evidence do you accept the Chomskyan ECP story?

      @Alex D. You write:
      "More generally, it's odd to criticize an explanation for not going deep enough when the alternative is...nothing."
      Excellent point. The problem for Chomskyan accounts is that you cannot say X is innate unless you have some biological implementation story. This is quite different for people who want to account for X without the additional assumption that it must be built in the human genome. They may find an account for X which can be implemented on a computer but not in a human brain [recall the radical difference between chess playing computers and humans]. It is quite possible that there is more than one way to implement 'language' in a computational system. What Chomskyans claim is that language IS implemented in a human brain. Everyone has by now noticed the deafening silence my questions about biological implementation generate. Norbert uses his dislike for me as excuse to refuse answering questions that reveal a very inconvenient truth for Chomskyans...

      Delete
    14. "you cannot say X is innate unless you have some biological implementation story."

      Barking (in all its varieties and nuances) is surely innate, yet there is no "biological implementation story" - whatever on earth this means. And anyone who debates this based on the absence of a "biological implementation story" is just not serious.

      Your question of biological implementation has received a "defeaning silence" simply because it is a red-herring to the issues at hand.

      Delete
    15. "Barking (in all its varieties and nuances) is surely innate," Hmm, and speaking English [vs. Mohawk] surely is not innate. So, as even Chomsky never tires to assert, what is at issue is on which side of the innate/acquired divide a given aspect of language is. Now if maybe a serious thinker could answer the question re ECP I'd appreciate it.

      My apologies but I cannot take seriously someone who believes for BIOlinguists the question of biological implementation [the investigation of Chomskyan I-language] is a red herring. Confused individuals who are unaware of biological implementation stories might profit from familiarizing themselves with work done in biology.

      Delete
    16. @X-ina: I just pointed out a problem with your claim. Instead of acknowledging the error in your claim, you went of on a tangent (as usual, I might add). I guess I am happy being confused. As far as I am concerned, it seems better to accept confusion than be wrong and not accept it.

      Now, please spare me more Chomsky talk. I wish someone would count how many times you invoke Chomsky unnecessarily in a paragraph.

      Delete
  5. @AlexD ; let me answer here so as not to get mixed up with CB's biolinguistics.

    "@Alex C. You seem to be saying that although the ECP is probably innate, we shouldn't say that it is because that would stop us looking for a deeper explanation. However, almost all of the people who actually are looking for deeper explanations of ECP effects think that they are innately determined! "

    I don't think ECP is innate. I don't think Norbert does either -- at least he has backed off from that, I think, to the claim that there are ECP effects and the ECP effects are caused by some innate bias.
    This is not just an annoying bit of hair-splitting. So let us distinguish two claims.

    1) The ECP is a true intensional statement about the psychologically real grammars.

    2) The ECP is a good descriptive generalisation about the sorts of phenomena we see in the sound/meaning pairings in natural language.

    So the consensus here (I think ??) is that 1) is false and 2) is true. I am sure someone will correct me if this is wrong.

    Let's assume that for the moment.
    Now my argument is this.

    Any theory of linguistics has to have a number of components that need to fit together.

    Two of those components are
    (UG) a specification of the class of possible grammars
    and
    (LAD) a specification of a learning algorithm that selects one of these on the basis of information available to the child.

    UG and LAD need to fit together quite closely. N argues that we can specify UG without thinking about learning algorithms and someone else will come along later and sort out the uninteresting technical details of the learning algorithm. I think this is a mistake for various technical reasons.
    I think that positing innate stuff that may be false as stated, will make the problem of finding a learning algorithm much harder, perhaps even impossible.
    Whereas perhaps some GGers think that this makes the problem easier, perhaps even trivial (if e.g. the class of grammars becomes finite as a result).

    As a side issue, as you know, I don't think that just saying "P is innate" constitutes an explanation of P.
    But if you want to say that it is an explanation, but a shallow one, then I am not going to argue the point,
    and in any event I am not interested in explaining things like the ECP, but in solving Plato's problem.

    ReplyDelete
    Replies
    1. @Alex C:
      "in any event I am not interested in explaining things like the ECP, but in solving Plato's problem."

      The reason I'm interested in the ECP effects is that they are concrete instances of Plato's Problem. So the idea that you are interested in the latter but not the former (or analogous instances of the former) comes close to being incoherent to me.

      Let me reiterate what I think that saying that the "ECP is innate" means. There are two possible readings. First that it points to where we should look for an explanation of ECP EFFECTS (i.e. the range of data, subject/object asymmetries AND argument adjunct asymmetries). Claims that the ECP is innate in this context amounts to saying the the etiology of these effects lies in the built-in properties of FL. What this leaves open is what exactly about FL it is that accounts for these effects (e.g. the shape of the hypothesis space, the nature of the priors over that space, the shape of the built-in learning algorithm, the shape of the built-in parsing principles, the fact that we have a 4 chamber heart, whatever).

      There is a stronger version of this that I am personally attracted to, but is a question of research strategy AND HERE I CAN SEE PEOPLE DISAGREEING. This takes the ECP principles as approximately correct descriptions of how FL is structured. Note, these principles are not effects as they are not data. They are putative explanations of this data. Now, are these innate? I think that a good strategy is to assume that they are pretty good descriptions of the internal structure of FL but they are not the right ones yet. Rather, these principles follow from something much more fundamental. Again the analogy is between islands and Subjacency: the former are explained by the latter. What's the cash value of this stronger assumption? Well, it tells you to look for accounts that derive the ECP principles. So look for stories that have the ECP principles as limit cases. This stronger view takes the ECP to be a target of explanation, and provided such targets can be (and has been) an extremely powerful research heuristic.

      Now if I understand you correctly, you buy the first weaker reading but not the second. Fine. I do the same with respect to Binding Theory in my own work, arguing that earlier versions of BT (Lees-Klima) are better targets for reduction than the GB version. These are closely related to GB's BT, but I understand why one would want to distinguish them. However, whether you take the first or the second version, can we agree that theories are to be evaluated in terms of how well they derive these effects, the ECP being just one of many? And if so, can you point me to work outside the GG tradition that has provided any way of doing this? (And please don't mention Kluender, Sag etc. on islands as I agree that this was serious and this has been dealt with already (not to mention they were effectively GGers)). So where in the MT literature, for example, or anywhere else, do we have non GG models that explain these effects (any of them) without basically assuming that ECP like principles are innate?

      Again, let me invite you to enlighten us about these models in a blog post that I will gladly put up here on the BLOG so that you can walk us through a concrete example or two.

      Delete
    2. @Alex C: Doesn't this sentiment of yours cut both ways?

      "UG and LAD need to fit together quite closely. N argues that we can specify UG without thinking about learning algorithms and someone else will come along later and sort out the uninteresting technical details of the learning algorithm. I think this is a mistake for various technical reasons.
      I think that positing innate stuff that may be false as stated, will make the problem of finding a learning algorithm much harder, perhaps even impossible."

      First, I don't agree with you that GGers don't have a learning theory in mind. Perhaps, not fleshed out formally, but they are, as far as I can see, at least some that have some sort of a learning framework in mind.

      Second, you yourself acknowledged earlier that your interest was not in the acquisition of human language per se, but "a general theory about how one can learn hierarchically structured languages from strings". Isn't this disconnected from UG then? If so, doesn't this perspective suffer from exactly the things you mention about the other perspective?

      If UG and the LAD need to fit together closely, then discounting or ignoring the former from a learning algorithm should suffer from exactly the same problems that you are worried about.

      NOTE: This is not a gotcha question/statement. I am simply not a learning theorist, and don't have the knowledge necessary to appreciate the claims. But, it did strike me as a bit puzzling so I ventured to ask. I guess, what I am asking for is a clarification of how a general purpose learner can be adapted to accommodate innate information later.

      Delete
    3. @CA: taking your points in order.

      First, you are quite right that many GGers have some sort of informal learning theory in mind. But I don't know what it is. For example, Norbert just posted a paper by Berwick, Chomsky and Piattelli-Palmarini ( here ) that criticises three approaches with explicit associated learning theories (partial, incomplete but explicit), but there is no hint there as to what the learning theory is that they advocate.
      Similarly Norbert posted a linke here to a paper by Hauser et al. which again contains only the sketchiest outlines to some vaguely parameter setting approach. So it would be nice if some of these models could be spelled out in some detail.

      Second, I am interested primarily in human language acquisition -- my approach is to develop a general theory as you quote. This seems like a reasonable (MP-ish) approach. A theory of aerodynamics that explains how birds fly must be based on a general theory that may also account to a greater or lesser extent how bats, insects, low speed planes and hypersonic jets fly. It seems good scientific practice to model the general phenomenon using general principles.

      Finally you say "If UG and the LAD need to fit together closely, then discounting or ignoring the former from a learning algorithm should suffer from exactly the same problems that you are worried about." This is a very very good question. I would rephrase it as saying "should suffer from a completely different set of problems that are just as hard." Because the problems of fitting UG to a given LAD are in a sense complementary to the problems of fitting a LAD to a given UG.
      But I feel reasonably confident that the right tools are available.
      (one of which I am talking about in a couple of weeks at LACL -- where I will see Thomas G I think). But I am far from having a complete answer to these questions.
      I think to answer these questions fully we need a more integrated approach.

      Delete
    4. @Alex C:
      "A theory of aerodynamics that explains how birds fly must be based on a general theory that may also account to a greater or lesser extent how bats, insects, low speed planes and hypersonic jets fly."

      Ok, show us how one of them "flies" linguistically. Even a bare sketch of the unacceptable type that people like me like to give would be acceptable. Just a hint of how the general learning theory will derive some set of effects that we both acknowledge to be prime empirical real estate. Your pick. ECP, WCO, Binding, Islands, whatever. Pick one and MT away.

      Again, let me repeat my offer: if the comments section does not afford enough room I will happily post your outline on the main blog. It would be my pleasure.

      Like Alex D, I am not interested in playing 'gotcha' here. I would really love to see an alternative that gets ANY of these effects in a different way than I tend to think they must be derived. I am skeptical, but not yet close minded.

      Delete
    5. Sorry I am rather busy at the moment and not able to give this discussion the attention it deserves, but Norbert says "However, whether you take the first or the second version, can we agree that theories are to be evaluated in terms of how well they derive these effects, the ECP being just one of many? ?"
      Well, no. I am interested in a different problem. So yes, I am interested in how children learn that "Which cake did you say that John ate?" is ok and
      "Which child did you say that ate the cake?" is not. But my theory doesn't have much to say about that, yet. To say something concrete we need a strong learning theory for MCFGs, and at the moment we have a weak learning theory for MCFGs (as of 2011) and a strong theory of CFGs (as of the beginning of this year) and we can hope to have some suitable results maybe next year, if all goes well.
      But that still won't really account for this distinction since the relevant sentences are kind of rare in the PLD. So I think we need to look perhaps to the syntax semantics interface, and using a CCG analysis to the unavailability of forward crossing composition. If you can do strong learning of MCFGs (a big if) and if you can translate the CCG style constraints on the rules into MCFG constraints (another big if), then it seems like one could build a plausible explanation of how this is acquired. Now this might or might not "derive ECP effects", and it might not explain the cross linguistic regularities but I am not primarily interested in those problems.

      But I want to emphasize this is just programmatic hand-waving. This is not stuff that is properly formalised, evaluated and peer-reviewed etc. These are just vague ideas that do not constitute an explanation. If you push me for the technical details, I will have to waffle or decline to answer.

      But the topic of this post/thread is your unwillingness to pony up a real story -- a story that is more than just hand-waving. And my argument that the lack of a learning theory is symptomatic of a methodological problem at the heart of GG. I don't think it is productive to enlarge the scope of the discussion to a general issue of theory choice -- my theory is better than your theory. We just aren't going to agree on that. I am interested in finding out if my views on the current state of the art are mistaken (are there lots of cool learning papers that spell out the details that I haven't come across) and if my views on the methodology are mistaken.

      Delete
    6. @Alex C:
      " I am interested in a different problem."

      Finally. It's what I've been saying all along. We are doing different things and interested in different questions. They may someday relate to one another. Who can tell. I personally doubt it very much. But, I am no Nostradamus (more was he apparently).

      So, now maybe we can conclude this discussion and agree. You are doing one thing. I another. What I hold constant, you do not. What I take to be the main research issues, you do not. What you take to be the right shape of the problem, I do not. That means that we see the relevant issues in fundamentally different ways. And that's why when doing this kind of work you've got to choose. Choosing does not mean that you cannot peak into what the "other side" is doing. It does not mean that you need to be dismissive of the projects you do not pursue. But it does mean that if you take one perspective on the problem you will AS OF NOW reject the other one as being way off the mark. That's where we started. I take the main project to explain the 30 or so generalizations I keep alluding to (including the ECP). You take this to be a mistake. You take the problem to be to explain how we sift through data that is relatively prevalent in the PLD, while mine is to consider those cases that are not.

      You ask if your methodology is mistaken? Well not relative to the questions that interest you. The problem is not in the method, IMO, but in the problems that attract you. But this is a question to taste and interests, not method. If you asks me how likely it is that the answers to your questions will be relevant to mine, I would say not very likely precisely because it abstracts away from the core problem as I understand it. Might I be wrong? It's logically possible. But, here I stand on the results we've garnered over the last 60 years of GG work. Mine aims to build on this. Yours starts by rejecting its relevance. That is not a bridgeable gap. One of us is due for a fall.

      So, I am happy to end matters here: we seem to finally agree that we are doing very different things that may or may not one day converge. That seems to be about right.

      Delete
  6. @Alex C. I don't want to repeat what N/CA are saying, so let me just comment on your (1) and (2). When I say that the ECP is innate (or that ECP effects are innately determined, or something like that), I don't mean either of (1) or (2). I mean that ECP effects derive primarily from innately-determined properties of the grammar (or LAD, processing systems, and whatever else is involved[*]). This does not entail (1); it entails (2) but is a stronger claim.

    On this understanding it is just a fact that the ECP is innate, just as it's a fact that barking is an innately-determined dog behavior. Whether or not that fact explains anything is largely irrelevant. The point is just that things could have been otherwise but aren't. So, there's a parallel universe where dogs learn to bark by observing other dogs, but we're not in that universe. Knowing this places useful constraints on further investigations of barking.

    It bears repeating that everyone who's ever had anything interesting to say about ECP effects (and I wish I belonged to this small but illustrious group!) has been a generative syntactician of some stripe who starts from the assumption that they're innately determined.

    ReplyDelete
    Replies
    1. So what's the alternative to innateness here? If "ECP effects derive primarily from innately-determined properties of the grammar, LAD, processing systems, etc", what would count as not being this way? Presumably the fact that people use construction A more frequently than construction B is due in part to different input frequencies; the LAD is presumably innately-determined to be sensitive to these, as are the processing systems, etc. I imagine, however, that this shouldn't `count'; can you give me an idea of where the categorical line lies?

      It bears repeating that everyone who's ever had anything interesting to say about ECP effects (and I wish I belonged to this small but illustrious group!) has been a generative syntactician of some stripe who starts from the assumption that they're innately determined.
      They share a number of other properties as well: being human, having hearts, not being politicians, etc. No one looks at ECP effects except for humans with hearts who are not politicians and who are GGers. I don't think that the `typical' GG assumption that things are innately determined plays any essential role in linguistic description or analysis.

      Delete
    2. This comment has been removed by the author.

      Delete
    3. "I don't think that the `typical' GG assumption that things are innately determined plays any essential role in linguistic description or analysis."

      @Greg: Isn't it the crucial presupposition behind the fact that evidence from more than one language can be brought to bear on solving an analytic problem? This has been an abundantly productive strategy in all the discussion of ECP effects, for example. Lots of languages appear to show subject/non-subject asymmetries, but each one of them seems to reveal different pieces of the solution to the question of why they exist. If there was not an innately determined core to the phenomenon, I don't see what would be the explanation for the fact that surfaces in language after language -- and, crucially with regards to your remark, this fact in turn has proved essential to progress on the basic linguistic description and analysis of the phenomenon itself.

      For example, just as a sample of the sort of scientific discourse I have in mind, here’s a paragraph from a recent paper by one of your students at Chicago, Martina Martinović, writing about subject/object extraction asymmetries (ECP effects) in Wolof:

      "[My] analysis improves on Pesetsky and Torrego (2001) by finding an example of a language [Wolof] in which both C and T which moves to C are overt. A weakness of their evidence lies in the fact that in the languages they discuss (Standard English, Belfast English, Spanish), C is always phonologically null, and only T is overt. If my analysis is on the right track, the evidence for a T-to-C analysis of the that-trace effect is considerably strengthened." (http://ling.auf.net/lingbuzz/001874)

      Indeed, as she shows, the relevant phenomena in Wolof look essentially identical to English and Spanish, with a Wolof-specific twist that in turn helps decide among competing analyses of English and Spanish. Maybe she's wrong, or there are other alternatives we should consider, but the productive (IMHO) scientific discourse that passages like this exemplify makes no sense except under the assumption that we are discussing something innate.

      Delete
    4. @Greg:
Linguistic patterns can be due to at least two factors: because that's the way the input data patterns or that's the way FL is built. There can, of course, be combinations of these, but let's put them aside.


      The claim is that native speakers judgments show ECP patterns because of the way FL is structured. What would an example of the converse be? Well, take headedness, i.e. X-comp vs comp-X order. Why do English speakers have X-comp and Japanese have comp-X order? Because kids have tons of evidence within the PLD if they are English speakers that the order os X-comp and Japanese that it is Comp-X. Hence THAT fact is reasonably taken to reflect the nature of the input, the 'patterning' of the data. 



      There are other examples of the same thing: Question formation in English vs Chinese/Japanese, backwards control in Tsez vs English, T to C in English but not Brazilian Portuguese. Here the stories we tell look primarily to the structure/patterning of the input data, not the structural constraints of FL. So, the line lies along the POS divide: GGers have assumed (maybe incorrectly) that IF patterning of the PLD suffices to "induce" the relevant effect, then that's where the causality lies. Where it cannot do so, then that's indication that we need to look at the built-in structure of FL.



      I will let Alex D respond to your second point. However, I was one of those involved in the ECP industry in the mid 80s (developing an idea originating with Aoun, Me, Weinberg and Lightfoot and Aoun developed a theory of Generalized Binding for ECP effects as well as fiexed subject effects). I can testify first hand that we did think that ECP effects were likely innate and that’s why we found them interesting to work on. We though that reducing them to binding theory effects made some sense theoretically and fit in well with what we took the basic architecture of a GB like grammar to be. We were likely incorrect, but if you are interested in motivations, the considerations you cite were among them.(BTW, I know that Lasnik had similar views as did Chomsky and Kayne. So, we were all similarly motivated).



      Last point: you say "I don't think that the `typical' GG assumption that things are innately determined plays any essential role in linguistic description or analysis." 
Depends what fish you want for fry. ECP effects are interesting to study precisely because they might tell us something about the structure of FL. IMO, this makes them far more interesting data than, say, whether a language is VO or OV. If problem selection is included in your "linguistic description or analysis" then you are wrong. If you mean that being innate or not plays no essential role in the details of the analysis, that is correct, at least seen one way. But this is not surprising. That something is innate is a conclusion, not a premise, at least in GB style accounts. Conclusions don't generally tend to act as premises, on pain of circularity. But they do feed other kinds of research (e.g. Crain’s work is a good example, as is Berwick’s and Wexler’s or Lidz’s, a.o.) But then again, conclusions are what motivate inquiry and make some problems more interesting than others. So, IF your interest is in the structure of FL then ECP like problems are good ones to work on as we have good reason to think that whatever causal powers explain them will shed light on the structure of FL. As that's the big game that I am hunting, the distinction is an important one. Of course, you need not hunt them varmints. You might just be interested in language patterns (a al Greenberg). Can't stop you, and might even steal some of your results. But that is a different (though occasionally related) enterprise and it behooves us to keep them conceptually separate.

      Delete
    5. @Both: Social construction theory is not my strong point, and I often forget that research questions are influenced in this way. You're right to point this out, and I should qualify my remarks accordingly (but I don't quite know how). Still, I think that the question of innateness is orthogonal hereto; it is an explanation of shared properties of natural language, which one can investigate without being committed to the ultimate explanation of.

      @David: The methodology you described is often talked about in terms of innateness, but I think it can be understood just as well in a more agnostic way. It seems a reasonable methodological strategy to try to describe something you don't understand in terms of something you do. This can result in giving you new ideas about the thing you originally understood, etc.
      `Principles and parameters' (P&P) simply pushes this strategy to an extreme. But even here, this has nothing to do with innateness, just the observation that the same descriptive tools suffice to describe the objects in question. Whether these tools are necessary is not something the methodology addresses, and it is agnostic as to where the descriptive generalizations come from (built in, data-driven, etc).

      One might ask (and I sometimes worry about this as I'm going to sleep) about the `unreasonable effectiveness' of the P&P methodology. I see three possibilities, which I do not know how to decide between:

      1) This is so effective because it's on to something right.
      This might be because that's actually how languages work, or because languages have certain properties, which this methodology approximately tracks, etc.

      2) This is so effective because it couldn't possibly not work, for any domain.
      Here the idea is that if we have only finitely many objects of study, that this methodology would ultimately arrive at a `least fixed point' grammar, of which the observed objects would be parameterizations.

      3) This only seems effective because we are only sketching out rough accounts.
      According to this possibility, if we were to try to give detailed accounts that you could implement in a parser, the parallels would break down.

      Delete
    6. @Greg:
      "it is an explanation of shared properties of natural language, which one can investigate without being committed to the ultimate explanation of."

      Sure, and one can study data sets of recordings in cloud chambers without being committed to the view that this tells us something about basic particles, or one can study how color changes when you add chemicals together without being committed that it tells you anything about chemical properties of matter or…you get the point.

      You are drawn to instrumentalist conceptions of linguistics. So far as I can tell, this view if carried over to other domains would be considered eccentric, as the above is meant to suggest. Let me repeat that innateness is not an explanation of the linguistic data, rather it is an account of native speaker's knowledge. Linguistic data is a probe into this structure. This is entirely analogous to what happens in fancier domains of the sciences: you have data, it generates theories and you ask what the theories are theories of. You can be entirely instrumentalist about theories if you wish, i.e. they are just there to organize data points. In physics this is theories as accounts of meter readings. In linguistics this is theories as organizations of judgment data points. This is a position to take. I'm not inclined to take it in linguistics or other parts of the sciences. But heh, do whatever makes you happy. What I would suggest is that you not draw invidious distinction between what linguists do and what others do. If instrumentalism is good enough for linguistics, then it should be good enough for the rest of the sciences too.

      Delete
    7. @Norbert:
      I am drawn to anti-realist positions, and (rest assured) not only in linguistics. I wish I hadn't brought this up way back when; I feel like it's a distraction.
      In particular, my comment doesn't rest on this. I was simply pointing out that one can notice, be interested in, and fruitfully investigate, shared properties across languages, without needing to assume a particular reason for their existence. Indeed, finding that a shared property is not `innate' also tells one something about FL.

      Delete
    8. @Greg:
      Correct, one can do research for all sorts of reasons. But scientists often ask of a theory what it is a theory of. Here's one answer, it is a theory OF FL. Here's another, it's a theory OF LANGUAGE. Here's yet a third, it's a theory OF DATA SETS. Now, these differing answers apply not only in linguistics, but in every other area of science as well. That means that pursuing this discussion in the domain of an immature science like linguistics is counterproductive. Your anti-realism is fine with me. But I object to it because it seems to preclude me asking a question I am interested in: what's the structure of FL. Of course if in general you don't find such questions interesting, you won't here either.

      Delete
    9. Reply to ranting academic way back up: I was not aware that it is not permissible on Norbert's blog to express AGREEMENT with Chomsky. In fact I'll go even further and post a link to 4 recent linguistic talks which make his position quite clear:

      http://whamit.mit.edu/2014/06/03/recent-linguistics-talks-by-chomsky/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+whamit+%28Whamit%21%29

      @Alex C.: The question I addressed at you had absolutely nothing to do with biolinguistics [which BTW is not my biolinguistics] - so if you kindly could answer, we'd be grateful.

      Delete
    10. @CB; I am aware that there are lots of (things that look like) exceptions to many of the standard generalisations like the ECP or relative clause islands. You are correct that if they are false as generalizations, or vary cross-linguistically, then it is a very bad idea to posit them as part of UG.

      Delete