Comments on Faculty of Language: Against open-mindedness; a reply to Mark

@Alex: As Alex has ceded the last word, I will tak...

2014-05-22T05:09:40.433-07:00

@Alex:
As Alex has ceded the last word, I will take it. Here's the lay of the land. We have some very good established more or less true generalizations (including RCIs). The aim is first, to explain them (that's what subjacency and the ECP were trying to do) then to explain how native speaker Gs respect them (that's where POS comes in and the assumptions that large parts of subjacency and ECP are either themselves innate or derive from something that is). Then, to explain how they could have arisen (Darwin's problem and MP). These efforts go on simultaneously. Where Alex and I disagree is that though he claims to buy the idea that RCIs (and my other GG facts) are well grounded generalizations (given some Scandinavian problems) his stories never actually try to explain why Gs respect them. He claims to buy the facts, but he refuses to address the consequence of their being facts. Moreover, when someone tries to address these, he objects if they involve assumptions he doesn't like (e.g. the theory of subjacency is innate). What does he do? He changes the topic. I have nothing against his topics (though they don't really interest me much personally) but I do have a lot against his little shuffle: Yes I buy the data but no I don't have to do anything that addresses it but I'm still allowed to be oh so skeptical about any approach that does. This habit of Alex's is deeply counterproductive, not to mention annoying. In fact, it suggests that he doesn't really give a damn about the facts he claims to accept for were he to care he would concede that at this moment he has nothing to say about them. His methods may one day address them, but right now nothing. That would be a refreshing pronouncement. It would even be true. But the yes I buy it, but I won't address it stance is simply squid ink obscuring the empirical landscape.

This will be my last word on this thread as well.

So we have probably reached the point where the in...

2014-05-22T01:13:37.074-07:00

So we have probably reached the point where the insight we can get by continuing is not worth the typing time, but (ignoring for the moment N's distortions of my views)
we are arguing about several different things:
So take RCIs.
There are three different questions that I am interested in to varying degrees, that are getting squished together.

L1A) How does the child learning English/Warlpiri ...come to acquire a grammar that correctly generates the facts that are described by the expression "RCs are islands in English/Warlpiri"?

TYPO) Why is is that so many (maybe all in fact) languages are described by the expression "RCs are islands"?

METHOD) when we construct a theory of L1A, should we start from the typological generalisations like RCI , assuming that they are "innate"? i.e. should we assume that RCI is in UG?

So (L1A) is the fundamental theoretical question of linguistics, and we just aren't going to be able to sort it out in the comments of a blog.

But I think we could have a useful discussion about the methodological point, which is where I do sharply disagree with Norbert. Without, as I have explained several times, denying the truth of many/nearly all of the descriptive generalisations he listed.

I think the final methodological point is a terrible and incredibly counterproductive idea.
I sort of understand, when I am being charitable, why people thought this was a good idea 40 years ago, since there were no alternatives, and people knew nothing about genetics, or learning theory., or the theory of computation.

And I feel the force of the POS arguments raised by Alex D, though as I wrote a book on the topic with Shalom Lappin, I am not going to try to recap the reasons why I think these arguments fail.

But to lapse into Norbertian hyperbole, I think assuming the RCI is innate is the worst possible starting assumption. Once you make that assumption I think it is impossible to make progress. And I think that is why nearly 50 years after Aspects, there are no good answers to (L1A) on the table.

@Alex D: Apologies for the incorrect attribution. ...

2014-05-21T22:52:53.181-07:00

@Alex D: Apologies for the incorrect attribution. It's a big thread, so it's easy to lose track of who said what. I was mostly responding to CA's remark that a good test bed is the kind of stuff that Alex D mentioned earlier: Does the model/story/... make sense in view of the time-course of acquisition actually observed in children? And looking at it again, I probably misconstrued the point of that remark, too.

@Norbert: I know you promised not to talk to me ag...

2014-05-21T21:01:25.369-07:00

@Norbert: I know you promised not to talk to me again but maybe you can explain to your GG community how you can get from Alex' statement above to:

"If Alex wants to go there, let him. But, and here I am addressing the GG community: nothing he has to say is worth worrying about for no discussion can possibly be fruitful. It's like someone denying that there is recursion because of Piraha or constituency because of Walpiri."

How could anything Alex said be on the same level of confusion as someone denying there is recursion because of Piraha? So why brand Alex as such a lunatic? Why are you so defensive? Charles Yang writes in the next blog post [which has been mostly ignored] that "any scientific hypothesis, is guilty until proven innocent". http://facultyoflanguage.blogspot.de/2014/05/how-to-solve-mysteries-reaction-to-our.html

I am unaware that MP [or hypotheses proposed under the MP umbrella] have been proven innocent yet. Are you suggesting GG proposals are no scientific hypotheses? Or will you be going after Charles next for reminding us that, when it comes to certainty that our hypotheses are right, we're all in the same boat? Unless you have some unique divine revelations that give you absolute certainty about the proposal Alex has expressed SOME scepticism about it is rather irresponsible to tell your community: "nothing he has to say is worth worrying about for no discussion can possibly be fruitful"

@benjamin.boerschinger: You are right. I was being...

2014-05-21T19:34:58.895-07:00

@benjamin.boerschinger: You are right. I was being sloppy (or just plain ignorant. Maybe, even "confused". :)).

I meant "model", not "algorithm". But, me and my alter ego will standby the rest of the statements.

2. @ Alex/Thomas Note that this is what happens in...

2014-05-21T14:53:34.832-07:00

2. @ Alex/Thomas
Note that this is what happens in the sciences all the time (again the Gas Laws). So, when I argue that we should take the results of 60 years of work as given, I do not mean that we should not refine these generalizations. I am arguing that we have done more than enough to show that something very like these generalizations is correct and that assuming this is a good way of moving forward. In other words, I am recommending a research strategy based on what we have discovered. From what I gather from Alex, his view is that the fact that the generalization have counter-examples shows us that there is nothing there to explain and so we can ignore these facts. His learning theory need not accommodate them because they are false. This way, IMO, sterility lies. You want to go that way, I cannot stop you. But really, if you take that road, you are blocking research. We can discuss how to fix the CNPC, but really not much will change conceptually. Of course I could be wrong and if Alex wants to show me that his learning theories will work fine if we complexify the island descriptions (i.e. he can show how we acquire English and Scandinavian islands using the available PLD but his story cannot explain English alone) well, I would sing his praises to the stars. But, we all know that Alex cannot do this, and is not really interested in doing this. He is interested in stopping discussion along these lines in any way he can (right Alex?).

So, yes Scandinavian is interesting. I doubt the data as described, but I may be wrong. But, whatever the correct description, it will only make the problem harder, so I suggest for theoretical purposes we ignore it for now. Sort of like ignoring the fact that we are not ideal speaker hearers, that we attain multiple grammars and that learning is not instantaneous. All fine idealizations. Add CNPC to the list.

Last point: same goes for the other 29 laws.

1. @ Alex/Thomas I reply here because the damn sit...

2014-05-21T14:53:08.821-07:00

1. @ Alex/Thomas
I reply here because the damn site prevented me from replying above. Excuse me.
Let me modulate my very last comment. Say that Scandinavian languages do not obey the CNPC (which, to repeat, I don't believe is true in full generality). But let's say it is. So what? It simply makes Plato's Problem harder. So, as a research strategy here's what we do. First assume that islands have no exceptions, then see how to factor some exceptions in. Why? Well, first, it holds pretty well. Recall I have been saying that they are "roughly" accurate. Second, where it holds it is pretty clear it cannot be holding because we have evidence that it holds. So, consider islands the analogue if the ideal gas laws: in ideal languages islands hold. Ok, now we have a question. Why? They could not have been learned in those languages (English being a good example). Why because there is no relevant data in the PLD. So we must assume etc. etc. etc.
\In sum. the facts of Scandinavian bear on how to refine the generalization, not what we should conclude about FL regarding this generalization. Again, it's exactly like the gas laws. They held approximately, but served well enough to undergird statistical mechanics.

Now, let's return to Scandinavian. Let's say they really never conform to the CNPC (again, I don't believe this, but hey, let's say). Could this be accommodated? Sure, there are ways of doing this even within GB. In fact, it would argue for the conclusion that islands are not primitives of UG (a conclusion we have come to long ago). What's the right analysis then. Good question. Say it is A. Now we can ask, if A is innate or learned? Well, given English it cannot be learned (recall A covers both). And off we go.

Alex's skepticism would be worth considering if factoring in the exception that is Scandinavian made the learning problem easier. But of course, if Alex is right, the problem is yet MORE difficult, not less. So, as a research strategy it make sense to assume that the Scandinavian facts are interesting outliers that require a somewhat more subtle description of island effects but they do not change the overall lay of the land.

@Thomas. I didn't say that proposed learning a...

2014-05-21T14:15:57.177-07:00

@Thomas. I didn't say that proposed learning algorithms ought necessarily to replicate the acquisition trajectory of children. I'm not sure which part of my comment you're getting that from. The question was just whether any of Alex C's work does anything to undermine POS arguments for the innateness of the ECP etc. The hurdle to overcome here is a much more basic one than replicating the trajectory of kids' acquisition of ECP effects. It's to explain how it could even in principle be possible for kids to learn the ECP on the basis of the data available to them. I'm open to the possibility that kids do quite a bit of grammar learning, but I don't yet see any work on learning algorithms which significantly reduces the force of POS arguments.

It's not true that mainland scandinavian fails...

2014-05-21T13:33:58.114-07:00

It's not true that mainland scandinavian fails to obey RC islands. Rather it seems to not obey SOME RC islands. Curiously, as has been noted, these islands in the same places in English (which does obey them) are better than the cases that are also bad in Scandinavian. How to describe these is a very interesting project. I personally believe that something along the lines that Dave Kush has pursued is the right way to go (see his co-authored paper in the Sprouse and Hornstein volume). So, I don't buy the description.

Second, I don't buy this, in part, because it if is accurate it would simply be unlearnable. There is no way in hell that the relevant PLD is available. I also don't buy this, because we find the very same contrasts in English.

But let me end: is it Alex;s view that ALL the stuff I cited at the outset is false? That there have been no discovered generalizations? That it's an accident that in language after language most of the islands fall into the same group, that binding works the same way, that ECP effects are as portrayed etc. IF that is the view, then it denies what I have said again and again is akin to global warming denial. There too there are anomalies. There too things are puzzling. There too there are skeptics. And I intend to treat skepticism in my domain rather like I believe these ought to be treated.

SO let me put this bluntly again: you buy that the generalizations we have found are more or less accurate or you cannot play the GG game. Reject these in toto and there is nothing to discuss. If Alex wants to go there, let him. But, and here I am addressing the GG community: nothing he has to say is worth worrying about for no discussion can possibly be fruitful. It's like someone denying that there is recursion because of Piraha or constituency because of Walpiri.

I have repeatedly said that Alex's skepticism is extreme. Your reconstruction of his views (and his endorsement thereof) tells me that I was right (again).

@CA, Alex D: I never understood the sentiment that...

2014-05-21T11:35:41.329-07:00

@CA, Alex D: I never understood the sentiment that the learning algorithm should replicate the learning trajectory of children. If a psycholinguist told a syntactician that their analysis makes the wrong processing predictions, the syntactician probably wouldn't doubt this analysis but rather the assumptions the psycholinguist makes about memory usage, serial VS parallel parsing, etc.

The same applies in the case of learnability. Alex's work provides a general learning algorithm, but there's many different ways of implementing this algorithm. If you wanted to code it up, you would have to make many decisions about data structures, control structures, etc. that are irrelevant for the general mechanism of the algorithm, but will have an effect on how your program behaves.

And keep in mind that the cognitive architecture of children undergoes many changes during the acquisition process, and we have no idea about the computational implications of this --- mostly because CS only has to deal with quantitative changes while running a program (e.g. dynamically expandable clusters of computers) but not qualitative ones (switching a machine from an x86-CPU to ARM without interrupting the OS, which is simply impossible with current architectures), so nobody has studied anything like this.

In sum, the relation between a learning algorithm and its cognitive execution is very tentative, there's a huge number of variables, and we don't even know what exactly they are.

Sure, you can make assumptions and if you get something that mirrors reality, great! But as long as we have no principled reason to deem those assumptions plausible, that's just a nice bonus. It is not an essential requirement for a learning algorithm. In particular, a learning algorithm that mirrors the learning trajectory in some experiments you run but hasn't been proven complete and sound is worse than one that deviates from the learning trajectory but is complete and sound.

Crossed with Thomas -- exactly -- I happened to be...

2014-05-21T11:27:10.402-07:00

Crossed with Thomas -- exactly -- I happened to be at the conference in honour of Elisabet Engdahl and one of the papers was on this topic, and I also had the good fortune to speak to an expert when I was giving a talk in Cambridge last month. There is a very recent paper on this (Christensen, Ken Ramshøj & Anne Mette, Nyvad. 2014. On the nature of escapable relative islands. Nordic Journal of Linguistics 37(1), 29–45.)

So, once again, I agree that a) relative clauses ...

2014-05-21T11:19:28.026-07:00

So, once again, I agree that
a) relative clauses in English are islands to movement
is a true descriptive generalisation.
However I think that
b) relative clauses in mainland Scandinavian are islands to movement
is false, as a descriptive generalisation, though one may be able to rescue it by some theoretical move -- (e.g. redefining what is a relative clause).
So I don't think that as a typological universal that RCI is correct.

I gave some arguments in my earlier comment about why I am not that interested in explaining typological universals. If they are not clear, I am happy to amplify them.

Addendum: The distribution could also be due to th...

2014-05-21T11:14:45.796-07:00

Addendum: The distribution could also be due to the learner, or some aspects of diachronic change. The important point is: if you assume that RCs are not universal islands, you should have a model that correctly predicts that they are islands in the majority of languages we know. Because treating it as a coincidence is never satisfying.

@Norbert: The contentious issue for Alex, methinks...

2014-05-21T11:11:19.908-07:00

@Norbert: The contentious issue for Alex, methinks, is the universality of the observed phenomena. For instance, it looks like relative clause extraction is possible in mainland Scandinavian languages. Your solution is probably to posit that the island constraint still holds and we're not dealing with movement. Alex's solution is to say there is no RC island constraint in these languages.

The latter has some appeal in that a formalism like MGs can enforce island constraints, but doesn't have to, variation is expected. In order to make this an interesting perspective, however, one would have to show that the distribution of RC-island VS RC-non-island across languages is related to some aspect of MGs (similar to recent attempts to related the frequency of natural vs unnatural morphological paradigms to feature decompositions).

@Alex C: I fear that you are becoming attracted t...

2014-05-21T10:46:46.471-07:00

@Alex C:

I fear that you are becoming attracted to the dark side once again. Here's your quote:

You may argue that an argument against this is that it fails to explain why, e.g. all languages have some other property P. It does fail to explain this fact, IF IT IS A FACT (my emph NH). I don't think that is a problem, because there are many other possible "explanations" for this (P is vacuous, P IS FALSE (my emph), P is an accident, P is a functional property, proto-Earth had P etc etc) which explanations are obviously different depending on what P is. (RELATIOVE CLAUSE ISLANDS BEING FALSE (MY EMPH) binary branching being vacuous, etc.) and some of which may for some P overlap (function and common descent for example) . My claim is also that L because of its structure could explain universal properties P even if L contains languages which are not P.

Here's my problem. Some time back I thought that we agreed that the list of effects that GG has discovered are more or less accurate. Are you reneging on this? Tquote above suggests you are. If you are, I am happy to go into my climate science denier routine. The rough accuracy of these discoveries I do not take to be up for grabs. If you do, let everyone know this so that we can all stop wasting our time.

Given that the effects are not/should not be controversial, what is: well how to explain these. Why do we find that Gs respect these effects (again, I take this to be a simple descriptive fact that GG has established as more or less accurate). One answer is that FLs are built to do so. Another answer is that the follow from principles of communication, common descent or whatever. Fine. Show us the money. My story is that they are part of FL and that's WHY all Gs appear to respect them. You don't like this story. Fine. Provide another. But provide one for THESE effects. You asked me to give you some and I did. The game is to explain THEM, or at least my game is. You can play whatever game you want. But what you have no right to do is pretend that you have an account of these phenomena or, even worse, pretend that they do not exist. The latter is climate science denial for it rejects the work of the past 60 years of GG. So, can we agree that ONE project is to try and explain why THESE effects exist in G after G after G? And can we agree that concluding that it is because these IN SOME FORM are innate is ONE possible explanation of these facts? Inquiring minds want to know Alex. Here are the two questions and you can answer yes/no:
1. the effects discovered by GG are legit targets of explanation
2. One possible account is that they or something that leads to them is innate

Your move Alex. Come clean.

@CA: I may be misunderstanding what you mean by &q...

2014-05-21T08:27:14.393-07:00

@CA:
I may be misunderstanding what you mean by "learning algorithm" but this strikes me as somewhat odd:
"In fact, View B [= the learning algorithm does have interesting things to say about the substantive aspects of the observed generalisations in human language] is what unifies some of the "really different and incompatible models" that Norbert mentions. At least some of them, P&P and UG + Bayes, try to grapple with the substantive aspects of the generalisations."

To the contrary, your usual Bayesian approach is rather agnostic about learning algorithms -- what is of interest is what the posterior induced by the specific biases one puts into a model (through specifying prior and likelihood) and the input looks like.
To derive this, you will have to make use of some algorithm or other (except for extremely simple models for which you can just do the math) but the algorithm is usually not of any particular interest. The focus is really on what certain biases plus data imply, not on how a specific algorithm performs.

Similarly, I think the interesting bit about Yang's variational learner (which I you mean when you mention P&P?) is not the "algorithm" which is, in a sense, just error-driven learning, but his setting everything up in terms of multiple grammars.

In any case, for neither "Bayes+UG" nor Yang-style P&P is it the algorithm that has anything substantive to say about the kinds of generalizations observed in human languages.

@Alex C: "I think you intend it as a criticis...

2014-05-21T06:35:12.477-07:00

@Alex C: "I think you intend it as a criticism, it does highlight a difference in attitude".

I actually didn't mean it as a criticism. I was just pointing out that the algorithm could in theory work for "Neanderthal Language", or any hierarchically structure language.

But, as I followed up, this essentially highlights that there are at least two camps out there (nothing new, here). View A: the learning algorithm has nothing particularly interesting to say about the substantive aspects of the observed generalisations in human language. View B: the learning algorithm does have interesting things to say about the substantive aspects of the observed generalisations in human language.

In fact, View B is what unifies some of the "really different and incompatible models" that Norbert mentions. At least some of them, P&P and UG + Bayes, try to grapple with the substantive aspects of the generalisations.

You appear to be have View A: "So Neanderthals, dolphins, martians etc .."

But this is not about the actual acquisition of human language then and it is important to appreciate that, in my opinion. For example, different song-bird species have different song repertoires, and appear to have different learning abilities. Perhaps, they are all finite state or even more constrained than that, and one could come up with a general learning mechanism for FS in general, or that part of the subregular hierarchy, if it that patterns are infect more precise. But, this doesn't actually address the issue at hand which is: how/why does each of those species develop/acquire their particular songs. Sure, a general purpose mechanism might help guide our view of the acquisition process. But, it in no way (at least to me) directly addresses the issue of prior/innate knowledge, or of the specifics involved in any one species.

Could it be that a general learning algorithm is all that a human being needs (or for that matter different species of songbirds). Sure, but that has to be shown. The algoritm will have to make sense wrt to the question that Alex D raised earlier (repeat):

Does the model/story/... make sense in view of the time-course of acquisition actually observed in children?

Do children's experiences contain the relevant distributional evidence necessary at/by the time children seem to know the relevant generalizations?

And what I added: Does the generalization IGNORE the relevant patterns observable in the input?

I think I am beginning to repeat myself, so I should probably stop and let others continue the discussion. I hope you won't be offended by that, if you do respond.

Note: I ignored your comment where you replied to the charge of vagueness, because it appears you misunderstood what I meant. I was referring to your vagueness related to the substantive aspects of linguistic generalisations.

@CA: you said "The work might as well be abou...

2014-05-21T04:07:36.700-07:00

@CA: you said "The work might as well be about "Neanderthal language". "
This is a very interesting remark; I think you intend it as a criticism, but it does highlight a difference in attitude. So I am trying to develop a general theory about how one can learn hierarchically structured languages from strings, -- so I think (tentatively..) that I would expect any evolved learned communication system that is based on hierarchically structured sequences of symbols to fall into these classes. So Neanderthals, dolphins, martians etc .. Different subclasses of course, but there are general arguments that suggest that these distributional techniques are optimal in a certain technical sense. That is a point of difference I think from Chomsky's view which was (I don't know if it still is) that there would be very human-specific aspects to UG. The recent evolang paper we discussed here contained none of that rhetoric though so perhaps he has changed his mind on that.

@Thomas: Thank you for your patience. I think it s...

2014-05-21T02:01:00.110-07:00

@Thomas: Thank you for your patience. I think it slowly emerges that we are talking about different issues when we talk about 'the language learner'. You seem to have the formal learner in mind while I am concerned about what an actual child in the real world does. Obviously the tasks faced by those 2 learners are very different. Yours indeed needs to settle on the correct grammar while for mine that really does not seem to matter [assume 10 grammars overlap completely in the range of the PLD encountered by the kid; why would it matter to her if she 'constructs' grammar 4 or 8?]. Also it makes sense for your learner to consider grammar-construction in isolation while for my learner if grammar construction is a goal at all it is one among many - most important is probably being able to communicate with parents and other kids. As long as she accomplishes that any grammar [fragment] she might settle on should be fine.

Also note that POS is less persuasive for real world learning: as you say, kids do not achieve complete mastery of any grammar and they do receive all kind of linguistic and non-linguistic feedback - [even though that may be difficult to measure]. I notice often that non-nativists commit the following error of reasoning: when they have succeeded to model some aspect of language acquisition with a domain general model they will say: see therefore kids do not need a domain specific mechanism either. But this does not follow at all - kids still may have a domain specific mechanism. Similarly; if you show that an abstract learner that depends on PLD alone could not succeed in grammar construction unless there are strong innate constraints, it does not follow that kids [who rely on a host of other information] could not succeed either.

As for the evolutionary account: imagine you construct a 'perfect acquisition model' but it turns out such a model could not have evolved in humans. Then you'd know that humans must acquire language in a different way [unless you're a Platonist about language and do not claim it is part of human biology - then it truly does not matter].

I guess for the time being we can just agree that we're interested in different things when we talk about language acquisition. Thanks for the literature files, I'll try to read them soon. One last point: Postal has said repeatedly [in print] that his work is no longer in the GG framework. I think as a matter of curtesy one should accept that Postal knows best what framework Postal's work falls under.

My dialectic was very unclear, so I am sorry for a...

2014-05-21T01:50:36.863-07:00

My dialectic was very unclear, so I am sorry for any confusion. I trying to explain what Norbert and I take to be the fundamental problem (language acquisition) and I am definitely not trying to explain typological universals. So explanations of language acquisition should be precise and not vague.
So I claim that some class of languages L, for concreteness, say the class of well nested MCFLs of dimension 2 with the finite context property, is a) learnable and b) contains the natural languages. This then locates the explanation of some universals (e.g. mildly context-sensitivity) in UG, and the explanation of other universals outside of UG.
You may argue that an argument against this is that it fails to explain why, e.g. all languages have some other property P. It does fail to explain this fact, if it is a fact. I don't think that is a problem, because there are many other possible "explanations" for this (P is vacuous, P is false, P is an accident, P is a functional property, proto-Earth had P etc etc) which explanations are obviously different depending on what P is. (Relative clause islands being false, binary branching being vacuous, etc.) and some of which may for some P overlap (function and common descent for example) . My claim is also that L because of its structure could explain universal properties P even if L contains languages which are not P.
And I gave a v simple example (this is a comment on a blog!). So I am vague about the things I am not trying to explain and precise about the things I am trying to explain. But Norbert is being vague and contradictory (in my opinion) about what he is trying to explain (language acquisition) and that is what I am "sincerely" complaining about.

@Avery: the notion of " linguistically signif...

2014-05-21T00:41:00.603-07:00

@Avery: the notion of " linguistically significant generalization" is quite hard to pin down. Pullum, Hurford etc have written on this. There are obviously generalizations that one can make that shouldn't be represented in the grammar, and how to draw the distinction between those and the LSGs is a subtle question. Maybe I wouldn't say that they aren't empirical, rather that the boundary between the empirical and the non-empirical is not clear in this case.
E.g. the first part of Hurford 1977 "The signficance of Linguistic Generalizations" where he says "The motto of such linguists could fairly be represented as 'Seek out all the LSG's you can and capture these in your theory.' To cite examples of this attitude would be superfluous: every linguist is familiar with the kudos of 'capturing' LSG's and the odium of 'missing' them.".

@Alex C: (Part II) c) "Thirdly there are man...

2014-05-20T21:56:35.604-07:00

@Alex C: (Part II)

c) "Thirdly there are many many different explanations for why all languages have some property P; basic functional explanations,
common descent, communicative efficiency etc etc. and of course just coincidence (as Piantadosi and Gibson argue)."

Again, to me, this is not an explanation. In fact, it is not at all clear that these different source of explanations will be mutually compatible. We know at least from the realm of phonology/phonetics that communicative efficiency, production and perception don't always lead to compatible expectations (and infect, all of these terms are terribly vague). So, again, as you try to argue wrt respect to Norbert's statement about different ways of incorporating innate knowledge ("These are all really different and incompatible models"), the ones that you mention are not necessarily in sync. In which case, just as you expect clarification from Norbert, I see it as fair to expect clarification from you as to which of these "potential factors" actually accounts for the observed substantive laws/effects…? In the absence of such a clear exposition, I am afraid you are again being as unclear as you complain others are.

Again, it is not my point to say that to focus purely on formal universals/laws is a bad strategy. Sure go for it. But, to allege that the generative viewpoint is being vague/sloppy while you yourself are being so with (substantive) generalizations that are important to generative linguists seems a very inconsistent position to maintain.

In fact, I can echo what Norbert (I think) has tried to communicate in the past. You can't say you are playing the game till you take both formal and substantive generalisations/laws/effects seriously. By systematically ignoring one source of information and presenting potential sources of explanations without precision/clarity, you simply cannot claim that general learning mechanisms (with perhaps little innate information) are going to work. The work might as well be about "Neanderthal language".

I acknowledge I have not presented an argument for strong innateness (more precisely, domain specific), but that was never my intention. And as I said earlier, I can't imagine any sensible researcher wanting to hang on to strong innateness if there are potential (general) cognitive explanations. The real question is, "Are there any that work or come close to working?"

d) "I don't think that a statement "P is innate" is an explanation of anything".

Surely, it is not a big chunk of the explanation. But, it could be the beginning of one. Just as "barking is innate" is the beginning of any work on how dogs develop into adult barkers. Furthermore, "P is not innate" is not any more of a full explanation. What we want is a (reasonably explicit/clear) explanation of ALL the relevant facts.

e) The debate/disagreement clearly is about what the relevant facts are that need a learning-model based explanation. And here a good test bed is the kind of stuff that Alex D mentioned earlier: Does the model/story/... make sense in view of the time-course of acquisition actually observed in children?

I think a potentially useful exercise given the disagreements on this blog would be to try and clarify (precisely) what would count as a good argument against one's own position. Now, it might be futile to try this for general viewpoints/programmes, but it might be worth it to ask such a question about specific aspects.

@Alex C: (Part I) a) "I really don't thi...

2014-05-20T21:56:05.129-07:00

@Alex C: (Part I)

a) "I really don't think the "You can learn too much" argument is any good".

I can recognize that this might not be an important aspect of linguistic data to you, but the surfeit of data is actually a problem. There are a LOT of potential generalizations in the data that are simply not made. One has to think about why that is the case. What separates the ones that are made from the ones that aren't? By just saying that such arguments are simply not as convincing, we are potentially ignoring a good source of information about the set of generalizations that we need a theory for, especially if only some kinds of generalizations are made, while others are systematically ignored. Of course, you can always say that is not interesting/useful/convincing, but I don't find that attitude particularly useful in understanding language.

b) "The classes of languages that they learn are not closed under union and so there are languages where L is learnable but L union the reversal of L is not learnable. So that does potentially provide an answer for why we don't see questions being formed by reversal."

Note, this is not very different from the hand-waving you complain that Norbert indulges in wrt to acquisition models. Could the facts that you mention above account for such reversals, perhaps, but equally well, they might not. For example, a grammar that can learn {a^n b^n | n > 0}, could also learn {b^n a^n | n > 0}, where "a" =/= "b". Here, the relationship is formally identical, but substantively different.

So, it is crucial that one show that the relevant reversals are unlearnable for the proposed (general) learning mechanisms, if you believe that the learning mechanism is the locus of the explanation for it (your answer suggests that you do think this is true for some generalizations, at least). Furthermore, what you mention is at least as vague as the things you have been complaining about. Of course, I respect that you have a difference of opinion, or that you value a different set of facts as crucial to understand, but to not acknowledge your own vagueness while complaining about someone else's is not terribly sincere.

One aspect of Alex' views that I don't und...

2014-05-20T17:06:51.695-07:00

One aspect of Alex' views that I don't understand is why he thinks that capturing linguistically significant generalizations isn't empirical - what is true about them, like all the other sources of evidence, is that they don't uniquely determine grammars. There does appear to me to be a kind of 'justificatory discontinuity', whereby it is relatively easy to motivate basic ideas about constituency, phrase types and features, and a few other points such as wh-gaps, but then, in order to build up a system that can actually relate sound to meaning in the manner envisioned in the late 1960s and maintained as the main goal ever since, a lot of stuff seems to have to be made up.

Originally, transformations seemed to address this problem by letting us work back from overt to covert structures one small step at a time, but they had apparently insuperable problems, and people started making up different kinds of stuff to deal with that, none of these inventions being sufficiently well justified for almost everybody to accept them. But this problem is presumably one of too much ignorance, not one specific to capturing generalizations, which in fact allow quite a lot of pruning of alternatives, excluding, for example, versions of PSG without something like features (since the internal structure of NPs with different feature combinations is (almost?) always almost (completely?) identical.

@Christina:I am not convinced by the argument that...

2014-05-20T12:18:37.717-07:00

@Christina:I am not convinced by the argument that having innately specified limits on the classes of possible grammars is required/helpful.
The learner needs to know the target class of languages. Even seemingly minor points of uncertainty can bring down learning algorithms. For instance, the class of strictly 2-local languages is learnable in the Gold paradigm, and so is the class of strictly 3-local languages, but the union of those two classes is not. Even in your scenario where there is only one language, the learner needs to know the target class, because there are infinitely many languages that it could generalize to from a finite amount of input. And since a learner learns a language by constructing a grammar for it, the prior of the learner amounts to a restriction on the class of grammars.

But what evolutionary story do you have to convince me [or others] that the generativist framework ought to be preferred?
I'm not trying to convince you of anything except that the argument you presented above does not show that strong nativism is evolutionary implausible. That being said, if you're asking why generative frameworks should be preferred to other alternatives for the analysis of language, why would an evolutionary argument be necessary? The frameworks tackling the kind of phenomena Norbert listed above are all generativist (GB, Minimalism, TAG, CCG, LFG, HPSG, GPSG, Postal's work, Jackendoff & Culicover, Arc-Pair grammar), there is no other game in town.

Thanks for the literature reference, do you have by chance anything in pdf form one can access when away from university libraries?
Here's an overview paper. You'll have to unzip it and convert it from ps to pdf.

Lets assume on my story only general domain components A,B,C are genetically encoded
This strikes me as incompatible with the assumption in your thought experiment that a specific grammar is innate, i.e. genetically encoded. At any rate, if you can encode a grammar, you can also encode restrictions on the class of grammars. The former requires more than the latter since the latter can be done by giving what amounts to a partial definition of a grammar, e.g. in Roger's L2KP (see the linked paper).

You say "learning is imperfect, wherefore even a single language as input will give rise to multiple grammars in the population. Hence there is no convergence towards a single grammar". This seems compatible with very weak nativist views, what exactly is the contribution of strong nativism then?
Yes, of course it is. The point was not to support strong nativism but to argue against your argument against strong nativism.

I am not aware of anyone making such a claim, can you please provide an example from the literature?
This has come up in personal discussions with biologist friends of mine, who do not work on language evolution, so their ideas about language are often very "unmentalistic". My point was not that this is a common fallacy that utterly discredits biologists, I just thought it would be a good example for why encoding matters. But I already said above that it probably isn't all that great an example.

On Chomsky's view I know a grammar when my innate language organ is in a certain state S. Do you share this view?
I'm agnostic. Formally, knowing a grammar means that the learning algorithm has converged to a grammar that generates all and only the strings of the language from which the input strings were sampled. Realistically, this exact generation requirement is too strong, but for everything I care about the formal abstraction is good enough.