Tuesday, February 11, 2014

Plato, Darwin, P&P and variation

Alex C (in the comment section here (Feb. 1)) makes a point that I’ve encountered before that I would like to comment on. He notes that Chomsky has stopped worrying about Plato’s Problem (PP) (as has much of “theoretical” linguistics as I noted in the previous post) and suggests (maybe this is too much to attribute to him, if so, sorry Alex) that this is due to Darwin’s Problems (DP) occupying center stage at present. I don’t want to argue with this factual claim, for I believe that there’s lots of truth to it (though IMO, as readers of the last several posts have no doubt gathered, theory of any kind is largely absent from current research). What I want to observe is that (1) there is a tension between PP and DP and (2) that resolving it opens an important place for theoretical speculation. IMO, one of the more interesting facets of current theoretical work is that it proposes a way of resolving this tension in an empirically interesting way. This is what I want to talk about.

First the tension: PP is the observation that the PLD the child uses in developing its G is impoverished in various ways when one compares it to the properties of Gs that children attain. PP, then, is another name for the Poverty of Stimulus Problem (POS).  Generative Grammarians have proposed to “solve” this problem by packing FL with principles of UG, many of which are very language specific (LS), at least if GB is taken as a guide to the content of FL.  By LS, I mean that the principles advert to very linguisticky objects (e.g. Subjects, tensed clauses, governors, case assigners, barriers, islands, c-command, etc) and very linguisticky operations (agreement, movement, binding, case assignment, etc.).  The idea has been that making UG rich enough and endowing it with LS innate structure will allow our theories of FL to attain explanatory adequacy, i.e. to explain how, say, Gs obey islands despite the absence of good and bad data relevant to fixing them present in the PLD. 

By now, all of this is pretty standard stuff (which is not to say that everyone buys into the scheme (Alex?)), and, for the most part, I am a big fan of POS arguments of this kind and their attendant conclusions. However, even given this, the theoretical problem that PP poses has hardly been solved. What we do have (again assuming that the POS arguments are well founded (which I do believe)) is a list of (plausibly) invariant(ish) properties of Gs and an explanation for why these can emerge in Gs in the absence of the relevant data in the PLD required to fix them. Thus, why do movement rules in a given G resist extraction from islands? Because something like the Subjacency/Barriers theory is part of every Language Acquisition Device’s (LAD) FL, that’s why.

However, even given this, what we still don’t have is an adequate account of how the variant properties of Gs emerge when planted in a particular PLD environment. Why is there V to T in French but not in English? Why do we have inverse control in Tsez but not Polish? Why wh-in-situ in Chinese but multiple wh to C in Bulgarian. The answer GB provided (and so far as I can tell, the answer still) is that FL contains parameters that can be set in different ways on the basis of PLD and the various Gs we have are the result of differential parameter setting. This is the story, but we have known for quite a while that this is less a solution to the question of how Gs emerge in all their variety than it is an explanation schema for a solution. P&P models, in other words, are not so much well worked out theories than they are part of a general recipe for a theory that were we able to cook it, would produce just the kind of FL that could provide a satisfying answer to the question of how Gs can vary so much. Moreover, as many have observed (Dresher and Janet Fodor are two notable examples, see below) there are serious problems with successfully fleshing out a P&P model.

Here are two: (i) the hope that many variant properties of Gs would hinge on fixing a small number of parameters seems increasingly empirically uncertain. Cederic Boeckx and Fritz Newmeyer have been arguing this for a while, and while their claims are debated (and by very intelligent people so, at least for a non-expert like me, the dust is still too unsettled to reach firm conclusions), it seems pretty clear that the empirical merits of earlier proposed parameterizations are less obvious than we took them to be. Indeed, there appears to some skepticism about whether there are any macro-parameters (in Baker’s sense[1]) and many of the micro-parametric proposals seem to end up restating what we observe in the data: that languages can differ. What made early macro-parameter theories interesting is the idea that differences among Gs come in largish clumps. The relation between a given parameter setting and the attested surface differences was understood as one to many. If, however, it turns out that every parameter correlates with just a single difference then the value of a parametric approach becomes quite unclear, at least so far as acquisition considerations are concerned. Why? Because it implies that surface differences are just due to differing PLD, not to the different options inherent in the structure of FL. In other words, if we end up with one parameter per surface difference then variation among Gs will not be as much of a window into the structure of FL as we thought it could be.

Here’s another problem: (ii) the likely parameters are not independent. Dresher (and friends) has demonstrated this for stress systems and Fodor (and friends) has provided analogous results for syntax.  The problem with a theory where parameters are not independent is that they make it very hard to see how acquisition could be incremental. If it turns out that the value of any parameter is conditional on the value of every other parameter (or very many others) then it would seem that we are stuck with a model in which all parameters must be set at once (i.e. instantaneous learning). This is not good! To evade this problem, we need some way of imposing independence on the parameters so that they can be set piecemeal without fear of having to re-set them later on. Both Dresher and Fodor have proposed ways of solving this independence problem (both elaborate a richer learning theory for parameter values to accommodate this problem). But, I think that it is fair to say that we are still a long way from a working solution. Moreover, the solutions provided all involve greatly enriching FL in a very LS way. This is where PP runs into DP. So let’s return to the aforementioned tension between PP and DP.

One way to solve PP is to enrich FL. The problem is that the richer and more linguistically parochial FL is, the harder it becomes to understand how it might have evolved. In other words, our standard GB tack in solving PP (LS enrichment of FL) appears to make answering DP harder. Note I say ‘appears.’ There are really two problems, and they are not equally acute. Let me explain.

As noted above, we have two things that a rich FL has been used to explain; (a) invariances characteristic of all Gs and (b) the attested variation among Gs. In a P&P model, the first ‘P’ handles (a) and the second (b). I believe that we have seen glimmers of how to resolve the tension between PP’s demands on FL versus DP’s as regards the principles part of P&P. Where things have become far more obscure (and even this might be too kind) involves the second parametric P. Here’s what I mean.

As I’ve argued in the past, one important minimalist project has been to do for the principles of GB what Chomsky did for islands and movement via the theory of subjacency in On Wh Movement (OWM). What Chomsky did in this paper is theoretically unify the disparate island effects by unifying all non-local (A’) dependency constructions by proposing that they have a common movement core (viz. move WH) subject to locality restrictions characterized by Bounding Theory (BT). This was terrifically inventive theory and aside from rationalizing/unifying Ross’s very disparate Island Effects, the combination of Move WH + BT predicted that all long movement would have to be successive cyclic (and even predicted a few more islands, e.g. subject islands and Wh-islands).[2]

But to get back to PP and DP, one way of regarding MP work over the last 20 years is as an attempt to do for GB modules what Chomsky did for Ross’s Islands. I’ve suggested this many times before but what I want to emphasize here is that this MP project is perfectly in harmony with the PP observation that we want to explain many of the invariances witnessed across Gs in terms of an innately structured FL. Here there is no real tension if this kind of unification can be realized. Why not? Because if successful we retain the GB generalizations. Just as Move WH + BT retain Ross’s generalizations, a successful unification within MP will retain GB’s (more or less) and so we can continue to tell the very same story about why Gs display the invariances attested as we did before. Thus, wrt this POS problem, there is a way to harmonize DP concerns with PP concerns. Of course, this does not mean that we will successfully manage to unify the GB modules in a Move WH + BT way, but we understand what a successful solution would look like and, IMO, we have every reason to be hopeful, though this is not the place to defend this view.

So, the principles part of P&P is, we might say, DP compatible (little joke here for the cognoscenti). The problem lies with the second P. FL on GB was understood to provide not only the principles of invariance but also to specify all the possible ways that Gs could differ. The parameters in GB were part of FL! And it is hard to see how to square this with DP given the terrific linguistic specificity of these parameters. The MP conceit has been to try and understand what Gs do in terms of one (perhaps)[3] linguistically specific operation (Merge) interacting with many general cognitive/computational operations/principles.  In other words, the aim has been to reduce the parochialism of the GB version of FL. The problem with the GB conception of parameters is that it is hard to see how to recast them in similarly general terms. All the parameters exploit notions that seem very very linguo-centric. This is especially true of micro parameters, but it is even true of macro ones. So, theoretically, parameters present a real problem for DP, and this is why the problems alluded to earlier have been taken by some (e.g. me) to suggest that maybe FL has little to say about G-variation. Moreover, it might explain why it is that, with DP becoming prominent, some of the interest in PP has seemed to wane. It is due to a dawning realization that maybe the structure of FL (our theory of UG) has little to say directly about grammatical variation and typology. Taken together PP and DP can usefully constrain our theories of FL, but mainly in licensing certain inferences about what kinds of invariances we will likely discover (indeed have discovered). However, when it comes to understanding variation, if parameters cannot be bleached of their LSity (and right now, this looks to me like a very rough road), it looks to me like they will never be made to fit with the leading ideas of MP, which are in turn driven by DP. 

So, Alex C was onto something important IMO. Linguists tend to believe that understanding variation is key to understanding FL. This is taken as virtually an article of faith. However, I am no longer so sure that this is a well founded presumption. DP provides us with some reasons to doubt that the range of variation reflects intrinsic properties of FL. If that is correct, then variation per se may me of little interest for those interested in liming the basic architecture of FL. Studying various Gs will, of course, remain a useful tool for in getting the details of the invariant principles and operations right. But, unlike earlier GB P&P models, there is at least an argument to be made (and one that I personally find compelling) that the range of G-variation has nothing whatsoever to do with the structure of FL and so will shed no light on two of the fundamental questions in Generative Grammar: what’s the structure of FL and why?[4]





[1] Though Baker, a really smart guy, thinks that there are so please don’t take me as endorsing the view that there aren’t any. I just don’t know. This is just my impression from linguist in the street interviews.
[2] The confirmation of this prediction was one of the great successes of generative grammar and the papers by, e.g. Kayne and Pollock, McCloskey, Chung, Torrego, and many others are still worth reading and re-reading. It is worth noting that the Move WH + BT story was largely driven by theoretical considerations, as Chomsky makes clear in OWM. The gratifying part is that the theory proved to be so empirically fecund.
[3] Note the ‘perhaps.’ If even merge is in the current parlance “third factor” then there is nothing taken to be linguistically special about FL.
[4] Note that this quite a bit of room for “learning” theory. For if the range of variation is not built into FL then why we see the variation we do must be due to how we acquire Gs given FL/UG.  The latter will still be important (indeed critical) in that any larning theory will have to incorporate the isolated invariances. However, a large part of the range of variation will fall outside the purview of FL. I discuss this somewhat in the last chapter of A theory if syntax for any of you with a prurient interest in such matters. See, in particular, the suggestion that we drop the switch analogy in favor of a more geometrical one.

27 comments:

  1. I agree with nearly all of this, up to and including the pessimistic conclusion about whether the "philological" side of linguistics is really going to shed light on FL.

    But can you study invariance without studying variation? It seems like they are two sides of the same coin. If one says that the class of grammars is some set G, is that specifying the invariant properties or the variant properties?

    It seems like the difference is more between an "inside out" approach, where you start from some (hypothesized) grammars for natural languages, and then try to come up with some G that includes them, or an "outside in" approach where you start from some very large class like CFGs, or MCFGs or MGs, and then try to shrink it in some way.

    ReplyDelete
    Replies
    1. Looking at various Gs will be useful in refining the principles. This is indeed one role that all the cross-ling stuff in the 80s and 90s did. What it won't do, if the conclusion is correct, is DIRECTLY shed light on the nature of FL, as it would have done if parameters were actually specified by FL. This changes the game quite a bit, I believe. IMO, if one looks back at the cross grammar research, it looks like the basic principles articulated in LGB, say, were more or less proven to be pretty correct, though incomplete. There was not much said, for example, about long distance anaphors and there was some discussion about whether islands were really varied from G to G (is Italian really different from English or are specific WHs extractable in the same contexts in both languages) and there was some good discussion about how some apparent surface differences did not indicate that one needed to really parameterize them (e.g. fixed subject effects absent in Italian, or differences in case marking allowing for hyper-raising in one language but not another). But, from where I sit the real success was the identification of these pretty invariant principles which popped up in more or less the same way across Gs. So, this part worked. What did not work was the identification of robust parameters, which cross ling variation was intended to identify as well. Maybe this is being overly pessimistic, however, for, as I noted, I am not an expert in these areas. So, sure, variational study can help, but not in the way originally envisaged, I think.

      I don't think that I am sure I agree with the two approaches you outlined. Given where we are now, I think that we have a pretty good idea as to what a reasonable chunk of UG looks like. The problem now is to understand why it has these properties. And even in this we have gained some non-trivial understanding due to MP (IMO, of course). Still on the agenda is to figure out those parts that still stick out (mainly ECP and subjacency) in a theoretically unruly manner. Or at least that's what's on my agenda.

      Delete
    2. I agree with Alex's point that variance and invariance are two sides of the same coin, so I'm not sure it makes sense to say that GB had these two separate components, principles and parameters, and MP has a good shot at explaining the former but not the latter. If you lay out a set of parameters then you're implicitly defining a set of invariant principles, and vice-versa. But one way of re-stating Norbert's point is just which side of that vice-versa we focus on. In GB days a common metaphor was the switchboard, so what was explicitly defined were the parameters, and the invariant principles were, in a certain sense, implicitly stated by assuming that they were there inside the wiring of the box without having any switches that provided access to them and turned them on or off. The "reverse" metaphor which Norbert mentioned to me many years ago and which I like a lot (and which I was expecting to make an appearance near the end of this blog post), is to replace the switchboard with something like a straight edge and compass, where the focus is on the invariant toolkit. You could of course come up with "parameters" describing the array of ways in which a straight edge and compass can be used, so they're really just two sides of the same coin as Alex said, but it just might not be the most insightful way to think about the range of producible constructions. The new metaphor doesn't really get rid of parameters, any more than the switchboard got rid of invariant principles, but flipping the arrangement of which is defined in terms of the other might be useful.

      Delete
    3. They are indeed related but the shift away from parameters has an impaction for variational studies that I was pointing to and the metaphor you noted also suggests, viz. that there is no bound on the range of variation within Gs. The idea in the GB era, one that Chomsky emphasized a lot, was that the possible range of variation in a parameters framework was finite. Given fixed number of two valued parameters this seems indeed correct. And given this picture one aim of studying various grammars was to fix these finite parameters. One implication of MP seems to me to be that this is the wrong way to think about variation. There is no fixed limit. There are fixed principles but there is no fixed limit. This is a partial return to the Aspects model in which there was an evaluation measure which ordered grammars wrt some simplicity measure. But this did no put a hard bound on possible variation. The same with the compass and ruler analogy. There is no limit on the different kinds of different figures one can draw with these tools even if there are many that you cannot. So, is the suggestion to get rid of parameters? Yes. Does this mean no variation? No. Is there only finite range of grammatical options? Yes on the parameters view, not on the non-parameter view. However, what this means is that we need some way of "guiding" variation, the analogue of the evaluation metric. That's what I'm thinking about now, with only partial success.

      Delete
    4. Excellent, because I think it would be catastrophic for the reputation of the field to abandon PP at this point; this was supposed to be what generative grammar was about for more than fifty years and suddenly poof, it's gone??

      More details about your perception of the difficulty with the evaluation metric would be welcome, since it's a very old idea, which unfortunately got abandoned by most people in the field at about the time that Stan Peters produced a relatively easy to understand account of it in his 1972 paper on the Projection Problem (PP for both "Plato's Problem" and the
      "Projection Problem" works for me).

      Delete
    5. The idea in the GB era, one that Chomsky emphasized a lot, was that the possible range of variation in a parameters framework was finite.

      Fair enough, this is a substantive difference that my previous comment overlooked. (Not that there's anything incoherent about parametrizing an infinite space, but point taken, this is not the way the term "parameter" was used.) And the upshot of this for variation is perhaps that when variation is assumed to be finite, there's some sense or appeal to the idea of looking around to find "all the variation", at which point we'll have a full description of something; but when variation is not finite there is no such end point, even in principle.

      Delete
    6. The main problem I have with the current views of the Evaluation Metric (EM) is that I don't know what it is. The one in Aspects does seem to be intractable, as Chomsky has noted. We never had a way of globally evaluating grammars and this is why Chomsky thought that the problem became at least conceptually tractable by moving to a finite number of binary parameters. In this I think he was right. Sadly, I do not think that this method has panned out well. So what do we put in its place.

      Here's what I have been thinking. WARNING: this is somewhat half baked. Ok, the input to the LAD is two kinds of info: a thematic+UG grammar structure (I'll illustrate in a minute) and a surface form, roughly a string of words. So for something like (1) we have, GIVEN UG+Utah, (2):
      (1) John hugged Mary
      (2) [TP John [Hug+past[vP Mary [vP John v [VP hug Mary]]]]]
      You can fiddle with the labels here, but the idea is that we assume that case theory etc requires relations to case positions and Utah tells us where the DPs start and morphosyntax tells us that V raises to T. Again we can fiddle, but these are roughly UG driven. The big GIVEN is that Utah gives one theta structure and this is "visible" and the assumption that for A to enter into a grammatical relation with B then A and B have to merge. At any rate, given (2) and the linear order of words in (1) what the learner needs to do is decide which copies in (2) to retain and which to delete to match the word order. In this sentence the problem is pretty trivial. It gets more interesting when we have binding, control etc but here I assume movement accounts of these so the problem does not really change much (there is the problem of learning the right morphosyntax, but I will put this aside). Where things become hairier and I think we need more is when we start seeing adverbs negation etc coming in. Here we have lots of word order possibilities that the simple structures above don't help you with. So here's the idea (very half (quarter, eight) baked: to facilitate the mapping from (2) to (1) one can throw in functional nodes to hang more structure on, but this comes at a cost. Think Bayes and Chinese buffets and Indian dinners here. So, what we need is a procedure for complexifying the basic structures like (2) by addition of more functional junk but this complexifying is costly so that you don't throw in anything anywhere all the time. The problem is that this is where I run out if ideas, in fact run out of technical competence.

      Hope this vague stuff helps. What I think we need, in sum, is not parameters, but a procedure that starts with what we are natively given (and this all of us are natively given) and then a rule for adding functional heads as needed to match the surface form by deletion of copies. That's it. Hope this rambling helps.

      Delete
    7. By 'intractable' do you mean computationally intractable in that there is no effective procedure for finding the best grammar (as known from the start), or just a hopeless mess that nobody could work with at all? (just checking)

      The idea of grammar growth by adding FPs appeals to me because although the number of ultimately attainable grammars would be infinite, it would be finite at any given age if there is a maximal rate at which new FPs could be added.

      Delete
    8. The term Chimsky uses is "not feasible." However, he pointed out that he had no idea how to actually compute an overall simplicity measure over grammars of the Aspects variety and I certainly have no idea how to do this. Once available such an EM would not be hard to use as it would define a simplicity ordering over Gs. The problem is finding the damn thing, or something that looks plausible. So I guess I meant number 2.

      Delete
    9. 'Chomsky' not 'Chimsky' though I don't believe the latter had any ideas concerning EM either.

      Delete
    10. This comment has been removed by the author.

      Delete
    11. So, it appears to me that in syntax hardly anyone really tried just counting the symbols and seeing what happens (I did a little bit in my thesis, but not for whole grammars); there is forex 0 literature on the most obvious pathologies associated with simple PS rules, and people would have had additional problems due to the fact that nobody then knew how to write working fragments that connected sound to meaning with a fixed notation (Montague Grammar had fragments, but written in freestyle math prose rather than a linguistic notation, & TG was of course hopeless).

      So perhaps a second attempt with the much better tools and general understanding that exist now might work out better. Your idea would be that EM=the number of extra FPs needed beyond the starter set provided by UG (assuming that there will be a finite number of parameters associated with each FP, all settings equally valued).

      The Bayesian literature seems to show some fluctuation and arbitrariness about how to calculate the Prior of a grammar, but not the point, it seems to me, that people couldn't just try stuff and see what sort of works and what really doesn't.

      Delete
    12. The problem Norbert described sounds extremely similar to the one discussed formally in Stabler (1998) and Kobele et al (2002). (Instead of copies these use multidominance.) The deficiencies in these works are (at least) three.
      1. getting the structures is magic
      2. we needed to be able to `see' silent heads
      3. we need a bound on the amount of lexical ambiguity
      On the other hand, no assumptions were made about UTAH, antecedently given categories, etc.

      Delete
    13. @Avery: there is quite a history on using the length of the grammar as an evaluation metric, from Solomonoff through the family of MDL and MML approaches up to modern Bayesian approaches by people like Amy Perfors, and Phil Blunsom and Mark Johnson and so on. The short and over simplified version (and this is my take on it, possibly not the general view) is that they work quite well if you have trees as input but don't work if you only have flat strings as input.

      But of course, you have to have a formalized grammar to do this, otherwise you don't have the object whose description you want to count, and you have to have a parser, or you can't check that you actually are generating the examples in the input.

      Delete
    14. I know. But afaik it has not been conducted with plausible assumptions about the input (including some kind of theta role information along with the strings, as suggested by Norbert), nor with an exploratory as to dogmatic attitude as to what the structural formats and rule notations might be.

      Delete
    15. @Greg. The role of UTAH is vital here as it gives us info about where DPs start their derivational lives. As for ecs, something like the C-T-v-V structure is given universally and the hope is that we can find a way of adding functional material as needed to map to the given strings. The string info and the theta info I assume enjoys what Chomsky called 'epistemological priority,' i.e. it is the kind of info that one can glean without invoking grammatical primitives. In a sense, then, both are "observables." Given that there is a rich enough UG we should be able to get things started and build more complex structure as needed. That's the idea.

      The use of UTAH like info has been part of the acquisition profile since at least Wexler and Culicover, who used DS/SS pairs as input to the LAD.

      Last point: I am inclined to a VERY thin view of lexical meaning, something very Davidsonian. If this is so, then lots of what we think of as part of the argument structure of predicates does not exist within the grammar. If this is so, then there is not much sub cat or selection info as predicates are bare predicates of events. Paul has worked on this idea extensively and I buy his views on this. At any rate, this means that lots of what we think of as in the purview of the grammar is not, e.g. *John is hugging is not ungrammatical. This, of course, changes the acquisition problem, as subcat info is largely absent. If this is so, then verbs are not ambiguous ever. They always just denote events or, in Paul's terms, are instructions to fetch concepts.

      Delete
    16. It's reasonable to assume that the strings of acoustic categories -- phones, phonemes etc are accessible to the learner, since there are reasonable theories, with quite a lot of detail, about how this could happen.
      But assuming that the theta role information is available seems problematic, particularly if you take the word learning work that you were posting about some time ago seriously. Just how does this happen? If learning words is so hard, then how can we learn these event types so easily, particularly given the cross-linguistic variation in the way that meanings are syntactically realized.

      Delete
    17. There is evidence that kids that are very very young (pre-lingistic in fact) seem to be able distinguish events with various participants. So they can distinguish an intransitive from a transitive verb. I would be surprised if agents and patients were not "observables" and UTAH tells you how to map these onto syntactic structures. ONe question you might be asking is whether ALL verbs are learned in this way, given that there are predicates like 'faces' where it is not clear what the "agent" or "patient" is. However, there is no reason to think that UTAH need apply to all predicates to get a foot in the door. There is quite a bit of literature on this in the acquisition world. I believe that even Tomassello has written on this. So, out seems that kids are very good at perceiving events and their participants. If UTAH is right, then they can use this very effectively.

      So, I guess that I am less surprised than you are that kids can use this to break into the grammatical system. I am not sure what earlier posts you are referring to.

      Delete
    18. Verb meanings do indeed seem to be hard to learn, but kids do learn them, and then the situation of use often provides info about the theta roles. Forex if I throw my teddy bear onto the floor from my hi chair, and Daddy says "Why did you throw your teddy bear onto the floor?", if I know roughly what 'throw' means, I get some evidence that the Patient follows the verb.

      Another consideration is that intro descriptively oriented syntax teachers expect students to be able to analyse basic grammatical structure from exceedingly small corpora of translated (sometimes glossed, sometimes not) examples, typically less than 2 dozen sentences, and they can do it.

      So a suprisingly small number of appropriately understood examples might suffice to get things off the ground. Here is a bit of preliminary analysis I've done of the amy20 files from the Bates corpus on Childes. The 'patterns' are recurrent utterance formats that I judged might not really be compositional syntax, but something more like construction-grammar templates. http://members.iinet.net.au/~ada/amy20.disc

      I'll add that any criticism I might be making is directed primarily at linguists for ignoring new and interesting mathematical work (perhaps only because it takes some time to learn enough about it not to be a complete idiot?), not to the people who are doing it.

      Delete
    19. @Norbert: there certainly is a lot on this in the acquisition world, as it is a widely held assumption. Roughly speaking, Pinker style semantic bootstrapping. But the arguments that children can do this are weak, and often rely on a "what else could it be?" inference. i.e. there is no other way that children could learn syntax, so no matter how implausible it is we should assume that they can. Moreover the experimental evidence -- like the Trueswell/Medina/Gleitman work shows that adults can't identify what verbs are being used from the situational context, but need the linguistic context. So in the absence of some worked out theory of how the children can learn these theta roles and so on, I don't accept this theory as an adequate explanation. Of course, if children can learn the entire meanings, and the meanings are structured in a way that is close to the syntactic structure, then you can learn the syntax without too much trouble.

      @Avery: I completely agree with the motivating example in your first paragraph, and we know from e.g. some of Pinker's work that children do in fact exploit that sort of information, once they know that transitive verbs are standard ways of representing certain types of event structures and so on. But that information is not available until later phases of acquisition and I find it implausible as an explanation of what is happening in the first 6 months or so.

      Finally, when you try to flesh out these theories, what tends to happen is that lots of things turn out to be innate -- syntactic categories, linking rules, innate notions of event structure, etc etc. and some of these run the risk, we may feel, of vioating the constraints that Darwin's problem places on possible solutions. But I have an open mind on how important a role this sort of semantic bootstrapping has in language acquisition, and am happy to revise it upwards in the light of some reasonable evidence.

      Delete
    20. @Alex. I believe that you are misreading the literature here, or selectively reading it. There is a very early paper by Golinkoff and Hirsch-Pasek that show that very very young kids (6 months if I recall correctly, at any rate they were in moms' laps being held up) able to pair intransitive sentences with intransitive "event" scenes and transitive ones with transitive scenes. They were quite able to make this pairing reliably. Second, my recollection of Lila's stuff is not that the pairing sentences to events was a challenge but that pairing of verbs to events in the absence of nouns was. This is why it is useful for kids to learn a bunch of nouns first (and indeed they seem to learn more nouns than verbs initially). Once nouns are provided, kids are very good at finding the right verbs. This is not a surprise given UTAH as the nouns are there to be paired with participants of events and knowing participants should be useful in identifying the event. So, I guess we differ here. Yes, UTAH is important in getting the system up and going. And yes there seems to be evidence that very young kids (apparently pre-linguistic) can use this information in certain contexts. Once UTAH works its magic, of course, there will be other cures to underlying structure in various languages. However, we need something like UTAH to play a role and it seems that it can play this role. Like you I am relying on the literature for this conclusion, but it seems that this line of reasoning has sufficient support.

      Delete
    21. Oops -- the paper I was thinking of was "Human simulations of vocabulary learning" by Gillette, Jane and Gleitman, Henry and Gleitman, Lila and Lederer, Anne in Cognition in 1999. I think there are different ways to interpret the evidence.

      Once you know a certain amount about language (whether this is learned or innate) : what the nouns, verbs are, some subcat information, some information about how this relates to event structure etc. , then I certainly agree that children use this information in an integrated way.
      The only debate is about when and where it comes from, and how important a role it plays in the early phases. Is UTAH innate? In which case, presumably theta roles are innate, and syntactic categories and a bunch of other stuff that you need to even express UTAH.

      I guess the tricky point is whether this sort of semantic bootstrapping works without any previous linguistic learning, and how little innate information you need to start the ball rolling. If it turns out that to get started you need a lot of stuff built in, then it starts to lose some of its appeal, at least for those of us on the small UG end of the spectrum.

      Delete
    22. Yes, one needs a lot of stuff built in and, as you note, for me this is not a big surprise or a problem. What I would like is that some of what is built in be non-linguocentric. the part of Utah that I like is that it projects event participants to structures. I think that the eventish side of things is probably quite generally cognitively available. Why it maps agents to subjects and patients to objects is quite arbitrary, so far as I can see, and the mapping itself must be part of FL. But given this mapping, we can get things going. Of course, were this mapping derivable from something else, I for one, would be delighted.

      Delete
    23. & it seems unlikely that we'd need anything like the full UTAH from Baker 1987, maybe just a prominence hierarchy as assumed in many syntactic frameworks, along the lines that more active->less tightly combined with the predicate, plus a tendency to view 'actions' as things distinct from their 'agents' (ie a concept of 'external argument').

      This might be easy to work towards if grammars were attached to a representational semantics along the lines of Jackendoff, rather than directory to model theory.

      Delete
    24. Another point is that the program of making UG (or perhaps 'Narrow UG' as opposed to 'Broad UG') simpler by shoving as much arguably universal structure as possible to the other side of the interfaces where the chances are better that it could have evolved, would be easier, in the case of semantics, if the semantics were representational in the general style of Jackendoff, rather than model-theoretic as worked out by Greg Kobele.

      I think I've worked out how to adapt LFG's 'glue semantics' to work for MGs, connecting the 'ports' of the meaning-constructors to features rather than to f-structures. This will work for many kinds of semantics where assembly can be done with typed lambda calculus.

      Delete
    25. @Avery: The most recent (and best) presentation of MG semantics uses only the typed lambda calculus, and can easily be thought of in representational terms.

      Delete
    26. Yes, nice, as far as I've gotten through it so far. I had wondered if anybody was doing anythign with Montogovian dynamics. A quibble, relevant to the title of the top blog posting, is that engineering the directionality of the head of Merge into the feature specifications of lexical items creates a typological/learning problem, in that this directionality does not seem to usually seem to be a property of lexical items, but of the entire language. Possible exceptions being quantity words such as enough and galore in English, which can/must follow the Ns they give the quantifies of. But I'm not aware of anything like this every happening with core arguments.

      Delete