Monday, September 15, 2014

Computations. modularity and nativism

The last post (here) prompted three useful comments by Max, Avery and Alex C. Though they appear to make three different points (Max pointing to Fodor’s thoughts on modularity, Avery on indirect negative evidence and Alex C on domain specific nativism) I believe that they all end up orbiting a similar small set of concerns. Let me explain.

Max links to (IMO) one of Fodor’s best ever book reviews (here). The review brings together many themes in discussing a pair of books (one by Pinker, the other by Plotkin). It outlines some links between computationalism, modularity, nativism and Darwininan natural selection (DNS). I’ll skip the discussion on DNS here, though I know that there will be many of you eager to battle his pernicious and misinformed views (not!).  Go at it.  What I think is interesting given the earlier post is Fodor’s linking together computationalism, modularity and nativism.  How do these ideas talk to one another? Let’s start by seeing what they are.

Fodor takes computationalism to be Turing’s “simply terrific idea” about how to mechanize rationality (i.e. thinking). As Fodor puts it (p. 2):

…some inferences are rational in virtue of the syntax of the sentences that enter into them; metaphorically, in virtue of the ‘shapes’ of these sentences.

Turing noted that, wherever an inference is formal in this sense, a machine can be made to execute the inference. This is because…you can make them [i.e. machines NH] quite good at detecting and responding to syntactic relations among sentences.

 And what makes syntax so nice? It’s LOCAL. Again as Fodor puts it (p. 3):

…Turing’s account of computation…doesn’t look past the form of sentences to their meanings and it assumes that the role of thoughts in a mental process is determined entirely by their internal (syntactic) structure.

Fodor continues to argue that where this kind of locally focused computation is not available, computationalism ceases to be useful.  When does this happen? When belief fixation requires the global canvassing and evaluation of disparate kinds of information all of which have variable and very non-linear effects on the process. Philosophers call this ‘inference to the best explanation’ (IBT) and the problem with IBT is that it’s a complete and utter mystery how it gets done.[1] Again as Fodor puts it (p. 3):

[often] your cognitive problem is to find and adopt whatever beliefs are best confirmed on balance. ‘Best confirmed on balance’ means something like: the strongest and simplest relevant beliefs that are consistent with as many of one’s prior epistemic commitments as possible. But as far as anyone knows, relevance, strength, simplicity, centrality and the like are properties, not of single sentences, but of whole belief systems: and there’s no reason at all to suppose that such global properties of belief systems are syntactic.[2]

And this is where modularity comes in; for modular systems limit the range of relevant information for any given computation and limiting what counts as relevant is critical to allowing one to syntactify a problem and allow computationalism to operate.  IMO, one of the reasons that GG has been a doable and successful branch of cog sci is that FL is modular(ish) (i.e. that something like the autonomy of syntax is roughly correct).  ‘Modular’ means “largely autonomous with respect to the rest of one’s cognition” (p. 3). Modularity is what allows Turing’s trick to operate. Turing’s trick, the mechanization of cognition, relies on the syntacticifcation of inference, which in turn relies on isolating the formal features that computations exploit.

All of which brings us (at last!) to nativism.  Modularity just is domain specificity.  Computations are modular if they are “more or less autonomous” and “special purpose” and “the information [they] can use to solve [cognitive problems] are proprietary” (p. 3).  So construed, if FL is modular, then it will also be domain specific. So if FL is a module (and we have lots of apparent evidence to suggest that it is) then it would not be at all surprising to find that FL is specially tuned to linguistic concerns. And that it exploits and manipulates “proprietary information” and that its computations were specifically “designed” to deal with the specific linguistic information it worries about.  So, if FL is a module, then we should expect it be contain lots of domain specific computational operations, principles and primitives.

How do we go about investigating the if-clause immediately above?  It helps go back to the schema we discussed in the previous post. Recall the general schema in (1) that we used to characterize the relevant problem in a given domain, ‘X’ ranging over different domains.  (2) is the linguistic case.

(1)  PXD -> FX -> GX
(2)  PLD -> FL -> GL

Linguists have discovered many properties of FL.  Before the Minimalist Program (MP) got going, the theories of FL were very linguistically parochial. The basic primitives, operations and principles did not appear to have much to say about other cognitive domains (e.g. vision, face recognition, causal inference). As such it was reasonable to conclude that the organization of FL was sui generis. And to the degree that this organization had to be take as innate (which, recall, was based on empirical arguments about what Gs did) then to that degree we had an argument for innate domain specific principles of FL.  MP has provided (a few) reasons for thinking that earlier theories overestimated the domain specificity of FL’s organization. However, as a matter of fact, the unification of FL with other domains of cognition (or computation) has been very very very modest.  I know what I am hoping for and I try not to confuse what I want to be true with what we have good reason to be true. You should too. Ambitions are one thing, results quite another. How one might go about realizing these MP ambitions?

If (1) correctly characterizes the problem, then one way for arguing against a dedicated capacity is to show that for various values of ‘X,’ FX is the same. So, say we look at vision and language, then were FL = FV we would have an argument that the very same kind of information and operations were cognitively at play in both vision and language.  I confess, that stating things this baldly makes it very implausible that FL does equal FV, but heh, it’s possible. The impressive trick would show how to pull this off (as opposed to simply expressing hopes or making windy assertions that this could be done), at least for some domains. And the trick is not an easy one to execute: we know a lot about the properties of natural language Gs. And we want an FL that explains these very properties. We don’t want a unification with other FXs that sacrifices this hard won knowledge to some mushy kind of “unification” (yes, these are scare quotes) which sacrifices the specifics that we have worked so hard to establish (yes Alex, I’m talking to you). An honest appraisal of how far we’ve come in unifying the principles across modules would conclude that, to date, we have very few results suggesting that FL is not domain specific. Don’t get me wrong: there are reasons to search for such unifications and I for one would be delighted if this happens. But hoping is not doing and ambitions are not achievements. So, if FL is not a dedicated capacity, but is merely the reflection of more general cognitive principles then it should be possible to find FL being the same as some FX (if not vision, then something else) and that this unified FX’ (i.e. which encompasses FL and FX) can derive the relevant Gs with all their wonderful properties given the appropriate PLD. There’s a Nobel prize awaiting such a unification, so hope to it.[3]

It is worth noting that there is tons of standard variety psycho evidence that FL really is modular with respect to other cognitive capacities.  Susan Curtiss (here and here) reviews the wealth of double dissociations between language and virtually any other capacity you might be interested in. Thus, at least in one perfectly coherent sense, FL is a module and so a dedicated special purpose system. Language competence swings independently of visual acuity, auditory facility, IQ, hair color, height, voacab proficiency, you name it. So if one takes such dissociations as dispositive (and it is the gold standard) then FL is a module with all that this entails.

However, there is a second way of thinking about what unification of the cognitive modules consists in and this may be the source of much (what I take to be) confused discussion. In particular, we need to separate out two questions: ‘Is FL a module?’ and ‘Is FL contain linguistically proprietary parts/circuits?’ One can maintain that FL is a module without also thinking that its parts are entirely different from those in every other module.  How so? Well, FL might be composed from the same kinds of parts present in other modules, albeit put together in distinctive ways. Same parts, same computations, different wiring. If this were so, then there would be a sense in which FL is a module (i.e. it has special distinctive proprietary computations etc.), yet when seen at the right grain it shares many (most? All?) of its basic computational features with other domains of cognition. In other words, it is possible that FL’s computations are distinctive and dedicated, and that they are built from the same simple parts found in other modules. Speaking personally, this is how I now understand the Minimalist Bet (i.e. that FL shares many basic computational properties with other systems). 

This is a coherent position (which does not imply it is correct). At the cellular level our organs are pretty similar. Nonetheless, a kidney is not a heart, and neither is a liver or a stomach.  So too with FL and other cognitive “organs.”  This is a possibility (in fact, I have argued in places that this is also plausible and maybe even true). So, seen from the perspective of the basic building blocks, it is possible that FL, though a separate module, is nonetheless “just like” every other kind of cognition. This version of the “modularity” issue asks not whether FL is a domain specific dedicated system (it is!), but whether it employs primitive circuits/operations proprietary to it (i.e. not shared with other cognitive domains). Here ‘domain specific’ means uses basic operations not attested in the other domains of non-linguistic cognition.

Of course, the MP bet is easy to articulate at a general level. What’s hard is to show that it’s true (or even plausible).  As I’ve argued before, to collect on this bet requires, first, reducing FL’s internal modularity (which in turn requires showing Binding, movement, control, agreement, etc. are really only apparently different) and, second, showing that this unification rests on cognitively generic basic operations.[4] Believe me when I tell you that this program has been a hard sell.

Moreover, the mainstream Minimalist position is that though this may be largely correct, it is exactly wrong: there are some special purpose linguistic devices and operations (e.g. Merge), which are responsible for Gs distinctive recursive property. At any rate, I think the logic is clear so I will not repeat the mantra yet again.

This brings me to the last point I want to make: Avery notes that more often than not positive evidence relevant to fixing a grammatical option is missing from the PLD.  In other words, Avery notes that the PLD is in fact even more impoverished than we tend to believe. He rightly notes that this implies that indirect negative evidence (INE) is more important than we tend to think.  Now if he is right (and I have no reason to think that he isn’t), then FL must be chocked full of domain specific information. Why? Because INE requires a sharp specification of options under consideration to be operative.  Induction that uses INE effectively must be richer than induction exploiting only positive data.[5] INE demands more articulated hypothesis space, not less. INE can compensate for poor direct evidence but only if FL knows what absences it’s looking for! You can hear the dogs that don’t bark but only if you are listening for barking dogs. If Avery’s cited example is correct (see here), then it seems that FL is attuned to micro variations, and this suggests a very rich system of very linguistically specific micro parameters internal to FL. Thus, if Avery is right, then FL will contain quite a lot of very domain specific information and given that this information is logically necessary to exploit INE it looks like these options must be innately specified and that FL contains lots of innate domain specific information. Of course, Avery may be wrong and those that don’t like this conclusion are free (indeed urged) to reanalyze the relevant cases (i.e. to indulge in some linguistic research and produce some helpful results).

This is a good place to stop.  There is an intimate connection between modularity, computationalism, and nativism. Computations can only do useful work where information is bounded. Bounded information is what modules provide. More often than not the information that a module exploits is native to it. MP is betting that with respect to FL, there is less language specific basic circuitry than heretofore assumed. However, this does not imply that FL is not a module (i.e. part of “general intelligence”). Indeed, given the kinds of evidence that Curtiss reviews, it is empirically very likely that FL is a module. And this can be true even if we manage to unify the internal modules of FL and demonstrate that the requisite remaining computations largely exploit domain general computational principles and operations. Avery’s important question remains: how much acquisition is driven by direct and how much by indirect negative evidence? Right now, we don’t really know (at least not to the level of detail that we want). That’s why these are still important research topics.  However, the logic is clear, even if the answers are not.



[1] Incidentally, IBT is one of the phenomena that dualists like Descartes pointed to in favor of a distinct mental substance. Dualism, in other words, is roughly the observation that much of thought cannot be mechanized.
[2] It’s important to understand where the problem lies. The problem is not giving a story in specific cases in specific contexts. We do this all the time. The problem is providing principles that select out the IBT antecedent to a specification of the contextually relevant variables. The hard problem is specifying what is relevant ex ante.
[3] Successful unifications almost always win kudos. Think electricity and magnetism, the the latter two with the weak force, terrestrial and celestial mechanics, chemistry and mechanics. These all get their own chapters in the greatest hits of science books. And in each case, it took lots of work to show that the desired unification was possible. There is no reason to think that cognition should be any easier.
[4] I include generic computational principles here, so-called first factor computational principles.
[5] In fact, if I understand Gold correctly (which is a toss up), acquiring modestly interesting Gs strictly using induction over positive data is impossible.

24 comments:

  1. This comment is from Mark Johnson. I am mere midwife here.

    Fodor's review (like pretty much all of his work) is very compelling.

    I'd like to pick up on a component Fodor points out is likely to be central to any rationalist account: "inference to the best explanation". This is what Bayesian methods aim to describe. Bayesian methods provide a specific account of how "to find and adopt whatever new beliefs are best confirmed on balance" given "what perception presents to you as currently the fact and you’ve got what memory presents to you as the beliefs that you’ve formed till now". Bayesian inference is compatible with Turing computation in general and minimalist representations in particular.

    I agree it's still very much an open question whether Bayesian methods can provide a satisfactory account of information integration for things like commonsense knowledge (heck, it's still an open question just what aspects of language Turing computation models can provide a satisfactory account of), but as far as I can tell, Bayesianism or something like it is the best account of information integration we have today.

    As you and Avery point out, the relationship between grammatical properties and the evidence available to the learner can be quite indirect. Even if we can find a complex set of "triggers" for these cases, a Minimalist should demand an explanation of why exactly these triggers are associated with exactly these grammatical properties. Bayesian methods provide an account of how primary data (e.g., "triggers") and inferences (e.g., grammatical properties) are connected, and so provide an explanatory framework for studying these aspects of the human language faculty.

    ReplyDelete
    Replies
    1. @ Mark J

      Yes, Bayes might be very helpful here. This is consonant with some nice work by Jeff Lidz and Annie Gagliardi (discussed here: http://facultyoflanguage.blogspot.com/search?q=Lidz+and+gagliardi). In fact, so far as I can tell, it is roughly the approach proposed in Aspects chapter 1 (p. 31-2) with a few probabilities thrown in. The problem that Chomsky later noted for dumping this approach is that it turned out to be very hard (and still is) to array Gs wrt to some general (simplicity) measure and so it was hard to operationailze the Aspects proposal. He called this the feasibility problem. It was possible to come up with ways of evaluating Gs pairwise, but not more generally and absent this it was hard to see how to make the evaluation metric work. It strikes me that Bayes needs something analogous to get off the ground, no? Second, even if we had this, there are computational issues with Bayes being able to actually carry out the relevant computations when the space of options gets "big" (with what 'big' means also an issue). Again this speaks to feasibility.

      That said, I agree (and I think Chomsky would) that Bayes methods per se are perfectly compatible with what GGer wish to provide and it is not Bayes that is the problem in integrating the two kinds of work but the rather silly associationsim that often comes packaged with Bayes like suggestions. Absent this, there is no reason for GGers to eschew Bayes like models. Indeed at UMD, we welcome the attempts to do just that.

      Delete
    2. Here's a comment from Trevor Lloyd. He had trouble posting so I have acted as go-between. I have divided it into 2 parts for technical reasons.

      I hope you will permit me to make a contribution to this post. It is also relevant to its immediate precursors. My field is the semantics of word meaning and my knowledge of GG is not extensive but many of the issues discussed here are of interest to me and I am confident I can make a contribution to the discussion, especially on concept formation and PoS, based on a newly discovered aspect of word and concept meaning.

      I have identified in the lexicon something that has largely escaped attention. This is a canonical set of abstract semantic features that are ostensibly non-linguistic, innate and universal. I will not describe the discovery procedure here but I believe the results are robust empirically. I call this small set of features ‘semantic factors’. There are just 24 semantic factors on current estimation but each splits into several sub-factors that make their meanings more explicit. They are quite different from the types of semantic features proposed by the early componentialists, Jackendoff, Pustejovsky or the semantic primitives of Anna Wierzbicka, or the feature norms proposed by Ken McRae.

      The factors provide the structure of the meaning of all (yes, all) words. They can be called ‘the alphabet of word meaning’ or the elements of a Universal Semantics. This is testimony to their level of primitiveness. There are just 24 semantic factors on my current estimation but each can splits into several sub-factors that make their meanings more explicit. The factors do not provide full content. That requires other elements that build on the factoral structure.

      The value of the factors is that they provide for the first time a way to identify the semantic contents of words/concepts in a principled way. It is unfortunate that the controversy about concept formation has raged without any firm knowledge of what concepts are. The factors and the other elements of word meaning provide a new basis for discussing the issue.

      Delete
    3. Here's part 2 of Trevor Lloyd's comment:

      Knowledge of the factors enables us to build a full 3-part model of word/concept meaning. Configurations of factors are the central constituent. They form the structure of what we learn and what we know in an instant that has the form of a mental gestalt in one or more modalities. (A semantic gestalt is a highly condensed, multi-part, isomorphic representation of an entity). The structure of the gestalt is not fully opaque and can be identified when the factors are known. The third part is World Knowledge. This is personal to a concept holder and is in linguistic form but the configuration of factors is universal. All of these elements are intuitively accessible.

      I will illustrate the value of the factors by considering a child’s experience in learning the concept KEY. If it handles a key at an early age the child forms a visual/tactile gestalt with a factoral structure. (To spell this out at this stage would take too much space). That will be the extent of its early concept. Later it will learn what keys are used for which will require structural modification to the gestalt. Later still it may build its world knowledge component of the concept (how they work in locks etc.) that will enable it to use the word in a versatile manner.

      With this knowledge of the nature of concepts it is easy to see, 1. that learning is a drawn-out process, 2. hypothesis formation is not the best description for it, 3. experiential information is crucial, 4. there is no evidence of an innate concept but strong evidence for innate structural constituents of all non-complex concepts. All of this except the reference to the factors is consistent with our normal understanding of the acquisition process. I wonder what Fodor’s response would be.

      A little more about the factors. They are neurally intelligible as they are all arguably instantiated in specific somatotopic regions of the human and animal brain – sensorimotor, kinesthetic, proprioceptive and interoceptive. A key feature is that the factors are of two main types, one physical parameters and the other affective/normative parameters.

      Although the factors have been derived from the lexicon they are not originally linguistic. They apply equally to concepts, cognition, memory, human behaviour and experience. Surprisingly they also can be seen to operate, not just in animal cognition, but across the whole field of biology. This capacity comes from their dimensional character. In relation to biology I describe the factors as 'the dimensions of the space of the interaction of organisms and their environments'. . In primitive organisms their dimensionality operates in an ultimately basic way as the parameters of the possibility of life.

      These primitives identified in the lexicon are indeed extraordinary.

      I am well aware that this brief description will stretch credibility to the limit but the empirical evidence for the factors in language is very robust, in my view. I am engaged in writing this up in book form at present but if you or your contributors are interested I can provide more information.

      Delete
  2. "Now if he is right (and I have no reason to think that he isn’t), then FL must be chocked full of domain specific information. Why? Because INE requires a sharp specification of options under consideration to be operative. Induction that uses INE effectively must be richer than induction exploiting only positive data.[5] INE demands more articulated hypothesis space, not less."

    I may well be misunderstanding this, but this seems backwards. If you have an algorithm that uses INE, then typically it can learn a larger class of languages than if it doesn't. So often the hypothesis class will be larger (in the INE case). Say the class of all probabilistic automata rather than just the class of reversible ones, to take an easy example.

    ReplyDelete
    Replies
    1. Tracking what's not there is very hard unless you know what you are looking for. In the absence of a specification of what to look for there are way too many things that are absent: prime numbered DPs, VPs with 'unicorn' complements, TPs with complex subjects, etc. Of course, if you KNOW what you are looking for AND YOU DO NOT FIND IT, well, that might be interesting. But absent a specification of the potential target, INE is impossible. Thus to make use of INE you need a pretty rich set of specified alternatives that you are canvassing and as these underlie learning and are not the result thereof, they mast be pre-specified (i.e. innate).

      Delete
    2. That's not how it works. Typically, you have a probabilistic grammar and you apply some method that increases the likelihood of the data, and this has the effect of assigning probability zero to types of things you don't see (yes, any experts reading, this is an oversimplification in certain important respects). You don't need to explicitly have a model of the things that are absent.

      Delete
    3. The method must rely on a specification of the hypothesis space and the weightings of alternative specified therein. Thus, all the possible alternatives are specified and this is what enables you to calculate increases in likelihood. Now do this without a specification of the possible outcomes. How far do you get? Once one is in for probabilities then one is already in for hypotheses spaces and these must be pre-specified for INE to be operative. No such options, no probabilities and no INE.

      Delete
    4. I agree with that! (at least under the standard modeling assumptions) I was only objecting to the idea that there is some important difference in this respect between INE-using algorithms and ones that don't look at frequency data. They seem in the same boat.

      Delete
  3. Here's another comment from Mark Johnson. Let's admit that there is something amusing about a computational neanderthal like me coming to the aid of a world class CS person lie Mark. Think mice pulling thorns from lion paws! At any rate, here's another comment from Mark.Oh yes, it's in 2 parts due to space issues.

    It seems to me that everyone agrees there needs to be some connection between properties of the primary linguistic data and the linguistic properties that get identified. (*)

    Correct me if I'm wrong, byt generative grammarians by and large assume that the relevant properties of the input are Boolean "triggers", i.e., simple patterns that either match an input or not. But as far as I know there's no explanatory theory in generative grammar about how triggers are associated with linguistic properties.

    It's of course possible that the connection is innate, i.e., our genes encode a giant matrix of triggers and associated linguistic properties. But this seems very unlikely for at least two reasons. First, this is a huge burden on evolution. Second, if the matrix of triggers/linguistic properties were encoded "independently" in the genome, we'd expect double dissociations; e.g., syndromes where somehow the trigger/linguistic property matrix gets out of sync with the faculty of language. To take a rather outdated examples, suppose the Pro-Drop parameter is "triggered" by rich inflection in the input (Mandarin not withstanding), and this is independently hard-wired into our genetic code. Then we might expect to see an abnormal population where the connection between the trigger and the linguistic property has been disturbed or disrupted somehow, e.g., a subpopulation where rich inflection doesn't trigger Pro-Drop but triggers something else "by mistake" (WH in situ?). But as far as I know we don't see anything like this.

    Bayesian and related approaches do provide a systematic, explanatory theory of how to infer linguistic properties from primary linguistic data, and it seems to me that generativists need such an account in order to avoid the problems just raised. I don't know if infants are doing Bayesian belief updating while learning language, but it seems likely that there's a systematic connection between the information they're extracting from the input and the linguistic properties they are inferring.

    ReplyDelete
  4. Here's aprt 2:

    Bayesian theory is perfectly compatible with the hypothesis that all that's available to the "learner" is information about a set of Boolean "triggers". But as Alex points out, it's not necessary to assume there is an innately-specified set of triggers. Indeed, for a broad class of probabilistic grammars, including probabilistic versions of Minimalist Grammars, these Bayesian methods allow us to extract information from an entire sentence in a way that is optimal in a certain sense. Moreover, this information is extracted as a by-product of the parsing process; something the child is presumably doing anyway.

    While Bayesian methods don't require that grammars are probabilistic, in practice Bayesian inference seems to work much better with grammars that are probabilistic. In a sense this is surprising, since the space of probabilistic grammars is "larger" than the space of discrete or categorical grammars (in the sense that the space of discrete grammars can be embedded in the space of probabilistic grammars). However, the size of a space isn't necessarily related to how hard it is to search that space, and in fact problems can often become easier if they are "relaxed", i.e., they are embedded in a larger problem space. "Learning" in the space of probabilistic grammars may be easier than "learning" in the corresponding discrete space because we can take advantage of continuity.


    (*) I'd say "learned" here except that I know that word makes Norbert see red. But in fact what statisticans and computer scientists mean by "learning" is almost exactly the same as what generative grammarians mean by "parameter setting".

    ReplyDelete
    Replies
    1. @ Mark

      The notion of 'trigger' is used in the context of PoS arguments to signal the fact that some of what competent speakers know was not acquired (and yes I do hate the term "learning" for it invites a specific conception of how input and end state relate) on the basis of instances of the relevant principle. So, for example, we don't acquire the ECP (I am assuming it is a universal here) by witnessing useful instances of the ECP at work "in the data," (positive or negative). Thus the data triggers this fact about our competence (in that we would not necessarily have acquired the ECP if not exposed to PLD at all) rather than "shapes" it. This, at least, is how I understand the trigger notion.

      The evo problem then is not with triggers per se, but the fact of parameters altogether. Are these built in to FL or not? In other words, is there a finite (albeit large) number of ways that Gs can differ and if so are these innately specified? In other words, are the parameters endogenous? It has been very hard to specify relevant parameters (those that make contact with the data) that are not very linguistically specific (this was Avery's point). Thus it would seem that parameters are both innately specified and domain specific. This is not a great thing from an MP perspective. I would love to see these cases reanalyzed. Right now, I don't know how to do this. That's my problem. I'd love to hear that Bayesians can solve it. But from what I've seen, they cannot. They build the same options into their systems as I do and from where I sit I don't see that this solves the problem. Please tell me I am wrong about this. I would be delighted if that were so.

      Delete
    2. Another addition by the "learned" MJ from down under. I, once more, am mere conduit.

      Many different mechanisms can be used to express cross-linguistic variation. I gather that the currently fashionable approach locates variation in the lexicon, in particular, in the inventory of functional categories present in the lexicon of a language. Although it's often supposed that acquiring lexical entries is easy, that's not necessarily the case, especially if the lexical entry concerned is sufficiently abstract (e.g., phonologically null). For example, determining whether a language's lexicon contains a functional element permitting WH in-situ in a lexicalist account is presumably just as hard (or easy) as setting a WH in-situ parameter in a parameter-setting account.

      As I understand it, a "triggers" approach is the received view of how such parameters are set or such abstract functional categories are acquired. As I explained in my previous post, I'd expect a Minimalist wouldn't be happy with a "triggers" account of acquisition because there's no explanation for how "triggers" are associated with whatever it is that is acquired. A Bayesian account is certainly more minimalist than a "triggers" account: there's a single principle governing how hypotheses (e.g., that a language has a certain parameter setting, or has a certain lexical entry) should be updated in the face of data. The same principle -- Bayesian belief updating -- accounts for the acquisition of the lexicon and whatever is responsible for linguistic variation. Moreover, the information required for Bayesian belief updating is computed as a by-product of the parsing process. This suggests that the child's goal is comprehension, and acquisition happens as a by-product.

      The paper I presented at the International Congress of Linguists last year in Geneva (on my web page) explains all this in more detail. I have another paper draft that applies the ICL paper's Bayesian approach to a probabilistic version of Stabler's Minimalist Grammar and shows that it's possible to acquire more abstract properties such as do-support and verb-second as well as lexical entries; I can post that as well if there's interest.

      (See -- I almost wrote an entire post without using the word "learn"! Surely you could get used to the idea that "learn" is used in a technical sense in computational models to mean "parameter setting". It's not as if generative grammar has never used common words in a technical sense.)

      Delete
    3. @MJ:
      Two points:
      FIrst, I don't think that it is assumed that learning lexical items (LI) is easy. We know from people like Lila that learning the first 50 LIs is anything but. We also have some reason (deriving from her recent research) to think that the process is not well modeled along Bayesian lines. Not that one couldn't kick it into that format, but that it really doesn't look anything like what we would initially expect. The link between parsing and acquisition goes back at least to Berwick's thesis and has always fit well with GG approaches to the issue, so there is nothing there to worry a GGer. However, generally speaking parsing needs grammars so it is likely that parsing is as much a by product of learning as learning is a by product of parsing.

      Second: I invite you to précis your more recent work on Bayes and learning for the blog. If you agree, send me a 3-4 pager and I will post it for all to see. It would be nice to see your views illustrated for the unwashed (like me). Interested?

      Delete
    4. I think there's a more 'innocent' way of understanding triggers than the Rube Goldberg mechanism suggested first by Mark with its capacity for pathological behavior, namely, triggers are simply forms of constructions that appear often enough in the PLD to be learned from, rather than so rare that they are probably 'projected' from the data by UG. I find it a bit odd that I can't point to much in the way of concrete, worked out literature sorting this out in specific cases.

      But for a possible example, 'group genitives' where the possessor head is followed by a PP or an NP:

      I tripped over the dog in the yard's bone
      We left a note in the man who we met yesterday's mailbox

      appear to be extremely rare in speech (Denison, Scott & Borjars 2010, The real distribution of the English “group genitive”, Studies in Language 34:532-564), and, running the Stanford PSG parser over 9.25m words of the CHILDES corpus (all of the big ones available in xml format and a number of the medium-sized one), I find exactly 2 examples of the prepositional ones, and 0 of the relative clauses. They found a lot more of the PP ones in the BNC spoken corpus (72 fixed phrases, names or titles, 11 'others'), but still only 2 of the relative clause type out of 10m words, so this would seem very plausibly to be 'projected' from the data.

      Whether or not the PP type is projected or constitutes a 'trigger' seems unclear to me; from the CHILDES data, probably not, from the BNC data, probably yes.

      But things that do occur often enough to be triggers would be coordinate NPs with one POS at the end, and indefnite pronouns followed by "else's" (somebody else's, who else's, nobody else's), where I find 57 of each in the CHILDES data (a rather strange coincidence), given about 6 per million words, which people seem to regard as clearly enough to constitute a trigger.

      So although I don't think we can definitely say what is learned vs not learned, we can at least begin to try to organize some information as to what things are likely to be learned/function as triggers, vs more, much more, or virtually certain to be projected from what is learned.

      Delete
    5. It certainly is interesting to study how things like this can be learned. But how exactly does a learner "know" that these coordinate NPs and "else's" are triggers for group genetives? The intuition is something like "nothing else could have generated these constructions", and this is exactly what the likelihood component of Bayesian updating formalises.

      Delete
    6. Indeed; I think that what I'm really complaining about is not having any substantial idea of what actually ought to be learnable from what, at least in the domain of complicated structures of the sort traditionally of interest to generative grammarians. And not thinking that the reason for this is that, as usual, there's some important reading I haven't done.

      There are certainly various studies here and there, but they don't add up to a coherent picture, it seems to me, or even bits of a real attempt to produce one.

      Delete
  5. The remark I'd like to make about the 2nd to the last paragraph of Norbert's main post is that the apparent domain-specificity of UG is for now only apparent; caused by our near total ignorance of how other things that might be related work, such as for example picking up skills by imitation. Whether it is real or not remains to be discovered. So I would prefer to wake up to a Sciencedaily.com headline saying 'non-linguistic and pre-human use for UG module found' than 'Chomsky's UG refuted', and the exact same work could get either of these two headlines, depending on how we describe what we are doing to the scientifically at-least-semi-literate public.

    ReplyDelete
    Replies
    1. I am pretty sure that parts of UG will not be prominent in other domains and parts will be. It would be nice to try pricing these apart. I confess to thinking that the kind of recursion we find in language will be special and will not find reflection in other domains. However, I can imagine that other aspects of language will be similar to what we find elsewhere. So, I am confident that how we view events and other animals do is not entirely different. I would also not be surprised to find that many of our concepts coincide with those of other animals. This is all a gesturer to the wide vs narrow conception of language that Chomsky, Hauser, Fitch noted, and that it would be good to find a way of articulating. These are conceivable projects. Would be nice if someone started thinking about them.

      Delete
    2. Isn't it also possible that recursion is part of the "language of thought"? It'd be pretty strange to have a recursive syntax but no recursive LOT. Maybe some other animals have a complex language of thought too? This would lead to a different take on "Darwin's probem".

      Delete
    3. It would lead to a different take. But right now it looks like whatever recursion animals have is nothing like the kind we have, or there is no evidence suggesting that they have what we have. Given this, I still think that Chomsky is right to think that the recursive trick that we find in FL is really the novelty, and the thing to be explained.

      Delete
    4. “So, I am confident that how we view events and other animals do is not entirely different. I would also not be surprised to find that many of our concepts coincide with those of other animals.”

      Can this be right? Surely a critical criterion of concepts is displacement from stimulus, something only humans possess. Animals can’t have what is generally meant by a concept.

      “Isn't it also possible that recursion is part of the "language of thought"? It'd be pretty strange to have a recursive syntax but no recursive LOT. Maybe some other animals have a complex language of thought too?”

      LOT must be recursive. But animals can’t have a LOT if they don’t have concepts. I have little idea what their “mental” life might consist of.

      Animals possess many kinds of intelligence, some of it of amazing complexity, but this does not imply anything like concepts or LOTs. Concepts must surely be both independent of stimuli and able to be combined in multiple ways. Animals don’t have that faculty.

      In other recent posts I have sketched the results of research on word meaning that has identified what I call “dimensions of cognition and behaviour” that are arguably common to the human, animal and organismal spheres, the whole biological domain. These do not require concepts to operate behaviourally but are fundamental to human cognition and language.

      Delete
  6. Another contribution from my left field perspective as outlined very sketchily in my posts of a few days ago on this thread.

    The domain-specificity of UG is an open question but, IMHO, there is no doubt about US = Universal Semantics (as outlined earlier). It is not specific to language. US, the structural dimensions of the space of the semantics of word meaning in the form of 24 primitive, abstract, innate semantic features that I call semantic factors, is also: the structural dimensions of the space of conceptualisation, of cognition, of human and animal behaviour and of organismal interaction with the environment. The small set of semantic factors (such as, materiality, particularity, surface, extension, spatiality, action, positiveness/negativeness, availability/possession, uncertainty/definitiveness) does not belong to a language module. Its demonstrable capacity to form the basic structure of all words and is arguably derived from its capacity in prior modalities.

    A big question is the relation of US to UG. It seems that the nucleus of UG is merge which I presume is effected by both syntactic and semantic features of words. The set of semantic factors that comprises US would seem to be as
    critical as syntactic features to the feature-checking of merge in that the compatibility of two merging units is partly dependent on their respective semantic factors.

    In terms of Norbert’s two formulas at the beginning, it appears that FX = FL in terms of structural combinatory principles (X being other cognitive and behavioural modalities). These primal cognitive factors, if they are valid, also arguably provide the basis for merge in pre-linguistic human thought phylogenetically and ontogenetically. The cause for the emergence of language needs to be sought elsewhere.

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete