Monday, October 7, 2013

Domain Specificity


One of the things that seems to bug many about FL/UG is the supposition that it is a domain specific module dedicated to ferreting out specifically linguistic information. Even those that have been reconciled to the possibility that minds/brains come chock full of pre-packaging, seem loath to assume that these natively provided innards are linguistically dedicated.  This antipathy has even come to afflict generativists of the MP stripe for there is believed to be an inconsistency between domain specificity and the minimalist ambition to simplify UG by removing large parts of its linguistically idiosyncratic structure. I have suggested elsewhere (here) that this tension is only apparent, and that it is entirely possible to pursue the MP cognitive leveling strategy without abandoning the idea that there is a dedicated FL/UG as part of human biological endowment. In a new paper, Gallistel and Matzel (G&M) (here) argue that domain specificity is the biological default once one rejects associationism and adopts an information processing model of cognition. Put more crudely, allergy to domain specificity is just another symptom of latent empiricism (i.e. a sad legacy of associationism).

I, of course, endorse G&M’s position that there is nothing inconsistent between accepting functionally differentiated modules and the assumption that these are largely constructed using common basic operations. And, I of course love G&M’s position that once one drops any associationist sympathies (and I urge you all to immediately do this for your own intellectual well being!), then the hunt for general learning mechanisms looks, at least in biological domains, ill-advised.  Or put more positively: once one adopts an information processing perspective then domain specificity seems obviously correct. Let’s consider G&M’s points in a little detail.

G&M contrasts associationist (A) and information processing (IP) models of learning and memory. The G&M paper is divided into two parts, more or less. The first several pages comprise a concise critique of associationist/neural net models in which learning is “the rewiring of a plastic nervous system by experience, and memory resides in the changed wiring (170).” The second part develops the evidence for an IP perspective on neural computations. The IP models contrast with A-models in distinguishing the mechanisms for learning (whose function is to “extract potentially useful information from experience”) and those for memory (whose function is to “carr[y] the acquired information forward in time in a computationally accessible form that is acted upon by the animal at the time of retrieval”) (170).  Here are some of their central points.

A-models are “recapitulative.” What G&M (170) intend here is that learning consists in finding the pattern in the data (see here): “An input that is part of the training input, or similar to it, evokes the trained output, or an output similar to it.”  IP models are “in no way recapitulations of the mappings (if any) that occurred during the learning.” This is the classical difference between rationalist vs empiricist conceptions of learning. A- models conceive of environmental input as adapting “behavior” to environmental circumstances. IP-models conceive of learning as building a “representation of important aspects of the experienced world.”

A-models gain a lot of their purchase within neuro-science (and psychology) by appearing to link so directly to a possible neural mechanism; long-term potentiation (LTP). However, G&M argue vigorously that LTP support for A-models entirely evaporates when the evidence linking LTP to A-models is carefully evaluated.  G&M walk us slowly through the various disconnects between standard A-processes of learning and LTP function; their time scales are completely different (“…temporal properties of LTP do not explain the temporal properties of behaviorally measured association formation (172).”), their persistence (i.e. how long the changes in LTP vs associations last) is entirely different and so “LTP does not explain the persistence of associative learning (172),” their reactivation schedules are entirely different (i.e. If L is learned and then extinguished, L is reacquired more quickly, but “LTP is neither more easily nor more persistent than it was after previous inductions.”), nor do LTP models provide any mechanism for solving the encoding problem (viz. A-learning is mediated by the comparison of different kinds of temporal intervals and there is no obvious way for LTP nets to do this) except by noting that what gets encoded is emergent (and this amounts to punting on the encoding problem, rather than addressing it).

In short, there is no support for A-models from neural LTP models. Indeed, the latter seem entirely out of synch with what’s needed to explain memory and learning. As G&M put it: “…if synaptic LTP is the mechanism of associative learning- and more generally, of memory- then it is disappointing that its properties explain neither the basic properties of associative learning nor the essential properties of a memory mechanism (173).” So much for the oft insinuated claim that connectionist models are preferable because they are neurally plausible (indeed, obvious!).

General conclusion: A-models have no obvious support from standard LTP models and these standard LTP models are inadequate for handling the simplest behavioral data. In effect, A- and LTP- accounts are the wrong kinds of theories (not wrong in detail, but in conception and hence without much (if any) redeeming scientific value) if one is interested in understanding the neural bases of cognition.

So what’s the right approach? IP-models. In the last parts of the paper G&M go over some examples.  They note that biologically plausible IP models will all share some important features:

1.     They will involve domain specific computations. Why? “Because no general purpose computation could serve the demands of all types of learning (175),” i.e. domain specificity is the natural expectation for IP models of neuro-cognition.
2.     The different computations will apply the same “primitive operations” in achieving functionally different results (175).[1]
3.     The IP approach to learning mechanisms “requires an understanding of the rudiments of the different domains in which the different learning mechanisms operate” (175). So, for example, figuring out if A is cause of B, or A is the edge of B will involve different computations from each other and from those that mediate the pairing of meanings and/with sounds.
4.     Though the neuro-science is at a primitive stage right now, “…if learning is the result of domain-specific computations, then studying the mechanism of learning is indistinguishable from studying the neural mechanisms that implement computations (175).”

Note that this will hold as much in the domain of language as in navigation and spatial representation.  In other words, once one dumps Associationism (as one must as it is empirically completely inadequate and intellectually toxic) then domain specificity is virtually ineluctable. There exist no interesting general purpose learning systems (just as there it no general sensing mechanism, as Gallistel has been wont to observe). That’s the G&M message. Cognitive computation, if it’s to be neurally based, will be quite specifically tailored to the cognitive tasks at hand, even if built from common primitive circuits.

The most interesting part of G&M, at least to me, was the review of the specific neural cells implicated in animal capacities for locating oneself in space and moving around within it. It seems that neuro-scientists are finding “functionally specialized neurons [that] signal abstract properties of the animal’s relation to its spatial environment (185).” These are genetically controlled and, as G&M note, their functional specialization provide “compelling evidence for problem-specific mechanisms.”

Note that the points G&M make above fit very snugly with standard assumptions within the Chomsky version of the generative tradition. In other words, the assumptions that generative linguists make concerning domain specific computations and mechanisms (though not necessarily primitive operations) simply reflects what is, or at least should be, the standard assumption in the study of cognition once Associationism is dumped (may it’s baneful influence soon disappear). If G&M are right, then there are no good reasons from neuro-biology for thinking that the standard assumptions concerning native domain specific structures for language are exotic or untoward.  They are neither. The problem is not with these assumptions, but with the unholy alliance between some parts of contemporary neuroscience and the A-models of learning and cognition that neuro types have uncritically accepted.

If you read the whole G&M paper (some parts involve some heavy lifting) and translate it into a linguistics framework, it is very hard to avoid the conclusion that if G&M are correct, (and, in case you’ve missed it, IMO they are) then the Chomskyan conception of language, mind, and brain is both anodyne and the only plausible game in cog-neuro town.


[1] A similar point wrt MP and UG is made here.

25 comments:

  1. So what is a good IP model of learning?

    I don't really understand where the boundary is meant to be drawn. I see that neural networks are BAD, and Q-learning is BAD, but what sorts of learning algorithms are good?

    (Parenthetically, I think it is very strange indeed to start from navigation in ants and rats, and take this to be the right starting point for understanding language acquisition.
    Navigation is presumably one of the most ancient and ubiquitous bits of cognition, and seems about as far away from language (recent, human-specific, discrete, learned behavior) as it is possible to get. )

    ReplyDelete
    Replies
    1. Those that are firmly based in domain specific constraints. I am pretty sure that you won't like this as we've been around this track before in discussing, for example, restrictions on binding. You don't like the story as it starts with the assumption that large parts of the binding theory are biologically pre-packaged. Thus, there is quite a bit of domain specific knowledge that is presupposed. How does this affect learning? Well, it changes the learning problem from learning the binding theory to learning what the morphological expression of an anaphor/pronoun is. In other words, the problem is to extract information from the data to determine which expressions are anaphors/pronouns (one this is figured out, all the other properties follow, and this is a much different problem than learning binding principles from the input.

      Now, as I said, I am pretty sure you won't like this because you did not like it before. You will say, that this just pushes back the problem as assuming that there is domain specific knowledge does not explain anything. I and G&M disagree. It changes the learning problem and then raises the "evolution" problem: how did the domain specific information get there? This is an interesting question, and one that should be addressed eventually. But the one way it should not be addressed is by assuming that there is no domain specific knowledge. Why? Because this is false and, if G&M are right, both unproductive and biologically implausible. Moreover, from the little I know, so correct me if I am wrong, we know very little about the evolution of almost everything. So, do we have an evolutionary scenrio with any plausibility of how navigation (an old system) evolved? Or say, the bee communication system? Or foraging behavior in corvids? I don't think we do. Does this mean that assumption of domain specificity there is ill advised because we have no good story for its evolution and therefore it must be that it's all general learning? Nope. Wrong conclusion. Pari passu for language. There are various questions: what's biologically given (and this G&M argue is largely domain specific competences) and how did these arise (and here we are largely clueless). It is vital to keep these questions logically apart even as we try to answer both. It seems to me that you like to cram them together precisely in order to dump domain specific knowledge. I think that your reasoning here is seriously flawed, in roughly the way G&M indicate.

      Delete
    2. Below is the critical passage:

      "from the little I know, so correct me if I am wrong, we know very little about the evolution of almost everything. So, do we have an evolutionary scenrio with any plausibility of how navigation (an old system) evolved? Or say, the bee communication system? Or foraging behavior in corvids? I don't think we do. Does this mean that assumption of domain specificity there is ill advised because we have no good story for its evolution and therefore it must be that it's all general learning? Nope. Wrong conclusion. Pari passu for language."

      I assume for arguments sake what you say is true. You still have failed to give an argument for why we should assume that language IS like the bee communication system or foraging behaviour. I know Chomsky believes this and therefore you do. But this is not an argument a biologist would accept. So WHY are these vastly different systems supposed to be comparable? If we have no evolutionary account for either, we have to go by what we actually can investigate in living organisms. We have some good evidence for genetically determined DS in bees and ants. We have so far no evidence for genetically determined DS for human language - so based on what is your 'pari passu' for language'?

      Delete
    3. The big difference is how old everything is.

      Navigation has been evolving since animals started moving around -- so for around what 500 million years? And human language has been around for 50-100,000 years. That is why for me there is a presumption that there hasn't been much domain specificity, while there is no such presumption for say bird flight.

      And I thought this was now the orthodox view -- isn't that Darwin's problem? One of the motivations for the MP?
      Just a moment ago you were arguing that all of these supposedly domain specific things in UG could be reduced to third factor principles because the theories were effective but not explanatory. Now you are arguing the opposite.

      Delete
    4. Ok, so you don't like going to the links that I put in. I understand, I don't either. So here's the recap: there are functionally distinct modules whose parts are pretty common across modules. That's the G&M view and mine. the MP trick is to show how to rebuild FL/UG from these parts. Thus, there is nothing incompatible between domains specificity and the MP program of showing how FL/UG might have evolved from (mostly) common "circuits" with the addition of very few (one,hopefully) novel bit. That's the way I see the program. We have actually discussed this in the past, so I doubt that this view of things will impress you more this time around. Oh well. It's good to have people pursuing different approaches. So, functional modularity, i.e. domain specificity is compatible with MP. There is, so far as I can tell, no contradiction.

      Delete
    5. But G & M are making a much stronger point, aren't they? They are claiming that domain general learning algorithms are biologically or neurally or computationally implausible or something.. and that does seem to be a different argument. Which has had some impact in the MP literature, even though it doesn't seem to be based on very strong arguments.

      There is no reason to think that IP models are necessarily domain specific -- and so their arguments against A models are not arguments for domain specificity.

      Delete
    6. A supplementary: you say "(one,hopefully) novel bit" -- so I guess this is Merge, right? But here we are talking about learning mechanisms, not representational mechanisms. So what is the novel bit that allows languages to be learned? Is that domain specific or domain general?

      Delete
    7. As I understand them, they are claiming that there is functional specificity (brains are built to solve very specific kinds of cognitive problems) from similar "primitive operations." They function to extract very specific kinds of information relevant to the computation at hand. They think that the idea that domain specificity is rejected is largely a hangover (in both senses) from earlier Associationist biases. They further argue that when one looks at bio-cognition in detail that one sees tons of domain specificity. This fits perfectly well with an IP view of things where the whole notion of information only makes sense given a domain of possibilities. At any rate, they provide io detail for this. They then add that we are beginning to find dedicated low level circuitry for this at the level of brain organization wrt navigation (e.g. the discussion of place cells etc) and that this organization is biologically provided. That's how I understand their claims.

      So: are there arguments against A-models arguments for domain specificity. No. The argument is that A-models and their incarnation in terms of LTP have blinded us into thinking that the neurobiology dictates A-models. They show that this is false. They further show that IP models do not require similar assumptions and that IN FACT most of bio-cognition, at least that which we understand, is dedicated. My conclusion: there is no a priori reason for thinking that this is less true in the language domain than anywhere else and that thinking otherwise is a regrettable residue of A-thinking. Does this mean that non-domain specificity is impossible? No. But it has no conceptual advantages either and it looks, if G&M are right, to be on the wrong track in those areas of cognition that we have a handle on.

      Delete
    8. t's merge or something like it (I'm partial to label myself). As for your second question, I take their point to be that learning in the absence of a specification of the domain of learning is a mug's game. I largely agree with that. Indeed, though I don't have this at my fingertips (I'm on the road away from my resources) , there is a cute quote from Chater, Griffiths, Tenebaum etc. that they quote that makes largely the same point. That the hard part is figuring out the hypothesis space, that's the real challenge. I read G&M as claiming that these will be highly articulated and very domain specific. It is consistent with this that all cognitively specific "spaces" are negotiated in the same ways (though maybe not). The point is that this will not be where the real action re learning is. Again, I am pretty sure that you do not think this is the case and here we, you vs G&M and moi, would disagree.

      Delete
    9. In reply to Norbert's comment starting "As I understand them, they are claiming that there is functional specificity"

      But there clearly is an a priori reason to think that language is different from navigation -- namely that language is very very recent.
      So I completely accept that ants may not have any domain specific learning abilities: and if you look at ants you may correctly conclude that everything is domain general. But that just does not bear on the question of whether there are domain general learning mechanisms in humans, where we have very strong reasons to think that there are -- namely our ability to learn in new domains like violin playing and baseball and calculus and making curry etc etc.

      Delete
    10. Lack of embedded comments is, as Thomas mentioned, an inconvenience -- this is in response to Norbert's "It's merge or something like it ".

      You say "I read G&M as claiming that these (Alex: the hypothesis spaces) will be highly articulated and very domain specific.".
      But this is incompatible with your claim that the only domain specific representational bit is Merge.. Or is the space simultaneously highly articulated, domain specific, and generated by just one mutation?

      Delete
    11. I don't see how this affects matters. First, animals can learn in novel things, and they do. The question is whether in our core cognitive capacities there is domain specificity. Moreover, if it seems entirely probably, at least to me, that learning in novel domains, piggy backs on what we can do in our core cognitive domains. So, is language a core cognitive domain? We likely disagree on the answer to this question. If it is, as I assume (after all we are very good at it, not unlike navigating animals at navigation), then it is reasonable to assume that its underlying system is domain specific. You repeat that this is impossible as there is not enough time for the domain specificity to have evolved. I reply, that we have no idea whether this is true for we have few (no?) models for the time it takes for a dedicated cognitive domain to evolve (again, unless you have something to point us to that addresses this). That leaves the question to how the arguments stack up in discussing the rather refined details of this capacity. I mentioned the binding theory, but there are others. When someone shows me how to derive its effects from domain general principles I'll be happy to discuss matters. Till then...

      Last point:G&M contrast two kinds of learning. The A-model view is that it is adapting behavior to environmental circumstances. The IP model is learning is building representations of the experienced world. If learning is "representation centered," then having native restrictions on the representations dictates the learning problem.

      last point: sure we learn a lot of diverse things. But an old an august explanation of this has centered our labile capacities on the fact that we can talk! And it is not clear, at least to me, that our capacities, putting these aside, are best seen in terms of general learning algorithms or in terms of something like Dehaene's recycling view of things: where core capacities are wedded to one another. On this conception, even where we generalize, we do so along lines made available by our dedicated core capacities.

      I'll let you have the last word on this (unless you egregiously exploit my kindness (kidding!)). I know we don't agree now. But I have hope for you yet Alex C!

      Delete
    12. You are right, it is. I take UG to specify the class of possible Gs. I take merge plus other non-linguistically parochial basic operations to define this class of Gs (e.g. the class of licit dependencies). This space, if we are lucky, is pretty constrained in that only these natively available options are realized. Now, and maybe this is where you are going, we can trade domain restriction for some evaluation metric. I have no views about this right now as I cannot figure out how to make them count in domains I know anything about. So, I buy the Aspects view of things, take UG to specify the hypothesis space and/or the eval metric over it and what Merge did was when combined with other cognitive operations allow for the class of dependencies that, roughly, GB characterized to be the only possible/probable ones. So, merge is the magic ingredient (in NC's opinion, recall, I like label) that allowed for the domain specific characterizations of the structural restrictions on Gs that FL/UG specifies.

      Delete
    13. Norbert's conception of 'Domain Specifity' as 'a combination of mostly pre-existing facilities with perhaps one or two new ones (I think features also deserve consideration)' seems to me to be so different from the earlier monolithic one as to deserve a different name. Also clear labelling as a perhaps speculative conclusion rather than a necessary assumption.

      In the bad old days when there really were people who believed that there was a Universal Learning Facility, DS did have a role in providing a license for linguists to look for strange facts about language that could provide evidence about what was on the child's cheat sheet for acquiring it, but nobody really believes in a ULF anymore, and we really do have a reasonable number of items to put on the cheat sheet (e.g., for Christina, it really does seem to be a universal that if a pronoun precedes and commands an NP with a lexical head, they are not coreferential; this is part of principle C that really works in all languages, afaik; other parts of the binding theory are trickier and subject to apparent typological variation).

      Delete
  2. Two questions and a comment:

    1. Is it possible to post a link to G&M that does not require me to dish out $20?

    2. The main argument seems to be: we have either A models [which are not domain specific NDS] or IP models [which are domain specific DS]. A models are bad therefore a good model must be domain specific. This would of course only follow if A models would be the only possible NDS models. How has this been established?

    My comment: I do not think that misguided empiricism/behaviourism is the main motivation for proposing NDS models. The main motivation comes from evolution: tinkering with existing structures and recruiting such structures for new purposes is a lot less 'costly' [in terms of evolutionary book-keeping] than generating novel structures AND maintaining these for a single [DS] purpose. So even if it were true that Merge were the result of a single mutation this would not explain why this novel structure did not get excapted for new purposes but remained DS over a fairly long period of time [compare how little time it took for lactose tolerance to 'evolve']. An earlier link by Norbert to [https://www.simonsfoundation.org/quanta/20130904-evolution-as-opportunist/ ] seems to suggest he accepts that evolution does not result in DS by default... And, needless to say, I share Alex's scepticism about the usefulness of analogies from ant or rat navigation...

    ReplyDelete
  3. Copy the article title from the $20 page into a browser - and you get it.

    ReplyDelete
  4. It seems to me that the real line between Good and Evil is between people who think that the hypothesis space has some structure that is relevant and can be investigated (even if they think that some implicit procedural characterization that they're working on is the best way to do this), and those who don't get this issue at all, or consider it senseless, completely irrelevant, or otherwise hopeless.

    With the first kind of person, you can apply data and thought to the issue of how best to describe the structure of the hypothesis space, whereas with the second, the only option is to have more alcohol and talk about something else.

    I see no point in assuming that the hypothesis space for knowledge of language has a rich, innate, domain specific structure unless you have some techniques for finding out what it is, but then, given these, you don't have to assume anything, but just investigate various areas and find out (as Ewan said, a few days ago, in a reply, if I understood him correctly).

    ReplyDelete
  5. I read part of the G & M paper because I was interested in the arguments linking the debate about A/IP with the debate about domain-specificity;
    the first being the classic PDP versus Turing machines, symbolic processing versus subsymbolic processing that was hashed over fairly thoroughly in the 80s and 90s and the second being for me the more interesting question.
    And G & M have some good arguments against the naive neural/associationist/behaviorist view, which I repudiate as well.

    But unfortunately G & M don't actually provide any arguments to link these two arguments: they just assert it without reasons.
    Unless I am missing something?
    The relevant passage is the one that Norbert quotes:
    "
    Framing learning problems as computational problems leads to the postulation of domain-specific learning mechanisms (Chomsky 1975, Gallistel 1999) because no general-purpose computation could serve the demands of all types of learning.
    "
    But that is it. There is no further argument. If someone could explain this argument that would be great.

    (And the claim is false, at least as far as I understand it. The general computation of the inner-product in a Hilbert space be the basis for a general learning algorithm like SVMs. )

    Then, to make matters worse, as one of the *domains* they consider probabilistic learning! So maybe they are using the word "domain" in some completely different way from the way that Norbert and I do.

    And then p.176 they completely botch Bayes rule. Look at the formula they provide and try to make sense of it: they multiply when they should divide or something and the notation is completely wrong. I guess this was just a proofing fail but it doesn't inspire confidence.

    I didn't find this paper helpful in clarifying the issues at all.

    ReplyDelete
    Replies
    1. So maybe they are using the word "domain" in some completely different way from the way that Norbert and I do.

      The intended meaning of "domain" in this context has never been particularly clear to me. Suppose you have an algorithm for sorting lists. In a sense it is domain general because it really works for any kind of list: Top 10 lists, grocery list, the phone book, lists of lists, it just doesn't matter what the elements are. At the same time it is domain specific because, well, it only works for lists. So what is it? Specific? General?

      The same thing is true for learning. What matters for learning is the structure of the hypothesis space and that this structure allows for sound inference steps from a finite sample. But what exactly the objects in the space represent isn't all that important. So in a sense learning algorithms are domain general since they can be used for a wide range of learning problems, but they are also domain specific because they only work for certain problem spaces.

      Delete
    2. I think you have to decide on a case by case basis.

      One way of thinking about it that seems to have had some traction in the past is to think of learning as based on propositionally represented prior knowledge about the domain -- so then one can ask what those 'facts' are about. For example, suppose you know in advance that the syntactic categories of the language are v,V,D, ... then this is a fact about language and therefore specific to the domain. If you think in terms of Bayesian learning then the term prior implies prior knowledge and seems to lead to this way of thinking.

      But I think you need to have a concrete proposal before you can evaluate whether it is domain specific. I think the better strategy is to figure out some good theories, and *then* argue about how domain specific they are.

      But one of the key thresholds for me has always been whether the syntactic categories (which clearly are domain specific) are built in or learned, which is why I keep on asking whether they are universal or not.

      Delete
    3. I think a case is building up that the top-level they are closer to universal than 'diversity linguists' such as Martin Haspelmath and and Bill Croft are currently claiming. There really always do seem to be things that one could reasonably call 'nouns' and 'verbs', and also adjectives, but these are problematic because they sometimes seem more like nouns (Greek), other times more like verbs (Indonesian), or split into two kinds going both ways (Japanese). 'Adverbs' are a complete nightmare because they seem to split into a maze of subcategories. There also seem to need to be subcategories, so a related question whether these can be split indefinitely finely (what Maurice Gross 1979 claimed in his 'Failure of Generative Grammar' paper), or whether there's a limit where real grammatical category splits end and something else takes over.

      And what, incidently, is the difference between real 'Parts of Speech' and features such as grammatical gender, which do in a sense split the noun category, but show a lot more variation across languages that the top-level Part-of-Speech categories do?

      Delete
    4. "You are right, it is. I take UG to specify the class of possible Gs." [Norbert] Do you really want to define it as doing only that, rather than also possibly imposing soft biases (aka an evaluation metric).

      Delete
    5. In an earlier remark I conceded your point. So evaluate metrics are fine.

      Delete
    6. As I thought, but I'm being hyper fussy about formulation here because of the big potential for misunderstanding and consequent misrepresentation.

      Delete
    7. And quite right too given earlier discussions.

      Delete