Faculty of Language: Computations. modularity and nativism

Monday, September 15, 2014

Computations. modularity and nativism

The last post (here) prompted three useful comments by Max, Avery and Alex C. Though they appear to make three different points (Max pointing to Fodor’s thoughts on modularity, Avery on indirect negative evidence and Alex C on domain specific nativism) I believe that they all end up orbiting a similar small set of concerns. Let me explain.

Max links to (IMO) one of Fodor’s best ever book reviews (here). The review brings together many themes in discussing a pair of books (one by Pinker, the other by Plotkin). It outlines some links between computationalism, modularity, nativism and Darwininan natural selection (DNS). I’ll skip the discussion on DNS here, though I know that there will be many of you eager to battle his pernicious and misinformed views (not!). Go at it. What I think is interesting given the earlier post is Fodor’s linking together computationalism, modularity and nativism. How do these ideas talk to one another? Let’s start by seeing what they are.

Fodor takes computationalism to be Turing’s “simply terrific idea” about how to mechanize rationality (i.e. thinking). As Fodor puts it (p. 2):

…some inferences are rational in virtue of the syntax of the sentences that enter into them; metaphorically, in virtue of the ‘shapes’ of these sentences.

Turing noted that, wherever an inference is formal in this sense, a machine can be made to execute the inference. This is because…you can make them [i.e. machines NH] quite good at detecting and responding to syntactic relations among sentences.

And what makes syntax so nice? It’s LOCAL. Again as Fodor puts it (p. 3):

…Turing’s account of computation…doesn’t look past the form of sentences to their meanings and it assumes that the role of thoughts in a mental process is determined entirely by their internal (syntactic) structure.

Fodor continues to argue that where this kind of locally focused computation is not available, computationalism ceases to be useful. When does this happen? When belief fixation requires the global canvassing and evaluation of disparate kinds of information all of which have variable and very non-linear effects on the process. Philosophers call this ‘inference to the best explanation’ (IBT) and the problem with IBT is that it’s a complete and utter mystery how it gets done.[1] Again as Fodor puts it (p. 3):

[often] your cognitive problem is to find and adopt whatever beliefs are best confirmed on balance. ‘Best confirmed on balance’ means something like: the strongest and simplest relevant beliefs that are consistent with as many of one’s prior epistemic commitments as possible. But as far as anyone knows, relevance, strength, simplicity, centrality and the like are properties, not of single sentences, but of whole belief systems: and there’s no reason at all to suppose that such global properties of belief systems are syntactic.[2]

And this is where modularity comes in; for modular systems limit the range of relevant information for any given computation and limiting what counts as relevant is critical to allowing one to syntactify a problem and allow computationalism to operate. IMO, one of the reasons that GG has been a doable and successful branch of cog sci is that FL is modular(ish) (i.e. that something like the autonomy of syntax is roughly correct). ‘Modular’ means “largely autonomous with respect to the rest of one’s cognition” (p. 3). Modularity is what allows Turing’s trick to operate. Turing’s trick, the mechanization of cognition, relies on the syntacticifcation of inference, which in turn relies on isolating the formal features that computations exploit.

All of which brings us (at last!) to nativism. Modularity just is domain specificity. Computations are modular if they are “more or less autonomous” and “special purpose” and “the information [they] can use to solve [cognitive problems] are proprietary” (p. 3). So construed, if FL is modular, then it will also be domain specific. So if FL is a module (and we have lots of apparent evidence to suggest that it is) then it would not be at all surprising to find that FL is specially tuned to linguistic concerns. And that it exploits and manipulates “proprietary information” and that its computations were specifically “designed” to deal with the specific linguistic information it worries about. So, if FL is a module, then we should expect it be contain lots of domain specific computational operations, principles and primitives.

How do we go about investigating the if-clause immediately above? It helps go back to the schema we discussed in the previous post. Recall the general schema in (1) that we used to characterize the relevant problem in a given domain, ‘X’ ranging over different domains. (2) is the linguistic case.

(1) PXD -> FX -> GX

(2) PLD -> FL -> GL

Linguists have discovered many properties of FL. Before the Minimalist Program (MP) got going, the theories of FL were very linguistically parochial. The basic primitives, operations and principles did not appear to have much to say about other cognitive domains (e.g. vision, face recognition, causal inference). As such it was reasonable to conclude that the organization of FL was sui generis. And to the degree that this organization had to be take as innate (which, recall, was based on empirical arguments about what Gs did) then to that degree we had an argument for innate domain specific principles of FL. MP has provided (a few) reasons for thinking that earlier theories overestimated the domain specificity of FL’s organization. However, as a matter of fact, the unification of FL with other domains of cognition (or computation) has been very very very modest. I know what I am hoping for and I try not to confuse what I want to be true with what we have good reason to be true. You should too. Ambitions are one thing, results quite another. How one might go about realizing these MP ambitions?

If (1) correctly characterizes the problem, then one way for arguing against a dedicated capacity is to show that for various values of ‘X,’ FX is the same. So, say we look at vision and language, then were FL = FV we would have an argument that the very same kind of information and operations were cognitively at play in both vision and language. I confess, that stating things this baldly makes it very implausible that FL does equal FV, but heh, it’s possible. The impressive trick would show how to pull this off (as opposed to simply expressing hopes or making windy assertions that this could be done), at least for some domains. And the trick is not an easy one to execute: we know a lot about the properties of natural language Gs. And we want an FL that explains these very properties. We don’t want a unification with other FXs that sacrifices this hard won knowledge to some mushy kind of “unification” (yes, these are scare quotes) which sacrifices the specifics that we have worked so hard to establish (yes Alex, I’m talking to you). An honest appraisal of how far we’ve come in unifying the principles across modules would conclude that, to date, we have very few results suggesting that FL is not domain specific. Don’t get me wrong: there are reasons to search for such unifications and I for one would be delighted if this happens. But hoping is not doing and ambitions are not achievements. So, if FL is not a dedicated capacity, but is merely the reflection of more general cognitive principles then it should be possible to find FL being the same as some FX (if not vision, then something else) and that this unified FX’ (i.e. which encompasses FL and FX) can derive the relevant Gs with all their wonderful properties given the appropriate PLD. There’s a Nobel prize awaiting such a unification, so hope to it.[3]

It is worth noting that there is tons of standard variety psycho evidence that FL really is modular with respect to other cognitive capacities. Susan Curtiss (here and here) reviews the wealth of double dissociations between language and virtually any other capacity you might be interested in. Thus, at least in one perfectly coherent sense, FL is a module and so a dedicated special purpose system. Language competence swings independently of visual acuity, auditory facility, IQ, hair color, height, voacab proficiency, you name it. So if one takes such dissociations as dispositive (and it is the gold standard) then FL is a module with all that this entails.

However, there is a second way of thinking about what unification of the cognitive modules consists in and this may be the source of much (what I take to be) confused discussion. In particular, we need to separate out two questions: ‘Is FL a module?’ and ‘Is FL contain linguistically proprietary parts/circuits?’ One can maintain that FL is a module without also thinking that its parts are entirely different from those in every other module. How so? Well, FL might be composed from the same kinds of parts present in other modules, albeit put together in distinctive ways. Same parts, same computations, different wiring. If this were so, then there would be a sense in which FL is a module (i.e. it has special distinctive proprietary computations etc.), yet when seen at the right grain it shares many (most? All?) of its basic computational features with other domains of cognition. In other words, it is possible that FL’s computations are distinctive and dedicated, and that they are built from the same simple parts found in other modules. Speaking personally, this is how I now understand the Minimalist Bet (i.e. that FL shares many basic computational properties with other systems).

This is a coherent position (which does not imply it is correct). At the cellular level our organs are pretty similar. Nonetheless, a kidney is not a heart, and neither is a liver or a stomach. So too with FL and other cognitive “organs.” This is a possibility (in fact, I have argued in places that this is also plausible and maybe even true). So, seen from the perspective of the basic building blocks, it is possible that FL, though a separate module, is nonetheless “just like” every other kind of cognition. This version of the “modularity” issue asks not whether FL is a domain specific dedicated system (it is!), but whether it employs primitive circuits/operations proprietary to it (i.e. not shared with other cognitive domains). Here ‘domain specific’ means uses basic operations not attested in the other domains of non-linguistic cognition.

Of course, the MP bet is easy to articulate at a general level. What’s hard is to show that it’s true (or even plausible). As I’ve argued before, to collect on this bet requires, first, reducing FL’s internal modularity (which in turn requires showing Binding, movement, control, agreement, etc. are really only apparently different) and, second, showing that this unification rests on cognitively generic basic operations.[4] Believe me when I tell you that this program has been a hard sell.

Moreover, the mainstream Minimalist position is that though this may be largely correct, it is exactly wrong: there are some special purpose linguistic devices and operations (e.g. Merge), which are responsible for Gs distinctive recursive property. At any rate, I think the logic is clear so I will not repeat the mantra yet again.

This brings me to the last point I want to make: Avery notes that more often than not positive evidence relevant to fixing a grammatical option is missing from the PLD. In other words, Avery notes that the PLD is in fact even more impoverished than we tend to believe. He rightly notes that this implies that indirect negative evidence (INE) is more important than we tend to think. Now if he is right (and I have no reason to think that he isn’t), then FL must be chocked full of domain specific information. Why? Because INE requires a sharp specification of options under consideration to be operative. Induction that uses INE effectively must be richer than induction exploiting only positive data.[5] INE demands more articulated hypothesis space, not less. INE can compensate for poor direct evidence but only if FL knows what absences it’s looking for! You can hear the dogs that don’t bark but only if you are listening for barking dogs. If Avery’s cited example is correct (see here), then it seems that FL is attuned to micro variations, and this suggests a very rich system of very linguistically specific micro parameters internal to FL. Thus, if Avery is right, then FL will contain quite a lot of very domain specific information and given that this information is logically necessary to exploit INE it looks like these options must be innately specified and that FL contains lots of innate domain specific information. Of course, Avery may be wrong and those that don’t like this conclusion are free (indeed urged) to reanalyze the relevant cases (i.e. to indulge in some linguistic research and produce some helpful results).

This is a good place to stop. There is an intimate connection between modularity, computationalism, and nativism. Computations can only do useful work where information is bounded. Bounded information is what modules provide. More often than not the information that a module exploits is native to it. MP is betting that with respect to FL, there is less language specific basic circuitry than heretofore assumed. However, this does not imply that FL is not a module (i.e. part of “general intelligence”). Indeed, given the kinds of evidence that Curtiss reviews, it is empirically very likely that FL is a module. And this can be true even if we manage to unify the internal modules of FL and demonstrate that the requisite remaining computations largely exploit domain general computational principles and operations. Avery’s important question remains: how much acquisition is driven by direct and how much by indirect negative evidence? Right now, we don’t really know (at least not to the level of detail that we want). That’s why these are still important research topics. However, the logic is clear, even if the answers are not.

[1] Incidentally, IBT is one of the phenomena that dualists like Descartes pointed to in favor of a distinct mental substance. Dualism, in other words, is roughly the observation that much of thought cannot be mechanized.

[2] It’s important to understand where the problem lies. The problem is not giving a story in specific cases in specific contexts. We do this all the time. The problem is providing principles that select out the IBT antecedent to a specification of the contextually relevant variables. The hard problem is specifying what is relevant ex ante.

[3] Successful unifications almost always win kudos. Think electricity and magnetism, the the latter two with the weak force, terrestrial and celestial mechanics, chemistry and mechanics. These all get their own chapters in the greatest hits of science books. And in each case, it took lots of work to show that the desired unification was possible. There is no reason to think that cognition should be any easier.

[4] I include generic computational principles here, so-called first factor computational principles.

[5] In fact, if I understand Gold correctly (which is a toss up), acquiring modestly interesting Gs strictly using induction over positive data is impossible.

24 comments:

NorbertSeptember 18, 2014 at 5:52 AM
This comment is from Mark Johnson. I am mere midwife here.

Fodor's review (like pretty much all of his work) is very compelling.

I'd like to pick up on a component Fodor points out is likely to be central to any rationalist account: "inference to the best explanation". This is what Bayesian methods aim to describe. Bayesian methods provide a specific account of how "to find and adopt whatever new beliefs are best confirmed on balance" given "what perception presents to you as currently the fact and you’ve got what memory presents to you as the beliefs that you’ve formed till now". Bayesian inference is compatible with Turing computation in general and minimalist representations in particular.

I agree it's still very much an open question whether Bayesian methods can provide a satisfactory account of information integration for things like commonsense knowledge (heck, it's still an open question just what aspects of language Turing computation models can provide a satisfactory account of), but as far as I can tell, Bayesianism or something like it is the best account of information integration we have today.

As you and Avery point out, the relationship between grammatical properties and the evidence available to the learner can be quite indirect. Even if we can find a complex set of "triggers" for these cases, a Minimalist should demand an explanation of why exactly these triggers are associated with exactly these grammatical properties. Bayesian methods provide an account of how primary data (e.g., "triggers") and inferences (e.g., grammatical properties) are connected, and so provide an explanatory framework for studying these aspects of the human language faculty.
ReplyDelete
Replies
Alex ClarkSeptember 19, 2014 at 7:09 AM
"Now if he is right (and I have no reason to think that he isn’t), then FL must be chocked full of domain specific information. Why? Because INE requires a sharp specification of options under consideration to be operative. Induction that uses INE effectively must be richer than induction exploiting only positive data.[5] INE demands more articulated hypothesis space, not less."

I may well be misunderstanding this, but this seems backwards. If you have an algorithm that uses INE, then typically it can learn a larger class of languages than if it doesn't. So often the hypothesis class will be larger (in the INE case). Say the class of all probabilistic automata rather than just the class of reversible ones, to take an easy example.
ReplyDelete
Replies
NorbertSeptember 20, 2014 at 8:39 AM
Here's another comment from Mark Johnson. Let's admit that there is something amusing about a computational neanderthal like me coming to the aid of a world class CS person lie Mark. Think mice pulling thorns from lion paws! At any rate, here's another comment from Mark.Oh yes, it's in 2 parts due to space issues.

It seems to me that everyone agrees there needs to be some connection between properties of the primary linguistic data and the linguistic properties that get identified. (*)

Correct me if I'm wrong, byt generative grammarians by and large assume that the relevant properties of the input are Boolean "triggers", i.e., simple patterns that either match an input or not. But as far as I know there's no explanatory theory in generative grammar about how triggers are associated with linguistic properties.

It's of course possible that the connection is innate, i.e., our genes encode a giant matrix of triggers and associated linguistic properties. But this seems very unlikely for at least two reasons. First, this is a huge burden on evolution. Second, if the matrix of triggers/linguistic properties were encoded "independently" in the genome, we'd expect double dissociations; e.g., syndromes where somehow the trigger/linguistic property matrix gets out of sync with the faculty of language. To take a rather outdated examples, suppose the Pro-Drop parameter is "triggered" by rich inflection in the input (Mandarin not withstanding), and this is independently hard-wired into our genetic code. Then we might expect to see an abnormal population where the connection between the trigger and the linguistic property has been disturbed or disrupted somehow, e.g., a subpopulation where rich inflection doesn't trigger Pro-Drop but triggers something else "by mistake" (WH in situ?). But as far as I know we don't see anything like this.

Bayesian and related approaches do provide a systematic, explanatory theory of how to infer linguistic properties from primary linguistic data, and it seems to me that generativists need such an account in order to avoid the problems just raised. I don't know if infants are doing Bayesian belief updating while learning language, but it seems likely that there's a systematic connection between the information they're extracting from the input and the linguistic properties they are inferring.

ReplyDelete
Replies
NorbertSeptember 20, 2014 at 8:40 AM
Here's aprt 2:

Bayesian theory is perfectly compatible with the hypothesis that all that's available to the "learner" is information about a set of Boolean "triggers". But as Alex points out, it's not necessary to assume there is an innately-specified set of triggers. Indeed, for a broad class of probabilistic grammars, including probabilistic versions of Minimalist Grammars, these Bayesian methods allow us to extract information from an entire sentence in a way that is optimal in a certain sense. Moreover, this information is extracted as a by-product of the parsing process; something the child is presumably doing anyway.

While Bayesian methods don't require that grammars are probabilistic, in practice Bayesian inference seems to work much better with grammars that are probabilistic. In a sense this is surprising, since the space of probabilistic grammars is "larger" than the space of discrete or categorical grammars (in the sense that the space of discrete grammars can be embedded in the space of probabilistic grammars). However, the size of a space isn't necessarily related to how hard it is to search that space, and in fact problems can often become easier if they are "relaxed", i.e., they are embedded in a larger problem space. "Learning" in the space of probabilistic grammars may be easier than "learning" in the corresponding discrete space because we can take advantage of continuity.

(*) I'd say "learned" here except that I know that word makes Norbert see red. But in fact what statisticans and computer scientists mean by "learning" is almost exactly the same as what generative grammarians mean by "parameter setting".
ReplyDelete
Replies
AveryAndrewsSeptember 21, 2014 at 4:19 PM
The remark I'd like to make about the 2nd to the last paragraph of Norbert's main post is that the apparent domain-specificity of UG is for now only apparent; caused by our near total ignorance of how other things that might be related work, such as for example picking up skills by imitation. Whether it is real or not remains to be discovered. So I would prefer to wake up to a Sciencedaily.com headline saying 'non-linguistic and pre-human use for UG module found' than 'Chomsky's UG refuted', and the exact same work could get either of these two headlines, depending on how we describe what we are doing to the scientifically at-least-semi-literate public.
ReplyDelete
Replies
Trevor LloydSeptember 22, 2014 at 3:33 AM
Another contribution from my left field perspective as outlined very sketchily in my posts of a few days ago on this thread.

The domain-specificity of UG is an open question but, IMHO, there is no doubt about US = Universal Semantics (as outlined earlier). It is not specific to language. US, the structural dimensions of the space of the semantics of word meaning in the form of 24 primitive, abstract, innate semantic features that I call semantic factors, is also: the structural dimensions of the space of conceptualisation, of cognition, of human and animal behaviour and of organismal interaction with the environment. The small set of semantic factors (such as, materiality, particularity, surface, extension, spatiality, action, positiveness/negativeness, availability/possession, uncertainty/definitiveness) does not belong to a language module. Its demonstrable capacity to form the basic structure of all words and is arguably derived from its capacity in prior modalities.

A big question is the relation of US to UG. It seems that the nucleus of UG is merge which I presume is effected by both syntactic and semantic features of words. The set of semantic factors that comprises US would seem to be as
critical as syntactic features to the feature-checking of merge in that the compatibility of two merging units is partly dependent on their respective semantic factors.

In terms of Norbert’s two formulas at the beginning, it appears that FX = FL in terms of structural combinatory principles (X being other cognitive and behavioural modalities). These primal cognitive factors, if they are valid, also arguably provide the basis for merge in pre-linguistic human thought phylogenetically and ontogenetically. The cause for the emergence of language needs to be sought elsewhere.
ReplyDelete
Replies
Trevor LloydSeptember 22, 2014 at 3:35 AM
This comment has been removed by the author.
ReplyDelete
Replies

Add comment

Faculty of Language

Comments

Monday, September 15, 2014

Computations. modularity and nativism

24 comments:

Contributors