Comments on Faculty of Language: Computations. modularity and nativism

“So, I am confident that how we view events and ot...

2014-09-27T02:10:50.065-07:00

“So, I am confident that how we view events and other animals do is not entirely different. I would also not be surprised to find that many of our concepts coincide with those of other animals.”

Can this be right? Surely a critical criterion of concepts is displacement from stimulus, something only humans possess. Animals can’t have what is generally meant by a concept.

“Isn't it also possible that recursion is part of the "language of thought"? It'd be pretty strange to have a recursive syntax but no recursive LOT. Maybe some other animals have a complex language of thought too?”

LOT must be recursive. But animals can’t have a LOT if they don’t have concepts. I have little idea what their “mental” life might consist of.

Animals possess many kinds of intelligence, some of it of amazing complexity, but this does not imply anything like concepts or LOTs. Concepts must surely be both independent of stimuli and able to be combined in multiple ways. Animals don’t have that faculty.

In other recent posts I have sketched the results of research on word meaning that has identified what I call “dimensions of cognition and behaviour” that are arguably common to the human, animal and organismal spheres, the whole biological domain. These do not require concepts to operate behaviourally but are fundamental to human cognition and language.

It would lead to a different take. But right now i...

2014-09-26T12:05:18.656-07:00

It would lead to a different take. But right now it looks like whatever recursion animals have is nothing like the kind we have, or there is no evidence suggesting that they have what we have. Given this, I still think that Chomsky is right to think that the recursive trick that we find in FL is really the novelty, and the thing to be explained.

Indeed; I think that what I'm really complaini...

2014-09-25T04:06:10.196-07:00

Indeed; I think that what I'm really complaining about is not having any substantial idea of what actually ought to be learnable from what, at least in the domain of complicated structures of the sort traditionally of interest to generative grammarians. And not thinking that the reason for this is that, as usual, there's some important reading I haven't done.

There are certainly various studies here and there, but they don't add up to a coherent picture, it seems to me, or even bits of a real attempt to produce one.

It certainly is interesting to study how things li...

2014-09-24T13:36:59.103-07:00

It certainly is interesting to study how things like this can be learned. But how exactly does a learner "know" that these coordinate NPs and "else's" are triggers for group genetives? The intuition is something like "nothing else could have generated these constructions", and this is exactly what the likelihood component of Bayesian updating formalises.

Isn't it also possible that recursion is part ...

2014-09-23T05:14:42.624-07:00

Isn't it also possible that recursion is part of the "language of thought"? It'd be pretty strange to have a recursive syntax but no recursive LOT. Maybe some other animals have a complex language of thought too? This would lead to a different take on "Darwin's probem".

I think there's a more 'innocent' way ...

2014-09-23T04:17:59.034-07:00

I think there's a more 'innocent' way of understanding triggers than the Rube Goldberg mechanism suggested first by Mark with its capacity for pathological behavior, namely, triggers are simply forms of constructions that appear often enough in the PLD to be learned from, rather than so rare that they are probably 'projected' from the data by UG. I find it a bit odd that I can't point to much in the way of concrete, worked out literature sorting this out in specific cases.

But for a possible example, 'group genitives' where the possessor head is followed by a PP or an NP:

I tripped over the dog in the yard's bone
We left a note in the man who we met yesterday's mailbox

appear to be extremely rare in speech (Denison, Scott & Borjars 2010, The real distribution of the English “group genitive”, Studies in Language 34:532-564), and, running the Stanford PSG parser over 9.25m words of the CHILDES corpus (all of the big ones available in xml format and a number of the medium-sized one), I find exactly 2 examples of the prepositional ones, and 0 of the relative clauses. They found a lot more of the PP ones in the BNC spoken corpus (72 fixed phrases, names or titles, 11 'others'), but still only 2 of the relative clause type out of 10m words, so this would seem very plausibly to be 'projected' from the data.

Whether or not the PP type is projected or constitutes a 'trigger' seems unclear to me; from the CHILDES data, probably not, from the BNC data, probably yes.

But things that do occur often enough to be triggers would be coordinate NPs with one POS at the end, and indefnite pronouns followed by "else's" (somebody else's, who else's, nobody else's), where I find 57 of each in the CHILDES data (a rather strange coincidence), given about 6 per million words, which people seem to regard as clearly enough to constitute a trigger.

So although I don't think we can definitely say what is learned vs not learned, we can at least begin to try to organize some information as to what things are likely to be learned/function as triggers, vs more, much more, or virtually certain to be projected from what is learned.

2014-09-22T03:35:32.347-07:00

This comment has been removed by the author.

Another contribution from my left field perspectiv...

2014-09-22T03:33:23.253-07:00

Another contribution from my left field perspective as outlined very sketchily in my posts of a few days ago on this thread.

The domain-specificity of UG is an open question but, IMHO, there is no doubt about US = Universal Semantics (as outlined earlier). It is not specific to language. US, the structural dimensions of the space of the semantics of word meaning in the form of 24 primitive, abstract, innate semantic features that I call semantic factors, is also: the structural dimensions of the space of conceptualisation, of cognition, of human and animal behaviour and of organismal interaction with the environment. The small set of semantic factors (such as, materiality, particularity, surface, extension, spatiality, action, positiveness/negativeness, availability/possession, uncertainty/definitiveness) does not belong to a language module. Its demonstrable capacity to form the basic structure of all words and is arguably derived from its capacity in prior modalities.

A big question is the relation of US to UG. It seems that the nucleus of UG is merge which I presume is effected by both syntactic and semantic features of words. The set of semantic factors that comprises US would seem to be as
critical as syntactic features to the feature-checking of merge in that the compatibility of two merging units is partly dependent on their respective semantic factors.

In terms of Norbert’s two formulas at the beginning, it appears that FX = FL in terms of structural combinatory principles (X being other cognitive and behavioural modalities). These primal cognitive factors, if they are valid, also arguably provide the basis for merge in pre-linguistic human thought phylogenetically and ontogenetically. The cause for the emergence of language needs to be sought elsewhere.

I am pretty sure that parts of UG will not be prom...

2014-09-21T17:26:43.755-07:00

I am pretty sure that parts of UG will not be prominent in other domains and parts will be. It would be nice to try pricing these apart. I confess to thinking that the kind of recursion we find in language will be special and will not find reflection in other domains. However, I can imagine that other aspects of language will be similar to what we find elsewhere. So, I am confident that how we view events and other animals do is not entirely different. I would also not be surprised to find that many of our concepts coincide with those of other animals. This is all a gesturer to the wide vs narrow conception of language that Chomsky, Hauser, Fitch noted, and that it would be good to find a way of articulating. These are conceivable projects. Would be nice if someone started thinking about them.

The remark I'd like to make about the 2nd to t...

2014-09-21T16:19:02.722-07:00

The remark I'd like to make about the 2nd to the last paragraph of Norbert's main post is that the apparent domain-specificity of UG is for now only apparent; caused by our near total ignorance of how other things that might be related work, such as for example picking up skills by imitation. Whether it is real or not remains to be discovered. So I would prefer to wake up to a Sciencedaily.com headline saying 'non-linguistic and pre-human use for UG module found' than 'Chomsky's UG refuted', and the exact same work could get either of these two headlines, depending on how we describe what we are doing to the scientifically at-least-semi-literate public.

@MJ: Two points: FIrst, I don't think that it ...

2014-09-21T09:45:13.815-07:00

@MJ:
Two points:
FIrst, I don't think that it is assumed that learning lexical items (LI) is easy. We know from people like Lila that learning the first 50 LIs is anything but. We also have some reason (deriving from her recent research) to think that the process is not well modeled along Bayesian lines. Not that one couldn't kick it into that format, but that it really doesn't look anything like what we would initially expect. The link between parsing and acquisition goes back at least to Berwick's thesis and has always fit well with GG approaches to the issue, so there is nothing there to worry a GGer. However, generally speaking parsing needs grammars so it is likely that parsing is as much a by product of learning as learning is a by product of parsing.

Second: I invite you to précis your more recent work on Bayes and learning for the blog. If you agree, send me a 3-4 pager and I will post it for all to see. It would be nice to see your views illustrated for the unwashed (like me). Interested?

Another addition by the "learned" MJ fro...

2014-09-21T09:36:51.404-07:00

Another addition by the "learned" MJ from down under. I, once more, am mere conduit.

Many different mechanisms can be used to express cross-linguistic variation. I gather that the currently fashionable approach locates variation in the lexicon, in particular, in the inventory of functional categories present in the lexicon of a language. Although it's often supposed that acquiring lexical entries is easy, that's not necessarily the case, especially if the lexical entry concerned is sufficiently abstract (e.g., phonologically null). For example, determining whether a language's lexicon contains a functional element permitting WH in-situ in a lexicalist account is presumably just as hard (or easy) as setting a WH in-situ parameter in a parameter-setting account.

As I understand it, a "triggers" approach is the received view of how such parameters are set or such abstract functional categories are acquired. As I explained in my previous post, I'd expect a Minimalist wouldn't be happy with a "triggers" account of acquisition because there's no explanation for how "triggers" are associated with whatever it is that is acquired. A Bayesian account is certainly more minimalist than a "triggers" account: there's a single principle governing how hypotheses (e.g., that a language has a certain parameter setting, or has a certain lexical entry) should be updated in the face of data. The same principle -- Bayesian belief updating -- accounts for the acquisition of the lexicon and whatever is responsible for linguistic variation. Moreover, the information required for Bayesian belief updating is computed as a by-product of the parsing process. This suggests that the child's goal is comprehension, and acquisition happens as a by-product.

The paper I presented at the International Congress of Linguists last year in Geneva (on my web page) explains all this in more detail. I have another paper draft that applies the ICL paper's Bayesian approach to a probabilistic version of Stabler's Minimalist Grammar and shows that it's possible to acquire more abstract properties such as do-support and verb-second as well as lexical entries; I can post that as well if there's interest.

(See -- I almost wrote an entire post without using the word "learn"! Surely you could get used to the idea that "learn" is used in a technical sense in computational models to mean "parameter setting". It's not as if generative grammar has never used common words in a technical sense.)

@ Mark The notion of 'trigger' is used in...

2014-09-20T08:54:33.754-07:00

@ Mark

The notion of 'trigger' is used in the context of PoS arguments to signal the fact that some of what competent speakers know was not acquired (and yes I do hate the term "learning" for it invites a specific conception of how input and end state relate) on the basis of instances of the relevant principle. So, for example, we don't acquire the ECP (I am assuming it is a universal here) by witnessing useful instances of the ECP at work "in the data," (positive or negative). Thus the data triggers this fact about our competence (in that we would not necessarily have acquired the ECP if not exposed to PLD at all) rather than "shapes" it. This, at least, is how I understand the trigger notion.

The evo problem then is not with triggers per se, but the fact of parameters altogether. Are these built in to FL or not? In other words, is there a finite (albeit large) number of ways that Gs can differ and if so are these innately specified? In other words, are the parameters endogenous? It has been very hard to specify relevant parameters (those that make contact with the data) that are not very linguistically specific (this was Avery's point). Thus it would seem that parameters are both innately specified and domain specific. This is not a great thing from an MP perspective. I would love to see these cases reanalyzed. Right now, I don't know how to do this. That's my problem. I'd love to hear that Bayesians can solve it. But from what I've seen, they cannot. They build the same options into their systems as I do and from where I sit I don't see that this solves the problem. Please tell me I am wrong about this. I would be delighted if that were so.

Here's aprt 2: Bayesian theory is perfectly c...

2014-09-20T08:40:24.892-07:00

Here's aprt 2:

Bayesian theory is perfectly compatible with the hypothesis that all that's available to the "learner" is information about a set of Boolean "triggers". But as Alex points out, it's not necessary to assume there is an innately-specified set of triggers. Indeed, for a broad class of probabilistic grammars, including probabilistic versions of Minimalist Grammars, these Bayesian methods allow us to extract information from an entire sentence in a way that is optimal in a certain sense. Moreover, this information is extracted as a by-product of the parsing process; something the child is presumably doing anyway.

While Bayesian methods don't require that grammars are probabilistic, in practice Bayesian inference seems to work much better with grammars that are probabilistic. In a sense this is surprising, since the space of probabilistic grammars is "larger" than the space of discrete or categorical grammars (in the sense that the space of discrete grammars can be embedded in the space of probabilistic grammars). However, the size of a space isn't necessarily related to how hard it is to search that space, and in fact problems can often become easier if they are "relaxed", i.e., they are embedded in a larger problem space. "Learning" in the space of probabilistic grammars may be easier than "learning" in the corresponding discrete space because we can take advantage of continuity.

(*) I'd say "learned" here except that I know that word makes Norbert see red. But in fact what statisticans and computer scientists mean by "learning" is almost exactly the same as what generative grammarians mean by "parameter setting".

Here's another comment from Mark Johnson. Let&...

2014-09-20T08:39:59.646-07:00

Here's another comment from Mark Johnson. Let's admit that there is something amusing about a computational neanderthal like me coming to the aid of a world class CS person lie Mark. Think mice pulling thorns from lion paws! At any rate, here's another comment from Mark.Oh yes, it's in 2 parts due to space issues.

It seems to me that everyone agrees there needs to be some connection between properties of the primary linguistic data and the linguistic properties that get identified. (*)

Correct me if I'm wrong, byt generative grammarians by and large assume that the relevant properties of the input are Boolean "triggers", i.e., simple patterns that either match an input or not. But as far as I know there's no explanatory theory in generative grammar about how triggers are associated with linguistic properties.

It's of course possible that the connection is innate, i.e., our genes encode a giant matrix of triggers and associated linguistic properties. But this seems very unlikely for at least two reasons. First, this is a huge burden on evolution. Second, if the matrix of triggers/linguistic properties were encoded "independently" in the genome, we'd expect double dissociations; e.g., syndromes where somehow the trigger/linguistic property matrix gets out of sync with the faculty of language. To take a rather outdated examples, suppose the Pro-Drop parameter is "triggered" by rich inflection in the input (Mandarin not withstanding), and this is independently hard-wired into our genetic code. Then we might expect to see an abnormal population where the connection between the trigger and the linguistic property has been disturbed or disrupted somehow, e.g., a subpopulation where rich inflection doesn't trigger Pro-Drop but triggers something else "by mistake" (WH in situ?). But as far as I know we don't see anything like this.

Bayesian and related approaches do provide a systematic, explanatory theory of how to infer linguistic properties from primary linguistic data, and it seems to me that generativists need such an account in order to avoid the problems just raised. I don't know if infants are doing Bayesian belief updating while learning language, but it seems likely that there's a systematic connection between the information they're extracting from the input and the linguistic properties they are inferring.

I agree with that! (at least under the standard mo...

2014-09-19T14:20:06.881-07:00

I agree with that! (at least under the standard modeling assumptions) I was only objecting to the idea that there is some important difference in this respect between INE-using algorithms and ones that don't look at frequency data. They seem in the same boat.

The method must rely on a specification of the hyp...

2014-09-19T08:36:57.395-07:00

The method must rely on a specification of the hypothesis space and the weightings of alternative specified therein. Thus, all the possible alternatives are specified and this is what enables you to calculate increases in likelihood. Now do this without a specification of the possible outcomes. How far do you get? Once one is in for probabilities then one is already in for hypotheses spaces and these must be pre-specified for INE to be operative. No such options, no probabilities and no INE.

That's not how it works. Typically, you have a...

2014-09-19T07:46:02.442-07:00

That's not how it works. Typically, you have a probabilistic grammar and you apply some method that increases the likelihood of the data, and this has the effect of assigning probability zero to types of things you don't see (yes, any experts reading, this is an oversimplification in certain important respects). You don't need to explicitly have a model of the things that are absent.

Tracking what's not there is very hard unless ...

2014-09-19T07:30:23.775-07:00

Tracking what's not there is very hard unless you know what you are looking for. In the absence of a specification of what to look for there are way too many things that are absent: prime numbered DPs, VPs with 'unicorn' complements, TPs with complex subjects, etc. Of course, if you KNOW what you are looking for AND YOU DO NOT FIND IT, well, that might be interesting. But absent a specification of the potential target, INE is impossible. Thus to make use of INE you need a pretty rich set of specified alternatives that you are canvassing and as these underlie learning and are not the result thereof, they mast be pre-specified (i.e. innate).

"Now if he is right (and I have no reason to ...

2014-09-19T07:09:12.563-07:00

"Now if he is right (and I have no reason to think that he isn’t), then FL must be chocked full of domain specific information. Why? Because INE requires a sharp specification of options under consideration to be operative. Induction that uses INE effectively must be richer than induction exploiting only positive data.[5] INE demands more articulated hypothesis space, not less."

I may well be misunderstanding this, but this seems backwards. If you have an algorithm that uses INE, then typically it can learn a larger class of languages than if it doesn't. So often the hypothesis class will be larger (in the INE case). Say the class of all probabilistic automata rather than just the class of reversible ones, to take an easy example.

Here's part 2 of Trevor Lloyd's comment: ...

2014-09-18T10:33:09.655-07:00

Here's part 2 of Trevor Lloyd's comment:

Knowledge of the factors enables us to build a full 3-part model of word/concept meaning. Configurations of factors are the central constituent. They form the structure of what we learn and what we know in an instant that has the form of a mental gestalt in one or more modalities. (A semantic gestalt is a highly condensed, multi-part, isomorphic representation of an entity). The structure of the gestalt is not fully opaque and can be identified when the factors are known. The third part is World Knowledge. This is personal to a concept holder and is in linguistic form but the configuration of factors is universal. All of these elements are intuitively accessible.

I will illustrate the value of the factors by considering a child’s experience in learning the concept KEY. If it handles a key at an early age the child forms a visual/tactile gestalt with a factoral structure. (To spell this out at this stage would take too much space). That will be the extent of its early concept. Later it will learn what keys are used for which will require structural modification to the gestalt. Later still it may build its world knowledge component of the concept (how they work in locks etc.) that will enable it to use the word in a versatile manner.

With this knowledge of the nature of concepts it is easy to see, 1. that learning is a drawn-out process, 2. hypothesis formation is not the best description for it, 3. experiential information is crucial, 4. there is no evidence of an innate concept but strong evidence for innate structural constituents of all non-complex concepts. All of this except the reference to the factors is consistent with our normal understanding of the acquisition process. I wonder what Fodor’s response would be.

A little more about the factors. They are neurally intelligible as they are all arguably instantiated in specific somatotopic regions of the human and animal brain – sensorimotor, kinesthetic, proprioceptive and interoceptive. A key feature is that the factors are of two main types, one physical parameters and the other affective/normative parameters.

Although the factors have been derived from the lexicon they are not originally linguistic. They apply equally to concepts, cognition, memory, human behaviour and experience. Surprisingly they also can be seen to operate, not just in animal cognition, but across the whole field of biology. This capacity comes from their dimensional character. In relation to biology I describe the factors as 'the dimensions of the space of the interaction of organisms and their environments'. . In primitive organisms their dimensionality operates in an ultimately basic way as the parameters of the possibility of life.

These primitives identified in the lexicon are indeed extraordinary.

I am well aware that this brief description will stretch credibility to the limit but the empirical evidence for the factors in language is very robust, in my view. I am engaged in writing this up in book form at present but if you or your contributors are interested I can provide more information.

Here's a comment from Trevor Lloyd. He had tro...

2014-09-18T10:32:38.738-07:00

Here's a comment from Trevor Lloyd. He had trouble posting so I have acted as go-between. I have divided it into 2 parts for technical reasons.

I hope you will permit me to make a contribution to this post. It is also relevant to its immediate precursors. My field is the semantics of word meaning and my knowledge of GG is not extensive but many of the issues discussed here are of interest to me and I am confident I can make a contribution to the discussion, especially on concept formation and PoS, based on a newly discovered aspect of word and concept meaning.

I have identified in the lexicon something that has largely escaped attention. This is a canonical set of abstract semantic features that are ostensibly non-linguistic, innate and universal. I will not describe the discovery procedure here but I believe the results are robust empirically. I call this small set of features ‘semantic factors’. There are just 24 semantic factors on current estimation but each splits into several sub-factors that make their meanings more explicit. They are quite different from the types of semantic features proposed by the early componentialists, Jackendoff, Pustejovsky or the semantic primitives of Anna Wierzbicka, or the feature norms proposed by Ken McRae.

The factors provide the structure of the meaning of all (yes, all) words. They can be called ‘the alphabet of word meaning’ or the elements of a Universal Semantics. This is testimony to their level of primitiveness. There are just 24 semantic factors on my current estimation but each can splits into several sub-factors that make their meanings more explicit. The factors do not provide full content. That requires other elements that build on the factoral structure.

The value of the factors is that they provide for the first time a way to identify the semantic contents of words/concepts in a principled way. It is unfortunate that the controversy about concept formation has raged without any firm knowledge of what concepts are. The factors and the other elements of word meaning provide a new basis for discussing the issue.

@ Mark J Yes, Bayes might be very helpful here. T...

2014-09-18T07:44:56.823-07:00

@ Mark J

Yes, Bayes might be very helpful here. This is consonant with some nice work by Jeff Lidz and Annie Gagliardi (discussed here: http://facultyoflanguage.blogspot.com/search?q=Lidz+and+gagliardi). In fact, so far as I can tell, it is roughly the approach proposed in Aspects chapter 1 (p. 31-2) with a few probabilities thrown in. The problem that Chomsky later noted for dumping this approach is that it turned out to be very hard (and still is) to array Gs wrt to some general (simplicity) measure and so it was hard to operationailze the Aspects proposal. He called this the feasibility problem. It was possible to come up with ways of evaluating Gs pairwise, but not more generally and absent this it was hard to see how to make the evaluation metric work. It strikes me that Bayes needs something analogous to get off the ground, no? Second, even if we had this, there are computational issues with Bayes being able to actually carry out the relevant computations when the space of options gets "big" (with what 'big' means also an issue). Again this speaks to feasibility.

That said, I agree (and I think Chomsky would) that Bayes methods per se are perfectly compatible with what GGer wish to provide and it is not Bayes that is the problem in integrating the two kinds of work but the rather silly associationsim that often comes packaged with Bayes like suggestions. Absent this, there is no reason for GGers to eschew Bayes like models. Indeed at UMD, we welcome the attempts to do just that.

This comment is from Mark Johnson. I am mere midwi...

2014-09-18T05:52:58.041-07:00

This comment is from Mark Johnson. I am mere midwife here.

Fodor's review (like pretty much all of his work) is very compelling.

I'd like to pick up on a component Fodor points out is likely to be central to any rationalist account: "inference to the best explanation". This is what Bayesian methods aim to describe. Bayesian methods provide a specific account of how "to find and adopt whatever new beliefs are best confirmed on balance" given "what perception presents to you as currently the fact and you’ve got what memory presents to you as the beliefs that you’ve formed till now". Bayesian inference is compatible with Turing computation in general and minimalist representations in particular.

I agree it's still very much an open question whether Bayesian methods can provide a satisfactory account of information integration for things like commonsense knowledge (heck, it's still an open question just what aspects of language Turing computation models can provide a satisfactory account of), but as far as I can tell, Bayesianism or something like it is the best account of information integration we have today.

As you and Avery point out, the relationship between grammatical properties and the evidence available to the learner can be quite indirect. Even if we can find a complex set of "triggers" for these cases, a Minimalist should demand an explanation of why exactly these triggers are associated with exactly these grammatical properties. Bayesian methods provide an account of how primary data (e.g., "triggers") and inferences (e.g., grammatical properties) are connected, and so provide an explanatory framework for studying these aspects of the human language faculty.