Comments

Showing posts with label POS. Show all posts
Showing posts with label POS. Show all posts

Thursday, March 17, 2016

Crows

Eric Raimy sent me this link to a piece on the nuero of crows (here). The piece argues that it is cog-neuro should add corvids to the list of "model organisms" (a distinguished group: zebra fish larvae, c-elegans worms, fruit flies and mice). Why? Well that's why I am linking to this piece. The reasoning is interesting for linguists. Let me indulge myself a bit.

There is a common assumption within linguistics that more is better. In particular, the more languages we study the better it is for linguistics. The assumption is that the best way to study what linguists should study is by looking at more and more languages. Why do we think this? I am not sure. Here are two possible reasons.

First, linguistics is the study of language and thus the more languages we study the further we advance this study. There are indeed some linguists that so consider the enterprise. I am not one of these. I am of the opinion that for modern GG the object of study is not languages but the faculty of language (FL), and if this is what you aim to understand then the idea that we should study more and more languages for each language studies advances our insight into the structure of FL needs some argument. It may, of course, be correct, but it needs an argument.

One possible argument is that unless we study a wide variety of languages we will not be able to discern how much languages vary (the right parameters) and so will mistake the structure of the invariances. So, if you want to get the invariant properties of FL right, you need to control for the variation and this can only be controlled by wide cross linguistic investigation. Ergo, we must study lots of languages.

I am on record to being skeptical that this is so. IMO, what we have found over the last 30 years (if not longer) is that languages do not change that much. The generalizations that were discovered mainly in the basis of a few languages seem to have held up pretty well over time. So, I personally find this particular reason on the weak side. Moreover, the correct calculation is not whether cross linguistic study is ever useful. Of course it is. Rather the question is whether it is a preferred way of proceeding. It is very labor intensive and quite hard. So we need to know how big the payoffs are. So, though there need be nothing wrong with this kind of inquiry, the presupposition that this is the right way to proceed and that every linguist ought to be grounded in work on some interesting (i.e. not English) language goes way beyond this anodyne prescription.

Note that the author of the Nautilus piece provides arguments for each of the model animals. Zebra fish larvae and c-elegans are there because it is easy to look into their brains. Fruit flies and mice have "easily tweak able genes." So, wrt the project of understanding neural mechanisms, these are good animals to study. Not the assumption is that the mechanisms are largely the same across animals and so we choose the ones to study on purely pragmatic grounds. Why add the corvid? Precisely because it raises an iterating question about what the neocortex adds to higher cognition. It seems that corvids are very smart but have none. Hence they are interesting.

The linguistic analogue of this sort of reasoning should be obvious. We should study language X because it makes, say, binding, easy to study because it marks in overt morphological form the underlying categories that we are interested in. Or, we should study language X because it shows the same profiles as language Y but, say, without overt movement hence suggesting that we need to refine our understanding of movement. There are good pragmatic reasons for studying a heretofore un(der) studied language. But not, these are pragmatic considerations, not principled ones.

Second, that's what linguists are trained to do, so that's what we should do. This is, I am sure we can all agree, a terrible argument. We should not be (like psychologists) a field that defines itself by the tools that it exploits. Technology is good when it embodies our leading insights. Otherwise it is only justifiable on pragmatic grounds. Linguistics is not the study of things using field methods. It is the study of FL and field methods are useful tools in advancing this study. Period.

I should add that I believe that there are good pragmatic reasons for looking at lots of languages. It is indeed true that at times a language makes manifest on the surface pieces of underlying structure that are hard to discern in English or (German or French, to name just two obvious dialects of English). However, my point here is not to dismiss cross ling work, but to argue against the assumption that this is obviously a good thing to do. Not only is this far from evident, IMO, but it also is far from clear to me that intensive study of a single language is less informative than the extensive study of many.

Just to meet expectations, let me add that I think that POS considerations, which are based on the intensive study of single Gs, is a much underused tool of investigation. Moreover, results based on POS reasoning are considered far more suspect than are those based on cross linguistic investigation. My belief is that this has things exactly backwards. However, I have made this point before, so I will not belabor it now.

Let me return to the linked paper and add one more point. The last paragraph is where we find the argument for adding corvids to our list of model animals (Btw, if corvids are that interesting and smart there arises a moral issue of whether we should be making them torture neuro subjects. I am not sure that we have a right to so treat them).

If, as Nieder told me, “the codes in the avian NCL and the mammalian PFC are the same, it suggests that there is one best neuronal solution to a common functional problem”—be it counting or abstract reasoning. What’s fascinating is that these common computations come from such different machinery. One explanation for this evolutionary convergence could be that—beyond some basic requirements in processing—the manner in which neurons are connected does not make much difference: Perhaps different wiring in the NCL and PFC still somehow leads to the same neural dynamics.

The next step in corvid neuroscience would be to uncover exactly how neurons arrive at solutions to computational challenges. Finding out how common solutions come from different hardware may very well be the key to understanding how neurons, in any organism, give rise to intelligence.

So, what makes corvids interesting is that they suggest that the neural code is somewhat independent of neural architecture. This kind of functionalism was something that Hilary Putnam was one of the first to emphasize. Moreover, as Eric noted in his e-mail to me, it is also the kind of thing that might shed light on some Gallistel like considerations (the kind of information carried is independent of the kind of nets we have which would make sense if the information is not carried in the net architecture).

To end: corvids are neat! Corvid brains might be neurally important. The first is important for youtube, the second for neuroscience. So too in linguistics.






Monday, June 22, 2015

I admit it

I admit it: until very recently when I heard “morphology” I reached for my pillow.  I knew that I was supposed to find it all very interesting, but like cod liver oil, knowing this did not make ingesting it any more pleasant. Moreover, I even developed a spiel that led to the conclusion that I did not have to be interested in it for it was at right angles to those questions that gripped me (and you all know what these are, viz. PP and DP and Empiricism/Rationalism, and FL and UG etc.). Morphology was not as obviously relevant to these questions because much of it dealt with finite (often small) exception-full paradigms. So many rules, so many exceptions, so many data points. Is there really a PoS problem here? Not obviously. So, yes morphology exists and is abundant, but does language really need morphology or is it just an excrescence? At any rate, despite some questions at a very general level (here and here), I was able to divert my gaze and convince myself that I had reason to do so. Then along came Omer thrusting Bobaljik into my hands and I am here to admit that I was completely wrong and that this stuff is great. You may know all about it, but partly as penance, let me say how and why I was wrong and why this stuff is very interesting even for someone with my interests.

Those of you that are better read than I am already know about Jonathan Bobaljik’s (JB) work on the morphology of superlatives. He has a book (here) and a bunch of papers (e.g. here).[1] I want to talk a little about one discovery that he has made that provides a novel (as JB himself notes) take on the classic PoS argument. It should be part of anyone’s bag of PoS examples that gets trotted out when you want to impress family, friends, students, and/or colleagues. It’s simple and very compelling. I have road tested it, and it works, even with those that know nothing about linguistics.

The argument is very simple. The fact concerns morphological patterns one finds when one examines morphological exceptions. In other words, it rests on the discovery that exceptions can be regular. A bunch of languages (though by no means all) can form comparatives and superlatives from base adjectival forms with affixes. English is a good example of one such language. It provides trios such as big, bigg-er, bigg-est and tall, tall-er, tall-est. Note that in these two examples, big and tall are part of the comparative er form and the superlative ­est form. This is the standard pattern in languages that do this kind of thing. Interestingly, there are exceptions. So in English we also find trios like good bett-er, be-st and bad, worse, wor-st where the comparative and superlative forms are not based on the same base as the simple adjectival form. In other words, the comparative and superlative are suppletive. There’s lots of technical ways of describing this, but for my purposes, this suffices.  Here’s what JB established (of course, based on the work of others that JB copiously cites): that if the comparative is suppletive, then so is the superlative. More graphically, if we take the trio of forms as Adj/Comp/Super, we find AAA patterns, ABB patterns and even ABC patterns but we find no ABA patterns and very very few (maybe no?) AAB patterns.[2] JB’s question is why not? And a very good questions this is.

JB argues that this follows from how superlatives are constructed and how suppletion reflects the Elsewhere Principle. The interested reader should read JB, but the basic proposal is that superlatives have comparatives as structural subparts.[3] How one pronounces the subpart then has an effect on how one can pronounce the larger structure. So, in effect, if the comparative is suppletive and it is part of the structure of the superlative, then the superlative must be suppletive as well given something like the Elsewhere Principle. This accounts for the absence of the ABA pattern. Explaining the absence of the AAB pattern takes a few more assumptions concerning the locality of morphological operations.[4]

All of this may or may not be correct (I am no expert) but it is all very interesting and very plausible. Let’s return to the PoS part of the argument.  JB notes several very interesting properties of these patterns.

First, the pattern (especially the *ABA) is linguistically very robust. It occurs in languages where the morphology makes it clear that comparatives are part of superlatives (Czech) and those where this is not at all evident on the surface (English). Thus, whatever is responsible for the *ABA pattern cannot be something that is surface detectable from inspecting morphologically overt morphemic forms. Thus, that it holds in languages like English does not follow from the fact that the comparative-within-superlative structure is evident in English forms. It isn’t. So that *ABA holds quite generally, even when there is no surface evidence suggesting that the superlative contains a comparative subpart, suggests that the postulated nested relationship between comparatives and superlatives drives the overt morphology, rather than the other way around.  And this, JB notes, strongly suggests that this gap in the possible patterns points to *ABA implicating some fundamental feature of FL/UG.

Note, incidentally, this is an excellent example where the G of language A can ground conclusions concerning the G of language B, something that only makes sense in the context of a commitment to some form of Universal Grammar. The G of Czech is telling us something about the G of English, which borders on the absurd unless one thinks that human Gs as such can have common properties (viz. a commitment to UG in some form).[5]

Second, suppletion of the relevant sort is pretty uncommon within any single language. So, in English there are only two such suppletive trios (for good and bad). So too in other languages (e.g. Hungarian, Estonian, Persian, Sanskrit, Modern Greek and Portuguese have one suppletive form each).[6] Consequently, the *ABA pattern only emerges if one takes a look across a large variety of languages and notes that the rather suppletive *ABA pattern never appears in any.

Let me stress these two points: (i) the pattern is surface invisible in many languages (e.g. English) in that the words that are relevant to finding it do not wear the pattern on their morphological sleeves. (ii) Moreover, the absent pattern occurs in an exceptional part of the language, suppletions being exceptions to the more clearly rule governed part of the morphology. And (iii) suppletions are rare both within a language and across languages. The numbers we are looking at are roughly 200 forms over 50 or so languages.  Nonetheless, when all of these exceptions from across all of these languages is examined, the absence of ABA patterns shines through clearly. So, there is a kind of three fold absence of relevant data for the child: the pattern is surface invisible in many languages, suppletion triplets are rare in any given language and the pattern is only really well grounded when one considers these small number of exceptional cases across a good number of languages. The PoSity (haha) of relevant evidence is evident. As JB rightly concludes, this begs for a FL/UG explanation.

Third, as JB notes, the absence really is the absence of a pattern. Here’s what I mean. Even among languages in close contact the pattern cannot be accounted for by pointing to roots and morphemes that express these patterns shared across the languages. The reason is that the relevant roots and morphemes in language A are not those that express it in language B even in cases where A and B are geographical and historical neighbors. So there is no plausible account of *ABA in terms of borrowing forms across Gs either geographically or historically local to one another.[7]

As JB eloquently sums things up (here: p 16):

…a growing body of research…finds…order in chaos – robust patterns of regularity that emerge as significant in their cross-linguistic aspect. Systematic gaps in these attested patterns…point to the existence of grammatical principles that abstract away from the peculiarities of individual words in specific languages and restrict the class of possible grammars.

This argument combines classic PoS themes with one important innovation. What’s classic is zeroing in on the absence of some grammatical possibility. UG is invoked to explain the gaps, exceptions the observed general patterns. UG is not generally invoked to explain what is visible, but what fails to bark. What is decidedly new, at least to me, is that the relevant pattern is only really detectable across Gs. This is the data the linguist needs to establish *ABA. There is no plausible world in which this kind of data is available in any child’s PLD. Indeed, given the rarity of thee forms overall, it is hard to see how this pattern could be detected by linguists in the absence of extensive comparative work. 

As noted, JB provides a theory of this gap in terms of the structure of superlatives as containing comparatives plus the Elsewhere Principle. If UG requires that superlatives be built from the comparative and UG adopts the Elsewhere Principle then the *ABA hole follows. These assumptions suffice to provide an explanatorily adequate theory of this phenomenon. However, JB decides to venture into territory somewhat beyond explanatory adequacy and asks why this should be true. More particularly, why must superlatives be built on top of comparatives? He speculates, in effect, that this reflects some more general property; specifically a principle

… limiting the (semantic) complexity of functional morphemes. Perhaps the reason there can be no true superlative (abstract) morpheme combining directly with adjectives without the mediation of a comparative element is that the superlative meaning “more X than all others” contain two semantically rich elements: the comparative operator and a universal quantifier. Perhaps UG imposes a condition that such semantically rich elements must start out as syntactic atoms (i.e. X0 nodes). (here p. 6)

It is tempting to speculate that this proposal is related to the kind of work that Hunter, Pietroski, Halberda and Lidz have done on the semantic structure of most (discussed here). They note that natural language meanings like to use some predicates but not others in expressing quantificational meanings. Their analysis of most critically involves a comparative and a universal component and shows how this fits with the kinds of predicates that the analog number + visual system prefer. One can think of these, perhaps, as the natural semantic predicates and if so this might also relate to what JB is pointing to. At any rate, we are in deep waters here, just the kind of ideas that a respect for minimalist questions would lead one to explore.

Let me end with one more very interesting point that JB makes. He notes that the style of explanation developed for the *ABA gap has application elsewhere. When one finds these kind of gaps, these kinds of part-whole accounts are very attractive. There seem to be other morphological gaps of interest to which this general kind of account can be applied and they have very interesting implications (e.g. case values may be highly structured). At any rate, as JB notes, with this explanatory schema in hand, reverse engineering projects of various kinds suggest themselves and become interesting to pursue.

So let me end. To repeat, boy was I wrong. Morphology provides some gorgeous examples of PoS reasoning, which, moreover, are easy to understand and explain to neophytes. I would suggest adding the *ABA MLG to your budget of handy PoS illustrations. It’s great stuff.


[1] I also have a copy of an early draft of what became part of the book that I will refer to here. I will call this JB-draft.
[2] Is Latin the only case of an ABC pattern? JB probably said but I can’t recall the answer.
[3] This is a classical kind of GG explanation: why do A and B share a property? Because A is a subpart of B. OR why if B then A? Because A is A is part of B. Note that this only really makes sense as an explanatory strategy if one is willing to countenance abstract form that is not always visible on the surface. See below.
[4] I won’t discuss this case here, but it clearly bears on question 32 here. Should we find the same locality conditions conditioning morphological and syntactic operations, this would be an interesting (indeed very interesting) reason for treating them the same.
[5] David Pesetsky made just this point forcefully in Athens.
[6] See JB-draft:15.
[7] JB-draft (15-16) notes that neither roots nor affixes that induce the suppletion are preserved across neighboring or historically related languages.

Tuesday, May 27, 2014

A game I play

Every now and then I play this game: how would Chomsky respond?  I do this for a variety of reasons. First, I respect his smarts and I think it is interesting to consider how things would look from his point of view. Second, I ache found that trying to understand his position, even when it appears foreign to my way of thinking has been useful for me in clarifying my own ideas. And third, because given Chomsky's prominence in the field and his influence on how the world views the efforts of GG, it is useful to know how he would defend a certain point of view even if he himself doesn't (or hasn't) defended it in this way.  regarding the third point: it's been my experience that when one suggests that "GG assumes "X or "GG has property Y" people take this to mean that Chomsky said that "GG assumes X" or "GG has property Y."  I am not always delighted with this way of parsing things, but given the way the world is and given that Chomsky is wickedly smart and very often correct the game is worth the effort.

In an earlier post (here), I tried to explain why I did not find any of the current attacks on the POS argument in the literature compelling. part of this consisted in explaining why I thought that the standardly cited reanalyses had :"focused on the wrong data to solve the wrong problem" and that as a result there is no reason to think that more work along these lines would ever shed any useful light on POS problems. I suggested that this was how one should really understand the discussion over Polar Questions: the anti-POS "rebuttals" misconstrue the point at issue, get the data wrong and supply answers for the wrong questions. In short, useless.

Why do I mention all of this again? Because there is an excellent recentish paper (here) by Berwick, Chomsky and Piatelli-Palmarini (BCP) that makes the points that I tried to make quickly, extensively. It is chapter 2 of the book (which is ludicrously expensive and you should take out from your library) and it suggests that my interpretation of the problem was largely on the right track. For example, I suggested that the original discussion was intended as a technically simple illustration of a much more general point aimed at a neophyte audience. BCP confirms this interpretation stating that "the examples were selected for expository reasons, deliberately simplified so that they could be presented as illustrations without the need to present more than site trivial linguistic theory" (20). They further note that the argument that Polar questions are formed using a structure dependent operation is the minimum one could say. It is not itself a detailed analysis but a general conclusion concerning the class of plausible analyses.  I also correctly surmised that the relevant data goes far beyond the simple cases generally discussed and that any adequate theory would have to extend to these more complex cases as well.  To make a long story short: I nailed it!!!

However, for those who want to read a pretty short very good discussion of the POS issue once again, a discussion where Chomsky's current views are very much in evidence, I could not do better than suggest this short readable paper "Poverty of the stimulus stands: why recent challenges fail."

One last point: there is a nice discussion here too of the interplay between PP and DP. As BCP notes, the aim of MPish accounts is to try to derive the effects of UG laden accounts that answer the POS with accounts that exploit less domain specific innate machinery. As they also note, the game is worth playing just in case you take the POS problem seriously and address the relevant data and generalizations. Changing the topic (as Perfors et al does) or ignoring the data (as Clark does and Christiansen does) means that whatever results ensue are irrelevant to the POS question at hand.  I would not have thought that this is worth repeating but for the fact that it appears to be a contentious claim. It isn't. That's why, as BCP indicates, the extant replies are worthless.

Addendum May 28/2014:

In the comments, Noah Motion has provided the following link to a very cheap version of the BCP paper. Thanks Noah.

Sunday, May 25, 2014

The GG game: Plato, Darwin and the POS

Alex Clark has made the following two comments (abstracted) in his comments to this post.

I find it quite frustrating that you challenge me to "pony up a story" but when pressed, you start saying the MP is just a conjecture and a program and not a theory.

So I read the Hauser et al paper where the only language specific bits are recursion and maps to the interfaces -- so where's the learning story that goes with that version of UG/FLN? Nobody gives me a straight answer. They change the subject or start waffling about 3rd factor principles.

I believe that these two questions betray a misunderstanding, one that Alex shares with many others concerning the objectives of the Minimalist Program (MP) and how they relate to those of earlier theory. We can address the issue by asking: how does going beyond explanatory adequacy relate to explanatory adequacy?  Talk on the Rialto is that the former cancels the latter. Nothing could be further from the truth. MP does not cancel the problems that pre-MP theory aimed to address. Aspiring to go beyond explanatory adequacy does not amnesty a theory from explanatory adequacy. Let me explain.

Before continuing, however, let me state that what follows is not Chomsky exegesis.  I am a partisan of Chomsky haruspication (well not him, but his writings), but right now my concern is not to scavenge around his literary entrails trying to find some obscure passage that might, when read standing on one’s head, confuse. I am presenting an understanding of MP that addresses the indicated question above. The two quoted paragraphs were addressed to (at?) me. So here is my answer. And yes, I have said this countless times before.

There are two puzzles, Plato’s Problem (PP) and Darwin’s Problem (DP).  They are interesting because of the light they potentially shed on the structure of FL, FL being whatever it is that allows humans to be as linguistically facile as we are.  The work in the last 60 years of generative grammar (GG) has revealed a lot about the structure of FL in that it has discovered a series of “effects” that characterize the properties of human Gs (I like to pretentiously refer to these as “laws of grammar” and will do so henceforth to irritate the congenitally irritated). Examples of the kinds of properties these Gs display/have include the following: Island effects, binding effects, ECP effects, obviation of Island effects under ellipsis, parasitic gap effects, Weak and Strong Crossover effects etc. (I provided about 30 of these effects/laws in the comments to the above mentioned post, Greg K, Avery and others added a few more).  To repeat again and loudly: THESE EFFECTS ARE EMPIRICALLY VERY WELL GROUNDED AND I TAKE THEM TO BE ROUGHLY ACCURATE DESCRIPTIONS OF THE KIND OF REGULARITIES THAT Gs DISPLAY AND I ASSUME THAT THEY ARE MORE OR LESS EMPIRICALLY CORRECT.  They define an empirical domain of inquiry. Those who don’t agree I consign to the first circle of scientific hell, the domicile of global warming skeptics, flat earthers and evo deniers. They are entitled to their views, but we are not required (in fact, it is a waste of time) to take their views seriously. So I won’t. 

Ok, let’s assume that these facts have been established. What then? Well, we can ask what they can tell us about FL. IMO, they potentially tell us a lot. How so? Via the POS argument. You all know the drill: propose a theory that derives the laws, take a look at the details of the theory, see what it would take to acquire knowledge of this theory which explains the laws, see if the PLD provides sufficient relevant information to acquire this theory. If so, assume that the available data is causally responsible.[1] If not assume that the structure of FL is causally responsible.  Thus, knowledge of the effects is explained by either pointing to the available data that it is assumed the LAD tracks or by adverting to the structure of LAD’s FL. Note, it is critical to this argument to distinguish between PLD and LD as the LAD has potential use of the former while only the linguist has access to the latter. The child is definitely not a little linguist.[2]

All of this is old hat, a hat that I’ve worn in public on this blog countless times before and so I will not preen before you so hatted again.  What I will bother saying again is that this can tell us something about FL. The laws themselves can strongly suggest whether FL is causally responsible for this or that effect we find in Gs. They alone do not tell us what exactly about FL is responsible for this or that effect. In other words, they can tell us where to look, but they don’t tell us what lives there.

So, how does one go from the laws+POS to a conjecture/claim about the structure of FL? Well, one makes a particular proposal that were it correct would derive the effects. In other words, one proposes a hypothesis, just as one does in any other area of the sciences. P,V,T relate to one another via the gas laws. Why? Well maybe it’s because gases are made up of small atoms banging against the walls of the container etc. etc. etc.  Swap gas laws for laws of grammar and atomic theory for innately structured FL and off we go.

So, what kinds of conjectures have people made? Well, here’s one: the principles of GB specify the innate structure of FL.[3] Here’s why this is a hypothesis worth entertaining: Were this true then it would explain why it is that native speakers judge movement out of islands to be lousy and why they like reflexivization where they dislike pronominalization and vice versa. How does it explain these laws? As follows: if the principles of GB correctly characterize FL, then in virtue of this FL will yield Gs that obey the laws of grammar.  So, again, were the hypothesis correct, it would explain why natural languages adhere to the generalizations GG has discovered over the last 60 years.[4]

Now, you may not like this answer. That’s your prerogative. The right response is to then provide another answer that derives the attested effects.  If you do, we can consider this answer and see how it compares with the one provided. Also, you might like the one provided and want to test it further. People (e.g. Crain, Lidz, Wexler, a.o.) have done just that by looking at real time acquisition in actual kids.  At any rate, all of this seems perfectly coherent to me, and pretty much standard scientific practice. Look for laws, try to explain them.

Ok, as you’ve no doubt noticed, the story told assumes that what’s in FL are principles of GB.[5] Doesn’t MP deny this? Yes and No. Yes, it denies that FL codes for exactly these principles as stated in GB. No, it assumes that some feature of FL exists from which the effects of these principles follow. In other words, MP assumes that PP is correct and that it sheds light on the structure of FL. It assumes that a successful POS argument implies that there is something about the structure of the LAD that explains the relevant effect. It even takes the GB description of the effects to be extensionally accurate. So how does it go beyond PP?

Well, MP assumes that what’s in FL does not have the linguistic specificity that GB answers to PP have. Why?

Well, MP argues that the more linguistically specific the contents of FL, the more difficult it will be to address DP. So, MP accepts that GB accurately derive the laws of grammar but assumes that the principles of GB themselves follow from yet more general principles many of which are domain general so as to be able to accommodate DP in addition to PP.[6] That, at least, is the conjecture. The program is to make good on this hunch. So, MP assumes that the PP problem has been largely correctly described (viz. that the goal is to deduce the laws of grammar from the structure of FL) but that the fine structure of FL is not as linguistically specific as GB has assumed.  In other words, that FL shares many of its operations and computational principles with those in other cognitive domains. Of course, it need not share all of them. There may be some linguistically specific features of FL, but not many. In fact, very very few. In fact, we hope, maybe (just maybe, cross my fingers) just ONE.

We all know the current favorite candidate: Merge. That’s Chomsky’s derby entry. And even this, Chomsky suggests may not be entirely proprietary to FL. I have another, Label. But really, for the purposes of this discussion, it doesn’t really matter what the right answer is (though, of course I am right and Chomsky is wrong!!).

So, how does MP go beyond explanatory adequacy? Well, it assumes the need to answer both PP and DP. In other words, it wants the properties of FL that answer PP to also be properties that can answer DP. This doesn’t reject PP. It doesn’t assume that the need to show how the facts/laws we have discovered over 60 years follow from FL has all of a sudden gone away. No. It accepts PP as real and as described and aims to find principles that do the job of explaining the laws that PP aims to explain but hopes to find principles/operations that are not so linguistic specific as to trouble DP.

Ok, how might we go about trying to realize this MP ambition (i.e. a theory that answers both PP and DP)? Here’s a thought: let’s see if we can derive the principles of GB from more domain general operations/principles.  Why would this be a very good strategy? Well because, to repeat, we know that were the principles of GB innate features of FL then they would explain why the Gs we find obey the laws of grammar we have discovered (see note 6 for philo of science nostrums). So were we able to derive GB from more general principles then these more general principles would also generate Gs that obeyed the laws of grammar. Here I am assuming the following extravagant rule of inference: if AàB and BàC then AàC.  Tricky, eh? So that’s the strategy. Derive GB principles from more domain general assumptions.

How well has MP done in realizing this strategy. Here we need to look not at the aims of the program, but at actual minimalist theories (MT). So how good are our current MT accounts in realizing MP objectives? The answer is necessarily complicated. Why? Because many minimalist theories are compatible with MP (and this relation between theory and program holds everywhere, not just in linguistics). So MP spawns many reasonable MTs. The name of the game if you like MP is to construct MTs that realize the goals of MP and see whether you can get them to derive the principles of GB (or the laws of grammar that GB describes). So, to repeat, how well have we done?

Different people will give different answers. Sadly, evaluations like these require judgment and reasonable people will differ here. I believe that given how hard the problems are, we have done not bad/pretty well for 20 years of work. I think that we have pretty good unifications of many parts of GB in terms of simpler operations and plausibly domain general computational principles. I have tried my own hand at this game (see here). Others have pursued this differently (e.g. Chomsky). But, and listen closely here, MP will have succeeded only if whatever MT it settles on addresses PP in the traditional way.  As far as MP is concerned, all the stuff we thought was innate before is still innate, just not quite in the particular form envisaged. What is unchanged is the requirement to derive the laws of grammar (as roughly described by GB). The only open question for DP is whether this can be done using domain general operations/principles with (at most) a very small sprinkling of domain specific linguistic properties. In other words, the open question is whether these laws are derived directly from principles of GB or indirectly from them (think GB as axioms vs GB as theorems of FL). 

I should add that no MT that I know of is just millimeters away from realizing this MP vision.  This is not a big surprise, IMO. What is a surprise, at least to me, is that we’ve made serious progress towards a good MPish account.  Still, there are lots of domain specific things we have not been able to banish from FL (ECP effects, all those pesky linguistic features (e.g. case), the universal base (and if Cinque is right, it’s a hell of a monster) and more). If we cannot get rid of them, then MP will only be partly realized. That’s ok, programs are, to repeat, not true or false, but fecund or not. MP has been very fertile and we (I?) have reason to be happy with the results so far, and hopeful that progress will continue (yes, I have a relentlessly sunny and optimistic disposition).

With this as prologue, let’s get back to Alex C. On this view, the learning story is more or less the one we had before. MP has changed little.[7] The claim that the principles of GB are innate is one that MP can endorse (and does, given the POS arguments). The question is not whether this is so, but whether the principles themselves are innate or do they derive from other more general innate principles. MP bets on the second. However, MP does not eschew the conclusion that GB (or some equivalent formulation) correctly characterizes the innate structure of FL. The only question is how direct these principles are instantiated, as axioms or as theorems. Regardless of the answer, the PP project as envisioned since the mid 60s is unchanged and the earlier answers provided still quite viable (but see caveat in note 7).

In sum, we have laws of grammar and GB explanations of them that, via the POS, argue that FL has GBish structure. MP, by adding DP to the mix, suggests that the principles of GB are derived features of FL, not primitive.  This, however, barely changes the earlier conclusions based on POS regarding PP. It certainly does not absolve anyone of having to explain the laws of grammar. It moreover implies that any theory that abstracts away from explaining these laws is a non-starter so-far as GG is concerned (Alex C provides a link to one such theory here).[8]

Let me end: here’s the entrance fee for playing the GG game:
1.     Acceptance that GG work over the last 60 years has identified significant laws of grammar.
2.     Acceptance that a reasonable aim of research is to explain these laws of grammar. This entails developing theories (like GB) which would derive these laws were these theories true (PP).
3.     More ambitiously, you can add DP to the mix by looking for theories using more domain general principles/operations from which the principles of GB (or something like them) follow as “theorems,” (adopting DP as another boundary condition on successful theory).

That’s the game. You can play or not. Note that they all start with (1) above. Denial that the laws of grammar exist puts you outside the domain of the serious. In other words, deny this and don’t expect to be taken seriously. Second, GG takes it to be a reasonable project to explain the laws of grammar and their relation to FL by developing theories like GB. Third, DP makes step 2 harder, but it does not change the requirement that any theory must address PP. Too many people, IMO, just can’t wrap their heads around this simple trio of goals. Of course, nobody has to play this game. But don’t be fooled by the skeptics into thinking that it is too ill defined to play. It’s not. People are successfully playing it. It’s just when these goals and ambitions are made clear many find that they have nothing to add and so want to convince you to stop playing. Don’t. It’s really fun. Ignore their nahnahbooboos.

[1] Note that this does not follow. There can be relevant data in the input and it may still be true that the etiology of the relevant knowledge traces to FL. However, as there is so much that fits POS reasoning, we can put these effects to the side for now
[2] One simple theory is that the laws themselves are innate. So, for example, one might think that the CNPC is innate. This is one way of reading Ross’s thesis. I personally doubt that this is right as the islands seem to more or less swing together, though there is some variation. So, I suspect that island effects themselves are not innate though their properties derive from structural properties of FL that are, something like what Subjacency theory provides.
[3] As many will no doubt jump our of their skins when they encounter this, let me be a tad careful. Saying that GB is innate does not specify how it is thus.  Aspects noted two ways that that this could be true: GB restricts the set of admissible hypotheses or it weights the possible alternative grammars/rules by some evaluation measure (markedness). For current purposes, either or both are adequate. GB tended to emphasize the restrictive hypothesis space, Ross, for example, was closer to a theory of markedness.
[4] Observe: FL is not itself a theory of how the LAD acquires a G in real time. Rather it specifies, if descriptively adequate, which Gs are acquirable (relative to some PLD) and what properties these Gs will have.  It is reasonable to suppose that what can be acquired will be part of any algorithm specifying how Gs get acquired, but they are not the same thing.  Nonetheless, the sentence that this note is appended to is correct even in the absence of a detailed “learning theory.”
[5] None of the above or the following relies on it being GB that we use to explain the laws. I happen to find GB a pretty good theory. But if you want something else, fine. Just plug your favorite theory in everywhere I put in ‘GB’ and keep reading.
[6] Again this is standard scientific practice: Einstein’s laws derive Newton’s. Does this mean that Newton’s laws are not real? Yes and No. They are not fundamental, but they are accurate descriptions. Indeed, one indication that Einstein’s laws are correct is that they derive Newton’s as limit cases. So too with statistical mechanics and thermodynamics or quantum mechanics and classical mechanics.  That’s the way it works. Earlier results (theory/laws) being the target of explanation/derivation of later more fundamental theory.
[7] The one thing it has changed is resurrect the idea that learning might not be parameter setting. As noted in various posts, FL internal parameters are a bit of a bother given MP aims. So, it is worth considering earlier approaches that were not cast in these terms, e.g. the approach in Berwick’s thesis.
[8] It’s oracular understanding of the acquisition problem simply abstracts away from PP, as Alex D noted. Thus, it is without interest for the problems discussed above.

Wednesday, November 28, 2012

Patterns, Patternings and Learning: a not so short ramble on Empiricism and Rationalism


As readers may have noticed (even my mother has noticed!), I am very fond of Poverty of Stimulus arguments (POS). Executed well, POSs generate slews of plausible candidate structures for FL/UG. Given my delight in these, I have always wondered why it is that many other otherwise intelligent looking/sounding people don’t find them nearly as suggestive/convincing as I do. It could be that they are not nearly as acute as they appear (unlikely), or it could be that I am wrong (inconceivable!), or it could be that discussants are failing to notice where the differences lie. I would like to explore this last possibility by describing two different senses of pattern, one congenial to an empiricist mind set, and one not so much. This is not, I suspect, a conscious conviction and so highlighting it may allow for a clearer understanding of where disagreement lies, even if it does not lead to a Kumbaya resolution of differences.  Here goes.

The point I want to make rests on a cute thought experiment suggested by an observation by David Berlinski in his very funny, highly readable and strongly recommended (especially with those who got off on Feyerabend’s jazz style writing in AgainstMethod) book Black Mischief.  Berlinski discusses two kinds of patterns. The first is illustrated in the following non-terminating decimal expansions:

1.     (a) .222222…
(b) .333333…
(c) .454545…
(d) .123412341234…

If asked to continue into the … range, a normal person (i.e. a college undergrad, the canonical psych subject and the only person buyable with a few “extra” credits, i.e. cheap) would continue (1a) with more 2s, (1c) with more 3s (1c) with 45s and (1d) with 1234s.  Why, because the average person would detect the indicated pattern and generalize as indicated.  People are good at detecting patterns of this sort. Hume discussed this kind of pattern recognition behavior, as have empiricists ever since. What the examples in (1) illustrate is constant conjunction, and this leads to a simple pattern that humans have little trouble extracting, (at least in the simple cases[1]).

Now as we all know, this will not get us great results for examples like (2).

2.     (a) .141592653589793…
(b) .718281828459045…

The cognoscenti will have recognized (2a) as the decimal part of the decimal expansion of π (15 first digits) and (2b) as the decimal part of the decimal expansion of e (15 first digits). If our all purpose undergrad were asked to continue the series he would have a lot of trouble doing so (Don’t take my word for it. Try the next three digits[2]). Why? Because these decimal expansions don’t display a regular pattern as they have none. That’s what makes these numbers irrational in contrast with the rational numbers in (1).  However, and this is important, the fact that they don’t display a pattern does not mean that it is impossible to generate the decimal expansions in (2). It is possible and there are well known algorithms for doing so (as we display anon). However, though there are generative procedures for calculating the decimal expansions of π and e, these procedures differ from the ones underlying (1) in that the products of the procedures don’t exhibit a perceptible pattern. The patterns, we might say, contrast in that the patterns in (1) carry the procedures for generating them in their patterning (Add 2,3, 45, 1234, to the end), while this is not so for the examples in (2). Put crudely, constant conjunction and association exercised on the patterning of 2s in (1a) lead to the rule ‘keep adding 2’ as the rule for generating (1a), while inspecting the patterning of digits in (2a) suggests nothing whatsoever about the rule that generates it (e.g. (3a)).  And this, I believe, is an important conceptual fault line separating empiricists from rationalists. For empiricists, the paradigm case of a generative procedure is intimately related to the observable patternings generated while Rationalists have generally eschewed any “resemblance” between the generative procedure and the objects generated. Let me explain.

As Chomsky has repeatedly correctly insisted, everybody assumes that learners come to the task of language acquisition with biases.  This just means that everyone agrees that what is acquired is not a list, but a procedure that allows for unbounded extension of the given (finite) examples in determinate ways. Thus, everyone (viz. both empiricists and rationalists (thus, both Chomsky and his critics)) agrees that the aim is to specify what biases a learner brings to the acquisition task. The difference lies in the nature of the biases each is willing to consider. Empiricists are happy with biases that allow for the filtering of patterns from data.[3] Their leading idea is that data reveals patterns and that learning amounts to finding these in the data. In other words, they picture the problem of learning as roughly illustrated by the example in (1).  Rationalists agree that this kind of learning exists,[4] but that there are learning problems akin to that illustrated (2). And that this kind of learning demands departure from algorithms that look for “simple” patternings of data. In fact, it requires something like a pre-specification of the possible  generative procedures. Here’s what I mean.

Consider learning the digital expansion of π. It’s possible to “learn” that some digital sequence is that of π by sampling the data (i.e. the digits) if, for example, one is biased to consider only a finite number of pre-specified procedures.  Concretely, say I am given the generative procedures in (3a) and (3b) and am shown the digits in (2a). Could I discover how to continue the sequence so armed? Of course. I could quickly come to “know” that (2a) is the right generative procedure and so I could continue adding to the … as desired. (Excuse 'infinity' below. Blogspot doesn't like the infinity sideways 8)

3 (a)
         infinity                     infinity
π = 2   ∑      k!/(2k+1)!! = 2 ∑ 2k k!2/ (2k+1)! = 2 [ 1+ 1/3 (1 + 2/5 (1 + 3/7 ( 1 +…)))]
          k=0                             k=0   

(b) e = lim (1+1/n)n = 1 + 1/1! + 1/2! + 1/3! + ...
           nà infinity

How would I come to know this? By plugging several values for k, n into (3a,b) and seeing what pops out. (3a) will spit out the sequence in (2a) and (3b) that of (2b). These generative procedures will diverge very quickly. Indeed the first computed digit renders us confident that asked to choose (3a) or (3b) given the data in (2a), (3a) is an easy choice.  The moral: even if there are no patterns in the data learning is possible if the range of relevant choices is sufficiently articulated and bounded. 

This is just a thought experiment, but I think that it highlights several features of importance. First, that everyone is knee deep in given biases, aka: innate, given modes of generalizations.  The question is not whether these exist but what they are. Empiricists, from the Rationalist point of view, unduly restrict the admissible biases to those constructed to find patterns in the data.  Second, that even in the absence of patterned data, learning is possible if we consider it as a choice among given hypotheses. Structured hypothesis spaces allow one to find generative procedures whose products display no obvious patterns. Bayesians, by the way, should be happy with this last point as nothing in their methods restricts what’s in the hypothesis space. Bayes instructs us how to navigate the space given input data. IT has nothing to say about what’s in the space of options to begin with. Consequently there is no a priori reason for restricting it to some functions rather than others. The matter, in other words is entirely empirical. Last, it pays to ask whether for any problem of interest it is more like that illustrated in (1) or in (2). One way of understanding Chomsky’s point is that when we understand what we want to explain, i.e. that linguistic competence amounts to a mastery of “constrained homophony” over an unbounded domain of linguistic objects (see here), then the problem looks much more like that in (2) than in (1), viz. there are very few (1) type patterns in the data when you look closely and there are even fewer when the nature of the PLD is considered.  In other words, Chomsky’s bet (and on this I think he is exactly right) is that the logical problem of language acquisition looks much more like (2) than like (1).

A historical aside: Here, Cartwright provides the ingredients for a nice reconstructed history. Putting more than a few words in her mouth, it would go something like this:

In the beginning there was Aristotle. For him minds could form concepts/identify substances from observation of the elements that instanced them (you learn ‘tiger’ by inspecting tigers, tiger-patterns lead to ‘tiger’ concepts/extracted tiger-substances). The 17th century dumped Aristotle’s epistemology and metaphysics. One strain rejected the substances and substituted the patterns visible to the naked eye (there is no concept/substance ‘tiger’ just some perceptible tiger patternings). This grew up to become Empiricism. The second, retained the idea of concepts/substances but gave up the idea that these were necessarily manifest in visible surface properties of experience (so ‘tiger’ may be triggered by tigers but the concept contains a whole lot more than what was provided in experience, even what was provided in the patternings).  This view grew up to be Rationalism. Empiricists rejected the idea that conceptual contents contain more than meets the eye. Rationalists gave up the idea the content of concepts are exhausted by what meets the eye.

Interestingly, this discussion persists. See for example Marr’s critique of Gibsonian theories of visual perception here. In sum, the idea that learning is restricted to patterns extractable from experience, though wrong, has a long and venerable pedigree. So too the Rationalist alternative. A rule of thumb: for every Aristotle there is a corresponding Plato (and, of course, vice versa).


[1] There is surely a bound to this. Consider a decimal expansion whose period are sequences of 2,500 digits. This would likely be hard to spot and the wonders of “constant” conjunction would likely be much less apparent.
[2] Answer: for π: 2,3,8 and for e: 2,3,5.
[3] Hence the ton of work done on categorization, categorization of prior categorizations, categorization of prior categorizations of prior categorizations…
[4] Or may exist. Whether it does is likely more complicated than usually assumed as Randy Gallistel’s work has shown. If Randy is right, then even the parade cases for associationism are considerably less empiricist than often assumed.