I am about to go dark for a couple of weeks. Vacation time!!! Woohoo!!! But before checking out, here are some pieces that I found fun to read.
First, there is a forthcoming review of the Berwick-Chomsky book Why only us in the New Yor Review of Books. It is by Ian Tattersall, a pretty impressive person in the field. It is a reasonable review, unlike virtually all of the others I have seen. The bottom line is that the Merge conjecture is a reasonable first step toward trying to understand how human linguistic capacity arose in the species. It is not the last word in Tattersall's opinion but it is a very good first step and pretty much the only reasonable suggestion out there. As this is completely against the received wisdom and as it is made by a very serious person who know something about the topic it is a great antidote to most of the reviews out there. BTW, Tattersall mentions these other reviews and puts them neatly in their place. Now, you are all expecting a link to the piece. But I cannot provide it as there is no publicly available URL. However, look for the review and/or get the issue from your library. It is very good and very helpful.
Second, in the previous post I discussed Pullum's views on Putnam's views on linguistics. I disagreed with him concerning their accuracy and utility, though, sadly, not their prevalence and influence. One point that Pullum did no discuss is Putnam's views on meaning and its impact on both linguistics and philosophy. Perhaps Putnam's most influential paper is The meaning of 'meaning'. It always gets trotted out for its important insights regarding narrow content, twin earth thought experiments and the role of social division of linguistic labor. For may part, I never understood why the last was a big deal given that in the end we needed to ground meaning in the relation of some mind's competence (in Putnam's views, those of some relevant expert). And there was little reason to think the this competence was different in kind from whatever we were looking for in that of the ideal speaker-hearer. Thus the observation, though possibly correct, did not seem particularly profound.
But this is not why I am discussing Putnam again. There is an excellent critical review of the logic of this very influential paper that Paul Pietroski (here). I cannot recommend it highly enough.
Third: STEM. Everywhere you turn at the university STEM is the order of the day. There is, were are led to believe, because there is a crisis in this country and we are suffering from a paucity of technically trained scientists and engineers. So dump the humanities and let's concentrate on getting out more STEM trained people. However, this story is BS. There is no dearth of the STEM trained. This NYT piece provides some numbers (here).
Here is a rule of thumb: if there is a scarcity of well trained people then their hiring price goes up. We do not expect to see legions of them without work, nor their salaries stagnating. But this is what we see with many many STEM trained types. So why the hype? I have my own views (first among these is the idea that all of our societal problems arise form a lack of education rather than an economic system that skews things in unfortunate directions). However, whether my dark suspicions are correct, it is worth knowing that the STEM stuff is quite highly hyped and that it is mostly false.
Last point: this does not imply that STEM competence is an unworthy goal. It only means that it has nothing to do with the job market. It's value lies in the value of a decent scientific education for other reasons than employment.
Fourth: remember SSRN and Elsevier's takeover and how we were told it would not make a difference? Well, here is an update on who well that is going. Hmm.
Fifth: A cute piece on physicists for hire to a answer the questions of the lay curious. I wonder if we could have a GG hotline for the tyro curious. Might prevent the sort of junk we regularly see in the press about GG.
That's it. Have a nice two weeks.
Sunday, August 7, 2016
Geoff Pullum has a recent piece in The Chronicle (here) in which he praises a deservedly famous man, Hilary Putnam. Putnam was an important 20-21st century philosopher who compiled what is arguably the best collection of essays ever in analytic philosophy. Pullum notes all of this and I cannot fault him for his judgment. However, he then takes one more step, he lauds Putnam as the “world’s most brilliant, insightful, and prescient philosopher of linguistics.” That Putnam was brilliant and insightful and (maybe) prescient is not something that I would (and did not (here)) contest. That any of this extended to his discussions of linguistics topics strikes me as either a sad commentary on the state of the philosophy of linguistics (this gets my vote) or hyperbole (it was a belated obit after all). At any rate, I want to make clear why I think that Putnam’s writings on these matters are best treated as object lessons rather than insights. Happily, this coincides with my re-reading of Language and Mind. Chomsky takes on some of Putnam’s more (sadly) influential criticisms of GG and (I am sure you will not be surprised to hear from me) demolishes them. The gist of Chomsky’s reply is that there is very little there there. He is right. This has not stopped analogous criticisms from repeatedly being advanced, but they have not gotten more convincing by repetition. Let me elaborate.
Putnam’s most directed critique of the Chomsky program in GG were his 1967 Synthese paper (“The ‘Innateness Hypothesis’ and Explanatory Models in Linguistics”) and a later companion piece “What is innate and why.” Chomsky considers Putnam’s arguments in detail in chapter 6 of the expanded edition of Language and Mind entitled “Linguistics and Philosophy.” Here is the play by play.
Chomsky’s critique has three parts:
1. Putnam’s specific critiques “enormously underestimate and misdescribe, the richness of structure, the particular and detailed properties of grammatical form and organization that must be accounted for by a “language acquisition model,” that are acquired by the normal speaker-hearer and that appear to be uniform among speakers and across languages” (179-180).
2. Putnam’s computational claims concerning grammatical simplicity are unfounded (181-2).
3. There is no argument for Putnam’s claim that “general multipurpose learning strategies” are sufficient to account for G acquisition and there is no current reason to think that any such exist when one looks at the grammatical details (184-5).
These are all closely related points, and Pullum is correct in suggesting that these points have repeatedly reappeared in critiques of GG. Thus, it is still true that simplistic views of what is required for G acquisition rely on underestimating and misdescribing what must be explained. It is still true that claims made on behalf of general learning strategies eschew the hard work of showing how the many “laws” linguists have discovered over the last 60 years are to be acquired without quite a bit of what looks like language specific software. Pullum is right: the critics have repeatedly picked up Putnam’s objections even after these have been shown to be inadequate and/or beside the point. Putnam has indeed been influential, and we are the worse for it.
Let me lightly elaborate on these three points.
Critics regularly avoid the hard problems. For example, look at virtually any takedown in the computer science literature of, for example, structure dependence, and you will observe this (see here and here for two recent reiterations of this complaint). All of these miss the point of the argument for structure dependence by concentrating on easily understandable illustrative toy examples intended for the general public (Reali and Christiensan) or misconstruing what the term actually denotes (Perfors et. al.).
I have said this before and I will do so again: GGers have discovered many non-trivial mid level generalizations that are the detailed fodder that fuel Poverty of Stimulus (PoS) arguments that implicate linguistically specific structure for FL. There can be no refutation of these arguments if the generalizaions are not addressed. So, Island Effects, ECP effects, Binding effects, Cross Over effects etc. constitute the “hard problems” for non-domain specific learning architectures. If you think that a general learner is the right way to go you need to account for these sorts of data. And there is, by now, a lot of this (see here for a partial list). However, advocates of “simpler” less domain specific accounts (almost) never broach these details, though absent this the counter proposals are at best insufficient and at worst idle.
It seems that Putnam is the first in a continuing line of critics that have decided that one can ignore the linguistic details when arguing against undesired cognitive conclusions. As Chomsky notes contra Putnam, there is more to phonology than a “short list of phonemes” from which languages can choose (e.g. there is also cyclic rule application) and there is more to syntax than proper names (e.g. there are also Island effects). Putnam failed to engage with the details (as discussed in Chomsky’s work at the time) and in doing so established a tradition that many have followed. It is not, however, a tradition that anyone should be proud to be part of, whatever its pedigree.
Putnam advanced another argument that is sadly still alive today. He argued that invoking innateness doesn’t solve the acquisition problem, but only “postpones” it. What’s this mean? The argument seems to be that stuffing FL with innate structure is explanatorily sterile as it simply pushes the problem back one step: how did the innate structure get there? Frankly, I find this claim philosophically embarrassing. Why?
One of the main professional requirements of a card-carrying philosopher is that her/his work clarify what point an argument is aiming to make; what question is it trying to answer? Assuming an FL that is structured with domain specific linguistic structure addresses the question of how an LAD can acquire its language specific G despite the poverty of the relevant PLD (here’s Chomsky 184-5: “Invoking an innate representation of universal grammar does solve the problem of learning (at least partially), in this case.”) If such a UG structured FL suffices to solve the PoS problem it raises a second question: how did the relevant (domain specific) mental structure get there (i.e. why is FL structured with language proprietary UGish principles). Note, these are two different questions (viz. “what FL is required to project a GL from PLDL?” is different from “how did the FL we in fact have get embedded in our mental architecture in the first place?”). Consequently failing to answer the evolutionary questions concerning the etiology of rich innate mental structure does not imply a failure to answer/address the question of how an individual LAD acquires its GL.
Of course, it is not an irrelevant either, or might not be. If we could show that a given domain specific FL could not possibly have evolved then we are pretty sure that the postulated innate mental mechanism in the individual cannot be a causal factor in G acquisition. After all, if such an FL cannot be there then it isn’t there and if it isn’t there then it cannot help with the acquisition problem. But, and this is very important, nobody has even the inklings of an argument against the assumption that even a very rich domain specific FL could not have arisen in humans. Right now, this impossibility claim is at best a hunch (viz. an ungrounded prejudice). Why? Because we currently have very few ideas about how any cognitive structures evolve (as Lewontin has famously noted). Indeed, even the evolution of seemingly simple non-cognitive structures remains mysterious (see here for a recent example). So, any confident claims that even a richly domain specific FL is evolutionarily impossible is not on the cards right now and is thus a weak counter-argument against an FL that can solve the acquisition problem.
A sidebar: now of this is meant to imply that this evolutionary question is uninteresting. I am an unrepentant Minimalist and take seriously the minimalist problematic: how could an FL such as ours arisen in the species. As such I am all in favor of purging FL of as much UG as possible and trading this for general cognitive mechanisms. However, because I consider this an interesting problem I resist fiat solutions; you know, bold yet vacuous declarations that a general learner can do it all without any detailed indications dealing with specific claims resting on bland assurances that it is in principle possible. I like the question so much that I want to see details; actual explanations engaging with specific proposed UG structures. I love reduction, I just don’t like the cheap variety. So derive your favorite UG based accounts from more general principles and watch me snap to attention.
BTW, Chomsky makes just this point as early as 1972. Here is a quote from his discussion of Putnam (182):
I would, naturally, assume that there is some more general basis in human mental structure for the fact (if it is a fact) that languages have transformational grammars; one of the primary scientific reasons for studying language is that this study may provide some insight into general properties of mind. Given those specific properties, we may then be able to show that transformational grammars are “natural.” This would constitute real progress, since it would enable us to raise the problem of innate conditions on acquisition of knowledge and belief in a more general framework. But it must be emphasized that, contrary to what Putnam asserts, there is no basis for assuming that “reasonable computing systems” will naturally be organized in the specific manner suggested by transformational grammar.
One might argue that Chomsky’s version of minimalism is his way of making good on Putnam’s computational conjecture, though I doubt that Putnam would see it that way. At any rate, Minimalism starts from the recognition that domain specific FLs can solve standard linguistic acquisition problems (i.e. PoS problems) and then tries to reduce the linguistic specificity of the various principles. It does not solve the domain specificity problem by ignoring the relevant domain specific principles.
One more point and I end. In his reply to Putnam Chomsky outlines a very reasonable strategy for eliminating domain specificity in favor of something like general learning. In his words 184):
A non dogmatic approach to this problem [i.e. the acquisition of language NH] can be pursued, though the investigation of specific areas of human competence, such as language, followed by the attempt to devise a hypothesis that will account for the development of such competence. If we discover that the same “learning strategies” are involved in a variety of cases, and that these suffice to account for the acquired competence, then we will have good reason to believe Putnam’s empirical hypothesis is correct. If, on the other hand, we discover that different innate systems…have to be postulated, then we will have good reason to believe that an adequate theory of mind will incorporate separate “faculties,’ each with unique or partially unique properties.
See here for another discussion elaborating these themes.
To sum up: The problem with Putnam’s philosophical discussions of linguistics is that they entirely missed the mark. They were based on very little detailed knowledge of the GG of the time. They confused several questions that needed to be kept separate and they philosophically begged questions that were (and still are) effectively empirical. The legacy has been a trail of really bad arguments that seem to arise zombie like despite their inadequacy. Putnam wrote many interesting papers. Unfortunately his papers on linguistics are not among these. Let these rest in peace.
 There are actually two points being run together here. The first is that any innate structure whether it is domain specific or not begs the explanatory question. The second is that only a domain specific “rich” FL does so. The form of the argument Putnam presents applies to either for both call for an evolutionary account of how the mental capacities arose. Humans might after all have a richer general cognitive apparatus than our ape cousins and how it arose would demand explanation ever if it were not domain specific. However, the thinking usually is that only domain specific richness is problematic. In what follows I abstract from this ambiguity.
 Perhaps it is not surprise that Dan Everett loved this Pullum post. In his words: “Glad you noticed this! He was indeed one of the best of the last 100 years.” This comment does not indicate what Everett found so wonderful, but given the topic of the Pullum’s post and Everett’s own added confusions to the philosophical issues, it is reasonable to assume that he found the Putnam critiques against domain specific nativism compelling. But you knew he would, right?
Saturday, July 30, 2016
It is somewhat surprising that Harper’s felt the need to run a hit piece by Tom Wolfe on Chomsky in its August issue (here). True, such stuff sells well. But given that there are more than enough engaging antics to focus on in Cleveland and Philadelphia one might have thought that they would save the Chomsky bashing for a slow news period. It is a testimony to Chomsky’s stature that there is a publisher of a mainstream magazine who concludes that even two national conventions featuring two of the most unpopular people ever to run for the presidency won’t attract more eyeballs than yet another takedown of Noam Chomsky and Generative Grammar (GG).
Not surprisingly, content wise there is nothing new here. It is a version of the old litany. Its only distinction is the over the top nuttiness of the writing (which, to be honest, has a certain charm in its deep dishonesty and nastiness) and its complete disregard for intellectual integrity. And, a whiff of something truly disgusting that I will get to at the very end. I have gone over the “serious” issues that the piece broaches before in discussions of analogous hit jobs in the New Yorker, the Chronicle of Higher Education, and Aeon (see here and here for example). Indeed, this blog was started as a response to what this piece is a perfect example of: the failure of people who criticize Chomsky and GG to understand even the basics of the views they are purportedly criticizing.
Here’s the nub of my earlier observations: Critics like Everett (among others, though he is the new paladin for the discontented and features prominently in this Wolfe piece too) are not engaged in a real debate for the simple reason that they are not addressing positions that anyone holds or has ever held. This point has been made repeatedly (incuding by me), but clearly to no avail. The present piece by Wolfe continues in this grand tradition. Here's what I've concluded: pointing out that neither Chomsky nor GG has ever held the positions being “refuted” is considered impolite. The view seems to be that Chomsky has been rude, sneaky even, for articulating views against which the deadly criticisms are logically refractory. Indeed, the critics refusal to address Chomsky’s actual views suggests that they think that discussing his stated positions would only encourage him in his naughty ways. If Chomsky does not hold the positions being criticized then he is clearly to blame for these are the positions that his critics want him to hold so that they can pummel him for holding them. Thus, it is plain sneaky of him to not hold them and in failing to hold them Chomsky clearly shows what a shifty, sneaky, albeit clever, SOB he really is because any moderately polite person would hold the views that Chomsky’s critics can demonstrate to be false! Given this, it is clearly best to ignore what Chomsky actually says for this would simply encourage him in articulating the views he in fact holds, and nobody would want that. For concreteness, let’s once again review what the Chomsky/GG position actually is regarding recursion and Universal Grammar (UG).
The Wolfe piece in Harper’s is based on Everett’s critique of Chomsky’s view that recursion is a central feature of natural language. As you are all aware, Everett believes that he has discovered a language (Piraha) whose G does not recurse (in particular, that forbids clauses to be embedded within clauses). Everett takes the putative absence of recursion within Piraha to rebut Chomsky’s view that recursion is a central feature of human natural language precisely because he believes that it is absent from Piraha Gs. Everett further takes this purported absence as evidence against the GG conception of UG and the idea that humans come with a native born linguistic facility to acquire Gs. For Everett human linguistic facility is due to culture, not biology (though why he thinks that these are opposed to one another is quite unclear). All of these Everett tropes are repeated in the Wolf piece, and if repetition were capable of improving the logical relevance of non-sequiturs, then the Wolfe piece would have been a valuable addition to the discussion.
How does the Everett/Wolfe “critique” miss the mark? Well, the Chomsky-GG view of recursion as a feature of UG does not imply that every human G is recursive. And thinking that it does is to confuse Chomsky Universals (CU) with Greenberg Universals (GU). I have discussed this before in many many posts (type in ‘Chomsky Universals’ or ‘Greenberg Universals’ in the search box and read the hits). The main point is that for Chomsky/GG a universal is a design feature of the Faculty of Language (FL) while for Greenberg it is a feature of particular Gs. The claim that recursion is a CU is to say that humans endowed with an FL construct recursive Gs when presented with the appropriate PLD. It makes no claim as to whether particular Gs of particular native speakers will allow sentences to licitly embed within sentences. If this is so, then Everett’s putative claim that Piraha Gs do not allow sentential recursion has no immediate bearing on the Chomsky-GG claims about recursion being a design feature of FL. That FL must be able to construct Gs with recursive rules does not imply that every G embodies recursive rules. Assuming otherwise is to reason fallaciously, not that such logical niceties have deterred Everett and friends.
Btw: I use ‘putative claim’ and ‘purported absence’ to highlight an important fact. Everett’s empirical claims are strongly contested. Nevins, Pesetsky and Rodrigues (NPR) have provided a very detailed rebuttal of Everett’s claims that Piraha Gs are recursiveless. If I were a betting man, my money would be in NPR. But for the larger issue it doesn’t matter if Everett is right and NPR are wrong. Thus, even were Everett right about the facts (which, I would bet that he isn’t) it would be irrelevant to his conclusion regarding the implications of Piraha for the Chomsky/GG claims concerning UG and recursion.
So what would be relevant evidence against the Chomsky/GG claim about the universality of recursion? Recall that the UG claim concerns the structure of FL, a cognitive faculty that humans come biologically endowed with. So, if the absence of recursion in Piraha Gs resulted from the absence of a recursive capacity in Piraha speakers’ FLs then this would argue that recursion was not a UG property of human FLs. In other words, if Piraha speakers could not acquire recursive Gs then we would have direct evidence that human FLs are not built to acquire recursive Gs. However, we know that this conditional is FALSE. Piraha kids have no trouble acquiring Brazilian Portuguese (BP), a language that everyone agrees is the product of a recursive G (e.g. BP Gs allow sentences to be repeatedly embedded within sentences). Thus, Piraha speakers’ FLs are no less recursively capable than BP speakers’ FLs or English speakers’ FLs or Swahili speakers’ FLs or... We can thus conclude that Piraha FLs are just human FLs and have as a universal feature the capacity to acquire recursive Gs.
All of this is old hat and has been repeated endlessly over the last several years in rebuttal to Everett’s ever more inflated claims. Note that if this is right, then there is no (as in none, nada, zippo, bubkis, gornisht) interesting “debate” between Everett and Chomsky concerning recursion. And this is so for one very simple reason. Equivocation obviates the possibility of debate. And if the above is right (and it is, it really is) then Everett’s entire case rests on confusing CUs and GUs. Moreover, as Wolfe’s piece is nothing more than warmed over Everett plus invective, its actual critical power is zero as it rests on the very same confusion.
But things are really much worse than this. Given how often the CU/GU confusion has been pointed out, the only rational conclusion is that Everett and his friends are deliberately running these two very different notions together. In other words, the confusion is actually a strategy. Why do they adopt it? There are two explanations that come to mind. First, Everett and friends endorse a novel mode of reasoning. Let’s call it modus non sequitur, which has the abstract form “if P why not Q.” It is a very powerful method of reasoning sure to get you where you want to go. Second possibility: Everett and Wolfe are subject to Sinclair’s Law, viz. It is difficult to get a man to understand something when his salary depends upon his not understanding it. If we understand ‘salary’ broadly to include the benefits of exposure in the high brow press, then … All of which brings us to Wolfe’s Harper’s piece.
Happily for the Sinclair inclined, the absence of possible debate does not preclude the possibility of considerable controversy. It simply implies that the controversy will be intellectually barren. And this has consequences for any coverage of the putative debate. Articles reprising the issues will focus on personalities rather than substance, because, as noted, there is no substance (though, thank goodness, there can be heroes engaging in the tireless (remunerative) pursuit of truth). Further, if such coverage appears in a venue aspiring to cater to the intellectual pretensions of its elite readers (e.g. The New Yorker, the Chronicle and, alas, now Harper’s) then the coverage will require obscuring the pun at the heart of the matter. Why? Because identifying the pun (aka equivocation) will expose the discussion as, at best, titillating gossip for the highbrow, at middling, a form of amusing silliness (e.g. perfect subject matter for Emily Litella) and, at worst, a form of celebrity pornography in the service of character assassination. Wolfe’s Harper’s piece is the dictionary definition of the third option.
Why do I judge Wolfe’s article so harshly? Because he quotes Chomsky’s observation that Everett’s claims even if correct are logically irrelevant. Here’s the full quote (39-40):
“It”—Everett’s opinion; he does not refer to Everett by name—“amounts to absolutely nothing, which is why linguists pay no attention to it. He claims, probably incorrectly, it doesn’t matter whether the facts are right or not. I mean, even accepting his claims about the language in question—Pirahã—tells us nothing about these topics. The speakers of this language, Pirahã speakers, easily learn Portuguese, which has all the properties of normal languages, and they learn it just as easily as any other child does, which means they have the same language capacity as anyone else does.”
A serious person might have been interested in finding out why Chomsky thought Everett’s claims “tell us nothing these topics.” Not Wolfe. Why try to understand issues that might detract from a storyline? No, Wolfe quotes Chomsky without asking what he might mean. Wolfe ignores Chomsky's identification of the equivocation as soon as he notes it. Why? Because this is a hit piece and identifying the equivocation at the heart of Everett’s criticism would immediately puncture Wolfe’s central conceit (i.e. heroic little guy slaying the Chomsky monster).
Wolfe clearly hates Chomsky. My reading of his piece is that he particularly hates Chomsky’s politics and the article aims to discredit the political ideas by savaging the man. Doing this requires demonstrating that Chomsky, who, as Wolfe notes is one of the most influential intellectuals of all time, is really a charlatan whose touted intellectual contributions have been discredited. This is an instance of the well know strategy of polluting the source. If Chomsky’s (revolutionary) linguistics is bunk then so are his politics. A well-known fallacy this, but not less effective for being so. Dishonest and creepy? Yes. Ineffective? Sadly no.
So there we have it. Another piece of junk, but this time in the style of the New Journalism. Before ending however, I want to offer you some quotes that highlight just how daft the whole piece is. There was a time that I thought that Wolfe was engaging in Sokal level provocation, but I concluded that he just had no idea what he was talking about and thought that stringing technical words together would add authority to his story. Take a look at this one, my favorite (p. 39):
After all, he [i.e. Chomsky, NH] was very firm in his insistence that it [i.e. UG, NH] was a physical structure. Somewhere in the brain the language organ was actually pumping the UG through the deep structure so that the LAD, the language acquisition device, could make language, speech, audible, visible, the absolutely real product of Homo sapiens’s central nervous system. [Wolfe’s emphasis, NH].
Is this great, or what! FL pumping UG through the deep structure. What the hell could this mean? Move over “colorless green ideas sleep furiously” we have a new standard for syntactically well-formed gibberish. Thank you Mr Wolfe for once again confirming the autonomy of syntax.
Or this encomium to cargo cult science (37):
It [Everett’s book, NH] was dead serious in an academic sense. He loaded it with scholarly linguistic and anthropological reports of his findings in the Amazon. He left academics blinking . . . and nonacademics with eyes wide open, staring.Yup, “loaded” with anthro and ling stuff that blinds professionals and leaves neophytes agog. Talk of scholarship. Who could ask for more? Not me. Great stuff.
Here’s one more, where Wolfe contrasts Chomsky and Everett (31):
Look at him! Everett was everything Chomsky wasn’t: a rugged outdoorsman, a hard rider with a thatchy reddish beard and a head of thick thatchy reddish hair. He could have passed for a ranch hand or a West Virginia gas driller.Methodist son of a cowboy rather than the son of Russian Askenazic Jews infatuated with political “ideas long since dried up and irrelevant,” products “perhaps” of a shtetl mentality (29). Chomsky is an indoor linguist “relieved not to go into the not-so-great outdoors,” desk bound “looking at learned journals with cramped type” (27) and who never left the computer, much less the building” (31). Chomsky is someone “very high, in an armchair, in an air conditioned office, spic and span” (36), one of those intellectuals with “radiation-bluish computer screen pallors and faux-manly open shirts” (31) never deigning to muddy himself with the “muck of life down below” (36). His linguistic “hegemony” (37) is “so supreme” that other linguists are “reduced to filling in gaps and supplying footnotes” (27).
Wowser. It may not have escaped your notice that this colorful contrast has an unsavory smell. I doubt that its dog whistle overtones were inaudible to Wolfe. The scholarly blue-pallored desk bound bookish high and mighty (Ashkenazi) Chomsky versus the outdoorsy (Methodist) man of the people and the soil and the wilderness Everett. The old world shtetl mentality brought down by a (lapsed) evangelical Methodist (32). Trump’s influence seems to extend to Harper’s. Disgusting.
That’s it for me. Harper’s should be ashamed of itself. This is not just junk. It is garbage. The stuff I quoted is just a sampling of the piece’s color. It is deeply ignorant and very nasty, with a nastiness that borders on the obscene. Your friends will read this and ask you about it. Be prepared.
 Actually, Greenberg’s own Universals were properties of languages not Gs. More exactly, they describe surface properties of strings within languages. As recursion is in the first instance a property of systems of rules and only secondarily a property of strings in a language, I am here extending the notion Greenberg Universal to apply to properties all Gs share rather than all languages (i.e. surface products of Gs) share.
 Incidentally, Wolfe does not address these counterarguments. Instead he suggests that NPR are Chomsky’s pawns who blindly attack anyone who exposes Chomsky’s fallacies (see p.35). However, reading Wolfe’s piece indicates that the real reason he does not deal with NPT’s substantive criticisms is that he cannot. He doesn’t know anything so he must ignore the substantive issues and engage in ad hominem attacks. Wolfe has not written a piece of popular science or even intellectual history for the simple reason that he does not appear to have the competence required to do so.
 It is worth pointing out that sentence recursion is just one example of recursion. So, Gs that repeatedly embed DPs within DPs or VPs within VPs are just as resursive as those that embed clauses within clauses.
 See Wolfe’s discussion of the “law” of recursion on 30-31. It is worth noting that Wolfe seems to think that “discovering” recursion was a big deal. But if it was Chomsky was not its discoverer, as his discussion of Cartesian precursors demonstrates. Recursion follows trivially from the fact of linguistic creativity. The implications of the fact that humans can and do acquire recursive Gs are significant. The fact itself is a pretty trivial observation.
Wednesday, July 27, 2016
Norbert and regular readers of this prestigious blog may have seen me participate in some discussions about open access publishing, e.g. in the wake of the Lingua exodus or after Norbert's link to that article purportedly listing a number of arguments in favor of traditional publishers. One thing that I find frustrating about this debate is that pretty much everybody who participates in it thinks of this issues as how the current publishing model can be reconciled with open access. That is a very limiting perspective, in my opinion, just like every company that has approached free/libre and open source software (aka FLOSS) with the mindset of a proprietary business model has failed in that domain or is currently failing (look at what happened to OpenOffice and MySQL after Oracle took control of the projects). In that spirit, I'd like to conduct a thought experiment: what would academic publishing look like if it didn't have decades of institutional cruft to carry around? Basically, if academic publishing hadn't existed until a few years ago, what kind of system would a bunch of technically-minded academics be hacking away on?
Wednesday, July 20, 2016
Here’s part 2. See here for part 1.
L&M identifies two other important properties that were central to the Cartesian view.
First, human linguistic usage is apparently free from stimulus control “either external or internal.” Cartesians thought that animals were not really free, animal behavior being tightly tied to either environmental exigencies (predators, food location) or to internal states (being hungry or horny). The law of effect is a version of this view (here). I am dubious that this is actually true of animals. And, I recall a quip from an experimental psych friend of mine that claimed that the first law of animal behavior is that the animal does whatever it damn well pleases. But, regardless of whether this is so for animals, it is clearly true of humans as manifest in their use of language. And a good thing too, L&M notes. For this freedom from stimulus control is what allows “language to serve as an instrument of thought and self-expression,” as it regularly does in daily life.
L&M notes that Cartesians did not take unboundedness or freedom from stimulus control to “exceed the bounds of mechanical explanation” (12). This brings us to the third feature of linguistic behavior: the coherence and aptness of everyday linguistic behavior. Thus, even though linguistic behavior is not stimulus bound, and hence not tightly causally bound to external or internal stimuli, linguistic behavior is not scattershot either. Rather it displays “appropriateness to the situation.” As L&M notes, it is not clear exactly how to characterize condign linguistic performance, though “there is no doubt that these are meaningful concepts…[as] [w]e can distinguish normal use of language from the ravings of a lunatic or the output of a computer with a random element” (12). This third feature of linguistic creativity, its aptness/fit to the situation without being caused by it was, for Cartesians, the most dramatic expression of linguistic creativity.
Let’s consider these last two properties a little more fully: (i) stimulus-freedom (SF) and (ii) apt fit (AF).
Note first that both kinds of creativity though expressed in language, are not restricted to linguistic performances. It’s just that normal language use provides everyday manifestations of both features.
Second, the sources of both these aspects of creativity are, so far as I can tell, still entirely mysterious. We have no idea how to “model” either SF or AF in the general case. We can, of course, identify when specific responses are apt and explain why someone said what they did on specific occasions. However, we have no general theory that illuminates the specific instances. More precisely, it’s not that we have poor theories, it’s that we really have no theories at all. The relevant factors remain mysteries, rather than problems in Chomsky’s parlance. L&M makes this point (12-13):
Honesty forces us to admit that we are as far today as Descartes was three centuries ago from understanding just what enables a human to speak in a way that is innovative, free from stimulus control, and also appropriate and coherent.
The intractability of SF and AF serves to highlight the importance of the competence/performance distinction. The study of competence is largely insulated from these mysterious factors. How so? Well, it abstracts away from use and studies capacities, not their exercise. SF and PF are not restricted to linguistic performances and so are unlikely intrinsically linked to the human capacity for language. Hence detaching the capacity should not (one hopes) corrupt its study, even if how competence is used for the free expression of thought remains obscure.
The astute reader will notice that Chomsky’s famous review of Skinner’s Verbal Behavior (VB) leaned heavily on the fact of SF. Or more accurately, the review argued that it was impossible to specify the contours of linguistic behavior by tightly linking it to environmental inputs/stimuli or internal states/rewards. Why? Cartesians have an answer: the Skinnerian project is hopeless. Our behavior is both SF and AF, our verbal behavior included. Hence any approach to language that focuses on behavior and its immediate roots in environmental stimuli and/or rewards is doomed to failure. Theories built on supposing that SF or AF are false will either be vacuous or evidently false. Chomsky’s critique showed how VB embodied the twin horns of this dilemma. Score one for the Cartesians.
One last point and I quit. Chomsky’s expansive discussion of the various dimensions of linguistic creativity may shed light on “Das Chomsky Probleme.” This is the puzzle of how, or whether, two of Chomsky’s interests, politics and linguistics, hook up. Chomsky has repeatedly (and IMO, rightly) noted that there is no logical relation between his technical linguistic work and his anarchist political views. Thus, there is no sense in which accepting the competence/performance distinction or thinking that TGG is required as part of any solution to linguistic creativity or thinking that there must be a language dedicated FL to allow for the facts of language acquisition in any way imply that we should organize societies on democratic bases in which all participants robustly participate, or vice versa. The two issues are logically and conceptually separate.
This said, those parts of linguistic creativity that the Cartesians noted and that remain as mysterious to us today as when they were first observed can ground a certain view of politics. And Chomsky talks about this (L&M:102ff). The Cartesian conception of human nature as creative in the strong Cartesian sense of SF and AF leads naturally to the conclusion that societies that respect these creative impulses are well suited to our nature and that those that repress them leave something to be desired. L&M notes that this creative conception lies at the heart of many Enlightenment and, later, Romantic conceptions of human well-being and the ethics and politics that would support expression of these creative capacities. There is a line of intellectual descent from Descartes through Rousseau to Kant that grounds respect for humans in the capacity for this kind of “freedom.” And Chomsky is clearly attracted to this idea. However, and let me repeat, however, Chomsky has nothing of scientific substance to say about these kinds of creativity, as he himself insists. He does not link his politics to the fact that humans come with the capacity to develop TGGs. As noted, TGGs are at right angles to SF and AF, and competence abstracts away from questions of behavior/performance where SF and AF live. Luckily, there is a lot we can say about capacities independent of considering how these capacities are put to use. And that is one important point of L&M’s extended discussion of the various aspects of linguistic creativity. That said, these three conceptions connect up in Cartesian conceptions of human nature, despite their logical and conceptual independence and so it is not surprising that Chomsky might find all three ideas attractive even if they are relevant for different kinds of projects. Chomsky’s political interests are conceptually separable from his linguistic ones. Surprise, surprise it seems that he can chew gum and walk at the same time!
Ok, that’s it. Too long, again. Take a look at the discussion yourself. It is pretty short and very interesting, not the least reason being how abstracting away from deep issues of abiding interest is often a pre-condition for opening up serious inquiry. Behavior may be what interests us, but given SF and AF is has proven to be refractory to serious study. Happily, studying the structure of the capacity independent of how it is used has proven to be quite a fertile area of inquiry. It would be a more productive world were these insights in L&M more widely internalized by the cog-neuro-ling communities.
 The one area where SFitude might be relevant regards the semantics of lexical items. Chomsky has argued against the denotational theories of meaning in part by noting that there is no good sense in which words denote things. He contrasts this with “words” in animal communication systess. As Chomsky has noted, how lexical items work “pose deep mysteries,” something that referential theories do not appreciate. See here for references and discussion.
Wednesday, July 13, 2016
Once again, this post got away from me, so I am dividing it into two parts.
As I mentioned in a recent previous post, I have just finished re-reading Language & Mind (L&M) and have been struck, once again, about how relevant much of the discussion is to current concerns. One topic, however, that does not get much play today, but is quite well developed in L&M is it’s discussion of Descartes’ very expansive conceptions of linguistic creativity and how it relates to the development of the generative program. The discussion is surprisingly complex and I would like to review its main themes here. This will reiterate some points made in earlier posts (here, here) but I hope it also deepens the discussion a bit.
Human linguistic creativity is front and center in L&M as it constitutes the central fact animating Chomsky’s proposal for Transformational Generative Grammar (TGG). The argument is that a TGG competence theory is a necessary part of any account of the obvious fact that humans regularly use language in novel ways. Here’s L&M (11-12):
…the normal use of language is innovative, in the sense of much of what we say in the course of normal use is entirely new, not a repetition of anything that we have heard before and not even similar in pattern - in any useful sense of the terms “similar” and “pattern” – to sentences or discourse that we have heard in the past. This is a truism, but an important one, often overlooked and not infrequently denied in the behaviorist period of linguistics…when it was almost universally claimed that a person’s knowledge of language is representable as a stored set of patterns, overlearned through constant repetition and detailed training, with innovation being at most a matter of “analogy.” The fact surely is, however, that the number of sentences in one’s native language that one will immediately understand with no feeling of difficulty or strangeness is astronomical; and that the number of patterns underlying our normal use of language and corresponding to meaningful and easily comprehensible sentences in our language is order of magnitudes greater than the number of seconds in a lifetime. It is in this sense the normal use of language is innovative.
There are several points worth highlighting in the above quote. First, note that normal use is “not even similar in pattern” to what we have heard before. In other words, linguistic competence is not an instance of pattern matching or recognition in any interesting sense of “pattern” or “matching.” Native speaker use extends both to novel sentences and to novel sentence patterns effortlessly. Why is this important?
IMO, one of the pitfalls of much work critical of GG is the assimilation of linguistic competence to a species of pattern matching. The idea is that a set of templates (i.e. in L&M terms: “a stored set of patterns”) combined with a large vocabulary can easily generate a large set of possible sentences in the sense of templates saturated by lexical items that fit.  Note, that such templates can be hierarchically organized and so display one of the properties of natural language Gs (i.e. hierarchical structures). Moreover, if the patterns are extractable from a subset of the relevant data then these patterns/templates can be used to project novel sentences. However, what the pattern matching conception of projection misses is that the patterns we find in Gs are not finite and the reason for this is that we can embed patterns within patterns within patterns within…you get the point. We can call the outputs of recursive rules “patterns” but this is misleading for once one sees that the patterns are endless, then Gs are not well conceived of as collections of patterns but collections of rules that generate patterns. And once one sees this, then the linguistic problem is (i) to describe these rules and their interactions and (ii) to further explain how these rules are acquired (i.e. not how the patterns are acquired).
The shift in perspective from patterns (and patternings in the data (see note 5)) to generative procedures and the (often very abstract) objects that they manipulate changes what the acquisition problem amounts to. One important implication of this shift of perspective is that scouring strings for patterns in the data (as many statistical learning systems like to do) is a waste of time because these systems are looking for the wrong things (at least in syntax). They are looking for patterns whereas they should be looking for rules. As the output of the “learning” has to be systems of rules, not systems of patterns, and as rules are, at best, implicit in patterns, not explicitly manifest by them, theories that don’t focus on rules are going to be of little linguistic interest.
Let me make this point another way: unboundedness implies novelty, but novelty can exist without unboundedness. The creativity issue relates to the accommodation of novel structures. This can occur even in small finite domains (e.g. loan words in phonology might be an example). Creativity implies projection/induction, which must specify a dimension of generalization along which inputs can be generalized so as to apply to instances beyond the input. This, btw, is universally acknowledged by anyone working on learning. Unboundedness makes projection a no-brainer. However, it also has a second important implication. It requires that the generalizations being made involve recursive rules. The unboundedness we find in syntax cannot be satisfied via pattern matching. It requires a specification of rules that can be repeatedly applied to create novel patterns. Thus, it is important to keep the issue of unboundedness separate from that of projection. What makes the unboundedness of syntax so important is that it requires that we move beyond the pattern-template-categorization conception of cognition.
Dare I add (more accurately, can I resist adding) that pattern matching is the flavor of choice for the Empricistically (E) inclined. Why? Well, as noted, everyone agrees that induction must allow generalization beyond the input data. Thus even Es endorse this for Es recognize that cognition involves projection beyond the input (i.e. “learning”). The question is the nature of this induction. Es like to think that learning is a function from input to patterns abstracted from the input, the input patterns being perceptually available in their patternings, albeit sometimes noisily. In other words, learning amounts to abstracting a finite set of patterns from the perceptual input and then creating new instances of those patterns by subbing novel atoms (e.g. lexical items) into the abstracted patterns. E research programs amount to finding ways to induce/abstract patterns/templates from the perceptual patternings in the data. The various statistical techniques Es explore are in service of finding these patterns in the (standardly, very noisy) input. Unboundedness implies that this kind of induction is, at best, incomplete. Or, more accurately, the observation that the number of patterns is unbounded implies that learning must involve more than pattern detection/abstraction. In domains where the number of patterns is effectively infinite, learning is a function from inputs to rules that generate patterns, not to patterns themselves. See link in note 6 for more discussion.
An aside: Most connectionist learners (and deep learners) are pattern matchers and, in light of the above, are simply “learning” the wrong things. No matter how many “patterns” the intermediate layers converge on from the (mega) data they are exposed to they will not settle on enough given that the number of patterns that human native speakers are competent in is effectively unbounded. Unless the intermediate layers acquire rules that can be recursively applied they have not acquired the right kinds of things and thus all of this modeling is irrelevant no matter how much of the data any given model covers.
Another aside: this point was made explicitly in the quote above but to no avail. As L&M notes critically (11): “it was almost universally claimed that a person’s knowledge of language is representable as a stored set of patterns, overlearned through constant repetition and detailed training.” Add some statistical massaging and a few neural nets and things have not changed much. The name of the inductive game in the E world is to look for perceptual available patterns in the signal, abstract them and use them to accommodate novelty. The unboundedness of linguistic patterns that L&M highlights implies that this learning strategy won’t suffice the language case, and this is a very important observation.
Ok, back to L&M
Second, the quote above notes that there is no useful sense of “analogy” that can get one from the specific patterns one might abstract from the perceptual data to the unbounded number of patterns with which native speakers display competence. In other words, “analogy” is not the secret sauce that gets one from input to rules So, when you hear someone talk about analogical processes reach for your favorite anti-BS device. If “analogy” is offered as part of any explanation of an inferential capacity you can be absolutely sure that no account is actually being offered. Simply put, unless the dimensions of analogy are explicitly specified the story being proffered is nothing but wind (in both the Ecclesiastes and the scatological sense of the term).
Third, the kind of infinity human linguistic creativity displays has a special character: it is a discrete infinity. L&M observes that human language (unlike animal communication systems) does not consist of a “fixed, finite number of linguistic dimensions, each of which is associated with a particular nonlinguistic dimension in such a way that selection of a point along the linguistic dimension determines and signals selection of a point along the associated nonlinguistic dimension” (69). So, for example, higher pitch or chirp being associated with greater intention to aggressively defend territory or the way that “readings of a speedometer can be said, with an obvious idealization, to be infinite in variety” (12).
L&M notes that these sorts of systems can be infinite, in the sense of containing “an indefinitely large range of potential signals.” However, in such cases the variation is “continuous” while human linguistic expression exploits “discrete” structures that can be used to “express indefinitely many new thoughts, intentions, feelings, and so on.” ‘New thoughts’ in the previous quote clearly meaning new kinds of thoughts (e.g. the signals are not all how fast the car is moving). As L&M makes clear, the difference between these two kinds of systems is “not one of “more” or “less,” but rather of an entirely different principle of organization,” one that does not work by “selecting a point along some linguistic dimension that signals a corresponding point along an associate nonlinguistic dimension.” (69-70).
In sum, human linguistic creativity implicates something like a TGG that pairs discrete hierarchical structures relevant to meanings with discrete hierarchical structures relevant to sounds and does so recursively. Anything that doesn’t do at least this is going to be linguistically irrelevant as it ignores the observable truism that humans are, as matter of course, capable of using an unbounded number of linguistic expressions effortlessly. Theories that fail to address this obvious fact are not wrong. They are irrelevant.
Is hierarchical recursion all that there is to linguistic creativity? No!! Chomsky makes a point of this in the preface to the enlarged edition of L&M. Linguistic creativity is NOT identical to the “recursive property in generative grammars” as interesting as such Gs evidently are (L&M: viii). To repeat, recursion is a necessary feature of any account aiming to account for linguistic creativity, BUT the Cartesian conception of linguistic creativity consists of far more than what even the most explanatorily adequate theory of grammar specifies. What more?
 This is not unique to the linguistic cognition. Lots of work in cog sci seems to identify higher cognition with categorization and pattern matching. One of the most important contributions of modern linguistics to cog sci has been to demonstrate that there is much more to cognition than this. In fact, the hard problems have less to do with pattern recognition than with pattern generation via rules of various sorts. See notes 5 and 6 for more off handed remarks of deep interest.
 I suspect that some partisans of Construction Grammar fall victim to the same misapprehension.
 Many cog-neuro types confuse hierarchy with recursion. A recent prominent example is in Frankland and Greene’s work on theta roles. See here for some discussion. Suffice it to say, that one can have hierarchy without recursion, and recursion without hierarchy in the derived objects that are generated. What makes linguistic objects distinctive is that they are the products of recursive processes that deliver hierarchically structured objects.
 Note that unbounded implies novelty, but novelty can exist without unboundedness. The creativity issue relates to easy handling of novel structures. This can occur even in small finite domains. Creativity implies projection, which must specify a dimension of generalization along which inputs can be extended to apply to instances beyond the input. Unboundedness makes projection a no-brainer. It further implies that the generalization involves recursive rules. Unboundedness cannot be pattern matching. It requires a specification of rules that can be repeatedly applied to create novel patterns. Thus, it is important to keep the issue of unboundedness separate from that of projection. What makes the unboundedness of syntax so important is that it requires that we move beyond the pattern-template-categorization conception of cognition.
 It is arguable that some rules are more manifest in the data that others are and so are more accessible to inductive procedures. Chomsky makes this distinction in L&M, contrasting surface structures which contains “formal properties that are explicit in the signal” to deep structure and transformations for which there is very little to no such information in the signal (L&M:19). For another discussion of this distinction see (here).
 Thus the hope of unearthing phrases via differential intra-phrase versus inter-phrase transition probabilities.
 We really should distinguish between ‘learning’ and ‘acquisition.’ We should reserve the first term for the pattern recognition variety and adopt the second for the induction to rules variety. Problems of the second type call for different tools/approaches than those in the first and calling both ‘learning’ merely obscures this fact and confuses matters.
 Although this is a sermon for another time, it is important to understand what a good model does: it characterizes the underlying mechanism. Good models model mechanism, not data. Data provides evidence for mechanism, and unless it does so, it is of little scientific interest. Thus, if a model identifies the wrong mechanism not matter how apparently successful in covering data, then it is the wrong model. Period. That’s one of the reasons connectionist models are of little interest, at least when it comes to syntactic matters.
I should add, that analogous creativity concerns drive Gallistel’s arguments against connectionist brain models. He notes that many animals display an effectively infinite variety of behaviors in specific domains (caching behavior in birds or dead reckoning in ants) and that these cannot be handled by connectionist devices that simply track the patterns attested. If Gallistel is right (and you know that I think he is) then the failure to appreciate the logic of infinity makes many current models of mind and brain beside the point.
 Note that unbounded implies novelty, but novelty can exist without unboundedness. The creativity issue relates to easy handling of novel structures. This can occur even in small sets. Creativity implies projection which must specify a dimension of generalization along which inputs can be extended to apply to instances beyond the input. Unboundedness makes projection a no-brainer. It further implies that the generalization is due to recursive rules that require more than establishing a fixed number of patterns that can be repeatedly filled to create novel instances of that pattern.