Monday, July 9, 2018

Newton as empiricist

One thing Empiricism (E) got right is that there are no foundational assumptions of general scientific utility that are empirically inviolable. Rationalists (R) once thought otherwise, thinking that the basics of the mechanical philosophy (matter is geometrical and all forces are contact forces) as well as an innate appreciation of some of God’s central features could undergird permanent foundations for investigation of the physical world. One of Newton’s big debates with the Cartesians revolved around this point and he argued (as convincingly examined hereby Andrew Janiak) that we have no privileged access to such theoretical starting points. Rather, even our most basic assumptions are subject to empiricalevaluation and can be overturned.

This is now the common view. The Rs were wrong, the Es right. Or at least someof them were. Descartes and his crew were metaphysical foundationalists in that they took the contours of reality to be delivered via clear and distinct innateideas. As these ideas reflected what had to be the casethey served to found physical inquiry in a mechanical view of the world. So for Rs innate ideas were critical in the sciences in that they were metaphysically foundational. Newton denied this. More exactly, he denied that there were empirically unassailable foundations for natural philosophy. And he did this by arguing that even foundational assumptions (e.g. the mechanical philosophy) were subject to experimental evaluation. This he proceeded to do in the Principia Andrew Janiakwhere he argued that the mechanical philosophers had it all wrong, and that mass and gravity were inconsistent with mechanism and so false. Great argument.

As Janiak tells the story, three features of Newton’s story really made his conclusions convincing. First, he provided a mathematical formulation of the force laws and the law and gravitation. Second, he showed how these unify terrestrial and celestial mechanics. That they wereunified was a staple of Cartesian mechanical thinking. However, they could not show that this conviction was scientifically justifiable. Newton unified the two domains via gravity (plus he threw in the tides as a bonus) thus achieving what the mechanical philosophers wanted by using a force they despised. Third, Newton provided a principled account for Galileo’s observation that acceleration was independent of the shape/size/density of the accelerating objects. Why this should be so was a real puzzle, and an acknowledged one. The link between gravitation and mass that Newton forged had this independence fall out as a trivial consequence. The second and third achievements were substantive within the framework of Cartesian physics (they solved problems that Cartesians recognized as worthy of explanation) but they were inconsistent with Cartesian mechanical philosophy (because they were based on Newton’s conceptions of mass and gravity), as was widely understood. This is why Newton’s physical interpretation of his formal work was resisted, though everyone agreed that the math was wondrous. The problem was not the math, but what Newton took the math to mean.

The Janiak book goes into lots of detail regarding Newton’s argumentative approach to arguing against Cartesian orthodoxy. It is a really great read. The principle form of the argument goes as follows: it is possible to know thatsomething is true without knowing exactly howit can be.[1]This involved unpacking Newton’s famous dictum that the does not feign hypotheses. This Janiak notes does not mean that Newton did not advance theories. Clearly he did. Rather Newton meant that he believed that he could both say that he knew something to be the case and that he did not know exactly how what he knew to the case could be the case. Sound familiar? This is a critical distinction and Newton’s point is that unless one carefully distinguishes the questions one is asking it is very hard to evaluate the quality of the answers. Some data for some questions can be completely compelling. The same data for other questions hardly begin to scratch the surface. 

Newton showed that it is both possible to know thatGravity is real and not know exactly whyit has the properties we know it to have or even to exactly know whatit is (a relation or a quality). What Newton did know was that one could believe thatgravity was real without thinking that it was an inherent property of bodies that acted at a distance. He insisted that local action was consistent with gravity (there was no contradiction between the two) even though he did not know how it could be (he had no constructive theory of local action that included gravity). 

I would have put things somewhat differently: Newton knew that gravity existed and what its signature features were and some central cases of its operations. He had an effective theory. But he did not believe that he had a fundamental account of gravity, though he knew that there could not be a mechanical explanation of these gravitational effects. Moreover, he knew that nothing known at the time implied that it was inconsistent with local action of some other yet to be determinedkind. So, he knew a lot but not everything, and that was good enough for him.

Note what convinced contemporaries: unification, and explanation of outstanding generalizations. So too today: we want to unify distinctive domains in syntax (MP requires this) and want to explain why the generalizations we appear to have discovered hold. This is the key to moving forward, and imitating Newton is not the worst path forward.

So, Eism did come out right on this point, but most Es did not. Newton was somewhat of an outlier on these foundational matters. He rightly understood that Cartesian metaphysics and its innate ideas window into reality would not wash. But he did not appear to embrace an Eist epistemology (or at least theory of mind). Unlike many, he did not endorse the view that there are no ideas in the intellect that are not first in the senses. Thus, his beef with Rists was not their nativism but their supposition that being innate conferred some kind of privileged metaphysical status. The way Newton short circuited the metaphysical conclusions was by showing that they did not hold up to experimental and theoretical scrutiny. He did not argue that the ideas could not be useful because they could not exist (i.e. there are no innate ideas) or that the only decent ideas were those based in perception. This was fortunate for curiously the Eists did not really grasp N’s basic take home message: ideaswherever they come fromrequire scientific validation through experiment and theory. And every part of every theory is in principle up for grabs, whatever its psychological source. Eists have tended not to understand that this was Newton’s message and have concluded that because Newton showed that Rist foundationalism failed that it failed because it countenanced innate ideas. 

But this was not the problem. The problem came with the added assumption that innate ideas in virtue of being innate are empirically unassailable. Curiously, Eists came up with their own form of foundationalism based on their view that the only good idea are the ones based in the senses. This idea didn’t turn out very well either and gave rise to its own forms of foundationalism, also false by Newton’s standards.

So, Rs were wrong about the relation of ideas to truth even though they were largely right about the psychology. Newton was right that there are no useful a priori foundations for the sciences (i.e. foundations that are not ultimately empirically justified). Es were right in that they believed that Newton did show that Rism was wrong to the degree that it was foundationalist. Where Es got lost is in rejecting the Rish psychology becauseit failed to provide metaphysical foundations. Eism ended up looking for epistemological foundations that would demarcate legit (i.e. scientific thinking) from non legit thinking (everything else). Eists pursued the idea that if one grounded ones ideas in the senses will guarantee good foundations. Newton would not have approved. The right conclusion is that that there are no epistemological shortcuts to good science. 

[1]These are two questions that, though related, should not be confused. I have argued that they often have been within linguistics too. Thathumans have an FL that undleries their unique capacity to acquire language is almost a tautology, IMO. How FL allows humans to do this is a substantive problem that we have just started to crack. 

Monday, July 2, 2018

Apes and the Chomsky Hierarchy

The obit for Koko in a recent post (here), though intended somewhat tongue in cheek, garnered three interesting comments. They pointed to a couple of recent papers that (in some sense that I will return to) extended the intellectual project of which Koko was an important cog when I was in grad school. It was the first EvoLang project that I was exposed to. The aim was to establish the linguistic facility of non-human apes. Which is what got me into that gorilla suit. 

In honor of Chomsky’s birthday, Elan Dresher, Amy Weinberg and yours truly decided to settle the issue once and for all by having Koko publically debate Noam on the topic of whether or not Koko was linguistically competent. IMO, the debate was a draw (though Noam (the real Noam, not Dresher-Noam) begged to differ). However, whatever the outcome, for a while after the debate I was considered somewhat of an expert on the topic as many concluded that having been the one in the suit I had an ape’s eye view of the issue and so could add a heretofore unavailable slant on the topic.  I milked this for all it was worth (as you would have too, I am sure).

So what was the debate about? The question was whether human language is continuous with similar linguistic capacities in our ape cousins. Eric Lenneberg dubbed the claim that human language is a quantitative extension of qualitatively analogous capacities in our ape cousins the Continuity Thesis (CT). Given CT, our evidently greater human linguistic capacities are just diminished ones of our ape cousins when turbo charged with greater brainpower. Human linguistic competence, in other words, is just ape linguistic competence goosed by a higher IQ. Given CT, we humans are not different in kind. We just have bigger brains (fatter rounder frontal lobes), more under the cranial hood, and our superior verbal capacities just hydroplane on that increased general intelligence.

Koko (Penny Patterson’s verbal gladiator) was not the only warrior in the CT campaign. There were many apes recruited to the good fight (see here). The Gardners had Washoe, Sue Rambaugh had Kanzi, Herb Terrace had Nym. There were others too. A lot of effort went into getting these apes to use signs (verbal, manual, computerized) and demonstrate the rudiments of linguistic competence. The aim was to show that they would/could acquire (either naturally, or more standardly after some extensive training) the capacity to articulate novel semantically composed messages using a rudimentary syntax similar to what we find in human natural language. Once this basic syntax was found, we could solve the EvoLang problem by attributing human verbal facility to this rudimentary syntax proportionatly enhanced by our bigger brains into the wondrous linguistically creative syntactic engine currently found in humans. Not surprisingly, it was believed that were it possible to show that, linguistically speaking, these other apes were attenuated versions of us, this would demonstrate that Chomsky was wrong and that there was nothing all that special about human linguistic capacity. If CT was correct then the difference between them and us is analogous to the difference between a two cylinder Citroen Deux Cheveaux and a Mercedes AMG 12 cylinder. The principles were the same even if the latter dwarfed the horsepower of the former. CT, in sum, obviated the need for (and indeed would be considerable evidence against) a dedicated linguistic mental module.

This really was all the rage for quite a while (hence my several months of apish celebrity). And I would bet (given the widespread coverage of Koko’s demise) that you could still get interviewed on NPR or published in the New Yorker with a groundbreaking (yet heartwarming) story about a talking ape (apes play well, though parrots also get good press).  In academic circles however, the story died. The research was largely shown to be weak in the extreme (“crappy” and “shoddy” are words that come to mind). Herb Terrace’s (non shoddy, not crappy) work with Nym Chimpsky (Mike Seidenberg and Laura Petito were the unfortunate grad students that did the hard slogging) more or less put to rest the idea that apes had any significant syntax at all, and showed that it is nothing like what we see in even the average three year old toddler.

Charles Yang has a really nice discussion of the contrast in capacities between Nym and your average toddler in a recent paper on Zipf’s law as it applies to the linguistic productions of toddlers vs those of Nym (here). By the global measure which that law affords Charles is able to show that:

Under the Zipfian light, however, the apparent continuity between chimps and children proves to be an illusion. Children have language; chimps do not. Young children spontaneously acquire rules within a short period of time; chimpanzees only show patterns of imitation after years of extensive training. 

Moreover, these “patterns of imitation” do not show the hallmarks of Zipfian diversity that we would expect to see if they were products of grammatical processes. As Charles bluntly puts it: Nym “was memorizing, not learning a rule.”[1]

The ineluctable conclusion is that teaching apes to talk came nowhere close to establishing CT. It failed toe establish anything like CT. In short, there is zero evidence from this quarter to gainsay the obvious observation (obvious to anyone but a sophisticated psychologist that is) that nothing does language like humans do language, not even sorta, kinda. The gap is not merely wide it is incommensurable. The gap is qualitative, not quantitative and the central distinguishing difference is that we humans are gifted with a biologically sui generic syntactic capacity. We develop unique Gs on the basis of unique FL/UGs. Yay us![2]

Now, I admit that I thought that this line of discussion was dead and buried, at least in the academic community. But I should have known better. What the commentators noted in their comments to my previous obit for Koko is that it is baaaack, albeit in a new guise, this time sporting a new fancy formal look. The recent papers adverted to, one outlining the original research conducted by a consortium of authors including Stan Dehaene (SD&Co)(here) and a comment on the work explaining its significance by Tecumseh Fitch (TF) (here), resurrects a modern version of CT, but this time in more formidable looking garb. IMO, this time around, the discussion is of even less biolinguistic or evolang relevance. I’d like to talk about these papers now to show why I find them deeply disappointing.

So what does SD&Co do? It shows that Macaques (i.e. monkeys, not apes so further from humans that is really CT optimal) can be (extensively) trained (10,000-25,000 training trials) to produce supra-regular (manual) sequences that whose patterns go beyond the generative capacity of finite state automota. This is argued to be significant for it provides evidence that crossing the regular grammar boundary is not “unique to humans” (SD&Co, 1). This fact is taken to be relevant biolinguistically for it “indicates cognitive capabilities [in non humans, NH]…that approach the computational complexity level of human syntax” (TF, R695).

The experiments taught monkeys to perform sequential operations in the spatial-motor domain (touching figures in sequential patterns on a screen). These manual patterns go beyond those describable by regular grammars. It is argued that mastering this sequential capacity implies that monkeys have acquired supra-regular grammars able to generate these patterns (Phrase Structure Grammars (PSG)), heretofore thought to be “only available in humans” (SD&Co, 1). So, assuming that the experiments with the monkeys was well done (and I don’t have the expertise to deny this, nor do I believe it (nor do I think that it is relevant to do so)), then it appears that non humans can cross over the domain of regular grammars and acquire performance compatible with computational systems in (at least) the PSG part of the Chomsky Hierarchy. Let’s stipulate that this is so. Why should we care? TF provides the argument. Here it is in rough outline:

1.    Only humans show an “unbounded” communicative “expressive power”
2.    Shared traits are a “boon to biologists interested in language” to “test evolutionary hypotheses about adaptive function”
3.    Syntax has “until now resisted the search for parallels in our animal brethren”
4.    It has heretofore been thought that “grammars” are “beyond the capabilities of nonhuman animals”
5.    SD&Co “show[s] that with adequate training, monkeys can break beyond this barrier”
6.    In particular, the monkeys could learn rules beyond those in finite state (regular) grammars
7.    So nonhumans have “supra-regular computational capacities”
8.    And this “suggest[s] that the monkey’s brain possesses the kind of cognitive mechanisms required for human linguistic syntax…”

So there is it, the modern version of CT.  Monkeys have what we have just a little less of it in that ours requires a handful of examples (5 according to SD&Co) to get it going while theirs needs on the order to 10,000-25,000. Deux Cheveaux vs Mercedes it is, once again.

So, is this argument any better or insightful than the earlier Ape versions of CT? Not as far as I can tell. Let me say why I don't think so.

Note that his version of CT is very much more modest than the earlier attempts. Let’s count the ways.

First, earlier versions looked to our immediateevo relations. Here, it’s not our immediate cousins whose capacities are investigated, but monkeys (and they are very, very, very distant relations).

Second, early CT were interested in demonstrating a productive syntax in service of semantic productivity. What’s wondrous about us is that the syntax is linked to (underlies?) semantic productivity. We are not looking at mere patterns or sequences. What is impressive in language is the fact that the relevant patterns can be semantically deployed to produce and understand an open ended number of distinct kinds of messages/thoughts. The SD&Co experiments have nothing to do with semantic productivity. It is just pattern generation that is at issue. There is no reason provided to think that the monkeys can or could use these patterns for compositional semantic ends. So, one key feature of oursyntax (the fact that it ties together meaning and articulation) is set to one side in these experiments.[3]And this is a big retreat from the earlier CT work that correctly recognized that linguistic creativity is the phenomenon of interest.

Third, and related to the second, the kind of grammars the monkeys acquire have properties that our grammars never display. Human Gs don’t have mirror image rules. And the reason we think that such kinds of rules are absent is that human Gs have rules that don’t allow them. They are “structure dependent.” Our rules exploit the hierarchy of linguistic representations, and the syntax eschews their sequential properties. So it is unclear, at least to me, why being able to teach monkeys rules of the kind we never find in human Gs would tell us about the properties of human Gs that fail to pattern in accord with these rules. And it it cannot do this, it also cannot inform us as to how our kinds of Gs evolutionarily arose in the species. 

This criticism is analogous to one raised against earlier CT work. It was regularly observed that even the most sophisticated language in nonhumans (the best actually being on dolphins by Herman and published in Cognition many, many, many years ago) made use of an ordinal encoding of “thematic” role to sequential position. The critters were taught a “grammar” to execute commands like “Bring A with B to C” with varying objects in the A,B,C positions. They were pretty good at the end, with the best able to string 5 positional roles and actions together. So, they mastered an ordinal template to roles and actions productively (though not recursively (see note 4)). But, and this was noted at the time and since, it was a sequential template without a hint of hierarchy. And this is completely unlike what we find in human Gs.[4]

Interestingly, SD&Co notes that the same thing holds in its experiments (pp. 6-7). 

Even after extensive training on length 4 sequences, behavioral analysis suggested that monkeys still relied on a simple ordinal memory encoding, whereas pre-schoolers spontaneously sued chunking and global geometric structure to compress the information. Thus, the human brain may possess additional computational devices, akin to a “language of thought,” to efficiently represent sequences using a compressed descriptor during inductive learning.

In other words humans display a sense of constituency (after 5 exposures) while highly trained monkeys never develop one (well, not even after 25,000 training examples). Maybe its because human Gs are built around the notion of a constituent, while monkey Gs never are. And as constituency is what allows human semantic productivity (it underlies our conceptions of compositionality) it really is quite an important difference between us and them. In fact, it has been what people like me who claim that human syntax is unique have pointed to forever. As such, discovering that monkeys fail to develop such a sense of hierarchy and constituency seems like a really big difference. In fact, it seems to recapitulate what the earlier CT investigations amply established. In fact, even CD&Co seem to suggest that it marks a qualitative difference, referring as it does to “additional computational devices,” albeit describing these as properties of a distinctive “language of thought” rather than what I would call “syntax.” Note, that if this is correct, then one reasonable way of understanding the SD&Co experiments is establishing that what monkeys do and what we do is qualitatively different and that looking for Gs in our ancestors is a mug’s game (the conclusion opposite to the one that TF draws). In other words, CT is wrong (again) and we should just stop assuming that general intelligence or general computational capacities will come to the evolang rescue. 

Fourth, it is important to appreciate what a modest conclusion DG&Co’s paper licenses even if true. It shows that monkeys can manage to acquire capacities wrt to kinds of patterns, including mirror image ones, that are assumed to be relevantly similar to linguistically relevant ones. What makes the patterns linguistically interesting? Regular Gs cannot generate them. They require at least Gs in the PSG part of the Chomsky Hierarchy. The conclusion is that monkeys have computational capacities beyond the regular, and lie in at least the PSG precincts. But even if correct, this is a very weak conclusion. Why? 

There are billion and billions of supra-regular rules/Gs (indeed there are even humongously many regular ones). Human Gs have zeroed in on one extremely small corner of this space. And it is true that unlessnon-humans can master supra-regular patterns they cannot have human Gish capacities because our Gs are super-regular. However even if they can master supra-regular Gs it does not follow that the kinds they can master are in any way similar to those that we deploy in natural our natural language facility. In other words, the claims in 7/8 above are wildly misleading. The fact that monkeys can have one kind of supra-regular capacity tells us nothing at all about whether they can ever have our kind of supra-regular capacity, the one underlying our linguistic facility. Assuming otherwise is simply a non-sequitur. In other words, TF’s claim that the monkey behavior “suggest[s] that the monkey’s brain possesses the kind of cognitive mechanisms required for human linguistic syntax” is entirely unfounded if this is taken to mean that they have computational mechanisms like those characteristic of our Gs.

Let me put this another way: GGers have provided pretty good descriptions of the kinds of properties human Gs have. The evolang question is how Gs with thesefeatures could have arisen in the species. Now, one feature of these Gs is their recursivity. But this is a very weak feature. Any G for a non-finite language in the Chomsky Hierarchy will have this feature.  But virtually none of them will have the properties our Gs enjoy. What we want from a decent evolang story is an account of how recursive Gs like ours(ones that generate an unbounded number of hierarchically structured objects capable of supporting meaning and articulation, that are non-counting, that support displacement, etc. etc. etc.) came to be fixed in the species. That’s what we want to understand. There is virtually no reason to think that the SD&Co advances this question even one nano-nano-nano meter (how many nano meters to a jot?) for the rules they get the monkeys to acquire look nothing like the rules that characterize our Gs (recall, human Gs eschew mirror image rules, they link meaning with articulation, they show displacement etc.).

Truth be told, GGers (and here I mean Chomsky) might be a little responsible for anyone thinking simple recursion is the central issue. We are responsible for two different reasons. 

The first sin is that many a minimalist talk talk starts by insisting that the key feature of human linguistic facility is recursion. However, it is not recursion per se that is of interest (or not only recursion) but the very specific kind of recursion that we find in human language. There are boundlessly many recursive systems (even regular languages have recursive Gs that can generate an unbounded number of outputs). Our specific recursive Gs have very distinctive properties and the relevant evolang question is not how recursion arose but how the specific hierarchical recursion we find in humans arose. And there is nothing in these papers that indicates that human linguistic facility is any less qualitatively distinctive than GGers have been arguing it is for the last 60 years. In fact, in some important ways, as I’ve noted, this recent foray into CT is several steps less interesting than the earlier failed work, as it fails to link syntactic recursion to semantic creativity, like earlier work tried to do (even if it largely failed).

The second failing can be laid at the feet of Syntactic Structures. It presents a neat little argument showing that natural language Gs cannot be modeled as finite state automata. This was worth noting at the time because Markov processes were taken to be useful models of human cognition/language.[5]So Chomsky showed that this could not be true even looking at extremely superficial properties found in the sequential patterns of natural language (e.g. if…then…patterns). He then noted that PSGs could generate such patterns and argued that PSGs are also inadequate once one considers slightly more interesting generalizations found in natural language. The second argument, however, does not claim that PSGs cannot generate the relevant patterns, but do not do so well (e.g. they miss obvious generalizations, make Gs too complicated etc.). Some concluded that this was a weak argument against PSGs (I disagree), but what’s more important is the conclusion tacitly drawn was that being PSGish is what language is all about. There is a very strong whiff of this assumption in both papers. But this is wrong. Given Chomsky’s points, being (at least) PSGish is at most a necessary condition in that being Finite State Gs cannot even get off the ground. But we know that not all PSGs are the same and that natural language shows very distinctive PSG properties (e.g. headedness, binarity) so the necessary condition is a very weak feature and that having PSG capacities does not imply having those underlying human language. It’s a little like discovering that the secret to life is the prime factorization of an even number and concluding that I am closer to finding it because I know how to factor 6 into its primes. Knowing that it is even is knowing something, just not very much.

Let’s return to the main discussion again. Fifth, if the point is to demonstrate that animals have fancy computational capacities (e.g. ones that require memory more involved than the kind we find in Finite State Machines) then we already have tons of evidence that this is so. In fact, I am willing to bet that being able to identify the position of the sun even when obscured, using that position to locate a food source, calculating a direct route back home despite a circuitous route out, communicating the position of this food source to others by systematically dancing in the dark and by being able to understand this message and reverse the trajectory of flight all the while calibrating the reliability of these messages by comparing their contents with a mental map specifying whether the communicated position is one of the possible positions for the food source is quite a bit more computationally involved than mirror reversing a sequence. In fact, I would bet that one comes close to using full Turing computational resources for this kind of calculation (e.g. memory much more involved than a stack, and computations at least as fancy as mirror reversal). So, if the aim is to show that animals have very fancy computational capacities, we already know that they do (cashing behavior is similarly fancy, as is dead-reckoning etc.). Just take a look at some of the behaviors that Gallistel is fond of describing and you can put to rest the question of whether non-human animal cognition can be computationally very involved. It can be. We know this. But, as fancy as it can be, it is fancy in ways different from the ways human linguistic capacity is fancy and there is no reason to think that mastering one gives you a leg up on mastering the other (you try dancing in the dark to tell someone where to find a cheeseburger at 4 AM). 

In sum, there is no reason to doubt that non-humans have fancy cognitive computational capacities. But there is also no way currently of going from theirvery fancy capacities to the ones that underlie our linguistic facility. And that, of course, is the problem of interest. If the aim of SD&Co is to demonstrate that non-humans can be computationally fancy shmancy, it was a waste of time. 

Sixth, let me end with a much weaker concern. There is one more sense in which these kinds of experiments strike me as weak. What we believe about humans is that it is (partly) in virtue of having Gs that generate structures of certain sorts that humans have linguistic capacities of the kinds they have. So, for example, GGers argue that it is in virtue of having Merge that Gs allow movement and reconstruction that humans can creatively understand the interpretation of sentences like Which of his1books did every author insist you review. Without Gs of this sort, these behaviors could not be accounted for.  Now, it is quite unclear to me if most of the animal literature on sequence pattern capacities in animals (can they master AnBnor mirror image sequences?) shows that they animals succeed in virtue of having PSGs that generate such sequences. Maybe they do it some other way, using systems more powerful than PSGs. Let me explain what I mean.

If asked to verify whether a sequence of As and Bs was an AnBnsequence, I personally would count the As and Bs and see if they matched up. This would allow me to do the task without using a PSG to generate the string that has n As followed by n Bs. In the first case I count. Not so the second. Now, I might be off here, but I do not see that the experiments generally reported (including the SD&Co one) show that the animals strut their cognitive stuff in virtueof mastering Gs with the right properties. Note, that PSGs can do what I do with counting without it. But another G could solve the problem with counting. So how do we know the animals don’t count (note that we are usually talking of very short sequences (3,4 unites long))? Or, more specifically, how do we know that they solve the problem by constructing/acquiring a PSG that they use to solve it? The fact that such a G could do this does not imply that this is how the monkeys actually do it. There are many ways to cognitively skin a problem. Using a non counting PSG is one specific way. There are others.[6]

Ok, enough already. I am a fan of the work of both DeHaene and of Fitch. I have learned a lot by reading them. But I don’t see the value of this computational revival of CT. It really does nothing to advance the relevant evolang or biolinguistic issues so far as I can tell. More worrisome is that it is generally oversold and, hence, misleading. It is time to bury the Continuity Thesis in all of its manifestations. Human language really is different. We even have some idea how. For evolang to have any interest, it needs to start asking how itcould have evolved, how something with its distinctive properties could have arisen. The current crop of CT inspired explorations of the Chomsky Hierarchy don’t do this, though they don’t do this in an obscure way that covers up their general irrelevance. I prefer the older ape based ways of being wrong. They at least made for some fun theater. Thx Koko. 

[1]Charles’ most interesting observation in this wonderful little paper (read it!) is the following: 

To this day, Nim has provided the only public database of signs from animal language studies. (By contrast, the ability of Koko, the famous talking gorilla who occasionally holds online chats, comes exclusively from her trainer’s interpretation and YouTube clips.) 

It is amazing how far anecdotes can take you when you are pushing a line whose truth is strongly desired. Herb Terrace and company are to be commended for studying the issue scientifically rather than a la NPR.
[2]There was always another thing odd about CT work. Say we could show that apes had linguistic capacities (nearly) identicalto ours. That would allow us to explain human linguistic facility by saying that humans inherited it from a common ape/human ancestors.
So given this assumption, how we became facile is easily explained. But this would not really explain the question we are most interested in: how linguistic capacity in general arouse. It would only explain why we have it given our ape ancestors had it. But as the really interesting question is how language arose, not how it arose in us, the very same problem pops up now as regards our ape ancestors and monkeys, unless we assume that monkeys too have/had more or less the same linguistic facility as our ape ancestors. And so on through the clades. At some point, heading backwards through our ancestry we would come to a place where animal X had language while related animal Y did not. And when we got there, the very same problem we are trying to answer in the human/ape case would arise, and arise in the very same form. As such it is unclear what kind of progress we make by attributing to our ancestors (pale) versions of our capacities unless we are willing to go all the way and attribute paler and paler versions of this linguistic capacity all the way down (or out) the evo tree/bush. 
This brings us to the real implicit assumption behind the earlier CT enterprise: everything talks. The differences we see are never qualitative ones. For unless we assume this pushing the problem back a clade or two really does nothing to solve the conceptual puzzle of interest. So the inchoate assumption has always been that language is just a product of general intelligence, intelligence is common across animals (life?) and one only sees language emerge robustly when intelligence crosses a certain quantitative threshold. The only reason that dogs don’t talk is that they are too dumb. But don’t say this around a favored pet because they might get pissed.
Last point: this anti-modularity conception of cognition is just part of the general Eish conception of minds. Gallistel and Matzel have a nice discussion of how Associationism favors general intelligence approaches to the mind because modularity requires specific bespoke architectures. 
[3]Note that earlier CT work aimed at showing that this was possible in animals. Like I note below, Dolphins didn’t do badly wrt simple sequence/thematic meaning linkages. So they showed a kind of compositionality in their acquired G.
[4]It was also not productive, capping out at 5 (at most). So there was no recursion here. What we would like to see is recursion coupled to semantic composition via a hierarchical syntax. Neither the dophins nor the macaques provide this.
[5]My recollection is that heavyweights like Suppes mooted as much.
[6]The work on ‘Most’ (here) revolves around just this theme: truth conditions can be determined by various different generative procedures. The relevant GG question is the nature of the generative procedure. So too here. Are the monkeys solving their problems by using an acquired G that generates the relevant strings or are they doing it in some other way (e.g. by counting or ordinally ordering the inputs?). 

Monday, June 25, 2018

Physics envy blues

These are rough days for those who suffer from Physics Envy (e.g. ME!!!!). It appears (at least if the fights in the popular press are any indication) that there is trouble in paradise and that physicists are beset with “ a growing sense of unease” (see herep. 1).[1]Why? Well for a variety of reasons it seems. For some (e.g. BA) the problem is that the Standard Theory in QM (ST) is unbelievably well grounded empirically (making “predictions verified to within one-in-ten-billion chance of error” (p.1)) and yet there are real questions that ST has no sway of addressing (e.g. “Where does gravity come from? Why do matter particles always possess three, ever heavier copies, with peculiar patterns in their masses, and why does the universe contain more matter than antimatter?” (p.1-2)). For others, the problem seems to be that physicists have become over obsessed with (mathematical) beauty and this has resulted in endless navel gazing and a failure to empirically justify the new theory (see here).[2]

I have no idea if this is correct. However, it is not a little amusing to note that it appears that part of the problem is that ST has just been too successful! No matter where we point the damn thing it gives the right empirical answer, up to 10 decimal points. What a pain!

And the other part of the problem is that it appears that the ways that theorists had hoped to do better (super-symmetry, string theory etc.) have not led to novel empirical results. 

In other words, the theory that we have that is very well grounded doesn’t answer questions to which we would love to have answers and the theories that provide potential answers to the questions we are interested in have little empirical backing. This hardly sounds like a crisis to me. Just the normal state of affairs when we have an excellent effective theory and are asking ever more fundamental questions. At any rate, it’s the kind of problem that I would love to see in linguistics.

However, this is not the way things are being perceived. Rather, the perception is that the old methods are running out steam and that this requires new methods to replace them. I’d like to say a word or two about this.

First off, there seems to be general agreement that the strategy of unification in which “science strives to explain seemingly disparate ‘surface’ phenomena by identifying, theorizing and ultimately proving their shared ‘bedrock’ origin” (BA;2) has been wildly successful heretofore. Or as BA puts in a more restrained manner, “has yielded many notable discoveries.” So the problem is not with the reasonable hope that such theorizing might bear fruit but with the fact that the current crop of ambitious attempts at unification have not been similarly fecund. Again as BA puts it: “It looks like the centuries-long quest for top-down unification has stalled…” (BA:2). 

Say that this is so. What is the alternative? Gelman in discussing similar issues titles a recent post “When does the quest for beauty lead science astray?” (here). The answer should be, IMO, never. It is never a bad idea to look for beautiful theories because beauty is what we attribute to theories that explain(i.e. have explanatory oomph) and as that is what we want from the sciences it can never be misleading to look for beautiful theories. Never.

However, beauty is not the onlything we want from our theories. We want empirical coverage as well. To be glib, that’s part of what makes science different from painting. Theories need empirical grounding in addition tobeauty. And sometimes you can increase a theory’s coverage at the expense of its looks and sometimes you can enhance its looks by narrowing its data coverage. All of this is old hat and if correct (and how could it be false really) then at any given time we want boththeories that are pretty and also cover a lot of empirical ground. Indeed, a good part of what makes a theory pretty is howit covers the empirical ground. Let me explain.

SH identifies (here) three dimensions of theoretical beauty: Simplicity, Naturalness and Elegance (I have capitalized them here as they are the three Graces of scientific inquiry).[3]

Theories are simple when they can “be derived from a few assumptions.” The fewer axioms the better. Unification (showing that two things that appear different are actually underlyingly the same) is a/the standard way of inducing simplicity. Note that theories with fewer axioms that cover the same empirical ground will necessarily have more intricate theorems to cover this ground. So simpler theories will have greater deductive structure, a feature necessary for explanatory oomph. There is nothing more scientifically satisfying than getting something for nothing (or, more accurately, getting two (better still some high N) for the price of one). As such, it is perfectly reasonable to prize simpler theories and treat them as evident marks of beauty.[4]

Furthermore, when it comes to simplicity it is possible to glimpse what makes it such a virtue in a scientific context. Simpler theories not only have greater deductive structure, the are also better grounded empirically in the sense that the fewer the axioms, the more empirical weight each of them supports. You can give a Bayesian rendition of this truism, but it is intuitively evident. 

The second dimension of theoretical beauty is Naturalness. As SH points out, naturalness is an assumption about types of assumptions, not the number. This turns out to be a notion best unpacked in a particular scientific local. So, for example, one reason Chomsky likes to mention “computational” properties when talking about FL is that Gs are computational systems so computational considerations should seem natural.[5]Breakthroughs come when we are able to import notions from one domain into another and make them natural. That is why a good chunk of Gallistel’s arguments against neural nets and for classical computational architectures amounts to arguing that classical CS notions should be imported into cog-neuro and that we should be looking to see how these primitives fit into our neuro picture of the brain. The argument with the connectionist/neural net types revolves around how natural a fit there is between the digital conception of the brain that comes from CS and the physical conception that we get out from neuroscience. So, naturalness is a big deal, but it requires lots of local knowledge to get a grip. Natural needs an index. Or, to put this negatively, natural talk gets very windy unless grounded in a particular problem or domain of inquiry.

The last feature of beauty is Elegance. SH notes that this is the fuzziest of the three. It is closely associated with the “aha-effect” (i.e. explanatory oomph). It lives in the “unexpected” connections a theory can deliver (a classic case being Dirac’s discovery of anti-matter). SH notes that it is also closely connected to a theory’s “rigidity” (what I think is better described as brittleness). Elegant theories are not labile or complaisant. They stand their ground and break when bent. Indeed, it is in virtue of a theory’s rigidity/brittleness that it succeeds having a rich deductive structure and explanatory oomph. Why so? Because the more brittle/rigid a theory is the less room it has for accommodating alternatives and the less things a theory makes possiblethe more it explains when what it allows as possible is discovered to be actual.

We see this in linguistics all the time. It is an excellent argument in favor of one analysis A over another B that A implies C (i.e. A and not-C are in contradiction) whereas B is merely consistent with C (i.e. A and not-C are consistent). A less rigid B is less explanatory than a more rigid A and hence is the superior explanatoryaccount. Note that this reasoning makes sense onlyif one is looking at a theory’s explanatory features. By assumption, a more labile account can cover the same empirical ground as a more brittle one. The difference is in what they excludenot what they cover. Sadly, IMO, the lack of appreciation of the difference between ‘covering the data’ and ‘explaining it’ often leads practitioners to favor “flexible” theories, when just the opposite should be the case. This makes sense if one takes the primary virtue of a theory to be regimented description. It makes no sense at all if one’s aims lie with explanation.

SH’s observations make it clear (at least to me) why theoretical beauty is prized and why we should be pursuing theories that have it. However, I think that there is something missing from SH’s account (and, IMO, Gelman’s discussion of it (here)). It doesn’t bind the three theoretical Graces as explicitly to the notion of explanation as it should. I have tried to do this a little in the comments above, but let me say a bit more, or say the same things one more time in perhaps a slightly different way.

Science is more than listing facts. It trucks in explanations. Furthermore, explanations are tied to the why-questions that identify problems that explanations are in service of elucidating. Beauty is a necessary ingredient in any answer to a why-question but what counts as beautiful will heavily depend on what the particular question at issue is. What makes beauty hard to pin down is this problem/why-question relativity. We want simple theories, but not toosimple? What is the measure? Well, stories just as complicated as required to answer the why-question at issue. Ditto with natural. Natural in one domain wrt to one question might be unnatural in another wrt to a different question. And of course the same is the case with brittle. Nobody wants a theory that is so brittle that it is immediately proven false. However, if one’s aim is explanation then beauty will be relevant and what counts as beautifulwill be contestable and rightly contested. In fact, most theorizing is argument over how to interpret Simple, Natural and Elegant in the domain of inquiry. That’s what makes it so important to understand the subtleties of the core questions (e.g. in linguistics: Why linguistic creativity? Why linguistic promiscuity? How did FL/UG arise?). At the end of the day (and, IMO, also at sunrise, high noon and sunset) the value of one’s efforts must be judged against how well the core questions of the discipline have been elucidated and their core problems explained. And this is a messy and difficult business. There is no way around this.

Actually, this is wrong. There is a way around this. One can confuse science with technology and replace explanation with data regimentation and coverage. There is a lot of that going around nowadays in the guise of Big Data and Deep Learning (see hereand here). Indeed some are calling for a revamped view of what the aim of science ought to be; simulation replacing explanation and embracing the obscurantism of overly complex uninterpretable “explanations.” For these kinds of views, theoretical beauty really is a sideshow. 

At any rate, let me end by reiterating the main point: beauty matters, it is always worth pursuing, it is question relative and cannot really ever be overdone. However, there is no guarantee that the questions we most want answered can be, or that the methods we have used successfully till now will continue to be fruitful. It seems that physicists feel that they are in a rut and that much of what they do is not getting traction. I know how they feel. But why expect otherwise? And what alternative is there to looking for beautiful accounts that cover reasonable amounts of data in interesting and enlightening ways? 

[1]This post is by Ben Allanch from Cambridge University physicist. I will refer to this post as ‘BA.’
[2]Sabine Hossenfelder (SH) has written a new book on this topic Lost in Maththat is getting a lot of play in the popular science blogosphere lately. Hereis a strong recommendation from one of our own Shravan Vasishth who believes that “she deserves to be world famous” for her views. 
[3]The Three Graces were beauty/youth, mirth and elegance (see here). This is not a bad fit with our three. Learning from the ancients, it strikes me that adding “mirth” to the catalogue of theoretical virtues would be a good idea. We not only want beautiful theories, we should also aim for ones that delight and make one smile. Great theories are not somber. Rather they tickle one’s fancy and even occasionally generate a fat smile, an “ah damn” shake of the head and a feeling of overall good feeling. 
[4]SH contrasts this discussion of simplicity with Occam’s claiming that the latter is the injunction to choose the simpler of two accounts that cover the same ground. The conception SH endorses is “absolute simplicity,” not a relative one. I frankly do not understand what this means. Unification makes sense as a strategy because it leads to a simpler theory relative to the non-unified one. Maybe what SH is pointing to is the absence of the ceteris paribus clause in the absolute notion of simplicity. If so, then SH is making the point we noted above: simpler theories might be empirically more challenged and in such circumstances Occam offers little guidance. This is correct. There is no magic formula for deciding what to opt for in such circumstances, which is why Occam is such a poor decision rule most of the time.
[5]See his discussion of Subjacency in On Wh Movementfor example.