Monday, February 29, 2016

Hauser reviews "Why only Us"

Though my interest in Darwin's Problem (DP) is deep, my "expertise," such as it is, is restricted to the logic of the argument. The logic is well known: (i) hierarchical recursion is a distinctive hallmark of human I-languages, (ii) there is no evidence that any other animal displays such recursive powers, (iii) what nonlinguistic evidence there is concerning such powers in humans is of rather recent vintage (roughly 100kya), (iv) logically speaking recursion is an all or nothing affair. The conclusion from (i)-(iv) is that something simple occurred roughly 100kya that in combination with the nonlinguistic cognitive and computational powers extant at the tome in our ancestors allowed for this human species specific capacity to emerge. That's the logic. 

It looks a lot like the logic of PoS arguments in that it starts from a specification of the capacity own interest and argues backwards to the causal mechanisms that could produce it. In other words, just as GGers investigate human linguistic cognition by first describing what it is that has been acquired and inferring from this what the system of acquisition must look like, so too we investigate evolutionary possibilities concerning language by first specifying what it is that has evolved. Sadly, this is not the general methods of investigation. Empiricists (both in psychology and evolutionary biology) seem to think that the direction of argument should be reversed: given that we know what learning and evolution is they conclude that a species specific FL or a species specific characteristics cannot grow in human minds nor have evolved there. The arguments they provide are awful, and not for sophisticated reasons. They are awful because they fail to address what we know about human linguistic capacities. They are based on the premiss, in other words, that the facts that GG has discovered over the last 60 years are bogus. As anyone but a flat earther knows this to be wrong,…[1]

Discussion at this level is where my expertise ends. However, DP is more than just logically interesting (it has potential empirical ramifications) and there is more to the logic than I outlined above (Merge may not be the only unique capacity our linguistic facility manifests (see the review below)). Berwick and Chomsky's Why Only Us (WOU) goes into these issues, and Marc Hauser has thought about them hard. So what better way to get into them more deeply than to ask Marc to do a blog post on WOU. He graciously agreed. Here it is. 


Berwick & Chomsky’s Why only us (2016):
Challenges to the what, when, and why?

Marc D. Hauser

Why only us  [WOU] is a wonderful, slim, engaging, and clearly written book by Robert Berwick and Noam Chomsky.  From the authors’ perspective, it is a book about language and evolution. And of course it is.  However, I think it is actually about something much bigger.  It is an argument about the evolution of thought itself, with language being not only one form of thought, but a domain that can impact thought itself, in ways that are truly unique in the animal kingdom.  Seen in this light, WOU provides a framework for thinking about the evolution of thought and a challenge to Darwin’s claim that the human mind is only quantitatively different from other animals. Since this is an idea that I have championed (Hauser, 2009), I am of course a bit partial! Let me unpack all of this by working through Berwick and Chomsky’s arguments, especially those where we don’t quite agree. 

One caveat up front:  as I have written before, including with Berwick and Chomsky (Hauser et al., 2014), I am not convinced that the ideas put forward here or in WOU are testable: animal capacities are far too impoverished to shed any comparative light on the evolution of human language, and the hominid fossil record is either silent or too recent to be of interest. My goal here, therefore, is to focus on the fascinating ideas raised in WOU,  leaving to the side how or whether such ideas might be confronted by significant empirical tests. 

One of the essential moves in WOU is to argue that MERGE —the simplest recursive operation — is the bedrock of our capacity for infinite expression by finite means, one that generates hierarchical structure. Because no other animal has MERGE, and because MERGE  is simple and the essence of language, the evolutionary process may well have occurred rapidly, appearing suddenly in only one species: modern humans or Homo sapiens sapiens (Hss).  To accept this argument, you have to accept at least five premises:

            1- MERGE is the essence of language
            2- No other animal has MERGE
            3- No other hominid has MERGE
  4- Due to the simplicity of MERGE, it could evolve quickly, perhaps
 due to mutation
            5- Because you either have or don’t have MERGE (there is no
                  demi-MERGE), there is no option for proto-language.

I accept 2 because the comparative literature shows nothing remotely like MERGE.  Whether one looks at data from natural communication, artificial language learning experiments, or animal training studies with human language or language-like tokens, there is simply no evidence of anything remotely recursive.  As Berwick and Chomsky note, the closest one gets is the combinatoric gymnastics observed in birdsong, but these are neither recursive nor do they generate hierarchical structures that shape or generate the variety of meaningful expressions observed in all human languages. 

I also accept 3, though here we don’t really have the evidence to say one way or the other, and even if we did, and it turned out that say Neanderthals had MERGE, it wouldn’t really make much of a difference to the argument.  That is, the fossil record for Neanderthal, though richer than we once thought, says nothing about recursive operations, and nor for that matter does the fossil record for Hss. Both records show interesting signs of creative thought — a topic to which I return — but nothing that would indicate recursive thought or expression.  If evidence emerges that Neanderthals had MERGE, that would simply push back the date of origin for Berwick and Chomsky’s evolutionary account, without changing the core details.    

Let’s turn to 1, 4 and 5 then.  What is interesting about the core argument in WOU is that although Berwick and Chomsky place significant emphasis on MERGE, they fully acknowledge that the recursive machinery must interface with the Conceptual-Intensional system on the one hand, and with the Sensory-Motor system on the other.  However, once one acknowledges the non-trivial roles of CI, SM, and the interfaces, while also recognizing the unique properties of each of these systems, it is no longer possible to accept premise 4, and challenges arise for premise 5.  This analysis lays open the door to some fascinating possibilities, many of which might be explored empirically. I consider a few next.

Berwick and Chomsky devote some of the early material of WOU to review work on vocal imitation in songbirds, including comparative genetic and neurobiological data.  In some ways, the songbird system is a lovely example because the work is exquisitely detailed and shows some nice parallels with our own.  In particular, songbirds learn their song in some of the same ways as young children learn language, including evidence of an innate system that constrains both the timing and material acquired.  However, there are elements of the songbird system that are strikingly different from our own, not mentioned in WOU, but when acknowledged, tell an even more interesting tale about the evolution of Hss — one that is at the same time supportive of the uniqueness claims in WOU while also raising questions about the nature of the uniqueness claim.  Specifically, the songbird system is a striking example of extreme modularity.  The capacity of a songbird to imitate or learn its species-specific song is not a capacity that extends to other calls in its vocal repertoire, nor to any visual display. That is, a songbird can imitate the song material it hears, but nothing else.  Not so for our species, where the capacity to imitate is amodal, or at least bimodal, with sounds and actions copied readily, and from birth. This disconnect from sensory modality is a trademark of human thought, and of course, is a critical feature of our language faculty:  at virtually all levels of detail, including syntax, semantics, phonology, acquisition, and pragmatics, there are no differences between signed and spoken languages. No other animal is like this.  Whether we observe songbirds, dolphins, or non-human primates, an individual born deaf does not emerge with a comparably expressive visual system of communication.  The systems of communicative expression are intimately tied to the modality, such that if one modality is damaged, other modalities are incapable of picking up the tab.  The fact that our language, and even more broadly, our thoughts, are detached from modality, suggests a fundamental reorganization in our representations and computations.  This takes us to CI, SM, MERGE and the interfaces.

Given the modularity of the songbird system, and the lack of imitative capacities in non-human primates, we also need an account of how a motor system capable of imitating sounds and actions evolved.  This is an account of how SM evolved, but also, about how and when SM interfaced with CI and MERGE.  There is virtually no evidence on offer, and it is hard to imagine what kind of evidence could emerge. For example, the suggestion that Neanderthals had a hyoid bone like Hss is interesting, but doesn’t tell us what they were doing with it, whether it was capable of being deployed in vocal imitation, and thus, of building up the lexicon.  And of course, we don’t know whether or how it was connected to CI or MERGE.  But whatever we discover about this account, it showcases the importance of understanding the evolution of at least one unique property of SM.

When we turn to CI, and in particular, lexical or conceptual atoms, we know extremely little about them, even in fully linguistics human adults.  Needless to say, this makes comparative and developmental work difficult.  But one observation seems fairly uncontroversial: many of our concepts are completely detached from sensory experiences, and thus can’t be defined by them. If we take this as a starting point, we can ask: do animals have anything remotely like this?  On one reading of Randy Gallistel’s elegant work, the answer is “Yes.”  All of the empirical work on number, time and space in animals suggests that such concepts are either not linked to or defined by a particular modality, or minimally, can be expressed in multiple modalities.  Similarly, there is evidence that animals are capable of representing some sense of identity or sameness that is not tied to a modality.  If this is right, and even if these concepts are not as abstract as ours, they suggest a potential comparative approach that at this point, seems closed off for our recursive capacity.   Having a comparative evolutionary landscape of inquiry not only aids in our analyses, it also raises a challenge to premises 4 and 5, as well as to Richard Lewontin’s comment (supported by Berwick and Chomsky) that we can’t study or understand the evolution of cognition.  Let me take a small detour to describe a gorgeous series of studies on the evolution of cognition to show what can and has been done, and then return to premises 4 and 5.

In most monogamous species, the male and female share the same home range or territory.  In polygynous species, in contrast, there are several females associated with one male, and thus, the male’s home range area encompasses all of the smaller female home ranges.  Based on this observation, Steve Gaulin and his colleagues (Gaulin & Wartell, 1990; Jacobs, Gaulin, Sherry, & Hoffman, 1990; Puts, Gaulin, & Breedlove, 2007) predicted that the spatial abilities of a monogamous vole would show no sex differences, whereas males would show greater abilities than females in a closely related polygynous vole species. Using a maze running task to test for spatial capacity, results provided strong support for the prediction.  Further, the size of the hippocampus — an area of the brain known to play an important role in spatial navigation — was significantly larger in males of the polygynous species when contrasted with females, whereas no sex differences were found for the monogamous species. This, and several other examples, reveal how one can in fact study the evolution of cognition. Lewontin is, I believe, flatly wrong.

Back to premises 4 and 5. If nonhuman animals have abstract, amodal concepts — as some authors suggest —  then we have a significant line of empirical inquiry into the evolution of this system.  If our concepts are unique — as authors such as Berwick and Chomsky believe —  then there may not be that many empirical options. Perhaps Neanderthals have such concepts, perhaps not. Either way, the evolutionary timescale is short, and the evidence thus far, relatively thin.  On either account, however, there is the pressing need to understand the nature of such concepts as they bear on what I believe is the most interesting side effect of this discussion, and the issues raised in WOU.  In brief, if one concedes that what is unique about language, and thus, its evolutionary history, is MERGE, CI, SM and the interfaces, then a different issue emerges:  are these four ingredients unique to language or part of all aspects of human thought?  Said differently, perhaps WOU is really an account of how our uniquely human system of thought evolved, with language being only one domain in terms of its internal and external systems of expression. Berwick and Chomsky often refer to our Language of Thought, as the core of language, and what is our most dominant use of language: internal thought.  On this view, externalization of this system in expressed language is not at the core of the evolutionary account.  On the one hand, I agree. On the other hand, I think the use of the term of Language of Thought or LOT has confused the issue because of the multiple uses of the word “language.” If the essence of the argument in WOU is about the computations and representations of thought, with linguistic thought being one flavor, then I would suggest we call this system the Logic of Thought.  I suggest this substitution of L-words for two reasons.  Language of Thought implies that the system is explicitly linguistic, and I don’t believe it is.  Further, I think Logic of Thought better captures the abstract nature of the ingredients, including both the recursive operations, concepts, motor routines, and interfaces. 

The Logic of Thought, I would argue, is uniquely human, and underpins not only language, but many other domains as well.  It explains, I believe, why actions that appear similar in other animals are actually not similar at all.  It also provides the ultimate challenge to Darwin’s argument that there is continuity in mental thought between humans and other animals, with differences attributable to quantity as opposed to quality.  In contrast, if the ideas discussed here, and ultimately raised by Berwick and Chomsky are right, then it is the Logic of Thought that is unique to humans.  The Logic of Thought includes all four ingredients: MERGE, CI, SM, and the interfaces. How these components are articulated in different domains is fascinating in its own right, and raises several additional puzzles. For example, if MERGE is the simplest recursive operation, is it one neural mechanism that interfaces with different, domain-specific concepts and actions, or were merge like circuits effectively cloned repeatedly, each subserving a different domain?  The first possibility suggests that damage to this singular MERGE circuit would reveal deficits in multiple domains.  The second option suggests that damage to the MERGE circuit in one domain would only reveal deficits in this domain. To my knowledge, there is no evidence of neuropsychological deficits or imaging studies that point to the nature or distribution of such recursive circuitry. 

In sum, WOU is really a terrific book. It is thought provoking and clear.  What more could you want?  My central challenge is that it paints an evolutionary account that can only work if the essence of language is simple, restricted to MERGE.  But language is much more than this.  As such, there has to be more to the evolutionary process.  By raising these issues, I believe Berwick and Chomsky have challenged us to think about another option, one that preserves their title, but focuses on the logic of thought.  Why only us? Much to think about.

Gaulin, S. J., & Wartell, M. S. (1990). Effects of experience and motivation on symmetrical-maze performance in the prairie vole (Microtus ochrogaster). Journal of Comparative Psychology, 104(2), 183–189.
Hauser, M. D. (2009). The possibility of impossible cultures. Nature, 460, 190–196.
Hauser, M. D., Yang, C., Berwick, R. C., Tattersall, I., Ryan, M. J., Watumull, J., et al. (2014). The mystery of language evolution. Frontiers in Psychology, 5(401), 1–12.
Jacobs, L. F., Gaulin, S. J., Sherry, D. F., & Hoffman, G. E. (1990). Evolution of spatial cognition: sex-specific patterns of spatial behavior predict hippocampal size. Proceedings of the National Academy of Sciences, 87(16), 6349–6352.

Puts, D. A., Gaulin, S. J., & Breedlove, S. M. (2007). Sex differences in spatial ability: evolution, hormones and the brain. Evolutionary Cognitive Neuroscience. MIT Press, pp 329-379.

[1] Talking about flat-earthers and limitless ignorance, it is worth comparing Hauser’s review below to one by V. Evans (yes, that V. Evans) here. The best that I can say is that it is consistent with what I have come to expect from Evans’ work, (viz. it is not worse than his other recent output (tant pis)).

Tuesday, February 23, 2016

More on journal quality control and the forward march of science

There is a view that whereas scientists are human, with all the cognitive and moral foibles that entails, Science as an institution is able to rise above these and, in a reasonable amount of time, find the truth. The idea is that though human scientists are partial to their own views, shade things to their advantage, are motivated by ambition, hubris, lucre and worse and might therefore veer from truth in the direction of self advancement and promotion, the institution of Science is self-correcting and the best system we have for finding out what the reality is like (think of the old Upton Sinclair quip: "it's difficult to get a man to understand something when his salary depends upon his not understanding it"). Or, another way of putting this, the moral and cognitive virtues of the ideal scientist are beyond human scientists but not beyond the institution as a whole.

This is a nice story. But, I am skeptical. It relies on an assumption that is quite debatable (and has been debated); that there is a system to scientific inquiry, a scientific method (SM). If there is such, it has resisted adequate description and, from what I can tell, the belief in SM is now considered rather quaint within philosophy, and even among practicing scientists. This is not to deny that there are better and worse arguments in favor of doing things one way or another, and better and worse experiments and better and worse theories and better and worse data. My skepticism does not extend to an endorsement of the view that we cannot rationally compare alternatives. It is to deny that there is a method for deciding this independent of the issues being discussed. There is no global method for evaluating scientific alternatives, though locally the debate can be rationally adjudicated. It is in this sense that I doubt that there is SM. There are many SMs that are at best loosely related to one another and that are tied very closely to the insights a particular field has generated.

If this is so, then one can ask for the source of the conviction that Science must overcome the failures of the individuals that comprise it. One reason is that conversation generally improves things and Science is organized conversation. But, usually the idea goes beyond this. After all, Science is supposed to be more than one big book group. And what is generally pointed to is the quality control that goes into scientific discourse and the self correcting nature of the enterprise. Data gets refined, theories get corrected using experiment. The data dependence and primacy of experiment is often schlepped out when the virtues of SM are being displayed.

There is, of course, some truth to this. However, it seems the structures promoting self-correction might be quite a but weaker than is often supposed. Andrew Gelman discusses one of these in a recent post (here). He notes that there is a very high cost to the process of correction that is necessary for the nice little story above to be operative. There are large institutional forces against it and individual scientists must bear large costs if they wish to correct these. On the assumption that the virtues of Science supervene on the efforts of scientists, this suggests that the failings of the latter are not so easily filtered out in the former. Or, at the very least, there is little reason to think that they are in a reasonable time.

There is a fear among scientists that if it is ever discovered how haphazard the search for truth actually is that this will discredit the enterprise. The old "demarcation problem"looms as a large PR problem. So, we tell the story of self correcting science, and this story might not be as well scientifically grounded as we might like. Certainly the Gelman post highlights problems, and these are not the only ones we know of (think Ionnides!). So, is there a problem?

Here's my view: there is no methodological justification of scientific findings. It's not that Science finds truth in virtue of having a good method for doing so, rather some science finds some truth and this supports local methods that allow for the finding of more truths of that kind. Success, breads methodology that promotes more success (i.e. not successful method leading to greater truth). And if this is right, then all of the methodological fussing is besides the point. Interestingly, IMO, you tend to find this kind of methodological navel-gazing in precisely those domains that seem least insightful. As Pat Suppes once put it:
It's a paradox of scientific method that the branches of empirical science that have the least theoretical development have the most sophisticated methods of evaluating evidence.
This may not be that much of a paradox. In areas we know something, the something we know speaks for itself, and does so eloquently. In areas where we know little, then we look to method to cover our ignorance. But method can't do this and insight tolerates sloppiness. Why have we made some scientific progress? I suspect that luck has played a big part. That, and some local hygiene to support the small insights. The rest is largely PR we use to to make ourselves feel good, and, of court, to beat those whose views we do not like.

Monday, February 22, 2016

Derived objects and derivation trees

I am pretty slow to understand things that are new to me. So I have a hard time rethinking things in new ways, being very comfortable with what I know, which is all a longwinded way of saying that when I started reading Stabler’s take on Minimalism (and Greg Kobele’s and Thomas Graff’s and Tim Hunter’s) that emphasized the importance of derivation trees (as opposed to derived objects) as the key syntactic “objects” I confess that I did not see what was/is at stake. I am still not sure that I do. However, thinking about this stuff led me to some thoughts on the matter and I thought that I would put them up here so that those better versed on these issues (and also more mentally flexible) could help set me (and maybe a few others) straight.

The way I understand things, derivation trees are ways of representing the derivational history of a sound/meaning pairing. A derivation applies some rules in some sequence and interpreting each step allows for a phonological and a semantic “yield,” (a phonetic form and a meaning correlate of these sequence of steps).  Some derivations are licit, some not. This divides the class of derivation trees into those groups that are G-ok and those that are not. However, and this seems to be an important point, all of this is doable without mentioning derived objects (i.e. classical phrase markers (PM)). Why? Because PMs are redundant. Derivation trees implicitly code all the information that a phrase marker does as the latter are just the products that applying rules in a particular sequence yield. Or, for every derivation tree it is possible to derive a corresponding PM.

Putting this another way: the mapping to sound and meaning (which every syntactic story must provide) is not a mapping from phrase markers to interpreted phrase markers but from sequences of rules to sound and meaning pairs (i.e. <s,m>). There is no need to get to the <s,m>s by first going through phrase markers. To achieve a pairing of articulation and semantic interpretations we need not transit through abstract syntactic phrase markers. We can go there directly. Somewhat poetically (think Plato’s cave), we can think of PMs as shadows cast by the real syntactic objects, the derivational sequence (represented as derivation trees), and though pretty to look at shadows (i.e. PMs) are, well, shadowy and not significant. At the very least, they are redundant and so, given Occamite sympathies for the cleanly shaved, best avoided.

This line of argument strikes me as pretty persuasive. Were it the case that derived objects/PMs didn’t do anything, then though they might be visually useful they are not fundamental Gish objects (sort of like linguistic versions of Feynman diagrams). But, I think that the emphasis on derivation trees misses one important function of PMs within syntax and I am not sure that this role is easily or naturally accommodated by an emphasis on derivation trees alone. Let me explain.

A classical view of Gs is that they sequentially map PMs into PMs. Thus, G rules apply to PMs and deliver back PMs. The derivations are generally taken to be Markovian in that the only PM that a G must inspect to proceed to the next rule application is the last one generated. So for the G to licitly get from PMn to PMn+1 it need only inspect the properties of PMn. On this conception, a PM brings information forward, in fact all (and in the best case, no more) of the information you need to know in order to take the next derivational step. On this view, then, derived objects serve to characterize the class of licit derivation trees by making clear what kinds of derivation trees are unkosher. An illustration might help.

So, think of island effects. PMs that have a certain shape prohibit expressions within them from moving to positions outside the island. So an expression E within an island is thereby frozen. How do we code islandhood? Well, some PMs are/contain islands and some do not. If E is within an island at stage PMn then E cannot move out of that island at stage PMn+1. Thus, the derivation tree that represents such movement is illict. From this perspective, we can think of derived objects as bringing forward information in derivational time, information that restricts the possible licit continuations of the derivation tree. Indeed, PMs do this in such a way that all the information relevant to continuing is contained in the last PM derived (i.e. it supports completely markovian derivations). This is one of the things (IMO, the most important thing) that PMs (aka derived objects) brought to the table theoretically.

So the relevant question concerning the syntactic “reality” of PMs/derived objects is whether we can recapture this role of PMs without adverting to them. And the answer should be “yes.” Why? Well derived objects just are summations of previous derivational steps. They just code prior history. But if this is what they do, then derived objects are, as described above, redundant, and so, in principle eliminable. In other words, we can derive all the right <s,m>s by considering the class of licit derivation trees and we can identify these without peaking at the derived objects that correspond to them.[1]

This line of argument, however, is reminiscent of another one that Stabler (here) critically discusses. He notes that certain kinds of context free grammars (MCFGs) can mimic movement with the effect that all the same <s,m>s that a movement based theory derives such a CFG can also derive and in effectively the same way. However, he notes that these CFGs are far less compact than the analogous transformational grammars and that this can make an important difference cognitively. Here’s the abstract:

Minimalist grammars (MGs) and multiple context free grammars (MCFGs)
are weakly equivalent in the sense that they define the same languages, a
large mildly context sensitive class that properly includes context free languages.
But in addition, for each MG, there is an MCFG which is strongly
equivalent in the sense that it defines the same language with isomorphic
derivations. However, the structure building rules of MGs but not MCFGs
are defined in a way that generalizes across categories. Consequently,MGs
can be exponentially more succinct than their MCFG equivalents, and this
difference shows in parsing models too. An incremental, top-down beam
parser forMGs is defined here, sound and complete for allMGs, and hence
also capable of parsing all MCFG languages. But since the parser represents
its grammar transparently, the relative succinctness of MGs is again
evident. And although the determinants of MG structure are narrowly and
discretely defined, probabilistic influences from a much broader domain
can influence even the earliest analytic steps, allowing frequency and context
effects to come early and from almost anywhere, as expected in incremental

Can we apply the same logic to the discussion above? Well maybe. Even if the derivation trees contain all the same information that a theory with PMs does, does it make it available in the same way that PMs do? Or, if we all agree that certain information must be “carried forward” (Kobele’s elegant term) derivationally, might it make a cognitive difference how this information is carried forward; implicitly in a derivation tree or explicitly in a PM? Well, here is one place to look: One thing that PMs allow is for derivation to be markovian. Is this a cognitively important feature, analogous to being compact? I can imagine that it might be. I can imagine that Gs being markovian has nice cognitive properties. Of course, this might be false. I just don’t know. At any rate, I have no problem believing that how information is carried forward can make a big computational difference. Consider an analogy.

Think of adding a long column of multi-digit numbers. One useful operation is the “carry” procedure whereby only the last digit of a column is registered and the rest is carried forward to the next column. But is “carrying” really necessary? Is adding all but the last digit to the next column necessary? Surely not, for the numbers carried can be recovered at every column by simply adding everything up again from earlier ones. Nonetheless, I suspect that re-adding again and again has a computational cost that carrying does not. It just makes things easier. Ditto with PMs. Even if the information is recoverable in derivation trees, PMs make accessing this information easy.

Let me go further. MPs of the kind that Stabler and his students have developed don’t really have much to say about how the kinds of G restrictions on derivations are to be retrieved in derivation trees without explicit mention of the information coded in the structural properties PMs exhibit. The only real case I’ve seen discussed in depth is minimality, and Stabler-like MGs (minimalist grammars) deal with minimality effects by effectively supposing that they never arise (it is never the case in Stabler MGs that a licit derivation allows two objects to have the same accessible checkable features). This rendering of minimality works well enough in the usual cases so that Stabler MG formalizations are good proxies for minimalist theories “in the wild.”  However, not only does this formalization not conform to the intuition that most syntactitians have about what the minimality condition is, it is furthermore easy to imagine that this is not the right way to formalize minimality effects for there may well be many derivations where more than one expression carries the requisite features in an accessible way (in fact, I’ve heard formalists discussing just this point many times (think multiple interrogation or multiple foci or topics or even case assignment). This is all to say, that the one case where a reasonably general G-condition does get discussed in the MP literature leaves it unclear how MP should/does treat other conditions that do not seem susceptible to the same coding (?) trick. Or, minimality environments are just one of the conditions that PMs make salient. It would be nice to see how other restrictions that we explain by thinking of derivations as markovian mappings from PMs to PMs is handled with derived objects.[2] Take structure dependence or, Island/ECP effects or the A-over-A condition for example. We know what needs doing: we need to say that some kinds of G relations are illicit between some positions in a derivation tree so that some extensions of the derivation tree are G-illicit. Is there a nice compact description of what these conditions are that make no mention, however inadvertently, of PMs?

That’s it. I have been completely convinced that derivation trees are indispensible. I am convinced that derived objects (aka PMs) are completely recoverable from derivation trees. I am even convinced that one need not transit through PMs to get to the right <s,m> pairs (in fact, I think that thinking of the mapping via PMs that are “handed over” to the interpretive components is a bad way of thinking of what Gs do). But this does not yet imply, I don’t think, that PMs are not important G like objects. At the very least they describe the kinds of information that we need to use to specify the class of licit derivation trees. Thus, we need an account of how information is brought forward in derivational time in derivation trees and, more importantly, what is not. Derived objects seem very useful in coding the conditions on G-licit derivation tree continuations. And as these are the very heart of modern GG theory (I would say the pride of what GG has discovered) we want to know how these are coded with PMs.

Let me end with one more historical point. Syntactic Structures argues that transformations are necessary to capture evident generalizations in the data. The argument for affix hopping and Aux movement was not that a PSG couldn’t code the facts, but that it did so in such a completely ugly, ad hoc and uninformative way. This was the original example for the utility of compact representations. PMs proved useful in similar ways: inspecting their properties allowed for certain kinds of “nice looking” derivations. The structure of a given PM constrained what next derivational was possible. That’s what PMs did well (in addition to feeding <s,m>s). Say we agree that PMs are not required (or really add much) in understanding the mapping between sounds and meaning (i.e. in deriving <s,m> pairs) what of the more interesting use to which OMs were made (i.e. stating restrictions on derivations). Is this second function as easily, insightfully discharged without PMs? I’d love to know.

[1] It is perhaps noteworthy that there is not a clear match between grammaticality and semantic interpretability. Thus, there are many unacceptable sentences that are easlity interpreted and, in fact, have only one possible interpretation (e.g. The child seems sleeping, or Which man did you say that left). This, IMO, is an important fact. We odn’t want our theories of linguistic meaning to go off the rails if a sentence is ungrammatical for that would (seem to) imply that it has no licit meaning. Now there are likely ways to get around this, but I find nothing wrong with the idea that an expression can be semantically well formed even if syntactically illicit and I would like a theory to allow this. Thus, we don’t want a theory to not yield a licit <s,m> just because there is no licit derivation. Of course, how to do this, is an open question.
[2] See here and here for another possible example of how derivation trees handle significant linguistic details that are easy to “see” if one adopts derived objects. The gist is that bounding “makes sense” in a theory where PMs are mapped into PMs in a computationally reasonable way. It is less clear why sticking traces into derived objects (if understood as yields (i.e. ss or ms in <s,m> pairs) makes any sense at all given their interpretive vacuity.

Friday, February 19, 2016

A pair of interesting posts from Andrew Gelman that might provoke

Here are a pair of posts by Andrew Gelman in issues that you might find interesting.[1]

The first (here) is on peer review. It notes the principle virtues and some of the drawbacks of the process. The main virtue, as he sees it, is that it serves as a check on coherence and egregious data mishandlings. The main vice is that (i) peer review mainly enforces the accepted wisdom and (ii) real data problems are too hard for reviewers to find as serious procedural and data vetting is way beyond what it is reasonable to ask a reviewer to do. Thus, a publication in a peer reviewed journal serves largely as a stamp that the paper reflects the views that the journal thinks are reasonable. Thus, the paper is a good example of the kinds of papers that the journal publishes. Of course, should the journal’s judgment of what is worthy be off the mark then the fact that something is published therein carries the obvious consequences. The radical conclusion is that peer review is mildly useful but hardly a guarantee that what is being published is likely to be true, roughly on the mark, or even interesting. It reflects the consensus world-view of the journal. That’s it.

I think that Gelman makes an important point here. I have ranted endlessly on how strictly theoretical work in GG is effectively considered journal inadmissible. It is not strictly speaking impossible to get something like this in (though with the exception of Chomsky pieces I don’t think I’ve ever seen such a paper in our leading journals), but for all practical purposes it is. A publishable paper has to have a certain structure, and theoretical investigations rooted in exploring the basic concepts and the ties between them (rather than the empirical consequences of some set of concepts (and this down to actual language specific numbered examples of data in a specific language)) are submitta non grata. Don’t bother. No chance.

Indeed, I would go quite a bit further. In my experience exploring ideas empirically that fall on the wrong side of what is considered the right view even when couched in the same idiom as the standards are treated with great skepticism. I know more than a few examples of papers exploring, for example, the virtues of the Movement Theory of Control that are harshly treated simply because they are exploring the virtues of this theory. In other words, the unstated assumption is that theories that are considered wrong cannot have interesting properties therefore they should not be investigated and cannot have interesting empirical or conceptual consequences. I believe that this encapsulates the essence of the anti-theoretical worldview. And, from my experience, it is pervasive in our little corner of the sciences.[2]

At any rate, take a look at Gelman’s piece for some (IMO, salutary) push back against the idea that what makes science so reliable and insightful is the peer review process.

The second post (here) is equally interesting. It notes that there are many reasons to collect data, only one of which is to test a theory. The discussion is couched in a distinction between testing and generating theories. It serves as a useful antidote to the idea that looking for generalizations in the data is somehow not being scientific (i.e. chasing significance is evil). It’s not that looking for patterns is bad, rather confusing generating hypotheses (i.e. looking for patterns) is not the same as testing hypotheses. Correct. They are not the same. But as Gelman notes, generating hypotheses (especially non-trivial ones) is not easy and needs nurturing. Need I add that this is what theoretical speculation is also for? But, I already ranted about this above, so no more here.

At any rate, two interesting posts which you might enjoy.

[1] Thx to Bill Idsardi for first pointing me to Gelman’s blog.
[2] Part of my physics envy is rooted in my admiration for the tolerance physicists have for thought-experiments and the investigation of models that though they know them to be empirically inadequate. See here for discussion of just one example of this kind of theoretical investigation. The money sentence for current purposes is the following: “The theories involved are generally not viable models of the real world (my emphasis NH), but they have certain features, such as their particle content or high degree of symmetry, which make them useful for solving problems in quantum field theory and quantum gravity, nonetheless contain interesting features that they hope to incorporate in more adequate accounts down the road.”