Comments

Wednesday, July 27, 2016

Scientific Publishing in a Modern World: A Thought Experiment

Norbert and regular readers of this prestigious blog may have seen me participate in some discussions about open access publishing, e.g. in the wake of the Lingua exodus or after Norbert's link to that article purportedly listing a number of arguments in favor of traditional publishers. One thing that I find frustrating about this debate is that pretty much everybody who participates in it thinks of this issues as how the current publishing model can be reconciled with open access. That is a very limiting perspective, in my opinion, just like every company that has approached free/libre and open source software (aka FLOSS) with the mindset of a proprietary business model has failed in that domain or is currently failing (look at what happened to OpenOffice and MySQL after Oracle took control of the projects). In that spirit, I'd like to conduct a thought experiment: what would academic publishing look like if it didn't have decades of institutional cruft to carry around? Basically, if academic publishing hadn't existed until a few years ago, what kind of system would a bunch of technically-minded academics be hacking away on?

Wednesday, July 20, 2016

Linguistic creativity 2

Here’s part 2. See here for part 1.

L&M identifies two other important properties that were central to the Cartesian view.

First, human linguistic usage is apparently free from stimulus control “either external or internal.” Cartesians thought that animals were not really free, animal behavior being tightly tied to either environmental exigencies (predators, food location) or to internal states (being hungry or horny). The law of effect is a version of this view (here). I am dubious that this is actually true of animals. And, I recall a quip from an experimental psych friend of mine that claimed that the first law of animal behavior is that the animal does whatever it damn well pleases. But, regardless of whether this is so for animals, it is clearly true of humans as manifest in their use of language. And a good thing too, L&M notes. For this freedom from stimulus control is what allows “language to serve as an instrument of thought and self-expression,” as it regularly does in daily life.

L&M notes that Cartesians did not take unboundedness or freedom from stimulus control to “exceed the bounds of mechanical explanation” (12). This brings us to the third feature of linguistic behavior: the coherence and aptness of everyday linguistic behavior. Thus, even though linguistic behavior is not stimulus bound, and hence not tightly causally bound to external or internal stimuli, linguistic behavior is not scattershot either. Rather it displays “appropriateness to the situation.” As L&M notes, it is not clear exactly how to characterize condign linguistic performance, though “there is no doubt that these are meaningful concepts…[as] [w]e can distinguish normal use of language from the ravings of a lunatic or the output of a computer with a random element” (12). This third feature of linguistic creativity, its aptness/fit to the situation without being caused by it was, for Cartesians, the most dramatic expression of linguistic creativity.

Let’s consider these last two properties a little more fully: (i) stimulus-freedom (SF) and (ii) apt fit (AF).

Note first that both kinds of creativity though expressed in language, are not restricted to linguistic performances. It’s just that normal language use provides everyday manifestations of both features.

Second, the sources of both these aspects of creativity are, so far as I can tell, still entirely mysterious. We have no idea how to “model” either SF or AF in the general case. We can, of course, identify when specific responses are apt and explain why someone said what they did on specific occasions. However, we have no general theory that illuminates the specific instances.[1] More precisely, it’s not that we have poor theories, it’s that we really have no theories at all. The relevant factors remain mysteries, rather than problems in Chomsky’s parlance. L&M makes this point (12-13):

Honesty forces us to admit that we are as far today as Descartes was three centuries ago from understanding just what enables a human to speak in a way that is innovative, free from stimulus control, and also appropriate and coherent.

The intractability of SF and AF serves to highlight the importance of the competence/performance distinction. The study of competence is largely insulated from these mysterious factors.  How so? Well, it abstracts away from use and studies capacities, not their exercise. SF and PF are not restricted to linguistic performances and so are unlikely intrinsically linked to the human capacity for language. Hence detaching the capacity should not (one hopes) corrupt its study, even if how competence is used for the free expression of thought remains obscure.

The astute reader will notice that Chomsky’s famous review of Skinner’s Verbal Behavior (VB) leaned heavily on the fact of SF. Or more accurately, the review argued that it was impossible to specify the contours of linguistic behavior by tightly linking it to environmental inputs/stimuli or internal states/rewards. Why? Cartesians have an answer: the Skinnerian project is hopeless. Our behavior is both SF and AF, our verbal behavior included. Hence any approach to language that focuses on behavior and its immediate roots in environmental stimuli and/or rewards is doomed to failure. Theories built on supposing that SF or AF are false will either be vacuous or evidently false. Chomsky’s critique showed how VB embodied the twin horns of this dilemma. Score one for the Cartesians.

One last point and I quit. Chomsky’s expansive discussion of the various dimensions of linguistic creativity may shed light on “Das Chomsky Probleme.” This is the puzzle of how, or whether, two of Chomsky’s interests, politics and linguistics, hook up. Chomsky has repeatedly (and IMO, rightly) noted that there is no logical relation between his technical linguistic work and his anarchist political views. Thus, there is no sense in which accepting the competence/performance distinction or thinking that TGG is required as part of any solution to linguistic creativity or thinking that there must be a language dedicated FL to allow for the facts of language acquisition in any way imply that we should organize societies on democratic bases in which all participants robustly participate, or vice versa. The two issues are logically and conceptually separate.

This said, those parts of linguistic creativity that the Cartesians noted and that remain as mysterious to us today as when they were first observed can ground a certain view of politics. And Chomsky talks about this (L&M:102ff). The Cartesian conception of human nature as creative in the strong Cartesian sense of SF and AF leads naturally to the conclusion that societies that respect these creative impulses are well suited to our nature and that those that repress them leave something to be desired. L&M notes that this creative conception lies at the heart of many Enlightenment and, later, Romantic conceptions of human well-being and the ethics and politics that would support expression of these creative capacities. There is a line of intellectual descent from Descartes through Rousseau to Kant that grounds respect for humans in the capacity for this kind of “freedom.” And Chomsky is clearly attracted to this idea. However, and let me repeat, however, Chomsky has nothing of scientific substance to say about these kinds of creativity, as he himself insists. He does not link his politics to the fact that humans come with the capacity to develop TGGs. As noted, TGGs are at right angles to SF and AF, and competence abstracts away from questions of behavior/performance where SF and AF live. Luckily, there is a lot we can say about capacities independent of considering how these capacities are put to use. And that is one important point of L&M’s extended discussion of the various aspects of linguistic creativity. That said, these three conceptions connect up in Cartesian conceptions of human nature, despite their logical and conceptual independence and so it is not surprising that Chomsky might find all three ideas attractive even if they are relevant for different kinds of projects. Chomsky’s political interests are conceptually separable from his linguistic ones. Surprise, surprise it seems that he can chew gum and walk at the same time!

Ok, that’s it. Too long, again. Take a look at the discussion yourself. It is pretty short and very interesting, not the least reason being how abstracting away from deep issues of abiding interest is often a pre-condition for opening up serious inquiry. Behavior may be what interests us, but given SF and AF is has proven to be refractory to serious study. Happily, studying the structure of the capacity independent of how it is used has proven to be quite a fertile area of inquiry. It would be a more productive world were these insights in L&M more widely internalized by the cog-neuro-ling communities.


[1] The one area where SFitude might be relevant regards the semantics of lexical items. Chomsky has argued against the denotational theories of meaning in part by noting that there is no good sense in which words denote things. He contrasts this with “words” in animal communication systess. As Chomsky has noted, how lexical items work “pose deep mysteries,” something that referential theories do not appreciate. See here for references and discussion.  

Wednesday, July 13, 2016

Linguistic creativity 1

Once again, this post got away from me, so I am dividing it into two parts.

As I mentioned in a recent previous post, I have just finished re-reading Language & Mind (L&M) and have been struck, once again, about how relevant much of the discussion is to current concerns. One topic, however, that does not get much play today, but is quite well developed in L&M is it’s discussion of Descartes’ very expansive conceptions of linguistic creativity and how it relates to the development of the generative program. The discussion is surprisingly complex and I would like to review its main themes here. This will reiterate some points made in earlier posts (here, here) but I hope it also deepens the discussion a bit.

Human linguistic creativity is front and center in L&M as it constitutes the central fact animating Chomsky’s proposal for Transformational Generative Grammar (TGG). The argument is that a TGG competence theory is a necessary part of any account of the obvious fact that humans regularly use language in novel ways. Here’s L&M (11-12):

…the normal use of language is innovative, in the sense of much of what we say in the course of normal use is entirely new, not a repetition of anything that we have heard before and not even similar in pattern - in any useful sense of the terms “similar” and “pattern” – to sentences or discourse that we have heard in the past. This is a truism, but an important one, often overlooked and not infrequently denied in the behaviorist period of linguistics…when it was almost universally claimed that a person’s knowledge of language is representable as a stored set of patterns, overlearned through constant repetition and detailed training, with innovation being at most a matter of “analogy.” The fact surely is, however, that the number of sentences in one’s native language that one will immediately understand with no feeling of difficulty or strangeness is astronomical; and that the number of patterns underlying our normal use of language and corresponding to meaningful and easily comprehensible sentences in our language is order of magnitudes greater than the number of seconds in a lifetime. It is in this sense the normal use of language is innovative.

There are several points worth highlighting in the above quote. First, note that normal use is “not even similar in pattern” to what we have heard before.[1] In other words, linguistic competence is not an instance of pattern matching or recognition in any interesting sense of “pattern” or “matching.”  Native speaker use extends both to novel sentences and to novel sentence patterns effortlessly. Why is this important?

IMO, one of the pitfalls of much work critical of GG is the assimilation of linguistic competence to a species of pattern matching.[2] The idea is that a set of templates (i.e. in L&M terms: “a stored set of patterns”) combined with a large vocabulary can easily generate a large set of possible sentences in the sense of templates saturated by lexical items that fit. [3] Note, that such templates can be hierarchically organized and so display one of the properties of natural language Gs (i.e. hierarchical structures).[4] Moreover, if the patterns are extractable from a subset of the relevant data then these patterns/templates can be used to project novel sentences. However, what the pattern matching conception of projection misses is that the patterns we find in Gs are not finite and the reason for this is that we can embed patterns within patterns within patterns within…you get the point. We can call the outputs of recursive rules “patterns” but this is misleading for once one sees that the patterns are endless, then Gs are not well conceived of as collections of patterns but collections of rules that generate patterns. And once one sees this, then the linguistic problem is (i) to describe these rules and their interactions and (ii) to further explain how these rules are acquired (i.e. not how the patterns are acquired).

The shift in perspective from patterns (and patternings in the data (see note 5)) to generative procedures and the (often very abstract) objects that they manipulate changes what the acquisition problem amounts to. One important implication of this shift of perspective is that scouring strings for patterns in the data (as many statistical learning systems like to do) is a waste of time because these systems are looking for the wrong things (at least in syntax).[5] They are looking for patterns whereas they should be looking for rules. As the output of the “learning” has to be systems of rules, not systems of patterns, and as rules are, at best, implicit in patterns, not explicitly manifest by them, theories that don’t focus on rules are going to be of little linguistic interest.[6]

Let me make this point another way: unboundedness implies novelty, but novelty can exist without unboundedness. The creativity issue relates to the accommodation of novel structures. This can occur even in small finite domains (e.g. loan words in phonology might be an example). Creativity implies projection/induction, which must specify a dimension of generalization along which inputs can be generalized so as to apply to instances beyond the input. This, btw, is universally acknowledged by anyone working on learning. Unboundedness makes projection a no-brainer. However, it also has a second important implication. It requires that the generalizations being made involve recursive rules. The unboundedness we find in syntax cannot be satisfied via pattern matching. It requires a specification of rules that can be repeatedly applied to create novel patterns. Thus, it is important to keep the issue of unboundedness separate from that of projection. What makes the unboundedness of syntax so important is that it requires that we move beyond the pattern-template-categorization conception of cognition.

Dare I add (more accurately, can I resist adding) that pattern matching is the flavor of choice for the Empricistically (E) inclined. Why? Well, as noted, everyone agrees that induction must allow generalization beyond the input data. Thus even Es endorse this for Es recognize that cognition involves projection beyond the input (i.e. “learning”). The question is the nature of this induction. Es like to think that learning is a function from input to patterns abstracted from the input, the input patterns being perceptually available in their patternings, albeit sometimes noisily.[7] In other words, learning amounts to abstracting a finite set of patterns from the perceptual input and then creating new instances of those patterns by subbing novel atoms (e.g. lexical items) into the abstracted patterns. E research programs amount to finding ways to induce/abstract patterns/templates from the perceptual patternings in the data. The various statistical techniques Es explore are in service of finding these patterns in the (standardly, very noisy) input. Unboundedness implies that this kind of induction is, at best, incomplete. Or, more accurately, the observation that the number of patterns is unbounded implies that learning must involve more than pattern detection/abstraction. In domains where the number of patterns is effectively infinite, learning[8] is a function from inputs to rules that generate patterns, not to patterns themselves. See link in note 6 for more discussion.

An aside: Most connectionist learners (and deep learners) are pattern matchers and, in light of the above, are simply “learning” the wrong things. No matter how many “patterns” the intermediate layers converge on from the (mega) data they are exposed to they will not settle on enough given that the number of patterns that human native speakers are competent in is effectively unbounded. Unless the intermediate layers acquire rules that can be recursively applied they have not acquired the right kinds of things and thus all of this modeling is irrelevant no matter how much of the data any given model covers.[9]

Another aside: this point was made explicitly in the quote above but to no avail. As L&M notes critically (11): “it was almost universally claimed that a person’s knowledge of language is representable as a stored set of patterns, overlearned through constant repetition and detailed training.” Add some statistical massaging and a few neural nets and things have not changed much. The name of the inductive game in the E world is to look for perceptual available patterns in the signal, abstract them and use them to accommodate novelty. The unboundedness of linguistic patterns that L&M highlights implies that this learning strategy won’t suffice the language case, and this is a very important observation.

Ok, back to L&M

Second, the quote above notes that there is no useful sense of “analogy” that can get one from the specific patterns one might abstract from the perceptual data to the unbounded number of patterns with which native speakers display competence. In other words, “analogy” is not the secret sauce that gets one from input to rules So, when you hear someone talk about analogical processes reach for your favorite anti-BS device. If “analogy” is offered as part of any explanation of an inferential capacity you can be absolutely sure that no account is actually being offered. Simply put, unless the dimensions of analogy are explicitly specified the story being proffered is nothing but wind (in both the Ecclesiastes and the scatological sense of the term).

Third, the kind of infinity human linguistic creativity displays has a special character: it is a discrete infinity. L&M observes that human language (unlike animal communication systems) does not consist of a “fixed, finite number of linguistic dimensions, each of which is associated with a particular nonlinguistic dimension in such a way that selection of a point along the linguistic dimension determines and signals selection of a point along the associated nonlinguistic dimension” (69). So, for example, higher pitch or chirp being associated with greater intention to aggressively defend territory or the way that “readings of a speedometer can be said, with an obvious idealization, to be infinite in variety” (12). 

L&M notes that these sorts of systems can be infinite, in the sense of containing “an indefinitely large range of potential signals.” However, in such cases the variation is “continuous” while human linguistic expression exploits “discrete” structures that can be used to “express indefinitely many new thoughts, intentions, feelings, and so on.”  ‘New thoughts’ in the previous quote clearly meaning new kinds of thoughts (e.g. the signals are not all how fast the car is moving). As L&M makes clear, the difference between these two kinds of systems is “not one of “more” or “less,” but rather of an entirely different principle of organization,” one that does not work by “selecting a point along some linguistic dimension that signals a corresponding point along an associate nonlinguistic dimension.” (69-70).

In sum, human linguistic creativity implicates something like a TGG that pairs discrete hierarchical structures relevant to meanings with discrete hierarchical structures relevant to sounds and does so recursively. Anything that doesn’t do at least this is going to be linguistically irrelevant as it ignores the observable truism that humans are, as matter of course, capable of using an unbounded number of linguistic expressions effortlessly.[10] Theories that fail to address this obvious fact are not wrong. They are irrelevant.

Is hierarchical recursion all that there is to linguistic creativity? No!! Chomsky makes a point of this in the preface to the enlarged edition of L&M. Linguistic creativity is NOT identical to the “recursive property in generative grammars” as interesting as such Gs evidently are (L&M: viii). To repeat, recursion is a necessary feature of any account aiming to account for linguistic creativity, BUT the Cartesian conception of linguistic creativity consists of far more than what even the most explanatorily adequate theory of grammar specifies.  What more?



[1] For an excellent discussion of this see Jackendoff’s very nice (though unfortunately (mis)named) Patterns in the mind (here).  It is a first rate debunking of the idea that linguistic minds are pattern matchers.
[2] This is not unique to the linguistic cognition. Lots of work in cog sci seems to identify higher cognition with categorization and pattern matching. One of the most important contributions of modern linguistics to cog sci has been to demonstrate that there is much more to cognition than this. In fact, the hard problems have less to do with pattern recognition than with pattern generation via rules of various sorts.  See notes 5 and 6 for more off handed remarks of deep interest.
[3] I suspect that some partisans of Construction Grammar fall victim to the same misapprehension.
[4] Many cog-neuro types confuse hierarchy with recursion. A recent prominent example is in Frankland and Greene’s work on theta roles. See here for some discussion. Suffice it to say, that one can have hierarchy without recursion, and recursion without hierarchy in the derived objects that are generated. What makes linguistic objects distinctive is that they are the products of recursive processes that deliver hierarchically structured objects.
[5] Note that unbounded implies novelty, but novelty can exist without unboundedness. The creativity issue relates to easy handling of novel structures. This can occur even in small finite domains. Creativity implies projection, which must specify a dimension of generalization along which inputs can be extended to apply to instances beyond the input. Unboundedness makes projection a no-brainer. It further implies that the generalization involves recursive rules. Unboundedness cannot be pattern matching. It requires a specification of rules that can be repeatedly applied to create novel patterns. Thus, it is important to keep the issue of unboundedness separate from that of projection. What makes the unboundedness of syntax so important is that it requires that we move beyond the pattern-template-categorization conception of cognition.
[6] It is arguable that some rules are more manifest in the data that others are and so are more accessible to inductive procedures. Chomsky makes this distinction in L&M, contrasting surface structures which contains “formal properties that are explicit in the signal” to deep structure and transformations for which there is very little to no such information in the signal (L&M:19). For another discussion of this distinction see (here).
[7] Thus the hope of unearthing phrases via differential intra-phrase versus inter-phrase transition probabilities.
[8] We really should distinguish between ‘learning’ and ‘acquisition.’ We should reserve the first term for the pattern recognition variety and adopt the second for the induction to rules variety. Problems of the second type call for different tools/approaches than those in the first and calling both ‘learning’ merely obscures this fact and confuses matters.
[9] Although this is a sermon for another time, it is important to understand what a good model does: it characterizes the underlying mechanism. Good models model mechanism, not data. Data provides evidence for mechanism, and unless it does so, it is of little scientific interest. Thus, if a model identifies the wrong mechanism not matter how apparently successful in covering data, then it is the wrong model. Period. That’s one of the reasons connectionist models are of little interest, at least when it comes to syntactic matters.
            I should add, that analogous creativity concerns drive Gallistel’s arguments against connectionist brain models. He notes that many animals display an effectively infinite variety of behaviors in specific domains (caching behavior in birds or dead reckoning in ants) and that these cannot be handled by connectionist devices that simply track the patterns attested. If Gallistel is right (and you know that I think he is) then the failure to appreciate the logic of infinity makes many current models of mind and brain beside the point.
[10] Note that unbounded implies novelty, but novelty can exist without unboundedness. The creativity issue relates to easy handling of novel structures. This can occur even in small sets. Creativity implies projection which must specify a dimension of generalization along which inputs can be extended to apply to instances beyond the input. Unboundedness makes projection a no-brainer. It further implies that the generalization is due to recursive rules that require more than establishing a fixed number of patterns that can be repeatedly filled to create novel instances of that pattern.

Tuesday, July 12, 2016

Informant/consultant data can be very tricky

Talk about data problems!  Here is one we should all be aware of. Beware native speakers with an agenda or a sense of humor.  Thx to Paul Pietroski for bringing this sever data problem wrt speaker judgments.

Sunday, July 10, 2016

Some more pieces on scientific publishing

Here are three more short pieces (here, here, here) on the academic publishing landscape. All three relate to publishing in bio-med and so have only a glancing relation to what goes on in linguistics. We are shielded from many of the problems cited by the relative irrelevance of our work for useful products. There is clearly a lot of pressure on research to come to the right conclusion in some fields. So maybe we should consider our lack of funding from certain sources to be a partial blessing.

The last piece is a bit more interesting than the first two in that it tries to find ways of mitigating the pressures. One of the more interesting claims is that blind review did not do much to help to promote more objective reviewing. Another interesting idea is to have reviews signed and so reviewers are responsible for their comments. Of course, I can imagine that there are also downsides to this, especially if the reviewee is not someone that a reviewer would want to mess with for all sort of personal or professional reasons. At any rate, interesting stuff.

Wednesday, July 6, 2016

GG re-education camp

Every summer I go back to Generative Grammar (GG) re-education camp. I pick up an old classic (or two) and reread it/them to see what I failed to understand when I read it/them last and what nuggets there remain to mine. This year, prompted by a project that I will tell you about soon (in order to pick your collective brains) I re-read Syntactic Structures (SS), Topics (T) and Language and Mind (L&M) (well, I’m in the middle of the last two and have read the first twice). At any rate, several things struck me and they seemed like good blog fodder, so let me share.

Before I got into linguistics (when I was still but a starry eyed philo major (no cracks please, too easy)) I though that deep structure was, well, deep. After all why call it deep structure if it was just another level, without particular significance. Wasn’t it, after all, the place where Gs met semantics (at least in both SS and the standard theory) and wasn’t meaning deep?

Moreover, I was not the only one who thought this. The popular press circulating around GG always seemed to zero in on “deep structure,” surface structure being so, well, surfacy. Like any philosopher, given a choice between plumbing the depths and skimming the surface I was all for going down and deep.

As I grew more sophisticated I came to realize the error of my ways and how terminology had mislead me. I would sneer at terminologically naived neophytes who failed to appreciate that “deep” did not mean “fundamental.” I would knowingly intone that deep structure was just another level and of no more intrinsic significance that any other level. I would also glibly point out that meaning was not restricted to deep structure as the Katz-Postal hypothesis was slowly giving way to interpretive theories of semantic interpretation where surface structure fed some aspects of meaning (Jackendoff 1972 being the seminal text).[1]

And I was wrong. Sophistication be damned, deep structure really was/is deep, even if not in the way that I originally thought. That’s what my summer rereading of the big three above showed me. So, why is deep structure deep in the sense of terrifically significant to the GG enterprise? Here’s why in one phrase: linguistic creativity (LC).

Chomsky noted that the fact of LC was underappreciated. Humans are able to appropriately produce and easily understand a (practically) infinite number of linguistic expressions. Or, as a matter of course, humans produce or parse linguistic expressions they have never before encountered. The capacity to do this requires that they have internalized a system of rules. What kinds? Rules that tightly couple a linguistic expression’s meaning with that linguistic expression’s articulation (sound, gesture). Absent this kind of theory (aka a grammar that generates an infinite number of sound meaning pairings) there is no possible account for this easily observed fact that humans are linguistically creative.

Moreover, as Chomsky argues in SS and L&M and T, the structure required to code for articulations are insufficient to represent core aspects of meaning. Here’s Chomsky in Topics (17):

It is clear…that deep structures must be quite different from this surface structure. For one thing, the surface representation in no way expresses the grammatical relations that are…crucial for semantic interpretation. Secondly, in the case of ambiguous sentences such as, for example, (5), only a single surface structure may be assigned but the deep structures must obviously differ. Such examples …are sufficient to indicate…that deep structures cannot be identified with surface structures. The inability of surface structures to indicate semantically significant grammatical relations (i.e., to serve as deep structures) is one fundamental fact that motivated the development of transformational generative grammar…

Thus any account of LC which wants to account for the human capacity to use an unbounded number of linguistic expressions (i.e. linguistic expressions a given native speaker has never before encountered) must include a system of rules that recursively generate sound meaning pairings based on different kinds of representations that are G related. Given LC and the fact meaning structures are different from sound structures there really is no other logical option than something like a transformational GG.

Before proceeding, I want to make an unpaid political announcement. GG has been regularly accused of dissing meaning. For example, the autonomy of syntax is often misunderstood as the irrelevance of semantics. As you all know, this is completely bogus. The autonomy of syntax thesis is the very very weak claim. It notes that syntactic properties are not reducible to semantic (or phonetic) ones. It does not deny that meaning (and sound) facts are G irrelevant.

Moreover, Chomsky emphasizes this point in all three works. In both T and L&M he emphasizes that the BIG problem with earlier Structuralism was its inability to accommodate the simplest facts about meaning (in particular what we now call theta roles (who did what to whom)). Thus, how language delivers meaning was at the center of Chomsky’s novel GG proposals and was the central feature of his critique of Structuralism. And this is not something that it takes sophisticated close textual analysis to discover. This leads me to think that many (maybe most) critics of GG’s syntactocentrism simply did not (and have not) read the work being criticized. Not only is this deeply ignorant, but is is intellectually irresponsible as well. Sadly, this kind of ignorant criticism has become a hallmark of the anti GG literature, something that people like Evans (see here and links provided) and Everett (see here) among others have further personalized. However, what is clear on rereading these classics is that these critiques are not based on even a cursory reading of the relevant texts.

Ok, back to the main programming: So, LC properly described leads quickly to the modern conception of grammar, one with distinctive levels for the coding of articulatory and semantic information (surface structure (S-S) and deep structure (D-S)) and operations that unite these levels (aka, transformations (T)). So what made deep structure deep was the realization that LC required it and once one had D-S and understood it to be structurally distinct from S-S then one needed Ts to relate them and the whole modern GG enterprise is up and running. Here’s Chomsky in L&M (17):

…the speaker makes infinite use of finite means. His grammar must, then, contain a finite system of rules that generate infinitely many deep and surface structures, appropriately related. It must also contain rules that related these abstract structures to certain representations of sound and meaning…

Well actually, most (not all) of the modern GG enterprise is motivated by the fact of LC, in particular the project of specifying the properties of particular human Gs and the enterprise of specifying the properties humans must have for acquiring these Gs. The minimalist program adds an extra dimension: the extra question (already mooted in these early works btw) of separating out the linguistically specific factors underlying these two capacities from the more cognitively and computationally general ones that are underlie capacities but are not specifically linguistically dedicated.

So, a recursive G is part of any theory aspiring to address the fact of LC and given the difference between S-S and D-S this G will have at least a D-S level, an S-S level and a T component to relate them. And this brings us to why deep structure was a deep discovery. Critically, structuralism was ready to recognize something like S-S. What structuralism missed was any level analogous to D-S, the level relevant to semantic interpretation. Again Chomsky in L&M (19):

…[M]odern structural and descriptive linguistics … restricts itself to the analysis of what I have called surface structure, to formal properties that are explicit in the signal and to phrases and units that can be determined from the signal by techniques of segmentation and classification…[S]uch taxonomic analysis leaves no place for the deep structures…[which] cannot be derived…by segmentation and classification of segmented units, nor can the transformational operations relating deep and surface structure…

So, what brought down classical structuralism and the Empiricist/Behaviorist psychology that it embraced? Well, the observation that LC required something like D-S. That, in short, is one reason why D-S really is deep.

I should add that the relevance of this line of thinking to G issues has still not been entirely internalized. There is an industry trying to show that phrase structure can be statistically induced from the signal, thinking that were this so the GG enterprise would be fatally wounded (see Elissa Newport’s work on this for example). There is nary a mention of the problem of relating D-Sish facts and S-Sish facts. The idea seems to be that if we could just get hierarchically structured S-Ss from the signal the whole GG project as envisioned by Chomsky over 60 years ago would be discredited as fundamentally empirically flawed. There is little recognition that the problems for structuralism and its attendant empiricist psychology started from the concession that S-S might be amenable to standard analytic (associationist) techniques.[2] The problem was that structuralism left out half the problem, the D-S part. Things, sadly, are no better today in much of the anti-GG literature.

There is a second reason that D-S was considered deep: it pointed to where language was likely to be invariant. Chomsky notes this in L&M discussing the philosophical grammarians (e.g. Port Royal types). He observes that modern conceptions of GG “make the assumption that languages will differ very little despite considerable diversity in superficial realization” (76). Where will languages be “similar”? “[O]nly at the deeper level , the level at which grammatical relations are expressed and at which the processes that provide for the creative aspect of language use are to be found” (77). Thus, D-S and the attendant operations that deliver a corresponding S-S were the natural locus of invariance given the obvious surface diversity of natural languages. Thus, the other deep property of D-S was that it and the principles mapping it to S-S were likely to be invariant across Gs, these invariances being key features of UG.

So, deep structure had some important features that arguably made it deep. But I can sense all you minimalists out there developing an uncomfortable intellectual itch that can be characterized roughly as follows: how deep could deep structure be given that contemporary theories have dispensed with it. Good itch. Let me scratch.

First, we have retained much of the point of deep structure in contemporary theory. So, for example, nobody now thinks that the syntactic structure relevant to surface phonetic form is the same as required to code for underlying grammatical function/thematic form. Indeed, given the predicate internal subject hypothesis there is almost no sentence, no matter how simple in which the underlying semantic subject (the external argument) starts in surface subject position (e.g. Spec T). The structure relevant to phon interpretation is understood as different from the syntax relevant for sem interpretation and it is taken for granted that any adequate G will have to generate an infinite number of phon-sem pairs. In other words, the moral of D-S has been completely internalized.

So too has the idea that D-S is G invariant. Contemporary syntactic theory does not tolerate variation in the mapping of theta roles to initial phrase marker positions. We are all UTAHers now! Thus, we do not expect Gs just like English but where affected objects are underlying subjects and agents are underlying objects. This is not a dimension of permissible variation. Nor do we expect the mapping principles that deliver CI (and possibly AP) interpretable objects to differ significantly. Operations are constrained by universal principles like phase impenetrability (aka subjacency), the ECP, minimality, etc. When we think of universals in GG, this is the kind of thing we are assuming. GG makes no claim about surface invariances. We expect the overt surface properties of language to vary dramatically. We expect little CI/LF variation and no variation in the principles of UG. Thus, invariance lives in the forms/derivations that feed CI, not in the surface realizations of these derivations. Again, this endorses much of the D-S conceptions outlined in SS, L&M and Topics.

So where the difference? Modern syntactic theory, minimalism, has largely abandoned the technology of D-S, not its grammatical point. Minimalism no longer assumes that there is a G level like deep structure or D-structure, i.e. a level at which GFs are determined by something like a phrase structure rules. This was part of every prior (Chomskyan) GG theory. The rejection of D-S (and its analogues) has been more or less complete.

We have given up the idea that D-S is the product of PS rules which all apply prior to displacement operations. In fact, thoroughly modern Minimalists don’t recognize a formal distinction between E-merge and I-merge, both just being instances of the same underlying Merge operation. Furthermore, Bare Phrase Structure has eliminated the distinction between Structure building and lexical insertion so critical to earlier D-S conceptions. In modern theory there is strictly speaking nothing like a PS rule anymore and so not much left of the idea that the Grammatical Functions relevant to semantic interpretation are coded via PS rules.

There remains one last residue of the old technical D-S idea. Some (e.g. Chomsky and PRO lovers everywhere) still hold onto the view that the logical GF roles are products of E-merge exclusively. Others (e.g. moi) do not restrict GF marking to E-merge, but allow that marking via I-merge. However, this is really the last place where the technical notion of D-S has life. I, of course, believe that I am right and Chomsky is wrong here (though I would not bet on myself, at least not a lot). However, this is really the last residue of the older conception of D-S as a level. The technical conception seems largely gone, though the empirical and conceptual points D-S served have been completely internalized.

Last point: here’s something else that struck me in rereading this literature: why is it that S-S and D-S don’t perfectly match. One can imagine a world in which these two had to coincide. In such a world there would be two articulations of flying planes can be dangerous (one corresponding to each interpretation) and passives (where surface and underlying grammatical relations do not coincide) would not exist. This is a perfectly conceivable universe, but it is not ours. Why not? Why is D-S distinct from S-S? Why don’t they match one to one? Might the mere fact that these two kinds of information are differentially encoded support Chomsky’s recent suggestions that the mapping to articulation is a late accretion and that the primary mapping is from something like D-S to SI? I don’t know, but it is curious that our world is not more neatly arranged. And that it is not, should be something we think about, and maybe one day address.

That’s it for now. The discovery of D-S launched the modern GG enterprise. The existence of D-Sish facts and what they means for GG are now part of the common wisdom. It is fair to say that D-S focused scientific attention on LC and Plato’s problem. If that ain’t “deep” I don’t know what could be.



[1] A great book btw, one that I would still recommend highly.
[2] Btw, this is almost certainly false once one starts thinking about how to “abstract” out categories that allow for recursion. It is one thing to define the VP in John saw the dog via these simple techniques and another to define the VPs in John saw the dog that Bill thinks that Mary kissed using them. Once we consider categories with recursive subparts the standard analytic techniques quickly fail. Simple phrase structure might be statistically coaxed from surface forms. Interesting ones with complex structure will not be.

Thursday, June 30, 2016

Two quick reads

Here are two quick pieces:

The first is on an unexpected consequence of making academic life more family friendly. Here are the facts: woman get pregnant. Men don't. Moreover, child bearing comes at an awkward time in the academic life cycle (i.e. right before tenure decision time). US universities have accommodated to this by freezing tenure clocks for those choosing to start families, in effect allowing pregnancy to lengthen the tenure clock. The NYT's piece reports on a study seeing how this worked out. The results is that it made it favored men. The reason is that the leave rule was applied equally to men and women in a family allowing both to use the lengthening provision. Men were able to use this extra time more effectively than women to burnish their research records. The result: men gained disproportionately from the new liberal leave rule. The piece also discusses ways of "fixing" this, but all in all, the result is that things get complicated.

Here's my hunch: the problem arises because of how we insist on evaluating research. There is a kind of assumption that the bigger a CV the better it is. Line items matter a too much. More is better. This really does handicap those that hit a dry patch, and given the biology of families, this means that on average women will have a tougher time of it than men if this be the criteria. We need a rethink here. Simple things like quality not quantity might help. But I suspect a better measure will arise if we shift from maximizing to satisfying: what do we consider a good/reasonable publication record. In fact, might too much publication be as bad as too little? What marks a contribution and what is just academic paper churning?

The second piece is by Frans de Waal on whether animals think. There is a line of thought (I have heard it expressed by colleagues) that denies that animals think because they identify that with having  linguistic capacity and animals don't have such. Hence they cannot think. This, btw, is a standard Cartesian trope as well; animals are machines bereft of res cogitates. De Waal begs to differ, as indeed does Jerry Fodor, who notes (quite rightly IMO) the following in LOT (1975):
‘The obvious (and I should have thought sufficient) refutation of the claim that natural languages are the medium of thought is that there are non-verbal organisms that think.’

Not only do animals think, they do so systematically. Of course, having linguistic capacity changes how you think. But then so does picking up a new word for something that you had no explicit word for. So, language affects thought, but being without language does not entail being thoughtless.

But this is not what I wanted to highlight in this piece. De Waal, one of the most important animal cognition people in the world, notes the obvious here concerning human linguistic capacity and its non continuity with what we find in animals:

You won’t often hear me say something like this, but I consider humans the only linguistic species. We honestly have no evidence for symbolic communication, equally rich and multifunctional as ours, outside our species. 
In other words, nothing does language like we do. There is no qualitative analogue to human linguistic capacity in the rest of nature. Period.

De Waal, however, makes a second important observation. Despite this unique human talent, there are    "pieces" of it in other parts of animal cognition.

But as with so many larger human phenomena, once we break it down into smaller pieces, some of these pieces can be found elsewhere. It is a procedure I have applied myself in my popular books about primate politics, culture, even morality. Critical pieces such as power alliances (politics) and the spreading of habits (culture), as well as empathy and fairness (morality), are detectable outside our species. The same holds for capacities underlying language.
There is a version of this observation that points to something like the Minimalist Program as an important project: find out which pieces are special to us that allow for our linguistic capacities and those that we share with other animals. Of course, the suggesting is that there will be pieces (or, if we are lucky, just one piece) that is special to us and that allows us to do linguistically what nothing else can. At any rate, De Waal is right: if one identifies the capacity for thought with the capacity for language then animals had better have (at least) rudimentary language. Of course, if we don't identify the two, as De Waal and Fodor urge, then there is nothing biologically untoward about one species of primate having a capacity unique among animals.