Here is a piece (here)
by Geoff Pullum (GP) celebrating the 50th anniversary of Aspects. This is a nice little post. GP
has a second post that he mentions in the one linked to on the
competence/performance distinction. I’ll put up a companion piece to this
second post anon. Here are a few comments on GP’s Aspects post.
Here GP gives a summary that feels right to me (i.e. my
recollections match GP’s) about the impact that Aspects had on those that first read it. Reading chapter 1 felt
revelatory, like a whole new world
opening up. The links that it forged between broad issues in philosophy (I was
an undergrad in philosophy when I first read it) and central questions in
cognition and computation were electrifying. Everyone in cognition broadly
construed (and I do mean everyone: CSers, psychologists, philosophers) read Aspects and believed that they had to read it. Part of this may have
been due to some terminological choices that Chomsky came to regret (or so it I
believe). For example, replacing the notion “kernel sentence” with the notion
“deep structure” led people to think, as GP put it:
Linguistics isn’t a matter of classifying
parts of sentences anymore; it was about discovering something deep, surprising
and hidden.
But this was not the reason for its impact. The reason Aspects was a go-to text was that
chapter 1 was (and still is) a seminal document of the Cognitive Revolution and
the study of mind. It is still the best single place to look if one is
interested in how the study of language can reveal surprising, non-trivial
features about human minds. So perhaps there is something right about the deep in Deep Structure. Here’s what I
mean.
I believe that Chomsky was in a way correct in his choice of
nomenclature. Though Deep Structure itself was/is not particularly “deep,”
understanding the aim of syntax as that which maps between phrase markers that represent
meaning-ish information (roughly thematic information, which, recall, was coded
at Deep Structure)[1]
with structures that feed phonetic expression is deep. Why? Because, such a mapping is not surface evident and it
involves rules and abstract structure with their own distinctive
properties. Aspects clarifies what is more implicit in Syntactic Structures (and LSLT, which was not then widely
available); namely that syntax manipulates abstract structures (phrase
markers). In particular, in contrast to Harris, who understood Transformations
as mapping sentences (actually items in a corpus (viz. utterances)) to
sentences, Aspects makes clear this
is not the right way to understand transformations or Gs. The latter map phrase
markers to other phrase markers and eventually to representations of sound and
meaning. They may map relations between sentences, but only very indirectly. And
this is a very big difference in the conception of what a G is and what a
transformation is, and it all arises in virtue of specifying what a Deep
Structure is. In particular, whereas utterances are plausibly observable, the
rules that do the mappings that Chomsky envisaged are not. Thus, what Aspects did was pronounce that the first
object of linguistic study is not
what you see and hear but the rules,
the Gs that mediate two “observables”:
what a sentence means and how it is pronounced. This was a real big deal, and
it remains a big deal (once again, reflect on the difference between Greenberg
and Chomsky Universals). As GP said above, Deep Structure moves us from
meditating on sentences (actually, utterances or items in corpora) to thinking
about G mappings.
Once one thinks of things in this way, then the rest of the
GG program follows pretty quickly: What properties do Gs have in common? How
are Gs acquired on the basis of the slim evidence available to the child? How
are Gs used in linguistic behavior? How did the capacity to form Gs arise in
the species? What must G capable brains be like to house Gs and FL/UGs? In
other words, once Gs become the focus of investigation, then the rest of the GG
program comes quickly into focus. IMO, it is impossible to understand the
Generative Program without understanding chapter 1 of Aspects and how it reorients attention to Gs and away from, as GP
put, “classifying parts of sentences.”
GP also points out that much of the details that Aspects laid out have been replaced with
other ideas and technology. There is more than a little truth to this. Most
importantly, in retrospect, Aspects
technology has been replaced by technicalia more reminiscent of the the Syntactic Structures (SS)-LSLT era. Most
particularly, we (i.e. minimalists) have abandoned Deep Structure as a level. How
so?
Deep Structure in Aspects
is the locus of G recursion (via PS rules) and the locus of interface with
the thematic system. Transformations did not create larger phrase markers, but
mapped these Deep Structure PMs into others of roughly equal depth and length.[2]
In more contemporary minimalist theories, we have returned to the earlier idea that
recursion is not restricted to one level (the base), but is a function of the
rules that work both to form phrases (as PS rules did in forming Deep Structure
PMS) and transform them (e.g. as movement operations did in Aspects). Indeed, Minimalism has gone
onse step further. The contemporary conceit denies that there is a fundamental
distinction between G operations that form constituents/units and those that displace
expressions from one position in a PM to another (i.e. the distinction between
PS rules and Transformations). That’s the big idea behind Chomsky’s modern
conception of Merge, and it is importantly different from every earlier conception of G within Generative Grammar. Just as
LGB removed constructions as central Gish objects, minimalism removed the PS/Transformation
rule distinction as a fundamental grammatical difference. In a merge based
theory there is only one recursive rule and both its instances (viz. E and I
merge) build bigger and bigger structures.[3]
Crucially (see note 3), this conception of structure
building also effectively eliminates lexical insertion as a distinct G
operation, one, incidentally, that absorbed quite a bit of ink in Aspects. However, it appears to me that
this latter operation may be making a comeback. To the degree that I understand
it, the DM idea that there is late lexical insertion
comes close to revitalizing this central Aspects
operation. In particular, on the DM conception, it looks like Merge is understood
to create grammatical slots into which contents are later inserted. This
distinction between an atom and the slot that it fills is foreign to the
original Merge idea. However, I may be wrong about this, and if so, please let
me know. But if so, it is a partial return to ideas central to the Aspects inventory of G operations.[4]
In addition, in most contemporary theories, there are two other
lasting residues of the Aspects conception
of Deep Structure. First, Deep Structure in Aspects
is the level where thematic information meets the G. This relation is
established exclusively by PS rules. This
idea is still widely adopted and travels under the guise of the assumption that
only E-Merge can discharge thematic information (related to the duality of interpretation
assumption). This assumption regarding a “residue” of Deep Structure, is the
point of contention between those that debate whether movement into theta
positions is possible (e.g. I buy it, Chomsky doesn’t). [5]
Thus, in one sense, despite the “elimination” of DS as a central minimalist
trope, there remains a significant residue that distinguishes those operations
that establish theta structure in the grammar and those that transform these
structures to establish the long distance displacement operations that are
linguistically ubiquitous.[6]
Second, all agree that theta domains are the smallest (i.e.
most deeply embedded) G domains. Thus, an expression discharges its thematic
obligations before it does anything
else (e.g. case, agreement, criterial checking etc.). This again reflects the Aspects idea that Deep Structures are
inputs to the transformational component. This assumption is still with us;
despite the “elimination” of Deep Structure. We (and here I mean everyone,
regardless of whether you believe that movement to a theta position is licit)
still assume that a DP E-merges into a theta position before it I-merges
anywhere else, and this has the consequence that the deepest layer of the
grammatical onion is the theta domain. So far as I know, this assumption is
axiomatic. In fact, why exactly sentences are organized so that the theta
domain is embedded in case/agreement domain, which is in turn embedded in
A’-domains is entirely unknown.[7]
In short, Deep Structure, or at least some shadowy residue,
is still with us, though in a slightly different technical form. We have
abandoned the view that all thematic
information is discharged before any
transformation can apply. But we have retained the idea that for any given
“argument” its thematic information is discharged before any transformation
applies to it, and most have further retained
the assumption that movement into theta positions is illicit. This is pure Deep
Structure.
Let me end by echoing GP’s main point. Aspects really is an amazing book, especially chapter 1. I still
find it inspirational and every time I read it I find something new. GP is
right to wonder why there haven’t been countless celebrations of the book. I
would love to say that it’s because its basic insights have been fully absorbed
into linguistics, and the wider cognitive study of language. It hasn’t. It’s
still, sadly, a revolutionary book. Read it again and see for yourself.
[1]
Indeed, given the Katz-Postal hypothesis all
semantic information was coded at Deep Structure. As you all know, this idea
was revised in the early 70s with both Deep Structure and Surface Structure
contributing to interpretation. Thematic information still coded in the first
and scope information and binding and other semantic effects coded in the
second. This led to a rebranding, with Deep
Structure giving way to D-Structure.
This more semantically restricted level was part of every subsequent mainstream
generative G until the more contemporary Minimalist period. And, as you will
see below, it still largely survives in modified form in thoroughly modern
minimalist grammars.
[2]
“Roughly” because there were pruning rules that made PMs smaller, but none that
made them appreciably bigger.
[3]
In earlier theories PS rules built structures that lexical insertion and
movement operations filled. The critical feature of Merge that makes all its
particular applications structure building operations is the elimination of the
distinction between an expression and the slot it occupies. Merge does not
first form a slot and then fill it. Rather expressions are combined directly
without rules that first form positions into which they are inserted.
[4]
Interestingly, this makes “traces” as understood within GB undefinable, and
this makes both the notion of trace and that of PRO unavailable in a Merge
based theory. As the rabbis of yore were found of mumbling: Those who
understand will understand.
[5]
Why only E-merge can do so despite
the unification of E and I merge is one of the wedges people like me use to
conceptually motivate the movement theory of control/anaphora. Put another way,
it is only via (sometimes roundabout) stipulation that a minimalist G sans Deep
Structure can restrict thematic discharge to E-merge.
[6]
In other words, contrary to much theoretical advertising, DS has not been
entirelyeliminated in most theories, though one central feature ahs been
dispensed with.
Interesting that you liken Minimalism to Syntactic Structures rather than Aspects. One could just as well analyze Minimalism as a return to Aspects: Deep Structure is furnished by derivation trees, which can be described in terms of context-free phrase structure grammars, and derivation trees are mapped to phrase structure trees that are only linearly bigger (unless you have overt copying). There is also a difference between Merge and Move in that the mapping brings about major structural changes for the former but not the latter.
ReplyDeleteI like this perspective because it highlights that we have made a lot of progress in characterizing this mapping. Peters & Ritchie showed that Aspects didn't have a good handle of that, that's why you got the Turing equivalence with even very harsh restrictions on D-Structure. They also pointed out in a follow-up paper (which seems to have been ignored at the time of publishing) that bounding the mapping with respect to the size of D-Structure lowers expressivity quite a bit. What you get is a non-standard class that generates
1) all context-free languages,
2) some but not all context-sensitive languages,
3) some (properly) recursively enumerable languages.
In hindsight, we can recognize this as a first rough approximation of the mildly context-sensitive language. So Aspects was on the right track, but the mapping was still too powerful --- and also too complicated, which is why the Peters and Ritchie proofs are pretty convoluted. Unifying all transformations in terms of a few strongly restricted movement operations (and doing away with deletion) has really cleared things up.
Just to be clear, I'm not saying that your characterization is less adequate. Rather, this is a nice demonstration that one and the same piece of technical machinery can be conceptualized in various ways to highlight different aspects.
@Thomas Graf What's the reference for the follow-up P&R paper?
ReplyDelete@INCOLLECTION{PetersRitchie73a,
Deleteauthor = {Peters, Stanley and Ritchie, Robert W.},
title = {Non-Filtering and Local-Filtering Transformational Grammars},
year = {1973},
editor = {Hintikka, Jaakko and Moravcsik, J.M.E. and Suppes, Patrick},
booktitle = {Approaches to Natural Language},
publisher = {Reidel},
address = {Dordrecht},
pages = {180--194}
}
Let me flesh out the technical side a bit. P&R show that a transformational grammar restricted to context-free D-structures and local-filtering transformations is rather peculiar with respect to weak generative capacity. The claims I made above are established as follows:
1) The fact that every context-free language can be generated is an immediate consequence of the context-freeness of D-structures.
2) These restricted transformational grammars cannot generate the language a^(2^(2^n)), which is context-sensitive.
3) P&R show that every recursively enumerable language is the intersection of some transformational language with a regular language. But since the next weaker class --- the class of recursive languages --- is closed under intersection with regular languages, the previous result can hold only if transformational grammars generates some non-recursive (and thus recursively enumerable) languages.
One minor correction to what I said in point 2 above: the paper does not show that some properly context-sensitive languages are generated by this formalism. It is in principle possible that expressivity jumps immediately from context-free all the way up to a specific subset of the recursively enumerable languages. That said, I am pretty sure that context-sensitive languages like a^n b^n c^n or a^(2^n) can be generated by the formalism, though I haven't worked out specific transformational grammars for these languages.