- what exactly it encompasses,
- why it is necessary,
- why it shows certain properties but not others.
Showing posts with label Labels. Show all posts
Showing posts with label Labels. Show all posts
Monday, July 21, 2014
What's in a Category? [Part 2]
Last week I wondered about the notion of syntactic category, aka part of speech (POS). My worry is that we have no clear idea what kind of work POS are supposed to do for syntax. We have some criteria for assigning POS to lexical items (LI) --- morphology, distribution, semantics --- but there are no clear-cut rules for how these are weighed against each other. Even worse, we have no idea why these are relevant criteria while plausible candidates such as phonological weight and arity seem to be irrelevant.1 So what we have is an integral part of pretty much every syntactic formalism for which we cannot say
Monday, July 14, 2014
What's in a Category? [Part 1]
Norbert's most recent comments on Chomsky's lecture implicitly touched on an issue that I've been pondering ever since I realized how categories can be linked to constraints. Norbert's primary concern is the role of labels and how labeling may drive the syntactic machinery. Here's what caught my attention in his description of these ideas:
In effect, labels are how we create equivalence classes of expressions based on the basic atomic inventory. Another way of saying this is that Labeling maps a "complex" set {a,b} to either a or b, thereby putting it in the equivalence class of 'a' or 'b'. If Labels allow Select to apply to anything in the equivalence class of 'a' (and not just to 'a' alone), we can derive [structured linguistic objects] via Iteration.Unless I'm vastly miscontruing Norbert's proposal, this is a generalization of the idea of labels as distribution classes. Linguists classify slept and killed Mary as VPs because they are interchangeable in all grammatical sentences of English. Now in Norbert's case the labels presumably aren't XPs but just lexical items, following standard ideas of Bare Phrase Structure. Let's ignore this complication for now (we'll come back to it later, I promise) and just focus on the issue that causes my panties to require some serious untwisting:
- Are syntactic categories tied to distribution classes in some way?
- If not, what is their contribution to the formalism?
- What does it mean for a lexical item to be, say, a verb rather than a noun?
- And why should we even care?
Wednesday, July 2, 2014
Comments on lecture 3-part the first
For some mysterious reason, these comments kept growing and
growing. I have decided to split them into three parts for easier handling. It
also has the advantage of allowing me a break over the coming long weekend. Oh yes, here's a link to the lectures again.
People, fasten your seatbelts and put up your tray tables,
in the third lecture Chomsky really puts the pedal to the me(n)tal. The first two lectures saw Chomsky arguing
that the minimal recursive operation (Merge, the single evolutionary “miracle”)
in combination with minimal computation (Phases by way of minimal
search/minimal “memory” and a gift from physics (if we only understood it
better) and so not part of UG) provides Gs that generate an infinite number of
structured linguistic objects (SLO) which manifest displacement,
reconstruction, morphology and cyclicity effects. In lecture 3, Chomsky turns his attention to
projection (i.e. X’ theory), in particular the idea that phrases are headed. He
argues that X’ theory is fundamentally misguided. In particular, Chomsky argues
that within the computational system (CS) units are not headed (i.e. there is
no analogue of the labeled phrase), though the objects transferred to the
sensory-motor (SM) and conceptual-intentional (CI) systems are headed, though, so far as I can tell, not headed with the
conventional labels. There are no more CPs or TPs or VPs, rather there are f
headed, WH headed, Topic headed etc expressions.
In what follows, I will try to present (i) Chomsky’s
arguments against labeled expressions in CS, (ii) his proposed minimal labeling
algorithm (MLA) and (iii) the main “empirical” payoff of the MLA. I put
“empirical” in scare quotes because as becomes clear, and Chomsky emphasizes,
the results deal with highly idealized examples. Getting the suggestions to pan
out in detail are left as exercises/projects for the sympathetic. The main emphasis of lecture 3, like that in
the first two, is on conceptual arguments that maintain that any analysis other
than the one that on offer is, at best, a fancy re-description of the
phenomenon of interest. Why? Because Chomsky claims to offer the conceptually
most minimal requirements. If
correct, all other accounts are more complex and so must carry less explanatory
oomph. The methodological assumption is that departures from the conceptual
minimum can only stipulate, not explain. As in previous posts, I will throw my
2 cents in occasionally during the course of exposition.
1. Why
Labels/Projections are Problematic
From the perspective of lectures 1 and 2, there is little
surprise, really, that Chomsky is hostile to the idea that labeling/projection
is part of CS. The reason is that if
Merge is the one “miracle” (and by this I just mean the simple adventitious evo
addition that made I-language possible) then labeling (the operation that
underlies projection) is a definite complication. Indeed, as Chomsky notes,
Merge without labels is sufficient to
generate an infinite number of hierarchical LSOs with movement like
dependencies. There is no need for
labels to do this and so, adding
labels cannot be the conceptually most minimal assumption. Conclusion? No
labels in the CS part of G.
But, Chomsky believes that empirically there is good evidence for labels (though, curiously,
he does not review what this is). So if they don’t arise in CS where do they
come from? Chomsky proposes that labels
are required for LSOs to be illegible at the interfaces. In other words, Bare
Output Conditions (BOC) demand labeled LSOs. No labels, no phonetic or CIish interpretations.
It is worth observing that this line of argument follows
quite neatly from Chomsky’s formulation of the SMT. Recall that the SMT is the
view that FL is the optimal realization of interface conditions. What Chomsky
means by this is not that human Gs
match properties of the interface as well as any G can (this is what I used to
think he meant, but I was wrong). Rather it means that FL is what we get from
(the) optimal (i.e. simplest, most conceptually minimal) rules and
computational principles that make the most minimal adjustments to accommodate
the readability conditions of CI (and secondarily and with a lot more
tinkering) SM. As Merge is the conceptually simplest possible rule, then if
labels exist (and Chomsky assumes they do), they must reflect the demands of
the interfaces. So what are labels? They must be the products of the
conceptually simplest rules that satisfy the legibility requirements of the
interfaces. There is nowhere else for them to come from given Chomsky’s
minimalist conceptual considerations, driven as they are by Occam.
We can take this reasoning a further step: if the interfaces
is where labels are needed, then labeling optimally occurs at the point where
SLOs are transferred to the interfaces (i.e. during Transfer at the phase
level). Thus, labels are not part of the “transformational” component of
G. Merging and CS could care less about
labels. CI and SM care a whole bunch.
A comment: Chomsky likes to note in this lecture that making
things simple regularly generates puzzles. I think he is right, and the point
is an important one. Stipulations resolve problems, but in very uninteresting
ways. So, it is not surprising that conceptual simplification generates
empirical problems. Here’s one that Chomsky did not mention in his lecture:
there is plenty of evidence that Gs manipulate some kinds of phrases and not
others. Thus, English has VP fronting and VP ellipsis, French does not.[1]
Or English overtly moves WHs while Chinese does not. This fact has two sides. First, that English
moves verb phrases, not just verbs.
Indeed, English doesn’t like to move verbs, but finds moving verb phrases just fine. And second that
English seems to care less how big
the verb phrases are. The displaced
element can be one word or arbitrarily large so long as it is a verb phrase. The fact that phrases (Max
Ps) move (rather than X’s say) is considered a descriptive staple.[2]
Such data make sense if units in CS are phrase like objects (viz. labeled
constituents). But sans labels (and hence sans notions like XP and X’) well,
let’s just say that why/how we get these kinds of data is a bit of a puzzle.
One that I am sure that Chomsky would welcome and I hereby bequeath to
him.
I should add that I believe that these kinds of facts,
staples of the GG literature since “the earliest days of Generative Grammar,”
are very problematic for an approach like Chomsky’s. To date (for the last
decade at least), we have stopped worrying about such data relegating their
eventual understanding to unknown principles of pied piping. Maybe, but it is
worth noting that one of the classical reasons for postulating labeled phrases
and rules that target them are these kinds of constituency tests. And these
tests, at least to me, seem more robust than any of the data that Chomsky cites
in lecture 3 (e.g. EPP and Subject/object asymmetries).[3]
2. The
Minimal Labeling Algorithm
Ok, let’s back to lecture 3.
So where do labels come from? They are the product of the conceptually
minimal labeling algorithm (MLA); a rule that looks for the most “prominent”
expression in a set. So, if Merge
creates objects like {a,b} then at Transfer, MLA applies to find the most
prominent element in the set and that is the set’s Label. “Most prominent element” means, least
embedded atom. This works well, Chomsky notes, in the standard head-complement
case where a is an atom (viz. an X0
in X’ terms) and b is a set of atoms
(and XP in X’ terms). In such a case a
is more prominent than anything in b.
Note that the MLA assumes that labels must
be atoms for otherwise b is as
prominent as a is. So, the MLA codes part of what X’ theory did:
it labels some complex expressions (sets) in terms of properties of the head (in X’ terms) of that complex
expression (set). Chomsky assumes that this application of MLA is conceptually
unproblematic as it is very simple (i.e. minimal search) and can apply
unambiguously (i.e. there must be exactly one most prominent atom to find).[4]
Take note of this non-ambiguity assumption, for it plays a major role in
Chomsky’s account.
Chomsky next turns to the problematic cases.[5]
First he considers the old “bottom of the tree” problem. Merge can combine two
atoms to yield a structure where a
and b are both atoms. An example might be {the,dog} or {eat, it}.
Here we would seem to have two atoms and so application of the MLA is ambiguous
(something that Chomsky considers a computationally unacceptable situation (I’m
not sure why? G operations must be deterministic?)). He resolves this by
adopting a Distributed Morphology proposal concerning categorization. In
particular, he assumes that lexical atoms are not Ns, Vs, As, etc. but roots.
Being an N, V or A is not a property of an atom, but a relation between a root
(R) (I am using R because Blogger does not like the standard root symbol) and a
functional expression. Thus, a noun like dog
is not an atom but a little set (viz. {n, R(dog)}).
To get the labeling to work our right, Chomsky makes a
second assumption: roots are too “weak” to label (i.e. they are inherently
incapable of labeling). Given this, the MLA must choose n as label of {n,R(dog)} for the root is not a potential label.[6]
Thus, the MLA identifies the shallowest
that can act as a label in an {a,b} structure.
In sum, the “bottom of the tree” problem (misnamed as there
are no trees if Chomsky is right and thinking in tree terms is positively
harmful to your theoretical linguistic health, as Chomsky notes (see below))
follows from the MLA, a rule that seeks out the most prominent potential
labeler.
Comment: Chomsky’s assumption is the thin end of a very
dangerous wedge, I believe. Recall the minimalist project is to minimize the
grammatically parochial in FL. This has been the impetus behind eliminating
levels and simplifying rules to their conceptually bare minimum (e.g. eliminate
traces). Indeed, as Chomsky has previously argued, we don’t want grammar
internal primitives for they exacerbate Darwin’s Problem (DP). With this in
mind, what are we to make of elements like n,
v, a, the functional atoms that convert roots into categories? Aren’t these quintessentially grammar
internal elements? And if so, don’t they need to get into FL as well? Put
another way, roots are just lexical atoms. Categories like n,v,a are grammatical formatives that turn lexical atoms into
grammatically manipulable objects. This
suggests that in their absence, LSOs cannot interface with CI. But it seems
unlikely to me that n,v,a are CI
interface categories. But if not, they are grammatical constructs whose
existence exacerbate DP.
Perhaps these functional elements relate to lexicalization,
the other mystery Chomsky often discusses. As noted in discussion of lecture 1
(here),
Chomsky has identified two very distinctive properties of natural language:
hierarchical recursion and the large number and odd interpretive properties of
lexical items. Maybe n,v,a relate to
the mysterious issues surrounding lexicalization. At any rate, it is not clear
to me why Merge can’t work with roots alone or why roots should be so
grammatically impotent. We can assume
this and it works. But the claim that the weakness assumption is conceptually
minimal still needs some argument, I believe. Btw, I generally stiffen when
people use ‘weak’ and ‘strong’ in explanations. They are words that often
signal that we don’t know what we are talking about. Here, ‘weak’ just means
that it cannot label. Don’t be fooled into thinking that it explains why labeling is impossible. It
doesn’t. It’s just a stipulation to make the trains run on time.[7]
Second point: It would be nice to have some more examples to
play with. The dog is easy when
compared with eat it. What’s the
structure of the latter? It cannot be {(R(eat), {n,R(it)}} for then this could not be labeled at all, or it
would be labeled n as n is the most prominent label-ful
expression. Nor can it be {{v, R(eat)},
it} for the MLA would label this an it-phrase.
Nor can it be {{v,R(eat)}, {n,it}} for
here the MLA cannot apply unambigously. So what is it? Note that this is just
the bottom of the tree problem (i.e. {X0, Y0}) all over
again, but this time with roots factored in. If anyone has any ideas about
this, please let me/us know. One unfortunate feature of the lectures that I
mentioned at the outset is the dearth of illustrative examples. There are a
few, but we really don’t get to see many full-fledged derivational details, and
for fixing ideas this is definitely not a plus. Moreover, Chomsky does not
respond well when asked to provide concrete examples. David Pesetsky and Sabine
Iatridou (and a few others who I could not identify) asked for these (at least
obliquely) during the lectures, but without success. Oh well, I guess it’s up
to us. So let me know.
[1]
I should say, “English” to note that English really does not exist (it is an
abstraction). But I won’t. Put in scare quotes where appropriate.
[2]
Thus we can get (i) but not (ii):
(i)
Persuade Mary that Frank loved her John did
(ii)
*Persuaded Mary that Frank loved her John
(ii) would be derived by moving T’ (T0+vP)
to the front on analogy with (i) where vP is fronted. At any rate, there is virtually no evidence
that Gs ever target X’-projections, a point (and problem) that Chomsky made
(and addressed) in his 95 book.
[3]
The other classical data motivating endocentric labeling are selection and
subcategorization restrictions. The relevant kinds of data are reviewed in Understanding Minimalism chapter 6.
[4]
In the simple case, the assumption that labels are atoms follows from the
non-ambiguity thesis. However, in the more complex cases that Chomsky
discusses, it is not clear why
complex elements cannot serve as labels. Nor is it clear to me whether this
makes any difference.
[5]
Chomsky has long taken the head-complement relation as grammatically basic.
This was true in LGB where objects
were the best behaved of elements and this is true in this case as well; the
simple head complement case being the unproblematic one. You see it as well in his objections to
Specifiers. On the face of it, it’s hard to see why the distinction between
first merged element and second merged element should make much of a
grammatical difference. But Chomsky seems to have the intuition that first
merged elements are less problematic grammatically than non first merged
expressions. If anyone knows why, send me a note.
[6]
If this is correct, then the search procedure is not “minimal.” The MLA must
scan the entire structure, i.e. both the root and the n to find the head, at least in this case. Rather what’s critical
is the non-ambiguity assumption, i.e. the assumption that the choice of label is unambiguous.
[7]
In addition to informal concepts like weak and strong, another sure sign of
mystery mongering is capitalization. Remember SUBJECT or CHAIN?
Monday, February 3, 2014
Derivation Trees: Syntacticians' Best Friend?
Monday, August 26, 2013
Got Culture?
In the last chapter of Dehaene’s Reading the Brain he speculates
about one of the really big human questions: whence culture? The books big
thesis, concentrating on reading and writing as vehicles for cultural
transmission, is the Neuronal Recycling Thesis (NRT). The idea is simple;
culture supervenes on neuronal mechanisms that arose to serve other ends. Think
exaptation as applied to culture. Thus,
reading and writing are underpinned by proto letters, which themselves live on
ecologically natural patterns useful for object recognition. So too, the hope goes, for the rest of what
we think of as culture. However, as Dehaene quickly notes, if this is the
source, and “we share most, if not all of these processors [i.e. recycled
structures NH] with other primates, why are we the only species to have
generated immense and well-developed cultures” (loc 4999). Dehaene has little
patience for those who fail to see a qualitative difference between human
cultural achievements and those of our ape cousins.
…the scarcity of animal cultures
and the paucity of their contents stand in sharp contrast to the immense list
of cultural traditions that even the smallest human groups develop
spontaneously. (loc 4999)
Dehaene specifically points to the absence of “graphic
invention” in primates as “not due to any trivial visual or motor limitation”
or to a lack of interest in drawing, apparently (loc 5020). He puts the problem
nicely:
If cultural invention stems from
the recycling of brain mechanisms that humans share with other primates, the
immense discrepancy between the cultural skills of human beings and chimpanzees
needs to be explained. (loc 5020)
He also surveys several putative answers, and finds them
wanting. His remarks on Tomasello (loc 5046-5067) seem to me quite correct,
noting that though Tomasello’s mind reading account might explain how culture
might spread and its achievements retained cross generationally:[1]
…it says little…about the initial
spark that triggers cultural invention. No doubt the human species is
particularly gifted at spreading culture – but it is also the only species to create culture in the first place. (loc
5067, his emphasis)
So what’s Dehaene’s proposal?
My own view is that another
singular change was needed - the capacity to arrive at new combinations of
ideas and the elaboration of a conscious mental synthesis (loc 5067).
This is quite a mouthful, and so far as I can see, what
Dehaene means by this is that our frontal lobe got bigger and that this
provided a “”neuronal workspace” whose main function is to assemble, confront,
recombine, and synthesize knowledge” (loc 5089).
I don’t find this particularly enlightening. It’s
neuro-speak for something happened, relevant somethings always involving the
brain (wouldn’t it be refreshing if every once in a while the kidney, liver or
heart were implicated!). In other words, the brain got bigger and we got
culture. Hmm. This might be a bit unfair. Dehaene does say more.
He notes that the primate cortex, in contrast to ours, is
largely modular, with “its own specific inputs, internal structure, and
outputs.” Our prefrontal cortex in contrast “emit and receive much more diverse
cortical signals” and so “tend to be less specialized.” In addition, the our
brains are less “modular” and have greater “bandwidth.” This works to prevent
“the division of data and allows out behavior to be guided by any combination
of information from past or present experience.” (loc 5089)
Broken down to its essentials, Dehaene is here identifying
the demodularization of thought as the key ingredient to the emergence of
culture. As he notes (loc 5168), in this he agrees with Liz Spelke (and others)
who has argued that the general ability to integrate information across modules
is what spices up our thinking beyond what we find in other primates. Interestingly for my purposes here, Spelke
ties this capacity for cross module integration to the development of linguistic
facility (see here).
This assumption, that language is a necessary condition for
the emergence of the kind of culture we see in humans is consistent with the
hypothesis Minimalists have been assuming (following people like Tatersall (here))
that the anthropological “big bang,” which occurred in the last 25-50,000 years,
piggy backed on the emergence of FL in the last 50-100,000 years. Moreover,
it’s language as module buster that gets the whole amazing culture show on the
road.
But what features of language make it a module buster? What allows grammar to “assemble and
recombine” otherwise modular information? What’s the secret linguistic sauce?
Sadly, neither Dehaene nor Spelke say. Which is too bad as me and my lunch buddies
(thx Paul, Bill) have discussed this question off and on for several years now,
without a lot to show for it. However, let me try to suggest a key
characteristic that we (aka I) believe is implicated. The key is syntax!
The idea is that FL provides a general-purpose syntax for
combining information trapped within modules.
Syntax is key here, for I am assuming (almost certainly wrongly, so feel
free to jump in at any point) what makes information modular is some feature of
the module internal representations that make it difficult for them to
“combine” with extra-modular information. I say syntax for once information trapped within a module can combine
with information in another module it appears that, more often than not, the
combination can be interpreted. Thus, it’s not that the combination of
modularly segregated concepts is semantically undigestible, rather the problem
seems to be getting the concepts to talk to one another in the first place, and,
I take this to mean, to syntactically combine. So module busting will amount of
figuring out how to treat otherwise distinct expressions in the same way. We
need some kind of abstract feature that, when attached to an arbitrary
expression, allows it to combine with any other expression from any other
module. What we need, in effect, is,
what Chomsky called, an “edge-feature,” (EF) a thingamajig that allows expressions to
freely combine.
Now, if you are like me, you will not find this proposal a
big step forward for it seems to more name a solution than provide one. After
all, what can EFs be such that they possess such powers? I am not sure, but I am pretty confident that
whatever this power is it’s purely syntactic. It is an intrinsic property of
lexical atoms and it is an inherited property of congeries of such (i.e.
outputs of Merge). I have suggested (here)
that EFs are, in fact, labels, which function to close Merge in the domain of
the lexical items (LIs). In the same place I proposed that labeling is the
distinctively linguistic operation, which in concert with other cognitively
recycled operations, allowed for the emergence of FL.
How might labels do this?
Good question. An answer will require addressing a more basic question: what
are labels? We know what they must do:
they must license the combination both of lexical atoms and complexes of
such. Atomic LIs are labels. Complexes of LIs are labeled in virtue of
containing atomic ones. The $64,000 question (doesn’t sound like much of a
prize anymore, does it?) is how to characterize this. Stay tuned.
So, culture supervenes on language and language is the
recycling of more primitive cognitive operations spiced with a bit of labeling.
Need I say that this is a very “personal” (read “extremely idiosyncratic and
not currently fashionable”) view?
Current MP accounts are very label-phobic. However, the question Dehaene raises is a
good one, especially for theories like MP that presuppose lots of cognitive
recycling.[2] It’s not one whose detailed answer is
anywhere on the horizon. But like all good questions, I suspect that it will
have lots of staying power and will provide lots of opportunities for fun
conversations.
Subscribe to:
Posts (Atom)