Faculty of Language was launched September 28, 2012. This means that it is now entering its third year and FoL is now in its terrible twos (TT). So no more cute quiet complaisant blog. No more walking on tip toes around important issues. No more shying away from polemics and vigorous debate. Now that we are in the TTs it's time to say what we really think and pursue the intellectual debate loudly and vigorously.
With this in mind, I would like to invite those who have been passive readers to join the fray. FoL was started to focus on the big issues that initially motivated the Generative enterprise. These, IMO, had lost the prominence they once had, and linguistics did not benefit from this. GG was once at the center of the cognitive revolution. Sadly, this is no longer so. It is similarly absent from much discussion in the cog-neuroscience of language. In other words, much of what GG has discovered has remained a well kept secret and the influence GG should have had on work in these areas has dissipated. I believe that we need to change this, both for the good of linguistics as a discipline and because we have important (indeed vital) contributions to make to the brain and cognitive sciences.
We need to reconnect with the big issues and vociferously push the consequences of our discoveries hard in the larger cog-neuro community. And part of this involves getting clear what we think these consequences are and part involves making sure that what we've done is neither misunderstood nor ignored. And this means talking up in public venues where the issues are raised and making sure that others get it, even if this means being intellectually pushy. And this means being ready to critique what we take to be work that ignores or /and flies in the face of all we know.
So let's make year 3 a good boisterous one. No more pussy-footing around. Please send me things YOU find important and relevant. Please send suggestions for things to discuss. Let's make lots of noise!! We will all benefit.
Monday, September 29, 2014
Friday, September 26, 2014
Never trust a fact that is not backed up with a decent theory, and vice versa
Experimental work is really hard. Lila once said loud enough for me to hear that you need to be nuts to do an experiment if you really don't have to. The reason is that they are time consuming, hard to get right and even harder to evaluate. We have recently being finding out how true this is, with paper after paper coming out arguing that much (most?) of what's in our journals is of dubious empirical standing. And this is NOT because of fraud. But because of how experiments get done, reported, evaluated and assessed with respect to professional advancement. Add to this the wonders of statistical methodology (courses in stats seem designed as how to manuals for gaming the system) and what we end up with is, it appears, junk science with impressive looking symbols. Paul Pietroski sent me this link to a book that reviews some of this in social psychology. But don't snigger, the problems cited go beyond this field, as the piece indicates.
I said that statistical methods are partly to blame for this. This should not be taken to imply that such methods when well used are not vital to empirical investigation. Of course they are! The problem is first, that they are readily abusable, and second the industry has often left the impression that facts that are statistically scrutinized are indubitable. In other words, the statistical industry has left the impression that facts are solid while theories are just airy fairy confabulation, if not downright horse manure. And you get this from the greats, i.e. how theory is fine but in the end the test of a true theory come from real world experiments yada yada yada. It is often overlooked how misleading real world experiments can be and how it often takes a lot of theory to validate them.
I think that there is a take home message here. Science is hard. Thinking is hard. There is no magic formula for avoiding error or making progress. What makes these things hard is that they involve judgment and this cannot be automated or rendered algorithmically safe. Science/thinking/judgment is not long division! But many think that it really is. That speculation is fine so long as made to meet the factual tribunal on a regular basis. On this view, the facts are solid, the theories in need of justification. Need I say that this view has a home in a rather particular philosophical view? Need I say that this view has a name (psst, it starts with 'Emp…). Need I say that this view has, ahem, problems? Need I say that the methodological dicta this view favors are misleading at best? We like to think that there are clear markers of the truth. That there is a method which if we follow it will get us to knowledge if only we persevere. There isn't. Here's a truism: We need both solid facts and good theories and which justify which is very much a local contextual matter. Facts need theoretical speculation as match as theoretical speculations need facts. It's one big intertwined mess, and we forget this, we are setting ourselves up for tsuris (a technical term my mother taught me).
I said that statistical methods are partly to blame for this. This should not be taken to imply that such methods when well used are not vital to empirical investigation. Of course they are! The problem is first, that they are readily abusable, and second the industry has often left the impression that facts that are statistically scrutinized are indubitable. In other words, the statistical industry has left the impression that facts are solid while theories are just airy fairy confabulation, if not downright horse manure. And you get this from the greats, i.e. how theory is fine but in the end the test of a true theory come from real world experiments yada yada yada. It is often overlooked how misleading real world experiments can be and how it often takes a lot of theory to validate them.
I think that there is a take home message here. Science is hard. Thinking is hard. There is no magic formula for avoiding error or making progress. What makes these things hard is that they involve judgment and this cannot be automated or rendered algorithmically safe. Science/thinking/judgment is not long division! But many think that it really is. That speculation is fine so long as made to meet the factual tribunal on a regular basis. On this view, the facts are solid, the theories in need of justification. Need I say that this view has a home in a rather particular philosophical view? Need I say that this view has a name (psst, it starts with 'Emp…). Need I say that this view has, ahem, problems? Need I say that the methodological dicta this view favors are misleading at best? We like to think that there are clear markers of the truth. That there is a method which if we follow it will get us to knowledge if only we persevere. There isn't. Here's a truism: We need both solid facts and good theories and which justify which is very much a local contextual matter. Facts need theoretical speculation as match as theoretical speculations need facts. It's one big intertwined mess, and we forget this, we are setting ourselves up for tsuris (a technical term my mother taught me).
Sunday, September 21, 2014
Mirror mirror in the brain
Lila Gleitman once conjectured that Empiricism, with its
associationist commitments, is innate. How else to explain its zombie like
capacity to repeatedly come back from intellectual death? One possible explanation for associationism’s
robustness is that it never returns in quite the same form. To paraphrase Mark
Twain, Empiricist history never repeats itself, but it often rhymes. I see this rhyming constantly.
Technologically, neural nets were perfectly compatible with Rationalist
sentiments (just a matter of initial weightings)[1],
nonetheless virtually all of the work done in this framework stank of
associationsism. The same holds, IMO,
for lots of recent Bayesian modeling and deep learning. There is nothing inherent in these approaches
that requires a coupling with Empiricist conceptions, but it seems that every
computational innovation seems drawn to Empiricism the way flies are to…, well
you know. It seems that we can now add
mirror neurons to the list. Why do I say
this? Because I’ve just finished reading a
terrific new book by Greg Hickock that critically reviews the mirror neuron
literature and its spiritual affinities with Behaviorism. But his book is not
merely a debunking (though don’t fret it does do a lot of that) of some bad
ideas which quickly became widely influential (another characteristic of
Empiricist fads). It is also both a nice accessible report of research on the
neuro frontier from a distinguished practitioner, and a nice case study in the
philo of science. What follows are some reasons I liked the book and why you
might find it worth dipping into.
In case you haven’t heard, mirror neurons are the
philosopher’s stones of contemporary neuro-science. Since their discovery in macaques in the late
1990s in Italy (I don't think macaques are native to Italy, just vacationing
there), they have been used to neuronally explain almost everything of
cognitive interest from language and its evolution to human empathy and autism.
What are these amazing brain mechanisms? Well, it seems that they are neurons that
fire both when the actor is acting and when the actor is watching someone else
act. They are neurons that seem part of both the motor and the conceptual
system. Or at least fire both when a monkey is reaching for something and when
s/he is watching someone else reaching for something. This has led to a robust
version of the motor theory of everything. In other words, understanding is
actually re-doing. I understand what reaching cognitively means by simulating
the reaching that I see. I understand what I hear by producing what I’ve heard.
I understand what someone is feeling, by reproducing the feeling in myself.
Talk about walking a mile in someone else’s shoes! That’s the basic idea, and
if Greg is right (and I am sure he is) this idea has really caught on.
What Greg does in the book is reveal that this simple idea
is, well, at best too simple and at worst, devoid of actual content. The
problem is not with the data: there are neurons that do what they have been
observed to do. However, the interpretation of what these firings mean has,
Greg argues, been deeply over-interpreted; over-interpreted to the point that
it is unlikely that much of a claim is being made in most cases.
All the book is great, but I particularly recommend the
sections on the role of anomalies in driving research and the terrific deflationary
section on embodied cognition, a notion that really deserves some critical
discussion, which Greg more than provides.
I’ve never understood why neuro types thought that embodied cognition
could serve as a basis for the “semantics” of action words, but it has. I would
recommend reading Greg on this and then, if you still want more, go back and
read Fodor and Pylyshyn on compositionality.
I also recommend you
take a careful look at Greg’s discussion of imitation (chapter 8) and its role
in “learning.” Here’s a short quote to give you the gist. There is
…a logical error in thinking about
imitation as the foundation for more complex capacities like theory of mind, or
that imitation itself had to evolve
to unleash a great leap forward. Maybe we should think the other way around.
Imitation is not the cause but the consequence of the evolution of human
cognitive abilities…(200)
And
For imitation to be at all useful,
you have to know what and when to imitate and you have to have the mental
machinery behind imitative behavior to put it to good use…More specifically, to
understand the role of imitation in language learning, we need to study how
language works…Or to frame it a bit differently, rather than centering our
theoretical efforts on imitation and then seeing what computational tasks
imitation might be useful for, we might center our focus on particular
computational tasks (language, understanding actions, grasping for objects) and
then see what role imitation may play…(201).
Imitation is a current refuge for associationist theories of
learning. And mirror neurons are the latest neuronal candidate for the
grounding of associationism. Greg’s
critical discussion, IMO, effectively blows up this bankrupt train of
theorizing. I’m not surprised, but I am grateful. Someone’s got to clean out
the Augean stables and Greg is very effective with a shovel.
Here’s another quote making the all too common link to
associationsism (228):
We’ve been down a similar road
before. Behaviorists had very simple mechanisms (association and reinforcement)
for explaining complex human behavior. But removing the mind as a mediator
between the environment and behavior ultimately didn’t have the required
explanatory oomph. Mirror neuron resonance theory isn’t quite behaviorism, but
there are not many degrees of separation because “it stresses… the primacy of a
“direct matching” between the observation and the execution of an action.” The
notion of “direct matching” removes the sort of operations that might normally
be thought to mediate the relation between observation and action systems…The
consequence of such a move is loss of explanatory power. The mirror neuron
direct matching claim results ina
failure to explain how mirror neurons know when to mirror in the first
place. We then have to look to the “cognitive system” for an explanation, which
lands us back where we started: with a complex mind behind the mirror neuron
curtain of explanation of complex mental functions.
As Greg notes, his critique of mirror neurons is a modern
revamping of Chomsky’s critique of Skinnerian behaviorism. Where it’s clear,
it’s clearly false and where it seems true it borders on the truistic and
vapid.
There is lots more in the book. For neophytes (e.g. me)
there is a good discussion of the dorsal and ventral systems of brain
organization (the how vs what organization of brains that Greg and David
Peoppel did so much to make part of the contemporary common neural wisdom in
the domain of language), and the various kinds of techniques that modern neuro
types use to probe brain structure. In
addition, there are lots of great examples that signal to the careful reader
that Greg is clearly a pretty good surfer and that he loves dogs.
So, if you are looking for a good popular neuro book or just
a good debunking, Greg’s book is a great place to go. Would make a marvelous Rosh
Hashanna present or a great Yom Kippur stocking stuffer.
Thursday, September 18, 2014
Commenting on Posts
Many have written to tell me that they had problems leaving a comment on a post. I am not sure why but I suspect it's because you have not chosen an "identity." At the bottom of the comment sections there is a box that asks you to choose an identity for posting. I have a google account and post as 'Norbert.' There are other options if you click on the box. But you need one of these to do anything. When I go to my parents and use their computer, I often forget to check this and my comments disappear (I know, would be better were this to happen more frequently) never to be seen again. Here's a little primer on how to comment.
IMO, lots of that's valuable on this site has come from the very many useful comments readers have provided. They are often (almost always) more thoughtful than the posts that they are commenting on. So please keep them coming.
IMO, lots of that's valuable on this site has come from the very many useful comments readers have provided. They are often (almost always) more thoughtful than the posts that they are commenting on. So please keep them coming.
Rethinking MOOCs
When MOOCs first came to the scene I expressed skepticism about their ultimate value for teaching and their capacity to really reduce costs without reducing educational quality (here, here). This, remember, was the selling point: more for less. Flash forward to today and it seems that the problems with MOOCs is becoming more and more apparent. Sure, high tech has a role to play in education (sort of like overhead projectors and power point) but it is not the panacea bureaucrats and entrepreneurs like to hype (no doubt for only purely reasons, like moving large amounts of money into their nap accounts). Well, it seems that MOOCs have hit their high water mark and skepticism about their general educational value is being reassessed. Not surprisingly, they can bring good results, but only if labor intensively used. Also, not surprisingly, it seems that getting people to use them means lowing what MOOCs are used to do. Here's a discussion. I don't buy it all, but it's a good sign of where the discussion is heading.
More on genes and language
Bill Idsardi sent this to me. Research on genes and language seems to be hotting up. Here is a study on rate of early word learning and a genetic difference that correlates with the variable rates.
Wednesday, September 17, 2014
Another Foxp2 article
Rich Hilliard sent me another report on the Foxp2 article that I brought to your attention yesterday. This one from the CBC, and being a very proud and smug Canadian I am bringing it to you attention as well. In addition, it has a very nice photo of a mouse with a re-engineered "humanized" Foxp2 gene (it really is adorable, btw). It also gives a few more details of the experiment and the different kinds of information that humanized mice integrated better than "just" mice did. Here's the short version of the experiment as told by the CBC: The experimenters
Jerry Fodor is reputed to have said that neuroscience has taught us virtually nothing about the mind. I am not sure that I entirely agree, but I am pretty sure that this work tells us next to nothing about language. Look, I love mice. They sing, they are cute, they run mazes better than I can, they navigate well in the dark. I am even sort of interested that one can put a human Foxp2 gene into a mouse. But the results of this experiment are very modest and have nothing whatsoever to tell us about language. I assume the language link is just there to hype the work. Show business.
"...trained mice to find chocolate in a maze. The animals had two options: use landmarks like lab equipment and furniture visible from the maze ("at the T-intersection, turn toward the chair") or by the feel of the floor ("smooth turn right, nubby turn left"). Mice with the human gene learned the route as well by seven days as regular mice did by 11….Surprsingly, however, when the scientists removed all the landmarks in the room, so mice could only learn by the feel-of-the-floor rule, the regular rodents did as well as the humanized ones. They also did just as well when the landmarks were present but the floor textiles were removed. It was only when mice culduse both learning techniques that those with the human brain gene excelled."This is the basis for the speculation that Foxp2 helps with language, for Graybiel interprets the results to "suggest" that Foxp2 enhances the capacity to transition "from thinking about something consciously to doing it unconsciously." And this relates to language how? Well when kids learn to speak they transition from consciously mimicking words they hear to speaking automatically. Really? This is the linking hypothesis? Am I alone in thinking that this gives speculation a bad name? It doesn't even rise to the level of a just-so story.
Jerry Fodor is reputed to have said that neuroscience has taught us virtually nothing about the mind. I am not sure that I entirely agree, but I am pretty sure that this work tells us next to nothing about language. Look, I love mice. They sing, they are cute, they run mazes better than I can, they navigate well in the dark. I am even sort of interested that one can put a human Foxp2 gene into a mouse. But the results of this experiment are very modest and have nothing whatsoever to tell us about language. I assume the language link is just there to hype the work. Show business.
Tuesday, September 16, 2014
Some fodder for lunch conversation
The inimitable Bill Idsardi sent me two links to a recent paper on Foxp2 (here and here). The paper, a collaborative effort between Ann Graybiel's lab at MIT and researchers at the Max Planck in Leipzig studied how mice equipped with a "humanized" form of the Foxp2 gene learned to run mazes. It seems that it helps, well at least some times in some ways. The big advantage the humanized gene provides os to facilitate the transition between declarative (deliberative) and procedural (automatic) forms of storing new info. At any rate, the mice did better at some tasks than those without the humanized form of the gene. The reports go onto speculate (and I do mean speculate) about how all of this might have something to do with language. Here's the AAAS version for the non-expert: "The results suggest the human version of the FOXP2 gene may enable quick switch to repetitive learning- an ability that could have helped infants 200,000 years ago better communicate with their parents." The emphasis in the previous quote is mine. I don't know if it is possible to make a more hedged "suggestion" but I sincerely doubt it. Even so, the report from Science does quote a skeptic who is "not sure how relevant the findings are to speech" given that the test relies on visual cues while speech relies on auditory ones. I think that were they to ask me I would have been more skeptical still as I am not sure I see what the bridging assumptions are that take one from facilitated routinization of maze running to even word learning (the capacity that Grabile cites in the MIT piece as possibly enhanced by this version of FOXP2 (is there a difference between FOXP2 and foxp2? I suspect that the former is the human one and the latter the non-human analogue. At any rate, …). Maybe, but it would have been nice to hear a little of how these two capacities might be related.
This might be interesting and important work. I am told that Graybiel is a big deal. Still, it is odd how little attempt there is to link this language gene to any language like effect. I suspect that the reasons for this (aside form the fact that it's probably hard at MIT to find anyone (e.g. a linguist) who knows anything about language (and yes that was sarcastic)) is that biologists are really flummoxed by language. The Science article notes in passing as if it were obvious the following: "As a uniquely human trait, language has long baffled evolutionary biologists" (2)." Funny, when I say things like this (e.g. that language is a species specific special capacity and that evolution has little to say about it) furor immediately erupts. However, it seems to be conventional wisdom, at least for Science writers (and both they and I are right about this). At any rate, take a look. It won't take long.
Here's one more thing that you might find interesting. Aaron White sent me this link to Michael Jordan where he discusses deep learning. His discussion of supervised vs unsupervised learning is useful coming from him. It's also short and he is also a big shot in this area so it's worth a quick look.
Thanks again to Bill and Aaron for this. Let me make it official: if you find something that you think would be of general interest, please send it along to me. One hope is that the blog can exploit the wisdom of crowds to make us all more aware with what is happening elsewhere that might be of general interest to us.
This might be interesting and important work. I am told that Graybiel is a big deal. Still, it is odd how little attempt there is to link this language gene to any language like effect. I suspect that the reasons for this (aside form the fact that it's probably hard at MIT to find anyone (e.g. a linguist) who knows anything about language (and yes that was sarcastic)) is that biologists are really flummoxed by language. The Science article notes in passing as if it were obvious the following: "As a uniquely human trait, language has long baffled evolutionary biologists" (2)." Funny, when I say things like this (e.g. that language is a species specific special capacity and that evolution has little to say about it) furor immediately erupts. However, it seems to be conventional wisdom, at least for Science writers (and both they and I are right about this). At any rate, take a look. It won't take long.
Here's one more thing that you might find interesting. Aaron White sent me this link to Michael Jordan where he discusses deep learning. His discussion of supervised vs unsupervised learning is useful coming from him. It's also short and he is also a big shot in this area so it's worth a quick look.
Thanks again to Bill and Aaron for this. Let me make it official: if you find something that you think would be of general interest, please send it along to me. One hope is that the blog can exploit the wisdom of crowds to make us all more aware with what is happening elsewhere that might be of general interest to us.
Monday, September 15, 2014
Computations. modularity and nativism
The last post (here)
prompted three useful comments by Max, Avery and Alex C. Though they appear to
make three different points (Max pointing to Fodor’s thoughts on modularity,
Avery on indirect negative evidence and Alex C on domain specific nativism) I
believe that they all end up orbiting a similar small set of concerns. Let me
explain.
Max links to (IMO) one of Fodor’s best ever book reviews (here).
The review brings together many themes in discussing a pair of books (one by
Pinker, the other by Plotkin). It outlines some links between computationalism,
modularity, nativism and Darwininan natural selection (DNS). I’ll skip the
discussion on DNS here, though I know that there will be many of you eager to
battle his pernicious and misinformed views (not!). Go at it.
What I think is interesting given the earlier post is Fodor’s linking
together computationalism, modularity and nativism. How do these ideas talk to one another? Let’s
start by seeing what they are.
Fodor takes computationalism to be Turing’s “simply terrific
idea” about how to mechanize rationality (i.e. thinking). As Fodor puts it (p.
2):
…some inferences are rational in
virtue of the syntax of the sentences that enter into them; metaphorically, in
virtue of the ‘shapes’ of these sentences.
Turing noted that, wherever an
inference is formal in this sense, a machine can be made to execute the
inference. This is because…you can make them [i.e. machines NH] quite good at
detecting and responding to syntactic relations among sentences.
And what makes syntax
so nice? It’s LOCAL. Again as Fodor
puts it (p. 3):
…Turing’s account of
computation…doesn’t look past the form of sentences to their meanings and it
assumes that the role of thoughts in a mental process is determined entirely by
their internal (syntactic) structure.
Fodor continues to argue that where this kind of locally focused computation is not
available, computationalism ceases to be useful. When does this happen? When belief fixation
requires the global canvassing and evaluation of disparate kinds of information
all of which have variable and very non-linear effects on the process.
Philosophers call this ‘inference to the best explanation’ (IBT) and the
problem with IBT is that it’s a complete and utter mystery how it gets done.[1]
Again as Fodor puts it (p. 3):
[often] your cognitive problem is
to find and adopt whatever beliefs are best confirmed on balance. ‘Best
confirmed on balance’ means something like: the strongest and simplest relevant
beliefs that are consistent with as many of one’s prior epistemic commitments
as possible. But as far as anyone knows, relevance, strength, simplicity,
centrality and the like are properties, not of single sentences, but of whole
belief systems: and there’s no reason at all to suppose that such global
properties of belief systems are syntactic.[2]
And this is where modularity comes in; for modular systems
limit the range of relevant information for any given computation and limiting
what counts as relevant is critical to allowing one to syntactify a problem and
allow computationalism to operate. IMO,
one of the reasons that GG has been a doable and successful branch of cog sci
is that FL is modular(ish) (i.e. that something like the autonomy of syntax is
roughly correct). ‘Modular’ means
“largely autonomous with respect to the rest of one’s cognition” (p. 3).
Modularity is what allows Turing’s trick to operate. Turing’s trick, the
mechanization of cognition, relies on the syntacticifcation of inference, which
in turn relies on isolating the formal features that computations exploit.
All of which brings us (at last!) to nativism. Modularity just is domain specificity. Computations are modular if they are “more or
less autonomous” and “special purpose” and “the information [they] can use to
solve [cognitive problems] are proprietary” (p. 3). So construed, if FL is modular, then it will also be domain specific. So if FL is
a module (and we have lots of apparent evidence to suggest that it is) then it
would not be at all surprising to find that FL is specially tuned to linguistic
concerns. And that it exploits and manipulates “proprietary information” and that
its computations were specifically “designed” to deal with the specific
linguistic information it worries about.
So, if FL is a module, then we
should expect it be contain lots of domain specific computational operations,
principles and primitives.
How do we go about investigating the if-clause immediately above?
It helps go back to the schema we discussed in the previous post. Recall
the general schema in (1) that we used to characterize the relevant problem in
a given domain, ‘X’ ranging over different domains. (2) is the linguistic case.
(1) PXD
-> FX -> GX
(2) PLD
-> FL -> GL
Linguists have discovered many properties of FL. Before the Minimalist Program (MP) got going,
the theories of FL were very linguistically parochial. The basic primitives,
operations and principles did not appear to have much to say about other
cognitive domains (e.g. vision, face recognition, causal inference). As such it
was reasonable to conclude that the organization of FL was sui generis. And to the degree that this organization had to be
take as innate (which, recall, was based on empirical
arguments about what Gs did) then to that degree we had an argument for innate
domain specific principles of FL. MP has
provided (a few) reasons for thinking that earlier theories overestimated the
domain specificity of FL’s organization. However,
as a matter of fact, the unification of FL with other domains of cognition (or
computation) has been very very very modest.
I know what I am hoping for and I try not to confuse what I want to be
true with what we have good reason to be true. You should too. Ambitions are
one thing, results quite another. How one might go about realizing these MP
ambitions?
If (1) correctly characterizes the problem, then one way for
arguing against a dedicated capacity
is to show that for various values of ‘X,’ FX is the same. So, say we look at
vision and language, then were FL = FV we would have an argument that the very
same kind of information and operations were cognitively at play in both vision
and language. I confess, that stating
things this baldly makes it very implausible that FL does equal FV, but heh,
it’s possible. The impressive trick
would show how to pull this off (as opposed to simply expressing hopes or
making windy assertions that this could be
done), at least for some domains. And the trick is not an easy one to execute:
we know a lot about the properties of natural language Gs. And we want an FL
that explains these very properties. We don’t want a unification with other FXs
that sacrifices this hard won knowledge to some mushy kind of “unification”
(yes, these are scare quotes) which sacrifices the specifics that we have
worked so hard to establish (yes Alex, I’m talking to you). An honest appraisal
of how far we’ve come in unifying the principles across modules would conclude
that, to date, we have very few results suggesting that FL is not domain specific. Don’t get me wrong:
there are reasons to search for such unifications and I for one would be
delighted if this happens. But hoping is not doing and ambitions are not
achievements. So, if FL is not a dedicated capacity, but is merely the
reflection of more general cognitive principles then it should be possible to
find FL being the same as some FX (if
not vision, then something else) and that this unified FX’ (i.e. which
encompasses FL and FX) can derive the relevant Gs with all their wonderful
properties given the appropriate PLD. There’s a Nobel prize awaiting such a
unification, so hope to it.[3]
It is worth noting that there is tons of standard variety
psycho evidence that FL really is modular with respect to other cognitive
capacities. Susan Curtiss (here
and here)
reviews the wealth of double dissociations between language and virtually any
other capacity you might be interested in. Thus, at least in one perfectly
coherent sense, FL is a module and so a dedicated special purpose system.
Language competence swings independently of visual acuity, auditory facility,
IQ, hair color, height, voacab proficiency, you name it. So if one takes such
dissociations as dispositive (and it is the gold standard) then FL is a module
with all that this entails.
However, there is a second way of thinking about what
unification of the cognitive modules consists in and this may be the source of
much (what I take to be) confused discussion. In particular, we need to
separate out two questions: ‘Is FL a module?’ and ‘Is FL contain linguistically
proprietary parts/circuits?’ One can maintain that FL is a module without also thinking
that its parts are entirely different
from those in every other module. How so?
Well, FL might be composed from the same kinds of parts present in other
modules, albeit put together in distinctive ways. Same parts, same
computations, different wiring. If this were so, then there would be a sense in
which FL is a module (i.e. it has special distinctive proprietary computations
etc.), yet when seen at the right grain it shares many (most? All?) of its
basic computational features with other domains of cognition. In other words, it
is possible that FL’s computations are distinctive and dedicated, and that they are built from the same simple
parts found in other modules. Speaking personally, this is how I now understand
the Minimalist Bet (i.e. that FL shares many basic computational properties
with other systems).
This is a coherent position (which does not imply it is
correct). At the cellular level our organs are pretty similar. Nonetheless, a
kidney is not a heart, and neither is a liver or a stomach. So too with FL and other cognitive “organs.” This is a possibility (in fact, I have argued
in places that this is also plausible and maybe even true). So, seen from the
perspective of the basic building blocks, it is possible that FL, though a
separate module, is nonetheless “just like” every other kind of cognition. This
version of the “modularity” issue asks not whether FL is a domain specific
dedicated system (it is!), but whether it employs primitive circuits/operations
proprietary to it (i.e. not shared with other cognitive domains). Here ‘domain
specific’ means uses basic operations not attested in the other domains of
non-linguistic cognition.
Of course, the MP bet is easy to articulate at a general
level. What’s hard is to show that it’s true (or even plausible). As I’ve argued before, to collect on this bet
requires, first, reducing FL’s internal modularity (which in turn requires
showing Binding, movement, control, agreement, etc. are really only apparently
different) and, second, showing that this unification rests on cognitively generic
basic operations.[4]
Believe me when I tell you that this program has been a hard sell.
Moreover, the mainstream Minimalist position is that though
this may be largely correct, it is exactly wrong: there are some special purpose linguistic
devices and operations (e.g. Merge), which are responsible for Gs distinctive
recursive property. At any rate, I think the logic is clear so I will not
repeat the mantra yet again.
This brings me to the last point I want to make: Avery notes
that more often than not positive evidence relevant to fixing a grammatical
option is missing from the PLD. In other
words, Avery notes that the PLD is in fact even more impoverished than we tend
to believe. He rightly notes that this implies that indirect negative evidence
(INE) is more important than we tend to think.
Now if he is right (and I have no reason to think that he isn’t), then
FL must be chocked full of domain specific information. Why? Because INE
requires a sharp specification of options under consideration to be operative. Induction that uses INE effectively must be
richer than induction exploiting only positive data.[5]
INE demands more articulated
hypothesis space, not less. INE can compensate for poor direct evidence but
only if FL knows what absences it’s looking
for! You can hear the dogs that don’t bark but only if you are listening
for barking dogs. If Avery’s cited example is correct (see here),
then it seems that FL is attuned to micro variations, and this suggests a very
rich system of very linguistically specific micro parameters internal to FL. Thus, if Avery is right,
then FL will contain quite a lot of very domain specific information and given
that this information is logically necessary to exploit INE it looks like these
options must be innately specified
and that FL contains lots of innate domain specific information. Of course,
Avery may be wrong and those that don’t like this conclusion are free (indeed
urged) to reanalyze the relevant cases (i.e. to indulge in some linguistic
research and produce some helpful results).
This is a good place to stop. There is an intimate connection between modularity,
computationalism, and nativism. Computations can only do useful work where
information is bounded. Bounded information is what modules provide. More often
than not the information that a module exploits is native to it. MP is betting
that with respect to FL, there is less language specific basic circuitry than heretofore assumed. However, this does not
imply that FL is not a module (i.e. part of “general intelligence”). Indeed,
given the kinds of evidence that Curtiss reviews, it is empirically very likely that FL is a module. And this can
be true even if we manage to unify the internal modules of FL and demonstrate
that the requisite remaining computations largely exploit domain general
computational principles and operations. Avery’s important question remains:
how much acquisition is driven by direct and how much by indirect negative
evidence? Right now, we don’t really know (at least not to the level of detail
that we want). That’s why these are still important research topics. However, the logic is clear, even if the answers
are not.
[1]
Incidentally, IBT is one of the phenomena that dualists like Descartes pointed
to in favor of a distinct mental substance. Dualism, in other words, is roughly
the observation that much of thought cannot be mechanized.
[2]
It’s important to understand where the problem lies. The problem is not giving
a story in specific cases in specific contexts. We do this all the time. The
problem is providing principles that select out the IBT antecedent to a
specification of the contextually relevant variables. The hard problem is
specifying what is relevant ex ante.
[3]
Successful unifications almost always win kudos. Think electricity and
magnetism, the the latter two with the weak force, terrestrial and celestial
mechanics, chemistry and mechanics. These all get their own chapters in the
greatest hits of science books. And in each case, it took lots of work to show
that the desired unification was possible. There is no reason to think that
cognition should be any easier.
[4]
I include generic computational principles here, so-called first factor
computational principles.
[5]
In fact, if I understand Gold correctly (which is a toss up), acquiring
modestly interesting Gs strictly using induction over positive data is
impossible.
Tuesday, September 9, 2014
Rationalism, Empiricism and Nativism -2
In an earlier post (here),
I reviewed Fodor’s and Chomsky’s argument concluding that anyone that believes
in induction must be a nativist. Why?
Because all extant inductive theories of belief fixation (BF) are selection theories and all selection
theories presuppose a given
hypothesis space that characterizes all the possible
fixable beliefs. Thus, anything that
“learns” (fixes beliefs) must have a representation of what is learned (a given
hypothesis space) which is used to evaluate the input/experience in fixing
whatever beliefs are fixed. Absent this,
it is impossible to define an inductive procedure.[1]
Thus, trivially (or almost tautologically (see note 1)), whatever one’s theory of induction,
be it Rationalist or Empiricist, everyone is a nativist. The question is not whether nativism but
what’s native. And here is where Rationalists and Empiricists actually differ.
Before going on, let me remind you that both Fodor and
Chomsky (and all the participants at Royaumont it seems to me) took this to be
a trivial, nay, almost a tautological consequence of what induction is. However, this does not mean that it is not
worth remembering and repeating. It is still the case that intelligent people
confuse Rationalism with Nativism and assume that Empiricists have no nativist
commitments. This suggests that Rationalists contrast with Empiricists in
making fancy assumptions about minds and hence bear the burden of proof in any
argument about mental structures.
However, once it is recognized that all psychological theory is
necessarily nativist, the burden shifting manoeuver looses much of its punch.
The question becomes not whether the
mind is pre-stocked with all sorts of stuff, but what kind of stuff it is
stuffed with and how this stuff is organized.
Amy Perfors (here)
says this exactly right (135)[2]:
…because all models implicitly
define a hypothesis space, it does not make sense to compare models according
to whether they build hypothesis spaces in. More interesting questions are:
What is the size of the latent hypothesis space defined by the model? How
strong or inflexible is the prior?...
So given that everyone is a nativist, how to decide between
Rationalist (R) vs Empiricist (E) approaches to the mind. First of all, note
that given that everyone is a trivial nativist the debate between Rs and Es
necessarily revolves around how
beliefs are fixed and what this implies for the mind’s native structure. Interestingly,
probing this question ends up focusing on what kind of experience is required
to fix a given belief.
Es have traditionally taken the position that beliefs are
fixed by positive exposures to extensions of the relevant concepts. So, for
example, one fixes the belief that ‘red’ means RED by exposure to red, and that
‘dog’ means DOG by exposure to dogs. Thus, there is no belief fixation without
exposure to tokens in the relevant extensions of a concept. It is in this sense
that Es see the environment as shaping
mental structure. Minds track environmental input and are structured by this
input. The main contribution that minds make to the structure of their contents
is by being receptive to the information that the environment makes available. On
an E view, the trick is to figure out how to extract information in the signal. As should be obvious,
this sort of view champions the idea that minds are very good statistical
machines able to find valuable informational needles in potentially very large
input haystacks. Rs have no problem with this assumption, but they argue that
it is insufficient to account for our attested cognitive capacities.
More particularly, Rs argue that there is more to the
fixation of belief than environmental input. Or, another way of making this
same point, is that the beliefs that get fixed via exposure to input data far
outrun the information available from that input. Thus, thought the environment
can trigger the emergence of beliefs they
do not shape them for we have
ideas/concepts that are not themselves tokened in the input. If this is correct, then Rs reason that
hypothesis spaces are highly structured and what you come to “know” is strongly
affected by this given structure. Note
that the disagreement between Rs and Es hinges on what it is possible to glean
from available input.
So how to approach this disagreement in a semi-rational
manner? This is where the Logical
Problem of Acquisition (LPA) comes in.
What is the LPA? It’s an attempt
to specify the nature of the input data that an Acquisition Device (AD) has access
to and to then compare this to the properties of the attained competence.
Chomsky discusses the general form of this approach in chapter 1 of Reflections on Language (here).
In the study of language, the famous diagram in (1)
concisely describes the relevant issues:
(1) PLDL
-> FL -> GL
PLDL is the name we give to the linguistic data
from L that a child (actually) uses in building its grammar. FL is, well you
know, and GL is the resultant grammar that a native speaker
attains. One can easily generalize this
schema to other domains of inquiry by subbing other relevant domains for “L.” A
generalized version of the schema is (2) (‘X’ being a variable ranging over
cognitive domains of interest) and a version of it as applied to vision is (3).
So, if one’s interest is in visual object recognition (as for example in Marr’s
program), we can consider the schema in (2) as outlining the logic to be
explored (PVD = Primary visual data, FV = Faculty of Vision, GV = grammar (i.e.
rules) of vision).[3]
(2) PXD
-> FX -> GX
(3) PVD
-> FV -> GV
This schematic rendition of the LPA focuses the R vs E
debate on the information available in PXD. An Eish conception commits hostages
to the view that PXD is quite rich and that it provides a lot of information
concerning GX. To the degree that
information about GX can be garnered from PXD to that degree we need not
populate FX with principles to bridge the gap. Rish conceptions rest on the
view that PXD is a rather poor source of information relevant to GX. As a result, Rs assume that FX is generally
quite rich.
Note that both Rs and Es assume that FX has a native
structure. This, recall is common to both views. The question at issue is how
much belief fixation (or more exactly the fixation of a particular belief) owes to the nature of the data and how much to
the structure of the hypothesis space. As a first approximation one can say
that Rs believe that given hypothesis spaces are pretty highly structured so
that the data required to “search” that space can be quite sparse. Conversely,
the richer the set of available alternatives the more one needs to rely on the
data to fix a given belief. Thus for Rs all the explanatory action lies in
specifying the narrow range of available alternatives, while for Es most of the
explanatory action lies in specifying the (most often nowadays, statistical)
procedures that determine how one moves across a rather expansive set of
possibilities.
The schemas above suggest ways of investigating this
disagreement. Let’s consider some.
E invites the view that, ceteris
paribus, variations in PXD should lead to variations in GX as the latter
closely tracks properties of the former (it is in this sense that Es think of
PXD as shaping a person’s mental
states). Thus, if some kinds of inputs
are systematically absent in an individuals’ PXD, we should expect that that
individual’s cognitive development and attained competence should differ from
that of a individual with more “normal” inputs. Hume (our first systematic
associationist psychologist) gives a useful version of this view:[4]
…wherever by any accident the
faculties which give rise to any impressions are obstructed in their
operations, as when one is born blind or deaf, not only the impressions are
lost, but also their corresponding ideas; so that there never appear in the
mind the least traces of either of them.
There’s been lots of research over the last 50 years exploring
Hume’s contention in the domain of language acquisition. Lila Gleitman and Barbara Landau (G&L)
provides a good brief overview of some of the child language research
investigating these matters.[5]
It notes that the evidence does not support this prediction (at least in the
domain of language). Rather it seems that “humans reconstruct linguistic form
…[despite] the blatantly inadequate information offered in their usable
environment (91).” In other words, it seems that the course of language
acquisition can proceed smoothly (in fact no differently than what happens in
the “normal” case) even when the input to the system is perceptually very
limited and degraded. G&L interpret this Rishly to mean that language
acquisition is relatively independent of the nature of the quality of the
input, which makes sense if it is guided by a rich system of innate knowledge.
G&L illustrate the logic using two kinds of examples:
blind people can and do learn the meanings of words like ‘see’ and ‘look’
without being able to see or look, and people can acquire full native
competence (and can make very subtle “perceptual” distinctions in their
vocabulary) despite being blind and deaf. Indeed, it seems that even extreme
degradation of the sensory channels leaves the process of language acquisition
unaffected.
It is worth noting just how degraded the input can be when
compared to the “normal” case. G&L reporting Carol Chomsky’s original research
on learning via the Tadoma method (92):[6]
To perceive speech at all, the
deaf-blind must place their fingers strategically at the mouth and throat of
the speaker, picking up the dynamic movements of the mouth and jaw, the timing
and intensity of the vocal-cord vibration, and the release of air…From this
information, differing radically in kind and quality from the continuously
varying speech wave, the blind-deaf recover the same ornate system of
structured facts as do hearing learners…
In short, there is plenty of evidence that language
acquisition can (and does) take place in the face of extremely degraded input,
at least when compared with the PLD available in the standard case.[7]
The Poverty of Stimulus (PoS) argument also reflects the
logic of the schemas in (1-3). As the schema suggests, a PoS has two major
struts: a description of the available PLD and a description of the grammatical
operations of interest (i.e. the relevant rules). The next step compares what information
can be gleaned about the operation from the data, the slack is then used to
probe the structure of FL. The standard PoS question is then: what must we
assume about FL so that given the
witnessed PLD, the LAD can derive the
relevant rules? As the schema indicates,
the inference is from instances of
rules (used outputs of a grammatical system) to the rules that generate the
observed sentences. Put another way, whatever else is going on, the LPA
requires that FL at least contain
some ways of generalizing beyond the PLD. This is not controversial. What is
controversial is how fancy these methods for generalizing beyond the data have
to be. For Es, the generalizing procedures are quite anodyne. For Rs it is
often quite rich.
Well-designed PoS arguments focus on grammatical phenomena
for which there is no likely relevant
information available in the PLD. If Es are right (see Hume above), all
relevant grammatical operations and principles should find (robust?) expression
in the PLD. If Rs are right, we should find lots of cases where speakers develop
grammatical competence even in the absence of relevant PLD (e.g. all agree that “John expects Mary to hug
himself” is out and that “John expects himself to hug Mary is good” where
‘John’ is the antecedent of ‘himself’).
It goes without saying that given this logic debate between
Es and Rs will revolve around how to specify the PLD in relevant cases (see here
for a sophisticated discussion). So for example, all accept the idea that PLD
consists of good examples of the relevant operation (e.g. all take: “John
hugged himself” to be a typical data point bearing on principle A (A)). What of
negative data, data that some example is unacceptable with the indicated
interpretation (e.g. that “John expects Mary to hug himself” is out)? There is every reason to think that overt
correction of LAD “mistakes” barely occurs. So, in this sense the PLD does not contain negative data. However,
perhaps for the LAD absence of evidence is evidence of absence. In other words,
perhaps for the LAD failing to
witness an example like “John expects Mary to hug himself” leads to the
conclusion that the dependency between ‘John’ and ‘himself’ in these
configurations is illicit. This is entirely possible. So too with other *-cases.[8]
Note, that this reasoning requires a fancier FL than one
that simply assumes that all decisions are made on the basis of positive
data. So the logic of LPA is respected
here: we compensate for the absence of certain information in the PLD (i.e.
direct negative evidence) by allowing FL to evaluate expectations of what
should be seen in the PLD were a given construction good.[9]
The question an R would ask an E is whether the capacity to compute such
expectations doesn’t itself require a pretty hefty native capacity. After all,
many things are absent from the data, but only some of these absences tell us
anything (e.g. I would bet that for most cases in the PLD the anaphor is within
5 words of the antecedent, nonetheless “John confidently for a man of his age
and temperament believes himself to be ready to run the marathon” seems fine).
One assumption I commonly make in considering PoS arguments
is that PLD effectively consists of simple acceptable sentences (e.g. “John
likes himself”). This is the so-called
Degree 0 hypothesis (D0H).[10] If the PLD is so restricted, then FL must be very rich indeed for many robust
linguistic phenomena are simply unattested
(and recall, induction is impossible in the absence of any data to drive it) in
simple clauses; e.g. island effects, ECP effects, many binding effects,
minimality effects a.o. The D0H may be too strong, but there are two (maybe one
as they are related) reasons for thinking that it is on the right track.
The first is Penthouse Principle (PP) Effects. Ross noted long ago that there are many
operations restricted to main clauses but virtually none that apply exclusively
to embedded clauses. Subject Aux Inversion and Tag Question formation are two
examples from English. If we assume that
something like D0H is right(ish) we expect all idiosyncratic processes to be
restricted to main clauses where substantial evidence for them will be
forthcoming. Embedded clauses, on the
other hand should be very regular. At the very least we expect no operations to apply exclusively to embedded domains, the
converse of the PP as given D0H there can be no evidence to fix them.
The second reason relates to this. It’s a diachronic
argument David Lightfoot gave based on the history of English (here).
It is based on a very nice observation:
main clause properties can affect embedded clause properties but not vice
versa. Lightfoot illustrates this by considering the shift from OV to VO in
English. He notes that in the period in
which the change occurred, embedded clauses always
displayed OV order. Despite this, English changed from OV to VO. Lightfoot reasons as follows: were embedded
clause information robustly available there would have been very good evidence
that, despite appearances to the contrary
in unembedded clauses, that English was OV not VO (i.e. the attested change
to VO (which ended up migrating to embedded clauses) would never have occurred.
Thus, the fact that English changed in this way is nice (and influences in the
other direction are unattested) follows if something like D0H holds (viz. an
LAD don’t use embedded clause information child in the acquisition of its
grammar). Lisa Pearl subsequently elaborated a sophisticated quantitative
version of this argument here
and here.
The upshot: D0H holds. Of course if it does then the strong version of PoS
arguments for many linguistic phenomena readily spring to mind. No data, no induction. No induction, highly structured
natively given hypothesis spaces guiding the AD.
OK, this post has gotten out of control and is far too long.
Let me end by reiterating the take-home message. Rs and Es differ not on whether nativism but
on what is native. And, exploring the latter effectively revolves around
considerations of how much information the data contains (and the child can
use) in fixing its beliefs. This is where the action is. Research like what
G&L review is interesting in that it shows that achieved competence seems
quite insensitive to large variations in the relevant usable data. Classical PoS
arguments are interesting in that they provide cases where it is arguable that
there is no data at all in the input
relevant to fixing a given belief. If
this is so, then the mechanisms of belief fixation must lean very heavily on
the highly structured (and hence restricted nature) of the hypothesis space
that ADs natively bring to the belief fixation process. In R/E debates everyone
believes that input matters and everyone believes that minds have native
structure. The argument is about how
much each factor contributes to the process. And this, is something that can
only be adjudicated empirically. As things stand now, IMO, the fertility of the
Rish position in the domain of language (most of cognition actually) has been
repeatedly demonstrated. Score one (indeed many) for Descartes and Kant.
[1]
In effect, induction serves to locate a member/members from a given set of
alternatives. No pre-specified alternatives, no induction. Thus Fodor’s point:
for learning (i.e. belief fixation) to be possible requires a given set of
concepts that mediate the process.
Fodor emphasizes that this view, though trivial, is
not purely tautological. There does exist a tautological
claim that some have confused with Fodor’s. This misreading interprets Fodor as
saying that any acquired concept must be acquirable (i.e. a principle of modal
logic along the lines of: If I do
have the concept that I could have
had the concept). Alex Clark, for example, so reads Fodor (here): “There is a tautological claims which is that
I have an innate intellectual endowment that allows me to acquire the concept
SMARTPHONE in some way, on the basis of reading, using them, talking to people
etc. Obviously any concept I have, I must have the innate ability to have it…”
Fodor
notes this possible interpretation of his views at Royaumont (p. 151-2), but
argues that this is not what he is
claiming. He says the following: “The
banal thesis is just that you have the innate potential of learning any concept
you can in fact learn; which reduces, in turn, to the non-insight that whatever
is learnable is learnable. …What I intended to argue is something very much
stronger; the intended argument depends on what learning is like, that is the
view that everybody has always accepted, that it is based on hypothesis
formation and confirmation. According to that view, it must be the case that
the concepts that figure in the hypothesis you come to accept are not only potentially accessible to you, but are actually exploited to mediate the learning…The
point about confirming a hypothesis like "X is miv off it is red and
square" is that it is required that not only red and square be potentially
available to the organism, but that these notions be effectively used to
mediate between the organism's experiences and its consequent beliefs about the
extension of miv…”
In other words, if inductive logics require given
hypothesis spaces to get off the ground and if we attribute an inductive logic
to a learner then we must also be attributing to them the given hypothesis
space AND we must be assuming that it is in virtue of exploiting the properties
of that space in fixing a belief. So far as I can tell, this is what every inductivist is in fact
committed to.
[2]
Despite the terminological misstep of identifying Rationalism with Nativism on
p 127.
[3]
In Marr’s program, the grammar includes the rules and derivations that get us
from the grey scale sketch to the 2.5D sketch.
[4]
This is quoted in Gleitman and Landau, see note 4. The quote is from Hume’s Treatise p 49.
[6]
Carol Chomsky’s original papers on this topic are appendixed in book. They are
well worth reading. On the basis of the reported speech, the Tadoma learners
seem indistinguishable from “normal” native speakers.
[7]
G&L also note the excess of data problem towards the end of their paper.
This is something that Gleitman has explored in more recent work (discussed here and in
links cited there). Lila once noted that a picture is worth a thousand words,
and that is precisely the problem. In the early period of word learning the
child is flooded with logical possibilities when word learning is studied in
naturalistic settings. Here induction
becomes a serious challenge not because there is no information but because
there is too much and narrowing it down to the relevant stuff is very hard.
Lila and colleagues have argued that in such cases what the child does bears
relatively little resemblance to the careful statistical sampling that one
might expect if acquisition were via “learning.” This suggests that there must
be a certain sweet spot where data is available but not too available for
learning (induction) to be a viable form of acquisition. Where this is not
possible other acquisition procedures appear to be at play, e.g. guess and
guess again! Note, that this amounts to saying that resource constraints are
key factors in making “learning” an option. In many cases, learning (i.e.
reviewing the alternatives systematically) is simply too costly, and other less
seemingly rational procedures kick in. Interestingly, form an R perspective, it
is precisely when the field of options is narrowed (when syntax kicks in) that
something akin to classical learning appears to become viable.
[8]
For reasons I have never quite understood, many (see here)
have assumed that GGers are hostile to the idea that LADs can use “negative”
data productively. This is simply false.
See Howard Lasnik (here)
for a good review. As Lasnik notes, the
possibility that negative data could be relevant goes back at least to
Chomsky’s LGB (if not earlier). What is
relevant, is not whether negative data might be useful but what kinds of minds
can productively use it. The absence of
a barking is useful when one is listening for dogs. Thus, the more constrained the space of
options under consideration the easier it is to use absence of evidence as
evidence of absence. If you have no idea what you are looking for, not finding
it is of little informational value.
[9]
For example, Chater and Vitanyi (C&V) (here)
order the available hypotheses according to “simplicity” measured in MDL terms,
not unlike what Chomsky proposed in Aspects.
Not surprisingly, given such an ordering indirect negative evidence can be
usefully exploited (something that would not surprise a GGer). What C&V do not consider are the
possibility of cases where there is virtually no relevant positive or negative data in the PLD. This is what is
taken to be the strongest kind of PoS argument and is the central case
discussed in at least one of the references C&V cite (see here).
[10]
Most who think that this is more or less on the right track actually take
“simple” to mean un-embedded binding domains (e.g. Lightfoot). This is sometimes called Degree 0+. Thus, ‘Bill’ is in the PLD in (i) but not in
(ii):
(i)
John believes Bill to be intelligent
(ii)
John believes (that) Bill is intelligent