The new AI is called "deep learning." There is a lot of money going into it (Google, I think just bought Hinton, the big cheese in this area) and a lot of useful applications being explored. It is, of course, being trotted out as the new panacea for cognitive theory. Sadly (or maybe happily), the stuff is beyond my pay-grade, but the little I understand of it makes it look a lot like the structuralist discovery procedures of yore. Here's a very short article on it in the Financial Times. I'm not sure whether you will be able to get to it as it matt be behind a paywall. If you can get to it, it's worth a look. What I liked is the blend of respect for what it can do combined with a certain skepticism regarding the more grandiose claims. Here's a taste from Oren Etzioni.
“There are plenty of exciting individual applications. But if you scratch a little deeper, the technology goes off a cliff,” he said.
Thursday, July 31, 2014
Wednesday, July 30, 2014
Built in maps
Bill Idsardi sent me this piece on bird navigation. It seems that Swainson thrushes have genetically built in maps that get them from Canada to Mexico and parts of south and central America. Moreover, it appears that these maps come in two flavors, some taking a coastal route and others taking a more medial one. As the author puts it, these routes can overlap and when they do (well let me quote here) "they have a chance to (ahem) mingle." Mingling has consequences and these consequences can apparently end up with mixed maps. As the researchers put it: "it is believed birds have genetic instructions on which direction they need to head and how long they need to fly," though, as conceded "it’s still a mystery how, exactly, a bird’s DNA tells it where to go."
This is interesting stuff (the area of study is called "vector navigation"). I bring it to your attention here for the obvious reason: whatever is genetically coded is very fancy: maps and routes. And despite the fact that the scientists have no idea how it is so coded does not stop them from concluding that it is so coded.
This is quite different from the attitudes in the the study of humans, as you all know. Chomsky has pointed repeatedly to the methodological dualism that pops out whenever human mental capacities are studied. Were thrushes humans the usual critics would be all over themselves arguing that postulating such inborn mechanisms is methodologically ill advised (and not at all explanatory), that DNA could not possibly code for maps and routes, and that there must be some very subtle learning mechanism lying behind the attested capacities (almost certainly using some Bayesian learning procedure while still in the egg!). In other words, standard scientific practice would have been suspended were thrushes humans. You draw the relevant moral.
This is interesting stuff (the area of study is called "vector navigation"). I bring it to your attention here for the obvious reason: whatever is genetically coded is very fancy: maps and routes. And despite the fact that the scientists have no idea how it is so coded does not stop them from concluding that it is so coded.
This is quite different from the attitudes in the the study of humans, as you all know. Chomsky has pointed repeatedly to the methodological dualism that pops out whenever human mental capacities are studied. Were thrushes humans the usual critics would be all over themselves arguing that postulating such inborn mechanisms is methodologically ill advised (and not at all explanatory), that DNA could not possibly code for maps and routes, and that there must be some very subtle learning mechanism lying behind the attested capacities (almost certainly using some Bayesian learning procedure while still in the egg!). In other words, standard scientific practice would have been suspended were thrushes humans. You draw the relevant moral.
Sunday, July 27, 2014
Comments on lecture 4-II
This is the second installment of comments on lecture 4. The
first part is here.
The Minimal Labeling Algorithm (MLA)
Chomsky starts the discussion by reviewing the basics of his
approach to labels. Here are the key
assumptions:
1. Labels
are necessary for interpreting structured linguistic objects (SLO) at the interfaces. I.e. Labels are not required for the operations of the
computational system (CS).
2. Labels
are assigned by the MLA as follows
a. All
constructed SLOs must be labeled (see 1)
b. Labels
are assigned within phases
c. MLA
minimally searches a set (i.e SLO) to identify the most
prominent “element.” This element is the label of the set.[1]
i. In
cases like {X,YP} (i.e. in which X is atomic and Y complex), the MLA chooses X:
the shallowest element capable of serving as a label
ii. In
{XP, YP} (i.e in cases where both members of the set are complex), There is no
unique choice to serve as label. In such cases, if XP and YP “agree,” (e.g. phi
feature agreement, or WH feature agreement) then MLA chooses the common
agreement features as the label.
3. Ancillary
assumptions:
a. Only
XPs that are heads of their chains are “visible” within a given set. Thus, in
{XP,YP} if XP is not head of its XP chain, then it is invisible within the set
{XP,YP}, (i.e. movement removes X within XP as a potential label).
b. Roots
are inherently incapable of labeling.
c. T
is parametrically a label. The capacity of T to serve as a label is related to
“richness of agreement.” It is “rich” in Italian so Italian T can serve as a
label. It is “poor” in English so English T cannot serve as a label.
d. If
a “weak” head (root or T) agrees with some X in XP then the agreement features
can serve as a label.
Assumptions in 3 and 4 suffice to explain several
interesting features of FL: the Fixed Subject Condition (FSC) (the
subject/object asymmetries in “ECP effects”), EPP effects, and the presence of
displacement. Let’s see how.
Consider first the EPP.
{T, YP} requires a label. In Italian this is not a problem for rich
agreement endows T with labeling prowess.[2]
English finesses this problem by raising a DP to “Spec” T. In a finite clause,
this induces agreement between the DP and the TP (well T’ in the “old” system,
but whatever) and the shared phi features can serve as the label. If, however, DP fails to raise to Spec T or
if DP in Spec T I-merges into some higher position, then it will not be
available for agreement and the set that contains T will not receive a label
and so will not be interpretable at CI or SM.
This is the account for the unacceptability of the sentences in (1) and
(2) (traces used for convenience):
(1) *
left John
(2) *Who1
did John say that t1 saw Mary
Comments: Note that for this account to go through, we must
assume that in {T, YP} that the “head” of Y is not a potential label. The fact
that T cannot serve as a label does not yet
imply that minimal search cannot find one.
Thus, say the complement of T were vP (here a weak v). Then the
structure is {T, vP}. If T is not a potential label, then the shallowest
possible label is v. Thus, we should be able to label the whole set ‘v’ and not
move anything up. Why then is movement required?
One possible reason is that John cannot stay in its base position for some reason. One reason
that it might have to move is that John
cannot get case here, forcing John to
move. The problem however, is that on a Probe-Goal system with non transitive v
as a weak phase, T can probe John and
assign it case (perhaps as a by-product of phi agreement, though I believe that
this is empirically dubious). Thus,
given Chomsky’s standard assumptions, John
can discharge whatever checking obligations it has without moving a jot.
So maybe, it needs to move for some other reason. One
consistent with the assumptions above is that it needs to move so that
{R(left), John} can be labeled.[3] Recall, however, that Chomsky assumes that
roots are universally incapable of labeling (3-c above) (Question: is 3-c a
stipulation of UG or does it follow from more general minimalist assumptions?
If the former then it exacerbates DP and so is an unwelcome stipulation (which
is not to say that it is incorrect, but given GM something we should be
suspicious of)). The structure of the set {R(left), John} is actually {R(left),
{n, R(John)}. In the latter there is a
highest potential label, namely ‘n.’ So, it the MLA is charged with finding the
most prominent potential label, then
it would appear that even without movement of {n, R(John)}, the MLA could
unambiguously apply. Once again, it is
not clear why I-merge is required.
Indeed, things are more obscure yet. In this lecture Chomsky suggests that roots
raise and combine with higher functional heads.
This implies that in {v, {[R(left) {n, R(John)}}} that R(left) vacates
the lowest set and unites with ‘v.’ But this movement will make ‘R(left)’
invisible in the lowest set, again allowing ‘n’ to label it. So, once again, it is not clear why John needs to raise to Spec T and why
‘v’ cannot serve to label {T, vP}.
Here’s another possibility: Maybe Chomsky is assuming a
theoretical analogue of defective intervention. Here’s what I mean. The MLA looks not for the highest potential labeler, but for the highest lexical
atom, whether it can serve as a label or
not. So in {T, vP}, T is the highest atom, it’s just that it cannot label.
So, unless something moves to its spec to agree with it, we will not be able to
label {T, vP} and we will face interpretive problems at the interfaces. On this interpretation, then, the MLA does
not look for the highest possible labeling atomic element, but simply the most
prominent element regardless of its labeling capacities. This will have the effect of forcing I-merge
of John to Spec-T.
So, let’s so interpret the MLA. Chomsky suggests that the same logic will
force raising of an object to Spec-R(V) in a standard transitive clause.[4]
Thus in something like (3a) the structure of the complement of v* is (3b) and
movement of the object to Spec-R(kiss) (3c), could allow for the set to be
labeled.
( (3) a.
Mary kissed John
b. {v*, {R(kiss), John}}
c. {John, {R(kiss), John}}
However, once again, this presupposes that raising the root
to v*, which Chomsky assumes to be universally required, will not suffice to
disambiguate the structure for labeling.[5]
So, the EPP follows given this way of reading the proposal:
MLA searches not for the first potential label, but for the closest lexical
atoms. If it finds one, it is the label if
it can be. If it cannot be, then tough luck, we need to label some other
way. One way would be for its Spec to be occupied with an agreeing element,
then the agreement can serve as the label. So even atoms that cannot label
(even cannot label inherently) can serve to interfere with other elements
serving as labels. This forces EPP movement in English so that agreement can resolve the ambiguity that stifles the MLA.
Question: how are non-finite TPs labeled? If they are
strong, then there is no problem. But clearly they generally don’t display any
agreement (at least any overt morphology). Thus, for infinitives, the link to
“rich” morphology is questionable. If non-finite Ts are weak, however, then how
can we get successive cyclic movement? Recall, the DP must remain in Spec T to
allow for labeling. Thus, successive cyclic movement should make labeling
impossible and we should get EPP problems?
In short, if a finite T requires a present subject in order to label the
“T’”, why not a non-finite T? A puzzle.
Question: I am still not clear how this analysis applies to
Existential Constructions (EC). In an earlier lecture, Chomsky insisted (to
reply to David P) that the EPP is mainly “about” the existence of null
expletives (note: I didn’t understand this as I said in comments to lecture
3). Ok: so English needs an expletive to
label “T’” but Italiina doesn’t. It’s not that Italian has a null expletive,
but that it has nothing in Spec-T as nothing is required. So, what happens in English? How exactly does there help label the “TP”?
Recall, the idea is that in {XP,YP} configurations there isn’t an
unambiguous most prominent atom and so the common agreement features serve as
the label. Does this mean that in ECs there
and the T share agreement features?[6] I am happy enough with this conclusion, but I
thought that Chomsky took the agreement in ECs to be between the associate and
T. Thus the agreement would not be between T and there but between T and the associate. How then does there figure in all of this? What’s it doing? Let me put this another way: the idea that
Chomsky is pursuing is that agreement provides a way for the MLA to find a
label when the structure is ambiguous. The label resolves the problem by
identifying a common feature(s) of XP and YP and using it to label the whole. But in ECs it is standardly assumed that the
agreement is not with the XP in Spec T but an associate in the complement
domain of T. So, either it is not
agreement that resolves the labeling problem in ECs, or there has agreement features, or ECs are, despite appearances, not
{XP,YP} SLOs. At any rate, I am not sure
what Chomsky would say about these, and they seem central to his proposal.
Inquiring minds want to know.
[1]
Chomsky seems to say that phase heads (PH) determine the label. I am not at all
sure why we need assume that it’s the PH that via the MLA determines the label.
It seems to me that MLA can function without a PH head being involved at all
to, as it were, “choose” the label. What is needed is that the MLA be a phase
level operation that applies, I assume, at Transfer. However, I may be wrong
about how Chomsky thinks of the role of PHs in labeling, though I think this is
what Chomsky actually says.
From what I can tell, MLA
requires that the choice of label respect minimal search and that it be
deterministic. I have interpreted this to mean, that the MLA applies to a given
set of elements to unambiguously choose the label. It is important that the MLA
does not tolerate labeling ambiguity (i.e. in a given structure exactly one
element can serve as the label and it will be chosen as the label) for this is
what forces movement and agreement, as we shall see. However, I do not see that
the MLA requires that PHs actually do
the labeling (i.e. choose the label). What is needed is that in every set,
there be a uniquely shallowest potential label and that the MLA choose it.
I am not clear why PHs are
required (if they are) to get the MLA to operate correctly. This may be a
residue of an idea that Chomsky later abandons, namely that all rules are
products of properties of the phase head. Chomsky, as I noted, dumps this
assumption at the end of lecture 4 and this might suffice to liberate the MLA
from PHs. Note it would also allow for
matrix clauses to be labeled. Minimal search is generally taken to be
restricted to sister’s of probes. PHs then do not “see” their specifiers,
making labeling of a matrix clause impossible. Freeing the MLA from PHs would
eliminate this problem.
[2]
Reminder: this is not an explanation.
“Rich agreement” is just the name we give to the fact that T can do this in
some languages and not in others. How
it manages to do this is unclear. I mention this to avoid confusing a
diacritical statement for an explanation. It is well known, that rich
morphological agreement is neither necessary nor sufficient to license EPP
effects, though there is a long tradition that identifies morphological
richness as the relevant parameter.
There is an equally long tradition that realizes that this carries very
little explanatory force.
[3]
‘R(X)’ means root of X.
[4]
Chomsky uses this to derive the Postal effects discussed by Koizumi and Lasnik
and Saito that indicate that accusative case marking endows an expression with
scope higher than its apparent base position. Chomsky thinks that this movement
really counterintuitive and so thinks that the system he develops is really terrific
precisely because it gets this as a consequence. It is worth pointing out that if case were
assigned in Spec-X configurations as in the earliest MP proposals, the movement
would be required as well. I am sure that Chomsky would not like this, though
it is pretty clear that Spec-X structures are not nearly as unpleasant to his
sensibilities as they once were given that this are the configurations where
agreement licenses labeling. That said, it is not configurations that license
agreement per se.
[5]
Chomsky does suggest that in head raising he raised head “labels” the derived
{X,Y} structure. If so, then maybe one needs to label the VP before the V
raises. I am frankly unclear about all of this.
[6]
A paper I wrote with Jacek Witkos on ECs suggested that in ECs there inherited features from the
nominal it was related to (it started out as a kind of dummy determiner and
moved) and that the agreement one sees is not then directly with the associate. This would suffice here, though I
strongly doubt that Chomsky would be delighted with this fix.
Wednesday, July 23, 2014
Academia as business
One of the forces behind the "crapification" of academic life (see here) is the fact that it is increasingly being managed by "bidnessmen." Part of this is because governments have decided to drop support for universities (the cold war is over and we beat the Russians) and the shortfall needs to made up from somewhere so universities have effectively turned themselves into fund raising machines and heading up a supercharged money grubber making machine requires the unctuous talents that only big bucks can attract. So, as Chomsky might put it, universities are no longer educational institutions and endowments but endowments with educational institutions. As all good minimalists know, being after a with means being of lesser significance!
At any rate, (here) is a piece that discusses some of the current dynamics. The ever increasing bureaucratization of the university is not a bug that can be eliminated, but a feature of how things are done now. Given the current state of play, universities do need alternative sources of funding and rich people and corporations (other "kinds" of people, at least in the USA) are the obvious source. Catering to these sources of income requires work and the people that are good at it, not surprisingly, are only tangentially interested in what universities do. The figures on the relative growth rates of faculty to administrators and the relevant salary comparisons are significant, IMO. It's not actually clear that much can be done, but it's worth knowing that this is no accident. It's just how things work now.
At any rate, (here) is a piece that discusses some of the current dynamics. The ever increasing bureaucratization of the university is not a bug that can be eliminated, but a feature of how things are done now. Given the current state of play, universities do need alternative sources of funding and rich people and corporations (other "kinds" of people, at least in the USA) are the obvious source. Catering to these sources of income requires work and the people that are good at it, not surprisingly, are only tangentially interested in what universities do. The figures on the relative growth rates of faculty to administrators and the relevant salary comparisons are significant, IMO. It's not actually clear that much can be done, but it's worth knowing that this is no accident. It's just how things work now.
Monday, July 21, 2014
What's in a Category? [Part 2]
Last week I wondered about the notion of syntactic category, aka part of speech (POS). My worry is that we have no clear idea what kind of work POS are supposed to do for syntax. We have some criteria for assigning POS to lexical items (LI) --- morphology, distribution, semantics --- but there are no clear-cut rules for how these are weighed against each other. Even worse, we have no idea why these are relevant criteria while plausible candidates such as phonological weight and arity seem to be irrelevant.1 So what we have is an integral part of pretty much every syntactic formalism for which we cannot say
- what exactly it encompasses,
- why it is necessary,
- why it shows certain properties but not others.
Comments on lecture 4-I
I have just finished listening to Chomsky’s fourth lecture
and so this will be the last series of posts on them (here).
If you have not seen them, let me again suggest that you take the time to
watch. They are very good and well worth the (not inconsiderable) time
commitment.
In 4, Chomsky’s does three things. First he again tries to
sell the style of investigation that the lectures as a whole illustrate. Second,
he reviews the motivations and basic results of his way of approaching the
Darwin’s Problem. Third, he proposes
ways of tidying up some of the loose ends that the outline in 3 generates (at
least they were loose ends that I did
not understand). Let me review each of
these points in turn.
1. The central issues and the Strong Minimalist Thesis (SMT)
Chomsky, as is his wont, returns to the key issues as he
sees them. There are two of particular importance.
First, he believes that we should be looking for simple theories. He names this dictum Galileo’s Maxim (GM). GM
asserts (i) that nature is simple and (ii) it is the task of the scientist to
prove that it is. Chomsky notes that this is not merely good general
methodological advice (which it is), but that in the particular context of the
study of FL there are substantive domain specific reasons for adopting it.
Namely: Darwin’s Problem (DP). Chomsky claims that DP rests on three
observations: (i) That our linguistic competence is not learnable from simple
data, (ii) There is no analogue of our linguistic capacity anywhere else in the
natural world, and (iii) The capacity for language emerged recently (in the
last 100k years or so), emerged suddenly and has remained stable in its
properties since its emergence.[1] These three points together imply that we
have a non-trival FL, that it is species specific and that it arose as a result
of a very “simple” addition to ancestor’s cognitive repertoire. So, in addition to the general (i.e. external
to the specific practice of linguistics) methodological
virtues of looking for simple and elegant theories, DP provides a more substantive (i.e. internal to
linguistics) incentive, as simple theories are just the sorts of things that could emerge rapidly in a lineage and
remain stable after emerging.
I very much like this way of framing the central aims of the
Minimalist Program (MP). It reconciles
two apparently contradictory themes that have motivated MP. The first theme is
that looking for simple theories is just good methodology and so MP is nothing
new. On this reading, MP is just the
rational extension of GG theorizing, just the application of general scientific
principles/standards of rational inquiry to linguistic investigations. On this view, MP concerns are nothing new and
the standards MP applies to theory evaluation are just the same as they always
were. The second view, one that also
seems to be a common theme, is that MP does
add a new dimension to inquiry. DP, though always a concern, is now ripe for
investigation. And thinking about DP motivates developing simple theories for
substantive reasons internal to linguistic investigations, motivations in
addition to the standard ones prompted by concerns of scientific hygiene. On this view, raising DP to prominence
changes the relevant standards for theoretical evaluation. Adding DP to Plato’s
Problem, then, changes the nature of the problem to be addressed in interesting
ways.
This combined view, I think, gets MP right. It is both novel and old hat. What Chomsky notes is that at some times,
depending on how developed theory is, new questions can emerge or become
accented and at those times the virtues of simplicity have a bite that goes
beyond general methodological concerns.
Another way of saying this, perhaps, is that there are times (now being
one in linguistics) where the value of theoretical simplicity is elevated and
the task of finding simple non-trivial coherent theories is the central research project. The SMT is
intended to respond to this way of viewing the current project (I comment on
this below).
Chomsky makes a second very important point. He notes that
our explanatory target should be the kinds of effects that GG has discovered
over the last 60 years. Thus, we should
try to develop accounts as to why FL generates an unbounded number of
structured linguistic objects (SLO), why it incorporates displacement
operations, why it obeys locality restrictions (strict cyclicity, PIC), why
there is overt morphology, why there are subject/object asymmetries (Fixed
Subject Effects/ECP), why there are EPP effects, etc. So, Chomsky identifies both
a method of inquiry (viz. Galileo’s
Maxim) and a target of inquiry (viz. the discovered laws and effects of GG).
Theory should aim to explain the second while taking DM very very seriously.
The SMT, as Chomsky sees it, is an example of how to do this
(actually, I don’t think he believes it is an example, but the only possible conceptually coherent way to proceed). Here’s the guts of the SMT: look for the conceptually simplest computational
procedures that generate SLOs and that are interpreted at CI and (secondarily)
SM. Embed these conceptually simple
operations in a computationally efficient
system (one that adheres to obvious and generic principles of efficient
computation like minimal search, No Tampering, Inclusiveness, Memory load
reduction) and show that from these optimal starting points one can derive a
good chunk of the properties that GG has discovered natural language grammars
to have. And, when confronted with
apparent counter-examples to the SMT, look harder for a solution that redeems the
SMT. This, Chomsky argues is the
right way, today, to do theoretical syntax.
I like almost all of this, as you might have guessed. IMO,
the only caveat I would have is that the conceptually simple is often a very
hard to discern. Moreover, what Occam might endorse, DP might not. I have
discussed before that what’s simple in a DP context might well depend on what
was cognitively available to our ancestors prior to the emergence of FL. Thus,
there may be many plausible simple
starting points that lead to different kinds of theories of FL all of which
respond to Chomsky’s methodological and substantive vision of MP. For what it’s
worth, contra Chomsky, I think (or, at least believe that it is rational to
suggest) that Merge is not simple but complex and that it is composed of a more
cognitively primitive operation (viz. Iteration) and a novel part (viz.
Labeling). For those who care about this, I discuss what I have in mind further
here
in part 4 (the finale) of my comments to lecture 3.[2]
However, that said, I could not agree with Chomsky’s general approach more. An
MP that respects DP should deify GM and target the laws of GG. Right on.
[1]
Chomsky has a nice riff where he notes that though it seems to him (and to any
sane researcher) that (i)-(iii) are obviously correct, nonetheless these are
highly controversial claims, if judged by the bulk of research on language. He
particularly zeros in on big data statistical learning types and observes
(correctly in my view) that not only have they not been able to deliver on even
the simplest PoS problems (e.g. structure dependence in Y/N questions) but that
they are currently incapable of
delivering anything of interest given that they have misconstrued the problem
to be solved. Chomsky develops this theme further, pointing out that to date,
in his opinion, we have learned nothing of interest from these pursuits either
in syntax or semantics. I completely agree and have said so here.
Still, I get great pleasure in hearing Chomsky’s completely accurate dismissive
comments.
[2]
I also discuss this in a chapter co-written with Bill Idsardi forthcoming in a
collection edited by Peter Kosta from Benjamins.
Thursday, July 17, 2014
Big money, big science and brains
Gary Marcus here
discusses a recent brouhaha taking place in the European neuro-science
community. The kerfuffle, not surprisingly, is about how to study the brain. In other words, it's about money. The Europeans have decided
to spend a lot of Euros (real money!) to try to find out how brains function. Rather than
throw lots of it at many different projects haphazardly and see which gain traction, the science bureaucrats in the EU have decided to pick winners (an
unlikely strategy for success given how little we know, but bureaucratic hubris
really knows no bounds). And, here’s a surprise, many of those left behind are
complaining.
Now, truth be told, in this case my sympathies lie with
(at least some) of those cut out. One of
these is Stan Dehaene, who, IMO, is really one of the best cog-neuro people
working today. What makes him good is
his understanding that good neuroscience requires good cognitive science (i.e.
that trying to figure out how brains do things requires having some
specification of what it is that they are doing). It seems that this,
unfortunately, is a minority opinion. And this is not good. Marcus explains
why.
His op-ed makes several important points concerning the
current state of the neuro art in addition to providing links to aforementioned
funding battle (I admit it: I can’t help enjoy watching others fighting
important “intellectual battles” that revolve around very large amounts of
cash). His most important point is that, at this point in time, we really have
no bridge between cognitive theories and neuro theories. Or as Marcus puts it:
What we are really
looking for is a bridge, some way of connecting two separate scientific
languages — those of neuroscience and psychology.
In fact, this is a nice and polite way of putting it. What
we are really looking for is some recognition from the hard-core neuro community
that their default psychological theories are deeply inadequate. You see, much
of the neuro community consists of crude (as if there were another kind)
associationists, and the neuro models they pursue reflect this. I have pointed
to several critical discussions of this shortcoming in the past by Randy
Gallistel and friends (here). Marcus himself
has usefully trashed the standard connectionist psycho models (here). However, they just refuse to die and this has had the
effect of diverting attention from the important problem that Marcus points to
above; finding that bridge.
Actually, it’s worse than
that. I doubt that Marcus’s point of view is widely shared in the neuro
community. Why? They think that they already have the required bridge. Gallistel
& King (here) review the current state of play: connectionist neural
models combine with associationist psychology to provide a unified picture of
how brains and minds interact. The
problem is not that neuroscience has no bridge, it’s that it has one and it’s a
bridge to nowhere. That’s the real problem. You can’t find what you are not
looking for and you won’t look for something if you think you already have it.
And this brings us back to the
aforementioned battle in Europe. Markham
and colleagues have a project. It is
described here as attempting to “reverse engineer the mammalian brain
by recreating the behavior of billions of neurons in a computer.” The game plan
seems to be to mimic the behavior of real brains by building a fully connected
brain within the computer. The idea seems to be that once we have this fully
connected neural net of billions of “neurons” it will become evident how brains
think and perceive. In other words, Markham and colleagues “know” how brains
think, it’s just a big neural net.[1] What’s missing is not the basic
concepts, but the details. From their point of view the problems is roughly to
detail the fine structure of the net (i.e. what’s connected to what). This is a
very complex problem for brains are very complicated
nets. However, nets they are. And once you buy this, then the problem of
understanding the brain becomes, as Science
put it (in the July 11/2014 issue), “an information technology” issue.[2]
And that’s where Marcus and
Dehaene and Gallistel and a few notable others disagree: they think that we
still don’t know the most basic features of how the brain processes
information. We don’t know how it stores info in memory, how it retrieves it
from memory, how it calls functions, how it binds variables, how, in a word, it
computes. And this is a very big thing not to know. It means that we don’t know
how brains incarnate even the most basic computational operations.
In the op-ed, Marcus develops
an analogy that Gallistel is also fond of pointing to between the state of
current neuroscience and biology before Watson and Crick.[3] Here’s Marcus on the cognition-neuro bridge
again:
Such bridges don’t come easily or often,
maybe once in a generation, but when they do arrive, they can change
everything. An example is the discovery of DNA, which allowed us to understand
how genetic information could be represented and replicated in a physical
structure. In one stroke, this bridge transformed biology from a mystery —
in which the physical basis of life was almost entirely unknown — into a
tractable if challenging set of problems, such as sequencing genes, working out
the proteins that they encode and discerning the circumstances that govern their
distribution in the body.
Neuroscience awaits a
similar breakthrough. We know that there must be some lawful relation between
assemblies of neurons and the elements of thought, but we are currently at a
loss to describe those laws. We don’t know, for example, whether our memories
for individual words inhere in individual neurons or in sets of neurons, or in
what way sets of neurons might underwrite our memories for words, if in fact
they do.
The presence of money (indeed, even the whiff of lucre) has a
way of sharpening intellectual disputes. This one is no different. The problem
from my point of view is that the wrong ideas appear to be cashing in. Those
controlling the resources do not seem (as Marcus puts it) “devoted to spanning
the chasm.” I am pretty sure I know why too: they don’t see one. If your
psychology is associationist (even if only tacitly so), then the problem is one
of detail not principle. The problem is getting the wiring diagram right (it is very complex you know), the problem
is getting the right probes to reveal the detailed connections to reveal the
full networks. The problem is not fundamental but practical; problems that we can
be confident will advance if we throw lots of money at them.
And, as always, things are
worse than this. Big money calls forth busy bureaucrats whose job it is to measure progress, write reports, convene panels to manage the money and the
science. The basic problem is that fundamental
science is impossible to manage due to its inherent unpredictability (as Popper noted long ago). So in
place of basic fundamental research, big money begets big science which begets
the strategic pursuit of the manageable. This is not always a bad thing. When questions are crisp and we understand roughly what's going on big science can find us the Higgs field or W bosons. However, when we are awaiting our "breakthrough" the virtues of this kind of research are far more debatable. Why? Because in this process, sadly, the hard
fundamental questions can easily get lost for they are too hard (quirky, offbeat, novel) for the system to digest.
Even more sadly, this kind of big money science follows a Gresham’s Law sort of logic with Big (heavily monied) Science driving out small bore fundamental
research. That’s what Marcus is pointing to, and he is right to be
disappointed.
[1]
I don’t understand why the failure of the full wiring diagram of the nematode
(which we have) to explain nematode behavior has not impressed so many of the
leading figures in the field (Cristof Koch is an exception here). If the problem were just the details of the
wiring diagram, then the nematode “cognition” should be an open book, which it
is most definitely not.
[2]
And these large scale technology/Big Data projects are a bureaucrats dream.
Here there is lots of room to manage the project, set up indices of progress
and success and do all the pointless things that bureaucrats love to do. Sadly,
this has nothing to do with real science.
Popper noted long ago that the problem with scientific progress is that
it is inherently unpredictable. You cannot schedule the arrival of breakthrough
ideas. But this very unpredictability is
what makes such research unpalatable to science managers and why it is that
they prefer big all encompassing sciency projects to the real thing.
[3]
Gallistel has made an interesting observation about this earlier period in
molecular biology. Most of the biochemistry predating Watson and Crick has been
thrown away. The genetics that predates
Watson and Crick has largely survived although elaborated. The analogy in the cognitive neurosciences is
that much of what we think of as cutting edge neuroscience might possibly
disappear once Marcus’s bridge is built. Cognitive theory, however, will
largely remain intact. So, curiously, if
the prior developments in molecular biology are any guide, the cognitive
results in areas like linguistics, vision, face recognition etc. will prove to
be far more robust when insight finally arrives than the stuff that most
neuroscientists are currently invested in.
For a nice discussion of this earlier period in molecular biology read this.
It’s a terrific book.