Faculty of Language: July 2014

Thursday, July 31, 2014

A short piece on deep learning

The new AI is called "deep learning." There is a lot of money going into it (Google, I think just bought Hinton, the big cheese in this area) and a lot of useful applications being explored. It is, of course, being trotted out as the new panacea for cognitive theory. Sadly (or maybe happily), the stuff is beyond my pay-grade, but the little I understand of it makes it look a lot like the structuralist discovery procedures of yore. Here's a very short article on it in the Financial Times. I'm not sure whether you will be able to get to it as it matt be behind a paywall. If you can get to it, it's worth a look. What I liked is the blend of respect for what it can do combined with a certain skepticism regarding the more grandiose claims. Here's a taste from Oren Etzioni.

“There are plenty of exciting individual applications. But if you scratch a little deeper, the technology goes off a cliff,” he said.

Wednesday, July 30, 2014

Built in maps

Bill Idsardi sent me this piece on bird navigation. It seems that Swainson thrushes have genetically built in maps that get them from Canada to Mexico and parts of south and central America. Moreover, it appears that these maps come in two flavors, some taking a coastal route and others taking a more medial one. As the author puts it, these routes can overlap and when they do (well let me quote here) "they have a chance to (ahem) mingle." Mingling has consequences and these consequences can apparently end up with mixed maps. As the researchers put it: "it is believed birds have genetic instructions on which direction they need to head and how long they need to fly," though, as conceded "it’s still a mystery how, exactly, a bird’s DNA tells it where to go."

This is interesting stuff (the area of study is called "vector navigation"). I bring it to your attention here for the obvious reason: whatever is genetically coded is very fancy: maps and routes. And despite the fact that the scientists have no idea how it is so coded does not stop them from concluding that it is so coded.

This is quite different from the attitudes in the the study of humans, as you all know. Chomsky has pointed repeatedly to the methodological dualism that pops out whenever human mental capacities are studied. Were thrushes humans the usual critics would be all over themselves arguing that postulating such inborn mechanisms is methodologically ill advised (and not at all explanatory), that DNA could not possibly code for maps and routes, and that there must be some very subtle learning mechanism lying behind the attested capacities (almost certainly using some Bayesian learning procedure while still in the egg!). In other words, standard scientific practice would have been suspended were thrushes humans. You draw the relevant moral.

Sunday, July 27, 2014

Comments on lecture 4-II

This is the second installment of comments on lecture 4. The first part is here.

The Minimal Labeling Algorithm (MLA)

Chomsky starts the discussion by reviewing the basics of his approach to labels. Here are the key assumptions:

1. Labels are necessary for interpreting structured linguistic objects (SLO) at the interfaces. I.e. Labels are not required for the operations of the computational system (CS).

2. Labels are assigned by the MLA as follows

a. All constructed SLOs must be labeled (see 1)

b. Labels are assigned within phases

c. MLA minimally searches a set (i.e SLO) to identify the most

prominent “element.” This element is the label of the set.[1]

i. In cases like {X,YP} (i.e. in which X is atomic and Y complex), the MLA chooses X: the shallowest element capable of serving as a label

ii. In {XP, YP} (i.e in cases where both members of the set are complex), There is no unique choice to serve as label. In such cases, if XP and YP “agree,” (e.g. phi feature agreement, or WH feature agreement) then MLA chooses the common agreement features as the label.

3. Ancillary assumptions:

a. Only XPs that are heads of their chains are “visible” within a given set. Thus, in {XP,YP} if XP is not head of its XP chain, then it is invisible within the set {XP,YP}, (i.e. movement removes X within XP as a potential label).

b. Roots are inherently incapable of labeling.

c. T is parametrically a label. The capacity of T to serve as a label is related to “richness of agreement.” It is “rich” in Italian so Italian T can serve as a label. It is “poor” in English so English T cannot serve as a label.

d. If a “weak” head (root or T) agrees with some X in XP then the agreement features can serve as a label.

Assumptions in 3 and 4 suffice to explain several interesting features of FL: the Fixed Subject Condition (FSC) (the subject/object asymmetries in “ECP effects”), EPP effects, and the presence of displacement. Let’s see how.

Consider first the EPP. {T, YP} requires a label. In Italian this is not a problem for rich agreement endows T with labeling prowess.[2] English finesses this problem by raising a DP to “Spec” T. In a finite clause, this induces agreement between the DP and the TP (well T’ in the “old” system, but whatever) and the shared phi features can serve as the label. If, however, DP fails to raise to Spec T or if DP in Spec T I-merges into some higher position, then it will not be available for agreement and the set that contains T will not receive a label and so will not be interpretable at CI or SM. This is the account for the unacceptability of the sentences in (1) and (2) (traces used for convenience):

(1) * left John

(2) *Who₁ did John say that t₁ saw Mary

Comments: Note that for this account to go through, we must assume that in {T, YP} that the “head” of Y is not a potential label. The fact that T cannot serve as a label does not yet imply that minimal search cannot find one. Thus, say the complement of T were vP (here a weak v). Then the structure is {T, vP}. If T is not a potential label, then the shallowest possible label is v. Thus, we should be able to label the whole set ‘v’ and not move anything up. Why then is movement required?

One possible reason is that John cannot stay in its base position for some reason. One reason that it might have to move is that John cannot get case here, forcing John to move. The problem however, is that on a Probe-Goal system with non transitive v as a weak phase, T can probe John and assign it case (perhaps as a by-product of phi agreement, though I believe that this is empirically dubious). Thus, given Chomsky’s standard assumptions, John can discharge whatever checking obligations it has without moving a jot.

So maybe, it needs to move for some other reason. One consistent with the assumptions above is that it needs to move so that {R(left), John} can be labeled.[3] Recall, however, that Chomsky assumes that roots are universally incapable of labeling (3-c above) (Question: is 3-c a stipulation of UG or does it follow from more general minimalist assumptions? If the former then it exacerbates DP and so is an unwelcome stipulation (which is not to say that it is incorrect, but given GM something we should be suspicious of)). The structure of the set {R(left), John} is actually {R(left), {n, R(John)}. In the latter there is a highest potential label, namely ‘n.’ So, it the MLA is charged with finding the most prominent potential label, then it would appear that even without movement of {n, R(John)}, the MLA could unambiguously apply. Once again, it is not clear why I-merge is required.

Indeed, things are more obscure yet. In this lecture Chomsky suggests that roots raise and combine with higher functional heads. This implies that in {v, {[R(left) {n, R(John)}}} that R(left) vacates the lowest set and unites with ‘v.’ But this movement will make ‘R(left)’ invisible in the lowest set, again allowing ‘n’ to label it. So, once again, it is not clear why John needs to raise to Spec T and why ‘v’ cannot serve to label {T, vP}.

Here’s another possibility: Maybe Chomsky is assuming a theoretical analogue of defective intervention. Here’s what I mean. The MLA looks not for the highest potential labeler, but for the highest lexical atom, whether it can serve as a label or not. So in {T, vP}, T is the highest atom, it’s just that it cannot label. So, unless something moves to its spec to agree with it, we will not be able to label {T, vP} and we will face interpretive problems at the interfaces. On this interpretation, then, the MLA does not look for the highest possible labeling atomic element, but simply the most prominent element regardless of its labeling capacities. This will have the effect of forcing I-merge of John to Spec-T.

So, let’s so interpret the MLA. Chomsky suggests that the same logic will force raising of an object to Spec-R(V) in a standard transitive clause.[4] Thus in something like (3a) the structure of the complement of v* is (3b) and movement of the object to Spec-R(kiss) (3c), could allow for the set to be labeled.

( (3) a. Mary kissed John

b. {v*, {R(kiss), John}}

c. {John, {R(kiss), John}}

However, once again, this presupposes that raising the root to v*, which Chomsky assumes to be universally required, will not suffice to disambiguate the structure for labeling.[5]

So, the EPP follows given this way of reading the proposal: MLA searches not for the first potential label, but for the closest lexical atoms. If it finds one, it is the label if it can be. If it cannot be, then tough luck, we need to label some other way. One way would be for its Spec to be occupied with an agreeing element, then the agreement can serve as the label. So even atoms that cannot label (even cannot label inherently) can serve to interfere with other elements serving as labels. This forces EPP movement in English so that agreement can resolve the ambiguity that stifles the MLA.

Question: how are non-finite TPs labeled? If they are strong, then there is no problem. But clearly they generally don’t display any agreement (at least any overt morphology). Thus, for infinitives, the link to “rich” morphology is questionable. If non-finite Ts are weak, however, then how can we get successive cyclic movement? Recall, the DP must remain in Spec T to allow for labeling. Thus, successive cyclic movement should make labeling impossible and we should get EPP problems? In short, if a finite T requires a present subject in order to label the “T’”, why not a non-finite T? A puzzle.

Question: I am still not clear how this analysis applies to Existential Constructions (EC). In an earlier lecture, Chomsky insisted (to reply to David P) that the EPP is mainly “about” the existence of null expletives (note: I didn’t understand this as I said in comments to lecture 3). Ok: so English needs an expletive to label “T’” but Italiina doesn’t. It’s not that Italian has a null expletive, but that it has nothing in Spec-T as nothing is required. So, what happens in English? How exactly does there help label the “TP”? Recall, the idea is that in {XP,YP} configurations there isn’t an unambiguous most prominent atom and so the common agreement features serve as the label. Does this mean that in ECs there and the T share agreement features?[6] I am happy enough with this conclusion, but I thought that Chomsky took the agreement in ECs to be between the associate and T. Thus the agreement would not be between T and there but between T and the associate. How then does there figure in all of this? What’s it doing? Let me put this another way: the idea that Chomsky is pursuing is that agreement provides a way for the MLA to find a label when the structure is ambiguous. The label resolves the problem by identifying a common feature(s) of XP and YP and using it to label the whole. But in ECs it is standardly assumed that the agreement is not with the XP in Spec T but an associate in the complement domain of T. So, either it is not agreement that resolves the labeling problem in ECs, or there has agreement features, or ECs are, despite appearances, not {XP,YP} SLOs. At any rate, I am not sure what Chomsky would say about these, and they seem central to his proposal. Inquiring minds want to know.

[1] Chomsky seems to say that phase heads (PH) determine the label. I am not at all sure why we need assume that it’s the PH that via the MLA determines the label. It seems to me that MLA can function without a PH head being involved at all to, as it were, “choose” the label. What is needed is that the MLA be a phase level operation that applies, I assume, at Transfer. However, I may be wrong about how Chomsky thinks of the role of PHs in labeling, though I think this is what Chomsky actually says.

From what I can tell, MLA requires that the choice of label respect minimal search and that it be deterministic. I have interpreted this to mean, that the MLA applies to a given set of elements to unambiguously choose the label. It is important that the MLA does not tolerate labeling ambiguity (i.e. in a given structure exactly one element can serve as the label and it will be chosen as the label) for this is what forces movement and agreement, as we shall see. However, I do not see that the MLA requires that PHs actually do the labeling (i.e. choose the label). What is needed is that in every set, there be a uniquely shallowest potential label and that the MLA choose it.

I am not clear why PHs are required (if they are) to get the MLA to operate correctly. This may be a residue of an idea that Chomsky later abandons, namely that all rules are products of properties of the phase head. Chomsky, as I noted, dumps this assumption at the end of lecture 4 and this might suffice to liberate the MLA from PHs. Note it would also allow for matrix clauses to be labeled. Minimal search is generally taken to be restricted to sister’s of probes. PHs then do not “see” their specifiers, making labeling of a matrix clause impossible. Freeing the MLA from PHs would eliminate this problem.

[2] Reminder: this is not an explanation. “Rich agreement” is just the name we give to the fact that T can do this in some languages and not in others. How it manages to do this is unclear. I mention this to avoid confusing a diacritical statement for an explanation. It is well known, that rich morphological agreement is neither necessary nor sufficient to license EPP effects, though there is a long tradition that identifies morphological richness as the relevant parameter. There is an equally long tradition that realizes that this carries very little explanatory force.

[3] ‘R(X)’ means root of X.

[4] Chomsky uses this to derive the Postal effects discussed by Koizumi and Lasnik and Saito that indicate that accusative case marking endows an expression with scope higher than its apparent base position. Chomsky thinks that this movement really counterintuitive and so thinks that the system he develops is really terrific precisely because it gets this as a consequence. It is worth pointing out that if case were assigned in Spec-X configurations as in the earliest MP proposals, the movement would be required as well. I am sure that Chomsky would not like this, though it is pretty clear that Spec-X structures are not nearly as unpleasant to his sensibilities as they once were given that this are the configurations where agreement licenses labeling. That said, it is not configurations that license agreement per se.

[5] Chomsky does suggest that in head raising he raised head “labels” the derived {X,Y} structure. If so, then maybe one needs to label the VP before the V raises. I am frankly unclear about all of this.

[6] A paper I wrote with Jacek Witkos on ECs suggested that in ECs there inherited features from the nominal it was related to (it started out as a kind of dummy determiner and moved) and that the agreement one sees is not then directly with the associate. This would suffice here, though I strongly doubt that Chomsky would be delighted with this fix.

Wednesday, July 23, 2014

Academia as business

One of the forces behind the "crapification" of academic life (see here) is the fact that it is increasingly being managed by "bidnessmen." Part of this is because governments have decided to drop support for universities (the cold war is over and we beat the Russians) and the shortfall needs to made up from somewhere so universities have effectively turned themselves into fund raising machines and heading up a supercharged money ~~grubber~~ making machine requires the unctuous talents that only big bucks can attract. So, as Chomsky might put it, universities are no longer educational institutions and endowments but endowments with educational institutions. As all good minimalists know, being after a with means being of lesser significance!

At any rate, (here) is a piece that discusses some of the current dynamics. The ever increasing bureaucratization of the university is not a bug that can be eliminated, but a feature of how things are done now. Given the current state of play, universities do need alternative sources of funding and rich people and corporations (other "kinds" of people, at least in the USA) are the obvious source. Catering to these sources of income requires work and the people that are good at it, not surprisingly, are only tangentially interested in what universities do. The figures on the relative growth rates of faculty to administrators and the relevant salary comparisons are significant, IMO. It's not actually clear that much can be done, but it's worth knowing that this is no accident. It's just how things work now.

Monday, July 21, 2014

What's in a Category? [Part 2]

Last week I wondered about the notion of syntactic category, aka part of speech (POS). My worry is that we have no clear idea what kind of work POS are supposed to do for syntax. We have some criteria for assigning POS to lexical items (LI) --- morphology, distribution, semantics --- but there are no clear-cut rules for how these are weighed against each other. Even worse, we have no idea why these are relevant criteria while plausible candidates such as phonological weight and arity seem to be irrelevant.¹ So what we have is an integral part of pretty much every syntactic formalism for which we cannot say

what exactly it encompasses,
why it is necessary,
why it shows certain properties but not others.

Okay, that's a pretty unsatisfying state of affairs. Actually, things are even more unsatisfying once you look at the issue from a formal perspective. But the formal perspective also suggests a way out of this mess.

Comments on lecture 4-I

I have just finished listening to Chomsky’s fourth lecture and so this will be the last series of posts on them (here). If you have not seen them, let me again suggest that you take the time to watch. They are very good and well worth the (not inconsiderable) time commitment.

In 4, Chomsky’s does three things. First he again tries to sell the style of investigation that the lectures as a whole illustrate. Second, he reviews the motivations and basic results of his way of approaching the Darwin’s Problem. Third, he proposes ways of tidying up some of the loose ends that the outline in 3 generates (at least they were loose ends that I did not understand). Let me review each of these points in turn.

1. The central issues and the Strong Minimalist Thesis (SMT)

Chomsky, as is his wont, returns to the key issues as he sees them. There are two of particular importance.

First, he believes that we should be looking for simple theories. He names this dictum Galileo’s Maxim (GM). GM asserts (i) that nature is simple and (ii) it is the task of the scientist to prove that it is. Chomsky notes that this is not merely good general methodological advice (which it is), but that in the particular context of the study of FL there are substantive domain specific reasons for adopting it. Namely: Darwin’s Problem (DP). Chomsky claims that DP rests on three observations: (i) That our linguistic competence is not learnable from simple data, (ii) There is no analogue of our linguistic capacity anywhere else in the natural world, and (iii) The capacity for language emerged recently (in the last 100k years or so), emerged suddenly and has remained stable in its properties since its emergence.[1] These three points together imply that we have a non-trival FL, that it is species specific and that it arose as a result of a very “simple” addition to ancestor’s cognitive repertoire. So, in addition to the general (i.e. external to the specific practice of linguistics) methodological virtues of looking for simple and elegant theories, DP provides a more substantive (i.e. internal to linguistics) incentive, as simple theories are just the sorts of things that could emerge rapidly in a lineage and remain stable after emerging.

I very much like this way of framing the central aims of the Minimalist Program (MP). It reconciles two apparently contradictory themes that have motivated MP. The first theme is that looking for simple theories is just good methodology and so MP is nothing new. On this reading, MP is just the rational extension of GG theorizing, just the application of general scientific principles/standards of rational inquiry to linguistic investigations. On this view, MP concerns are nothing new and the standards MP applies to theory evaluation are just the same as they always were. The second view, one that also seems to be a common theme, is that MP does add a new dimension to inquiry. DP, though always a concern, is now ripe for investigation. And thinking about DP motivates developing simple theories for substantive reasons internal to linguistic investigations, motivations in addition to the standard ones prompted by concerns of scientific hygiene. On this view, raising DP to prominence changes the relevant standards for theoretical evaluation. Adding DP to Plato’s Problem, then, changes the nature of the problem to be addressed in interesting ways.

This combined view, I think, gets MP right. It is both novel and old hat. What Chomsky notes is that at some times, depending on how developed theory is, new questions can emerge or become accented and at those times the virtues of simplicity have a bite that goes beyond general methodological concerns. Another way of saying this, perhaps, is that there are times (now being one in linguistics) where the value of theoretical simplicity is elevated and the task of finding simple non-trivial coherent theories is the central research project. The SMT is intended to respond to this way of viewing the current project (I comment on this below).

Chomsky makes a second very important point. He notes that our explanatory target should be the kinds of effects that GG has discovered over the last 60 years. Thus, we should try to develop accounts as to why FL generates an unbounded number of structured linguistic objects (SLO), why it incorporates displacement operations, why it obeys locality restrictions (strict cyclicity, PIC), why there is overt morphology, why there are subject/object asymmetries (Fixed Subject Effects/ECP), why there are EPP effects, etc. So, Chomsky identifies both a method of inquiry (viz. Galileo’s Maxim) and a target of inquiry (viz. the discovered laws and effects of GG). Theory should aim to explain the second while taking DM very very seriously.

The SMT, as Chomsky sees it, is an example of how to do this (actually, I don’t think he believes it is an example, but the only possible conceptually coherent way to proceed). Here’s the guts of the SMT: look for the conceptually simplest computational procedures that generate SLOs and that are interpreted at CI and (secondarily) SM. Embed these conceptually simple operations in a computationally efficient system (one that adheres to obvious and generic principles of efficient computation like minimal search, No Tampering, Inclusiveness, Memory load reduction) and show that from these optimal starting points one can derive a good chunk of the properties that GG has discovered natural language grammars to have. And, when confronted with apparent counter-examples to the SMT, look harder for a solution that redeems the SMT. This, Chomsky argues is the right way, today, to do theoretical syntax.

I like almost all of this, as you might have guessed. IMO, the only caveat I would have is that the conceptually simple is often a very hard to discern. Moreover, what Occam might endorse, DP might not. I have discussed before that what’s simple in a DP context might well depend on what was cognitively available to our ancestors prior to the emergence of FL. Thus, there may be many plausible simple starting points that lead to different kinds of theories of FL all of which respond to Chomsky’s methodological and substantive vision of MP. For what it’s worth, contra Chomsky, I think (or, at least believe that it is rational to suggest) that Merge is not simple but complex and that it is composed of a more cognitively primitive operation (viz. Iteration) and a novel part (viz. Labeling). For those who care about this, I discuss what I have in mind further here in part 4 (the finale) of my comments to lecture 3.[2] However, that said, I could not agree with Chomsky’s general approach more. An MP that respects DP should deify GM and target the laws of GG. Right on.

[1] Chomsky has a nice riff where he notes that though it seems to him (and to any sane researcher) that (i)-(iii) are obviously correct, nonetheless these are highly controversial claims, if judged by the bulk of research on language. He particularly zeros in on big data statistical learning types and observes (correctly in my view) that not only have they not been able to deliver on even the simplest PoS problems (e.g. structure dependence in Y/N questions) but that they are currently incapable of delivering anything of interest given that they have misconstrued the problem to be solved. Chomsky develops this theme further, pointing out that to date, in his opinion, we have learned nothing of interest from these pursuits either in syntax or semantics. I completely agree and have said so here. Still, I get great pleasure in hearing Chomsky’s completely accurate dismissive comments.

[2] I also discuss this in a chapter co-written with Bill Idsardi forthcoming in a collection edited by Peter Kosta from Benjamins.

Thursday, July 17, 2014

Big money, big science and brains

Gary Marcus here discusses a recent brouhaha taking place in the European neuro-science community. The kerfuffle, not surprisingly, is about how to study the brain. In other words, it's about money. The Europeans have decided to spend a lot of Euros (real money!) to try to find out how brains function. Rather than throw lots of it at many different projects haphazardly and see which gain traction, the science bureaucrats in the EU have decided to pick winners (an unlikely strategy for success given how little we know, but bureaucratic hubris really knows no bounds). And, here’s a surprise, many of those left behind are complaining.

Now, truth be told, in this case my sympathies lie with (at least some) of those cut out. One of these is Stan Dehaene, who, IMO, is really one of the best cog-neuro people working today. What makes him good is his understanding that good neuroscience requires good cognitive science (i.e. that trying to figure out how brains do things requires having some specification of what it is that they are doing). It seems that this, unfortunately, is a minority opinion. And this is not good. Marcus explains why.

His op-ed makes several important points concerning the current state of the neuro art in addition to providing links to aforementioned funding battle (I admit it: I can’t help enjoy watching others fighting important “intellectual battles” that revolve around very large amounts of cash). His most important point is that, at this point in time, we really have no bridge between cognitive theories and neuro theories. Or as Marcus puts it:

What we are really looking for is a bridge, some way of connecting two separate scientific languages — those of neuroscience and psychology.

In fact, this is a nice and polite way of putting it. What we are really looking for is some recognition from the hard-core neuro community that their default psychological theories are deeply inadequate. You see, much of the neuro community consists of crude (as if there were another kind) associationists, and the neuro models they pursue reflect this. I have pointed to several critical discussions of this shortcoming in the past by Randy Gallistel and friends (here). Marcus himself has usefully trashed the standard connectionist psycho models (here). However, they just refuse to die and this has had the effect of diverting attention from the important problem that Marcus points to above; finding that bridge.

Actually, it’s worse than that. I doubt that Marcus’s point of view is widely shared in the neuro community. Why? They think that they already have the required bridge. Gallistel & King (here) review the current state of play: connectionist neural models combine with associationist psychology to provide a unified picture of how brains and minds interact. The problem is not that neuroscience has no bridge, it’s that it has one and it’s a bridge to nowhere. That’s the real problem. You can’t find what you are not looking for and you won’t look for something if you think you already have it.

And this brings us back to the aforementioned battle in Europe. Markham and colleagues have a project. It is described here as attempting to “reverse engineer the mammalian brain by recreating the behavior of billions of neurons in a computer.” The game plan seems to be to mimic the behavior of real brains by building a fully connected brain within the computer. The idea seems to be that once we have this fully connected neural net of billions of “neurons” it will become evident how brains think and perceive. In other words, Markham and colleagues “know” how brains think, it’s just a big neural net.[1] What’s missing is not the basic concepts, but the details. From their point of view the problems is roughly to detail the fine structure of the net (i.e. what’s connected to what). This is a very complex problem for brains are very complicated nets. However, nets they are. And once you buy this, then the problem of understanding the brain becomes, as Science put it (in the July 11/2014 issue), “an information technology” issue.[2]

And that’s where Marcus and Dehaene and Gallistel and a few notable others disagree: they think that we still don’t know the most basic features of how the brain processes information. We don’t know how it stores info in memory, how it retrieves it from memory, how it calls functions, how it binds variables, how, in a word, it computes. And this is a very big thing not to know. It means that we don’t know how brains incarnate even the most basic computational operations.

In the op-ed, Marcus develops an analogy that Gallistel is also fond of pointing to between the state of current neuroscience and biology before Watson and Crick.[3] Here’s Marcus on the cognition-neuro bridge again:

Such bridges don’t come easily or often, maybe once in a generation, but when they do arrive, they can change everything. An example is the discovery of DNA, which allowed us to understand how genetic information could be represented and replicated in a physical structure. In one stroke, this bridge transformed biology from a mystery — in which the physical basis of life was almost entirely unknown — into a tractable if challenging set of problems, such as sequencing genes, working out the proteins that they encode and discerning the circumstances that govern their distribution in the body.

Neuroscience awaits a similar breakthrough. We know that there must be some lawful relation between assemblies of neurons and the elements of thought, but we are currently at a loss to describe those laws. We don’t know, for example, whether our memories for individual words inhere in individual neurons or in sets of neurons, or in what way sets of neurons might underwrite our memories for words, if in fact they do.

The presence of money (indeed, even the whiff of lucre) has a way of sharpening intellectual disputes. This one is no different. The problem from my point of view is that the wrong ideas appear to be cashing in. Those controlling the resources do not seem (as Marcus puts it) “devoted to spanning the chasm.” I am pretty sure I know why too: they don’t see one. If your psychology is associationist (even if only tacitly so), then the problem is one of detail not principle. The problem is getting the wiring diagram right (it is very complex you know), the problem is getting the right probes to reveal the detailed connections to reveal the full networks. The problem is not fundamental but practical; problems that we can be confident will advance if we throw lots of money at them.

And, as always, things are worse than this. Big money calls forth busy bureaucrats whose job it is to measure progress, write reports, convene panels to manage the money and the science. The basic problem is that fundamental science is impossible to manage due to its inherent unpredictability (as Popper noted long ago). So in place of basic fundamental research, big money begets big science which begets the strategic pursuit of the manageable. This is not always a bad thing. When questions are crisp and we understand roughly what's going on big science can find us the Higgs field or W bosons. However, when we are awaiting our "breakthrough" the virtues of this kind of research are far more debatable. Why? Because in this process, sadly, the hard fundamental questions can easily get lost for they are too hard (quirky, offbeat, novel) for the system to digest. Even more sadly, this kind of big money science follows a Gresham’s Law sort of logic with Big (heavily monied) Science driving out small bore fundamental research. That’s what Marcus is pointing to, and he is right to be disappointed.

[1] I don’t understand why the failure of the full wiring diagram of the nematode (which we have) to explain nematode behavior has not impressed so many of the leading figures in the field (Cristof Koch is an exception here). If the problem were just the details of the wiring diagram, then the nematode “cognition” should be an open book, which it is most definitely not.

[2] And these large scale technology/Big Data projects are a bureaucrats dream. Here there is lots of room to manage the project, set up indices of progress and success and do all the pointless things that bureaucrats love to do. Sadly, this has nothing to do with real science. Popper noted long ago that the problem with scientific progress is that it is inherently unpredictable. You cannot schedule the arrival of breakthrough ideas. But this very unpredictability is what makes such research unpalatable to science managers and why it is that they prefer big all encompassing sciency projects to the real thing.

[3] Gallistel has made an interesting observation about this earlier period in molecular biology. Most of the biochemistry predating Watson and Crick has been thrown away. The genetics that predates Watson and Crick has largely survived although elaborated. The analogy in the cognitive neurosciences is that much of what we think of as cutting edge neuroscience might possibly disappear once Marcus’s bridge is built. Cognitive theory, however, will largely remain intact. So, curiously, if the prior developments in molecular biology are any guide, the cognitive results in areas like linguistics, vision, face recognition etc. will prove to be far more robust when insight finally arrives than the stuff that most neuroscientists are currently invested in. For a nice discussion of this earlier period in molecular biology read this. It’s a terrific book.