Wednesday, January 24, 2018

Gary Marcus on deep learning

An “unknown” commentator left links to two very interesting Gary Marcus (GM) pieces (here1 and here2) on the current state of Deep Learning (DL) research. His two pieces make the points that I tried to make in a previous post (here), but do so much more efficiently and insightfully than I did. They are MUCH better. I strongly recommend that you take a look if you are interested in the topics.

Here are, FWIW, a couple of reactions to the excellent discussion these papers provide.

Consider first here1.

1. GM observes that the main critiques of DL contend not that DL is useless or uninteresting, but (i) that it leaves out a lot if one’s research interests lie with biological cognition, and (ii) that the part that DL leaves out is precisely what theories promoting symbolic computation have always focused on. In other words, the idea that DL suffices as a framework for serious cognition is what is up for grabs not whether it is necessary. Recall, Rs are comfortable with the kinds of mechanisms DLers favor. The E mistake is to think that this is all there is. It isn’t. As GM puts it (here1:4): DL is “not a universal…solvent, but simply…one tool among many…”

I am tempted to go a bit farther (something that Lake et. al. (see here) moot as well). I suspect that if one’s goal is to understand cognitive processes then DL will play a decidedly secondary explanatory role. The hard problem is figuring out the right representational format (the kinds of generalizations it licenses and categorizations it encourages). These fixed, DL can work its magic. Without these, DL will be relatively idle. These facts can be obscured by DLers that do not seem to appreciate the kinds of Rish debts their own programs actually incur (a point that GM makes eloquently in here 2). However, as we all know a truly blank slate generalizes not at all. We all need built-ins to do anything. The only relevant question is which ones and how much, not whether. DLers (almost always of an Eish persuasion) seem to have a hard time understanding this or drawing the appropriate conclusions from this uncontentious fact.

2. GM makes clear (here1:5) in what sense DL is bad at hierarchy. The piece contrasts “feature-wise hierarchy” from systems that “can make explicit reference to the parts of larger wholes.” GM describes the former as a species of “hierarchical feature detection; you build lines out of pixels, letters out of lines, words out of letters and so forth.” DL is very good at this (GM: “the best ever”). But it cannot do the second at all well, which is the kind of hierarchy we need to describe, say, linguistic objects with constituents that are computationally active. Note, that what GM calls “hierarchical feature detection” corresponds quite well with the kind of discovery procedures earlier structuralism advocated and whose limitations Chomsky exposed over 60 years ago. As GM notes, pure DL does not handle at all well the kinds of structures GGers regularly make use of to explain the simplest linguistic facts. Moreover, DL fails for roughly the reasons that Chomsky originally laid out; it does not appreciate the particular computational challenges that constituency highlights.

3. GM has a very nice discussion of where/how exactly DLs fail. It relates to “extrapolation” (see discussion of question 9, 10ff). And why? Because DL networks “don’t have a way of incorporating prior knowledge” that involve “operations over variables.” For these kinds of “extrapolations” we need standard symbolic representations, and this is something that DL eschews (for typically anti-nativist/rationalist motives). So they fail to do what humans find trivially easy (viz. to “learn from examples the function you want and extrapolate it”). Can one build into DL systems that employ operations over variables? GM notes that they can. But in doing so they will not be pure DL devices and will have to allow for symbolic computations and the innate (i.e. given) principles and operations that DLers regularly deny is needed.

4. GM’s second paper also has makes for very useful reading. It specifically discusses the AlphaGO programs recently in the news for doing for Go what other programs did for chess (beat the human champions). GM asks whether the success of these programs support the anti R conclusions that its makers have bruited about? The short answer is ‘NO!”. The reason, as GM shows, is that there is lots of specialized pre-packaged machinery that allows these programs to succeed. In other words, they are elbow deep into very specific “innate” architectural assumptions without which the programs would not function.

Nor should this be surprising for this is precisely what one should expect. The discussion is very good and anyone interested in a good short discussion of innateness and why it is important should take a look.

5. One point struck me as particularly useful. If what GM says is right then it appears that the non nativist Es don’t really understand what their own machines are doing. If GM is right, then they don’t seem to see how to approach the E/R debate because they have no idea what the debate is about. The issue is not whether machines can cognize. The issue is what needs to be in a machine that cognizes. I have a glimmer of a suspicion that DLers (and maybe other Eish AIers) confuse two different questions: (a) Is cognition mechanizable (i.e does cognition require a kind of mentalistic vitalism )? versus (b) What goes into a cognitively capable mind: how rasa can a cognitively competent tabula be?
These are two very different questions. The first takes mentalism to be opposed to physicalism, the suggestion being that mental life requires something above and beyond the standard computational apparatus to explain how we cognize as we do. The second is a question within physicalism and asks how much “innate” (i.e. given) knowledge is required to get a computational system to cognize as we do. The E answer to the second question is that not much given structure is needed. The Rs beg to differ. However, Rs are not committed to operations and mechanisms that transcend the standard variety computational mechanisms we are all familiar with. No ghosts or special mental stuff required. If indeed DLers confuse these two questions then it explains why they consider whatever program they produce (no matter how jam packed with specialized “given” structures (of the kind that GM notes to be the case with AlphaGO)) as justifying Eism. But as this is not what the debate is about, the conclusion is a non-sequitur. AlphaGo is very Rish precisely because it is very non rasa tabularly.[1]

To end: These two pieces are very good and important. DL has been massively oversold. We need papers that keep yelling about how little cloth surrounds the emperor. If your interests are in human (or even animal) cognition then DL cannot be the whole answer. Indeed, it may not even be much or the most important part of the answer. But for now if we can get it agreed that DL requires serious supplementation to get off the ground, that will be a good result. GM’s papers are a very good at getting us to this conclusion.



[1] I should add, that there are serious mental mysteries that we don’t know how to account for conutationally. Chomsky describes these as the free use of our capacities and what Fodor discusses under the heading central systems. We have no decent handle on how we freely exercise our capacities or how the complex judgments work. These are mysteries, but these mysteries are not what the E/R debate is mostly about.

Monday, January 22, 2018

The Gallistel-King conjecture; an update

As time goes by, bets against the veracity of the Gallistel-King conjecture (see here and here) are becoming longer and longer. Don’t get me wrong. The cog-neuro world is not about to give up on its love affair with connectionism. It’s just that as the months pass, the problems with this (sadly, hyper Empiricist) view of things becomes ever more evident and this readies people for a change. Moreover, as you can’t beat something with nothing but a promise of something (you actually need a concrete something), it is heartening to see that the idea of classical computation within the neuron/cell is becoming ever more conventional. Here is a recent report that shows how far things have come.

It shows how living cells can classically compute, in the sense of programmable circuits (“predictable and programmable RNA-RNA interactions”), which “resemble” conventional electronic circuits” with the added feature that they “self-assemble” within cells “sense incoming messages and respond to them by producing a particular computational output.” Furthermore, “these switches can be combined…to produce more complex logic gates capable of evaluating and responding to multiple outputs, just like a computer may take several variables and perform sequential operations like addition and subtraction in order to reach a final result.” Recall, that as Gallistel has long argued, being able to compute a number and store it and use it for further computation is precisely the kind of neural computation we need to be cognitively adequate. We now know that cells have the chemical wherewithal to accomplish this using little RNA circuits, and that this is actually quite easy for the cell to do (“The RNA-only approach to producing cellular nanodevices is a significant advance, as earlier efforts required the use of complex intermediaries, like proteins”) reliably.


So, the idea that cells can classically compute is true. It would be surprising if evolution developed an entirely novel computational procedure instead of exploiting the computational potential of ready available ones to get our cognitive capacities off the ground. This is possible (of course) but seems like a weird way to proceed if the ingredients for a standard kind of computation (symbolic) are already there for the taking. This is the point of the Gallistel-King conjecture, and to me, it seems like a very good one.

Wednesday, January 10, 2018

An update on Empiricism vs Rationalism debates

All papers involve at least two levels of discourse. The first comprises the thesis: what is being argued? What evidence is there for what is being argued? What’s the point of what is being argued? What follows from what is being argued? The second is less a matter of content than of style: how does the paper say what it says? This latter dimension most often reveals the (often more obscure) political dimensions of the discipline, and relates to topics like the following: What parts of one’s butt is it necessary to cover so as to avoid annoying the powers that be? Which powers is it necessary to flatter in order to get a hearing or to avoid the ruinous “high standards of scrutiny” that can always be deployed to block publication? Whose work must get cited and whose can be safely ignored? What are the parameters of the discipline’s Overton Window? If you’ve ever been lucky enough to promote an unfashionable position, you have developed a sensitivity to these “howish” concerns. And if you have not, you should. They are very important. Nothing reveals the core shibboleths (and concomitant power structures) of a discipline more usefully than a forensic inquiry into the nature of the eggshells upon which a paper is treading.  In what follows I would like to do a little eggshell inspection in discussing a pretty good paper from a (for me) unexpected source (a bunch of Bayesians). The authors are Lake, Ullman, Tenenbaum and Gersham (LUTG) all from the BCS group at MIT. The paper (here) is about the next steps forward for an AI that aspires to be cognitively interesting (like the AI of old), and maybe even technologically cutting edge, (though LUTG makes this second point very gingerly).

But that is not all the paper is about. LUTG is also a useful addition to the discussions about Empiricism (E) vs Rationalism (R) in the study of mind, though LUTG does not put matters in quite this way (IMO, for quasi-political reasons, recall “howish” concerns!). To locate the LUTG on this axis will require some work on my part. Here goes.

As I’ve noted before, E and R have both a metaphysical and an epistemological side. Epistemologically, E takes minds to be initially (relatively) unstructured, with mental contour forged over time as a by-product of environmental input as sampled by the senses.  Minds on this view come to represent their environments more or less by copying their structures as manifest in the sensory input.  Successful minds are ones that effectively (faithfully and efficiently) copy the sensory input. As Gallistel & Matzel put it, for Es minds are “recapitulative,” their function being to reliably take “an input that is part of the training input, or similar to it, [to] evoke[s] the trained output, or an output similar to it” (see here for discussion and links). Another way of putting this point is that E minds are pattern matchers, whose job is to track the patternings evident in the input (see here). Pattern matching is recapitulative and E relies on the idea that cognition amounts to tracking the patternings in the sensory input to yield reliable patterns.

Coupled with this E theory of mind comes a metaphysics. Reality has a relatively flat causal structure. There is no rich hierarchy of causal mechanisms whose complex interactions lie behind what we experience. Rather, what you sense is all there is. Therefore, effectively tracking and cataloguing this experience and setting up the right I/O associations suffices to get at a decent representation of what exists. There is no hidden causal iceberg of which this is the sensory/perceptual tip. The I/O relations are all there is. Not surprisingly (this is what philosophers are paid for), the epistemology and the metaphysics fit snugly together.

R combines its epistemology and its metaphysics differently. Epistemologically, R regards sensory innocent minds to be highly structured. The structure is there to allow minds to use sensory/perceptual information to suss out the (largely) hidden causal structures that produce these sensations/perceptions. As Gallistel & Matzel put it R minds are more like information processing systems structured to enable minds to extract information about the unperceived causal structures of the environment that generate the observed patterns of sensation/perception. On the R view, minds sample the available sensory input to construct causal models of the world which generate the perceived sensory patterns. And in order to do this, minds come richly stocked with the cognitive wherewithal necessary to build such models. R epistemology takes it for granted that what one perceives vastly underdetermines what there is and takes it to be impossible to generate models of what there is from sensory perception without a big boost from given (i.e. unlearned) mental structures that make possible the relevant induction to the underlying causal mechanisms/models.

R metaphysics complements this epistemological view by assuming that the world is richly structured causally and that what we sense/perceive is a complex interaction effect of these more basic complexly interacting causal mechanisms. There are hidden powers etc. that lie behind what we have sensory access to and that in no way “resembles” (or “recapitulates”) the observables (see here for discussion).

Why do I rehearse these points yet again? Because I want to highlight a key feature of the E/R dynamic: what distinguishes E from R is not whether pattern matching is a legit mental operation (both views agree that it is (or at least, can be)). What distinguishes them is the Es think that this is all there is (and all there needs to be), while Rs reserve an important place for another kind of mental process, one that builds models of the underlying complex non visible causal systems that generate these patterns. In other words, what distinguishes E from R is the belief that there is more to mental life than tracking the statistical patterns of the inputs. R doesn’t deny that this is operative. R denies that this suffices. There needs to me more, a lot more.[1]

Given this backdrop, it is clear that GG is very much in the R tradition. The basic observation is that human linguistic facility requires knowledge a G (a set of recursive rules). Gs are not perceptually visible (though their products often are). Further, cursory inspection of natural languages indicates that human Gs are quite complex and cannot be acquired solely by induction (even sophisticate discovery procedures that allowed for inductions over inductions over inductions…). Acquiring Gs requires some mental pre-packaging (aka, UG) that enables an LAD to construct a G on the basis of simple haphazard bits of G output (aka, PLD). Once acquired, Gs can be used to execute many different kinds of linguistic behaviors, including parsing and producing novel sentences and phrases. That’s the GG conceit in a small nutshell: human linguistic facility implicates Gs which implicates UGs, both G and UG being systems of knowledge that can be put to multiple uses, including G acquisition, and the production and comprehension of an unbounded number of very different novel linguistic expressions within a given natural language. G, UG, competence vs performance: the hallmarks of a R theory of linguistic minds.

LUTG makes effectively these points but without much discussing the language case (it does at the end but mainly to say that it won’t discuss it).  LUTG’s central critical claim is that human cognition relies on more than pattern matching. There is, in addition, model building, which relies on two kinds of given mental contents to guide learning.  The first kind, what LUTG calls “core” (and an R would call “given”) involves substantive specific content (e.g. naïve physics and psychology). The second involves formal properties (e.g. compositionality and certain kinds of formal analogy (what LUTG calls “learning-to-learn”).[2] LUTG notes that these two features pre-condition further learning and are indispensible. LUTG further notes that the two mental powers (i.e. model building and pattern recognition) can likely be fruitfully combined, though it (rightly, IMO) insists that model building is the central cognitive operation. In other words, LUTG recapitulates the observations above that R theories can incorporate E mechanisms and that E mechanisms alone are insufficient to model human cognition and that without the part left out, E pattern matching models are poor fits with what we have tons of evidence to be central features of cognitive life. Here’s how the abstract puts it:

We review progress in cognitive science suggesting that truly human-like learning and thinking machines will have to reach beyond current engineering trends in both what they learn and how they learn it. Specifically, we argue that these machines should (1) build causal models of the world that support explanation and understanding, rather than merely solving pattern recognition problems; (2) ground learning in intuitive theories of physics and psychology to support and enrich the knowledge that is learned; and (3) harness compositionality and learning-to-learn to rapidly acquire and generalize knowledge to new tasks and situations. We suggest concrete challenges and promising routes toward these goals that can combine the strengths of recent neural network advances with more structured cognitive models.

The paper usefully goes over these points in some detail.  It further notes that this conception of the learning (I prefer the term “acquisition” myself) problem in cognition challenges associationist assumptions which LUTG observes (again rightly IMO) is characteristic of most work in connectionism, machine learning and deep learning. LUTG also points to ways that “model free methods” (aka pattern matching algorithms) might usefully supplement the model building cognitive basics to improve performance and implementation of model based knowledge (see 1.2 and 4.3.2).[3]

Section 3 is the heart of the critique of contemporary AI, which largely ignores the (R) model building ethos that LUTG champions. As it observes, the main fact about humans when compared with current AI models is that humans “learn a lot more from a lot less” (6). How much more? Well, human learning is very flexible and rarely tied to the specifics of the learning situation. LUTG provides a nice discussion of this in the domain of learning written characters. Further, human learning generally requires very little “input.” Often, a couple of minutes or exposures to the relevant stimulus is more than enough. In fact, in contrast to current machine learning or DL systems humans do not need to have the input curated and organized (or, as Geoff Hinton recently put it (here): humans, as opposed to DLers, “clearly don’t need all the labeled data.”). And why not? Because unlike what DLers and connectionists and associationists have forever assumed, human minds (and hence brains) are not E devices but R ones. Or as LUTG puts it (9):

People never start completely from scratch, or even close to “from scratch,” and that is the secret to their success. The challenge of building models of human learning and thinking then becomes: How do we bring to bear rich prior knowledge to learn new tasks and solve new problems so quickly? What form does that prior knowledge take, and how is it constructed, from some combination of inbuilt capacities and previous experience?

In short, the blank tablet assumption endemic to E conceptions of mind is fundamentally misguided and a (the?) central problem of cognitive psychology is to figure out what is given and how this given stuff is used. Amen (and about time)!
Section 4 goes over what LUTG takes to be some core powers of human minds. This includes naïve theories of how the physical world functions and how animate agents operate. In addition, with a nod to Fodor, it outlines the critical role of compositionality in allowing for human cognitive productivity.[4] It is a nice discussion and makes useful comments on causal models and their relation to generative ones (section 4.2.2).
Now let’s note a few eggshells. A central recurring feature of the LUTG discussion is the observation that it is unclear how or whether current DL approaches might integrate these necessary mechanisms. The paper does not come right out and say that DL models will have a hard time with these without radically changing its sub-symbolic associationist pattern matching monomania, but it strongly suggests this. Here’s a taste of this recurring theme (and please note the R overtones).

As in the case of intuitive physics, the success that generic networks will have in capturing intuitive psychological reasoning will depend in part on the representations humans use. Although deep networks have not yet been applied to scenarios involving theory of mind and intuitive psychology, they could probably learn visual cues, heuristics, and summary statistics of a scene that happens to involve agents. If that is all that underlies human psychological reasoning, a data-driven deep learning approach can likely find success in this domain.
However, it seems to us that any full formal account of intuitive psychological reasoning needs to include representations of agency, goals, efficiency, and reciprocal relations. As with objects and forces, it is unclear whether a complete representation of these concepts (agents, goals, etc.) could emerge from deep neural networks trained in a purely predictive capacity. Similar to the intuitive physics domain, it is possible that with a tremendous number of training trajectories in a variety of scenarios, deep learning techniques could approximate the reasoning found in infancy even without learning anything about goal- directed or socially directed behavior more generally. But this is also unlikely to resemble how humans learn, understand, and apply intuitive psychology unless the concepts are genuine. In the same way that altering the setting of a scene or the target of inference in a physics-related task may be difficult to generalize without an understanding of objects, altering the setting of an agent or their goals and beliefs is difficult to reason about without understanding intuitive psychology.

Yup. Right on. But why so tentative? “It is unclear?” Nope it is very clear, and has been for about 300 years since these issues were first extensively discussed. And we all know why. Because concepts like “agent,” “goal,” “cause,” “force,” are not observables and not reducible to observables. So, if they play central roles in our cognitive models then pattern matching algorithms won’t suffice. But as this is all that E DL systems countenance, then DL is not enough. This is the correct point, and LUTG makes it. But note the hesitancy with which it does so. Is it too much to think that this likely reflects the issue mooted at the outset about the implicit politics that one’s dance over the eggshells reveals.
It is not hard to see from how LUTG makes its very reasonable case that it is a bit nervous about DL (the current star of AI). LUTG is rhetorically covering its posterior while (correctly) noting that unreconstructed DL will never make the grade. The same wariness makes it impossible for LUTG to acknowledge its great debt to R predecessors.[5] As LUTG states, its “goal is to build on their [neural networks, NH] successes rather than dwell on their shortcomings” (2). But those that always look forward and never back won’t move forward particularly well either (think Obama and the financial crisis). Understanding that E is deeply inadequate is a prerequisite for moving forward. It is no service to be mealy-mouthed about this. One does not finesse one’s way around veery influential bad ideas.
Ok, so I have a few reservations about how LUTG makes its basic points. That said, this is a very useful paper. It is nice to see this coming out of the very influential Bayesian group at MIT and in a prominent place like B&BS. I am hoping that it indicates that the pendulum is swinging away from E and towards a more reasonable R conception of minds. As I’ve noted the analogies with standard GG practice is hard to miss. In addition LUTG rightly points to the shortcomings with connectionist/deep learning/neural net approaches to mental life. This is good. It may not be news to many of us, but if this signals a return to R conceptions of mind, it is a very positive step in the right direction.[6]


[1] R often goes farther: that even tracking the relevant perceptual regularities requires lots of given mental baggage. The world does not come pre-labeled. So zeroing in on the relevant dimensions for inductive generalization itself requires lots of pre-packaged “knowledge.” Geoff Hinton (the daddy of deep learning) seems to have come to a similar view of late concerning how hard it is to get things off the ground without curated data. See below for a reference.
[2] GGers should find this reminiscent of Chomsky’s discussion in Aspects of substantive and formal universals and how these interact to create Gs (models) of the ambient degenerate and deficient linguistic input available to the child (PLD).
[3] Or to put this in GG friendly terms; LUTG resurrects something very like a competence/performance distinction, model building being analogous to the former and model free methods applied to these models being analogous to the latter. An idea I found interesting is that given models of competence provides a useful domain for performance models wherein sophisticated pattern matching algorithms do their work. Conceptually, this idea seems very similar to what Charles Yang has been advocating as well (see here).
[4] Actually, though compositionality is critical to productivity, it does not suffice for generating an “infinite number of thoughts” or using an “infinite number of sentences”  “from a finite set if primitives” (14). For this we need more than compositionality, we need recursion as well.
[5] It is amazing, IMO, how small a role Chomsky plays in LUTG’s discussion. So far as I can tell, all of its major points were developed in the modern period by him. I am pretty sure that some of the authors know this but that highlighting this fact would hurt the paper politically by offending the relevant leading DL lights.
[6] BTW, LUTG has a nice discussion of the biological “plausibility” of neural net models of the brain. The short point is that short of being pictorially suggestive, there is no real reason for thinking that brains are like connectionist nets. As LUTG puts it (20):

Many seemingly well-accepted ideas concerning neural computation are in fact biologically dubious, or uncertain at best…

For example, most neural networks use some form of gradient based (e.g. back propogation) or Hebbian learning. It has long been argued, however, that
backpropagation is not biologically plausible. As Crick (1989) famously pointed out, backpropagation seems to rquire that information be transmitted backward along the axon, which does not fit with realistic models of neuronal function…

As LUTG observes, this should not in and of itself stop people from investigation neural nets as possible models of brain computation, but it should put an end to the prejudice that brains are nets because they look net-like. Sheesh! 

Thursday, January 4, 2018

Phi features, binding, and A-positions

Preface

This post continues a theme started here and here. Broadly, this series of posts is an attempt to highlight the daylight that exists between syntax and semantics.

I have several motivations for writing these posts. First, writing them, and reading & replying to comments, really helps me sharpen my own thinking on the issues. (Whether I’m convincing anyone but myself is a separate matter, of course.) Additionally, though, it is my impression that when it comes to the syntax-semantics mapping, the working assumption that the mapping in question is transparent – a wholly legitimate research heuristic, of course – is in practice often elevated to the status of ontological principle. This, in turn, licenses potentially problematic inferences about syntax. And it is these cases that I wish to highlight.

I hasten to add that I’m not sure there’s anything different in kind here from what goes on in any other “interface” work. That is, I don’t mean to impugn syntax-semantics work in particular (as opposed to, say, syntax-morphology work or whatever else). It’s just that the particular syntax-semantics inferences I’m talking about are ones that I often bump up against in my own work, and I often get the feeling that they are accorded the status of “established truths” – which places the burden of proof on any proposal that would contradict them. It’s this view that I’d like to challenge here.

Finally, for interesting discussions pertaining to the substance of this post in particular, I’d like to thank Amy Rose Deal – who should not, of course, be held responsible for any of its contents; in fact I’m fairly sure she would disagree!

Okay, let’s get to it...

––––––––––––––––––––

What is an “A-position”? Originally, the ‘A’ was supposed to be a mnemonic for “Argument” – the idea being that an A-position is any position that could, in principle, introduce arguments. A particular set of properties was then shown to correlate with being in, or moving to, an A-position. Most important for our current purposes are the binding-related ones: A-positions were the positions from which one could antecede novel binding dependencies. Hence the well-known kind of asymmetry between (1a) and (1b):

Some distractions

Here are three very short pieces that might amuse you.

The first (here) discusses the complex ways that birds cooperate while singing to enhance their partner's responses. This is pretty sophisticated behavior and it strikes me as having a more than passing resemblance to turn taking activity in cooperative conversation. If this analogy is on the right track, then it is a case where something that we find in human language use has analogues in other species. Note, that so far as we can tell, cooperation of this sort does not endow the cooperators with anything like unbounded hierarchical syntax of the kind found in human language. Which just goes to show (if this were needed) that the fact that communication can be socially directed and involves cooperation does not suffice to explain its formal properties. I am sure you did not need reminding of this, though there are some who still suggest that ultimately such forms of cooperation will get one all the way to recursive syntax.

Here's another piece on plant cognition, this time decision making. Their strategic thinking is quite striking, with plants suiting their responses to the strategic options available to them. Their "behavior" is very context sensitive and it appears that they they maximize their access to light using several different strategies appropriately. How they do this is unclear, but that they do it seems well established. As Michael Gruntman, one of the researchers noted: "Such an ability to choose between different responses according to their outcome could be particularly important in heterogeneous environments, where plants can grow under neighbors with different size, age, or density, and should therefore be able to choose their appropriate strategy." And all without brains.

The third piece (here), is a spot by Gelman. It more or less speaks for itself but it useful makes the point again that stats without theory usually produces junk. We cannot repeat this often enough, especially given his observation that this message has not filtered through to the professionals that use the statistical machinery.