Friday, March 30, 2018

Three varieties of theoretical research

As FoLers know, I do not believe that linguists (or at least syntacticians) highly prize theoretical work. Just the opposite, in fact. This is why, IMO, the field has tolerated (rather than embraced) the minimalist project (MP) and why so many professionals believe MP to have largely been a failure despite, what (again IMO) is its evident overall successes. As I’ve argued this at length before, I will not do so again here. Rather I would like to report on an interesting paper that I have just re-read that tries to elucidate three distinct kinds of theoretical work. The paper is an old one (published in 2000). It’s called “Thinking about Mechanisms” and the authors, three philosophers, are Peter Machamer, Lindley Darden and Carl Craven (MDC). Here is a link. The paper concentrates on elucidating the notion of a mechanism and argues that it is the key explanatory notion within neurobiology and molecular genetics. The discussion is interesting and I recommend it. In what follows, I would like to pick out some points at random that MDC makes and relate it to linguistic theorizing. This, I hope, will encourage others to look more kindly on theoretical work.

MDC defines the notion of a as follows:

Mechanisms are entities and activities organized such that they are productive of regular changes from start or set-up to finish or termination conditions… To give a description of a mechanism for a phenomenon is to explain that phenomenon, i.e. to explain how it was produced. (3)

So, mechanisms are theoretical constructs whose features (the “entities,” their “properties,” and the “activities” they partake in) explain how phenomena of interest arise. So mechanisms produce phenomena (in biology, in real time) in virtue of the properties of their parts and the activities they engender. [1]

MDC divides a mechanistic description into three parts: (i) Set-up Conditions, (ii) Termination Conditions and (iii) Intermediate activities.

The first, set-up conditions, are “idealized descriptions” of the beginning of the mechanism. Termination conditions are “idealized states or parameters describing a privileged endpoint.” The intermediate steps provide an account of how one gets from the initial set-up to the termination conditions, which describe the phenomenon of interest. (11-12)

This should all sound vaguely familiar. To me it sounds very much like what linguists do in providing a grammatical derivation of a sentence of interest. We start with an initial structure (e.g. a D(eep) S(tructure) representation and explain some feature of a sentence (e.g why the syntactic subject is interpreted as a thematic object) by showing how various operations (i.e. transformations) lead from the initial to the termination state. Doing this explains why the sentence of interest has the properties to be explained. Indeed, it has them in virtue of being the endpoint of the licit derivation provided.

Note too that both mechanisms and GPs focus on idealized situations. GPs describes the linguistic competence of an ideal speaker-hearer or the FL of an idealized LAD. So too with biological mechanisms. They describe idealized hearts or kidneys or electrical conduction at a synapse. Actual instances are not identical to these, though they function in the same ways (it is hoped). No two hearts are the same, yet every idealized heart is identical to any other.

This said, derivations are not actually mechanisms in MDC’s sense for they do not operate in real time (unlike the one’s biologists are typically describing (e.g. synaptic transmission or protein synthesis)). However, generative procedures (GP) are the “mechanisms” of interest within GG for it is (at least in large part) in virtue of the properties of GPs that we explain why native speakers judge the linguistic objects in their native languages as they do and why Gs have the properties they have. Furthermore, as in biology, the aim of linguistics is to elucidate the basic properties of GPs and try to explain why they have the properties they have and not others. So, GPs in linguistics are analogous to mechanisms in other parts of biology. Phenomena are interesting exactly to the degree that they serve to shine light on the fine structures of mechanisms in biology. Ditto with GPs in linguistics.

MDC notes that a decent way to write a history of biology is to trace out the history of its mechanisms. I cannot say whether this is so for the rest of biology, but as regards linguistics, there are many worse ways of tracing the history of modern GG than by outlining how the notion of GP has evolved over the last 60 years.  There is a reasonable argument to be made (and I have tried to make it (see here and following four posts) that the core understanding of GP has become simpler and more general in this period, and that the Minimalist Program is conservative extension of prior work describing the core properties of a human linguistic GP. Not surprisingly, this has analogues in the other kinds of biological theorizing MDC discusses.

So, the core explanatory construct in biology according to MDS is the mechanism. As MDC puts it: “…a mechanistic explanation…renders a phenomenon intelligible…Intelligibility arises not from an explanation’s correctness, but rather from an elucidative relation between the explanans (the set-up conditions and the intermediate entities and activities) and the explanadum (the termination condition or the phenomenon to be explained)” (21).

MDC is at pains to point out that this “elucidative relation” holds regardless of the accuracy of the description. So explanatory potential is independent of truth, and what theory aims at are theories with such potential. Explanatory potential relies on elucidating how something could work, not how it does. The gap between possibility and actuality is critical for the theoretical enterprise. It’s what allows it a certain degree of autonomy.  

For such autonomy to be possible it is critical to appreciate that explanatory potential (what I have elsewhere called “oomph”) is not reducible to regularity of behavior. Again MDC (21-22):

We should not be tempted to follow Hume and later logical empiricists into thinking that the intelligibility of activities (or mechanisms) is reducible to their regularity. Description or mechanisms render the end stage intelligible by showing how it is produced by bottom out entities and activities. To explain is not merely to redescribe one regularity as a series of several. Rather, explanation involves revealing the productive relation. It is the unwinding, bonding, and breaking that explain protein synthesis; it is the binding, bending, and opening that explain the activity of Na+ channels. It is not the regularities that explain the activities but the activities that sustain the regularities.

In other words, mechanisms are not (statistical) summaries of what something regularly does. Regularities/summaries do not (and cannot) explain, and as mechanisms aim to explain they must be more than such summaries no matter how regular. Mechanisms outline how a phenomenon has (or could have) arisen, and this requires outlining the structures and principles that mechanisms deploy to “generate” the phenomenon of interest.[2]

Importantly, it is the relative independence of explanatory potential from truth that allows theory to have an independent existence. MDC suggest three different grades of theoretical involvement summed up in three related but different questions: How possibly? How plausibly? How actually? Let me elaborate.

Explanations are hard. They are hard precisely because they must go beyond recapitulating the phenomenon of interest. Finding the right concepts and putting them together in the right way can be demanding. Here is an example of what I mean (see here for an earlier discussion).

The Minimalist Program (MP) has largely ignored ECP effects of the argument/adjunct asymmetry variety.  Why so? I would contend it is because it is quite unclear how to understand these effects in MP terms. In this respect ECP effects contrast with island effects. There are MP compatible versions of the latter, largely recapitulating earlier versions of Subjacency Theory. IMO, such accounts are not particularly elegant, nor particularly insightful. However it is possible to pretty directly trade bounding nodes for phases, escape hatches for phase edges and Subjacency Principles for Phase Impenetrability Conditions in largely a one for one swap and thereby end up with a theory no worse than the older GB stories but cast in an acceptable MP idiom. This does not constitute a great theoretical leap forward (and so, if this is correct, for these phenomena thinking a la MP does not deepen our understanding), but at least it is clear how island effects could hold within an MP style conception of G. They reduce to Subjacency Effects albeit with all the parts suitably renamed. In other words, the theoretical and conceptual resources of MP are adequate to recapitulate (if not much illuminate) the theoretical and conceptual resources of earlier GB.

This is not so for ECP effects. Why not? Well for several reasons, but the two big ones are that the ECP is a trace licensing condition and the technology behind it appears to run afoul of inclusiveness. Let’s discuss each point in turn.

The big idea behind the ECP is that traces are grammatically toxic unless tamed. They can be tamed by being marked (gamma-marked) by a local antecedent throughout the course of the derivation. The distinction between arguments and adjuncts arises from the assumption that argument A’-chains can be reduced, thereby eliminating their -gamma marked carriers and thereby not cancelling the derivation at LF (recall, -gamma marked expressions kill a derivation). So, traces are toxic, +gamma marking tames them, and deletion acts differently for adjuncts and arguments which is why the former are more restricted than the latter. This, plus a kind of uniformity principle on chains (not a great or intuitive principle IMO, but maybe this is just me) which invidiously distinguishes adjunct from argument chains,[3] yields the desired empirical payoff.

Given the complexity of the ECP data, this is an achievement. Whether it constitutes much of an explanation is something people can disagree about. However, whatever it’s value, it runs afoul of what appear to be basic MP assumptions. For example, MP eschews traces, hence there is little conceptual place for a module of the grammar whose job it is to license them. Second, MP derivations reject adding little diacritics to expressions in the course of a derivation. If indices are technicalia non grata, what to make of +/-gamma marks. Last, MP derivations are taken to be monotonic (No Tampering), hence frowning upon operations that delete information on the “LF” side of a derivation. But deleting –gamma marked traces is what “explains” the argument/adjunct difference. So, the standard GB story doesn’t really fit with basic MP assumptions and this makes it fruitful to ask how ECP effects could possibly be modeled in MP style accounts.  And this is a job for theorists: to come up with a story that could fit, to find the right combination of MP compatible concepts that would yield roughly the right empirical outcomes.[4] The theoretical challenge, then, in the first instance, is to explain how to possibly fit ECP effects into an MP setting, given the elimination of traces and a commitment to derivational monotonicity.

There are additional why questions out there begging for how possible scenarios: e.g. Why case? Why are phrase markers organized so that theta domains are within case/agreement domains that are within A’ (information structure) domains? Why are reflexivization and pronominalization in complementary distribution? Why is selection and subcategorization so local? I could go on and on. These why questions are hard not because we have tons of possible explanatory options but cannot figure out which one to run with, but because we have few candidate theories to run with at all. And that is a theoretical challenge, not just an empirical one. It’s in situations like these that how-possibly becomes a pressing and interesting issue. Sadly, it is also something that many working syntacticians barely attend to.

MDC notes a second level of theoretical involvement: how plausible is a certain possible story. Clearly to ask this requires having a how possible scenario or two sketched out. It is tempting to think that plausibility is largely a matter of empirical coverage. But I would like to suggest otherwise. IMO, plausibility is evaluated along two dimensions: how well the novel theory covers the older (gross) empirical terrain and how many novel lines of inquiry it prompts. A theory is plausible to the degree that it largely conserves the results and empirical coverage of prior theory (what one might call the “stylized facts”) and the degree to which it successfully explains things that earlier theory left stipulative. Again, let me illustrate.

Clearly, plausibility is more demanding than possibility. Plausible theories not only explain, but have verisimilitude (we think that they have a decent chance of being correct). What are the marks of a plausible account? Well, they cover roughly the same empirical territory of the theory they are replacing and they explain what earlier theory stipulated. Here are a couple of examples.

I believe that movement theories of binding and control are plausible precisely because they are able to explain why Obligatory Control (OC) and reflexivization have many of the properties they do. For example, we typically find neither in the subject position of finite clauses (e.g. John expects PRO to/*will win, John expects him(he)self to/*will win). Why not? Well if the movement theory is right, then they are parts of A-chains and so should pattern like what we find in analogous raising constructions (e.g. John was expected t to/*would win), and they do. So the movement theory derives what is largely stipulated in earlier accounts and exposes as systematic relations that earlier theory treated as coincidental (that finite subject positions don’t allow PRO, reflexives or A-traces).  Does his make such accounts true? Nope. But it does enhance their plausibility. Thus, being able to unify these disparate phenomena and provide principled explanations for the distribution of OC PRO and for the relative paucity of nominative reflexives enhances their claims on truth.

Note that here plausibility hinges on (1) accepting that prior accounts are roughly descriptively accurate (i.e. doing what decent science always does; building on past work and insight) and (2) explaining their stipulated features in a principled way. When a story has these two features it moves from possible to plausible. Of course, demonstrating plausibility is not trivial, and what some consider plausibility enhancing others will find wanting. But that is as it should be. The point is not that theorizing is dispositive (nothing is) but that it strives for goals different from empirical coverage (and this is not intended to disparage the latter).

Let me out this another way. When one has a possible explanation in hand it is time to start looking for evidence in its favor. In other words, rather than looking for ways to reject the account one looks for reasons to accept it as a serious one. Trying to falsify (i.e. rigorously test) a proposal has its place, but so does looking for support. However, trying to falsify a possible theory is premature. What one should test are the plausible ones, and that means finding ways to elevate the possible to a higher epistemological plain; the territory of the plausible. That’s what how-plausibly theory aims to do; find the fit between something that is possible and what has come before and showing that the new possible story is a fecund extension of the old. It is an extension in that it covers much of the same territory. It is fecund in that it improves on what came before. This kind of theorizing is also hard to pull off, but like how-possible theory, it relies heavily on theoretical imagination.

Which brings us to how-actually investigations. This is where the theory and the data really meet and where something that family resembles falsification comes into play. Say we have a plausible theory, the next step is to tease out ways of testing its central assumptions. This, no doubt, sounds obvious. But I would beg to differ. Much of what goes on in my little area of linguistics fails to test central postulates and largely concentrates on seeing how to fit current theoretical conceptions to available data (e.g. how to apply a Probe-Goal account to some configuration of agreement/case data). There is nothing wrong with this, of course. But it is not quite “testing” the theory in the sense of isolating its central premises/concepts and seeing how they fly. Let me give you an example.

I personally know of very few critical tests driven by thoughtful theorizing. But I do know of one: the Aoun/Choueiri account of reconstruction effects (RE). The reasoning is as follows: If REs are reflections of the copy theory of movement (as every good Minimalist believes) then where there is no movement, there should be no reconstruction (notice movement is a necessary, not sufficient condition for RE). There is no movement from islands, therefore there should be no RE within islands. Aoun/Choueiri then goes onto argue that resumption in Lebanese Arabic is a movement dependency (Demirdache argued this first I believe) and argues that whereas REs are available when an antecedent binds a resumptive outside an island, they fail systematically to arise with resumptives inside islands. This argues for two central conclusions: (i) that REs are indeed parasitic on movement and (ii) that resumption is a movement dependency. This vindicates the copy theory, and with it a central precept of MP.

For now forget about whether Aoun/Choueiri is right about the facts.[5] The important point here is the logic. The test is interesting because it very clearly implicates key features of current theory: the copy theory of movement, islands as restrictions on movements and REs as piggybacking on copies.  These are three central features and the argument if correct tests them. And this is interesting precisely because they are central ideas in any MP style account. Moreover, it is very clear how the premises bear on the testable conclusion.[6] They can be laid out (that’s where theory comes in BTW, in laying out the premises and showing how together they have certain testable consequences) and a prediction squeezed from them. Moreover, the premises, as noted, are theoretically robust. The Copy Theory of Movement is a core feature of MP architectures, locality as islands are central parts of any reasonable GG theory of syntax. Hence if these pulled apart it would indicate something seriously amiss with how we conceptualize the fundamentals of FL/UG. And that is what makes the Aoun/Choueiri argument impressive.

Like I said, I personally know of only a couple of cases like this. What makes it useful here is that it illustrates how to successfully do how-actually theory (i.e. it is a paradigm case of how-actually theoretical practice). Find consequences of core conceptions and use them to test the core ideas.  We all know that most of what we believe today is likely wrong at least in detail. Knowing this however, does not mean that we cannot test the core features of our accounts. But this requires determining what is central (which requires theoretical evaluation and judicious imagination) and figuring out how to tease consequences from them (which requires analytical acumen). In testing a proposal to see how-actual we need to lead from theory to data, and this means thinking theoretically by respecting the deductive structure that makes a theory the theory it is.

How possible, how plausible, how actual; three grades of theoretical involvement. All are useful. All require attention to the deductive structure of the core ideas that constitute theory. All start with these ideas and move outwards towards the phenomena that, correctly used, can help us refine and improve them. Right now, theoretical work is largely absent from the discipline, at least the how possibly and how plausibly variety. Even the how actually kind is far less common than commonly supposed.


[1] MDC distinguishes “substantivilists” and “process” ontologists” wrt their different understanding of mechanism. The difference appears to reside in whether mechanisms comprise both “entities” and “activities” or whether activities alone suffice. MDC takes it as obvious that reducing entities to activities is hopeless (“As far as we know, there are no activities in …biology…that are not activities of entitites” (5)). I mention this for the discussion is redolent of the current discussion on FoL (Idsardi and Raimy discussing Hale and Reiss) concerning substance free phonology. It is curious that the same kind of discussion takes place in a very different venue and so it is worth taking a look at it in this domain to gain leverage on the one in ours.
[2] As an old friend (Louise Antony) once remarked: in answer to a question like “why did this book drop to the floor when I let go?” it is not helpful to answer that “it always drops whenever anyone lets go.”
[3] I am sure it is not news that the distinction is not accurately described in terms of arguments and adjuncts. But for the record, the absence of pair-list readings of WHs extracted from weak islands seems to show the same acceptability profile as adjuncts even though the WH moves from the complement position. The difference seems to be less argument/adjunct and more individual variable vs higher level variable interpretation.
[4] FWIW, I think that this is where maybe a minimality style explanation of the Rizzi-Cinque variety might be a better fit than the Lasnik-Saito/Barriers approach. But even this story need some detailed reworking.
[5] There is some evidence from Jordanian Arabic contradicting it, though I am not sure whether I believe it yet. Of course, you can take what I believe and still need the full fare of about $2 to get a metro ride in DC.
[6] I often find it surprising how few papers of a purported theoretical nature actually set out their premises clearly and deduce the conclusion of interest. More often, we sue theory like putty and smear it on our favorite empirical findings to see if copiously applied it can be used to hold the data together. Though this method can yield interesting results it is not theory driven and generally fails to address an identifiable theoretical question.
 

Thursday, March 29, 2018

Imagine no substantive possessions

Bill Idsardi & Eric Raimy
 

Let’s return now to the beginning of the exposition. Reiss 2016:1 starts out with a Lennon-Ono riff ( https://www.rollingstone.com/music/news/yoko-ono-added-as-songwriter-on-john-lennons-imagine-w488104 , thanks Karthik!):
“Imagine a theory of phonology that makes no reference to well-formedness, repair, contrast, typology, variation, language change, markedness, ‘child phonology’, faithfulness, constraints, phonotactics, articulatory or acoustic phonetics, or speech perception.”
(I wonder if you can.) Having excluded all of this stuff he wants to argue “that something remains that is worthy of the name ‘phonology’.” Unless he’s using these terms in ways we don’t understand, there would seem to be no substance left at all, as the resulting phonology can’t make reference to the motor and perceptual interfaces and any statements about precedence relations (phonotactics) are excluded. We’re also puzzled about how one can construct a formal system without employing any well-formedness conditions (axioms). And a theory without any substance is not a theory of anything. Similarly, an interface that doesn’t effectively transmit any information between two modules is not an interface but the lack of an interface.

Put in Marrian terms (Marr 1982, you knew this was in the cards when you started reading), there have to be some linking hypotheses between the computational, algorithmic and implementational levels (Marr p. 330 "the real power of the approach lies in the integration of all three levels of attack" emphasis added), and there must be reasonable interfaces which include compatibility in data structures between any connecting sub-modules contributing to the overall solution of the problem (e.g. the different visual coordinate systems, Marr 1982:317ff).

Max Papillon tells me [wji] that I’m misreading all this, and that I’m not the target audience anyway. (I do get it that I’m not considered much of a phonologist these days, the basis of a long-running joke in the Maryland department.) Perhaps then this is all a Feyerabend 1975 ish move, providing an ascetic formalist tonic to the hedonistic excess of substance, as Feyerabend 1978:127 explains in one reply to a book review (in the section called “Conversations with Illiterates”):
“I do not say that epistemology should become anarchic or that the philosophy of science should become anarchic. I say that both disciplines should receive anarchism as medicine. Epistemology is sick, it must be cured, and the medicine is anarchy. Now medicine is not something one takes all the time. One takes it for a certain period of time, and then one stops.” (emphasis in original)
Strengthening the comparison with Feyerabend, I recall Joe Pater’s comment to Mark Hale after Mark’s talk at the MIT Phonology 2000 conference, “I know what you are. You’re a philosopher!” The Feyerabend analogy is how I understood the Hale & Reiss 2000 charge of “substance abuse”: there’s too much appeal to substance, and this should be reduced (take your medicine). As a methodological maxim "Reduce Substance!" then I'm all on board. But let's not confuse ourselves into thinking that all reference to substance can be completely eliminated, for the theory has to be about something.

Fortunately, we think there are relatively concrete proposals to be made that start right where Chomsky suggests, with features and precedence. Our proposal (tune in next time) can be read as Raimy 2000 on steroids, with dollops of Avery & Idsardi 1999, Poeppel & Idsardi 2012 and Kazanina, Bowers & Idsardi 2017. (Do people take steroids with dollops of anything? Maybe with those articles as a chaser? Sorry for the mixed metaphors.)

Let’s sum up here with an attempt at our understanding of what substance and substantive should mean in the context of developing modular theories for complicated things like speech and vision. Entities or relations in the model are substantive to the degree that they do explanatory work within the model and have lawful connections across the interfaces to entities and relations in other modules. Such things are the substance of the theory. Entities or relations in the model that do not have such lawful connections are the (purely) formal or non-substantive things (we will suggest some). But this can be a hard matter to establish in any particular case, for the lawful connections will tend to be partial rather than total. The bumper sticker version of all this is “substantive = veridical and useful”.

Next time: Swifties

Tuesday, March 27, 2018

Beyond Epistodome

Note: This post is NOT by Norbert. It's by Bill Idsardi and Eric Raimy. This is the first in a series of posts discussing the Substance Free Phonology (SFP) program, and phonological topics more generally.

Bill Idsardi and Eric Raimy

Before the beginning

For and against method is a fascinating book, documenting the correspondence between Paul Feyerabend and Imre Lakatos in the years just before Lakatos died. Feyerabend proposed a series of exchanges with Lakatos, with Lakatos explicating his Methodology of Scientific Research Programs, and Feyerabend taking the other side, the arguments that became Against Method. We’ll try something similar here, on the Faculty of Language blog, relating to the question of substance in phonology.

Beyond Epistodome

Note: not “Epistemodome” because phonology cares not one whit about etymology Over severalteen posts we will consider the Substance Free Phonology (SFP) program outlined by Charles Reiss and Mark Hale in a number of publications, especially Hale & Reiss 2000, 2008 and Reiss 2016, 2017. We will be concentrating mainly on Reiss 2016 ( http://ling.auf.net/lingbuzz/003087/current.pdf ). Although we agree with many of their proposals, we reject almost all of the rationales they offer for them. Because that’s such an unusual combination of views, we thought that this would be a useful forum for discussion. (And we don’t think any journal would want to publish something like this anyway.) Because this is the Faculty of Language blog (FLog? FoLog? vote in the comments!), we will start with a reading. Today’s reading is from the book of LGB, chapter 1, page 10 (Chomsky 1981):
“In the general case of theory construction, the primitive basis can be selected in any number of ways, so long as the condition of definability is met, perhaps subject to conditions of simplicity of some sort. [fn 12: See Goodman (1951).] But in the case of UG, other considerations enter. The primitive basis must meet a condition of epistemological priority. That is, still assuming the idealization to instantaneous language acquisition, we want the primitives to be concepts that can plausibly be assumed to provide a preliminary, pre-linguistic analysis of a reasonable selection of presented data, that is, to provide the primary linguistic data that are mapped by the language faculty to a grammar; relaxing the idealization to permit transitional stages, similar considerations hold. [fn 13: On this matter, see Chomsky (1975, chapter 3).] It would, for example, be reasonable to suppose that such concepts as “precedes” or “is voiced” enter into the primitive basis …” (emphasis added)
So the motto here is not “substance free”, but rather “substance first, not much of that, and not much of anything else either”. Since we’re writing this during Lent (we gave up sanity for Lent), the message of privation seems appropriate. And we are sure that the minimalist ethos is clear to this blog’s readers as well. Reiss 2016:16-7 makes a different claim:
“• phonology is epistemologically prior to phonetics
Hammarberg (1976) leads us to see that for a strict empiricist, the somewhat rounded-lipped k of coop and the somewhat spread-lipped k of keep are very different. Given their distinctness, Hammarberg make the point, obvious yet profound, that we linguists have no reason to compare these two segments unless we have a paradigm that provides us with the category k. Our phonological theory is logically prior to our phonetic description of these two segments as “kinds of k”. So our science is rationalist. As Hammarberg also points out, the same reasoning applies to the learner -- only because of a pre-existing built-in system of categories used to parse, can the learner treat the two ‘sounds’ as variants of a category: “phonology is logically and epistemologically prior to phonetics”. Phonology provides equivalence classes for phonetic discussion.” (emphasis added)
Two claims of epistemological priority enter, one claim leaves (or maybe none). The pre-existing built-in system of categories used to parse include: (1) the features (Chomsky, Reiss 2016:18), which they both agree are substantive (Chomsky: “concepts … [that] provide a preliminary pre-linguistic analysis”, Reiss 2016:26: “This work [Hale & Reiss 2003a, 2008, 1998] accepts the existence of innate substantive features”) and (2) precedence (Chomsky; Reiss is mum on this point), also substantive. (We will get to our specific proposal in post # 3.) In the case of the learner, it’s not clear if a claim of epistemological priority can be made in either direction. In our view children have both structures: they come with innate, highly specified motor, perceptual and memory architectures along with a phonology module which has interfaces to those three entities (and probably others besides, as aspects of phonological representations are available for subsequent linguistic processing, Poeppel & Idsardi 2012, and are available in at least limited ways to introspection and metalinguistic judgements, say the central systems of Fodor 1983). The goal for the child is to learn how to transfer information among these systems for the purposes of learning and using the sound structures of the languages that they encounter. We do agree with SFP that a fruitful way of approaching this question is with a system of ordered rules within the phonological component (Halle & Bromberger 1989). In terms of evolutionary (bio-linguistic) priority, it seems blindingly clear that the supporting auditory, motor and memory systems pre-date language, and the phonology module is the new kid on the block. (Whether animal call systems because they also connect memory, action and perception are homologous to phonology is an empirical matter, see Hauser 1996.) In terms of epistemological priority for scientific investigation there would seem to be a couple ways to proceed here (Hornstein & Idsardi 2014). One is to see the human system as primates + X, essentially the evolutionary view, and ask what the minimal X is that we need to add to our last common ancestor to account for modern human abilities. The answer for phonology might be “not much” (Fitch 2018). But there’s another view, more divorced from actual biology, which tries to build things up from first principles. So in this case that would mean asking what can we conclude about any system that needs to connect memory, action, and perception systems of any sort, a “Good Old Fashioned Artificial Intelligence” (GOFAI) approach (Haugeland 1985, see https://en.wikipedia.org/wiki/Symbolic_artificial_intelligence ). This seems to be closer to what Hale & Reiss have in mind, maybe. If so, then by this general MAP definition animal call systems would qualify as phonologies. As would a lot of other activities, including reading-writing, reading-typing, rituals (Staal 1996), dancing, kung-fu fighting, etc. (Nightmares about long ago semiotics classes ensue.) But there are problems (maybe not insurmountable) in proceeding in this way. It’s not clear that there is a general theory of sensation and perception, or of action. And what there is (e.g. Fechner/Weber laws, i.e. sensory systems do logarithms) doesn’t seem particularly helpful in the present context. We think that Gallistel 2007 is particular clear on this point:
“From a computational point of view, the notion of a general purpose learning process (for example, associative learning), makes no more sense than the notion of a general purpose sensing organ—a bump in the middle of the forehead whose function is to sense things. There is no such bump, because picking up information from different kinds of stimuli—light, sound, chemical, mechanical, and so on—requires organs with structures shaped by the specific properties of the stimuli they process. The structure of an eye—including the neural circuitry in the retina and beyond—reflects in exquisite detail the laws of optics and the exigencies of extracting information about the world from reflected light. The same is true for the ear, where the exigencies of extracting information from emitted sounds dictates the many distinctive features of auditory organs. We see with eyes and hear with ears—rather than sensing through a general purpose sense organ--because sensing requires organs with modality-specific structure.” (emphasis added)
So our take on this is that we’re going to restrict the term phonology to humans for now, and so we will need to investigate the human systems for memory, action and perception in terms of their roles in human language, in order to be able to understand the interfaces. But we agree with the strategy of finding a small set of primitives (features and precedence) that we can map across the memory-action-perception (MAP) interfaces and seeing how far we can get with that inside phonology. With Fitch 2018 The phonological continuity hypothesis, though, we will consider the properties of phonology-like systems (especially auditory pattern recognition) in other animals, such as ferrets and finches to be informative about human phonology (see also Yip 2013, Samuels 2015). How much phonological difference does it make that ASL is signed-viewed instead of spoken-heard? Maybe none or maybe a lot, probably some. The idea that there would be action-perception features in both cases seems perfectly fine, though they would obviously be connecting different things (e.g. joint flexion/extension and object-centered angle in signed language and orbicularis oris activation and FM sweep in spoken languages). Does it matter that object-centered properties are further along the cortical visual processing stream (Perirhinal cortex) whereas FM sweeps are identifiable in primary auditory cortex (A1)? Can we ignore the sub-cortical differences between the visual pathway to V1 (simple) and the ascending auditory pathway to A1 (complex)? Does it matter that V1 is two dimensional (retinotopic) and so computations there have access to notions such as spatial frequency that don’t have any clear correlates in the auditory system? Do we need to add spatial relations between features to the precedence relation in our account of ASL? (This seems to be almost certainly yes.) Again, we agree that it’s a good tactic to go as far as we can with features and precedence in both cases, but we won’t be surprised if we end up explanatorily short, especially for ASL. To address a technical point, can you learn equivalence classes? Yes, you can, that’s what unsupervised learning and cluster analysis algorithms do (Hastie, Tibshirani & Freidman 2001). Those techniques aren’t free of assumptions either (No Free Lunch Theorems, Wolpert 1996), but given some reasonable starting assumptions (innate or otherwise) they do seem relevant to human speech category formation (Dillon, Dunbar & Idsardi 2013, see also Chandrasekaran, Koslov & Maddox 2014) even if we ultimately restrict this to feature selection or (de-)activation instead of feature “invention”. Next time: Just my imagination (running away with me)

Friday, March 16, 2018

Lotsa luck, please

One of things that makes doing research (and by this I mean original research) hard is that there is no guarantee of success. There really are no good rules for doing it. More often than not, it’s unclear what the question being asked should be, let alone what it actually is. It’s similarly unclear what kinds of answers would serve to advance said partially formulated question, nor what kinds of data would offer the best kinds of evidence. The research game is played while the rules are being made, remade and remade again and even if done well there is no reason to think that the whole damn thing might not collapse or spin out of control. There’s no insurance one can buy to protect against this kind of failure. That’s just the way it is. Put more concisely, there is no scientific method wherein you press a research button, work hard, and reap the rewards.

But why not? Why can’t we automate the whole process? Why do we fail so often? Well, interestingly, people have been thinking about this question and a recent Sci Am blog post discusses some recent research on the topic (see here). It seems that one reason things are so hard is that a lot of the process is driven by luck, and this, again interestingly, has important implications.

First, the claims: science likes to think of itself as a meritocracy, perhaps the paradigmatic meritocracy.[1] Scientists are judged by their work, the successful ones being so largely because they are smarter, more disciplined, more careful, more creative than others. Research success leads to increased resources to fund further success and in the best of all worlds, resources flow to those with proven track records.

Rewards enhance this perception and promote a hero-conception of discovery in which a brave person battles against the forces of scientific darkness and by originality, grit and brains to burn triumphs over the forces of darkness and ignorance. Examples of such include the average Nobel prize winner. In this regard, such prizes are considered as significantly different from lotteries. Their rewards are deserved because earned, and not the result of dumb luck. The Sci Am piece asks whether Nobel prizes and lotteries are not more similar than generally believed. And if they are, what does this do to the conception of merit and rewards to the deserving?

There is one more point to make: this meritocracy is not only taken to be right and proper but is also understood to be the best way to advance the scientific search for truth. Merit gains rewards which advances the common project. But is this actually the case?

Here’s what I mean. What if it turned out that luck plays a VERY large role in success?  What if success in science (and in life) is not a matter of the best and brightest and hardest working gaining as a result of their smarts, creativity and perseverance, but but is more a product of being in the right place at the right time? What, in other words, should happen if we came to appreciate that luck plays an inordinately large role in success? Were this so, and the blog post linked to cites work arguing for this conclusion, then the idea that science should be run as a meritocracy would require some rethinking.

How good are the arguments proffered? Well, not bad, but not dispositive either. They consist of two kinds of work: some modeling (largely of the ‘toy’ variety) and some review of empirical work that argues the models are pointing in the right direction. I leave it to you to evaluate the results. FWIW, IMO, the argument is pretty good and, at the very least, goes some way towards noting something that I have long considered obvious: that lots of success, be it academic or otherwise, is due to luck. As my mother used to say: “it’s better to be lucky than smart.” Apparently, the models bear her out.

What follows from this if it is correct? Well, the biggest implications are for those activities where we reward people based on their track records (e.g. promotion, tenure and funding). In what follows I want to avoid discussing promotion and tenure and concentrate on funding for the papers note some interesting features of funding mechanisms that avoid the meritocratic route. In particular, it seems that the most efficient way to fund research, the one that gets most bang for the buck, targets “diversity” rather than “excellence” (the latter term is so abused nowadays that I assume that shortly it will be a synonym for BS).

For example, one study of over 1200 Quebec researchers over 15 years concludes: “both in terms of quantity of papers produced and of their scientific impact, the concentration of research funding in the hands of a so-called ‘elite’ of researchers produces diminishing marginal returns” (5). Indeed, the most efficient way to distribute funds according to the models (and the empirical studies back this up) is equally and consistently over a lifetime of research (6):

…the best funding strategy of them all was one where an equal number of funding was distributed to everyone. Distributing funds at a rate of 1 unit every five years resulted in 60% of the most talented individuals having a greater than average level of success, and distributing funds at a rate of 5 units every five years resulted in 100% of the most talented individuals having an impact! This suggests that if a funding agency or government has more money available to distribute, they'd be wise to use that extra money to distribute money to everyone, rather than to only a select few. As the researchers conclude,

"[I]f the goal is to reward the most talented person (thus increasing their final level of success), it is much more convenient to distribute periodically (even small) equal amounts of capital to all individuals rather than to give a greater capital only to a small percentage of them, selected through their level of success - already reached - at the moment of the distribution."

This is what one would expect if luck (“serendipity and chance” (5)) is the dominant factor in scientific breakthrough.

So, does this mean it is only luck? No, clearly, other things matter. But luck plays an outsized role. Where talent etc. comes in is readying one to exploit the luck that comes one’s way (a prepared mind is a lucky one?). The post ends with the following reasonable observation based on the studies it reviews:

The results of this elucidating simulation, which dovetail with a growing number of studies based on real-world data, strongly suggest that luck and opportunity play an underappreciated role in determining the final level of individual success. As the researchers point out, since rewards and resources are usually given to those who are already highly rewarded, this often causes a lack of opportunities for those who are most talented (i.e., have the greatest potential to actually benefit from the resources), and it doesn't take into account the important role of luck, which can emerge spontaneously throughout the creative process. The researchers argue that the following factors are all important in giving people more chances of success: a stimulating environment rich in opportunities, a good education, intensive training, and an efficient strategy for the distribution of funds and resources. They argue that at the macro-level of analysis, any policy that can influence these factors will result in greater collective progress and innovation for society (not to mention immense self-actualization of any particular individual).

So, there may be a good reason for why research feels so precarious. It is. It requires luck to succeed, lots of luck, lots of consistent luck. And if this is correct, it suggests that the winner take all strategies that funding agents tend to favor is likely quite counterproductive for it relies on picking winners which is very hard to do if the distribution of winners is largely a matter of luck.

That said, I doubt that things will change very soon. First, in an era of big science, big grants are needed and if there is a money shortage, then the only way to have big grants is to eliminate little ones. Second, there is an internal dynamic. Winners like to think that their success is due to their own efforts. That’s the charm of meritocracy. And as winners tend to make the rules don’t expect the promotion of “excellence” (and rewarding it) to end anytime soon, even if it would make the scientific life a whole lot better.

Last point: a while ago FoL discussed an interesting interview with the biologist Sydney Brenner (here). It generated a lively discussion in the comments that bears on the above. The point that Brenner made is that science as practiced today would have stifled the breakthrough research that was carried on in his youth. Some noted a certain kind of nostalgia for a bygone era in Brenner’s remarks, a period with its own substantial downsides.  This is likely correct. However, in light of the “luck is critical” thesis, Brenner’s point might have been based on the fact that in his day funding was more widely spread out among the relevant players and so it was possible for more people to “get” lucky. The problem then with the current state of play is not merely the insufficient level of funding, but the distribution of that funding across potential recipients.  In earlier days, the money flowed to the bulk of the research community. Nowadays it does not. And if luck matters, then the spread matters too. More pointedly, if luck matters, then rewarding the successful is a bad strategy.



[1] Rewards enhance this perception and promote a hero-conception of discovery in which a brave person battles against the forces of scientific darkness and by originality, grit and brains to burn triumphs over the forces of darkness and ignorance. Examples of such include the average Nobel prize winner. In this regard, such prizes are considered as significantly different from lotteries. Their rewards are deserved because earned, and not the result of dumb luck. The Sci Am piece asks whether Nobel prizes and lotteries are not more similar than generally believed. And if they are, what does this do to the conception of merit and rewards to the deserving?