Sunday, April 7, 2013

Operationalizing the Strong Minimalist Thesis

In the sciences, it takes a lot of work for a new idea to take hold.  Aside from a modicum of conceptual clarity, a conceptual innovation must be operationalized, and this requires offering canonical or paradigmatic models of its application. I mention this because for me one of the recurring difficulties with the Minimalist Program has been figuring out what makes any given proposal/analysis minimalist (i.e. the path from Minimalist Program to Minimalist Theory is often obscure). So, while being a minimalist is all very nice (e.g. it greatly (minimally?) enhances my self-esteem), what is more absent than it should be are clear examples of doing minimalism; examples of what makes a particular analysis/proposal minimalist or how the abstract leading ideas get concretized in everyday work. Fortunately, I have recently read some papers that, I believe, can serve as parade cases of minimalist thinking and provide clear examples of how the Strong Minimalist Thesis can be incarnated. They are the topic of today’s sermon.

First, what’s the Strong Minimalist Thesis (SMT)?  It is the claim that “language is an optimal solution” to interface conditions:

…the human faculty of language FL [is] an optimal solution to minimal design specifications, conditions that must be satisfied for language to be usable at all…for each language L (a state of FL), the expressions generated by L must be “legible” to systems that access these objects at the interface between FL and external systems – external to FL, internal to the person. (DbP 1).

The systems that L interfaces with use its generated objects. The SMT proposes that the objects of L are well designed for the cognitive interfaces that use them in doing what they do. Put another way, the generated objects can be used as is (i.e. without further alteration) to do what needs getting done (viz. the information they contain need not be further repackaged for the interfaces to use them for whatever tasks they set their “hands” to.)[1] What the hell does this mean?  Two papers by Pietroski, Lidz, Halberda and Hunter (PLHH) (here and here) provide a useful concrete model for interpreting these abstract claims. Their discussion centers on the correct representation of the meaning of most. Here’s what PLHH do.

The papers are interested in figuring out how most sentences affect visual perception (i.e. how the visual system uses grammatical information in making a visual judgment). Specifically, how does someone who hears (1) judge whether a certain presented array of dots verifies (1).

(1)  Most of the dots are blue

It’s a given that (1) is true iff the number of blue dots exceeds the number of non-blue dots. The question is how does one represent the italicized information and does it matter to what people do.  The problem becomes interesting in that there are several ways of representing the quantity information in (1) that are not intensionally equivalent (viz. they use different predicates and different relations) despite being truth functionally the same (in Frege speak: they involve different routes to the same truth value). Here are three possible representations for the meaning of most.

            (2)       a. OneToOnePlus*: [{x: D (x)}, [x: Y (x)}] iff some some set s, s Ì {X: D
    (x)} and OneToOne [s, {x: Y (x)}][2]
                        b. |{x: D (x) & Y (x)}| > {x: D (x) & - Y (x)}|
                        c. |{x: D (x) & Y (x)}| > |{ x: D (x)}| - |{x: D(x) & Y (x)}|

For D= ‘dot’ and Y= ‘blue,’ the (2a) representation carries out the evaluation of the dot scene by pairing the blue with the non-blue dots and seeing if there is at least one extra blue dot left over. The second, in (2b), sees if the size of the set of blue dots is greater than the size of the set of non-blue dots and the third, (2c) sees if the size of the set of blue dots is greater than the size of the set of all the dots minus the set of blue dots.  (2a) differs from the others in using a distinct predicate (viz. OnToOne) while (2b) and (2c) differ in the sets are compared, the former directly calculating the set of non-blue dots, the latter never directly numerically evaluating this set (i.e. it does so indirectly by directly subtracting the blue dots from the entire set of dots).

PLHH reason as follows: There are two possibilities when speakers are asked to evaluate dot scenes on hearing (1):  (i) the visual/counting system might find some of these representations more congenial than the others. (ii) Or, the visual/counting system might use any of these three truth functionally equivalent representations given the right circumstances.  If (i) holds then this visual-counting interface favors one representation over the other two. Why? Because “linguistic meanings are related the cognitive systems that are used to evaluate sentences for truth and falsity.” More specifically: “a declarative sentence S is semantically associated with a canonical procedure for determining whether S is true…[and] competent speakers are biased towards strategies that directly reflect canonical specifications of truth conditions.” They dub this thesis the Interface Transparency Thesis (ITT). Put in slightly more “minimalist” terms: in a well designed grammar, its products will supply the information the interfaces need in a way transparent to those needs. More specifically, the interfaces will “use” the kinds of information the linguistic structure directly encodes.  Or, the information that the grammatical representations encode and the information that the interface uses is one and the same.

Before going on, observe that the ITT provides a useful (and, as PLHH demonstrate, usable) interpretation of “optimal solution to interface conditions”: for a given interface (i.e. system that uses language) how transparent is the mapping between the information made available from L and the information that the interface uses to do what it does?  SMT amounts to the hypothesis that a strong transparency holds between the information as coded in L and the information these various interfaces exploit to do what they do.  If considerable transparency holds then SMT is vindicated. If not, not.[3] So among other things, one very useful contribution of PLHH’s papers is that they provide a substantive yet manageable interpretation of the SMT. But that is not all.

I would not be going through all of this were it not the case that PLHH demonstrate that not all representations of most are created equal.  They provide very good reasons to conclude that (2c) is the right semantic representation of most (or, is clearly superior to (2a,b)). Demonstrating this is conceptually simple but the argument is quite complicated and rich. Showing (2c) is superior to (2a/b) requires knowing a lot about properties of the interface, in this case knowing how people count (humans use two counting systems with different properties) and how people “count” what they see.  Luckily, this is a well-studied domain of visual perception (Halberda (one of the ‘H’s) has done a lot of basic work on how humans do this) and so it is possible to contrive visual dot scenes that would favor one or another of the representational formats in (2) and see what happens. The answer is that the information in (2c) is what humans compute, even when things are visually arranged so that (2a) or (2b) would be simple to apply.[4] The bottom line: humans have a bias for (2c) and the source of this bias is reasonably attributed to the fact that the visual-counting system likes the information as represented in (2c), as per the ITT.

Assume that this is correct. Can we go further and explain the properties of the representation (2c) in terms of the properties of this interface?  Let me be clear: PLHH show that one representation is preferred to others. We can attribute this to the meaning of most being (2c) coupled with the ITT.  Given this we can ask the next question: is the representational format of (2c) explicable in terms of the properties of this interface? Recall, the SMT suggests that FL (a late emerging system) is the “optimal solution” to interface requirements. This suggests that the properties of FL are what they are because of the properties of the interfaces that use them.  PLHH show that for some features of (2c) this explanatory chit can be cashed in. Here’s their very interesting argument.

First, note that in (2b,c) different sets are being selected for enumeration (recall (2c) says nothing direct about the non-blue dots).  This said, it’s a fact that humans are very good at selecting positive features in an array (e.g. blue dots or red dots or green dots) but not at negatively specified features (e.g. not-blue dots). This clearly argues against (2b) in that one is directed to select the non-blue dots.  Second, it has been shown that subjects (human adults) “always attend and enumerate the superset of all dots,” which is good news for (2c) as this is a required part of the specified computation. Third, it can be shown that when subjects use the Approximate Number System (ANS), the one used in this task, they can “estimate the cardinality of up to three sets in parallel,” which means that if there are blue dots, red dots, yellow dots, green dots and mauve dots that (2b) could not be used to evaluate (1) in such a scene (i.e. the requirements in (2b) do not scale up very well, whereas those in (2c) do).  In sum, as PLHH put it:

A meaning like [(2c)]…is straightforwardly verified with these resources, since the sets required for verification (one color plus the superset) are easily and automatically attended by the visual system. Moreover, this meaning does not become less plausible as the number of color subsets increases.

In other words, given the ITT in this domain (for which PLHH have provided evidence) and given the properties of the ANS and the visual system, representations like (2c) perfectly fit the structural capacities of the interface.  Thus, the meaning of most as specified in (2c) fits the noted interface specifications to a (SM)T!

The work is gorgeous. But aside from its stand-alone value, it’s really useful for minimalists to contemplate and absorb.  The Interface Transparency Thesis provides a useful concept for investigating how interfaces and grammars “fit.”  If such a fit can be established, it is possible (sometimes) to argue from properties of the interface to properties of the representations.  Minimalist should understand and absorb this two-step tango for it serves to operationalize the SMT, moving it from a frequently annoying slogan to a research problem, something every minimalist should welcome.

Let me end by noting that PLHH are not alone in deploying this argument. Berwick and Weinberg (BW) (here, where the notion ‘transparency’ was also mooted and discussed) develop an earlier version of this argument.[5] Their version of the ITT considers another interface, the parser (i.e. those interfaces that underlie parsing utterances in real time) and asks what grammatical properties would allow for optimal parsing (roughly parsing in linear time). BW showed that parsers with bounded left contexts would serve nicely and argued that grammars that respected some version of cyclicity+subjacency would perfectly fit the bill. So, if we assume that parsers use the structures generated by L to parse then a cyclic+subjacent compliant grammar would be the perfect fit.  The form of argument is exactly the same as that in PLHH, with the relevant interface this time being those that underlie parsing.

So, the upshot: there are now some paradigm cases out there of how to argue for the SMT.  Deploying these arguments requires knowing a lot about grammar and a lot about some interface property. However, as these two cases show, such arguments can be made. Moreover, they can be made convincingly.  It seems that the SMT is not merely a guiding regulative ideal but even one that can be empirically evaluated. Pretty damn good!

The take home message?  One effective way of investigating the SMT is to identify some interface system (the parser, the visual system, the ANS) and see how it uses the grammatical information provided by L.  The SMT leads to the expectation that it uses the information “transparently,” and that the details of how the interface works can explain why the representation looks like it does. This is hard to pull off, for it requires knowing a lot both about the grammar and the interface at issue.  It suggests that future syntacticians will need to have new skill sets and/or be very collaborative.  This will no doubt be demanding. But, hey, who every said that cognitive-biolinguistics would be easy. The most anyone promised was that it would be fun, and, if these cases are any indication, crammed with more than a touch of intellectual beauty as well.

[1] If I understand the notion “covering grammar” correctly, then one might say that the competence grammar is the covering grammar for the relevant interface.  In the best case it is the grammar that every interface uses.
[2] OneToOne is a function pairs individuals in D and Y one to one.
[3] Note the word ‘considerable.’  The relevant evaluation will revolve around some estimation of the degree of transparency and this may be a labile notion.  It may be possible to make these estimations on a case by case basis without having a general measure of transparency.
[4] As PLHH note: humans can in fact apply the predicates in (2a) and (2b) in non-quantificational tasks. Thus their failure to apply them in these “linguistic” contexts cannot be traced to some general human incapacity to deploy them.
[5] In addition Colin Phillips proposal that the grammar is identical to the parser can be interpreted as postulating a very strong transparency assumption for this interface.

1 comment: