Faculty of Language: Transparency Thesis

Showing posts with label Transparency Thesis. Show all posts

Friday, April 19, 2013

One FInal Time into the Breach (I hope): More on the SMT

The discussion of the SMT posts has gotten more abstract than I hoped. The aim of the first post discussing the results by Pietroski, Lidz, Halberda and Hunter was to bring the SMT down to earth a little and concretize its interpretation in the context of particular linguistic investigations. PLHH investigate the following: there are many ways to represent the meaning of most, all of which are truth functionally equivalent. Given this, are the representations empirically equivalent or are there grounds for arguing choosing one representation over the others. PLHH propose to get a handle on this by investigating how these representations are used by the ANS+visual system in evaluating dot scenes wrt statements like most of the dots are blue. They discover that the ANS+visual system always uses one of three possible representations to evaluate these scenes even when use of the others would be both doable and very effective in that context. When one further queries the core computational predilections of the ANS+visual system it turns out that the predicates that it computes easily coincide with those that the “correct” representation makes available. The conclusion is that the one of the three representations is actually superior to the others qua linguistic representation of the meaning of most, i.e. it is the linguistic meaning of most. This all fits rather well with the SMT. Why? Because the SMT postulates that one way of empirically evaluating candidate representations is with regard to their fit with the interfaces (ANS+visual) that use it. In other words, the SMT bids us look to how grammars fit with interfaces and, as PLHH show, if one understands ‘fit’ to mean ‘be transparent with’ then one meaning trumps the others when we consider how the candidates interact with the ANS+visual system.

It is important to note that things need not have turned out this way empirically. It could have been the case that despite core capacities of the ANS+visual system the evaluation procedure the interface used when evaluating most sentences was highly context dependent, i.e. in some cases it used the one-to-one strategy, in others the ‘|dots ∩ blue| - |dots ∩ not-blue|’ strategy and sometimes the ‘|dots ∩ blue| - [|dots| - |dots ∩ blue|]’ strategy. But, and this is important, this did not happen. In all cases the interface exclusively used the third option, the one that fit very snugly with the basic operations of the ANS+visual system. In other words, the representation used is the one that the SMT (interpreted as the Interface Transparency Thesis) implicates. Score one for the SMT.

Note that the argument puts together various strands: it relies on specific knowledge on how the ANS+visual system functions. It relies on specific proposals for the meaning of most and given these it investigates what happens when we put them together. The kicker is that if we assume that the relation between the linguistic representation and what the ANS+visual system uses to evaluate dot scenes is “transparent” then we are able to predict[1] which of the three candidate representations will in fact be used in a linguistic+ANS+visual task (i.e. the task of evaluating a dot scene for a given most sentence[2]).[3]

The upshot: we are able to use information from how the interface behaves to determine a property of a linguistic representation. Read that again slowly: PLHH argue that understanding how these tasks are accomplished provides evidence for what the linguistic meanings are (viz. what the correct representations of the meanings are). In other words, experiments like this bear on the nature of linguistic representations and a crucial assumption in tying the whole beautiful package together is the SMT interpreted along the lines of the ITT.

As I mentioned in the first post on the SMT and Minimalism (here), this is not the only exemplar of the SMT/ITT in action. Consider one more, this time concentrating on work by Colin Phillips (here). As previously noted (here), there are methods for tracking the online activities of parsers. So, for example, the Filled Gap Effect (FGE) tracks the time course of mapping a string of words into structured representations. Question: what rules do parsers use in doing this. The SMT/ITT answer is that parsers use the “competence” grammars that linguists with their methods investigate. Colin tests this by considering a very complex instance: gaps within complex subjects. Let’s review the argument.

First some background. Crain and Fodor (1985) and Stowe (1986) discovered that the online process of relating a “filler” to its “gap” (e.g. in trying to assign a Wh a theta role by linking it to its theta assigning predicate) is very eager. Parsers try to shove wayward Whs into positions even if filled by another DP. This eagerness shows up behaviorally as slowdowns in reading times when the parser discovers a DP already homesteading in the thematic position it wants to shove the un-theta marked DP into. Thus in (1a) (in contrast to (1b), there is a clear and measurable slowdown in reading times at Bill because it is a place that the who could have received a theta role.

(1) a. Who did you tell Bill about

b. Who did you tell about Bill

Thus, given the parser’s eagerness, the FGE becomes a probe for detecting linguistic structure built online. A natural question is where do FGEs appear? In other words, do they “respect” conditions that “competence” grammars code? BTW, all I mean by ‘competence grammars’ are those things that linguists have proposed using their typical methods (one’s that some Platonists seem to consider the only valid windows into grammatical structure!)? The answer appears to be they do. Colin reviews the literature and I refer you to his discussion.[4] How do FGEs show that parsers respect grammatical structure? Well, they seem not to apply within islands! In other words, parsers do not attempt to related Whs to gaps within islands. Why? Well given the SMT/ITT it is because Whs could not have moved from positions wihin islands and so they are not potential theta marking sites for the Whs that the parser is eagerly trying to theta mark. In other words, given the SMT/ITT we expect parser eagerness (viz. the FGE) to be sensitive to the structure of grammatical representations, and it seems that it is.

Observe again, that this is not a logical necessity. There is no a priori reason why the grammars that parsers use should have the properties that linguists have postulated, unless one adopts the SMT/ITT that is. But let’s go on discussing Colin’s paper for it gets a whole lot more subtle than this. It’s not just gross properties of grammars that parsers are sensitive to, as we shall presently see.

Colin consider gaps within two kinds of complex subjects. Both prevent direct extraction of a Wh (2a/3a), however, sentences like (2b) license parasitic gaps while those like (3b) do not:

(2) a. *What₁ did the attempt to repair t₁ ultimately damage the car

b. What₁ did the attempt to repair t₁ ultimately damage t₁

(3) a. *What₁ did the reporter that criticized t₁ eventually praise the war

b. *What did the reporter that criticized t₁ eventually praise t₁

So the grammar allows gaps related to extracted Whs in (2b) but not (3b), but only if this is a parasitic gap. This is a very subtle set of grammatical facts. What is amazing (in my view nothing short of unbelievable) is that the parser respects these parasitic gap licensing conditions. Thus, what Colin shows is that we find FGEs at the italicized expressions in (4a) but not (4b):

(3) a. What₁ did the attempt to repair the car ultimately …

b. What₁ did the reporter that criticized the war eventually …

This is a case where the parser is really tightly cleaving to distinctions that the grammar makes. It seems that the parser codes for the possibility of a parasitic gap while processing the sentence in real time. Again, this argues for a very transparent relation between the “competence” grammar and the parsing grammar, just as the SMT/ITT would require.

I urge the interested to read Colin’s article in full. What I want to stress here is that this is another concrete illustration of the SMT. If grammatical representations are optimal realizations of interface conditions then the parser should respect the distinctions that grammatical representations make. Colin presents evidence that it does, and does so very subtly. If linguistic representations are used by interfaces, then we expect to find this kind of correlation. Again, it is not clear to me why this should be true given certain widely bruited Platonic conceptions. Unless it is precisely these representations that are used by the parser, why should the parser respect its dicta? There is no problem understanding how this could be true given a standard mentalist conception of grammars. And given the SMT/ITT we expect it to be true. That we find evidence in its favor strengthens this package of assumptions.

There are other possible illustrations of the SMT/ITT. We should develop a sense of delight at finding these kind of data. As Colin’s stuff shows, the data is very complex and, in my view, quite surprising, just like PLHH’s stuff. In addition, they can act as concrete illustrations of how to understand the SMT in terms of Interface Transparency. An added bonus is that they stand as a challenge to certain kinds of Platonist conceptions, I believe. Bluntly: either these representations are cognitively available or we cannot explain why the ANS+visual system and the parser act as if they were. If Platonic representations are cognitively (and neurally, see note 3) available, then they are not different from what mentalists have taken to be the objects of study all along. If from a Platonist perspective they are not cognitively (and neurally) available then Platonists and mentalists are studying different things and, if so, they are engaged in parallel rather than competing investigations. In either case, mentalists need take heed of Platonist results exactly to the degree that they can be reinterpreted mentalistically. Fortunately, many (all?) of their results can be so interpreted. However, where this is not possible, they would be of absolutely no interest to the project of describing linguistic competence. Just metaphysical curiosities for the ontologically besotted.

[1] Recall, as discussed here, ‘predict’ does not mean ‘explain.’

[2] Remember, absent the sentence and in specialized circumstances the visual system has no problem using strategies that call on powers underlying the other two non-exploited strategies. It’s only when the visual system is combined with the ANS and with the linguistic most sentence probe that we get the observed results.

[3] Actually, I overstate things here: we are able to predict some of the properties of the right representation, e.g. that it doesn’t exploit negatively specified predicates or disjunctions of predicates.

[4] Actually, there are several kinds of studies reviewed, only some of which involve FGEs. Colin also notes EEG studies that show P600 effects when one has a theta-undischarged Wh and one crosses into an island. I won’t make a big deal out of this, but there is not exactly a dearth of neuro evidence available for tracking grammatical distinctions. They are all over the place. What we don’t have are good accounts of how brains implement grammars. We have tons of evidence that brain responses track grammatical distinctions, i.e. that brains respond to grammatical structures. This is not very surprising if you are not a dualist. After all we have endless amounts of behavioral evidence (viz. acceptability judgments, FGEs, eye movement studies, etc.) and on the assumption that human behavior supervenes on brain properties it would be surprising if brains did not distinguish what human subjects distinguish behaviorally. I mention this only to state the obvious: some kinds of Platonism should find these kinds of correlations challenging. Why should brains track grammatical structure if these live in Platonic heavens rather than brains? Just asking.

Wednesday, April 10, 2013

Plato and the SMT

On communicating with others about the SMT post (here) I have come to believe that it can be used to illustrate what I find unfortunate about a Platonist conception of linguistics. So with the sincere desire of afflicting the platonistically comfortable, I would like to outline how the version of the SMT I discussed earlier makes relatively little sense from a Platonist perspective. Moreover, given the demonstrated empirical fecundity of this kind of SMT research, so much the worse for a Platonist conception. In other words, any conception of linguistics that insulates itself from these kinds of considerations is one to be shunned, dumped, dispensed with, thrown onto the garbage heap of very bad ideas. Strong words, eh? Good, that was my intent. Let me elaborate.

The previous post suggested that an excellent way of understanding the SMT is in terms of the degree of transparency holding between linguistic representations and those exploited by the interfaces to do whatever they do. Pietroski, Lidz, Halberda and Hunter (PLHH) argue that the right linguistic representation for the meaning of most concerns quite a bit more than getting the truth conditions right. It also includes getting a decent account of how these representations are deployed to “count.” The argument proceeds as follows: They observe that there are many distinct ways of presenting the truth conditions and that some are more congenial to the ANS+visual system than others in in being favored by ANS+visual interface when it evaluates sentences like Most of the dots are blue when presented with various dot arrays. Specifically, they demonstrate that subjects evaluate the truth of such sentences in such scenes by applying the operations and predicates explicitly represented in (1c), not those in the truth functionally equivalent (1a,b). Indeed, they show that representations like (1a,b) are shunned even when they could have been effectively used.[1]

(1) a. OneToOnePlus*: [{x: D (x)}, [x: Y (x)}] iff some some set s, s Ì {X: D

(x)} and OneToOne [s, {x: Y (x)}]

b. |{x: D (x) & Y (x)}| > {x: D (x) & - Y (x)}|

c. |{x: D (x) & Y (x)}| > |{ x: D (x)}| - |{x: D(x) & Y (x)}|

What’s this mean? Well, given that these three representations are truth functionally identical they must support all the same inferences. So looking at standard linguistic data (e.g. what inferences each supports) fails to choose between them. However as they get to these same truth conditions in different ways (i.e. using different predicates and operations) then if you assume that the interfaces use the representations provided by L to do what they do (i.e. the SMT), then not all truth functionally equivalent representations of most need be empirically equal. Why not? Because some (e.g. (1c)) might explain how humans judge visual displays while the others (e.g. (1a,b)) do not. In other words, as the ANS+visual system evaluates sentences like most of the dots are blue by comparing the size of the set of blue dots and the size of the set of all the dots minus the blue dots (i.e. by using the information as depicted in (1c)) this argues for (1c) being the correct representation of the meaning of most. More interesting still, PLHH show that if we assume that it is a design feature of linguistic representations that they perfectly cater to the needs of the interfaces (viz. the SMT understood in terms of transparency) we can explain why (1c) has some of the properties it does (hint: because the interfaces are capable of doing some things well and not others).[2]

Now, I really like this argument form. It provides a way of unpacking the SMT in terms of the Transparency Thesis. It is able to generate rich non-trivial hypotheses about linguistic representations that are susceptible to empirical testing (as PLHH demonstrate). However, and here is the punchline, none of this makes much sense from a Platonist perspective. Why? Because it presupposes that linguistic representations can be investigated by considering how they fit with other features of human cognition. More specifically, it assumes that we can learn something about linguistic structures by investigating how they are used by other human mental organs (here, the ANS+visual system). Or to put this another way, entailment relations and judgments of truth conditions do not exhaust the evidence relevant to discovering the properties of linguistic representations. As PLHH demonstrate, how humans verify these truth conditions (i.e. how the products of FL interact with performance-interface systems) is relevant as well. But this only makes sense if these representations are psychologically active, i.e. they don’t live in some disembodied platonic heaven. Put crudely, though competence is not the same as performance, on my view of linguistics (and PLHH’s and Berwick and Weinberg’s and Colin Phillips’ and Poeppel’s and (I believe) Chomsky’s, in fact in all those with a reasonable view of scientific practice) it is possible to learn about the structure of competence grammars (i.e. about how sentences are represented mentally) by looking at how they are put to use. Some (e.g. the later Katz[3]) seem to have concluded from the fact that this is hard to do and that many peaks at the interface fail to reveal anything interesting that this is in principle impossible. This is a very counter-productive view to take. And, talking methodology, this is the strongest kind of criticism one can level against a proposed conception. Bluntly, Platonism is a bad position to adopt for it blinkers your investigative options by precluding certain questions from consideration.

I mentioned a while ago (here) that one thing that I don’t like about the Platonic Attitude is that it serves to confine linguistic investigation to its own little empirical ghetto. Roughly “grammaticality” judgments count, entailments ok, ditto, synonymy and paraphrase. However, no psycho evidence please: what kids do in acquiring their grammar, or how people parse sentences or how dot displays are evaluated are not relevant to linguistics proper. This latter kind of data is in principle incapable of probing linguistic representations. Sure, once linguists have done their work, we can ask how their constructs might be used. But, only linguistic evidence is relevant in evaluating linguistic structure, psycho, bio and neuro evidence is in principle besides the point. What the work on most (and the other stuff discussed in the previous post) shows is that this attitude cuts research off from a bevy of interesting questions and investigations. Of course, one can take a cramped view of linguistic research, but, really, why do it? Why cleave to a view that imposes blinders on research? Why draw metaphysical lines in the sand that hinder investigation? My conclusion: not only are there no good arguments against the mentalist conception of grammar and the cog-bio conception of linguistics but there are good methodological reasons to reject Platonism in this domain. One cannot stop people from choosing a cramped scientific aesthetic if this is where they are determined to go, but there are good reasons of scientific hygiene to shun this step. Consider yourself duly warned!

[1] Recall, here could have been means that the interface has the power to apply the relevant information in (1a,b) (i.e. they present evidence that in other contexts it does so) and it would be a very good (viz. optimal) strategy to do so in some of the arrays presented. So, logically speaking, all three of the alternatives could have been on an equal footing, circumstances choosing between them This, however, turned out to be false, and that is very very interesting.

[2] I confess that I am not sure that I believe that this is a reasonable metaphysical assumption, though it is an excellent methodological principle. Thus, it works well as a directive for exploring the properties of linguistic representations. However, I find it harder to see how the interfaces could cause FL to have the properties it has. Darwin and his buddies could in principle be useful here, were it not for the fact that the time apparently available seems pretty short, hence not leaving much room for natural selection to work its magic. For now, I am putting these sophistications aside. Maybe I’ll post more on this later. Be warned!

[3] In fact, the early Katz (he of Katz and Fodor 1963 (“The structure of semantic theory” in Language)) had the more expansive conception outlined here. Someone who might know (p.c.) speculates that the “failure” to empirically ground the derivational theory of complexity is what led Katz to throw in the towel and retreat to Platonism. Too bad. But unlike some later epigones, at least Katz had it right the first time.

Monday, April 1, 2013

A Neat New Argument and a Hint of the Future?

Syntacticians often act like parochial snobs. They are snobs in that some (e.g. me) believe that the kinds of data and explanations offered within syntax are deeper than those offered in other domains. There really is non-trivial syntactic theory and a whole budget of effects that these theories explain. Syntacticians (e.g. me) can be parochial in believing that only these kinds of questions are worth asking and investigating. A consequence of this is the oft-exhibited habit of evaluating the interest of other linguistic questions in terms of how much they address questions in syntax. This habit can be especially pronounced when syntacticians consider work in psycholinguistics. When evaluating psycho work, it is not uncommon for syntacticians to expect psycholinguists to provide evidence/methods for helping to choose among competing cutting edge syntactic alternatives. As this rarely happens, syntacticians often come to have jaundiced views about the intellectual contributions of psycholinguistic research.

This is clearly a rather perverse way of evaluating psycho research. The issue is not whether psycholinguistics can answer syntactic questions but whether they have their own interesting questions concerning the form and function of FL. Over the years I have made it a habit to sit in on lab meetings of my psycho colleagues and two things have struck me. First, that wrt syntactic theory, to date, psycho techniques have contributed little that purely syntactic methods have not delivered more cleanly and quickly (but see below) and, second, that psycholinguists have found fascinating effects and have developed interesting theories of these effects that greatly expand our understanding of how FL is used in real time, both learning and processing. Let me elaborate each point just a bit.

A good deal of the questions in language processing and acquisition can proceed just fine in blissful ignorance of the latest findings in syntax. For example, studying the acquisition or processing of long distance dependencies will not be greatly affected by whether one assumes that movement is actually an expression of Merge (I-merge) or an independent operation in its own right. The differences between these two conceptions is too fine grained to be captured by (at least current) psycho methods. However, this does not mean that there is nothing worth studying. For example, regardless of how a WH gets to clause initial position, we have an interesting question: how eagerly does the parser try to find the gap the WH is related to? One possibility is that it waits to find a gap and then tries to relate the WH to it. A second option is that it tries to link the WH immediately on sighting a theta marking host predicate without waiting to see if there is a gap after the predicate to fill. Thus in a sentence like (1), we can ask if the parser tries to interpret who as complement of tell after/before seeing if there is a gap there.

(1) Who did you tell Bill about

So given the question ‘how “eager” is the parser?’, the next question is how to study its eagerness? Via a very interesting phenomenon discovered in the 1980s (Laurie Stowe in 1984), the so-called “Filled Gap Effect” (FGE). If you put people in front of a computer and have them read a sentence word by word and measure how they doe this, it turns out that in a sentence like (1) readers will pause longer at Bill than they would in reading a sentence like (2):

(2) Who did you tell about Bill

This is a very reliable effect. Interpretation? Readers are trying to thematically interpret who when they get to tell and must rescind this when they get to Bill in (1) but not in (2). In other words, readers “prefer” giving a thematic role to who even at the cost of having to rescind this assignment soon after over waiting to see if there is an available role to assign, even if this means just waiting one word. Conclusion: parsers are very eager to interpret uninterpreted material. And this eagerness can be measured and used to probe how parsers use grammars to construct sentence interpretations in real time, i.e. FGE is a marvelous tool for probing the relation between grammars and parsers.

Let me give an illustrative example. Consider the following question: Given that parsers use grammars what is the relation between competence grammars (ones beloved of syntacticians) and parsing grammars? A strong position is that there is a very high level of “transparency” between the two.[1] What’s this mean? Well, that the categories and relations that the grammar specifies are identical to those that the parser exploits/respects in real time. For example, the categories that grammars deploy (e.g. DP, VP, CP) are what the parser tries to recover and the conditions the grammar respects (e.g. minimality, c-command, subjacency) the parser does as well. A good deal of work in parsing over the last 15 years has been aimed at specifying the degree of transparency between competence grammars and parsing grammars. For example, psycholinguists have investigated whether parsers respect c-command in trying to find a bound anaphors possible antecedents and whether parsers display cross-over effects.[2] My colleague, Colin Phillips, has a really beautiful set of results bearing on how parsers “do” movement, particularly does gap filling “obey” islands (see here). The answer is “yes.” How does he know? Building on earlier work by others, he shows that the FGE only appears if the “gap” sits in a possible movement site. Gaps within islands do not trigger FGEs (unless they are licensable parasitic gaps (this is Colin’s great find)). Gaps generable by movement do trigger FGEs. The argument is subtle and well worth reading (if you’ve read it already, read it again to your kids; it makes for a lively bedtime experience!). All this makes sense if parsing grammars and competence grammars are (largely) the same. Ergo…

Note that these psycho results build on what syntactic research has revealed about competence. It uses these results to address a related very interesting question, viz. the Transparency Thesis (TT) and it does so by exploiting a psycho probe, viz. FGE, manifest in online reading tasks. Most interestingly in my view, as TT becomes more and more empirically grounded (and the evidence in its favor is already pretty good IMO), syntacticians may finally get what they’ve been asking for: psycholinguistic constraints on adequate competence theories (Syntacticians, careful what you ask for lest you get it!). After all, if TT is right, then syntactic theories that fail to support transparent parsing grammars should be less valued than those that do. In other words if TT proves empirically tenable, then online psycho results will prove highly relevant to evaluating the empirical adequacy of proposed competence grammars.

How far away is this day? Well, I want to end by presenting a possible glimpse of the future. In some recent work Shevaun Lewis, Dave Kush and Brad Larson (LKL) have used FGEs to probe the syntactic derivation of constructions like (3) (see here for some slides):

(3) What and when will we eat

Not surprisingly, these coordinated WH questions have rather elaborate syntactic properties. I will not detail them for you (read the slides), except to say that LKL are led to analyze these by treating the two WHs rather differently. LKL propose that the inner WH lands in clause initial position via movement while the outer one is base generated there. The evidence for this conclusion is two-fold. First, there is an acceptability contrast between (4a) and (4b), the latter being quite a bit worse than the former (and yes they ran the relevant acceptability judgment studies to show this). Second, the what in (4a) fails to induce an FGE. If FGEs are diagnostic of movement dependencies (as above) then the absence of these in (4a) is just what we would expect, and apparently receive. To my knowledge this is the first time a technique borrowed from psycho has been pressed into service to support a novel syntactic conclusion. Terrific!

(4) a. What and when will we eat something

b. When and what will we eat something

It is the sign of a progressing research program that novel questions and techniques keep springing up. The aim is to conciliate these producing a bigger and bigger coherent picture. To date, in my view, a great deal of what we have discovered about how FL does syntax has come from the careful analysis of natural language grammars. The LKL results are signaling a slightly different future: I have a dream that is deeply rooted in the Generative enterprise. I have a dream that one day linguistics will rise up and live the true meaning of its biolinguistic roots. I have a dream that one day syntacticians and psycholinguists (and eventually neuroscientists) will use each other’s work to strongly constrain their common research project of understanding how FL is structured and how it is used. I have a dream, and LKL provides a glimpse of that wonderful and glorious future.

[1] The term ‘transparency’ is from Berwick and Weinberg 1984.

[2] For a good review of the first, c.f. Dillon (here). All I have to offer for the second is some hot off the presses work by Kush, Lidz and Phillips. Poster from latest CUNY (here).

Faculty of Language

Comments