Monday, June 20, 2016

Classical case theory

So what’s classical (viz. GB) Case Theory (CCT) a theory of? Hint: it’s not primarily about overt morphological case, though given some ancillary assumptions, it can be (and has been) extended to cover standard instances of morphological case in some languages. Nonetheless, as originally proposed by Jean Roger Vergnaud (JRV), it has nothing whatsoever to do with overt case. Rather, it is a theory of (some of) the filters proposed in “Filters and Control” by Chomsky and Lasnik (F&C).

What do the F&C filters do? They track the distribution of overt nominal expressions. (Overt) D/NPs are licit in some configurations and not in others. For example, they shun the subject positions of non-finite clauses (modulo ECM), they don’t like being complement to Ns or As, nor complements to passivized verbs. JRV’s proposal, outlined in his famous letter to Chomsky and Lasnik, is that it is possible to simplify the F&C theory if we reanalyze the key filters as case effects; specifically if we assume that nominals need case and that certain heads assign case to nominals in their immediate vicinity. Note, that JRV understood the kind of case he was proposing to be quite abstract. It was certainly not something evident from the surface morphology of a language. How do I know? Because F&C filters, and hence JRV’s CCT, was used to explain the distribution of all nominals in English and French and these two languages display very little overt morphology on most nominals. Thus, if CCT was to supplant filters (which was the intent) then the case at issue had to be abstract. The upshot: CCT always trucked in abstract case.

So what about morphologically overt case? Well, CCT can accommodate it if we add the assumption that abstract case, which applies universally to all nominals in all Gs to regulate their distribution, is morphologically expressed in some Gs (a standard GG maneuver). Do this and abstract case can serve as the basis of a theory of overt morphological case. But, and this is critical, the assumption that the mapping from abstract to concrete case can be phonetically pretty transparent is not a central feature of the CCT.[1]

I rehearse this history because it strikes me that lots of discussion of case nowadays thinks that CCT is a theory of the distribution of morphological case marking on nominals. Thus, it is generally assumed that a key component of CCT assigns nominative case to nominals in finite subject positions and accusative to those in object slots etc. From early on, many observed that this simple morphological mapping paradigm is hardly universal. This has led many to conclude that CCT must be wrong. However, this only follows if this is what CCT was a theory of, which, I noted above, it was not.

Moreover, and this is quite interesting actually, so far as I can tell the new case theorists (the ones that reject the CCT) have little to say about the topic CCT or C&F’s filters tried to address. Thus, for example, Marantz’s theory of dependent case (aimed at explaining the morphology) is weak on the distribution of overt nominals. This suggests that CCT and the newer Morphological Case Theory (MCT) are in complimentary distribution: what the former takes as its subject matter and what the latter takes as its subject matter fail to overlap. Thus, at least in principle, there is room for both accounts; both a theory of abstract case (CCT) and a theory of morphological case (MCT). The best theory, of course, would be one in which both types of case are accommodated in a single theory (this is what the extension of the CCT to morphology hoped to achieve). However, were these two different, though partially related systems this would be an acceptable result for many purposes.[2]

Let’s return to the F&C filters and the CCT for a moment. What theoretically motivated them? We know what domain of data they concerned themselves with (the distribution of overt nominal).[3] But why have any filters at all?

F&C was part of the larger theoretical project of simplifying transformations. In fact, it was part of the move from construction based G rules to rules like move alpha (MA). Pre MA, transformations were morpheme sensitive and construction specific. We had rules like relative clause formation and passive and question formation. These rules applied to factored strings which met the rules’ structural conditions (SD). The rules applied to these strings to execute structural changes (SC). The rules applied cyclically, could be optional or obligatory and could be ordered wrt one another (see here for some toy illustrations). The theoretical simplification of the transformational component was the main theoretical research project from the mid 1970s to the early-mid 1980s. The simplification amounted to factoring out the construction specificity of earlier rules, thereby isolating the fundamental displacement (aka, movement) property. MA is the result. It is the classical movement transformations shorn of their specificity. In technical terms, MA is a transformation without specified SDs or SCs. It is a very very simple operation and was a big step towards the merge based conception of structure and movement that many adopt today.

How were filters and CCT part of this theoretical program? Simplifying transformations by eliminating SDs and SCs makes it impossible to treat transformations as obligatory. What would it mean to say that a rule like MA is obligatory? Obliged to do what exactly?  So adopting MA means having optional movement transformations. But optional movement of anything anywhere (which is what MA allows) means wildly overgenerating. To regulate this overgeneration without SDs and SCs requires something like filters. Those in F&C regulated the distribution of nominals in the context of a theory in which MA could freely move them around (or not!). Filters make sure that these vacate the wrong places and end up in the right ones. You don’t move for case strictly speaking. Rather the G allows free movement (it’s not for anything as there are no SDs that can enforce movement) but penalizes structures that have nominals in the wrong places. In effect, we move the power of SDs and SCs from the movement rules themselves and put them into the filters. F&C (and CCT which rationalized them) outline one type of filter, Rizzi’s criterial conditions provide another variety. Theoretically, the cost of simplifying the rules is adding the filters.[4]  

So, we moved from complex to simple rules at the price of Gs with filters of various sorts. Why was this a step forward? Two reasons.

First, MA lies behind Chomsky’s unification of Ross’s Islands via Subjacency Theory (ST) (and, IMO, is a crucial step in the development of trace theory and the ECP). Let me elaborate. Once we reduce movement to its essentials, as MA does, then it is natural to investigate the properties of movement as such, properties like island sensitivity (a.o.). Thus, ‘On Wh Movement’ (OWM) demonstrates that MA as such respects islands. Or, to put this another way, ST is not construction specific. It applies to all movement dependencies regardless of the specific features being related. Or, MA serves to define what a movement dependency is and ST regulates this operation regardless of the interpretive ends the operation serves, be it focus or topic, or questions, or relativization or clefts or… If MA is involved, islands are respected. Or, ST is a property of MA per se, not the specific constructions MA can be “part” of.[5]

Second, by factoring out MA form movement transformations and replacing SDs/SCs with filters focuses on the question of where these filters come from? Are they universal (part of FL/UG) or language specific? One of the nice features of CCT was that it had the feel of a (potential) FL/UG principle. CCT Case was abstract. The relations were local (government). Gs as diverse as those found in English, French, Icelandic and Chinese looked like they respected these principles (more or less). Moreover, were CCT right, then it did not look like easily learnable given that it was empirically motivated by negative data. So, simplifying the rules of G led to the discovery of plausible universal features of FL/UG. Or, more cautiously, it led to an interesting research program: looking for plausible universal filters on simple rules of derivation.[6]

What should we make of all of this today in a more minimalist setting? Well, so far as I can tell, the data that motivated the F&C filters and the CCT, as well as the theoretical motivation of simplifying G operations, is still with us. If this is so, then some residue of the CCT reflects properties of FL/UG. And this generates a minimalist question: Is CCT linguistically proprietary? Why Case features at all? How, if at all, is abstract case related to (abstract?) agreement? What is anything relates CCT and MTC? How is case discharged in a model without the government relation? How is case related to other G operations? Etc. You know the drill.[7] IMO, we have made some progress on some of these questions (e.g. treating case as a by product of Merge/Agree) and no progress on others (e.g. why there is case at all).[8] However, I believe research has been hindered, in part, by forgetting what CCT was a theory of and why it was such a big step forward.

Before ending, let me mention one more property of abstract case. In minimalist settings abstract case freezes movement. Or, more correctly, in some theories case marking a nominal makes it ineligible for further movement. This “principle” is a reinvention of the old GB observation that well formed chains have one case (marked on the head of the chain) and one theta role (marked on the foot). If this is on the right track (which it might not be) the relevant case here is abstract. So, for example, a quirky subject in a finite subject position in a language like Icelandic can no more raise than can a nominative marked subject. If we take the quirky case marked subject to be abstractly case marked in the same way as the nominative is, then this follows smoothly. Wrt abstract case (i.e. ignoring the morphology) both structures are the same. To repeat, so far as I know, this application of abstract case was not a feature of CCT.

To end: I am regularly told that CCT is dead, and maybe it is. But the arguments generally brought forward in obituary seem to me to be at right angles to what CCT intended to explain. What might be true is that extensions of CCT to include morphological case need re-thinking. But the original motivation seems intact and, frow what I can tell, something like CCT is the only theory around to account for these classical data.[9] And this is important. For if this is right, then minimalists need to do some hard thinking in order to integrate the CCT into a more friendly setting.

[1] Nor, as I recall, did people think that it was likely to be true. It was understood pretty early on that inherent/quirky case (I actually still don’t understand the difference, btw) does not transparently reflect the abstract case assigned. Indeed, the recognized difference between structural case and inherent case signaled early on that whatever abstract case was morphologically, it was not something easily read off the surface.
[2] Indeed, Distributed Morphology might be the form that such a hybrid theory might take.
[3] Actually, there was a debate about whether only overt nominal were relevant. Lasnik had a great argument suggesting that A’-traces also need case marking. Here is the relevant data point: * The man1 (who/that) it was believed t1 to be smart. Why is this relative clause unacceptable even if we don’t pronounce the complementizer? Answer: the A’-t needs case. This, to my knowledge, is the only data against the idea that case exclusively regulates the distribution of overt nominal expressions. Let me know if there are others out there.
[4] Well, if you care about overgeneration. If you don’t, then you can do without filters or CCT.
[5] Whether this is an inherent property of movement rather than, say, overt movement, was widely investigated in the 1980s. As you all know, Huang argued that ST is better viewed as an SS filter rather than part of the definition of MA.
[6] I should add, that IMO, this project was tremendously successful and paved the way for the Minimalist Program.
[7] The two most recent posts (here and here) discuss some of these issues.
[8] Curiously, the idea that case and agreement are effectively the same thing was not part of CCT. This proposal is a minimalist one. It’s theoretical motivation is twofold: first to try to reduce case and agreement to a common “mystery,” one being better than two. Second, because if case is a feature of nominals then probes are not the sole locus of uninterpretable features. Case is the quintessential uninterpretable feature. CCT understood it to be a property of nominals. This sits uncomfortably with a probe/goal theory in which all uninterpretable features are located in probes (e.g. phase heads). One way to get around this problem is to treat case as by-products of the “real” agreement operation initiated by the probe.
            From what I gather, the idea that case reduces to agreement is currently considered untenable. This does not bother me in the least given my general unhappiness with probe/goal theories. But this is a topic for another discussion.
[9] Reducing nominal distribution to syntactic selection is not a theory as the relevant features are almost always diacritical.

Saturday, June 18, 2016

Modern university life

Here are two articles on modern academic life that might interest you.

The first is on grad student unionization. I have heard many people argue that grad student unions would severely negatively affect the mentor-mentee relation that lies at the heart of grad education. How? By setting up an adversarial relationship between the two mediated by a bureaucracy (the union) whose interest is not fully in line with that of the grad student. I have never been moved by this, but I have been moved by the observation that grad student life if currently pretty hard with a less than stellar prospect of landing a job at the end (see here for some discussion). The piece I link to goes over these arguments in some detail. His conclusion is that the objections are largely bad. However, even where it true that grad student unions would change the prof-student mentoring relationship, it is not clear to me that this would not be a cost worth bearing. Grad students are in an extremely exploitable position. This is when unions make sense.

The second piece is about how the composition of university personnel has changed over the last several years. If confirms the observation that tenure track faculty has shrunk and that part-time faculty has risen. But, it notes that the problem is likely not the growth in admin people or other non-prof personnel. It seems that this group has stayed relatively stable. This said, the paper does not investigate funding issues (are non-profs sucking up more of the money than the used to?) nor does it discuss how much money at universities is now being diverted from the core missions of teaching and research to the “entertainment” part of current university life (i.e. new gym facilities, art centers, fancy dorms, support staff for entrepreneurship, etc.). Here is the conclusion. I will keep my eye out for the promised sequel.

The results of this analysis suggest that the share of employees at colleges who are administrators has not been much higher in recent years than it was in 1987. There has been growth, though, in the other professionals employment category. This growth is potentially related to a growth of amenities and other programs outside of the teaching and research that have been the traditional focus of colleges and universities, although this is difficult to ascertain due to the broad nature of this category. An additional result in the analysis is that the share of faculty who are full-time employees has been declining. This decline has occurred within the public sector, the private sector, and the for-profit sector.

One limitation of the analysis here is that it considers only employment and not spending on salaries, amenities, or anything else. However, I plan to address spending by colleges and universities in a future Economic Commentary.

Thursday, June 16, 2016

Filters and bare output conditions

I penned this post about a week ago but futzed around with it till today. Of course, this means that much of what I have to say has already been said better and more succinctly by several commentators to Omer’s last post (here, you will note similarities to claims made by Dan Milway, David Adger and Omer). So if you want a verbose rehearsal of some of the issues they touched on, read ahead.

GB syntax has several moving parts. One important feature is that it is a generate and filter syntax (rather than a crash proof theory); one in which rules apply “freely” (they need meet no structural conditions as in, for example, the Standard Theory) and some of the outputs generated by freely applying these rules are “filtered out” at some later level. The Case Filter is the poster child illustration of this logic. Rules of (abstract) case assignment are free, but if a nominal fails to receive a case, the case filter kills the derivation (in modern parlance, the derivation crashes) at a later level. In GB, there is an intimate connection between rules applying freely and filters that dispose of the over-generated grammatical detritus.

Flash forward to the minimalist program (MP). What happens to these filters? Well, like any aspect of G and FL/UG the question that arises is whether these features are linguistically proprietary or reflexes of (i) efficient computation or (ii) properties of the interpretive interfaces. The latter are called “Bare Out Conditions” (BOC) and the most common approach to GBish filters within MP is to reinterpret them as BOCs.

In MP, features mediate the conversion from syntactic filters to BOCs. The features come in two flavors; the inherently interpretable and the inherently un-interpretable. Case (on DPs or T/v) or agreement features (on T or C) are understood as being “un-interpretable” at the CI interface. If convergent derivations are those that produce syntactic objects that are interpretable at both interfaces (or at least at CI, the important interface for today’s thoroughly modern minimalists) then derivations that reach the interface with un-interpretable features result in non-convergence (aka crash). Gs describe how to license such features. So filters are cashed in for BOCs by assuming that the syntactic features GB regulated are interpretable “time bombs” (Omer’s term) which derail Full Interpretation at the interfaces.

There is prima facie reason for doubting that these time bombs exist. After all, if Gs have them then derivations should never converge. So either such features cannot exist OR G derivations must be able to defuse them in some way. As you all know, checking un-interpretable features serves to neuter their derivation crashing powers, and a great deal of G commerce in many MP theories exists to pacify the features that would otherwise cause derivational trouble. Indeed, a good deal of research in current syntax involves deploying and checking these features to further various empirical ends.

Though MPers don’t discuss this much, there is something decidedly odd about a “perfect” or “optimally designed” theory that enshrines at its core toxic features. Why would a perfect craftsman have ordered those? In early MP it was argued that such features were required to “explain” movement/displacement, it too being considered an “imperfection.”[1] However, in current MP, movement is the byproduct of the simplest/best possible theory of Merge so displacement cannot be an imperfection. This then re-raises the question of why we have un-interpretable features at all? So far as I can tell, there is nothing conceptually amiss with a theory in which all operations are driven by the need to link expressions in licit interpretable relationships (e.g. getting anaphors linked to anaphors, getting a DP in the scope of a Topic or Focus marker). The main problem with this view is empirical; case and agreement features exist and that there is no obvious interpretive utility to them. To my knowledge we currently have no good theoretical story addressing why Gs contain un-interpretable features. But, to repeat, I fail to see how there is anything well-designed about putting un-interpretable features into Gs only to then chase them around in an effort to license them.

As MP progressed, the +/- interpretable distinction came to be supplemented with another; the +/- valued distinction. To my recollection, this latter distinction was intended to replace the +/- interpretable distinction, but like so much work in syntax the former distinction remained.[2] Today, we have four cells at our disposal and every one of them has been filled (i.e. someone in some paper has used them).[3]

So, +/- interpretable and +/- valued are the MP way of translating filters into MP acceptable objects. It is part of the effort to make filters less linguistically proprietary by tracing their effects to non-linguistic properties of the interfaces. Did this effort succeed?

This is the topic of considerable current debate. Omer (my wonderful colleague) has been arguing that filters and BOCs just won’t work (here). He has lots of data aimed at showing that probes whose un-interpretable features are not cashiered do not necessarily lead to unacceptability. On the basis of this he urges a return to a much earlier conception of grammar, of the Syntactic Structures/Aspects variety, wherein rules apply to effect structural changes (SC) when their structural descriptions (SD) are met. These rules can be obligatory. Importantly, obligatory rules whose SDs never occur do not crash derivations in virtue of not applying. They just fail to apply and there is no grammatical consequence of this failure. If we see rules of agreement as obligatory rules and their feature specifications as SDs and understand them to be saying “if there is a match for this feature then match” then we can cover lots of empirical ground as regards agreement without filters (and so without BOCs).

Furthermore, if this works, then we remove the need to understand the recondite issues of interpretability as applied to these kinds of features. Agreement becomes a fact about the nature of G rules and their formats (a return of SDs and SCs) rather than the structure of interfaces and their properties. Given that we currently know a lot more about the computational system than we do about the SI interface (in fact, IMO, we know next to nothing about the properties of SI), this seems like a reasonable move. It even has some empirical benefits as Omer shows.

There is an interesting conceptual feature of this line of attack. The move to filters was part of a larger project for simplifying Movement Transformations (MT).[4] GB simplified them by removing SDs and SCs from their formal specifications.[5] Filters mitigated the resulting over-generation. So filters were the theoretical price paid for simplifying MTs to that svelte favorite Move alpha. The hope was that these filters were universal[6] and so did not need to be acquired (i.e. part of UG).[7] Omer’s work shows that this logic was quite correct. The price of eliminating filters is complicating rules by adding SDs and SCs back in (albeit in altered form).

One last point: I have a feeling that filters are making a comeback. Early MP theories where Greed was a big deal were effectively theories where the computational procedures carried the bulk of the explanatory load. But nowadays there seems to be a move to optional rules and with them filters will likely be proposed again (e.g. see Chomsky on labels). We should recall that in an MP setting filters are BOCs (or we should hope they are). And this places an obligation on those proposing them to given them some kind of BOCish interpretation (hence Chomsky’s insistence that labels are necessary for CI interpretation). And these are not always easy to provide. For example, it is easy to understand minimality effects as by-products of the computational system (e.g. minimal search, minimal computation), but there are arguments that minimality is actually an output condition (i.e. a filter) that applies late (e.g. at Spell Out). Ok, that would seem to make it a BOC. But what kind of BOC is that? Why for SI reasons would minimality hold? I am not saying it doesn’t. But if it is applied to outputs then we need a story, at least if we care about the goals of MP.

[1] The idea was that relating toxic features with movement reduced two imperfections (and so MP puzzles) to one.
[2] The replacement was motivated, I believe, on the grounds that nobody quite knew what interpretability consisted in. It thus became a catch-all diacritic rather than a way of explaining away filters as BOCs. Note: were features only valued on Spell Out to AP this problem might have been finessed, at least for morphologically overt features. Overt features are interpretable at the AP interface even if semantically without value and hence troublesome to CI. However, valuation in the course of the derivation results ion features of dubious value at CI. If full interpretation governs CI (i.e. every feature must be interpreted) then valued features need to be interpretable and we are back where we started, but with another layer of apparatus.
[3] Here’s my curmudgeon self: and this is progress?! Your call.
[4] This is extensively discussed in Chomsky’s “Conditions on Rules.” A great, currently largely unread, paper.
[5] There were good learnability motivations for this simplification as well. Reintroducing SDs and SCs will require that we revisit these learnability concerns. All things being equal, the more complex a rule, the harder it is to acquire. As Omer’s work demonstrates, there is quite a lot of variation in agreement/case patterns and so lots for the LAD to figure out.
[6] This is why they were abstract, btw. The learning problem was how to map abstract features onto morphologically overt ones, not how to “acquire” the abstract features. From what I can tell, Distribute Morphology buys into this picture; the problem of morphology being how to realize these abstracta concretely. This conception is not universally endorsed (e.g. see Omer’s stuff).
[7] Of course encumbering UG creates a minimalist problem hence the reinterpretation in terms of BOCs. Omer’s argument is that neither the GB filters strategy nor the MP BOC reinterpretation works well empirically.

Wednesday, June 15, 2016

Case & agreement: beware of prevailing wisdom

Someone recently told me a (possibly apocryphal) story about the inimitable Mark Baker. The story involves Mark giving a plenary lecture somewhere on the topic of case. To open the lecture, possibly-apocryphal-Mark says something along the following lines:
Those of you who don't work on case probably have in your heads some rough sketch of how case works. (e.g. Agree in person/number/gender between a designated head and a noun phrase, resulting in that noun phrase being case-marked.) What you need to realize is that basically nobody who actually works on case believes that this is how case works.
Now, whether or not this is really how it all went down, possibly-apocryphal-Mark has a point. In fact, I'm here to tell you that his point holds not only of case, but of agreement, too.

In one sense, this situation is probably not all that unique to case & agreement. I'm sure presuppositions and focus alternatives don't actually work the way that I (whose education on these matters stopped at the introductory stage) think they work, either. The thing is, no less than the entire feature calculus of minimalist syntax is built on this purported model of case & agreement. [If you don't believe me, go read "The Minimalist Program" again; you'll find that things like the interpretable-uninterpretable distinction are founded on the (supposed) behavior of person/number/gender and case (277ff.).] And it is a model of case & agreement that – to repeat – simply doesn't work.

So what model am I talking about? I'm really talking about a pair of intertwined theories of case and of agreement, which work roughly as follows:
  1. there is a Case Filter, and it is implemented through feature-checking: each noun phrase is born with a case feature that, were it to reach the interfaces (PF/LF) unchecked, would cause ungrammaticality (a.k.a., a "crash"); this feature is checked when the noun phrase enters into an agreement relation with an appropriate functional head (T0, v0, etc.), and only if this agreement relation involves the full set of nominal phi features (person, number, gender)
  2. agreement is also based on feature-checking: the aforementioned functional heads (T0, v0, etc.) carry "uninterpretable person/number/gender features"; if these reach the interfaces (PF/LF) unchecked, the result is – you guessed it – ungrammaticality (a.k.a., a "crash"); these uninterpretable features get checked when they are overwritten with the valued person/number/gender features found on the noun phrase
Thus, on this view, case & agreement live in something of a happy symbiosis: agreement between a functional head and a noun phrase serves to check what would otherwise be ungrammaticality-causing features on both elements.

From the vantage point of 2016, however, I think it is quite safe to say that none of this is right. And, in fact, even the Abstractness Gambit (the idea that (1) and (2) are operative in the syntax, but morphology obscures their effects) cannot save this theory.

What follows builds heavily on some of my own work (though far from exclusively so; some of the giants whose shoulders I am standing on include Marantz, Rezac, Bobaljik, and definitely-not-apocryphal Mark Baker) – and so I apologize in advance if some of this comes across as self-promoting.


Let's start with (1). Absolutive(=ABS) is a structural case, but there are ABS noun phrases that could not possibly have been agreed with, living happily in grammatical Basque sentences. How do we know they could not possibly have been agreed with (not even "abstractly")? Because we know that (non-clitic-doubled) dative arguments in Basque block agreement with a lower ABS noun phrase, and we can look specifically at ABS arguments that have a dative coargument. (Indeed, when the dative coargument is removed or clitic-doubled, morphologically overt agreement with the ABS – impossible in the presence of the dative coargument – becomes possible.)

So if an ABS noun phrase in Basque has a dative coargument, we know that this ABS noun phrase could not have been targeted for agreement by a head like v0 or T0 (because they are higher than the dative coargument). Notice that this rules out agreement with these heads regardless of whether that supposed agreement is overt or not; it is a matter of structural height, coupled with minimality. The distribution of overt agreement here serves only to confirm what our structural analysis already leads us to expect.

And yet despite the fact that it could not have been targeted for agreement, there is our ABS noun phrase, living its life, Case Filter be damned. [For the curious, note that this is crucially different from seemingly similar Icelandic facts, which Bobaljik (2008) suggests might be handled in terms of restructuring. That is because whether the embedded predicate is ditransitive (=has a dative argument) or monotransitive (=lacks one) cannot, to the best of my knowledge, affect the restructuring possibilities of the embedding predicate one bit.]

If you would like to read more about this, see my 2011 paper in NLLT, in particular pp. 929 onward. (That paper builds on the analysis of the relevant Basque constructions that was in my 2009 LI paper, so if you have questions about the analysis itself, that's the place to look.)


Moving to (2), this is demonstrably false, as well. This can be shown using data from the K'ichean languages (a branch of Mayan). These languages have a construction in which the verb agrees either with the subject or with the object, depending on which of the two bears marked features. So, for example, Subj:3sg+Obj:3pl will yield the same agreement marking (3pl) as Subj:3pl+Obj:3sg will. It is relatively straightforward to show that this is not an instance of Multiple Agree (i.e., the verb does not "agree with both arguments"), but rather an instance of the agreeing head looking only for marked features, and skipping constituents that don't bear the features it is looking for. Just like an interrogative C0 will skip a non-[wh] subject to target a [wh] object, so will the verb in this construction skip a [sg] (i.e., non-[pl]) subject to target a [pl] object.

This teaches us that 3sg noun phrases are not viable targets for the relevant head in K'ichean. Ah, but now you might ask: "What if both the subject and the object are 3sg?" The facts are that such a configuration is (unsurprisingly) fine, and an agreement form which is glossed as "3sg" shows up in this case (so to speak; it is actually phonologically null). That's all well and good; but what happened to the unchecked uninterpretable person/number/gender features on the head? Remember, they couldn't have been checked, because everything is now 3sg. And if 3sg things were viable targets for this head, then you could get "3sg" agreement in a Subj:3sg+Obj:3pl configuration, too – by simply targeting the subject – but in actuality, you can't. [This line of reasoning is resistant even to the "but what about null expletives?" gambit: if the uninterpretable phi features on the head were checked by a null expletive, then either the expletive is formally plural or formally singular. If it is singular, then we already know it could not have been a viable target for this head; if it is plural, and it has been targeted for agreement, then we predict plural agreement morphology, contrary to fact. Thus, alternatives based on a null expletive do not work here.]

What about Last Resort? It is entirely possible that grammar has an operation that swoops in should any "uninterpretable features" have made it to the interface unchecked, and deletes the offending features. But now ask yourself this: what prevents this operation from swooping in and deleting the features on the head even when there was a viable agreement target there for the taking (e.g. a 3pl nominal)? i.e., why can't you just gratuitously fail to agree with an available target, and just have the Last Resort operation take care of your unchecked features later? The only possible answer is that the grammar "knows that this would be cheating"; the grammar makes sure the Last Resort is just that – a last resort – it keeps track of whether you could have agreed with a nominal, and only if you couldn't have are you then eligible for the deletion of offending features. Put another way, the compulsion to agree with an available target is not reducible to just the state of the relevant features once they reach the interfaces; it is obligatory independently of such considerations. You see where this is going: if this bookkeeping / independent obligatoriness is going on anyway, uninterpretable features become 100% redundant. They bear exactly none of the empirical burden (i.e., there is no single derivation in the entire grammar that would be ruled out by unchecked features, only by illicit application of the Last Resort operation).

Bottom line: there is no grammatical device of any efficacy that corresponds to this notion of "uninterpretable person/number/gender feature."


At this juncture, you might wonder what, exactly, I'm proposing in lieu of (1-2). The really, really short version is this: agreement and case are transformations, in the sense that they are obligatory when their structural description is met, and irrelevant otherwise. (Retro, ain't it?) To see what I mean, and how this solves the problems associated with (1) and (2), I'm afraid you'll have to read some of my published work. In particular, chapters 5, 8, and 9 of my 2014 book. Again, sorry for the self-promotional nature of this.



Every practicing linguist has, in their head, a "toy theory" of various phenomena that are not that linguist's primary focus. This is natural and probably necessary, because no one can be an expert in everything. The difference, when it comes to case and especially when it comes to agreement, is that these phenomena have been (implicitly or explicitly, rightly or wrongly) taken as the exemplar of feature interaction in grammar. And so other members of the field have (implicitly or explicitly) taken this toy theory of case & agreement as a model of how their own feature systems should work.

And lest you think I have constructed a straw-man, let me end with an example. If you follow my own work, you know that I have been involved in a debate or two recently where my position has amounted to "such and such phenomenon X is not reducible to the same mechanism that underlies agreement in person/number/gender." What strikes me about these debates is the following: if A is the mechanism that underlies agreement, these (attempted) reductions are not reductions-to-A at all; they are reductions-to-the-LING-101-version-of-A (e.g. Chomsky's Agree), which – to paraphrase possibly-apocryphal-Mark – nobody who works on agreement thinks (or, at least, nobody who works on agreement should think) is a viable theory of agreement.

Now, it is logically possible that a feature calculus that was invented to capture agreement in person/number/gender (e.g. Agree), and turns out to be ill-suited for that purpose, is nevertheless – by sheer coincidence – the right theory for some other phenomenon (or set of phenomena) X. But even if that turns out to be the case, because the mechanism in question doesn't account for agreement in the first place, there is no "reduction" here at all.