Note: This post is NOT by Norbert. It's by Bill Idsardi and Eric Raimy. This is the first in a series of posts discussing the Substance Free Phonology (SFP) program, and phonological topics more generally.
Bill Idsardi and Eric Raimy
Before the beginning
For and against method is a fascinating book, documenting the correspondence between Paul Feyerabend and Imre Lakatos in the years just before Lakatos died. Feyerabend proposed a series of exchanges with Lakatos, with Lakatos explicating his Methodology of Scientific Research Programs, and Feyerabend taking the other side, the arguments that became Against Method. We’ll try something similar here, on the Faculty of Language blog, relating to the question of substance in phonology.
Beyond EpistodomeNote: not “Epistemodome” because phonology cares not one whit about etymology Over severalteen posts we will consider the Substance Free Phonology (SFP) program outlined by Charles Reiss and Mark Hale in a number of publications, especially Hale & Reiss 2000, 2008 and Reiss 2016, 2017. We will be concentrating mainly on Reiss 2016 ( http://ling.auf.net/lingbuzz/003087/current.pdf ). Although we agree with many of their proposals, we reject almost all of the rationales they offer for them. Because that’s such an unusual combination of views, we thought that this would be a useful forum for discussion. (And we don’t think any journal would want to publish something like this anyway.) Because this is the Faculty of Language blog (FLog? FoLog? vote in the comments!), we will start with a reading. Today’s reading is from the book of LGB, chapter 1, page 10 (Chomsky 1981):
“In the general case of theory construction, the primitive basis can be selected in any number of ways, so long as the condition of definability is met, perhaps subject to conditions of simplicity of some sort. [fn 12: See Goodman (1951).] But in the case of UG, other considerations enter. The primitive basis must meet a condition of epistemological priority. That is, still assuming the idealization to instantaneous language acquisition, we want the primitives to be concepts that can plausibly be assumed to provide a preliminary, pre-linguistic analysis of a reasonable selection of presented data, that is, to provide the primary linguistic data that are mapped by the language faculty to a grammar; relaxing the idealization to permit transitional stages, similar considerations hold. [fn 13: On this matter, see Chomsky (1975, chapter 3).] It would, for example, be reasonable to suppose that such concepts as “precedes” or “is voiced” enter into the primitive basis …” (emphasis added)So the motto here is not “substance free”, but rather “substance first, not much of that, and not much of anything else either”. Since we’re writing this during Lent (we gave up sanity for Lent), the message of privation seems appropriate. And we are sure that the minimalist ethos is clear to this blog’s readers as well. Reiss 2016:16-7 makes a different claim:
“• phonology is epistemologically prior to phonetics…
Hammarberg (1976) leads us to see that for a strict empiricist, the somewhat rounded-lipped k of coop and the somewhat spread-lipped k of keep are very different. Given their distinctness, Hammarberg make the point, obvious yet profound, that we linguists have no reason to compare these two segments unless we have a paradigm that provides us with the category k. Our phonological theory is logically prior to our phonetic description of these two segments as “kinds of k”. So our science is rationalist. As Hammarberg also points out, the same reasoning applies to the learner -- only because of a pre-existing built-in system of categories used to parse, can the learner treat the two ‘sounds’ as variants of a category: “phonology is logically and epistemologically prior to phonetics”. Phonology provides equivalence classes for phonetic discussion.” (emphasis added)Two claims of epistemological priority enter, one claim leaves (or maybe none). The pre-existing built-in system of categories used to parse include: (1) the features (Chomsky, Reiss 2016:18), which they both agree are substantive (Chomsky: “concepts … [that] provide a preliminary pre-linguistic analysis”, Reiss 2016:26: “This work [Hale & Reiss 2003a, 2008, 1998] accepts the existence of innate substantive features”) and (2) precedence (Chomsky; Reiss is mum on this point), also substantive. (We will get to our specific proposal in post # 3.) In the case of the learner, it’s not clear if a claim of epistemological priority can be made in either direction. In our view children have both structures: they come with innate, highly specified motor, perceptual and memory architectures along with a phonology module which has interfaces to those three entities (and probably others besides, as aspects of phonological representations are available for subsequent linguistic processing, Poeppel & Idsardi 2012, and are available in at least limited ways to introspection and metalinguistic judgements, say the central systems of Fodor 1983). The goal for the child is to learn how to transfer information among these systems for the purposes of learning and using the sound structures of the languages that they encounter. We do agree with SFP that a fruitful way of approaching this question is with a system of ordered rules within the phonological component (Halle & Bromberger 1989). In terms of evolutionary (bio-linguistic) priority, it seems blindingly clear that the supporting auditory, motor and memory systems pre-date language, and the phonology module is the new kid on the block. (Whether animal call systems because they also connect memory, action and perception are homologous to phonology is an empirical matter, see Hauser 1996.) In terms of epistemological priority for scientific investigation there would seem to be a couple ways to proceed here (Hornstein & Idsardi 2014). One is to see the human system as primates + X, essentially the evolutionary view, and ask what the minimal X is that we need to add to our last common ancestor to account for modern human abilities. The answer for phonology might be “not much” (Fitch 2018). But there’s another view, more divorced from actual biology, which tries to build things up from first principles. So in this case that would mean asking what can we conclude about any system that needs to connect memory, action, and perception systems of any sort, a “Good Old Fashioned Artificial Intelligence” (GOFAI) approach (Haugeland 1985, see https://en.wikipedia.org/wiki/Symbolic_artificial_intelligence ). This seems to be closer to what Hale & Reiss have in mind, maybe. If so, then by this general MAP definition animal call systems would qualify as phonologies. As would a lot of other activities, including reading-writing, reading-typing, rituals (Staal 1996), dancing, kung-fu fighting, etc. (Nightmares about long ago semiotics classes ensue.) But there are problems (maybe not insurmountable) in proceeding in this way. It’s not clear that there is a general theory of sensation and perception, or of action. And what there is (e.g. Fechner/Weber laws, i.e. sensory systems do logarithms) doesn’t seem particularly helpful in the present context. We think that Gallistel 2007 is particular clear on this point:
“From a computational point of view, the notion of a general purpose learning process (for example, associative learning), makes no more sense than the notion of a general purpose sensing organ—a bump in the middle of the forehead whose function is to sense things. There is no such bump, because picking up information from different kinds of stimuli—light, sound, chemical, mechanical, and so on—requires organs with structures shaped by the specific properties of the stimuli they process. The structure of an eye—including the neural circuitry in the retina and beyond—reflects in exquisite detail the laws of optics and the exigencies of extracting information about the world from reflected light. The same is true for the ear, where the exigencies of extracting information from emitted sounds dictates the many distinctive features of auditory organs. We see with eyes and hear with ears—rather than sensing through a general purpose sense organ--because sensing requires organs with modality-specific structure.” (emphasis added)So our take on this is that we’re going to restrict the term phonology to humans for now, and so we will need to investigate the human systems for memory, action and perception in terms of their roles in human language, in order to be able to understand the interfaces. But we agree with the strategy of finding a small set of primitives (features and precedence) that we can map across the memory-action-perception (MAP) interfaces and seeing how far we can get with that inside phonology. With Fitch 2018 The phonological continuity hypothesis, though, we will consider the properties of phonology-like systems (especially auditory pattern recognition) in other animals, such as ferrets and finches to be informative about human phonology (see also Yip 2013, Samuels 2015). How much phonological difference does it make that ASL is signed-viewed instead of spoken-heard? Maybe none or maybe a lot, probably some. The idea that there would be action-perception features in both cases seems perfectly fine, though they would obviously be connecting different things (e.g. joint flexion/extension and object-centered angle in signed language and orbicularis oris activation and FM sweep in spoken languages). Does it matter that object-centered properties are further along the cortical visual processing stream (Perirhinal cortex) whereas FM sweeps are identifiable in primary auditory cortex (A1)? Can we ignore the sub-cortical differences between the visual pathway to V1 (simple) and the ascending auditory pathway to A1 (complex)? Does it matter that V1 is two dimensional (retinotopic) and so computations there have access to notions such as spatial frequency that don’t have any clear correlates in the auditory system? Do we need to add spatial relations between features to the precedence relation in our account of ASL? (This seems to be almost certainly yes.) Again, we agree that it’s a good tactic to go as far as we can with features and precedence in both cases, but we won’t be surprised if we end up explanatorily short, especially for ASL. To address a technical point, can you learn equivalence classes? Yes, you can, that’s what unsupervised learning and cluster analysis algorithms do (Hastie, Tibshirani & Freidman 2001). Those techniques aren’t free of assumptions either (No Free Lunch Theorems, Wolpert 1996), but given some reasonable starting assumptions (innate or otherwise) they do seem relevant to human speech category formation (Dillon, Dunbar & Idsardi 2013, see also Chandrasekaran, Koslov & Maddox 2014) even if we ultimately restrict this to feature selection or (de-)activation instead of feature “invention”. Next time: Just my imagination (running away with me)
I don't think the Reiss logic on features is necessarily that widely accepted, even within SFP; seems to me that a Mielke-style story is pretty consistent with the rest of the SFP worldview. I talk about this and also touch on the modality issue in this chapter: Samuels, B.D. (2012) ‘The emergence of phonological forms.’ In A.M. Di Sciullo (ed.), Towards a biolinguistic understanding of grammar: essays on interfaces, 193-213. Amsterdam: John Benjamins.ReplyDelete
Thanks Bridget. My view on features is really pretty close to Charles's, but for different reasons I think (which we'll get to eventually). What's your understanding of the constituent elements of "sound patterns" in the emergentist view the right hand side of Mielke's Figure 1.4? Groups of sounds?Delete
Yes, thanks for the comment Bridget. My main question and concern with Mielke type emergence stories on features is the acquisition space explosion problem. One of the main things I saw the poster I presented at the LSA was that more features exponentially grow the acquisition space so we should try to limit the input features for phonological acquisition. Standard SPE-esque feature systems already have a huge acquisition space so I don't see the advantage in making it larger by allowing for arbitrarily learned features.Delete
Another twist on this is that the actual used space in a Dresherian hierarchy is not strictly correlated with number of features but also which features are used. For example, 'mid' as used by UPSID (should be ok as a Mielke feature, right?) blows things up a bit.
Related to my comment above, I see the question about a Mielke type story to be a trade-off between rules and representations. Mielke's argument seems to be that since we can find rules that have 'crazy sets' of segments then we must have more arbitrary features so we can write one rule to do it. The other way is to keep the smaller set of features and allow rule repetition/duplication where something like the RUKI rule is two to four rules that are suspiciously similar. This may be a 'conspiracy' but I'll take a conspiracy over a hypothesis space explosion. YMMV.Delete
Note: I recognize the promissory note on the LSA work and will do my best to be prompt in getting it written up.
How much phonological difference does it make that ASL is signed-viewed instead of spoken-heard? Maybe none or maybe a lot, probably some.ReplyDelete
Undoubtedly, the biological differences of the modality make phonetic differences which the phonology has to acknowledge. However, there is strong evidence, that I've written about in a recent CLS paper , that aspects of sign phonology show exactly the same formal complexity class (or, equivalently, the same logical power) as their spoken equivalents. The real question with sign (and also with Fitch's Continuity Hypothesis) is not "How different is it" but rather "where does the difference lie"? One way a formal language theory approach (which is about as substance-free as you can get) helps is by clearly laying out the roles representation and complexity play. However you change a representation, you can clearly show the effects on the computational power required of the phonology. The issue of substance is in a sense independent from these results, but as Reiss notes, "substance-free" has many usages.
Thanks Jon for the comment and the link to the paper. I don't think it will come as a surprise to anyone that I endorse exploring formal models whole-heartedly. But, as Jeff Heinz and Thomas Graf have pointed out, different representational systems (data structures) can result in different logical complexity to describe systems of patterns. For example, describing CFGs as string sets is different from describing them with tree automata. Or +1 is first-order definable from < but not vice-versa. What I suspect is that sign languages add at least one spatial relation to the basic temporal one in spoken language. So I think that the hard test case will be to find some sign language generalization that is an interaction between spatial and temporal relations and see what formal power that requires. One version of the formal language approach would "hide" the spatial relations inside the alphabet of the formal language. But I'm not sure that this would cause the temporal relations to be non-sub-regular. I'm all for the algebraic approach, but I worry that this will be akin to trying to compare groups (with one relation) with fields (with two relations). There, the elements of a field do form a group under one operation (+), and (throwing away 0) also form a group under the other operation (·). So many (maybe all) group properties would look the same in the field. But the field also has laws regarding the relationship between the relations, distributivity, that doesn't follow from the group structure alone.Delete
I think that the hard test case will be to find some sign language generalization that is an interaction between spatial and temporal relations and see what formal power that requires.Delete
This is exactly right. The sign community has endorsed this view, and there is considerable evidence toward autosegmental models of sign structure, which Wendy Sandler posits as a tradeoff between "sequentiality and simultaneity". this allows even more than just one spatial relation. If the sign representation can be modeled with, say, graph languages, as Adam Jardine has shown with other autosegmental processes (coincidentally I am doing this exact work now) then we can clearly see what contribution the representation makes.
Thanks again Jon. Here's the closest I can come to what I have in mind.Delete
Spoken languages do not seem to be able to enforce a condition of temporal symmetry (i.e. a palindrome language). This follows from the sub-regular conjecture as non-trivial palindrome languages are non-regular (though there are interesting variations on this question that are regular; the CFG and PDA constructions for palindrome languages are standard fare in automata theory texts). So a system of sub-regular relations isn't sufficient to define a symmetry relation that we could test strings for.
However, signed languages have signs that are visually symmetrical. Harry van der Hulst pointed out to me many years ago that "motorcycle" (with 2 throttles) and "tape-recorder" (with a motion that would snap the tape) are two that are non-iconic in interesting ways, which he suggested at the time showed some kind of abstract pressure to enforce/prefer symmetry. Harry argues recently ( http://wrap.warwick.ac.uk/65928/2/WRAP_KIT%2024-04-14-SK%20%283%29.pdf ) that this is a general cognitive constraint, not something specific to signed language, because it "holds equally well for signs and spontaneous gestures". But it seems to me that you would still need to be able to specify a relation of symmetry in order to recognize it or categorize by it. I think that it is likely that the visual system has "special hardware" to assess symmetry in the context of visual object recognition (Palmer Vision Science pp375ff) and human body and facial attractiveness (e.g. http://psycnet.apa.org/buy/2012-00536-001 this one has an interesting experimental technique).
So if signed languages use visual symmetry to define a class of sounds they would at least need access to primitives able to (collectively) code for this property, which would seem to make the class of spatial relations different from the class of temporal relations.
Now I think we have a long way to go to show whether this is the case or not, but I think it at least gives teeth to the question.
Ah, I see what you mean! As far as temporal symmetry, I'd point to the extremely bounded temporal nature of signs (signs are canonically monosyllabic with a CVC-like structure) as avoiding palindrome or even First-Last Harmony territory, since there's no unbounded enforcement requiring full regular or context-free power.Delete
I also agree that having independent articulators (not even mentioning prosodic facial stuff) is a whole new ball game. One would just have to show HOW a spatial relation is defined in terms of a model signature. One reason I like the autosegmental approach is that it makes this distinction between sequential and spatial relations very clear. However, the downside is that the autosegmental models eventually start looking like Purkinje cells. Now, it could be (pure speculation) that the general cognitive symmetry constraint you're talking about is present, and that sign balances this power against the sub-regular temporal requirements banning sequential symmetry by limiting the size of the sign. That trade-off I think could be tested for, and would be interesting particularly with reduplication.
Thanks Jon. I agree with most of this, but we need to keep in mind that westill need to get long-temporal-distance patterns, such as are captured by Strictly Piecewise statements in the sub-regular hierarchy.Delete
[There will be at least two pieces to this reply]ReplyDelete
I'm going to respond first to something at the end of Bill and Eric's (B&E) post, the idea that you can learn equivalence classes (ECs). I think a nice example is suggested by Stanislas Dehaene in his pop book Reading in the Brain:
Experiments show that very little training suffices to DeCoDe, At An EsSeNtIaLly NoRmAl SpEeD, EnTiRe SeNtEnCes WhOsE LeTtErS HaVe BeEn PrInTeD AlTeRnAtElY iN uPpErCaSe aNd In LoWeRcAsE.
So, we seem to be able to construct an EC that includes, say, a and A, (in an unbounded number of shapes, sizes, colors and fonts), and treat the members of such a class as identical for the fast, automatic task of reading.
We also seem to be able to create an EC that treats [p] and [ph] as `the same' (in some sense) if we speak English, and not do so if we speak Thai.
It does not follow, of course, that ALL the equivalence classes that we have can be created. It is hard to imagine how the class of YES-NO questions, for example, can be created from acoustic inputs like
Is mommy tall?
Does daddy snore?
Does Wilbur want yet another cheese scone?
Wasn't the frog annoying?
[If the inputs are something more than acoustic inputs, like full syntactic trees minus a Question head, we're begging the question (no pun intended), since we have to explain where all those other node labels came from.]
So, are phonological features more like the category ``the letter A'' or more like the category ''YES-NO question''?
I'm not sure, but I'm going to push the latter for now.
Let me try to clear up a few things:
1) This is not an argument, but I should point out that Hammarberg, a phonetician by training, sees himself as solidly in the rationalist Chomskyan camp. I once sent him fan mail and he suggested in his reply (p.c.) that his rationalist outlook guaranteed that he never could be taken seriously in the empiricist world of phonetics. I recommend his papers `The Metaphysics of Coarticulation' and 'The Raw and the Cooked' (as in `raw data', which he argues does not exist).
2) I'm going to claim that the passage that the LGB passage that B&E cite is inconsistent. Chomsky says that the primitive basis (of UG) provides a ``pre-linguistic'' analysis. ''Pre-linguistic'' is of course a kind of ''non-linguistic''. But if it is part of UG, it should be linguistic, by definition.
The discussion only becomes about linguistic issues when those ``pre-linguistic'' primitives get organized into or build linguistic ones. Notice that Chomsky mentions the `concept' ``is voiced'', by which I assume he means that the vocal folds are vibrating. That is different from ``+VOICED'' which we know has a very complicated relationship to what is actually happening in the throat. For illustration, let me use an old-fashioned, over-simplified example. The phonetic correlates of the difference between the words bet [bEt] and bed [bEd] are mostly found in the length of the vowels, but we traditionally say that the phonological difference involves +/-VOICED final stops. So, my interpretation of the LGB passage is that the sub/pre-linguistic `concepts' like ``is voiced'' parse stimuli into (innate) categories like ``+VOICED''.
I am not an expert on the latest animal ``phonology'' results, but I'd say that when your monkey can treat vowel duration, short VOT of a stop and vocal fold vibration during a nasal ALL THE SAME, then we'll say he has +VOICED. Til then, he is just sensitive to ``is voiced'', vocal fold vibration.
To reiterate: +VOICED is a (potentially misleading) name for one category into which humans parse sounds that are like human voices. Sensitivity to vocal fold vibration seems to trigger parsing into this category more than some other things, but so do many other properties, like longish vowels. To say that other critters are sensitive to ``voicing'' is not the same as saying that they have a category +VOICE.
3)Let's turn to `precedence' as a primitive. Bill and Eric have done very interesting work on precedence relations in phonological representations, work that has inspired my own, so we probably agree about many precedence related matters, and we all want to see it given a starring role in phonology. However, we all know, as well, that actual temporal precedence and phonological precedence are not the same. Here are some things to keep in mind:
a. Forget phonology---even auditory perception is subject to illusions of precedence, as work in the Auditory Scene Analysis framework has shown.
b. In other domains of cognition, such as touch, there are also illusions of temporal precedence---see the cutaneous rabbit effect, where the mind has to `go back in time' to construct where intermediate illusory hops occurred.
c. As Sapir pointed out in discussing Nootka, the sequencing of an oral gesture and a glottal gesture differs in glottalized sonorants vs. obstruents, but the sequencing is ignored, and the two orders are treated as phonologically identical.
So, again, as with voicing, I suggest that Chomsky's `precedence' is sub/pre-linguistic, and that there is a complicated relationship among temporal-perceptual-phonological precedence. For me, in phonology, to say that a morpheme contains segments with precedence relations among them (viz., strings) is just to say that the phonological representation is an ordered set of segments---precedence is a mathematico-logical notion in (my) phonology. (I am ignoring things like syllable structure here.)
(All this relates to what vision people like Pylyshyn call the `stimulus independence' of cognitive categories, but I won't go into that.)
4) One more point. I understand B&E to be raising the possibility that +VOICED and +HIGH are not pre-existing innate phonological categories. If instead each learner constructs their own set of features from more basic primitives, statements like the following lose meaning:
``Morris has a +HIGH vowel in the word [hip] and Noam has a +HIGH vowel in the work [hip].''
Well, we could mean that each uses the highest (frontest, unroundedest, tensest) vowel of his (idiosyncatrically constructed) inventory in (his version of) [hip]. But as an absolute statement of identity, the original sentence is no good, since we are using ``+HIGH'' to refer to two different things. If we take this path, then every single cross-linguistic generalization referring to features loses its meaning. (All markedness claims become meaningless---which might be an improvement on their current status.)
Now turn to (morpho-)syntactic categories and features. If Morris's Tense feature is not the same as Noam's Tense feature, but each is instead an idiosyncratically constructed category, I think syntactic theory becomes impossible.
I don't see a way out of having innate features in phonology or syntax at this point.
On point (2) about VOICING, actually no, we don't assume that the 'vocal cords are vibrating' is the actual substance of laryngeal features. We (I'll force Bill's hand on this if he disagrees) assume that the laryngeal distinctive features are actually based on articulatory positions of the larynx which manipulate 'vocal cord vibration'. We will elaborate on this in a different post if I understand all of the different stuff we have cooking here.Delete
An important point on this, is that if Joe Salmons is correct about the very very very strong correleation between regressive voice assimilation based around the 'slack' feature and bidirectional spreading of the 'spread' feature, the differential substance of whether vocal cords are abducted vs. the thyroid cartilage being rocked back provides potential insight to this bias among phonological processes.
Thanks Charles. Very quick replies here, as later posts will deal with these topics again.Delete
1. I believe in a cast of innate features much like yours. (Just in case anyone is confused.) And I agree that the written alphabet is interestingly different, which for me puts graphology beyond the scope of phonology. But I'm sure that there are plenty of people writing xfst code who are perfectly happy to lump spoken and written language together, and for engineering purposes that seems like an efficient thing to do.
2. I don't think Chomsky is being inconsistent, he's saying that the primitives in UG have reasonable correspondences to entities across the interfaces.
3. I don't think that phonological precedence is isomorphic to physical precedence, but that phonological precedence has a reasonable relation to what codes as precedence in motor and perceptual systems. (And there might be several kinds of codes here.) As you say, it is well know that the threshold of detection for two events and for the ordering of two events is different from each other and different across sensory modalities. So for me part of the understanding of phonological precedence would include knowing what the motor and perceptual systems can temporally distinguish and code for so that there could be reasonable interfaces to them. Given the existence of neural oscillators in the motor system (Gallistel 1980 ch 4, 5) and the perceptual system ( https://www.sciencedirect.com/science/article/pii/S1053811913006721 ) I think that also gives us a way of understanding some kinds of "loops in time" as repetition of events at the same phase in successive cycles driven by exogenous or endogenous frequencies.
Let me clarify my point that Eric responded to. Whatever the typical correlates of what we call +VOICE (or whatever), it seems clear that humans will categorize/parse/transduce many different stimuli as having THAT feature.ReplyDelete
Let's choose another one like phonological length. A word-initial "long" L in Malay will be longer than a short one, but a word-initial "long" K will have, not a longer articulation, but a stronger burst than a phonologically short one. Humans are endowed with a DELUSION OF IDENTITY that makes them encode both L: and K: as what we call +LONG (humor me, and treat it like a feature). Very different stimuli get categorized as members of the same Equivalence Class.
Do monkeys do THAT in the same way we do? Or are they free of the DELUSIONS OF IDENTITY that seem to be a/the crucial part of having human phonological representations? It is not enough to show monkeys being sensitive to some of the cues we use. We need to show that they ignore the same differences that we do. So, I guess, I am claiming that in the formula HUMAN=PRIMATE+X, X is a particular set of delusions.
Charles's comment is exactly on point, and unfortunately the work on other animals is pretty complicated to do. Mitchell Steinschneider has worked on the neural encoding of VOT in primates and humans distinguishing between single-on and double-on responses. So there are auditory neurons, in both primates and humans, that show double-on responses to long VOTs (> 40ms or so, https://www.ncbi.nlm.nih.gov/pubmed/10561410 then look for the rest of Steinschneider's articles) but only single-on responses to short VOTs, such that responses to 0ms and 20ms were essentially the same. So some of these delusions of identity are certainly observable in A1 (they come with the hardware). I think that this is the same point that you, Charles, were making before about detection thresholds for distinctness and order. So presumably this stuff isn't interesting for your point, because these delusions of identity are inherited from the general auditory system.Delete
(Just an aside here. Among the biologists there's quite a bit of worry about whether primates are an adequate animal model here, as they don't do vocal learning.)
So what you're asking for here, quite correctly I think, is to get some animal to perform a go/no-go task on some delusions of identity. For example, go on long consonants, no-go on short consonants. I wouldn't be surprised if we can train the ferrets to do this, as the group here at UMD has already trained them to recognize Mandarin tones across different speakers with different pitch ranges. But I think the potential objection is the same one that you raised before about letter caseforms: maybe the ferrets are doing by simply constructing an (arbitrary) equivalence class, like "A" = "a". But maybe this is what you think +LONG is?
Now my own idea about what the ferrets (and humans) are doing for tones is that there are pre-existing circuits that take the average first and second derivatives of the pitch over the course of a "syllable" (i.e. one theta cycle since we're still talking auditory cortex). This is a very effective data compression technique which certainly yields interesting equivalence classes which would then cause delusions of identity: e.g. negative, zero, and positive average first derivatives result from falling, level and rising fundamental frequency contours.
I feel like I'm being pedantic here but I can't countenance +LONG here for a few reasons.Delete
The first is that when we are really trying to figure out fundamental aspects of phonology, shorthand won't do. We want the real crucial cases, not shorthand 'close enough' ones and you admit yourself that +LONG is not a feature.
The second is that I do believe that word initial geminate stops are longer in articulation than short ones in the closure phase. This may be obscured in word initial position but the 'equivalent class' here can be fairly accurate. The question is just how does one notice/perceive/hear longer silence with out a preceding boundary. Which leads the the third point.
The point that is posed is how does 'louder' get equated with 'length' and an unstated assumption here is that there are only one-to-one cues between acoustics/articulation and the phonological representation. I'm pretty sure this is a straw man position because we know that there are trading relations among cues to phonological representations. loud~long appears to be one that is used a lot in tradeoffs. Think stress. For long stops this even makes some sort of 'common sense' (be careful) because a longer closure will lead to greater supraglottal pressure buildup which should be 'louder'.
And D... I think C's point here about +LONG and Bill's response identifies a part of this discussion that I think gets very weird very quick. I like weird but it can be very confusing at least for me...Delete
What do 'illusions' or known perceptual inaccuracies like the temporal streaming ones tell us about phonological substance? If I am tracking, C suggests that illusions provide evidence for 'substance freeness' but I don't think I like that interpretation. Let me talk about Bill's Mandarin hearing ferrets...
One feature that Bill lauds about these fancy ferrets at UMD is that they can tell Mandarin tones apart from each other even when it is acoustically 'illusory' (presumably a small woman's low tone is picthed higher than a large man's high tone...). Is this substantive? If we can actually show that a neural circuit creates this 'illusion' as a fundamental part of the processing of the input then I think this is substance. This situation would not be an example of learning an arbitrary equivalence class which I do agree would show 'substance free' behavior.
I am really annoyed that 'embodied' as a term has already been used for ideas that I don't buy because I think it captures some of what I see here. Wetware is substantive and if it happens to create illusions then the illusions are substantive. Very weird and very annoying because we have to sort through cases individually.