[Note: the following owes tremendous debts to Jeff Heinz and Paul Pietroski, but that does NOT imply their endorsement. But they can provide endorsements or disavowals in the comments if they want to. They also have the right to remain silent, because ‘Murica.]
Taking phonology to be the mind/brain model for speech, it needs to have interfaces to at least three other systems: the articulatory motor control system (action), the auditory system (perception) and the long term memory system (memory), forming a Memory-Action-Perception (MAP) loop (Poeppel & Idsardi 2012). Likewise, signed languages will have to have interfaces to action (the motor system for the hands, arms, etc.), perception (the visual system) and to memory. As discussed in the comments last week, we think therefore that sign language phonology probably also includes spatial primitives which spoken language phonology lacks.
So the data structures inside phonology proper must be able to effectively receive and send information across those interfaces. This condition is a basic tenet of the minimalist program. Pietroski (2003: 198 Chomsky and his critics) is on-point and unmistakable :
Taking phonology to be the mind/brain model for speech, it needs to have interfaces to at least three other systems: the articulatory motor control system (action), the auditory system (perception) and the long term memory system (memory), forming a Memory-Action-Perception (MAP) loop (Poeppel & Idsardi 2012). Likewise, signed languages will have to have interfaces to action (the motor system for the hands, arms, etc.), perception (the visual system) and to memory. As discussed in the comments last week, we think therefore that sign language phonology probably also includes spatial primitives which spoken language phonology lacks.
So the data structures inside phonology proper must be able to effectively receive and send information across those interfaces. This condition is a basic tenet of the minimalist program. Pietroski (2003: 198 Chomsky and his critics) is on-point and unmistakable :
"Indeed, a sentence is said to be a pair of instructions -- called "PF" and LF" -- for the A/P and C/I systems. If these instructions are to be usable by the extralinguistic systems, PFs and LFs may have to respect constraints that would be arbitrary from a purely linguistic perspective."
(And see Chomsky's replies in the same book for further remarks on the importance of the interface conditions, e.g. p 275.) That is, the data structures inside phonology must be sufficiently similar to the data structures in the directly connecting modules to allow for the relevant information to be transferred. Such interface conditions are also definitely part of Marr's program, though people seem to miss this point, perhaps because it's buried in his detailed account of the visual system, e.g. pp 317 ff on transforms between coordinate systems.
Reiss and Hale often refer to the interfaces as “transducers”, but the whole of phonology is a transducer between LTM and motor control, and between perception and LTM (“From memory to speech and back”, Halle 2003). (There also seem to be some connections between action and perception that do not include a way station at memory. We ignore those things here.) Moreover, at times the SFP proposals seem to imply transducers which would have very impressive computational abilities, and which look implausible to us as direct interfaces which can be instantiated in human neural hardware (which in perceptual systems seem largely limited to affine transformations and certain changes in topology and discretization such as are accomplished in the vision system, Palmer 1999, though see Koch 1999 for an idea of how much computation a single neuron might be able to do -- a lot).
So, first, we will adopt the proposals of Jakobson, Fant and Halle 1952 (minus the acoustic definitions, which are nevertheless very relevant for engineering applications), and Halle 1983 regarding distinctive features and their neural instantiation, but drawing the feature set from Avery & Idsardi 1999. Here is a relevant diagram from Halle 1983:
Reiss and Hale often refer to the interfaces as “transducers”, but the whole of phonology is a transducer between LTM and motor control, and between perception and LTM (“From memory to speech and back”, Halle 2003). (There also seem to be some connections between action and perception that do not include a way station at memory. We ignore those things here.) Moreover, at times the SFP proposals seem to imply transducers which would have very impressive computational abilities, and which look implausible to us as direct interfaces which can be instantiated in human neural hardware (which in perceptual systems seem largely limited to affine transformations and certain changes in topology and discretization such as are accomplished in the vision system, Palmer 1999, though see Koch 1999 for an idea of how much computation a single neuron might be able to do -- a lot).
So, first, we will adopt the proposals of Jakobson, Fant and Halle 1952 (minus the acoustic definitions, which are nevertheless very relevant for engineering applications), and Halle 1983 regarding distinctive features and their neural instantiation, but drawing the feature set from Avery & Idsardi 1999. Here is a relevant diagram from Halle 1983:
And here is A&I’s wildly speculative proposal:
We believe that SFP (or at least the Reiss & Hale contingent) is sort of ok with this, although they remain more open on the feature set (which for them continues to include things like [±voiced] -- by the way, also in the Halle 1983 figure above -- despite Halle & Stevens 1971, Iverson & Salmons 1995 et seq).
We understand the perception side to involve neural assemblies including spectro-temporal receptive fields (STRFs, Mesgarani, David, Fritz & Shamma 2008, Mesgarani, Cheung, Johnson & Chang 2014) and a coupled dual-time-window temporal analysis (Poeppel 2003, Giraud & Poeppel 2012). On the action side, we’ll go with Bouchard, Mesgarani, Johnson & Chang 2013 and another off-the-shelf component, Guenther 2014. Much less is known about the relevant memory systems, but we’ll take stuff like Hasselmo 2012 and Murray, Wise & Graham 2017 as some starting points.
Most importantly in our opinion (we've been telegraphing this point in previous posts) we need a reasonable understanding of what “precedes” is, and we think precedes needs to be front and center in the theory. We feel that it is very unfortunate that phonological practice has often favored implicit depictions of “precedes” (as horizontal position in a diagram) rather than being explicit about it (but we’ve rehearsed these arguments before, to not much effect). We take “precedes” to be a temporal relation at the action and perception interfaces, providing the basis in the phonology for notions such as “before” and “after” and “at the same time, more or less”. (And for speech “more of less” appears to be about 50ms, Saberi & Perrott 1999, again see Poeppel & Giraud & Ghitza &co. It's not impossible for the auditory system to detect changes that are faster than this, but such rapid transitions will be encoded in phonology as features rather than as separate events.) We feel that it bears repeating here that the precedes relation in phonology is interfacing to the data structures and relations for time that are available in motor control and auditory perception and memory, which themselves are not going to capture perfectly the physical nature of time (whatever that is). That is, the acuity of precedes will be limited by things such as the fact that there is an auditory threshold for the detection of order between two physical events. (A point that Charles brought up in the comments last week.)
There are a number of ways to construct pertinent data structures and relations, but we will choose to do this in terms of events, abstract points in time (compare Carson-Berndsen’s 1998 Time-Map Phonology in which events cover spans or intervals of linear time). This will probably seem weird at first, but we believe that it leads to a better overall model. NB: This is absolutely NOT the ONLY way to go about formalizing phonology. SFP (Bale & Reiss 2018) take quite a different approach based on set theory. We will discuss the differences in a later post or two.
Within the phonology that means that we have at least:
We understand the perception side to involve neural assemblies including spectro-temporal receptive fields (STRFs, Mesgarani, David, Fritz & Shamma 2008, Mesgarani, Cheung, Johnson & Chang 2014) and a coupled dual-time-window temporal analysis (Poeppel 2003, Giraud & Poeppel 2012). On the action side, we’ll go with Bouchard, Mesgarani, Johnson & Chang 2013 and another off-the-shelf component, Guenther 2014. Much less is known about the relevant memory systems, but we’ll take stuff like Hasselmo 2012 and Murray, Wise & Graham 2017 as some starting points.
Most importantly in our opinion (we've been telegraphing this point in previous posts) we need a reasonable understanding of what “precedes” is, and we think precedes needs to be front and center in the theory. We feel that it is very unfortunate that phonological practice has often favored implicit depictions of “precedes” (as horizontal position in a diagram) rather than being explicit about it (but we’ve rehearsed these arguments before, to not much effect). We take “precedes” to be a temporal relation at the action and perception interfaces, providing the basis in the phonology for notions such as “before” and “after” and “at the same time, more or less”. (And for speech “more of less” appears to be about 50ms, Saberi & Perrott 1999, again see Poeppel & Giraud & Ghitza &co. It's not impossible for the auditory system to detect changes that are faster than this, but such rapid transitions will be encoded in phonology as features rather than as separate events.) We feel that it bears repeating here that the precedes relation in phonology is interfacing to the data structures and relations for time that are available in motor control and auditory perception and memory, which themselves are not going to capture perfectly the physical nature of time (whatever that is). That is, the acuity of precedes will be limited by things such as the fact that there is an auditory threshold for the detection of order between two physical events. (A point that Charles brought up in the comments last week.)
Within the phonology that means that we have at least:
- events/elements/entities, which are points in abstract time. We will use lower case letters (e, f, g, ...) to indicate these.
- features, which we construct as properties of events. When we don’t care what their content is we will use upper case letters to indicate them (F, G, H, …). So Fe means that event e has feature F. We will enclose specific features in brackets, following common usage in phonology, e.g. [spread]e. Notationally, [F, G]e will mean Fe AND Ge. We will drop the event variable when it’s clear in context (think Haskell point-free notation).
- precedes, a 2-place relation of order over events, notated e^f (e precedes f). The exact “meaning” of this relation is a little tricky given that we are not going to put many restrictions on it. For example, following Raimy 2000, we will allow “loops in time”. We're not sure that model internal relations really have any "meaning" apart from how they function inside the system and across the interfaces, but if it helps, e^f is something like "after e you can send f next" at the motor interface and "perceived e and then perceived f next" at the perceptual interface.
(We’ll do things in this way partly because of Bromberger 1988, though I [wji] still don’t think I fully understand Sylvain’s point. Also, having (1-3) allows us to steal some of Paul Pietroski’s ideas.)
So far (1-3) give us a directed multigraph (it allows self-edges and multiple edges between nodes); and with it comes with no guarantees of connectedness yet. (And Jon Rawski would like us to point out that there's a fore-shadowing of a model-theoretic approach here. Jon, please say more in the comments if you'd like.) We suppose this thing needs a name, so let’s call it Event-Feature-Precedence (EFP) Theory (would PFE be better? that could be pronounced [p͡fɛ], as in “[p͡fɛ], that’s not much of a theory”). With (1-3) we have a feature-based version of Raimy 2000 (as opposed to its original x-tier orientation), but since we will allow events to have multiple properties (features), we can recreate Raimy diagrams, such as this one for “kitty-kitty” where the symbols are the usual shorthands for combinations of features.
So far (1-3) give us a directed multigraph (it allows self-edges and multiple edges between nodes); and with it comes with no guarantees of connectedness yet. (And Jon Rawski would like us to point out that there's a fore-shadowing of a model-theoretic approach here. Jon, please say more in the comments if you'd like.) We suppose this thing needs a name, so let’s call it Event-Feature-Precedence (EFP) Theory (would PFE be better? that could be pronounced [p͡fɛ], as in “[p͡fɛ], that’s not much of a theory”). With (1-3) we have a feature-based version of Raimy 2000 (as opposed to its original x-tier orientation), but since we will allow events to have multiple properties (features), we can recreate Raimy diagrams, such as this one for “kitty-kitty” where the symbols are the usual shorthands for combinations of features.
And now, for some random quotes about non-linear time (add more in the comments, please!):
“This time travel crap, just fries your brain like a egg.” Looper
“There is no time. Many become one.” Arrival
“Thirty-one years ago, Dick Feynman told me about his "sum over histories" version of quantum mechanics. "The electron does anything it likes," he said. "It just goes in any direction at any speed, forward or backward in time, however it likes, and then you add up the amplitudes and it gives you the wave-function." I said to him, "You're crazy." But he wasn't." Freeman Dyson
(Note: Freeman Dyson is a physicist, not a movie).
We also take properties (features) to be brain states, as in Halle 1983. Then [spread]e means that event e has the property spread glottis. (For us being brain states doesn't preclude the properties from being other things too. We mean (1-3) in a Marrian way across implementations, algorithms and problem specifications.) This discussion will sometimes be cast as if features are single neurons. This is certainly a vast over-simplification, but it will do for present purposes. For us this means that a feature (= neuron (group)) can be “activated”. (The word “feature” seems to induce a lot of confusion, so we might call these constructs fneurons, which we will insist should be pronounced [fnɚɑ̃n] without a prothetic [ɛ].) The innervation of [spread] fneuron in event e (in conjunction with the correct state of volitional control circuits) will cause a signal to be sent to the motor control system that will ultimately innervate the descending laryngeal nerve to innervate the posterior cricoarytenoid muscle (and reciprocally de-innervate the lateral cricoarytenoid muscle). On the perception side, we assume (facts not in evidence because we’re too lazy to look through all of the STRFs in Mesgarani et al 2008) that there are auditory neurons whose STRFs calculate the intensity difference between bark bands 1 and 2 as versus bark bands 3 and 4 (probably modulated by the overall spectral tilt in bark bands 5 to 10). The greater this intensity difference the more likely a [spread] fneuron is to be innervated (activated past threshold). We doubt that there’s much effect of volitional control on the perceptual side, as auditory MMNs can be observed in comatose patients, or at least in those that eventually recover, 30/33 patients in Fischer, Morlet & Giard 2000. To Charles's point last week about phonological delusions, it's well known that large positive values of voice onset time (VOT) can signal [spread] also, without the inclusion of voice quality differences in the first few pitch periods of the vowel. So the working hypothesis is that the neurons for [spread] are connected to auditory neurons with at least two kinds of STRFs, the bark-based one mentioned above, and a neuron yielding a double-on response, see various publications by Steinschneider.
As many people are aware, graphs (and multigraphs) are usually defined over sets (or bags or multisets) of vertices and edges. We haven’t included the set stuff here. Why not? The idea, for the moment at least, is that we’re calculating in a workspace, and there isn’t significant substructure in the workspace in terms of individuated phonological forms. So the workspace universe provides the set structures, such as they are, the events, the properties of the events and the relations between events. This could well be a big mistake, as it seems to preclude asking (or answering) questions like “do these two words rhyme?” as this would involve comparing sub-structures of two different phonological representations. That is, in order to evaluate rhyme (or alliteration, or …) we would have to evaluate/find a matching relation between two sub-graphs, so we would need to be able to represent two separate graphs in the workspace and know which one was which. To do this we could add labels to keep track of multiple representations, or add the extra set structure. There are (different) mathematical consequences for either move, and it isn’t at all clear which would be preferable. So we won’t do anything for now. That is, we’re wimping out on this question.
Finally, we will say once more that there are some strong similarities between this approach and Carson-Berndsen’s work. But in Time-Map Phonology time is represented using intervals on a continuous linear timeline whereas here we have discretized time instead and we allow precedence to be non-linear.
Next time: It’s Musky
As many people are aware, graphs (and multigraphs) are usually defined over sets (or bags or multisets) of vertices and edges. We haven’t included the set stuff here. Why not? The idea, for the moment at least, is that we’re calculating in a workspace, and there isn’t significant substructure in the workspace in terms of individuated phonological forms. So the workspace universe provides the set structures, such as they are, the events, the properties of the events and the relations between events. This could well be a big mistake, as it seems to preclude asking (or answering) questions like “do these two words rhyme?” as this would involve comparing sub-structures of two different phonological representations. That is, in order to evaluate rhyme (or alliteration, or …) we would have to evaluate/find a matching relation between two sub-graphs, so we would need to be able to represent two separate graphs in the workspace and know which one was which. To do this we could add labels to keep track of multiple representations, or add the extra set structure. There are (different) mathematical consequences for either move, and it isn’t at all clear which would be preferable. So we won’t do anything for now. That is, we’re wimping out on this question.
Finally, we will say once more that there are some strong similarities between this approach and Carson-Berndsen’s work. But in Time-Map Phonology time is represented using intervals on a continuous linear timeline whereas here we have discretized time instead and we allow precedence to be non-linear.
Next time: It’s Musky
Let me start this longish post by saying that it is nice to see a phonological discussion develop on this wonderful blog. As a kind of a preamble, I am going to comment on some of the claims made in the previous post, and then I will focus on some issues raised in this post, but all in all my comments will revolve around the phonology-phonetics interface, or to be more specific, around the relationship between phonological features and phonetic substance.
ReplyDeleteWhat is substance free phonology about? Idsardi and Raimy (I&R) (sometimes “I” is used in these posts, sometimes “we”, so forgive me if I misattribute some claims) offer the following interpretation of the SFP enterprise:
“(…) how I understood the Hale & Reiss 2000 charge of “substance abuse”: there’s too much appeal to substance, and this should be reduced (take your medicine). As a methodological maxim "Reduce Substance!" then I'm all on board.”
I don’t think that SFP is to be interpreted as a phonological methodology. SFP is not a recommendation that can roughly be read as: ‘take it easy with the substance abuse, though a little abuse here and there is tolerable’. Rather, I take SFP to be a theoretically coherent (coherent in the broader context of generative linguistics and the rationalist approach to the study of language) and an empirically testable claim about the nature of phonological competence: phonological rules do not refer to articulatory, or acoustic, or perceptual information. In other words, phonology (as an aspect of the mind/brain) treats features (and other units of phonological representation) as arbitrary symbols. From the point of view of phonology, then, features are substance-free units. This, of course, does not mean that features are not related to phonetic substance, and such a conceptualization of features does not preclude the construction of a neurobiologically plausible interface theory (even spelled out in Marr’s terms). I will return to this point in a moment, but first let me just clarify what I mean by ‘substance’ in ‘substance free phonology’.
I&R write the following:
“let's not confuse ourselves into thinking that all reference to substance can be completely eliminated, for the theory has to be about something”
And before that:
“And a theory without any substance is not a theory of anything.”
I take ‘substance’ to mean ‘phonetic substance’, i.e., things like movements of the tongue, values of formants, loudness, duration expressed in milliseconds etc. I don’t think that ‘substance’ should be equated with ‘any kind of content’ or ‘aboutness’, which is what the above quotes seem to suggest (correct me if I am misinterpreting). In the final paragraph of the same post, I&R elaborate on their understanding of the term ‘substance’: “substantive = veridical and useful”. But in my opinion ‘veridical and useful’ is not at all what ‘substance’ means in ‘substance free phonology’. Can’t purely formal operations (e.g., merge) be veridical and useful? Features understood as substance-free units are veridical and useful in at least two senses: In conjunction with rules they allow for the expression of linguistically relevant generalizations, and they play a role in the phonetic interpretation of the surface representation.
[continued below]
In SFP literature two interesting and seemingly opposite formulations can be found: ‘substantive features’ (e.g., in Reiss 2016: 26; also 2018: 446 in the published version) and ‘substance-free features’ (e.g., in Hale & Kissock 2007: 84). While this introduces some confusion, it is not a contradiction. The notion ‘substantive features’ just means that, unlike for example certain formal aspects of rules (e.g., set unification, if you believe in that), features are somehow related to phonetic substance – the task is to find out how exactly. It is important to note that even in Halle (1983/2002), to which I&R refer, features are understood in this manner -- from the phonological point of view they are abstract and substance-free, and are related to phonetic substance indirectly:
Delete“Considerations of this nature were much in our minds thirty years ago when Jakobson, Fant and I were working on Preliminaries to Speech Analysis, and it was these considerations that led us to draw a sharp distinction between distinctive features, which were abstract phonological entities, and their concrete articulatory and acoustic implementation. Thus, in Preliminaries we spoke not of ‘articulatory features’ or of ‘acoustic features,’ but of ‘articulatory’ and/or ‘acoustic correlates’ of particular distinctive features.” (Halle 1983/2002: 108-109)
Assuming that phonological features can be regarded as abstract, substance-free mental units (which on purely phonological grounds is necessary for reasons discussed by Hale & Reiss 2008) the question is: How do features relate to phonetic substance? This is one of the central issues regarding the phonology-phonetics interface, and a question that Charles and I have recently addressed in this paper (I’ll refer to the paper as ‘CP’ because we call that interface theory Cognitive Phonetics). I&R write that if we were to banish substance from phonology, including features, “the resulting phonology can’t make reference to the motor and perceptual interfaces”. The purpose of CP was to explore how the substance-free conception of the phonological module of grammar (and particularly features, as elements of SRs) interfaces with the SM system in charge of speech production (we were looking into externalization via speech, putting speech perception aside for the most part, but that was just for reasons of space and manageability), while adhering to the condition of biolinguistic (i.e., neurobiological) plausibility. It turned out that you only need two very simple Marrian algorithms to convert substance-free features into data structures which are directly interpretable by the motor system: A1 assigns neuromuscular activity to each feature, A2 arranges this activity temporally. We proposed (roughly and tentatively) that on the implementational level the Spt and the anterior insula execute A1, the cerebellum and basal ganglia execute A2, and the supplementary motor area integrates A1 and A2 before shipping a final ‘true phonetic representation’ to M1. (The neuro discourse is a bit more sophisticated in the actual paper). We argued that assuming substance-free features + simple transduction algorithms is better (for both the phonology and the phonology-phonetics interface) than assuming substance-laden features.
[continued below]
Describing all the workings of our interface theory would take me too far afield (and would smell of trying to hijack the discussion), but let me just say that, broadly, CP is very much in agreement with the following statements by I&R:
Delete“Put in Marrian terms there have to be some linking hypotheses between the computational, algorithmic and implementational levels”
“(…) it [phonology -- vv] needs to have interfaces to at least three other systems: the articulatory motor control system (action), the auditory system (perception) and the long term memory system (memory), forming a Memory-Action-Perception (MAP) loop”
but does not seem to be in (complete) agreement with the following statements:
“[we’re] taking phonology to be the mind/brain model for speech” (cf. “phonology [is] the study of the mental model for human speech” in Idsardi & Monahan 2016: 141)
“the data structures inside phonology must be sufficiently similar to the data structures in the directly connecting modules to allow for the relevant information to be transferred”
“the whole of phonology is a transducer between LTM and motor control, and between perception and LTM”
I'll pause here to make room for discussion, fully aware that I owe an explanation as to why I consider the latter three statements slightly problematic.
Thanks Veno. I hope having discussions just like this will lead us all to a sharper understanding of the issues. As I've said before, I think that our positions are actually pretty close. So maybe it's partly about the sloganeering.
DeleteI think that your "dueling" SFP quotes at the beginning
of reply #2 highlights something that keeps coming up, which might be framed as a distinction between "proximal" vs "distal" substance. Saying that a module meets its interface conditions for us means that it communicates effectively with the *adjacent* modules. So the overlap in the data structures is the proximal substance (being veridical cross the modules and useful within the module). The relation through a series of such modular connections would be "distal substance".
When you say "phonological rules do not refer to articulatory, or acoustic, or perceptual information" if you mean, for example, that auditory neurons do not directly participate in the calculations within the phonology module (they are not part of the calculation "loop", no sub-routines in those modules are asked for their opinions about phonological computations) then we agree. But if by "refer" you mean "lawfully related to" then we disagree, because we think that the phonological units (here features) have direct, lawful relations (here in STRF terms). And you seem to have a very similar account of how the elements in phonology are directly lawfully related to pre-motor and motor cell assemblies. This falls under our definition of "substance" -- veridical information that does real work in the theory.
So maybe you would be ok with the formulation that phonological features are proximally substantive, but distally non-substantive (because the chain of connections is too long)?
So I'll push just a bit more to try to sharpen the question still. We've said that "substantive" = "veridical and useful", and we've further said that the important idea for modular theories is "proximally substantive", not "distally substantive". Are you ok with this? Or would you like to provide a different definition for "substance"/"substantive"? Or do you think we're wrong to draw a proximal/distal distinction in this way?
Sorry I'm a little late to this party. But I want to waive any right to silence. While I have nothing to say about the substance of phonology, I can say that some of what Bill and Eric attributed to me was inherited, long ago, from Morris.
DeleteIt's no accident that we've ended up with a shared conception of a "minimal" language faculty that pairs semantic instructions with phonological instructions--with something like conjunction as the semantic analog of precedence, and "eventish" predicates playing an important role in how "syntactic structures" (assembled via something like MERGE) interface with representations that respect agrammatical constraints (on agrammatical operations).
In Morris' essay in the 1990 "Invitation to Cognitive Science" volume, he describes utterances as actions that are manifested via the “gymnastics executed by certain anatomical structures...If utterances are regarded as “dances” performed by…movable portions of the vocal tract, then one must also suppose that underlying each utterance (“dance”) there is a “score” in some “choreographic” notation that instructs each “dancer” what to do and when (p.47)."
That passage stuck in my head. I've always understood the minimalist talk of "semantic instructions" as an invitation to explore potential analogies on the "Conceptual-Intentional" side of the street. And for better or worse, I think of LFs as "scores" in a notation that tells a "concept-builder" what to do in what order. (The first footnote in "Conjoining Meanings" cites the passage above.) Of course, this abstract picture leaves room for many kinds of instructions. One can imagine minds in which grammatical combinations of lexical items are not construed eventishly, and *lengthening* an expression is in no way correlated with logical *strengthening*. But for creatures like us, a longer score seems to be a way of saying more--modulo a few familiar suspects (e.g., negation, disjunction, and conditionalization). If that's right, then we should probably abandon any idea that on the semantic side, combination is construed in a logically neutral way. (Alas, that idea remains alive, or at least undead.) At some point, it would be nice to explore the potential phonology/semantics analogies in more detail.
The conception of features as ‘instructions to the articulators’ (e.g. Kenstowicz & Kisseberth 1979: 239) raises more questions about their relation to substance than it answers. While I do think that features are lawfully related to articulatory movements, I think that that relation is indirect and complex. Take, for example, the feature +HIGH. In order for the sensorimotor (SM) system to execute this feature it needs to know for how long the +HIGH configuration must be maintained; importantly, this temporal information is irrelevant for phonological computation. If we care about the competence/performance distinction, we cannot ascribe temporal information expressed in milliseconds to features (or segments) because the duration of a speech sound depends on speech rate, and surely we agree that speech rate is not part of competence. This is why we drew a line between the phonological module of the grammar (competence) and cognitive phonetics (performance). Both are cognitive (and ultimately neurobiological) systems: the first one computes (i.e., preserves the representational format), the second one transduces (i.e., changes the representational format). This is also the reason why I think it is slightly misleading to define phonology as the mental model for speech -- speech contains so much information that phonological computation systematically ignores; and also why I don’t quite agree that phonology is a transducer between LTM and motor control (since phonological computation preserves the representational format, and therefore cannot be a transducer).
ReplyDeleteIn my view, effective communication between adjacent modules does not entail overlap or identity in data structures characteristic for each module. (If there is identity in data structures between two modules, then why do we think that we’re dealing with two modules instead of a single module?) Rather, I take that the distinctness of data structures is surmounted by transduction (a conversion of one data structure into a different data structure), which can be found between many different systems within a single organism. For example, the process of hearing entails several transductions: air pressure differentials are transduced into biomechanical vibrations of the tympanic membrane and the ossicles of the middle ear, which are transduced via the oval window into fluidic movements within the cochlea, which are in turn transduced by the organ of Corti into electrical signals. In our paper, Charles and I proposed that transduction is also present at the phonology-phonetics interface: SRs (consisting of features) are transformed into what we call ‘true phonetic representations’ and it is this representational format that can be viewed as a “score” (what Paul mentioned) for articulation, not the format with which phonology operates. The ‘proximal substance’ is a bit suspect to me since it relies on the assumption about the overlap/identity in data structures between phonology and the SM system.
Thanks Veno, I think we're getting close to clarifying the differences here, which are not all that large, but I do consider important, and I think you do too. With the clarifications, I then think the differences turn into empirical issues which I think can then be investigated. A few things, I'm sure this discussion will continue across subsequent posts.
ReplyDeleteVV: "In order for the sensorimotor (SM) system to execute this feature it needs to know for how long the +HIGH configuration must be maintained; importantly, this temporal information is irrelevant for phonological computation."
I'd be a bit more cautious here. If you allow for autosegmental association of +HIGH, or for geminates, then there is some (perhaps small) use of timing information in the phonology. The mapping to motor commands isn't going to be isomorphic, but my feeling is that it will be quasi-homomorphic, preserving some aspects of the difference. (Put rather bluntly phonological geminates will not be systematically shorter in articulation or audition than singletons.) Also, the motor system *eventually* has to establish the temporal extents, it's not at all clear to me that the temporal extents are established by the *beginning* of the motor calculation (in fact it seems pretty clear to me that they aren't). I think the motor system only receives a rough idea from the phonology of how long the gestures should be. That part is in the interface, the rest of the calculation of the temporal extents is within the motor system.
VV: "we cannot ascribe temporal information expressed in milliseconds to features"
I'm sure that you can find quotes (even some from me) expressed in milliseconds. When you find some of mine, then I was being sloppy, offering some frame of reference for the reader or I would disavow the statements. The idea that we're pursuing here is that brain timing is done by endogenous oscillations (with theta and gamma bands being particularly important for speech). Even this is certainly too simple an answer, but we will try not to mix endogenous and exogenous descriptions of time. I was just on a NACS/Kinesiology dissertation defense on walking yesterday, and all of the discussion of time was done in terms of phase in the gait cycle. That's the sort of thing we would mean by time here, phase in delta-theta-gamma bands (which now has me worried about possible fraternity connotations).
VV: "I take that the distinctness of data structures is surmounted by transduction (a conversion of one data structure into a different data structure) ..."
DeleteI agree with this view in the case of mechano-transducers and chemo-transducers near the sensory peripheries (e.g. Bossomaier 2012). But biology has built special purpose hardware in these cases. Once we're approaching cortex the possibility of whole-sale changes in the data structures seems much less common or likely to me. This is the -otopic part of the various cortical representations, e.g. retinotopic organization of V1, tonotopic (which should really be called cochleotopic) organization in A1, and so on. The best case for a truly wild mapping (supporting your view) is the olfactory system:
"For example, the somatic sensory and visual cortices, described in the preceding chapters all feature spatial maps of the relevant receptor surface, and the auditory cortex features frequency maps. Whether any analogous maps exist in the pyriform cortex (or the olfactory bulb) is not yet known. Indeed, until recently it has been difficult to imagine on what sensory qualities an olfactory map would be based, ..." (Purves et al Neuroscience)
And we agree with David Poeppel that far too much emphasis have been placed on spatial maps. We'd be happy with other kinds of neural coding schemes (rate codes, oscillatory codes, etc.).
We're admittedly taking a very Marr/CS view on the data structures here. We're saying that communication across the modules will be possible in the case that there are corresponding structures (overlapping data structures, *not* identical). It won't be an isomorphism, but we're expecting it to be limited to a class of operations including affine transformations, quantization (and its inverse), and certain kinds of topological changes (such as the conversion of the linear wavelength of light into a circular opponent-process data structure in which purple is next to red). We don't believe it's "anything goes" in here.
In the case of the gestural scores, we would expect to see a homomorphism over events, with the time extents under-determined by the point nature of the phonological representation for events that we're proposing, but which would have to respect the ^ relation across the interface (which could become "smudged" by later calculations inside the motor system). In the case of the features (properties), we're expecting lawful correspondences for the collection of properties in an event to a an initial collection of descending motor nerve instructions (which are then re-organized within the motor system). That is, we think it's important to look at what you think the motor system *inputs* look like, not their outputs, which have definitely been transformed by the computations within the motor system.
Gestural scores do seem like a useful candidate for one level of representation, and I think to make good progress here we have to now try to say what we think the gestural score looks like (it's data structure) at the *beginning* of the motor calculation (not at the end of that calculation). We're saying that the three pieces of our phonology all have correspondences and simple mappings across the interface, to form the starting point of the motor calculation. But what looks simple to us might look complex to you. Our working definition of simple is affine transformations, quantization (and its inverse) and some topological changes (I'll admit that my topology is rusty, so I'm less clear on what these are).
Yes, Veno thanks for continuing the discussion. One thing that I want to add here is that I think there might be a latent disagreement or misunderstanding about how many modules we are talking about. Incorrectly or not, I get the impression that Veno (sorry to talk about you while you are right here) assumes that there is one big (or maybe small) phonology module that feeds directly into the 'cognitive phonetics' module which then goes to the MS. Along with this assumption is that a 'gestural score' continuous type representation is down in the MS somewhere. This appears to me to be a kind of 'fewer modules are better' type of position.
ReplyDeleteWe on the other hand (Bill can correct this here but he has never thrown anything at me when I talk this way) believe that there are more modules than probably what other folk assume. As part of this Brownman & Goldstein's Articulatory Phonology (and maybe Carson-Berndsen’s 1998 Time-Map Phonology, I say maybe because I haven't read it) could be distinct modules that are in the speech chain between the MS and LTM. Regardless of whether this is correct or not, I think we (at least me and Bill) are talking about the last module which interfaces with LTM when we are talking about EFP here. I'm sure everyone else's milage varies here...
To me the importance of thinking about how many modules/links there are in the speech chain affects what one thinks about the overlap conditions between modules. Fewer modules (or less? I'll talk pretty one day) will give the impression that there can be much more deviance between modules or more freedom in the mapping from one to the other. More modules allow for a much tighter restriction on the interfaces between modules. Tighter interface conditions enforce 'baby steps' in the transformations of representations.
A final point on this is that if there more modules involved here I then take any veracity between distant modules to be strong evidence for some sort of substance in features.
Thank you Bill and Eric for your replies. I’ll try to clarify one of my previous points, as it seems to me that it was taken slightly differently than I intended. I have a feeling that our discussion is beginning to narrow on very specific issues, which is only possible because we agree on so many fundamental concepts.
ReplyDeleteOne idea that I tried to argue for is that it cannot be the case that the data structure that exits the phonological module of the grammar is the same data structure that enters what’s informally called ‘the phonetic implementation system’ (PIS). Why not? Because the output of phonology – a surface representation – lacks information that the PIS (I’m having a hard time not pronouncing this as [pʰɪs]) needs in order to produce speech. The logic of the argument is basically this:
(1) Outputs of the phonological module, SRs consisting of features, do not contain substantial and temporal information.
(2) The PIS requires articulatory, auditory and temporal information in order to produce speech.
∴ SRs are not legible to the PIS and phonology cannot in principle feed speech production directly.
∴ The interface between phonology and the PIS is mediated by transduction.
By “substantial and temporal information” in (1) I mean, for example, the information about which muscles to contract and for how long. Why would (1) hold? Because phonological computation treats features equivalently despite wild variation in their articulatory (and concomitant acoustic) realizations. As Charles mentioned in a previous post, if we define phonological features through precise, richly specified articulatory configurations or acoustic measures, then we won’t be able to formally capture a tremendous amount of obviously important generalizations. On the other hand, the fact that Sylvester Stallone cannot contract his orbicularis oris (due to the paralysis of a facial nerve) in the same way than, say, Bruce Willis, is phonologically irrelevant, i.e., we should not assume that Sly’s [+ROUND] is different than Bruce’s [+ROUND]. And that irrelevancy suggests to me that features contain only a very rough, highly abstract characterization of what needs to be achieved in articulatory and auditory terms. Another kind of information that I think is missing from SRs is exact temporal information. Note that I am not talking about abstract timing, but rather about concrete time; abstract timing is useful for phonology, but not enough for speech production. Take, for example, the SR [sluga] (meaning ‘servant’ in Croatian). While the phonological precedence relation s^l^u holds, in speech the articulatory realization of [u]’s [+ROUND] feature is temporally (over)extended across all of [l]’s duration and most (or all) of [s]’s duration. In other words, [s] is [–ROUND], but it is realized as if it were [+ROUND]. Note also that anticipatory coarticulation cannot be a simple consequence of speech organ inertia – it has to be planned/calculated before the final efferent neuromuscular instructions are sent to speech effectors, thus motivating a cognitive processing stage which is distinct from both phonology and from (traditionally conceived) articulatory phonetics. This seems to suggest that what enters the motor system is not an SR (i.e., features), but rather a more richly specified data structure created by cognitive transduction; while there might be some isomorphism between data structures characteristic for phonology and the motor system, this isomorphism is quite modest in my opinion.
[continued below]
Of course, in principle it is not impossible that all that information is indeed encoded in features, and phonology systematically ignores it. Faced with this possibility, I’d be inclined to use the same line of reasoning that Chomsky (2013: 39) employs in arguing about the lack of linear order in syntax: If syntactic computation in terms of minimal structural distance always prevails over computation in terms of minimal linear distance, then the null-hypothesis should be that linear order is not available in syntax. If phonology systematically ignores certain information which are pertinent for speaking, then the null-hypothesis should be that that information is not available in phonology.
DeleteFinally, as Eric said, I take phonology to be a single module of the grammar. The main reason for this is its computational autonomy: phonology manipulates symbols (features, syllables, feet etc.) in a certain way (I’d say via rules, but OT would disagree) without changing the available representational alphabet. While I have not heard of a double dissociation of phonology from the rest of the grammar, I have heard about phonological processing being separable from non-linguistic acoustic processing (e.g. Phillips et al. 2000; Dehaene-Lambertz and Pena 2001; Dehaene-Lambertz et al. 2002), suggesting that phonology is informationally encapsulated at least from audition. Eric said that you (pl.) “believe that there are more modules than probably what other folk assume”. Did you mean that phonology is multimodular or that the speech chain is multimodular? (The multimodularity of the entire speech chain seems indisputable.)
Thanks again Veno. Again, yes, I think we agree on the facts here, and most of the interpretation. Let me just take a few points briefly.
Delete1. Sly Stallone's peripheral nerve damage. In this case I would say that Sly even has the same motor outputs, but that they're just ineffectual at the periphery. I take this to be a case very much like the discussion in Gallistel 1980: pp122ff of the amputation of the mesothoracic legs in cockroaches and the resulting effect on their walking patterns. The central pattern generator, the coupled oscillators and the contralateral requirement all remain the same, in place, and operative. It's just the very periphery that changes (we've cut two legs off). Same thing for Sly, in my opinion (well we didn't cut his legs off), and so we don't learn anything from his nerve damage about phonology or even about the central motor system.
2. [sluga] Here I think we need to see the full proposals, and I'm afraid that we're both offering mostly promissory notes so far. Is there no spreading of [round] within the phonology for you?
3. Gestural scores. For the purposes of better exposition and explication, maybe we could both take Articulatory Phonology (AP) gestural scores as an intermediate representation. If so, then the basis of a proposal from us would be to map our events into the Browman and Goldstein 1989 "point notation" for AP and then follow the rest of their account through the gestural scores and onto the motor system (with some updates from Guenther 2016). It seems to me that our events, features and precedence relations have pretty simple and straightforward mappings to their point notation: events to events, features to gestures, and precedence to precedence and phase-synchronization. If we then want to do gestural extension of (the gestural correlate of) [round] within the gestural score calculation that seems ok too. I.e. I think I would be ok with a gestural account of [round] gesture extension within the calculations done over gestural scores. But, to my previous point again, then the initial AP representation at the interface with phonology does just read the phonology information because it has a superset or direct mapping of those datatypes (events, features, precedence).
I think I may need some clarifications about your formalization, but it might turn out that my confusion is merely terminological. You open the next post with:
ReplyDeleteRecall from last time that we are trying to formalize phonology in terms of events (e, f, g, ...; points in abstract time), distinctive features (F, G, ...; properties of events), and precedence, a non-commutative relation of order over events (e^f, etc.). So far, together this forms a directed multigraph.
You've used the term multigraph a few times now, but I don't understand why. A multigraph allows for multiple edges between adjacent vertices, but AFAICS there's no real need (yet?) for that.
Your precedence relation is a preorder, it is transitive and reflexive, but not a(nti)symmetric. I.e. you can have a in E and b in E s.t. a^b and b^a and a<>b (these are your "loops"). Your use of "non-commutative" is also confusing to me, as that's typically a descriptor of operations...is that what you mean by the a(nti)symmetry?
Also, I'm not clear on what "time" means here. You refer to your events as "points in abstract time" (also as "abstract points in time", which is maybe the same?)...I'm not sure what "abstract time" is, but if it's anything like real time, which is basically the ne plus ultra of total orders (at least in the subrelativistic, non-quantum world we're dealing with in the present case), then it seems to me that your precedence relation is something rather different, and so saying that you "allow loops in time" is also confusing at best (Feynmann and heptapods notwithstanding).
Sorry to nitpick...I like the idea of a full and tight formal model of phonology (I'm slowly working through Charles & Alan's book in parallel) and so want to make sure I'm grokking it all.
Hi again Fred. Yes, I think you're grokking it. We allow more than the usual DAGs from computer science. We allow cycles (loops), self-edges, and multiple parallel edges. It is definitely true that we haven't illustrated all of these things yet. You can have a look at Eric's publications for previews of those things.
DeleteI apologize for the old-fashioned terminology. When I was a math undergrad 35 years ago it seems that we weren't as crisp with the terminology. For example, if relations are viewed as functions from pairs to T/F, then commutative amounts to the same thing as symmetric.
^ isn't reflexive (we don't require a^a for all a), but it isn't anti-reflexive either (we allow a^a).
^ is not symmetric (a^b doesn't imply b^a), but also not anti-symmetric (where a^b and b^a implies a=b), nor asymmetric (where a^b implies not b^a). So it's non-symmetric (this term doesn't seem as common in usage as "non-commutative" or "non-Abelian").
And ^ is also not transitive, for a^b and b^c does *not* imply that a^c. It's more like successor/child than like less-than/descendent.
So what *does* time mean here. Well, it certainly appears that it isn't all that much like "real" time, though it does have a kind of scheduling time interpretation (do this next, maybe). And we think it has an interesting relationship to phase-angle time in oscillatory systems (e.g. Articulatory Phonology or the walking gait cycle that I mentioned, and ultimately the endogenous neural rhythms in delta, theta and gamma bands). Given the "weirdness" of time here, this might be the most SFP thing that we're proposing.
I've resisting True Detective's "time is a flat circle" from season one, but I give in now.
I am finally able to come back to the discussion, which seems to have taken a Kafka-esque turn (Rambo qua roach) in my absence. One of these days, I'll have to come clean and take the blame for making a mess of the phrase "Substance Free". I think Mark warned me about this at some point, so don't blame him.
ReplyDelete