The past two or three weeks have marked the Chinese New Year and the 106th birthday of Zhou Youguang (周有光), the codifier of pinyin, the official Chinese system for transcribing Mandarin into the Roman alphabet. In the late 1970s, when I took a Structure of Mandarin course at SOAS, there was still competition between various romanization systems, but Zhou has lived to see his supersede all others. Pinyin (pīnyīn, 拼音), which literally means ‘spell sound’, has provided a phonetic tool for millions of Mandarin learners.
Thanks to a seasonal attic clearance, I happened to spend the Chinese New Year reading Henry Sweet’s 1884 paper for the Philological Society, The Practical Study of Language. Sweet (on whom Bernard Shaw partly based Henry Higgins in Pygmalion) was, like Zhou, a champion of the phonetic approach to language learning, and his advice is still relevant.
Sweet deplores (I retain his reformed spellings)
the tendency of gramarians to regard the spoken language as a coruption of the literary language
the general axiom – equaly important for the practical and the scientific study of language – that the living spoken form of every language should be made the foundation of its study.
The language I’ve studied most recently happens to be Mandarin. In my sporadic attempts to learn the basics, I’ve used both of the best-known audio courses, the Pimsleur [ˈpɪmzlɚ] and the Michel Thomas [mɪˈʃɛl ˈtɒməs].
First let me say that I’d recommend both: they’re better than any of the book-based courses I’ve tried. They broadly follow the same method, drip-feeding vocab and asking you to translate increasingly complex utterances into Mandarin; you’re given pauses for this, after which you hear answers. The Pimsleur is drier in style and contains more speech from native speakers. The Michel Thomas is cosier and particularly good for those who aren’t “language naturals”: you eavesdrop on lessons given to two ordinary English-speaking students who make mistakes and are corrected, with a single native speaker on hand.
For the pronunciation of consonants and vowels, both courses follow the Sweet-approved precept that “the forein language should be lernt by imitation”. But it’s optimistic to think that English-speaking students will pick up Mandarin’s lexical tones by imitation alone, and both courses point them out explicitly. Indeed the Michel Thomas lays more emphasis on tone than any course I’ve encountered (book, audio or classroom), consistently referring to mnemonic colour-coded hand-shapes; the proof of the pudding is that both of the recorded students (one American, one British, neither a linguistic savant) do a generally good job of recalling and performing words’ tones without prompting.
But there’s a wrinkle. In Mandarin, three of the five tones are stable, and relatively easy to learn. These are Tone 1 (high level – in Michel Thomas, “green thumb out”), Tone 2 (a high rise, i.e. mid to high – “blue finger up”), and Tone 4 (a fall from high to low – “black finger down”). Here are ‘lose’, ‘ten’, and ‘be’ as pronounced by Google Translate (segmentally, they’re all shi):
The wrinkle is that the other two tones, Tone 3 and the neutral tone, vary depending on context – a fact that goes unacknowledged by the two courses.
Neither Pimsleur nor Michel Thomas has much to say about neutral tone; neither points out that it is conditioned by the tone to the left. But I don’t think that’s a huge problem. If the learner always pronounces neutral-tone syllables with shortened vowels and a middish pitch, s/he won’t often be far wrong.
Tone 3 is more problematic. Its default pitch is low (level or slightly falling), but there are two other forms: when it’s followed by another Tone 3 within a unit, it becomes a high rise (= Tone 2), and when it’s phrase-final, speakers optionally produce it as a low rise (low to mid).
Here, again from Google Translate, is ‘I’m fine’, which is made up of three syllables, wo hen hao, literally ‘I very good’. All three are Tone 3 syllables, but each exhibits a different variant of it. The last syllable exhibits the optional phrase-final low rise; the middle syllable becomes Tone 2, because it makes a phrasal unit with the following Tone 3 syllable; and the first syllable has the default low pitch:
The contextual variability of Tone 3 isn’t a marginal aspect of the language. Tone 3 syllables, needless to say, constitute about a quarter of Mandarin’s vocabulary, and sequences of them are common. Mandarin courses stereotypically begin with these exchanges: “Hello.” “Hello.” “How are you?” “I’m very well. What about you?” “I’m well too.” The Mandarin equivalents of these consist entirely of Tone 3 syllables, plus two neutral tone syllables.
Not only do Pimsleur and Michel Thomas teach that Tone 3 has a single unvarying pronunciation; worse, they claim that this pronunciation is not as described above, but falling-rising. In the Michel Thomas course, this is reinforced with the colour-coded mnemonic “Red V for Victory”.
It’s one thing for written language courses to misrepresent pronunciation. But when it’s an audio-only course, there’s bound to be trouble. The Pimsleur narrator repeatedly draws our attention to patterns we don’t actually hear. Listen to the discrepancy between the pronunciation and the description of the Mandarin word for ‘but’, keshi, whose first syllable has Tone 3 in its default (low) form, followed by a syllable with Tone 4 (fall):
Michel Thomas goes one step further, and falsifies the spoken Mandarin to fit the incorrect English description:
When the American student is asked to say ‘but’, he unsurprisingly but incorrectly produces an English-style fall-rise on the first syllable; this is accepted as correct and followed again with the artificial version by the native speaker (pauses edited out):
And when the other student attempts ‘America’ (meiguo, which has Tone 3 on the first syllable and Tone 2 on the second), she puts a big British fall-rise on the first syllable – for which the incorrect teaching is of course to blame. This is perceived as too deviant by the native speaker, who insists on her own artificial low-falling-rising pattern; whereupon the English-speaking teacher chimes in to confuse matters further by reiterating the party line about the V for Victory but then pronouncing mei as a low rise with no initial fall whatsoever:
Here, from Pimsleur, is the real, actual pronunciation of ‘America’ (with neither a V nor even a rise on mei):
You might be wondering where the “V” obsession comes from – unless of course you know a little pinyin. Pinyin uses one invariant diacritic for Tone 3 syllables, and it is, surprise surprise, a V-shape: wǒ hěn hǎo, kěshì, měiguó.
It may be that a complex falling-rising pattern was once more evident in Mandarin than it is in the modern language; the very fact that Tone 3 has developed context-dependent alternants is evidence for its having had a more complex (and so less stable) form. To Zhou Youguang, the written mark [ˇ] clearly seemed the best option. But the fall and the rise are rarely both present; any fall is never more than slight; and Tone 3 never resembles the fall-rise intonation of English and other European languages. Moreover, disregarding its alternation is like teaching English learners that drive and driven have the same vowels, or do and don’t, or that –ed is pronounced the same in wanted and walked.
It might be objected that tone alternations are too hard for western learners. But both Pimsleur and Michel Thomas are happy to teach, quite early on, that the negative word bu has Tone 4 as a default, but Tone 2 when it’s followed by another Tone 4. Why would they go out of their way to emphasise this detail of tone alternation? The answer’s simple: because this alternation is shown in pinyin: bú before Tone 4 syllables, bù elsewhere.
So these courses are not, first and foremost, teaching us the “living spoken form” of the language. They’re teaching us to speak like a writing system. They teach that Tone 3 is pronounced invariantly because it’s written invariantly. They teach that its pitch is V-shaped because it’s written with a V-shape. Tonal alternation is taught for the word bu because it’s written for the word bu. And neutral tone has no written mark, so its pitch gets no description.
It’s deeply ironic that – despite the global victory of Zhou’s system to ‘spell sound’ with pinyin, and despite our living in an age of easily downloadable audio tuition – deference to writing still has the power to undermine the practical study of language. I do like the Pimsleur and the Michel Thomas courses, but I wish their creators had read Sweet (1884):
The first great step wil be to discard the ordinary spelling entirely in teaching pronunciation, and substitute a purely fonetic one, giving a genuin and adequate reprezentation of the actual language, not, as is too often the case, of an imaginary language, spoken by imaginary ‘corect speakers’.