Lucas quiz – the answers

toy-930614_1920This post supplies my interpretations, with discussion, of the audio clips in my George Lucas ear-training quiz for non-native learners/users of English. Many thanks to all those who’ve submitted their answers so far. If you haven’t yet taken the quiz but would like to before reading these answers, you’re welcome to do so at any time. All responses are useful and appreciated, and you can submit your answers anonymously.

Two points are repeatedly shown up by the captioning software. One is that its errors are phonetically quite reasonable, reflecting inadequate ‘higher level’ knowledge of words’ probability in a given context. Secondly, the errors frequently involve weak syllables, notably the weak forms which function words typically have in native speech. Any non-native who has to understand native speech, whether for work, pleasure or language testing, must expect such weak forms.

The answers

1. I hated school, uh, I loved to build things

Lucas prolongs the first person pronoun to something like [aaaiij] while he decides what to say next. The captioner has quite reasonably interpreted this as two syllables, I then E. Note that natives will often say diphthong-vowel sequences without the glottal stop which non-natives are prone to insert between them.

Lucas’s FACE vowel in hated is quite a close monophthong, [he̝ɾɪd]. (There’s a well-established tradition of transcribing AmE FACE and GOAT as monophthongs, /e/ and /o/.) The captioner found this particular vowel token close enough to be the vowel of he.

Context tells us that Lucas almost certainly intended the past tense loved rather the present tense love, but in phonetic terms this isn’t an error. Final /t/ and /d/ may be deleted between consonants if the preceding consonant agrees in voicing. In loved‿to, the /d/ is eligible for deletion, making it sound identical to love to. In native speech, the contrast between past and present forms is often neutralized in this way.

2. …cars and engines… worked in a foreign car service

The subject of the verb worked is I, but Lucas omits this as it’s already been uttered in the preceding context: ‘I loved working on cars and engines…’

Lucas doesn’t pronounce worked very strongly. The captioner has correctly identified that the initial consonant is labial, and that the vowel is r-coloured.

3. and, uh, was almost killed and as a result of that sort of sat in a/the hospital for a long time thinking about my place in the world

As in 1., the captioner has interpreted a prolonged syllable as a two-syllable initialism. The vowel quality of Lucas’s ‘uh’ is somewhat frontish.

The captioner identifies the phrase as a result, although the a is not clearly pronounced. But it’s missed the preceding and, which is clearly present in its most common form, /ən/.

Lucas’s sort of sat in [so˞vsætn] has been interpreted as service at /sə˞vɨsæt/. In terms of word-recognition this seems a big error, but comparison of those transcriptions shows how much of the phonetic material it’s got right.

The sequence /n ð/ often becomes [nː] or simply [n], so [ɪnə] here could be either in a or in the.

4. and, uh, took my very bad grades to a junior college

This begins [ænəː], which is consistent with the name Anna. Clearly the captioner has difficulty identifying pause vowels, preferring to choose real words. It has also missed the unreleased /k/ on took. If Lucas had really said to my, then to would probably have been pronounced in its weak pre-consonant form /tə/, but what he says is [tɵk̚] or [tʊk̚].

5. and then, uh, when I graduated and I was about to go on to the last two years of college

The captioner has interpreted Lucas’s when I as ten. This may at first seem puzzling, but immediately before when Lucas produces a dental/alveolar click, quite a common pre-utterance habit with many speakers (I often do it myself). I think the captioner has interpreted the click as the plosion of a /t/.

Lucas’s and is weakened to just a nasal, very common in native speech. And I was is roughly [n̩awəz], making now is a reasonable interpretation of that stretch.

In native speech, of frequently weakens to /ə/ before a consonant, making it homophonous with the indefinite article.

6. uh, at, uh, San Francisco State to get a degree in anthropology

Again, the captioner prefers to interpret pause vowels as words. The first is taken to be the, the second to be to, both of which commonly contain schwa.

Lucas has been associated with the San Francisco area all his life. For him it’s a completely familiar name and here he compresses it down into something like [sæɱsɪsko]. Little wonder the captioner guesses only three syllables, send to the.

I’m surprised that the captioner missed the prolonged weak form of to [təː] between state and get. The final /t/ of state and the initial /t/ of to are combined into one long [tː], absolutely routine in native speech.

7. my best friend who I’d grown up with since I was four years old

Lucas contracts I had into I’d, as is normal in native speech. If the utterance had actually been who had grown up, the /h/ on the auxiliary verb had would probably have been lost, so it’s a reasonable error.

8. an artist, a photographer, I’d done a lot of photography and I wanted to be an

The captioner misses another contraction here, I’d. Phonetically, the contracted had survives only in slight lengthening of [dː].

Lucas’s a lot of photography and is [əˈlɑɾəfəˈtɑgrfiən]. The phrase a lot of a talker fear would be something like [əˈlɑɾəvəˈtɑkrfiər]. Phonetically, that is, it’s mostly correct. This kind of error just goes to show how much ‘higher level’ knowledge is involved in our comprehension of speech: our minds naturally and rapidly home in on word choices that are more likely in a given context. It’s very difficult to give this wide-ranging capacity to a machine.

9. and it wasn’t a school of photography, it was a school of cinematography

Natives usually pronounce of as /ə(v)/, and in AmE there is little difference (for many speakers, none at all) between schwa and the STRUT vowel of love. So the difference between Lucas’s school of and the captioner’s school love is largely one of durational subtleties.

Note that Lucas moves the main stress of cinematography from the fourth syllable to the first syllable, because of the contrast with photography. This is quite normal in native speech.

10. I/I’d never heard of such a silly thing

This is something like [ænvrˈhə˞ɾ(əsədʒə)ˈsɨliˈθɪŋ]. The portion in parentheses (of such a) is so compressed that it’s not too surprising the captioner has missed it.

In native speech evening is generally two syllables, not three, so the interpretation of silly thing [ˈsɨliˈθɪŋ] as -sal evening [ˈsɨlˈivnɪŋ] is not too far off.

Lucas gives this fairly common English expression (silly can be replaced with stupid, crazy, etc) its usual intonation pattern, with the main accent on heard. Such a silly thing is de-accented, implying that the silliness is obvious. Similarly, the compound noun rehearsal evening would have the main stress on the first word; so the captioner’s error has not only segmental but also prosodic justification.

At this tempo I can’t tell whether there’s a contracted perfective had between I and never. In BrE, the perfective would be more likely. AmE is more able to use the plain past with perfective meaning.

11. so, uh, and I hadn’t really paid much attention to the movies when I was young

The captioner ignores rather than misinterprets the hesitation vowel here, which is the right thing to do: careful human subtitlers wouldn’t include them.

Again, a contraction has been missed, and this one really matters, because it makes the sentence negative. Lucas’s hadn’t really is [hædnrili]; the [n] is small, but it’s there. Context helps the native listener, because the following word much also indicates a negative, not…much. The corresponding positive sentence would have been I’d really paid a lot of attention.

Lucas really does swallow the words to the. The striking thing here is not that the captioner missed them, but that the native listener’s mind so readily reconstructs them from so little phonetic evidence.

12. you know, ‘Bridge on the River Kwai’ and a few things like that, but

David Lean’s 1957 film epic The Bridge on the River Kwai was a huge success, won many Oscars and is widely considered an all-time great (it starred Alec Guinness, whom Lucas would later cast as Obi-Wan Kenobi in Star Wars). This is specific knowledge that the captioner apparently lacks, the problem augmented by the weakness of the, and, a. Lucas’s eight syllables on the River Kwai and a few have been interpreted as only five, in reply into. The take-home point for non-natives is just how weak such weak forms can easily be.

13. and, um, I, we didn’t get a television til I was ten

The captioner has missed another contracted negative: Lucas’s we didn’t get a is [wiɾɪŋgɛɾə]. Natives use the word not in its strong form far, far less than non-natives. Didn’t often becomes di’n’t in conversational speech.

The vowel of til is quite mid-central here, so tower is not as wild a guess as it looks.

14. it blew everybody away in the Film Department, and they sent it out to festivals, and it won like 47 festivals

The errors here all involve the overlooking of weak vowels. The parenthesized vowels are very weak in Lucas’s (a)way(i)n, so it’s not surprising this has been intepreted as when. The same applies to sent(i)t, which has been interpreted as one syllable, turn. Clearly the captioner is not very familiar with the word festivals, though it’s common enough in the film world. Both fast balls and vessels miss the word’s weak middle vowel. Lastly, the parenthesized vowel of and (i)t won has been missed.

Note that Lucas breaks the rule that repeated items should be de-accented. He puts a nuclear accent on the repetition of festivals despite the fact that he’s only just said it. Nonetheless, I recommend that non-natives learn and use the rule.

15. whether it was anthropology, uh, whether it was art

Whether and weather are homophones for most speakers in America (and England), so this is not a phonetic error. The captioner has missed both of the weak it pronouns. The phonetic evidence for these is slight but real: a decrease in volume corresponding to the /t/, perhaps with some glottalization.

16. so, no matter which route I took, because I cared about all of them

Lucas says route very clearly, using the pronunciation with the MOUTH vowel which is pretty common in America. It seems the captioner is more familiar with the pronunciation with the GOOSE vowel.

The interpretation of Lucas’s took as talk is puzzling. He says it clearly. If it were either of the AmE pronunciations of talk, /tɑk/ or /tɔk/, it would be backer, opener and longer. Perhaps the ‘problem’ is precisely that Lucas says it so strongly, with nuclear intonation; the captioner may be more familiar with took in a weaker pronunciation.

The past tense ending on cared is a flapped [ɾ], but is definitely there.

17. just had never happened, so you were just doomed to be a ticket-taker at Disneyland

Here you is very weak, its main effect being to pull forward the preceding vowel of so, creating something like [sɛj]. The captioner has reasonably identified this as a single syllable, the subject pronoun they.

The second syllable of ticket is also weak, with no voiced vowel, giving something like [tɪktːekr]. The unfortunate consequence of this is the interpretation dick taker.

Lucas’s meaning is that it was unprecedented in his day for a film school graduate to get work in the movies, so the closest one could get was a menial job at Disneyland.

18. happiness is pleasure, and happiness is joy

Lucas’s joy is nuclear-accented and therefore long, with the two elements of the diphthong clearly differentiated, something like [dʒoːeː]. It’s therefore not too surprising that the captioner has detected two syllables. Julie is one of the few two-syllable words beginning with and an o/u type vowel, followed by an e/i type vowel.

19. um, and it uh peaks and then goes down, it peaks very high

Again the captioner has interpreted a hesitation as a word, detecting the final nasal and guessing at on.

Lucas’s then is weak and missing the initial /ð/ as is quite possible after and, giving a rapid [ənen] or [ənɪn]. (Compare the optionality of /ð/ in 3., where [ɪnə] might be in a or in the.) The captioner has found only one syllable and guessed at in.

The second it is weak but its vowel is definitely there. The captioner seems to have decided that this was just the release of the final /n/ in down.

Detecting pics for peaks is definitely a mistake: native listeners, I’m fairly sure, will hear this as the FLEECE vowel of peaks and not the KIT vowel of pics (regardless of the context). It’s interesting to see the captioner struggling, as so many non-natives do, with this subtle but vital contrast in native English.

20. …doesn’t work, so, if you’re trying to sustain that level of peak pleasure

The final consonant of peak isn’t released, and I don’t hear a clear indication of its place of articulation; perhaps the best transcription is [piʔplɛʒr]. Certainly the context tells us that this must be peak rather than Pete. Having chosen Pete, the captioner seems to have decided that the preceding word is more likely to be with than of, though the latter clearly corresponds better to the audio.

21. it’s a selfish, self-centered emotion, that’s created by a self-centered motive of greed

Lucas’s it’s a is very weak [ɪsə] and has been missed by the captioner.

His initial vowel in emotion is schwa, as is typical for AmE, so it sounds exactly the same as a motion. (Many BrE speakers, me included, begin emotion with the KIT vowel.)

Subordinating that’s and auxiliary has can easily be more similar in native speech than non-natives might expect. Both typically contain schwa. /ð/ is a very weak sound, while /h/ is often lost from the beginning of an unstressed syllable. And Lucas has omitted the /t/, as he did in it’s.

The of is weak [əv], again typical of native speech. The captioner seems to have ignored the faint schwa and interpreted this as an extension of the final /v/ of motive.

22. …happiness. So, with that, I’m gone

Diphthongs like the MOUTH vowel of out may have very little change in quality when clipped (shortened) by a following voiceless obstruent like /t/, as in without. So phonetically it’s quite reasonable for the captioner to have heard with that as without.

A very Happy New Year to all!

7 replies
  1. Seth
    Seth says:

    Fascinating post! Here are my thoughts on some of the things you wrote:

    3. Note also that the vowel he uses in sort there is lightly-rounded and centralized. I’ve noticed a similar thing in posh English speech, where sort of is often something like [ˈsəɾəv]. There the R isn’t pronounced, however.

    10. To my American ears, that one sounds like They’d never heard of such a silly thing. I’d have to watch the video these clips came from to see the context he said it in.

    14. The vowel he uses in the first syllable of festivals is noticeably open and retracted to my ear. The first syllable of that word does indeed sound close to fast to me, so I’m not surprised the captioner interpreted it that way. Maybe the captioner’s interpretation of the second syllable as balls has something to do with the secondary articulation of Lucas’ /l/ there?

    16. His vowel in took does sound a bit open to me there, so maybe that also helped lead to the interpretation of that word as talk.

    19. Again I think the quality of his vowel in um there might’ve also had something to do with the captioner’s interpretation. The vowel has a quality between [ɑ] and [ʌ]. I thought the the interpretation of high as hot there was interesting. In AmE, both words start with [h] and then an open, unrounded vowel. Then the tongue moves higher and fronter in the mouth.

    20. I hear a clear indication of the final consonant in peak‘s place of articulation. I think I read somewhere that [k̚] and [t̚] (both with no audible release) have different effects on the quality of the end of a preceding vowel. The word he says doesn’t sound like Pete, although I can see how non-native speakers might have a hard time differentiating those two words.

    Happy New Year to you too.

    • Geoff Lindsey
      Geoff Lindsey says:

      Thanks for commenting in such detail, Seth. Re 20, the acoustic effects of consonants on surrounding vowels are known as formant transitions. The signature of velars is the converging of the second and third formants, known in the trade as ‘velar pinch’. Spectrographic analysis shows that there is some pinch in the peak of 20

      but this might indicate either a move towards a fronted velar or a small diphthongal movement of the [ij] kind.

      The pinch is more marked in the second peaks of 19, where I hear a velar somewhat more clearly:

      But others may agree more with your ears than with mine.

  2. Andrew Usher
    Andrew Usher says:

    As I was perhaps the one that led you to consider American English more and thus come up with this quiz, I should provide my comments. (Having read this of course I can’t answer the quiz itself, and as a native speaker I shouldn’t anyway). Generally, your comments on the automated captioning are good and I don’t need to critique every one.

    1. Lucas’s ‘hated’ is indeed an unremarkable GA pronunciation. But I think it a mistake to always lump monophthongal FACE and GOAT together. Surely pronunciation guides have usually done so, but I think even within the British Isles, and certainly in America, the former is more common than the latter. Note that Lucas’s GOAT is not monophthongal, and indeed such a [o:] (except before a liquid) is often seen as a regionalism in the US.

    2. To my ears, Lucas says ‘_and_ worked …’. The very weak ‘and’ is correctly perceived by the transcriber as nasality; the fact that START before voiceless consonants (as in ‘marked’) can drift toward NURSE (this I consider related to ‘Canadian raising’) only helps the perception of ‘marked’.

    3. I agree in having the American NORTH/FORCE (in ‘sort of’) as [o] rather than [ɔ]; dictionary transcriptions are way behind the times on this one. Similarly SQUARE would be better as [e] than [ɛ]; these both may owe to Kenyon’s presumably extinct dialect having very lowered pre-[ɹ] vowels.

    I definitely think it’s ‘in the’ hospital, not only because that’s the usual American idiom but that I hear some segment (probably a flap) after the [n].

    5. Strange, isn’t it, that we use this sound – the click which we never use phonetically, so often. I also do it, and I suppose nearly everyone does it sometimes.

    6. Lucas’s ‘San Francisco’ is still four syllables, though the second is quite slurred. Weakening to three would be disfavored in English due to our alternating stress preference.

    8. The captioner has to be condemned for producing the nonsense ‘a talker fear’, though you’re quite right that it’s not far off phonetically given Lucas’s LOT/THOUGHT merger as [ɑ].

    9. This is controversial, but I don’t think Americans really merge STRUT (generally [ɐ] or nearly) and schwa in position (unlike many in England and Wales). There’s some overlap, and the strong forms of ‘of’ and a few other words do have STRUT. But I have even a minimal pair: the adjective ‘just’ (always STRUT) vs. the adverb ‘just’ (normally schwa even when stressed, unless hyper-articulating).

    10. Nasty for the captioner! I didn’t even get it when listening, because no context surrounded it.

    13. I would say that ’til’ is reduced toward schwa, which should not encourage the interpretation ‘tower’ – in GA, at any rate, the first element of that diphthong is never higher than STRUT (see above).

    14. Monophthongal FACE is misinterpreted again. It’s hard to understand how ‘festivals’, which is hardly slurred, can be missed here.

    16. ‘Route’ is notoriously variable in America. I’ve made a conscious decision to pronounce the noun always with GOOSE and the verb with MOUTH; I think this accords with the most common usages. Certainly ‘router’ and ‘routing’ in computing are always MOUTH here.

    17. It’s not ‘you’ that is very weak but the preceding ‘so’.

    18. Strange we didn’t get ‘Joey’ rather than ‘Julie’; neither’s very good but the former is surely closer.

    19. Lucas’s FLEECE is also monophthongal, which might cause the hearing as ‘pics’ if the captioner expects an off-glide on /i:/. ‘High’ is pronounced with the two elements somewhat separated (as was ‘joy’ in the previous one) but I don’t see how the second can be heard as a ‘t’.

    20. Already answered; ‘Pete’ is just a mistake.

    22. True to an extent, but here ‘with that’ has a voiceless and then a voiced ‘th’, impossible for ‘without’.

    k_over_hbarc at

    • Geoff Lindsey
      Geoff Lindsey says:

      Many thanks for providing all this, Andrew. Re 1, the other day I was struck that this dictionary, in which the IPA seems based on AmE (I don’t own it), chooses for FACE but for GOAT. Re 9, I’m sure there are Americans who merge STRUT and schwa. After I gave a talk to a general audience not long ago, a American (GenAm) ESL teacher came up to me with his dictionary; he said the one thing he’d always wanted to know was why it used two different symbols, ʌ and ə, for the same vowel.

      • Andrew Usher
        Andrew Usher says:

        I don’t have the dictionary you mention but it seems plausible that they started with Kenyon and Knott’s reasonable (for GA) /e/ and /o/ but changed the former – this is called a ‘world’ dictionary – to avoid clash with the unfortunate Gimson convention of using /e/ for DRESS.

        As for STRUT/schwa I did write that there’s some overlap. Schwa is normally unstressed and very underspecified phonetically, especially with the weak vowel merger. As bolstered by the formant diagrams in the recent article , American schwa has a common STRUT-like realisation in morpheme-final (and certainly phrase-final, as you mentioned for RP) position. But that doesn’t indicate a merger. People are far too influenced by the transcriptions and conventions they’ve encountered, many of which here do have STRUT=schwa starting in grade school, to trust them in general.

        In a similar discussion on John Wells’s blog, John Cowan proposed a test for the true merger: can a speaker pronounced ‘fungus’ (under stress) with the same vowel twice? If not, as for Cowan, and for me, there is no merger. I can force myself to say ‘fungus’ with schwa in the first syllable but it sounds really weird! (_Fungus_ is a good choice because its schwa is morpheme-internal, can’t be deleted, and isn’t adjacent to a liquid or nasal that can color it.)

  3. dainichi
    dainichi says:

    Native Danish and Japanese speaker here.

    In 13, I think I hear a very reduced “think” between the “I” and the “we”. It could also be that he’s cutting himself off after the “I” with some sort of glottal stop, but there’s definitely something strange going on.

    Also, later in that same clip, could that be “until” (or rather, ‘ntil), instead of “till”? I base that on the rhythm of the sentence and what I perceive as slight lengthening of the final [n] in “television”.

    • Andrew Usher
      Andrew Usher says:

      I think you’re right; I didn’t catch it because I was only focusing on what Geoff had commented on for that clip. Neither is definite, though; ‘think’ in ‘I think’ is mental, at most, nothing is uttered but an unintelligible sound – he switches from ‘I’ to ‘we’ so fast it could hardly be otherwise, and the latter might just be hesitation rather than a reduced syllable.

      Neither is as clear as the ‘and’ in 2 that Geoff missed. I paid attention to that because he notes ‘marked’ for ‘worked’ and I realised a better explanation of that is the reduced ‘and’ before ‘worked’.

      k_over_hbarc at

Leave a Reply

Leave a Reply

Your email address will not be published. Required fields are marked *