Animating speech can be one of the most difficult tasks in animation; the process of matching the mouth-movements of your animation to the phonemes of your audio track is most commonly known as lip-synching. For a quick fix, it's no problem to just animate the mouth opening and closing, and it's a simple shortcut, especially when animating for the web. But if you want to add actual expression and realistic mouth-movements, it helps to study how the shape of the mouth changes with each sound. There are dozens upon dozens of variations, but my sketches are renderings from the basic ten shapes of the Preston Blair phoneme series. (They're also an example of what happens when Adri dashes off ten-minute sketches from memory rather than detailed artwork.)
These ten basic phoneme shapes can match almost any sound of speech, in varying degrees of expression--and with the in-between frames moving from one to the other, are remarkably accurate.
You may want to keep this for reference.
- A and I: For the A and I vowel sounds, the lips are generally pulled a bit wider, teeth open, tongue visible and flat against the floor of the mouth.
- E: The E phoneme is similar to the A and I, but the lips are stretched a bit wider, the corners uplifted more, and the mouth and teeth closed a bit more.
- U: For the U sound, the lips are pursed outwards, drawn into a pucker but still somewhat open; the teeth open, and the tongue somewhat lifted.
- O: Again the mouth is drawn to a pucker, but the lips don't purse outwards, and the mouth is rounder, the tongue flat against the floor of the mouth.
- C, D, G, K, N, R, S, Th, Y, and Z: Long list, wasn't it? This configuration pretty much covers all the major hard consonants: lips mostly closed, stretched wide, teeth closed or nearly closed.
- F and V: Mouth at about standard width, but teeth pressed down into the lower lip. At times there can be variations closer to the D/Th configuration.
- L: The mouth is open and stretched apart much like the A/I configuration, but
- M, B, and P: These sounds are made with the lips pressed together; it's the duration that matters. "M" is a long hold, "mmm"; "B" is a shorter hold then part, almost a "buh" sound; P is a quick hold, puff of air.
- W and Q: These two sounds purse the mouth the most, almost closing it over the teeth, with just the bottoms of the upper teeth visible, sometimes not even that. Think of a "rosebud mouth".
- Rest Position: Think of this as the "slack" position, when the mouth is at rest--only with the thread of drool distinctly absent.
When you're drawing or modeling your animation, by listening to each word and the syllable combinations inherent you can usually break them down into a variation of these ten phoneme sets. Note that my drawings aren't perfectly symmetrical; that wasn't just shoddy sketching. No two people express themselves in an identical fashion, and each has individual facial quirks that make their speech and expressions asymmetrical.
Whenever I look through animation discussions on lip-synching—and some become vastly more complicated than this—I always remind myself of Kermit the Frog.
A guy’s hand in a sock puppet. The mouth opens. The mouth closes. Sometimes the mouth scrunches up.
It’s easy to become seduced by interesting complications.
It’s better to embrace simplicity.