Journal
IEEE TRANSACTIONS ON MULTIMEDIA
Volume 8, Issue 3, Pages 542-549Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TMM.2006.870732
Keywords
animation; audio-visual mapping; HMM; IOHMM; learning
Ask authors/readers for more resources
In this paper, we formulate the problem of synthesizing facial animation from an input audio sequence as a dynamic audio-visual mapping. We propose that audio-visual mapping should be modeled with an input-output hidden Markov model, or IOHMM. An IOHMM is an HMM for which the output and transition probabilities are conditional on the input sequence. We train IOHMMs using the expectation-maximization (EM) algorithm with a novel architecture to explicitly model the relationship between transition probabilities and the input using neural networks. Given an input sequence, the output sequence is synthesized by the maximum likelihood estimation. Experimental results demonstrate that IOHMMs can generate natural and good-quality facial animation sequences from the input audio.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available