Does reproduction of more precise spatiotemporal dynamics for 3D avatar faces increase the recognition accuracy of facial expressions?

S Nagata, Y Arai, Y Inaba, S Akamatsu

Department of Applied Informatics, Hosei University, Japan
Contact: syunsuke.nagata@akamatsu.info

Facial expressions are recognized by humans more accurately when they are presented in a motion picture than in a still image. A common method to create motion for facial expressions is to synthesize the intermediate image frames between the starting neutral face and the final frame corresponding to the peak of the expression by an image morphing technique (i.e., linear interpolation of the images). However, to produce more precise and solid spatiotemporal dynamics for 3D avatar faces, we adopted a different approach [Kuratate et al., 2005, Journal of the IIEEJ, 34(4), 336-343], in which a face’s 3D shape model was transformed based on the motion data of a real human face measured by a motion capture system. For both the 3D shape and motion data, we calculated the displacement from the neutral face and represented them in low-dimensional parameters by PCA. By machine learning, we derived the transformation matrix applicable for estimating the parameter representing the 3D shape from that of the motion. This step allowed dynamic transformation of the 3D faces controlled by the motion capture data while generating facial expressions. Through a preliminary subjective experiment, facial expressions dynamically synthesized by our proposed method were found more perceptible than motion pictures generated by the previous linear morphing method as well as still images.

Up Home