MSadTalker: Modified Stylized Audio-Driven Single Image Talking Face Animation Based on Head Motion Generation and Visual Silence Detection

Yuanlin Wang; Wen He; Qijun Yao; Jichen Yang

doi:10.70695/IAAI202601A8

Authors

Yuanlin Wang Guangdong Polytechnic Normal University Author
Wen He Guangdong Polytechnic Normal University Author
Qijun Yao Guangdong Polytechnic Normal University Author
Jichen Yang Guangdong Polytechnic Normal University Author

DOI:

https://doi.org/10.70695/IAAI202601A8

Keywords:

Talking Head Synthesis; Audio-Driven Animation; Head Pose Generation; Silence Detection; Cross-Lingual Robustness

Abstract

In order to address two critical issues in stylized audio-driven single-image talking face animation (SadTalker)—namely unnatural head motion in cross-lingual speech and unsynchronized lip movement during silent periods—this paper presents a modified version SadTalker called MSadTalker. The proposed method integrates head motion generation and lip motion-based silence detection into the original SadTalker framework. Specifically, a cosine function is employed to generate natural head motion, while lip movement analysis is applied to detect visual silence. The head motion generation module produces stable, human-like head rotations using preset amplitude and frequency parameters, effectively suppressing unnatural jitter in cross-lingual scenarios. The silence detection mechanism identifies silent intervals by computing derivatives of lip keypoint motion and applying threshold-based judgment, thereby directly suppressing unnecessary head and lip movements during silence to enhance end-to-end synchronization and realism. Experiments demonstrate that MSadTalker achieves higher stability and robustness across multiple language environments, including Chinese and English. It exhibits smoother and more natural head motion trajectories, along with more stable posture maintenance during silent periods.

MSadTalker: Modified Stylized Audio-Driven Single Image Talking Face Animation Based on Head Motion Generation and Visual Silence Detection

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

Categories

How to Cite

Language

Change of Organizing Institution