MSadTalker: Modified Stylized Audio-Driven Single Image Talking Face Animation Based on Head Motion Generation and Visual Silence Detection

Authors

  • Yuanlin Wang Guangdong Polytechnic Normal University Author
  • Wen He Guangdong Polytechnic Normal University Author
  • Qijun Yao Guangdong Polytechnic Normal University Author
  • Jichen Yang Guangdong Polytechnic Normal University Author

DOI:

https://doi.org/10.70695/IAAI202601A8

Keywords:

Talking Head Synthesis; Audio-Driven Animation; Head Pose Generation; Silence Detection; Cross-Lingual Robustness

Abstract

In order to address two critical issues in stylized audio-driven single-image talking face animation (SadTalker)—namely unnatural head motion in cross-lingual speech and unsynchronized lip movement during silent periods—this paper presents a modified version SadTalker called MSadTalker. The proposed method integrates head motion generation and lip motion-based silence detection into the original SadTalker framework. Specifically, a cosine function is employed to generate natural head motion, while lip movement analysis is applied to detect visual silence. The head motion generation module produces stable, human-like head rotations using preset amplitude and frequency parameters, effectively suppressing unnatural jitter in cross-lingual scenarios. The silence detection mechanism identifies silent intervals by computing derivatives of lip keypoint motion and applying threshold-based judgment, thereby directly suppressing unnecessary head and lip movements during silence to enhance end-to-end synchronization and realism. Experiments demonstrate that MSadTalker achieves higher stability and robustness across multiple language environments, including Chinese and English. It exhibits smoother and more natural head motion trajectories, along with more stable posture maintenance during silent periods.

Published

2026-03-31

How to Cite

Wang, Y., He, W., Yao, Q., & Yang, J. (2026). MSadTalker: Modified Stylized Audio-Driven Single Image Talking Face Animation Based on Head Motion Generation and Visual Silence Detection. Innovative Applications of AI, 3(1), 30-38. https://doi.org/10.70695/IAAI202601A8