A Dual-Stage Chinese Instruction Jailbreaking Framework for Generative Large Language Models

Authors

  • Yingkun Huang China Electronics Data Corporation Author
  • Xiaoru zhuang School of Mechanical and Electrical Engineering, Shenzhen Polytechnic University Author
  • Shihao Song China Electronics Date Corporation, Author

DOI:

https://doi.org/10.70695/IAAI202504A5

Keywords:

Large Language Models; Prompt Injection; Jailbreak; Chinese Cotext; Security Evaluation

Abstract

Large Language Models (LLMs) equipped with advanced reasoning capabilities have demonstrated impressive performance across natural language tasks, yet remain susceptible to context-dependent or partially obfuscated safety-sensitive instructions, particularly in Chinese-language settings. To systematically assess these risks, this paper introduces a Dual-Stage Instruction Safety Evaluation Framework (DISEF) comprising Virtualized Scenario Embedding (VSE), which embeds queries into semantically benign contexts to examine alignment stability under scenario-driven shifts, and Formal Payload Splitting (FPS), a controlled diagnostic technique for analyzing robustness when models process fragmented or implicitly encoded risk-related content. The framework is validated using the IJCAI 2025 Generative LLM Security Attack-Defense benchmark, covering prompt diversity, risk-consistency assessment, and content-level risk distribution across multiple representative LLMs. Experimental findings reveal notable discrepancies in alignment robustness, highlighting cross-model vulnerability patterns and exposure points within Chinese instruction-processing pathways. The proposed framework provides actionable insights for strengthening safety alignment, enhancing threat detection mechanisms, and supporting the development of standardized evaluation approaches for next-generation generative AI systems.

Published

2025-12-31

How to Cite

Huang, Y., zhuang, X., & Song, S. (2025). A Dual-Stage Chinese Instruction Jailbreaking Framework for Generative Large Language Models. Innovative Applications of AI, 2(4), 11-20. https://doi.org/10.70695/IAAI202504A5