A Dual-Stage Chinese Instruction Jailbreaking Framework for Generative Large Language Models

Yingkun Huang; Xiaoru zhuang; Shihao Song

doi:10.70695/IAAI202504A5

Authors

Yingkun Huang China Electronics Data Corporation Author
Xiaoru zhuang School of Mechanical and Electrical Engineering, Shenzhen Polytechnic University Author
Shihao Song China Electronics Date Corporation, Author

DOI:

https://doi.org/10.70695/IAAI202504A5

Keywords:

Large Language Models; Prompt Injection; Jailbreak; Chinese Cotext; Security Evaluation

Abstract

Large Language Models (LLMs) equipped with advanced reasoning capabilities have demonstrated impressive performance across natural language tasks, yet remain susceptible to context-dependent or partially obfuscated safety-sensitive instructions, particularly in Chinese-language settings. To systematically assess these risks, this paper introduces a Dual-Stage Instruction Safety Evaluation Framework (DISEF) comprising Virtualized Scenario Embedding (VSE), which embeds queries into semantically benign contexts to examine alignment stability under scenario-driven shifts, and Formal Payload Splitting (FPS), a controlled diagnostic technique for analyzing robustness when models process fragmented or implicitly encoded risk-related content. The framework is validated using the IJCAI 2025 Generative LLM Security Attack-Defense benchmark, covering prompt diversity, risk-consistency assessment, and content-level risk distribution across multiple representative LLMs. Experimental findings reveal notable discrepancies in alignment robustness, highlighting cross-model vulnerability patterns and exposure points within Chinese instruction-processing pathways. The proposed framework provides actionable insights for strengthening safety alignment, enhancing threat detection mechanisms, and supporting the development of standardized evaluation approaches for next-generation generative AI systems.

A Dual-Stage Chinese Instruction Jailbreaking Framework for Generative Large Language Models

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

Categories

How to Cite

Language

Change of Organizing Institution