Tailoring Multimodal AIGC to Time-Honored Brands: A Stable Diffusion-Based Framework for Visual Generation and Evaluation
DOI:
https://doi.org/10.70695/IAAI202504A6Keywords:
Time-honored Brand; Stable Diffusion; Cultural Feature Embedding; Multimodal Control; Efficient Parameter Fine-Tuning; Reliability Calibration; Visual GenerationAbstract
To address the dual requirements of cultural expression and engineering implementation in the visual design of time-honored brands, this study proposes an adaptive optimization architecture based on Stable Diffusion. The framework employs Textual Inversion to derive composable cultural tokens and utilizes LoRA/DreamBooth parameters for the efficient fine-tuning of both generic and proprietary styles. By integrating ControlNet and IP-Adapter, the system achieves a fusion of layout and style priors, while a dual-channel gating mechanism enables collaborative control over semantics and composition. During inference, reliability in prompt adherence is calibrated through CFG-Rescale, attention reweighting, and temperature scaling. Extensive experiments on publicly available multimodal datasets and real-world brand scenarios demonstrate a significant improvement in the alignment between objective metrics and human evaluations. The method's stability and necessity are confirmed through robustness tests and component ablation studies, while A/B testing reveals its distinct advantages in cost-effectiveness and operational efficiency. This research ultimately provides a replicable and verifiable technical solution for the visual generation needs of both cultural heritage and commercial brands.