Combining the Optimal Seed Number Determining and SCE-based Data Augmentation for Logical Access Attack Detection
DOI:
https://doi.org/10.70695/skknxc45Abstract
Recent advancements in voice conversion and text-to-speech technologies enable the creation of natural-sounding speech, posing challenges to automatic speaker verification systems. In response, research into spoofing countermeasures has intensified to safeguard ASV systems from these threats. While advanced spoofing countermeasures can detect known types of spoofing attacks, their effectiveness diminishes against unknown attacks which have not appeared in the training set. In this work, to address the challenge of determining the optimal baseline from the best seed number, and to ensure that both we and others can replicate and potentially enhance the results with ease, we propose a method for best seed number seeking. Building on the optimal baseline, we proposed a novel data augmentation technique termed SCE, which is rooted in signal companding and expanding. Specifically, signal companding employs a-law and mu-law algorithms, whereas signal expanding leverages bit-24 and bit-32 variants of the original training set. We believed that our proposed method enhances the robustness of the detection system through data augmentation with SCE. Our investigations utilize the ASVspoof 2019 logical access corpus and employ a ResNet-based system. The results reveal that the SCE technique surpasses the performance of many leading single systems, demonstrating its prowess in tackling the unpredictable nature of attacks. Notably, its t-DCF and EER metrics achieve scores of 0.050 and 1.60% respectively, which can rank top several systems to date.
