AdvReverb: Rethinking the Stealthiness of Audio Adversarial Examples to Human Perception
Published in IEEE Transactions on Information Forensics and Security, 2023
Recommended citation: Meng Chen, Li Lu*, Jiadi Yu, Zhongjie Ba, Feng Lin, Kui Ren. "AdvReverb: Rethinking the Stealthiness of Audio Adversarial Examples to Human Perception." IEEE Transactions on Information Forensics and Security. 19, pp. 1948-1962. 2024. doi: 10.1109/TIFS.2023.3345639.
IEEE Transactions on Information Forensics and Security is a premier journal on information security and signal processing, which is sponsored by IEEE Signal Processing Society. IEEE TIFS is a CCF-A journal.
Abstract: As one of the most representative applications built on deep learning, audio systems, including keyword spotting, automatic speech recognition, and speaker identification, have recently been demonstrated to be vulnerable to adversarial examples, which have already raised general concerns in both academia and industry. Existing attacks follow the same adversarial example generation paradigm from computer vision, i.e., overlaying the optimized additive perturbations on original voices. However, due to the additive perturbations’ nature on human audibility, balancing the stealthiness and attack capability remains a challenging problem. In this paper, we rethink the stealthiness of audio adversarial examples and turn to introduce another kind of audio distortion, i.e., reverberation, as a new perturbation format for stealthy adversarial example generation. Such convolutional adversarial perturbations are crafted as real-world impulse responses and behave as a natural reverberation for deceiving humans. Based on this idea, we propose AdvReverb to construct, optimize, and deliver phoneme-level convolutional adversarial perturbations on both speech and music carriers with a well-designed objective.