From One Stolen Utterance: Assessing the Risks of Voice Cloning in the AIGC Era

Published in Proceedings of IEEE S&P, 2025

Recommended citation: Kun Wang, Meng Chen, Li Lu*, Jingwen Feng, Qianniu Chen, Zhongjie Ba, Kui Ren, and Chun Chen. "From One Stolen Utterance: Assessing the Risks of Voice Cloning in the AIGC Era." Proceedings of IEEE S&P. San Francisco, CA, USA. 2025. doi: to appear.

IEEE Symposium on Security and Privacy is the premier forum for presenting developments in computer security and electronic privacy, and for bringing together researchers and practitioners in the field. IEEE S&P is a Top-four security and CCF-A conference.

Abstract: The advent of voice cloning has fundamentally threatened the role of voice as a unique biometric. Many criminal incidents have already been reported to demonstrate its significant risks of identity forgery. Previous works explored the risks of voice cloning in constrained settings, which require victim speakers to either be already seen in the training data of voice cloning models, or leak dozens of minutes of their speech samples to adversaries. However, with the rapid progress of voice cloning in AIGC (Artificial Intelligence Generated Content) era, these requirements have largely been released, leaving the exact risks of state-of-the-art (SOTA) voice cloning techniques shrouded in a dense fog. To uncover it, this paper conducts a large-scale study in real-world scenarios to assess the risks of advanced voice cloning techniques. This study involves 5 SOTA voice cloning techniques (open-source and commercial), across 8 SOTA voice authentication systems (open-source and real-world) and 30 human listeners, using voice data of over 7,000 speakers (public and custom). By experimental and theoretical analysis, this study reveals that 1) state-of-the-art voice cloning techniques pose severe threats in spoofing voice authentication systems and human listeners; 2) demographic factors such as age and gender of victim speakers have a subtle impact on voice cloning attacks; 3) human listeners’ subjective opinions and background about voice cloning play an important role in their susceptibility to attacks; 4) advanced detection methods still fail to identify voice cloning samples as expected.

View the full paper