VoiceCloak: Adversarial Example Enabled Voice De-Identification with Balanced Privacy and Utility

Published in Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), 2023

Recommended citation: Meng Chen, Li Lu*, Junhao Wang, Jiadi Yu, Yingying Chen, Zhibo Wang, Zhongjie Ba, Feng Lin, Kui Ren. "VoiceCloak: Adversarial Example Enabled Voice De-Identification with Balanced Privacy and Utility." Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT). 7(2), pp. 48:1-48:21. Cancun, Mexico. 2023. doi: 10.1145/3596266.

Proceedings of ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies is the premier international journal in ubiquitous computing, which was previously the top ranked international conference ACM UbiComp. ACM UbiComp is a CCF-A conference.

Abstract: Faced with the threat of identity leakage during voice data publishing, users are engaged in a privacy-utility dilemma when enjoying the utility of voice services. Existing machine-centric studies employ direct modification or text-based re-synthesis to de-identify users’ voices but cause inconsistent audibility for human participants in emerging online communication scenarios, such as virtual meetings. In this paper, we propose a human-centric voice de-identification system, VoiceCloak, which uses adversarial examples to balance the privacy and utility of voice services. Instead of typical additive examples inducing perceivable distortions, we design a novel convolutional adversarial example that modulates perturbations into real-world room impulse responses. Benefiting from this, VoiceCloak could preserve user identity from exposure by Automatic Speaker Identification (ASI), while remaining the voice perceptual quality for non-intrusive de-identification. Moreover, VoiceCloak learns a compact speaker distribution through a conditional variational auto-encoder to synthesize diverse targets on demand. Guided by these pseudo targets, VoiceCloak constructs adversarial examples in an input-specific manner, enabling any-to-any identity transformation for robust de-identification.

View the full paper