How do you detect a cloned voice? The simple answer… deep learning. Hugely enjoyed presenting our different detection algorithms and the relative benefits of each at the 2023 IEEE International Workshop on Information and Forensics (WIFS) in Nuremburg, Germany.
We presented a range of methods; from black-box neural networks to human-understandable features. Although deep learning outperforms all other algorithms, getting consistently close to 100% accuracy in the lab, it struggles to generalize in the wild- and offers very little explainability. While easy to understand, simple perceptual features such as pauses, volumes, and breaths are no longer enough to combat increasingly human-like cloning tools. However, we did find that signal processing features on the audio waveform afforded a good compromise between explainability and accuracy.
Paper available in IEEE conference proceedings, or pre-print via arXiv: https://arxiv.org/pdf/2307.07683.pdf