Maty, Hany and I have officially released the DeepSpeak dataset*. Tired of using the usual poor quality, non-consensual and limited deepfake datasets found online, we decided to make our own. We built a tool to collect webcam data from consenting crowd workers online and create a diverse range of audio and visual deepfakes.
Now in it’s second iteration, the dataset comprises over 50 hours of footage of 500 diverse individuals (recorded with their consent), and 50 hours of deepfake video footage consisting of multiple variants of: (1) three face-swaps deepfake generators; (2) four lip-sync deepfake generators; and (3) three avatar deepfake generators. The dataset also contains both natural voices and voices from three deepfake voice generators (shout out to ElevenLabs and PlayAI for making their APIs available to us).
This data is publicly available by request under academic and commercial licenses at the below links.
Paper: https://lnkd.in/gtjBSHJR
Dataset v1.1: https://lnkd.in/gFDbwDja
Dataset v2.0: https://lnkd.in/gXb-jTeP
Work on v3.0 is already underway.
* yes, we named our dataset before the release of the DeepSeek AI model!