Abstract:
Speaker recognition technology, deployed in sectors like banking, education, recruitment, immigration, law enforcement, and healthcare, relies heavily on biometric data. ...Show MoreMetadata
Abstract:
Speaker recognition technology, deployed in sectors like banking, education, recruitment, immigration, law enforcement, and healthcare, relies heavily on biometric data. However, the ethical implications and biases inherent in the datasets driving this technology have not been fully explored. Through a longitudinal study of close to 700 papers published at the ISCA Interspeech Conference in the years 2012 to 2021, we investigate how dataset use has evolved alongside the widespread adoption of deep neural networks. Our study identifies the most commonly used datasets in the field and examines their usage patterns. The analysis reveals significant shifts in data practices since the advent of deep learning: a small number of datasets dominate speaker recognition training and evaluation, and the majority of studies evaluate their systems on a single dataset. For four key datasets–Switchboard, Mixer, VoxCeleb, and ASVspoof–we conduct a detailed analysis of metadata and collection methods to assess ethical concerns and privacy risks. Our study highlights numerous challenges related to sampling bias, re-identification, consent, disclosure of sensitive information and security risks in speaker recognition datasets, and emphasizes the need for more representative, fair, and privacy-aware data collection in this domain.
Published in: IEEE Transactions on Biometrics, Behavior, and Identity Science ( Volume: 7, Issue: 1, January 2025)