Abstract:
Analyzing speech signals to learn the age behind the voice can be valuable for different purposes including voice authentication or advertisements formulated on age group...Show MoreMetadata
Abstract:
Analyzing speech signals to learn the age behind the voice can be valuable for different purposes including voice authentication or advertisements formulated on age groups. Applying diverse machine learning algorithms to reveal the main features behind age can be challenging since it requires a deep understanding of voice aging; thus, to solve this problem, we developed two methods that focus on classifying a person’s age into a group (teens, twenties, thirties, forties, fifties, and sixties) based on his/her voice. The first method was using the Mel-Spectrograms images and feeding them into a Convolutional Neural Network (CNN) model as an image classification task. The second method was extracting important acoustic features including Mel-Frequency-Cepstral-Coefficients (MFCCs), and other features like, Spectral Contrast, Spectral Roll-Off, and Spectral Bandwidth. Then, classifying those extracted features using different machine learning algorithms namely K-Nearest Neighbors (KNN) and Label Propagation (LP). They achieved an accuracy of 95%. Finally, a combination of the two methods was implemented to enhance the model’s robustness. We were able to attain an overall accuracy of 97% which is the highest in the literature for the age classification task on the Mozilla Common Voice dataset.
Date of Conference: 11-13 June 2023
Date Added to IEEE Xplore: 05 July 2023
ISBN Information: