In the field of voice therapy, perceptual evaluation is widely used by expert listeners as a way to evaluate pathological and normal voice quality. This approach is understandably subjective as it is subject to listeners’ bias which high inter- and intra-listeners variability can be found. As such, research on automatic assessment of pathological voices using a combination of subjective and objective analyses emerged. The present study aimed to develop a complementary automatic assessment system for voice quality based on the well-known GRBAS scale by using a battery of multidimensional acoustical measures through Deep Neural Networks. A total of 44 dimensionality parameters including Mel-frequency Cepstral Coefficients, Smoothed Cepstral Peak Prominence and Long-Term Average Spectrum was adopted. In addition, the state-of-the-art automatic assessment system based on Modulation Spectrum (MS) features and GMM classifiers was used as comparison system. The classification results using the proposed method revealed a moderate correlation with subjective GRBAS scores of dysphonic severity, and yielded a better performance than MS-GMM system, with the best accuracy around 81.53%. The findings indicate that such assessment system can be used as an appropriate evaluation tool in determining the presence and severity of voice disorders.
Cite as: Xie, S., Yan, N., Yu, P., Ng, M.L., Wang, L., Ji, Z. (2016) Deep Neural Networks for Voice Quality Assessment Based on the GRBAS Scale. Proc. Interspeech 2016, 2656-2660, doi: 10.21437/Interspeech.2016-986
@inproceedings{xie16d_interspeech, author={Simin Xie and Nan Yan and Ping Yu and Manwa L. Ng and Lan Wang and Zhuanzhuan Ji}, title={{Deep Neural Networks for Voice Quality Assessment Based on the GRBAS Scale}}, year=2016, booktitle={Proc. Interspeech 2016}, pages={2656--2660}, doi={10.21437/Interspeech.2016-986} }