Using combined features to improve speaker verification in the face of limited reverberant data

Al-Karawi, Khamis A.; Mohammed, Duraid Y.

doi:10.1007/s10772-023-10048-7

Using combined features to improve speaker verification in the face of limited reverberant data

Published: 14 October 2023

Volume 26, pages 789–799, (2023)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

107 Accesses
3 Citations
Explore all metrics

Abstract

Automatic speaker recognition has garnered significant attention in research, displaying impressive performance in matched conditions where training and testing environments are similar. However, the system’s efficacy diminishes considerably when confronted with mismatched conditions, such as noise and reverberation. The extraction of features plays a pivotal role in determining the speaker recognition system’s overall performance. Gammatone Frequency Cepstral Coefficients (GFCCs) have emerged as a commonly employed method for feature extraction in this domain. GFCCs exhibit robustness in handling environmental variations, encompassing diverse speaking styles and languages. Nevertheless, they prove sensitive to background conditions, including noise and reverberation, leading to a significant decline in system performance. A novel “Entrocy” feature has been proposed in response to this challenge. Entrocy is the Fourier Transform of Entrocy and aims to estimate the variation of information (or entropy) within an audio segment over time. A composite feature vector is formed by combining the Entrocy feature with GFCCs. The performance of this proposed approach was meticulously assessed using the i-vector PLDA baseline speaker recognition systems. Notably, the Entrocy feature consistently outperforms the well-established GFCC features, exhibiting robustness in these challenging conditions. Experiments conducted on speaker verification in controlled environments reveal that the speaker verification system can deliver high performance if the reverberation time does not exceed 1.0 s. Moreover, the speech samples need to be longer than 5 s are. These results highlight the effectiveness of the proposed method in reducing the equal error rate and improving the detection error trade-off, ultimately enhancing the system’s overall accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Feature Level Fusion Scheme for Robust Speaker Identification

New Front End Based on Multitaper and Gammatone Filters for Robust Speaker Verification

Mitigate the reverberation effect on the speaker verification performance using different methods

Article 18 November 2020

References

Alenizi, A. S., & Al-Karawi, K. A. (2022). Cloud computing adoption-based digital open government services: Challenges and barriers. In Proceedings of 6th international congress on information and communication technology (pp. 149–160).
Alenizi, A. S., & Al-Karawi, K. A. (2023). Effective biometric technology used with big data. In Proceedings of 7th international congress on information and communication technology (pp. 239–250).
Alenizi, A. S., & Al-Karawi, K. A. (2023). Internet of Things (IoT) adoption: Challenges and barriers. In Proceedings of 7th international congress on information and communication technology (pp. 217–229).
Al-Karawi, K. A. (2019). Robustness speaker recognition based on feature space in clean and noisy condition. International Journal of Sensors, Wireless Communications and Control, 9, 1–10.
Article Google Scholar
Al-Karawi, K. A. (2020). Mitigate the reverberation effect on the speaker verification performance using different methods. International Journal of Speech Technology, 24, 143–153.
Article Google Scholar
Al-Karawi, K. A. (2023). Face mask effects on speaker verification performance in the presence of noise. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-023-15824-w
Article Google Scholar
Al-Karawi, K. A., & Li, F. (2017). Robust speaker verification in reverberant conditions using estimated acoustic parameters—a maximum likelihood estimation and training on the fly approach. In 2017 Seventh international conference on innovative computing technology (INTECH) (pp. 52–57).
Al-Karawi, K. A., & Ahmed, S. T. (2021). Model selection toward robustness speaker verification in reverberant conditions. Multimedia Tools and Applications, 80, 36549–36566.
Article Google Scholar
Al-Karawi, K. A., Al-Noori, A. H., Li, F. F., & Ritchings, T. (2015). Automatic speaker recognition system in adverse conditions—implication of noise and reverberation on system performance. International Journal of Information and Electronics Engineering, 5, 423.
Article Google Scholar
Al-Karawi, K. A., & Mohammed, D. Y. (2019). Early reflection detection using autocorrelation to improve robustness of speaker verification in reverberant conditions. International Journal of Speech Technology, 22, 1077–1084.
Article Google Scholar
Al-Karawi, K. A., & Mohammed, D. Y. (2021). Improving short utterance speaker verification by combining MFCC and Entrocy in Noisy conditions. Multimedia Tools and Applications, 80, 22231–22249.
Article Google Scholar
CATT-Acoustic. (2010). v8.0c, Room acoustic modelling software. Retrieved October 18, 2010 from http://www.catt.se
Chakroun, R., Frikha, M., & Beltaïfa Zouari, L. (2018). New approach for short utterance speaker identification. IET Signal Processing, 12, 873–880.
Article Google Scholar
Chen, Y.-W., & Lin, C.-J. (2006). Combining SVMs with various feature selection strategies. In Feature extraction (pp. 315–324). Springer.
Dehak, N., Dehak, R., Kenny, P., Brümmer, N., Ouellet, P., & Dumouchel, P. (2009). Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification. In 10th Annual conference of the international speech communication association.
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19, 788–798.
Article Google Scholar
Deng, L., & Yu, D. (2014). Deep learning: Methods and applications. Foundations and Trends in Signal Processing, 7, 197–387.
Article MathSciNet MATH Google Scholar
Fatima, N., & Zheng, T. F. (2012). short utterance speaker recognition a research agenda. In 2012 International conference on systems and informatics (ICSAI) (pp. 1746–1750).
Hautamäki, V., Cheng, Y.-C., Rajan, P., & Lee, C.-H. (2013). Minimax i-vector extractor for short duration speaker verification. In INTERSPEECH (pp. 3708–3712).
Jayanna, H., & Prasanna, S. M. (2009). Analysis, feature extraction, modeling and testing techniques for speaker recognition. IETE Technical Review, 26, 181.
Article Google Scholar
Kanagasundaram, A., Vogt, R., Dean, D. B., Sridharan, S., & Mason, M. W. (2011). I-vector based speaker recognition on short utterances. In Proceedings of the 12th annual conference of the international speech communication association (pp. 2341–2344).
Kenny, P. (2010). Bayesian speaker verification with heavy-tailed priors. In Odyssey (p. 14).
Kenny, P., Stafylakis, T., Ouellet, P., Gupta, V., & Alam, M. J. (2014). Deep neural networks for extracting Baum–Welch statistics for speaker recognition. In Odyssey (pp. 293–298).
Kenny, P., Boulianne, G., Ouellet, P., & Dumouchel, P. (2007). Joint factor analysis versus eigenchannels in speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing, 15, 1435–1447.
Article Google Scholar
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52, 12–40.
Article Google Scholar
Li, L., Wang, D., Zhang, C., & Zheng, T. F. (2016). Improving short utterance speaker recognition by modeling speech unit classes. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24, 1129–1139.
Article Google Scholar
Loshin, P. (Ed.). (2016). Barclays replaces passwords with voice authentication. Future US.
Google Scholar
Mak, M.-W., Hsiao, R., & Mak, B. (2006). A comparison of various adaptation methods for speaker verification with limited enrollment data. In 2006 IEEE international conference on acoustics speech and signal processing proceedings (pp. I–I).
Mohammed, D. Y. (2017). Overlapped speech and music segmentation using singular spectrum analysis and random forests. Salford University.
Google Scholar
Mohammed, D. Y., Al-Karawi, K., & Aljuboori, A. (2021). Robust speaker verification by combining MFCC and entrocy in noisy conditions. Bulletin of Electrical Engineering and Informatics. https://doi.org/10.11591/eei.v10i4.2957
Article Google Scholar
Mohammed, D., Al-Karawi, K. A., Duncan, P., & Li, F. F. (2019). Overlapped music segmentation using a new effective feature and random forests. International Journal of Artificial Intelligence. https://doi.org/10.11591/ijai.v8.i2.pp181-189
Article Google Scholar
Mohammed, D. Y., Al-Karawi, K. A., Husien, I. M., & Ghulam, M. A. (2020). Mitigate the reverberant effects on speaker recognition via multi-training (pp. 95–109). Springer.
Google Scholar
Nosratighods, M., Ambikairajah, E., Epps, J., & Carey, M. J. (2010). A segment selection technique for speaker verification. Speech Communication, 52, 753–761.
Article Google Scholar
Poddar, A., Sahidullah, M., & Saha, G. (2017). Speaker verification with short utterances: A review of challenges, trends and opportunities. IET Biometrics, 7, 91–101.
Article Google Scholar
Prince, S. J., & Elder, J. H. (2007). Probabilistic linear discriminant analysis for inferences about identity. In 2007 IEEE 11th international conference on computer vision (pp. 1–8).
Sadjadi, S. O., Slaney, M., & Heck, L. (2013). MSR Identity Toolbox v1.0: A MATLAB toolbox for speaker-recognition research. In Speech and Language Processing Technical Committee Newsletter.
Schroeder, M. R. (1979). Integrated-impulse method measuring sound decay without using impulses. The Journal of the Acoustical Society of America, 66, 497–500.
Article Google Scholar
Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27, 379–423.
Article MathSciNet MATH Google Scholar
Stewart, W. J. (2009). Probability, Markov chains, queues, and simulation: The mathematical basis of performance modeling. Princeton University Press.
Book MATH Google Scholar
Vogt, R., Sridharan, S., & Mason, M. (2009). Making confident speaker verification decisions with minimal speech. IEEE Transactions on Audio, Speech, and Language Processing, 18, 1182–1192.
Article Google Scholar
Zhao, X., Wang, Y., & Wang, D. (2014). Robust speaker identification in noisy and reverberant conditions. In International conference on acoustics, speech, and signal processing (ICASSP).

Download references

Author information

Authors and Affiliations

Diyala University, Baqubah, Diyala, Iraq
Khamis A. Al-Karawi
College of Engineering, Al-Iraqia University, Baghdad, Iraq
Duraid Y. Mohammed

Authors

Khamis A. Al-Karawi
View author publications
You can also search for this author in PubMed Google Scholar
Duraid Y. Mohammed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Khamis A. Al-Karawi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Al-Karawi, K.A., Mohammed, D.Y. Using combined features to improve speaker verification in the face of limited reverberant data. Int J Speech Technol 26, 789–799 (2023). https://doi.org/10.1007/s10772-023-10048-7

Download citation

Received: 17 July 2023
Accepted: 07 September 2023
Published: 14 October 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s10772-023-10048-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using combined features to improve speaker verification in the face of limited reverberant data

Abstract

Access this article

Similar content being viewed by others

A Feature Level Fusion Scheme for Robust Speaker Identification

New Front End Based on Multitaper and Gammatone Filters for Robust Speaker Verification

Mitigate the reverberation effect on the speaker verification performance using different methods

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using combined features to improve speaker verification in the face of limited reverberant data

Abstract

Access this article

Similar content being viewed by others

A Feature Level Fusion Scheme for Robust Speaker Identification

New Front End Based on Multitaper and Gammatone Filters for Robust Speaker Verification

Mitigate the reverberation effect on the speaker verification performance using different methods

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation