Skip to main content
Log in

Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Voice activity detection (VAD) refers to the task of identifying vocal segments from an audio clip. It helps in reducing the computational overhead as well elevate the recognition performance of speech-based systems by helping to discard the non vocal portions from an input signal. In this paper, a VAD technique is presented that uses line spectral frequency-based statistical features namely LSF-S coupled with extreme learning-based classification. The experiments were performed on a database of more than 350 h consisting of data from multifarious sources. We have obtained an encouraging overall accuracy of 99.43%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

Notes

  1. Retrieved Jan 24, 2018 from https://azure.microsoft.com/en-in/services/cognitive-services/speaker-recognition/.

  2. Retrieved Jan 24, 2018 from https://www.nuance.com/omni-channel-customer-engagement/security/multi-modal-biometrics/freespeech.html.

  3. Retrieved Jan 24, 2018 from https://www.youtube.com.

References

  • Asbai, N., Bengherabi, M., Amrouche, A., & Aklouf, Y. (2015). Improving the self-adaptive voice activity detector for speaker verification using map adaptation and asymmetric tapers. International Journal of Speech Technology, 18(2), 195–203.

    Article  Google Scholar 

  • Bäckström, T. (2017). Speech coding with code-excited linear prediction: Signals and communication technology (1st ed.). New York: Springer. eBook ISBN 978-3-319-50204-5.

  • Beritelli, F., Casale, S., & Russo, M. (1999). A pattern recognition approach to robust voiced/unvoiced speech classification using fuzzy logic. International Journal of Pattern Recognition and Artificial Intelligence, 13(01), 109–132.

    Article  Google Scholar 

  • Borin, R. G., & Silva, M. T. (2017). Voice activity detection using discriminative restricted Boltzmann machines. In EUSIPCO-2017 (pp. 523–527).

  • Bouguelia, M. R., Nowaczyk, S., Santosh, K. C., & Verikas, A. (2017). Agreeing to disagree: Active learning with noisy labels without crowdsourcing. International Journal of Machine Learning and Cybernetics. https://doi.org/10.1007/s13042-017-0645-0.

    Google Scholar 

  • Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7(Jan), 1–30.

    MathSciNet  MATH  Google Scholar 

  • Dey, M., Dey, N., Mahata, S. K., Chakraborty, S., Acharjee, S., & Das, A. (2014). Electrocardiogram feature based inter-human biometric authentication system. In ICESC-2014 (pp. 300–304).

  • Dharavath, K., Talukdar, F. A., Laskar, R. H., & Dey, N. (2017). Face recognition under dry and wet face conditions. In N. Dey & V. Santhi (Eds.), Intelligent techniques in signal processing for multimedia security (pp. 253–271). Cham: Springer.

    Chapter  Google Scholar 

  • Ding, S., Zhang, N., Zhang, J., Xu, X., & Shi, Z. (2017). Unsupervised extreme learning machine with representational features. International Journal of Machine Learning and Cybernetics, 8(2), 587–595.

    Article  Google Scholar 

  • Dudley, H. (1939). The vocoder. Bell Labs Record, 17, 122–126.

    Google Scholar 

  • Dudley, H., Riesz, R. R., & Watkins, S. A. (1939). A synthetic speaker. Journal of Franklin Institute, 227, 739–764.

    Article  Google Scholar 

  • Freeman, D. K., Cosier, G., Southcott, C. B., & Boyd, I. (1989). The voice activity detector for the Pan-European digital cellular mobile telephone service. In ICASSP-1989, (pp. 369–372).

  • Ghosh, P. K., Tsiartas, A., & Narayanan, S. (2011). Robust voice activity detection using long-term signal variability. IEEE Transactions on Audio, Speech, and Language Processing, 19(3), 600–613.

    Article  Google Scholar 

  • Gil-Pita, R., Garca-Gomez, J., Bautista-Durn, M., Combarro, E., & Cocana-Fernandez, A. (2017). Evolved frequency log-energy coefficients for voice activity detection in hearing aids. In FUZZ-IEEE-2017 (pp. 1–6).

  • Gorriz, J. M., Ramrez, J., Lang, E. W., & Puntonet, C. G. (2006). Hard c-means clustering for voice activity detection. Speech Communication, 48(12), 1638–1649.

    Article  Google Scholar 

  • Graf, S., Herbig, T., Buck, M., & Schmidt, G. (2015). Features for voice activity detection: A comparative analysis. EURASIP Journal on Advances in Signal Processing, 2015(1), 91.

    Article  Google Scholar 

  • Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1), 10–18.

    Article  Google Scholar 

  • Hamaidi, L. K., Muma, M., & Zoubir, A. M. (2017). Robust distributed multi-speaker voice activity detection using stability selection for sparse non-negative feature extraction. In EUSIPCO-2017 (pp. 161–165).

  • Harris, F. J. (1978). On the use of windows for harmonic analysis with the discrete Fourier transform. Proceedings of the IEEE, 66(1), 51–83.

    Article  Google Scholar 

  • Hu, K., Zhou, Z., Weng, L., Liu, J., Wang, L., Su, Y., et al. (2017). An optimization strategy for weighted extreme learning machine based on PSO. International Journal of Pattern Recognition and Artificial Intelligence, 31(01), 1751001.

    Article  Google Scholar 

  • Huang, G. B., Bai, Z., Kasun, L. L. C., & Vong, C. M. (2015). Local receptive fields based extreme learning machine. IEEE Computational Intelligence Magazine, 10(2), 18–29.

    Article  Google Scholar 

  • Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2006). Extreme learning machine: Theory and applications. Neurocomputing, 70(1–3), 489–501.

    Article  Google Scholar 

  • Hussain, T., Siniscalchi, S. M., Lee, C. C., Wang, S. S., Tsao, Y., & Liao, W. H. (2017). Experimental study on extreme learning machine applications for speech enhancement. IEEE Access, 5, 25542–25554.

    Article  Google Scholar 

  • Joseph, S. M., & Babu, A. P. (2016). Wavelet energy based voice activity detection and adaptive thresholding for efficient speech coding. International Journal of Speech Technology, 19(3), 537–550.

    Article  Google Scholar 

  • Luo, Y., Yang, B., Xu, L., Hao, L., Liu, J., Yao, Y., et al. (2017). Segmentation of the left ventricle in cardiac MRI using a hierarchical extreme learning machine model. International Journal of Machine Learning and Cybernetics. https://doi.org/10.1007/s13042-017-0678-4.

    Google Scholar 

  • Lyon, D. A. (2009). The discrete Fourier transform, part 4: Spectral leakage. Journal of object technology. https://doi.org/10.5381/jot.2009.8.7.c2.

    Google Scholar 

  • Ma, Y., & Nishihara, A. (2013). Efficient voice activity detection algorithm using long-term spectral flatness measure. EURASIP Journal on Audio, Speech, and Music Processing, 2013(1), 87.

    Article  Google Scholar 

  • Mukherjee, H., Obaidullah, S. M., Phadikar, S., & Roy, K. (2018). MISNA-A musical instrument segregation system from noisy audio with LPCC-S features and extreme learning. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-018-5993-6.

    Google Scholar 

  • Obaidullah, S. M., Santosh, K. C., Das, N., Halder, C., & Roy, K. (2018). Handwritten Indic script identification in multi-script document images: A survey. International Journal of Pattern Recognition and Artificial Intelligence. https://doi.org/10.1142/S0218001418560128.

    Google Scholar 

  • Odelowo, B. O., & Anderson, D. V. (2017). Speech enhancement using extreme learning machines. In WASPAA-2017 (pp. 200–204).

  • Paliwal, K. K. (1992). On the use of line spectral frequency parameters for speech recognition. Digital Signal Processing, 2(2), 80–87.

    Article  Google Scholar 

  • Pasad, A., Sabu, K., & Rao, P. (2017). Voice activity detection for children’s read speech recognition in noisy conditions. In NCC-2017 (pp. 1–6).

  • Rajeswari, P., Raju, S. V., Ashour, A. S., & Dey, N. (2017). Multi-fingerprint unimodelbased biometric authentication supporting cloud computing. In N. Dey & V. Santhi (Eds.), Intelligent techniques in signal processing for multimedia security (pp. 469–485). Cham: Springer.

    Google Scholar 

  • Shi, Y. Q., Li, R. W., Zhang, S., Wang, S., & Yi, X. Q. (2016). A speech endpoint detection algorithm based on BP neural network and multiple features. In AMMIS-2015 (pp. 393–402).

  • Solé-Casals, J., Martí-Puig, P., Reig-Bolaño, R., & Zaiats, V. (2009). Score function for voice activity detection. In NOLISP-09 (pp. 76–83).

  • Vajda, S., & Santosh, K. C. (2016). A Fast k-Nearest Neighbor Classifier Using Unsupervised Clustering. In RTIP2R-2016 (pp. 185–193).

  • Wang, L., Phapatanaburi, K., Go, Z., Nakagawa, S., Iwahashi, M., & Dang, J. (2017). Phase aware deep neural network for noise robust voice activity detection. In ICME-17 (pp. 1087–1092).

  • Wei, H., Long, Y., & Mao, H. (2016). Improvements on self-adaptive voice activity detector for telephone data. International Journal of Speech Technology, 19(3), 623–630.

    Article  Google Scholar 

  • Wu, B., Ren, X., Liu, C., & Zhang, Y. (1997). A robust, real-time voice activity detection algorithm for embedded mobile devices. Journal of Sol-Gel Science and Technology, 8(2), 133–146.

    Article  Google Scholar 

  • Wu, G. D., & Wu, P. J. (2016). Type-2 fuzzy neural network for voice activity detection. In iFuzzy-2016 (pp. 1–4).

  • Wu, J., & Zhang, X. L. (2011). An efficient voice activity detection algorithm by combining statistical model and energy detection. EURASIP Journal on Advances in Signal Processing, 2011(1), 18.

    Article  Google Scholar 

  • Yoo, I. C., Lim, H., & Yook, D. (2015). Formant-based robust voice activity detection. IEEE/ACM Transactions on Audio, Speech and Language Processing, 23(12), 2238–2245.

    Article  Google Scholar 

  • Zhao, H., Guo, X., Wang, M., Li, T., Pang, C., & Georgakopoulos, D. (2018). Analyze EEG signals with extreme learning machine based on PMIS feature selection. International Journal of Machine Learning and Cybernetics, 9(2), 243–249.

    Article  Google Scholar 

Download references

Acknowledgements

The authors wish to thank Dr. Chayan Halder of University of Engineering and Management, Kolkata, Miss Payel Rakshit of Maheshtala College, Budge Budge and Miss Ankita Dhar of West Bengal State University, Barasat for extending a helping hand as and when required during the entire span of this work. They would also like to thank Mr. Debajyoti Bose of University of Petroleum and Energy Studies, Dehradun for his help.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kaushik Roy.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mukherjee, H., Obaidullah, S.M., Santosh, K.C. et al. Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. Int J Speech Technol 21, 753–760 (2018). https://doi.org/10.1007/s10772-018-9525-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-018-9525-6

Keywords

Navigation