Abstract
Nowadays, according to ever-increasing volumes of audio content, audio processing is a vital need. In the aerospace field, voice commands could be used instead of data commands in order to speed up the command transmission, help crewmembers to complete their tasks by allowing hands-free control of supplemental equipment and as a redundant system for increasing the reliability of command transmission. In this paper, a voice command detection (VCD) framework is proposed for aerospace applications, which decodes the voice commands to comprehensible and executable commands, in an acceptable speed with a low false alarm rate. The framework is mainly based on a keyword spotting method, which extracts some pre-defined target keywords from the input voice commands. The mentioned keywords are input arguments to the proposed rule-based language model (LM). The rule-based LM decodes the voice commands based on the input keywords and their locations. Two keyword spotters are trained and used in the VCD system. The phone-based keyword spotter is trained on TIMIT database. Then, speaker adaptation methods are exploited to modify the parameters of the trained models using non-native speaker utterances. The word-based keyword spotter is trained on a database prepared and specialized for aerospace applications. The experimental results show that the word-based VCD system decodes the voice commands with true detection rate equal to 88% and false alarm rate equal to 12%, in average. Additionally, using speaker adaptation methods in the phone-based VCD system improves the true detection and false alarm rates about 21% and 21%, respectively.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ahmed, A., Ahmed, T., Ullah, M., et al. (2012) Controlling and securing a digital home using multiple sensor based perception system integrated with mobile and voice technology. arXiv preprint arXiv:1209.5420.
Bahl, L., Brown, P., De Souza, P., et al. (1986) Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP’86 (pp. 49–52). IEEE.
Benayed, Y., Fohr, D., Haton, J. P., et al. (2003a) Improving the performance of a keyword spotting system by using support vector machines. In 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, 2003. ASRU’03. (pp. 145–149). IEEE.
Butt, M., Khanam, M., Khiyal, M., Khan, A., et al. (2011) Controlling home appliances remotely through voice command. (IJACSA) International Journal of Advanced Computer Science and Applications, Special Issue on Wireless and Mobile Networks, 35–39. doi:10.14569/SpecialIssue.2011.010206.
Chen, C.-P., Bilmes, J. A., & Kirchhoff, K. (2002) Low-resource noise-robust feature post-processing on AURORA 2.0. In Seventh International Conference on Spoken Language Processing.
Chen, G., Parada, C., & Heigold, G. (2014) Small-footprint keyword spotting using deep neural networks. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4087–4091). IEEE.
Cornu, E., Destrez, N., Dufaux, A., et al. (2002) An ultra low power, ultra miniature voice command system based on hidden markov models. In 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. IV-3800–IV-3803). IEEE.
Fernández, S., Graves, A., & Schmidhuber, J. (2007) An application of recurrent neural networks to discriminative keyword spotting. In International Conference on Artificial Neural Networks (pp. 220–229). Berlin: Springer.
Fezari, M., Boumaza, M. S., & Aldahoud, A. (2012) Voice command system based on pipelining classifiers GMM-HMM. In 2012 International Conference on Information Technology and e-Services (ICITeS) (pp. 1–6). IEEE.
Fezari, M., & Bousbia-Salah, M. (2006) A voice command system for autonomous robots guidance. In 9th IEEE International Workshop on Advanced Motion Control (pp. 261–265.). IEEE.
Firdaus, A. M., Yusof, R. M., Saharul, A., et al. (2015) Controlling an electric car starter system through voice. International Journal of Science & Technology Research, 4(4), 5–9.
Gupta, A., Patel, N., & Khan, S. (2014) Automatic speech recognition technique for voice command. In 2014 International Conference on Science Engineering and Management Research (ICSEMR) (pp. 1–5). IEEE.
Hoque, E., Dickerson, R. F., & Stankovic, J. A. (2014) Vocal-diary: A voice command based ground truth collection system for activity recognition. In Proceedings of the Wireless Health 2014 on National Institutes of Health (pp. 1–6). ACM.
Juang, B.-H., & Katagiri, S. (1992). Discriminative learning for minimum error classification (pattern recognition). IEEE Transactions on signal processing, 40, 3043–3054.
Keshet, J., Grangier, D., & Bengio, S. (2009). Discriminative keyword spotting. Speech Communication, 51, 317–329.
Lamel, L. F., Kassel, R. H., & Seneff, S. (1989) Speech database development: Design and analysis of the acoustic-phonetic corpus. In Speech Input/Output Assessment and Speech Databases.
Li, J., Deng, L., Gong, Y., et al. (2014) An overview of noise-robust automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22, 745–777.
Liu, W. K., & Fung, P. N. (2000) MLLR-based accent model adaptation without accented data. In Sixth International Conference on Spoken Language Processing (ICSLP 2000), Beijing.
Manikandan, M., Araghuram, S. D., Vignesh, S., et al. (2015). Device control using voice recognition in wireless smart home system. International Journal of Innovative Research in Computer and Communication Engineering, 3, 104–108.
Manos, A. S., & Zue, V. W. (1997) A segment-based wordspotter using phonetic filler models. In 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1997. ICASSP-97 (pp. 899–902). IEEE.
Morris, R. B., Whitmore, M., & Adam, S. C. (1993). How well does voice interaction work in space? IEEE Aerospace and Electronic Systems Magazine, 8, 26–31.
Mporas, I., Ganchev, T., Siafarikas, M., et al. (2007). Comparison of speech features on the speech recognition task. Journal of Computer Science, 3, 608–616.
Ngo, K., Spriet, A., Moonen, M., et al. (2012). A combined multi-channel Wiener filter-based noise reduction and dynamic range compression in hearing aids. Signal Processing, 92, 417–426.
Özkartal, S. G. (2015). Development of a system for human language commands and control for a quadcopter application. Journal of Management Research, 7, 1.
Povey, D., & Woodland, P. C. (2002) Minimum phone error and I-smoothing for improved discriminative training. In 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. I-105–I-108). IEEE.
Principi, E., Squartini, S., Bonfigli, R., et al. (2015). An integrated system for voice command recognition and emergency detection based on audio signals. Expert Systems with Applications, 42, 5668–5683.
Rohlicek, J. R., Russell, W., Roukos, S., et al. (1989) Continuous hidden Markov modeling for speaker-independent word spotting. In 1989 International Conference on Acoustics, Speech, and Signal Processing, 1989. ICASSP-89 (pp. 627–630). IEEE.
Shokri, A., Tabibian, S., Akbari, A., et al. (2011) A robust keyword spotting system for Persian conversational telephone speech using feature and score normalization and ARMA filter. In 2011 IEEE GCC Conference and Exhibition (GCC) (pp. 497–500). IEEE.
Szöke, I., Schwarz, P., Matejka, P., et al. (2005) Comparison of keyword spotting approaches for informal continuous speech. In Interspeech (pp. 633–636). Citeseer.
Tabibian, S., Akbari, A., & Nasersharif, B. (2013). Keyword spotting using an evolutionary-based classifier and discriminative features. Engineering Applications of Artificial Intelligence, 26, 1660–1670.
Tabibian, S., Akbari, A., & Nasersharif, B. (2014). Extension of a kernel-based classifier for discriminative spoken keyword spotting. Neural processing letters, 39, 195–218.
Tabibian, S., Akbari, A., & Nasersharif, B. (2015). Speech enhancement using a wavelet thresholding method based on symmetric Kullback–Leibler divergence. Signal Processing, 106, 184–197.
Tabibian, S., Akbari, A., & Nasersharif, B. (2016). A fast hierarchical search algorithm for discriminative keyword spotting. Information Sciences, 336, 45–59.
Tranter, S., Yu, K., Everinann, G., et al. (2004) Generating and evaluating segmentations for automatic speech recognition of conversational telephone speech. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. Proceedings.(ICASSP’04) (p. I-753.). IEEE,
Vapnik, V. N., & Vapnik, V. (1998) Statistical learning theory. New York: Wiley.
Vaseghi, S. V. (2008) Advanced digital signal processing and noise reduction. Hoboken: Wiley.
Vergyri, D., Lamel, L., & Gauvain, J.-L. (2010) Automatic speech recognition of multiple accented English data (pp. 1652–1655). In INTERSPEECH.
Viikki, O., Bye, D., & Laurila, K. (1998) A recursive feature vector normalization approach for robust speech recognition in noise. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 733–736). IEEE.
Wang, R., Shen, Z., Zhang, H., & Leung, C. (2015) Follow me: A personal robotic companion system for the elderly. International Journal of Information Technology (IJIT), 21(1).
Watile, Y., Ghotkar, P., & Rohankar, B. (2015) Computer control with voice command using matlab. Computer, doi:10.17148/IJIREEICE.2015.3613.
Weinstein, C. J. (1995) Military and government applications of human-machine communication by voice. Proceedings of the National Academy of Sciences 92:10011–10016.
Yoshizawa, S., Hayasaka, N., Wada, N., et al. (2004) Cepstral gain normalization for noise robust speech recognition. In Proceedings.(ICASSP’04). IEEE International Conference on Acoustics, Speech, and Signal Processing (vol. 201, p. I-209-212). IEEE.
Young, S. J., Woodland, P., & Byrne, W. (1993) HTK: Hidden Markov Model Toolkit V1. 5. Washington D.C.: Cambridge University Engineering Department Speech Group and Entropic Research Laboratories Inc.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tabibian, S. A voice command detection system for aerospace applications. Int J Speech Technol 20, 1049–1061 (2017). https://doi.org/10.1007/s10772-017-9467-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-017-9467-4