Abstract
This research constitutes a relatively new approach by developing a smart solution which has emerged from the research activity using at first Google Glass and usable speech detection services. The authors conducted in the last year a series of developing, testing and evaluating the prototype results in order to decide, which service provides better results than the third-party speech detection service like Google Speech API or IBM Watson Speech To Text. This finding should significantly help the authors during the data evaluation and testing in developed smart solution. The basic idea is that authors have already developed a functional basic solution—a prototype. This solution was properly working and usable, but there are still some disadvantages to be improved. In order to accomplish the best results possible, the authors have added another element to their solution. A challenging problem which arises in this domain is concerned with significant data savings, server load, detection quality, and again opens a space for further improvements, such as following research and testing. This element is part of the statistical analysis and it is called Hidden Markov Model, which is used for speech recognition applications for last twenty years. The authors examined and studied many different articles and scientific sources in order to find the best solution for higher efficiency of speech recognition usable in their developed prototype (and for this article).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Graf, S. et al.: Features for voice activity detection: a comparative analysis. EURASIP J. Adv. Sign. Process. 1, 91 (2015)
Yanna, M.A., Nishihara. A.: Efficient voice activity detection algorithm using long-term spectral flatness measure. EURASIP J. Audio Speech Music Process. 1, 87 (2013)
Warakagoda, N.D.: A hybrid ANN-HMM ASR system with NN based adaptive preprocessing. May. Web (1996)
Wang, Z., Schultz, T., Waibel, A.: Comparison of acoustic model adaptation techniques on non-native speech. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003, Proceedings (ICASSP’03), pp. I–I. IEEE (2003)
Shearer, A.E., Hildebrand, M.S., Smith, R.J.H.: Hereditary hearing loss and deafness overview (2017)
Sohn, J., Kim, N.S., Sung, W.: A statistical model-based voice activity detection. IEEE Sig. Process Lett. 6(1), 1–3 (1999)
Jurafsky, D., Martin, J.H.: Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall series in artificial intelligence 1–1024 (2009)
Kyle, J.G., et al.: Sign language: the study of deaf people and their language. Cambridge University Press, Cambridge (1988)
Berger, A., et al.: Google glass used as assistive technology its utilization for blind and visually impaired people. In: International Conference on Mobile Web and Information Systems, pp. 70–82. Springer, Cham (2017)
Berger, A., Maly, F.: Prototype of a smart google glass solution for deaf (and hearing impaired) people. In: International Conference on Mobile Web and Intelligent Information Systems, pp. 38–47. Springer, Cham (2018)
Gandrud, C.: Reproducible research with R and R studio. Chapman and Hall/CRC (2016)
Urbanek, S.: Audio Interface for R. URL: https://cran.r-project.org/package=audio
Ligges, U., et al.: Analysis of Music and Speech. URL: https://cran.r-project.org/package=tuneR
Sueur, J., et al. Sound Analysis and Synthesis. URL: https://cran.r-project.org/package=seewave
Himmelmann, L.: HMM—Hidden Markov Models. URL: https://cran.r-project.org/package=HMM
Zue, V., Seneff, S., Glass, J.: Speech database development at MIT: TIMIT and beyond. Speech Commun. 9(4), 351–356 (1990)
Garofolo, J.S. et al. TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1. Web Download. Philadelphia: Linguistic Data Consortium (1993)
Aalen, O.O., Johansen, S.: An empirical transition matrix for non-homogeneous Markov chains based on censored observations. Scand. J. Stat. 1, 141–150 (1978)
Lou, H.-L.: Implementing the Viterbi algorithm. IEEE Signal Process. Mag. 12(5), 42–52 (1995)
Tatarinov, J., Pollák, P.: Hidden markov models in voice activity detection. In: COST278 and ISCA Tutorial and Research Workshop (ITRW) on Robustness Issues in Conversational Interaction (2004)
Acknowledgements
This work and the contribution were supported by the project of Students Grant Agency—FIM, University of Hradec Kralove, Czech Republic. Ales Berger is a student member of the research team.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Berger, A., Maly, F. (2020). Speech Activity Detection for Deaf People: Evaluation on the Developed Smart Solution Prototype. In: Huk, M., Maleszka, M., Szczerbicki, E. (eds) Intelligent Information and Database Systems: Recent Developments. ACIIDS 2019. Studies in Computational Intelligence, vol 830. Springer, Cham. https://doi.org/10.1007/978-3-030-14132-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-14132-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14131-8
Online ISBN: 978-3-030-14132-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)