Abstract
A novel approach for assisting bidirectional communication between people of normal hearing and hearing-impaired is presented. While the existing hearing-impaired assistive devices such as hearing aids and cochlear implants are vulnerable in extreme noise conditions or post-surgery side effects, the proposed concept is an alternative approach wherein spoken dialogue is achieved by means of employing a robust speech recognition technique which takes into consideration of noisy environmental factors without any attachment into human body. The proposed system is a portable device with an acoustic beamformer for directional noise reduction and capable of performing speech-to-text transcription function, which adopts a keyword spotting method. It is also equipped with an optimized user interface for hearing-impaired people, rendering intuitive and natural device usage with diverse domain contexts. The relevant experimental results confirm that the proposed interface design is feasible for realizing an effective and efficient intelligent agent for hearing-impaired.
Similar content being viewed by others
References
Ara V, Nefian AV, Monson H (1999) An embedded HMM-based approach for face detection and recognition. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), Phoenix, AZ, USA, pp 3553–3556
Beh J, Baran RH, Ko H (2006) Dual channel based speech enhancement using novelty filter for robust speech recognition in automobile environments. In: Proceedings of IEEE international conference on consumer electronics, Las Vegas, NV, USA, pp 243–244
Benesty J, Chen J, Huang Y, Dmochowski J (2007) On microphone-array beamforming from a MIMO acoustic signal processing perspective. IEEE Trans Audio Speech Lang Process 15(3):1053–1065
Chen Y, Hou T, Meng S, Zhong S, Liu J (2006) A new framework for large vocabulary keyword spotting using two-pass confidence measure. In: Proceedings of conference on computational engineering in systems applications (IMACS), Beijing, China, pp 68–71
Cohen I (2004) Multichannel post-filtering in nonstationary noise environments. IEEE Trans Signal Process 52(5):1149–1160
Cohen I (2005) Relaxed statistical model for speech enhancement and a priori SNR estimation. IEEE Trans Speech Audio Process 13(5):870–881
Cohen I, Berdugo B (2001) Speech enhancement for non-stationary noise environments. Elsevier Signal Process 81(11):2403–2418
Davenport J, Schwartz R, Nguyen L (1999) Towards a robust real-time decoder. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), Phoenix, AZ, USA, pp 645–648
Engwall O, Bälter O, Öster AM, Kjellström H (2007) Designing the user interface of the computer-based speech training system ARTUR based on early user tests. J Behav Inf Technol 25(4):353–365
Ephraim Y, Van Trees HL (1995) A signal subspace approach for speech enhancement. IEEE Trans Speech Audio Process 3(4):251–266
Gannot S (2001) Signal enhancement using beamforming and nonstationarity with applications to speech. IEEE Trans Signal Process 49(8):1614–1626
Gannot S, Cohen I (2001) Speech enhancement based on the general transfer function GSC and Postfiltering. IEEE Trans Speech Audio Process 12(6):561–571
Habets E, Benesty J (2011) Joint dereverberation and noise reduction using a two-stage beamforming approach. In Proceedings of joint workshop on hands-free speech communication and microphone arrays, Edinburgh, UK, pp 191–195
Horvitz E (1999) Principles of mixed-initiative user interfaces. In: Proceedings of ACM SIGCHI conference on human factors in computing systems, Pittsburgh, PA, USA, pp 159–166
Hu Y, Loizou P (2004) Incorporating a psychoacoustical model in frequency domain speech enhancement. IEEE Signal Process Lett 11(2):270–273
Huggins-Daines D, Kumar M, Chan A, Black AW, Ravishankar M, Rudnicky AI (2006) PocketSphinx: a free real-time continuous speech recognition system for hand-held devices. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), Toulouse, France, pp 185–188
Humes LE, Wilson DL, Humes AC (2003) Examination of differences between successful and unsuccessful elderly hearing aid candidates matched for age, hearing loss and gender. Int J Audiol 42(7):432–441
Jeong S, Min K, Ko H (2006) Fast decoder design of connected word speech recognition for automobile navigation system. In: Proceedings of IEEE international conference on consumer electronics, Las Vegas, NV, USA, pp 215–216
Karpov A, Ronzhin A, Kipyatkova I (2011) An assistive bi-modal user interface integrating multi-channel speech recognition and computer vision. In: Proceedings of 14th international conference on human-computer interaction HCI international, Orlando, FL, USA, pp 454–463
Kim N, Chang J (2000) Spectral enhancement based on global soft decision. IEEE Signal Process Lett 7(5):108–110
Kosmidou VE, Hadjileontiadis LI (2010) Using sample entropy for automated sign language recognition on sEMG and accelerometer data. Med Biol Eng Comput 48(3):255–267
Lee W, Song J, Chang J (2011) Minima-controlled speech presence uncertainty tracking method for speech enhancement. Signal Process 91(1):155–161
Lund AM (2001) Measuring usability with the USE questionnaire. STC usability. SIG Newsl 8(2)
Luo X, Han M, Liu T, Chen W, Bai F (2012) Assistive learning for hearing impaired college students using mixed reality: a pilot study. In: Proceedings of IEEE international conference on virtual reality and visualization (ICVRV), Qinhuangdao, China, pp 74–81
Markovich S, Gannot S, Cohen I (2009) Multichannel eigenspace beamforming in a reverberant noisy environment with multiple interfering speech signals. IEEE Trans Audio Speech Lang Process 17(6):1071–1086
Martin R (1994) Spectral subtraction based on minimum statistics. In: Proceedings of European signal processing conference (EUSIPCO), Edinburgh, UK, pp 1182–1185
Novuk M, Humpl R, Krbec P, Bergl V, Sedivy J (2003) Two-pass search strategy for large list recognition on embedded speech recognition platforms. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), Hong Kong, China, pp 200–203
Park J, Ko H (2008) Real-time continuous phoneme recognition system using class-dependent tied-mixture HMM with HBT structure for speech-driven lip-synchronization. IEEE Trans Multimedia 10(7):1299–1306
Pulli P, Hyry J, Pouke M, Yamamoto G (2012) User interaction in smart ambient environment targeted for senior citizen. Med Biol Eng Comput 50(11):1119–1126
Schuster J, Gupta K, Hoare R, Jones A (2006) Speech silicon: an FPGA architecture for real-time, hidden markov model based speech recognition. EURASIP J Embed Syst 2(1):1–19
Song J, Yang S (2009) Design of communication system for the hearing impaired. J Korean Soc Des Sci 22(1):197–206
Sorri M, Luotonen M, Laitakari K (1984) Use and non-use of hearing aids. Br J Audiol 18(3):169–172
Suresh P, Vasudevan N, Ananthanarayanan N (2012) Computer-aided interpreter for hearing and speech impaired. In: Proceedings of 4th IEEE international conference on computational intelligence, communication systems and networks (CICSyN), Phuket, Thailand, pp 248–253
Szöke I, Schwarz P, Matějka P, Karafiát M (2005) Comparison of keyword spotting approaches for informal continuous speech. In: Proceedings of 9th European conference on speech communication and technology (Interspeech), Lisbon, Portugal, pp 633–636
Wang X, Han Z, Wang J, Guo M (2008) Speech recognition system based on visual feature for the hearing impaired. In: Proceedings of 4th international conference on natural computation (ICNC), Jinan, China, pp 543–546
Yoo I, Yook D (2008) Automatic sound recognition for the hearing impaired. IEEE Trans Consum Electron 54(4):2029–2036
Zeng G, Rebscher S, Harrison WV, Sun X, Feng H (2008) Cochlear implants: system design, integration, and evaluation. A clinical application review. IEEE Rev Biomed Eng 1:115–142
Zhao Y, Zhang X, Hu R, Xue J, Li X, Che L, Hu R, Schopp L (2006) An automatic captioning system for telemedicine. In: Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP), Toulouse, France, pp 957–960
Acknowledgments
This research was supported by Ministry of Health & Welfare R&D (A111189) and partially supported by Seokyeong University grant programme in 2013.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lee, S., Kang, S., Han, D.K. et al. Dialogue enabling speech-to-text user assistive agent system for hearing-impaired person. Med Biol Eng Comput 54, 915–926 (2016). https://doi.org/10.1007/s11517-015-1447-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11517-015-1447-8