Skip to main content

Enhancing Speech Recorded from a Wearable Sensor Using a Collection of Autoencoders

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1087))

Abstract

Assistive Technology (AT) is a concept which includes the use of technological devices to improve the learning process or the general capabilities of people with disabilities. One of the major tasks of the AT is the development of devices that offer alternative or augmentative communication capabilities.

In this work, we implemented a simple AT device with a low-cost sensor for registering speech signals, in which the sound is perceived as low quality and corrupted. Thus, it is not suitable to integrate into speech recognition systems, automatic transcription or general recognition of vocal-tract sounds for people with disabilities.

We propose the use of a group of artificial neural networks that improve different aspects of the signal. In the study of the speech enhancement, it is normal to focus on how to make improvements in specific conditions of the signal, such as background noise, reverberation, natural noises, among others. In this case, the conditions that degrade the sound are unknown. This uncertainty represents a bigger challenge for the enhancement of the speech, in a real-life application.

The results show the capacity of the artificial neural networks to enhance the quality of the sound, under several objective evaluation measurements. Therefore, this proposal can become a way of treating these kinds of signals to improve robust speech recognition systems and increase the real possibilities for implementing low-cost AT devices.

Supported by the University of Costa Rica.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Alshurafa, N., et al.: Recognition of nutrition intake using time-frequency decomposition in a wearable necklace using a piezoelectric sensor. IEEE Sens. J. 15(7), 3909–3916 (2015)

    Article  Google Scholar 

  2. Alshurafa, N., Kalantarian, H., Pourhomayoun, M., Sarin, S., Liu, J.J., Sarrafzadeh, M.: Non-invasive monitoring of eating behavior using spectrogram analysis in a wearable necklace. In: 2014 IEEE Healthcare Innovation Conference (HIC), pp. 71–74. IEEE (2014)

    Google Scholar 

  3. Coto-Jiménez, M.: Pre-training long short-term memory neural networks for efficient regression in artificial speech postfiltering. In: 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI), pp. 1–7. IEEE (2018)

    Google Scholar 

  4. Coto-Jiménez, M., Goddard-Close, J.: LSTM deep neural networks postfiltering for improving the quality of synthetic voices. In: Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Ayala-Ramírez, V., Olvera-López, J.A., Jiang, X. (eds.) MCPR 2016. LNCS, vol. 9703, pp. 280–289. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39393-3_28

    Chapter  Google Scholar 

  5. Coto-Jiménez, M., Goddard-Close, J.: LSTM deep neural networks postfiltering for enhancing synthetic voices. Int. J. Pattern Recogn. Artif. Intell. 32(01), 1860008 (2018)

    Article  MathSciNet  Google Scholar 

  6. Coto-Jimenez, M., Goddard-Close, J., Di Persia, L., Rufiner, H.L.: Hybrid speech enhancement with wiener filters and deep LSTM denoising autoencoders. In: 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI), pp. 1–8. IEEE (2018)

    Google Scholar 

  7. Coto-Jiménez, M., Goddard-Close, J., Martínez-Licona, F.: Improving automatic speech recognition containing additive noise using deep denoising autoencoders of LSTM networks. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 354–361. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43958-7_42

    Chapter  Google Scholar 

  8. Du, J., Wang, Q., Gao, T., Xu, Y., Dai, L.R., Lee, C.H.: Robust speech recognition with speech enhanced deep neural networks. In: Fifteenth Annual Conference of the International Speech Communication Association (2014)

    Google Scholar 

  9. Erro, D., Sainz, I., Navas, E., Hernáez, I.: Improved HNM-based vocoder for statistical synthesizers. In: Twelfth Annual Conference of the International Speech Communication Association (2011)

    Google Scholar 

  10. Fan, Y., Qian, Y., Xie, F.L., Soong, F.K.: TTS synthesis with bidirectional LSTM based recurrent neural networks. In: Fifteenth Annual Conference of the International Speech Communication Association (2014)

    Google Scholar 

  11. Feng, X., Zhang, Y., Glass, J.: Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1759–1763. IEEE (2014)

    Google Scholar 

  12. Gautschi, G.: Piezoelectric sensors. In: Gautschi, G. (ed.) Piezoelectric Sensorics, pp. 73–91. Springer, Heidelberg (2002). https://doi.org/10.1007/978-3-662-04732-3_5

    Chapter  Google Scholar 

  13. Gers, F.A., Schraudolph, N.N., Schmidhuber, J.: Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 3(Aug), 115–143 (2002)

    MathSciNet  MATH  Google Scholar 

  14. Graves, A., Fernández, S., Schmidhuber, J.: Bidirectional LSTM networks for improved phoneme classification and recognition. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 799–804. Springer, Heidelberg (2005). https://doi.org/10.1007/11550907_126

    Chapter  Google Scholar 

  15. Graves, A., Jaitly, N., Mohamed, A.R.: Hybrid speech recognition with deep bidirectional LSTM. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 273–278. IEEE (2013)

    Google Scholar 

  16. Greff, K., Srivastava, R.K., Koutník, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2017)

    Article  MathSciNet  Google Scholar 

  17. Han, K., He, Y., Bagchi, D., Fosler-Lussier, E., Wang, D.: Deep neural network based spectral feature mapping for robust speech recognition. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)

    Google Scholar 

  18. Healy, E.W., Yoho, S.E., Wang, Y., Wang, D.: An algorithm to improve speech recognition in noise for hearing-impaired listeners. J. Acoust. Soc. Am. 134(4), 3029–3038 (2013)

    Article  Google Scholar 

  19. Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  20. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  21. Ishii, T., Komiyama, H., Shinozaki, T., Horiuchi, Y., Kuroiwa, S.: Reverberant speech recognition based on denoising autoencoder. In: INTERSPEECH, pp. 3512–3516 (2013)

    Google Scholar 

  22. Kim, D., et al.: Digits: freehand 3D interactions anywhere using a wrist-worn gloveless sensor. In: Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology, pp. 167–176. ACM (2012)

    Google Scholar 

  23. Kolasinska, A., Quadrio, G., Gaggi, O., Palazzi, C.E.: Technology and aging: users’ preferences in wearable sensor networks. In: Proceedings of the 4th EAI International Conference on Smart Objects and Technologies for Social Good, pp. 77–81. ACM (2018)

    Google Scholar 

  24. Llombart, J., Ribas, D., Miguel, A., Vicente, L., Ortega, A., Lleida, E.: Speech enhancement with wide residual networks in reverberant environments. arXiv preprint arXiv:1904.05167 (2019)

  25. Maegaard, B., Choukri, K., Calzolari, N., Odijk, J.: ELRA-European Language Resources Association-background, recent developments and future perspectives. Lang. Resour. Eval. 39(1), 9–23 (2005)

    Article  Google Scholar 

  26. Manganiello, L., Vega, C., Rıos, A., Valcárcel, M.: Use of wavelet transform to enhance piezoelectric signals for analytical purposes. Anal. Chim. Acta 456(1), 93–103 (2002)

    Article  Google Scholar 

  27. Morabito, V.: Wearable technologies. The Future of Digital Business Innovation, pp. 23–42. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-26874-3_2

    Chapter  Google Scholar 

  28. Nanayakkara, S., Shilkrot, R., Yeo, K.P., Maes, P.: EyeRing: a finger-worn input device for seamless interactions with our surroundings. In: Proceedings of the 4th Augmented Human International Conference, pp. 13–20. ACM (2013)

    Google Scholar 

  29. Naylor, P.A., Gaubitch, N.D.: Speech Dereverberation. Springer, Heidelberg (2010). https://doi.org/10.1007/978-1-84996-056-4

    Book  MATH  Google Scholar 

  30. Ribas, D., Llombart, J., Miguel, A., Vicente, L.: Deep speech enhancement for reverberated and noisy signals using wide residual networks. arXiv preprint arXiv:1901.00660 (2019)

  31. Seltzer, M.L., Yu, D., Wang, Y.: An investigation of deep neural networks for noise robust speech recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7398–7402. IEEE (2013)

    Google Scholar 

  32. Sirohi, J., Chopra, I.: Fundamental understanding of piezoelectric strain sensors. J. Intell. Mater. Syst. Struct. 11(4), 246–257 (2000)

    Article  Google Scholar 

  33. Tressler, J.F., Alkoy, S., Newnham, R.E.: Piezoelectric sensors and sensor materials. J. Electroceram. 2(4), 257–272 (1998)

    Article  Google Scholar 

  34. Velázquez, R.: Wearable assistive devices for the blind. In: Lay-Ekuakille, A., Mukhopadhyay, S.C. (eds.) Wearable and Autonomous Biomedical Devices and Systems for Smart Environment. LNEE, vol. 75, pp. 331–349. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15687-8_17

    Chapter  Google Scholar 

  35. Villamizar, L.H., Gualdron, M., Gonzalez, F., Aceros, J., Rizzo-Sierra, C.V.: A necklace sonar with adjustable scope range for assisting the visually impaired. In: 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 1450–1453. IEEE (2013)

    Google Scholar 

  36. Vincent, E., Watanabe, S., Nugraha, A.A., Barker, J., Marxer, R.: An analysis of environment, microphone and data simulation mismatches in robust speech recognition. Comput. Speech Lang. 46, 535–557 (2017)

    Article  Google Scholar 

  37. Wilson, J., Walker, B.N., Lindsay, J., Cambias, C., Dellaert, F.: Swan: system for wearable audio navigation. In: 2007 11th IEEE International Symposium on Wearable Computers, pp. 91–98. IEEE (2007)

    Google Scholar 

  38. Yu, L., Bao, J., Giurgiutiu, V.: Signal processing techniques for damage detection with piezoelectric wafer active sensors and embedded ultrasonic structural radar. In: Smart Structures and Materials 2004: Sensors and Smart Structures Technologies for Civil, Mechanical, and Aerospace Systems, vol. 5391, pp. 492–504. International Society for Optics and Photonics (2004)

    Google Scholar 

  39. Zen, H., Sak, H.: Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4470–4474. IEEE (2015)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the University of Costa Rica (UCR), Project No. 322-B9-105 and ED-3416.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marvin Coto-Jiménez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

González-Salazar, A., Gutiérrez-Muñoz, M., Coto-Jiménez, M. (2020). Enhancing Speech Recorded from a Wearable Sensor Using a Collection of Autoencoders. In: Crespo-Mariño, J., Meneses-Rojas, E. (eds) High Performance Computing. CARLA 2019. Communications in Computer and Information Science, vol 1087. Springer, Cham. https://doi.org/10.1007/978-3-030-41005-6_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-41005-6_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-41004-9

  • Online ISBN: 978-3-030-41005-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics