Label Noise Robustness and Learning Speed in a Self-Learning Vocal User Interface

Ons, Bart; Gemmeke, Jort F.; Van hamme, Hugo

doi:10.1007/978-1-4614-8280-2_22

Bart Ons⁵,
Jort F. Gemmeke⁵ &
Hugo Van hamme⁵

1510 Accesses

Abstract

A self-learning vocal user interface learns to map user-defined spoken commands to intended actions. The voice user interface is trained by mining the speech input and the provoked action on a device. Although this generic procedure allows a great deal of flexibility, it comes at a cost. Two requirements are important to create a user-friendly learning environment. First, the self-learning interface should be robust against typical errors that occur in the interaction between a non-expert user and the system. For instance, the user gives a wrong learning example to the system by commanding “Turn on the television!” and pushing a power button on the wrong remote control. The spoken command is then supervised by a wrong action and we refer to these errors as label noise. Secondly, the mapping between voice commands and intended actions should happen fast, i.e. require few examples. To meet these requirements, we implemented learning through supervised NMF. We tested keyword recognition accuracy for different levels of label noise and different sizes of training sets. Our learning approach is robust against label noise, but some improvement regarding fast mapping is desirable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Heinroth, T., Grotz, M., Nothdurft, F., Minker, W.: Adaptive speech understanding for intuitive model-based spoken dialogues. In: Proceedings of LREC, pp. 1281–1288 (2012)
Google Scholar
Taguchi, R., Iwahashi, N., Funakoshi, K., Nakano, M., Nose, T., Nitta, T.: Learning physically grounded lexicons from spoken utterances. In: Inaki, M.(ed.) Human Machine Interaction–Getting Closer, pp. 69–84 (2012). URL http://www.intechopen.com/books/human-machine-interaction-getting-closer/learning-physically-grounded-lexicons-from-spoken-utterances
van de Loo, J., Gemmeke, J.F., De Pauw, G., Driesen, J., Van hamme, H., Daelemans, W.: Towards a self-learning assistive vocal interface: Vocabulary and grammar learning. In: Proceedings of the workshop Speech and Multimodal Interaction in Assistive Environments (SMIAE) (2012)
Google Scholar
Bootkrajang, J.: Learning with labeling errors. Tech. Rep. CSR-11-07, School of Computer Science, University of Birmingham (2011)
Google Scholar
Driesen, J., ten Bosch, L., Van hamme, H.: Adaptive non-negative matrix factorization in a computational model of language acquisition. In: Proceedings of the Interspeech, pp. 1711–1714. Brighton, UK (2009)
Google Scholar
Driesen, J., Gemmeke, J., Van hamme, H.: Weakly supervised keyword learning using sparse representations of speech. In: Proceedings ICASSP, pp. 5145–5148. Kyoto (2012)
Google Scholar
Driesen, J., Van hamme, H.: Modelling vocabulary acquisition, adaptation, and generalization in infants using adaptive bayesian plsa. Neurocomputing 74, 1874–1882 (2011)
Article Google Scholar
Lee, H., Yoo, J., choi, S.: Semi-supervised nonnegative matrix factorization. IEEE Signal Process. Lett. 17, 4–7 (2009)
Google Scholar
Van hamme, H.: Hac-models: a novel approach to continuous speech recognition. In: Proceeding of Interspeech, pp. 255–258. Brisbane (2008)
Google Scholar
Lee, D., Seung, H.: Learning the parts of objects by nonnegative matrix factorization. Nature 401, 788–791 (1999)
Article Google Scholar
Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of ACM SIGIR Conference on Research and Development in Information RetrievalInterspeech. Toronto (2003)
Google Scholar
Boves, L., ten Bosch, L., Moore, R.: Acorns-towards computational modeling of communication and recognition skills. In: Proceedings IEEE International Conference On Cognitive informatics, pp. 349–355. California (2007)
Google Scholar
Driesen, J.: Discovering words in speech using matrix factorization. Ph.D. thesis, K.U.Leuven, ESAT (2012)
Google Scholar

Download references

Acknowledgements

This work is funded by IWT-SBO project 100049 (ALADIN).

Author information

Authors and Affiliations

Department ESAT-PSI, KU Leuven, Leuven, Belgium
Bart Ons, Jort F. Gemmeke & Hugo Van hamme

Authors

Bart Ons
View author publications
You can also search for this author in PubMed Google Scholar
Jort F. Gemmeke
View author publications
You can also search for this author in PubMed Google Scholar
Hugo Van hamme
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bart Ons .

Editor information

Editors and Affiliations

IMMI-CNRS, Orsay, France
Joseph Mariani
LIMSI-CNRS, Orsay, France
Sophie Rosset
IMMI-CNRS, Orsay, France
Martine Garnier-Rizet
LIMSI-CNRS, Orsay, France
Laurence Devillers

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ons, B., Gemmeke, J.F., Van hamme, H. (2014). Label Noise Robustness and Learning Speed in a Self-Learning Vocal User Interface. In: Mariani, J., Rosset, S., Garnier-Rizet, M., Devillers, L. (eds) Natural Interaction with Robots, Knowbots and Smartphones. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8280-2_22

Download citation

DOI: https://doi.org/10.1007/978-1-4614-8280-2_22
Published: 28 August 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8279-6
Online ISBN: 978-1-4614-8280-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics