Abstract
A self-learning vocal user interface learns to map user-defined spoken commands to intended actions. The voice user interface is trained by mining the speech input and the provoked action on a device. Although this generic procedure allows a great deal of flexibility, it comes at a cost. Two requirements are important to create a user-friendly learning environment. First, the self-learning interface should be robust against typical errors that occur in the interaction between a non-expert user and the system. For instance, the user gives a wrong learning example to the system by commanding “Turn on the television!” and pushing a power button on the wrong remote control. The spoken command is then supervised by a wrong action and we refer to these errors as label noise. Secondly, the mapping between voice commands and intended actions should happen fast, i.e. require few examples. To meet these requirements, we implemented learning through supervised NMF. We tested keyword recognition accuracy for different levels of label noise and different sizes of training sets. Our learning approach is robust against label noise, but some improvement regarding fast mapping is desirable.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Heinroth, T., Grotz, M., Nothdurft, F., Minker, W.: Adaptive speech understanding for intuitive model-based spoken dialogues. In: Proceedings of LREC, pp. 1281–1288 (2012)
Taguchi, R., Iwahashi, N., Funakoshi, K., Nakano, M., Nose, T., Nitta, T.: Learning physically grounded lexicons from spoken utterances. In: Inaki, M.(ed.) Human Machine Interaction–Getting Closer, pp. 69–84 (2012). URL http://www.intechopen.com/books/human-machine-interaction-getting-closer/learning-physically-grounded-lexicons-from-spoken-utterances
van de Loo, J., Gemmeke, J.F., De Pauw, G., Driesen, J., Van hamme, H., Daelemans, W.: Towards a self-learning assistive vocal interface: Vocabulary and grammar learning. In: Proceedings of the workshop Speech and Multimodal Interaction in Assistive Environments (SMIAE) (2012)
Bootkrajang, J.: Learning with labeling errors. Tech. Rep. CSR-11-07, School of Computer Science, University of Birmingham (2011)
Driesen, J., ten Bosch, L., Van hamme, H.: Adaptive non-negative matrix factorization in a computational model of language acquisition. In: Proceedings of the Interspeech, pp. 1711–1714. Brighton, UK (2009)
Driesen, J., Gemmeke, J., Van hamme, H.: Weakly supervised keyword learning using sparse representations of speech. In: Proceedings ICASSP, pp. 5145–5148. Kyoto (2012)
Driesen, J., Van hamme, H.: Modelling vocabulary acquisition, adaptation, and generalization in infants using adaptive bayesian plsa. Neurocomputing 74, 1874–1882 (2011)
Lee, H., Yoo, J., choi, S.: Semi-supervised nonnegative matrix factorization. IEEE Signal Process. Lett. 17, 4–7 (2009)
Van hamme, H.: Hac-models: a novel approach to continuous speech recognition. In: Proceeding of Interspeech, pp. 255–258. Brisbane (2008)
Lee, D., Seung, H.: Learning the parts of objects by nonnegative matrix factorization. Nature 401, 788–791 (1999)
Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of ACM SIGIR Conference on Research and Development in Information RetrievalInterspeech. Toronto (2003)
Boves, L., ten Bosch, L., Moore, R.: Acorns-towards computational modeling of communication and recognition skills. In: Proceedings IEEE International Conference On Cognitive informatics, pp. 349–355. California (2007)
Driesen, J.: Discovering words in speech using matrix factorization. Ph.D. thesis, K.U.Leuven, ESAT (2012)
Acknowledgements
This work is funded by IWT-SBO project 100049 (ALADIN).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this paper
Cite this paper
Ons, B., Gemmeke, J.F., Van hamme, H. (2014). Label Noise Robustness and Learning Speed in a Self-Learning Vocal User Interface. In: Mariani, J., Rosset, S., Garnier-Rizet, M., Devillers, L. (eds) Natural Interaction with Robots, Knowbots and Smartphones. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8280-2_22
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8280-2_22
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8279-6
Online ISBN: 978-1-4614-8280-2
eBook Packages: EngineeringEngineering (R0)