Skip to main content

Label Noise Robustness and Learning Speed in a Self-Learning Vocal User Interface

  • Conference paper
  • First Online:
Natural Interaction with Robots, Knowbots and Smartphones

Abstract

A self-learning vocal user interface learns to map user-defined spoken commands to intended actions. The voice user interface is trained by mining the speech input and the provoked action on a device. Although this generic procedure allows a great deal of flexibility, it comes at a cost. Two requirements are important to create a user-friendly learning environment. First, the self-learning interface should be robust against typical errors that occur in the interaction between a non-expert user and the system. For instance, the user gives a wrong learning example to the system by commanding “Turn on the television!” and pushing a power button on the wrong remote control. The spoken command is then supervised by a wrong action and we refer to these errors as label noise. Secondly, the mapping between voice commands and intended actions should happen fast, i.e. require few examples. To meet these requirements, we implemented learning through supervised NMF. We tested keyword recognition accuracy for different levels of label noise and different sizes of training sets. Our learning approach is robust against label noise, but some improvement regarding fast mapping is desirable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Heinroth, T., Grotz, M., Nothdurft, F., Minker, W.: Adaptive speech understanding for intuitive model-based spoken dialogues. In: Proceedings of LREC, pp. 1281–1288 (2012)

    Google Scholar 

  2. Taguchi, R., Iwahashi, N., Funakoshi, K., Nakano, M., Nose, T., Nitta, T.: Learning physically grounded lexicons from spoken utterances. In: Inaki, M.(ed.) Human Machine Interaction–Getting Closer, pp. 69–84 (2012). URL http://www.intechopen.com/books/human-machine-interaction-getting-closer/learning-physically-grounded-lexicons-from-spoken-utterances

  3. van de Loo, J., Gemmeke, J.F., De Pauw, G., Driesen, J., Van hamme, H., Daelemans, W.: Towards a self-learning assistive vocal interface: Vocabulary and grammar learning. In: Proceedings of the workshop Speech and Multimodal Interaction in Assistive Environments (SMIAE) (2012)

    Google Scholar 

  4. Bootkrajang, J.: Learning with labeling errors. Tech. Rep. CSR-11-07, School of Computer Science, University of Birmingham (2011)

    Google Scholar 

  5. Driesen, J., ten Bosch, L.,  Van hamme, H.: Adaptive non-negative matrix factorization in a computational model of language acquisition. In: Proceedings of the Interspeech, pp. 1711–1714. Brighton, UK (2009)

    Google Scholar 

  6. Driesen, J., Gemmeke, J., Van hamme, H.: Weakly supervised keyword learning using sparse representations of speech. In: Proceedings ICASSP, pp. 5145–5148. Kyoto (2012)

    Google Scholar 

  7. Driesen, J., Van hamme, H.: Modelling vocabulary acquisition, adaptation, and generalization in infants using adaptive bayesian plsa. Neurocomputing 74, 1874–1882 (2011)

    Article  Google Scholar 

  8. Lee, H., Yoo, J., choi, S.: Semi-supervised nonnegative matrix factorization. IEEE Signal Process. Lett. 17, 4–7 (2009)

    Google Scholar 

  9. Van hamme, H.: Hac-models: a novel approach to continuous speech recognition. In: Proceeding of Interspeech, pp. 255–258. Brisbane (2008)

    Google Scholar 

  10. Lee, D., Seung, H.: Learning the parts of objects by nonnegative matrix factorization. Nature 401, 788–791 (1999)

    Article  Google Scholar 

  11. Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of ACM SIGIR Conference on Research and Development in Information RetrievalInterspeech. Toronto (2003)

    Google Scholar 

  12. Boves, L., ten Bosch, L., Moore, R.: Acorns-towards computational modeling of communication and recognition skills. In: Proceedings IEEE International Conference On Cognitive informatics, pp. 349–355. California (2007)

    Google Scholar 

  13. Driesen, J.: Discovering words in speech using matrix factorization. Ph.D. thesis, K.U.Leuven, ESAT (2012)

    Google Scholar 

Download references

Acknowledgements

This work is funded by IWT-SBO project 100049 (ALADIN).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bart Ons .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this paper

Cite this paper

Ons, B., Gemmeke, J.F., Van hamme, H. (2014). Label Noise Robustness and Learning Speed in a Self-Learning Vocal User Interface. In: Mariani, J., Rosset, S., Garnier-Rizet, M., Devillers, L. (eds) Natural Interaction with Robots, Knowbots and Smartphones. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8280-2_22

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-8280-2_22

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-8279-6

  • Online ISBN: 978-1-4614-8280-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics