Skip to main content

Identifying User Intents in Vietnamese Spoken Language Commands and Its Application in Smart Mobile Voice Interaction

  • Conference paper
Intelligent Information and Database Systems (ACIIDS 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9621))

Included in the following conference series:

  • 2341 Accesses

Abstract

This paper presents a lightweight machine learning model and a fast conjunction matching method to the problem of identifying user intents behind their spoken text commands. These model and method were integrated into a mobile virtual assistant for Vietnamese (VAV) to understand what mobile users mean to carry out on their smartphones via their commands. User intent, in the scope of our work, is an action associated with a particular mobile application. Given an input spoken command, its application will be identified by an accurate classifier while the action will be determined by a flexible conjunction matching algorithm. Our classifier and conjunction matcher are very compact in order that we can store and execute them right on mobile devices. To evaluate the classifier and the matcher, we annotated a medium-sized data set, conducting various experiments with different settings, and achieving impressive accuracy for both the application and action identification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Microsoft Skype Translator and AT&T Speech-to-Speech Translation.

  2. 2.

    Wit.ai: https://wit.ai.

  3. 3.

    VAV: https://play.google.com/store/apps/details?id=com.mdnteam.vav.

References

  1. Angelov, K., Bringert, B., Ranta, A.: Speech-enabled hybrid multilingual translation for mobile devices. In: EACL (2014)

    Google Scholar 

  2. Bastianelli, E., Castellucci, G., Croce, D., Basili, R., Nardi, D.: Effective and robust NLU for human-robot interaction. In: ECAI, vol. 263, pp. 57–62 (2014)

    Google Scholar 

  3. Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Comput. Linguist. 22(1), 39–71 (1996)

    Google Scholar 

  4. Borthwick, A.: A maximum entropy approach to named entity recognition. Ph.D. dissertation, Deptartment of CS, New York University (1999)

    Google Scholar 

  5. Branavan, S.R.K., Chen, H., Zettlemoyer, L.S., Barzilay, R.: Reinforcement learning for mapping instructions to actions. In: ACL/IJCNLP, pp. 82–90 (2009)

    Google Scholar 

  6. Branavan, S.R.K., Zettlemoyer, L.S., Barzilay, R.: Reading between the lines: learning to map high-level instructions to commands. In: ACL, pp. 1268–1277 (2010)

    Google Scholar 

  7. Bratman, M.: Intention, Plans, and Practical Reason. Harvard University Press, Cambridge (1987)

    Google Scholar 

  8. Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: ICML (2014)

    Google Scholar 

  9. Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A., Ng, A.Y.: Deep Speech: scaling up end-to-end speech recognition  (2014). arxiv.org/abs/1412.5567v2

  10. Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29, 82–97 (2012)

    Article  Google Scholar 

  11. Liu, D., Nocedal, J.: On the limited memory BFGS method for large-scale optimization. Math. Program. 45, 503–528 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  12. Popkin, J.: Google, apple siri and IBM watson: the future of natural-language question answering in your enterprise. Gartner Technical Professional Advice (2013)

    Google Scholar 

  13. Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: EMNLP, vol.1, pp. 133–142 (1996)

    Google Scholar 

  14. Tellex, S., Kollar, T., Dickerson, S., Walter, M.R., Banerjee, A.G., Teller, S., Roy, N.: Understanding natural language commands for robotic navigation and mobile manipulation. In: AAAI (2011)

    Google Scholar 

  15. Tur, G., Mori, R.D.: Spoken Language Understanding: Systems for Extracting Semantic Information from Speech. Wiley, New York (2011)

    Book  MATH  Google Scholar 

Download references

Acknowledgment

This work was supported by the project QG.15.29 from Vietnam National University, Hanoi (VNU).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thi-Lan Ngo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ngo, TL. et al. (2016). Identifying User Intents in Vietnamese Spoken Language Commands and Its Application in Smart Mobile Voice Interaction. In: Nguyen, N.T., Trawiński, B., Fujita, H., Hong, TP. (eds) Intelligent Information and Database Systems. ACIIDS 2016. Lecture Notes in Computer Science(), vol 9621. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49381-6_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-49381-6_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-49380-9

  • Online ISBN: 978-3-662-49381-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics