Identifying User Intents in Vietnamese Spoken Language Commands and Its Application in Smart Mobile Voice Interaction

Ngo, Thi-Lan; Nguyen, Van-Hop; Vuong, Thi-Hai-Yen; Nguyen, Thac-Thong; Nguyen, Thi-Thua; Pham, Bao-Son; Phan, Xuan-Hieu

doi:10.1007/978-3-662-49381-6_19

Thi-Lan Ngo^8,9,
Van-Hop Nguyen⁹,
Thi-Hai-Yen Vuong⁹,
Thac-Thong Nguyen⁹,
Thi-Thua Nguyen⁹,
Bao-Son Pham⁹ &
…
Xuan-Hieu Phan⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9621))

Included in the following conference series:

Asian Conference on Intelligent Information and Database Systems

2341 Accesses

Abstract

This paper presents a lightweight machine learning model and a fast conjunction matching method to the problem of identifying user intents behind their spoken text commands. These model and method were integrated into a mobile virtual assistant for Vietnamese (VAV) to understand what mobile users mean to carry out on their smartphones via their commands. User intent, in the scope of our work, is an action associated with a particular mobile application. Given an input spoken command, its application will be identified by an accurate classifier while the action will be determined by a flexible conjunction matching algorithm. Our classifier and conjunction matcher are very compact in order that we can store and execute them right on mobile devices. To evaluate the classifier and the matcher, we annotated a medium-sized data set, conducting various experiments with different settings, and achieving impressive accuracy for both the application and action identification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Microsoft Skype Translator and AT&T Speech-to-Speech Translation.
2.
Wit.ai: https://wit.ai.
3.
VAV: https://play.google.com/store/apps/details?id=com.mdnteam.vav.

References

Angelov, K., Bringert, B., Ranta, A.: Speech-enabled hybrid multilingual translation for mobile devices. In: EACL (2014)
Google Scholar
Bastianelli, E., Castellucci, G., Croce, D., Basili, R., Nardi, D.: Effective and robust NLU for human-robot interaction. In: ECAI, vol. 263, pp. 57–62 (2014)
Google Scholar
Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Comput. Linguist. 22(1), 39–71 (1996)
Google Scholar
Borthwick, A.: A maximum entropy approach to named entity recognition. Ph.D. dissertation, Deptartment of CS, New York University (1999)
Google Scholar
Branavan, S.R.K., Chen, H., Zettlemoyer, L.S., Barzilay, R.: Reinforcement learning for mapping instructions to actions. In: ACL/IJCNLP, pp. 82–90 (2009)
Google Scholar
Branavan, S.R.K., Zettlemoyer, L.S., Barzilay, R.: Reading between the lines: learning to map high-level instructions to commands. In: ACL, pp. 1268–1277 (2010)
Google Scholar
Bratman, M.: Intention, Plans, and Practical Reason. Harvard University Press, Cambridge (1987)
Google Scholar
Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: ICML (2014)
Google Scholar
Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A., Ng, A.Y.: Deep Speech: scaling up end-to-end speech recognition (2014). arxiv.org/abs/1412.5567v2
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29, 82–97 (2012)
Article Google Scholar
Liu, D., Nocedal, J.: On the limited memory BFGS method for large-scale optimization. Math. Program. 45, 503–528 (1989)
Article MathSciNet MATH Google Scholar
Popkin, J.: Google, apple siri and IBM watson: the future of natural-language question answering in your enterprise. Gartner Technical Professional Advice (2013)
Google Scholar
Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: EMNLP, vol.1, pp. 133–142 (1996)
Google Scholar
Tellex, S., Kollar, T., Dickerson, S., Walter, M.R., Banerjee, A.G., Teller, S., Roy, N.: Understanding natural language commands for robotic navigation and mobile manipulation. In: AAAI (2011)
Google Scholar
Tur, G., Mori, R.D.: Spoken Language Understanding: Systems for Extracting Semantic Information from Speech. Wiley, New York (2011)
Book MATH Google Scholar

Download references

Acknowledgment

This work was supported by the project QG.15.29 from Vietnam National University, Hanoi (VNU).

Author information

Authors and Affiliations

University of Information and Communication Technology, Thai Nguyen University, Thai Nguyen, Vietnam
Thi-Lan Ngo
University of Engineering and Technology, Vietnam National University, Hanoi, Vietnam
Thi-Lan Ngo, Van-Hop Nguyen, Thi-Hai-Yen Vuong, Thac-Thong Nguyen, Thi-Thua Nguyen, Bao-Son Pham & Xuan-Hieu Phan

Authors

Thi-Lan Ngo
View author publications
You can also search for this author in PubMed Google Scholar
Van-Hop Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Thi-Hai-Yen Vuong
View author publications
You can also search for this author in PubMed Google Scholar
Thac-Thong Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Thi-Thua Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Bao-Son Pham
View author publications
You can also search for this author in PubMed Google Scholar
Xuan-Hieu Phan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thi-Lan Ngo .

Editor information

Editors and Affiliations

Wrocław University of Technology, Wrocław, Poland
Ngoc Thanh Nguyen
Wrocław University of Technology, Wrocław, Poland
Bogdan Trawiński
Iwate Prefectural University, Takizawa, Japan
Hamido Fujita
National University of Kaohsiung, Kaohsiung, Taiwan
Tzung-Pei Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ngo, TL. et al. (2016). Identifying User Intents in Vietnamese Spoken Language Commands and Its Application in Smart Mobile Voice Interaction. In: Nguyen, N.T., Trawiński, B., Fujita, H., Hong, TP. (eds) Intelligent Information and Database Systems. ACIIDS 2016. Lecture Notes in Computer Science(), vol 9621. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49381-6_19

Download citation

DOI: https://doi.org/10.1007/978-3-662-49381-6_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49380-9
Online ISBN: 978-3-662-49381-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics