A Template-Based Approach for Generating Vietnamese References from Flat MR Dataset in Restaurant Domain

Nguyen, Dang Tuan; Tran, Trung

doi:10.1007/978-981-33-4370-2_17

Dang Tuan Nguyen⁹ &
Trung Tran⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1306))

Included in the following conference series:

International Conference on Future Data and Security Engineering

1180 Accesses

Abstract

In recent years, researchers in natural language generation (NLG) focus on corpus-based systems on specific or across domains. The training data should consist of meaning representations (MRs) paired with Natural Language (NL) references. In the first content of the article, we introduce a Vietnamese Flat MR dataset which is the first Vietnamese dataset for training end-to-end, data-driven NLG systems in restaurant domain. We establish a method of generating references on this dataset. The core of the method are two important stages: (i) sentence planning which determine semantic template of the output text; (ii) surface realization which selecting appropriate Vietnamese phrases to replace the corresponding predicates (slot-value) of the Flat MR in the semantic template. The evaluation results show that the dataset and proposed generating method have contributed well to the development of the NLG research direction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Androutsopoulos, I., Lampouras, G., Galanis, D.: Generating natural language descriptions from OWL ontologies: the natural OWL system. J. Artif. Intell. Res. 48, 671–715 (2013)
Article Google Scholar
Ballesteros, M., Bohnet, B., Mille, S., Wanner, L.: Data-driven sentence generation with non-isomorphic trees. In: Proceedings of NAACL-HTL 2015, pp. 387–397 (2015)
Google Scholar
Bangalore, S., Stent, A.: Natural Language Generation in Interactive Systems. Cambridge University Press, Cambridge (2014)
Google Scholar
Bernardi, R., et al.: Automatic description generation from images: a survey of models, datasets, and evaluation measures. J. Artif. Intell. Res. 55, 409–442 (2016)
Article Google Scholar
Chen, X., et al.: Microsoft COCO Captions: Data Collection and Evaluation Server (2015)
Google Scholar
Chen, D.L., Mooney, R.J.: Learning to sportscast: a test of grounded language acquisition. In: Proceedings of the 25th International Conference on Machine learning (ICML), Helsinki, Finland, pp. 128–135 (2008)
Google Scholar
Chomsky, N.: Syntactic Structures, 2nd edn. Mouton de Gruyter (2002)
Google Scholar
Colin, E., Gardent, C., Mrabet, Y., Narayan, S., Beltrachini, P.L.: The webNLG challenge: generating text from DBPedia data. In: Proceedings of INLG 2016, pp. 163–167 (2016)
Google Scholar
Dethlefs, N., Hastie, H., Rieser, V., Lemon, O.: Optimising incremental dialogue decisions using information density for interactive systems. In: Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing, EMNLP, pp. 82–93 (2012)
Google Scholar
Dethlefs, N.: Context-sensitive natural language generation: from knowledge-driven to data-driven techniques. Lang. Linguist. Compass 8(3), 99–115 (2014)
Article Google Scholar
Doddington, G.: Automatic evaluation of machine translation quality using n-gram cooccurrence statistics. In: Proceedings of the 2nd International Conference on Human Language Technology Research, San Diego, CA, USA, pp. 138–145 (2002)
Google Scholar
Dong, L., Huang, S., Wei, F., Lapata, M., Zhou, M., Xu, K.: Learning to generate product reviews from attributes. In: Proceedings of EACL 2017, pp. 623–632 (2017)
Google Scholar
Dusek, O., Jurcicek, F.: Sequence-to-sequence generation for spoken dialogue via deep syntax trees and strings. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, pp. 45–51 (2016a)
Google Scholar
Dusek, O., Jurcicek, F.: A context-aware natural language generator for dialogue systems. In: Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Los Angeles, CA, USA, pp. 185–190 (2016b)
Google Scholar
Dusek, O., Jurcicek, F.: Training a natural language generator from unaligned data. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, pp. 451–461 (2015)
Google Scholar
Dusek, O., Novikova, J., Rieser, V.: Findings of the E2E NLG challenge (2018)
Google Scholar
Dusek, O., Novikova, J., Rieser, V.: Evaluating the state-of-the-art of end-to-end natural language generation: the E2E NLG challenge (2019)
Google Scholar
Gatt, A., Krahmer, E.: Survey of the state of the art in natural language generation: core tasks, applications and evaluation. J. Artif. Intell. Res. 61, 65–170 (2018)
Article MathSciNet Google Scholar
Gardent, C., Beltrachini, P.L.: A statistical, grammar-based approach to microplanning. Comput. Linguist. 43(1), 1–30 (2017)
Article MathSciNet Google Scholar
Gkatzia, D., Mahamood, S.: A snapshot of NLG evaluation practices 2005–2014. In: Proceedings of the Fifteenth European Workshop on Natural Language Generation (ENLG), pp. 57–60. Association for Computational Linguistics, Brighton, UK (2015)
Google Scholar
Halliday, M., Matthiessen, C.: An Introduction to Functional Grammar, 3rd edn. Hodder Arnold, London (2004)
Google Scholar
Herzig, J., Shmueli-Scheuer, M., Sandbank, T., Konopnicki, D.: Neural response generation for customer service based on personality traits. In: Proceedings of INLG 2017, pp. 252–256 (2017)
Google Scholar
Lampouras, G., Vlachos, A.: Imitation learning for language generation from unaligned data. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, pp. 1101–1112 (2016)
Google Scholar
Lavie, A., Agarwal, A.: METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the Second Workshop on Statistical Machine Translation, Prague, Czech Republic, pp. 228–231 (2007)
Google Scholar
Lebret, R., Grangier, D., Auli, M.: Generating Text from Structured Data with Application to the Biography Domain. CoRR, 1603.07771 (2016)
Google Scholar
Lepp, L., Munezero, M., Granroth-Wilding, M., Toivonen, H.: Data-driven news generation for automated journalism. In: Proceedings of INLG 2017, pp. 188–197 (2017)
Google Scholar
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004, Barcelona, Spain, pp. 74–81 (2004)
Google Scholar
Mairesse, F., et al.: Phrase-based statistical language generation using graphical models and active learning. In: Proceedings of the Forty-Eighth Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, pp. 1552–1561 (2010)
Google Scholar
Mei, H., Bansal, M., Walter, M.R.: What to talk about and how? Selective generation using LSTMs with coarse-to-fine alignment. In: Proceedings of NAACL-HLT, San Diego, CA, USA (2016)
Google Scholar
Novikova, J., Lemon, O., Rieser, V.: Crowd-sourcing NLG data: pictures elicit better data. In: Proceedings of the 9th International Natural Language Generation Conference, Edinburgh, UK, pp. 265–273 (2016)
Google Scholar
Novikova, J., Dusek, O., Rieser, V.: The E2E dataset: new challenges for end-to-end generation. In: Proceedings of the 18th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Saarbrücken, Germany, pp. 201–206 (2017a)
Google Scholar
Novikova, J., Dusek, O., Rieser, V.: Data-driven Natural Language Generation: Paving the Road to Success. arXiv preprint arXiv:1706.09433 (2017b)
Novikova, J., Dusek, O., Curry, A.C., Rieser, V.: Why we need new evaluation metrics for NLG. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp. 2241–2252 (2017c)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, pp. 311–318 (2002)
Google Scholar
Reiter, E., Dale, R.: Building Natural Language Generation System. Cambridge University Press, Cambridge (1997)
Google Scholar
Rieser, V., Lemon, O., Keizer, S.: Natural language generation as incremental planning under uncertainty: adaptive information presentation for statistical dialogue systems. IEEE/ACM Trans. Audio Speech Lang. Process. 22(5), 979–993 (2014). https://doi.org/10.1109/TASL.2014.2315271
Article Google Scholar
Sharma, S., He, J., Suleman, K., Schulz, H., Bachman, F.: Natural language generation in dialogue using lexicalized and delexicalized data. CoRR, abs/1606.03632 (2016)
Google Scholar
Tran, T.: Phương pháp xác định những câu hỏi tương đương nghĩa cho hệ thống tìm kiếm thư viện bằng truy vấn tiếng Việt [The method of identifying questions having the equivalent meaning for the library finding system by Vietnamese queries]. M.S. Thesis. University of Information Technology, VNU-HCM, Vietnam (2011)
Google Scholar
Vedantam, R., Zitnick, C.L., Parikh, D.: CIDEr: consensus-based image description evaluation. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, pp. 4566–4575 (2015)
Google Scholar
Wen, T.-H., Gasic, M., Mrksic, N., Su, P.-H., Vandyke, D., Young, S.: Semantically conditioned LSTM-based natural language generation for spoken dialogue systems. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 1711–1721 (2015)
Google Scholar
Wen, T.-H., et al.: Multi-domain neural network language generation for spoken dialogue systems. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, pp. 120–129 (2016)
Google Scholar
Zaidan, O.F., Callison-Burch, C.: Crowdsourcing translation: professional quality from non-professionals. In: Proceedings of the ACL, Portland, Oregon, USA, pp. 1220–1229 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Sai Gon University, Ho Chi Minh City, Vietnam
Dang Tuan Nguyen & Trung Tran

Authors

Dang Tuan Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Trung Tran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dang Tuan Nguyen .

Editor information

Editors and Affiliations

Ho Chi Minh City University of Technology, Ho Chi Minh City, Vietnam
Tran Khanh Dang
Johannes Kepler University of Linz, Linz, Austria
Josef Küng
Hosei University, Tokyo, Japan
Makoto Takizawa
Sungkyunkwan University, Suwon, Korea (Republic of)
Tai M. Chung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, D.T., Tran, T. (2020). A Template-Based Approach for Generating Vietnamese References from Flat MR Dataset in Restaurant Domain. In: Dang, T.K., Küng, J., Takizawa, M., Chung, T.M. (eds) Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications. FDSE 2020. Communications in Computer and Information Science, vol 1306. Springer, Singapore. https://doi.org/10.1007/978-981-33-4370-2_17

Download citation

DOI: https://doi.org/10.1007/978-981-33-4370-2_17
Published: 19 November 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-4369-6
Online ISBN: 978-981-33-4370-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics