Abstract
In recent years, researchers in natural language generation (NLG) focus on corpus-based systems on specific or across domains. The training data should consist of meaning representations (MRs) paired with Natural Language (NL) references. In the first content of the article, we introduce a Vietnamese Flat MR dataset which is the first Vietnamese dataset for training end-to-end, data-driven NLG systems in restaurant domain. We establish a method of generating references on this dataset. The core of the method are two important stages: (i) sentence planning which determine semantic template of the output text; (ii) surface realization which selecting appropriate Vietnamese phrases to replace the corresponding predicates (slot-value) of the Flat MR in the semantic template. The evaluation results show that the dataset and proposed generating method have contributed well to the development of the NLG research direction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Androutsopoulos, I., Lampouras, G., Galanis, D.: Generating natural language descriptions from OWL ontologies: the natural OWL system. J. Artif. Intell. Res. 48, 671–715 (2013)
Ballesteros, M., Bohnet, B., Mille, S., Wanner, L.: Data-driven sentence generation with non-isomorphic trees. In: Proceedings of NAACL-HTL 2015, pp. 387–397 (2015)
Bangalore, S., Stent, A.: Natural Language Generation in Interactive Systems. Cambridge University Press, Cambridge (2014)
Bernardi, R., et al.: Automatic description generation from images: a survey of models, datasets, and evaluation measures. J. Artif. Intell. Res. 55, 409–442 (2016)
Chen, X., et al.: Microsoft COCO Captions: Data Collection and Evaluation Server (2015)
Chen, D.L., Mooney, R.J.: Learning to sportscast: a test of grounded language acquisition. In: Proceedings of the 25th International Conference on Machine learning (ICML), Helsinki, Finland, pp. 128–135 (2008)
Chomsky, N.: Syntactic Structures, 2nd edn. Mouton de Gruyter (2002)
Colin, E., Gardent, C., Mrabet, Y., Narayan, S., Beltrachini, P.L.: The webNLG challenge: generating text from DBPedia data. In: Proceedings of INLG 2016, pp. 163–167 (2016)
Dethlefs, N., Hastie, H., Rieser, V., Lemon, O.: Optimising incremental dialogue decisions using information density for interactive systems. In: Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing, EMNLP, pp. 82–93 (2012)
Dethlefs, N.: Context-sensitive natural language generation: from knowledge-driven to data-driven techniques. Lang. Linguist. Compass 8(3), 99–115 (2014)
Doddington, G.: Automatic evaluation of machine translation quality using n-gram cooccurrence statistics. In: Proceedings of the 2nd International Conference on Human Language Technology Research, San Diego, CA, USA, pp. 138–145 (2002)
Dong, L., Huang, S., Wei, F., Lapata, M., Zhou, M., Xu, K.: Learning to generate product reviews from attributes. In: Proceedings of EACL 2017, pp. 623–632 (2017)
Dusek, O., Jurcicek, F.: Sequence-to-sequence generation for spoken dialogue via deep syntax trees and strings. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, pp. 45–51 (2016a)
Dusek, O., Jurcicek, F.: A context-aware natural language generator for dialogue systems. In: Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Los Angeles, CA, USA, pp. 185–190 (2016b)
Dusek, O., Jurcicek, F.: Training a natural language generator from unaligned data. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, pp. 451–461 (2015)
Dusek, O., Novikova, J., Rieser, V.: Findings of the E2E NLG challenge (2018)
Dusek, O., Novikova, J., Rieser, V.: Evaluating the state-of-the-art of end-to-end natural language generation: the E2E NLG challenge (2019)
Gatt, A., Krahmer, E.: Survey of the state of the art in natural language generation: core tasks, applications and evaluation. J. Artif. Intell. Res. 61, 65–170 (2018)
Gardent, C., Beltrachini, P.L.: A statistical, grammar-based approach to microplanning. Comput. Linguist. 43(1), 1–30 (2017)
Gkatzia, D., Mahamood, S.: A snapshot of NLG evaluation practices 2005–2014. In: Proceedings of the Fifteenth European Workshop on Natural Language Generation (ENLG), pp. 57–60. Association for Computational Linguistics, Brighton, UK (2015)
Halliday, M., Matthiessen, C.: An Introduction to Functional Grammar, 3rd edn. Hodder Arnold, London (2004)
Herzig, J., Shmueli-Scheuer, M., Sandbank, T., Konopnicki, D.: Neural response generation for customer service based on personality traits. In: Proceedings of INLG 2017, pp. 252–256 (2017)
Lampouras, G., Vlachos, A.: Imitation learning for language generation from unaligned data. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, pp. 1101–1112 (2016)
Lavie, A., Agarwal, A.: METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the Second Workshop on Statistical Machine Translation, Prague, Czech Republic, pp. 228–231 (2007)
Lebret, R., Grangier, D., Auli, M.: Generating Text from Structured Data with Application to the Biography Domain. CoRR, 1603.07771 (2016)
Lepp, L., Munezero, M., Granroth-Wilding, M., Toivonen, H.: Data-driven news generation for automated journalism. In: Proceedings of INLG 2017, pp. 188–197 (2017)
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004, Barcelona, Spain, pp. 74–81 (2004)
Mairesse, F., et al.: Phrase-based statistical language generation using graphical models and active learning. In: Proceedings of the Forty-Eighth Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, pp. 1552–1561 (2010)
Mei, H., Bansal, M., Walter, M.R.: What to talk about and how? Selective generation using LSTMs with coarse-to-fine alignment. In: Proceedings of NAACL-HLT, San Diego, CA, USA (2016)
Novikova, J., Lemon, O., Rieser, V.: Crowd-sourcing NLG data: pictures elicit better data. In: Proceedings of the 9th International Natural Language Generation Conference, Edinburgh, UK, pp. 265–273 (2016)
Novikova, J., Dusek, O., Rieser, V.: The E2E dataset: new challenges for end-to-end generation. In: Proceedings of the 18th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Saarbrücken, Germany, pp. 201–206 (2017a)
Novikova, J., Dusek, O., Rieser, V.: Data-driven Natural Language Generation: Paving the Road to Success. arXiv preprint arXiv:1706.09433 (2017b)
Novikova, J., Dusek, O., Curry, A.C., Rieser, V.: Why we need new evaluation metrics for NLG. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp. 2241–2252 (2017c)
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, pp. 311–318 (2002)
Reiter, E., Dale, R.: Building Natural Language Generation System. Cambridge University Press, Cambridge (1997)
Rieser, V., Lemon, O., Keizer, S.: Natural language generation as incremental planning under uncertainty: adaptive information presentation for statistical dialogue systems. IEEE/ACM Trans. Audio Speech Lang. Process. 22(5), 979–993 (2014). https://doi.org/10.1109/TASL.2014.2315271
Sharma, S., He, J., Suleman, K., Schulz, H., Bachman, F.: Natural language generation in dialogue using lexicalized and delexicalized data. CoRR, abs/1606.03632 (2016)
Tran, T.: Phương pháp xác định những câu hỏi tương đương nghĩa cho hệ thống tìm kiếm thư viện bằng truy vấn tiếng Việt [The method of identifying questions having the equivalent meaning for the library finding system by Vietnamese queries]. M.S. Thesis. University of Information Technology, VNU-HCM, Vietnam (2011)
Vedantam, R., Zitnick, C.L., Parikh, D.: CIDEr: consensus-based image description evaluation. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, pp. 4566–4575 (2015)
Wen, T.-H., Gasic, M., Mrksic, N., Su, P.-H., Vandyke, D., Young, S.: Semantically conditioned LSTM-based natural language generation for spoken dialogue systems. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 1711–1721 (2015)
Wen, T.-H., et al.: Multi-domain neural network language generation for spoken dialogue systems. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, pp. 120–129 (2016)
Zaidan, O.F., Callison-Burch, C.: Crowdsourcing translation: professional quality from non-professionals. In: Proceedings of the ACL, Portland, Oregon, USA, pp. 1220–1229 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Nguyen, D.T., Tran, T. (2020). A Template-Based Approach for Generating Vietnamese References from Flat MR Dataset in Restaurant Domain. In: Dang, T.K., Küng, J., Takizawa, M., Chung, T.M. (eds) Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications. FDSE 2020. Communications in Computer and Information Science, vol 1306. Springer, Singapore. https://doi.org/10.1007/978-981-33-4370-2_17
Download citation
DOI: https://doi.org/10.1007/978-981-33-4370-2_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-4369-6
Online ISBN: 978-981-33-4370-2
eBook Packages: Computer ScienceComputer Science (R0)