Surface Realisation Using Factored Language Models and Input Seed Features

Barros, Cristina; Lloret, Elena

doi:10.1007/978-3-030-02840-4_2

Cristina Barros¹⁵ &
Elena Lloret¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10633))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

478 Accesses

Abstract

Natural Language Generation research field needs to move forward to the design and development of flexible and adaptive techniques and approaches capable of producing language automatically, for any domain, language and purpose. In light of this, the aim of this paper is to study the appropriateness of factored language models for the stage of surface realisation, thus presenting an almost-fully language independent statistical approach. Its main novelty is that it can be adapted to generate texts for different purposes or domains thanks to the use of an input seed feature that guides all the generation process. In the context of this research, the seed input is a phoneme and our goal is to generate a full meaningful sentence that maximises the amount of words containing that phoneme. We experimented with different factors, including lemmas or part-of-speech tags, based on a trigram language model. The analysis carried out with several configurations of our proposed approach showed an improvement of 47% and 40% as far as the total meaningful generated sentences is concerned, with respect to traditional language models, for English and Spanish, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

HanaNLG: A Flexible Hybrid Approach for Natural Language Generation

A Study on Flexibility in Natural Language Generation Through a Statistical Approach to Story Generation

Selected Challenges in Grammar-Based Text Generation from the Semantic Web

Notes

1.
http://www.dyslexia-reading-well.com/support-files/the-44-phonemes-of-english.pdf.

References

Axelrod, A.: Factored language models for statistical machine translation (2006)
Google Scholar
Ballesteros, M., Bohnet, B., Mille, S., Wanner, L.: Data-driven sentence generation with non-isomorphic trees. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 387–397. Association for Computational Linguistics, Denver, May–June 2015. http://www.aclweb.org/anthology/N15-1042
Barros, C., Lloret, E.: Input seed features for guiding the generation process: a statistical approach for spanish. In: Proceedings of the 15th European Workshop on Natural Language Generation (ENLG). Association for Computational Linguistics (ACL) (2015)
Google Scholar
Belz, A., Bohnet, B., Mille, S., Wanner, L., White, M.: The surface realisation task: recent developments and future plans. In: Proceedings of the Seventh International Natural Language Generation Conference, pp. 136–140. INLG 2012. Association for Computational Linguistics, Stroudsburg (2012). http://dl.acm.org/citation.cfm?id=2392712.2392743
Bilmes, J.A., Kirchhoff, K.: Factored language models and generalized parallel backoff. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Companion Volume of the Proceedings of HLT-NAACL 2003-Short Papers, vol. 2, pp. 4–6. Association for Computational Linguistics (2003)
Google Scholar
Crego, J.M., Yvon, F.: Factored bilingual n-gram language models for statistical machine translation. Mach. Transl. 24(2), 159–175 (2010)
Article Google Scholar
Dethlefs, N., Hastie, H., Cuayáhuitl, H., Lemon, O.: Conditional random fields for responsive surface realisation using global features. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 1254–1263. Association for Computational Linguistics, Sofia, August 2013. http://www.aclweb.org/anthology/P13-1123
Durrett, G., DeNero, J.: Supervised learning of complete morphological paradigms. In: Proceedings of the North American Chapter of the Association for Computational Linguistics (2013)
Google Scholar
Ge, T., Pei, W., Ji, H., Li, S., Chang, B., Sui, Z.: Bring you to the past: automatic generation of topically relevant event chronicles. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing: Long Papers, vol. 1, pp. 575–585. Association for Computational Linguistics, Beijing, July 2015. http://www.aclweb.org/anthology/P15-1056
Gerani, S., Mehdad, Y., Carenini, G., Ng, R.T., Nejat, B.: Abstractive summarization of product reviews using discourse structure. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1602–1613. Association for Computational Linguistics, Doha, October 2014. http://www.aclweb.org/anthology/D14-1168
Gyawali, B., Gardent, C.: Surface realisation from knowledge-bases. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 424–434. Association for Computational Linguistics, Baltimore, June 2014. http://www.aclweb.org/anthology/P14-1040
Isard, A., Brockmann, C., Oberlander, J.: Individuality and alignment in generated dialogues. In: Proceedings of the Fourth International Natural Language Generation Conference, pp. 25–32. Association for Computational Linguistics (2006)
Google Scholar
Kondadadi, R., Howald, B., Schilder, F.: A statistical NLG framework for aggregated planning and realization. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 1406–1415. Association for Computational Linguistics, Sofia, August 2013. http://www.aclweb.org/anthology/P13-1138
Konstas, I., Lapata, M.: A global model for concept-to-text generation. J. Artif. Int. Res. 48(1), 305–346 (2013). http://dl.acm.org/citation.cfm?id=2591248.2591256
Article Google Scholar
Lim-Cheng, N.R., Fabia, G.I.G., Quebral, M.E.G., Yu, M.T.: Shed: an online diet counselling system. In: DLSU Research Congress 2014 (2014)
Google Scholar
Mairesse, F., Young, S.: Stochastic language generation in dialogue using factored language models. Comput. Linguist. 40(4), 763–799 (2014)
Article Google Scholar
Morales, J.L.O.: Nuevo método de ortografía. Colección Cervantes, Verbum (1992)
Google Scholar
Nicolai, G., Cherry, C., Kondrak, G.: Inflection generation as discriminative string transduction. In: Proceedings of NAACL (2015)
Google Scholar
Novais, E.M., Paraboni, I.: Portuguese text generation using factored language models. J. Braz. Comput. Soc. 19(2), 135–146 (2012)
Article Google Scholar
Padró, L., Stanilovsky, E.: Freeling 3.0: towards wider multilinguality. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA) (2012)
Google Scholar
Ramos-Soto, A., Bugarín, A.J., Barro, S., Taboada, J.: Linguistic descriptions for automatic generation of textual short-term weather forecasts on real prediction data. IEEE Trans. Fuzzy Syst. 23(1), 44–57 (2015)
Article Google Scholar
Reiter, E., Dale, R.: Building Natural Language Generation Systems. Cambridge University Press, Cambridge (2000)
Book Google Scholar
Reiter, E., Turner, R., Alm, N., Black, R., Dempster, M., Waller, A.: Using NLG to help language-impaired users tell stories and participate in social dialogues. In: Proceedings of the 12th European Workshop on Natural Language Generation, pp. 1–8. Association for Computational Linguistics (2009)
Google Scholar
Rvachew, S., Rafaat, S., Martin, M.: Stimulability, speech perception skills, and the treatment of phonological disorders. Am. J. Speech-Lang. Pathol. 8(1), 33–43 (1999)
Article Google Scholar
Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings International Conference on Spoken Language Processing, vol. 2, pp. 901–904 (2002)
Google Scholar
Tachbelie, M.Y., Abate, S.T., Menzel, W.: Morpheme-based and factored language modeling for Amharic speech recognition. In: Vetulani, Z. (ed.) LTC 2009. LNCS (LNAI), vol. 6562, pp. 82–93. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20095-3_8
Chapter Google Scholar
Vergyri, D., Kirchhoff, K., Duh, K., Stolcke, A.: Morphology-based language modeling for Arabic speech recognition. In: INTERSPEECH, vol. 4, pp. 2245–2248 (2004)
Google Scholar
Wan, S., Dras, M., Dale, R., Paris, C.: Spanning tree approaches for statistical sentence generation. In: Empirical Methods in Natural Language Generation: Data-Oriented Methods and Empirical Evaluation, pp. 13–44 (2010). https://doi.org/10.1007/978-3-642-15573-4_2
Chapter Google Scholar
White, M., Rajkumar, R.: Perceptron reranking for CCG realization. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 1, pp. 410–419. Association for Computational Linguistics (2009)
Google Scholar

Download references

Acknowledgment

This research work has been partially funded by the Generalitat Valenciana through the projects “DIIM2.0: Desarrollo de técnicas Inteligentes e Interactivas de Minería y generación de información sobre la web 2.0” (PROMETEOII/2014/ 001); and partially funded by the Spanish Government through projects TIN2015-65100-R, TIN2015-65136-C2-2-R, as well as by the project “Análisis de Sentimientos Aplicado a la Prevención del Suicidio en las Redes Sociales (ASAP)” funded by Ayudas Fundación BBVA a equipos de investigación científica.

Author information

Authors and Affiliations

Department of Software and Computing Systems, University of Alicante, Apdo. de Correos 99, 03080, Alicante, Spain
Cristina Barros & Elena Lloret

Authors

Cristina Barros
View author publications
You can also search for this author in PubMed Google Scholar
Elena Lloret
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cristina Barros .

Editor information

Editors and Affiliations

Universidad Autónoma del Estado de Hidalgo, Pachuca, Mexico
Félix Castro
INFOTEC Aguascalientes, Aguascalientes, Mexico
Sabino Miranda-Jiménez
Tecnológico de Monterrey, Atizapán de Zaragoza, Mexico
Miguel González-Mendoza

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barros, C., Lloret, E. (2018). Surface Realisation Using Factored Language Models and Input Seed Features. In: Castro, F., Miranda-Jiménez, S., González-Mendoza, M. (eds) Advances in Computational Intelligence. MICAI 2017. Lecture Notes in Computer Science(), vol 10633. Springer, Cham. https://doi.org/10.1007/978-3-030-02840-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-02840-4_2
Published: 01 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02839-8
Online ISBN: 978-3-030-02840-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics