Skip to main content

Surface Realisation Using Factored Language Models and Input Seed Features

  • Conference paper
  • First Online:
Advances in Computational Intelligence (MICAI 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10633))

Included in the following conference series:

  • 433 Accesses

Abstract

Natural Language Generation research field needs to move forward to the design and development of flexible and adaptive techniques and approaches capable of producing language automatically, for any domain, language and purpose. In light of this, the aim of this paper is to study the appropriateness of factored language models for the stage of surface realisation, thus presenting an almost-fully language independent statistical approach. Its main novelty is that it can be adapted to generate texts for different purposes or domains thanks to the use of an input seed feature that guides all the generation process. In the context of this research, the seed input is a phoneme and our goal is to generate a full meaningful sentence that maximises the amount of words containing that phoneme. We experimented with different factors, including lemmas or part-of-speech tags, based on a trigram language model. The analysis carried out with several configurations of our proposed approach showed an improvement of 47% and 40% as far as the total meaningful generated sentences is concerned, with respect to traditional language models, for English and Spanish, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.dyslexia-reading-well.com/support-files/the-44-phonemes-of-english.pdf.

References

  1. Axelrod, A.: Factored language models for statistical machine translation (2006)

    Google Scholar 

  2. Ballesteros, M., Bohnet, B., Mille, S., Wanner, L.: Data-driven sentence generation with non-isomorphic trees. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 387–397. Association for Computational Linguistics, Denver, May–June 2015. http://www.aclweb.org/anthology/N15-1042

  3. Barros, C., Lloret, E.: Input seed features for guiding the generation process: a statistical approach for spanish. In: Proceedings of the 15th European Workshop on Natural Language Generation (ENLG). Association for Computational Linguistics (ACL) (2015)

    Google Scholar 

  4. Belz, A., Bohnet, B., Mille, S., Wanner, L., White, M.: The surface realisation task: recent developments and future plans. In: Proceedings of the Seventh International Natural Language Generation Conference, pp. 136–140. INLG 2012. Association for Computational Linguistics, Stroudsburg (2012). http://dl.acm.org/citation.cfm?id=2392712.2392743

  5. Bilmes, J.A., Kirchhoff, K.: Factored language models and generalized parallel backoff. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Companion Volume of the Proceedings of HLT-NAACL 2003-Short Papers, vol. 2, pp. 4–6. Association for Computational Linguistics (2003)

    Google Scholar 

  6. Crego, J.M., Yvon, F.: Factored bilingual n-gram language models for statistical machine translation. Mach. Transl. 24(2), 159–175 (2010)

    Article  Google Scholar 

  7. Dethlefs, N., Hastie, H., Cuayáhuitl, H., Lemon, O.: Conditional random fields for responsive surface realisation using global features. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 1254–1263. Association for Computational Linguistics, Sofia, August 2013. http://www.aclweb.org/anthology/P13-1123

  8. Durrett, G., DeNero, J.: Supervised learning of complete morphological paradigms. In: Proceedings of the North American Chapter of the Association for Computational Linguistics (2013)

    Google Scholar 

  9. Ge, T., Pei, W., Ji, H., Li, S., Chang, B., Sui, Z.: Bring you to the past: automatic generation of topically relevant event chronicles. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing: Long Papers, vol. 1, pp. 575–585. Association for Computational Linguistics, Beijing, July 2015. http://www.aclweb.org/anthology/P15-1056

  10. Gerani, S., Mehdad, Y., Carenini, G., Ng, R.T., Nejat, B.: Abstractive summarization of product reviews using discourse structure. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1602–1613. Association for Computational Linguistics, Doha, October 2014. http://www.aclweb.org/anthology/D14-1168

  11. Gyawali, B., Gardent, C.: Surface realisation from knowledge-bases. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 424–434. Association for Computational Linguistics, Baltimore, June 2014. http://www.aclweb.org/anthology/P14-1040

  12. Isard, A., Brockmann, C., Oberlander, J.: Individuality and alignment in generated dialogues. In: Proceedings of the Fourth International Natural Language Generation Conference, pp. 25–32. Association for Computational Linguistics (2006)

    Google Scholar 

  13. Kondadadi, R., Howald, B., Schilder, F.: A statistical NLG framework for aggregated planning and realization. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 1406–1415. Association for Computational Linguistics, Sofia, August 2013. http://www.aclweb.org/anthology/P13-1138

  14. Konstas, I., Lapata, M.: A global model for concept-to-text generation. J. Artif. Int. Res. 48(1), 305–346 (2013). http://dl.acm.org/citation.cfm?id=2591248.2591256

    Article  Google Scholar 

  15. Lim-Cheng, N.R., Fabia, G.I.G., Quebral, M.E.G., Yu, M.T.: Shed: an online diet counselling system. In: DLSU Research Congress 2014 (2014)

    Google Scholar 

  16. Mairesse, F., Young, S.: Stochastic language generation in dialogue using factored language models. Comput. Linguist. 40(4), 763–799 (2014)

    Article  Google Scholar 

  17. Morales, J.L.O.: Nuevo método de ortografía. Colección Cervantes, Verbum (1992)

    Google Scholar 

  18. Nicolai, G., Cherry, C., Kondrak, G.: Inflection generation as discriminative string transduction. In: Proceedings of NAACL (2015)

    Google Scholar 

  19. Novais, E.M., Paraboni, I.: Portuguese text generation using factored language models. J. Braz. Comput. Soc. 19(2), 135–146 (2012)

    Article  Google Scholar 

  20. Padró, L., Stanilovsky, E.: Freeling 3.0: towards wider multilinguality. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA) (2012)

    Google Scholar 

  21. Ramos-Soto, A., Bugarín, A.J., Barro, S., Taboada, J.: Linguistic descriptions for automatic generation of textual short-term weather forecasts on real prediction data. IEEE Trans. Fuzzy Syst. 23(1), 44–57 (2015)

    Article  Google Scholar 

  22. Reiter, E., Dale, R.: Building Natural Language Generation Systems. Cambridge University Press, Cambridge (2000)

    Book  Google Scholar 

  23. Reiter, E., Turner, R., Alm, N., Black, R., Dempster, M., Waller, A.: Using NLG to help language-impaired users tell stories and participate in social dialogues. In: Proceedings of the 12th European Workshop on Natural Language Generation, pp. 1–8. Association for Computational Linguistics (2009)

    Google Scholar 

  24. Rvachew, S., Rafaat, S., Martin, M.: Stimulability, speech perception skills, and the treatment of phonological disorders. Am. J. Speech-Lang. Pathol. 8(1), 33–43 (1999)

    Article  Google Scholar 

  25. Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings International Conference on Spoken Language Processing, vol. 2, pp. 901–904 (2002)

    Google Scholar 

  26. Tachbelie, M.Y., Abate, S.T., Menzel, W.: Morpheme-based and factored language modeling for Amharic speech recognition. In: Vetulani, Z. (ed.) LTC 2009. LNCS (LNAI), vol. 6562, pp. 82–93. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20095-3_8

    Chapter  Google Scholar 

  27. Vergyri, D., Kirchhoff, K., Duh, K., Stolcke, A.: Morphology-based language modeling for Arabic speech recognition. In: INTERSPEECH, vol. 4, pp. 2245–2248 (2004)

    Google Scholar 

  28. Wan, S., Dras, M., Dale, R., Paris, C.: Spanning tree approaches for statistical sentence generation. In: Empirical Methods in Natural Language Generation: Data-Oriented Methods and Empirical Evaluation, pp. 13–44 (2010). https://doi.org/10.1007/978-3-642-15573-4_2

    Chapter  Google Scholar 

  29. White, M., Rajkumar, R.: Perceptron reranking for CCG realization. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 1, pp. 410–419. Association for Computational Linguistics (2009)

    Google Scholar 

Download references

Acknowledgment

This research work has been partially funded by the Generalitat Valenciana through the projects “DIIM2.0: Desarrollo de técnicas Inteligentes e Interactivas de Minería y generación de información sobre la web 2.0” (PROMETEOII/2014/ 001); and partially funded by the Spanish Government through projects TIN2015-65100-R, TIN2015-65136-C2-2-R, as well as by the project “Análisis de Sentimientos Aplicado a la Prevención del Suicidio en las Redes Sociales (ASAP)” funded by Ayudas Fundación BBVA a equipos de investigación científica.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cristina Barros .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Barros, C., Lloret, E. (2018). Surface Realisation Using Factored Language Models and Input Seed Features. In: Castro, F., Miranda-Jiménez, S., González-Mendoza, M. (eds) Advances in Computational Intelligence. MICAI 2017. Lecture Notes in Computer Science(), vol 10633. Springer, Cham. https://doi.org/10.1007/978-3-030-02840-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02840-4_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02839-8

  • Online ISBN: 978-3-030-02840-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics