Skip to main content

The Role of Linked Data in Content Selection

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8862))

Abstract

This paper explores the appropriateness of utilizing Linked Data as a knowledge source for content selection. Content Selection is a crucial subtask in Natural Language Generation which has the function of determining the relevancy of contents from a knowledge source based on a communicative goal. The recent online era has enabled us to accumulate extensive amounts of generic online knowledge some of which has been made available as structured knowledge sources for computational natural language processing purposes. This paper proposes a model for content selection by utilizing a generic structured knowledge source, DBpedia, which is a replica of the unstructured counterpart, Wikipedia. The proposed model uses log likelihood to rank the contents from DBpedia Linked Data for relevance to a communicative goal. We performed experiments using DBpedia as the Linked Data resource using two keyword datasets as communicative goals. To optimize parameters we used keywords extracted from QALD-2 training dataset and QALD-2 testing dataset is used for the testing. The results was evaluated against the verbatim based selection strategy. The results showed that our model can perform 18.03% better than verbatim selection.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Reiter, E., Dale, R.: Building natural language generation systems. Cambridge University Press (January 2000)

    Google Scholar 

  2. Jentzsch, A., Cyganiak, R., Bizer, C.: State of the LOD Cloud. Technical report, Hasso-Plattner-Institute, Potsdam-Babelsberg (2011)

    Google Scholar 

  3. Rayson, P., Berridge, D., Francis, B.: Extending the Cochran rule for the comparison of word frequencies between corpora. In: 7th International Conference on Statistical Analysis of Textual Data (2004)

    Google Scholar 

  4. He, T., Zhang, X., Xinghuo, Y.: An Approach to Automatically Constructing Domain Ontology. In: Pacific Asia Computational Linguistics, Wuhan, pp. 150–157 (2006)

    Google Scholar 

  5. Gelbukh, A., Sidorov, G., Lavin-Villa, E., Chanona-Hernandez, L.: Automatic Term Extraction Using Log-Likelihood Based Comparison with General Reference Corpus. In: Hopfe, C.J., Rezgui, Y., Métais, E., Preece, A., Li, H. (eds.) NLDB 2010. LNCS, vol. 6177, pp. 248–255. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  6. Pedersen, P.: WordNet: Similarity - Measuring the Relatedness of Concepts. In: Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics, Boston, pp. 38–41 (2004)

    Google Scholar 

  7. Miller, G.A.: WordNet: A Lexical Database for English. Communications of the ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  8. Penas, A., Hovy, E.: Semantic enrichment of text with background knowledge. In: NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, Los Angeles, pp. 15–23. Association for Computational Linguistics (June 2010)

    Google Scholar 

  9. Voorhees, E., Tice, D.: Building a Question Answering Test Collection. In: ACM Special Interest Group on Information Retrieval Conference, Athens, Greece. ACM Press (2000)

    Google Scholar 

  10. Unger, C.: Question Answering Over Linked Data. Technical report, Bielefeld University, Heraklion, Greece (2012)

    Google Scholar 

  11. Smith, N., Heilman, M., Hwa, R., Cohen, S., Gimpel, K.: Question-Answer Dataset. Technical report, Carnegie Mellon University, Pennsylvania, USA (2013)

    Google Scholar 

  12. Bouayad-Agha, N., Casamayor, G., Wanner, L., Mellish, C.: Content selection from semantic web data. In: Seventh International Natural Language Generation Conference, Utica, IL, USA, pp. 146–149. Association for Computational Linguistics (May 2012)

    Google Scholar 

  13. Bouayad-Agha, N., Casamayor, G., Wanner, L., Mellish, C.: Overview of the First Content Selection Challenge from Open Semantic Web Data. In: Proceedings of the 14th European Workshop on Natural Language Generation, Sofia, Bulgaria, pp. 98–102. Association for Computational Linguistics (August 2013)

    Google Scholar 

  14. Kutlak, R., Mellish, C., van Deemter, K.: Content Selection Challenge - University of Aberdeen Entry. In: Fourteenth European Workshop on Natural Language Generation, Sofia, Bulgaria, pp. 208–209. Association for Computational Linguistics (August 2013)

    Google Scholar 

  15. Venigalla, H., Eugenio, B.D.: UIC-CSC: The Content Selection Challenge Entry from the University of Illinois at Chicago. In: Proceedings of the 14th European Workshop on Natural Language Generation, Sofia, Bulgaria, pp. 210–211. Association for Computational Linguistics (August 2013)

    Google Scholar 

  16. Duboue, P.A., McKeown, K.R.: Statistical acquisition of content selection rules for natural language generation. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, Morristown, NJ, USA, vol. 10, pp. 121–128. Association for Computational Linguistics (July 2003)

    Google Scholar 

  17. Bouayad-Agha, N., Casamayor, G., Wanner, L.: Content selection from an ontology-based knowledge base for the generation of football summaries. In: Thirtheenth European Workshop on Natural Language Generation, Nancy, France, pp. 72–81. Association for Computational Linguistics (September 2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Perera, R., Nand, P. (2014). The Role of Linked Data in Content Selection. In: Pham, DN., Park, SB. (eds) PRICAI 2014: Trends in Artificial Intelligence. PRICAI 2014. Lecture Notes in Computer Science(), vol 8862. Springer, Cham. https://doi.org/10.1007/978-3-319-13560-1_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13560-1_46

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13559-5

  • Online ISBN: 978-3-319-13560-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics