Skip to main content

Automatic Extraction of Explicit and Implicit Keywords to Build Document Descriptors

  • Conference paper
Progress in Artificial Intelligence (EPIA 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8154))

Included in the following conference series:

Abstract

Keywords are single and multiword terms that describe the semantic content of documents. They are useful in many applications, such as document searching and indexing, or to be read by humans. Keywords can be explicit, by occurring in documents, or implicit, since, although not explicitly written in documents, they are semantically related to their contents. This paper presents a statistical approach to build document descriptors with explicit and implicit keywords automatically extracted from the documents. Our approach is language-independent and we show comparative results for three different European languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Luhn, H.P.: The automatic creation of literature abstracts. IBM Journal of Research and Development 2, 159–168 (1958)

    Article  MathSciNet  Google Scholar 

  2. Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28, 11–21 (1972)

    Article  Google Scholar 

  3. Salton, G., Yang, C.: On the specification of term value in automatic indexing. Journal of Documentation 29(4), 351–372 (1973)

    Article  Google Scholar 

  4. Cigarrán, J.M., Peñas, A., Gonzalo, J., Verdejo, M.F.: Automatic selection of noun phrases as document descriptors in an fca-based information retrieval system. In: Ganter, B., Godin, R. (eds.) ICFCA 2005. LNCS (LNAI), vol. 3403, pp. 49–63. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  5. Hulth, A.: Enhancing linguistically oriented automatic keyword extraction. In: Proceedings of Human Language Technology - North American Association for Computational Linguistics, pp. 17–20 (2004)

    Google Scholar 

  6. Alani, H., Sanghee, K., Millard, D.E., Weal, M.J., Lewis, P.H., Hall, W., Shadbolt, N.: Automatic extraction of knowledge from web documents. In: Proceedings of Workshop of Human Language Technology for the Semantic Web and Web Services, 2nd International Semantic Web Conference (2003)

    Google Scholar 

  7. Ercan, G., Cicekli, I.: Using lexical chains for keyword extraction. Information Processing and Management: An International Journal Archive 6, 1705–1714 (2007)

    Article  Google Scholar 

  8. Zhang, K., Xu, H., Tang, J., Li, J.: Keyword extraction using support vector machine. In: Yu, J.X., Kitsuregawa, M., Leong, H.-V. (eds.) WAIM 2006. LNCS, vol. 4016, pp. 85–96. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Mihalcea, R., Tarau, P.: Textrank: Bringing order into texts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), vol. 2 (2004)

    Google Scholar 

  10. Silva, J.F., Lopes, G.P.: Towards automatic building of document keywords. In: COLING 2010 The 23rd International Conference on Computational Linguistics, pp. 1149–1157 (2010)

    Google Scholar 

  11. Teixeira, L.F., Lopes, G.P., Ribeiro, R.A.: An extensive comparison of metrics for automatic extraction of key terms. In: Proceedings of 4th International Conference on Agents and Artificial Intelligence, pp. 55–63 (2012)

    Google Scholar 

  12. Ventura, J., Silva, J.F.: Mining concepts from texts. In: International Conference on Computer Science (2012)

    Google Scholar 

  13. Suzuki, Y., Fukumoto, F., Sekiguchi, Y.: Keyword extraction of radio news using term weighting with an encyclopedia and newspaper articles. In: SIGIR (1998)

    Google Scholar 

  14. Delort, J.Y., Bouchon-Meunier, B., Rifqi, M.: Enhanced web document summarization using hyperlinks. In: Proceedings of the Fourteenth Association for Computing Machinery Conference on Hypertext and Hypermedia (2003)

    Google Scholar 

  15. Xu, S., Yang, S., Lau, F.C.: Keyword extraction and headline generation using novel word features. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2010) (2010)

    Google Scholar 

  16. Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In: CIKM 2007: Proceedings of the 16th ACM Conference on Information and Knowledge Management, vol. 2 (2010)

    Google Scholar 

  17. Milne, D., Witten, I.H.: An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proceedings of Wikipedia and AI Workshop at the AAAI 2008 Conference (WikiAI 2008) (2008)

    Google Scholar 

  18. Silva, J.F., Lopes, G.P.: A local maxima method and a fair dispersion normalization for extracting multiword units. In: Proceedings of the 6th Meeting on the Mathematics of Language, pp. 369–381 (1999)

    Google Scholar 

  19. Frantzi, K., Ananiadou, S.: Extracting nested collocations. In: The 16th International Conference on Computational Linguistics (COLING 1996), pp. 41–46 (1996)

    Google Scholar 

  20. Yoshida, M., Nakagawa, H.: Automatic term extraction based on perplexity of compound words. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 269–279. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ventura, J., Silva, J. (2013). Automatic Extraction of Explicit and Implicit Keywords to Build Document Descriptors. In: Correia, L., Reis, L.P., Cascalho, J. (eds) Progress in Artificial Intelligence. EPIA 2013. Lecture Notes in Computer Science(), vol 8154. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40669-0_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40669-0_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40668-3

  • Online ISBN: 978-3-642-40669-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics