Skip to main content

Unsupervised Extraction of Conceptual Keyphrases from Abstracts

  • Conference paper
  • First Online:
Book cover Semantic Keyword-Based Search on Structured Data Sources (IKC 2016)

Abstract

The extraction of meaningful keyphrases is important for a variety of applications, such as recommender systems, solutions for browsing of literature, or automatic categorization of documents. Since this task is not trivial, a great amount of different approaches have been introduced in the past, either focusing on single aspects of the process or utilizing the characteristics of a certain type of document. Especially when it comes to supporting the user in grasping the topics of a document (i.e. in the display of search results), precise keyphrases can be very helpful. However, in such situations usually only the abstract or a short excerpt is available, which most approaches do not acknowledge. Methods based on the frequency of words are not appropriate in this case, since the short texts do not contain sufficient word statistics for a frequency analysis. Secondly, many existing methods are supervised and therefore depend on domain knowledge or manually annotated data, which is in many scenarios not available. Therefore we present an unsupervised graph-based approach for extracting meaningful keyphrases from abstracts of scientific articles. We show that even though our method is not based on manually annotated data or corpora, it works surprisingly well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Baddeley, A.D., Thomson, N., Buchanan, M.: Word length and the structure of short-term memory. J. Verbal Learn. Verbal Behav. 14(6), 575–589 (1975)

    Article  Google Scholar 

  2. Barla, M., Bieliková, M.: On deriving tagsonomies: keyword relations coming from crowd. In: Nguyen, N.T., Kowalczyk, R., Chen, S.-M. (eds.) ICCCI 2009. LNCS (LNAI), vol. 5796, pp. 309–320. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04441-0_27

    Chapter  Google Scholar 

  3. Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: An overview of graph-based keyword extraction methods and approaches. J. Inf. Organ. Sci. 39(1), 1–20 (2015)

    Google Scholar 

  4. Brandes, U.: A faster algorithm for betweenness centrality*. J. Math. Sociol. 25(2), 163–177 (2001)

    Article  Google Scholar 

  5. Buyukkokten, O., Garcia-Molina, H., Paepcke, A.: Seeing the whole in parts: text summarization for web browsing on handheld devices. In: Proceedings of the 10th International Conference on World Wide Web, pp. 652–662. ACM (2001)

    Google Scholar 

  6. Freeman, L.C.: Centrality in social networks - conceptual clarification. Soc. Netw. 1, 215–239 (1978)

    Article  Google Scholar 

  7. Goecks, J., Shavlik, J.: Learning users’ interests by unobtrusively observing their normal behavior. In: Proceedings of the 5th International Conference on Intelligent user interfaces, pp. 129–132. ACM (2000)

    Google Scholar 

  8. Grothe, L., Luca, E.W.D., Nürnberger, A.: A comparative study on language identification methods. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), pp. 980–985 (2008)

    Google Scholar 

  9. Igarashi, A., Pierce, B.C., Wadler, P.: Featherweight Java: a minimal core calculus for Java and GJ. ACM Trans. Program. Lang. Syst. (1999)

    Google Scholar 

  10. Lahiri, S., Choudhury, S.R., Caragea, C.: Keyword and keyphrase extraction using centrality measures on collocation networks. arXiv:1401.6571 (2014)

  11. Lossio-Ventura, J.A., Jonquet, C., Roche, M., Teisseire, M.: Yet another ranking function for automatic multiword term extraction. In: Przepiórkowski, A., Ogrodniczuk, M. (eds.) NLP 2014. LNCS (LNAI), vol. 8686, pp. 52–64. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10888-9_6

    Chapter  Google Scholar 

  12. Popova, S., Khodyrev, I.: Ranking in keyphrase extraction problem: is it suitable to use statistics of words occurrences. Proc. Inst. Syst. Program. 26(4), 123–136 (2014)

    Google Scholar 

  13. Popova, S., Kovriguina, L., Mouromtsev, D., Khodyrev, I.: Stop-words in keyphrase extraction problem. In: 2013 14th Conference of Open Innovations Association (FRUCT), pp. 113–121. IEEE (2013)

    Google Scholar 

  14. Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. In: Berry, M.W., Kogan, J. (eds.) Text Mining, pp. 1–20. Wiley, New York (2010)

    Google Scholar 

  15. Sarıyüce, A.E., Kaya, K., Saule, E., Catalyürek, U.V.: Incremental algorithms for closeness centrality. In: IEEE International Conference on BigData (2013)

    Google Scholar 

  16. Šišović, S., Martinčić-Ipšić, S., Meštrović, A.: Toward network-based keyword extraction from multitopic web documents. In: International Conference on Information Technologies and Information Society (ITIS 2014) (2014)

    Google Scholar 

  17. Wang, R., Liu, W., McDonald, C.: Using word embeddings to enhance keyword identification for scientific publications. In: Sharaf, M.A., Cheema, M.A., Qi, J. (eds.) ADC 2015. LNCS, vol. 9093, pp. 257–268. Springer, Heidelberg (2015). doi:10.1007/978-3-319-19548-3_21

    Chapter  Google Scholar 

  18. Xie, Z.: Centrality measures in text mining: prediction of noun phrases that appear in abstracts. In: Proceedings of the ACL Student Research Workshop, pp. 103–108. Association for Computational Linguistics, Stroudsburg (2005)

    Google Scholar 

  19. Yoon, J., Kim, K.: Detecting signals of new technological opportunities using semantic patent analysis and outlier detection. Scientometrics 90(2), 445–461 (2011)

    Article  Google Scholar 

Download references

Acknowledgements

This research is supported by BMWi grants KF3358702KM4, KF2885203KM4.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcus Thiel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Ludwig, P., Thiel, M., Nürnberger, A. (2017). Unsupervised Extraction of Conceptual Keyphrases from Abstracts. In: Calì, A., Gorgan, D., Ugarte, M. (eds) Semantic Keyword-Based Search on Structured Data Sources. IKC 2016. Lecture Notes in Computer Science(), vol 10151. Springer, Cham. https://doi.org/10.1007/978-3-319-53640-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-53640-8_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-53639-2

  • Online ISBN: 978-3-319-53640-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics