Unsupervised Extraction of Conceptual Keyphrases from Abstracts

Ludwig, Philipp; Thiel, Marcus; Nürnberger, Andreas

doi:10.1007/978-3-319-53640-8_4

Philipp Ludwig¹⁶,
Marcus Thiel¹⁶ &
Andreas Nürnberger¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10151))

Included in the following conference series:

International KEYSTONE Conference on Semantic Keyword-Based Search on Structured Data Sources

694 Accesses
2 Citations

Abstract

The extraction of meaningful keyphrases is important for a variety of applications, such as recommender systems, solutions for browsing of literature, or automatic categorization of documents. Since this task is not trivial, a great amount of different approaches have been introduced in the past, either focusing on single aspects of the process or utilizing the characteristics of a certain type of document. Especially when it comes to supporting the user in grasping the topics of a document (i.e. in the display of search results), precise keyphrases can be very helpful. However, in such situations usually only the abstract or a short excerpt is available, which most approaches do not acknowledge. Methods based on the frequency of words are not appropriate in this case, since the short texts do not contain sufficient word statistics for a frequency analysis. Secondly, many existing methods are supervised and therefore depend on domain knowledge or manually annotated data, which is in many scenarios not available. Therefore we present an unsupervised graph-based approach for extracting meaningful keyphrases from abstracts of scientific articles. We show that even though our method is not based on manually annotated data or corpora, it works surprisingly well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Baddeley, A.D., Thomson, N., Buchanan, M.: Word length and the structure of short-term memory. J. Verbal Learn. Verbal Behav. 14(6), 575–589 (1975)
Article Google Scholar
Barla, M., Bieliková, M.: On deriving tagsonomies: keyword relations coming from crowd. In: Nguyen, N.T., Kowalczyk, R., Chen, S.-M. (eds.) ICCCI 2009. LNCS (LNAI), vol. 5796, pp. 309–320. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04441-0_27
Chapter Google Scholar
Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: An overview of graph-based keyword extraction methods and approaches. J. Inf. Organ. Sci. 39(1), 1–20 (2015)
Google Scholar
Brandes, U.: A faster algorithm for betweenness centrality*. J. Math. Sociol. 25(2), 163–177 (2001)
Article Google Scholar
Buyukkokten, O., Garcia-Molina, H., Paepcke, A.: Seeing the whole in parts: text summarization for web browsing on handheld devices. In: Proceedings of the 10th International Conference on World Wide Web, pp. 652–662. ACM (2001)
Google Scholar
Freeman, L.C.: Centrality in social networks - conceptual clarification. Soc. Netw. 1, 215–239 (1978)
Article Google Scholar
Goecks, J., Shavlik, J.: Learning users’ interests by unobtrusively observing their normal behavior. In: Proceedings of the 5th International Conference on Intelligent user interfaces, pp. 129–132. ACM (2000)
Google Scholar
Grothe, L., Luca, E.W.D., Nürnberger, A.: A comparative study on language identification methods. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), pp. 980–985 (2008)
Google Scholar
Igarashi, A., Pierce, B.C., Wadler, P.: Featherweight Java: a minimal core calculus for Java and GJ. ACM Trans. Program. Lang. Syst. (1999)
Google Scholar
Lahiri, S., Choudhury, S.R., Caragea, C.: Keyword and keyphrase extraction using centrality measures on collocation networks. arXiv:1401.6571 (2014)
Lossio-Ventura, J.A., Jonquet, C., Roche, M., Teisseire, M.: Yet another ranking function for automatic multiword term extraction. In: Przepiórkowski, A., Ogrodniczuk, M. (eds.) NLP 2014. LNCS (LNAI), vol. 8686, pp. 52–64. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10888-9_6
Chapter Google Scholar
Popova, S., Khodyrev, I.: Ranking in keyphrase extraction problem: is it suitable to use statistics of words occurrences. Proc. Inst. Syst. Program. 26(4), 123–136 (2014)
Google Scholar
Popova, S., Kovriguina, L., Mouromtsev, D., Khodyrev, I.: Stop-words in keyphrase extraction problem. In: 2013 14th Conference of Open Innovations Association (FRUCT), pp. 113–121. IEEE (2013)
Google Scholar
Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. In: Berry, M.W., Kogan, J. (eds.) Text Mining, pp. 1–20. Wiley, New York (2010)
Google Scholar
Sarıyüce, A.E., Kaya, K., Saule, E., Catalyürek, U.V.: Incremental algorithms for closeness centrality. In: IEEE International Conference on BigData (2013)
Google Scholar
Šišović, S., Martinčić-Ipšić, S., Meštrović, A.: Toward network-based keyword extraction from multitopic web documents. In: International Conference on Information Technologies and Information Society (ITIS 2014) (2014)
Google Scholar
Wang, R., Liu, W., McDonald, C.: Using word embeddings to enhance keyword identification for scientific publications. In: Sharaf, M.A., Cheema, M.A., Qi, J. (eds.) ADC 2015. LNCS, vol. 9093, pp. 257–268. Springer, Heidelberg (2015). doi:10.1007/978-3-319-19548-3_21
Chapter Google Scholar
Xie, Z.: Centrality measures in text mining: prediction of noun phrases that appear in abstracts. In: Proceedings of the ACL Student Research Workshop, pp. 103–108. Association for Computational Linguistics, Stroudsburg (2005)
Google Scholar
Yoon, J., Kim, K.: Detecting signals of new technological opportunities using semantic patent analysis and outlier detection. Scientometrics 90(2), 445–461 (2011)
Article Google Scholar

Download references

Acknowledgements

This research is supported by BMWi grants KF3358702KM4, KF2885203KM4.

Author information

Authors and Affiliations

Faculty of Computer Science, Otto von Guericke University Magdeburg, Magdeburg, Germany
Philipp Ludwig, Marcus Thiel & Andreas Nürnberger

Authors

Philipp Ludwig
View author publications
You can also search for this author in PubMed Google Scholar
Marcus Thiel
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Nürnberger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcus Thiel .

Editor information

Editors and Affiliations

Department of Computer Science and Information Systems, Birkbeck University of London, London, UK
Andrea Calì
Computer Science Department, Technical University of Cluj-Napoca, Cluj-Napoca, Romania
Dorian Gorgan
Computer and Decision Engineering (CoDE) Department, Université Libre de Bruxelles, Brussels, Belgium
Martín Ugarte

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ludwig, P., Thiel, M., Nürnberger, A. (2017). Unsupervised Extraction of Conceptual Keyphrases from Abstracts. In: Calì, A., Gorgan, D., Ugarte, M. (eds) Semantic Keyword-Based Search on Structured Data Sources. IKC 2016. Lecture Notes in Computer Science(), vol 10151. Springer, Cham. https://doi.org/10.1007/978-3-319-53640-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-53640-8_4
Published: 15 February 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-53639-2
Online ISBN: 978-3-319-53640-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics