Skip to main content

SemCluster: Unsupervised Automatic Keyphrase Extraction Using Affinity Propagation

  • Conference paper
  • First Online:
Book cover Advances in Computational Intelligence Systems (UKCI 2017)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 650))

Included in the following conference series:

Abstract

Keyphrases provide important semantic metadata for organizing and managing free-text documents. As data grow exponentially, there is a pressing demand for automatic and efficient keyphrase extraction methods. We introduce in this paper SemCluster, a clustering-based unsupervised keyphrase extraction method. By integrating an internal ontology (i.e., WordNet) with external knowledge sources, SemCluster identifies and extracts semantically important terms from a given document, clusters the terms, and, using the clustering results as heuristics, identifies the most representative phrases and singles them out as keyphrases. SemCluster is evaluated against two baseline unsupervised methods, TextRank and KeyCluster, over the Inspec dataset under an F1-measure metric. The evaluation results clearly show that SemCluster outperforms both methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.dpedia.org.

  2. 2.

    http://yago-knowledge.org.

  3. 3.

    DBpedia schema is available at http://mappings.dbpedia.org/server/ontology/classes/.

  4. 4.

    The alignment results are available at http://www.pdlab.io/semcluster/inspec.rar.

  5. 5.

    The lookup code is available at https://github.com/dbpedia/lookup.

  6. 6.

    ESA code is available at http://treo.deri.ie/easyesa/.

  7. 7.

    http://opennlp.apache.org.

  8. 8.

    Wordnet v.3.1 is available at https://wordnet.princeton.edu/wordnet/download.

  9. 9.

    Hulth’s dataset copy is available at http://pdllab.io/semcluster/inspec.rar.

References

  1. Turney, P.D.: Learning algorithms for keyphrase extraction. Inf. Retr. 2(4), 303–336 (2000)

    Article  Google Scholar 

  2. Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 1262–1273 (2014)

    Google Scholar 

  3. Washio, T., Motoda, H.: State of the art of graph-based data mining. ACM SIGKDD Explor. Newsl. 5(1), 59–68 (2003)

    Article  Google Scholar 

  4. Sonowane, S.S., Kulkarni, P.A.: Graph based representation and analysis of text document: a survey of techniques. Int. J. Comput. Appl. 96, 1–8 (2014)

    Google Scholar 

  5. Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. Association for Computational Linguistics (2004)

    Google Scholar 

  6. Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 257–266 (2009)

    Google Scholar 

  7. Steier, A.M., Belew, R.K.: Exporting phrases: a statistical analysis of topical language. In: Second Symposium on Document Analysis and Information Retrieval, pp. 179–190 (1993)

    Google Scholar 

  8. Barker, K., Cornacchia, N.: Using noun phrase heads to extract document keyphrases. In: Conference of the Canadian Society for Computational Studies of Intelligence, pp. 40–52. Springer (2000)

    Google Scholar 

  9. Litvak, M., Last, M.: Graph-based keyword extraction for single-document summarization. In: Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization, pp. 17–24 (2008)

    Google Scholar 

  10. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical Report, Stanford InfoLab (1999)

    Google Scholar 

  11. Tsatsaronis, G., Varlamis, I., Nrvg, K.: SemanticRank: ranking keywords and sentences using semantic graphs. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1074–1082 (1999)

    Google Scholar 

  12. Bracewell, D.B., Ren, F., Kuriowa, S.: Multilingual single document keyword extraction for information retrieval. In: Proceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering, pp. 517–522 (2005)

    Google Scholar 

  13. Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 216–223. Association for Computational Linguistics (2003)

    Google Scholar 

  14. Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J.: Introduction to WordNet: an on-line lexical database. Int. J. Lexicogr. 3(4), 235–244 (1990)

    Article  Google Scholar 

  15. Alrehamy, H., Walker, C.: Personal data lake with data gravity pull. In: Proceedings of the 2015 IEEE Fifth International Conference on Big Data and Cloud Computing, pp. 160–167 (2015)

    Google Scholar 

  16. Navigli, R.: Word sense disambiguation: a survey. ACM Comput. Surv. 41(2) (2009)

    Google Scholar 

  17. Patwardhan, S., Banerjee, S., Pedersen, T.: SenseRelate::TargetWord: a generalized framework for word sense disambiguation. In Proceedings of the ACL 2005 on Interactive Poster and Demonstration Sessions, pp. 73–76. Association for Computational Linguistics (2005)

    Google Scholar 

  18. Meng, L., Huang, R., Gu, J.: A review of semantic similarity measures in WordNet. Int. J. Hybrid Inf. Technol. 6(1), 1–12 (2013)

    Google Scholar 

  19. Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics (1994)

    Google Scholar 

  20. Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  21. Liu, F., Pennell, D., Liu, F., Liu, Y.: Unsupervised approaches for automatic keyword extraction using meeting transcripts. In: Proceedings of Human Language Technologies, pp. 620–628. Association for Computational Linguistics (2009)

    Google Scholar 

  22. Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 1606–1611 (2007)

    Google Scholar 

  23. Witten, I., Milne, D.: An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In: Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pp. 25–30. AAAI Press, Chicago (2008)

    Google Scholar 

  24. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(2), 281–305 (2012)

    MathSciNet  MATH  Google Scholar 

  25. Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint http://arxiv.org/abs/cmp-lg/9709008 (1997)

  26. Moro, A., Cecconi, F., Navigli, R.: Multilingual word sense disambiguation and entity linking for everybody. In: Proceedings of the 2014 International Conference on Posters and Demonstrations, pp. 25–28. CEUR-WS.org (2014)

    Google Scholar 

  27. Guan, R., Shi, X., Marchese, M., Yang, C., Liang, Y.: Text clustering with seeds affinity propagation. IEEE Trans. Knowl. Data Eng. 23(4), 627–637 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Coral Walker .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Alrehamy, H.H., Walker, C. (2018). SemCluster: Unsupervised Automatic Keyphrase Extraction Using Affinity Propagation. In: Chao, F., Schockaert, S., Zhang, Q. (eds) Advances in Computational Intelligence Systems. UKCI 2017. Advances in Intelligent Systems and Computing, vol 650. Springer, Cham. https://doi.org/10.1007/978-3-319-66939-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66939-7_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66938-0

  • Online ISBN: 978-3-319-66939-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics