SemCluster: Unsupervised Automatic Keyphrase Extraction Using Affinity Propagation

Alrehamy, Hassan H.; Walker, Coral

doi:10.1007/978-3-319-66939-7_19

Hassan H. Alrehamy¹⁷ &
Coral Walker¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 650))

Included in the following conference series:

UK Workshop on Computational Intelligence

1204 Accesses
6 Citations
3 Altmetric

Abstract

Keyphrases provide important semantic metadata for organizing and managing free-text documents. As data grow exponentially, there is a pressing demand for automatic and efficient keyphrase extraction methods. We introduce in this paper SemCluster, a clustering-based unsupervised keyphrase extraction method. By integrating an internal ontology (i.e., WordNet) with external knowledge sources, SemCluster identifies and extracts semantically important terms from a given document, clusters the terms, and, using the clustering results as heuristics, identifies the most representative phrases and singles them out as keyphrases. SemCluster is evaluated against two baseline unsupervised methods, TextRank and KeyCluster, over the Inspec dataset under an F1-measure metric. The evaluation results clearly show that SemCluster outperforms both methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.dpedia.org.
2.
http://yago-knowledge.org.
3.
DBpedia schema is available at http://mappings.dbpedia.org/server/ontology/classes/.
4.
The alignment results are available at http://www.pdlab.io/semcluster/inspec.rar.
5.
The lookup code is available at https://github.com/dbpedia/lookup.
6.
ESA code is available at http://treo.deri.ie/easyesa/.
7.
http://opennlp.apache.org.
8.
Wordnet v.3.1 is available at https://wordnet.princeton.edu/wordnet/download.
9.
Hulth’s dataset copy is available at http://pdllab.io/semcluster/inspec.rar.

References

Turney, P.D.: Learning algorithms for keyphrase extraction. Inf. Retr. 2(4), 303–336 (2000)
Article Google Scholar
Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 1262–1273 (2014)
Google Scholar
Washio, T., Motoda, H.: State of the art of graph-based data mining. ACM SIGKDD Explor. Newsl. 5(1), 59–68 (2003)
Article Google Scholar
Sonowane, S.S., Kulkarni, P.A.: Graph based representation and analysis of text document: a survey of techniques. Int. J. Comput. Appl. 96, 1–8 (2014)
Google Scholar
Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. Association for Computational Linguistics (2004)
Google Scholar
Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 257–266 (2009)
Google Scholar
Steier, A.M., Belew, R.K.: Exporting phrases: a statistical analysis of topical language. In: Second Symposium on Document Analysis and Information Retrieval, pp. 179–190 (1993)
Google Scholar
Barker, K., Cornacchia, N.: Using noun phrase heads to extract document keyphrases. In: Conference of the Canadian Society for Computational Studies of Intelligence, pp. 40–52. Springer (2000)
Google Scholar
Litvak, M., Last, M.: Graph-based keyword extraction for single-document summarization. In: Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization, pp. 17–24 (2008)
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical Report, Stanford InfoLab (1999)
Google Scholar
Tsatsaronis, G., Varlamis, I., Nrvg, K.: SemanticRank: ranking keywords and sentences using semantic graphs. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1074–1082 (1999)
Google Scholar
Bracewell, D.B., Ren, F., Kuriowa, S.: Multilingual single document keyword extraction for information retrieval. In: Proceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering, pp. 517–522 (2005)
Google Scholar
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 216–223. Association for Computational Linguistics (2003)
Google Scholar
Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J.: Introduction to WordNet: an on-line lexical database. Int. J. Lexicogr. 3(4), 235–244 (1990)
Article Google Scholar
Alrehamy, H., Walker, C.: Personal data lake with data gravity pull. In: Proceedings of the 2015 IEEE Fifth International Conference on Big Data and Cloud Computing, pp. 160–167 (2015)
Google Scholar
Navigli, R.: Word sense disambiguation: a survey. ACM Comput. Surv. 41(2) (2009)
Google Scholar
Patwardhan, S., Banerjee, S., Pedersen, T.: SenseRelate::TargetWord: a generalized framework for word sense disambiguation. In Proceedings of the ACL 2005 on Interactive Poster and Demonstration Sessions, pp. 73–76. Association for Computational Linguistics (2005)
Google Scholar
Meng, L., Huang, R., Gu, J.: A review of semantic similarity measures in WordNet. Int. J. Hybrid Inf. Technol. 6(1), 1–12 (2013)
Google Scholar
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics (1994)
Google Scholar
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
Article MathSciNet MATH Google Scholar
Liu, F., Pennell, D., Liu, F., Liu, Y.: Unsupervised approaches for automatic keyword extraction using meeting transcripts. In: Proceedings of Human Language Technologies, pp. 620–628. Association for Computational Linguistics (2009)
Google Scholar
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 1606–1611 (2007)
Google Scholar
Witten, I., Milne, D.: An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In: Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pp. 25–30. AAAI Press, Chicago (2008)
Google Scholar
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(2), 281–305 (2012)
MathSciNet MATH Google Scholar
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint http://arxiv.org/abs/cmp-lg/9709008 (1997)
Moro, A., Cecconi, F., Navigli, R.: Multilingual word sense disambiguation and entity linking for everybody. In: Proceedings of the 2014 International Conference on Posters and Demonstrations, pp. 25–28. CEUR-WS.org (2014)
Google Scholar
Guan, R., Shi, X., Marchese, M., Yang, C., Liang, Y.: Text clustering with seeds affinity propagation. IEEE Trans. Knowl. Data Eng. 23(4), 627–637 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Informatics, Cardiff University, Cardiff, UK
Hassan H. Alrehamy & Coral Walker

Authors

Hassan H. Alrehamy
View author publications
You can also search for this author in PubMed Google Scholar
Coral Walker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Coral Walker .

Editor information

Editors and Affiliations

Xiamen University, Xiamen Shi, Fujian, China
Fei Chao
Cardiff University, Cardiff, United Kingdom
Steven Schockaert
Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
Qingfu Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alrehamy, H.H., Walker, C. (2018). SemCluster: Unsupervised Automatic Keyphrase Extraction Using Affinity Propagation. In: Chao, F., Schockaert, S., Zhang, Q. (eds) Advances in Computational Intelligence Systems. UKCI 2017. Advances in Intelligent Systems and Computing, vol 650. Springer, Cham. https://doi.org/10.1007/978-3-319-66939-7_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-66939-7_19
Published: 05 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66938-0
Online ISBN: 978-3-319-66939-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics