A Comparison of Graph-Based and Statistical Metrics for Learning Domain Keywords

Kouznetsov, Alexandre; Zouaq, Amal

doi:10.1007/978-3-319-13332-4_21

Alexandre Kouznetsov²¹ &
Amal Zouaq²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8863))

Included in the following conference series:

Pacific Rim Knowledge Acquisition Workshop

773 Accesses
4 Citations

Abstract

In this paper, we present a comparison of unsupervised and supervised methods for key-phrase extraction from a domain corpus. The experimented unsupervised methods employ individual statistical measures and graph-based measures while the supervised methods apply machine learning models that include combinations of these statistical and graph-based measures. Graph-based measures are applied on a graph that connects terms and compound expressions through conceptual relations and represents a whole corpus about a domain, rather than a single document. Using three datasets from different domains, we observed that supervised methods over-perform unsupervised ones. We also found that the graph-based measures Degree and Reachability generally over-perform (in the majority of the cases) the standard baseline TF-IDF and other graph-based measures while the co-occurrences based measure Pointwise Mutual Information over-performs all the other metrics, including the graph-based measures, when taken individually.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Deep learning and embeddings-based approaches for keyphrase extraction: a literature review

Article Open access 05 July 2024

Automatic Key-Phrase Extraction: Empirical Study of Graph-Based Methods

The Semantic Level of Shannon Information: Are Highly Informative Words Good Keywords? A Study on German

References

Boudin, F.: A Comparison of Centrality Measures for Graph-Based Keyphrase Extraction. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing (October 2013)
Google Scholar
Brandes, U.: A Faster Algorithm for Betweenness Centrality. The Journal of Mathematical Sociology 25(2), 163–177 (2001)
Article MATH Google Scholar
le Cessie, S., van Houwelingen, J.C.: Ridge Estimators in Logistic Regression. Applied Statistics 41(1), 191–201 (1992)
Article MATH Google Scholar
Church, K.W., Hanks, P.: Word association norms, mutual information and lexicography. In: ACL, pp. 76–83 (1989)
Google Scholar
Hulth, A.: Improved Automatic Keyword Extraction Given More Linguistic Knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, Sapporo, Japan, pp. 216–223 (2003)
Google Scholar
Kleinberg, J.M.: Authoritative Sources in a Hyperlinked Environment. J. ACM 46(5), 604–632 (1999)
Article MATH MathSciNet Google Scholar
Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-occurrence statistical information. International Journal on Artificial Intelligence Tools (2004)
Google Scholar
Lahiri, S., Choudhury, S.R. Caragea, C.: Keyword and Keyphrase Extraction Using Centrality Measures on Collocation Networks. Cornell University Library, http://arxiv.org/abs/1401.6571 (submitted on January 25, 2014)
Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. In: Proceedings of EMNLP 2004, Barcelona, Spain, pp. 404–411 (July 2004)
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web. In: Proceedings of the 7th International World Wide Web Conference, Brisbane, Australia, pp. 161–172 (1998)
Google Scholar
Rodriguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation Forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(10), 1619–1630
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)
Article Google Scholar
Kim, S.N., Medelyan, O., Kan, M.-Y., Baldwin, T.: SemEval-2010 Task 5: Automatic Keyphrase Extraction from Scientific Articles, http://www.aclweb.org/anthology/S10-1004
Turney, P.D.: Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–502. Springer, Heidelberg (2001)
Google Scholar
Washtell, J., Markert, K.: A comparison of windowless and window-based computational association measures as predictors of syntagmatic human associations. In: EMNLP, pp. 628–637 (2009)
Google Scholar
Weka 3: Data Mining Software in Java, http://www.cs.waikato.ac.nz/ml/weka/index.html
Zouaq, A., Gasevic, D., Hatala, M.: Towards Open Ontology Learning and Filtering. Information Systems
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Royal Military College of Canada, CP 17000, Succursale Forces, Kingston, Canada, K7K 7B4
Alexandre Kouznetsov & Amal Zouaq

Authors

Alexandre Kouznetsov
View author publications
You can also search for this author in PubMed Google Scholar
Amal Zouaq
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Tasmania, Private Bag 87, 7001, Hobart, Tasmania, Australia
Yang Sok Kim & Byeong Ho Kang &
Department of Computing, Faculty of Science, Macquarie University, 2109, Sydney, Australia
Deborah Richards

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kouznetsov, A., Zouaq, A. (2014). A Comparison of Graph-Based and Statistical Metrics for Learning Domain Keywords. In: Kim, Y.S., Kang, B.H., Richards, D. (eds) Knowledge Management and Acquisition for Smart Systems and Services. PKAW 2014. Lecture Notes in Computer Science(), vol 8863. Springer, Cham. https://doi.org/10.1007/978-3-319-13332-4_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-13332-4_21
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13331-7
Online ISBN: 978-3-319-13332-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Comparison of Graph-Based and Statistical Metrics for Learning Domain Keywords

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Deep learning and embeddings-based approaches for keyphrase extraction: a literature review

Automatic Key-Phrase Extraction: Empirical Study of Graph-Based Methods

The Semantic Level of Shannon Information: Are Highly Informative Words Good Keywords? A Study on German

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Comparison of Graph-Based and Statistical Metrics for Learning Domain Keywords

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Deep learning and embeddings-based approaches for keyphrase extraction: a literature review

Automatic Key-Phrase Extraction: Empirical Study of Graph-Based Methods

The Semantic Level of Shannon Information: Are Highly Informative Words Good Keywords? A Study on German

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation