A Two-Level Keyphrase Extraction Approach

Ali, Chedi Bechikh; Wang, Rui; Haddad, Hatem

doi:10.1007/978-3-319-18117-2_29

Chedi Bechikh Ali^14,15,
Rui Wang¹⁶ &
Hatem Haddad¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9042))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

3365 Accesses
3 Citations

Abstract

In this paper, we present a new two-level approach to extract KeyPhrases from textual documents. Our approach relies on a linguistic analysis to extract candidate KeyPhrases and a statistical analysis to rank and filter the final KeyPhrases. We evaluated our approach on three publicly available corpora with documents of varying lengths, domains and languages including English and French. We obtained improvement of Precision, Recall and F-measure. Our results indicate that our approach is independent of the length, the domain and the language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Turney, P.D.: Learning algorithms for keyphrase extraction. Inf. Retr. 2, 303–336 (2000)
Article Google Scholar
Hasan, K.S., Ng, V.: Conundrums in unsupervised keyphrase extraction: Making sense of the state-of-the-art. In: 23rd International Conference on Computational Linguistics, Beijing, China, pp. 365–373 (2010)
Google Scholar
Litvak, M., Last, M.: Graph-based keyword extraction for single-document summarization. In: Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization, pp. 17–24. Association for Computational Linguistics, Stroudsburg (2008)
Chapter Google Scholar
Hammouda, K.M., Matute, D.N., Kamel, M.S.: CorePhrase: Keyphrase extraction for document clustering. In: Perner, P., Imiya, A. (eds.) MLDM 2005. LNCS (LNAI), vol. 3587, pp. 265–274. Springer, Heidelberg (2005)
Chapter Google Scholar
Gutwin, C., Paynter, G., Witten, I., Nevill-Manning, C., Frank, E.: Improving browsing in digital libraries with keyphrase indexes. Decis. Support Syst. 27, 81–104 (1999)
Article Google Scholar
Bracewell, D.B., Ren, F., Kuriowa, S.: Multilingual single document keyword extraction for information retrieval. In: IEEE Natural Language Processing and Knowledge Engineering, pp. 517–522 (2005)
Google Scholar
Turney, P.D.: Coherent keyphrase extraction via web mining. In: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, Acapulco, Mexico, pp. 434–442 (2003)
Google Scholar
Nart, D.D., Tasso, C.: A keyphrase generation technique based upon keyphrase extraction and reasoning on loosely structured ontologies. In: Proceedings of the 7th International Workshop on Information Filtering and Retrieval, Turin, Italy, pp. 49–60 (2013)
Google Scholar
Wang, R., Liu, W., McDonald, C.: How preprocessing affects unsupervised keyphrase extraction. In: Gelbukh, A. (ed.) CICLing 2014, Part I. LNCS, vol. 8403, pp. 163–176. Springer, Heidelberg (2014)
Chapter Google Scholar
Barker, K., Cornacchia, N.: Using noun phrase heads to extract document keyphrases. In: Hamilton, H.J. (ed.) Canadian AI 2000. LNCS (LNAI), vol. 1822, pp. 40–52. Springer, Heidelberg (2000)
Chapter Google Scholar
Zesch, T., Gurevych, I.: Approximate matching for evaluating keyphrase extraction. In: Recent Advances in Natural Language Processing, Borovets, Bulgaria, pp. 484–489 (2009)
Google Scholar
Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28, 11–21 (1972)
Article Google Scholar
Mihalcea, R., Tarau, P.: Textrank: Bringing order into texts. In: Lin, D., Wu, D. (eds.) Conference on Empirical Methods in Natural Language Processing, pp. 404–411. Association for Computational Linguistics, Barcelona (2004)
Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30, 107–117 (1998)
Article Google Scholar
Haddad, H.: French noun phrase indexing and mining for an information retrieval system. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 277–286. Springer, Heidelberg (2003)
Chapter Google Scholar
Paroubek, P., Zweigenbaum, P., Forest, D., Grouin, C.: Indexation libre et contrôlée d’articles scientifiques. présentation et résultats du défi fouille de textes deft2012 (controlled and free indexing of scientific papers. presentation and results of the deft2012 text-mining challenge) (in french). In: JEP-TALN-RECITAL 2012, Atelier DEFT 2012: DÉfi Fouille de Textes, pp. 1–13. ATALA/AFCP, Grenoble (2012)
Google Scholar
Quiniou, S., Cellier, P., Charnois, T., Legallois, D.: What about sequential data mining techniques to identify linguistic patterns for stylistics? In: Gelbukh, A. (ed.) CICLing 2012, Part I. LNCS, vol. 7181, pp. 166–177. Springer, Heidelberg (2012)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

ISG, Tunis University, Tunis, Tunisia
Chedi Bechikh Ali
LISI laboratory, INSAT, Carthage University, Tunis, Tunisia
Chedi Bechikh Ali
School of Computer Science and Software Engineering, The University of Western Australia, Crawley, Australia
Rui Wang
Department of Computer Engineering, Faculty of Engineering, Mevlana University, Konya, Turkey
Hatem Haddad

Authors

Chedi Bechikh Ali
View author publications
You can also search for this author in PubMed Google Scholar
Rui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hatem Haddad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chedi Bechikh Ali .

Editor information

Editors and Affiliations

Centro de Investigación en Computación, Instituto Politécnico Nacional, Mexico DF, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ali, C.B., Wang, R., Haddad, H. (2015). A Two-Level Keyphrase Extraction Approach. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9042. Springer, Cham. https://doi.org/10.1007/978-3-319-18117-2_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-18117-2_29
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18116-5
Online ISBN: 978-3-319-18117-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics