Abstract
In this paper, we present a new two-level approach to extract KeyPhrases from textual documents. Our approach relies on a linguistic analysis to extract candidate KeyPhrases and a statistical analysis to rank and filter the final KeyPhrases. We evaluated our approach on three publicly available corpora with documents of varying lengths, domains and languages including English and French. We obtained improvement of Precision, Recall and F-measure. Our results indicate that our approach is independent of the length, the domain and the language.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Turney, P.D.: Learning algorithms for keyphrase extraction. Inf. Retr. 2, 303–336 (2000)
Hasan, K.S., Ng, V.: Conundrums in unsupervised keyphrase extraction: Making sense of the state-of-the-art. In: 23rd International Conference on Computational Linguistics, Beijing, China, pp. 365–373 (2010)
Litvak, M., Last, M.: Graph-based keyword extraction for single-document summarization. In: Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization, pp. 17–24. Association for Computational Linguistics, Stroudsburg (2008)
Hammouda, K.M., Matute, D.N., Kamel, M.S.: CorePhrase: Keyphrase extraction for document clustering. In: Perner, P., Imiya, A. (eds.) MLDM 2005. LNCS (LNAI), vol. 3587, pp. 265–274. Springer, Heidelberg (2005)
Gutwin, C., Paynter, G., Witten, I., Nevill-Manning, C., Frank, E.: Improving browsing in digital libraries with keyphrase indexes. Decis. Support Syst. 27, 81–104 (1999)
Bracewell, D.B., Ren, F., Kuriowa, S.: Multilingual single document keyword extraction for information retrieval. In: IEEE Natural Language Processing and Knowledge Engineering, pp. 517–522 (2005)
Turney, P.D.: Coherent keyphrase extraction via web mining. In: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, Acapulco, Mexico, pp. 434–442 (2003)
Nart, D.D., Tasso, C.: A keyphrase generation technique based upon keyphrase extraction and reasoning on loosely structured ontologies. In: Proceedings of the 7th International Workshop on Information Filtering and Retrieval, Turin, Italy, pp. 49–60 (2013)
Wang, R., Liu, W., McDonald, C.: How preprocessing affects unsupervised keyphrase extraction. In: Gelbukh, A. (ed.) CICLing 2014, Part I. LNCS, vol. 8403, pp. 163–176. Springer, Heidelberg (2014)
Barker, K., Cornacchia, N.: Using noun phrase heads to extract document keyphrases. In: Hamilton, H.J. (ed.) Canadian AI 2000. LNCS (LNAI), vol. 1822, pp. 40–52. Springer, Heidelberg (2000)
Zesch, T., Gurevych, I.: Approximate matching for evaluating keyphrase extraction. In: Recent Advances in Natural Language Processing, Borovets, Bulgaria, pp. 484–489 (2009)
Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28, 11–21 (1972)
Mihalcea, R., Tarau, P.: Textrank: Bringing order into texts. In: Lin, D., Wu, D. (eds.) Conference on Empirical Methods in Natural Language Processing, pp. 404–411. Association for Computational Linguistics, Barcelona (2004)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30, 107–117 (1998)
Haddad, H.: French noun phrase indexing and mining for an information retrieval system. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 277–286. Springer, Heidelberg (2003)
Paroubek, P., Zweigenbaum, P., Forest, D., Grouin, C.: Indexation libre et contrôlée d’articles scientifiques. présentation et résultats du défi fouille de textes deft2012 (controlled and free indexing of scientific papers. presentation and results of the deft2012 text-mining challenge) (in french). In: JEP-TALN-RECITAL 2012, Atelier DEFT 2012: DÉfi Fouille de Textes, pp. 1–13. ATALA/AFCP, Grenoble (2012)
Quiniou, S., Cellier, P., Charnois, T., Legallois, D.: What about sequential data mining techniques to identify linguistic patterns for stylistics? In: Gelbukh, A. (ed.) CICLing 2012, Part I. LNCS, vol. 7181, pp. 166–177. Springer, Heidelberg (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ali, C.B., Wang, R., Haddad, H. (2015). A Two-Level Keyphrase Extraction Approach. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9042. Springer, Cham. https://doi.org/10.1007/978-3-319-18117-2_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-18117-2_29
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18116-5
Online ISBN: 978-3-319-18117-2
eBook Packages: Computer ScienceComputer Science (R0)