Abstract
Many academic journals and conferences require that each article include a list of keyphrases. These keyphrases should provide general information about the contents and the topics of the article. Keyphrases may save precious time for tasks such as filtering, summarization, and categorization. In this paper, we investigate automatic extraction and learning of keyphrases from scientific articles written in English. Firstly, we introduce various baseline extraction methods. Some of them, formalized by us, are very successful for academic papers. Then, we integrate these methods using different machine learning methods. The best results have been achieved by J48, an improved variant of C4.5. These results are significantly better than those achieved by previous extraction systems, regarded as the state of the art.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alterman, R.: Text Summarization. In: Shapiro, S.C. (ed.) Encyclopedia of Artificial Intelligence, pp. 1579–1587. John Wiley & Sons, New York (1992)
Brandow, B., Mitze, K., Rau, L.F.: Automatic Condensation of Electronic Publications by Sentence Selection. Information Processing and Management 31(5), 675–685 (1994)
D’Avanzo, E., Magnini, B., Vallin, A.: Keyphrase Extraction for Summarization Purposes: The LAKE System at DUC 2004. In: Document Understanding Workshop (2004)
Dumais, S.T., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of ACM-CIK International Conference on Information and Knowledge Management, pp. 148–155. ACM Press, Philadelphia (1998)
Edmundson, H.P.: New Methods in Automatic Extraction. Journal of the ACM 16(2), 264–285 (1969)
Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-Specific Key-Phrase Extraction. In: Proc. IJCAI, pp. 668–673. Morgan Kaufmann, San Francisco (1999)
Gelbukh, A., Sidorov, G., Guzmán-Arenas, A.: A Method of Describing Document Contents through Topic Selection. In: Proc. SPIRE 1999, International Symposium on String Processing and Information Retrieval, Mexico, pp. 73–80 (1999)
Gelbukh, A., Sidorov, G., Han, S.-Y., Hernandez-Rubio, E.: Automatic Syntactic Analysis for Detection of Word Combinations. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 243–247. Springer, Heidelberg (2004)
HaCohen-Kerner, Y.: Automatic Extraction of Keywords from Abstracts. In: Palade, V., Howlett, R.J., Jain, L. (eds.) KES 2003. LNCS, vol. 2773, pp. 843–849. Springer, Heidelberg (2003)
HaCohen-Kerner, Y., Malin, E., Chasson, I.: Summarization of Jewish Law Articles in Hebrew. In: Proceedings of the 16th International Conference on Computer Applications in Industry and Engineering, pp. 172–177. International Society for Computers and Their Applications (ISCA), Las Vegas (2003)
HaCohen-Kerner, Y., Stern, I., Korkus, D.: Baseline Keyphrase Extraction Methods from Hebrew News HTML Documents. WSEAS Transactions on Information Science and Applications 6(1), 1557–1562 (2004)
Hulth, A.: Improved Automatic Keyword Extraction Given More Linguistic Knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 216–223 (2003)
Hulth, A.: Reducing False Positives by Expert Combination in Automatic Keyword Indexing. In: Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP 2003), Borovets, pp. 197–203 (2003)
Humphreys, K.J.B.: Phraserate: An HTML Keyphrase Extractor. Technical report, University of California, Riverside, California (2002)
Jones, S., Paynter, G.W.: Automatic Extraction of Document Keyphrases for Use in Digital Libraries: Evaluation and Applications. Journal of the American Society for Information Science and Technology 53(8), 653–677 (2002)
Kupiec, J., Pederson, J., Chen, F.: A Trainable Document Summarizer. In: Proceedings of the 18th Annual International ACM SIGIR, pp. 68–73 (1995)
Luhn, H.P.: The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development 2(2), 159–165 (1958)
Mani, I., Maybury, M.T.: Advances in Automatic Text Summarization, pp. ix–xv. MIT Press, Cambridge (1999)
Neto, J.L., Freitas, A.A., Kaestner, C.A.A.: Automatic Text Summarization Using a Machine Learning Approach. In: Bittencourt, G., Ramalho, G.L. (eds.) SBIA 2002. LNCS (LNAI), vol. 2507, pp. 205–215. Springer, Heidelberg (2002)
Quinlan, J.R.: C4.5: Programs For Machine Learning. Morgan Kaufmann, Los Altos (1993)
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice-Hall, Upper Saddle River (1995)
Turney, P.: Learning Algorithms for Keyphrase Extraction. Information Retrieval Journal 2(4), 303–336 (2000)
Turney, P.: Coherent Keyphrase Extraction via Web Mining. In: Proceedings of IJCAI 2003, pp. 434–439 (2003)
Wu, J., Agogino, A.M.: Automating Keyphrase Building with Multi-Objective Genetic Algorithms. In: Proceedings of the 37th Annual Hawaii International Conference on System Science, HICSS, pp. 104–111 (2003)
Weka (2004), http://www.cs.waikato.ac.nz/~ml/weka
Yang, Y., Webb, G.I.: Weighted Proportional k-Interval Discretization for Naïve-Bayes Classifiers. In: Proceedings of the 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 501–512 (2003)
Zhang, Y., Milios, E., Zincir-Heywood, N.: A Comparison of Keyword- and Keyterm-based Methods for Automatic Web Site Summarization, in Technical Report WS-04-01, Papers from the on Adaptive Text Extraction and Mining, San Jose, CA, pp. 15–20 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
HaCohen-Kerner, Y., Gross, Z., Masa, A. (2005). Automatic Extraction and Learning of Keyphrases from Scientific Articles. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30586-6_74
Download citation
DOI: https://doi.org/10.1007/978-3-540-30586-6_74
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24523-0
Online ISBN: 978-3-540-30586-6
eBook Packages: Computer ScienceComputer Science (R0)