Keyword Extraction Using Support Vector Machine

Zhang, Kuo; Xu, Hui; Tang, Jie; Li, Juanzi

doi:10.1007/11775300_8

Keyword Extraction Using Support Vector Machine

Kuo Zhang¹⁹,
Hui Xu¹⁹,
Jie Tang¹⁹ &
…
Juanzi Li¹⁹

Conference paper

2111 Accesses
86 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4016))

Abstract

This paper is concerned with keyword extraction. By keyword extraction, we mean extracting a subset of words/phrases from a document that can describe the ‘meaning’ of the document. Keywords are of benefit to many text mining applications. However, a large number of documents do not have keywords and thus it is necessary to assign keywords before enjoying the benefit from it. Several research efforts have been done on keyword extraction. These methods make use of the ‘global context information’, which makes the performance of extraction restricted. A thorough and systematic investigation on the issue is thus needed. In this paper, we propose to make use of not only ‘global context information’, but also ‘local context information’ for extracting keywords from documents. As far as we know, utilizing both ‘global context information’ and ‘local context information’ in keyword extraction has not been sufficiently investigated previously. Methods for performing the tasks on the basis of Support Vector Machines have also been proposed in this paper. Features in the model have been defined. Experimental results indicate that the proposed SVM based method can significantly outperform the baseline methods for keyword extraction. The proposed method has been applied to document classification, a typical text mining processing. Experimental results show that the accuracy of document classification can be significantly improved by using the keyword extraction method.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Azcarraga, A., Yap, T.J., Chua, T.S.: Comparing Keyword Extraction Techniques for WEBSOM Text Archives. International Journal of Artificial Intelligence Tools 11(2), 219–232 (2000)
Google Scholar
Berger, A.L., Mittal, V.O.: OCELOT: A System for Summarizing Web Pages. In: Proceedings of the 23rd ACM SIGIR Conference, pp. 144–151 (2000)
Google Scholar
Brill, E., Ngai, G.: Man vs. machine: A case study in baseNP learning. In: Proceedings of the 18th International Conference on Computational Linguistics, pp. 65–72 (1999)
Google Scholar
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, Philadelphia (2002)
Google Scholar
Document Understanding Conference, http://www-nlpir.nist.gov/projects/duc/
Dumais, S.T., Platt, J., Heckerman, D., Sahami, M.: Inductive Learning Algorithms and Representations for Text Categorization. In: Proceedings of the 7th International Conference on Information and Knowledge Management, pp. 148–155 (1998)
Google Scholar
Frank, E., Paynter, G.W., Witten, I.H.: Domain-Specific Keyphrase Extraction. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, pp. 668–673. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Hulth, A.: Combining Machine Learning and Natural Language Processing for Automatic Keyword Extraction. Ph.D. diss., Dept. of Computer and Systems Sciences, Stockholm University (2004)
Google Scholar
Mani, I., Maybury, M.T.: Advances in Automatic Text Summarization. The MIT Press, Cambridge (1999)
Google Scholar
Mani, I.: Automatic Summarization. John Benjamins Pub.Co., Amsterdam (2001)
MATH Google Scholar
Matsuo, Y., Ishizuka, M.: Keyword Extraction from a Single Document using Word Co-occurrence Statistical Information. Int’l Journal on Artificial Intelligence Tools 13(1), 157–169 (2004)
Article Google Scholar
Miller, G., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J.: Wordnet: An On-line Lexical Database. International Journal of Lexicography 3(4), 235–312 (1990)
Article Google Scholar
Sleator, D., Temperley, D.: Parsing English with a Link Grammar. Technical Report, CMU-CS-91-196, Dept. of Computer Science, Carnegie Mellon University (1991)
Google Scholar
Tang, J., Li, J.Z., Wang, K.H., Cai, Y.R.: Loss Minimization based Keyword Distillation. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 572–577. Springer, Heidelberg (2004)
Chapter Google Scholar
Turney, P.D.: Learning Algorithms for Keyphrase Extraction. Information Retrieval 2(4), 303–336 (2000)
Article Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
MATH Google Scholar
Witten, I.H., Paynter, G.W., et al.: KEA: Practical Automatic Keyphrase Extraction. In: Proceedings of 4th ACM Conference on Digital Libraries, Berkeley, CA, pp. 254–255 (1999)
Google Scholar
Xun, E., Huang, C., Zhou, M.: A Unified Statistical Model for the Identification of English baseNP. In: Proceedings of the 38th Annual Meeting of the Association for ComputationalLinguistics, Hong Kong (2000)
Google Scholar
Zha, H.: Generic Summarization and Keyphrase Extraction Using Mutual Reinforcement Principle and Sentence Clustering. In: Proceedings of the 25th ACM SIGIR Conference, pp. 113–120 (2002)
Google Scholar
Zhu, M., Cai, Z., Cai, Q.: Automatic Keywords Extraction of Chinese Document Using Small World Structure. In: Proceeding of the international conference on Natural Language Processing and Knowledge Engineering (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, P.R.China
Kuo Zhang, Hui Xu, Jie Tang & Juanzi Li

Authors

Kuo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hui Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Tang
View author publications
You can also search for this author in PubMed Google Scholar
Juanzi Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Chinese University of Hong Kong, Hong Kong, China
Jeffrey Xu Yu
Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, 153-8505, Tokyo, Japan
Masaru Kitsuregawa
Department of Computing, Hong Kong Polytechnic University, Hong Kong
Hong Va Leong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, K., Xu, H., Tang, J., Li, J. (2006). Keyword Extraction Using Support Vector Machine. In: Yu, J.X., Kitsuregawa, M., Leong, H.V. (eds) Advances in Web-Age Information Management. WAIM 2006. Lecture Notes in Computer Science, vol 4016. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11775300_8

Download citation

DOI: https://doi.org/10.1007/11775300_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35225-9
Online ISBN: 978-3-540-35226-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics