Abstract
Keyphrase extraction plays an important role in automatic document understanding. In order to obtain concise and comprehensive information about the content of document, the keyphrases extracted from a given document should meet two requirements. First, the keyphrases should be diverse to each other so as to avoid carrying duplicated information. Second, every keyphrases should cover various aspects of the topics in the document so as to avoid unnecessary information loss. In this paper, we address the issue of automatic keyphrases extraction, giving the emphasis on the diversity and coverage of keyphrases which is generally ignored in most conventional keyphrase extraction approaches. Specifically, the issue is formulated as a subset learning problem in the framework of structural learning and structural SVM is employed to preform the task. Experiments on a scientific literature dataset show that our approach outperforms several state-of-the-art keyphrase extraction approaches, which verifies the benefits of explicit diversity and coverage enhancement.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lehtonen, M., Doucet, A.: Enhancing Keyword Search with a Keyphrase Index. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631, pp. 65–70. Springer, Heidelberg (2009)
Wu, Y., Li, Q.: Document Keyphrases as Subject Metadata: Incorporating Document Key Concepts in Search Results. Information Retrieval 11, 229–249 (2008)
Turney, P.D.: Learning Algorithms for Keyphrase Extraction. Information Retrieval 2, 303–336 (2000)
Hulth, A.: Improved Automatic Keyword Extraction Given More Linguistic Knowledge. In: Proceedings of EMNLP, pp. 216–223 (2003)
Medelyan, O., Witten, I.H.: Thesaurus Based Automatic Keyphrase Indexing. In: Proceedings of JCDL, pp. 296–297 (2006)
Jiang, X., Hu, Y., Li, H.: A Ranking Approach to Keyphrase Extraction. In: Proceedings of SIGIR, pp. 756–757 (2009)
Yih, W., Goodman, J., Carvalho, V.R.: Finding Advertising Keywords on Web Pages. In: Proceedings of WWW, pp. 213–222 (2006)
Li, Z., Zhou, D., Juan, Y., Han, J.: Keyword Extraction for Social Snippets. In: Proceedings of WWW, pp. 1143–1144 (2010)
Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. In: Proceedings of EMNLP, pp. 404–411 (2004)
Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic Keyphrase Extraction via Topic Decomposition. In: Proceedings of EMNLP, pp. 366–376 (2010)
Wan, X., Yang, J., Xiao, J.: Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction. In: Proceedings of ACL, pp. 552–559 (2007)
Grineva, M., Grinev, M., Lizorkin, D.: Extracting Key Terms From Noisy and Multi-theme Documents. In: Proceedings of WWW, pp. 661–670 (2009)
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large Margin Methods for Structured and Interdependent Output Variables. JMLR, 1453–1484 (2005)
Joachims, T., Finley, T., Yu, C.J.: Cutting-plane training of structural SVMs. Machine Learning, 27–59 (2009)
Yu, C.J., Joachims, T.: Training Structural SVMs with Kernels Using Sampled Cuts. In: Proceeding of SIGKDD, pp. 794–802 (2008)
Sarawagi, S., Gupta, R.: Accurate Max-Margin Training for Structured Output Spaces. In: Proceedings of ICML, pp. 888–895 (2008)
Yue, Y., Joachims, T.: Predicting Diverse Subsets Using Structural SVMs. In: Proceedings of ICML, pp. 1224–1231 (2008)
Zhu, L., Chen, Y., Yuille, A., Freeman, W.: Latent Hierarchical Structural Learning for Object Detection. In: Proceedings of CVPR, pp. 1062–1069 (2010)
Wan, S., Angryk, R.A.: Measuring semantic similarity using wordnet-based context vectors. In: Proceedings of IEEE ICMSC, pp. 908–913 (2007)
Islam, A., Inkpen, D.: Semantic Text Similarity Using Corpus-Based Word Similarity and String Similarity. ACM TKDE 2, 10–25 (2008)
Sahami, M., Heilman, T.D.: A Web-based Kernel Function for Measuring the Similarity of Short Text Snippets. In: Proceedings of WWW, pp. 377–386 (2006)
Miller, G.A.: WordNet: A Lexical Database for English. Communications of the ACM 38, 39–41 (1995)
Landauer, T., Foltz, P.W., Laham, D.: Introduction to Latent Semantic Analysis. Discourse Processes 25, 259–284 (1998)
Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C.: KEA: Practical Automatic Keyphrase Extraction. In: Proceedings of JCDL, pp. 254–255 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ni, W., Liu, T., Zeng, Q. (2012). Extracting Keyphrase Set with High Diversity and Coverage Using Structural SVM. In: Sheng, Q.Z., Wang, G., Jensen, C.S., Xu, G. (eds) Web Technologies and Applications. APWeb 2012. Lecture Notes in Computer Science, vol 7235. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29253-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-29253-8_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29252-1
Online ISBN: 978-3-642-29253-8
eBook Packages: Computer ScienceComputer Science (R0)