Skip to main content

Extracting Keyphrase Set with High Diversity and Coverage Using Structural SVM

  • Conference paper
Web Technologies and Applications (APWeb 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7235))

Included in the following conference series:

Abstract

Keyphrase extraction plays an important role in automatic document understanding. In order to obtain concise and comprehensive information about the content of document, the keyphrases extracted from a given document should meet two requirements. First, the keyphrases should be diverse to each other so as to avoid carrying duplicated information. Second, every keyphrases should cover various aspects of the topics in the document so as to avoid unnecessary information loss. In this paper, we address the issue of automatic keyphrases extraction, giving the emphasis on the diversity and coverage of keyphrases which is generally ignored in most conventional keyphrase extraction approaches. Specifically, the issue is formulated as a subset learning problem in the framework of structural learning and structural SVM is employed to preform the task. Experiments on a scientific literature dataset show that our approach outperforms several state-of-the-art keyphrase extraction approaches, which verifies the benefits of explicit diversity and coverage enhancement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lehtonen, M., Doucet, A.: Enhancing Keyword Search with a Keyphrase Index. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631, pp. 65–70. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  2. Wu, Y., Li, Q.: Document Keyphrases as Subject Metadata: Incorporating Document Key Concepts in Search Results. Information Retrieval 11, 229–249 (2008)

    Article  Google Scholar 

  3. Turney, P.D.: Learning Algorithms for Keyphrase Extraction. Information Retrieval 2, 303–336 (2000)

    Article  Google Scholar 

  4. Hulth, A.: Improved Automatic Keyword Extraction Given More Linguistic Knowledge. In: Proceedings of EMNLP, pp. 216–223 (2003)

    Google Scholar 

  5. Medelyan, O., Witten, I.H.: Thesaurus Based Automatic Keyphrase Indexing. In: Proceedings of JCDL, pp. 296–297 (2006)

    Google Scholar 

  6. Jiang, X., Hu, Y., Li, H.: A Ranking Approach to Keyphrase Extraction. In: Proceedings of SIGIR, pp. 756–757 (2009)

    Google Scholar 

  7. Yih, W., Goodman, J., Carvalho, V.R.: Finding Advertising Keywords on Web Pages. In: Proceedings of WWW, pp. 213–222 (2006)

    Google Scholar 

  8. Li, Z., Zhou, D., Juan, Y., Han, J.: Keyword Extraction for Social Snippets. In: Proceedings of WWW, pp. 1143–1144 (2010)

    Google Scholar 

  9. Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. In: Proceedings of EMNLP, pp. 404–411 (2004)

    Google Scholar 

  10. Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic Keyphrase Extraction via Topic Decomposition. In: Proceedings of EMNLP, pp. 366–376 (2010)

    Google Scholar 

  11. Wan, X., Yang, J., Xiao, J.: Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction. In: Proceedings of ACL, pp. 552–559 (2007)

    Google Scholar 

  12. Grineva, M., Grinev, M., Lizorkin, D.: Extracting Key Terms From Noisy and Multi-theme Documents. In: Proceedings of WWW, pp. 661–670 (2009)

    Google Scholar 

  13. Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large Margin Methods for Structured and Interdependent Output Variables. JMLR, 1453–1484 (2005)

    Google Scholar 

  14. Joachims, T., Finley, T., Yu, C.J.: Cutting-plane training of structural SVMs. Machine Learning, 27–59 (2009)

    Google Scholar 

  15. Yu, C.J., Joachims, T.: Training Structural SVMs with Kernels Using Sampled Cuts. In: Proceeding of SIGKDD, pp. 794–802 (2008)

    Google Scholar 

  16. Sarawagi, S., Gupta, R.: Accurate Max-Margin Training for Structured Output Spaces. In: Proceedings of ICML, pp. 888–895 (2008)

    Google Scholar 

  17. Yue, Y., Joachims, T.: Predicting Diverse Subsets Using Structural SVMs. In: Proceedings of ICML, pp. 1224–1231 (2008)

    Google Scholar 

  18. Zhu, L., Chen, Y., Yuille, A., Freeman, W.: Latent Hierarchical Structural Learning for Object Detection. In: Proceedings of CVPR, pp. 1062–1069 (2010)

    Google Scholar 

  19. Wan, S., Angryk, R.A.: Measuring semantic similarity using wordnet-based context vectors. In: Proceedings of IEEE ICMSC, pp. 908–913 (2007)

    Google Scholar 

  20. Islam, A., Inkpen, D.: Semantic Text Similarity Using Corpus-Based Word Similarity and String Similarity. ACM TKDE 2, 10–25 (2008)

    Google Scholar 

  21. Sahami, M., Heilman, T.D.: A Web-based Kernel Function for Measuring the Similarity of Short Text Snippets. In: Proceedings of WWW, pp. 377–386 (2006)

    Google Scholar 

  22. Miller, G.A.: WordNet: A Lexical Database for English. Communications of the ACM 38, 39–41 (1995)

    Article  Google Scholar 

  23. Landauer, T., Foltz, P.W., Laham, D.: Introduction to Latent Semantic Analysis. Discourse Processes 25, 259–284 (1998)

    Article  Google Scholar 

  24. Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C.: KEA: Practical Automatic Keyphrase Extraction. In: Proceedings of JCDL, pp. 254–255 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ni, W., Liu, T., Zeng, Q. (2012). Extracting Keyphrase Set with High Diversity and Coverage Using Structural SVM. In: Sheng, Q.Z., Wang, G., Jensen, C.S., Xu, G. (eds) Web Technologies and Applications. APWeb 2012. Lecture Notes in Computer Science, vol 7235. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29253-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29253-8_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29252-1

  • Online ISBN: 978-3-642-29253-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics