Skip to main content

Improving Sentence Similarity Measurement by Incorporating Sentential Word Importance

  • Conference paper
AI 2010: Advances in Artificial Intelligence (AI 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6464))

Included in the following conference series:

Abstract

Measuring similarity between sentences plays an important role in textual applications such as document summarization and question answering. While various sentence similarity measures have recently been proposed, these measures typically only take into account word importance by virtue of inverse document frequency (IDF) weighting. IDF values are based on global information compiled over a large corpus of documents, and we hypothesise that at the sentence level better performance can be achieved by using a measure of the importance of a word within the sentence that it appears. In this paper we show how the PageRank graph-centrality algorithm can be used to assign a numerical measure of importance to each word in a sentence, and how these values can be incorporated within various sentence similarity measures. Results from applying the measures to a difficult sentence clustering task demonstrates that incorporation of sentential word importance leads to statistically significant improvement in clustering performance as evaluated using a range of external clustering criteria.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Nomoto, M.: A New Approach to Unsupervised Text Summarization. In: Proceedings of the 24th ACM SIGIR, pp. 26–34 (2001)

    Google Scholar 

  2. Erkan, G., Radev, D.: LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. Journal of Art. Int. Research 22, 457–479 (2004)

    Google Scholar 

  3. Salton, G.: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)

    Google Scholar 

  4. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  5. Li, Y., McLean, D., Bandar, Z., O’Shea, F., Crockett, K.: Sentence Similarity Based on Semantic Nets and Corpus Statistics. IEEE TKDE 18(8), 1138–1150 (2006)

    Google Scholar 

  6. Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and Knowledge-based Measures of Text Semantic Similarity. In: 21st National Conference on Art. Int., Boston, vol. 1, pp. 775–780 (2006)

    Google Scholar 

  7. Islam, A., Inkpen, D.: Semantic Text Similarity using Corpus-based Word Similarity and String Similarity. ACM Trans. on KDD 2(2), 1–25 (2008)

    Google Scholar 

  8. Achananuparp, P., Hu, X., Yang, C.: Addressing the Variability of Natural Language Expression in Sentence Similarity with Semantic Structure of the Sentences. In: PAKDD, pp. 548–555 (2009)

    Google Scholar 

  9. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge

    Google Scholar 

  10. Achananuparp, P., Hu, X., Shen, X.: The Evaluation of Sentence Similarity Measures. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 305–316. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  11. Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1998)

    Article  Google Scholar 

  12. Jiang, J.J., Conrath, D.W.: Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In: 10th Inter. Conf. on Research in Computational Linguistics, pp. 19–33 (1997)

    Google Scholar 

  13. Budanitsky, A., Hirst, G.: Evaluating WordNet-based Measures of Lexical Semantic Relatedness. Computational Linguistics 32(1), 13–47 (2006)

    Article  MATH  Google Scholar 

  14. Kleinberg, J.M.: Authoritative Sources in a Hyperlinked Environment. Journal of the ACM (JACM) 46(5), 604–632 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  15. Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. In: EMNLP, pp. 404–411 (2004)

    Google Scholar 

  16. Lesk, M.: Automatic Sense Disambiguation using Machine Readable Dictionaries: How to tell a pine cone from an ice cream cone. In: Proc. of the SIGDOC, pp. 24–26 (1986)

    Google Scholar 

  17. Dolan, W., Chris Quirk, C., Brockett, C.V.: Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources. In: 20th International Conf. on Computational Linguistics, pp. 350–356 (2004)

    Google Scholar 

  18. Dagan, I., Dolan, B., Giampiccolo, D., Magnini, B.: The Third PASCAL Recognizing Textual Entailment Challenge. In: ACL-PASCAL Workshop on TEP, pp. 1–9 (2007)

    Google Scholar 

  19. http://www.famousquotesandauthors.com/ (accessed May 26, 2010)

  20. Ng, A.Y., Jordan, M.I., Weiss, Y.: On Spectral Clustering: Analysis and an Algorithm. In: NIPS, pp. 849–856 (2001)

    Google Scholar 

  21. Luxburg, V.: A Tutorial on Spectral Clustering. Statistics and Computing 17(4), 395–416 (2007)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Skabar, A., Abdalgader, K. (2010). Improving Sentence Similarity Measurement by Incorporating Sentential Word Importance. In: Li, J. (eds) AI 2010: Advances in Artificial Intelligence. AI 2010. Lecture Notes in Computer Science(), vol 6464. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17432-2_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17432-2_47

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17431-5

  • Online ISBN: 978-3-642-17432-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics