Improving Sentence Similarity Measurement by Incorporating Sentential Word Importance

Skabar, Andrew; Abdalgader, Khaled

doi:10.1007/978-3-642-17432-2_47

Andrew Skabar²⁰ &
Khaled Abdalgader²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6464))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

1821 Accesses
2 Citations

Abstract

Measuring similarity between sentences plays an important role in textual applications such as document summarization and question answering. While various sentence similarity measures have recently been proposed, these measures typically only take into account word importance by virtue of inverse document frequency (IDF) weighting. IDF values are based on global information compiled over a large corpus of documents, and we hypothesise that at the sentence level better performance can be achieved by using a measure of the importance of a word within the sentence that it appears. In this paper we show how the PageRank graph-centrality algorithm can be used to assign a numerical measure of importance to each word in a sentence, and how these values can be incorporated within various sentence similarity measures. Results from applying the measures to a difficult sentence clustering task demonstrates that incorporation of sentential word importance leads to statistically significant improvement in clustering performance as evaluated using a range of external clustering criteria.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Nomoto, M.: A New Approach to Unsupervised Text Summarization. In: Proceedings of the 24^th ACM SIGIR, pp. 26–34 (2001)
Google Scholar
Erkan, G., Radev, D.: LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. Journal of Art. Int. Research 22, 457–479 (2004)
Google Scholar
Salton, G.: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Li, Y., McLean, D., Bandar, Z., O’Shea, F., Crockett, K.: Sentence Similarity Based on Semantic Nets and Corpus Statistics. IEEE TKDE 18(8), 1138–1150 (2006)
Google Scholar
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and Knowledge-based Measures of Text Semantic Similarity. In: 21st National Conference on Art. Int., Boston, vol. 1, pp. 775–780 (2006)
Google Scholar
Islam, A., Inkpen, D.: Semantic Text Similarity using Corpus-based Word Similarity and String Similarity. ACM Trans. on KDD 2(2), 1–25 (2008)
Google Scholar
Achananuparp, P., Hu, X., Yang, C.: Addressing the Variability of Natural Language Expression in Sentence Similarity with Semantic Structure of the Sentences. In: PAKDD, pp. 548–555 (2009)
Google Scholar
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge
Google Scholar
Achananuparp, P., Hu, X., Shen, X.: The Evaluation of Sentence Similarity Measures. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 305–316. Springer, Heidelberg (2008)
Chapter Google Scholar
Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1998)
Article Google Scholar
Jiang, J.J., Conrath, D.W.: Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In: 10^th Inter. Conf. on Research in Computational Linguistics, pp. 19–33 (1997)
Google Scholar
Budanitsky, A., Hirst, G.: Evaluating WordNet-based Measures of Lexical Semantic Relatedness. Computational Linguistics 32(1), 13–47 (2006)
Article MATH Google Scholar
Kleinberg, J.M.: Authoritative Sources in a Hyperlinked Environment. Journal of the ACM (JACM) 46(5), 604–632 (1999)
Article MathSciNet MATH Google Scholar
Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. In: EMNLP, pp. 404–411 (2004)
Google Scholar
Lesk, M.: Automatic Sense Disambiguation using Machine Readable Dictionaries: How to tell a pine cone from an ice cream cone. In: Proc. of the SIGDOC, pp. 24–26 (1986)
Google Scholar
Dolan, W., Chris Quirk, C., Brockett, C.V.: Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources. In: 20th International Conf. on Computational Linguistics, pp. 350–356 (2004)
Google Scholar
Dagan, I., Dolan, B., Giampiccolo, D., Magnini, B.: The Third PASCAL Recognizing Textual Entailment Challenge. In: ACL-PASCAL Workshop on TEP, pp. 1–9 (2007)
Google Scholar
http://www.famousquotesandauthors.com/ (accessed May 26, 2010)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On Spectral Clustering: Analysis and an Algorithm. In: NIPS, pp. 849–856 (2001)
Google Scholar
Luxburg, V.: A Tutorial on Spectral Clustering. Statistics and Computing 17(4), 395–416 (2007)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Computer Engineering, La Trobe University, Bundoora, Australia
Andrew Skabar & Khaled Abdalgader

Authors

Andrew Skabar
View author publications
You can also search for this author in PubMed Google Scholar
Khaled Abdalgader
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer and Information Science, University of South Australia, 5095, Mawson Lakes, SA, Australia
Jiuyong Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Skabar, A., Abdalgader, K. (2010). Improving Sentence Similarity Measurement by Incorporating Sentential Word Importance. In: Li, J. (eds) AI 2010: Advances in Artificial Intelligence. AI 2010. Lecture Notes in Computer Science(), vol 6464. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17432-2_47

Download citation

DOI: https://doi.org/10.1007/978-3-642-17432-2_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17431-5
Online ISBN: 978-3-642-17432-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics