Skip to main content

Addressing the Variability of Natural Language Expression in Sentence Similarity with Semantic Structure of the Sentences

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5476))

Included in the following conference series:

Abstract

In this paper, we present a new approach that incorporates semantic structure of sentences, in a form of verb-argument structure, to measure semantic similarity between sentences. The variability of natural language expression makes it difficult for existing text similarity measures to accurately identify semantically similar sentences since sentences conveying the same fact or concept may be composed lexically and syntactically different. Inversely, sentences which are lexically common may not necessarily convey the same meaning. This poses a significant impact on many text mining applications’ performance where sentence-level judgment is involved. The evaluation has shown that, by processing sentence at its semantic level, the performance of similarity measures is significantly improved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Achananuparp, P., Han, H., Nasraoui, O., Johnson, R.: Semantically enhanced user modeling. In: Proceedings of the 2007 ACM Symposium on Applied Computing, pp. 1335–1339. ACM Press, New York (2007)

    Chapter  Google Scholar 

  2. Achananuparp, P., Hu, X., Xiajiong, X.: The evaluation of sentence similarity measures. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 305–316. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  3. Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet project. In: Proceedings of the COLING-ACL, Montreal, Canada (1998)

    Google Scholar 

  4. Barzilay, R., Elhadad, N.: Sentence Alignment for Monolingual Comparable Corpora. In: Proceedings of EMNLP, Sapporo, Japan, pp. 25–33 (2003)

    Google Scholar 

  5. Bilotti, M.W., Ogilvie, P., Callan, J., Nyberg, E.: Structured retrieval for question answering. In: Proceedings SIGIR 2007, pp. 351–358. ACM, New York (2007)

    Google Scholar 

  6. Collobert, R., Weston, J.: Fast Semantic Extraction Using a Novel Neural Network Architecture. In: Proceedings of ACL 2007, Prague, Czech Republic, June 23–30 (2007)

    Google Scholar 

  7. Cordeiro, J., Dias, G., Brazdil, P.: A Metric for Paraphrase Detection. In: Proceedings ICCGI 2007, p. 7. IEEE Computer Society, Washington (2007)

    Google Scholar 

  8. Dagan, I., Glickman, O., Magnini, B.: The PASCAL recognising textual entailment challenge. In: Proceedings of the PASCAL Workshop (2005)

    Google Scholar 

  9. Dolan, W., Quirk, C., Brockett, C.: Unsupervised construction of large paraphrase corpora: Exploiting massively parallel new sources. In: Proceedings of the 20th International Conference on Computational Linguistics (2004)

    Google Scholar 

  10. Healy, A., Miller, G.: The verb as the main determinant of sentence meaning. Pschonomic Science 20, 372 (1970)

    Google Scholar 

  11. Hickl, A., Bensley, J.: A Discourse Commitment-Based Framework for Reconizing Textual Entailment. In: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Prague, pp. 171–176 (2007)

    Google Scholar 

  12. Metzler, D., Bernstein, Y., Croft, W., Moffat, A., Zobel, J.: Similarity measures for tracking information flow. In: Proceedings of CIKM 2005, pp. 517–524 (2005)

    Google Scholar 

  13. Metzler, D., Dumais, S.T., Meek, C.: Similarity Measures for Short Segments of Text. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECiR 2007. LNCS, vol. 4425, pp. 16–27. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  14. Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and Knowledge-based Measures of Text Semantic Similarity. In: Proceedings of AAAI 2006, Boston (July 2006)

    Google Scholar 

  15. Murdock, V.: Aspects of sentence retrieval. Ph.D. Thesis, University of Massachusetts (2006)

    Google Scholar 

  16. Palmer, M., Gildea, D., Kingsbury, P.: The Proposition Bank: An Annotated Corpus of Semantic Roles. Comput. Linguist. 31(1), 71–106 (2005)

    Article  Google Scholar 

  17. Park, Y., Byrd, R.J., Boguraev, B.K.: Automatic glossary extraction: beyond terminology identification. In: Proceedings of the 19th international Conference on Computational Linguistics, Taipei, Taiwan, August 24 - September 01, pp. 1–7 (2002)

    Google Scholar 

  18. Pradhan, S., Ward, W., Hacioglu, K., Martin, J.H., Jurafsky, D.: Shallow Semantic Parsing using Support Vector Machines. In: Proceedings of HLT/NAACL 2004, Boston, MA, May 2-7 (2004)

    Google Scholar 

  19. Shehata, S., Karray, F., Kamel, M.: A concept-based model for enhancing text categorization. In: Proceedings of KDD 2007, pp. 629–637. ACM, New York (2007)

    Google Scholar 

  20. Tatu, M., Moldovan, D.: COGEX at RTE3. In: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Prague, pp. 22–27 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Achananuparp, P., Hu, X., Yang, C.C. (2009). Addressing the Variability of Natural Language Expression in Sentence Similarity with Semantic Structure of the Sentences. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01307-2_52

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01306-5

  • Online ISBN: 978-3-642-01307-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics