Skip to main content

Improving Text Similarity Measurement by Critical Sentence Vector Model

  • Conference paper
Information Retrieval Technology (AIRS 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3689))

Included in the following conference series:

Abstract

We propose the Critical Sentence Vector Model (CSVM), a novel model to measure text similarity. The CSVM accounts for the structural and semantic information of the document. Compared to existing methods based on keyword vector, e.g. Vector Space Model (VSM), CSVM measures documents similarity by measuring similarity between critical sentence vectors extracted from documents. Experiments show that CSVM outperforms VSM in calculation of text similarity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Salton, G., Wong, A., Yang, C.S.: A Vector Space Model for Automatic Indexing. Communications of the ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  2. Raghavan, V.V., Wong, S.K.M.: A Critical Analysis of Vector Space Model for Information Retrieval. Journal of the American Society for Information Science, 279–287 (1986)

    Google Scholar 

  3. Lee, D.L., Chuang, H., Seamons, K.: Document Ranking and the Vector-space Model. IEEE Software, 67–75 (1997)

    Google Scholar 

  4. Maria, N., Silva, M.J.: Theme-based Retrieval of Web News. In: Proceedings of the Third International Workshop on the Web and Databases, pp. 26–33 (2000)

    Google Scholar 

  5. Salton, G., Allan, J., Buckley, C.: Approaches to Passage Retrieval in Full Text Information Systems. In: ACM SIGIR conference on R&D in Information Retrieval, pp. 49–58 (1993)

    Google Scholar 

  6. Quek, C.Y.: Classification of World Wide Web Documents. Technical Report, Carnegie Mellon University (1996)

    Google Scholar 

  7. Sebastiani, F.: Machine Learning in Automated Text Categorization. Technical Report, IEI-B4-31-1999, Consiglio Nazionale delle Ricerche, Pisa, Italy (1999)

    Google Scholar 

  8. Jing, H.: Sentence Reduction for Automatic Text Summarization. In: Proc. of the 6th Conference on Applied Natural Language Processing, pp. 310–315 (2000)

    Google Scholar 

  9. Litkowski, K.C.: Question-answering Using Semantic Relation Triples. TREC8 (1999)

    Google Scholar 

  10. Abney, S., Collins, M., Singhal, A.: Answer Extraction. In: Proc. of the 6th ANLP Conference, pp. 296–301 (2000)

    Google Scholar 

  11. Pasca, M., Harabagiu, S.: High Performance Question & Answering. In: Proc. ACM SIGIR, New Orleans, pp. 366–374 (2001)

    Google Scholar 

  12. Friburger, N., Maurel, D.: Textual Similarity Based on Proper Names. In: MF/IR (2002)

    Google Scholar 

  13. Abney, S.: Partial Parsing via Finite-state Cascades. In: Workshop on Robust Parsing, 8th European Summer School in Logic, Language and Information, Prague, Czech Republic, pp. 8–15 (1996)

    Google Scholar 

  14. Abney, S.: Partial Parsing. Tutorial at ANLP (1994)

    Google Scholar 

  15. Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms, 2nd edn.

    Google Scholar 

  16. Strehl, J.G., Mooney, R.: Impact of Similarity Measures on Web Page Clustering. In: AAAI 2000 Workshop of Artificial Intelligence for Web Search, pp. 58–64 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, W., Wong, KF., Yuan, C., Li, W., Xia, Y. (2005). Improving Text Similarity Measurement by Critical Sentence Vector Model. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds) Information Retrieval Technology. AIRS 2005. Lecture Notes in Computer Science, vol 3689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562382_44

Download citation

  • DOI: https://doi.org/10.1007/11562382_44

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29186-2

  • Online ISBN: 978-3-540-32001-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics