Abstract
We propose the Critical Sentence Vector Model (CSVM), a novel model to measure text similarity. The CSVM accounts for the structural and semantic information of the document. Compared to existing methods based on keyword vector, e.g. Vector Space Model (VSM), CSVM measures documents similarity by measuring similarity between critical sentence vectors extracted from documents. Experiments show that CSVM outperforms VSM in calculation of text similarity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Salton, G., Wong, A., Yang, C.S.: A Vector Space Model for Automatic Indexing. Communications of the ACM 18(11), 613–620 (1975)
Raghavan, V.V., Wong, S.K.M.: A Critical Analysis of Vector Space Model for Information Retrieval. Journal of the American Society for Information Science, 279–287 (1986)
Lee, D.L., Chuang, H., Seamons, K.: Document Ranking and the Vector-space Model. IEEE Software, 67–75 (1997)
Maria, N., Silva, M.J.: Theme-based Retrieval of Web News. In: Proceedings of the Third International Workshop on the Web and Databases, pp. 26–33 (2000)
Salton, G., Allan, J., Buckley, C.: Approaches to Passage Retrieval in Full Text Information Systems. In: ACM SIGIR conference on R&D in Information Retrieval, pp. 49–58 (1993)
Quek, C.Y.: Classification of World Wide Web Documents. Technical Report, Carnegie Mellon University (1996)
Sebastiani, F.: Machine Learning in Automated Text Categorization. Technical Report, IEI-B4-31-1999, Consiglio Nazionale delle Ricerche, Pisa, Italy (1999)
Jing, H.: Sentence Reduction for Automatic Text Summarization. In: Proc. of the 6th Conference on Applied Natural Language Processing, pp. 310–315 (2000)
Litkowski, K.C.: Question-answering Using Semantic Relation Triples. TREC8 (1999)
Abney, S., Collins, M., Singhal, A.: Answer Extraction. In: Proc. of the 6th ANLP Conference, pp. 296–301 (2000)
Pasca, M., Harabagiu, S.: High Performance Question & Answering. In: Proc. ACM SIGIR, New Orleans, pp. 366–374 (2001)
Friburger, N., Maurel, D.: Textual Similarity Based on Proper Names. In: MF/IR (2002)
Abney, S.: Partial Parsing via Finite-state Cascades. In: Workshop on Robust Parsing, 8th European Summer School in Logic, Language and Information, Prague, Czech Republic, pp. 8–15 (1996)
Abney, S.: Partial Parsing. Tutorial at ANLP (1994)
Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms, 2nd edn.
Strehl, J.G., Mooney, R.: Impact of Similarity Measures on Web Page Clustering. In: AAAI 2000 Workshop of Artificial Intelligence for Web Search, pp. 58–64 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, W., Wong, KF., Yuan, C., Li, W., Xia, Y. (2005). Improving Text Similarity Measurement by Critical Sentence Vector Model. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds) Information Retrieval Technology. AIRS 2005. Lecture Notes in Computer Science, vol 3689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562382_44
Download citation
DOI: https://doi.org/10.1007/11562382_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29186-2
Online ISBN: 978-3-540-32001-2
eBook Packages: Computer ScienceComputer Science (R0)