Improving Text Similarity Measurement by Critical Sentence Vector Model

Li, Wei; Wong, Kam-Fai; Yuan, Chunfa; Li, Wenjie; Xia, Yunqing

doi:10.1007/11562382_44

Wei Li²⁰,
Kam-Fai Wong²⁰,
Chunfa Yuan²¹,
Wenjie Li²² &
…
Yunqing Xia²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3689))

Included in the following conference series:

Asia Information Retrieval Symposium

1032 Accesses
1 Citations

Abstract

We propose the Critical Sentence Vector Model (CSVM), a novel model to measure text similarity. The CSVM accounts for the structural and semantic information of the document. Compared to existing methods based on keyword vector, e.g. Vector Space Model (VSM), CSVM measures documents similarity by measuring similarity between critical sentence vectors extracted from documents. Experiments show that CSVM outperforms VSM in calculation of text similarity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Salton, G., Wong, A., Yang, C.S.: A Vector Space Model for Automatic Indexing. Communications of the ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Raghavan, V.V., Wong, S.K.M.: A Critical Analysis of Vector Space Model for Information Retrieval. Journal of the American Society for Information Science, 279–287 (1986)
Google Scholar
Lee, D.L., Chuang, H., Seamons, K.: Document Ranking and the Vector-space Model. IEEE Software, 67–75 (1997)
Google Scholar
Maria, N., Silva, M.J.: Theme-based Retrieval of Web News. In: Proceedings of the Third International Workshop on the Web and Databases, pp. 26–33 (2000)
Google Scholar
Salton, G., Allan, J., Buckley, C.: Approaches to Passage Retrieval in Full Text Information Systems. In: ACM SIGIR conference on R&D in Information Retrieval, pp. 49–58 (1993)
Google Scholar
Quek, C.Y.: Classification of World Wide Web Documents. Technical Report, Carnegie Mellon University (1996)
Google Scholar
Sebastiani, F.: Machine Learning in Automated Text Categorization. Technical Report, IEI-B4-31-1999, Consiglio Nazionale delle Ricerche, Pisa, Italy (1999)
Google Scholar
Jing, H.: Sentence Reduction for Automatic Text Summarization. In: Proc. of the 6th Conference on Applied Natural Language Processing, pp. 310–315 (2000)
Google Scholar
Litkowski, K.C.: Question-answering Using Semantic Relation Triples. TREC8 (1999)
Google Scholar
Abney, S., Collins, M., Singhal, A.: Answer Extraction. In: Proc. of the 6th ANLP Conference, pp. 296–301 (2000)
Google Scholar
Pasca, M., Harabagiu, S.: High Performance Question & Answering. In: Proc. ACM SIGIR, New Orleans, pp. 366–374 (2001)
Google Scholar
Friburger, N., Maurel, D.: Textual Similarity Based on Proper Names. In: MF/IR (2002)
Google Scholar
Abney, S.: Partial Parsing via Finite-state Cascades. In: Workshop on Robust Parsing, 8th European Summer School in Logic, Language and Information, Prague, Czech Republic, pp. 8–15 (1996)
Google Scholar
Abney, S.: Partial Parsing. Tutorial at ANLP (1994)
Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms, 2nd edn.
Google Scholar
Strehl, J.G., Mooney, R.: Impact of Similarity Measures on Web Page Clustering. In: AAAI 2000 Workshop of Artificial Intelligence for Web Search, pp. 58–64 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Systems Engineering, the Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Wei Li, Kam-Fai Wong & Yunqing Xia
State Key Laboratory of Intelligent Technology and System, Tsinghua University, Beijing, 100084, China
Chunfa Yuan
Department of Computing, Hong Kong Polytechnic University, Hung Hom, Hong Kong
Wenjie Li

Authors

Wei Li
View author publications
You can also search for this author in PubMed Google Scholar
Kam-Fai Wong
View author publications
You can also search for this author in PubMed Google Scholar
Chunfa Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Wenjie Li
View author publications
You can also search for this author in PubMed Google Scholar
Yunqing Xia
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Pohang University of Science and Technology, San 31, Hyoja-dong, Nam-gu, 790-784, Pohang, Korea
Gary Geunbae Lee
Computer and Communication Media Research, NEC Corp., Miyazaki 4-1-1, Miyamae-ku, 216-8555, Kawasaki, Japan
Akio Yamada
Human-Computer Communications Laboratory, Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong
Helen Meng
School of Engineering, Information and Communications University, 119, Munjiro, Yuseong-gu, 305-732, Daejeon, Korea
Sung Hyon Myaeng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, W., Wong, KF., Yuan, C., Li, W., Xia, Y. (2005). Improving Text Similarity Measurement by Critical Sentence Vector Model. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds) Information Retrieval Technology. AIRS 2005. Lecture Notes in Computer Science, vol 3689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562382_44

Download citation

DOI: https://doi.org/10.1007/11562382_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29186-2
Online ISBN: 978-3-540-32001-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics