Abstract
We examine text retrieval strategies using the sparsified concept decomposition matrix. The centroid vector of a tightly structured text collection provides a general description of text documents in that collection. The union of the centroid vectors forms a concept matrix. The original text data matrix can be projected into the concept space spanned by the concept vectors. We propose a procedure to conduct text retrieval based on the sparsified concept decomposition (SCD) matrix. Our experimental results show that text retrieval based on SCD may enhance the retrieval accuracy and reduce the storage cost, compared with the popular text retrieval technique based on latent semantic indexing with singular value decomposition.
The research work of the authors was supported in part by the U.S. National Science Foundation under grants CCR-0092532, and ACR-0202934, and in part by the U.S. Department of Energy Office of Science under grant DE-FG02-02ER45961.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bassu, D., Behrens, C.: Distributed LSI: scalable concept-based information retrieval with high semantic resolution. In: Proceedings of the 2003 Text Mining Workshop, San Francisco, CA, pp. 72–82 (2003)
Berry, M.W., Drmac, Z., Jessup, E.R.: Matrix, vector space, and information retrieval. SIAM Rev. 41, 335–362 (1999)
Deerwester, S., Dumais, S.T., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. J. Amer. Soc. Infor. Sci. 41, 391–407 (1990)
Dhillon, I.S., Modha, D.S.: Concept decompositions for large sparse text data using clustering. Machine Learning 42, 143–175 (2001)
Gao, J., Zhang, J.: Clustered SVD strategies in latent semantic indexing. Technical Report No. 382-03, Department of Computer Science, University of Kentucky, Lexington, KY (2003)
Gao, J., Zhang, J.: Sparsification strategies in latent semantic indexing. In: Proceedings of the 2003 Text Mining Workshop, San Francisco, CA, pp. 93–103 (2003)
Husbands, P., Simon, H., Ding, C.: On the use of singular value decomposition for text retrieval. In: Computational Information Retrieval, pp. 145–156. SIAM, Philadelphia (2001)
Jain, A., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1998)
Sasaki, M., Kita, K.: Information retrieval system using concept projection based on PDDP algorithm. In: Pacific Association for Computational Linguistics (PACLING 2001), pp. 243–249 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gao, J., Zhang, J. (2004). Text Retrieval Using Sparsified Concept Decomposition Matrix. In: Zhang, J., He, JH., Fu, Y. (eds) Computational and Information Science. CIS 2004. Lecture Notes in Computer Science, vol 3314. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30497-5_82
Download citation
DOI: https://doi.org/10.1007/978-3-540-30497-5_82
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24127-0
Online ISBN: 978-3-540-30497-5
eBook Packages: Computer ScienceComputer Science (R0)