Abstract
Latent Semantic Indexing(LSI) has been proved to be effective to capture the semantic structure of document collections. It is widely used in content-based text retrieval. However, in many real-world applications dealing with very large document collections, LSI suffers from its high computational complexity, which comes from the process of Singular Value Decomposition(SVD). As a result, in practice, the folding-in method is widely used as an approximation to the LSI method. However, in practice, the folding-in method is generally implemented ”as is” and detailed analysis on its effectiveness and performance is left out. Consequentially, the performance of the folding-in method cannot be guaranteed. In this paper, we firstly illustrated the underlying principle of the folding-in method from a linear algebra point of view and analyzed some existing commonly used techniques. Based on the theoretical analysis, we proposed a novel algorithm to guide the implementation of the folding-in method. Our method was justified and evaluated by a series of experiments on various classical IR data sets. The results indicated that our method was effective and had consistent performance over different document collections.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Dumais, S.T.: LSI meets TREC: A status report. In: The First Text REtrieval Conference (TREC1), pp. 137–152 (1992)
Dumais, S.T.: Latent semantic indexing (LSI) and TREC-2. In: The Second Text REtrieval Conference (TREC2), pp. 105–116 (1993)
Dumais, S.T.: Latent semantic indexing (LSI): TREC-3 report. In: The Third Text REtrieval Conference (TREC3), pp. 105–115 (1994)
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1996)
Berry, M.W., Dumais, S.T., O’Brien, G.W.: Using linear algebra for intelligent information retrieval. SIAM Rev. 37(4), 573–595 (1995)
Berry, M.W., Drmač, Z., Jessup, E.R.: Matrix, vector spaces, and information retrieval. SIAM Rev. 41(2), 335–362 (1999)
Kontostathis, A., Pottenger, W.M.: A framework for understanding LSI performance. In: Proceedings of ACM SIGIR Workshop on Mathematical/Formal Methods in Information Retrieval (ACMSIGIRMF/IR 2003) (2003)
Eckart, C., Young, G.: The approximation of one matrix by another of lower rank. Psychometrika 1, 211–218 (1936)
Dumais, S.: Enhancing performance in latent semantic indexing (LSI) retrieval. Technical Report TM-ARH-017527 (1990)
O’Brien, G.W.: Information management tools for updating an SVD-encoded indexing scheme. Master’s thesis, The University of Knoxville, Tennessee, TN (1994)
Fierro, R.D., Jiang, E.P.: Lanczos and the Riemannian SVD in information retrieval applications. Numer. Linear Algebra Appl. 12(4), 355–372 (2005)
Chen, C.-M., Stoffel, N., Post, M., Basu, C., Bassu, D., Behrens, C.: Telcordia LSI engine: Implementation and scalability issues. In: RIDE 2001: Proceedings of the 11th International Workshop on research Issues in Data Engineering (2001)
Tang, C., Dwarkadas, S., Xu, Z.: On scaling latent semantic indexing for large peer-to-peer systems. In: SIGIR 2004: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 112–121 (2004)
Bassu, D., Behrens, C.: Distributed LSI: Scalable concept-based information retrieval with high semantic resolution. In: Proceedings of the 3rd SIAM International Conference on Data Mining (Text Mining Workshop) (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, X., Jin, X. (2006). Understanding and Enhancing the Folding-In Method in Latent Semantic Indexing. In: Bressan, S., Küng, J., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2006. Lecture Notes in Computer Science, vol 4080. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11827405_11
Download citation
DOI: https://doi.org/10.1007/11827405_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37871-6
Online ISBN: 978-3-540-37872-3
eBook Packages: Computer ScienceComputer Science (R0)