Skip to main content

Understanding and Enhancing the Folding-In Method in Latent Semantic Indexing

  • Conference paper
Database and Expert Systems Applications (DEXA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4080))

Included in the following conference series:

Abstract

Latent Semantic Indexing(LSI) has been proved to be effective to capture the semantic structure of document collections. It is widely used in content-based text retrieval. However, in many real-world applications dealing with very large document collections, LSI suffers from its high computational complexity, which comes from the process of Singular Value Decomposition(SVD). As a result, in practice, the folding-in method is widely used as an approximation to the LSI method. However, in practice, the folding-in method is generally implemented ”as is” and detailed analysis on its effectiveness and performance is left out. Consequentially, the performance of the folding-in method cannot be guaranteed. In this paper, we firstly illustrated the underlying principle of the folding-in method from a linear algebra point of view and analyzed some existing commonly used techniques. Based on the theoretical analysis, we proposed a novel algorithm to guide the implementation of the folding-in method. Our method was justified and evaluated by a series of experiments on various classical IR data sets. The results indicated that our method was effective and had consistent performance over different document collections.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  2. Dumais, S.T.: LSI meets TREC: A status report. In: The First Text REtrieval Conference (TREC1), pp. 137–152 (1992)

    Google Scholar 

  3. Dumais, S.T.: Latent semantic indexing (LSI) and TREC-2. In: The Second Text REtrieval Conference (TREC2), pp. 105–116 (1993)

    Google Scholar 

  4. Dumais, S.T.: Latent semantic indexing (LSI): TREC-3 report. In: The Third Text REtrieval Conference (TREC3), pp. 105–115 (1994)

    Google Scholar 

  5. Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1996)

    MATH  Google Scholar 

  6. Berry, M.W., Dumais, S.T., O’Brien, G.W.: Using linear algebra for intelligent information retrieval. SIAM Rev. 37(4), 573–595 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  7. Berry, M.W., Drmač, Z., Jessup, E.R.: Matrix, vector spaces, and information retrieval. SIAM Rev. 41(2), 335–362 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  8. Kontostathis, A., Pottenger, W.M.: A framework for understanding LSI performance. In: Proceedings of ACM SIGIR Workshop on Mathematical/Formal Methods in Information Retrieval (ACMSIGIRMF/IR 2003) (2003)

    Google Scholar 

  9. Eckart, C., Young, G.: The approximation of one matrix by another of lower rank. Psychometrika 1, 211–218 (1936)

    Article  Google Scholar 

  10. Dumais, S.: Enhancing performance in latent semantic indexing (LSI) retrieval. Technical Report TM-ARH-017527 (1990)

    Google Scholar 

  11. O’Brien, G.W.: Information management tools for updating an SVD-encoded indexing scheme. Master’s thesis, The University of Knoxville, Tennessee, TN (1994)

    Google Scholar 

  12. Fierro, R.D., Jiang, E.P.: Lanczos and the Riemannian SVD in information retrieval applications. Numer. Linear Algebra Appl. 12(4), 355–372 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  13. Chen, C.-M., Stoffel, N., Post, M., Basu, C., Bassu, D., Behrens, C.: Telcordia LSI engine: Implementation and scalability issues. In: RIDE 2001: Proceedings of the 11th International Workshop on research Issues in Data Engineering (2001)

    Google Scholar 

  14. Tang, C., Dwarkadas, S., Xu, Z.: On scaling latent semantic indexing for large peer-to-peer systems. In: SIGIR 2004: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 112–121 (2004)

    Google Scholar 

  15. Bassu, D., Behrens, C.: Distributed LSI: Scalable concept-based information retrieval with high semantic resolution. In: Proceedings of the 3rd SIAM International Conference on Data Mining (Text Mining Workshop) (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, X., Jin, X. (2006). Understanding and Enhancing the Folding-In Method in Latent Semantic Indexing. In: Bressan, S., Küng, J., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2006. Lecture Notes in Computer Science, vol 4080. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11827405_11

Download citation

  • DOI: https://doi.org/10.1007/11827405_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-37871-6

  • Online ISBN: 978-3-540-37872-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics