ABSTRACT
Latent Semantic Indexing (LSI) dramatically reduces the dimension of the document space by mapping it into a space spanned by conceptual indices. Empirically, the number of concepts that can represent the documents are far fewer than the great variety of words in the textual representation. Although this almost obviates the problem of lexical matching, the mapping incurs a high computational cost compared to document parsing, indexing, query matching, and updating. This paper shows how LSI is based on a unitary transformation, for which there are computationally more attractive alternatives. This is exemplified by the Haar transform, which is memory efficient, and can be computed in linear to sublinear time. The principle advantages of LSI are thus preserved while the computational costs are drastically reduced.
- 1.M.Berry,S.Dumais,and G.O 'Brien.Lowrank orthogonal decompositions for information retrieval applications.SIAM Review 37(4):573 -59,1995. Google ScholarDigital Library
- 2.M.Berry and R.Fierro.Low-rank orthogonal decomposition for information retrieval applications. Numerical Linear Algebra with Applications 1(1):1 -27, 1996.Google Scholar
- 3.I.Daubechies.Ten Lectures on Wavelets SIAM,1992. Google ScholarDigital Library
- 4.G.Eckart and G.Young.The approximation of one matrix by another of lower rank.Psychometrika 1:211 -218,1936.Google Scholar
- 5.A.Haar.Zur theorie der orthogonalen funktionensysteme.Annals of Mathematics 69:331 -371,1910.Google Scholar
- 6.A.Khokhar,P.Thulasiraman,G.Heber,and G.Gao. Load adaptive algorithms and implementations for the 2d discrete wavelet transform on fine-grain multithreaded architectures.In Proceedings of IPPS/PDPS 199 EEE Press,1999. Google ScholarDigital Library
Index Terms
- Unitary operators for fast latent semantic indexing (FLSI)
Recommendations
Essential Dimensions of Latent Semantic Indexing (LSI)
HICSS '07: Proceedings of the 40th Annual Hawaii International Conference on System SciencesLatent Semantic Indexing (LSI) is commonly used to match queries to documents in information retrieval applications. LSI has been shown to improve retrieval performance for some, but not all, collections, when compared to traditional vector space ...
Latent semantic indexing (LSI) fails for TREC collections
The aim of latent semantic indexing (LSI) is to uncover the relationships between terms, hidden concepts, and documents. LSI uses the matrix factorization technique known as singular value decomposition (SVD). In this paper, we apply LSI to standard ...
Term norm distribution and its effects on latent semantic indexing
Latent Semantic Indexing (LSI) uses the singular value decomposition to reduce noisy dimensions and improve the performance of text retrieval systems. Preliminary results have shown modest improvements in retrieval accuracy and recall, but these have ...
Comments