Abstract
Efficient information searching and retrieval methods are needed to navigate the ever increasing volumes of digital information. Traditional lexical information retrieval methods can be inefficient and often return inaccurate results. To overcome problems such as polysemy and synonymy, concept-based retrieval methods have been developed. One such method is Latent Semantic Indexing (LSI), a vector-space model, which uses the singular value decomposition (SVD) of a term-by-document matrix to represent terms and documents in k-dimensional space. As with other vector-space models, LSI is an attempt to exploit the underlying semantic structure of word usage in documents. During the query matching phase of LSI, a user's query is first projected into the term-document space, and then compared to all terms and documents represented in the vector space. Using some similarity measure, the nearest (most relevant) terms and documents are identified and returned to the user. The current LSI query matching method requires that the similarity measure be computed between the query and every term and document in the vector space. In this paper, the kd-tree searching algorithm is used within a recent LSI implementation to reduce the time and computational complexity of query matching. The kd-tree data structure stores the term and document vectors in such a way that only those terms and documents that are most likely to qualify as nearest neighbors to the query will be examined and retrieved.
Article PDF
Similar content being viewed by others
References
Arya S (1995) Nearest neighbor searching and applications. PhD Dissertation, University of Maryland, College Park, MD.
Arya S and Mount DM (1993) Algorithms for fast vector quantization. In: Proceedings of the 1993 Data Compression Conference (DCC), IEEE Press, pp. 381–390.
Belkin N and Croft W (1987) Retrieval techniques. In: Williams M, Ed., Annual Review of Information Science and Technology (ARIST) Vol. 22. Elsevier Science Publishers B.V., Chap. 4, pp. 109–145.
Bentley JL (1975) Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509–517.
Berry MW, Dumais ST and O'Brien GW (1995) Using linear algebra for intelligent information retrieval. SIAM Review, 37(4):573–595.
Berry MW and Fierro RD (1996) Low-rank orthogonal decompositions for information retrieval applications. Numerical Linear Algebra with Applications, 3(4):301–327.
The Concise Columbia Encyclopedia, 2nd ed. (1989) Columbia University Press, New York.
Deerwester S, Dumais ST, Furnas GW, Landauer TK and Harshman R. (1990) Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(2):391–407.
Dumais ST (1991) Improving the retrieval of information from external sources. Behavior Research Methods, Instruments & Computers, 23(2):229–236.
Frakes W and Baeza-Yates R (1992) Information Retrieval: Data Structures & Algorithms. Prentice Hall, Englewood Cliffs, NJ.
Friedman JH, Bentley JL and Finkel RA (1977) An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 3(3):209–226.
Golub GH and Van Loan CF (1996) Matrix computations, 3rd ed. Johns Hopkins University Press, Baltimore, MD.
Hughey MK (1998) An implementation of kd-trees for improved query matching with latent semantic indexing. Master's Thesis, University of Tennessee, Knoxville, TN.
Letsche TA (1996) Toward large-scale information retrieval using latent semantic indexing. Master's Thesis, University of Tennessee, Knoxville, TN.
Letsche TA and Berry MW (1997) Large-scale information retrieval with latent semantic indexing. Information Sciences, (100):105–137.
Salton G and Buckley C (1990) Improving retrieval performance by relevance feedback. Journal American Society for Information Sciences, 41(4):288–297.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Hughey, M., Berry, M. Improved Query Matching Using kd-Trees: A Latent Semantic Indexing Enhancement. Information Retrieval 2, 287–302 (2000). https://doi.org/10.1023/A:1009915010963
Issue Date:
DOI: https://doi.org/10.1023/A:1009915010963