Modified LSI Model for Efficient Search by Metric Access Methods

Skopal, Tomáš; Moravec, Pavel

doi:10.1007/978-3-540-31865-1_18

Tomáš Skopal¹⁸ &
Pavel Moravec¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3408))

Included in the following conference series:

European Conference on Information Retrieval

4309 Accesses

Abstract

Text collections represented in LSI model are hard to search efficiently (i.e. quickly), since there exists no indexing method for the LSI matrices. The inverted file, often used in both boolean and classic vector model, cannot be effectively utilized, because query vectors in LSI model are dense. A possible way for efficient search in LSI matrices could be the usage of metric access methods (MAMs). Instead of cosine measure, the MAMs can utilize the deviation metric for query processing as an equivalent dissimilarity measure. However, the intrinsic dimensionality of collections represented by LSI matrices is often large, which decreases MAMs’ performance in searching. In this paper we introduce σ-LSI, a modification of LSI in which we artificially decrease the intrinsic dimensionality of LSI matrices. This is achieved by an adjustment of singular values produced by SVD. We show that suitable adjustments could dramatically improve the efficiency when searching by MAMs, while the precision/recall values remain preserved or get only slightly worse.

Download to read the full chapter text

Chapter PDF

MSQL: efficient similarity search in metric spaces using SQL

Article 06 October 2017

Wei Lu, Jiajia Hou, … Thomas Moscibroda

Pruning Algorithms for Low-Dimensional Non-metric k-NN Search: A Case Study

Shortening the Candidate List for Similarity Searching Using Inverted Index

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Anh, V.N., de Kretser, O., Moffat, A.: Vector-space ranking with effective early termination. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 35–42. ACM Press, New York (2001)
Chapter Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, New York (1999)
Google Scholar
Berry, M., Browne, M.: Understanding Search Engines, Mathematical Modeling and Text Retrieval. SIAM, Philadelphia (1999)
MATH Google Scholar
Berry, M., Dumais, S., Letsche, T.: Computation Methods for Intelligent Information Access. In: Proceedings of the 1995 ACM/IEEE Supercomputing Conference (1995)
Google Scholar
Berry, M.W., Fierro, R.D.: Low-Rank Orthogonal Decomposition for Information Retrieval Applications. Numerical Algebra with Applications 1(1), 1–27 (1996)
Google Scholar
Böhm, C., Berchtold, S., Keim, D.: Searching in High-Dimensional Spaces – Index Structures for Improving the Performance of Multimedia Databases. ACM Computing Surveys 33(3), 322–373 (2001)
Article Google Scholar
Chávez, E., Navarro, G.: A probabilistic spell for the curse of dimensionality. In: Buchsbaum, A.L., Snoeyink, J. (eds.) ALENEX 2001. LNCS, vol. 2153, p. 147. Springer, Heidelberg (2001)
Chapter Google Scholar
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Compututing Surveys 33(3), 273–321 (2001)
Article Google Scholar
Ciaccia, P., Patella, M., Zezula, P.: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. In: Proceedings of the 23rd Athens Intern. Conf. on VLDB, pp. 426–435. Morgan Kaufmann, San Francisco (1997)
Google Scholar
Dohnal, V., Gennaro, C., Savino, P., Zezula, P.: D-index: Distance searching index for metric data sets. Multimedia Tools Applications 21(1), 9–33 (2003)
Article Google Scholar
Frieze, A., Kannan, R., Vempala, S.: Fast Monte-Carlo Algorithms for Finding Low Rank Approximations. In: Proceedings of 1998 FOCS, pp. 370–378 (1998)
Google Scholar
Golub, G.H., Loan, C.F.V.: Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)
MATH Google Scholar
Larsen, R.M.: Lanczos bidiagonalization with partial reorthogonalization. Technical report, University of Aarhus (1998)
Google Scholar
Micó, M.L., Oncina, J., Vidal, E.: An algorithm for finding nearest neighbour in constant average time with a linear space complexity. In: International Conference on Pattern Recognition, pp. 557–560 (1992)
Google Scholar
Moffat, A., Zobel, J.: Fast ranking in limited space. In: Proceedings of the Tenth International Conference on Data Engineering, pp. 428–437. IEEE Computer Society Press, Los Alamitos (1994)
Chapter Google Scholar
Kanerva, J.K.P., Holst, A.: Random Indexing of Text Samples for Latent Semantic Analysis. In: Proceedings of the 22nd Annual Conference of the Cognitive Science Society, p. 1036 (2000)
Google Scholar
Papadimitriou, C.H., Tamaki, H., Raghavan, P., Vempala, S.: Latent semantic indexing: A probabilistic analysis. In: Proocedings of the ACM Conference on Principles of Database Systems (PODS), pp. 159–168 (1998)
Google Scholar
Persin, M.: Document filtering for fast ranking. In: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 339–348. Springer, New York (1994)
Google Scholar
Ponte, J., Croft, W.: A language modelling approach to IR. In: Proceedings of the 21 st ACM SIGIR Conference, pp. 275–281 (1998)
Google Scholar
Skopal, T., Moravec, P., Pokorný, J., Snášel, V.: Metric Indexing for the Vector Model in Text Retrieval. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 183–195. Springer, Heidelberg (2004)
Chapter Google Scholar
Voorhees, E.M., Harman, D.: Overview of the sixth text REtrieval conference (TREC-6). Information Processing and Management 36(1), 3–35 (2000)
Article Google Scholar
Yanilos, P.N.: Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces. In: Proceedings of Fourth Annual ACM/SIGACT-SIAM Symposium on Discrete Algorithms - SODA, pp. 311–321 (1993)
Google Scholar
Zezula, P., Savino, P., Amato, G., Rabitti, F.: Approximate Similarity Retrieval with M-Trees. VLDB Journal 7(4), 275–293 (1998)
Article Google Scholar

Download references

Author information

Authors and Affiliations

FMP, Department of Software Engineering, Charles University in Prague, Malostranské nám. 25, 118 00, Prague, Czech Republic
Tomáš Skopal
FEECS, Department of Computer Science, Technical University of Ostrava, 17. listopadu 15, 708 33, Ostrava, Czech Republic
Pavel Moravec

Authors

Tomáš Skopal
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Moravec
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departamento de Electrónica y Computación, Universidad de Santiago de Compostela, Spain
David E. Losada
Departamento de Ciencias de la Computación e Inteligencia Artificial E.T.S.I. Informática y de Telecomunicación, Universidad de Granada, 18071, Granada, Spain
Juan M. Fernández-Luna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Skopal, T., Moravec, P. (2005). Modified LSI Model for Efficient Search by Metric Access Methods. In: Losada, D.E., Fernández-Luna, J.M. (eds) Advances in Information Retrieval. ECIR 2005. Lecture Notes in Computer Science, vol 3408. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31865-1_18

Download citation

DOI: https://doi.org/10.1007/978-3-540-31865-1_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25295-5
Online ISBN: 978-3-540-31865-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Modified LSI Model for Efficient Search by Metric Access Methods

Abstract

Chapter PDF

Similar content being viewed by others

MSQL: efficient similarity search in metric spaces using SQL

Pruning Algorithms for Low-Dimensional Non-metric k-NN Search: A Case Study

Shortening the Candidate List for Similarity Searching Using Inverted Index

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Modified LSI Model for Efficient Search by Metric Access Methods

Abstract

Chapter PDF

Similar content being viewed by others

MSQL: efficient similarity search in metric spaces using SQL

Pruning Algorithms for Low-Dimensional Non-metric k-NN Search: A Case Study

Shortening the Candidate List for Similarity Searching Using Inverted Index

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation