Abstract
In the area of Text Retrieval, processing a query in the vector model has been verified to be qualitatively more effective than searching in the boolean model. However, in case of the classic vector model the current methods of processing many-term queries are inefficient, in case of LSI model there does not exist an efficient method for processing even the few-term queries. In this paper we propose a method of vector query processing based on metric indexing, which is efficient especially for the LSI model. In addition, we propose a concept of approximate semi-metric search, which can further improve the efficiency of retrieval process. Results of experiments made on moderate text collection are included.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Achlioptas, D.: Database-friendly random projections. In: Symposium on Principles of Database Systems (2001)
Anh, V.N., de Kretser, O., Moffat, A.: Vector-space ranking with effective early termination. In: Proceedings of the 24th annual international ACM SIGIR, pp. 35–42. ACM Press, New York (2001)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley, New York (1999)
Berry, M., Browne, M.: Understanding Search Engines, Mathematical Modeling and Text Retrieval. SIAM, Philadelphia (1999)
Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Knowledge Discovery and Data Mining, pp. 245–250 (2001)
Blott, S., Weber, R.: An Approximation-Based Data Structure for Similarity Search. Technical report, ESPRIT (1999)
Böhm, C., Berchtold, S., Keim, D.: Searching in High-Dimensional Spaces – Index Structures for Improving the Performance of Multimedia Databases. ACM Computing Surveys 33(3), 322–373 (2001)
Chávez, E., Navarro, G.: A probabilistic spell for the curse of dimensionality. In: Buchsbaum, A.L., Snoeyink, J. (eds.) ALENEX 2001. LNCS, vol. 2153, p. 147. Springer, Heidelberg (2001)
Ciaccia, P., Patella, M., Zezula, P.: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. In: Proceedings of the 23rd Athens Intern. Conf. on VLDB, pp. 426–435. Morgan Kaufmann, San Francisco (1997)
Corazza, P.: Introduction to metric-preserving functions. Amer. Math Monthly 104(4), 309–323 (1999)
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Deppisch, U.: S-tree: A Dynamic Balanced Signature Index for Office Retrieval. In: Proceedings of ACM SIGIR (1986)
Faloutsos, C.: Signature-based text retrieval methods, a survey. IEEE Computer society Technical Committee on Data Engineering 13(1), 25–32 (1990)
Lee, D.L., Ren, L.: Document Ranking on Weight-Partitioned Signature Files. In: ACM TOIS 14, pp. 109–137 (1996)
Moffat, A., Zobel, J.: Fast ranking in limited space. In: Proceedings of ICDE 1994, pp. 428–437. IEEE Computer Society, Los Alamitos (1994)
Moravec, P., Pokorný, J., Snášel, V.: Vector Query with Signature Filtering. In: Proc. of the 6th Bussiness Information Systems Conference, USA, (2003)
Papadimitriou, C.H., Tamaki, H., Raghavan, P., Vempala, S.: Latent semantic indexing: A probabilistic analysis. In: Proocedings of the ACM Conference on Principles of Database Systems (PODS), Seattle, pp. 159–168 (1998)
Patella, M.: Similarity Search in Multimedia Databases. Dipartmento di Elettronica Informatica e Sistemistica, Bologna (1999)
Persin, M.: Document filtering for fast ranking. In: Proceedings of the 17th annual international ACM SIGIR, pp. 339–348. Springer, New York (1994)
Salton, G., McGill, M.: Introduction to Modern Information Retrieval, 1st edn. McGraw Hill Publications, New York (1983)
Skopal, T., Pokorný, J., Krátký, M., Snášel, V.: Revisiting M-tree Building Principles. In: Kalinichenko, L.A., Manthey, R., Thalheim, B., Wloka, U. (eds.) ADBIS 2003. LNCS, vol. 2798, pp. 148–162. Springer, Heidelberg (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Skopal, T., Moravec, P., Pokorný, J., Snášel, V. (2004). Metric Indexing for the Vector Model in Text Retrieval. In: Apostolico, A., Melucci, M. (eds) String Processing and Information Retrieval. SPIRE 2004. Lecture Notes in Computer Science, vol 3246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30213-1_26
Download citation
DOI: https://doi.org/10.1007/978-3-540-30213-1_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23210-0
Online ISBN: 978-3-540-30213-1
eBook Packages: Springer Book Archive