Abstract
In order to speedup retrieval in large collections of data, index structures partition the data into subsets so that query requests can be evaluated without examining the entire collection. As the complexity of modern data types grows, metric spaces have become a popular paradigm for similarity retrieval. We propose a new index structure, called D-Index, that combines a novel clustering technique and the pivot-based distance searching strategy to speed up execution of similarity range and nearest neighbor queries for large files with objects stored in disk memories. We have qualitatively analyzed D-Index and verified its properties on actual implementation. We have also compared D-Index with other index structures and demonstrated its superiority on several real-life data sets. Contrary to tree organizations, the D-Index structure is suitable for dynamic environments with a high rate of delete/insert operations.
Similar content being viewed by others
References
T. Bozkaya and Ozsoyoglu, “Indexing large metric spaces for similarity search queries,” ACM TODS, Vol. 24, No. 3, pp. 361–404, 1999.
B. Bustos, G. Navarro, and E. Chavez, “Pivot selection techniques for proximity searching in metric spaces,” in Proceedings of the XXI Conference of the Chielan Computer Science Society (SCCC01), IEEE CS Press, 2001, pp. 33–40.
E. Chavez, J. Marroquin, and G. Navarro, “Fixed queries array: A fast and economical data structure for proximity searching,” Multimedia Tools and Applications, Vol. 14, No. 2, pp. 113–135, 2001.
E. Chavez, G. Navarro, R. Baeza-Yates, and J. Marroquin, “Proximity searching in metric spaces,” ACM Computing Surveys. Vol. 33, No. 3, pp. 273–321, 2001.
P. Ciaccia, M. Patella, and P. Zezula, “M-tree: An efficient access method for similarity search in metric spaces,” in Proceedings of the 23rd VLDB Conference, Athens, Greece, 1997, pp. 426–435.
R.F.S. Filho, A. Traina, C. Traina Jr., and C. Faloutsos, “Similarity search without tears: The OMNI-family of all-purpose access methods,” in Proceedings of the 17th ICDE Conference, Heidelberg, Germany, 2001, pp. 623–630.
V. Dohnal, C. Gennaro, P. Savino, and P. Zezula, “Separable splits in metric data sets,” in Proceedings of 9-th Italian Symposium on Advanced Database Systems, Venice, Italy, June 2001, pp. 45–62, LCM Selecta Group—Milano.
C. Gennaro, P. Savino, and P. Zezula, “Similarity search in metric databases through Hashing,” in Proceedings of ACM Multimedia 2001 Workshops, Oct. 2001, Ottawa, Canada, pp. 1–5.
J.M. Hellerstein, J.F. Naughton, and A. Pfeffer, “Generalized search trees for database systems,” in Proceedings of the 21st VLDB Conference, 1995, pp. 562–573.
B. Seeger, P. Larson, and R. McFayden, “Reading a set of disk pages,” in Proceedings of the 19th VLDB Conference, 1993, pp. 592–603.
P.N. Yianilos, “Data structures and algorithms for nearest neighbor search in general metric spaces,” ACMSIAM Symposium on Discrete Algorithms (SODA), 1993, pp. 311–321.
P.N. Yianilos, “Excluded middle vantage point forests for nearest neighbor search,” Tech. rep., NEC Research Institute, 1999, Presented at Sixth DIMACS Implementation Challenge: Nearest Neighbor Searchesworkshop, Jan. 15, 1999.
C. Yu, B.C. Ooi, K.L. Tan, and H.V. Jagadish, “Indexing the Distance: Anefficient method toKNNprocessing,” in Proceedings of the 27th VLDB Conference, Roma, Italy, 2001, pp. 421–430.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Dohnal, V., Gennaro, C., Savino, P. et al. D-Index: Distance Searching Index for Metric Data Sets. Multimedia Tools and Applications 21, 9–33 (2003). https://doi.org/10.1023/A:1025026030880
Issue Date:
DOI: https://doi.org/10.1023/A:1025026030880