Distributed Processing of Similarity Queries

Papadopoulos, Apostolos N.; Manolopoulos, Yannis

doi:10.1023/A:1026509108054

Distributed Processing of Similarity Queries

Published: January 2001

Volume 9, pages 67–92, (2001)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Apostolos N. Papadopoulos¹ &
Yannis Manolopoulos²

74 Accesses
14 Citations
Explore all metrics

Abstract

Many modern applications in diverse fields demand the efficient manipulation of very large multidimensional datasets. It is evident, that efficient and effective query processing techniques need to be developed, in order to provide acceptable response times in query processing. In this paper, we study the processing of similarity nearest neighbor queries in large distributed multidimensional databases, where objects are represented as vectors in a vector space, and are distributed in a multi-computer environment. The departure from the centralized case embodies a number of advantages and (unfortunately) a number of difficulties that need to be successfully overcome. In this perspective, four query evaluation strategies are presented, namely Concurrent Processing (CP), Selective Processing (SP), Two-Phase Processing (2PP) and Probabilistic Processing (PRP). The proposed techniques are compared analytically and experimentally, in order to discover the advantages of each one, as well as the best cases where each one should be applied. Experimental results are presented, demonstrating the performance of each method under different parameters values. Also, we investigate the impact of derived data that should be maintained in order to process similarity queries efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

D.J. Abel, B.C. Ooi, K.-L. Tan, R. Power, and J.X. Yu, “Spatial join strategies in distributed spatial DBMS,” in Proceedings of the 4th International Symposium in Spatial Databases (SSD' 95), Portland, ME, USA, August 1995, pp. 348–367.
R. Agrawal, C. Faloutsos, and A. Swami, “Efficient similarity search in sequence databases,” in Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms, 1993, pp. 69–84.
S. Arya, D.M. Mount, and O. Narayan, “Accounting for boundary effects in nearest neighbor searching,” in Proceedings of the 11-th Annual Symposium on Computational Geometry, Vancouver, British Columbia, Canada, 1995, pp. 336–344.
N. Beckmann, H.P. Kriegel, and B. Seeger, “The R*-tree: An efficient and robust method for points and rectangles,” in Proceedings of the ACM SIGMOD Conference, Atlantic City, NJ, 1990, pp. 322–331.
J.L. Bentley, B.W. Weide, and A.C. Yao, “Optimal expected-time algorithms for closest point problems,” ACM Transactions on Mathematical Software, vol. 6, no. 4, pp. 563–580, 1980.
Google Scholar
S. Berchtold, D. Keim, and H.-P. Kriegel, “The X-tree: An index structure for high-dimensional data,” in Proceedings of the 22nd VLDB Conference, Bombay, India, 1996.
D. DeWitt and P. Valduriez, “Parallel database systems: The future of high performance database systems,” Communications of the ACM, vol. 6, no. 6, pp. 85–98, 1992.
Google Scholar
S.T. Dumais, “Latent semantic indexing (LSI) and TREC-2,” in The 2nd Text Retrieval Conference, D.K. Harman (Ed.), MD, March 1994, pp. 105–115.
C. Faloutsos and I. Kamel, “Beyond uniformity and independence, analysis of R-trees using the concept of fractal dimension,” in Proceedings of the 13th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS' 94), Minneapolis, MN, USA, 1994, pp. 4–13.
C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, “Fast subsequence matching in time-series databases,” in Proceedings of the ACM SIGMOD Conference, Minneapolis, MN, USA, 1994, pp. 419–429.
C. Faloutsos and K.-L. Lin, “FastMap: A fast algorithm for indexing, data mining and visualization of traditional and multimedia datasets,” in Proceedings of the ACM SIGMOD Conference, San Jose, CA, 1995, pp. 163–174.
R. Fagin, “Combining fuzzy information from multiple systems,” in Proceedings of the 15-th ACMSIGACTSIGMOD-SIGART Symposium on Principles of Database Systems (PODS' 96), Montreal, Canada, 1996, pp. 216–226.
J.H. Friedman, J.L. Bentley, and R.A. Finkel, “An algorithm for finding the best matches in logarithmic expected time,” ACM Transactions on Mathematical Software, vol. 3, pp. 209–226, 1977.
Google Scholar
R.H. Guting, “An introduction to spatial database systems,” The VLDB Journal, vol. 3, no. 4, pp. 357–399, 1994.
Google Scholar
A. Guttman, “R-trees: A dynamic index structure for spatial searching,” in Proceedings of the ACMSIGMOD Conference, Boston, MA, 1984, pp. 47–57.
H.V. Jagadish, “Linear clustering of objects with multiple attributes,” in Proceedings of the ACM SIGMOD Conference, Atlantic City, NJ, 1990, pp. 332–342.
H.V. Jagadish, “A retrieval technique for similar shapes,” in Proceedings of the ACM SIGMOD Conference, Denver, CO, May 1991, pp. 208–217.
N. Koudas, C. Faloutsos, and I. Kamel, “Declustering spatial databases on a multi-computer architecture,” in Proceedings of the Extending Database Technology Conference (EDBT' 96), Avignon, France, 1996.
R. Laurini and D. Thompson, Fundamentals of Spatial Information Systems, Academic Press: London, 1992.
Google Scholar
K. Lin, H.V. Jagadish, and C. Faloutsos, “The TV-tree: An index structure for high-dimensional data,” The VLDB Journal, vol. 3, pp. 517–542, 1994.
Google Scholar
A.N. Papadopoulos and Y. Manolopoulos, “Parallel processing of nearest neighbor queries in declustered spatial data,” in Proceedings of the 5th ACM-GISWorkshop in Advances in Geographical Information Systems, Rockville, MD, USA, November 1996, pp. 35–43.
A.N. Papadopoulos and Y. Manolopoulos, “Performance of nearest neighbor queries in R-trees,” in Proceedings of the 6th International Conference on Database Theory (ICDT' 97), Delphi, Greece, January 1997, pp. 394–408.
A.N. Papadopoulos and Y. Manolopoulos, “Similarity query processing using disk arrays,” in Proceedings of the ACM SIGMOD Conference, Seattle, Washington, USA, 1998, pp. 225–236.
N. Roussopoulos, S. Kelley, and F. Vincent, “Nearest neighbor queries,” in Proceedings of the ACMSIGMOD Conference, San Jose, CA, USA, 1995, pp. 71–79.
A. Sanfeliu and K.-S. Fu, “A distance measure between attributed relational graphs for pattern recognition,” IEEE Transactions on Systems, Man and Cybernetics, vol. smc-13, no. 3, pp. 353–362, 1983.
Google Scholar
W.R. Stevens, UNIX Network Programming, Prentice-Hall, 1990.
P. Valduriez and T. Ozsu, Principles of Distributed Database Systems, Prentice Hall, 1991.
D. White and R. Jain, “Similarity indexing: Algorithms and performance,” in Proceedings of the SPIE: Storage and Retrieval for Image and Video Databases IV, Jan Jose, CA, USA, 1996, vol. 2670, pp. 62–75.

Download references

Author information

Authors and Affiliations

Data Engineering Research Lab., Department of Informatics, Aristotle University, 54006, Thessaloniki, Greece
Apostolos N. Papadopoulos
Data Engineering Research Lab., Department of Informatics, Aristotle University, 54006, Thessaloniki, Greece
Yannis Manolopoulos

Authors

Apostolos N. Papadopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Yannis Manolopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Papadopoulos, A.N., Manolopoulos, Y. Distributed Processing of Similarity Queries. Distributed and Parallel Databases 9, 67–92 (2001). https://doi.org/10.1023/A:1026509108054

Download citation

Issue Date: January 2001
DOI: https://doi.org/10.1023/A:1026509108054

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed Processing of Similarity Queries

Abstract

Access this article

Similar content being viewed by others

High-dimensional similarity searches using query driven dynamic quantization and distributed indexing

Distributed Similarity Queries in Metric Spaces

Optimized and Parallel Query Processing in Similarity-Based Databases

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Distributed Processing of Similarity Queries

Abstract

Access this article

Similar content being viewed by others

High-dimensional similarity searches using query driven dynamic quantization and distributed indexing

Distributed Similarity Queries in Metric Spaces

Optimized and Parallel Query Processing in Similarity-Based Databases

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation