Skip to main content
Log in

Distributed Processing of Similarity Queries

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Many modern applications in diverse fields demand the efficient manipulation of very large multidimensional datasets. It is evident, that efficient and effective query processing techniques need to be developed, in order to provide acceptable response times in query processing. In this paper, we study the processing of similarity nearest neighbor queries in large distributed multidimensional databases, where objects are represented as vectors in a vector space, and are distributed in a multi-computer environment. The departure from the centralized case embodies a number of advantages and (unfortunately) a number of difficulties that need to be successfully overcome. In this perspective, four query evaluation strategies are presented, namely Concurrent Processing (CP), Selective Processing (SP), Two-Phase Processing (2PP) and Probabilistic Processing (PRP). The proposed techniques are compared analytically and experimentally, in order to discover the advantages of each one, as well as the best cases where each one should be applied. Experimental results are presented, demonstrating the performance of each method under different parameters values. Also, we investigate the impact of derived data that should be maintained in order to process similarity queries efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. D.J. Abel, B.C. Ooi, K.-L. Tan, R. Power, and J.X. Yu, “Spatial join strategies in distributed spatial DBMS,” in Proceedings of the 4th International Symposium in Spatial Databases (SSD' 95), Portland, ME, USA, August 1995, pp. 348–367.

  2. R. Agrawal, C. Faloutsos, and A. Swami, “Efficient similarity search in sequence databases,” in Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms, 1993, pp. 69–84.

  3. S. Arya, D.M. Mount, and O. Narayan, “Accounting for boundary effects in nearest neighbor searching,” in Proceedings of the 11-th Annual Symposium on Computational Geometry, Vancouver, British Columbia, Canada, 1995, pp. 336–344.

  4. N. Beckmann, H.P. Kriegel, and B. Seeger, “The R*-tree: An efficient and robust method for points and rectangles,” in Proceedings of the ACM SIGMOD Conference, Atlantic City, NJ, 1990, pp. 322–331.

  5. J.L. Bentley, B.W. Weide, and A.C. Yao, “Optimal expected-time algorithms for closest point problems,” ACM Transactions on Mathematical Software, vol. 6, no. 4, pp. 563–580, 1980.

    Google Scholar 

  6. S. Berchtold, D. Keim, and H.-P. Kriegel, “The X-tree: An index structure for high-dimensional data,” in Proceedings of the 22nd VLDB Conference, Bombay, India, 1996.

  7. D. DeWitt and P. Valduriez, “Parallel database systems: The future of high performance database systems,” Communications of the ACM, vol. 6, no. 6, pp. 85–98, 1992.

    Google Scholar 

  8. S.T. Dumais, “Latent semantic indexing (LSI) and TREC-2,” in The 2nd Text Retrieval Conference, D.K. Harman (Ed.), MD, March 1994, pp. 105–115.

  9. C. Faloutsos and I. Kamel, “Beyond uniformity and independence, analysis of R-trees using the concept of fractal dimension,” in Proceedings of the 13th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS' 94), Minneapolis, MN, USA, 1994, pp. 4–13.

  10. C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, “Fast subsequence matching in time-series databases,” in Proceedings of the ACM SIGMOD Conference, Minneapolis, MN, USA, 1994, pp. 419–429.

  11. C. Faloutsos and K.-L. Lin, “FastMap: A fast algorithm for indexing, data mining and visualization of traditional and multimedia datasets,” in Proceedings of the ACM SIGMOD Conference, San Jose, CA, 1995, pp. 163–174.

  12. R. Fagin, “Combining fuzzy information from multiple systems,” in Proceedings of the 15-th ACMSIGACTSIGMOD-SIGART Symposium on Principles of Database Systems (PODS' 96), Montreal, Canada, 1996, pp. 216–226.

  13. J.H. Friedman, J.L. Bentley, and R.A. Finkel, “An algorithm for finding the best matches in logarithmic expected time,” ACM Transactions on Mathematical Software, vol. 3, pp. 209–226, 1977.

    Google Scholar 

  14. R.H. Guting, “An introduction to spatial database systems,” The VLDB Journal, vol. 3, no. 4, pp. 357–399, 1994.

    Google Scholar 

  15. A. Guttman, “R-trees: A dynamic index structure for spatial searching,” in Proceedings of the ACMSIGMOD Conference, Boston, MA, 1984, pp. 47–57.

  16. H.V. Jagadish, “Linear clustering of objects with multiple attributes,” in Proceedings of the ACM SIGMOD Conference, Atlantic City, NJ, 1990, pp. 332–342.

  17. H.V. Jagadish, “A retrieval technique for similar shapes,” in Proceedings of the ACM SIGMOD Conference, Denver, CO, May 1991, pp. 208–217.

  18. N. Koudas, C. Faloutsos, and I. Kamel, “Declustering spatial databases on a multi-computer architecture,” in Proceedings of the Extending Database Technology Conference (EDBT' 96), Avignon, France, 1996.

  19. R. Laurini and D. Thompson, Fundamentals of Spatial Information Systems, Academic Press: London, 1992.

    Google Scholar 

  20. K. Lin, H.V. Jagadish, and C. Faloutsos, “The TV-tree: An index structure for high-dimensional data,” The VLDB Journal, vol. 3, pp. 517–542, 1994.

    Google Scholar 

  21. A.N. Papadopoulos and Y. Manolopoulos, “Parallel processing of nearest neighbor queries in declustered spatial data,” in Proceedings of the 5th ACM-GISWorkshop in Advances in Geographical Information Systems, Rockville, MD, USA, November 1996, pp. 35–43.

  22. A.N. Papadopoulos and Y. Manolopoulos, “Performance of nearest neighbor queries in R-trees,” in Proceedings of the 6th International Conference on Database Theory (ICDT' 97), Delphi, Greece, January 1997, pp. 394–408.

  23. A.N. Papadopoulos and Y. Manolopoulos, “Similarity query processing using disk arrays,” in Proceedings of the ACM SIGMOD Conference, Seattle, Washington, USA, 1998, pp. 225–236.

  24. N. Roussopoulos, S. Kelley, and F. Vincent, “Nearest neighbor queries,” in Proceedings of the ACMSIGMOD Conference, San Jose, CA, USA, 1995, pp. 71–79.

  25. A. Sanfeliu and K.-S. Fu, “A distance measure between attributed relational graphs for pattern recognition,” IEEE Transactions on Systems, Man and Cybernetics, vol. smc-13, no. 3, pp. 353–362, 1983.

    Google Scholar 

  26. W.R. Stevens, UNIX Network Programming, Prentice-Hall, 1990.

  27. P. Valduriez and T. Ozsu, Principles of Distributed Database Systems, Prentice Hall, 1991.

  28. D. White and R. Jain, “Similarity indexing: Algorithms and performance,” in Proceedings of the SPIE: Storage and Retrieval for Image and Video Databases IV, Jan Jose, CA, USA, 1996, vol. 2670, pp. 62–75.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Papadopoulos, A.N., Manolopoulos, Y. Distributed Processing of Similarity Queries. Distributed and Parallel Databases 9, 67–92 (2001). https://doi.org/10.1023/A:1026509108054

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1026509108054

Navigation