Skip to main content
Log in

Efficient range query processing in metric spaces over highly distributed data

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Similarity search in P2P systems has attracted a lot of attention recently and several important applications, like distributed image search, can profit from the proposed distributed algorithms. In this paper, we address the challenging problem of efficient processing of range queries in metric spaces, where data is horizontally distributed across a super-peer network. Our approach relies on SIMPEER (Doulkeridis et al. in Proceedings of VLDB, pp. 986–997, 2007), a framework that dynamically clusters peer data, in order to build distributed routing information at super-peer level. SIMPEER allows the evaluation of exact range and nearest neighbor queries in a distributed manner that reduces communication cost, network latency, bandwidth consumption and computational overhead at each individual peer. In this paper, we extend SIMPEER by focusing on efficient range query processing and providing recall-based guarantees for the quality of the result retrieved so far. This is especially useful for range queries that lead to result sets of high cardinality and incur high processing costs, while the complete result set becomes overwhelming for the user. Our framework employs statistics for estimating an upper limit of the number of possible results for a range query and each super-peer may decide not to propagate further the query and reduce the scope of the search. We provide an experimental evaluation of our framework and show that our approach performs efficiently, even in the case of high degree of distribution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Banaei-Kashani, F., Shahabi, C., SWAM: a family of access methods for similarity-search in peer-to-peer data networks. In: Proceedings of CIKM, pp. 304–313 (2004)

  2. Batko, M., Gennaro, C., Zezula, P.: A scalable nearest neighbor search in P2P systems. In: Proceedings of DBISP2P, pp. 79–92 (2004)

  3. Batko, M., Novak, D., Falchi, F., Zezula, P.: On scalability of the similarity search in the world of peers. In: Proceedings of InfoScale, p. 20 (2006)

  4. Bawa, M., Condie, T., Ganesan, P.: LSH forest: self-tuning indexes for similarity search. In: Proceedings of WWW, pp. 651–660 (2005)

  5. Bharambe, A.R., Agrawal, M., Seshan, S.: Mercury: supporting scalable multi-attribute range queries. In: Proceedings of SIGCOMM, pp. 353–366 (2004)

  6. Chavez, E., Navarro, G., Baeza-Yates, R., Marroquin, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)

    Article  Google Scholar 

  7. Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search in metric spaces. In Proceedings of VLDB, pp. 426–435 (1997)

  8. Ciaccia, P., Patella, M., Zezula, P.: A cost model for similarity queries in metric spaces. In: Proceedings of PODS, pp. 59–68 (1998)

  9. Crainiceanu, A., Linga, P., Gehrke, J., Shanmugasundaram, J.: P-tree: a P2P index for resource discovery applications. In: Proceedings of WWW (2004)

  10. Crainiceanu, A., Linga, P., Machanavajjhala, A., Gehrke, J., Shanmugasundaram, J.: P-ring: an efficient and robust P2P range index structure. In Proceedings of SIGMOD, pp. 223–234 (2007)

  11. Crespo, A., Garcia-Molina, H.: Routing indices for peer-to-peer systems. In: Proceedings of ICDCS, pp. 23–32 (2002)

  12. Datta, A., Hauswirth, M., John, R., Schmidt, R., Aberer, K.: Range queries in trie-structured overlays. In Proceedings of P2P, pp. 57–66 (2005)

  13. Doulkeridis, C., Vlachou, A., Kotidis, Y., Vazirgiannis, M.: Peer-to-peer similarity search in metric spaces. In Proceedings of VLDB, pp. 986–997 (2007)

  14. Falchi, F., Gennaro, C., Zezula, P.: A content-addressable network for similarity search in metric spaces. In: Proceedings of DBISP2P, pp. 126–137 (2005)

  15. Ganesan, P., Bawa, M., Garcia-Molina, H.: Online balancing of range-partitioned data with applications to peer-to-peer systems. In: Proceedings of VLDB, pp. 444–455 (2004)

  16. Hjaltason, G.R., Samet, H.: Index-driven similarity search in metric spaces. ACM Trans. Database Syst. 28(4), 517–580 (2003)

    Article  Google Scholar 

  17. Jagadish, H.V., Ooi, B.C., Tan, K.-L., Yu, C., Zhang, R.: iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans. Database Systems 30(2), 364–397 (2005)

    Article  Google Scholar 

  18. Jagadish, H.V., Ooi, B.C., Vu, Q.H.: BATON: a balanced tree structure for peer-to-peer networks. In: Proceedings of VLDB, pp. 661–672 (2005)

  19. Jagadish, H.V., Ooi, B.C., Vu, Q.H., Zhang, R., Zhou, A.: VBI-tree: a peer-to-peer framework for supporting multi-dimensional indexing schemes. In: Proceedings of ICDE, p. 34 (2006)

  20. Kalnis, P., Ng, W.S., Ooi, B.C., Tan, K.-L.: Answering similarity queries in peer-to-peer networks. Inf. Syst. 31(1), 57–72 (2006)

    Article  Google Scholar 

  21. Tung, A.K.H., Zhangz, R., Koudas, N., Ooi, B.C.: Similarity search: a matching based approach. In: Proceedings of VLDB, pp. 631–642 (2006)

  22. Liu, B., Lee, W.-C., Lee, D.L.: Supporting complex multi-dimensional queries in P2P systems. In: Proceedings of ICDCS, pp. 155–164 (2005)

  23. Novak, D., Zezula, P.: M-Chord: a scalable distributed similarity search structure. In: Proceedings of InfoScale, p. 19 (2006)

  24. Ntarmos, N., Pitoura, T., Triantafillou, P.: Range query optimization leveraging peer heterogeneity. In: Proceedings of DBISP2P (2005)

  25. Ratnasamy, S., Francis, P., Handley, M., Karp, R., Schenker, S.: A scalable content-addressable network. In: Proceedings of SIGCOMM, pp. 161–172 (2001)

  26. Shen, H.T., Shu, Y., Yu, B.: Efficient semantic-based content search in P2P network. IEEE Trans. Knowl. Data Eng. 16(7), 813–826 (2004)

    Article  Google Scholar 

  27. Shu, Y., Ooi, B.C., Tan, K.-L., Zhou, A.: Supporting multi-dimensional range queries in peer-to-peer systems. In: Proceedings of P2P, pp. 173–180 (2005)

  28. Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: a scalable peer-to-peer lookup service for internet applications. In: Proceedings of SIGCOMM, pp. 149–160 (2001)

  29. Vlachou, A., Doulkeridis, C., Kotidis, Y., Vazirgiannis, M.: SKYPEER: efficient subspace skyline computation over distributed data. In: Proceedings of ICDE, pp. 416–425 (2007)

  30. Yang, B., Garcia-Molina, H.: Designing a super-peer network. In: Proceedings of ICDE, pp. 49–60 (2003)

  31. Yu, C., Ooi, B.C., Tan, K.-L., Jagadish, H.V.: Indexing the distance: an efficient method to KNN processing. In: Proceedings of VLDB (2001)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christos Doulkeridis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Doulkeridis, C., Vlachou, A., Kotidis, Y. et al. Efficient range query processing in metric spaces over highly distributed data. Distrib Parallel Databases 26, 155 (2009). https://doi.org/10.1007/s10619-009-7047-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10619-009-7047-6

Keywords

Navigation