Skip to main content
Log in

Supporting K nearest neighbors query on high-dimensional data in P2P systems

  • Research Article
  • Published:
Frontiers of Computer Science in China Aims and scope Submit manuscript

Abstract

Peer-to-peer systems have been widely used for sharing and exchanging data and resources among numerous computer nodes. Various data objects identifiable with high dimensional feature vectors, such as text, images, genome sequences, are starting to leverage P2P technology. Most of the existing works have been focusing on queries on data objects with one or few attributes and thus are not applicable on high dimensional data objects. In this study, we investigate K nearest neighbors query (KNN) on high dimensional data objects in P2P systems. Efficient query algorithm and solutions that address various technical challenges raised by high dimensionality, such as search space resolution and incremental search space refinement, are proposed. An extensive simulation using both synthetic and real data sets demonstrates that our proposal efficiently supports KNN query on high dimensional data in P2P systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ratnasamy S, Francis P, Handley M, et al. Scalable, distributed object location and routing for large-scale peer-to-peer systems In: Proceedings of ACM SIGCOMM 2001. New York: ACM Press, 2001, 161–172

    Google Scholar 

  2. Stoica I, Morris R, Karger D, et al. Chord: A scalable peer-topeer lookup service for Internet applications. In: Proceedings of ACMSIGCOMM2001. New York: ACM Press, 2001, 149–160

    Google Scholar 

  3. Rowstron A I T, Druschel P. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In: Proceedings of IFIP/ACM International Conference on Distributed Systems Platforms (Middleware). New York: ACM Press, 2001, 329–350

    Google Scholar 

  4. Zhao B Y, Kubiatowicz J D, Joseph A D. Tapestry: an infrastructure for fault-tolerant wide-area location and routing. Technical Report UCS/CSD-01-1141, Computer Science Division, U. C. Berkeley, 2001

    Google Scholar 

  5. Andrzejak A, Xu Z. Scalable, efficient range queries for grid information services. In: Proceedings of IEEE International Conference on Peer-to-Peer Computing (P2P). Wahsington D.C.: IEEE Computing Soceity, 2002, 33–40

    Google Scholar 

  6. Aspnes J, Shah G. Skip graphs. In: Proceedings of ACMSIAM Symposium on Discrete Algorithms (SODA). New York: ACM Press, 2003, 384–393

    Google Scholar 

  7. Bharambe A R, Agrawal M, Seshan S. Mercury: Supporting scalable multi-attribute range queries. In: Proceedings of ACM SIGCOMM. New York: ACM Press, 2004, 353–366

    Google Scholar 

  8. Ganesan P, Bawa M, Garcia-Molina H. Online balancing of range-partitioned data with applications to peer-to-peer systems. In: Proceedings of International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 2004, 444–455

  9. Gao J, Steenkiste P. An adaptive protocol for efficient support of range queries in DHT-based systems. In: Proceedings of IEEE International Conference on Network Protocols (ICNP). Washington D.C.: IEEE Computer Society, 2004, 239–250

    Google Scholar 

  10. Gupta A, Agrawal D, Abbadi A E. Approximate range selection queries in peer-to-peer systems. In: Proceedings of Biennial Conference on Innovative Data Systems Research (CIDR), 2003

  11. Sahin O, Gupta A, Agrawal D, et al. A peer-to-peer framework for caching range queries. In: Proceedings of International Conference on Data Engineering (ICDE). Washinton D.C.: IEEE Computer Society, 2004, 165–176

    Chapter  Google Scholar 

  12. Shu Y, Ooi B C, Tan KL, et al. Supporting multi-dimensional range queries in peer-to-peer systems. In: Proceedings of IEEE International Conference on Peer-to-Peer Computing (P2P). Washington D.C.: IEEE Computer Society, 2005, 173–180

    Google Scholar 

  13. Banaei-Kashani F, Shahabi C. SWAM: a family of access methods for similarity-search in peer-to-peer data networks. In: Proceedings of ACM Conference on Information and Knowledge Management (CIKM). New York: ACM Press, 2004, 304–313

    Chapter  Google Scholar 

  14. Jagadish H V, Ooi BC, Vu Q H, et al. VBI-Tree: a peer-to-peer framework for supporting multi-dimensional indexing schemes. In: Proceedings of International Conference on Data Engineering (ICDE), 2006

  15. Li M, Lee W-C, Sivasubramaniam A. DPTree: a balanced tree based indexing framework for peer-to-peer systems. In: Proceedings of International Conference on Network Protocols (ICNP). Washington D.C.: IEEE Computer Society, 2006, 12–21

    Chapter  Google Scholar 

  16. Liu B, Lee W-C, Lee D L. Supporting complex multi-dimensional queries in P2P systems. In: Proceedings of International Conference on Distributed Computing Systems (ICDCS), 2005, 155–164

  17. Tanin E, Nayar D, Samet H. An efficient nearest neighbor algorithm for P2P settings. In: Proceedings of National Conference on Digital Government Research, 2005, 21–28

  18. Li M, Lee W-C, Sivasubramaniam A. Semantic small world: An overlay network for peer-to-peer search. In: Proceedings of International Conference on Network Protocols (ICNP). Washington D.C.: IEEE Computer Society, 2004, 228–238

    Google Scholar 

  19. Li M, Lee W-C, Sivasubramaniam A, et al. Ssw: a small world based overlay for peer-to-peer search. IEEE Transaction on Parallel and Distributed Systems, 2008, 19(2): 735–749

    Article  Google Scholar 

  20. Ganesan P, Yang B, Garcia-Molina B. One torus to rule them all: Multidimensional queries in P2P systems. In: Proceedings of International Workshop on the Web and Databases (WebDB), 2004, 19–24

  21. Tang C, Xu Z, Dwarkadas S. Peer-to-peer information retrieval using self-organizing semantic overlay networks. In: Proceedings of ACM SIGCOMM. New York: AMC Press, 2003, 175–186

    Google Scholar 

  22. Müller W, Henrich A. Fast retrieval of high-dimensional feature vectors in P2P networks using compact peer data summaries. In: Proceedings of ACM SIGMM International Workshop on Multimedia Information Retrieval (MIR). New York: ACM Press, 2003, 79–86

    Chapter  Google Scholar 

  23. Aberer K. P-Grid: a self-organizing access structure for P2P information systems. In: Proceedings of International Conference on Cooperative Information Systems (CoopIS) 2001, 179–194

  24. Crainiceanu A, Linga P, Gehrke J, et al. Querying peer-to-peer networks using P-trees. In: Proceedings of International Workshop on the Web and Databases (WebDB). New York: ACM Press, 2004, 25–30

    Chapter  Google Scholar 

  25. Houle M. E, Sakuma J. Fast approximate similarity search in extremely high-dimensional data sets. In: Proceedings of International Conference on Data Engineering (ICDE). Washinton DC.: IEEE Computer Society, 2005, 619–630

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wang-Chien Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, M., Lee, WC., Sivasubramaniam, A. et al. Supporting K nearest neighbors query on high-dimensional data in P2P systems. Front. Comput. Sci. China 2, 234–247 (2008). https://doi.org/10.1007/s11704-008-0026-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-008-0026-7

Keywords

Navigation