Skip to main content
Log in

Multi-dimensional data density estimation in P2P networks

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Estimating the global data distribution in Peer-to-Peer (P2P) networks is an important issue and has not yet been well addressed. It can benefit many P2P applications, such as load balancing analysis, query processing, data mining, and so on. In this paper, we propose a novel algorithm which is based on compact multi-dimensional histogram information to achieve high estimation accuracy with low estimation cost. Maintaining data distribution in a multi-dimensional histogram which is spread among peers without overlapping and each part of which is further condensed by a set of discrete cosine transform coefficients, each peer is capable to hierarchically accumulate the compact information to the entire histogram by information exchange and consequently estimates the global data density with accuracy and efficiency. Algorithms on discrete cosine transform coefficients hierarchically accumulating as well as density estimation error are introduced with detailed theoretical analysis and proof. Our extensive performance study confirms the effectiveness and efficiency of our methods on density estimation in dynamic P2P networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Arai, B., Das, G., Gunopulos, D., Kalogeraki, V.: Approximating aggregation queries in peer-to-peer networks. In: Proceedings of the 22th International Conference on Data Engineering (ICDE’2006), p. 42 (2006)

  2. Arai, B., Lin, S., Gunopulos, D.: Efficient data sampling in heterogeneous peer-to-peer networks. In: Proceedings of 7th IEEE International Conference on Data Mining (ICDM’2007) (2007)

  3. Bernstein, S.N.: Theory of Probability. Moscow (1927). Russian

  4. Bharambe, R., Agrawal, M., Seshan, S.: Mercury: Supporting scalable multi-attribute range queries. In: Proceedings of the ACM SIGCOMM 2004 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM’2004), pp. 353–366 (2004)

  5. Bruno, N., Chaudhuri, S., Gravano, L.: Stholes: A multidimensional workload-aware histogram. In: Proceedings of the ACM SIGCOMM 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM’2001), pp. 211–222 (2001)

  6. Chen, C., Roussopoulos, N.: Adaptive selectivity estimation using query feedback. In: Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (SIGMOD’1994), pp. 161–172 (1994)

  7. Christodoulakis, S.: Estimating record selectivities. Inf. Syst. J. 8, 105–115 (1983)

    Article  Google Scholar 

  8. Conway, J., Sloane, N., Wilks, A.: Gray codes for reflection groups. Graphs Comb. 5(1), 315–325 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  9. Datta, S., Bhaduri, K., Giannella, C., Wolff, R., Kargupta, H.: Distributed data mining in peer-to-peer networks. IEEE Internet Comput. 10(4), 18–26 (2006)

    Article  Google Scholar 

  10. Gkantsidis, C., Mihail, M., Saberi, A.: Random walks in peer-to-peer networks. In: Proceedings of the 23rd Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM’2004), pp. 241–263 (2004)

  11. Gray, F.: Pulse code communication. US Patent 2,632,058 (Mar. 17, 1953)

  12. Haas, P.J., Naughton, J.F., Seshadri, S., Stokes, L.: Sampling based estimation of the number of distinct values of an attribute. In: Proceedings of the 21th International Conference on Very Large Data Bases (VLDB’1995), pp. 311–322 (1995)

  13. Hu, Y., Chen, H., Lou, J., Li, J.: Distributed density estimation using non-parametric statistics. In: Proceedings of the 27th International Conference on Distributed Computing Systems (ICDCS’2007), pp. 28–36 (2007)

  14. Ioannidis, Y.: Universality of serial histograms. In: Proceedings of the 19st International Conference on Very Large Data Bases (VLDB’1993), pp. 256–267 (1993)

  15. Ioannidis, Y., Poosala, V.: Balancing optimality and practicality for query result size estimation. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD’1995), pp. 233–244 (1995)

  16. Jagadish, H., Kouda, N., Muthukrishnan, S., Poosala, V., Sevcik, K., Sue, T.: Optimal histograms with quality guarantees. In: Proceedings of the 24nd International Conference on Very Large Data Bases (VLDB’1998), pp. 273–286 (1998)

  17. Kempe, D., Dobra, A., Gehrke, J.: Gossip-based computation of aggregate information. In: Proceedings of the 44rd IEEE Symposium on Foundations of Computer Science (FOCS’2003), pp. 482–491 (2003)

  18. King, V., Saia, J.: Choosing a random peer. In: Proceedings of the 23rd Annual ACM Symposium on Principles of Distributed Computing (PODC’2004), pp. 125–130 (2004)

  19. Kowalczyk, W., Vlassis, N.: Newscast EM. Adv. Neural Inf. Process. Syst. 2, 713–720 (2005)

    Google Scholar 

  20. Lee, J., Kim, D., Chung, C.: Multi-dimensional selectivity estimation using compressed histogram information. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (SIGMOD’1999), pp. 205–214 (1999)

  21. Matias, Y., Vitter, J.S., Wang, M.: Wavelet-based histograms for selectivity estimation. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD’1998), pp. 448–459 (1998)

  22. Moro, G., Monti, G., Ouksel, A.: Merging G-Grid P2P systems while preserving their autonomy. In: Proceedings of the MobiQuitous’04 Workshop on Peer-to-Peer Knowledge Management (P2PKM 2004), pp. 123–137 (2004)

  23. Ouksel, M.: The interpolation-based grid file. In: Proceedings of the Fourth ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, pp. 20–27 (1985)

  24. Ouksel, M., Mayer, O.: A robust and efficient spatial data structure. Acta Inf. 29(4), 335–373 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  25. Ouksel, A., Moro, G.: G-Grid: A class of scalable and self-organizing data structures for multi-dimensional querying and content routing in P2P networks. In: Proceedings of the 2nd Internat. Workshop on Agents and Peer-to-Peer Computing, Melbourne, Australia, vol. 2872, pp. 123–137. Springer, New York (2003)

    Google Scholar 

  26. Ouksel, M., Kumar, V., Majumkar, C.: Management of concurrency in interpolation based grid file organization and its performance. Inf. Sci. 78(1–2), 129–158 (1994)

    Article  MATH  Google Scholar 

  27. Pitoura, T., Triantafillou, P.: Load distribution fairness in P2P data management systems. In: Proceedings of the 23nd International Conference on Data Engineering (ICDE’2007), pp. 396–405 (2007)

  28. Rao, K., Yip, P.: Discrete Cosine Transform: Algorithms, Advantages, Applications. Academic Press Professional, New York (1990)

    MATH  Google Scholar 

  29. Ratnasamy, S., Francis, P., Handley, K., Karp, R., Shenker, S.: A scalable content-addressable network. In: Proceedings of the ACM SIGCOMM 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM’2001), pp. 161–172 (2001)

  30. Ray, W.D., Driver, R.M.: Further decomposition of the Karhunen-Loeve series representation of a stationary random process. IEEE Trans. Inf. Theory 16(6), 663–668 (1970)

    Article  MATH  MathSciNet  Google Scholar 

  31. Rowstron, A., Druschel, P.: Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In: Middleware, pp. 329–350 (2001)

  32. Seshadri, S.: Probabilistic methods in query processing. PhD thesis, University of Wisconsin (1992)

  33. Shu, Y., Ooi, B.C., Tan, K.-L., Zhou, A.: Supporting multi-dimensional range queries in peer-to-peer systems. In: Proceedings of the 5th IEEE International Conference on Peer-to-Peer Computing (P2P’2005), pp. 173–180 (2005)

  34. Stoica, I., Morris, R., Karger, D., Kaashoek, F., Blalakrishnan, H.: Chord: a scalable peer-to-peer lookup service for Internet applications. In: Proceedings of the ACM SIGCOMM 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM’2001), pp. 17–32 (2001)

  35. Wang, S., Ooi, B.C., Tung, A.K.H., Xu, L.: Efficient skyline query processing on peer-to-peer networks. In: Proceedings of the 23th International Conference on Data Engineering (ICDE’2007), pp. 1126–1135 (2007)

  36. Zhu, Y., Hu, Y.: Towards efficient load balancing in structured P2P system. In: Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’2004) (2004)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aoying Zhou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, M., Qian, W., Gong, X. et al. Multi-dimensional data density estimation in P2P networks. Distrib Parallel Databases 26, 261 (2009). https://doi.org/10.1007/s10619-009-7045-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10619-009-7045-8

Keywords

Navigation