Skip to main content
Log in

Efficient parallel processing of range queries through replicated declustering

Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

A common technique used to minimize I/O in data intensive applications is data declustering over parallel servers. This technique involves distributing data among several disks so as to parallelize query retrieval and thus, improve performance. We focus on optimizing access to large spatial data, and the most common type of queries on such data, i.e., range queries. An optimal declustering scheme is one in which the processing for all range queries is balanced uniformly among the available disks. It has been shown that single copy based declustering schemes are non-optimal for range queries. In this paper, we integrate replication in conjunction with parallel disk declustering for efficient processing of range queries. We note that replication is largely used in database applications for several purposes like load balancing, fault tolerance and availability of data. We propose theoretical foundations for replicated declustering and propose a class of replicated declustering schemes, periodic allocations, which are shown to be strictly optimal for a number of disks. We propose a framework for replicated declustering, using a limited amount of replication and provide extensions to apply it on real data, which include arbitrary grids and a large number of disks. Our framework also provides an effective indexing scheme that enables fast identification of data of interest in parallel servers. In addition to optimal processing of single queries, we show that this framework is effective for parallel processing of multiple queries. We present experimental results comparing the proposed replication scheme to other techniques for both single queries and multiple queries, on synthetic and real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. K.A.S. Abdel-Ghaffar and A. El Abbadi, “Optimal disk allocation for partial match queries,” ACM Transactions on Database Systems, vol. 18, no. 1, pp. 132–156, 1993.

  2. K.A.S. Abdel-Ghaffar and A. El Abbadi, “Optimal allocation of two-dimensional data,” in International Conference on Database Theory, Delphi, Greece, 1997, pp. 409–418.

  3. I. Anderson, Combinatorial Designs, Ellis Horwood Limited, 1990.

  4. M.J. Atallah and S. Prabhakar, “(Almost) optimal parallel block access for range queries,” in Proceeding ACM Symp. on Principles of Database Systems, Dallas, Texas, May 2000, pp. 205–215.

  5. N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger, “The R* tree: An efficient and robust access method for points and rectangles,” in Proceeding ACM SIGMOD Int. Conf. on Management of Data, May 23–25 1990, pp. 322–331.

  6. S. Berchtold, C. Bohm, B. Braunmuller, D.A. Keim, and H.-P. Kriegel, “Fast parallel similarity search In multimedia databases,” in Proceeding ACM SIGMOD Int. Conf. on Management of Data, Arizona, U.S.A., 1997, pp. 1–12.

  7. S. Berchtold, D.A. Keim, and H.P. Kreigel, “The X-tree: An index structure for highdimensional data,” in 22nd Conference on Very Large Databases, Bombay, India, 1996, pp. 28–39.

  8. D. Bertsekas and R. Gallager, Data Networks: Second Edition, Prentice Hall, 1991.

  9. R. Bhatia, R.K. Sinha, and C. Chen, “Hierarchical declustering schemes for range queries,” in Advances in Database Technology—EDBT 2000, 7th International Conference on Extending Database Technology, Lecture Notes in Computer Science, Konstanz, Germany, March 2000, pp. 525–537.

  10. R. Bose and S. Shrikhande, “On the construction of sets of mutually orthogonal latin squares and the falsity of a conjecture of euler,” Euler. Trans. Am. Math. Sm., vol. 95, pp. 191–209, 1960.

  11. C. Chen, R. Bhatia, and R. Sinha, “Declustering using golden ratio sequences,” in International Conference on Data Engineering, San Diego, California, Feb. 2000, pp. 271–280.

  12. C. Chen and C.T. Cheng, “From discrepancy to declustering: Near optimal multidimensional declustering strategies for range queries,” in Proceeding ACM Symp. on Principles of Database Systems, Wisconsin, Madison, 2002, pp. 29–38.

  13. C.-M. Chen and C.T. Cheng, “Replication and retrieval strategies of multidimensional data on parallel disks,” in CIKM 03: Proceedings of the Twelfth International Conference on Information and Knowledge Management, ACM Press, New York, NY, USA, 2003, pp. 32–39.

  14. L. Chen and D. Rotem, “Optimal response time retrieval of replicated data,” in Proceeding ACM Symp. on Principles of Database Systems, Minneapolis, Minnesota, May 1994, pp. 36–44.

  15. L.T. Chen and D. Rotem, “Declustering objects for visualization,” in Proceedings of the Int. Conf. on Very Large Data Bases, Dublin, Ireland, Aug. 1993, pp. 85–96.

  16. L.T. Chen, D. Rotem, and S. Seshadri, “Declustering databases on heterogeneous disk systems,” in Proceedings of the Int. Conf. on Very Large Data Bases, Zurich, Switzerland, Sept. 1995, pp. 110–121.

  17. M. Chen, H. Hsiao, C. Lie, and P. Yu, “Using rotational mirrored declustering for replica placement In a disk array-based video server,” in: Proceedings of the ACM Multimedia, 1995, pp. 121–130.

  18. B. Chor, C.E. Leiserson, R.L. Rivest, and J.B. Shearer, “An application of number theory to the organization of raster-graphics memory,” Journal of the Association for Computing Machinery, vol. 33, no. 1, pp. 86–104, 1986.

  19. P. Ciaccia and A. Veronesi, “Dynamic declustering methods for parallel grid files,” in Proceedings of Third International ACPC Conference with Special Emphasis on Parallel Databases and Parallel I/O, Berlin, Germany, Sept. 1996, pp. 110–123.

  20. M. Coyle, S. Shekhar, and Y. Zhou, “Evaluation of disk allocation methods for parallelizing spatial queries on grid files,” Journal of Computer and Software Engineering, 1995.

  21. A. Czumaj, C. Riley, and C. Scheideler, Perfectly balanced allocation.

  22. H.C. Du and J.S. Sobolewski, “Disk allocation for cartesian product files on multiple-disk systems,” ACM Transactions of Database Systems, vol. 7, no. 1, pp. 82–101, 1982.

  23. C. Faloutsos and P. Bhagwat, “Declustering using fractals,” in Proceedings of the 2nd International Conference on Parallel and Distributed Information Systems, San Diego, CA, Jan. 1993, pp. 18–25.

  24. C. Faloutsos and D. Metaxas, “Declustering using error correcting codes,” in Proc. ACM Symp. on Principles of Database Systems, 1989, pp. 253–258.

  25. Fan, Gupta, and Liu, “Latin cubes and parallel array access,” in IPPS: 8th International Parallel Processing Symposium. IEEE Computer Society Press, 1994.

  26. H. Ferhatosmanoglu, D. Agrawal, and A.E. Abbadi, “Concentric hyperspaces and disk allocation for fast parallel range searching,” in Proc. Int. Conf. Data Engineering, Sydney, Australia, March 1999, pp. 608–615.

  27. H. Ferhatosmanoglu, A.S. Tosun, and A. Ramachandran, “Replicated declustering of spatial data,” in Proc. ACM Symp. on Principles of Database Systems, June 2004.

  28. H. Ferhatosmanoglu, A.S. Tosun, and A. Ramachandran, “Replicated declustering of spatial data,” in PODS 04: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, ACM Press, New York, NY, USA. 2004, pp. 125–135.

  29. K. Frikken, “Optimal distributed declustering using replication,” in Tenth International Conference on Database Theory (ICDT 2005), 2005.

  30. K. Frikken, M. Atallah, S. Prabhakar, and R. Safavi-Naini, “Optimal parallel i/o for range queries through replication,” in Proceedings of 13th International Conference of Database and Expert Systems Applications (DEXA), 2002, pp. 669–678.

  31. V. Gaede and O. Gunther, “Multidimensional access methods,” ACM Computing Surveys, vol. 30 pp. 170–231, 1998.

  32. S. Ghandeharizadeh and D.J. DeWitt, “Hybrid-range partitioning strategy: A new declustering strategy for multiprocessor database machines,” in Proceedings of 16th International Conference on Very Large Data Bases, Aug. 1990, pp. 481–492.

  33. S. Ghandeharizadeh and D.J. DeWitt, “Hybrid-range partitioning strategy: A new declustering strategy for multiprocessor database machines,” in Proceedings of 16th International Conference on Very Large Data Bases, Aug. 1990, pp. 481–492.

  34. S. Ghandeharizadeh and D.J. DeWitt, “A multiuser performance analysis of alternative declustering strategies,” in Proc. Int. Conf. Data Engineering, Los Angeles, California, Feb. 1990, pp. 466–475.

  35. S. Ghandeharizadeh and D.J. DeWitt, “A performance analysis of alternative multi-attribute declustering strategies,” in Proc. ACM SIGMOD Int. Conf. on Management of Data, San Diego, 1992, pp. 29–38.

  36. S. Ghandeharizadeh, D.J. DeWitt, and W. Qureshi, “A performance analysis of alternative multiattribute declustering strategies,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, June 1992, pp. 29–38.

  37. L. Golubchik, S. Khanna, S. Khuller, R. Thurimella, and A. Zhu, “Approximation algorithms for data placement on parallel disks,” in Symposium on Discrete Algorithms, 2000, pp. 223–232.

  38. J. Gray, B. Horst, and M. Walker, “Parity striping of disc arrays: Low-cost reliable storage with acceptable throughput,” in Proceedings of the Int. Conf. on Very Large Data Bases, Washington DC, Aug. 1990, pp. 148–161.

  39. A. Guttman, “R-trees: A dynamic index structure for spatial searching,” in Proc. ACM SIGMOD Int. Conf. on Management of Data, 1984, pp. 47–57.

  40. K.A. Hua and H.C. Young, “A general multidimensional data allocation method for multicomputer database systems,” in Database and Expert System Applications, Toulouse, France, Sept. 1997, pp. 401–409.

  41. I. Kamel and C. Faloutsos, “Parallel R-trees,” in Proc. ACM SIGMOD Int. Conf. on Management of Data, San Diego, CA, June 1992, pp. 195–204.

  42. K. Kim and V.K. Prasanna-Kumar, “Latin squares for parallel array access,” IEEE Transactions on Parallel and Distributed Systems, vol. 4, no. 4, pp. 361–370, 1993.

  43. M.H. Kim and S. Pramanik, “Optimal file distribution for partial match retrieval,” in Proc. ACM SIGMOD Int. Conf. on Management of Data, Chicago, 1988, pp. 173–182.

  44. J. Li, J. Srivastava, and D. Rotem, “CMD: A multidimensional declustering method for parallel database systems,” in Proceedings of the Int. Conf. on Very Large Data Bases, Vancouver, Canada, Aug. 1992, pp. 3–14.

  45. B. Moon, A. Acharya, and J. Saltz, “Study of scalable declustering algorithms for parallel grid files,” in Proceeding. of the Parallel Processing Symposium, April 1996.

  46. R. Muntz, J. Santos, and S. Berson, “A parallel disk storage system for real-time multimedia applications,” International Journal of Intelligent Systems, Special Issue on Multimedia Computing System, vol. 13, no. 12, 1998, pp. 1137–1174.

  47. R.-T. Portal, “North east dataset,” http://www.rteeportal.org/datasets/spatial/US/NE.zip.

  48. R.-T. Portal, “Sequoia dataset,” http://www.rteeportal.org/datasets/spatial/US/Sequoia.zip.

  49. S. Prabhakar, K. Abdel-Ghaffar, D. Agrawal, and A. El Abbadi, “Cyclic allocation of two-dimensional data,” in International Conference on Data Engineering, Orlando, Florida, Feb. 1998, pp. 94–101.

  50. S. Prabhakar, D. Agrawal, and A. El Abbadi, “Efficient disk allocation for fast similarity searching,” in 10th International Symposium on Parallel Algorithms and Architectures, SPAA98, Puerto Vallarta, Mexico, June 1998, pp. 78–87.

  51. H. Samet, The Design and Analysis of Spatial Structures, AddisonWesley Publishing Company, Inc., Massachusetts, 1989.

  52. J. Santos and R. Muntz, “Design of the RIO (randomized I/O) storage server. Technical Report TR970032,” UCLA Computer Science Department, 1997. http://mml.cs.ucla.edu/publications/papers/cstech970032.ps.

  53. Seagate. Seagate specifications, December 2003. http://www.seagate.com/pdf/datasheets/.

  54. S. Shekhar and D.R. Liu, “Partitioning similarity graphs: A framework for declustering problems,” Information Systems, vol. 21, no. 4, pp. 475–496, 1996.

  55. S. Shekhar, S. Ravada, V. Kumar, D. Chubb, and G. Turner, “Declustering and load balancing methods for parallelizing geographical information systems,” IEEE Transactions on Knowledge and Data Engineering, vol. 10, no. 4, pp. 632–655, 1998.

  56. R.K. Sinha, R. Bhatia, and C. Chen, “Asymptotically optimal declustering schemes for range queries,” in 8th International Conference on Database Theory, Lecture Notes In Computer Science, London, UK Springer, Jan. 2001, pp. 144–158.

  57. A.S. Tosun and H. Ferhatosmanoglu, “Optimal parallel I/O using replication,” in Proceedings of International Workshops on Parallel Processing (ICPP), Vancouver, Canada, Aug. 2002.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guadalupe Canahuate.

Additional information

Recommended by: Ahmed Elmagarmid

Supported by U.S. Department of Energy (DOE) Award No. DE-FG02-03ER25573, and National Science Foundation (NSF) grant CNS-0403342.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ferhatosmanoglu, H., Tosun, A.Ş., Canahuate, G. et al. Efficient parallel processing of range queries through replicated declustering. Distrib Parallel Databases 20, 117–147 (2006). https://doi.org/10.1007/s10619-006-9362-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-006-9362-5

Keywords

Navigation