skip to main content
research-article

Divide-and-conquer scheme for strictly optimal retrieval of range queries

Published:30 November 2009Publication History
Skip Abstract Section

Abstract

Declustering distributes data among parallel disks to reduce retrieval cost using I/O parallelism. Many schemes were proposed for single copy declustering of spatial data. Recently, declustering using replication gained a lot of interest and several schemes with different properties were proposed. It is computationally expensive to verify optimality of replication schemes designed for range queries and existing schemes verify optimality for up to 50 disks. In this article, we propose a novel method to find replicated declustering schemes that render all spatial range queries optimal. The proposed scheme uses threshold based declustering, divisibility of large queries for optimization and optimistic approach to compute maximum flow. The proposed scheme is generic and works for any number of dimensions. Experimental results show that using 3 copies there exist allocations that render all spatial range queries optimal for up to 750 disks in 2 dimensions and with the exception of several values for up to 100 disks in 3 dimensions. The proposed scheme improves search for strictly optimal replicated declustering schemes significantly and will be a valuable tool to answer open problems on replicated declustering.

References

  1. Abdel-Ghaffar, K. A. S. and El Abbadi, A. 1997. Optimal allocation of two-dimensional data. In Proceedings of the International Conference on Database Theory (ICDT). 409--418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Amer-Yahia, S. and Johnson, T. 2000. Optimizing queries on compressed bitmaps. In Proceedings of the International Conference on Very Large Databases (VLDB). 329--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Antoshenkov, G. 1995. Byte-aligned bitmap compression. In Proceedings of the Data Compression Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Atallah, M. J. and Prabhakar, S. 2000. (Almost) optimal parallel block access for range queries. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 205--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Beckmann, N., Kriegel, H., Schneider, R., and Seeger, B. 1990. The R* tree: An efficient and robust access method for points and rectangles. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 322--331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Berchtold, S., Bohm, C., Braunmuller, B., Keim, D. A., and Kriegel, H.-P. 1997. Fast parallel similarity search in multimedia databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bhatia, R., Sinha, R. K., and Chen, C. 2000. Hierarchical declustering schemes for range queries. In Proceedings of the International Conference on Extending Database Technology (EDBT). 525--537. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chen, C., Bhatia, R., and Sinha, R. 2000. Declustering using golden ratio sequences. In Proceedings of the International Conference on Data Engineering (ICDE). 271--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chen, C. and Cheng, C. T. 2002. From discrepancy to declustering: Near optimal multidimensional declustering strategies for range queries. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 29--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chen, C.-M. and Cheng, C. 2003. Replication and retrieval strategies of multidimensional data on parallel disks. In Proceedings of the Conference on Information and Knowledge Management (CIKM'03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chen, L. T. and Rotem, D. 1994. Optimal response time retrieval of replicated data. In ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 36--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ciaccia, P. and Veronesi, A. 1996. Dynamic declustering methods for parallel grid files. In Proceedings of the 3rd International ACPC Conference with Special Emphasis on Parallel Databases and Parallel I/O. 110--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Du, H. C. and Sobolewski, J. S. 1982. Disk allocation for cartesian product files on multiple-disk systems. ACM Trans. Datab. Syst. 7, 1, 82--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Faloutsos, C. and Bhagwat, P. 1993. Declustering using fractals. In Proceedings of the 2nd International Conference on Parallel and Distributed Information Systems. 18--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Faloutsos, C. and Metaxas, D. 1989. Declustering using error correcting codes. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 253--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Fan, C., Gupta, A., and Liu, J. 1994. Latin cubes and parallel array access. In Proceedings of the 8th International Parallel Processing Symposium. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ferhatosmanoglu, H., Agrawal, D., and Abbadi, A. E. 1999. Concentric hyperspaces and disk allocation for fast parallel range searching. In Proceedings of the International Conference on Data Engineering (ICDE). 608--615. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ferhatosmanoglu, H., Tosun, A. S., Canahuate, G., and Ramachandran, A. 2006. Efficient parallel processing of range queries through replicated declustering. J. Distrib. Parall. Datab. 20, 2, 117--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ferhatosmanoglu, H., Tosun, A. S., and Ramachandran, A. 2004. Replicated declustering of spatial data. In Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS). 125--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Frikken, K. 2005. Optimal distributed declustering using replication. In Proceedings of the 10th International Conference on Database Theory (ICDT'05). 144--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Frikken, K., Atallah, M., Prabhakar, S., and Safavi-Naini, R. 2002. Optimal parallel i/o for range queries through replication. In Proceedings of the 13th International Conference on Database and Expert Systems Applications (DEXA). 669--678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Gaede, V. and Gunther, O. 1998. Multidimensional access methods. ACM Comput. Surv. 30, 170--231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ghandeharizadeh, S. and DeWitt, D. J. 1990a. Hybrid-range partitioning strategy: A new declustering strategy for multiprocessor database machines. In Proceedings of the International Conference onVery Large Databases (VLDB). 481--492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ghandeharizadeh, S. and DeWitt, D. J. 1990b. A multiuser performance analysis of alternative declustering strategies. In Proceedings of the International Conference on Data Engineering (ICDE). 466--475. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ghandeharizadeh, S. and DeWitt, D. J. 1992. A performance analysis of alternative multi-attribute declustering strategies. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 29--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Gray, J., Horst, B., and Walker, M. 1990. Parity striping of disc arrays: Low-cost reliable storage with acceptable throughput. In Proceedings of the International Conference on Very Large Databases (VLDB). 148--161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Guttman, A. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 47--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hua, K. A. and Young, H. C. 1997. A general multidimensional data allocation method for multicomputer database systems. In Proceedings of the International Conference on Database and Expert System Applications. 401--409. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Kim, K. and Prasanna-Kumar, V. K. 1993. Latin squares for parallel array access. IEEE Trans. Parall. Distrib. Syst. 4, 4, 361--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Kim, M. H. and Pramanik, S. 1988. Optimal file distribution for partial match retrieval. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). Chicago, 173--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Koyuturk, M. and Aykanat, C. 2005. Iterative-improvement-based declustering heuristics for multi-disk databases. Inform. Syst. 30, 9, 47--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Li, J., Srivastava, J., and Rotem, D. 1992. CMD: a multidimensional declustering method for parallel database systems. In Proceedings of the International Conference on Very Large Databases (VLDB). Vancouver, Canada, 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Liu, D. and Wu, M. 2001. A hypergraph based approach to declustering problems. Distr. Paral. Datab. 10, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Lovasz, L. and Plummer, M. 1986. Matching Theory. North-Holland. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Moon, B., Acharya, A., and Saltz, J. 1996. Study of scalable declustering algorithms for parallel grid files. In Proceedings of the Parallel Processing Symposium. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Prabhakar, S., Abdel-Ghaffar, K., Agrawal, D., and El Abbadi, A. 1998. Cyclic allocation of two-dimensional data. In Proceedings of the International Conference on Data Engineering (ICDE). 94--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Prabhakar, S., Agrawal, D., and El Abbadi, A. 1998. Efficient disk allocation for fast similarity searching. In Proceedings of the Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'98). 78--87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Samet, H. 1989. The Design and Analysis of Spatial Structures. Addison Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Sanders, P., Egner, S., and Korst, K. 2000. Fast concurrent access to parallel disks. In Proceedings of the 11th ACM-SIAM Symposium on Discrete Algorithms. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Shektar, S. and Liu, D. 1996. Partitioning similarity graphs: A framework for declustering problems. Inform. Syst. 21, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Sinha, R. K., Bhatia, R., and Chen, C. 2001. Asymptotically optimal declustering schemes for range queries. In Proceedings of the 8th International Conference on Database Theory. Lecture Notes in Computer Science. Springer, 144--158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Stockinger, K. 2002. Bitmap indices for speeding up high-dimensional data analysis. In Proceedings of the 13th International Conference on Database and Expert Systems Applications. Springer-Verlag, 881--890. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Tosun, A. S. 2004. Replicated declustering for arbitrary queries. In Proceedings of the 19th ACM Symposium on Applied Computing. 748--753. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Tosun, A. S. 2005a. Constrained declustering. In Proceedings of the International Conference on Information Technology Coding and Computing. 232--237. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Tosun, A. S. 2005b. Design theoretic approach to replicated declustering. In Proceedings of the International Conference on Information Technology Coding and Computing. 226--231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Tosun, A. S. 2005c. Threshold based declustering in high dimensions. In Proceedings of the International Conference on Database and Expert Systems Applications. 818--827. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Tosun, A. S. 2006. Efficient retrieval of replicated data. J. Distrib. Parall. Datab. 19, 2-3, 107--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Tosun, A. S. 2007a. Analysis and comparison of replicated declustering schemes. IEEE Trans. Parall. Distrib. Syst. 18, 11, 1578--1591. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Tosun, A. S. 2007b. Threshold-based declustering. Inform. Sci. 177, 5, 1309--1331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Tosun, A. S. and Ferhatosmanoglu, H. 2002. Optimal parallel I/O using replication. In Proceedings of the International Workshops on Parallel Processing (ICPP). Vancouver, Canada, 506--513. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Wu, K., Otoo, E., and Shoshani, A. 2002. Compressing bitmap indexes for faster search operations. In Proceedings of the International Conference on Statistical and Scientific Database Management (SSDBM). 99--108. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Divide-and-conquer scheme for strictly optimal retrieval of range queries

        Recommendations

        Reviews

        David Gary Hill

        The most common query type in a large-say a terabyte (TB) or larger-database, such as a relational or spatial database, is a range query. In a range query, a user specifies a range of values for each dimension of interest within a dataset. The user receives an output result of "the set of items in the dataset that have values within the specified range for each dimension." When the efficient retrieval of all the requested items is a challenge, the time it takes to finish retrieving all of the items from a range query might be unacceptably long. Previous research has focused on efficient retrieval structures and methods that use input/output (I/O) parallelism, which involves storage techniques that access data from multiple disks. Numerous declustering schemes-that distribute data among parallel disks-have been proposed. This paper discusses overcoming the limitations of single-copy declustering schemes, and proposes using replication to achieve optimal queries. Tosun claims that the proposed replicated declustering scheme is "generic and works for any number of dimensions." The paper covers in mathematical detail the proposed scheme. Tosun includes a couple of examples that show how this approach might be integrated with commercial applications. Readers who have a technical interest in speeding up range queries for databases may find this paper useful. Online Computing Reviews Service

        Access critical reviews of Computing literature here

        Become a reviewer for Computing Reviews.

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Storage
          ACM Transactions on Storage  Volume 5, Issue 3
          November 2009
          153 pages
          ISSN:1553-3077
          EISSN:1553-3093
          DOI:10.1145/1629075
          Issue’s Table of Contents

          Copyright © 2009 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 30 November 2009
          • Revised: 1 March 2009
          • Accepted: 1 March 2009
          • Received: 1 May 2008
          Published in tos Volume 5, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader