Abstract
Declustering distributes data among parallel disks to reduce retrieval cost using I/O parallelism. Many schemes were proposed for single copy declustering of spatial data. Recently, declustering using replication gained a lot of interest and several schemes with different properties were proposed. It is computationally expensive to verify optimality of replication schemes designed for range queries and existing schemes verify optimality for up to 50 disks. In this article, we propose a novel method to find replicated declustering schemes that render all spatial range queries optimal. The proposed scheme uses threshold based declustering, divisibility of large queries for optimization and optimistic approach to compute maximum flow. The proposed scheme is generic and works for any number of dimensions. Experimental results show that using 3 copies there exist allocations that render all spatial range queries optimal for up to 750 disks in 2 dimensions and with the exception of several values for up to 100 disks in 3 dimensions. The proposed scheme improves search for strictly optimal replicated declustering schemes significantly and will be a valuable tool to answer open problems on replicated declustering.
- Abdel-Ghaffar, K. A. S. and El Abbadi, A. 1997. Optimal allocation of two-dimensional data. In Proceedings of the International Conference on Database Theory (ICDT). 409--418. Google ScholarDigital Library
- Amer-Yahia, S. and Johnson, T. 2000. Optimizing queries on compressed bitmaps. In Proceedings of the International Conference on Very Large Databases (VLDB). 329--338. Google ScholarDigital Library
- Antoshenkov, G. 1995. Byte-aligned bitmap compression. In Proceedings of the Data Compression Conference. Google ScholarDigital Library
- Atallah, M. J. and Prabhakar, S. 2000. (Almost) optimal parallel block access for range queries. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 205--215. Google ScholarDigital Library
- Beckmann, N., Kriegel, H., Schneider, R., and Seeger, B. 1990. The R* tree: An efficient and robust access method for points and rectangles. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 322--331. Google ScholarDigital Library
- Berchtold, S., Bohm, C., Braunmuller, B., Keim, D. A., and Kriegel, H.-P. 1997. Fast parallel similarity search in multimedia databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 1--12. Google ScholarDigital Library
- Bhatia, R., Sinha, R. K., and Chen, C. 2000. Hierarchical declustering schemes for range queries. In Proceedings of the International Conference on Extending Database Technology (EDBT). 525--537. Google ScholarDigital Library
- Chen, C., Bhatia, R., and Sinha, R. 2000. Declustering using golden ratio sequences. In Proceedings of the International Conference on Data Engineering (ICDE). 271--280. Google ScholarDigital Library
- Chen, C. and Cheng, C. T. 2002. From discrepancy to declustering: Near optimal multidimensional declustering strategies for range queries. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 29--38. Google ScholarDigital Library
- Chen, C.-M. and Cheng, C. 2003. Replication and retrieval strategies of multidimensional data on parallel disks. In Proceedings of the Conference on Information and Knowledge Management (CIKM'03). Google ScholarDigital Library
- Chen, L. T. and Rotem, D. 1994. Optimal response time retrieval of replicated data. In ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 36--44. Google ScholarDigital Library
- Ciaccia, P. and Veronesi, A. 1996. Dynamic declustering methods for parallel grid files. In Proceedings of the 3rd International ACPC Conference with Special Emphasis on Parallel Databases and Parallel I/O. 110--123. Google ScholarDigital Library
- Du, H. C. and Sobolewski, J. S. 1982. Disk allocation for cartesian product files on multiple-disk systems. ACM Trans. Datab. Syst. 7, 1, 82--101. Google ScholarDigital Library
- Faloutsos, C. and Bhagwat, P. 1993. Declustering using fractals. In Proceedings of the 2nd International Conference on Parallel and Distributed Information Systems. 18--25. Google ScholarDigital Library
- Faloutsos, C. and Metaxas, D. 1989. Declustering using error correcting codes. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 253--258. Google ScholarDigital Library
- Fan, C., Gupta, A., and Liu, J. 1994. Latin cubes and parallel array access. In Proceedings of the 8th International Parallel Processing Symposium. Google ScholarDigital Library
- Ferhatosmanoglu, H., Agrawal, D., and Abbadi, A. E. 1999. Concentric hyperspaces and disk allocation for fast parallel range searching. In Proceedings of the International Conference on Data Engineering (ICDE). 608--615. Google ScholarDigital Library
- Ferhatosmanoglu, H., Tosun, A. S., Canahuate, G., and Ramachandran, A. 2006. Efficient parallel processing of range queries through replicated declustering. J. Distrib. Parall. Datab. 20, 2, 117--147. Google ScholarDigital Library
- Ferhatosmanoglu, H., Tosun, A. S., and Ramachandran, A. 2004. Replicated declustering of spatial data. In Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS). 125--135. Google ScholarDigital Library
- Frikken, K. 2005. Optimal distributed declustering using replication. In Proceedings of the 10th International Conference on Database Theory (ICDT'05). 144--157. Google ScholarDigital Library
- Frikken, K., Atallah, M., Prabhakar, S., and Safavi-Naini, R. 2002. Optimal parallel i/o for range queries through replication. In Proceedings of the 13th International Conference on Database and Expert Systems Applications (DEXA). 669--678. Google ScholarDigital Library
- Gaede, V. and Gunther, O. 1998. Multidimensional access methods. ACM Comput. Surv. 30, 170--231. Google ScholarDigital Library
- Ghandeharizadeh, S. and DeWitt, D. J. 1990a. Hybrid-range partitioning strategy: A new declustering strategy for multiprocessor database machines. In Proceedings of the International Conference onVery Large Databases (VLDB). 481--492. Google ScholarDigital Library
- Ghandeharizadeh, S. and DeWitt, D. J. 1990b. A multiuser performance analysis of alternative declustering strategies. In Proceedings of the International Conference on Data Engineering (ICDE). 466--475. Google ScholarDigital Library
- Ghandeharizadeh, S. and DeWitt, D. J. 1992. A performance analysis of alternative multi-attribute declustering strategies. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 29--38. Google ScholarDigital Library
- Gray, J., Horst, B., and Walker, M. 1990. Parity striping of disc arrays: Low-cost reliable storage with acceptable throughput. In Proceedings of the International Conference on Very Large Databases (VLDB). 148--161. Google ScholarDigital Library
- Guttman, A. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 47--57. Google ScholarDigital Library
- Hua, K. A. and Young, H. C. 1997. A general multidimensional data allocation method for multicomputer database systems. In Proceedings of the International Conference on Database and Expert System Applications. 401--409. Google ScholarDigital Library
- Kim, K. and Prasanna-Kumar, V. K. 1993. Latin squares for parallel array access. IEEE Trans. Parall. Distrib. Syst. 4, 4, 361--370. Google ScholarDigital Library
- Kim, M. H. and Pramanik, S. 1988. Optimal file distribution for partial match retrieval. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). Chicago, 173--182. Google ScholarDigital Library
- Koyuturk, M. and Aykanat, C. 2005. Iterative-improvement-based declustering heuristics for multi-disk databases. Inform. Syst. 30, 9, 47--70. Google ScholarDigital Library
- Li, J., Srivastava, J., and Rotem, D. 1992. CMD: a multidimensional declustering method for parallel database systems. In Proceedings of the International Conference on Very Large Databases (VLDB). Vancouver, Canada, 3--14. Google ScholarDigital Library
- Liu, D. and Wu, M. 2001. A hypergraph based approach to declustering problems. Distr. Paral. Datab. 10, 3. Google ScholarDigital Library
- Lovasz, L. and Plummer, M. 1986. Matching Theory. North-Holland. Google ScholarDigital Library
- Moon, B., Acharya, A., and Saltz, J. 1996. Study of scalable declustering algorithms for parallel grid files. In Proceedings of the Parallel Processing Symposium. Google ScholarDigital Library
- Prabhakar, S., Abdel-Ghaffar, K., Agrawal, D., and El Abbadi, A. 1998. Cyclic allocation of two-dimensional data. In Proceedings of the International Conference on Data Engineering (ICDE). 94--101. Google ScholarDigital Library
- Prabhakar, S., Agrawal, D., and El Abbadi, A. 1998. Efficient disk allocation for fast similarity searching. In Proceedings of the Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'98). 78--87. Google ScholarDigital Library
- Samet, H. 1989. The Design and Analysis of Spatial Structures. Addison Wesley. Google ScholarDigital Library
- Sanders, P., Egner, S., and Korst, K. 2000. Fast concurrent access to parallel disks. In Proceedings of the 11th ACM-SIAM Symposium on Discrete Algorithms. Google ScholarDigital Library
- Shektar, S. and Liu, D. 1996. Partitioning similarity graphs: A framework for declustering problems. Inform. Syst. 21, 4. Google ScholarDigital Library
- Sinha, R. K., Bhatia, R., and Chen, C. 2001. Asymptotically optimal declustering schemes for range queries. In Proceedings of the 8th International Conference on Database Theory. Lecture Notes in Computer Science. Springer, 144--158. Google ScholarDigital Library
- Stockinger, K. 2002. Bitmap indices for speeding up high-dimensional data analysis. In Proceedings of the 13th International Conference on Database and Expert Systems Applications. Springer-Verlag, 881--890. Google ScholarDigital Library
- Tosun, A. S. 2004. Replicated declustering for arbitrary queries. In Proceedings of the 19th ACM Symposium on Applied Computing. 748--753. Google ScholarDigital Library
- Tosun, A. S. 2005a. Constrained declustering. In Proceedings of the International Conference on Information Technology Coding and Computing. 232--237. Google ScholarDigital Library
- Tosun, A. S. 2005b. Design theoretic approach to replicated declustering. In Proceedings of the International Conference on Information Technology Coding and Computing. 226--231. Google ScholarDigital Library
- Tosun, A. S. 2005c. Threshold based declustering in high dimensions. In Proceedings of the International Conference on Database and Expert Systems Applications. 818--827. Google ScholarDigital Library
- Tosun, A. S. 2006. Efficient retrieval of replicated data. J. Distrib. Parall. Datab. 19, 2-3, 107--124. Google ScholarDigital Library
- Tosun, A. S. 2007a. Analysis and comparison of replicated declustering schemes. IEEE Trans. Parall. Distrib. Syst. 18, 11, 1578--1591. Google ScholarDigital Library
- Tosun, A. S. 2007b. Threshold-based declustering. Inform. Sci. 177, 5, 1309--1331. Google ScholarDigital Library
- Tosun, A. S. and Ferhatosmanoglu, H. 2002. Optimal parallel I/O using replication. In Proceedings of the International Workshops on Parallel Processing (ICPP). Vancouver, Canada, 506--513. Google ScholarDigital Library
- Wu, K., Otoo, E., and Shoshani, A. 2002. Compressing bitmap indexes for faster search operations. In Proceedings of the International Conference on Statistical and Scientific Database Management (SSDBM). 99--108. Google ScholarDigital Library
Index Terms
- Divide-and-conquer scheme for strictly optimal retrieval of range queries
Recommendations
Efficient parallel processing of range queries through replicated declustering
A common technique used to minimize I/O in data intensive applications is data declustering over parallel servers. This technique involves distributing data among several disks so as to parallelize query retrieval and thus, improve performance. We focus ...
Analysis and Comparison of Replicated Declustering Schemes
Declustering distributes data among parallel disks to reduce retrieval cost using I/O parallelism. Many schemes were proposed for single copy declustering of spatial data. Recently, declustering using replication gained a lot of interest and several ...
Replicated declustering for arbitrary queries
SAC '04: Proceedings of the 2004 ACM symposium on Applied computingDeclustering have attracted a lot of interest over the couple of years. Recently, declustering using replication is proposed to reduce the additive overhead of declustering. Most of the work on declustering focuses on spatial range queries. However, in ...
Comments