Abstract
Most of the previous work on declustering have been focused on proposing good mapping functions under the assumption that the data space is partitioned equally for all dimensions. In this paper, we relax equal partition restriction on all dimensions by choosing smaller number of dimensions as split axes and study the effects of grid-like partitioning methods on the performance of a mapping function which is widely used for declustering algorithms. For this, we propose a cost model to expect the number of grid cells intersecting a range query and apply the best mapping scheme so far to the partitioned grid cells. Experiments show that our cost model gives remarkable accuracy for all ranges of selectivities and dimensions. By applying different partitioning schemes on the Kronecker sequence mapping function [5], which is known to be the best mapping function for high-dimensional data so far, we can achieve up to 23 times performance gain. Thus we can conclude that the performance of a mapping function is highly dependent on partitioning schemes applied. And our cost model gives clear criteria on how to select the number of split dimensions out of d dimensions to achieve better performance of a mapping function on declustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Atallah, M.J., Prabhakar, S. (Almost) Optimal Parallel Block Access for Range Queries. In: Proc. PODS Conf., pp. 205–215 (2000)
Berchtold, S., Böhm, C., Kriegel, H.-.P.: Improving the Query Performance of High-Dimensional Index Structures by Bulk Loading R-trees. In: Schek, H.-J., Saltor, F., Ramos, I., Alonso, G. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 216–230. Springer, Heidelberg (1998)
Bhatia, R., Sinha, R.K., Chen, C.-M.: Declustering Using Golden Ratio Sequences. In: Proc. ICDE Conf., pp. 271–280 (2000)
Chen, C.M., Cheng, C.T.: From Discrepancy to Declustering: Near optimal multidimensional declustering strategies for range queries. In: Proc. PODS Conf., pp. 29–38 (2002)
Chend, C.-M., Bhatia, R., Sinha, R.K.: Multidimensional Declustering Schemes Using Golden Ratio Sequence and Kronecker Sequences. IEEE TKDE 15(3), 659–670 (2003)
Du, H.C., Sobolewski, J.S.: Disk Allocation for Cartisian Files on Multiple-Disk Systems. ACM Trans. Database Systems 7(1), 82–102 (1982)
Faloutsos, C., Bhagwat, P.: Declustering Using Fractals. In: Proc. Parallel and Distributed Information Systems Conf., pp. 18–25 (1993)
Faloutsos, C., Metaxas, D.: Disk Allocation Methods Using Error Correcting Codes. IEEE Trans. on Computers 40(8), 907–914 (1991)
Fang, M.T., Lee, R.C.T., Chang, C.C.: The Idea of De-Clustering and Its applications. In: Proc. VLDB Conf., pp. 181–188 (1986)
S-Wk. Kao, M., Winslee, M., Cho, Y., Lee., J.: New GDM-based Declustering Methods for Parallel Range Queries. In: Proc. IDEAS Symp., pp. 119–127 (1999)
Kim, H.C., Li, K.J.: Declustering Spatial Objects by Clustering for Parallel Disks. In: Mayr, H.C., Lazanský, J., Quirchmayr, G., Vogel, P. (eds.) DEXA 2001. LNCS, vol. 2113, pp. 450–459. Springer, Heidelberg (2001)
Kim, H.C., Lopez, M., Leutenegger, S.T., Li, K.J.: Efficient Declustering of Nonuniform Multidimensional data Using Shifted Hilbert Curves. In: Lee, Y., Li, J., Whang, K.-Y., Lee, D. (eds.) DASFAA 2004. LNCS, vol. 2973, pp. 694–707. Springer, Heidelberg (2004)
Kim, M.H., Pramanik, S.: Optimal File Distribution For Partial Match Retrieval. In: Proc. SIGMOD Conf., pp. 173–182 (1988)
Liu, D.R., Shekhar, S.: Partitioning Similarity Graphs: A Framework for Declustering Problems. International Journal Information System 21(6), 475–496 (1996)
Liu, D.R., Wu, M.Y.: A Hypergraph Based Approach to Declustering Problems. Distributed and Parallel Databases 10(3), 269–288 (2001)
Prabhakar, S., Abdel-Ghaffar, K., El Abbadi, A.: Cyclic Allocation of Two-Dimensional Data. In: Proc. ICDE Conf., pp. 94–101 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, TW., Kim, HC., Li, KJ. (2004). A Study on Grid Partition for Declustering High-Dimensional Data. In: Yakhno, T. (eds) Advances in Information Systems. ADVIS 2004. Lecture Notes in Computer Science, vol 3261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30198-1_35
Download citation
DOI: https://doi.org/10.1007/978-3-540-30198-1_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23478-4
Online ISBN: 978-3-540-30198-1
eBook Packages: Computer ScienceComputer Science (R0)