Selective Replicated Declustering for Arbitrary Queries

Oktay, K. Yasin; Turk, Ata; Aykanat, Cevdet

doi:10.1007/978-3-642-03869-3_37

K. Yasin Oktay¹⁷,
Ata Turk¹⁷ &
Cevdet Aykanat¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5704))

Included in the following conference series:

European Conference on Parallel Processing

1312 Accesses
4 Citations

Abstract

Data declustering is used to minimize query response times in data intensive applications. In this technique, query retrieval process is parallelized by distributing the data among several disks and it is useful in applications such as geographic information systems that access huge amounts of data. Declustering with replication is an extension of declustering with possible data replicas in the system. Many replicated declustering schemes have been proposed. Most of these schemes generate two or more copies of all data items. However, some applications have very large data sizes and even having two copies of all data items may not be feasible. In such systems selective replication is a necessity. Furthermore, existing replication schemes are not designed to utilize query distribution information if such information is available. In this study we propose a replicated declustering scheme that decides both on the data items to be replicated and the assignment of all data items to disks when there is limited replication capacity. We make use of available query information in order to decide replication and partitioning of the data and try to optimize aggregate parallel response time. We propose and implement a Fiduccia-Mattheyses-like iterative improvement algorithm to obtain a two-way replicated declustering and use this algorithm in a recursive framework to generate a multi-way replicated declustering. Experiments conducted with arbitrary queries on real datasets show that, especially for low replication constraints, the proposed scheme yields better performance results compared to existing replicated declustering schemes.

Download to read the full chapter text

Chapter PDF

S2D: Shared Distributed Datasets, Storing Shared Data for Multiple and Massive Queries Optimization in a Distributed Data Warehouse

Heterogeneous Replicas for Multi-dimensional Data Management

Generating Distributed Query Plans Using Modified Cuckoo Search Algorithm

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Tosun, A.S.: Threshold-based declustering. Information Sciences 177(5), 1309–1331 (2007)
Article MathSciNet MATH Google Scholar
Koyuturk, M., Aykanat, C.: Iterative-improvement-based declustering heuristics for multi-disk databases. Information Systems 30, 47–70 (2005)
Article Google Scholar
Liu, D.R., Shekhar, S.: Partitioning similarity graphs: a framework for declustering problems. Information Systems 21, 475–496 (1996)
Article Google Scholar
Liu, D.R., Wu, M.Y.: A hypergraph based approach to declustering problems. Distributed and Parallel Databases 10(3), 269–288 (2001)
Article MathSciNet Google Scholar
Ozdal, M.M., Aykanat, C.: Hypergraph models and algorithms for data-pattern-based clustering. Data Mining and Knowledge Discovery 9, 29–57 (2004)
Article MathSciNet Google Scholar
Demir, E., Aykanat, C., Cambazoglu, B.B.: A link-based storage scheme for efficient aggregate query processing on clustered road networks. Information Systems (2009), doi:10.1016/j.is.2009.03.005
Google Scholar
Demir, E., Aykanat, C., Cambazoglu, B.B.: Clustering spatial networks for aggregate query processing: A hypergraph approach. Information Systems 33(1), 1–17 (2008)
Article Google Scholar
Tosun, A.S.: Analysis and comparison of replicated declustering schemes. IEEE Trans. Parallel Distributed Systems 18(11), 1587–1591 (2007)
Google Scholar
Sanders, P., Egner, S., Korst, K.: Fast concurrent access to parallel disks. In: Proc. 11th ACM-SIAM Symp. Discrete Algorithms, pp. 849–858 (2000)
Google Scholar
Tosun, A.S.: Replicated declustering for arbitrary queries. In: Proc. 19th ACM Symp. Applied Computing, pp. 748–753 (2004)
Google Scholar
Tosun, A.S.: Design theoretic approach to replicated declustering. In: Proc. Int’l Conf. Information Technology Coding and Computing, pp. 226–231 (2005)
Google Scholar
Fiduccia, C.M., Mattheyses, R.M.: A linear-time heuristic for improving network partitions. In: Proc. of the 19th ACM/IEEE Design Automation Conference, pp. 175–181 (1982)
Google Scholar
Chen, L.T., Rotem, D.: Optimal response time retrieval of replicated data. In: Proc. 13th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, pp. 36–44 (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Bilkent University, 06800, Ankara, Turkey
K. Yasin Oktay, Ata Turk & Cevdet Aykanat

Authors

K. Yasin Oktay
View author publications
You can also search for this author in PubMed Google Scholar
Ata Turk
View author publications
You can also search for this author in PubMed Google Scholar
Cevdet Aykanat
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Software Technology, Delft University of Technology, Mekelweg 4, 2628, Delft, CD, The Netherlands
Henk Sips , Dick Epema & Hai-Xiang Lin , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oktay, K.Y., Turk, A., Aykanat, C. (2009). Selective Replicated Declustering for Arbitrary Queries. In: Sips, H., Epema, D., Lin, HX. (eds) Euro-Par 2009 Parallel Processing. Euro-Par 2009. Lecture Notes in Computer Science, vol 5704. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03869-3_37

Download citation

DOI: https://doi.org/10.1007/978-3-642-03869-3_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03868-6
Online ISBN: 978-3-642-03869-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics