Abstract
Clustering network sites is a vital issue in parallel and distributed database systems DDBS. Grouping distributed database network sites into clusters is considered an efficient way to minimize the communication time required for query processing. However, clustering network sites is still an open research problem since its optimal solution is NP-complete. The main contribution in this field is to find a near optimal solution that groups distributed database network sites into disjoint clusters in order to minimize the communication time required for data allocation. Grouping a large number of network sites into a small number of clusters effectively increases the transaction response time, results in better data distribution, and improves the distributed database system performance. We present a novel algorithm for clustering distributed database network sites based on the communication time as database query processing is time dependent. Extensive experimental tests and simulations are conducted on this clustering algorithm. The experimental and simulation results show that a better network distribution is achieved with significant network servers load balance and network delay, a minor communication time between network sites is realized, and a higher distributed database system performance is recognized.
Similar content being viewed by others
References
Ozsu M, Valduriez P (1991) Principles of distributed database systems, 1st edn. Prentice-Hall, Englewood Cliffs
Chen E (2007) Distributed DBMS concepts and design. Available from: http://www.cs.sjsu.edu/~lee/cs157b/fall2003/Edward_Chen_Chapter%2022.ppt. Accessed 9th November, 2007
Graham J (2005) Efficient allocation in distributed object oriented databases with capacity and security constraints. Ph.D. Dissertation. University of Idaho
Hoffer J, Prescott M, McFadden F (2004) Modern database management, 7th edn. Prentice-Hall, Englewood Cliffs
Ozsu M, Valduriez P (1999) Principles of distributed database systems, 2nd edn. Prentice-Hall, Englewood Cliffs
Can F (1993) Incremental clustering for dynamic information processing. ACM Trans Inf Syst 11(2):143–164
Younis O, Fahmy S (2004) Distributed clustering in ad-hoc sensor networks: a hybrid, energy-efficient approach. In: The conference on computer communications, the twenty-third conference of the IEEE communications society, March 7–11, Hong Kong
Halkidi M, Batistakis Y, Vazirgiannis M (2001) Clustering algorithms and validity measures. In: Proceedings of the SSDBM conference
Lingras P, West C (2004) Interval set clustering of web users with rough k-means. J Intell Inf Syst 23(1):5–16
Shyu M, Chen S, Rubin S (2004) Stochastic clustering for organizing distributed information sources. IEEE Trans Syst Man, Cybern B 34(5):2035–2047
Son J, Kim M (2004) An adaptable vertical partitioning method in distributed systems. J Syst Softw 73(3):551–561
Agrawal S, Narasayya V, Yang B (2004) Integrating vertical and horizontal partitioning into automated physical database design. In: SIGMOD 2004, Paris, France. ACM, New York, pp 359–370
Ma H, Scchewe K, Wang Q (2007) Distribution design for higher-order data models. Data Knowl Eng 60:400–434
Costa R, Lifschitz S (2003) Database allocation strategies for parallel BLAST evaluation on clusters. Distrib Parallel Databases 13:99–127
Menon S (2005) Allocating fragments in distributed databases. IEEE Trans Parallel Distrib Syst 16(7):577–585
Hababeh I, Ramachandran M, Bowring N (2007) A high-performance computing method for data allocation in distributed database systems. J Supercomput 39(1):3–18
Hababeh I, Ramachandran M, Bowring N (2008) Designing a high performance integrated strategy for secured distributed database systems. Int J Comput Res (IJCR) 16(1):1–52
Hamerly G, Elkan C (2003) Learning the K in K-means. In: 7th Annual conference on neural information processing systems
Lingras P, Yao Y (2002) Time complexity of rough clustering: gas versus k-means. In: Third international conference on rough sets and current trends in computing. LNCS. Springer, London, pp 263–270
Kumar P, Krishna P, Bapi R, Kumar S (2007) Rough clustering of sequential data. Data Knowl Eng 63:183–199
Fronczak A, Holyst J, Jedyank M, Sienkiewicz J (2002) Higher order clustering coefficients. Barabasi-Albert networks. Physica A 316(1–4):688–694
OPNET IT Guru Academic Edition 9.1, OPNET Technologies, Inc (2003) Available from: http://www.opnet.com/university_program/itguru_academic_edition/. Accessed 30th January, 2009
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hababeh, I. Improving network systems performance by clustering distributed database sites. J Supercomput 59, 249–267 (2012). https://doi.org/10.1007/s11227-010-0436-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-010-0436-9