Improving Performance on Data-Intensive Applications Using a Load Balancing Methodology Based on Divisible Load Theory

Rosas, Claudia; Sikora, Anna; Jorba, Josep; Moreno, Andreu; César, Eduardo

doi:10.1007/s10766-012-0199-4

Improving Performance on Data-Intensive Applications Using a Load Balancing Methodology Based on Divisible Load Theory

Published: 27 June 2012

Volume 42, pages 94–118, (2014)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Claudia Rosas¹,
Anna Sikora¹,
Josep Jorba²,
Andreu Moreno³ &
…
Eduardo César¹

504 Accesses
Explore all metrics

Abstract

Data-intensive applications are those that explore, query, analyze, and, in general, process very large data sets. Generally, these applications can be naturally implemented in parallel but, in many cases, these implementations show severe performance problems mainly due to load imbalances, inefficient use of available resources, and improper data partition policies. It is worth noticing that the problem becomes more complex when the conditions causing these problems change at run time. This paper proposes a methodology for dynamically improving the performance of certain data-intensive applications based on: adapting the size and number of data partitions, and the number of processing nodes, to the current application conditions in homogeneous clusters. To this end, the processing of each exploration is monitored and gathered data is used to dynamically tune the performance of the application. The tuning parameters included in the methodology are: (i) the partition factor of the data set, (ii) the distribution of the data chunks, and (iii) the number of processing nodes to be used. The methodology assumes that a single execution includes multiple related explorations on the same partitioned data set, and that data chunks are ordered according to their processing times during the application execution to assign first the most time consuming partitions. The methodology has been validated using the well-known bioinformatics tool—BLAST—and through extensive experimentation using simulation. Reported results are encouraging in terms of reducing total execution time of the application (up to a 40 % in some cases).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Data Partitioning Model for Highly Heterogeneous Systems

Adaptive load balancing in cluster computing environment

Article 10 June 2023

Classification of Load Balancing Optimization Algorithms in Cloud Computing: A Survey Based on Methodology

Article 01 June 2024

References

Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410(8) (1990)
Google Scholar
Banicescu, I., Velusamy, V.: Load balancing highly irregular computations with the adaptive factoring. In: Proceedings of the 16th International Parallel and Distributed Processing Symposium, IPDPS ’02, p. 195. IEEE Computer Society, Washington, DC, USA (2002). doi:10.1109/IPDPS.2002.1015661
Bharadwaj V., Ghose D., Robertazzi T.G.: Divisible Load Theory: a new paradigm for load scheduling in distributed systems. Clust. Comput. 6, 7–17 (2003). doi:10.1023/A:1020958815308
Article Google Scholar
Bharadwaj V., Robertazzi T.G., Ghose D.: Scheduling Divisible Loads in Parallel and Distributed Systems. IEEE Computer Society Press, Los Alamitos, CA, USA (1996)
Google Scholar
Boutammine, S.S., Millot D., Parrot C.: An adaptive scheduling method for grid computing. In: Euro-Par 2006 Parallel Processing, vol. 4128, pp. 188–197. Springer, Berlin, Heidelberg (2006). doi:10.1007/11823285_20
Bryant, R.E.: Data-Intensive Supercomputing: The Case for DISC. Tech. rep., Carnegie Mellon University (2007)
Cannataro M., Talia D., Srimani P.K.: Parallel data intensive computing in scientific and commercial applications. Parallel Comput. Parallel Data-Intensive Algorithms Appl. 28(5), 673–704 (2002). doi:10.1016/S0167-8191(02)00091-1
Google Scholar
César E., Moreno A., Sorribes J., Luque E.: Modeling Master/Worker applications for automatic performance tuning. Parallel Comput. Algorithm Skelet. 32(7–8), 568–589 (2006). doi:10.1016/j.parco.2006.06.005
Google Scholar
Chiba, T., den Burger, M., Kielmann, T., Matsuoka, S.: Dynamic load-balanced multicast for data-intensive applications on clouds. In: Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, CCGRID ’10, pp. 5–14. IEEE Computer Society, Washington, DC, USA (2010). doi:10.1109/CCGRID.2010.63
Chuprat, S., Baruah, S.: Scheduling divisible real-time loads on clusters with varying processor start times. In: Proceedings of the 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, pp. 15–24 (2008). doi:10.1109/RTCSA.2008.23
Darling, A.E., Carey, L., Feng, W.: The design, implementation, and evaluation of mpiBLAST. In: 4th International Conference on Linux Clusters: The HPC Revolution 2003 in conjunction with ClusterWorld Conference & Expo (2003)
Drozdowski M., Wolniewicz P.: Divisible load scheduling in systems with limited memory. Clust. Comput. 6, 19–29 (2003). doi:10.1023/A:1020910932147
Article Google Scholar
Glimcher, L., Ravi, V., Agrawal, G.: Supporting load balancing for distributed data-intensive applications. In: International Conference on High Performance Computing (HiPC ’09), pp. 235–244 (2009). doi:10.1109/HIPC.2009.5433204
Hummel S.F., Schonberg E., Flynn L.E.: Factoring: a method for scheduling parallel loops. Commun. ACM 35, 90–101 (1992). doi:10.1145/135226.135232
Article Google Scholar
Lin H., Ma X., Feng W.C., Samatova N.: Coordinating computation and i/o in massively parallel sequence search. IEEE Trans. Parallel Distrib. Syst. 22(4), 529–543 (2011). doi:10.1109/TPDS.2010.101
Article Google Scholar
Lin, X., Lu, Y., Deogun, J., Goddard, S.: Real-time divisible load scheduling for cluster computing. In: Proceedings of the 13th IEEE Real Time and Embedded Technology and Applications Symposium, RTAS ’07, pp. 303–314. IEEE Computer Society, Washington, DC, USA (2007). doi:10.1109/RTAS.2007.29
Lin X., Mamat A., Lu Y., Deogun J., Goddard S.: Real-time scheduling of divisible loads in cluster computing environments. J. Parallel Distrib. Comput. 70(3), 296–308 (2010). doi:10.1016/j.jpdc.2009.11.009
Article MATH Google Scholar
Lu, W., Jackson, J., Barga, R.: AzureBlast: a case study of developing science applications on the cloud. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC ’10, pp. 413–420. ACM, New York, NY, USA (2010). doi:10.1145/1851476.1851537
Matsunaga, A., Tsugawa, M., Fortes, J.: CloudBLAST: combining mapreduce and virtualization on distributed resources for bioinformatics applications. In: Proceedings of the 2008 Fourth IEEE International Conference on eScience, ESCIENCE ’08, pp. 222–229. IEEE Computer Society, Washington, DC, USA (2008). doi:10.1109/eScience.2008.62
NCBI: Blast homepage. http://blast.ncbi.nlm.nih.gov/ (2010). http://blast.ncbi.nlm.nih.gov/
Oehmen C., Nieplocha J.: ScalaBLAST: a scalable implementation of BLAST for high-performance data-intensive bioinformatics analysis. IEEE Trans. Parallel Distrib. Syst. 17, 740–749 (2006). doi:10.1109/TPDS.2006.112
Article Google Scholar
Othman, M., Abdullah, M., Ibrahim, H., Subramaniam, S.: Adaptive divisible load model for scheduling data-intensive grid applications. In: Proceedings of the 7th International Conference on Computational Science, Part I, ICCS ’07, pp. 446–453. Springer, Berlin, Heidelberg (2007). doi:10.1007/978-3-540-69384-0_30
Schatz M.C.: CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25(11), 1363–1369 (2009). doi:10.1093/bioinformatics/btp236
Article Google Scholar
Smith T.F., Waterman M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)
Article Google Scholar
Yang, Y., Casanova, H., Drozdowski, M., Lawenda, M., Legrand, A.: On the complexity of multi-round divisible load scheduling. Research Report RR-6096, INRIA (2007). http://hal.inria.fr/inria-00123711

Download references

Author information

Authors and Affiliations

Universitat Autonoma de Barcelona, 08193, Bellaterra, Spain
Claudia Rosas, Anna Sikora & Eduardo César
Universitat Oberta de Catalunya, 08018, Barcelona, Spain
Josep Jorba
Escola Universitaria Salesiana de Sarria, 08017, Barcelona, Spain
Andreu Moreno

Authors

Claudia Rosas
View author publications
You can also search for this author inPubMed Google Scholar
Anna Sikora
View author publications
You can also search for this author inPubMed Google Scholar
Josep Jorba
View author publications
You can also search for this author inPubMed Google Scholar
Andreu Moreno
View author publications
You can also search for this author inPubMed Google Scholar
Eduardo César
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Claudia Rosas.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rosas, C., Sikora, A., Jorba, J. et al. Improving Performance on Data-Intensive Applications Using a Load Balancing Methodology Based on Divisible Load Theory. Int J Parallel Prog 42, 94–118 (2014). https://doi.org/10.1007/s10766-012-0199-4

Download citation

Received: 09 December 2011
Accepted: 12 June 2012
Published: 27 June 2012
Issue Date: February 2014
DOI: https://doi.org/10.1007/s10766-012-0199-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving Performance on Data-Intensive Applications Using a Load Balancing Methodology Based on Divisible Load Theory

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Data Partitioning Model for Highly Heterogeneous Systems

Adaptive load balancing in cluster computing environment

Classification of Load Balancing Optimization Algorithms in Cloud Computing: A Survey Based on Methodology

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now