Abstract
Modern computation is becoming complex in a way that the resource requirement is gradually increasing. High Throughput Computing is one technique to deal with such a complexity. After a significant amount of time, computing clusters gets highly overloaded resulting in degradation of performance. Since there is no central coordinator in Computer Supported Cooperative Working (CSCW) load-balancing is more complex. An overloaded node does not participate in a CSCW network as they are already overloaded. This paper proposes migration of computation intensive jobs from overloaded nodes, which will allow overloaded nodes to be able to participate in CSCW. The proposed solution improves the performance by making more nodes participating in CSCW by migrating compute intensive jobs from overloaded nodes to underloaded nodes. Evaluation of proposed approach shows that the availability and performance of the CSCW clusters is improved by 30%-40% with fault-tolerance based load balancing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Selikhov, A., Germain, C.: A Channel Memory based fault tolerance for MPI applications. Future Generation Computer Systems 21(5), 709–715 (2005)
Al-Saqabi, K.H., Saleh, K.A.: An efficient process migration algorithm for homogeneous clusters. Information and Software Technology 38(9), 569–580 (1996)
Hursey, J., Graham, R.L.: Analyzing fault aware collective performance in a process fault tolerant MPI. Parallel Computing 38(1-2), 15–25 (2012)
Chtepen, M., Claeys, F.H.A., Dhoedt, B., De Turck, F., Demeester, P., Vanrolleghem, P.A.: Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids. IEEE Transactions on Parallel and Distributed Systems 20(2), 180–190 (2009)
Lopriore, L.: Object and process migration in a single-address-space distributed system. Microprocessors and Microsystems 23(10), 587–595 (2000)
Payli, R.U., et al.: DLB—a dynamic load balancing tool for grid computing. Scientific International Journal for Parallel and Distributed Computing 07(02) (2004)
Cao, J., et al.: Grid load balancing using intelligent agents. Future Generation Computer Systems 21(1), 135–149 (2005)
Yagoubi, Slimani, Y.: Task load balancing for grid computing. Journal of Computer Science 3(3), 186–194 (2007)
Nehra, N., Patel, R.B., Bhatt, V.K.: A framework for distributed dynamic load balancing in heterogeneous cluster. Journal of Computer Science (2007)
Hargrove, P.H., Duell, J.C.: Berkeley lab checkpoint/restart (BLCR) for Linux clusters, https://ftg.lbl.gov/assets/projects/CheckpointRestart/Pubs/LBNL-60520.pdf
RodrÃguez, G., Pardo, X.C., MartÃn, M.J., González, P.: Performance evaluation of an application-level checkpointing solution on grids. Future Generation Computer Systems 26, 1012–1023 (2010), doi:10.1016/j.future.2010.04.016
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hariyale, H., Vardhan, M., Pandey, A., Mishra, A., Kushwaha, D.S. (2012). Load Balancing in Cluster Using BLCR Checkpoint/Restart. In: Meghanathan, N., Nagamalai, D., Chaki, N. (eds) Advances in Computing and Information Technology. Advances in Intelligent Systems and Computing, vol 176. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31513-8_74
Download citation
DOI: https://doi.org/10.1007/978-3-642-31513-8_74
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31512-1
Online ISBN: 978-3-642-31513-8
eBook Packages: EngineeringEngineering (R0)