Abstract
Efficient management of a distributed system is a common problem for university’s and commercial computer centres, and handling node failures is a major aspect of it. Failures which are rare in a small commodity cluster, at large scale become common, and there should be a way to overcome them without restarting all parallel processes of an application. The efficiency of existing methods can be improved by forming a hierarchy of distributed processes. That way only lower levels of the hierarchy need to be restarted in case of a leaf node failure, and only root node needs special treatment. Process hierarchy changes in real time and the workload is dynamically rebalanced across online nodes. This approach makes it possible to implement efficient partial restart of a parallel application, and transactional behaviour for computer centre service tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Andrianov, S., Degtyarev, A.: Parallel and distributed computations. Saint Petersburg State University (2007). (in Russian)
Armstrong, J.: Making reliable distributed systems in the presence of software errors. PhD thesis, The Royal Institute of Technology Stockholm, Sweden (2003)
Degtyarev, A.: High performance computer technologies in shipbuilding. In: Birk, L., Harries, S. (eds.) OPTIMISTIC – optimization in marine design. Mensch & Buch Verlag, Berlin
Handigol, N., Heller, B., Jeyakumar, V., Lantz, B., McKeown, N.: Reproducible network experiments using container-based emulation. In: Proceedings of the 8th International Conference on Emerging Networking Experiments and Technologies, pp. 253–264. ACM (2012)
Heller, B.: Reproducible Network Research with High-fidelity Emulation. PhD thesis, Stanford University (2013)
Kochman, S., Wojciechowski, P.T., Kmieciak, M.: Batched transactions for RESTful web services. In: Harth, A., Koch, N. (eds.) ICWE 2011. LNCS, vol. 7059, pp. 86–98. Springer, Heidelberg (2012)
Lantz, B., Heller, B., McKeown, N.: A network in a laptop: rapid prototyping for software-defined networks. In: Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks, p. 19. ACM (2010)
Lifflander, J., Meneses, E., Menon, H., Miller, P., Krishnamoorthy, S., Kalé, L.V.: Scalable replay with partial-order dependencies for message-logging fault tolerance. In: IEEE International Conference on Cluster Computing (CLUSTER), pp. 19–28. IEEE (2014)
Soshmina, I., Bogdanov, A.: Using GRID technologies for computations. Saint Petersburg State University Bulletin (Physics and Chemistry) 3, 130–137 (2007). (in Russian)
Tel, G.: Introduction to distributed algorithms. Cambridge University Press (2000)
Wilde, E., Pautasso, C.: REST: from research to practice. Springer Science & Business Media (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Gankevich, I., Tipikin, Y., Degtyarev, A., Korkhov, V. (2015). Novel Approaches for Distributing Workload on Commodity Computer Systems. In: Gervasi, O., et al. Computational Science and Its Applications -- ICCSA 2015. ICCSA 2015. Lecture Notes in Computer Science(), vol 9158. Springer, Cham. https://doi.org/10.1007/978-3-319-21410-8_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-21410-8_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21409-2
Online ISBN: 978-3-319-21410-8
eBook Packages: Computer ScienceComputer Science (R0)