Skip to main content
Log in

Improving the performance of I/O-intensive applications on clusters of workstations

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Load balancing in a workstation-based cluster system has been investigated extensively, mainly focusing on the effective usage of global CPU and memory resources. However, if a significant portion of applications running in the system is I/O-intensive, traditional load balancing policies can cause system performance to decrease substantially. In this paper, two I/O-aware load-balancing schemes, referred to as IOCM and WAL-PM, are presented to improve the overall performance of a cluster system with a general and practical workload including I/O activities. The proposed schemes dynamically detect I/O load imbalance of nodes in a cluster, and determine whether to migrate some I/O load from overloaded nodes to other less- or under-loaded nodes. The current running jobs are eligible to be migrated in WAL-PM only if overall performance improves. Besides balancing I/O load, the scheme judiciously takes into account both CPU and memory load sharing in the system, thereby maintaining the same level of performance as existing schemes when I/O load is low or well balanced. Extensive trace-driven simulations for both synthetic and real I/O-intensive applications show that: (1) Compared with existing schemes that only consider CPU and memory, the proposed schemes improve the performance with respect to mean slowdown by up to a factor of 20; (2) When compared to the existing approaches that only consider I/O with non-preemptive job migrations, the proposed schemes achieve improvements in mean slowdown by up to a factor of 10; (3) Under CPU-memory intensive workloads, our scheme improves the performance over the existing approaches that only consider I/O by up to 47.5%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. A. Acharva and S. Setia, Availability and utility of idle memory in workstation clusters, in: Proceedings of the ACM SIGMETRICS Conf. on Measuring and Modeling of Computer Systems (1999).

  2. A. Acharya et al, Tuning the performance of I/O-intensive parallel applications, in: Proceedings of the 4th IOPADS, Philadelphia, PA (1996) pp. 15–27.

  3. J. Basney and M. Livny, Managing network resources in condor, in: Proceedings of the Ninth IEEE Symposium on High Performance Distributed Computing (HPDC9) (2000) pp. 298–299.

  4. A. D. Brown, T. C. Mowry, and O. Krieger, Compiler-based I/O prefetching for out-of-core applications. ACM Transactions on Computer Systems 19(2) (2001) 111–170.

    Article  Google Scholar 

  5. C. Chang, B. Moon, A. Acharya, C. Shock, A. Sussman, and J. Saltz, Titan: A high-performance remote-sensing database, in: Proc. of International Conference on Data Engineering (1997).

  6. M. M. Cettei, W. B. L. III, and R. B. Ross, Support for parallel out of core applications on beowulf workstations, in: Proceedings of the 1998 IEEE Aerospace Conference (1998).

  7. J. Cruz and K. Park, Towards communication-sensitive load balancing, in: Proc. 21 Int’l Conf. Distributed Computing Systems (ICDCS 2001) (2001).

  8. B. Forney, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau, Storage-aware caching: Revisiting caching for heterogeneous storage systems, in: Proceedings of the 1st Symposium on File and Storage Technology Monterey, California, USA (2002).

  9. P. Geoffray, OPIOM: Off-processor I/O with Myrinet. Future Generation Computer Systems 18 (2002) 491–499.

    Article  MATH  Google Scholar 

  10. M. Harchol-Balter and A. Downey, Exploiting process lifetime distributions for load balancing. ACM Transactions on Computer Systems 15(3) (1997) 253–285.

    Article  Google Scholar 

  11. B. Hendrickson and D. Womble, The torus-wrap mapping for dense matrix calculations on massively parallel computers. SIAM J. Sci. Comput. 15(5) (1994).

  12. C. Hui and S. Chanson, Improved strategies for dynamic load sharing. IEEE Concurrency 7(3) (1999).

  13. C. Isert, and K. Schwan, ACDS: Adapting computational data streams for high performance, in: International Parallel and Distributed Processing Symposium (IPDPS) (2000).

  14. D. Kotz and N. Nieuwejaar, Dynamic file-access characteristics of a production parallel scientific workload, in: Proceedings of the ACM Conference on Supercomputing (1994) pp. 640–649.

  15. R. Lavi and A. Barak, The home model and competitive algorithm for load balancing in a computing cluster, in: Proceedings of the 21st Int’l Conf. Distributed Computing Systems (ICDCS 2001).

  16. L. Lee, P. Scheauermann, and R. Vingralek, File assignment in parallel I/O systems with minimal variance of service time. IEEE Trans. on Computers 49(2) (2000) 127–140.

    Article  Google Scholar 

  17. X. Ma, M. Winslett, J. Lee, and S. Yu, Faster collective output through active buffering, in: Proceedings of the International Symposium on Parallel and Distributed Processing (2002).

  18. B. Pasquale and G. Polyzos, Dynamic I/O characterization of I/O intensive scientific applications, in: Proceedings of the Supercomputing (1994) pp. 660–669.

  19. X. Qin, H. Jiang, Y. Zhu, and D. Swanson, Dynamic load balancing for I/O- and memory-intensive workload in clusters using a feedback control mechanism, in: Proceedings of the 9th International Euro-Par Conference on Parallel Processing (Euro-Par 2003), Klagenfurt, Austria (2003a).

  20. X. Qin, H. Jiang, Y. Zhu, and D. Swanson, Dynamic load balancing for I/O-intensive tasks on heterogeneous clusters, in: Proceedings of the 10th International Conference on High Performance Computing (HiPC 2003), India (2003b).

  21. X. Qin, H. Jiang, Y. Zhu, and D. Swanson, A dynamic load balancing scheme for I/O-intensive applications in distributed systems, in: Proceedings of the 32nd International Conference on Parallel Processing Workshops (2003c).

  22. X. Qin, H. Jiang, Y. Zhu, and D. Swanson, Towards load balancing support for I/O-intensive parallel jobs in a cluster of workstations, in: Proceedings of the 5th IEEE International Conference on Cluster Computing (Cluster 2003), Hong Kong (2003d).

  23. K. Ranganathan and I. Foster, Decoupling computation and data scheduling in distributed data-intensive, in: Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing Edinburgh, Scotland, UK (2002).

  24. A. Riska and E. Smirni, Exact aggregate solutions for M/G/1-type Markov processes, in: Proceedings of ACM Sigmetircs 2002 Edinburgh, Scotland, UK (2002) pp. 86–96.

  25. J. Roads et al., A preliminary description of the Western U.S. climatology, in: Proceedings of the Ninth Annual Pacific Climate (PAClim) Workshop (1992).

  26. M. Surdeanu, D. Modovan, and S. Harabagiu, Performance analysis of a distributed question/answering system. IEEE Trans. on Parallel and Distributed Systems 13(6) (2002) 579–596.

    Article  Google Scholar 

  27. T. Tanaka, Configurations of the solar wind flow and magnetic field around the planets with no magnetic field: Calculation by a new MHD. Journal of Geophysical Research (1993) pp. 17251– 17262.

  28. M. Uysal, A. Acharya, and J. Saltz, Requirements of I/O systems for parallel machines: An Application-driven study, in: Technical Report, CS-TR-3802, University of Maryland, College Park (1997).

  29. G. Voelker, Managing server load in global memory systems, in: Proceedings of the ACM SIGMETRICS Conf. on Measuring and Modeling of Computer Systems (1997).

  30. S. Wu and U. Manber, Agrep—A fast approximate pattern-matching tool, in: the USENIX Conference Proceedings San Francisco, CA (1992) pp. 153–162.

  31. X. Wu, V. Taylor, and R. Stevens, Design and implementation of prophesy automatic instrumentation and data entry system, in: Proc. of the 13th IASTED Int. Conf. on Parallel and Distributed Computing and Systems CA (2001).

  32. L. Xiao, S. Chen, and X. Zhang, Dynamic cluster resource allocations for jobs with known and unknown memory demands. IEEE Trans. on Parallel and Distributed Systems 13(3) (2002) 223–240.

    Article  Google Scholar 

  33. L. Xiao, X. Zhang, and Y. Qu, Effective load sharing on heterogeneous networks of workstations, in: Proc. of International Symposium on Parallel and Distributed Processing (2000).

  34. X. Zhang, Y. Qu, and L. Xiao, Improving distributed workload performance by sharing both cpu and memory resources, in: Proceedings of the 20th Int’l Conf. on Distributed Computing Systems (2000).

  35. Y. Zhu, H. Jiang, X. Qin, D. Feng, and D. Swanson, Improved read performance in a cost-effective, fault-tolerant parallel virtual file system (CEFT-PVFS), in: Proc. of the 3rd IEEE/ACM Intl. Symp. on Cluster Computing and the Grid (2003a) pp. 730–735.

  36. Y. Zhu, H. Jiang, X. Qin, and D. Swanson, A case study of parallel I/O for biological sequence analysis on linux clusters, in: Proceedings of the 5th IEEE International Conference on Cluster Computing, Hong Kong (2003b).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao Qin.

Additional information

Xiao Qin received the BSc and MSc degrees in computer science from Huazhong University of Science and Technology in 1992 and 1999, respectively. He received the PhD degree in computer science from the University of Nebraska-Lincoln in 2004. Currently, he is an assistant professor in the department of computer science at the New Mexico Institute of Mining and Technology. His research interests include parallel and distributed systems, storage systems, real-time computing, performance evaluation, and fault-tolerance. He served on program committees of international conferences like CLUSTER, ICPP, and IPCCC. During 2000–2001, he was on the editorial board of The IEEE Distributed System Online. He is a member of the IEEE.

Hong Jiang received the B.Sc. degree in Computer Engineering in 1982 from Huazhong University of Science and Technology, Wuhan, China; the M.A.Sc. degree in Computer Engineering in 1987 from the University of Toronto, Toronto, Canada; and the PhD degree in Computer Science in 1991 from the Texas A&M University, College Station, Texas, USA. Since August 1991 he has been at the University of Nebraska-Lincoln, Lincoln, Nebraska, USA, where he is Associate Professor and Vice Chair in the Department of Computer Science and Engineering. His present research interests are computer architecture, parallel/distributed computing, computer storage systems and parallel I/O, performance evaluation, middleware, networking, and computational engineering. He has over 70 publications in major journals and international Conferences in these areas and his research has been supported by NSF, DOD and the State of Nebraska. Dr. Jiang is a Member of ACM, the IEEE Computer Society, and the ACM SIGARCH and ACM SIGCOMM.

Yifeng Zhu received the B.E. degree in Electrical Engineering from Huazhong University of Science and Technology in 1998 and the M.S. degree in computer science from University of Nebraska Lincoln (UNL) in 2002. Currently he is working towards his Ph.D. degree in the department of computer science and engineering at UNL. His main fields of research interests are parallel I/O, networked storage, parallel scheduling, and cluster computing. He is a student member of IEEE.

David Swanson received a Ph.D. in physical (computational) chemistry at the University of Nebraska-Lincoln (UNL) in 1995, after which he worked as an NSF-NATO postdoctoral fellow at the Technical University of Wroclaw, Poland, in 1996, and subsequently as a National Research Council Research Associate at the Naval Research Laboratory in Washington, DC, from 1997–1998. In early 1999 he returned to UNL where he has coordinated the Research Computing Facility and currently serves as an Assistant Research Professor in the Department of Computer Science and Engineering. The Office of Naval Research, the National Science Foundation, and the State of Nebraska have supported his research in areas such as large-scale parallel simulation and distributed systems.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qin, X., Jiang, H., Zhu, Y. et al. Improving the performance of I/O-intensive applications on clusters of workstations. Cluster Comput 9, 297–311 (2006). https://doi.org/10.1007/s10586-006-9742-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-006-9742-7

Keywords

Navigation