Skip to main content
Log in

Paravirtualization effect on single- and multi-threaded memory-intensive linear algebra software

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Previous studies have revealed that paravirtualization imposes minimal performance overhead on High Performance Computing (HPC) workloads, while exposing numerous benefits for this field. In this study, we are investigating the impact of paravirtualization on the performance of automatically-tuned software systems. We compare peak performance, performance degradation in constrained memory situations, performance degradation in multi-threaded applications, and inter-VM shared memory performance. For comparison purposes, we examine the proficiency of ATLAS, a quintessential example of an autotuning software system, in tuning the BLAS library routines for paravirtualized systems. Our results show that the combination of ATLAS and Xen paravirtualization delivers native execution performance and nearly identical memory hierarchy performance profiles in both single and multi-threaded scenarios. Furthermore, we show that it is possible to achieve memory sharing among OS instances at native speeds. These results expose new benefits to memory-intensive applications arising from the ability to slim down the guest OS without influencing the system performance. In addition, our findings support a novel and very attractive deployment scenario for computational science and engineering codes on virtual clusters and computational clouds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures. Morgan Kaufmann, Los Altos (2002)

    Google Scholar 

  2. Amazon: Amazon Elastic Compute Cloud (EC2). http://aws.amazon.com/ec2 (2007)

  3. Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide. Society for Industrial and Applied Mathematics, Philadelphia (1999)

    Google Scholar 

  4. Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., Yelick, K.A.: The landscape of parallel computing research: a view from Berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, Dec. (2006)

  5. Back, G., Nikolopoulos, D.S.: Application-specific customization on many-core platforms: the VT-ASOS framework. In: Proceedings of the Second Workshop on Software and Tools for Multi-Core Systems, March 2007

  6. Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R.: Virtual machine monitors: Xen and the art of virtualization. In: Symposium on Operating Systems Principles (SOSP), 2003

  7. Bilmes, J., Asanovic, K., Chin, C.-W., Demmel, J.: Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In: International Conference on Supercomputing, pp. 340–347, 1997

  8. Blackford, L.S., Demmel, J., Dongarra, J., Duff, I., Hammarling, S., Henry, G., Heroux, M., Kaufman, L., Lumsdaine, A., Petitet, A., Pozo, R., Remington, K., Whaley, R.C.: An updated set of Basic Linear Algebra Subprograms (BLAS). ACM Trans. Math. Softw. 28(2), 135–151 (2002)

    Article  Google Scholar 

  9. Clark, C., Fraser, K., Hand, S., Hansen, J.G., Jul, E., Limpach, C., Pratt, I., Warfield, A.: Live migration of virtual machines. In: USENIX Symposium on Networked Systems Design and Implementation (NSDI ’05), Boston, MA, USA, May 2005

  10. Demmel, J., Dongarra, J., Eijkhout, V., Fuentes, E., Petitet, A., Vuduc, R., Whaley, C., Yelick, K.: Self-adapting linear algebra algorithms and software. Proc. IEEE 93(2), 293–312 (2005) (Special Issue on Program Generation, Optimization, and Adaptation)

    Article  Google Scholar 

  11. Dongarra, J.J., Croz, J.D., Hammarling, S., Hanson, R.J.: An extended set of FORTRAN Basic Linear Algebra Subprograms. ACM Trans. Math. Softw. 14(1), 1–17 (1988)

    Article  MATH  Google Scholar 

  12. Foster, I., Freeman, T., Keahy, K., Scheftner, D., Sotomayer, B., Zhang, X.: Virtual clusters for grid communities. In: CCGRID ’06: Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID’06). Washington, DC, USA, 2006, pp. 513–520. IEEE Computer Society, Los Alamitos (2006)

    Chapter  Google Scholar 

  13. Frigo, M., Johnson, S.G.: FFTW: An adaptive software architecture for the FFT. In: Proc. 1998 IEEE Intl. Conf. Acoustics Speech and Signal Processing, vol. 3, pp. 1381–1384. IEEE, New York (1998)

    Google Scholar 

  14. Huang, W., Koop, M., Panda, D.: Efficient one-copy MPI shared memory communication in virtual machines. In: IEEE Cluster 2008, 2008

  15. IBM: IBM Blue Cloud. http://www-03.ibm.com/press/us/en/pressrelease/22613.wss, Nov. (2007)

  16. Krintz, C., Wolski, R.: Using phase behavior in scientific application to guide Linux operating system customization. In: Workshop on Next Generation Software at IEEE International Parallel and Distributed Processing Symposium (IPDPS), April 2005

  17. Lamport, L.: A new solution of Dijkstra’s concurrent programming problem. Commun. ACM 17(8), 453–455 (1974)

    Article  MATH  MathSciNet  Google Scholar 

  18. Lawson, C.L., Hanson, R.J., Kincaid, D.R., Krogh, F.T.: Basic Linear Algebra Subprograms for Fortran usage. ACM Trans. Math. Soft. 5(3), 308–323 (1979)

    Article  MATH  Google Scholar 

  19. Levon, J.: Oprofile—a system profiler for Linux. http://oprofile.sourceforge.net/ (2004)

  20. Mergen, M.F., Uhlig, V., Krieger, O., Xenidis, J.: Virtualization for high-performance computing. SIGOPS Oper. Syst. Rev. 40(2), 8–11 (2006)

    Article  Google Scholar 

  21. Nagarajan, A.B., Mueller, F., Engelmann, C., Scott, S.L.: Proactive fault tolerance for HPC with Xen virtualization. In: ICS ’07: Proceedings of the 21st Annual International Conference on Supercomputing. New York, NY, USA, 2007, pp. 23–32. ACM, New York (2007)

    Google Scholar 

  22. Naughton, T., Vallee, G., Scott, S.: Dynamic adaptation using Xen. In: First Workshop on System-level Virtualization for High Performance Computing (HPCVirt 2007), March 2007

  23. Padua, D.A., Wolfe, M.: Advanced Compiler Optimizations for Supercomputers. Commun. ACM 29(12), 1184–1201 (1986)

    Article  Google Scholar 

  24. Ranadive, A., Kesavan, M., Gavrilovska, A., Schwan, K.: Performance implications of virtualizing multicore cluster machines. In: Workshop on HPC System Virtualization, in Conjunction with Eurosys’08, Glasgow, UK, 2008

  25. Ruth, P., Rhee, J., Xu, D., Kennell, R., Goasguen, S.: Autonomic live adaptation of virtual computational environments in a multi-domain infrastructure. In: Autonomic Computing, 2006. ICAC ’06. IEEE International Conference, pp. 5–14, 2006

  26. Vuduc, R., Demmel, J., Yelick, K.: OSKI: a library of automatically tuned sparse matrix kernels. In: Proc. SciDAC 2005, Journal of Physics: Conference Series, vol. 16, San Francisco, CA, June 2005

  27. Whaley, R.C., Petitet, A., Dongarra, J.: Automated Empirical Optimizations of Software and the ATLAS Project. Parallel Comput. 27(1–2), 3–35 (2001)

    Article  MATH  Google Scholar 

  28. Whitaker, A., Shaw, M., Gribble, S.: Scale and performance in the Denali isolation kernel. In: Symposium on Operating Systems Design and Implementation (OSDI), 2002

  29. Youseff, L., Wolski, R., Gorda, B., Krintz, C.: Evaluating the performance impact of Xen on MPI and process execution for HPC systems. In: VTDC ’06: Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing, 2006

  30. Youseff, L., Wolski, R., Gorda, B., Krintz, C.: Paravirtualization for HPC systems. In: Min, G., Martino, B.D., Yang, L.T., Guo, M., Rünger, G. (eds.) ISPA Workshops. Lecture Notes in Computer Science, vol. 4331, pp. 474–486. Springer, Berlin (2006)

    Google Scholar 

  31. Youseff, L., Wolski, R., Krintz, C.: Linux kernel specialization for scientific application performance. Technical Report UCSB Technical Report 2005-29, Univ. of California, Santa Barbara, Nov. (2005)

  32. Zhang, X., McIntosh, S., Rohatgi, P., Griffin, J.L.: Xensocket: a high-throughput interdomain transport for vms. Technical report, IBM Research Technical Report RC24247 (2007)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lamia Youseff.

Additional information

This work is sponsored in part by NSF grants (ST-HEC-0444412 and CCF-0331645).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Youseff, L., Seymour, K., You, H. et al. Paravirtualization effect on single- and multi-threaded memory-intensive linear algebra software. Cluster Comput 12, 101–122 (2009). https://doi.org/10.1007/s10586-009-0080-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-009-0080-4

Keywords

Navigation