Paravirtualization effect on single- and multi-threaded memory-intensive linear algebra software

Youseff, Lamia; Seymour, Keith; You, Haihang; Zagorodnov, Dmitrii; Dongarra, Jack; Wolski, Rich

doi:10.1007/s10586-009-0080-4

Paravirtualization effect on single- and multi-threaded memory-intensive linear algebra software

Published: 24 January 2009

Volume 12, pages 101–122, (2009)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Lamia Youseff¹,
Keith Seymour²,
Haihang You²,
Dmitrii Zagorodnov¹,
Jack Dongarra² &
…
Rich Wolski¹

125 Accesses
7 Citations
Explore all metrics

Abstract

Previous studies have revealed that paravirtualization imposes minimal performance overhead on High Performance Computing (HPC) workloads, while exposing numerous benefits for this field. In this study, we are investigating the impact of paravirtualization on the performance of automatically-tuned software systems. We compare peak performance, performance degradation in constrained memory situations, performance degradation in multi-threaded applications, and inter-VM shared memory performance. For comparison purposes, we examine the proficiency of ATLAS, a quintessential example of an autotuning software system, in tuning the BLAS library routines for paravirtualized systems. Our results show that the combination of ATLAS and Xen paravirtualization delivers native execution performance and nearly identical memory hierarchy performance profiles in both single and multi-threaded scenarios. Furthermore, we show that it is possible to achieve memory sharing among OS instances at native speeds. These results expose new benefits to memory-intensive applications arising from the ability to slim down the guest OS without influencing the system performance. In addition, our findings support a novel and very attractive deployment scenario for computational science and engineering codes on virtual clusters and computational clouds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluating the Impact of OpenMP 4.0 Extensions on Relevant Parallel Workloads

Potential of a modern vector supercomputer for practical applications: performance evaluation of SX-ACE

Article Open access 07 March 2017

Ryusuke Egawa, Kazuhiko Komatsu, … Hiroaki Kobayashi

A Survey of Application Memory Usage on a National Supercomputer: An Analysis of Memory Requirements on ARCHER

References

Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures. Morgan Kaufmann, Los Altos (2002)
Google Scholar
Amazon: Amazon Elastic Compute Cloud (EC2). http://aws.amazon.com/ec2 (2007)
Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide. Society for Industrial and Applied Mathematics, Philadelphia (1999)
Google Scholar
Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., Yelick, K.A.: The landscape of parallel computing research: a view from Berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, Dec. (2006)
Back, G., Nikolopoulos, D.S.: Application-specific customization on many-core platforms: the VT-ASOS framework. In: Proceedings of the Second Workshop on Software and Tools for Multi-Core Systems, March 2007
Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R.: Virtual machine monitors: Xen and the art of virtualization. In: Symposium on Operating Systems Principles (SOSP), 2003
Bilmes, J., Asanovic, K., Chin, C.-W., Demmel, J.: Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In: International Conference on Supercomputing, pp. 340–347, 1997
Blackford, L.S., Demmel, J., Dongarra, J., Duff, I., Hammarling, S., Henry, G., Heroux, M., Kaufman, L., Lumsdaine, A., Petitet, A., Pozo, R., Remington, K., Whaley, R.C.: An updated set of Basic Linear Algebra Subprograms (BLAS). ACM Trans. Math. Softw. 28(2), 135–151 (2002)
Article Google Scholar
Clark, C., Fraser, K., Hand, S., Hansen, J.G., Jul, E., Limpach, C., Pratt, I., Warfield, A.: Live migration of virtual machines. In: USENIX Symposium on Networked Systems Design and Implementation (NSDI ’05), Boston, MA, USA, May 2005
Demmel, J., Dongarra, J., Eijkhout, V., Fuentes, E., Petitet, A., Vuduc, R., Whaley, C., Yelick, K.: Self-adapting linear algebra algorithms and software. Proc. IEEE 93(2), 293–312 (2005) (Special Issue on Program Generation, Optimization, and Adaptation)
Article Google Scholar
Dongarra, J.J., Croz, J.D., Hammarling, S., Hanson, R.J.: An extended set of FORTRAN Basic Linear Algebra Subprograms. ACM Trans. Math. Softw. 14(1), 1–17 (1988)
Article MATH Google Scholar
Foster, I., Freeman, T., Keahy, K., Scheftner, D., Sotomayer, B., Zhang, X.: Virtual clusters for grid communities. In: CCGRID ’06: Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID’06). Washington, DC, USA, 2006, pp. 513–520. IEEE Computer Society, Los Alamitos (2006)
Chapter Google Scholar
Frigo, M., Johnson, S.G.: FFTW: An adaptive software architecture for the FFT. In: Proc. 1998 IEEE Intl. Conf. Acoustics Speech and Signal Processing, vol. 3, pp. 1381–1384. IEEE, New York (1998)
Google Scholar
Huang, W., Koop, M., Panda, D.: Efficient one-copy MPI shared memory communication in virtual machines. In: IEEE Cluster 2008, 2008
IBM: IBM Blue Cloud. http://www-03.ibm.com/press/us/en/pressrelease/22613.wss, Nov. (2007)
Krintz, C., Wolski, R.: Using phase behavior in scientific application to guide Linux operating system customization. In: Workshop on Next Generation Software at IEEE International Parallel and Distributed Processing Symposium (IPDPS), April 2005
Lamport, L.: A new solution of Dijkstra’s concurrent programming problem. Commun. ACM 17(8), 453–455 (1974)
Article MATH MathSciNet Google Scholar
Lawson, C.L., Hanson, R.J., Kincaid, D.R., Krogh, F.T.: Basic Linear Algebra Subprograms for Fortran usage. ACM Trans. Math. Soft. 5(3), 308–323 (1979)
Article MATH Google Scholar
Levon, J.: Oprofile—a system profiler for Linux. http://oprofile.sourceforge.net/ (2004)
Mergen, M.F., Uhlig, V., Krieger, O., Xenidis, J.: Virtualization for high-performance computing. SIGOPS Oper. Syst. Rev. 40(2), 8–11 (2006)
Article Google Scholar
Nagarajan, A.B., Mueller, F., Engelmann, C., Scott, S.L.: Proactive fault tolerance for HPC with Xen virtualization. In: ICS ’07: Proceedings of the 21st Annual International Conference on Supercomputing. New York, NY, USA, 2007, pp. 23–32. ACM, New York (2007)
Google Scholar
Naughton, T., Vallee, G., Scott, S.: Dynamic adaptation using Xen. In: First Workshop on System-level Virtualization for High Performance Computing (HPCVirt 2007), March 2007
Padua, D.A., Wolfe, M.: Advanced Compiler Optimizations for Supercomputers. Commun. ACM 29(12), 1184–1201 (1986)
Article Google Scholar
Ranadive, A., Kesavan, M., Gavrilovska, A., Schwan, K.: Performance implications of virtualizing multicore cluster machines. In: Workshop on HPC System Virtualization, in Conjunction with Eurosys’08, Glasgow, UK, 2008
Ruth, P., Rhee, J., Xu, D., Kennell, R., Goasguen, S.: Autonomic live adaptation of virtual computational environments in a multi-domain infrastructure. In: Autonomic Computing, 2006. ICAC ’06. IEEE International Conference, pp. 5–14, 2006
Vuduc, R., Demmel, J., Yelick, K.: OSKI: a library of automatically tuned sparse matrix kernels. In: Proc. SciDAC 2005, Journal of Physics: Conference Series, vol. 16, San Francisco, CA, June 2005
Whaley, R.C., Petitet, A., Dongarra, J.: Automated Empirical Optimizations of Software and the ATLAS Project. Parallel Comput. 27(1–2), 3–35 (2001)
Article MATH Google Scholar
Whitaker, A., Shaw, M., Gribble, S.: Scale and performance in the Denali isolation kernel. In: Symposium on Operating Systems Design and Implementation (OSDI), 2002
Youseff, L., Wolski, R., Gorda, B., Krintz, C.: Evaluating the performance impact of Xen on MPI and process execution for HPC systems. In: VTDC ’06: Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing, 2006
Youseff, L., Wolski, R., Gorda, B., Krintz, C.: Paravirtualization for HPC systems. In: Min, G., Martino, B.D., Yang, L.T., Guo, M., Rünger, G. (eds.) ISPA Workshops. Lecture Notes in Computer Science, vol. 4331, pp. 474–486. Springer, Berlin (2006)
Google Scholar
Youseff, L., Wolski, R., Krintz, C.: Linux kernel specialization for scientific application performance. Technical Report UCSB Technical Report 2005-29, Univ. of California, Santa Barbara, Nov. (2005)
Zhang, X., McIntosh, S., Rohatgi, P., Griffin, J.L.: Xensocket: a high-throughput interdomain transport for vms. Technical report, IBM Research Technical Report RC24247 (2007)

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, University of California, Santa Barbara, USA
Lamia Youseff, Dmitrii Zagorodnov & Rich Wolski
Dept. of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, USA
Keith Seymour, Haihang You & Jack Dongarra

Authors

Lamia Youseff
View author publications
You can also search for this author in PubMed Google Scholar
Keith Seymour
View author publications
You can also search for this author in PubMed Google Scholar
Haihang You
View author publications
You can also search for this author in PubMed Google Scholar
Dmitrii Zagorodnov
View author publications
You can also search for this author in PubMed Google Scholar
Jack Dongarra
View author publications
You can also search for this author in PubMed Google Scholar
Rich Wolski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lamia Youseff.

Additional information

This work is sponsored in part by NSF grants (ST-HEC-0444412 and CCF-0331645).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Youseff, L., Seymour, K., You, H. et al. Paravirtualization effect on single- and multi-threaded memory-intensive linear algebra software. Cluster Comput 12, 101–122 (2009). https://doi.org/10.1007/s10586-009-0080-4

Download citation

Received: 30 December 2008
Accepted: 06 January 2009
Published: 24 January 2009
Issue Date: June 2009
DOI: https://doi.org/10.1007/s10586-009-0080-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Paravirtualization effect on single- and multi-threaded memory-intensive linear algebra software

Abstract

Access this article

Similar content being viewed by others

Evaluating the Impact of OpenMP 4.0 Extensions on Relevant Parallel Workloads

Potential of a modern vector supercomputer for practical applications: performance evaluation of SX-ACE

A Survey of Application Memory Usage on a National Supercomputer: An Analysis of Memory Requirements on ARCHER

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Paravirtualization effect on single- and multi-threaded memory-intensive linear algebra software

Abstract

Access this article

Similar content being viewed by others

Evaluating the Impact of OpenMP 4.0 Extensions on Relevant Parallel Workloads

Potential of a modern vector supercomputer for practical applications: performance evaluation of SX-ACE

A Survey of Application Memory Usage on a National Supercomputer: An Analysis of Memory Requirements on ARCHER

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation