Abstract
OpenMP has become the dominant standard for shared memory programming. It is traditionally used for Symmetric Multiprocessor Systems, but has more recently also found its way to parallel architectures with distributed shared memory like NUMA machines. This combines the advantages of OpenMP’s easy-to-use programming model with the scalability and cost-effectiveness of NUMA architectures.
In NUMA (Non Uniform Memory Access) environments, however, OpenMP codes suffer from the longer latencies of remote memory accesses. This can be observed for both hardware and software DSM systems. In this paper we present SIMT/OMP, a simulation environment capable of modeling NUMA scenarios and providing comprehensive performance data about the inter-connection traffic. We use this tool to study the impact of NUMA on the performance of OpenMP applications and show how the memory layout of these codes can be improved using a visualization tool. Based on these techniques, we have achieved performance increases of up to a factor of five on some of our benchmarks, especially in larger system configurations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Basumallik, A., Min, S.-J., Eigenmann, R.: Towards OpenMP Execution on Software Distributed Shared Memory Systems. In: Zima, H.P., Joe, K., Sato, M., Seo, Y., Shimasaki, M. (eds.) ISHPC 2002. LNCS, vol. 2327, pp. 457–468. Springer, Heidelberg (2002)
Bailey, D., et al.: The NAS Parallel Benchmarks. Technical Report RNR-94-007, Department of Mathematics and Computer Science, Emory University (March 1994)
Gonzàlez, M., Ayguadé, E., Martorell, X., Labarta, J., Navarro, N., Oliver, J.: NanosCompiler: Supporting Flexible Multilevel Parallelism in OpenMP. Concurrency: Practice and Experience 12(12), 1205–1218 (2000)
Grbic, T.S., Brown, S., Caranci, S., Grindley, G., Gusat, M., Lemieux, G., Loveless, K., Manjikian, N., Srbljic, S., Stumm, M., Vranesic, Z., Zilic, Z.: Design and Implementation of the NUMAchine Multiprocessor. In: Proceedings of the 1998 Conference on Design Automation, Los Alamitos, CA, June 1998, pp. 66–69 (1998)
Hellwagner, H., Reinefeld, A. (eds.): SCI: Scalable Coherent Interface. LNCS, vol. 1734. Springer, Heidelberg (1999)
Hellwagner, H., Reinefeld, A. (eds.): SCI: Scalable Coherent Interface. LNCS, vol. 1734. Springer, Heidelberg (1999)
Jin, H., Frumkin, M., Yan, J.: The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance. Technical Report NAS-99-011, NASA Ames Research Center (October 1999)
Kee, Y.-S., Kim, J.-S., Ha, S.: ParADE: An OpenMP Programming Environment for SMP Cluster Systems. In: Proceedings of Supercomputing (SC 2003), Phoenix, USA (November 2003)
Kusano, K., Satoh, S., Sato, M.: Performance Evaluation of the Omni OpenMP Compiler. In: Valero, M., Joe, K., Kitsuregawa, M., Tanaka, H. (eds.) ISHPC 2000. LNCS, vol. 1940, pp. 403–414. Springer, Heidelberg (2000)
Laudon, J., Lenoski, D.: The SGI Origin: A ccNUMA Highly Scalable Server. In: Proceedings of the 24th International Symposium on Computer Architecture, May 1997, pp. 241–251 (1997)
Martorell, X., Ayguadé, E., Navarro, N., Corbalán, J., González, M., Labarta, J.: Thread Fork/Join Techniques for Multi-Level Parallelism Exploitation in NUMA Multiprocessors. In: Proceedings of the 1999 International Conference on Supercomputing, Rhodes, Greece, June 1999, pp. 294–301 (1999)
Mu, T., Tao, J., Schulz, M., McKee, S.A.: Interactive Locality Optimization on NUMA Architectures. In: Proceedings of the ACM Symposium on Software Visualization, San Diego, USA (June 2003)
Nguyen, A.-T., Michael, M., Sharma, A., Torrellas, J.: The Augmint Multiprocessor Simulation Toolkit for Intel x86 Architectures. In: Proceedings of 1996 International Conference on Computer Design, October 1996, pp. 486–491. IEEE Computer Society Press, Los Alamitos (1996)
Nitzberg, B., Lo, V.: Distributed Shared Memory: A Survey of Issues and Algorithms. IEEE Computer, 52–59 (August 1991)
Sato, M., Harada, H., Ishikawa, Y.: OpenMP compiler for Software Distributed Shared Memory System SCASH. In: Proceedings of Workshop on OpenMP Applications and Tool, WOMPAT (2000)
Tao, J., Schulz, M., Karl, W.: A Simulation Tool for Evaluating Shared Memory Systems. In: Proceedings of the 36th Annual Simulation Symposium, Orlando, Florida, April 2003, pp. 335–342 (2003)
Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: characterization and methodological considerations. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, June 1995, pp. 24–36 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tao, J., Schulz, M., Karl, W. (2005). SIMT/OMP: A Toolset to Study and Exploit Memory Locality of OpenMP Applications on NUMA Architectures. In: Chapman, B.M. (eds) Shared Memory Parallel Programming with Open MP. WOMPAT 2004. Lecture Notes in Computer Science, vol 3349. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31832-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-31832-3_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24560-5
Online ISBN: 978-3-540-31832-3
eBook Packages: Computer ScienceComputer Science (R0)