Skip to main content

SIMT/OMP: A Toolset to Study and Exploit Memory Locality of OpenMP Applications on NUMA Architectures

  • Conference paper
Shared Memory Parallel Programming with Open MP (WOMPAT 2004)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3349))

Included in the following conference series:

  • 536 Accesses

Abstract

OpenMP has become the dominant standard for shared memory programming. It is traditionally used for Symmetric Multiprocessor Systems, but has more recently also found its way to parallel architectures with distributed shared memory like NUMA machines. This combines the advantages of OpenMP’s easy-to-use programming model with the scalability and cost-effectiveness of NUMA architectures.

In NUMA (Non Uniform Memory Access) environments, however, OpenMP codes suffer from the longer latencies of remote memory accesses. This can be observed for both hardware and software DSM systems. In this paper we present SIMT/OMP, a simulation environment capable of modeling NUMA scenarios and providing comprehensive performance data about the inter-connection traffic. We use this tool to study the impact of NUMA on the performance of OpenMP applications and show how the memory layout of these codes can be improved using a visualization tool. Based on these techniques, we have achieved performance increases of up to a factor of five on some of our benchmarks, especially in larger system configurations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Basumallik, A., Min, S.-J., Eigenmann, R.: Towards OpenMP Execution on Software Distributed Shared Memory Systems. In: Zima, H.P., Joe, K., Sato, M., Seo, Y., Shimasaki, M. (eds.) ISHPC 2002. LNCS, vol. 2327, pp. 457–468. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  2. Bailey, D., et al.: The NAS Parallel Benchmarks. Technical Report RNR-94-007, Department of Mathematics and Computer Science, Emory University (March 1994)

    Google Scholar 

  3. Gonzàlez, M., Ayguadé, E., Martorell, X., Labarta, J., Navarro, N., Oliver, J.: NanosCompiler: Supporting Flexible Multilevel Parallelism in OpenMP. Concurrency: Practice and Experience 12(12), 1205–1218 (2000)

    Article  MATH  Google Scholar 

  4. Grbic, T.S., Brown, S., Caranci, S., Grindley, G., Gusat, M., Lemieux, G., Loveless, K., Manjikian, N., Srbljic, S., Stumm, M., Vranesic, Z., Zilic, Z.: Design and Implementation of the NUMAchine Multiprocessor. In: Proceedings of the 1998 Conference on Design Automation, Los Alamitos, CA, June 1998, pp. 66–69 (1998)

    Google Scholar 

  5. Hellwagner, H., Reinefeld, A. (eds.): SCI: Scalable Coherent Interface. LNCS, vol. 1734. Springer, Heidelberg (1999)

    Google Scholar 

  6. Hellwagner, H., Reinefeld, A. (eds.): SCI: Scalable Coherent Interface. LNCS, vol. 1734. Springer, Heidelberg (1999)

    Google Scholar 

  7. Jin, H., Frumkin, M., Yan, J.: The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance. Technical Report NAS-99-011, NASA Ames Research Center (October 1999)

    Google Scholar 

  8. Kee, Y.-S., Kim, J.-S., Ha, S.: ParADE: An OpenMP Programming Environment for SMP Cluster Systems. In: Proceedings of Supercomputing (SC 2003), Phoenix, USA (November 2003)

    Google Scholar 

  9. Kusano, K., Satoh, S., Sato, M.: Performance Evaluation of the Omni OpenMP Compiler. In: Valero, M., Joe, K., Kitsuregawa, M., Tanaka, H. (eds.) ISHPC 2000. LNCS, vol. 1940, pp. 403–414. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  10. Laudon, J., Lenoski, D.: The SGI Origin: A ccNUMA Highly Scalable Server. In: Proceedings of the 24th International Symposium on Computer Architecture, May 1997, pp. 241–251 (1997)

    Google Scholar 

  11. Martorell, X., Ayguadé, E., Navarro, N., Corbalán, J., González, M., Labarta, J.: Thread Fork/Join Techniques for Multi-Level Parallelism Exploitation in NUMA Multiprocessors. In: Proceedings of the 1999 International Conference on Supercomputing, Rhodes, Greece, June 1999, pp. 294–301 (1999)

    Google Scholar 

  12. Mu, T., Tao, J., Schulz, M., McKee, S.A.: Interactive Locality Optimization on NUMA Architectures. In: Proceedings of the ACM Symposium on Software Visualization, San Diego, USA (June 2003)

    Google Scholar 

  13. Nguyen, A.-T., Michael, M., Sharma, A., Torrellas, J.: The Augmint Multiprocessor Simulation Toolkit for Intel x86 Architectures. In: Proceedings of 1996 International Conference on Computer Design, October 1996, pp. 486–491. IEEE Computer Society Press, Los Alamitos (1996)

    Google Scholar 

  14. Nitzberg, B., Lo, V.: Distributed Shared Memory: A Survey of Issues and Algorithms. IEEE Computer, 52–59 (August 1991)

    Google Scholar 

  15. Sato, M., Harada, H., Ishikawa, Y.: OpenMP compiler for Software Distributed Shared Memory System SCASH. In: Proceedings of Workshop on OpenMP Applications and Tool, WOMPAT (2000)

    Google Scholar 

  16. Tao, J., Schulz, M., Karl, W.: A Simulation Tool for Evaluating Shared Memory Systems. In: Proceedings of the 36th Annual Simulation Symposium, Orlando, Florida, April 2003, pp. 335–342 (2003)

    Google Scholar 

  17. Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: characterization and methodological considerations. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, June 1995, pp. 24–36 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tao, J., Schulz, M., Karl, W. (2005). SIMT/OMP: A Toolset to Study and Exploit Memory Locality of OpenMP Applications on NUMA Architectures. In: Chapman, B.M. (eds) Shared Memory Parallel Programming with Open MP. WOMPAT 2004. Lecture Notes in Computer Science, vol 3349. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31832-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-31832-3_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24560-5

  • Online ISBN: 978-3-540-31832-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics