Abstract
OpenMP is becoming an important shared memory programmingmodel due to its portability, scalability, and flexibility. However, as it is a fact with any programming paradigms, cache access behavior significantly influences the performance of OpenMP applications. Improving cache performance in order to reduce misses therfore becomes a critical issue for High Performance Computing. This can be achieved by optimizing the source code, but also gained through adequate coherence schemes.
This work studies the behavior of various cache coherence protocols, including both hardware based mechanisms and software based relaxed models. The goal is to examine how well individual schemes perform with different architectures and applications, in order to find general ways to support the cache design in shared memory systems. The study is based on a simulation environment capable of modeling the parallel execution of OpenMP programs. First experimental results show that relaxed models are scalable and can be used as efficient alternative for those hardware coherence mechanisms.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Archibald, J.: A Cache Coherence Approach for Large Multiprocessor Systems. In: Proceedings of the International Conference on Supercomputing, November 1988, pp. 337–345 (1988)
Basumallik, A., Min, S.-J., Eigenmann, R.: Towards OpenMP Execution on Software Distributed Shared Memory Systems. In: Zima, H.P., Joe, K., Sato, M., Seo, Y., Shimasaki, M. (eds.) ISHPC 2002. LNCS, vol. 2327, pp. 457–468. Springer, Heidelberg (2002)
Dagum, L., Menon, R.: OpenMP: An Industry-Standard API for Shared-Memory Programming. IEEE Computational Science & Engineering 5(1), 46–55 (1998)
Gonzàlez, M., Ayguadé, E., Martorell, X., Labarta, J., Navarro, N., Oliver, J.: NanosCompiler: Supporting Flexible Multilevel Parallelism in OpenMP. Concurrency:Practice and Experience 12(12), 1205–1218 (2000)
Grbic, T.S., Brown, S., Caranci, S., Grindley, G., Gusat, M., Lemieux, G., Loveless, K., Manjikian, N., Srbljic, S., Stumm, M., Vranesic, Z., Zilic, Z.: Design and Implementation of the NUMAchine Multiprocessor. In: Proceedings of the 1998 Conference on Design Automation, Los Alamitos, CA, June 1998, pp. 66–69 (1998)
Jin, H., Frumkin, M., Yan, J.: The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance. Technical Report NAS-99-011, NASA Ames Research Center (October 1999)
Laudon, J., Lenoski, D.: The SGI Origin: A ccNUMA Highly Scalable Server. In: Proceedings of the 24th International Symposium on Computer Architecture, May 1997, pp. 241–251 (1997)
Pramanick, I.: MPI and PVM Programming. In: Buyya, R. (ed.) High Performance Cluster Computing. Programming and Applications, vol. 2, ch. 3, pp. 48–86. Prentice Hall PTR, Englewood Cliffs (1999)
Tao, J., Schulz, M., Karl, W.: A Simulation Tool for Evaluating Shared Memory Systems. In: Proceedings of the 36th Annual Simulation Symposium, Orlando, Florida, April 2003, pp. 335–342 (2003)
Tao, J., Weidendorfer, J.: Cache Simulation Based on Runtime Instrumentation for OpenMP Applications. In: Proceedings of the 37th Annual Simulation Symposium, Arlington, VA (April 2004) (to appear)
WWW. Valgrind, an open-source memory debugger for x86-GNU/Linux (1999), http://developer.kde.org/~sewardj/
Zhou, Y., Iftode, L., Singh, J.P., Li, K., Toonen, B.R., Schoinas, I., Hill, M.D., Wood, D.A.: Relaxed Consistency and Coherence Granularity in DSM Systems: A Performance Evaluation. In: Proceedings of the Sixth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, June 1997, pp. 193–205 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tao, J., Karl, W. (2004). Impact of Cache Coherence Models on Performance of OpenMP Applications. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds) Euro-Par 2004 Parallel Processing. Euro-Par 2004. Lecture Notes in Computer Science, vol 3149. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27866-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-540-27866-5_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22924-7
Online ISBN: 978-3-540-27866-5
eBook Packages: Springer Book Archive