Abstract
The integration of the memory controller on the processor die enables ever larger core counts in commodity hardware shared memory systems with Non-Uniform Memory Architecture properties. Shared memory parallelization with OpenMP is an elegant and widely used approach to leverage the power of such systems. The binding of the OpenMP threads to compute cores and the corresponding memory association are becoming even more critical in order to obtain optimal performance. In this work we provide a method to measure the amount of remote socket memory accesses a thread generates. We use available performance monitoring CPU counters in combination with thread binding on a quad socket Nehalem EX system. For visualization of the collected data we use Vampir.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Top500.org: Top 500 List June 2010 (July 2010), http://www.top500.org/
HP: HP ProLiant DL980 G7 Server Data Sheet
Intel(R): Intel(r) thread checker, http://software.intel.com/en-us/intel-thread-checker/
Sun Microsystems: Thread analyzer user’s guide, http://dlc.sun.com/pdf/820-0619/820-0619.pdf
Fürlinger, K., Gerndt, M.: A profiling tool for OpenMP. In: OpenMP Shared Memory Parallel Programming, Dresden, Germany. Springer, Heidelberg (2008)
Terboven, C., an Mey, D., Schmidl, D., Jing, H., Wagner, M.: Data and thread affinity in OpenMP programs. In: Memory Access on future Processors: A solved problem? In: ACM International Conference on Computing Frontiers, Ischia, Italy (May 2008)
Jarp, S., Jurga, R., Nowak, A.: Perfmon2: A leap forward in performance monitoring. In: International Conference on Computing in High Energy and Nuclear Physics. Journal of Physics: Conference Series, vol. 119, p. 042017 (2008)
Intel: Intel 64 and IA-32 Architectures Optimization Reference Manual (2009)
Intel: Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 3B (2010)
Intel(R): Intel(R) Xeon(R) processor 7500 series uncore programming guide (2010), http://www.intel.com/Assets/pt_BR/PDF/designguide/323535.pdf
Mohr, B., Malony, A.D., Shende, S., Wolf, F.: Design and prototype of a performance tool interface for OpenMP. J. Supercomput. 23(1), 105–128 (2002)
Terpstra, D., Jagode, H., You, H., Dongarra, J.: Collecting performance data with papi-c. In: Proceedings of the 3rd Parallel Tools Workshop (2010) (to appear)
Knüpfer, A., Brendel, R., Brunst, H., Mix, H., Nagel, W.E.: Introducing the open trace format (OTF). In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2006. LNCS, vol. 3992, pp. 526–533. Springer, Heidelberg (2006)
Knüpfer, A., Brunst, H., Doleschal, J., Jurenz, M., Lieber, M., Mickler, H., Müller, M.S., Nagel, W.E.: The vampir performance analysis tool-set. In: Proceedings of the 2nd HLRS Parallel Tools Workshop, Stuttgart, Germany (July 2008)
Wolf, F., Wylie, B.J.N., Ábrahám, E., Becker, D., Frings, W., Fürlinger, K., Geimer, M., Hermanns, M.-A., Mohr, B., Moore, S., Pfeifer, M., Szebenyi, Z.: Usage of the scalasca toolset for scalable performance analysis of large-scale parallel applications. In: Proceedings of the 2nd HLRS Parallel Tools Workshop, Stuttgart, Germany (July 2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Iwainsky, C. et al. (2011). An Approach to Visualize Remote Socket Traffic on the Intel Nehalem-EX. In: Guarracino, M.R., et al. Euro-Par 2010 Parallel Processing Workshops. Euro-Par 2010. Lecture Notes in Computer Science, vol 6586. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21878-1_64
Download citation
DOI: https://doi.org/10.1007/978-3-642-21878-1_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21877-4
Online ISBN: 978-3-642-21878-1
eBook Packages: Computer ScienceComputer Science (R0)