Skip to main content

Performance Tuning of x86 OpenMP Codes with MAQAO

  • Conference paper
  • First Online:
Tools for High Performance Computing 2009

Abstract

Failing to find the best optimization sequence for a given application code can lead to compiler generated codes with poor performances or inappropriate code. It is necessary to analyze performances from the assembly generated code to improve over the compilation process. This paper presents a tool for the performance analysis of multithreaded codes (OpenMP programs support at the moment). MAQAO relies on static performance evaluation to identify compiler optimizations and assess performance of loops. It exploits static binary rewriting for reading and instrumenting object files or executables. Static binary instrumentation allows the insertion of probes at instruction level. Memory accesses can be captured to help tune the code, but such traces require to be compressed. MAQAO can analyze the results and provide hints for tuning the code. We show on some examples how this can help users improve their OpenMP applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Acumum AB. Acumem SlowSpotter and Acumem ThreadSpotter, 2009. http://www.acumem.com/content/view/133/182/.

  2. L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. HPCToolkit: Tools for performance analysis of optimized parallel programs. Technical Report TR08-06, Rice University, 2008.

    Google Scholar 

  3. A. Alexandrov, S. Bratanov, J. Fedorova, D. Levinthal, I. Lopatin, and D. Ryabtsev. Parallelization Made Easier with Intel Performance-Tuning Utility, 2007. http://www.intel.com/technology/itj/2007/v11i4/.

  4. B. Buck and J. K. Hollingsworth. An API for Runtime Code Patching. Intl. Journal of High Performance Computing Applications, 14:317–329, 2000.

    Article  Google Scholar 

  5. Intel Corporation. Intel VTune Performance Analyzer 9.1, 2009. http://software.intel.com/en-us/intel-vtune/.

  6. L. Djoudi, D. Barthou, P. Carribault, C. Lemuet, J-T. Acquaviva, and W. Jalby. Exploring Application Performance: a New Tool For a Static/Dynamic Approach. In Los Alamos Computer Science Institute Symp., Santa Fe, NM, October 2005.

    Google Scholar 

  7. E. N. Elnozahy. Address trace compression through loop detection and reduction. SIGMETRICS Perform. Eval. Rev., 27(1):214–215, 1999.

    Article  Google Scholar 

  8. Agner F. Software optimization resources, 2009. http://www.agner.org/optimize/.

  9. L. Georgiadis, R. F. Werneck, R. E. Tarjan, S. Triantafyllis, and D. I. August. Algorithms - ESA, 3221:677–688, 2004.

    Google Scholar 

  10. W. Jalby, C. Lemuet, and X. Le Pasteur. A New Set of Microbenchmarks to Explore Memory System Performance for Scientific Computing, 2004. International Journal of High Performance Computing Applications.

    Google Scholar 

  11. A. Ketterlin and Ph. Clauss. Prediction and Trace Compression of Data Access trough Nested Loop Recognition. In ACM/IEEE Int. Symp. on Code Optimization and Generation, 2008.

    Google Scholar 

  12. S. Koliai, S. Zuckerman, E. Oseret, M. Ivascot, T. Moseley, D. Quang, and W. Jalby. A Balanced Approach to Application Performance Tuning. In Proc. of LCPC, LNCS, Delaware, USA, October 2009. Springer.

    Google Scholar 

  13. J. Marathe, F. Mueller, T. Mohan, B. R. de Supinski, S. A. McKee, and A. Yoo. METRIC: Tracking Down Inefficiencies in the Memory Hierarchy via Binary Rewriting. ACM/IEEE Int. Symp. on Code Optimization and Generation, 0:289, 2003.

    Google Scholar 

  14. N. Nethercote and J. Seward. Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation. 2007. Proceedings of ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI 2007), San Diego, California, USA, June 2007.

    Google Scholar 

  15. C. Mills Olschanowsky, M. Tikir, L. Carrington, and A. Snavely. PSnAP: Accurate Synthetic Address Streams Through Memory Profiles. In Int. Workshop on Languages and Compilers for Parallel Computing, 2009.

    Google Scholar 

  16. ParMA ITEA2 Project: Parallel Programming for Multicore Architectures. http://www.parma-itea2.org/.

  17. B. Risio, A. Berreth, S. Zuckerman, S. Koliai, M. Ivascot, W. Jalby, B. Krammer, B. Mohr, and T. William. How to Accelerate an Application: a Practical Case Study in Combustion Modelling. In Proc. of ParCo, Lyon, France, 2009.

    Google Scholar 

  18. C. Valensi and D. Barthou. MADRAS: Multi-Architecture Disassembler and Reassembler, 2009. http://maqao.prism.uvsq.fr/wiki/wiki/MadrasDownload.

  19. S. Wallace and K. Hazelwood. SuperPin: Parallelizing Dynamic Instrumentation for Real-Time Performance. In ACM/IEEE Int. Symp. on Code Optimization and Generation, pages 209–217, San Jose, CA, March 2007.

    Google Scholar 

  20. F. Wolf, B.J.N. Wylie, E. Ábrahám, D. Becker, W. Frings, K. Fürlinger, M. Geimer, M.-A. Hermanns, B. Mohr, S. Moore, M. Pfeifer, and Z. Szebenyi. Usage of the SCALASCA Toolset for Scalable Performance Analysis of Large-Scale Parallel Applications. In Proc. of the 2nd HLRS Parallel Tools Workshop, pages 157–167, Stuttgart, Germany, July 2008. Springer. ISBN 978-3-540-68561-6.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Barthou, D., Charif Rubial, A., Jalby, W., Koliai, S., Valensi, C. (2010). Performance Tuning of x86 OpenMP Codes with MAQAO. In: Müller, M., Resch, M., Schulz, A., Nagel, W. (eds) Tools for High Performance Computing 2009. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11261-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-11261-4_7

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-11260-7

  • Online ISBN: 978-3-642-11261-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics