OTFX: An In-memory Event Tracing Extension to the Open Trace Format 2

Wagner, Michael; Knüpfer, Andreas; Nagel, Wolfgang E.

doi:10.1007/978-3-319-49956-7_1

Michael Wagner^30,31,
Andreas Knüpfer³¹ &
Wolfgang E. Nagel³¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10049))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

953 Accesses

Abstract

In event-based performance analysis the amount of collected data is one of the most urgent challenges. It can massively slow down application execution, overwhelm the underlying file system and introduce significant measurement bias due to intermediate memory buffer flushes. To address these issues we propose an in-memory event tracing approach that dynamically adapts the volume of application events to an amount that is guaranteed to fit into a single memory buffer, and therefore, avoiding file interaction entirely. These concepts include runtime filtering, enhanced encoding techniques, and novel strategies for runtime event reduction. The concepts further include the hierarchical memory buffer a multi-dimensional, hierarchical data structure allowing to realize these concepts with minimal overhead. We demonstrate the capabilities of our concepts with a prototype implementation called OTFX, based on the Open Trace Format 2, a state-of-the-art open source tracing library used by the performance analyzers Vampir, Scalasca, and Tau.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Score-P and OMPT: Navigating the Perils of Callback-Driven Parallel Runtime Introspection

Unifying the Analysis of Performance Event Streams at the Consumer Interface Level

Tracking Memory Usage in OpenSHMEM Runtimes with the TAU Performance System

References

Argonne National Laboratories. Nek5000 website (2016). http://nek5000.mcs.anl.gov
Eschweiler, D., Wagner, M., Geimer, M., Knüpfer, A., Nagel, W.E., Wolf, F.: Open trace format 2: the next generation of scalable trace formats and support libraries. In: Applications, Tools and Techniques on the Road to Exascale Computing, vol. 22 of Advances in Parallel Computing, pp. 481–490 (2012)
Google Scholar
Geimer, M., Wolf, F., Wylie, B.J., Ábrahám, E., Becker, D., Mohr, B.: The scalasca performance toolset architecture. Concurrency Comput. Pract. Exp. 22(6), 702–719 (2010)
Google Scholar
Hess, B., Kutzner, C., van der Spoel, D., Lindahl, E.: GROMACS 4: algorithms for highly efficient, load-balanced, and scalable molecular simulation. J. Chem. Theor. Comput. 4(3), 435–447 (2008)
Article Google Scholar
Ilsche, T., Schuchart, J., Cope, J., Kimpe, D., Jones, T., Knüpfer, A., Iskra, K., Ross, R., Nagel, W.E., Poole, S.: Enabling event tracing at leadership-class scale through I/O forwarding middleware. In: Proceedings of the 21th International Symposium on High Performance Distributed Computing (HPDC 2012), pp. 49–60. ACM, June 2012
Google Scholar
Knüpfer, A., Brunst, H., Doleschal, J., Jurenz, M., Lieber, M., Mickler, H., Müller, M.S., Nagel, W.E.: The vampir performance analysis tool-set. In: Resch, M., Keller, R., Himmler, V., Krammer, B., Schulz, A. (eds.) Tools for High Performance Computing, pp. 139–155. Springer, Heidelberg (2008). doi:10.1007/978-3-540-68564-7_9
Chapter Google Scholar
Knüpfer, A., Nagel, W.E.: Compressible memory data structures for event-based trace analysis. Future Gener. Comput. Syst. 22(3), 359–368 (2006)
Article Google Scholar
Knüpfer, A., Rössel, C., Mey, D., Biersdorff, S., Diethelm, K., Eschweiler, D., Geimer, M., Gerndt, M., Lorenz, D., Malony, A., Nagel, W.E., Oleynik, Y., Philippen, P., Saviankou, P., Schmidl, D., Shende, S., Tschüter, R., Wagner, M., Wesarg, B., Wolf, F.: Score-P: a joint performance measurement run-time infrastructure for periscope, scalasca, TAU, and vampir. In: Brunst, H., Müller, M.S., Nagel, W.E., Resch, M.M. (eds.) Tools for High Performance Computing 2011, pp. 79–91. Springer, Heidelberg (2012)
Chapter Google Scholar
Lieber, M., Grützun, V., Wolke, R., Müller, M.S., Nagel, W.E.: Highly scalable dynamic load balancing in the atmospheric modeling system COSMO-SPECS+FD4. In: Jónasson, K. (ed.) PARA 2010. LNCS, vol. 7133, pp. 131–141. Springer, Heidelberg (2012). doi:10.1007/978-3-642-28151-8_13
Chapter Google Scholar
Llort, G., Gonzalez, J., Servat, H., Gimenez, J., Labarta, J.: On-line detection of large-scale parallel application’s structure. In: 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS), pp. 1–10 (2010)
Google Scholar
Mohror, K., Karavanic, K.L.: Evaluating similarity-based trace reduction techniques for scalable performance analysis. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC 2009), pp. 55:1–55:12 (2009)
Google Scholar
Mußler, J., Lorenz, D., Wolf, F.: Reducing the overhead of direct application instrumentation using prior static analysis. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011. LNCS, vol. 6852, pp. 65–76. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23400-2_7
Chapter Google Scholar
Noeth, M., Ratn, P., Mueller, F., Schulz, M., de Supinski, B.R.: ScalaTrace: scalable compression and replay of communication traces for high-performance computing. J. Parallel Distrib. Comput. 69(8), 696–710 (2009)
Article Google Scholar
Plimpton, S.: Fast parallel algorithms for short-range molecular dynamics. J. Comput. Phys. 117(1), 1–19 (1995)
Article MATH Google Scholar
Sandia National Laboratories. Lammps website (2016). http://lammps.sandia.gov
Shende, S.S., Malony, A.D.: The tau parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006)
Article Google Scholar
Top500. Top 500 supercomputer sites (2015). http://www.top500.org
Virtual Institute – High Productivity Supercomputing (VI-HPS). Score-P and OTF2 website and download page (2016). http://www.vi-hps.org/projects/score-p
Wagner, M., Doleschal, J., Knüpfer, A., Nagel, W.E., Monitoring, S.R.: Non-intrusive elimination of high-frequency functions. In: Proceedings of the International Conference on High Performance Computing & Simulation (HPCS), pp. 295–302 (2014)
Google Scholar
Wagner, M., Doleschal, J., Knüpfer, A., Nagel, W.E.: Runtime message uniquification for accurate communication analysis on incomplete MPI event traces. In: Proceedings of the 20th European MPI Users’ Group Meeting (EuroMPI 2013), pp. 123–128 (2013)
Google Scholar
Wagner, M., Doleschal, J., Knüpfer, A.: MPI-focused tracing with OTFX: an MPI-aware in-memory event tracing extension to the open trace format 2. In: Proceedings of the 22th European MPI Users’ Group Meeting (EuroMPI 2015), pp. 7: 1–7: 8 (2015)
Google Scholar
Wagner, M., Knüpfer, A., Nagel, W.E.: Enhanced encoding techniques for the open trace format 2. Proc. Comput. Sci. 9, 1979–1987 (2012)
Article Google Scholar
Wagner, M., Knüpfer, A., Nagel, W.E.: Hierarchical memory buffering techniques for an in-memory event tracing extension to the open trace format 2. In: 2013 42nd International Conference on Parallel Processing (ICPP), pp. 970–976 (2013)
Google Scholar
Wagner, M.: Concepts for In-memory Event Tracing: Runtime Event Reduction with Hierarchical Memory Buffers. Doctoral thesis (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Barcelona Supercomputing Center, 08034, Barcelona, Spain
Michael Wagner
Center for Information Services and HPC (ZIH), 01062, Dresden, Germany
Michael Wagner, Andreas Knüpfer & Wolfgang E. Nagel

Authors

Michael Wagner
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Knüpfer
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang E. Nagel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Wagner .

Editor information

Editors and Affiliations

Carlos III University of Madrid, Getafe, Spain
Jesus Carretero
Carlos III University of Madrid, Getafe, Spain
Javier Garcia-Blas
Mathematical Support for Computers, N. I. Lobachevsky State University of Nizhny Novgorod, Nizhniy Novgorod, Russia
Victor Gergel
Research Computing Center (RCC), Moscow State University, Moscow, Russia
Vladimir Voevodin
Research Computing Center (RCC), Moscow State University, Moscow, Russia
Iosif Meyerov
E.U. Politécnica, Universidad de Extremaddura, Cáceres, Spain
Juan A. Rico-Gallego
Ingenieria de Sistemas Informáticos, Universidad de Extremaddura, Cáceres, Spain
Juan C. Díaz-Martín
Universitat Politécnica de València, Valencia, Spain
Pedro Alonso
Distributed and Parallel Systems Group, Institute for Computer Science, Innsbruck, Austria
Juan Durillo
Carlos III University of Madrid, Getafe, Spain
José Daniel Garcia Sánchez
UCD School of Computer Science, University College Dublin, Dublin, Ireland
Alexey L. Lastovetsky
University of Calabria, Rende (CS), Italy
Fabrizio Marozzo
Information Science and Engineering, Central South University, Changsha, Hunan, China
Qin Liu
Information Science and Engineering, Central South University, Changsha, Hunan, China
Zakirul Alam Bhuiyan
Ludwig Maximilian University of Munich, Munich, Germany
Karl Fürlinger
Informatik 10 - Rechnertechnik, Technische Universität München, Munich, Germany
Josef Weidendorfer
High Performance Computing Center (HLRS), Stuttgart, Germany
José Gracia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wagner, M., Knüpfer, A., Nagel, W.E. (2016). OTFX: An In-memory Event Tracing Extension to the Open Trace Format 2. In: Carretero, J., et al. Algorithms and Architectures for Parallel Processing. ICA3PP 2016. Lecture Notes in Computer Science(), vol 10049. Springer, Cham. https://doi.org/10.1007/978-3-319-49956-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-49956-7_1
Published: 19 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49955-0
Online ISBN: 978-3-319-49956-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics