Skip to main content
Log in

An Instrumentation Approach for Hardware-Agnostic Software Characterization

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Simulators and empirical profiling data are often used to understand how suitable a specific hardware architecture is for an application. However, simulators can be slow, and empirical profiling-based methods can only provide insights about the existing hardware on which the applications are executed. While the insights obtained in this way are valuable, such methods cannot be used to evaluate a large number of system designs efficiently. Analytical performance evaluation models are fast alternatives, particularly well-suited for system design-space exploration. However, to be truly application-specific, they need to be combined with a workload model that captures relevant application characteristics. In this paper we introduce PISA, a framework based on the LLVM infrastructure that is able to generate such a model for sequential and parallel applications by performing hardware-independent characterization. Characteristics such as instruction-level parallelism, memory access patterns and branch behavior are analyzed per thread or process during application execution. To illustrate the potential of the framework, we provide a detailed characterization of a representative benchmark for graph-based analytics, Graph 500. Finally, we analyze how the properties extracted with PISA across Graph 500 and SPEC CPU2006 applications compare to measurements performed on x86 and POWER8 processors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

References

  1. Anghel, A., Rodriguez, G., Prisacari, B., Minkenberg, C., Dittmann, G.: Quantifying communication in graph analytics. In: Kunkel, J.M., Ludwig, T. (eds.) High Performance Computing. Lecture Notes in Computer Science, vol. 9137, pp. 472–487. Springer International Publishing (2015)

  2. Argollo, E., Falcón, A., Faraboschi, P., Monchiero, M., Ortega, D.: Cotson: infrastructure for full system simulation. SIGOPS Oper. Syst. Rev. 43(1), 52–61 (2009)

    Article  Google Scholar 

  3. Beckmann, N., Eastep, J., Gruenwald, C., Kurian, G., Kasture, H., Miller, J.E., Celio, C., Agarwal, A.: Graphite: a distributed parallel simulator for multicores. Technical report, MIT (2009)

  4. Cabezas, V.: A tool for analysis and visualization of application properties. Technical Report RZ3834, IBM (2012)

  5. Carlson, T.E., Heirman, W., Eeckhout, L.: Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’11, pp. 52:1–52:12. ACM, New York, NY, USA (2011)

  6. Carlson, T.E., Heirman, W., Eyerman, S., Hur, I., Eeckhout, L.: An evaluation of high-level mechanistic core models. ACM Transactions on Architecture and Code Optimization (TACO), (2014)

  7. Czechowski, K., Battaglino, C., McClanahan, C., Chandramowlishwaran, A., Vuduc, R.: Balance principles for algorithm-architecture co-design. In: Proceedings of HotPar’11, pp. 9–9. USENIX Association, Berkeley, CA, USA

  8. Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A.D., Ailamaki, A., Falsafi, B.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of ASPLOS’12, pp. 37–48

  9. Fog, A.: The microarchitecture of intel, amd and via cpus. An optimization guide for assembly programmers and compiler makers. http://www.agner.org/optimize/microarchitecture.pdf

  10. Graph 500: Graph 500 benchmark. http://www.graph500.org/

  11. Hennessy, J.L., Patterson, D.A.: Computer Architecture, Fourth Edition: A Quantitative Approach. Morgan Kaufmann Publishers Inc, San Francisco (2006)

    MATH  Google Scholar 

  12. Hoste, K., Eeckhout, L.: Microarchitecture-independent workload characterization. IEEE Micro 27(3), 63–72 (2007)

    Article  Google Scholar 

  13. Jongerius, R., Mariani, G., Anghel, A., Dittmann, G., Vermij, E., Corporaal, H.: Analytic processor model for fast design-space exploration. In: Proceedings of the 33rd IEEE International Conference on Computer Design (ICCD), ICCD’15 (2015)

  14. Jose, J., Potluri, S., Tomko, K., Panda, D.K.: Designing scalable Graph500 benchmark with hybrid MPI+OpenSHMEM programming models. In: ISC’13. Lecture Notes in Computer Science, vol. 7905, pp. 109–124. Springer

  15. Lam, M.S., Wilson, R.P.: Limits of control flow on parallelism. In: Proceedings of the 19th Annual International Symposium on Computer Architecture, ISCA ’92, pp. 46–57 (1992)

  16. Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of CGO’04, pp. 75–86

  17. Luk, C.-K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of PLDI’05, pp. 190–200. ACM, New York, NY, USA (2005)

  18. Patel, A., Afram, F., Chen, S., Ghose, K.: Marss: a full system simulator for multicore x86 cpus. In: Proceedings of the 48th Design Automation Conference, DAC ’11, pp 1050–1055. ACM, New York, NY, USA (2011)

  19. Shao, Y.S., Brooks, D.: ISA-independent workload characterization and its implications for specialized architectures. In: Proceedings of ISPASS’13, pp. 245–255

  20. Sharapov, I., Kroeger, R., Delamarter, G., Cheveresan, R., Ramsay, M.: A case study in top-down performance estimation for a large-scale parallel application. In: Proceedings of PPoPP’06, pp. 81–89. ACM

  21. Suzumura, T., Ueno, K., Sato, H., Fujisawa, K., Matsuoka, S.: Performance characteristics of Graph500 on large-scale distributed environment. In: Proceedings of IISWC’11, pp. 149–158

  22. Yokota, T., Ootsu, K., Baba, T.: Potentials of branch predictors: from entropy viewpoints. In: Proceedings of the 21st International Conference on Architecture of Computing Systems, ARCS’08, pp. 273–285. Springer, Berlin, Heidelberg (2008)

  23. Zhong, Y., Shen, X., Ding, C.: Program locality analysis using reuse distance. ACM Trans. Program. Lang. Syst 31(6), 20:1–20:39 (2009)

    Article  Google Scholar 

Download references

Acknowledgments

This work is conducted in the context of the joint ASTRON and IBM DOME project and is funded by the Netherlands Organisation for Scientific Research (NWO), the Dutch Ministry of EL&I, and the Province of Drenthe. We would like to thank Evelina Dumitrescu for running part of the OpenMP and MPI PISA characterizations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreea Anghel.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Anghel, A., Vasilescu, L.M., Mariani, G. et al. An Instrumentation Approach for Hardware-Agnostic Software Characterization. Int J Parallel Prog 44, 924–948 (2016). https://doi.org/10.1007/s10766-016-0410-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-016-0410-0

Keywords

Navigation