Skip to main content

Performance Evaluation of Stencil Computations Based on Source-to-Source Transformations

  • Conference paper
  • First Online:
High Performance Computing (CARLA 2018)

Abstract

Stencil computations are commons in High Performance Computing (HPC) applications, they consist in a pattern that replicates the same calculation in a data domain. The Finite-Difference Method is an example of stencil computations and it is used to solve real problems in diverse areas related to Partial Differential Equations (electromagnetics, fluid dynamics, geophysics, etc.). Although a large body of literature on optimization of this class of applications is available, the performance evaluation and its optimization on different HPC architectures remain a challenge. In this work, we implemented the 7-point Jacobian stencil in a Source-to-Source Transformation Framework (BOAST) to evaluate the performance of different HPC architectures. Achieved results present that the same source code can be executed on current architectures with a performance improvement, and it helps the programmer to develop the applications without dependence on hardware features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Breuer, A., Heinecke, A., Bader, M.: Petascale local time stepping for the ADER-DG finite element method. In: 2016 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2016, Chicago, IL, USA, 23–27 May 2016, pp. 854–863 (2016)

    Google Scholar 

  2. Buchty, R., Heuveline, V., Karl, W., Weiss, J.P.: A survey on hardware-aware and heterogeneous computing on multicore processors and accelerators. Concurrency Comput. Pract. Exp. 24(7), 663–675 (2012). https://doi.org/10.1002/cpe.1904

    Article  Google Scholar 

  3. Christen, M., Schenk, O., Burkhart, H.: Automatic code generation and tuning for stencil kernels on modern shared memory architectures. Comput. Sci. 26(3–4), 205–210 (2011)

    Google Scholar 

  4. Cronsioe, J., Videau, B., Marangozova-Martin, V.: Boast: bringing optimization through automatic source-to-source transformations. In: 2013 IEEE 7th International Symposium on Embedded Multicore SoCs, pp. 129–134, September 2013. https://doi.org/10.1109/MCSoC.2013.12

  5. Datta, K., Kamil, S., Williams, S., Oliker, L., Shalf, J., Yelick, K.: Optimization and performance modeling of stencil computations on modern microprocessors. SIAM Rev. 51(1), 129–159 (2009). https://doi.org/10.1137/070693199

    Article  MATH  Google Scholar 

  6. Datta, K., et al.: Auto-Tuning Stencil Computations on Multicore and Accelerators. CRC Press, Taylor & Francis Group (2010)

    Google Scholar 

  7. Dupros, F., Boulahya, F., Aochi, H., Thierry, P.: Communication-avoiding seismic numerical kernels on multicore processors. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conferen on Embedded Software and Systems (ICESS), pp. 330–335, August 2015. https://doi.org/10.1109/HPCC-CSS-ICESS.2015.230

  8. Dupros, F., Do, H., Aochi, H.: On scalability issues of the elastodynamics equations on multicore platforms. In: Proceedings of the International Conference on Computational Science, ICCS 2013, Barcelona, Spain, 5–7 June 2013, pp. 1226–1234 (2013)

    Google Scholar 

  9. Forth, S.A., Tadjouddine, M., Pryce, J.D., Reid, J.K.: Jacobian code generated by source transformation and vertex elimination can be as efficient ash and-coding. ACM Trans. Math. Softw. 30(3), 266–299 (2004). https://doi.org/10.1145/1024074.1024076. http://doi.acm.org/10.1145/1024074.1024076

  10. Genssler, T., Kuttruff, V.: Source-to-source transformation in the large. In: Böszörményi, L., Schojer, P. (eds.) JMLC 2003. LNCS, vol. 2789, pp. 254–265. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45213-3_31

    Chapter  Google Scholar 

  11. Khan, M., Priyanka, N., Ahmed, W., Radhika, N., Pavithra, M., Parimala, K.: Understanding source-to-source transformations for frequent porting of applications on changing cloud architectures. In: 2014 International Conference on Parallel, Distributed and Grid Computing, pp. 350–354, December 2014. https://doi.org/10.1109/PDGC.2014.7030769

  12. Lee, S., Min, S.J., Eigenmann, R.: OpenMP to GPGPU: a compiler framework for automatic translation and optimization. SIGPLAN Not. 44(4), 101–110 (2009). https://doi.org/10.1145/1594835.1504194. http://doi.acm.org/10.1145/1594835.1504194

  13. Loveman, D.B.: Program improvement by source-to-source transformation. J. ACM 24(1), 121–145 (1977). https://doi.org/10.1145/321992.322000. http://doi.acm.org/10.1145/321992.322000

  14. Martínez, V., Dupros, F., Castro, M., Navaux, P.: Performance improvement of stencil computations for multi-core architectures based on machine learning. Procedia Comput. Sci. 108, 305–314 (2017). https://doi.org/10.1016/j.procs.2017.05.164. http://www.sciencedirect.com/science/article/pii/S1877050917307408. international Conference on Computational Science, ICCS 2017, 12–14 June 2017, Zurich, Switzerland

  15. Mijakovic, R., Firbach, M., Gerndt, M.: An architecture for flexible auto-tuning: the periscope tuning framework 2.0. In: International Conference on Green High Performance Computing (ICGHPC), pp. 1–9, February 2016. https://doi.org/10.1109/ICGHPC.2016.7508066

  16. Mittal, S., Vetter, J.S.: A survey of CPU-GPU heterogeneous computing techniques. ACM Comput. Surv. 47(4), 69:1–69:35 (2015). https://doi.org/10.1145/2788396

    Article  Google Scholar 

  17. Moczo, P., Robertsson, J., Eisner, L.: The finite-difference time-domain method for modeling of seismic wave propagation. In: Advances in Wave Propagation in Heterogeneous Media, Advances in Geophysics, vol. 48, chap. 8, pp. 421–516. Elsevier - Academic Press (2007)

    Google Scholar 

  18. Nguyen, A., Satish, N., Chhugani, J., Kim, C., Dubey, P.: 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In: 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–13, November 2010. https://doi.org/10.1109/SC.2010.2

  19. Noaje, G., Jaillet, C., Krajecki, M.: Source-to-source code translator: OpenMP C to CUDA. In: 2011 IEEE International Conference on High Performance Computing and Communications, pp. 512–519, September 2011. https://doi.org/10.1109/HPCC.2011.73

  20. Renault, E., Ancelin, C., Jimenez, W., Botero, O.: Using source-to-source transformation tools to provide distributed parallel applications from openMP source code. In: 2008 International Symposium on Parallel and Distributed Computing, pp. 197–204, July 2008. https://doi.org/10.1109/ISPDC.2008.65

  21. Sodani, A., et al.: Knights landing: second-generation intelxeon phi product. IEEE Micro 36(2), 34–46 (2016). https://doi.org/10.1109/MM.2016.25

    Article  Google Scholar 

  22. Stojanovic, S., Bojic, D., Bojovic, M., Valero, M., Milutinovic, V.: An overview of selected hybrid and reconfigurable architectures. In: 2012 IEEE International Conference on Industrial Technology (ICIT), pp. 444–449, March 2012. https://doi.org/10.1109/ICIT.2012.6209978

  23. Tang, Y., Chowdhury, R.A., Kuszmaul, B.C., Luk, C.K., Leiserson, C.E.: The pochoir stencil compiler. In: ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2011, pp. 117–128. ACM, New York (2011). https://doi.org/10.1145/1989493.1989508. http://doi.acm.org/10.1145/1989493.1989508

  24. Videau, B., et al.: Boast: a meta programming framework to produce portable and efficient computing kernels for HPC applications. Int. J. High Perform. Comput. Appl. 32(1), 28–44 (2018). https://doi.org/10.1177/1094342017718068

    Article  Google Scholar 

  25. Wahib, M., Maruyama, N.: Automated GPU kernel transformations in large-scale production stencil applications. In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2015, pp. 259–270. ACM, New York (2015). https://doi.org/10.1145/2749246.2749255. http://doi.acm.org/10.1145/2749246.2749255

  26. Zhao, B., Li, Z., Jannesari, A., Wolf, F., Wu, W.: Dependence-based code transformation for coarse-grained parallelism. In: Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores, COSMIC 2015, pp. 1:1–1:10. ACM, New York (2015). https://doi.org/10.1145/2723772.2723777. http://doi.acm.org/10.1145/2723772.2723777

Download references

Acknowledgments

This work has been granted by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), the Fundação de Amparo à Pesquisa do Estado do Rio Grande do Sul (FAPERGS). Research has received funding from the EU H2020 Programme and from MCTI/RNP-Brazil under the HPC4E Project, grant agreement n.o 689772. It was also supported by Intel under the Modern Code project, and the PETROBRAS oil company under Ref. 2016/00133-9. We also thank to RICAP, partially funded by the Ibero-American Program of Science and Technology for Development (CYTED), Ref. 517RT0529.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Víctor Martínez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Martínez, V., Serpa, M.S., Pavan, P.J., Padoin, E.L., Navaux, P.O.A. (2019). Performance Evaluation of Stencil Computations Based on Source-to-Source Transformations. In: Meneses, E., Castro, H., Barrios Hernández, C., Ramos-Pollan, R. (eds) High Performance Computing. CARLA 2018. Communications in Computer and Information Science, vol 979. Springer, Cham. https://doi.org/10.1007/978-3-030-16205-4_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-16205-4_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-16204-7

  • Online ISBN: 978-3-030-16205-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics