Abstract
This paper introduces an automatic tuning method for the tiling parameters required in an implementation of the three-dimensional FDTD method based on time-space tiling. In this tuning process, an appropriate range for the tile size is first determined by trial experiments using cubic tiles. The tile shape is then optimized by using the Monte Carlo method. The tiled FDTD kernel was multi-threaded and its performance with the tuned parameters was evaluated on multi-core processors. When compared with a naively implemented kernel, the performance of the tuned FDTD kernel was improved by more than a factor of two.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lu, J., Thiel, D., Saario, S.: FDTD analysis of dielectric-embedded electronically switched multiple-beam (DE-ESMB) antenna array. IEEE Trans. Magn. 38, 701–704 (2002)
Ala, G., Di Piazza, M.C., Tine, G., Viola, F., Vitale, G.: Numerical simulation of radiated EMI in 42 V electrical automotive architectures. IEEE Trans. Magn. 42, 879–882 (2006)
Chew, K.C., Fusco, V.F.: A parallel implementation of the finite difference time-domain algorithm. Int. J. Numer. Model. 8, 293–299 (1995)
Wolf, M.: More iteration space tiling. In: Proceedings of the Supercomputing 1989, pp. 655–664 (1989)
Wonnacott, D.: Using time skewing to eliminate idle time due to memory bandwidth and network limitations. In: Proceedings of the IPDPS 2000 (2000)
Strzodka, R., et al.: Cache oblivious parallelograms in iterative stencil computations. In: Proceedings of the ICS 2010, pp. 49–59 (2010)
Bondhugula, U., Hartono, A., Ramanujam, J., Sadayaooan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: Proceedings of the 2008 ACM SIGPLAN Programming Language Design and Implementation (PLDI), pp. 101–113 (2008)
Minami, T., et al.: Temporal and spatial tiling method without redundant calculations for three-dimensional FDTD method. IPSJ Tran. Adv. Comput. Syst. (In Japanese) (to appear)
Hiraishi, T., et al.: Xcrypt: a perl extension for job level parallel programming. In: Proceedings of the WHIST 2012 (2012)
Whaley, R.C., Petitet, A., Dongarra, J.: Automated empirical optimization of software and the ATLAS project. Parallel Comput. 27, 3–35 (2001)
Vuduc, R., Demmel, J., Yelick, K.: OSKI: a library of automatically tuned sparse matrix kernels. In: Proceedings of the SciDAC 2005, Journal of Physics: Conference Series, vol. 16, pp. 521–530 (2005)
Datta, K., et al.: Stencil computation optimization and auto-tuning on state-of-the-art muticore architectures. In: Proceedings of the SC 2008 (2008)
Datta, K., et al.: Auto-tuning the 27-point stencil for multicore. In: Proceedings of the iWAPT 2009 (2009)
Shirako, J., Sharma, K., Fauzia, N., Pouchet, L.-N., Ramanujam, J., Sadayappan, P., Sarkar, V.: Analytical bounds for optimal tile size selection. In: O’Boyle, M. (ed.) CC 2012. LNCS, vol. 7210, pp. 101–121. Springer, Heidelberg (2012)
Maruyama, N., Nomura, T., Sato, K., Matsuoka, S.: Physis: an implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers. In: Proceedings of the SC 2011 (2008)
Wellein, G., et al.: Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization. In: Proceedings of the COMPSAC 2009, pp. 579–586 (2009)
Wittmann, M., Hager, G., Wellein, G.: Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory. In: Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing. WS and Phd Forum (IPDPSW) (2010)
Orozco, D., Gau, G.: Mapping the FDTD application to many-core chip architectures. In: Proceedings of the 2009 International Conference on Parallel Processing (ICPP), pp. 309–316 (2009)
PLUTO - An automatic parallelizer and locality optimizer for multicores. http://pluto-compiler.sourceforge.net
Nguyen, A., Satish, N., Chhugani, J., Changkyu, K., Dubey, P.: 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In: Proceedings of the SC 2010 (2010)
Jin, G., Endo, T., Matsuoka, S.: A multi-level optimization method for stencil computation on the domain that is bigger than memory capacity of GPU. In: Proceedings of the 2013 27th IEEE International Symposium on Parallel and Distributed Processing. WS and Phd Forum (IPDPSW), pp. 1080–1087 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Minami, T., Hibino, M., Hiraishi, T., Iwashita, T., Nakashima, H. (2015). Automatic Parameter Tuning of Three-Dimensional Tiled FDTD Kernel. In: Daydé, M., Marques, O., Nakajima, K. (eds) High Performance Computing for Computational Science -- VECPAR 2014. VECPAR 2014. Lecture Notes in Computer Science(), vol 8969. Springer, Cham. https://doi.org/10.1007/978-3-319-17353-5_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-17353-5_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17352-8
Online ISBN: 978-3-319-17353-5
eBook Packages: Computer ScienceComputer Science (R0)