Abstract
This paper proposes coarse grain task parallelization for a earthquake simulation program using Finite Difference Method to solve the wave equations in 3-D heterogeneous structure or the Ground Motion Simulator (GMS) on various cc-NUMA servers using IBM, Intel and Fujitsu multicore processors. The GMS has been developed by the National Research Institute for Earth Science and Disaster Prevention (NIED) in Japan. Earthquake wave propagation simulations are important numerical applications to save lives through damage predictions of residential areas by earthquakes. Parallel processing with strong scaling has been required to precisely calculate the simulations quickly. The proposed method uses the OSCAR compiler for exploiting coarse grain task parallelism efficiently to get scalable speed-ups with strong scaling. The OSCAR compiler can analyze data dependence and control dependence among coarse grain tasks, such as subroutines, loops and basic blocks. Moreover, locality optimizations considering the boundary calculations of FDM and a new static scheduler that enables more efficient task schedulings on cc-NUMA servers are presented. The performance evaluation shows 110 times speed-up using 128 cores against the sequential execution on a POWER7 based 128 cores cc-NUMA server Hitachi SR16000 VM1, 37.2 times speed-up using 64 cores against the sequential execution on a Xeon E7-8830 based 64 cores cc-NUMA server BS2000, 19.8 times speed-up using 32 cores against the sequential execution on a Xeon X7560 based 32 cores cc-NUMA server HA8000/RS440, 99.3 times speed-up using 128 cores against the sequential execution on a SPARC64 VII based 256 cores cc-NUMA server Fujitsu M9000, 9.42 times speed-up using 12 cores against the sequential execution on a POWER8 based 12 cores cc-NUMA server Power System S812L.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Akcelik, V., Bielak, J., Biros, G., Epanomeritakis, I., Fernandez, A., Ghattas, O., Kim, E.J., Lopez, J., O’Hallaron, D.R., Tu, T., Urbanic, J.: Highresolution forward and inverse earthquake modeling of terascale computers. In: Proceedings of the ACM/IEEE SC2003 (2003)
Aoi, S., Fujiwara, H.: 3-D finite difference method using discontinuous grids. Bull. Seismol. Soc. Am. 89, 918–930 (1999)
Tiankai, T., David, R.O., Omar, G.: Scalable parallel octree meshing for terascale applications. In: Proceedings of ACM/IEEE SC2005 (2005)
Aoi, S., Nishizawa, N., Aoki, T.: Large scale simulation of seismic wave propagation using GPGPU. In: Proceedings of the Fifthteenth World Conference on Earthquake Engineering (2012)
Kasahara, H., Obata, M., Ishizaka, K.: Automatic coarse grain task parallel processing on SMP Using OpenMP. In: Midkiff, S.P., Moreira, J.E., Gupta, M., Chatterjee, S., Ferrante, J., Prins, J.F., Pugh, B., Tseng, C.-W. (eds.) LCPC 2000. LNCS, vol. 2017, pp. 189–207. Springer, Heidelberg (2001)
Obata, M., Shirako, J., Kaminaga, H., Ishizaka, K., Kasahara, H.: Hierarchical parallelism control for multigrain parallel processing. In: Pugh, B., Tseng, C.-W. (eds.) LCPC 2002. LNCS, vol. 2481, pp. 31–44. Springer, Heidelberg (2005)
GMS Homepage. http://www.gms.bosai.go.jp
The HDF Group. http://www.hdfgroup.org/
Monica, D.L., Edward, E.R., Michael, E.W.: The cache performance and optimizations of blocked algorithms. In: Proceedings of the Fourth International Conference on Architectural Support For Programming Languages and Operating Systems, pp. 63–74 (1991)
Apan, Q., Ken, K.: A cache-consciout profitability model for empirical tuning of loop fusion. In: 18th International Workshop, LCpPC 2005, Hawthorne, NY, USA, October 20–22, 2005, pp. 106–120 (2005)
OSCAR ApPI 2.0. http://www.kasahara.elec.waseda.ac.jp/api2/regist_en.html
Jaswinder, P.S., Truman, J., Anoop, G., John, L.H.: An empirical comparison of the Kendall Square Research KSR-1 and Stanford DASH Multiprocessors. In: Supercomputing 1993 Proceedings of the 1993 ACM/IEEE Conference on Supercomputing, pp. 214–225 (1993)
Cahill, J.J., Nguyen, T., Vega, M., Baska, D., Szerdi, D., Pross, H., Arroyo, R.X., Nguyen, H., Mueller, M.J., Henderson, D.J., Moreira, J.: IBM power systems build with the POWER8 architecture and processors. IBM J. Res. Dev. 59(1), 1–10 (2015)
Acknowledgment
The authors would like to thank the members of the Hitachi-Waseda collaborative research project and the Hitachi, Ltd. and the NIED for their support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Shimaoka, M., Wada, Y., Kimura, K., Kasahara, H. (2016). Coarse Grain Task Parallelization of Earthquake Simulator GMS Using OSCAR Compiler on Various Cc-NUMA Servers. In: Shen, X., Mueller, F., Tuck, J. (eds) Languages and Compilers for Parallel Computing. LCPC 2015. Lecture Notes in Computer Science(), vol 9519. Springer, Cham. https://doi.org/10.1007/978-3-319-29778-1_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-29778-1_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29777-4
Online ISBN: 978-3-319-29778-1
eBook Packages: Computer ScienceComputer Science (R0)