Coarse Grain Task Parallelization of Earthquake Simulator GMS Using OSCAR Compiler on Various Cc-NUMA Servers

Shimaoka, Mamoru; Wada, Yasutaka; Kimura, Keiji; Kasahara, Hironori

doi:10.1007/978-3-319-29778-1_15

Coarse Grain Task Parallelization of Earthquake Simulator GMS Using OSCAR Compiler on Various Cc-NUMA Servers

Mamoru Shimaoka¹⁶,
Yasutaka Wada^16,17,
Keiji Kimura¹⁶ &
…
Hironori Kasahara¹⁶

Conference paper
First Online: 20 February 2016

620 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9519))

Abstract

This paper proposes coarse grain task parallelization for a earthquake simulation program using Finite Difference Method to solve the wave equations in 3-D heterogeneous structure or the Ground Motion Simulator (GMS) on various cc-NUMA servers using IBM, Intel and Fujitsu multicore processors. The GMS has been developed by the National Research Institute for Earth Science and Disaster Prevention (NIED) in Japan. Earthquake wave propagation simulations are important numerical applications to save lives through damage predictions of residential areas by earthquakes. Parallel processing with strong scaling has been required to precisely calculate the simulations quickly. The proposed method uses the OSCAR compiler for exploiting coarse grain task parallelism efficiently to get scalable speed-ups with strong scaling. The OSCAR compiler can analyze data dependence and control dependence among coarse grain tasks, such as subroutines, loops and basic blocks. Moreover, locality optimizations considering the boundary calculations of FDM and a new static scheduler that enables more efficient task schedulings on cc-NUMA servers are presented. The performance evaluation shows 110 times speed-up using 128 cores against the sequential execution on a POWER7 based 128 cores cc-NUMA server Hitachi SR16000 VM1, 37.2 times speed-up using 64 cores against the sequential execution on a Xeon E7-8830 based 64 cores cc-NUMA server BS2000, 19.8 times speed-up using 32 cores against the sequential execution on a Xeon X7560 based 32 cores cc-NUMA server HA8000/RS440, 99.3 times speed-up using 128 cores against the sequential execution on a SPARC64 VII based 256 cores cc-NUMA server Fujitsu M9000, 9.42 times speed-up using 12 cores against the sequential execution on a POWER8 based 12 cores cc-NUMA server Power System S812L.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Akcelik, V., Bielak, J., Biros, G., Epanomeritakis, I., Fernandez, A., Ghattas, O., Kim, E.J., Lopez, J., O’Hallaron, D.R., Tu, T., Urbanic, J.: Highresolution forward and inverse earthquake modeling of terascale computers. In: Proceedings of the ACM/IEEE SC2003 (2003)
Google Scholar
Aoi, S., Fujiwara, H.: 3-D finite difference method using discontinuous grids. Bull. Seismol. Soc. Am. 89, 918–930 (1999)
Google Scholar
Tiankai, T., David, R.O., Omar, G.: Scalable parallel octree meshing for terascale applications. In: Proceedings of ACM/IEEE SC2005 (2005)
Google Scholar
Aoi, S., Nishizawa, N., Aoki, T.: Large scale simulation of seismic wave propagation using GPGPU. In: Proceedings of the Fifthteenth World Conference on Earthquake Engineering (2012)
Google Scholar
Kasahara, H., Obata, M., Ishizaka, K.: Automatic coarse grain task parallel processing on SMP Using OpenMP. In: Midkiff, S.P., Moreira, J.E., Gupta, M., Chatterjee, S., Ferrante, J., Prins, J.F., Pugh, B., Tseng, C.-W. (eds.) LCPC 2000. LNCS, vol. 2017, pp. 189–207. Springer, Heidelberg (2001)
Chapter Google Scholar
Obata, M., Shirako, J., Kaminaga, H., Ishizaka, K., Kasahara, H.: Hierarchical parallelism control for multigrain parallel processing. In: Pugh, B., Tseng, C.-W. (eds.) LCPC 2002. LNCS, vol. 2481, pp. 31–44. Springer, Heidelberg (2005)
Chapter Google Scholar
GMS Homepage. http://www.gms.bosai.go.jp
The HDF Group. http://www.hdfgroup.org/
Monica, D.L., Edward, E.R., Michael, E.W.: The cache performance and optimizations of blocked algorithms. In: Proceedings of the Fourth International Conference on Architectural Support For Programming Languages and Operating Systems, pp. 63–74 (1991)
Google Scholar
Apan, Q., Ken, K.: A cache-consciout profitability model for empirical tuning of loop fusion. In: 18th International Workshop, LCpPC 2005, Hawthorne, NY, USA, October 20–22, 2005, pp. 106–120 (2005)
Google Scholar
OSCAR ApPI 2.0. http://www.kasahara.elec.waseda.ac.jp/api2/regist_en.html
Jaswinder, P.S., Truman, J., Anoop, G., John, L.H.: An empirical comparison of the Kendall Square Research KSR-1 and Stanford DASH Multiprocessors. In: Supercomputing 1993 Proceedings of the 1993 ACM/IEEE Conference on Supercomputing, pp. 214–225 (1993)
Google Scholar
Cahill, J.J., Nguyen, T., Vega, M., Baska, D., Szerdi, D., Pross, H., Arroyo, R.X., Nguyen, H., Mueller, M.J., Henderson, D.J., Moreira, J.: IBM power systems build with the POWER8 architecture and processors. IBM J. Res. Dev. 59(1), 1–10 (2015)
Article Google Scholar

Download references

Acknowledgment

The authors would like to thank the members of the Hitachi-Waseda collaborative research project and the Hitachi, Ltd. and the NIED for their support.

Author information

Authors and Affiliations

Advanced Multicore Processor Research Institute, Waseda University, 27 Waseda-machi, Shinjuku-ku, Tokyo, 162-0042, Japan
Mamoru Shimaoka, Yasutaka Wada, Keiji Kimura & Hironori Kasahara
Department of Information Science, Meisei University, 2-1-1 Hodokubo, Hino, Tokyo, 191-8506, Japan
Yasutaka Wada

Authors

Mamoru Shimaoka
View author publications
You can also search for this author in PubMed Google Scholar
Yasutaka Wada
View author publications
You can also search for this author in PubMed Google Scholar
Keiji Kimura
View author publications
You can also search for this author in PubMed Google Scholar
Hironori Kasahara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mamoru Shimaoka .

Editor information

Editors and Affiliations

North Carolina State University, Raleigh, North Carolina, USA
Xipeng Shen
North Carolina State University, Raleigh, North Carolina, USA
Frank Mueller
North Carolina State University, Raleigh, North Carolina, USA
James Tuck

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shimaoka, M., Wada, Y., Kimura, K., Kasahara, H. (2016). Coarse Grain Task Parallelization of Earthquake Simulator GMS Using OSCAR Compiler on Various Cc-NUMA Servers. In: Shen, X., Mueller, F., Tuck, J. (eds) Languages and Compilers for Parallel Computing. LCPC 2015. Lecture Notes in Computer Science(), vol 9519. Springer, Cham. https://doi.org/10.1007/978-3-319-29778-1_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-29778-1_15
Published: 20 February 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29777-4
Online ISBN: 978-3-319-29778-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics