research-article

A porting and optimization of search for neighbour-particle in MPS method for GPU by using OpenACC

Authors:
Takaaki Miyajima

Numerical Simulation Research Unit, Aeronautical Technology Directorate, Japan Aerospace Exploration Agency (JAXA), Tokyo, Japan

Numerical Simulation Research Unit, Aeronautical Technology Directorate, Japan Aerospace Exploration Agency (JAXA), Tokyo, Japan
View Profile

,
Kenichi Kubota

Numerical Simulation Research Unit, Aeronautical Technology Directorate, Japan Aerospace Exploration Agency (JAXA), Tokyo, Japan

Numerical Simulation Research Unit, Aeronautical Technology Directorate, Japan Aerospace Exploration Agency (JAXA), Tokyo, Japan
View Profile

,
Naoyuki Fujita

Numerical Simulation Research Unit, Aeronautical Technology Directorate, Japan Aerospace Exploration Agency (JAXA), Tokyo, Japan

Numerical Simulation Research Unit, Aeronautical Technology Directorate, Japan Aerospace Exploration Agency (JAXA), Tokyo, Japan
View Profile

HEART '17: Proceedings of the 8th International Symposium on Highly Efficient Accelerators and Reconfigurable TechnologiesJune 2017Article No.: 12Pages 1–6https://doi.org/10.1145/3120895.3120903

Published:07 June 2017Publication History

HEART '17: Proceedings of the 8th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies

Pages 1–6

ABSTRACT

Moving Particle Semi-implicit (MPS) method is a particle method used in fields such as computational fluid dynamics. It is classified as a particle method. Target fluids and objects are divided up into particles, and each particle interacts with its neighbour-particle. The search for neighbour-particle is the main bottleneck of the MPS method. In this paper, we port and optimize "search for neighbour-particle" part in MPS method for GPU by using OpenACC. It accounted for 56% of all the processing time. We present three different optimizations and evaluated them with three different data sets; 25,704, 224,910 and 2,247,750 particles. We also use four different GPUs; NVIDIA K20c, GTX1080, P100(PCIe) and P100(NVlink). As a result, P100(NVlink) GPU achieves 41.5 times speed-up compared with 24 MPI process CPU version when the number of particles is 2,247,750.

References

Openacc home --- www.openacc.org. http://www.openacc.org/.Google Scholar
S. Koshizuka and Y. Oka. Moving particle semi-implicit method for fragmentation of incompressible fluid. Nuclear Science and Engineering, 123:421--434, 1996. Google ScholarCross Ref
J. Larkin. OpenACC Programming & Best Practices Guide, July 2015.Google Scholar
K. Murotani, S. Koshizuka, T. Tamai, K. Shibata, N. Mitsume, S. Yoshimura, S. Tanaka, K. Hasegawa, E. Nagai, and T. Fujisawa. Development of hierarchical domain decomposition explicit mps method and application to large-scale tsunami analysis with floating objects. Journal of Advanced Simulation in Science and Engineering, 1(1):16--35, 2014. Google ScholarCross Ref
K. Murotani, I. Masaie, T. Matsunaga, S. Koshizuka, R. Shioya, M. Ogino, and T. Fujisawa. Performance improvements of differential operators code for mps method on gpu. Computational Particle Mechanics, 2(3):261--272, 2015. Google ScholarCross Ref
W. Seiya, A. Takayuki, T. Satori, and S. Takashi. Neighbor-particle Searching Method for Particle Simulation Based on Contact Interaction Model for GPU Computing. IPSJ Transactions on Advanced Computing Systems, 8(4):50--60, 2015.Google Scholar
Y. Sota, A. Watanabe, and T. Kojima. Accerelation of the moving paricle semi-implicit method through multi-gpu parallel computing with dynamic domain decomposition. Journal of Japan Society of Civil Engineers, Ser. A2 (Applied Mechanics (AM)), 69(2), 2013.Google Scholar
H. Sun, Y. Tian, Y. Zhang, J. Wu, S. Wang, Q. Yang, and Q. Zhou. A special sorting method for neighbor search procedure in smoothed particle hydrodynamics on gpus. In Parallel Processing Workshops (ICPPW), 2015 44th International Conference on, pages 81--85, Sept 2015. Google ScholarDigital Library

Recommendations

Evaluation of a Directive-Based GPU Programming Approach for High-Order Unstructured Mesh Computational Fluid Dynamics
PASC '17: Proceedings of the Platform for Advanced Scientific Computing Conference

In this work we evaluate the effectiveness of using OpenACC as a paradigm for the auto-parallelization of a high-order unstructured CFD code on Graphics Processing Units (GPUs). This is in lieu of hand-written CUDA or OpenCL code for the algorithms that ...
Read More
CUDA vs OpenACC: performance case studies with kernel benchmarks and a memory-bound CFD application
CCGRID '13: Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing

OpenACC is a new accelerator programming interface that provides a set of OpenMP-like loop directives for the programming of accelerators in an implicit and portable way. It allows the programmer to express the offloading of data and computations to ...
Read More
Understanding Performance Portability of OpenACC for Supercomputers
IPDPSW '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop

Scientific applications need to be moved among supercomputers, such as Tianhe-2 and TSUBAME 2.5. OpenACC provides a directive-based approach for a single source code base with function portability across different accelerators used in the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

HEART '17: Proceedings of the 8th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies
June 2017
172 pages
ISBN:9781450353168
DOI:10.1145/3120895

Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 June 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
GPU
MPS
Moving Particle Semi-implicit
OpenACC
Performance optimization
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate22of50submissions,44%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 60
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A porting and optimization of search for neighbour-particle in MPS method for GPU by using OpenACC

HEART '17: Proceedings of the 8th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies

ABSTRACT

References

Cited By

Recommendations

Evaluation of a Directive-Based GPU Programming Approach for High-Order Unstructured Mesh Computational Fluid Dynamics

CUDA vs OpenACC: performance case studies with kernel benchmarks and a memory-bound CFD application

Understanding Performance Portability of OpenACC for Supercomputers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A porting and optimization of search for neighbour-particle in MPS method for GPU by using OpenACC

HEART '17: Proceedings of the 8th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies

ABSTRACT

References

Cited By

Recommendations

Evaluation of a Directive-Based GPU Programming Approach for High-Order Unstructured Mesh Computational Fluid Dynamics

CUDA vs OpenACC: performance case studies with kernel benchmarks and a memory-bound CFD application

Understanding Performance Portability of OpenACC for Supercomputers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media