Using heterogeneous computing for scattering prediction in scenarios with several source configurations

López-Portugués, M.; López-Fernández, J.  A.; Ranilla, José; Ayestarán, R. G.; Las-Heras, F.

doi:10.1007/s11227-015-1618-2

Using heterogeneous computing for scattering prediction in scenarios with several source configurations

Published: 16 January 2016

Volume 73, pages 57–74, (2017)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

M. López-Portugués¹,
J. A. López-Fernández¹,
José Ranilla ORCID: orcid.org/0000-0003-2941-3741²,
R. G. Ayestarán¹ &
…
F. Las-Heras¹

208 Accesses
1 Citation
Explore all metrics

Abstract

In this work, we present a tool for solving large scattering problems with several acoustic source configurations. These problems entail a large matrix multiplication where the matrices must be generated on demand so that problems can be solved using systems with less memory than that required to store the whole matrices. We have analysed and developed different versions: one based on multiple matrix-vector products, two different approaches built on tiled matrix multiplication, and one heterogeneous implementation for using a GPU and a Xeon Phi simultaneously. To test these implementations, we have used different devices: multicore CPUs, a Xeon Phi accelerator, and a Tesla GPU. When compared to our initial work, the peak speedup of the new solutions is \(25\times \) for CPU, \(17\times \) for Phi, \(20\times \) for GPU, and \(20\times \) for the heterogeneous GPU + Phi implementation. Finally, it is worth mentioning that the tool presented in this work can be adapted and applied to other fields whenever the problem to solve requires a large matrix multiplication where the elements must be generated on demand (e.g. the inverse scattering problem in electromagnetics).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Parallelization of Hierarchical Matrix Algorithms for Electromagnetic Scattering Problems

The spectral cell method for wave propagation in heterogeneous materials simulated on multiple GPUs and CPUs

Article 17 August 2018

Farshid Mossaiby, Meysam Joulaian & Alexander Düster

An optimized, easy-to-use, open-source GPU solver for large-scale inverse homogenization problems

Article 09 September 2023

Di Zhang, Xiaoya Zhai, … Xiao-Ming Fu

Notes

8 cores at 2.0 GHz (Hyper-Threading and Turbo Boost disabled).
2,496 CUDA cores at 706 MHz and 5 GB of device memory.
60 cores at 1.053 GHz (4 threads/core) and 8 GB of RAM.

References

Intel Corporation (2014) Intel Math Kernel Library documentation. http://software.intel.com/en-us/articles/intel-math-kernel-library-documentation
NVIDIA Corporation (2015) cuBLAS Library. http://docs.nvidia.com/cuda/cublas/
Innovative Computing Laboratory (ICL) (2015) Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA). http://icl.cs.utk.edu/plasma/
Innovative Computing Laboratory (ICL) (2015) MAGMA. http://icl.cs.utk.edu/magma/
Quintana-Ortí G et al (2012) A runtime system for programming out-of-core matrix algorithms-by-tiles on multithreaded architectures. ACM Trans Math Softw 38(4):1–25
Article MathSciNet Google Scholar
Hu FQ (2013) An efficient solution of time domain boundary integral equations for acoustic scattering and its acceleration by Graphics Processing Units. In: 19th AIAA/CEAS Aeroacoustics Conference. American Institute of Aeronautics and Astronautics
López-Portugués M et al (2014) Aircraft noise scattering prediction using different accelerator architectures. J Supercomp 70(2):612–622
Article Google Scholar
El-Shenawee M, Miller EL (2004) Multiple-incidence and multifrequency for profile reconstruction of random rough surfaces using the 3-D electromagnetic fast multipole model. IEEE Trans Geosci Remote Sens 42(11):2499–2510
Article Google Scholar
Álvarez-López Y et al (2010) Geometry reconstruction of metallic bodies using the sources reconstruction method. IEEE Antennas Wirel Propag Lett 9:1197–1200
Article Google Scholar
Guan J, Yan S, Jin JM (2013) An OpenMP-CUDA implementation of multilevel fast multipole algorithm for electromagnetic simulation on multi-GPU computing systems. IEEE Trans Antennas Propag 61(7):3607–3616
Article MathSciNet Google Scholar
Nguyen QM et al (2013) Parallelizing fast multipole method for large-scale electromagnetic problems using GPU clusters. IEEE Antennas Wirel Propag Lett 12:868–871
Article Google Scholar
Dang V, Nguyen Q, Kilic O (2013) Fast multipole method for large-scale electromagnetic scattering problems on GPU cluster and FPGA-accelerated platforms. Appl Comput Electromagn Soc J 28(12):1187–1198
Google Scholar
López-Portugués M et al (2012) Acoustic scattering solver based on single level FMM for multi-GPU systems. J Parallel Distrib Comp 72(9):1057–1064
Article Google Scholar
López-Portugués M et al (2013) Parallelization of the FMM on distributed-memory GPGPU. J Supercomp 64(1):17–27
Article Google Scholar
López-Portugués M et al. (2015) Solving noise prediction problems with several noise source configurations using multicore and manycore architectures. In: Proceedings of the 15th International Conference on Computational and Mathematical Methods in Science and Engineering. CMMSE. http://cmmse.usal.es/cmmse2015/images/stories/congreso/Proceedings_CMMSE_2015.pdf
Wu TW (2000) Boundary element acoustics: fundamentals and computer codes. WIT Press, Southampton
MATH Google Scholar
Anderson E et al (1995) LAPACK users’ guide. second. Society for Industrial and Applied Mathematics, Philadelphia
NVIDIA Corporation (2014) CUDA C Programming Guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
Vladimirov A (2012) Auto-vectorization with the Intel Compilers: is Your Code Ready for Sandy Bridge and Knights Corner? Stanford University for Colfax International. http://research.colfaxinternational.com/file.axd?file=2012/3/Colfax_Sandy_Bridge_AVX.pdf
Intel Corporation (2013) Compiler methodology for Intel MIC architecture vectorization essentials, data alignment to assist vectorization. https://software.intel.com/en-us/articles/data-alignment-to-assist-vectorization
Gannon D, Jalby W, Gallivan K (1988) Strategies for cache and local memory management by global program transformation. J Parallel Distrib Comp 5(5):587–616
Article Google Scholar
Lebeck AR, Wood DA (1994) Cache profiling and the SPEC benchmarks: a case study. IEEE Comp 27(10):15–26
Article Google Scholar
Intel Corporation (2014) Memory management for optimal performance on Intel Xeon Phi coprocessor: alignment and prefetching. https://software.intel.com/en-us/articles/memory-management-for-optimal-performance-on-intel-xeon-phi-coprocessor-alignment-and

Download references

Acknowledgments

This work has been partially supported by the “Ministerio de Economía y Competitividad” of Spain / FEDER under grants TEC2012-38142-C04-04 and TEC2015-67387-C4-3-R; and by the “Gobierno del Principado de Asturias” / FEDER under project FC-15-GRUPIN14-114.

Author information

Authors and Affiliations

Departamento de Ingeniería Eléctrica, Electrónica, de Computadores y Sistemas, Universidad de Oviedo, Gijón, Spain
M. López-Portugués, J. A. López-Fernández, R. G. Ayestarán & F. Las-Heras
Departamento de Informática, Universidad de Oviedo, Gijón, Spain
José Ranilla

Authors

M. López-Portugués
View author publications
You can also search for this author in PubMed Google Scholar
J. A. López-Fernández
View author publications
You can also search for this author in PubMed Google Scholar
José Ranilla
View author publications
You can also search for this author in PubMed Google Scholar
R. G. Ayestarán
View author publications
You can also search for this author in PubMed Google Scholar
F. Las-Heras
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to José Ranilla.

Rights and permissions

Reprints and permissions

About this article

Cite this article

López-Portugués, M., López-Fernández, J. ., Ranilla, J. et al. Using heterogeneous computing for scattering prediction in scenarios with several source configurations . J Supercomput 73, 57–74 (2017). https://doi.org/10.1007/s11227-015-1618-2

Download citation

Published: 16 January 2016
Issue Date: January 2017
DOI: https://doi.org/10.1007/s11227-015-1618-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Using heterogeneous computing for scattering prediction in scenarios with several source configurations

Abstract

Access this article

Similar content being viewed by others

Parallelization of Hierarchical Matrix Algorithms for Electromagnetic Scattering Problems

The spectral cell method for wave propagation in heterogeneous materials simulated on multiple GPUs and CPUs

An optimized, easy-to-use, open-source GPU solver for large-scale inverse homogenization problems

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using heterogeneous computing for scattering prediction in scenarios with several source configurations

Abstract

Access this article

Similar content being viewed by others

Parallelization of Hierarchical Matrix Algorithms for Electromagnetic Scattering Problems

The spectral cell method for wave propagation in heterogeneous materials simulated on multiple GPUs and CPUs

An optimized, easy-to-use, open-source GPU solver for large-scale inverse homogenization problems

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation