Skip to main content
Log in

Using heterogeneous computing for scattering prediction in scenarios with several source configurations

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In this work, we present a tool for solving large scattering problems with several acoustic source configurations. These problems entail a large matrix multiplication where the matrices must be generated on demand so that problems can be solved using systems with less memory than that required to store the whole matrices. We have analysed and developed different versions: one based on multiple matrix-vector products, two different approaches built on tiled matrix multiplication, and one heterogeneous implementation for using a GPU and a Xeon Phi simultaneously. To test these implementations, we have used different devices: multicore CPUs, a Xeon Phi accelerator, and a Tesla GPU. When compared to our initial work, the peak speedup of the new solutions is \(25\times \) for CPU, \(17\times \) for Phi, \(20\times \) for GPU, and \(20\times \) for the heterogeneous GPU + Phi implementation. Finally, it is worth mentioning that the tool presented in this work can be adapted and applied to other fields whenever the problem to solve requires a large matrix multiplication where the elements must be generated on demand (e.g. the inverse scattering problem in electromagnetics).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. 8 cores at 2.0 GHz (Hyper-Threading and Turbo Boost disabled).

  2. 2,496 CUDA cores at 706 MHz and 5 GB of device memory.

  3. 60 cores at 1.053 GHz (4 threads/core) and 8 GB of RAM.

References

  1. Intel Corporation (2014) Intel Math Kernel Library documentation. http://software.intel.com/en-us/articles/intel-math-kernel-library-documentation

  2. NVIDIA Corporation (2015) cuBLAS Library. http://docs.nvidia.com/cuda/cublas/

  3. Innovative Computing Laboratory (ICL) (2015) Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA). http://icl.cs.utk.edu/plasma/

  4. Innovative Computing Laboratory (ICL) (2015) MAGMA. http://icl.cs.utk.edu/magma/

  5. Quintana-Ortí G et al (2012) A runtime system for programming out-of-core matrix algorithms-by-tiles on multithreaded architectures. ACM Trans Math Softw 38(4):1–25

    Article  MathSciNet  Google Scholar 

  6. Hu FQ (2013) An efficient solution of time domain boundary integral equations for acoustic scattering and its acceleration by Graphics Processing Units. In: 19th AIAA/CEAS Aeroacoustics Conference. American Institute of Aeronautics and Astronautics

  7. López-Portugués M et al (2014) Aircraft noise scattering prediction using different accelerator architectures. J Supercomp 70(2):612–622

    Article  Google Scholar 

  8. El-Shenawee M, Miller EL (2004) Multiple-incidence and multifrequency for profile reconstruction of random rough surfaces using the 3-D electromagnetic fast multipole model. IEEE Trans Geosci Remote Sens 42(11):2499–2510

    Article  Google Scholar 

  9. Álvarez-López Y et al (2010) Geometry reconstruction of metallic bodies using the sources reconstruction method. IEEE Antennas Wirel Propag Lett 9:1197–1200

    Article  Google Scholar 

  10. Guan J, Yan S, Jin JM (2013) An OpenMP-CUDA implementation of multilevel fast multipole algorithm for electromagnetic simulation on multi-GPU computing systems. IEEE Trans Antennas Propag 61(7):3607–3616

    Article  MathSciNet  Google Scholar 

  11. Nguyen QM et al (2013) Parallelizing fast multipole method for large-scale electromagnetic problems using GPU clusters. IEEE Antennas Wirel Propag Lett 12:868–871

    Article  Google Scholar 

  12. Dang V, Nguyen Q, Kilic O (2013) Fast multipole method for large-scale electromagnetic scattering problems on GPU cluster and FPGA-accelerated platforms. Appl Comput Electromagn Soc J 28(12):1187–1198

    Google Scholar 

  13. López-Portugués M et al (2012) Acoustic scattering solver based on single level FMM for multi-GPU systems. J Parallel Distrib Comp 72(9):1057–1064

    Article  Google Scholar 

  14. López-Portugués M et al (2013) Parallelization of the FMM on distributed-memory GPGPU. J Supercomp 64(1):17–27

    Article  Google Scholar 

  15. López-Portugués M et al. (2015) Solving noise prediction problems with several noise source configurations using multicore and manycore architectures. In: Proceedings of the 15th International Conference on Computational and Mathematical Methods in Science and Engineering. CMMSE. http://cmmse.usal.es/cmmse2015/images/stories/congreso/Proceedings_CMMSE_2015.pdf

  16. Wu TW (2000) Boundary element acoustics: fundamentals and computer codes. WIT Press, Southampton

    MATH  Google Scholar 

  17. Anderson E et al (1995) LAPACK users’ guide. second. Society for Industrial and Applied Mathematics, Philadelphia

  18. NVIDIA Corporation (2014) CUDA C Programming Guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

  19. Vladimirov A (2012) Auto-vectorization with the Intel Compilers: is Your Code Ready for Sandy Bridge and Knights Corner? Stanford University for Colfax International. http://research.colfaxinternational.com/file.axd?file=2012/3/Colfax_Sandy_Bridge_AVX.pdf

  20. Intel Corporation (2013) Compiler methodology for Intel MIC architecture vectorization essentials, data alignment to assist vectorization. https://software.intel.com/en-us/articles/data-alignment-to-assist-vectorization

  21. Gannon D, Jalby W, Gallivan K (1988) Strategies for cache and local memory management by global program transformation. J Parallel Distrib Comp 5(5):587–616

    Article  Google Scholar 

  22. Lebeck AR, Wood DA (1994) Cache profiling and the SPEC benchmarks: a case study. IEEE Comp 27(10):15–26

    Article  Google Scholar 

  23. Intel Corporation (2014) Memory management for optimal performance on Intel Xeon Phi coprocessor: alignment and prefetching. https://software.intel.com/en-us/articles/memory-management-for-optimal-performance-on-intel-xeon-phi-coprocessor-alignment-and

Download references

Acknowledgments

This work has been partially supported by the “Ministerio de Economía y Competitividad” of Spain / FEDER under grants TEC2012-38142-C04-04 and TEC2015-67387-C4-3-R; and by the “Gobierno del Principado de Asturias” / FEDER under project FC-15-GRUPIN14-114.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to José Ranilla.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

López-Portugués, M., López-Fernández, J. ., Ranilla, J. et al. Using heterogeneous computing for scattering prediction in scenarios with several source configurations . J Supercomput 73, 57–74 (2017). https://doi.org/10.1007/s11227-015-1618-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-015-1618-2

Keywords

Navigation