ABSTRACT
The 3D holographic display has long been expected as a future human interface as it does not require users to wear special devices. However, in addition to the delay of display device technology, its heavy computation requirement prevents the realization of such displays. A recent study says that objects and holograms with several giga-pixels should be processed in real time for the realization of high resolution and wide view angle. To this problem, first, we have proposed a new data distribution method that utilizes a basic FFT-based O(N log N) computation but does not need any inter-node communications during the computation on a multi-GPU cluster. Then, we have implemented the method on a multi-GPU cluster, applying several single-node and multi-node optimization and parallelization techniques. The experimental results show that the intra-node optimizations attain 11.52 times speed-up from the original single node code. Further, multi-node optimizations using 8 nodes, 2 GPUs per node, attain the execution time of 4.28 sec. for generating 1.6 giga-pixel hologram from 3.2 giga-pixel object. It means 237.92 times speed-up of the sequential processing by CPU using a conventional FFT-based algorithm.
- T. Baba, S. Watanabe, B.J. Jackin, T. Ohkawa, K. Ootsu, T. Yokota, Y. Hayasaki, and T. Yatagai. 2018. Overcoming the difficulty of large-scale CGH generation on multi-GPU cluster,. In Proc. the 11th Workshop on General Purpose GPUs. 13--21. Vienna, Austria. Google ScholarDigital Library
- D.G. Curry, G. Martinse, and D.G. Hopper. 2003. Capability of the human visual system. In Proc. SPIE, Vol. 5080.Google Scholar
- B.J. Jackin, H. Miyata, T. Ohkawa, K. Ootsu, T. Yokota, Y. Hayasakiand T. Yatagai, and T. Baba. 2014. Distributed caluculation method for large-pixel-number holograms by decomposition of object and hologram planes. In Optics Letters, Vol. 39. 6867--6870.Google ScholarCross Ref
- B.J. Jackin, S. Watanabe, K. Ootsu, T. Ohkawa, T. Yokota, Y. Hayasaki, T. Yatagai, and T. Baba. 2018. Decomposition method for fast computation of gigapixel-sized Fresnel holograms on a graphics processing unit cluster. In Applied Optics, Vol. 57. 3134--3145.Google ScholarCross Ref
- H. Niwase, M. Fujiwara, H. Araki, Y. Maeda, H. Nakayama, T. Kakue, T. Shimobaba, T. Ito, and N. Takada. 2015. Fast computation of computer-generated hologram using multi-GPU cluster system for a single spatial light modulator. In Forum on Information Technology, Vol. 14. 41--44.Google Scholar
- NVIDIA. 2016. CUDA C PROGRAMMING GUIDE NVIDIA.Google Scholar
- L. Onural, F. Yaras., and H. Kang. 2011. Digiral Holographic Three-Dimensional Video Displays. In Proc. IEEE 99. 576--589.Google Scholar
- Open MPI 2017. Open Source High Performance Computing. https://www. open-mpi.org/Google Scholar
- R.B.A. Tanjung, X. Xu, X. Liang, S. Solanki, F. Farbiz Y. Pan, B. Xu, and T-C. Chong. 2010. Digital holographic three-dimensional display of 50-Mpixel holograms using a two-axis scanning mirror device. In Optical Engineering, Vol. 49(2).Google Scholar
- S. Watanabe, B.J. Jackin, T. Ohkawa, K. Ootsu, T. Yokota, Y. Hayasaki, T. Yatagai, and T. Baba. 2017. Acceleration of large-scale CGH generation using multi-GPU cluster,. In Proc. Workshop on Advances in Networking and Computing. 589--593.Google Scholar
- Y. Zhang, J. Liu, X. Li, and Y. Wang. 2016. Fast processing method to generate gigabyte computer generated holography for three-dimensional dynamic holographic display. In Chinese Optics Letters. 030901--1--030901--5.Google Scholar
- Y. Zhao, L. Cao, H. Zhang, D. Kong, and G. Jin. 2015. Accurate calculation of Computer-generated holograms using angular-spectrum layer-oriented method. In Optics Express, Vol. 23.Google Scholar
Recommendations
Overcoming the difficulty of large-scale CGH generation on multi-GPU cluster
GPGPU-11: Proceedings of the 11th Workshop on General Purpose GPUsThe 3D holographic display has long been expected as a future human interface as it does not require users to wear special devices. However, its heavy computation requirement prevents the realization of such displays. A recent study says that objects ...
A Jacobi_PCG solver for sparse linear systems on multi-GPU cluster
The General Purpose Graphics Processing Unit (GPGPU or GPU) has powerful float-point computation ability and is suitable for intensive computing, such as solving large linear systems. The Jacobi Preconditioned Conjugate Gradient method (Jacobi_PCG or ...
Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster
GPGPU has drawn much attention on accelerating non-graphic applications. The simulation by D3Q19 model of the lattice Boltzmann method was executed successfully on multi-node GPU cluster by using CUDA programming and MPI library. The GPU code runs on ...
Comments