Skip to main content

Accelerating Stencil Computation with Fully Homomorphic Encryption Using GPU

  • Conference paper
  • First Online:
Euro-Par 2024: Parallel Processing (Euro-Par 2024)

Abstract

Stencil computations with fully homomorphic encryption (FHE) is an emerging area with significant potential to address the challenges of protecting sensitive data of HPC applications in outsourcing computing environment. However, the computational overhead introduced by FHE can drastically reduce the performance of stencil computations compared to unencrypted implementations. This paper proposes two optimized algorithms for stencil computation with FHE tailored to GPU platforms: Matrix Overlap Processing (MOP) and Matrix Fixed-point Processing(MFP). MOP divides the input matrix into multiple slices, encrypts elements at the same positions across slices into a single ciphertext, and processes them with a uniform computing pattern. MFP directly encrypts neighbouring elements into ciphertexts, stores them in a table, and processes them in parallel on the GPU. The experimental results show that our proposed methods achieve significant speedups compared to the corresponding OpenMP implementations on CPU. Specifically, the MOP implementation achieves a speedup of 8.7\(\times \), while the MFP implementation achieves a speedup of 10.3\(\times \) on GPU.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Cheon, J.H., Kim, A., Kim, M., Song, Y.: Homomorphic encryption for arithmetic of approximate numbers. In: Takagi, T., Peyrin, T. (eds.) ASIACRYPT 2017. LNCS, vol. 10624, pp. 409–437. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70694-8_15

    Chapter  Google Scholar 

  2. Chillotti, I., Gama, N., Georgieva, M., Izabachène, M.: TFHE: fast fully homomorphic encryption over the torus. J. Cryptol. 33(1), 34–91 (2020)

    Article  MathSciNet  Google Scholar 

  3. Denzler, A., et al.: Casper: accelerating stencil computations using near-cache processing. IEEE Access 11, 22136–22154 (2023)

    Article  Google Scholar 

  4. Gentry, C.: Fully homomorphic encryption using ideal lattices. In: Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, pp. 169–178 (2009)

    Google Scholar 

  5. Han, K., Ki, D.: Better bootstrapping for approximate homomorphic encryption. In: Jarecki, S. (ed.) CT-RSA 2020. LNCS, vol. 12006, pp. 364–390. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-40186-3_16

    Chapter  Google Scholar 

  6. Ikeda, K., Ino, F., Hagihara, K.: Efficient acceleration of mutual information computation for nonrigid registration using CUDA. IEEE J. Biomed. Health Inform. 18(3), 956–968 (2014)

    Article  Google Scholar 

  7. Jin, H., Wu, W., Shi, X., He, L., Zhou, B.B.: TurboDL: improving the CNN training on GPU with fine-grained multi-streaming scheduling. IEEE Trans. Comput. 70(4), 552–565 (2020)

    Article  Google Scholar 

  8. Jung, W., Kim, S., Ahn, J.H., Cheon, J.H., Lee, Y.: Over 100x faster bootstrapping in fully homomorphic encryption through memory-centric optimization with GPUs. In: IACR Transactions on Cryptographic Hardware and Embedded Systems, pp. 114–148 (2021)

    Google Scholar 

  9. Kim, S., et al.: BTS: an accelerator for bootstrappable fully homomorphic encryption. In: Proceedings of the 49th Annual International Symposium on Computer Architecture, pp. 711–725 (2022)

    Google Scholar 

  10. Kondratyuk, N., Nikolskiy, V., Pavlov, D., Stegailov, V.: GPU-accelerated molecular dynamics: state-of-art software performance and porting from Nvidia CUDA to AMD HIP. Int. J. High Perform. Comput. Appl. 35(4), 312–324 (2021)

    Article  Google Scholar 

  11. Liu, X., et al.: Toward accelerated stencil computation by adapting tensor core unit on GPU. In: Proceedings of the 36th ACM International Conference on Supercomputing, pp. 1–12 (2022)

    Google Scholar 

  12. Mouchet, C., Troncoso-Pastoriza, J., Bossuat, J.P., Hubaux, J.P.: Multiparty homomorphic encryption from ring-learning-with-errors. Proc. Privacy Enhancing Technol. 2021(4), 291–311 (2021)

    Article  Google Scholar 

  13. Niculescu, V.: On the impact of high performance computing in big data analytics for medicine. Appl. Med. Inform. 42(1), 9–18 (2020)

    Google Scholar 

  14. Okuyama, T., et al.: Accelerating ode-based simulation of general and heterogeneous biophysical models using a GPU. IEEE Trans. Parallel Distrib. Syst. 25(8), 1966–1975 (2013)

    Article  Google Scholar 

  15. Shen, J., Shigeoka, K., Ino, F., Hagihara, K.: An out-of-core branch and bound method for solving the 0-1 knapsack problem on a GPU. In: Ibrahim, S., Choo, K.-K.R., Yan, Z., Pedrycz, W. (eds.) ICA3PP 2017. LNCS, vol. 10393, pp. 254–267. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65482-9_17

    Chapter  Google Scholar 

  16. Shen, J., Shigeoka, K., Ino, F., Hagihara, K.: GPU-based branch-and-bound method to solve large 0–1 knapsack problems with data-centric strategies. Concurr. Comput. Pract. Exp. 31(4), e4954 (2019)

    Article  Google Scholar 

  17. Shen, J., Wu, Y., Okita, M., Ino, F.: Accelerating GPU-based out-of-core stencil computation with on-the-fly compression. In: Shen, H., et al. (eds.) PDCAT 2021. LNCS, vol. 13148, pp. 3–14. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-96772-7_1

    Chapter  Google Scholar 

  18. Shi, X., Agrawal, T., Lin, C.A., Hwang, F.N., Chiu, T.H.: A parallel nonlinear multigrid solver for unsteady incompressible flow simulation on multi-GPU cluster. J. Comput. Phys. 414, 109447 (2020)

    Article  MathSciNet  Google Scholar 

  19. Yantır, H.E., Eltawil, A.M., Salama, K.N.: Efficient acceleration of stencil applications through in-memory computing. Micromachines 11(6), 622 (2020)

    Article  Google Scholar 

  20. Zhu, Y., Wang, X., Ju, L., Guo, S.: FxHENN: FPGA-based acceleration framework for homomorphic encrypted CNN inference. In: 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 896–907. IEEE (2023)

    Google Scholar 

Download references

Acknowledgments

This work has been supported by the Natural Science Foundation of Hubei Province of China [grant number 2023AFB394] and Knowledge Innovation Program of Wuhan-Shuguang Project [grant number 2022010801020283].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pei Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, X., Li, P., Chen, J., Yao, S. (2024). Accelerating Stencil Computation with Fully Homomorphic Encryption Using GPU. In: Carretero, J., Shende, S., Garcia-Blas, J., Brandic, I., Olcoz, K., Schreiber, M. (eds) Euro-Par 2024: Parallel Processing. Euro-Par 2024. Lecture Notes in Computer Science, vol 14803. Springer, Cham. https://doi.org/10.1007/978-3-031-69583-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-69583-4_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-69582-7

  • Online ISBN: 978-3-031-69583-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics