Skip to main content

Advertisement

Log in

MLFormer: a high performance MPC linear inference framework for transformers

  • Research Article
  • Published:
Journal of Cryptographic Engineering Aims and scope Submit manuscript

Abstract

Transformer-based models are widely used in natural language processing tasks, and their application has been further extended to computer vision as well. In their usage, data security has become a crucial concern when deploying deep learning services on cloud platforms. To address these security concerns, Multi-party computation (MPC) is employed to prevent data and model leakage during the inference process. However, Transformer model introduces several challenges for MPC computation, including the time overhead of the Softmax (normalized exponential) function, the accuracy issue caused by the “dynamic range” of approximated division and exponential, and the high memory overhead when processing long sequences. To overcome these challenges, we propose MLformer, an MPC-based inference framework for transformer models based on Crypten Knott et al. (Adv Neural Inf Process Syst 34: 4961–4973, 2021), a secure machine learning framework suggested by Facebook AI Research group, in the semi-honest adversary model. In this framework, we replace the softmax attention with linear attention, which has linear time and memory complexity with input length. The modification eliminates the softmax function entirely, resulting in lower time and memory overhead. To ensure the accuracy of linear attention, we propose the scaled linear attention to address the dynamic range issue caused by the MPC division used and a new approximate division function is proposed to reduce the computational time of the attention block. Furthermore, to improve the efficiency and accuracy of MPC exponential and reciprocal which are commonly used in transformer model, we propose a novel MPC exponential protocol and first integrate the efficient reciprocal protocol Bar-Ilan and Beaver (in Proceedings of the 8th annual ACM symposium on principles of distributed computing, pp. 201–209, 1989) to our framework. Additionally, we optimize the computation of causal linear attention, which is utilized in private inference of auto-regression tasks, using our novel CUDA kernel functions. All the proceeding optimizations contribute to the construction of a more accurate and efficient framework. The experimental results demonstrate that our framework achieves comparable accuracy with reduced inference time and GPU memory overhead compared to the original transformer model. The speedup reaches 78.79% compared to traditional private transformer with input length of 1024 patches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Algorithm 2
Fig. 4
Algorithm 3
Algorithm 4
Fig. 5
Algorithm 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data Availibility Statement

No datasets were generated or analysed during the current study.

Notes

  1. \(abs{(MPC-Actual)/Actual}\)

References

  1. Knott, B., Venkataraman, S., Hannun, A., Sengupta, S., Ibrahim, M., Maaten, L.: Crypten: Secure multi-party computation meets machine learning. Adv. Neural. Inf. Process. Syst. 34, 4961–4973 (2021)

    Google Scholar 

  2. Bar-Ilan, J., Beaver, D.: Non-cryptographic fault-tolerant computing in constant number of rounds of interaction. In: Proceedings of the 8th Annual ACM Symposium on Principles of Distributed Computing, pp. 201–209 (1989)

  3. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)

  4. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929, [cs.CV] (2020)

  5. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)

  6. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training. OpenAI (2018)

  7. Hashemi, H., Wang, Y., Annavaram, M.: DarKnight: a data privacy scheme for training and inference of deep neural networks. arXiv:2006.01300, [cs.CR] (2020)

  8. Sun, X., Zhang, P., Liu, J.K., Yu, J., Xie, W.: Private machine learning classification based on fully homomorphic encryption. IEEE Trans. Emerg. Top. Comput. 8(2), 352–364 (2018)

    Google Scholar 

  9. Lindell, Y.: Secure multiparty computation. Commun. ACM 64(1), 86–96 (2020)

    Article  Google Scholar 

  10. Mohassel, P., Zhang, Y.: Secureml: A system for scalable privacy-preserving machine learning. In: 2017 IEEE Symposium on Security and Privacy, pp. 19–38 (2017). IEEE

  11. Yao, A.C.: Protocols for secure computations. In: 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982), pp. 160–164 (1982). IEEE

  12. Kumar, N., Rathee, M., Chandran, N., Gupta, D., Rastogi, A., Sharma, R.: Cryptflow: Secure tensorflow inference. In: 2020 IEEE Symposium on Security and Privacy, pp. 336–353 (2020). IEEE

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  14. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, [cs.CV] (2015)

  15. Wang, Y., Suh, G.E., Xiong, W., Lefaudeux, B., Knott, B., Annavaram, M., Lee, H.-H.S.: Characterization of MPC-based private inference for transformer-based models. In: 2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 187–197 (2022). IEEE

  16. Wang, S., Li, B.Z., Khabsa, M., Fang, H., Ma, H.: Linformer: Self-attention with linear complexity. arXiv:2006.04768, [cs.LG] (2020)

  17. Xiong, Y., Zeng, Z., Chakraborty, R., Tan, M., Fung, G., Li, Y., Singh, V.: Nyströmformer: A nyström-based algorithm for approximating self-attention. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 14138–14148 (2021)

  18. Baker, C.T., Taylor, R.: The numerical treatment of integral equations. J. Appl. Mech. 46(4), 969 (1979)

    Article  Google Scholar 

  19. Katharopoulos, A., Vyas, A., Pappas, N., Fleuret, F.: Transformers are RNNs: Fast autoregressive transformers with linear attention. In: International Conference on Machine Learning, pp. 5156–5165 (2020). PMLR

  20. Riazi, M.S., Weinert, C., Tkachenko, O., Songhori, E.M., Schneider, T., Koushanfar, F.: Chameleon: A hybrid secure computation framework for machine learning applications. In: Proceedings of the 2018 on Asia Conference on Computer and Communications Security, pp. 707–721 (2018)

  21. Tan, S., Knott, B., Tian, Y., Wu, D.J.: CryptGPU: Fast privacy-preserving machine learning on the GPU. In: 2021 IEEE Symposium on Security and Privacy (SP), pp. 1021–1038 (2021). IEEE

  22. Dong, Y., Chen, X., Song, X., Li, K.: FlexBNN: fast private binary neural network inference with flexible bit-width. IEEE Trans. Inf. Forens. Secur. (2023)

  23. Zhang, F., Chen, Z., Zhang, C., Zhou, A.C., Zhai, J., Du, X.: An efficient parallel secure machine learning framework on GPUs. IEEE Trans. Parallel Distrib. Syst. 32(9), 2262–2276 (2021)

    Article  Google Scholar 

  24. Sutradhar, K., Om, H.: A privacy-preserving comparison protocol. IEEE Trans. Comput. 72(6), 1815–1821 (2023). https://doi.org/10.1109/TC.2022.3215640

    Article  Google Scholar 

  25. Resende, A., Railsback, D., Dowsley, R., Nascimento, A.C., Aranha, D.F.: Fast privacy-preserving text classification based on secure multiparty computation. IEEE Trans. Inf. Forensics Secur. 17, 428–442 (2022)

    Article  Google Scholar 

  26. Feng, Q., He, D., Liu, Z., Wang, H., Choo, K.-K.R.: SecureNLP: A system for multi-party privacy-preserving natural language processing. IEEE Trans. Inf. Forensics Secur. 15, 3709–3721 (2020)

    Article  Google Scholar 

  27. Chen, T., Bao, H., Huang, S., Dong, L., Jiao, B., Jiang, D., Zhou, H., Li, J., Wei, F.: THE-X: privacy-preserving transformer inference with homomorphic encryption. arXiv:2206.00216, [cs.CR] (2022)

  28. Child, R., Gray, S., Radford, A., Sutskever, I.: Generating long sequences with sparse transformers. arXiv:1904.10509, [cs.LG] (2019)

  29. Kitaev, N., Kaiser, Levskaya, A.: Reformer: the efficient transformer. arXiv:2001.04451, [cs.LG] (2020)

  30. Kenton, J.D., Chang, M.-W., Kenton, L., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, vol. 1, p. 2 (2019)

  31. Conneau, A., Lample, G.: Cross-lingual language model pretraining. Adv. Neural Inf. Process. Syst. 32 (2019)

  32. Beaver, D.: Efficient multiparty protocols using circuit randomization. In: Advances in Cryptology—CRYPTO’91: Proceedings 11, pp. 420–432 (1992). Springer

  33. Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). arXiv:1511.07289, [cs.LG] (2016)

  34. Watson, J.-L., Wagh, S., Popa, R.A.: Piranha: A GPU platform for secure computation. In: 31st USENIX Security Symposium, pp. 827–844 (2022)

  35. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747, [cs.LG] (2017)

Download references

Acknowledgements

This work is partially supported by the National Natural Science Foundation of China (62002023, 62002239, 62372417, 62132008, 62071222, U22B2030, U20A20176), Guangdong Provincial Key Laboratory IRADS (2022B1212010006, R0400001-22), Guangdong Province General Universities Key Field Project (New Generation Information Technology) (2023ZDZX1033), Zhejiang Lab open research project (No. K2022PD0AB03), the Natural Science Foundation of Jiangsu Province (BK20220075) and the Fok Ying-Tong Education Foundation for Young Teachers in the Higher Education Institutions of China (No. 20193218210004).

Author information

Authors and Affiliations

Authors

Contributions

(1) Siqi Liu made substantial contributions to the conception, design, implementation of the work; (2) Siqi Liu, Zhusen Liu, and Donglong Chen analyzed the restults and wrote the main manuscript; (3) Wangchen Dai, Lu Zhou, Zhe Liu, Ray C. C. Cheung, and Çetin Kaya Koç gave sufficient guidance on the paper revision; (4) All authors reviewed the manuscript.

Corresponding authors

Correspondence to Donglong Chen or Wangchen Dai.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, S., Liu, Z., Chen, D. et al. MLFormer: a high performance MPC linear inference framework for transformers. J Cryptogr Eng 15, 2 (2025). https://doi.org/10.1007/s13389-024-00365-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13389-024-00365-1

Keywords

Navigation