MLFormer: a high performance MPC linear inference framework for transformers

Liu, Siqi; Liu, Zhusen; Chen, Donglong; Dai, Wangchen; Zhou, Lu; Liu, Zhe; Cheung, Ray  C. C.; Koç, Çetin Kaya

doi:10.1007/s13389-024-00365-1

MLFormer: a high performance MPC linear inference framework for transformers

Research Article
Published: 19 November 2024

Volume 15, article number 2, (2025)
Cite this article

Journal of Cryptographic Engineering Aims and scope Submit manuscript

Siqi Liu^1,6,
Zhusen Liu²,
Donglong Chen^1,3,
Wangchen Dai⁴,
Lu Zhou⁵,
Zhe Liu³,
Ray C. C. Cheung⁶ &
…
Çetin Kaya Koç^5,7

262 Accesses
Explore all metrics

Abstract

Transformer-based models are widely used in natural language processing tasks, and their application has been further extended to computer vision as well. In their usage, data security has become a crucial concern when deploying deep learning services on cloud platforms. To address these security concerns, Multi-party computation (MPC) is employed to prevent data and model leakage during the inference process. However, Transformer model introduces several challenges for MPC computation, including the time overhead of the Softmax (normalized exponential) function, the accuracy issue caused by the “dynamic range” of approximated division and exponential, and the high memory overhead when processing long sequences. To overcome these challenges, we propose MLformer, an MPC-based inference framework for transformer models based on Crypten Knott et al. (Adv Neural Inf Process Syst 34: 4961–4973, 2021), a secure machine learning framework suggested by Facebook AI Research group, in the semi-honest adversary model. In this framework, we replace the softmax attention with linear attention, which has linear time and memory complexity with input length. The modification eliminates the softmax function entirely, resulting in lower time and memory overhead. To ensure the accuracy of linear attention, we propose the scaled linear attention to address the dynamic range issue caused by the MPC division used and a new approximate division function is proposed to reduce the computational time of the attention block. Furthermore, to improve the efficiency and accuracy of MPC exponential and reciprocal which are commonly used in transformer model, we propose a novel MPC exponential protocol and first integrate the efficient reciprocal protocol Bar-Ilan and Beaver (in Proceedings of the 8th annual ACM symposium on principles of distributed computing, pp. 201–209, 1989) to our framework. Additionally, we optimize the computation of causal linear attention, which is utilized in private inference of auto-regression tasks, using our novel CUDA kernel functions. All the proceeding optimizations contribute to the construction of a more accurate and efficient framework. The experimental results demonstrate that our framework achieves comparable accuracy with reduced inference time and GPU memory overhead compared to the original transformer model. The speedup reaches 78.79% compared to traditional private transformer with input length of 1024 patches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 1

Algorithm 2

Comprehensive Analysis of Privacy in Black-Box and White-Box Inference Attacks Against Generative Adversarial Network

SecureQNN: Introducing a Privacy-Preserving Framework for QNNs at the Deep Edge

SecureBiNN: 3-Party Secure Computation for Binarized Neural Network Inference

Data Availibility Statement

No datasets were generated or analysed during the current study.

Notes

$abs{(MPC-Actual)/Actual}$

References

Knott, B., Venkataraman, S., Hannun, A., Sengupta, S., Ibrahim, M., Maaten, L.: Crypten: Secure multi-party computation meets machine learning. Adv. Neural. Inf. Process. Syst. 34, 4961–4973 (2021)
Google Scholar
Bar-Ilan, J., Beaver, D.: Non-cryptographic fault-tolerant computing in constant number of rounds of interaction. In: Proceedings of the 8th Annual ACM Symposium on Principles of Distributed Computing, pp. 201–209 (1989)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929, [cs.CV] (2020)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training. OpenAI (2018)
Hashemi, H., Wang, Y., Annavaram, M.: DarKnight: a data privacy scheme for training and inference of deep neural networks. arXiv:2006.01300, [cs.CR] (2020)
Sun, X., Zhang, P., Liu, J.K., Yu, J., Xie, W.: Private machine learning classification based on fully homomorphic encryption. IEEE Trans. Emerg. Top. Comput. 8(2), 352–364 (2018)
Google Scholar
Lindell, Y.: Secure multiparty computation. Commun. ACM 64(1), 86–96 (2020)
Article Google Scholar
Mohassel, P., Zhang, Y.: Secureml: A system for scalable privacy-preserving machine learning. In: 2017 IEEE Symposium on Security and Privacy, pp. 19–38 (2017). IEEE
Yao, A.C.: Protocols for secure computations. In: 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982), pp. 160–164 (1982). IEEE
Kumar, N., Rathee, M., Chandran, N., Gupta, D., Rastogi, A., Sharma, R.: Cryptflow: Secure tensorflow inference. In: 2020 IEEE Symposium on Security and Privacy, pp. 336–353 (2020). IEEE
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, [cs.CV] (2015)
Wang, Y., Suh, G.E., Xiong, W., Lefaudeux, B., Knott, B., Annavaram, M., Lee, H.-H.S.: Characterization of MPC-based private inference for transformer-based models. In: 2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 187–197 (2022). IEEE
Wang, S., Li, B.Z., Khabsa, M., Fang, H., Ma, H.: Linformer: Self-attention with linear complexity. arXiv:2006.04768, [cs.LG] (2020)
Xiong, Y., Zeng, Z., Chakraborty, R., Tan, M., Fung, G., Li, Y., Singh, V.: Nyströmformer: A nyström-based algorithm for approximating self-attention. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 14138–14148 (2021)
Baker, C.T., Taylor, R.: The numerical treatment of integral equations. J. Appl. Mech. 46(4), 969 (1979)
Article Google Scholar
Katharopoulos, A., Vyas, A., Pappas, N., Fleuret, F.: Transformers are RNNs: Fast autoregressive transformers with linear attention. In: International Conference on Machine Learning, pp. 5156–5165 (2020). PMLR
Riazi, M.S., Weinert, C., Tkachenko, O., Songhori, E.M., Schneider, T., Koushanfar, F.: Chameleon: A hybrid secure computation framework for machine learning applications. In: Proceedings of the 2018 on Asia Conference on Computer and Communications Security, pp. 707–721 (2018)
Tan, S., Knott, B., Tian, Y., Wu, D.J.: CryptGPU: Fast privacy-preserving machine learning on the GPU. In: 2021 IEEE Symposium on Security and Privacy (SP), pp. 1021–1038 (2021). IEEE
Dong, Y., Chen, X., Song, X., Li, K.: FlexBNN: fast private binary neural network inference with flexible bit-width. IEEE Trans. Inf. Forens. Secur. (2023)
Zhang, F., Chen, Z., Zhang, C., Zhou, A.C., Zhai, J., Du, X.: An efficient parallel secure machine learning framework on GPUs. IEEE Trans. Parallel Distrib. Syst. 32(9), 2262–2276 (2021)
Article Google Scholar
Sutradhar, K., Om, H.: A privacy-preserving comparison protocol. IEEE Trans. Comput. 72(6), 1815–1821 (2023). https://doi.org/10.1109/TC.2022.3215640
Article Google Scholar
Resende, A., Railsback, D., Dowsley, R., Nascimento, A.C., Aranha, D.F.: Fast privacy-preserving text classification based on secure multiparty computation. IEEE Trans. Inf. Forensics Secur. 17, 428–442 (2022)
Article Google Scholar
Feng, Q., He, D., Liu, Z., Wang, H., Choo, K.-K.R.: SecureNLP: A system for multi-party privacy-preserving natural language processing. IEEE Trans. Inf. Forensics Secur. 15, 3709–3721 (2020)
Article Google Scholar
Chen, T., Bao, H., Huang, S., Dong, L., Jiao, B., Jiang, D., Zhou, H., Li, J., Wei, F.: THE-X: privacy-preserving transformer inference with homomorphic encryption. arXiv:2206.00216, [cs.CR] (2022)
Child, R., Gray, S., Radford, A., Sutskever, I.: Generating long sequences with sparse transformers. arXiv:1904.10509, [cs.LG] (2019)
Kitaev, N., Kaiser, Levskaya, A.: Reformer: the efficient transformer. arXiv:2001.04451, [cs.LG] (2020)
Kenton, J.D., Chang, M.-W., Kenton, L., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, vol. 1, p. 2 (2019)
Conneau, A., Lample, G.: Cross-lingual language model pretraining. Adv. Neural Inf. Process. Syst. 32 (2019)
Beaver, D.: Efficient multiparty protocols using circuit randomization. In: Advances in Cryptology—CRYPTO’91: Proceedings 11, pp. 420–432 (1992). Springer
Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). arXiv:1511.07289, [cs.LG] (2016)
Watson, J.-L., Wagh, S., Popa, R.A.: Piranha: A GPU platform for secure computation. In: 31st USENIX Security Symposium, pp. 827–844 (2022)
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747, [cs.LG] (2017)

Download references

Acknowledgements

This work is partially supported by the National Natural Science Foundation of China (62002023, 62002239, 62372417, 62132008, 62071222, U22B2030, U20A20176), Guangdong Provincial Key Laboratory IRADS (2022B1212010006, R0400001-22), Guangdong Province General Universities Key Field Project (New Generation Information Technology) (2023ZDZX1033), Zhejiang Lab open research project (No. K2022PD0AB03), the Natural Science Foundation of Jiangsu Province (BK20220075) and the Fok Ying-Tong Education Foundation for Young Teachers in the Higher Education Institutions of China (No. 20193218210004).

Author information

Authors and Affiliations

Guangdong Provincial Key Laboratory of IRADS, BNU-HKBU United International College, Zhuhai, 519000, China
Siqi Liu & Donglong Chen
Hangzhou Innovation Institute of Beihang University, Hangzhou, 311121, China
Zhusen Liu
Zhejiang Lab, Hangzhou, 310000, China
Donglong Chen & Zhe Liu
Sun Yat-sen University, Shenzhen, 518107, China
Wangchen Dai
Nanjing University of Aeronautics and Astronautics, Nanjing, 210000, China
Lu Zhou & Çetin Kaya Koç
City University of Hong Kong, Hong Kong, 310000, China
Siqi Liu & Ray C. C. Cheung
Iǧdır University, Turkey, and University of California Santa Barbara, Santa Barbara, USA
Çetin Kaya Koç

Authors

Siqi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhusen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Donglong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wangchen Dai
View author publications
You can also search for this author in PubMed Google Scholar
Lu Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ray C. C. Cheung
View author publications
You can also search for this author in PubMed Google Scholar
Çetin Kaya Koç
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

(1) Siqi Liu made substantial contributions to the conception, design, implementation of the work; (2) Siqi Liu, Zhusen Liu, and Donglong Chen analyzed the restults and wrote the main manuscript; (3) Wangchen Dai, Lu Zhou, Zhe Liu, Ray C. C. Cheung, and Çetin Kaya Koç gave sufficient guidance on the paper revision; (4) All authors reviewed the manuscript.

Corresponding authors

Correspondence to Donglong Chen or Wangchen Dai.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, S., Liu, Z., Chen, D. et al. MLFormer: a high performance MPC linear inference framework for transformers. J Cryptogr Eng 15, 2 (2025). https://doi.org/10.1007/s13389-024-00365-1

Download citation

Received: 18 January 2024
Accepted: 21 October 2024
Published: 19 November 2024
DOI: https://doi.org/10.1007/s13389-024-00365-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MLFormer: a high performance MPC linear inference framework for transformers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Comprehensive Analysis of Privacy in Black-Box and White-Box Inference Attacks Against Generative Adversarial Network

SecureQNN: Introducing a Privacy-Preserving Framework for QNNs at the Deep Edge

SecureBiNN: 3-Party Secure Computation for Binarized Neural Network Inference

Data Availibility Statement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

MLFormer: a high performance MPC linear inference framework for transformers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Comprehensive Analysis of Privacy in Black-Box and White-Box Inference Attacks Against Generative Adversarial Network

SecureQNN: Introducing a Privacy-Preserving Framework for QNNs at the Deep Edge

SecureBiNN: 3-Party Secure Computation for Binarized Neural Network Inference

Data Availibility Statement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation