Abstract
Large-scale machine learning (LSML) models, such as the GPT-3.5 that powers the well-known ChatGPT chatbot, have revolutionized our perception of AI by enabling more natural, context-aware, and interactive experiences. Yet, training such large models nowadays requires multiple months of computation on expensive hardware, including GPUs, orchestrated by specialized software, so-called LSML frameworks. Due to the model size, neither the on-device memory of GPUs nor the RAM is capable of holding all parameters simultaneously during training. Therefore, LSML frameworks dynamically offload data to NVMe storage and reload the information just in time.
In this paper, we investigate the security of NVMe offloaded data in LSML against poisoning attacks and present NVMevade, the first untargeted poisoning attack on NVMe offloads. NVMevade allows the attacker to reduce the model performance, as well as slow down or even stall the training process. For instance, we demonstrate that an attacker can achieve a stealthy increase of 182% in training time, thus, inflating costs for model training. To address this vulnerability, we develop NVMensure, the first defense that guarantees the integrity and freshness of NVMe offloaded data in LSML. By conducting a large-scale study, we demonstrate the robustness of NVMensure against poisoning attacks and explore runtime efficiency and security trade-offs it can provide. We tested 22 different NVMensure configurations and report an overhead between 9.8% and 64.2%, depending on the selected security level. We also note that NVMensure is going to be effective against targeted poisoning attacks which do not exist yet but might be developed in the future.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
A NVIDIA A100 GPU [28] provides 80 GB of on-device memory.
- 2.
NVMe is an interface specification for PCIe attached Flash and SSD storage devices.
- 3.
We posit that the attacker has effectively circumvented OS access controls, thus obtaining necessary user-level permissions to access and manipulate NVMe files.
- 4.
The replay attack naturally also affects the model performance, but the effect was very marginal and not recognizable in our experiments.
- 5.
For stealthiness evaluation, we compared the execution log (console output) with and without attack. We only found different timestamps and minimal loss value changes, which is normal in different runs.
- 6.
In our experiments the training then stopped. Certainly, real-world scenarios can roll back to a certain checkpoint and continue training automatically.
- 7.
Since LSML models are normally trained on publicly available data that can be scrutinized by experts, dataset privacy is not a concern.
References
Aumasson, J.-P., Neves, S., Wilcox-O’Hearn, Z., Winnerlein, C.: BLAKE2: simpler, smaller, fast as MD5. In: Jacobson, M., Locasto, M., Mohassel, P., Safavi-Naini, R. (eds.) ACNS 2013. LNCS, vol. 7954, pp. 119–135. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38980-1_8
Bagdasaryan, E., Shmatikov, V.: Blind backdoors in deep learning models. In: USENIX Security (2021)
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
Bojar, O., et al.: Findings of the 2016 conference on machine translation. In: Proceedings of the First Conference on Machine Translation (2016)
Brown, T., et al.: Language models are few-shot learners. In: NeurIPS (2020)
Bubeck, S., et al.: Sparks of Artificial General Intelligence: Early experiments with GPT-4. arXiv preprint arXiv:2303.12712 (2023)
Chen, H., Fu, C., Zhao, J., Koushanfar, F.: ProFlip: targeted trojan attack with progressive bit flips. In: IEEE/CVF ICCV (2021)
Chen, X., Liu, C., Li, B., Lu, K., Song, D.: Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. arXiv preprint arXiv:1712.05526 (2017)
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: ICML (2008)
Costan, V., Devadas, S.: Intel SGX explained. Cryptology ePrint Archive (2016)
El Merabet, H., Hajraoui, A.: A survey of malware detection techniques based on machine learning. IJACSA (2019)
Fan, B., Andersen, D.G., Kaminsky, M., Mitzenmacher, M.D.: Cuckoo filter: practically better than bloom. In: CoNEXT (2014)
Gallagher, P., Director, A.: Secure Hash Standard (SHS). FIPS PUB (1995)
Goldblum, M., et al.: Dataset security for machine learning: data poisoning, backdoor attacks, and defenses. IEEE PAMI 45(2), 1563–1580 (2022)
Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: ICASSP (2013)
Guo, D., Liu, Y., Li, X., Yang, P.: False negative problem of counting bloom filter. IEEE Trans. Knowl. Data Eng. 22(5), 651–664 (2010)
Hilal, W., Gadsden, S.A., Yawney, J.: Financial fraud: a review of anomaly detection techniques and recent advances. Expert Syst. Appl. 193, 116429 (2022)
International Organization for Standardization: Information processing — Use of longitudinal parity to detect errors in information messages. ISO Standard ISO 1155, ISO (2001)
Jagielski, M., Oprea, A., Biggio, B., Liu, C., Nita-Rotaru, C., Li, B.: Manipulating machine learning: poisoning attacks and countermeasures for regression learning. In: IEEE S &P (2018)
Jang, I., Tang, A., Kim, T., Sethumadhavan, S., Huh, J.: Heterogeneous isolated execution for commodity GPUs. In: ASPLOS (2019)
Kinney, S.L.: Trusted Platform Module Basics: Using TPM in Embedded Systems. Elsevier (2006)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NeurIPS (2017)
Le Quoc, D., Gregor, F., Singh, J., Fetzer, C.: SGX-PySpark: secure distributed data analytics. In: WWW (2019)
Mechanics, M.: What runs ChatGPT? Inside Microsoft’s AI supercomputer | Featuring Mark Russinovich (2023). https://youtu.be/Rk3nTUfRZmo
Microsoft Research: Turing NLG: A 17 Billion Parameter Language Model by Microsoft (2021). https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/
Mutlu, O., Kim, J.S.: RowHammer: a retrospective. IEEE TCAD 39(8), 1555–1571 (2020)
Narayanan, D., et al.: PipeDream: generalized pipeline parallelism for DNN training. In: ACM SOSP (2019)
Nvidia: A100 GPU (2023). https://www.nvidia.com/en-us/data-center/a100/
Nvidia: DGX Systems (2023). https://www.nvidia.com/de-de/data-center/dgx-systems/
OpenAI: Chatgpt (2023). https://openai.com/research/chatgpt
OpenAI: GPT-4 Technical Report. arXiv preprint arXiv:2303.08774 (2023)
Orenbach, M., Lifshits, P., Minkin, M., Silberstein, M.: Eleos: ExitLess OS services for SGX enclaves. In: EuroSys (2017)
Ozga, W., Quoc, D.L., Fetzer, C.: Perun: Secure Multi-Stakeholder Machine Learning Framework with GPU Support. arXiv preprint arXiv:2103.16898 (2021)
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: NeurIPS (2019)
Paudice, A., Muñoz-González, L., Gyorgy, A., Lupu, E.C.: Detection of Adversarial Training Examples in Poisoning Attacks through Anomaly Detection. arXiv preprint arXiv:1802.03041 (2018)
Peri, N., et al.: Deep k-NN defense against clean-label data poisoning attacks. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12535, pp. 55–70. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66415-2_4
Peterson, W.W., Brown, D.T.: Cyclic codes for error detection. In: Proceedings of the IRE (1961)
Post, M.: A call for clarity in reporting BLEU scores. In: Proceedings of the Third Conference on Machine Translation: Research Papers (2018)
Quoc, D.L., Gregor, F., Arnautov, S., Kunkel, R., Bhatotia, P., Fetzer, C.: SecureTF: a secure TensorFlow framework. In: ACM/IFIP Middleware (2020)
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training. In: OpenAI (2018)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI Blog (2019)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR 21(1), 5485–5551 (2020)
Rajbhandari, S., et al.: DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale. arXiv preprint arXiv:2201.05596 (2022)
Rajbhandari, S., Rasley, J., Ruwase, O., He, Y.: ZeRO: memory optimizations toward training trillion parameter models. In: SC 2020 (2020)
Rajbhandari, S., Ruwase, O., Rasley, J., Smith, S., He, Y.: ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning. arXiv preprint arXiv:2104.07857 (2021)
Rakin, A.S., He, Z., Fan, D.: TBT: targeted neural network attack with bit trojan. In: IEEE/CVF CVPR (2020)
Rakin, A.S., He, Z., Li, J., Yao, F., Chakrabarti, C., Fan, D.: T-BFA: targeted bit-flip adversarial weight attack. IEEE PAMI 44(11), 7928–7939 (2022)
Rasley, J., Rajbhandari, S., Ruwase, O., He, Y.: DeepSpeed: system optimizations enable training deep learning models with over 100 billion parameters. In: SIGKDD (2020)
Ren, J., et al.: ZeRO-offload: democratizing billion-scale model training. In: USENIX ATC (2021)
Rivest, R.: The MD5 Message-Digest Algorithm. IETF (1992)
Saha, A., Subramanya, A., Pirsiavash, H.: Hidden trigger backdoor attacks. In: AAAI (2020)
Sarwate, D.V.: Computation of cyclic redundancy checks via table look-up. Commun. ACM 31(8), 1008–1013 (1988)
Shoeybi, M., Patwary, M., Puri, R., LeGresley, P., Casper, J., Catanzaro, B.: Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arXiv preprint arXiv:1909.08053 (2020)
Smith, S., et al.: Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model (2022)
Tian, Z., Cui, L., Liang, J., Yu, S.: A comprehensive survey on poisoning attacks and countermeasures in machine learning. ACM Comput. Surv. 55(8), 1–35 (2022)
Tian, Z., Cui, L., Liang, J., Yu, S.: A comprehensive survey on poisoning attacks and countermeasures in machine learning. ACM CSUR (2022)
Tramèr, F., Boneh, D.: Slalom: fast, verifiable and private execution of neural networks in trusted hardware. In: ICLR (2018)
Tsai, C.C., Porter, D.E., Vij, M.: Graphene-SGX: a practical library OS for unmodified applications on SGX. In: USENIX ATC (2017)
Volos, S., Vaswani, K., Bruno, R.: Graviton: trusted execution environments on GPUs. In: USENIX OSDI (2018)
Xia, G., Chen, J., Yu, C., Ma, J.: Poisoning attacks in federated learning: a survey. IEEE Access 11, 10708–10722 (2023)
Xiao, H., Biggio, B., Brown, G., Fumera, G., Eckert, C., Roli, F.: Is feature selection secure against training data poisoning? PMLR (2015)
Yang, C., Wu, Q., Li, H., Chen, Y.: Generative Poisoning Attack Method Against Neural Networks. arXiv preprint arXiv:1703.01340 (2017)
Yao, F., Rakin, A.S., Fan, D.: DeepHammer: depleting the intelligence of deep neural networks through targeted chain of bit flips. In: USENIX Security (2020)
Zhu, J., et al.: Enabling rack-scale confidential computing using heterogeneous trusted execution environment. IEEE S&P (2020)
Acknowledgment
We thank the Private AI Collaborate Research Institute which is co-sponsored by Intel Labs (www.private-ai.org) for partially supporting this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Instantiation for DeepSpeed
A Instantiation for DeepSpeed
Below, we provide implementation details of our approaches for DeepSpeed [48].
1.1 A.1 NVMevade
Table 6 depicts the parameters of NVMevade for attack modes 1 and 2. Further, we want to report, that our scanning process confirmed that DeepSpeed offloads three types of data: model parameters, optimizer states, and gradients. Following, we provide additional information for the different attack modes.
Mode 1 and 2. During the poisoning process via bit-flips, we simultaneously attacked all offloaded data. The most impactful perturbations were made in intermediate training parameters, namely gradients and optimizer states, leading to immediate gradient overflows. In benign training, DeepSpeed scales gradients using a scaling factor to address gradient underflows. However, in NVMevade’s implanted overflow scenario, the current training step is skipped, the scaling factor is halved, and the training is halted. When the scaling factor reaches a configured minimum, the entire DeepSpeed process is interrupted with an exception, typically occurring within a few steps of malicious overflows. When adjusting the parameters for mode 2, the objective is to minimize the occurrence of overflows while ensuring a significant level of detrimental modifications, thereby preventing the training process from abruptly terminating.
Mode 3. Due to DeepSpeed’s implementation of a basic sanity check on file sizes, adjusting file size might be necessary if a newly offloaded file with the same name but different size appears during replay attacks. The adjustment can be done via Python’s truncate function to reduce size or zero-padding to increase it. Yet, our experiments didn’t encounter this scenario.
1.2 A.2 NVMensure
NVMensure is implemented in C++ within the DeepNVMe [45] library. The runtime of integrity mechanisms varies depending on the implementation. MD5 and BLAKE2bp employ C++ reference implementations, while SHA-256 utilizes a highly efficient kernel implementation accessed through the kernel’s crypto API. We implemented CRC-32 [52] and LRC [18] by ourselves in C++.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Krauß, T., Götz, R., Dmitrienko, A. (2024). Security of NVMe Offloaded Data in Large-Scale Machine Learning. In: Tsudik, G., Conti, M., Liang, K., Smaragdakis, G. (eds) Computer Security – ESORICS 2023. ESORICS 2023. Lecture Notes in Computer Science, vol 14347. Springer, Cham. https://doi.org/10.1007/978-3-031-51482-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-51482-1_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-51481-4
Online ISBN: 978-3-031-51482-1
eBook Packages: Computer ScienceComputer Science (R0)