Abstract
Recent works developed an emerging attack, called Model Stealing (MS), to steal the functionalities of remote models, rendering the privacy of cloud-based machine learning services under threat. In this paper, we propose a new defense against MS attacks, using Semantic Inspection (called SeInspect). SeInspect mainly achieves two breakthroughs in this line of work. First, state-of-the-art MS attacks tend to craft malicious queries within a distribution close to benign ones. Such a characteristic increases the stealthiness of these attacks and makes them able to circumvent most of the existing MS defenses. In SeInspect, we introduce a semantic feature based detection method to amplify the query distribution discrepancy between malicious and benign users. Thus, SeInspect can detect stealthy MS attacks with a higher detection rate than existing defenses. Second, in our evaluation, we notice that existing defenses cause significantly increased response latency of model service due to repetfitive user-by-user inspection (e.g., increased by 7.01 times for PRADA, EuroS &P 2019). To mitigate the problem, we propose to analyze semantic features with a two-layer defense mechanism. The first layer can achieve a “quickshot” on users in batches and pick out all potentially malicious users. Then, the second layer identifies the attacker in a user-by-user manner. In our evaluation, we experiment with SeInspect on eight typical MS attacks. The result shows that SeInspect can detect two more attacks than prior works while reducing latency by at least \(54.00\%\).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Tramèr, F., Zhang, F., Juels, A., Reiter, M.K., Ristenpart, T.: Stealing machine learning models via prediction apis. In: 25th Security Symposium (USENIX Security 16), pp. 601–618 (2016)
Juuti, M., Szyller, S., Marchal, S., Asokan, N.: Prada: protecting against dnn model stealing attacks. In: IEEE European Symposium on Security and Privacy (EuroS &P). IEEE 2019, pp. 512–527 (2019)
Zhang, Z., Chen, Y., Wagner, D.: Seat: similarity encoder by adversarial training for detecting model extraction attack queries. In: Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security, AISec 2021, pp. 37–48. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3474369.3486863
Pal, S., Gupta, Y., Kanade, A., Shevade, S.: Stateful detection of model extraction attacks. arXiv preprint arXiv:2107.05166 (2021)
Kesarwani, M., Mukhoty, B., Arya, V., Mehta, S.: Model extraction warning in mlaas paradigm. In: Proceedings of the 34th Annual Computer Security Applications Conference, pp. 371–380 (2018)
Sadeghzadeh, A.M., Dehghan, F., Sobhanian, A.M., Jalili, R.: Hardness of samples is all you need: Protecting deep learning models using hardness of samples, arXiv preprint arXiv:2106.11424 (2021)
Pal, S., Gupta, Y., Shukla, A., Kanade, A., Shevade, S., Ganapathy, V.: Activethief: model extraction using active learning and unannotated public data. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 865–872, April 2020
Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pp. 506–519 (2017)
Gao, R., et al.: Maximum mean discrepancy test is aware of adversarial attacks. In: International Conference on Machine Learning. PMLR, pp. 3564–3575 (2021)
Orekondy, T., Schiele, B., Fritz, M.: Knockoff nets: stealing functionality of black-box models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4954–4963 (2019)
Hinton, G., Vinyals, O., Dean, J., et al.: Distilling the knowledge in a neural network, arXiv preprint arXiv:1503.02531, vol. 2, no. 7 (2015)
Angluin, D.: Queries and concept learning. Mach. Learn. 2(4), 319–342 (1988)
Kariyappa, S., Prakash, A., Qureshi, M.K.: Maze: data-free model stealing attack using zeroth-order gradient estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13 814–13 823 (2021)
Yuan, X., Ding, L., Zhang, L., Li, X., Wu, D.O.: Es attack: model stealing against deep neural networks without data hurdles. IEEE Trans. Emerging Top. Comput. Intell. (2022)
Batina, L., Bhasin, S., Jap, D., Picek, S.: Csi nn: reverse engineering of neural network architectures through electromagnetic side channel. In: 28th USENIX Security Symposium (USENIX Security 19), pp. 515–532 (2019)
Oh, S.J., Schiele, B., Fritz, M.: Towards reverse-engineering black-box neural networks. In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R. (eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700, pp. 121–144. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28954-6_7
He, X., Jia, J., Backes, M., Gong, N.Z., Zhang, Y.: Stealing links from graph neural networks. In: 30th USENIX Security Symposium (USENIX Security 21), pp. 2669–2686 (2021)
Chen, K., Guo, S., Zhang, T., Xie, X., Liu, Y.: Stealing deep reinforcement learning models for fun and profit. In: Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security, pp. 307–319 (2021)
Takemura, T., Yanai, N., Fujiwara, T.: Model extraction attacks on recurrent neural networks. J. Inf. Process. 28, 1010–1024 (2020)
Gong, X., Wang, Q., Chen, Y., Yang, W., Jiang, X.: Model extraction attacks and defenses on cloud-based machine learning models. IEEE Commun. Magaz. 58(12), 83–89 (2020)
Gong, Z., Jiang, W., Zhan, J., Song, Z.: Model stealing defense with hybrid fuzzy models: work-in-progress. In: 2020 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ ISSS), pp. 30–31. IEEE (2020)
Mori, Y., Nitanda, A., Takeda, A.: Bodame: bilevel optimization for defense against model extraction, arXiv preprint arXiv:2103.06797 (2021)
Kariyappa, S., Qureshi, M.K.: Defending against model stealing attacks with adaptive misinformation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2020)
Orekondy, T., Schiele, B., Fritz, M.: Prediction poisoning: towards defenses against dnn model stealing attacks. arXiv preprint arXiv:1906.10908 (2019)
Lee, T., Edwards, B., Molloy, I., Su, D.: Defending against neural network model stealing attacks using deceptive perturbations. In: IEEE Security and Privacy Workshops (SPW), pp. 43–49. IEEE (2019)
Zheng, H., Ye, Q., Hu, H., Fang, C., Shi, J.: BDPL: a boundary differentially private layer against machine learning model extraction attacks. In: Sako, K., Schneider, S., Ryan, P.Y.A. (eds.) ESORICS 2019. LNCS, vol. 11735, pp. 66–83. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29959-0_4
Kesarwani, M., Mukhoty, B., Arya, V., Mehta, S.: Model extraction warning in mlaas paradigm. In: Proceedings of the 34th Annual Computer Security Applications Conference, ACSAC 2018. Association for Computing Machinery, New York, pp. 371–380 (2018). https://doi.org/10.1145/3274694.3274740
Jia, H., Choquette-Choo, C.A., Chandrasekaran, V., Papernot, N.: Entangled watermarks as a defense against model extraction. In: 30th USENIX Security Symposium (USENIX Security 21), pp. 1937–1954 (2021)
Zhu, L., Li, Y., Jia, X., Jiang, Y., Xia, S.-T., Cao, X.: Defending against model stealing via verifying embedded external features. In: ICML 2021 Workshop on Adversarial Machine Learning (2021)
Ahmed, M., Mahmood, A.N., Hu, J.: A survey of network anomaly detection techniques. J. Network Comput. Appl. 60, 19–31 (2016)
Quiring, E., Arp, D., Rieck, K.: Forgotten siblings: Unifying attacks on machine learning and digital watermarking. IEEE European Symposium on Security and Privacy (EuroS &P), pp. 488–502. IEEE (2018)
Correia-Silva, J.R., Berriel, R.F., Badue, C., de Souza, A.F., Oliveira-Santos, T.: Copycat cnn: stealing knowledge by persuading confession with random non-labeled data. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2018)
Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13(25), 723–773 (2012). jmlr.org/papers/v13/gretton12a.html
Leucht, A., Neumann, M.H.: Dependent wild bootstrap for degenerate u-and v-statistics. J. Multivariate Anal. 117, 257–280 (2013)
Liu, F., Xu, W., Lu, J., Zhang, G., Gretton, A., Sutherland, D.J.: Learning deep kernels for non-parametric two-sample tests. In: ICML (2020)
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No. U21A20464, 61872283), the Natural Science Basic Research Program of Shaanxi (No. 2021JC-22), Key Research and Development Program of Shaanxi (No. 2022GY-029), CNKLSTISS, the China 111 Project.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
A Appendix
A Appendix
1.1 A.1 Maximum Mean Discrepancy
The maximum mean discrepancy (MMD) measures the closeness between two distributions \(\mathbb {P}\) and \(\mathbb {Q}\) [33], represented as:
where \(\mathcal {F}\) is a set containing all continuous functions, X and Y are independent and identically distributed (iid) datasets. The MMD depends on \(\mathcal {F}\). To ensure that the test of MMD is consistent in power, thus providing an analytic solution, \(\mathcal {F}\) is restricted to be a unit ball in the reproducing kernel Hilbert space (RKHS). Thereby the kernel based MMD is defined as:
where k is a kernel regarding RKHS \(\mathcal {H}_k\).
1.2 B.2 Settings of the Global Monitor
In our experiments, three types of global monitors are evaluated: RF, XGBoost and LightGBM. The final output of the global monitor is based on “one-vote veto” mechanism of four sub-models:
where x denotes an input. The four sub-models focus on different proportions of attack samples, which are \([10\%, 25\%)\), \([25\%, 40\%)\), \([40\%, 70\%)\) and \([70\%, 100\%)\). To better detect attack samples with low proportion, we limit the proportions of attack samples of \(\mathcal {M}^{1}_{\mathcal {G}}(x)\) and \(\mathcal {M}^{2}_{\mathcal {G}}(x)\) within a smaller scale. Here we give a detailed description of each sub-model in Table 7.
1.3 C.3 Accuracy Reduction on FashionMNSIT
We present the accuracy reduction of three defenses (FashionMNIST) in Table 6. Compared with PRADA, SeInspect reduces accuracy by \(12.67\%\). It is notable that, SeInspect causes a more significant accuracy reduction on SVHN than on FashionMNIST (especially on type-2 attacks T-RND-FGSM and COLOR), because the surrogate model on FashionMNIST can be trained by much fewer queries than on SVHN.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, X., Ma, Z., Liu, Y., Qin, Z., Zhang, J., Wang, Z. (2022). SeInspect: Defending Model Stealing via Heterogeneous Semantic Inspection. In: Atluri, V., Di Pietro, R., Jensen, C.D., Meng, W. (eds) Computer Security – ESORICS 2022. ESORICS 2022. Lecture Notes in Computer Science, vol 13554. Springer, Cham. https://doi.org/10.1007/978-3-031-17140-6_30
Download citation
DOI: https://doi.org/10.1007/978-3-031-17140-6_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17139-0
Online ISBN: 978-3-031-17140-6
eBook Packages: Computer ScienceComputer Science (R0)