SeInspect: Defending Model Stealing via Heterogeneous Semantic Inspection

Liu, Xinjing; Ma, Zhuo; Liu, Yang; Qin, Zhan; Zhang, Junwei; Wang, Zhuzhu

doi:10.1007/978-3-031-17140-6_30

Xinjing Liu¹¹,
Zhuo Ma¹¹,
Yang Liu¹¹,
Zhan Qin¹²,
Junwei Zhang¹¹ &
…
Zhuzhu Wang¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13554))

Included in the following conference series:

European Symposium on Research in Computer Security

2331 Accesses
1 Citations
1 Altmetric

Abstract

Recent works developed an emerging attack, called Model Stealing (MS), to steal the functionalities of remote models, rendering the privacy of cloud-based machine learning services under threat. In this paper, we propose a new defense against MS attacks, using Semantic Inspection (called SeInspect). SeInspect mainly achieves two breakthroughs in this line of work. First, state-of-the-art MS attacks tend to craft malicious queries within a distribution close to benign ones. Such a characteristic increases the stealthiness of these attacks and makes them able to circumvent most of the existing MS defenses. In SeInspect, we introduce a semantic feature based detection method to amplify the query distribution discrepancy between malicious and benign users. Thus, SeInspect can detect stealthy MS attacks with a higher detection rate than existing defenses. Second, in our evaluation, we notice that existing defenses cause significantly increased response latency of model service due to repetfitive user-by-user inspection (e.g., increased by 7.01 times for PRADA, EuroS &P 2019). To mitigate the problem, we propose to analyze semantic features with a two-layer defense mechanism. The first layer can achieve a “quickshot” on users in batches and pick out all potentially malicious users. Then, the second layer identifies the attacker in a user-by-user manner. In our evaluation, we experiment with SeInspect on eight typical MS attacks. The result shows that SeInspect can detect two more attacks than prior works while reducing latency by at least $54.00\%$.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Tramèr, F., Zhang, F., Juels, A., Reiter, M.K., Ristenpart, T.: Stealing machine learning models via prediction apis. In: 25th Security Symposium (USENIX Security 16), pp. 601–618 (2016)
Google Scholar
Juuti, M., Szyller, S., Marchal, S., Asokan, N.: Prada: protecting against dnn model stealing attacks. In: IEEE European Symposium on Security and Privacy (EuroS &P). IEEE 2019, pp. 512–527 (2019)
Google Scholar
Zhang, Z., Chen, Y., Wagner, D.: Seat: similarity encoder by adversarial training for detecting model extraction attack queries. In: Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security, AISec 2021, pp. 37–48. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3474369.3486863
Pal, S., Gupta, Y., Kanade, A., Shevade, S.: Stateful detection of model extraction attacks. arXiv preprint arXiv:2107.05166 (2021)
Kesarwani, M., Mukhoty, B., Arya, V., Mehta, S.: Model extraction warning in mlaas paradigm. In: Proceedings of the 34th Annual Computer Security Applications Conference, pp. 371–380 (2018)
Google Scholar
Sadeghzadeh, A.M., Dehghan, F., Sobhanian, A.M., Jalili, R.: Hardness of samples is all you need: Protecting deep learning models using hardness of samples, arXiv preprint arXiv:2106.11424 (2021)
Pal, S., Gupta, Y., Shukla, A., Kanade, A., Shevade, S., Ganapathy, V.: Activethief: model extraction using active learning and unannotated public data. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 865–872, April 2020
Google Scholar
Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pp. 506–519 (2017)
Google Scholar
Gao, R., et al.: Maximum mean discrepancy test is aware of adversarial attacks. In: International Conference on Machine Learning. PMLR, pp. 3564–3575 (2021)
Google Scholar
Orekondy, T., Schiele, B., Fritz, M.: Knockoff nets: stealing functionality of black-box models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4954–4963 (2019)
Google Scholar
Hinton, G., Vinyals, O., Dean, J., et al.: Distilling the knowledge in a neural network, arXiv preprint arXiv:1503.02531, vol. 2, no. 7 (2015)
Angluin, D.: Queries and concept learning. Mach. Learn. 2(4), 319–342 (1988)
Article MathSciNet Google Scholar
Kariyappa, S., Prakash, A., Qureshi, M.K.: Maze: data-free model stealing attack using zeroth-order gradient estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13 814–13 823 (2021)
Google Scholar
Yuan, X., Ding, L., Zhang, L., Li, X., Wu, D.O.: Es attack: model stealing against deep neural networks without data hurdles. IEEE Trans. Emerging Top. Comput. Intell. (2022)
Google Scholar
Batina, L., Bhasin, S., Jap, D., Picek, S.: Csi nn: reverse engineering of neural network architectures through electromagnetic side channel. In: 28th USENIX Security Symposium (USENIX Security 19), pp. 515–532 (2019)
Google Scholar
Oh, S.J., Schiele, B., Fritz, M.: Towards reverse-engineering black-box neural networks. In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R. (eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700, pp. 121–144. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28954-6_7
Chapter Google Scholar
He, X., Jia, J., Backes, M., Gong, N.Z., Zhang, Y.: Stealing links from graph neural networks. In: 30th USENIX Security Symposium (USENIX Security 21), pp. 2669–2686 (2021)
Google Scholar
Chen, K., Guo, S., Zhang, T., Xie, X., Liu, Y.: Stealing deep reinforcement learning models for fun and profit. In: Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security, pp. 307–319 (2021)
Google Scholar
Takemura, T., Yanai, N., Fujiwara, T.: Model extraction attacks on recurrent neural networks. J. Inf. Process. 28, 1010–1024 (2020)
Google Scholar
Gong, X., Wang, Q., Chen, Y., Yang, W., Jiang, X.: Model extraction attacks and defenses on cloud-based machine learning models. IEEE Commun. Magaz. 58(12), 83–89 (2020)
Article Google Scholar
Gong, Z., Jiang, W., Zhan, J., Song, Z.: Model stealing defense with hybrid fuzzy models: work-in-progress. In: 2020 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ ISSS), pp. 30–31. IEEE (2020)
Google Scholar
Mori, Y., Nitanda, A., Takeda, A.: Bodame: bilevel optimization for defense against model extraction, arXiv preprint arXiv:2103.06797 (2021)
Kariyappa, S., Qureshi, M.K.: Defending against model stealing attacks with adaptive misinformation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2020)
Google Scholar
Orekondy, T., Schiele, B., Fritz, M.: Prediction poisoning: towards defenses against dnn model stealing attacks. arXiv preprint arXiv:1906.10908 (2019)
Lee, T., Edwards, B., Molloy, I., Su, D.: Defending against neural network model stealing attacks using deceptive perturbations. In: IEEE Security and Privacy Workshops (SPW), pp. 43–49. IEEE (2019)
Google Scholar
Zheng, H., Ye, Q., Hu, H., Fang, C., Shi, J.: BDPL: a boundary differentially private layer against machine learning model extraction attacks. In: Sako, K., Schneider, S., Ryan, P.Y.A. (eds.) ESORICS 2019. LNCS, vol. 11735, pp. 66–83. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29959-0_4
Chapter Google Scholar
Kesarwani, M., Mukhoty, B., Arya, V., Mehta, S.: Model extraction warning in mlaas paradigm. In: Proceedings of the 34th Annual Computer Security Applications Conference, ACSAC 2018. Association for Computing Machinery, New York, pp. 371–380 (2018). https://doi.org/10.1145/3274694.3274740
Jia, H., Choquette-Choo, C.A., Chandrasekaran, V., Papernot, N.: Entangled watermarks as a defense against model extraction. In: 30th USENIX Security Symposium (USENIX Security 21), pp. 1937–1954 (2021)
Google Scholar
Zhu, L., Li, Y., Jia, X., Jiang, Y., Xia, S.-T., Cao, X.: Defending against model stealing via verifying embedded external features. In: ICML 2021 Workshop on Adversarial Machine Learning (2021)
Google Scholar
Ahmed, M., Mahmood, A.N., Hu, J.: A survey of network anomaly detection techniques. J. Network Comput. Appl. 60, 19–31 (2016)
Article Google Scholar
Quiring, E., Arp, D., Rieck, K.: Forgotten siblings: Unifying attacks on machine learning and digital watermarking. IEEE European Symposium on Security and Privacy (EuroS &P), pp. 488–502. IEEE (2018)
Google Scholar
Correia-Silva, J.R., Berriel, R.F., Badue, C., de Souza, A.F., Oliveira-Santos, T.: Copycat cnn: stealing knowledge by persuading confession with random non-labeled data. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2018)
Google Scholar
Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13(25), 723–773 (2012). jmlr.org/papers/v13/gretton12a.html
Google Scholar
Leucht, A., Neumann, M.H.: Dependent wild bootstrap for degenerate u-and v-statistics. J. Multivariate Anal. 117, 257–280 (2013)
Article MathSciNet Google Scholar
Liu, F., Xu, W., Lu, J., Zhang, G., Gretton, A., Sutherland, D.J.: Learning deep kernels for non-parametric two-sample tests. In: ICML (2020)
Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. U21A20464, 61872283), the Natural Science Basic Research Program of Shaanxi (No. 2021JC-22), Key Research and Development Program of Shaanxi (No. 2022GY-029), CNKLSTISS, the China 111 Project.

Author information

Authors and Affiliations

Xidian University, Xi’an, China
Xinjing Liu, Zhuo Ma, Yang Liu, Junwei Zhang & Zhuzhu Wang
Zhejiang University, Hangzhou, China
Zhan Qin

Authors

Xinjing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhuo Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhan Qin
View author publications
You can also search for this author in PubMed Google Scholar
Junwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhuzhu Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhuo Ma or Yang Liu .

Editor information

Editors and Affiliations

Rutgers University, Newark, NJ, USA
Vijayalakshmi Atluri
Hamad Bin Khalifa University, Doha, Qatar
Roberto Di Pietro
Technical University of Denmark, Kongens Lyngby, Denmark
Christian D. Jensen
Technical University of Denmark, Kongens Lyngby, Denmark
Weizhi Meng

A Appendix

1.1 A.1 Maximum Mean Discrepancy

The maximum mean discrepancy (MMD) measures the closeness between two distributions $\mathbb {P}$ and $\mathbb {Q}$ [33], represented as:

$$\begin{aligned} \begin{aligned} \text {MMD}(\mathcal {F},\mathbb {P},\mathbb {Q}) = \sup _{f\in \mathcal {F}} \left| \mathbb {E}_{X\sim \mathbb {P}}[f(X)] - \mathbb {E}_{Y\sim \mathbb {Q}}[f(Y)] \right| . \end{aligned} \end{aligned}$$

(12)

where $\mathcal {F}$ is a set containing all continuous functions, X and Y are independent and identically distributed (iid) datasets. The MMD depends on $\mathcal {F}$. To ensure that the test of MMD is consistent in power, thus providing an analytic solution, $\mathcal {F}$ is restricted to be a unit ball in the reproducing kernel Hilbert space (RKHS). Thereby the kernel based MMD is defined as:

$$\begin{aligned} \begin{aligned} \text {MMD}(\mathcal {F},\mathbb {P},\mathbb {Q}) = \sup _{f\in \mathcal {H}, \left\| f \right\| _{\mathcal {H} \le 1 } }\left| \mathbb {E}_{X\sim \mathbb {P}}[f(X)] - \mathbb {E}_{Y\sim \mathbb {Q}}[f(Y)] \right| . \end{aligned} \end{aligned}$$

(13)

where k is a kernel regarding RKHS $\mathcal {H}_k$.

1.2 B.2 Settings of the Global Monitor

In our experiments, three types of global monitors are evaluated: RF, XGBoost and LightGBM. The final output of the global monitor is based on “one-vote veto” mechanism of four sub-models:

$$\begin{aligned} \begin{aligned} \mathcal {M_G}&=\{ \mathcal {M}^1_{\mathcal {G}}, \mathcal {M}^2_{\mathcal {G}}, \mathcal {M}^3_{\mathcal {G}}, \mathcal {M}^4_{\mathcal {G}} \} , \\ s.t., \mathcal {M_G}(x)&= \mathcal {M}^{1}_{\mathcal {G}}(x) \vee \mathcal {M}^{2}_{\mathcal {G}}(x) \vee \mathcal {M}^{3}_{\mathcal {G}}(x) \vee \mathcal {M}^{4}_{\mathcal {G}}(x). \end{aligned} \end{aligned}$$

(14)

where x denotes an input. The four sub-models focus on different proportions of attack samples, which are $[10\%, 25\%)$, $[25\%, 40\%)$, $[40\%, 70\%)$ and $[70\%, 100\%)$. To better detect attack samples with low proportion, we limit the proportions of attack samples of $\mathcal {M}^{1}_{\mathcal {G}}(x)$ and $\mathcal {M}^{2}_{\mathcal {G}}(x)$ within a smaller scale. Here we give a detailed description of each sub-model in Table 7.

Table 6. Test set accuracy (%) of the surrogate model obtained by MS attackers, when the target model is deployed with different defenses on FashionMNIST. ‘Accuracy’ (%) denotes accuracy of the surrogate model when the attack is detected. ‘Reduction’ (%) indicates the reduction of test set accuracy caused by defense. ‘Reduction’ (%) indicates the reduction of test set accuracy caused by defenses. ‘Top’ (%) represents the highest test set accuracy of the surrogate model without defense and budget constraints. ‘-’ denotes the defender fails to detect the attack within the query budget.

Full size table

Table 7. Settings of the global monitors (RF, XGBoost and LightGBM) on FashionMNIST and SVHN.

Full size table

1.3 C.3 Accuracy Reduction on FashionMNSIT

We present the accuracy reduction of three defenses (FashionMNIST) in Table 6. Compared with PRADA, SeInspect reduces accuracy by $12.67\%$. It is notable that, SeInspect causes a more significant accuracy reduction on SVHN than on FashionMNIST (especially on type-2 attacks T-RND-FGSM and COLOR), because the surrogate model on FashionMNIST can be trained by much fewer queries than on SVHN.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, X., Ma, Z., Liu, Y., Qin, Z., Zhang, J., Wang, Z. (2022). SeInspect: Defending Model Stealing via Heterogeneous Semantic Inspection. In: Atluri, V., Di Pietro, R., Jensen, C.D., Meng, W. (eds) Computer Security – ESORICS 2022. ESORICS 2022. Lecture Notes in Computer Science, vol 13554. Springer, Cham. https://doi.org/10.1007/978-3-031-17140-6_30

Download citation

DOI: https://doi.org/10.1007/978-3-031-17140-6_30
Published: 25 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17139-0
Online ISBN: 978-3-031-17140-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SeInspect: Defending Model Stealing via Heterogeneous Semantic Inspection

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Maximum Mean Discrepancy

1.2 B.2 Settings of the Global Monitor

1.3 C.3 Accuracy Reduction on FashionMNSIT

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation