Skip to main content

SeInspect: Defending Model Stealing via Heterogeneous Semantic Inspection

  • Conference paper
  • First Online:
Computer Security – ESORICS 2022 (ESORICS 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13554))

Included in the following conference series:

Abstract

Recent works developed an emerging attack, called Model Stealing (MS), to steal the functionalities of remote models, rendering the privacy of cloud-based machine learning services under threat. In this paper, we propose a new defense against MS attacks, using Semantic Inspection (called SeInspect). SeInspect mainly achieves two breakthroughs in this line of work. First, state-of-the-art MS attacks tend to craft malicious queries within a distribution close to benign ones. Such a characteristic increases the stealthiness of these attacks and makes them able to circumvent most of the existing MS defenses. In SeInspect, we introduce a semantic feature based detection method to amplify the query distribution discrepancy between malicious and benign users. Thus, SeInspect can detect stealthy MS attacks with a higher detection rate than existing defenses. Second, in our evaluation, we notice that existing defenses cause significantly increased response latency of model service due to repetfitive user-by-user inspection (e.g., increased by 7.01 times for PRADA, EuroS &P 2019). To mitigate the problem, we propose to analyze semantic features with a two-layer defense mechanism. The first layer can achieve a “quickshot” on users in batches and pick out all potentially malicious users. Then, the second layer identifies the attacker in a user-by-user manner. In our evaluation, we experiment with SeInspect on eight typical MS attacks. The result shows that SeInspect can detect two more attacks than prior works while reducing latency by at least \(54.00\%\).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Tramèr, F., Zhang, F., Juels, A., Reiter, M.K., Ristenpart, T.: Stealing machine learning models via prediction apis. In: 25th Security Symposium (USENIX Security 16), pp. 601–618 (2016)

    Google Scholar 

  2. Juuti, M., Szyller, S., Marchal, S., Asokan, N.: Prada: protecting against dnn model stealing attacks. In: IEEE European Symposium on Security and Privacy (EuroS &P). IEEE 2019, pp. 512–527 (2019)

    Google Scholar 

  3. Zhang, Z., Chen, Y., Wagner, D.: Seat: similarity encoder by adversarial training for detecting model extraction attack queries. In: Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security, AISec 2021, pp. 37–48. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3474369.3486863

  4. Pal, S., Gupta, Y., Kanade, A., Shevade, S.: Stateful detection of model extraction attacks. arXiv preprint arXiv:2107.05166 (2021)

  5. Kesarwani, M., Mukhoty, B., Arya, V., Mehta, S.: Model extraction warning in mlaas paradigm. In: Proceedings of the 34th Annual Computer Security Applications Conference, pp. 371–380 (2018)

    Google Scholar 

  6. Sadeghzadeh, A.M., Dehghan, F., Sobhanian, A.M., Jalili, R.: Hardness of samples is all you need: Protecting deep learning models using hardness of samples, arXiv preprint arXiv:2106.11424 (2021)

  7. Pal, S., Gupta, Y., Shukla, A., Kanade, A., Shevade, S., Ganapathy, V.: Activethief: model extraction using active learning and unannotated public data. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 865–872, April 2020

    Google Scholar 

  8. Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pp. 506–519 (2017)

    Google Scholar 

  9. Gao, R., et al.: Maximum mean discrepancy test is aware of adversarial attacks. In: International Conference on Machine Learning. PMLR, pp. 3564–3575 (2021)

    Google Scholar 

  10. Orekondy, T., Schiele, B., Fritz, M.: Knockoff nets: stealing functionality of black-box models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4954–4963 (2019)

    Google Scholar 

  11. Hinton, G., Vinyals, O., Dean, J., et al.: Distilling the knowledge in a neural network, arXiv preprint arXiv:1503.02531, vol. 2, no. 7 (2015)

  12. Angluin, D.: Queries and concept learning. Mach. Learn. 2(4), 319–342 (1988)

    Article  MathSciNet  Google Scholar 

  13. Kariyappa, S., Prakash, A., Qureshi, M.K.: Maze: data-free model stealing attack using zeroth-order gradient estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13 814–13 823 (2021)

    Google Scholar 

  14. Yuan, X., Ding, L., Zhang, L., Li, X., Wu, D.O.: Es attack: model stealing against deep neural networks without data hurdles. IEEE Trans. Emerging Top. Comput. Intell. (2022)

    Google Scholar 

  15. Batina, L., Bhasin, S., Jap, D., Picek, S.: Csi nn: reverse engineering of neural network architectures through electromagnetic side channel. In: 28th USENIX Security Symposium (USENIX Security 19), pp. 515–532 (2019)

    Google Scholar 

  16. Oh, S.J., Schiele, B., Fritz, M.: Towards reverse-engineering black-box neural networks. In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R. (eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700, pp. 121–144. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28954-6_7

    Chapter  Google Scholar 

  17. He, X., Jia, J., Backes, M., Gong, N.Z., Zhang, Y.: Stealing links from graph neural networks. In: 30th USENIX Security Symposium (USENIX Security 21), pp. 2669–2686 (2021)

    Google Scholar 

  18. Chen, K., Guo, S., Zhang, T., Xie, X., Liu, Y.: Stealing deep reinforcement learning models for fun and profit. In: Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security, pp. 307–319 (2021)

    Google Scholar 

  19. Takemura, T., Yanai, N., Fujiwara, T.: Model extraction attacks on recurrent neural networks. J. Inf. Process. 28, 1010–1024 (2020)

    Google Scholar 

  20. Gong, X., Wang, Q., Chen, Y., Yang, W., Jiang, X.: Model extraction attacks and defenses on cloud-based machine learning models. IEEE Commun. Magaz. 58(12), 83–89 (2020)

    Article  Google Scholar 

  21. Gong, Z., Jiang, W., Zhan, J., Song, Z.: Model stealing defense with hybrid fuzzy models: work-in-progress. In: 2020 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ ISSS), pp. 30–31. IEEE (2020)

    Google Scholar 

  22. Mori, Y., Nitanda, A., Takeda, A.: Bodame: bilevel optimization for defense against model extraction, arXiv preprint arXiv:2103.06797 (2021)

  23. Kariyappa, S., Qureshi, M.K.: Defending against model stealing attacks with adaptive misinformation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2020)

    Google Scholar 

  24. Orekondy, T., Schiele, B., Fritz, M.: Prediction poisoning: towards defenses against dnn model stealing attacks. arXiv preprint arXiv:1906.10908 (2019)

  25. Lee, T., Edwards, B., Molloy, I., Su, D.: Defending against neural network model stealing attacks using deceptive perturbations. In: IEEE Security and Privacy Workshops (SPW), pp. 43–49. IEEE (2019)

    Google Scholar 

  26. Zheng, H., Ye, Q., Hu, H., Fang, C., Shi, J.: BDPL: a boundary differentially private layer against machine learning model extraction attacks. In: Sako, K., Schneider, S., Ryan, P.Y.A. (eds.) ESORICS 2019. LNCS, vol. 11735, pp. 66–83. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29959-0_4

    Chapter  Google Scholar 

  27. Kesarwani, M., Mukhoty, B., Arya, V., Mehta, S.: Model extraction warning in mlaas paradigm. In: Proceedings of the 34th Annual Computer Security Applications Conference, ACSAC 2018. Association for Computing Machinery, New York, pp. 371–380 (2018). https://doi.org/10.1145/3274694.3274740

  28. Jia, H., Choquette-Choo, C.A., Chandrasekaran, V., Papernot, N.: Entangled watermarks as a defense against model extraction. In: 30th USENIX Security Symposium (USENIX Security 21), pp. 1937–1954 (2021)

    Google Scholar 

  29. Zhu, L., Li, Y., Jia, X., Jiang, Y., Xia, S.-T., Cao, X.: Defending against model stealing via verifying embedded external features. In: ICML 2021 Workshop on Adversarial Machine Learning (2021)

    Google Scholar 

  30. Ahmed, M., Mahmood, A.N., Hu, J.: A survey of network anomaly detection techniques. J. Network Comput. Appl. 60, 19–31 (2016)

    Article  Google Scholar 

  31. Quiring, E., Arp, D., Rieck, K.: Forgotten siblings: Unifying attacks on machine learning and digital watermarking. IEEE European Symposium on Security and Privacy (EuroS &P), pp. 488–502. IEEE (2018)

    Google Scholar 

  32. Correia-Silva, J.R., Berriel, R.F., Badue, C., de Souza, A.F., Oliveira-Santos, T.: Copycat cnn: stealing knowledge by persuading confession with random non-labeled data. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2018)

    Google Scholar 

  33. Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13(25), 723–773 (2012). jmlr.org/papers/v13/gretton12a.html

    Google Scholar 

  34. Leucht, A., Neumann, M.H.: Dependent wild bootstrap for degenerate u-and v-statistics. J. Multivariate Anal. 117, 257–280 (2013)

    Article  MathSciNet  Google Scholar 

  35. Liu, F., Xu, W., Lu, J., Zhang, G., Gretton, A., Sutherland, D.J.: Learning deep kernels for non-parametric two-sample tests. In: ICML (2020)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. U21A20464, 61872283), the Natural Science Basic Research Program of Shaanxi (No. 2021JC-22), Key Research and Development Program of Shaanxi (No. 2022GY-029), CNKLSTISS, the China 111 Project.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhuo Ma or Yang Liu .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Maximum Mean Discrepancy

The maximum mean discrepancy (MMD) measures the closeness between two distributions \(\mathbb {P}\) and \(\mathbb {Q}\) [33], represented as:

$$\begin{aligned} \begin{aligned} \text {MMD}(\mathcal {F},\mathbb {P},\mathbb {Q}) = \sup _{f\in \mathcal {F}} \left| \mathbb {E}_{X\sim \mathbb {P}}[f(X)] - \mathbb {E}_{Y\sim \mathbb {Q}}[f(Y)] \right| . \end{aligned} \end{aligned}$$
(12)

where \(\mathcal {F}\) is a set containing all continuous functions, X and Y are independent and identically distributed (iid) datasets. The MMD depends on \(\mathcal {F}\). To ensure that the test of MMD is consistent in power, thus providing an analytic solution, \(\mathcal {F}\) is restricted to be a unit ball in the reproducing kernel Hilbert space (RKHS). Thereby the kernel based MMD is defined as:

$$\begin{aligned} \begin{aligned} \text {MMD}(\mathcal {F},\mathbb {P},\mathbb {Q}) = \sup _{f\in \mathcal {H}, \left\| f \right\| _{\mathcal {H} \le 1 } }\left| \mathbb {E}_{X\sim \mathbb {P}}[f(X)] - \mathbb {E}_{Y\sim \mathbb {Q}}[f(Y)] \right| . \end{aligned} \end{aligned}$$
(13)

where k is a kernel regarding RKHS \(\mathcal {H}_k\).

1.2 B.2 Settings of the Global Monitor

In our experiments, three types of global monitors are evaluated: RF, XGBoost and LightGBM. The final output of the global monitor is based on “one-vote veto” mechanism of four sub-models:

$$\begin{aligned} \begin{aligned} \mathcal {M_G}&=\{ \mathcal {M}^1_{\mathcal {G}}, \mathcal {M}^2_{\mathcal {G}}, \mathcal {M}^3_{\mathcal {G}}, \mathcal {M}^4_{\mathcal {G}} \} , \\ s.t., \mathcal {M_G}(x)&= \mathcal {M}^{1}_{\mathcal {G}}(x) \vee \mathcal {M}^{2}_{\mathcal {G}}(x) \vee \mathcal {M}^{3}_{\mathcal {G}}(x) \vee \mathcal {M}^{4}_{\mathcal {G}}(x). \end{aligned} \end{aligned}$$
(14)

where x denotes an input. The four sub-models focus on different proportions of attack samples, which are \([10\%, 25\%)\), \([25\%, 40\%)\), \([40\%, 70\%)\) and \([70\%, 100\%)\). To better detect attack samples with low proportion, we limit the proportions of attack samples of \(\mathcal {M}^{1}_{\mathcal {G}}(x)\) and \(\mathcal {M}^{2}_{\mathcal {G}}(x)\) within a smaller scale. Here we give a detailed description of each sub-model in Table 7.

Table 6. Test set accuracy (%) of the surrogate model obtained by MS attackers, when the target model is deployed with different defenses on FashionMNIST. ‘Accuracy’ (%) denotes accuracy of the surrogate model when the attack is detected. ‘Reduction’ (%) indicates the reduction of test set accuracy caused by defense. ‘Reduction’ (%) indicates the reduction of test set accuracy caused by defenses. ‘Top’ (%) represents the highest test set accuracy of the surrogate model without defense and budget constraints. ‘-’ denotes the defender fails to detect the attack within the query budget.
Table 7. Settings of the global monitors (RF, XGBoost and LightGBM) on FashionMNIST and SVHN.

1.3 C.3 Accuracy Reduction on FashionMNSIT

We present the accuracy reduction of three defenses (FashionMNIST) in Table 6. Compared with PRADA, SeInspect reduces accuracy by \(12.67\%\). It is notable that, SeInspect causes a more significant accuracy reduction on SVHN than on FashionMNIST (especially on type-2 attacks T-RND-FGSM and COLOR), because the surrogate model on FashionMNIST can be trained by much fewer queries than on SVHN.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, X., Ma, Z., Liu, Y., Qin, Z., Zhang, J., Wang, Z. (2022). SeInspect: Defending Model Stealing via Heterogeneous Semantic Inspection. In: Atluri, V., Di Pietro, R., Jensen, C.D., Meng, W. (eds) Computer Security – ESORICS 2022. ESORICS 2022. Lecture Notes in Computer Science, vol 13554. Springer, Cham. https://doi.org/10.1007/978-3-031-17140-6_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-17140-6_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-17139-0

  • Online ISBN: 978-3-031-17140-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics