Abstract
We introduce a form of steganography in the domain of machine learning which we call training set camouflage. Imagine Alice has a training set on an illicit machine learning classification task. Alice wants Bob (a machine learning system) to learn the task. However, sending either the training set or the trained model to Bob can raise suspicion if the communication is monitored. Training set camouflage allows Alice to compute a second training set on a completely different – and seemingly benign – classification task. By construction, sending the second training set will not raise suspicion. When Bob applies his standard (public) learning algorithm to the second training set, he approximately recovers the classifier on the original task. Training set camouflage is a novel form of steganography in machine learning. We formulate training set camouflage as a combinatorial bilevel optimization problem and propose solvers based on nonlinear programming and local search. Experiments on real classification tasks demonstrate the feasibility of such camouflage.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alfeld, S., Zhu, X., Barford, P.: Explicit defense actions against test-set attacks. In: AAAI, pp. 1274–1280 (2017)
Balbach, F.J., Zeugmann, T.: Teaching randomized learners. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS (LNAI), vol. 4005, pp. 229–243. Springer, Heidelberg (2006). https://doi.org/10.1007/11776420_19
Barreno, M., Nelson, B., Joseph, A.D., Tygar, J.: The security of machine learning. Mach. Learn. 81(2), 121–148 (2010)
Barreno, M., Nelson, B., Sears, R., Joseph, A.D., Tygar, J.D.: Can machine learning be secure? In: Proceedings of the 2006 ACM Symposium on Information, Computer and Communications Security (2006)
Biggio, B., Roli, F.: Wild patterns: ten years after the rise of adversarial machine learning. arXiv preprint arXiv:1712.03141 (2017)
Brakerski, Z.: Fully homomorphic encryption without modulus switching from classical GapSVP. In: Safavi-Naini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 868–886. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32009-5_50
Brakerski, Z., Gentry, C., Vaikuntanathan, V.: (Leveled) fully homomorphic encryption without bootstrapping. ACM Trans. Comput. Theory (TOCT) 6(3), 13 (2014)
Brückner, M., Kanzow, C., Scheffer, T.: Static prediction games for adversarial learning problems. J. Mach. Learn. Res. 13, 2617–2654 (2012)
Brückner, M., Scheffer, T.: Nash equilibria of static prediction games. In: Advances in Neural Information Processing Systems (2009)
Brückner, M., Scheffer, T.: Stackelberg games for adversarial prediction problems. In: ACM SIGKDD (2011)
Bulò, S.R., Biggio, B., Pillai, I., Pelillo, M., Roli, F.: Randomized prediction games for adversarial machine learning. IEEE Trans. Neural Netw. Learn. Syst. 28, 2466–2478 (2016)
Bussieck, M.R., Pruessner, A.: Mixed-integer nonlinear programming. SIAG/OPT Newsl. Views News 14(1), 19–22 (2003)
Cachin, C.: An information-theoretic model for steganography. In: Aucsmith, D. (ed.) IH 1998. LNCS, vol. 1525, pp. 306–318. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-49380-8_21
Chandramouli, R.: A mathematical approach to steganalysis. In: Proceedings SPIE, vol. 4675, pp. 4–25 (2002)
Cox, I.J., Kalker, T., Pakura, G., Scheel, M.: Information transmission and steganography. In: Barni, M., Cox, I., Kalker, T., Kim, H.-J. (eds.) IWDW 2005. LNCS, vol. 3710, pp. 15–29. Springer, Heidelberg (2005). https://doi.org/10.1007/11551492_2
Dalvi, N., Domingos, P., Sanghai, S., Verma, D., et al.: Adversarial classification. In: ACM SIGKDD (2004)
Dziugaite, G.K., Roy, D.M., Ghahramani, Z.: Training generative neural networks via maximum mean discrepancy optimization. arXiv preprint arXiv:1505.03906 (2015)
Fridrich, J.: Feature-based steganalysis for JPEG images and its implications for future design of steganographic schemes. In: Fridrich, J. (ed.) IH 2004. LNCS, vol. 3200, pp. 67–81. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30114-1_6
Gentry, C., Sahai, A., Waters, B.: Homomorphic encryption from learning with errors: conceptually-simpler, asymptotically-faster, attribute-based. In: Canetti, R., Garay, J.A. (eds.) CRYPTO 2013. LNCS, vol. 8042, pp. 75–92. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40041-4_5
Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13(Mar), 723–773 (2012)
Hardt, M., Megiddo, N., Papadimitriou, C., Wootters, M.: Strategic classification. In: ACM ITCS (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE CVPR, pp. 770–778 (2016)
Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)
Hopper, N.J., Langford, J., von Ahn, L.: Provably secure steganography. In: Yung, M. (ed.) CRYPTO 2002. LNCS, vol. 2442, pp. 77–92. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45708-9_6
Hosmer Jr., D.W., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression, vol. 398. Wiley, Hoboken (2013)
Huang, L., Joseph, A.D., Nelson, B., Rubinstein, B.I., Tygar, J.: Adversarial machine learning. In: AISEC (2011)
Joachims, T.: A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. Technical report, Carnegie-Mellon University Pittsburgh PA, Department of Computer Science (1996)
Johnson, N.F., Jajodia, S.: Exploring steganography: seeing the unseen. Computer 31(2), 26–34 (1998)
Juels, A., Ristenpart, T.: Honey encryption: security beyond the brute-force bound. In: Nguyen, P.Q., Oswald, E. (eds.) EUROCRYPT 2014. LNCS, vol. 8441, pp. 293–310. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-55220-5_17
Katz, J., Menezes, A.J., Van Oorschot, P.C., Vanstone, S.A.: Handbook of Applied Cryptography. CRC Press, Boca Raton (1996)
Ker, A.D.: Steganalysis of LSB matching in grayscale images. IEEE Signal Process. Lett. 12(6), 441–444 (2005)
Kerckhoffs, A.: La Cryptographie Militaire (Part I), vol. 9, pp. 5–38 (1883)
Kerckhoffs, A.: La Cryptographie Militaire (Part II), vol. 9, pp. 161–191 (1883)
Kloft, M., Laskov, P.: A poisoning attack against online anomaly detection. In: NIPS Workshop on Machine Learning in Adversarial Environments for Computer Security. Citeseer (2007)
Kloft, M., Laskov, P.: Online anomaly detection under adversarial impact. In: AISTATS, pp. 405–412 (2010)
Kloft, M., Laskov, P.: Online anomaly detection under adversarial impact (2011)
Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, vol. 14(2), pp. 1137–1145. Montreal, Canada (1995)
Krasin, I., et al.: Openimages: a public dataset for large-scale multi-label and multi-class image classification. Dataset (2017). https://github.com/openimages
Krenn, R.: Steganography and steganalysis (2004)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)
Laskov, P., Kloft, M.: A framework for quantitative security analysis of machine learning. In: Proceedings of the 2nd ACM Workshop on Security and Artificial Intelligence (2009)
Letchford, J., Vorobeychik, Y.: Optimal interdiction of attack plans. In: AAMAS (2013)
Liu, J., Zhu, X.: The teaching dimension of linear learners. J. Mach. Learn. Res. 17(162), 1–25 (2016)
Liu, W., Chawla, S.: A game theoretical model for adversarial learning. In: IEEE International Conference on Data Mining Workshops 2009. ICDMW 2009 (2009)
López-Alt, A., Tromer, E., Vaikuntanathan, V.: On-the-fly multiparty computation on the cloud via multikey fully homomorphic encryption. In: Proceedings of the Forty-Fourth Annual ACM Symposium on Theory of Computing, pp. 1219–1234. ACM (2012)
Lowd, D., Meek, C.: Adversarial learning. In: ACM SIGKDD, pp. 641–647. ACM (2005)
Maganbhai, P.A.K., Chouhan, K.: A study and literature review on image steganography. Int. J. Comput. Sci. Inf. Technol. 6, 685–688 (2015)
Mei, S., Zhu, X.: Using machine teaching to identify optimal training-set attacks on machine learners. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Queirolo, F.: Steganography in images. Final Communications Report 3 (2011)
Reyzin, L., Russell, S.: More efficient provably secure steganography. Department of Computer Science, Boston University (2003)
Rich, E., Knight, K.: Artificial Intelligence. McGraw-Hill, New York (1991)
Rivest, R.L., Adleman, L., Dertouzos, M.L.: On data banks and privacy homomorphisms. Found. Secur. Comput. 4(11), 169–180 (1978)
Simmons, G.J.: The prisoners’ problem and the subliminal channel. In: Chaum, D. (ed.) Advances in Cryptology, pp. 51–67. Springer, Heidelberg (1984). https://doi.org/10.1007/978-1-4684-4730-9_5
Singh, K.U.: A survey on image steganography techniques. Int. J. Comput. Appl. 97(18) (2014)
Smart, N.P., Vercauteren, F.: Fully homomorphic encryption with relatively small key and ciphertext sizes. In: Nguyen, P.Q., Pointcheval, D. (eds.) PKC 2010. LNCS, vol. 6056, pp. 420–443. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13013-7_25
Steinwart, I.: On the influence of the kernel on the consistency of support vector machines. J. Mach. Learn. Res. 2(Nov), 67–93 (2001)
Tan, K.M.C., Killourhy, K.S., Maxion, R.A.: Undermining an anomaly-based intrusion detection system using common exploits. In: Wespi, A., Vigna, G., Deri, L. (eds.) RAID 2002. LNCS, vol. 2516, pp. 54–73. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-36084-0_4
Thompson, A.: All the news (2017). https://www.kaggle.com/snapcrack/all-the-news
Van Tilborg, H.C., Jajodia, S.: Encyclopedia of Cryptography and Security. Springer, Heidelberg (2014)
Vorobeychik, Y., Li, B.: Optimal randomized classification in adversarial settings. In: AAMAS (2014)
Wu, H.C.: The Karush-Kuhn-Tucker optimality conditions in an optimization problem with interval-valued objective function. Eur. J. Oper. Res. 176(1), 46–59 (2007)
Zhang, L., Wu, J., Zhou, N.: Image encryption with discrete fractional cosine transform and chaos. In: Fifth International Conference on Information Assurance and Security 2009. IAS 2009, vol. 2, pp. 61–64. IEEE (2009)
Zhang, X., Zhu, X., Wright, S.: Training set debugging using trusted items. In: AAAI (2018)
Acknowledgment
This work is supported in part by NSF 1545481, 1704117, 1623605, 1561512, and the MADLab AF Center of Excellence FA9550-18-1-0166.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix A: MMD as Eve’s Detection Function
A Appendix A: MMD as Eve’s Detection Function
One critical component of our camouflage framework is Eve’s detection function \(\varPsi \)—how she determines if a training set is suspicious or not. Eve’s detection function is a two-sample test as its goal is to discern if the two sets \({\mathcal {C}}, D\) are drawn from the same distribution or not. In what follows we discuss using Maximum Mean Discrepancy (MMD) [20] as Eve’s detection function, as we do in our experiments. MMD is a widely used two-sample test [17], but, of course other detection functions can be used in (1). We first review basic \(\mathbf{MMD }\) following [20]. Let p and \(p'\) be two Borel probability measures defined on a topological space \(\mathcal {Z}\). Given a class of functions \(\mathcal {F}\) such that \(f:\mathcal {Z}\mapsto {\mathbb R}, f\in \mathcal {F}\), \(\mathbf{MMD }\) is defined as \( \mathbf{MMD }(p,p')=\sup _{f\in \mathcal {F}}(E_{{\mathbf z}}[f({{\mathbf z}})]-E_{{{\mathbf z}}'}[f({{\mathbf z}}')]) \). Any unit ball in a reproducing kernel Hilbert space (RKHS) can be used as the function class \(\mathcal {F}\) if the kernel is universal (e.g., Gaussian and Laplace kernels [58]). Using this function space, \(\mathbf{MMD }\) is a metric. This means \(\mathbf{MMD }(p,p') = 0 \Leftrightarrow p = p'\). Computing \(\mathbf{MMD }\) requires the expectations to be known, which generally, is not the case in practice. We obtain an empirical estimation by replacing the population expectations with empirical mean computed on i.i.d. samples \(Z=\{{{\mathbf z}}_1,\ldots ,{{\mathbf z}}_n\}\) and \(Z'=\{{{\mathbf z}}'_1,\ldots ,{{\mathbf z}}'_m\}\) from p and \(p'\), respectively. We define

where k is the kernel of the RKHS. Let \(d=\vert \mathbf{MMD }(Z,Z')-\mathbf{MMD }(p,p')\vert \). Gretton et al. show that \( P\left( d > 2 \left( \sqrt{\frac{K}{n}} + \sqrt{\frac{K}{m}}\right) + \epsilon \right) \le 2 e^{-\frac{\epsilon ^2nm}{2K(n\,+\,m)}} \), where K is an upperbound on the kernel values. We convert the above bound into a one-sided hypothesis testing procedure. Under the null hypothesis \(p=p'\) we have \(\mathbf{MMD }(p,p')=0\). We consider positive deviations of \(\mathbf{MMD }(Z,Z')\) from \(\mathbf{MMD }(p,p')\). Equating the RHS with \(\alpha \) (probability of incorrectly stating \(p\ne p'\) also known as the type I error) gives a hypothesis test of level-\(\alpha \), where solving \(\epsilon \) as a function of \(\alpha \) gives \( \alpha = e^{-\frac{\epsilon ^2nm}{2K(n\,+\,m)}} \Rightarrow \epsilon = \sqrt{\frac{2K(n\,+\,m)}{nm}\log \frac{1}{\alpha }} \). We retain the null hypothesis if \( \mathbf{MMD }(Z,Z') - T <0 \), where the threshold is \(T = 2 \left( \sqrt{\frac{K}{n}} + \sqrt{\frac{K}{m}}\right) + \sqrt{\frac{2K(n\,+\,m)}{nm}\log \frac{1}{\alpha }}.\) This also defines Eve’s detection function (\(\varPsi ({\mathcal {C}},D)\)) at level-\(\alpha \): \( \varPsi ({\mathcal {C}},D)\equiv \mathbf{MMD }({\mathcal {C}},D) - T. \) If \(\varPsi ({\mathcal {C}},D) \ge 0\) then Eve realizes that \(D\) is not drawn i.i.d. from \(\mathbb {Q}_{({{\mathbf x}}, y)}\) and flags it as suspicious.
For all our experiments Eve used the RBF kernel \(k({{\mathbf z}}_i, {{\mathbf z}}_j) = \exp \left( -\frac{\Vert {{\mathbf z}}_i\,-\,{{\mathbf z}}_j \Vert ^2}{2\sigma ^2}\right) \). Eve set \(\sigma \) to be the median distance between points in the camouflage pool as proposed in [20]. Eve also included the scaled class label as a feature dimension: \([{{\mathbf x}}_i, c \mathbbm {1}\{y_i=1\}]\) where \(c=\max _{k,l\text { such that } y_k = y_l} \Vert {{\mathbf x}}_{k} - {{\mathbf x}}_{l}\Vert \) and \(\mathbbm {1}\{\cdot \}\) is the indicator function. This augmented feature enables Eve to monitor both features and labels. When using the NLP solver Alice only has to consider instances from camouflage pool. She calculated \(\mathbf{MMD }\) in the following manner:

Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Sen, A., Alfeld, S., Zhang, X., Vartanian, A., Ma, Y., Zhu, X. (2018). Training Set Camouflage. In: Bushnell, L., Poovendran, R., Başar, T. (eds) Decision and Game Theory for Security. GameSec 2018. Lecture Notes in Computer Science(), vol 11199. Springer, Cham. https://doi.org/10.1007/978-3-030-01554-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-01554-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01553-4
Online ISBN: 978-3-030-01554-1
eBook Packages: Computer ScienceComputer Science (R0)