Abstract
In the security application, an attacker might violate the data stationary assumption that is a common assumption in the most machine learning techniques. This problem named as the domain shift problem arises when training (source) and test (target) data follow different distributions. The inherent adversarial nature of the security applications considerably effects on the robustness of a learning system. For that, a classifier designer needs to evaluate the robustness of a learning system under potential attacks during the design phase. The previous studies investigate the effect of reduced feature vector on the security evaluation of a learning classifier. They demonstrate that traditional feature selection techniques lead to even worsen performance. Therefore, an adversary-aware feature selection algorithm is proposed to improve the robustness of the learning systems. However, prior studies in domain adaptation techniques which are fundamental in addressing domain shift problem demonstrate that original space may not be directly suitable for refining this distribution mismatch, because some features may have been distorted by the domain shift. In this paper, we propose domain invariant feature extraction model based on domain adaptation technique in order to address domain shift problem caused by an adversary. We conduct an experiment that graphically shows the effect of a successful attack on the MNIST handwritten digits classification task. After that, we design synthetic datasets to investigate the effect of reduced feature vector on the performance of a learning system under attack. Moreover, our proposed feature extraction model significantly outperforms the adversarial-aware feature selection and traditional feature selection models on the application of spam filtering
Similar content being viewed by others
References
Barreno M, Nelson B, Joseph AD, Tygar J (2010) The security of machine learning. Mach Learn 81(2):121–148
Barreno M, Nelson B, Sears R, Joseph AD, Tygar JD (2006) Can machine learning be secure? In: Proceedings of the 2006 ACM symposium on information, computer and communications security. ACM, pp 16–25
Basu T, Murthy C (2016) A supervised term selection technique for effective text categorization. Int J Mach Learn Cybern 7(5):877–892
Biggio B, Corona I, Maiorca D, Nelson B, Šrndić N, Laskov P, Giacinto G, Roli F (2013) Evasion attacks against machine learning at test time. In: Machine learning and knowledge discovery in databases. Springer, pp 387–402
Biggio B, Fumera G, Roli F (2014) Security evaluation of pattern classifiers under attack. Knowl Data Eng IEEE Trans 26(4):984–996
Blitzer J, McDonald R, Pereira F (2006) Domain adaptation with structural correspondence learning. In: Proceedings of the 2006 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 120–128
Brückner M, Kanzow C, Scheffer T (2012) Static prediction games for adversarial learning problems. J Mach Learn Res 13(1):2617–2654
Brückner M, Scheffer T (2011) Stackelberg games for adversarial prediction problems. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 547–555
Byrd RH, Lu P, Nocedal J, Zhu C (1995) A limited memory algorithm for bound constrained optimization. SIAM J Sci Comput 16(5):1190–1208
Cao J, Chen T, Fan J (2016) Landmark recognition with compact bow histogram and ensemble ELM. Multimed Tools Appl 75(5):2839–2857
Chen J, Guo M, Wang X, Liu B (2016) A comprehensive review and comparison of different computational methods for protein remote homology detection. Brief Bioinform. doi:10.1093/bib/bbw108
Daume III H (2007) Frustratingly easy domain adaptation. In: Proceedings of the 45th annual meeting of the Association of Computational Linguistics, Prague, Czech Republic. pp 256–263
Dekel O, Shamir O, Xiao L (2010) Learning to classify with missing and corrupted features. Mach Learn 81(2):149–178
Duan L, Tsang IW, Xu D, Maybank SJ (2009) Domain transfer svm for video concept detection. In: IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009. IEEE, pp 1375–1381
Gopalan R, Li R, Chellappa R (2014) Unsupervised adaptation across domain shifts by generating intermediate data representations. Pattern Ana Mach Intell IEEE Trans 36(11):2288–2302
Huang J, Gretton A, Borgwardt KM, Schölkopf B, Smola AJ (2006) Correcting sample selection bias by unlabeled data. In: Advances in neural information processing systems, pp 601–608
Huang L, Joseph AD, Nelson B, Rubinstein BI, Tygar J (2011) Adversarial machine learning. In: Proceedings of the 4th ACM workshop on Security and artificial intelligence. ACM, pp 43–58
Jorgensen Z, Zhou Y, Inge M (2008) A multiple instance learning strategy for combating good word attacks on spam filters. J Mach Learn Res 9:1115–1146
Kołcz A, Teo CH (2009) Feature weighting for improved classifier robustness. In: CEAS09: sixth conference on email and anti-spam
Li B, Vorobeychik Y (2014) Feature cross-substitution in adversarial classification. In: Advances in neural information processing systems, pp 2087–2095
Liu B, Wang S, Dong Q, Li S, Liu X (2016) Identification of DNA-binding proteins by combining auto-cross covariance transformation and ensemble learning. IEEE Trans Nanobiosci 15(4):328–334
Long M, Wang J, Ding G, Sun J, Yu P (2013) Transfer feature learning with joint distribution adaptation. In: Proceedings of the IEEE international conference on computer vision, pp 2200–2207
Long M, Wang J, Ding G, Sun J, Yu P (2014) Transfer joint matching for unsupervised domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1410–1417
Lowd D, Meek C (2005) Adversarial learning. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining. ACM, pp 641–647
Macdonald C, Ounis I, Soboroff I (2007) Overview of the TREC 2007 blog track. In: TREC, vol 7. Citeseer, pp 31–43
Nelson B, Barreno M, Chi FJ, Joseph AD, Rubinstein BI, Saini U, Sutton C, Tygar J, Xia K (2009) Misleading learners: co-opting your spam filter. In: Machine learning in cyber trust. Springer, pp 17–51
Hearst MA et al (1998) Support vector machines. IEEE Intell Syst App 13(4):18–28
Pan SJ, Ni X, Sun J-T, Yang Q, Chen Z (2010) Cross-domain sentiment classification via spectral feature alignment. In: Proceedings of the 19th international conference on World Wide Web. ACM, pp 751–760
Pan SJ, Tsang IW, Kwok JT, Yang Q (2011) Domain adaptation via transfer component analysis. Neural Netw IEEE Trans 22(2):199–210
Saenko K, Kulis B, Fritz M, Darrell T (2010) Adapting visual category models to new domains. In: Computer vision—ECCV 2010. Springer, pp 213–226
Shah AR, Oehmen CS, Webb-Robertson B-J (2008) Svm-hustlean iterative semi-supervised machine learning approach for pairwise protein remote homology detection. Bioinformatics 24(6):783–790
Uguroglu S, Carbonell J (2011) Feature selection for transfer learning. In: Machine learning and knowledge discovery in databases. Springer, pp 430–442
Wang F, Liu W, Chawla S (2014) On sparse feature attacks in adversarial learning. In: 2014 IEEE international conference on data mining (ICDM). IEEE, pp 1013–1018
Xiao H, Biggio B, Brown G, Fumera G, Eckert C, Roli F (2015) Is feature selection secure against training data poisoning? In: Proceedings of the 32nd international conference on machine learning (ICML-15), pp 1689–1698
Zhang F et al (2016) Adversarial feature selection against evasion attacks. IEEE Trans Cybern 46(3):766–777
Zhu C, Byrd RH, Lu P, Nocedal J (1997) Algorithm 778: L-bfgs-b: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans Math Softw (TOMS) 23(4):550–560
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Khorshidpour, Z., Tahmoresnezhad, J., Hashemi, S. et al. Domain invariant feature extraction against evasion attack. Int. J. Mach. Learn. & Cyber. 9, 2093–2104 (2018). https://doi.org/10.1007/s13042-017-0692-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-017-0692-6