Abstract
There have been significant developments in the application of Machine Learning based classifiers for identifying malware camouflaging as benign files (our study is based on PDF files) in recent times like PDFRate. However, unlike other fields where statistical techniques are used, malware detection lacks the fundamental assumption in ML-based techniques that the training data represents the perspective input. Instead, malware can be designed to specifically break the ML classifiers as an anomaly. We present a thorough study and the results of our improvement over the implementation of one such prominent project EvadeML, which is a Genetic Programming based technique to evade ML-based malware classifiers. EvadeML has shown 100% success rate for two target PDF malware classifiers PDFRate and Hidost. We have modified the EvadeML to have a better evasion efficiency for another PDF malware classifier AnalyzePDF and found significant improvement over the EvadeML. We have also tested our modified approach for the PDFRate malware classifier and found 100% success rate as in the original EvadeML.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
2018 internet security threat report. https://www.symantec.com/security-center/threat-report
AnalyzePDF - bringing the dirt up to the surface. https://hiddenillusion.github.io/2013/12/03/analyzepdf-bringing-dirt-up-to-surface/
CVE details. Adobe acrobat reader—CVE security vulnerabilities, versions and detailed reports. https://www.cvedetails.com/product/497
Jaff ransomware hiding in a PDF document. https://www.vmray.com/cyber-security-blog/jaff-ransomware-hiding-in-a-pdf-document/
Yara rules. https://github.com/Yara-Rules/rules
Banzhaf, W., Nordin, P., Keller, R.E., Francone, F.D.: Genetic Programming: An Introduction, vol. 1. Morgan Kaufmann, San Francisco (1998)
Chenette, S.: Malicious documents archive for signature testing and research - contagio malware dump. http://contagiodump.blogspot.com/2010/08/malicious-documents-archive-for.html
Dahl, G.E., Stokes, J.W., Deng, L., Yu, D.: Large-scale malware classification using random projections and neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, 26–31 May 2013, pp. 3422–3426 (2013). https://doi.org/10.1109/ICASSP.2013.6638293
Dang, H., Huang, Y., Chang, E.C.: Evading classifiers by morphing in the dark. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 119–133. ACM (2017)
Gonzalez, L.E., Vázquez, R.A.: Malware classification using Euclidean distance and artificial neural networks. In: 12th Mexican International Conference on Artificial Intelligence, MICAI 2013, México, Mexico, 24–30 November 2013, pp. 103–108 (2013). Special Session Proceedings. https://doi.org/10.1109/MICAI.2013.18
Grosse, K., Manoharan, P., Papernot, N., Backes, M., McDaniel, P.D.: On the (statistical) detection of adversarial examples. CoRR abs/1702.06280 (2017). http://arxiv.org/abs/1702.06280
Russu, P., Demontis, A., Biggio, B., Fumera, G., Roli, F.: Secure kernel machines against evasion attacks. In: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security, AISec@CCS 2016, Vienna, Austria, 28 October 2016, pp. 59–69 (2016). https://doi.org/10.1145/2996758.2996771
Smutz, C., Stavrou, A.: Malicious PDF detection using metadata and structural features. In: 28th Annual Computer Security Applications Conference, ACSAC 2012, Orlando, FL, USA, 3–7 December 2012, pp. 239–248 (2012). https://doi.org/10.1145/2420950.2420987
Srndic, N., Laskov, P.: Detection of malicious PDF files based on hierarchical document structure. In: 20th Annual Network and Distributed System Security Symposium, NDSS 2013, San Diego, California, USA, 24–27 February 2013 (2013). https://www.ndss-symposium.org/ndss2013/detection-malicious-pdf-files-based-hierarchical-document-structure
Tong, L., Li, B., Hajaj, C., Vorobeychik, Y.: Feature conservation in adversarial classifier evasion: a case study. CoRR abs/1708.08327 (2017). http://arxiv.org/abs/1708.08327
Xu, W., Qi, Y., Evans, D.: Automatically evading classifiers: a case study on PDF malware classifiers. In: 23rd Annual Network and Distributed System Security Symposium, NDSS 2016, San Diego, California, USA, 21–24 February 2016 (2016). http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2017/09/automatically-evading-classifiers.pdf
Acknowledgement
The research work has been conducted in the Information Security Education and Awareness (ISEA) Lab of Indian Institute of Technology, Guwahati, Assam, India. The authors would like to acknowledge IIT Guwahati, ISEA, and Ministry of Electronics and Information Technology (MeitY), Government of India for the support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Dey, S., Kumar, A., Sawarkar, M., Singh, P.K., Nandi, S. (2019). EvadePDF: Towards Evading Machine Learning Based PDF Malware Classifiers. In: Nandi, S., Jinwala, D., Singh, V., Laxmi, V., Gaur, M., Faruki, P. (eds) Security and Privacy. ISEA-ISAP 2019. Communications in Computer and Information Science, vol 939. Springer, Singapore. https://doi.org/10.1007/978-981-13-7561-3_11
Download citation
DOI: https://doi.org/10.1007/978-981-13-7561-3_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-7560-6
Online ISBN: 978-981-13-7561-3
eBook Packages: Computer ScienceComputer Science (R0)