Abstract
Malware detection is still an open problem. There are numerous attacks that take place every day where malware is used to steal private information, disrupt services, or sabotage industrial systems. In this paper, we combine three kinds of contextual information, namely static, dynamic, and instruction-based, for malware detection. This leads to the definition of more than thirty thousand features, which is a large features set that covers a wide range of a sample characteristics. Through experiments with one million files, we show that this features set leads to machine learning based models that can detect both malware seen roughly at the time when the models are built, and malware first seen even months after the models were built (i.e., the detection models remain effective months ahead of time). This may be due to the comprehensiveness of the features set.
Similar content being viewed by others
Notes
For more information, see https://goo.gl/FCEPLh
References
Ahmadi, M., Giacinto, G., Ulyanov, D., Semenov, S., Trofimov, M.: Novel feature extraction, selection and fusion for effective malware family classification. ArXiv e-prints (2015)
Ahmed, F., Hameed, H., Shafiq, M.Z., Farooq, M.: Using spatio-temporal information in api calls with machine learning algorithms for malware detection. In: Proceedings of the 2Nd ACM Workshop on Security and Artificial Intelligence, AISec ’09, pp. 55–62. ACM, New York, NY, USA (2009). doi:10.1145/1654988.1655003
aldeid.com: PEiD. http://www.aldeid.com/wiki/PEiD. Accessed: Feb. 8th, 2014
Anderson, B., Storlie, C., Lane, T.: Improving malware classification: bridging the static/dynamic gap. In: Proceedings of the 5th ACM workshop on Security and artificial intelligence, pp. 3–14. ACM (2012)
AV-Comparative: File detection test of malicious software. (March 2015)
CNET: lenovo hit by lawsuit over superfish adware. http://www.cnet.com/news/lenovo-hit-by-lawsuit-over-superfish-adware/. Accessed 9 December 2015
Demme, J., Maycock, M., Schmitz, J., Tang, A., Waksman, A., Sethumadhavan, S., Stolfo, S.: On the feasibility of online malware detection with performance counters. SIGARCH Comput. Archit. News 41(3), 559–570 (2013). doi:10.1145/2508148.2485970
Ding, Y., Dai, W., Yan, S., Zhang, Y.: Control flow-based opcode behavior analysis for malware detection. Computers & Security 44, 65–74 (2014). doi:10.1016/j.cose.2014.04.003. http://www.sciencedirect.com/science/article/pii/S0167404814000558
Hiramoto, K.: Technical account manager at VirusTotal. Personal Communication. Sept. 24th, 2014
Huang, J., Zhang, X., Tan, L., Wang, P., Liang, B.: Asdroid: Detecting stealthy behaviors in android applications by user interface and program behavior contradiction. In: Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pp. 1036–1046. ACM, New York, NY, USA (2014). doi:10.1145/2568225.2568301
Kang, B., Han, K.S., Kang, B., Im, E.G.: Malware categorization using dynamic mnemonic frequency analysis with redundancy filtering. Digit. Investig. 11(4), 323–335 (2014). doi:10.1016/j.diin.2014.06.003. http://www.sciencedirect.com/science/article/pii/S1742287614000772
Kolter, J.Z., Maloof, M.A.: Learning to detect malicious executables in the wild. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04, pp. 470–478. ACM, New York, NY, USA (2004). doi:10.1145/1014052.1014105
Kompalli, S.: Using existing hardware services for malware detection. In: Security and Privacy Workshops (SPW), 2014 IEEE, pp. 204–208. IEEE (2014)
Labs, K.: The great bank robbery: the carbanak apt. http://securelist.com/blog/research/68732/the-great-bank-robbery-the-carbanak-apt/. Accessed 25 Mar 2015
Labs, M.: Mcafee labs threats report for february 2015. http://www.mcafee.com/us/resources/reports/rp-quarterly-threat-q4-2014.pdf. Accessed 25 Mar 2015
M0SA: Syp.01: Bypassing online dynamic analysis systems. Valhalla ezine, issue #4, November 2013. http://vxheaven.org/lib/vmo04.html
Martinez, E.: Software engineer at VirusTotal. Personal Communication. Dec. 25th, 2014
Miao, Q., Liu, J., Cao, Y., Song, J.: Malware detection using bilayer behavior abstraction and improved one-class support vector machines. Int. J. Inf. Secur. 15(14), 1–19 (2015). doi:10.1007/s10207-015-0297-6
Microsoft: Microsoft pe and coff specification. https://msdn.microsoft.com/en-us/windows/hardware/gg463119.aspx. Accessed 20 Nov 2015
pefile: https://github.com/erocarrera/pefile. Accessed 6 June 2015
Perdisci, R., Lanzi, A., Lee, W.: Classification of packed executables for accurate computer virus detection. Pattern Recogn. Lett. 29(14), 1941–1946 (2008). doi:10.1016/j.patrec.2008.06.016
Quist, D., Smith, V., Computing, O.: Detecting the presence of virtual machines using the local data table. Offens. Comput. (2006)
Ravula, R.R., Liszka, K.J., Chan, C.C.: Learning attack features from static and dynamic analysis of malware. In: Knowledge Discovery, Knowledge Engineering and Knowledge Management, pp. 109–125. Springer (2013)
Saleh, M., Ratazzi, E., Xu, S.: Instructions-based detection of sophisticated obfuscation and packing. In: Military Communications Conference (MILCOM), 2014 IEEE, pp. 1–6 (2014). doi:10.1109/MILCOM.2014.9
Saleh, M.E., Mohamed, A.B., Nabi, A.A.: Eigenviruses for metamorphic virus recognition. IET Inf. Secur. 5(4), 191–198 (2011)
Salehi, Z., Sami, A., Ghiasi, M.: Using feature generation from API calls for malware detection. Comput. Fraud Secur. 2014(9), 9–18 (2014)
Sandbox, C.: Cuckoo sandbox: automated malware analysis. Accessed 6 June 2015
Santos, I., Devesa, J., Brezo, F., Nieves, J., Bringas, P.G.: Opem: a static-dynamic approach for machine-learning-based malware detection. In: International Joint Conference CISIS12-ICEUTE’ 12-SOCO’ 12 Special Sessions, pp. 271–280. Springer (2013)
Santos, I., Ugarte-Pedrero, X., Sanz, B., Laorden, C., Bringas, P.G.: Collective classification for packed executable identification. In: Proceedings of the 8th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, CEAS ’11, pp. 23–30. ACM, New York, NY, USA (2011). doi:10.1145/2030376.2030379
Saxe, J., Berlin, K.: Deep neural network based malware detection using two dimensional binary program features. arXiv preprint arXiv:1508.03096 (2015)
Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: Proceedings 2001 IEEE Symposium on Security and Privacy, 2001. S&P 2001, pp. 38–49. IEEE (2001)
Shafiq, M., Tabish, S., Farooq, M.: PE-probe: leveraging packer detection and structural information to detect malicious portable executables. In: Proceedings of the Virus Bulletin Conference (VB), pp. 29–33 (2009)
Shafiq, M., Tabish, S., Mirza, F., Farooq, M.: PE-Miner: Mining structural information to detect malicious executables in real-time. In: E. Kirda, S. Jha, D. Balzarotti (eds.) Recent Advances in Intrusion Detection. Lecture Notes in Computer Science, vol. 5758, pp. 121–141. Springer, Berlin Heidelberg (2009). doi:10.1007/978-3-642-04342-0_7
Shahzad, F., Farooq, M.: Elf-miner: using structural knowledge and data mining methods to detect new (linux) malicious executables. Knowl. Inf. Syst. 30(3), 589–612 (2012). doi:10.1007/s10115-011-0393-5
Storlie, C., Anderson, B., Vander Wiel, S., Quist, D., Hash, C., Brown, N.: Stochastic identification of malware with dynamic traces. ArXiv e-prints (2014)
Tang, A., Sethumadhavan, S., Stolfo, S.J.: Unsupervised anomaly-based malware detection using hardware features. CoRR arXiv:1403.1631 (2014)
Tian, R., Islam, M., Batten, L., Versteeg, S.: Differentiating malware from cleanware using behavioural analysis. In: 2010 5th International Conference on Malicious and Unwanted Software (MALWARE), pp. 23–30 (2010). doi:10.1109/MALWARE.2010.5665796
Treadwell, S., Zhou, M.: A heuristic approach for detection of obfuscated malware. In: IEEE International Conference on Intelligence and Security Informatics, 2009 ISI ’09, pp. 291–299 (2009). doi:10.1109/ISI.2009.5137328
UPX: Upx: The ultimate packer for executables. http://upx.sourceforge.net/. Accessed 7 Dec 2015
VirusTotal: http://www.VirusTotal.com/. Accessed 6 June 2015
Weka: Weka 3: Data mining software in Java. http://www.cs.waikato.ac.nz/ml/weka/. Accessed 6 June 2015
Yan, G., Brown, N., Kong, D.: Exploring discriminatory features for automated malware classification. In: Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 41–61. Springer (2013)
You, I., Yim, K.: Malware obfuscation techniques: a brief survey. In: BWCCA, pp. 297–300 (2010)
Zetter, K.: Countdown to Zero Day: Stuxnet and the Launch of the World’s First Digital Weapon. Crown Publishing Group, New York (2014)
Acknowledgements
We thank VirusTotal for providing us the dataset that is analyzed in the present paper. We also thank John Charlton for proofreading the paper. The research was supported in part by ARO Grant #W911NF-13-1-0141, NSF Grants #1111925, #IIS-1213026 and #CNS-1461926.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Saleh, M., Li, T. & Xu, S. Multi-context features for detecting malicious programs. J Comput Virol Hack Tech 14, 181–193 (2018). https://doi.org/10.1007/s11416-017-0304-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11416-017-0304-8