Abstract
The proliferation of malware has presented a serious threat to the security of computer systems. Traditional signature-based anti-virus systems fail to detect polymorphic/metamorphic and new, previously unseen malicious executables. Data mining methods such as Naive Bayes and Decision Tree have been studied on small collections of executables. In this paper, resting on the analysis of Windows APIs called by PE files, we develop the Intelligent Malware Detection System (IMDS) using Objective-Oriented Association (OOA) mining based classification. IMDS is an integrated system consisting of three major modules: PE parser, OOA rule generator, and rule based classifier. An OOA_Fast_FP-Growth algorithm is adapted to efficiently generate OOA rules for classification. A comprehensive experimental study on a large collection of PE files obtained from the anti-virus laboratory of KingSoft Corporation is performed to compare various malware detection approaches. Promising experimental results demonstrate that the accuracy and efficiency of our IMDS system outperform popular anti-virus software such as Norton AntiVirus and McAfee VirusScan, as well as previous data mining based detection systems which employed Naive Bayes, Support Vector Machine (SVM) and Decision Tree techniques. Our system has already been incorporated into the scanning tool of KingSoft’s Anti-Virus software.
Similar content being viewed by others
References
Adleman, L.: An abstract theory of computer viruses (invited talk). In: CRYPTO ’88: Proceedings on Advances in Cryptology, pp. 354–374, New York, NY, USA. Springer, New York (1990)
Agrawal, R., Imielinski, T.: Mining association rules between sets of items in large databases. In: Proceedings of SIGMOD (1993)
Agrawal, R., Srikant, R.: Fast algorithms for association rule mining. In: Proceedings of VLDB-94 (1994)
Cheng, H., Yan, X., Han, J., Hsu, C.: Discriminative frequenct pattern analysis for effective classification. In: Proceedings of IEEE 23rd International Conference on Data Engineering (ICDE-07) (2007)
Christodorescu, M., Jha, S.: Static analysis of executables to detect malicious patterns. In: Proceedings of the 12th USENIX Security Symposium (2003)
Fan M. and Li C. (2003). Mining frequent patterns in an fp-tree without conditional fp-tree generation. J. Comput. Res. Dev. 40: 1216–1222
Filiol E. (2005). Computer Viruses: from Theory to Applications. Springer, Heidelberg
Filiol E. (2006). Malware pattern scanning schemes secure against black-box analysis. J. Comput. Virol. 2(1): 35–50
Filiol E., Jacob G. and Liard M.L. (2007). Evaluation methodology and theoretical model for antiviral behavioural detection strategies. J. Comput. Virol. 3(1): 27–37
Han J. and Kamber M. (2006). Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of SIGMOD, pp. 1–12, May (2000)
Hsu C. and Lin C. (2002). A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13: 415–425
Jain A., Duin R. and Mao J. (2000). Statistical pattern recognition: A review. IEEE Trans. Pattern Anal. Mach. Intell. 22: 4–37
Kephart, J., Arnold, W.: Automatic extraction of computer virus signatures. In: Proceedings of 4th Virus Bulletin International Conference, pp. 178–184 (1994)
Kolter, J., Maloof, M.: Learning to detect malicious executables in the wild. In: Proceedings of KDD’04 (2004)
Kwak N. and Choi C. (2002). Input feature selection by mutual information based on parzen window. IEEE Trans. Pattern Anal. Mach. Intell. 24: 1667–1671
Langley, P.: Selection of relevant features in machine learning. In: Proceedings of AAAI Fall Symposium (1994)
Lee, T., Mody, J.: Behavioral classification. In: Proceedings of 2006 EICAR Conference (2006)
Liu, B., Hsu, W., Ma, Y.: Integreting classification and association rule mining. In: Proceedings of KDD’98 (1998)
Lo R., Levitt K. and Olsson R. (1995). Mcf: A malicious code filter. Comput. Secur. 14: 541–566
McGraw G. and Morrisett G. (2002). Attacking malicious code: report to the infosec research council. IEEE Softw. 17(5): 33–41
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27 (2005)
Rabek, J., Khazan, R., Lewandowski, S., Cunningham, R.: Detection of injected, dynamically generated, and obfuscated malicious code. In: Proceedings of the 2003 ACM Workshop on Rapid Malcode, pp. 76–82 (2003)
Schultz, M., Eskin, E., Zadok, E.: Data mining methods for detection of new malicious executables. In: Security and Privacy, 2001 Proceedings. 2001 IEEE Symposium on 14–16 May, pp. 38–49 (2001)
Shen, Y., Yang, Q., Zhang, Z.: Objective-oriented utility-based association mining. In: Proceedings of IEEE International Conference on Data Mining (2002)
Sung, A., Xu, J., Chavez, P., Mukkamala, S.: Static analyzer of vicious executables (save). In: Proceedings of the 20th Annual Computer Security Applications Conference (2004)
Swets J. and Pickett R. (1982). Evaluation of Diagnostic System: Methods from Signal Detection Theory. Academic Press, New York
Tan P., Steinbach M. and Kumar V. (2005). Introduction to Data Mining. Addison Wesley, Reading
Vapnik V. (1999). The Nature of Statistical Learning Theory. Springer, Heidelberg
Wang, J., Deng, P., Fan, Y., Jaw, L., Liu, Y.: Virus detection using data mining techniques. In: Proceedings of IEEE International Conference on Data Mining (2003)
Witten H. and Frank E. (2005). Data Mining: Practical Machine Learning Tools with Java Implementations. Morgan Kaufmann, San Francisco
Xu, J., Sung, A., Chavez, P., Mukkamala, S.: Polymorphic malicous executable sanner by api sequence analysis. In: Proceedings of the International Conference on Hybrid Intelligent Systems (2004)
Ye, Y., Wang, D., Li, T., Ye, D.: IMDS: Intelligent malware detection system. In: Proccedings of ACM International Conference on Knowlege Discovery and Data Mining (SIGKDD 2007) (2007)
Yin, X., Han, J.: Cpar: Classification based on predictive association rules. In: Proceedings of 3rd SIAM International Conference on Data Mining (SDM’03), May (2003)
Zuo Z. and Tian Zhou M. (2004). Some further theoretical results about computer viruses. Comput. J. 47(6): 627–633
Zuo Z., Zhu Q.-x. and Zhou M.-t. (2005). On the time complexity of computer viruses. IEEE Trans. Inf. Theory 51(8): 2962–2966
Author information
Authors and Affiliations
Corresponding author
Additional information
A short version of the paper is appeared in [33]. The work is partially supported by NSF IIS-0546280 and an IBM Faculty Research Award. The authors would also like to thank the members in the anti-virus laboratory at KingSoft Corporation for their helpful discussions and suggestions.
Rights and permissions
About this article
Cite this article
Ye, Y., Wang, D., Li, T. et al. An intelligent PE-malware detection system based on association mining. J Comput Virol 4, 323–334 (2008). https://doi.org/10.1007/s11416-008-0082-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11416-008-0082-4