Skip to main content
Log in

An intelligent PE-malware detection system based on association mining

  • Original Paper
  • Published:
Journal in Computer Virology Aims and scope Submit manuscript

Abstract

The proliferation of malware has presented a serious threat to the security of computer systems. Traditional signature-based anti-virus systems fail to detect polymorphic/metamorphic and new, previously unseen malicious executables. Data mining methods such as Naive Bayes and Decision Tree have been studied on small collections of executables. In this paper, resting on the analysis of Windows APIs called by PE files, we develop the Intelligent Malware Detection System (IMDS) using Objective-Oriented Association (OOA) mining based classification. IMDS is an integrated system consisting of three major modules: PE parser, OOA rule generator, and rule based classifier. An OOA_Fast_FP-Growth algorithm is adapted to efficiently generate OOA rules for classification. A comprehensive experimental study on a large collection of PE files obtained from the anti-virus laboratory of KingSoft Corporation is performed to compare various malware detection approaches. Promising experimental results demonstrate that the accuracy and efficiency of our IMDS system outperform popular anti-virus software such as Norton AntiVirus and McAfee VirusScan, as well as previous data mining based detection systems which employed Naive Bayes, Support Vector Machine (SVM) and Decision Tree techniques. Our system has already been incorporated into the scanning tool of KingSoft’s Anti-Virus software.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Adleman, L.: An abstract theory of computer viruses (invited talk). In: CRYPTO ’88: Proceedings on Advances in Cryptology, pp. 354–374, New York, NY, USA. Springer, New York (1990)

  2. Agrawal, R., Imielinski, T.: Mining association rules between sets of items in large databases. In: Proceedings of SIGMOD (1993)

  3. Agrawal, R., Srikant, R.: Fast algorithms for association rule mining. In: Proceedings of VLDB-94 (1994)

  4. Cheng, H., Yan, X., Han, J., Hsu, C.: Discriminative frequenct pattern analysis for effective classification. In: Proceedings of IEEE 23rd International Conference on Data Engineering (ICDE-07) (2007)

  5. Christodorescu, M., Jha, S.: Static analysis of executables to detect malicious patterns. In: Proceedings of the 12th USENIX Security Symposium (2003)

  6. Fan M. and Li C. (2003). Mining frequent patterns in an fp-tree without conditional fp-tree generation. J. Comput. Res. Dev. 40: 1216–1222

    Google Scholar 

  7. Filiol E. (2005). Computer Viruses: from Theory to Applications. Springer, Heidelberg

    MATH  Google Scholar 

  8. Filiol E. (2006). Malware pattern scanning schemes secure against black-box analysis. J. Comput. Virol. 2(1): 35–50

    Article  Google Scholar 

  9. Filiol E., Jacob G. and Liard M.L. (2007). Evaluation methodology and theoretical model for antiviral behavioural detection strategies. J. Comput. Virol. 3(1): 27–37

    Article  Google Scholar 

  10. Han J. and Kamber M. (2006). Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco

    Google Scholar 

  11. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of SIGMOD, pp. 1–12, May (2000)

  12. Hsu C. and Lin C. (2002). A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13: 415–425

    Article  Google Scholar 

  13. Jain A., Duin R. and Mao J. (2000). Statistical pattern recognition: A review. IEEE Trans. Pattern Anal. Mach. Intell. 22: 4–37

    Article  Google Scholar 

  14. Kephart, J., Arnold, W.: Automatic extraction of computer virus signatures. In: Proceedings of 4th Virus Bulletin International Conference, pp. 178–184 (1994)

  15. Kolter, J., Maloof, M.: Learning to detect malicious executables in the wild. In: Proceedings of KDD’04 (2004)

  16. Kwak N. and Choi C. (2002). Input feature selection by mutual information based on parzen window. IEEE Trans. Pattern Anal. Mach. Intell. 24: 1667–1671

    Article  Google Scholar 

  17. Langley, P.: Selection of relevant features in machine learning. In: Proceedings of AAAI Fall Symposium (1994)

  18. Lee, T., Mody, J.: Behavioral classification. In: Proceedings of 2006 EICAR Conference (2006)

  19. Liu, B., Hsu, W., Ma, Y.: Integreting classification and association rule mining. In: Proceedings of KDD’98 (1998)

  20. Lo R., Levitt K. and Olsson R. (1995). Mcf: A malicious code filter. Comput. Secur. 14: 541–566

    Article  Google Scholar 

  21. McGraw G. and Morrisett G. (2002). Attacking malicious code: report to the infosec research council. IEEE Softw. 17(5): 33–41

    Article  Google Scholar 

  22. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27 (2005)

  23. Rabek, J., Khazan, R., Lewandowski, S., Cunningham, R.: Detection of injected, dynamically generated, and obfuscated malicious code. In: Proceedings of the 2003 ACM Workshop on Rapid Malcode, pp. 76–82 (2003)

  24. Schultz, M., Eskin, E., Zadok, E.: Data mining methods for detection of new malicious executables. In: Security and Privacy, 2001 Proceedings. 2001 IEEE Symposium on 14–16 May, pp. 38–49 (2001)

  25. Shen, Y., Yang, Q., Zhang, Z.: Objective-oriented utility-based association mining. In: Proceedings of IEEE International Conference on Data Mining (2002)

  26. Sung, A., Xu, J., Chavez, P., Mukkamala, S.: Static analyzer of vicious executables (save). In: Proceedings of the 20th Annual Computer Security Applications Conference (2004)

  27. Swets J. and Pickett R. (1982). Evaluation of Diagnostic System: Methods from Signal Detection Theory. Academic Press, New York

    Google Scholar 

  28. Tan P., Steinbach M. and Kumar V. (2005). Introduction to Data Mining. Addison Wesley, Reading

    Google Scholar 

  29. Vapnik V. (1999). The Nature of Statistical Learning Theory. Springer, Heidelberg

    Google Scholar 

  30. Wang, J., Deng, P., Fan, Y., Jaw, L., Liu, Y.: Virus detection using data mining techniques. In: Proceedings of IEEE International Conference on Data Mining (2003)

  31. Witten H. and Frank E. (2005). Data Mining: Practical Machine Learning Tools with Java Implementations. Morgan Kaufmann, San Francisco

    Google Scholar 

  32. Xu, J., Sung, A., Chavez, P., Mukkamala, S.: Polymorphic malicous executable sanner by api sequence analysis. In: Proceedings of the International Conference on Hybrid Intelligent Systems (2004)

  33. Ye, Y., Wang, D., Li, T., Ye, D.: IMDS: Intelligent malware detection system. In: Proccedings of ACM International Conference on Knowlege Discovery and Data Mining (SIGKDD 2007) (2007)

  34. Yin, X., Han, J.: Cpar: Classification based on predictive association rules. In: Proceedings of 3rd SIAM International Conference on Data Mining (SDM’03), May (2003)

  35. Zuo Z. and Tian Zhou M. (2004). Some further theoretical results about computer viruses. Comput. J. 47(6): 627–633

    Article  Google Scholar 

  36. Zuo Z., Zhu Q.-x. and Zhou M.-t. (2005). On the time complexity of computer viruses. IEEE Trans. Inf. Theory 51(8): 2962–2966

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Li.

Additional information

A short version of the paper is appeared in [33]. The work is partially supported by NSF IIS-0546280 and an IBM Faculty Research Award. The authors would also like to thank the members in the anti-virus laboratory at KingSoft Corporation for their helpful discussions and suggestions.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ye, Y., Wang, D., Li, T. et al. An intelligent PE-malware detection system based on association mining. J Comput Virol 4, 323–334 (2008). https://doi.org/10.1007/s11416-008-0082-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11416-008-0082-4

Keywords

Navigation