HDM-Analyser: a hybrid analysis approach based on data mining techniques for malware detection

Eskandari, Mojtaba; Khorshidpour, Zeinab; Hashemi, Sattar

doi:10.1007/s11416-013-0181-8

HDM-Analyser: a hybrid analysis approach based on data mining techniques for malware detection

Original Paper
Published: 17 February 2013

Volume 9, pages 77–93, (2013)
Cite this article

Journal of Computer Virology and Hacking Techniques Aims and scope Submit manuscript

Mojtaba Eskandari¹,
Zeinab Khorshidpour¹ &
Sattar Hashemi¹

1051 Accesses
36 Citations
Explore all metrics

Abstract

Today’s security threats like malware are more sophisticated and targeted than ever, and they are growing at an unprecedented rate. To deal with them, various approaches are introduced. One of them is Signature-based detection, which is an effective method and widely used to detect malware; however, there is a substantial problem in detecting new instances. In other words, it is solely useful for the second malware attack. Due to the rapid proliferation of malware and the desperate need for human effort to extract some kinds of signature, this approach is a tedious solution; thus, an intelligent malware detection system is required to deal with new malware threats. Most of intelligent detection systems utilise some data mining techniques in order to distinguish malware from sane programs. One of the pivotal phases of these systems is extracting features from malware samples and benign ones in order to make at least a learning model. This phase is called “Malware Analysis” which plays a significant role in these systems. Since API call sequence is an effective feature for realising unknown malware, this paper is focused on extracting this feature from executable files. There are two major kinds of approach to analyse an executable file. The first type of analysis is “Static Analysis” which analyses a program in source code level. The second one is “Dynamic Analysis” that extracts features by observing program’s activities such as system requests during its execution time. Static analysis has to traverse the program’s execution path in order to find called APIs. Because it does not have sufficient information about decision making points in the given executable file, it is not able to extract the real sequence of called APIs. Although dynamic analysis does not have this drawback, it suffers from execution overhead. Thus, the feature extraction phase takes noticeable time. In this paper, a novel hybrid approach, HDM-Analyser, is presented which takes advantages of dynamic and static analysis methods for rising speed while preserving the accuracy in a reasonable level. HDM-Analyser is able to predict the majority of decision making points by utilising the statistical information which is gathered by dynamic analysis; therefore, there is no execution overhead. The main contribution of this paper is taking accuracy advantage of the dynamic analysis and incorporating it into static analysis in order to augment the accuracy of static analysis. In fact, the execution overhead has been tolerated in learning phase; thus, it does not impose on feature extraction phase which is performed in scanning operation. The experimental results demonstrate that HDM-Analyser attains better overall accuracy and time complexity than static and dynamic analysis methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Applying NLP techniques to malware detection in a practical environment

Article Open access 06 June 2021

Large language models and unsupervised feature learning: implications for log analysis

Article 04 April 2024

Survey of intrusion detection systems: techniques, datasets and challenges

Article Open access 17 July 2019

References

Abou-Assaleh, T., Cercone, N., Keselj, V., Sweidan, R.: Detection of new malicious code using n-grams signatures. In: Proceedings of Second Annual Conference on Privacy, Security and Trust, pp. 193–196. Citeseer (2004)
Bayer, U., Kruegel, C., Kirda, E.: Ttanalyze: A tool for analyzing malware. In: 15th European Institute for Computer Antivirus Research (EICAR 2006) Annual Conference. Citeseer (2006)
Bergeron, J., Debbabi, M., Desharnais, J., Erhioui, M., Lavoie, Y., Tawbi, N.: Static detection of malicious code in executable programs. Int. J. Req. Eng. 2001, 184–189 (2001)
Google Scholar
Bergeron, J., Debbabi, M., Desharnais, J., Ktari, B., Salois, M., Tawbi, N., Charpentier, R., Patry, M.: Detection of malicious code in cots software: A short survey. In: First International Software Assurance Certification Conference (ISACC99) (1999)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Cleary, J., Trigg, L.: K*: An instance-based learner using an entropic distance measure. In: Machine Learning-International Workshop Then Conference-, pp. 108–114. Citeseer (1995)
Dietterich, T.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Mach. Learn. 40(2), 139–157 (2000)
Article Google Scholar
Duds, R., Hart, P.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
Google Scholar
Fasikhov, R.: Api logger tool. http://blackninja2000.narod.ru/rus/api_logger.html
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2), 131–163 (1997)
Article MATH Google Scholar
Holmes, G., Donkin, A., Witten, I.: Weka: A machine learning workbench. In: Intelligent Information Systems, 1994. Proceedings of the 1994 Second Australian and New Zealand Conference on, pp. 357–361. IEEE (1994)
Iba, W., Langley, P.: Induction of one-level decision trees. In: Proceedings of the Ninth International Conference on, Machine Learning, pp. 233–240 (1992)
Idika, N., Mathur, A.: A survey of malware detection techniques. Purdue University (2007)
Langley, P., Iba, W., Thompson, K.: An analysis of bayesian classifiers. In: Proceedings of the National Conference on Artificial Intelligence, pp. 223–223. Wiley, Hoboken (1992)
Lewis, D.: Naive (bayes) at forty: The independence assumption in information retrieval. Machine Learning: ECML-98, pp. 4–15 (1998)
Orenstein, D.: Quickstudy: Application programming interface (api) (2000)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, Los Altos (1988)
Google Scholar
Picard, R., Cook, R.: Cross-validation of regression models. J. Am. Stat. Assoc., 575–583 (1984)
Platt, J.: 12 fast training of support vector machines using sequential minimal, optimization (1998)
Rabek, J., Khazan, R., Lewandowski, S., Cunningham, R.: Detection of injected, dynamically generated, and obfuscated malicious code. In: Proceedings of the 2003 ACM workshop on Rapid malcode, pp. 76–82. ACM (2003)
Roundy, K., Miller, B.: Hybrid analysis and control of malware. In: Recent Advances in Intrusion Detection, pp. 317–338. Springer, Berlin (2010)
Sekar, R., Bowen, T., Segal, M.: On preventing intrusions by process behavior monitoring. In: USENIX Intrusion Detection, Workshop, vol. 1999 (1999)
Siddiqui, M.: Data mining methods for malware detection. ProQuest (2008)
Sung, A., Xu, J., Chavez, P., Mukkamala, S.: Static analyzer of vicious executables (save). In: Computer Security Applications Conference, 2004. 20th Annual, pp. 326–334. IEEE (2004)
Szor, P.: The Art of Computer Virus Research and Defense. Addison-Wesley Professional, Reading (2005)
Tzermias, Z., Sykiotakis, G., Polychronakis, M., Markatos, E.: Combining static and dynamic analysis for the detection of malicious documents. In: Proceedings of the Fourth European Workshop on System Security, p. 4. ACM (2011)
Wagner, D., Dean, R.: Intrusion detection via static analysis. In: Security and Privacy, 2001. S &P 2001. Proceedings. 2001 IEEE Symposium on, pp. 156–168. IEEE (2001)
Xu, J., Sung, A., Chavez, P.: Mukkamala, S.: Polymorphic malicious executable scanner by api sequence analysis (2004)
Ye, Y., Wang, D., Li, T., Ye, D.: Imds: Intelligent malware detection system. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1043–1047. ACM (2007)

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Shiraz University, Shiraz, Iran
Mojtaba Eskandari, Zeinab Khorshidpour & Sattar Hashemi

Authors

Mojtaba Eskandari
View author publications
You can also search for this author in PubMed Google Scholar
Zeinab Khorshidpour
View author publications
You can also search for this author in PubMed Google Scholar
Sattar Hashemi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mojtaba Eskandari.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Eskandari, M., Khorshidpour, Z. & Hashemi, S. HDM-Analyser: a hybrid analysis approach based on data mining techniques for malware detection. J Comput Virol Hack Tech 9, 77–93 (2013). https://doi.org/10.1007/s11416-013-0181-8

Download citation

Received: 22 May 2012
Accepted: 29 January 2013
Published: 17 February 2013
Issue Date: May 2013
DOI: https://doi.org/10.1007/s11416-013-0181-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HDM-Analyser: a hybrid analysis approach based on data mining techniques for malware detection

Abstract

Access this article

Similar content being viewed by others

Applying NLP techniques to malware detection in a practical environment

Large language models and unsupervised feature learning: implications for log analysis

Survey of intrusion detection systems: techniques, datasets and challenges

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

HDM-Analyser: a hybrid analysis approach based on data mining techniques for malware detection

Abstract

Access this article

Similar content being viewed by others

Applying NLP techniques to malware detection in a practical environment

Large language models and unsupervised feature learning: implications for log analysis

Survey of intrusion detection systems: techniques, datasets and challenges

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation