DeepAM: a heterogeneous deep learning framework for intelligent malware detection

Ye, Yanfang; Chen, Lingwei; Hou, Shifu; Hardy, William; Li, Xin

doi:10.1007/s10115-017-1058-9

DeepAM: a heterogeneous deep learning framework for intelligent malware detection

Regular Paper
Published: 09 May 2017

Volume 54, pages 265–285, (2018)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Yanfang Ye ORCID: orcid.org/0000-0001-8376-7239¹,
Lingwei Chen¹,
Shifu Hou¹,
William Hardy¹ &
…
Xin Li¹

3563 Accesses
88 Citations
3 Altmetric
Explore all metrics

Abstract

With computers and the Internet being essential in everyday life, malware poses serious and evolving threats to their security, making the detection of malware of utmost concern. Accordingly, there have been many researches on intelligent malware detection by applying data mining and machine learning techniques. Though great results have been achieved with these methods, most of them are built on shallow learning architectures. Due to its superior ability in feature learning through multilayer deep architecture, deep learning is starting to be leveraged in industrial and academic research for different applications. In this paper, based on the Windows application programming interface calls extracted from the portable executable files, we study how a deep learning architecture can be designed for intelligent malware detection. We propose a heterogeneous deep learning framework composed of an AutoEncoder stacked up with multilayer restricted Boltzmann machines and a layer of associative memory to detect newly unknown malware. The proposed deep learning model performs as a greedy layer-wise training operation for unsupervised feature learning, followed by supervised parameter fine-tuning. Different from the existing works which only made use of the files with class labels (either malicious or benign) during the training phase, we utilize both labeled and unlabeled file samples to pre-train multiple layers in the heterogeneous deep learning framework from bottom to up for feature learning. A comprehensive experimental study on a real and large file collection from Comodo Cloud Security Center is performed to compare various malware detection approaches. Promising experimental results demonstrate that our proposed deep learning framework can further improve the overall performance in malware detection compared with traditional shallow learning methods, deep learning methods with homogeneous framework, and other existing anti-malware scanners. The proposed heterogeneous deep learning framework can also be readily applied to other malware detection tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

malC: A Novel Deep Learning Architecture for Malware Classification

Malware Classification Using Multi-layer Perceptron Model

An Artificial Intelligence Approach for Malware Detection Using Deep Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

https://www.virustotal.com/.

References

Arel I, Rose DC, Karnowski TP (2010) Deep machine learning—a new frontier in artificial intelligence research. IEEE Comput Intell Mag 5(4):13–18
Article Google Scholar
Bailey M, Oberheide J, Andersen J, Mao Z, Ahanian F, Nazario J (2007) Automated classification and analysis of internet malware. In: 10th international symposium on research in attacks, intrusions and defenses (RAID) 2007, LNCS, pp 178–197
Bengio Y, LeCun Y (2007) Scaling learning algorithms towards AI. Large-Scale Kernel Mach 34(5):1–41
Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127
Article MathSciNet MATH Google Scholar
Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: Advances in neural information processing systems 19 (NIPS’06), pp 153–160
Carreira-Perpinan M, Hinton G (2005) On contrastive divergence learning. In: Proceedings of the tenth international workshop on artificial intelligence and statistics
Cesare S, Xiang Y, Zhou W (2014) Control flow-based malware variant detection. IEEE Trans Dependable Secure Comput 11(4):307–317
Article Google Scholar
Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning (ICML’08), pp 160–167
Dunne RA (2007) A statistical approach to neural networks for pattern recognition, 1st edn. Wiley, New York
Book MATH Google Scholar
Egele M, Scholte T, Kirda E, Kruegel C (2008) A survey on automated dynamic malware analysis techniques and tools. In: ACM computing surveys (CSUR), vol 44(2), pp 6:1–6:42
Filiol E (2006) Malware pattern scanning schemes secure against blackbox analysis. J Comput Virol 2(1):35–50
Article Google Scholar
Filiol E, Jacob G, Liard ML (2007) Evaluation methodology and theoretical model for antiviral behavioural detection strategies. J Comput Virol 3(1):27–37
Article Google Scholar
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Article MathSciNet MATH Google Scholar
Hinton GE, Osindero S, Teh Y (2006) A fast learning algorithm for deep belief nets. Neural Comput 18:1527–1554
Article MathSciNet MATH Google Scholar
Hinton GE (2012) A practical guide to training restricted Boltzmann machines. Neural Netw Tricks Trade 7700:599–619
Article Google Scholar
Hinton GE, Dayan P, Frey BJ, Neal RM (1995) The wake-sleep algorithm for unsupervised neural networks. Science 268(5214):1158–1161
Article Google Scholar
Hinton GE (2007) To recognize shapes, first learn to generate images. Prog Brain Res 165:535–547
Article Google Scholar
Hou S, Chen L, Tas E, Demihovskiy I, Ye Y (2015) Cluster-oriented ensemble classifiers for malware detection. In: IEEE international conference on semantic computing (IEEE ICSC), pp 189–196
Huang W, Song G, Hong H, Xie K (2014) Deep architecture for traffic flow prediction: deep belief networks with multitask learning. IEEE Trans Intell Transp Syst 15(5):2191–2201
Article Google Scholar
Jung W, Kim S, Choi S (2015) Poster: deep learning for zero-day flash malware detection. In: 36th IEEE symposium on security and privacy
Kaspersky Lab (2015) The great bank robbery. http://www.kaspersky.com/about/news/virus/2015/Carbanak-cybergang-steals-1-bn-USD-from-100-financial-institutions-worldwide
Kavukcuoglu K, Sermanet P, Boureau Y, Gregor K, Mathieu M, LeCun Y (2010) Learning convolutional feature hierarchies for visual recognition. In: Advances in neural information processing systems (NIPS 2010), vol 23
Kephart J, Arnold W (1994) Automatic extraction of computer virus signatures. In: Proceedings of 4th virus bulletin international conference, pp 178–184
Kolter J, Maloof M (2004) Learning to detect malicious executables in the wild. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (ACM SIGKDD’04), pp 470–478
Kong D, Yan G (2013) Discriminant malware distance learning on structural information for automated malware classification. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1357–1365
Li Y, Ma R, Jiao R (2015) A hybrid malicious code detection method based on deep learning. Int J Secur Appl 9(5):205–216
Google Scholar
Lv Y, Duan Y, Kang W, Li Z, Wang F (2015) Traffic flow prediction with big data: a deep learning approach. IEEE Trans Intell Transp Syst 16(2):865–873
Google Scholar
Masud MM, Al-Khateeb TM, Hamlen KW, Gao J, Khan L, Han J, Thuraisingham B (2008) Cloud-based malware detection for evolving data streams. In: ACM transactions on management information systems (TMIS), vol 2(3), pp 16:1–16:27
Menahem E, Shabtai A, Levhar A (2013) Detecting malware through temporal function-based features. In: Proceedings of the 2013 ACM SIGSAC conference on computer and communications security, pp 1379–1382
Ouellette J, Pfeffer A, Lakhotia A (2013) Countering malware evolution using cloud-based learning. In: 8th international conference on malicious and unwanted software (MALWARE), pp 85–94
Park Y, Zhang Q, Reeves D, Mulukutla V (2010) AntiBot: clustering common semantic patterns for bot detection. In: IEEE 34th annual computer software and applications conference, pp 262–272
Schultz M, Eskin E, Zadok E (2001) Data mining methods for detection of new malicious executables. In: Proccedings of IEEE symposium on security and privacy
Shah S, Jani H, Shetty S, Bhowmick K (2013) Virus detection using artificial neural networks. Int J Comput Appl 84(5):3–21
Sung A, Xu J, Chavez P, Mukkamala S (2005) Static analyzer of vicious executables (save). In: Proceedings of the 20th annual computer security applications conference (ACSAC), pp 326–334
Symantec (2016) Internet security threat report. https://www.symantec.com/secu-rity-center/threat-report
Teh YW, Hinton GE (2001) Rate-coded restricted Boltzmann machines for face recognition. In: Proceedings of advances in neural information processing systems, pp 908–914
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408
MathSciNet MATH Google Scholar
Wang J, Deng P, Fan Y, Jaw L, Liu Y (2003) Virus detection using data mining techniques. In: Proccedings of IEEE 37th annual 2003 international Carnahan conference security technology
Wueest C (2016) Symantec security response: financial threats 2015. http://www.syman-tec.com/content/en/us/enterprise/media/security_response/whitepapers/financial-threats-2015.pdf
Ye Y, Wang D, Li T, Ye D, Jiang Q (2008) An intelligent PE-malware detection system based on association mining. J Comput Virol 4:323–334
Article Google Scholar
Ye Y, Wang D, Li T, Ye D (2007) IMDS: intelligent malware detection system. In: Proceedings of the 13th ACM SIGKDD, pp 1043–1047
Ye Y, Li T, Zhu S, Zhuang W, Tas E, Gupta U, Abdulhayoglu M (2011) Combining file content and file relations for cloud based malware detection. In: Proceedings of ACM international conference on knowledge discovery and data mining (ACM SIGKDD), pp 222–230

Download references

Acknowledgements

The authors would also like to thank the anti-malware experts of Comodo Security Lab for the data collection as well as helpful discussions and supports. This work is partially supported by the US National Science Foundation under Grant CNS-1618629.

Author information

Authors and Affiliations

Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, 26506, USA
Yanfang Ye, Lingwei Chen, Shifu Hou, William Hardy & Xin Li

Authors

Yanfang Ye
View author publications
You can also search for this author inPubMed Google Scholar
Lingwei Chen
View author publications
You can also search for this author inPubMed Google Scholar
Shifu Hou
View author publications
You can also search for this author inPubMed Google Scholar
William Hardy
View author publications
You can also search for this author inPubMed Google Scholar
Xin Li
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Yanfang Ye.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ye, Y., Chen, L., Hou, S. et al. DeepAM: a heterogeneous deep learning framework for intelligent malware detection. Knowl Inf Syst 54, 265–285 (2018). https://doi.org/10.1007/s10115-017-1058-9

Download citation

Received: 12 May 2016
Accepted: 27 April 2017
Published: 09 May 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s10115-017-1058-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DeepAM: a heterogeneous deep learning framework for intelligent malware detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

malC: A Novel Deep Learning Architecture for Malware Classification

Malware Classification Using Multi-layer Perceptron Model

An Artificial Intelligence Approach for Malware Detection Using Deep Learning

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now