Malware Behavior Profiling from Unstructured Data

Chiam, Yoong Jien; Maarof, Mohd Aizaini; Kassim, Mohamad Nizam; Zainal, Anazida

doi:10.1007/978-3-030-49345-5_14

Malware Behavior Profiling from Unstructured Data

Yoong Jien Chiam¹⁸,
Mohd Aizaini Maarof¹⁸,
Mohamad Nizam Kassim^18,19 &
…
Anazida Zainal¹⁸

Conference paper
First Online: 01 August 2020

348 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1182))

Abstract

Recently, the emergence of the new malware has caused a major threat especially in finance sector in which many of the online banking data was stolen by the adversaries. The malware threats information needs to be collected immediately after its outbreak. Early detection can save others from being the victims. Unfortunately, there is time delay to get the new malware information into the Malware Database such as ExploitDB. A pre-emptive way needs to be taken to gather the first-hand information of the new malware as a preventive measure. One of the methods is by extracting information from open source data such as online news by using Named Entity Recognition (NER). However, the existing NER system is incapable to extract the domain specific entities from the online news accurately. The aim of this paper is to extract the malware entities and its behaviour attributes using extended version of NER with HMM and CRF. A malware annotated corpus is produced in order to conduct the supervise learning for the machine learning approach of the name entity tagger. The results show CRF performs slightly better than HMM. Few experiments are performed in order to optimize the performance of CRF in terms of feature extraction. Finally, the malware behaviour information is visualized onto a dashboard by combining few statistical graphs using matplotlib. The purpose of visualizing the malware behaviour profile extracted from the online news is to help cyber security experts to better understand the malware behaviour.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Bridges, R.A., Jones, C.L., Iannacone, M.D., Testa, K.M., Goodall, J.R.: Automatic labeling for entity extraction in cyber security, pp. 1–11 (2013)
Google Scholar
He, Y., Kayaalp, M.: Biological entity recognition with conditional random fields. In: Annual Symposium Proceedings/AMIA Symposium, AMIA, pp. 293–297 (2008)
Google Scholar
Joshi, A., Lal, R., Finin, T., Joshi, A.: Extracting cybersecurity related linked data from text. In: Proceedings - 2013 IEEE 7th International Conference on Semantic Computing, ICSC 2013, pp. 252–259 (2013)
Google Scholar
Kaspersky Lab Page. https://www.kaspersky.com/about/press-releases/2019_number-of-users-attacked-by-banking-trojans-grew. Accessed 15 Mar 2019
Knoth, P., Gooch, P.: An introduction to text mining research papers what is text mining? (September 2015)
Google Scholar
Lim, S.K., Muis, A.O., Lu, W., Ong, C.H.: MalwareTextDB: a database for annotated malware articles, pp. 1557–1567 (2017)
Google Scholar
Ponomareva, N., Rosso, P., Pla, F., Molina, A.: Conditional random fields vs. hidden markov models in a biomedical named entity recognition task. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing, RANLP, pp. 479–483 (May 2014)
Google Scholar
Quimbaya, A.P., Múnera, A.S., Rivera, R.A.G., Rodríguez, J.C.D., Velandia, O.M.M., Peña, A.A.G., Labbé, C.: Named entity recognition over electronic health records through a combined dictionary-based approach. Procedia Comput. Sci. 100, 55–61 (2016)
Article Google Scholar
Vail, D.L., Lafferty, J.D., Veloso, M.M.: Feature selection in conditional random fields for activity recognition. In: IEEE International Conference on Intelligent Robots and Systems, pp. 3379–3384 (2007)
Google Scholar
Zhu, F., Patumcharoenpol, P., Zhang, C., Yang, Y., Chan, J., Meechai, A., Shen, B.: Biomedical text mining and its applications in cancer research. J. Biomed. Inform. 46(2), 200–211 (2013)
Article Google Scholar

Download references

Acknowledgement

This work is a collaboration between Universiti Teknologi Malaysia and CyberSecurity Malaysia. It is partly supported by the Research Management Centre (RMC) at the Universiti Teknologi Malaysia (UTM) under High Impact Research Grant (HIR) (VOT PY/2018/02890).

Author information

Authors and Affiliations

Cyber Threat Intelligence Lab, Information Assurance and Security Research Group, School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia, 81310, Skudai, Johor, Malaysia
Yoong Jien Chiam, Mohd Aizaini Maarof, Mohamad Nizam Kassim & Anazida Zainal
Cyber Security Responsive Services Division, CyberSecurity Malaysia, Level 7, Tower 1, Menara Cyber Axis, Cyberjaya, Selangor, Malaysia
Mohamad Nizam Kassim

Authors

Yoong Jien Chiam
View author publications
You can also search for this author in PubMed Google Scholar
Mohd Aizaini Maarof
View author publications
You can also search for this author in PubMed Google Scholar
Mohamad Nizam Kassim
View author publications
You can also search for this author in PubMed Google Scholar
Anazida Zainal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anazida Zainal .

Editor information

Editors and Affiliations

Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR), Auburn, WA, USA
Ajith Abraham
Department of Computer Science and Engineering, Vardhaman College of Engineering, Hyderabad, Telangana, India
M. A. Jabbar
Departamento de Inteligencia Artificial, Universidad Politécnica de Madrid, Madrid, Spain
Sanju Tiwari
ISEP - Instituto Superior de Engenharia do Porto, Porto, Portugal
Isabel M. S. Jesus

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chiam, Y.J., Maarof, M.A., Kassim, M.N., Zainal, A. (2021). Malware Behavior Profiling from Unstructured Data. In: Abraham, A., Jabbar, M., Tiwari, S., Jesus, I. (eds) Proceedings of the 11th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2019). SoCPaR 2019. Advances in Intelligent Systems and Computing, vol 1182. Springer, Cham. https://doi.org/10.1007/978-3-030-49345-5_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-49345-5_14
Published: 01 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49344-8
Online ISBN: 978-3-030-49345-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics