Abstract
In the present cyber landscape, the sophistication level of malware attacks is rising steadily. Advanced Persistent Threats (APT) and other sophisticated attacks employ complex and intelligent malware. Such malware integrates numerous malignant capabilities into a single complex form of malware, known as multipurpose malware. As attacks get more complicated, it is increasingly important to be aware of what the detected malware can do and comprehend the complete range of functionalities. Traditional malware analysis focuses on malware detection and family classification. The family classification provides insights about the dominant capability rather than the full range of capabilities present in the malware, which is insufficient. Hence, we propose MalXCap to extract multiple functionalities (named malware capabilities) hidden within a single malware sample. MalXCap employs dynamic analysis and captures malware capabilities by identifying patterns of API call sequences to achieve the goal. In the current workflow, there is no publicly available malware capability dataset. Therefore, we analyze 8k malware samples collected from the public domain, identify 12 different capabilities, and prepare a dataset. We use this dataset to train MalXCap and learn the patterns of API sequences to detect different malignant capabilities. MalXCap demonstrates its efficiency by achieving 97.02% accuracy score and 0.0025 hamming loss. Analyzing the capabilities of malware enables security professionals to understand the advanced techniques used in malware, summarize the attack, and develop better countermeasures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A pioneer security firm. https://www.picussecurity.com.
- 2.
A pioneer security firm. https://www.mandiant.com.
- 3.
MalwareBazaarhttps://bazaar.abuse.ch.
- 4.
H0lyGh0st: f8fc2445a9814ca8cf48a979bff7f182d6538f4d1ff438cf259268e8b4b76f86.
- 5.
Medusa: 26af2222204fca27c0fdabf9eefbfdb638a8a9322b297119f85cce3c708090f0.
- 6.
GpCode: e9ffda70e3ab71ee9d165abec8f2c7c52a139b71666f209d2eaf0c704569d3b1.
- 7.
LockBit: 2ecf1fe02d8fb099b68e4d9bceeeadbe5fc8347f5a76d52f35ed48b516963735.
References
Qiu, J., et al.: Cyber code intelligence for Android malware detection. IEEE Trans. Cybern. 53(1), 617–627 (2022)
Ahmadi, M., Ulyanov, D., Semenov, S., Trofimov, M., Giacinto, G.: Novel feature extraction, selection and fusion for effective malware family classification. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pp. 183–194 (2016)
Qiu, J., et al.: A3CM: automatic capability annotation for Android malware. IEEE Access 7, 147156–147168 (2019). https://doi.org/10.1109/ACCESS.2019.2946392
Alrawi, O., et al.: Forecasting malware capabilities from cyber attack memory images. In: USENIX Security Symposium, pp. 3523–3540 (2021)
de Carvalho, A.C.P.L.F., Freitas, A.A.: A tutorial on multi-label classification techniques. In: Abraham, A., Hassanien, A.E., Snáašel, V. (eds.) Foundations of Computational Intelligence Volume 5. SCI, vol. 205, pp. 177–195. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01536-6_8
Han, W., Xue, J., Wang, Y., Zhang, F., Gao, X.: APTMalInsight: identify and cognize APT malware based on system call information and ontology knowledge framework. Inf. Sci. 546, 633–664 (2021)
von der Assen, J., et al.: A lightweight moving target defense framework for multi-purpose malware affecting IoT devices. arXiv preprint arXiv:2210.07719 (2022)
CAPA, Mandiant. https://github.com/mandiant/capa. Accessed 29 Apr 2023
New Picus Red Report warns of “Swiss Army knife” malware. https://www.picussecurity.com/press-release/red-report-2023-warns-of-swiss-army-knife-malware
Multipurpose malware: Sometimes Trojans come in threes. https://www.kaspersky.co.in/blog/multipurpose-malware-sometimes-trojans-come-in-threes/6059/
Joachims, T.: A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. Carnegie-Mellon University Pittsburgh PA, Department of Computer Science (1996)
Kumar, N., Mukhopadhyay, S., Gupta, M., Handa, A., Shukla, S.K.: Malware classification using early stage behavioural analysis. In: 2019 14th Asia Joint Conference on Information Security (AsiaJCIS), Kobe, Japan, pp. 16–23 (2019). https://doi.org/10.1109/AsiaJCIS.2019.00-10
Han, W., Xue, J., Wang, Y., Liu, Z., Kong, Z.: MalInsight: a systematic profiling based malware detection framework. J. Netw. Comput. Appl. 125, 236–250 (2019)
Gibert, D., Mateu, C., Planes, J.: The rise of machine learning for detection and classification of malware: research developments, trends and challenges. J. Netw. Comput. Appl. 153, 102526 (2020)
Kondrak, G.: N-gram similarity and distance. In: Consens, M., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 115–126. Springer, Heidelberg (2005). https://doi.org/10.1007/11575832_13
Multi-Purpose Ransomware Fuels DDoS Attacks. https://www.securityweek.com/multi-purpose-ransomware-fuels-ddos-attacks/
Zhang, M.-L., Zhou, Z.-H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014). https://doi.org/10.1109/TKDE.2013.39
CISA Alert AA23-040A: Maui and HolyGhost Ransomware Target Critical Infrastructure. https://www.picussecurity.com/resource/blog/cisa-alert-aa23-040a-maui-and-holyghost-ransomware-target-critical-infrastructure
TrickBot: Not Your Average Hat Trick - A Malware with Multiple Hats. https://www.cisecurity.org/insights/blog/trickbot-not-your-average-hat-trick-a-malware-with-multiple-hats. Accessed 02 May 2023
Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)
Drew, J., Moore, T., Hahsler, M.: Polymorphic malware detection using sequence classification methods. In: 2016 IEEE Security and Privacy Workshops (SPW), pp. 81–87. IEEE (2016)
GlobeImposter Ransomware Being Distributed with MedusaLocker via RDP. https://asec.ahnlab.com/en/48940/
Li, C., Lv, Q., Li, N., Wang, Y., Sun, D., Qiao, Y.: A novel deep framework for dynamic malware detection based on API sequence intrinsic features. Comput. Secur. 116, 102686 (2022)
Agarkar, S., Ghosh, S.: Malware detection & classification using machine learning. In: 2020 IEEE International Symposium on Sustainable Energy, Signal Processing and Cyber Security (iSSSC), pp. 1–6. IEEE (2020)
North Korean threat actor targets small and midsize businesses with H0lyGh0st ransomware. https://www.microsoft.com/en-us/security/blog/2022/07/14/north-korean-threat-actor-targets-small-and-midsize-businesses-with-h0lygh0st-ransomware/
Rani, N., Mishra, A., Kumar, R., Ghosh, S., Shukla, S.K., Bagade, P.: A generalized unknown malware classification. In: Li, F., Liang, K., Lin, Z., Katsikas, S.K. (eds.) SecureComm 2022. LNICST, vol. 462, pp. 793–806. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25538-0_41
Rani, N., Dhavale, S.V.: Leveraging machine learning for ransomware detection. arXiv preprint arXiv:2206.01919 (2022)
Malware Analysis - ransomware - b14c45c1792038fd69b5c75e604242a3. https://www.redpacketsecurity.com/malware-analysis-ransomware-b14c45c1792038fd69b5c75e604242a3/
Xu, Z., Fang, X., Yang, G.: MalBERT: a novel pre-training method for malware detection. Comput. Secur. 111, 102458 (2021)
Rani, N., Dhavale, S.V., Singh, A., Mehra, A.: A survey on machine learning-based ransomware detection. In: Giri, D., Raymond Choo, K.K., Ponnusamy, S., Meng, W., Akleylek, S., Prasad Maity, S. (eds.) ICMC 2021. AISC, vol. 1412, pp. 171–186. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-6890-6_13
Deng, X., Mirkovic, J.: Malware behavior through network trace analysis. In: Ghita, B., Shiaeles, S. (eds.) INC 2020. LNNS, vol. 180, pp. 3–18. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-64758-2_1
Ghamrawi, N., McCallum, A.: Collective multi-label classification. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 195–200 (2005)
Rewterz Threat Alert - Lockbit Ransomware - Active IOCs. https://www.rewterz.com/rewterz-news/rewterz-threat-alert-lockbit-ransomware-active-iocs-13/
Singh, A., Handa, A., Kumar, N., Shukla, S.K.: Malware classification using image representation. In: Dolev, S., Hendler, D., Lodha, S., Yung, M. (eds.) CSCML 2019. LNCS, vol. 11527, pp. 75–92. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20951-3_6
North Korean H0lyGh0st Ransomware Has Ties to Global Geopolitics. https://blogs.blackberry.com/en/2022/08/h0lygh0st-ransomware
Abusnaina, A., et al.: DL-FHMC: deep learning-based fine-grained hierarchical learning approach for robust malware classification. IEEE Trans. Dependable Secure Comput. 19(5), 3432–3447 (2021)
Amer, E., Zelinka, I.: A dynamic Windows malware detection and prediction method based on contextual understanding of API call sequence. Comput. Secur. 92, 101760 (2020)
Ahmed, I., Xu, W., Annavajjala, R., Yoo, W.-S.: Joint demodulation and decoding with multi-label classification using deep neural networks (2021)
Opitz, J., Burst, S.: Macro F1 and Macro F1. arXiv preprint arXiv:1911.03347 (2019)
Fujii, S., Yamagishi, R., Yamauchi, T.: Survey and analysis on ATT &CK mapping function of online sandbox for understanding and efficient using. J. Inf. Process. 30, 807–821 (2022). Released on J-STAGE 15 December 2022, Online ISSN 1882-6652. https://doi.org/10.2197/ipsjjip.30.807
Acknowledgement
We thank to the C3i (Cyber Security and Cyber Security for Cyber-Physical Systems) Innovation Hub at IIT Kanpur for partially funding this research project. A special thanks to Mr. Vikas Maurya for his insightful feedback.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Binary Relevance (BR). Binary Relevance is a popular and straightforward problem transformation method. In this method we chose 12 different gaussian naive bayes based single-label binary classifiers to predict 12 capabilities.
As illustrated in Fig. 5, each classifier produce output as 0/1 for each malware capability. We take the union of all outputs predicted by every classifier and consider them multi-label outputs for the given sample. This model’s effectiveness suffers if the dataset’s target labels are dependent or correlated with each other.
Classifier Chain (CC). This method solves the limitation of Binary Relevance by addressing the label correlation problem by using a chain of binary classifiers with same length as the number of target labels. As shown in Fig. 6, \(m_i\) represents a data sample which \(C_1\) uses as input (step 1) and predicts output as \(l_1\) (step 2), where \(l_1 \in \{0,1\}\). Further, \(C_2\) uses \(m_i\) and \(l_1\) combined as input (step 3) and produces output as \(l_2\) (step 4), where \(l_2 \in \{0,1\}\). Similarly, this chain goes on till \(C_n\), and we compute the union of each \(C_x\), where \(1 \le x \le n\), and produce a multi-label output of \(1 \times n\) dimensions. Following this approach, the CC method solves the label correlation problem present in the binary Relevance method.
Label Powerset (LP). This method addresses the issue of simultaneously assigning multiple labels to an instance. This method considers all possible label combinations for every instance in the dataset. As shown in the Table 6, If a data sample associates with two target labels, \(L_1\) and \(L_3\), it obtains a new target label as \(L_{1,3}\) in the dataset and repeat this for all data samples to transformed the dataset into single-label dataset. In the worst-case scenario, the LP method generates \(2^{|L|}\) number of new single-label target classes for L multi-label target classes. Thus, this method’s computational complexity poses a problem and it grows exponentially with the number of target classes.
Multi-label k Nearest Neighbors (ML-KNN). ML-kNN is a lazy learning approach and combines the concepts of KNN and Bayesian probability to make predictions for multi-label classification. It consists of two phases: training phase and prediction phase. In the training phase, the first step is to preprocess the data. Let N denote training instances and L denote total target labels. Each training instance i is denoted by a feature vector \(X_i\) of dimension D (where D depends on the type of feature transformation method), and its label vector \(Y_i\) is a binary vector of length L, indicating the presence or absence of each label. After that, For each class j, we estimate the prior probability \(P(Y_j)\) and the conditional probabilities \(P(X|Y_j)\) for each feature given the class using maximum likelihood estimation. We follow formula as given below:
where \(P(Y_j)\) represent prior probability and \(P(X_k|Y_j)\) represent conditional probabilities. After that, we store the transformed training instances and their corresponding label vectors. In next prediction phase, we convert the test instance into the same format as the training instances. Let \(X_{test}\) denote the feature vector of test instance. We use euclidean distance as distance metric to find K training instances that are most similar to \(X_{test}\) test instance based on the feature values. Let \(N_k\) denote the indices of nearest neighbors. Now, for each label j, we calculate the conditional probabilities \(P(Y_j|X_{test})\) using Bayes’ theorem:
where \(X_k\) represents the feature values of \(k^{th}\) nearest neighbor, and Z represent a normalization constant. The product \(\prod \) is taken over all K nearest neighbors.
Further, we select the top labels with the highest probabilities \(P(Y_j|X_{test})\) as the predicted labels for the given test instance. By considering the label probabilities and feature similarities, ML-kNN finds the K nearest neighbors and assigns labels based on their votes.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Saha, B., Rani, N., Shukla, S.K. (2023). MalXCap: A Method for Malware Capability Extraction. In: Meng, W., Yan, Z., Piuri, V. (eds) Information Security Practice and Experience. ISPEC 2023. Lecture Notes in Computer Science, vol 14341. Springer, Singapore. https://doi.org/10.1007/978-981-99-7032-2_14
Download citation
DOI: https://doi.org/10.1007/978-981-99-7032-2_14
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7031-5
Online ISBN: 978-981-99-7032-2
eBook Packages: Computer ScienceComputer Science (R0)