MalXCap: A Method for Malware Capability Extraction

Saha, Bikash; Rani, Nanda; Shukla, Sandeep Kumar

doi:10.1007/978-981-99-7032-2_14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14341))

Included in the following conference series:

International Conference on Information Security Practice and Experience

323 Accesses

Abstract

In the present cyber landscape, the sophistication level of malware attacks is rising steadily. Advanced Persistent Threats (APT) and other sophisticated attacks employ complex and intelligent malware. Such malware integrates numerous malignant capabilities into a single complex form of malware, known as multipurpose malware. As attacks get more complicated, it is increasingly important to be aware of what the detected malware can do and comprehend the complete range of functionalities. Traditional malware analysis focuses on malware detection and family classification. The family classification provides insights about the dominant capability rather than the full range of capabilities present in the malware, which is insufficient. Hence, we propose MalXCap to extract multiple functionalities (named malware capabilities) hidden within a single malware sample. MalXCap employs dynamic analysis and captures malware capabilities by identifying patterns of API call sequences to achieve the goal. In the current workflow, there is no publicly available malware capability dataset. Therefore, we analyze 8k malware samples collected from the public domain, identify 12 different capabilities, and prepare a dataset. We use this dataset to train MalXCap and learn the patterns of API sequences to detect different malignant capabilities. MalXCap demonstrates its efficiency by achieving 97.02% accuracy score and 0.0025 hamming loss. Analyzing the capabilities of malware enables security professionals to understand the advanced techniques used in malware, summarize the attack, and develop better countermeasures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A pioneer security firm. https://www.picussecurity.com.
2.
A pioneer security firm. https://www.mandiant.com.
3.
MalwareBazaarhttps://bazaar.abuse.ch.
4.
H0lyGh0st: f8fc2445a9814ca8cf48a979bff7f182d6538f4d1ff438cf259268e8b4b76f86.
5.
Medusa: 26af2222204fca27c0fdabf9eefbfdb638a8a9322b297119f85cce3c708090f0.
6.
GpCode: e9ffda70e3ab71ee9d165abec8f2c7c52a139b71666f209d2eaf0c704569d3b1.
7.
LockBit: 2ecf1fe02d8fb099b68e4d9bceeeadbe5fc8347f5a76d52f35ed48b516963735.

References

Qiu, J., et al.: Cyber code intelligence for Android malware detection. IEEE Trans. Cybern. 53(1), 617–627 (2022)
Article Google Scholar
Ahmadi, M., Ulyanov, D., Semenov, S., Trofimov, M., Giacinto, G.: Novel feature extraction, selection and fusion for effective malware family classification. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pp. 183–194 (2016)
Google Scholar
Qiu, J., et al.: A3CM: automatic capability annotation for Android malware. IEEE Access 7, 147156–147168 (2019). https://doi.org/10.1109/ACCESS.2019.2946392
Article Google Scholar
Alrawi, O., et al.: Forecasting malware capabilities from cyber attack memory images. In: USENIX Security Symposium, pp. 3523–3540 (2021)
Google Scholar
de Carvalho, A.C.P.L.F., Freitas, A.A.: A tutorial on multi-label classification techniques. In: Abraham, A., Hassanien, A.E., Snáašel, V. (eds.) Foundations of Computational Intelligence Volume 5. SCI, vol. 205, pp. 177–195. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01536-6_8
Chapter Google Scholar
Han, W., Xue, J., Wang, Y., Zhang, F., Gao, X.: APTMalInsight: identify and cognize APT malware based on system call information and ontology knowledge framework. Inf. Sci. 546, 633–664 (2021)
Article Google Scholar
von der Assen, J., et al.: A lightweight moving target defense framework for multi-purpose malware affecting IoT devices. arXiv preprint arXiv:2210.07719 (2022)
CAPA, Mandiant. https://github.com/mandiant/capa. Accessed 29 Apr 2023
New Picus Red Report warns of “Swiss Army knife” malware. https://www.picussecurity.com/press-release/red-report-2023-warns-of-swiss-army-knife-malware
Multipurpose malware: Sometimes Trojans come in threes. https://www.kaspersky.co.in/blog/multipurpose-malware-sometimes-trojans-come-in-threes/6059/
Joachims, T.: A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. Carnegie-Mellon University Pittsburgh PA, Department of Computer Science (1996)
Google Scholar
Kumar, N., Mukhopadhyay, S., Gupta, M., Handa, A., Shukla, S.K.: Malware classification using early stage behavioural analysis. In: 2019 14th Asia Joint Conference on Information Security (AsiaJCIS), Kobe, Japan, pp. 16–23 (2019). https://doi.org/10.1109/AsiaJCIS.2019.00-10
Han, W., Xue, J., Wang, Y., Liu, Z., Kong, Z.: MalInsight: a systematic profiling based malware detection framework. J. Netw. Comput. Appl. 125, 236–250 (2019)
Article Google Scholar
Gibert, D., Mateu, C., Planes, J.: The rise of machine learning for detection and classification of malware: research developments, trends and challenges. J. Netw. Comput. Appl. 153, 102526 (2020)
Article Google Scholar
Kondrak, G.: N-gram similarity and distance. In: Consens, M., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 115–126. Springer, Heidelberg (2005). https://doi.org/10.1007/11575832_13
Chapter Google Scholar
Multi-Purpose Ransomware Fuels DDoS Attacks. https://www.securityweek.com/multi-purpose-ransomware-fuels-ddos-attacks/
Zhang, M.-L., Zhou, Z.-H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014). https://doi.org/10.1109/TKDE.2013.39
Article Google Scholar
CISA Alert AA23-040A: Maui and HolyGhost Ransomware Target Critical Infrastructure. https://www.picussecurity.com/resource/blog/cisa-alert-aa23-040a-maui-and-holyghost-ransomware-target-critical-infrastructure
TrickBot: Not Your Average Hat Trick - A Malware with Multiple Hats. https://www.cisecurity.org/insights/blog/trickbot-not-your-average-hat-trick-a-malware-with-multiple-hats. Accessed 02 May 2023
Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)
Article MATH Google Scholar
Drew, J., Moore, T., Hahsler, M.: Polymorphic malware detection using sequence classification methods. In: 2016 IEEE Security and Privacy Workshops (SPW), pp. 81–87. IEEE (2016)
Google Scholar
GlobeImposter Ransomware Being Distributed with MedusaLocker via RDP. https://asec.ahnlab.com/en/48940/
Li, C., Lv, Q., Li, N., Wang, Y., Sun, D., Qiao, Y.: A novel deep framework for dynamic malware detection based on API sequence intrinsic features. Comput. Secur. 116, 102686 (2022)
Article Google Scholar
Agarkar, S., Ghosh, S.: Malware detection & classification using machine learning. In: 2020 IEEE International Symposium on Sustainable Energy, Signal Processing and Cyber Security (iSSSC), pp. 1–6. IEEE (2020)
Google Scholar
North Korean threat actor targets small and midsize businesses with H0lyGh0st ransomware. https://www.microsoft.com/en-us/security/blog/2022/07/14/north-korean-threat-actor-targets-small-and-midsize-businesses-with-h0lygh0st-ransomware/
Rani, N., Mishra, A., Kumar, R., Ghosh, S., Shukla, S.K., Bagade, P.: A generalized unknown malware classification. In: Li, F., Liang, K., Lin, Z., Katsikas, S.K. (eds.) SecureComm 2022. LNICST, vol. 462, pp. 793–806. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25538-0_41
Chapter Google Scholar
Rani, N., Dhavale, S.V.: Leveraging machine learning for ransomware detection. arXiv preprint arXiv:2206.01919 (2022)
Malware Analysis - ransomware - b14c45c1792038fd69b5c75e604242a3. https://www.redpacketsecurity.com/malware-analysis-ransomware-b14c45c1792038fd69b5c75e604242a3/
Xu, Z., Fang, X., Yang, G.: MalBERT: a novel pre-training method for malware detection. Comput. Secur. 111, 102458 (2021)
Article Google Scholar
Rani, N., Dhavale, S.V., Singh, A., Mehra, A.: A survey on machine learning-based ransomware detection. In: Giri, D., Raymond Choo, K.K., Ponnusamy, S., Meng, W., Akleylek, S., Prasad Maity, S. (eds.) ICMC 2021. AISC, vol. 1412, pp. 171–186. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-6890-6_13
Chapter MATH Google Scholar
Deng, X., Mirkovic, J.: Malware behavior through network trace analysis. In: Ghita, B., Shiaeles, S. (eds.) INC 2020. LNNS, vol. 180, pp. 3–18. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-64758-2_1
Chapter Google Scholar
Ghamrawi, N., McCallum, A.: Collective multi-label classification. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 195–200 (2005)
Google Scholar
Rewterz Threat Alert - Lockbit Ransomware - Active IOCs. https://www.rewterz.com/rewterz-news/rewterz-threat-alert-lockbit-ransomware-active-iocs-13/
Singh, A., Handa, A., Kumar, N., Shukla, S.K.: Malware classification using image representation. In: Dolev, S., Hendler, D., Lodha, S., Yung, M. (eds.) CSCML 2019. LNCS, vol. 11527, pp. 75–92. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20951-3_6
Chapter Google Scholar
North Korean H0lyGh0st Ransomware Has Ties to Global Geopolitics. https://blogs.blackberry.com/en/2022/08/h0lygh0st-ransomware
Abusnaina, A., et al.: DL-FHMC: deep learning-based fine-grained hierarchical learning approach for robust malware classification. IEEE Trans. Dependable Secure Comput. 19(5), 3432–3447 (2021)
Article Google Scholar
Amer, E., Zelinka, I.: A dynamic Windows malware detection and prediction method based on contextual understanding of API call sequence. Comput. Secur. 92, 101760 (2020)
Article Google Scholar
Ahmed, I., Xu, W., Annavajjala, R., Yoo, W.-S.: Joint demodulation and decoding with multi-label classification using deep neural networks (2021)
Google Scholar
Opitz, J., Burst, S.: Macro F1 and Macro F1. arXiv preprint arXiv:1911.03347 (2019)
Fujii, S., Yamagishi, R., Yamauchi, T.: Survey and analysis on ATT &CK mapping function of online sandbox for understanding and efficient using. J. Inf. Process. 30, 807–821 (2022). Released on J-STAGE 15 December 2022, Online ISSN 1882-6652. https://doi.org/10.2197/ipsjjip.30.807

Download references

Acknowledgement

We thank to the C3i (Cyber Security and Cyber Security for Cyber-Physical Systems) Innovation Hub at IIT Kanpur for partially funding this research project. A special thanks to Mr. Vikas Maurya for his insightful feedback.

Author information

Authors and Affiliations

Indian Institute of Technology Kanpur, Kanpur, India
Bikash Saha, Nanda Rani & Sandeep Kumar Shukla

Authors

Bikash Saha
View author publications
You can also search for this author in PubMed Google Scholar
Nanda Rani
View author publications
You can also search for this author in PubMed Google Scholar
Sandeep Kumar Shukla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bikash Saha .

Editor information

Editors and Affiliations

Technical University of Denmark, Kongens Lyngby, Denmark
Weizhi Meng
Xidian University, Xi'an, China
Zheng Yan
University of Milan, Milan, Italy
Vincenzo Piuri

Appendix

Binary Relevance (BR). Binary Relevance is a popular and straightforward problem transformation method. In this method we chose 12 different gaussian naive bayes based single-label binary classifiers to predict 12 capabilities.

As illustrated in Fig. 5, each classifier produce output as 0/1 for each malware capability. We take the union of all outputs predicted by every classifier and consider them multi-label outputs for the given sample. This model’s effectiveness suffers if the dataset’s target labels are dependent or correlated with each other.

Classifier Chain (CC). This method solves the limitation of Binary Relevance by addressing the label correlation problem by using a chain of binary classifiers with same length as the number of target labels. As shown in Fig. 6, $m_i$ represents a data sample which $C_1$ uses as input (step 1) and predicts output as $l_1$ (step 2), where $l_1 \in \{0,1\}$. Further, $C_2$ uses $m_i$ and $l_1$ combined as input (step 3) and produces output as $l_2$ (step 4), where $l_2 \in \{0,1\}$. Similarly, this chain goes on till $C_n$, and we compute the union of each $C_x$, where $1 \le x \le n$, and produce a multi-label output of $1 \times n$ dimensions. Following this approach, the CC method solves the label correlation problem present in the binary Relevance method.

Label Powerset (LP). This method addresses the issue of simultaneously assigning multiple labels to an instance. This method considers all possible label combinations for every instance in the dataset. As shown in the Table 6, If a data sample associates with two target labels, $L_1$ and $L_3$, it obtains a new target label as $L_{1,3}$ in the dataset and repeat this for all data samples to transformed the dataset into single-label dataset. In the worst-case scenario, the LP method generates $2^{|L|}$ number of new single-label target classes for L multi-label target classes. Thus, this method’s computational complexity poses a problem and it grows exponentially with the number of target classes.

Table 6. Label Powerset Transformation

Full size table

Multi-label k Nearest Neighbors (ML-KNN). ML-kNN is a lazy learning approach and combines the concepts of KNN and Bayesian probability to make predictions for multi-label classification. It consists of two phases: training phase and prediction phase. In the training phase, the first step is to preprocess the data. Let N denote training instances and L denote total target labels. Each training instance i is denoted by a feature vector $X_i$ of dimension D (where D depends on the type of feature transformation method), and its label vector $Y_i$ is a binary vector of length L, indicating the presence or absence of each label. After that, For each class j, we estimate the prior probability $P(Y_j)$ and the conditional probabilities $P(X|Y_j)$ for each feature given the class using maximum likelihood estimation. We follow formula as given below:

$$\begin{aligned} P(Y_j) = \frac{\text {Number of instances with label } Y_j}{N} \end{aligned}$$

(10)

$$\begin{aligned} P(X_k|Y_j) = \frac{\text {Number of samples with label } Y_j \text { and feature value } X_k}{ \text {Number of samples with label } Y_j} \end{aligned}$$

(11)

where $P(Y_j)$ represent prior probability and $P(X_k|Y_j)$ represent conditional probabilities. After that, we store the transformed training instances and their corresponding label vectors. In next prediction phase, we convert the test instance into the same format as the training instances. Let $X_{test}$ denote the feature vector of test instance. We use euclidean distance as distance metric to find K training instances that are most similar to $X_{test}$ test instance based on the feature values. Let $N_k$ denote the indices of nearest neighbors. Now, for each label j, we calculate the conditional probabilities $P(Y_j|X_{test})$ using Bayes’ theorem:

$$\begin{aligned} P(Y_j|X_{test}) = \frac{P(Y_j) * \prod _k P(X_k|Y_j)}{Z} \end{aligned}$$

(12)

where $X_k$ represents the feature values of $k^{th}$ nearest neighbor, and Z represent a normalization constant. The product $\prod $ is taken over all K nearest neighbors.

Further, we select the top labels with the highest probabilities $P(Y_j|X_{test})$ as the predicted labels for the given test instance. By considering the label probabilities and feature similarities, ML-kNN finds the K nearest neighbors and assigns labels based on their votes.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saha, B., Rani, N., Shukla, S.K. (2023). MalXCap: A Method for Malware Capability Extraction. In: Meng, W., Yan, Z., Piuri, V. (eds) Information Security Practice and Experience. ISPEC 2023. Lecture Notes in Computer Science, vol 14341. Springer, Singapore. https://doi.org/10.1007/978-981-99-7032-2_14

Download citation

DOI: https://doi.org/10.1007/978-981-99-7032-2_14
Published: 08 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7031-5
Online ISBN: 978-981-99-7032-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MalXCap: A Method for Malware Capability Extraction

Abstract

Access this chapter

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation