Skip to main content

From Fine-Grained to Refined: APT Malware Knowledge Graph Construction and Attribution Analysis Driven by Multi-stage Graph Computation

  • Conference paper
  • First Online:
Computational Science – ICCS 2024 (ICCS 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14832))

Included in the following conference series:

  • 557 Accesses

Abstract

In response to the growing threat of Advanced Persistent Threat (APT) in network security, our research introduces an innovative APT malware attribution tool, the APTMalKG knowledge graph. This knowledge graph is constructed from comprehensive APT malware data and refined through a multi-stage graph clustering process. We have incorporated domain-specific meta-paths into the GraphSAGE graph embedding algorithm to enhance its effectiveness. Our approach includes an ontology model capturing complex APT malware characteristics and behaviors, extracted from sandbox analysis reports and expanded intelligence. To manage the graph’s granularity and scale, we categorize nodes based on domain knowledge, form a correlation subgraph, and progressively adjust similarity thresholds and edge weights. The refined graph maintains crucial attribution data while reducing complexity. By integrating domain-specific meta-paths into GraphSAGE, we achieve improved APT attribution accuracy with an average accuracy of 91.16%, an F1 score of 89.82%, and an average AUC of 98.99%, enhancing performance significantly. This study benefits network security analysts with an intuitive knowledge graph and explores large-scale graph computing methods for practical scenarios, offering a multi-dimensional perspective on APT malware analysis and attribution research, highlighting the value of knowledge graphs in network security.

Supported by Youth Innovation Promotion Association, CAS (No. 2023170).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Malware Attribute Enumeration and Characterization (MAEC) (2023). https://maecproject.github.io/. Accessed 11 Nov 2023

  2. Balan, G., Gavriluţ, D.T., Luchian, H.: Using API calls for sequence-pattern feature mining-based malware detection. In: Su, C., Gritzalis, D., Piuri, V. (eds.) ISPEC 2022, pp. 233–251. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-21280-2_13

  3. Busch, J., Kocheturov, A., Tresp, V., Seidl, T.: Nf-gnn: network flow graph neural networks for malware detection and classification. In: Proceedings of the 33rd International Conference on Scientific and Statistical Database Management, pp. 121–132. Association for Computing Machinery (2021)

    Google Scholar 

  4. Chang, H.Y., Yang, T.Y., Zhuang, C.J., Tseng, W.L.: Ransomware detection by distinguishing api call sequences through lstm and bert models. Comput. J. 13, 5439 (2023)

    Google Scholar 

  5. Cremer, F., Sheehan, B., Fortmann, M., Kia, A.N., Mullins, M., Murphy, F., Materne, S.: Cyber risk and cybersecurity: a systematic review of data availability. Geneva Papers Risk Insur. Issues Pract. 47, 698–736 (2022)

    Article  Google Scholar 

  6. CyberMonitor, Robert Haist, K., et al.: APT and cybercriminals campaign collection. GitHub repository (2022). https://github.com/CyberMonitor/APT_CyberCriminal_Campagin_Collections

  7. Do Xuan, C., Huong, D.: A new approach for apt malware detection based on deep graph network for endpoint systems. Appl. Intell. 52(12), 14005–14024 (2022)

    Article  Google Scholar 

  8. Dutta, S., Rastogi, N., Yee, D., Gu, C., Ma, Q.: Malware knowledge graph: a comprehensive knowledge base for malware analysis and detection. In: 2021 IEEE Network Security and Privacy Protection International Conference (NSPW) (2021)

    Google Scholar 

  9. Feurer, M., et al.: auto-sklearn: automated machine learning toolkit (2023). https://automl.github.io/auto-sklearn/master/. gitHub repository

  10. Hasan, M.M., Islam, M.U., Uddin, J.: Advanced persistent threat identification with boosting and explainable AI. SN Comput. Sci. 4, 271–279 (2023)

    Article  Google Scholar 

  11. Kiesling, E., Ekelhart, A., Kurniawan, K., Ekaputra, F.: The SEPSES knowledge graph: an integrated resource for cybersecurity. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11779, pp. 198–214. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30796-7_13

  12. Kiran Bandla, S.C.: Aptnotes data. GitHub repository (2021). https://github.com/aptnotes/data

  13. Lee, K., Lee, J., Yim, K.: Classification and analysis of malicious code detection techniques based on the apt attack. Appl. Sci. 13, 2894 (2023)

    Article  Google Scholar 

  14. Li, S., Zhou, Q., Zhou, R., Lv, Q.: Intelligent malware detection based on graph convolutional network. J. Supercomput. 78, 4182–4198 (2022)

    Article  Google Scholar 

  15. Li, S., Zhang, Q., Wu, X., Han, W., Tian, Z.: Attribution classification method of apt malware in IoT using machine learning techniques. Secur. Commun. Netw. 2021, 1–12 (2021)

    Google Scholar 

  16. Li, Z., Zeng, J., Chen, Y., Liang, Z.: AttacKG: constructing technique knowledge graph from cyber threat intelligence Reports. In: Atluri, V., Di Pietro, R., Jensen, C.D., Meng, W. (eds.) ESORICS 2022, pp. 589–609. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-17140-6_29

  17. MLG at Neo4j. Community detection (2022). https://neo4j.com/docs/graph-data-science/current/algorithms/community/

  18. Moon, H.-J., Bu, S.-J., Cho, S.-B.: Directional graph transformer-based control flow embedding for malware classification. In: Yin, H., et al. (eds.) IDEAL 2021. LNCS, vol. 13113, pp. 426–436. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-91608-4_42

  19. Peng, C., Xia, F., Naseriparsa, M., Osborne, F.: Knowledge graphs: opportunities and challenges. Artif. Intell. Rev. 56, 13071–13102 (2023)

    Article  Google Scholar 

  20. RedDrip7. Apt_digital_weapon: indicators of compromise (IOCS) collected from public resources and categorized by qi-anxin. GitHub repository (2022)

    Google Scholar 

  21. Ren, Y., Xiao, Y., Zhou, Y., Zhang, Z., Tian, Z.: Cskg4apt: a cybersecurity knowledge graph for advanced persistent threat organization attribution. IEEE Trans. Knowl. Data Eng. 35, 5695–5709 (2023)

    Google Scholar 

  22. Renz, M., Kröger, P., Koschmider, A., Landsiedel, O., de Sousa, N.T.: Cross domain fusion for spatiotemporal applications: taking interdisciplinary, holistic research to the next level. Informatik Spektrum 45, 271–277 (2022)

    Article  Google Scholar 

  23. Sahoo, D.: Cyber threat attribution with multi-view heuristic analysis. In: Choo, K.-K.R., Dehghantanha, A. (eds.) Handbook of Big Data Analytics and Forensics, pp. 53–73. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-74753-4_4

  24. Sharma, A., Gupta, B.B., Singh, A.K., Saraswat, V.K.: Advanced persistent threats (apt): evolution, anatomy, attribution and countermeasures. J. Ambient. Intell. Humaniz. Comput. 14, 9355–9381 (2023)

    Article  Google Scholar 

  25. Sikos, L.F.: Cybersecurity knowledge graphs. Knowl. Inf. Syst. 65, 3511–3531 (2023)

    Article  Google Scholar 

  26. Soni, H., Kishore, P., Mohapatra, D.P.: Opcode and API based machine learning framework for malware classification. In: 2022 2nd International Conference on Intelligent Technologies (CONIT), pp. 1–7 (2022)

    Google Scholar 

  27. Tekerek, A., Yapici, M.M.: A novel malware classification and augmentation model based on convolutional neural network. Comput. Secur. 112, 102515 (2022)

    Article  Google Scholar 

  28. VirusTotal. Virustotal: analyse suspicious files and URLs to detect malware. Website (2022). https://www.virustotal.com/

  29. Wai, F.K., Thing, V.L.L.: Clustering based opcode graph generation for malware variant detection. In: 2021 18th International Conference on Privacy, Security and Trust (PST), pp. 1–11 (2021)

    Google Scholar 

  30. Wei, C., Li, Q., Guo, D., Meng, X.: Toward identifying apt malware through API system calls. Secur. Commun. Netw. 2021, 8077220 (2021)

    Article  Google Scholar 

  31. Wu, X.W., Wang, Y., Fang, Y., Jia, P.: Embedding vector generation based on function call graph for effective malware detection and classification. Neural Comput. Appl. 34, 8643–8656 (2022)

    Article  Google Scholar 

  32. Xuan, C.D., Dao, M.H.: A novel approach for apt attack detection based on combined deep learning model. Neural Comput. Appl. 33, 13251–13264 (2021)

    Article  Google Scholar 

Download references

Funding

Supported by Youth Innovation Promotion Association, CAS (No. 2020166) and Youth Innovation Promotion Association, CAS (No. 2023170).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiuyun Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jing, R., Jiang, Z., Wang, Q., Wang, S., Li, H., Chen, X. (2024). From Fine-Grained to Refined: APT Malware Knowledge Graph Construction and Attribution Analysis Driven by Multi-stage Graph Computation. In: Franco, L., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2024. ICCS 2024. Lecture Notes in Computer Science, vol 14832. Springer, Cham. https://doi.org/10.1007/978-3-031-63749-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-63749-0_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-63748-3

  • Online ISBN: 978-3-031-63749-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics