Advanced persistent threat detection via mining long-term features in provenance graphs

Xu, Fan; Zhao, Qinxin; Liu, Xiaoxiao; Wang, Nan; Gao, Meiqi; Wen, Xuezhi; Zhang, Dalin

doi:10.1007/s11704-024-40610-8

Advanced persistent threat detection via mining long-term features in provenance graphs

Research Article
Published: 28 January 2025

Volume 19, article number 1910809, (2025)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Fan Xu^1,2^na1,
Qinxin Zhao³^na1,
Xiaoxiao Liu⁴,
Nan Wang¹,
Meiqi Gao^4,5,6,
Xuezhi Wen⁴ &
…
Dalin Zhang¹

157 Accesses
1 Citation
Explore all metrics

Abstract

Advanced Persistent Threats (APTs) pose significant challenges to detect due to their “low-and-slow” attack patterns and frequent use of zero-day vulnerabilities. Within this task, the extraction of long-term features is often crucial. In this work, we propose a novel end-to-end APT detection framework named Long-Term Feature Association Provenance Graph Detector (LT-ProveGD). Specifically, LT-ProveGD encodes contextual information of the dynamic provenance graph while preserving the topological information with space efficiency. To combat “low-and-slow” attacks, LT-ProveGD develops an autoencoder with an integrated multi-head attention mechanism to extract long-term dependencies within the encoded representations. Furthermore, to facilitate the detection of previously unknown attacks, we leverage Jenks’ natural breaks methodology, enabling detection without relying on specific attack information. By conducting extensive experiments on five widely used datasets with state-of-the-art attack detection methods, we demonstrate the superior effectiveness of LT-ProveGD.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Approach for Advanced Persistent Threats Detection via Graph Transformer

A Study on Efficient Provenance-Based Intrusion Detection System Using Few-Shot Graph Representation Learning

Big knowledge-based semantic correlation for detecting slow and low-level advanced persistent threats

Article Open access 27 November 2021

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Pasquier T F J M, Singh J, Eyers D, Bacon J. Camflow: managed data-sharing for cloud services. IEEE Transactions on Cloud Computing, 2017, 5(3): 472–484
Article MATH Google Scholar
Xu F, Wang N, Wu H, Wen X, Zhao X, Wan H. Revisiting graph-based fraud detection in sight of heterophily and spectrum. In: Proceedings of the 38th AAAI Conference on Artificial Intelligence. 2024, 9214–9222
Google Scholar
Stojanović B, Hofer-Schmitz K, Kleb U. Apt datasets and attack modeling for automated detection methods: a review. Computers & Security, 2020, 92: 101734
Article MATH Google Scholar
Hindy H, Atkinson R, Tachtatzis C, Colin J N, Bayne E, Bellekens X. Utilising deep learning techniques for effective zero-day attack detection. Electronics, 2020, 9(10): 1684
Article Google Scholar
Erlacher F, Dressler F. On high-speed flow-based intrusion detection using snort-compatible signatures. IEEE Transactions on Dependable and Secure Computing, 2022, 19(1): 495–506
Article MATH Google Scholar
Li Z, Chen Q A, Yang R, Chen Y, Ruan W. Threat detection and investigation with system-level provenance graphs: a survey. Computers & Security, 2021, 106: 102282
Article MATH Google Scholar
Lv Y, Qin S, Zhu Z, Yu Z, Li S, Han W. A review of provenance graph based apt attack detection: applications and developments. In: Proceedings of the 7th IEEE International Conference on Data Science in Cyberspace. 2022, 498–505
MATH Google Scholar
Sterckx L, Demeester T, Deleu J, Develder C. Knowledge base population using semantic label propagation. Knowledge-Based Systems, 2016, 108: 79–91
Article Google Scholar
Stitz H, Gratzl S, Piringer H, Zichner T, Streit M. KnowledgePearls: provenance-based visualization retrieval. IEEE Transactions on Visualization and Computer Graphics, 2019, 25(1): 120–130
Article Google Scholar
Church K W. Word2Vec. Natural Language Engineering, 2017, 23(1): 155–162
Article Google Scholar
Xu F, Wang N, Wu H, Wen X, Zhang D, Lu S, Li B, Gong W, Wan H, Zhao X. Gladformer: a mixed perspective for graph-level anomaly detection. In: Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases. 2024, 337–353
MATH Google Scholar
Kashyap V, Sheth A. Semantic and schematic similarities between database objects: a context-based approach. The VLDB Journal, 1996, 5(4): 276–304
Article MATH Google Scholar
Milajerdi S M, Gjomemo R, Eshete B, Sekar R, Venkatakrishnan V N. HOLMES: real-time APT detection through correlation of suspicious information flows. In: Proceedings of 2019 IEEE Symposium on Security and Privacy. 2019, 1137–1152
Chapter Google Scholar
Hassan W U, Guo S, Li D, Chen Z, Jee K, Li Z, Bates A. NoDoze: combatting threat alert fatigue with automated provenance triage. In: Proceedings of the 26th Annual Network and Distributed System Security Symposium. 2019
MATH Google Scholar
Hossain N, Milajerdi S M, Wang J, Eshete B, Gjomemo R, Sekar R, Stoller S D, Venkatakrishnan V N. SLEUTH: real-time attack scenario reconstruction from COTS audit data. In: Proceedings of the 26th USENIX Conference on Security Symposium. 2017, 487–504
Google Scholar
Hossain N, Sheikhi S, Sekar R. Combating dependence explosion in forensic analysis using alternative tag propagation semantics. In: Proceedings of 2020 IEEE Symposium on Security and Privacy. 2020, 1139–1155
Chapter MATH Google Scholar
Milajerdi S M, Eshete B, Gjomemo R, Venkatakrishnan V N. POIROT: aligning attack behavior with kernel audit records for cyber threat hunting. In: Proceedings of 2019 ACM SIGSAC Conference on Computer and Communications Security. 2019, 1795–1812
Google Scholar
Han X, Pasquier T F J M, Bates A, Mickens J, Seltzer M I. Unicorn: runtime provenance-based detector for advanced persistent threats. In: Proceedings of the 27th Annual Network and Distributed System Security Symposium. 2020
MATH Google Scholar
Liang R, Gao Y, Zhao X. Sequence feature extraction-based apt attack detection method with provenance graphs. Scientia Sinica Informationis, 2022, 52(8): 1463–1480
Article MATH Google Scholar
Dey R, Salem F M. Gate-variants of gated recurrent unit (GRU) neural networks. In: Proceedings of the 60th IEEE International Midwest Symposium on Circuits and Systems. 2017, 1597–1600
MATH Google Scholar
Liu F, Wen Y, Zhang D, Jiang X, Xing X, Meng D. Log2vec: a heterogeneous graph embedding based approach for detecting cyber threats within enterprise. In: Proceedings of 2019 ACM SIGSAC Conference on Computer and Communications Security. 2019, 1777–1794
MATH Google Scholar
Xie Y, Feng D, Hu Y, Li Y, Sample S, Long D. Pagoda: a hybrid approach to enable efficient real-time provenance based intrusion detection in big data environments. IEEE Transactions on Dependable and Secure Computing, 2020, 17(6): 1283–1296
Article Google Scholar
Pascanu R, Mikolov T, Bengio Y. On the difficulty of training recurrent neural networks. In: Proceedings of the 30th International Conference on Machine Learning. 2013, 1310–1318
MATH Google Scholar
Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. Computer, 2009, 42(8): 30–37
Article MATH Google Scholar
Shervashidze N, Schweitzer P, van Leeuwen E J, Mehlhorn K, Borgwardt K M. Weisfeiler-Lehman graph kernels. The Journal of Machine Learning Research, 2011, 12: 2539–2561
MathSciNet MATH Google Scholar
Nikolentzos G, Siglidis G, Vazirgiannis M. Graph kernels: a survey. Journal of Artificial Intelligence Research, 2021, 72: 943–1027
Article MathSciNet MATH Google Scholar
Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks. In: Proceedings of the 5th International Conference on Learning Representations. 2017
MATH Google Scholar
Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang G, Cai J, Chen T. Recent advances in convolutional neural networks. Pattern Recognition, 2018, 77: 354–377
Article MATH Google Scholar
Kalman D. A singularly valuable decomposition: the SVD of a matrix. The College Mathematics Journal, 1996, 27(1): 2–23
Article MathSciNet MATH Google Scholar
Roweis S. EM algorithms for PCA and SPCA. In: Proceedings of the 10th International Conference on Neural Information Processing Systems. 1997, 626–632
MATH Google Scholar
Perozzi B, Al-Rfou R, Skiena S. DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014, 701–710
Chapter MATH Google Scholar
Grover A, Leskovec J. node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016, 855–864
Chapter MATH Google Scholar
Chen L, Asai K, Nonomura T, Xi G, Liu T. A review of backward-facing step (BFS) flow mechanisms, heat transfer and control. Thermal Science and Engineering Progress, 2018, 6: 194–216
Article MATH Google Scholar
Agarwal S, Sable A, Sawant D, Kahalekar S, Hanawal M K. Threat detection and response in Linux endpoints. In: Proceedings of the 14th International Conference on Communication Systems & Networks. 2022, 447–449
Google Scholar
Ma S, Lee K H, Kim C H, Rhee J, Zhang X, Xu D. Accurate, low cost and instrumentation-free security audit logging for windows. In: Proceedings of the 31st Annual Computer Security Applications Conference. 2015, 401–410
Chapter MATH Google Scholar
Zhang F, Leach K, Wang H, Stavrou A. TrustLogin: securing password-login on commodity operating systems. In: Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security. 2015, 333–344
Chapter MATH Google Scholar
Pohly D J, McLaughlin S, McDaniel P, Butler K. Hi-Fi: collecting high-fidelity whole-system provenance. In: Proceedings of the 28th Annual Computer Security Applications Conference. 2012, 259–268
Chapter MATH Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6000–6010
Google Scholar
Hou C, Xie Y, Zhang Z. An improved convolutional neural network based indoor localization by using Jenks natural breaks algorithm. China Communications, 2022, 19(4): 291–301
Article MATH Google Scholar
Griffith J, Kong D, Caro A, Benyo B, Khoury J, Upthegrove T, Christovich T, Ponomorov S, Sydney A, Saini A, Shurbanov V, Willig C, Levin D, Dietz J. Scalable transparency architecture for research collaboration (STARC)-DARPA transparent computing (TC) program. Cambridge: Raytheon BBN Technologies Corp, 2020
Google Scholar

Download references

Acknowledgements

This work was supported in part by the Fundamental Research Funds for the Central Universities (2024JBMC031), the OpenFund of Advanced Cryptography and System Security Key Laboratory of Sichuan Province (No. SKLACSS-202312), the CCF-NSFOCUS Open Fund, the National Natural Science Foundation of China (Grant Nos. 62202042, U20A6003, 62076146, 62021002, U19A2062, 62127803, U1911401 and 6212780016), the Fundamental Research Funds for the Central Universities, JLU, the Industrial Technology Infrastructure Public Service Platform Project ‘Public Service Platform for Urban Rail Transit Equipment Signal System Testing and Safety Evaluation’ (No. 2022-233- 225), and Ministry of Industry and Information Technology of China.

Author information

These authors contributed equally to this work.

Authors and Affiliations

School of Cyberspace Science and Techonology, Beijing Jiaotong University, Beijing, 100044, China
Fan Xu, Nan Wang & Dalin Zhang
University of Science and Technology of China, Hefei, 230026, China
Fan Xu
The State Key Laboratory for Novel Software Technology, Department of Computer Science and Technology, Nanjing University, Nanjing, 210023, China
Qinxin Zhao
School of Software Engineering, Beijing Jiaotong University, Beijing, 100044, China
Xiaoxiao Liu, Meiqi Gao & Xuezhi Wen
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China
Meiqi Gao
Advanced Cryptography and System Security Key Laboratory of Sichuan Province, Chengdu, 610000, China
Meiqi Gao

Authors

Fan Xu
View author publications
Search author on:PubMed Google Scholar
Qinxin Zhao
View author publications
Search author on:PubMed Google Scholar
Xiaoxiao Liu
View author publications
Search author on:PubMed Google Scholar
Nan Wang
View author publications
Search author on:PubMed Google Scholar
Meiqi Gao
View author publications
Search author on:PubMed Google Scholar
Xuezhi Wen
View author publications
Search author on:PubMed Google Scholar
Dalin Zhang
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Nan Wang.

Ethics declarations

Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.

Additional information

Fan Xu received his BS degree from Dalian University of Technology, China in 2022. He is currently working towards his MS degree at University of Science and Technology of China (USTC), China. His research interests include graph representation learning, AI4Science, and anomaly detection.

Qinxin Zhao obtained the BSc degree from Nanjing University, China in 2024. She is currently pursuing the PhD degree at the School of Software Engineering, Nanjing University, China.

Xiaoxiao Liu received the MSc Degree in software engineering from Beijing Jiaotong University (BJTU), China in 2023. Her research interest is APT attack detection.

Nan Wang received the BE degree from the Harbin Institute of Technology China in 2016 and the PhD degree from Tsinghua University, China in 2021. She is currently an assistant professor with the School of Cyberspace Science and Techonology, Beijing Jiaotong University, China.

Meiqi Gao received her bachelor’s degree from Tianjin Foreign Studies University, China in 2022 and is currently pursuing a master’s degree at Beijing Jiaotong University, China. Her research interests include deep learning, network security, and anomaly detection.

Xuezhi Wen received his bachelor’s degree from Shijiazhuang Tiedao University, China in 2022 and is currently pursuing a master’s degree at Beijing Jiaotong University, China. His research interests include deep learning, network security, and anomaly detection.

Dalin Zhang graduated from Beijing University of Posts and Telecommunications with a PhD in computer science and technology, China in 2014. His research interests include software security, intelligent transportation systems, and machine learning.

Electronic supplementary material