Abstract
Many activities in the cybersecurity realm can be represented using graphs stream, such as call graphs. In this paper, we introduce an innovative method to detect Advanced Persistent Threats (APTs) from their onset. Unique to our approach is the ability to assimilate both structural and temporal aspects, crucial for differentiating between benign and malicious activities. To overcome challenges presented by streaming data processing, we leverage hashing techniques for a compact data representation. This method, when combined with a dynamic machine learning framework, facilitates swift, incremental detection and ensures minimal memory usage. Empirical evaluations underscore the efficacy of our approach, allowing a real-time response by pinpointing APTs at the initial stages of their activity






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The datasets used during the current study are publicly available : https://drive.google.com/drive/folders/1Kp3JQsZz2X61efHU4mTEWHdF-ZSun8ad
References
Alshamrani A, Myneni S, Chowdhary A, Huang D (2019) A survey on advanced persistent threats: Techniques, solutions, challenges, and research opportunities. IEEE Commun Surv & Tutor 21(2):1851–1877
Quintero-Bonilla S, Rey A (2020) A new proposal on the advanced persistent threat: A survey. Appl Sci 10(11):3874
Ma X, Wu J, Xue S, Yang J, Zhou C, Sheng QZ, Xiong H, Akoglu L (2021) A comprehensive survey on graph anomaly detection with deep learning. IEEE Transactions on Knowledge and Data Engineering
Wu Y, Dai H-N, Tang H (2021) Graph neural networks for anomaly detection in industrial internet of things. IEEE Internet Things J 9(12):9214–9231
Yamanishi K, Takeuchi J-i (2002) A unifying framework for detecting outliers and change points from non-stationary time series data. In: Proceedings of the Eighth ACM SIGKDD international conference on knowledge discovery and data mining, pp 676–681
Pu G, Wang L, Shen J, Dong F (2020) A hybrid unsupervised clustering-based anomaly detection method. Tsinghua Sci Technol 26(2):146–153
Ahmad B, Jian W, Ali ZA, Tanvir S, Khan MSA (2019) Hybrid anomaly detection by using clustering for wireless sensor network. Wirel Pers Commun 106:1841–1853
Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 855–864
Wu Z, Pan S, Chen F, Long G, Zhang C, Philip SY (2020) A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst 32(1):4–24
Eswaran D, Faloutsos C, Guha S, Mishra N (2018) Spotlight: Detecting anomalies in streaming graphs. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1378–1386
Yu W, Cheng W, Aggarwal CC, Zhang K, Chen H, Wang W (2018) Netwalk: A flexible deep embedding approach for anomaly detection in dynamic networks. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 2672–2681
Chang Y-Y, Li P, Sosic R, Afifi M, Schweighauser M, Leskovec J (2021) F-fade: Frequency factorization for anomaly detection in edge streams. In: Proceedings of the 14th ACM international conference on web search and data mining, pp 589–597
Liu Y, Pan S, Wang YG, Xiong F, Wang L, Chen Q, Lee VC (2021) Anomaly detection in dynamic graphs via transformer. IEEE Transactions on Knowledge and Data Engineering
Lagraa S, Amrouche K, Seba H et al (2021) A simple graph embedding for anomaly detection in a stream of heterogeneous labeled graphs. Pattern Recognit 112:107746
Manzoor E, Milajerdi SM, Akoglu L (2016) Fast memory-efficient anomaly detection in streaming heterogeneous graphs. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1035–1044
Yang Z, Yu J, Kitsuregawa M (2010) Fast algorithms for top-k approximate string matching. In: Proceedings of the AAAI conference on artificial intelligence vol 24, pp 1467–1473
Bolton AD, Anderson-Cook CM (2017) Apt malware static trace analysis through bigrams and graph edit distance. Stat Anal Data Min: ASA Data Sci J 10(3):182–193
Milajerdi SM, Gjomemo R, Eshete B, Sekar R, Venkatakrishnan V (2019) Holmes: real-time apt detection through correlation of suspicious information flows. In: 2019 IEEE Symposium on security and privacy (SP), pp 1137–1152. IEEE
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium on theory of computing, pp 604–613
Dasgupta A, Kumar R, Sarlós, T (2011) Fast locality-sensitive hashing. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp.1073–1081
Wegman MN, Carter JL (1981) New hash functions and their use in authentication and set equality. J Comput Syst Sci 22(3):265–279
Lemire D, Kaser O (2014) Strongly universal string hashing is fast. Comput J 57(11):1624–1638
Liu FT, Ting KM, Zhou Z-H (2008) Isolation forest. In: 2008 Eighth Ieee international conference on data mining, pp 413–422. IEEE
Narayanan A, Chandramohan M, Venkatesan R, Chen L, Liu Y, Jaiswal S (2017) graph2vec: Learning distributed representations of graphs. arXiv:1707.05005
Oh J, Cho K, Bruna J (2019) Advancing graphsage with a data-driven node sampling. arXiv:1904.12935
Abadal S, Jain A, Guirado R, López-Alonso J, Alarcón E (2021) Computing graph neural networks: A survey from algorithms to accelerators. ACM Comput Surv (CSUR) 54(9):1–38
Carrington AM, Manuel DG, Fieguth PW, Ramsay T, Osmani V, Wernly B, Bennett C, Hawken S, Magwood O, Sheikh Y et al (2022) Deep roc analysis and auc as balanced average accuracy, for improved classifier selection, audit and explanation. IEEE Trans Pattern Anal Mach Intell 45(1):329–341
Acknowledgements
This work is supported by the French National Research Agency (ANR) under grant ANR-20-CE39-0008.
Funding
This work is supported by the French National Research Agency (ANR) under grant ANR-20-CE39-0008.
Author information
Authors and Affiliations
Contributions
Walid MEGHERBI, Abd Errahmane KIOUCHE, Mohammed HADDAD, and Hamida SEBA all made significant contributions to the research, design, and interpretation of the study.
Corresponding author
Ethics declarations
Ethical and Informed Consent for data used
This study does not make use of any personal data, and therefore does not require anyone’s informed consent.
Conflicts of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Megherbi, W., Kiouche, A.E., Haddad, M. et al. Detection of advanced persistent threats using hashing and graph-based learning on streaming data. Appl Intell 54, 5879–5890 (2024). https://doi.org/10.1007/s10489-024-05475-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05475-1