Skip to main content
Log in

Detection of advanced persistent threats using hashing and graph-based learning on streaming data

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Many activities in the cybersecurity realm can be represented using graphs stream, such as call graphs. In this paper, we introduce an innovative method to detect Advanced Persistent Threats (APTs) from their onset. Unique to our approach is the ability to assimilate both structural and temporal aspects, crucial for differentiating between benign and malicious activities. To overcome challenges presented by streaming data processing, we leverage hashing techniques for a compact data representation. This method, when combined with a dynamic machine learning framework, facilitates swift, incremental detection and ensures minimal memory usage. Empirical evaluations underscore the efficacy of our approach, allowing a real-time response by pinpointing APTs at the initial stages of their activity

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The datasets used during the current study are publicly available : https://drive.google.com/drive/folders/1Kp3JQsZz2X61efHU4mTEWHdF-ZSun8ad

Notes

  1. https://gitlab.liris.cnrs.fr/gladis/hadapt

References

  1. Alshamrani A, Myneni S, Chowdhary A, Huang D (2019) A survey on advanced persistent threats: Techniques, solutions, challenges, and research opportunities. IEEE Commun Surv & Tutor 21(2):1851–1877

  2. Quintero-Bonilla S, Rey A (2020) A new proposal on the advanced persistent threat: A survey. Appl Sci 10(11):3874

    Article  Google Scholar 

  3. Ma X, Wu J, Xue S, Yang J, Zhou C, Sheng QZ, Xiong H, Akoglu L (2021) A comprehensive survey on graph anomaly detection with deep learning. IEEE Transactions on Knowledge and Data Engineering

  4. Wu Y, Dai H-N, Tang H (2021) Graph neural networks for anomaly detection in industrial internet of things. IEEE Internet Things J 9(12):9214–9231

    Article  Google Scholar 

  5. Yamanishi K, Takeuchi J-i (2002) A unifying framework for detecting outliers and change points from non-stationary time series data. In: Proceedings of the Eighth ACM SIGKDD international conference on knowledge discovery and data mining, pp 676–681

  6. Pu G, Wang L, Shen J, Dong F (2020) A hybrid unsupervised clustering-based anomaly detection method. Tsinghua Sci Technol 26(2):146–153

    Article  Google Scholar 

  7. Ahmad B, Jian W, Ali ZA, Tanvir S, Khan MSA (2019) Hybrid anomaly detection by using clustering for wireless sensor network. Wirel Pers Commun 106:1841–1853

    Article  Google Scholar 

  8. Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 855–864

  9. Wu Z, Pan S, Chen F, Long G, Zhang C, Philip SY (2020) A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst 32(1):4–24

  10. Eswaran D, Faloutsos C, Guha S, Mishra N (2018) Spotlight: Detecting anomalies in streaming graphs. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1378–1386

  11. Yu W, Cheng W, Aggarwal CC, Zhang K, Chen H, Wang W (2018) Netwalk: A flexible deep embedding approach for anomaly detection in dynamic networks. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 2672–2681

  12. Chang Y-Y, Li P, Sosic R, Afifi M, Schweighauser M, Leskovec J (2021) F-fade: Frequency factorization for anomaly detection in edge streams. In: Proceedings of the 14th ACM international conference on web search and data mining, pp 589–597

  13. Liu Y, Pan S, Wang YG, Xiong F, Wang L, Chen Q, Lee VC (2021) Anomaly detection in dynamic graphs via transformer. IEEE Transactions on Knowledge and Data Engineering

  14. Lagraa S, Amrouche K, Seba H et al (2021) A simple graph embedding for anomaly detection in a stream of heterogeneous labeled graphs. Pattern Recognit 112:107746

    Article  Google Scholar 

  15. Manzoor E, Milajerdi SM, Akoglu L (2016) Fast memory-efficient anomaly detection in streaming heterogeneous graphs. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1035–1044

  16. Yang Z, Yu J, Kitsuregawa M (2010) Fast algorithms for top-k approximate string matching. In: Proceedings of the AAAI conference on artificial intelligence vol 24, pp 1467–1473

  17. Bolton AD, Anderson-Cook CM (2017) Apt malware static trace analysis through bigrams and graph edit distance. Stat Anal Data Min: ASA Data Sci J 10(3):182–193

    Article  MathSciNet  Google Scholar 

  18. Milajerdi SM, Gjomemo R, Eshete B, Sekar R, Venkatakrishnan V (2019) Holmes: real-time apt detection through correlation of suspicious information flows. In: 2019 IEEE Symposium on security and privacy (SP), pp 1137–1152. IEEE

  19. Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium on theory of computing, pp 604–613

  20. Dasgupta A, Kumar R, Sarlós, T (2011) Fast locality-sensitive hashing. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp.1073–1081

  21. Wegman MN, Carter JL (1981) New hash functions and their use in authentication and set equality. J Comput Syst Sci 22(3):265–279

    Article  MathSciNet  Google Scholar 

  22. Lemire D, Kaser O (2014) Strongly universal string hashing is fast. Comput J 57(11):1624–1638

    Article  Google Scholar 

  23. Liu FT, Ting KM, Zhou Z-H (2008) Isolation forest. In: 2008 Eighth Ieee international conference on data mining, pp 413–422. IEEE

  24. Narayanan A, Chandramohan M, Venkatesan R, Chen L, Liu Y, Jaiswal S (2017) graph2vec: Learning distributed representations of graphs. arXiv:1707.05005

  25. Oh J, Cho K, Bruna J (2019) Advancing graphsage with a data-driven node sampling. arXiv:1904.12935

  26. Abadal S, Jain A, Guirado R, López-Alonso J, Alarcón E (2021) Computing graph neural networks: A survey from algorithms to accelerators. ACM Comput Surv (CSUR) 54(9):1–38

    Article  Google Scholar 

  27. Carrington AM, Manuel DG, Fieguth PW, Ramsay T, Osmani V, Wernly B, Bennett C, Hawken S, Magwood O, Sheikh Y et al (2022) Deep roc analysis and auc as balanced average accuracy, for improved classifier selection, audit and explanation. IEEE Trans Pattern Anal Mach Intell 45(1):329–341

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the French National Research Agency (ANR) under grant ANR-20-CE39-0008.

Funding

This work is supported by the French National Research Agency (ANR) under grant ANR-20-CE39-0008.

Author information

Authors and Affiliations

Authors

Contributions

Walid MEGHERBI, Abd Errahmane KIOUCHE, Mohammed HADDAD, and Hamida SEBA all made significant contributions to the research, design, and interpretation of the study.

Corresponding author

Correspondence to Abd Errahmane Kiouche.

Ethics declarations

Ethical and Informed Consent for data used

This study does not make use of any personal data, and therefore does not require anyone’s informed consent.

Conflicts of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Megherbi, W., Kiouche, A.E., Haddad, M. et al. Detection of advanced persistent threats using hashing and graph-based learning on streaming data. Appl Intell 54, 5879–5890 (2024). https://doi.org/10.1007/s10489-024-05475-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05475-1

Keywords