Structural Node Representation Learning for Detecting Botnet Nodes

Carpenter, Justin; Layne, Janet; Serra, Edoardo; Cuzzocrea, Alfredo; Gallo, Carmine

doi:10.1007/978-3-031-36805-9_47

Justin Carpenter¹⁴,
Janet Layne¹⁴,
Edoardo Serra¹⁴,
Alfredo Cuzzocrea¹⁵ &
…
Carmine Gallo¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13956 ))

Included in the following conference series:

International Conference on Computational Science and Its Applications

569 Accesses

Abstract

Private consumers, small businesses, and even large enterprises are all more at risk from botnets. These botnets are known for spearheading Distributed Denial-Of-Service (DDoS) attacks, spamming large populations of users, and causing critical harm to major organizations. The development of Internet-of-Things (IoT) devices led to the use of these devices for cryptocurrency mining, in transit data interception, and sending logs containing private data to the master botnet. Different techniques have been developed to identify these botnet activities, but only a few use Graph Neural Networks (GNNs) to analyze host activity by representing their communications with a directed graph. Although GNNs are intended to extract structural graph properties, they risk to cause overfitting, which leads to failure when attempting to do so from an unidentified network. In this study, we test the notion that structural graph patterns might be used for efficient botnet detection. In this study, we also present SIR-GN, a structural iterative representation learning methodology for graph nodes. Our approach is built to work well with untested data, and our model is able to provide a vector representation for every node that captures its structural information. Finally, we demonstrate that, when the collection of node representation vectors is incorporated into a neural network classifier, our model outperforms the state-of-the-art GNN based algorithms in the detection of bot nodes within unknown networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zou, C.C., Cunningham, R.: Honeypot-aware advanced botnet construction and maintenance. In: International Conference on Dependable Systems and Networks, pp. 199–208. IEEE (2006)
Google Scholar
Yan, G., Ha, D.T., Eidenbenz, S.: AntBot: Anti-pollution peer-to-peer botnets. Comput. Netw. 55(8), 1941–1956 (2011)
Article Google Scholar
Gu, G., Perdisci, R., Zhang, J., Lee, W.: Botminer: Clustering Analysis of Network Traffic for Protocol-And Structure-Independent Botnet Detection (2008)
Google Scholar
Holz, T., Gorecki, C., Freiling, F., Rieck, K.: Detection and mitigation of fast-flux service networks. In: 15th Annual Network and Distributed System Security Symposium (2008)
Google Scholar
Bartos, K., Sofka, M., Franc, V.: Optimized Invariant Representation of Network Traffic for Detecting Unseen Malware Variants. In: 25th {USENIX} Security Symposium, pp. 807–822 (2016)
Google Scholar
Perdisci, R., Lee, W.: Method and System for Detecting Malicious and/or Botnet-Related Domain Names. Patent 10,027,688 (2018)
Google Scholar
Andriesse, D., Rossow, C., Bos, H.: Reliable recon in adversarial peer-to-peer botnets. In: 2015 Internet Measurement Conference, pp. 129–140 (2015)
Google Scholar
Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: Online learning of social representations. In: 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710 (2014)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: Large-scale information network embedding. In: 24^th International Conference on World Wide Web, pp. 1067–1077 (2015)
Google Scholar
Grover, A., Leskovec, J.: Node2vec: Scalable Feature Learning for Networks. In: 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864 (2016)
Google Scholar
Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Netw. 20(1), 61–80 (2008)
Article Google Scholar
Ribeiro, L.F., Saverese, P.H., Figueiredo, D.R.: Struc2vec: Learning node representations from structural identity. In: 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 385–394 (2017)
Google Scholar
Donnat, C., Zitnik, M., Hallac, D., Leskovec, J.: Learning structural node embeddings via diffusion wavelets. In: 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1320–1329 (2018)
Google Scholar
Joaristi, M., Serra, E.: SIR-GN: A fast structural iterative representation learning approach for graph nodes. ACM Trans. Knowl. Discov. Data 15(6), 1–39 (2021)
Article Google Scholar
Layne, J., Serra, E.: INFSIR-GN: Inferential Labeled Node and Graph Representation Learning. arXiv preprint arXiv:1918.10503 (2021)
Ceci, M., Cuzzocrea, A., Malerba, D.: Supporting roll-up and drill-down operations over OLAP data cubes with continuous dimensions via density-based hierarchical clustering. In: SEBD. Citeseer, pp. 57–65 (2011)
Google Scholar
Serra, E., Joaristi, M., Cuzzocrea, A.:, Large-scale sparse structural node representation. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 5247–5253. IEEE (2020)
Google Scholar
Braun, P., Cuzzocrea, A., Keding, T.D., Leung, C.K., Padzor, A.G., Sayson, D.: Game data mining: clustering and visualization of online game data in cyber-physical worlds. Procedia Comput. Sci. 112, 2259–2268 (2017)
Article Google Scholar
Guzzo, A., Sacca, D., Serra, E.: An effective approach to inverse frequent set mining. In: 2009 9th IEEE International Conference on Data Mining, pp. 806–811. IEEE (2009)
Google Scholar
Morris, K.J., Egan, S.D., Linsangan, J.L., Leung, C.K., Cuzzocrea, A., Hoi, C.S.: Token-based adaptive time-series prediction by ensembling linear and non-linear estimators: A machine learning approach for predictive analytics on big stock data”. In: 2018 17th IEEE International Conference on Machine Learning and Applications, pp. 1486–1491. IEEE (2018)
Google Scholar
Serra, E., Subrahmanian, V.: A survey of quantitative models of terror group behavior and an analysis of strategic disclosure of behavioral models. IEEE Trans. Comput. Soc. Syst. 1(1), 66–88 (2014)
Article Google Scholar
Bellatreche, L., Cuzzocrea, A., Benkrid, S.: F&A: A methodology for effectively and efficiently designing parallel relational data warehouses on heterogenous database clusters. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2010. LNCS, vol. 6263, pp. 89–104. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15105-7_8
Chapter Google Scholar
Korzh, O., Joaristi, M., Serra, E.: Convolutional neural network ensemble fine-tuning for extended transfer learning. In: Chin, F.Y.L., Chen, C.L.P., Khan, L., Lee, K., Zhang, L.-J. (eds.) BIGDATA 2018. LNCS, vol. 10968, pp. 110–123. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94301-5_9
Chapter Google Scholar
Ahn, S., et al.: A fuzzy logic based machine learning tool for supporting big data business analytics in complex artificial intelligence environments. In: 2019 IEEE International Conference on Fuzzy Systems, pp. 1–6. IEEE (2019)
Google Scholar
Serra, E., Sharma, A., Joaristi, M., Korzh, O.: Unknown landscape identification with CNN transfer learning. In: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 813–820. IEEE (2018)
Google Scholar
Serra, E., Shrestha, A., Spezzano, F., Squicciarini, A.: Deeptrust: An automatic framework to detect trustworthy users in opinion-based systems. In: 10th ACM Conference on Data and Application Security and Privacy, pp. 29–38 (2020)
Google Scholar
Joaristi, M., Serra, E., Spezzano, F.: Inferring bad entities through the panama papers network. In: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 767–773. IEEE (2018)
Google Scholar
Joaristi, M., Serra, E., Spezzano, F.: Detecting suspicious entities in offshore leaks networks. Soc. Netw. Anal. Min. 9(1), 1–15 (2019)
Article Google Scholar
CAIDA. The CAIDA UCSD Anonymized Internet Traces-2018. (2018). Accessed 16 Sept. 2017. https://www.caida.org/data/passive/passivedataset.xml
Kaashoek, M.F., Karger, D.R.: Koorde: A simple degree-optimal distributed hash table. In: Kaashoek, M.F., Stoica, I. (eds.) IPTPS 2003. LNCS, vol. 2735, pp. 98–107. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45172-3_9
Chapter Google Scholar
Maymounkov, P., Mazières, D.: Kademlia: A peer-to-peer information system based on the XOR metric. In: Druschel, P., Kaashoek, F., Rowstron, A. (eds.) IPTPS 2002. LNCS, vol. 2429, pp. 53–65. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45748-8_5
Chapter MATH Google Scholar
Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H., Chord, A.: A scalable peer-to-peer lookup service for internet applications. Lab. Comput. Sci., Massachusetts Inst. Technol., Tech. Rep. TR-819 (2001)
Google Scholar
Jelasity, M., Bilicki, V., et al.: Towards automated detection of peer-to-peer botnets: On the limits of local approaches. LEET 9, 3 (2009)
Google Scholar
Garcia, S., Grill, M., Stiborek, J., Zunino, A.: An empirical comparison of botnet detection methods. Comput. Secur. 45, 100–123 (2014)
Article Google Scholar
Zhou, J., Xu, Z., Rush, A.M., Yu, M.: Automating botnet detection with graph neural networks. arXiv preprint arXiv:2003.06344 (2020)
Coronato, A., Cuzzocrea, A.: An innovative risk assessment methodology for medical information systems. IEEE Trans. Knowl. Data Eng. 34(7), 3095–3110 (2020)
Google Scholar
Leung, C.K., Cuzzocrea, A., Mai, J.J., Deng, D., Jiang, F.: Personalized deepinf: Enhanced social influence prediction with deep learning and transfer learning. In: 2019 IEEE International Conference on Big Data, pp. 2871–2880. IEEE (2019)
Google Scholar
Leung, C.K., Braun, P., Hoi, C.S.H., Souza, J., Cuzzocrea, A.: Urban analytics of big transportation data for supporting smart cities. In: Ordonez, C., Song, I.-Y., Anderst-Kotsis, G., Tjoa, A.M., Khalil, I. (eds.) DaWaK 2019. LNCS, vol. 11708, pp. 24–33. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27520-4_3
Chapter Google Scholar
Leung, C.K., Chen, Y., Hoi, C.S., Shang, S., Wen, Y., Cuzzocrea, A.: Big data visualization and visual analytics of COVID-19 data. In: 24th International Conference Information Visualisation, pp. 415–420. IEEE (2020)
Google Scholar
Leung, C.K., Chen, Y., Hoi, C.S., Shang, S., Cuzzocrea, A.: Machine learning and OLAP on big COVID-19 data. In: 2020 IEEE International Conference on Big Data, pp. 5118–5127. IEEE (2020)
Google Scholar
Barkwell, K.E., et al.: Big data visualisation and visual analytics for music data mining. In: 22nd International Conference on Information Visualisation, pp. 235–240. IEEE (2018)
Google Scholar
Camara, R.C., et al.: Fuzzy logic-based data analytics on predicting the effect of hurricanes on the stock market. In: International Conference on Fuzzy Systems, pp. 1–8. IEEE (2018)
Google Scholar

Download references

Acknowledgement

This work was partially supported by project SERICS (PE00000014) under the MUR National Recovery and Resilience Plan funded by the European Union - NextGenerationEU.

Author information

Authors and Affiliations

Computer Science Department, Boise State University, Boise, ID, USA
Justin Carpenter, Janet Layne & Edoardo Serra
iDEA LAB, University of Calabria, Rende, Italy
Alfredo Cuzzocrea & Carmine Gallo

Authors

Justin Carpenter
View author publications
You can also search for this author in PubMed Google Scholar
Janet Layne
View author publications
You can also search for this author in PubMed Google Scholar
Edoardo Serra
View author publications
You can also search for this author in PubMed Google Scholar
Alfredo Cuzzocrea
View author publications
You can also search for this author in PubMed Google Scholar
Carmine Gallo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alfredo Cuzzocrea .

Editor information

Editors and Affiliations

University of Perugia, Perugia, Italy
Osvaldo Gervasi
University of Basilicata, Potenza, Italy
Beniamino Murgante
Monash University, Clayton, VIC, Australia
David Taniar
Kyushu Sangyo University, Fukuoka, Japan
Bernady O. Apduhan
University of Minho, Braga, Portugal
Ana Cristina Braga
University of Cagliari, Cagliari, Italy
Chiara Garau
National Technical University of Athens, Athens, Greece
Anastasia Stratigea

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carpenter, J., Layne, J., Serra, E., Cuzzocrea, A., Gallo, C. (2023). Structural Node Representation Learning for Detecting Botnet Nodes. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2023. ICCSA 2023. Lecture Notes in Computer Science, vol 13956 . Springer, Cham. https://doi.org/10.1007/978-3-031-36805-9_47

Download citation

DOI: https://doi.org/10.1007/978-3-031-36805-9_47
Published: 30 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36804-2
Online ISBN: 978-3-031-36805-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics