Toward Generating a Large Scale Intrusion Detection Dataset and Intruders Behavioral Profiling Using Network and Transportation Layers Traffic Flow Analyzer (NTLFlowLyzer)

Shafi, MohammadMoein; Lashkari, Arash Habibi; Roudsari, Arousha Haghighian

doi:10.1007/s10922-025-09917-0

Toward Generating a Large Scale Intrusion Detection Dataset and Intruders Behavioral Profiling Using Network and Transportation Layers Traffic Flow Analyzer (NTLFlowLyzer)

Published: 10 March 2025

Volume 33, article number 44, (2025)
Cite this article

Journal of Network and Systems Management Aims and scope Submit manuscript

MohammadMoein Shafi¹,
Arash Habibi Lashkari^1,2 &
Arousha Haghighian Roudsari³

378 Accesses
Explore all metrics

Abstract

In today’s digital landscape, network security and intrusion detection systems are crucial due to our growing dependence on interconnected systems and data exchange. IDS continuously monitors network traffic to detect threats and ensure the security and integrity of modern digital infrastructure. However, IDSs face several challenges, including low accuracy, high false positive rates, the absence of an effective behavioral profiling model, and the requirement for enhanced visualization capabilities. This paper introduces a groundbreaking pattern extraction and profiling system that addresses limitations in characterizing diverse network activities. We introduce a novel attribute selection algorithm, a groundbreaking approach for characterizing network activities, and a novel concept of local and global profiling, featuring the concept of a “super feature”. Our approach, which includes Attribute Extraction, Relation Extraction, and Entity Extraction, forms a robust foundation for precise activity characterization and accurate profiling. By emphasizing sub-behaviors through Local and Global profiling, we effectively mitigate the common issue of high false positive rates seen in previous methods. The approach culminates in the weighting of sub-profiles and the influence of the global profile on shaping comprehensive activity profiles, achieved through a neural network architecture. We perform practical implementation and validation by developing a new network traffic analyzer, NTLFlowLyzer, with an extensive set of over 300 features and introducing the updated benchmark data set BCCC-CSE-CIC-IDS2018. The experimental results showed that the proposed Local and Global profiling was effective in profiling different malicious activities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

NTFA: Network Flow Aggregator

Advanced IDS: a comparative study of datasets and machine learning algorithms for network flow-based intrusion detection systems

Article Open access 01 April 2025

Mapping an Enterprise Network by Analyzing DNS Traffic

Data Availability

After publishing this paper, the updated intrusion Detection Dataset, namely BCCC-CSE-CIC-IDS2018, will be publicly available on our website [8]. Additionally, the implementation code for the NTLFlowLyzer will be accessible on the GitHub repository [10].

References

Abdulganiyu, O.H., Ait Tchakoucht, T., Saheed, Y.K.: A systematic literature review for network intrusion detection system (ids). Int. J. Inf. Secur. 1–38 (2023)
de Neira, A.B., Kantarci, B., Nogueira, M.: Distributed denial of service attack prediction: challenges, open issues and opportunities. Comput. Netw. 222, 109553 (2023)
Google Scholar
Alashhab, Z.R., Anbar, M., Singh, M.M., Hasbullah, I.H., Jain, P., Al-Amiedy, T.A.: Distributed denial of service attacks against cloud computing environment: survey, issues, challenges and coherent taxonomy. Appl. Sci. 12(23), 12441 (2022)
Google Scholar
Markevych, M., Dawson, M.: A review of enhancing intrusion detection systems for cybersecurity using artificial intelligence (ai). Int. Confer. Knowl.-Based Org. 29, 30–37 (2023)
Google Scholar
Aloqaily, M., Kanhere, S., Bellavista, P., Nogueira, M.: Special issue on cybersecurity management in the era of ai. J. Netw. Syst. Manage. 30(3), 39 (2022)
Google Scholar
Shafi, M., Lashkari, A.H., Rodriguez, V., Nevo, R.: Toward generating a new cloud-based distributed denial of service (DDoS) dataset and cloud intrusion traffic characterization. Information 15(4), 195 (2024)
Google Scholar
Thakkar, A., Lohiya, R.: A review of the advancement in intrusion detection datasets. Proced. Comput. Sci. 167, 636–645 (2020)
Google Scholar
BCCC-CSE-CIC-IDS2018. BCCC updated intrusion detection dataset (2018) (bccc-cse-cic-ids2018). https://www.yorku.ca/research/bccc/ucs-technical/cybersecurity-datasets-cds/
Lashkari, A.H., Draper-Gil, G., Mamun, M.S.I., Ghorbani, A.A., et al.: Characterization of tor traffic using time based features. In: ICISSp, pp. 253–262 (2017)
BCCC: Network and transport layer flow analyzer (ntlflowlyzer). Behaviour-Centric Cybersecurity Center (BCCC). https://github.com/ahlashkari/NTLFlowLyzer
Tang, B., Wang, J., Yu, Z., Chen, B., Ge, W., Yu, J., Lu, T.: Advanced persistent threat intelligent profiling technique: a survey. Comput. Electr. Eng. 103, 108261 (2022)
Google Scholar
Shafi, M., Lashkari, A.H., Roudsari, A.H.: Ntlflowlyzer: towards generating an intrusion detection dataset and intruders behavior profiling through network and transport layers traffic analysis and pattern extraction. Comput. Secur. 148, 104160 (2025)
Google Scholar
Masdari, M., Khezri, H.: A survey and taxonomy of the fuzzy signature-based intrusion detection systems. Appl. Soft Comput. 92, 106301 (2020)
Google Scholar
Ayyagari, M.R., Kesswani, N., Kumar, M., Kumar, K.: Intrusion detection techniques in network environment: a systematic review. Wirel. Netw. 27(2), 1269–1285 (2021)
Google Scholar
Khraisat, A., Gondal, I., Vamplew, P., Kamruzzaman, J.: Survey of intrusion detection systems: techniques, datasets and challenges. Cybersecurity 2(1), 1–22 (2019)
Google Scholar
Hajj, S., El Sibai, R., Bou Abdo, J., Demerjian, J., Makhoul, A., Guyeux, C.: Anomaly-based intrusion detection systems: the requirements, methods, measurements, and datasets. Trans. Emerging Telecommun. Technol. 32(4), e4240 (2021)
Google Scholar
Kocher, G., Kumar, G.: Machine learning and deep learning methods for intrusion detection systems: recent developments and challenges. Soft. Comput. 25(15), 9731–9763 (2021)
Google Scholar
Devi, M., Nandal, P., Sehrawat, H.: A novel rule-based intrusion detection framework for secure wireless sensor networks (2023)
Einy, S., Oz, C., Navaei, Y.D.: The anomaly-and signature-based ids for network security using hybrid inference systems. Math. Probl. Eng. 2021, 1–10 (2021)
Google Scholar
Varzaneh, Z.A., Kuchaki Rafsanjani, M.: Intrusion detection system using a new fuzzy rule-based classification system based on genetic algorithm. Intell. Dec. Technol. 15(2), 231–237 (2021)
Google Scholar
Asad, H., Gashi, I.: Dynamical analysis of diversity in rule-based open source network intrusion detection systems. Empir. Softw. Eng. 27, 1–30 (2022)
Google Scholar
Díaz-Verdejo, J., Muñoz-Calle, J., Estepa Alonso, A., Estepa Alonso, R., Madinabeitia, G.: On the detection capabilities of signature-based intrusion detection systems in the context of web attacks. Appl. Sci. 12(2), 852 (2022)
Google Scholar
Liao, H.-J., Lin, C.-H.R., Lin, Y.-C., Tung, K.-Y.: Intrusion detection system: a comprehensive review. J. Netw. Comput. Appl. 36(1), 16–24 (2013)
Google Scholar
Mushtaq, E., Zameer, A., Umer, M., Abbasi, A.A.: A two-stage intrusion detection system with auto-encoder and lstms. Appl. Soft Comput. 121, 108768 (2022)
Google Scholar
Tama, B.A., Comuzzi, M., Rhee, K.-H.: Tse-ids: a two-stage classifier ensemble for intelligent anomaly-based intrusion detection system. IEEE Access 7, 94497–94507 (2019)
Google Scholar
Dina, A.S., Manivannan, D.: Intrusion detection based on machine learning techniques in computer networks. Internet Things 16, 100462 (2021)
Google Scholar
Zavrak, S., İskefiyeli, M.: Anomaly-based intrusion detection from network flow features using variational autoencoder. IEEE Access 8, 108346–108358 (2020)
Google Scholar
Salo, F., Injadat, M., Nassif, A.B., Shami, A., Essex, A.: Data mining techniques in intrusion detection systems: a systematic literature review. IEEE Access 6, 56046–56058 (2018)
Google Scholar
Guibene, K., Messai, N., Ayaida, M., Khoukhi, L.: A data mining-based intrusion detection system for cyber physical power systems. In: Proceedings of the 18th ACM International Symposium on QoS and Security for Wireless and Mobile Networks, pp. 55–62 (2022)
Mohan, L., Jain, S., Suyal, P., Kumar, A.: Data mining classification techniques for intrusion detection system. In: 2020 12th International Conference on Computational Intelligence and Communication Networks (CICN), pp. 351–355. IEEE (2020)
Monzer, M.-H., Beydoun, K., Ghaith, A., Flaus, J.-M.: Model-based ids design for icss. Reliab. Eng. Syst. Saf. 225, 108571 (2022)
Google Scholar
Sonchack, J., Aviv, A.J., Smith, J.M.: Cross-domain collaboration for improved ids rule set selection. J. Inf. Secur. Appl. 24, 25–40 (2015)
Google Scholar
Sagala, A.: Automatic snort ids rule generation based on honeypot log. In: 2015 7th International Conference on Information Technology and Electrical Engineering (ICITEE), pp. 576–580. IEEE (2015)
Tomandl, A., Fuchs, K.-P., Federrath, H.: Rest-net: a dynamic rule-based ids for vanets. In: 2014 7th IFIP Wireless and Mobile Networking Conference (WMNC), pp. 1–8. IEEE (2014)
Afzal, Z., Lindskog, S.: Ids rule management made easy. In: 2016 8th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), pp. 1–8. IEEE (2016)
AlYousef, M.Y., Abdelmajeed, N.T.: Dynamically detecting security threats and updating a signature-based intrusion detection systemâ€™s database. Proced. Comput. Sci. 159, 1507–1516 (2019)
Google Scholar
Li, W., Tug, S., Meng, W., Wang, Y.: Designing collaborative blockchained signature-based intrusion detection in iot environments. Fut. Gener. Comput. Syst. 96, 481–489 (2019)
Google Scholar
Wang, Y., Meng, W., Li, W., Li, J., Liu, W.-X., Xiang, Y.: A fog-based privacy-preserving approach for distributed signature-based intrusion detection. J. Parallel Distrib. Comput. 122, 26–35 (2018)
Google Scholar
Zhang, C., Jia, D., Wang, L., Wang, W., Liu, F., Yang, A.: Comparative research on network intrusion detection methods based on machine learning. Comput. Secur. 121, 102861 (2022)
Google Scholar
Ahmad, Z., Shahid Khan, A., Wai Shiang, C., Abdullah, J., Ahmad, F.: Network intrusion detection system: a systematic study of machine learning and deep learning approaches. Trans. Emerging Telecommun. Technol. 32(1), e4150 (2021)
Google Scholar
Kim, T., Pak, W.: Real-time network intrusion detection using deferred decision and hybrid classifier. Futur. Gener. Comput. Syst. 132, 51–66 (2022)
Google Scholar
Qiu, W., Ma, Y., Chen, X., Yu, H., Chen, L.: Hybrid intrusion detection system based on Dempster–Shafer evidence theory. Comput. Secur. 117, 102709 (2022)
Google Scholar
Herrera-Semenets, V., Hernández-León, R., van den Berg, J.: A fast instance reduction algorithm for intrusion detection scenarios. Comput. Electr. Eng. 101, 107963 (2022)
Google Scholar
Baldini, G., Amerini, I.: Online distributed denial of service (DDOS) intrusion detection based on adaptive sliding window and morphological fractal dimension. Comput. Netw. 210, 108923 (2022)
Google Scholar
Asif, M., Abbas, S., Khan, M., Fatima, A., Khan, M.A., Lee, S.-W.: Mapreduce based intelligent model for intrusion detection using machine learning technique. J. King Saud Univer.-Comput. Inf. Sci. (2021)
Hou, J., Liu, F., Lu, H., Tan, Z., Zhuang, X., Tian, Z.: A novel flow-vector generation approach for malicious traffic detection. J. Parallel Distrib. Comput. 169, 72–86 (2022)
Google Scholar
Rabbani, M., Wang, Y.L., Khoshkangini, R., Jelodar, H., Zhao, R., Hu, P.: A hybrid machine learning approach for malicious behaviour detection and recognition in cloud computing. J. Netw. Comput. Appl. 151, 102507 (2020)
Google Scholar
Herrmann, D., Banse, C., Federrath, H.: Behavior-based tracking: exploiting characteristic patterns in DNS traffic. Comput. Secur. 39, 17–33 (2013)
Google Scholar
Imran, M., Haider, N., Shoaib, M., Razzak, I., et al.: An intelligent and efficient network intrusion detection system using deep learning. Comput. Electr. Eng. 99, 107764 (2022)
Google Scholar
Liu, Q., Wang, D., Jia, Y., Luo, S., Wang, C.: A multi-task based deep learning approach for intrusion detection. Knowl.-Based Syst. 238, 107852 (2022)
Google Scholar
Ravi, V., Chaganti, R., Alazab, M.: Recurrent deep learning-based feature fusion ensemble meta-classifier approach for intelligent network intrusion detection system. Comput. Electr. Eng. 102, 108156 (2022)
Google Scholar
Li, B., Wang, Y., Xu, K., Cheng, L., Qin, Z.: Dfaid: density-aware and feature-deviated active intrusion detection over network traffic streams. Comput. Secur. 118, 102719 (2022)
Google Scholar
Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.A.: Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 31(3), 357–374 (2012)
Google Scholar
Garcia, S., Grill, M., Stiborek, J., Zunino, A.: An empirical comparison of botnet detection methods. Comput. Secur. 45, 100–123 (2014)
Google Scholar
Moustafa, N., Slay, J.: Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In: 2015 Military Communications and Information Systems Conference (MilCIS), pp. 1–6. IEEE (2015)
Lawrence, H., Ezeobi, U., Tauil, O., Nosal, J., Redwood, O., Zhuang, Y., Bloom, G.: Cupid: a labeled dataset with pentesting for evaluation of network intrusion detection. J. Syst. Arch. 129, 102621 (2022)
Google Scholar
Cohen, I., Huang, Y., Chen, J., Benesty, J., Benesty, J., Chen, J., Huang, Y., Cohen, I.: Pearson correlation coefficient. Noise Reduct. Speech Process. 1–4 (2009)
Awerbuch, B.: A new distributed depth-first-search algorithm. Inf. Process. Lett. 20(3), 147–150 (1985)
Google Scholar
Chen, Y.-C.: A tutorial on kernel density estimation and recent advances. Biostat. Epidemiol. 1(1), 161–187 (2017)
Google Scholar
Heer, J.: Fast & accurate gaussian kernel density estimation. In: 2021 IEEE Visualization Conference (VIS), pp. 11–15. IEEE (2021)
Sheikhpour, R., Sarram, M.A., Sheikhpour, R.: Particle swarm optimization for bandwidth determination and feature selection of kernel density estimation based classifiers in diagnosis of breast cancer. Appl. Soft Comput. 40, 113–131 (2016)
Google Scholar
He, Y.-L., Ye, X., Huang, D.-F., Huang, J.Z., Zhai, J.-H.: Novel kernel density estimator based on ensemble unbiased cross-validation. Inf. Sci. 581, 327–344 (2021)
MathSciNet Google Scholar
Kotsiantis, S., Kanellopoulos, D.: Association rules mining: a recent overview. GESTS Int. Trans. Comput. Sci. Eng. 32(1), 71–82 (2006)
Google Scholar
Zeng, Y., Yin, S., Liu, J., Zhang, M.: Research of improved fp-growth algorithm in association rules mining. Sci. Program. 2015, 6–6 (2015)
Google Scholar
Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of ICNN’95-International Conference on Neural Networks, vol. 4, pp. 1942–1948. IEEE (1995)
Poli, R., Kennedy, J., Blackwell, T.: Particle swarm optimization: an overview. Swarm Intell. 1, 33–57 (2007)
Google Scholar
Zhang, Z.: Improved adam optimizer for deep neural networks. In: 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), pp. 1–2. IEEE (2018)
Wei, Z., Wang, J., Zhao, Z., Shi, K.: Toward data efficient anomaly detection in heterogeneous edge-cloud environments using clustered federated learning. Futur. Gener. Comput. Syst. 164, 107559 (2025)
Google Scholar
Dilworth, R., Gudla, C.: Harnessing pu learning for enhanced cloud-based ddos detection: a comparative analysis. arXiv preprint arXiv:2410.18380 (2024)
Shafi, M., Lashkari, A.H., Mohanty, H.: Unveiling malicious dns behavior profiling and generating benchmark dataset through application layer traffic analysis. Comput. Electr. Eng. 118, 109436 (2024)
Google Scholar
Zou, F., Ren, Y., Zhu, J., Tang, J.: Detecting data leakage in dns traffic based on time series anomaly detection. In: 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys), pp. 503–510. IEEE (2021)
Lison, P., Mavroeidis, V.: Neural reputation models learned from passive dns data. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 3662–3671. IEEE (2017)
Göcs, L., Johanyák, Z.C.: Identifying relevant features of cse-cic-ids2018 dataset for the development of an intrusion detection system. arXiv preprint arXiv:2307.11544 (2023)
Sarhan, M., Layeghy, S., Portmann, M.: Feature analysis for machine learning-based iot intrusion detection. arXiv preprint arXiv:2108.12732 (2021)
Cambiaso, E., Papaleo, G., Chiola, G., Aiello, M.: Slow dos attacks: definition and categorisation. Int. J. Trust Manag. Comput. Commun. 1(3–4), 300–319 (2013)
Google Scholar
Cambiaso, E., Papaleo, G., Aiello, M.: Taxonomy of slow dos attacks to web applications. In: Recent Trends in Computer Networks and Distributed Systems Security: International Conference, SNDS 2012, Trivandrum, India, October 11–12, 2012. Proceedings 1, pp. 195–204. Springer (2012)
Lashkari, A.H., Gil, G.D., Mamun, M.S.I., Ghorbani, A.A.: Characterization of tor traffic using time based features. In: Proceeding of the 3rd International Conference on Information System Security and Privacy, SCITEPRESS (2017)

Download references

Acknowledgements

The authors acknowledge the grant from Canada Research Chair—Tier II (#CRC-2021-00340) and the Natural Sciences and Engineering Research Council of Canada—NSERC (#RGPIN-2020-04701)—to Arash Habibi Lashkari.

Funding

The authors acknowledge the grant from Canada Research Chair—Tier II (#CRC-2021-00340) and the Natural Sciences and Engineering Research Council of Canada—NSERC (#RGPIN-2020-04701)—to Arash Habibi Lashkari.

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, York University, Toronto, ON, Canada
MohammadMoein Shafi & Arash Habibi Lashkari
Behavior-Centric Cybersecurity Center (BCCC), School of Information Technology, York University, Toronto, ON, Canada
Arash Habibi Lashkari
School of Computing, Gachon University, Seongnam, South Korea
Arousha Haghighian Roudsari

Authors

MohammadMoein Shafi
View author publications
You can also search for this author inPubMed Google Scholar
Arash Habibi Lashkari
View author publications
You can also search for this author inPubMed Google Scholar
Arousha Haghighian Roudsari
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

MohammadMoein Shafi: Designed and implemented the model and all related code; wrote and prepared the main manuscript text and conceptualization. Arash Habibi Lashkari: Contributed to supervision, conceptualization, writing-review, editing, and securing the fund. Arousha Haghighian Roudsari: Provided advice and suggestions as the collaborator, along with reviewing and editing the manuscript.

Corresponding author

Correspondence to MohammadMoein Shafi.

Ethics declarations

Conflict of interest

The authors do not have any relevant Conflict of interest to disclose concerning the content of this paper.

Ethical Approval

This article does not involve any research studies conducted with human participants or animals by any authors.

Consent for Publication

Permitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shafi, M., Lashkari, A.H. & Roudsari, A.H. Toward Generating a Large Scale Intrusion Detection Dataset and Intruders Behavioral Profiling Using Network and Transportation Layers Traffic Flow Analyzer (NTLFlowLyzer). J Netw Syst Manage 33, 44 (2025). https://doi.org/10.1007/s10922-025-09917-0

Download citation

Received: 05 May 2024
Revised: 06 December 2024
Accepted: 18 February 2025
Published: 10 March 2025
DOI: https://doi.org/10.1007/s10922-025-09917-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Toward Generating a Large Scale Intrusion Detection Dataset and Intruders Behavioral Profiling Using Network and Transportation Layers Traffic Flow Analyzer (NTLFlowLyzer)

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

NTFA: Network Flow Aggregator

Advanced IDS: a comparative study of datasets and machine learning algorithms for network flow-based intrusion detection systems

Mapping an Enterprise Network by Analyzing DNS Traffic

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Consent for Publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now