Skip to main content
Log in

High-performance network traffic analysis for continuous batch intrusion detection

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Network traffic analysis is applied to detect intrusions and manage application traffic. Continuous batch network traffic analysis is a computationally demanding task. Because of traffic intensity variations due to the natural peaks and crests of network traffic intensity, a network analysis cluster may have to be severely over-dimensioned to support 24/7 continuous packet block capture and processing. In this paper, we characterize the computational requirements of the network traffic packets for several conditions, which constitute a useful tool for generating a network workload in simulated scenarios. Our target MapReduce jobs are map-intensive, including string matching-based virus and malware detection. We present an architecture for a Hadoop-based network analysis solution including a scheduler, report on using this approach in a small cluster, and show scheduling performance results obtained through simulation. The scheduler considers a cloud-based traffic analysis solution that bursts traffic to the cloud to overcome local resource limitations. The results show that we are able to reduce the amount of the traffic to burst out by up to 50 % and still accomplish a continuous batch traffic analysis with single-job comparable run times.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. http://hadoop.apache.org.

  2. http://www.pravail.com.

  3. https://www.openstack.org/software/icehouse/.

  4. http://docs.openstack.org/developer/sahara/.

  5. The CAIDA UCSD http://www.caida.org/data/passive/trace_stats/.

References

  1. Stephen McGough A, Forshaw M, Gerrard C, Wheater S, Allen B, Robinson P (2014) Comparison of a cost-effective virtual cloud cluster with an existing campus cluster. Future Gen Comput Syst 41:65–78

    Article  Google Scholar 

  2. Guo T, Sharma U, Shenoy P, Wood T, Sahu S (2014) Cost-aware cloud bursting for enterprise applications. ACM Trans Internet Technol 13(3):1–24

    Article  Google Scholar 

  3. Nair SK et al (2010) Towards secure cloud bursting, brokerage and aggregation. In: Proceedings of the 8th IEEE European conference on web services, ECOWS 2010, pp 189–196

  4. Lee Y, Lee Y (2012) Toward scalable internet traffic measurement and analysis with Hadoop. ACM SIGCOMM Comput Commun Rev 43(1):5–13

    Article  Google Scholar 

  5. RIPE (2012) Large-scale PCAP data analysis using Apache Hadoop. https://github.com/RIPE-NCC/hadoop-pcap

  6. Pallavi A, Hemlata P (2012) Network traffic analysis using packet sniffer. Int J Eng Res Appl 2(3):854–856

    Google Scholar 

  7. Bicer T, Chiu D, Agrawal G (2011) A framework for data-intensive computing with cloud bursting. 2011 IEEE international conference on cluster computing, pp 169–177

  8. Kailasam S, Dhawalia P, Balaji SJ, Iyer G, Dharanipragada J (2014) Extending MapReduce across clouds with BStream. IEEE Trans Cloud Comput 2(3):362–376

    Article  Google Scholar 

  9. Chang H, Kodialam M, Kompella RR, Lakshman TV, Lee M, Mukherjee S (2011) Scheduling in mapreduce-like systems for fast completion time. IEEE INFOCOM, pp 3074–3082

  10. Mattess M, Calheiros RN, Buyya R (2013) Scaling MapReduce applications across hybrid clouds to meet soft deadlines. International conference on advanced information networking and applications, pp 629–636

  11. Verma A, Cherkasova L, Kumar VS, Campbell RH (2012) Deadline-based workload management for MapReduce environments: pieces of the performance puzzle. In: Proceedings of network operations and management symposium, pp 900–905

  12. Dong X, Wang Y, Liao H (2011) Scheduling mixed real-time and non-real-time applications in MapReduce environment. International conference on parallel and distributed systems, pp 9–16

  13. Hwang E, Kim KH (2012) Minimizing cost of virtual machines for deadline-constrained MapReduce applications in the cloud international conference on grid computing, pp 130–138

  14. Kc K, Anyanwu K (2010) Scheduling hadoop jobs to meet deadlines. In: Proceedings of IEEE second international conference on cloud computing technology and science, Indianapolis, pp 388–392

  15. Lim N, Majumdar S, Ashwood-Smith P (2014) A constraint programming-based resource management technique for processing MapReduce jobs with SLAs on clouds. International conference on parallel processing (ICPP), pp 411–421

  16. Gaj P, Kwiecie A, Stera P (2015) Estimating the intensity of long-range dependence in real and synthetic traffic traces. Springer Comput Netw 522:11–22

    Article  Google Scholar 

Download references

Acknowledgments

Work (partially) funded by the Operational Programme for Competitiveness and Internationalisation - COMPETE 2020 within project POCI-01-0145-FEDER-006961, and by FCT – Portuguese Foundation for Science and Technology as part of projects UID/EEA/50014/2013 and UID/CEC/00027/2013.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jorge G. Barbosa.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Morla, R., Gonçalves, P. & Barbosa, J.G. High-performance network traffic analysis for continuous batch intrusion detection. J Supercomput 72, 4107–4128 (2016). https://doi.org/10.1007/s11227-016-1743-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1743-6

Keywords

Navigation