Skip to main content
Log in

An unsupervised approach for traffic trace sanitization based on the entropy spaces

  • Published:
Telecommunication Systems Aims and scope Submit manuscript

Abstract

The accuracy and reliability of an anomaly-based network intrusion detection system are dependent on the quality of data used to build a normal behavior profile. However, obtaining these datasets is not trivial due to privacy, obsolescence, and suitability issues. This paper presents an approach to traffic trace sanitization based on the identification of anomalous patterns in a three-dimensional entropy space of the flow traffic data captured from a campus network. Anomaly-free datasets are generated by filtering out attacks and traffic pieces that modify the typical position of centroids in the entropy space. Our analyses were performed on real life traffic traces and show that the sanitized datasets have homogeneity and consistency in terms of cluster centroids and probability distributions of the PCA-transformed entropy space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Ahmad, I., Abdullah, A., Alghamdi, A., & Hussain, M. (2013). Optimized intrusion detection mechanism using soft computing techniques. Telecommunication Systems, 52(4), 2187–2195.

  2. Anthes, G. (2010). Security in the cloud. Communications of the ACM, 53(11), 16–18.

    Article  Google Scholar 

  3. Aydın, M. A., Zaim, A. H., & Ceylan, K. G. (2009). A hybrid intrusion detection system design for computer network security. Computers & Electrical Engineering, 35(3), 517–526.

    Article  Google Scholar 

  4. Bace, R. G. (2000). Intrusion detection. Indianapolis, IN: Macmillan Technical Publishing.

    Google Scholar 

  5. Bermúdez-Edo, M., Salazar-Hernández, R., Díaz-Verdejo, J., & García-Teodoro, P. (2006). Proposals on assessment environments for anomaly-based network intrusion detection systems (Vol. 4347, pp. 210–221). Berlin: Springer.

  6. Brown, C., Cowperthwaite, A., Hijazi, A. & Somayaji, A. (2009). Analysis of the 1999 DARPA/Lincoln Laboratory IDS evaluation data with NetADHICT. In Proceedings of the 2th IEEE international conference on computational intelligence for security and defense applications, Piscataway, NJ (pp. 67–73).

  7. Brugger, S. T. & Chow, J. (2007). An assessment of the DARPA IDS evaluation dataset using snort. Technical Report CSE-2007-1.

  8. Burkhart, M., Schatzmann, D., Trammell, B., Boschi, E., & Plattner, B. (2010). The role of network trace anonymization under attack. SIGCOMM Computer Communication Review, 40(1), 5–11.

    Article  Google Scholar 

  9. Chen, L. M., Chen, M. C., Liao, W., & Sun, Y. S. (2013). A scalable network forensics mechanism for stealthy self-propagating attacks. Computer Communications, 36(13), 1471–1484.

    Article  Google Scholar 

  10. Cisco Systems. (2011). Data sheets and literature. http://www.cisco.com/en/US/products/ps6601/prod_literature.html.

  11. Claise, B. (2008). Specification of the IP flow information export (IPFIX) protocol for the exchange of IP traffic flow information. In RFC 5101.

  12. Cretu, G. F., Stavrou, A., Locasto, M. E., Stolfo, S. J., & Keromytis, A. D. (2008). Casting out demons: Sanitizing training data for anomaly sensors. In IEEE symposium on security and privacy, SP 2008 (pp. 81–95).

  13. Cretu, G., Stavrou, A., Stolfo, S. J., Keromytis, A. D., & Locasto, M. E. (2013). U.S. Patent No. 8,407,160. Washington, DC: U.S. Patent and Trademark Office.

  14. Cureton, E. E., & D’Agostino, R. B. (1983). Factor analysis, an applied approach. Hillsdale, NJ: Lawrence Erlbaum Associates Inc.

    Google Scholar 

  15. Denning, D. E. (1987). An intrusion-detection model. IEEE Transactions on Software Engineering, 13(2), 222–232.

    Article  Google Scholar 

  16. Elkan, C. (2003). Using the triangle inequality to qccelerate \(k\)-Means. In Proceedings of ICML (pp. 147–153).

  17. Fraleigh, C., Moon, S., Lyles, B., Cotton, C., Khan, M., Moll, D., et al. (2003). Packet-level traffic measurements from the sprint IP backbone. IEEE Network, 17(6), 6–16.

    Article  Google Scholar 

  18. Gang, L., Hongli, Z., Yu, Z., Qassrawi, M. T., Xiangzhan, Y., & Lizhi, P. (2013). Automatically mining application signatures for lightweight deep packet inspection. Communications, China, 10(6), 86–99.

    Article  Google Scholar 

  19. He, W., Hu, G., & Zhou, Y. (2012). Large-scale IP network behavior anomaly detection and identification using substructure-based approach and multivariate time series mining. Telecommunication Systems, 50(1), 1–13.

  20. He, D., Kumar, N., & Khan, M. K. (2014). Robust anonymous authentication protocol for healthcare applications using wireless medical sensor networks., Multimedia systems Berlin: Springer.

    Google Scholar 

  21. Huang, C., & Janies, J. (2009). An adaptive approach to granular real-time anomaly detection. EURASIP Journal on Advances in Signal Processing, 7, 893.

    Google Scholar 

  22. Jordan, E. H., Kelly, E. J., & Jordan, K. B. (2013). U.S. Patent Application 13/828,510.

  23. Juniper Networks. (2010). http://www.juniper.net.

  24. Knuth, D. (1997). The art of computer programming (3rd ed.). Reading, MA: Addison-Wesley.

    Google Scholar 

  25. Lakhina, A., Crovella, M., & Diot, C. (2005). Mining anomalies using traffic feature distributions. SIGCOMM ’05: Proceedings of the 2005 conference on applications, technologies, architectures, and protocols for computer communications (pp. 217–228). New York, NY: ACM.

    Chapter  Google Scholar 

  26. Langin, C., & Rahimi, S. (2010). Soft computing in intrusion detection: the state of the art. Journal of Ambient Intelligence and Humanized Computing, 1(2), 133–145.

    Article  Google Scholar 

  27. Laskov, P., et al. (2005). Learning intrusion detection: Supervised or unsupervised?., Image analysis and processing-ICIAP Berlin: Springer.

    Google Scholar 

  28. Macqueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Procedings of the fifth Berkeley symposium on math, statistics, and probability (Vol. 1, pp. 281–297). University of California Press.

  29. Martinez, W. L., Martinez, A. R., & Solka, J. L. (2011). Exploratory data analysis with MATLAB (2nd ed.). Boca Raton: CRC.

    Google Scholar 

  30. McMahon, D. (2014). Beyond perimeter defense: Defense-in-depth leveraging upstream security. Best Practices in Computer Network Defense: Incident Detection and Response, 35, 43–53.

    Google Scholar 

  31. Mojena, R. (1977). Hierarchical grouping methods and stopping rules: An evaluation. The Computer Journal, 20(4), 359–363.

    Article  Google Scholar 

  32. Nadeem, A., & Howarth, M. (2013). Protection of MANETs from a range of attacks using an intrusion detection and prevention system. Telecommunication Systems, 52(4), 2047–2058.

  33. Narang, P., Ray, S., Hota, C., & Venkatakrishnan, V. (2014). PeerShark: Detecting peer-to-peer botnets by tracking conversations. In IEEE security and privacy workshops (pp. 108–115). IEEE.

  34. Narang, P., Hota, C., & Venkatakrishnan, V. N. (2014). PeerShark: Flow-clustering and conversation-generation for malicious peer-to-peer traffic identification. EURASIP Journal on Information Security, 2014(1), 1–12.

    Article  Google Scholar 

  35. Nikolova, E., & Jecheva, V. (2012). Some similarity coefficients and application of data mining techniques to the anomaly-based IDS. Telecommunication Systems, 50(2), 127–136.

  36. Nychis, G., Sekar, V., Andersen, D. G., Kim, H. & Zhang H. (2008). An empirical evaluation of entropy-based traffic anomaly detection. In Proceedings of the internet measurement conference, Vouliagmeni (pp. 151–156).

  37. Paxson, V. (2004). Strategies for sound internet measurement. In IMC ’04: Proceedings of the 4th ACM. SIGCOMM conference on Internet measurement (pp. 263–271).

  38. Robinson, A., Chan, Y., & Dietz, D. (2006). Detecting a security disturbance in multi commodity stochastic networks. Telecommunication Systems, 31(1), 11–27.

    Article  Google Scholar 

  39. RSA 2012 Cybercrime Trends Report. http://www.rsa.com.

  40. Scott, D. W. (2001). Multivariate density estimation: Theory, practice, and visualization. Chichester: Wiley-Interscience.

    Google Scholar 

  41. Scott, D. W., & Rain, S. R. (2004). Multi-dimensional density estimation. In C. R. Rao & E. J. Wegman (Eds.), Handbook of statistics data mining and computational statistics. New York: Elsevier.

    Google Scholar 

  42. Shafi, K., Abbass, H. A. & Zhu, W. (2009). A methodology to evaluate supervised learning algorithms for intrusion detection, Technical Report.

  43. Shiravi, A., Shiravi, H., Tavallaee, M., & Ghorbani, A. A. (2012). Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Computers & Security, 31(3), 357–374.

    Article  Google Scholar 

  44. Silverman, B. W. (1986). Density estimation for statistics and data analysis. London: Chapman and Hall.

    Book  Google Scholar 

  45. Silviera, F., Diot, C., Taft, N. & Goviandan, R. (2010, June). Detecting traffic anomalies using an equilibrium property. In Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems, New York (pp. 377–378).

  46. Sperotto, A., Schaffrath, G., Sadre, R., Morariu, C., Pras, A., & Stiller, B. (2010). An overview of IP flow-based intrusion detection. IEEE Communications Surveys & Tutorials, 12(3), 343–356.

    Article  Google Scholar 

  47. Tolle, J., Jahnke, M., Felde N. G., & Martini, P. (2006). Impact of sanitized message flows in a cooperative intrusion warning system. In IEEE MILCOM’06.

  48. Velarde-Alvarado, P., Vargas-Rosales, C., Toral-Cruz, H., Ramirez-Pacheco, J., & Hernandez-Aquino, R. (2013). Characterizing flow-level traffic behavior with entropy spaces for anomaly detection, Building next-generation converged networks: Theory and practice. Baca Raton, FL: CRC.

    Google Scholar 

  49. Wressnegger, C., Schwenk, G., Arp, D., & Rieck, K. (2013, November). A close look on n-grams in intrusion detection: anomaly detection vs. classification. In Proceedings of the 2013 ACM workshop on artificial intelligence and security (pp. 67–76). New Yok, NY: ACM.

  50. Xu, K., Zhang, Z., & Bhattacharyya, S. (2008). Internet traffic behavior profiling for network security monitoring. IEEE/ACM Transactions on Networking, 16(6), 1241–1252.

    Article  Google Scholar 

Download references

Acknowledgments

We would like to thank CONACyT for its support through its “SEP-CONACyT Ciencia Básica CB-2011” Research funding for Project number 167859. We also would like to thank the Telecommunications Focus Group at Tecnológico de Monterrey, Campus Monterrey.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pablo Velarde-Alvarado.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Velarde-Alvarado, P., Vargas-Rosales, C., Martinez-Pelaez, R. et al. An unsupervised approach for traffic trace sanitization based on the entropy spaces. Telecommun Syst 61, 609–626 (2016). https://doi.org/10.1007/s11235-015-0017-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11235-015-0017-6

Keywords

Navigation