Abstract
The accuracy and reliability of an anomaly-based network intrusion detection system are dependent on the quality of data used to build a normal behavior profile. However, obtaining these datasets is not trivial due to privacy, obsolescence, and suitability issues. This paper presents an approach to traffic trace sanitization based on the identification of anomalous patterns in a three-dimensional entropy space of the flow traffic data captured from a campus network. Anomaly-free datasets are generated by filtering out attacks and traffic pieces that modify the typical position of centroids in the entropy space. Our analyses were performed on real life traffic traces and show that the sanitized datasets have homogeneity and consistency in terms of cluster centroids and probability distributions of the PCA-transformed entropy space.
Similar content being viewed by others
References
Ahmad, I., Abdullah, A., Alghamdi, A., & Hussain, M. (2013). Optimized intrusion detection mechanism using soft computing techniques. Telecommunication Systems, 52(4), 2187–2195.
Anthes, G. (2010). Security in the cloud. Communications of the ACM, 53(11), 16–18.
Aydın, M. A., Zaim, A. H., & Ceylan, K. G. (2009). A hybrid intrusion detection system design for computer network security. Computers & Electrical Engineering, 35(3), 517–526.
Bace, R. G. (2000). Intrusion detection. Indianapolis, IN: Macmillan Technical Publishing.
Bermúdez-Edo, M., Salazar-Hernández, R., Díaz-Verdejo, J., & García-Teodoro, P. (2006). Proposals on assessment environments for anomaly-based network intrusion detection systems (Vol. 4347, pp. 210–221). Berlin: Springer.
Brown, C., Cowperthwaite, A., Hijazi, A. & Somayaji, A. (2009). Analysis of the 1999 DARPA/Lincoln Laboratory IDS evaluation data with NetADHICT. In Proceedings of the 2th IEEE international conference on computational intelligence for security and defense applications, Piscataway, NJ (pp. 67–73).
Brugger, S. T. & Chow, J. (2007). An assessment of the DARPA IDS evaluation dataset using snort. Technical Report CSE-2007-1.
Burkhart, M., Schatzmann, D., Trammell, B., Boschi, E., & Plattner, B. (2010). The role of network trace anonymization under attack. SIGCOMM Computer Communication Review, 40(1), 5–11.
Chen, L. M., Chen, M. C., Liao, W., & Sun, Y. S. (2013). A scalable network forensics mechanism for stealthy self-propagating attacks. Computer Communications, 36(13), 1471–1484.
Cisco Systems. (2011). Data sheets and literature. http://www.cisco.com/en/US/products/ps6601/prod_literature.html.
Claise, B. (2008). Specification of the IP flow information export (IPFIX) protocol for the exchange of IP traffic flow information. In RFC 5101.
Cretu, G. F., Stavrou, A., Locasto, M. E., Stolfo, S. J., & Keromytis, A. D. (2008). Casting out demons: Sanitizing training data for anomaly sensors. In IEEE symposium on security and privacy, SP 2008 (pp. 81–95).
Cretu, G., Stavrou, A., Stolfo, S. J., Keromytis, A. D., & Locasto, M. E. (2013). U.S. Patent No. 8,407,160. Washington, DC: U.S. Patent and Trademark Office.
Cureton, E. E., & D’Agostino, R. B. (1983). Factor analysis, an applied approach. Hillsdale, NJ: Lawrence Erlbaum Associates Inc.
Denning, D. E. (1987). An intrusion-detection model. IEEE Transactions on Software Engineering, 13(2), 222–232.
Elkan, C. (2003). Using the triangle inequality to qccelerate \(k\)-Means. In Proceedings of ICML (pp. 147–153).
Fraleigh, C., Moon, S., Lyles, B., Cotton, C., Khan, M., Moll, D., et al. (2003). Packet-level traffic measurements from the sprint IP backbone. IEEE Network, 17(6), 6–16.
Gang, L., Hongli, Z., Yu, Z., Qassrawi, M. T., Xiangzhan, Y., & Lizhi, P. (2013). Automatically mining application signatures for lightweight deep packet inspection. Communications, China, 10(6), 86–99.
He, W., Hu, G., & Zhou, Y. (2012). Large-scale IP network behavior anomaly detection and identification using substructure-based approach and multivariate time series mining. Telecommunication Systems, 50(1), 1–13.
He, D., Kumar, N., & Khan, M. K. (2014). Robust anonymous authentication protocol for healthcare applications using wireless medical sensor networks., Multimedia systems Berlin: Springer.
Huang, C., & Janies, J. (2009). An adaptive approach to granular real-time anomaly detection. EURASIP Journal on Advances in Signal Processing, 7, 893.
Jordan, E. H., Kelly, E. J., & Jordan, K. B. (2013). U.S. Patent Application 13/828,510.
Juniper Networks. (2010). http://www.juniper.net.
Knuth, D. (1997). The art of computer programming (3rd ed.). Reading, MA: Addison-Wesley.
Lakhina, A., Crovella, M., & Diot, C. (2005). Mining anomalies using traffic feature distributions. SIGCOMM ’05: Proceedings of the 2005 conference on applications, technologies, architectures, and protocols for computer communications (pp. 217–228). New York, NY: ACM.
Langin, C., & Rahimi, S. (2010). Soft computing in intrusion detection: the state of the art. Journal of Ambient Intelligence and Humanized Computing, 1(2), 133–145.
Laskov, P., et al. (2005). Learning intrusion detection: Supervised or unsupervised?., Image analysis and processing-ICIAP Berlin: Springer.
Macqueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Procedings of the fifth Berkeley symposium on math, statistics, and probability (Vol. 1, pp. 281–297). University of California Press.
Martinez, W. L., Martinez, A. R., & Solka, J. L. (2011). Exploratory data analysis with MATLAB (2nd ed.). Boca Raton: CRC.
McMahon, D. (2014). Beyond perimeter defense: Defense-in-depth leveraging upstream security. Best Practices in Computer Network Defense: Incident Detection and Response, 35, 43–53.
Mojena, R. (1977). Hierarchical grouping methods and stopping rules: An evaluation. The Computer Journal, 20(4), 359–363.
Nadeem, A., & Howarth, M. (2013). Protection of MANETs from a range of attacks using an intrusion detection and prevention system. Telecommunication Systems, 52(4), 2047–2058.
Narang, P., Ray, S., Hota, C., & Venkatakrishnan, V. (2014). PeerShark: Detecting peer-to-peer botnets by tracking conversations. In IEEE security and privacy workshops (pp. 108–115). IEEE.
Narang, P., Hota, C., & Venkatakrishnan, V. N. (2014). PeerShark: Flow-clustering and conversation-generation for malicious peer-to-peer traffic identification. EURASIP Journal on Information Security, 2014(1), 1–12.
Nikolova, E., & Jecheva, V. (2012). Some similarity coefficients and application of data mining techniques to the anomaly-based IDS. Telecommunication Systems, 50(2), 127–136.
Nychis, G., Sekar, V., Andersen, D. G., Kim, H. & Zhang H. (2008). An empirical evaluation of entropy-based traffic anomaly detection. In Proceedings of the internet measurement conference, Vouliagmeni (pp. 151–156).
Paxson, V. (2004). Strategies for sound internet measurement. In IMC ’04: Proceedings of the 4th ACM. SIGCOMM conference on Internet measurement (pp. 263–271).
Robinson, A., Chan, Y., & Dietz, D. (2006). Detecting a security disturbance in multi commodity stochastic networks. Telecommunication Systems, 31(1), 11–27.
RSA 2012 Cybercrime Trends Report. http://www.rsa.com.
Scott, D. W. (2001). Multivariate density estimation: Theory, practice, and visualization. Chichester: Wiley-Interscience.
Scott, D. W., & Rain, S. R. (2004). Multi-dimensional density estimation. In C. R. Rao & E. J. Wegman (Eds.), Handbook of statistics data mining and computational statistics. New York: Elsevier.
Shafi, K., Abbass, H. A. & Zhu, W. (2009). A methodology to evaluate supervised learning algorithms for intrusion detection, Technical Report.
Shiravi, A., Shiravi, H., Tavallaee, M., & Ghorbani, A. A. (2012). Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Computers & Security, 31(3), 357–374.
Silverman, B. W. (1986). Density estimation for statistics and data analysis. London: Chapman and Hall.
Silviera, F., Diot, C., Taft, N. & Goviandan, R. (2010, June). Detecting traffic anomalies using an equilibrium property. In Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems, New York (pp. 377–378).
Sperotto, A., Schaffrath, G., Sadre, R., Morariu, C., Pras, A., & Stiller, B. (2010). An overview of IP flow-based intrusion detection. IEEE Communications Surveys & Tutorials, 12(3), 343–356.
Tolle, J., Jahnke, M., Felde N. G., & Martini, P. (2006). Impact of sanitized message flows in a cooperative intrusion warning system. In IEEE MILCOM’06.
Velarde-Alvarado, P., Vargas-Rosales, C., Toral-Cruz, H., Ramirez-Pacheco, J., & Hernandez-Aquino, R. (2013). Characterizing flow-level traffic behavior with entropy spaces for anomaly detection, Building next-generation converged networks: Theory and practice. Baca Raton, FL: CRC.
Wressnegger, C., Schwenk, G., Arp, D., & Rieck, K. (2013, November). A close look on n-grams in intrusion detection: anomaly detection vs. classification. In Proceedings of the 2013 ACM workshop on artificial intelligence and security (pp. 67–76). New Yok, NY: ACM.
Xu, K., Zhang, Z., & Bhattacharyya, S. (2008). Internet traffic behavior profiling for network security monitoring. IEEE/ACM Transactions on Networking, 16(6), 1241–1252.
Acknowledgments
We would like to thank CONACyT for its support through its “SEP-CONACyT Ciencia Básica CB-2011” Research funding for Project number 167859. We also would like to thank the Telecommunications Focus Group at Tecnológico de Monterrey, Campus Monterrey.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Velarde-Alvarado, P., Vargas-Rosales, C., Martinez-Pelaez, R. et al. An unsupervised approach for traffic trace sanitization based on the entropy spaces. Telecommun Syst 61, 609–626 (2016). https://doi.org/10.1007/s11235-015-0017-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11235-015-0017-6