An unsupervised approach for traffic trace sanitization based on the entropy spaces

Velarde-Alvarado, Pablo; Vargas-Rosales, Cesar; Martinez-Pelaez, Rafael; Toral-Cruz, Homero; Martinez-Herrera, Alberto F.

doi:10.1007/s11235-015-0017-6

An unsupervised approach for traffic trace sanitization based on the entropy spaces

Published: 31 March 2015

Volume 61, pages 609–626, (2016)
Cite this article

Telecommunication Systems Aims and scope Submit manuscript

Pablo Velarde-Alvarado¹,
Cesar Vargas-Rosales²,
Rafael Martinez-Pelaez³,
Homero Toral-Cruz⁴ &
…
Alberto F. Martinez-Herrera²

409 Accesses
9 Citations
6 Altmetric
Explore all metrics

Abstract

The accuracy and reliability of an anomaly-based network intrusion detection system are dependent on the quality of data used to build a normal behavior profile. However, obtaining these datasets is not trivial due to privacy, obsolescence, and suitability issues. This paper presents an approach to traffic trace sanitization based on the identification of anomalous patterns in a three-dimensional entropy space of the flow traffic data captured from a campus network. Anomaly-free datasets are generated by filtering out attacks and traffic pieces that modify the typical position of centroids in the entropy space. Our analyses were performed on real life traffic traces and show that the sanitized datasets have homogeneity and consistency in terms of cluster centroids and probability distributions of the PCA-transformed entropy space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Study and Evaluation of Unsupervised Algorithms Used in Network Anomaly Detection

The Use of Statistical Signatures to Detect Anomalies in Computer Network

Adaptive Traffic Modelling for Network Anomaly Detection

References

Ahmad, I., Abdullah, A., Alghamdi, A., & Hussain, M. (2013). Optimized intrusion detection mechanism using soft computing techniques. Telecommunication Systems, 52(4), 2187–2195.
Anthes, G. (2010). Security in the cloud. Communications of the ACM, 53(11), 16–18.
Article Google Scholar
Aydın, M. A., Zaim, A. H., & Ceylan, K. G. (2009). A hybrid intrusion detection system design for computer network security. Computers & Electrical Engineering, 35(3), 517–526.
Article Google Scholar
Bace, R. G. (2000). Intrusion detection. Indianapolis, IN: Macmillan Technical Publishing.
Google Scholar
Bermúdez-Edo, M., Salazar-Hernández, R., Díaz-Verdejo, J., & García-Teodoro, P. (2006). Proposals on assessment environments for anomaly-based network intrusion detection systems (Vol. 4347, pp. 210–221). Berlin: Springer.
Brown, C., Cowperthwaite, A., Hijazi, A. & Somayaji, A. (2009). Analysis of the 1999 DARPA/Lincoln Laboratory IDS evaluation data with NetADHICT. In Proceedings of the 2th IEEE international conference on computational intelligence for security and defense applications, Piscataway, NJ (pp. 67–73).
Brugger, S. T. & Chow, J. (2007). An assessment of the DARPA IDS evaluation dataset using snort. Technical Report CSE-2007-1.
Burkhart, M., Schatzmann, D., Trammell, B., Boschi, E., & Plattner, B. (2010). The role of network trace anonymization under attack. SIGCOMM Computer Communication Review, 40(1), 5–11.
Article Google Scholar
Chen, L. M., Chen, M. C., Liao, W., & Sun, Y. S. (2013). A scalable network forensics mechanism for stealthy self-propagating attacks. Computer Communications, 36(13), 1471–1484.
Article Google Scholar
Cisco Systems. (2011). Data sheets and literature. http://www.cisco.com/en/US/products/ps6601/prod_literature.html.
Claise, B. (2008). Specification of the IP flow information export (IPFIX) protocol for the exchange of IP traffic flow information. In RFC 5101.
Cretu, G. F., Stavrou, A., Locasto, M. E., Stolfo, S. J., & Keromytis, A. D. (2008). Casting out demons: Sanitizing training data for anomaly sensors. In IEEE symposium on security and privacy, SP 2008 (pp. 81–95).
Cretu, G., Stavrou, A., Stolfo, S. J., Keromytis, A. D., & Locasto, M. E. (2013). U.S. Patent No. 8,407,160. Washington, DC: U.S. Patent and Trademark Office.
Cureton, E. E., & D’Agostino, R. B. (1983). Factor analysis, an applied approach. Hillsdale, NJ: Lawrence Erlbaum Associates Inc.
Google Scholar
Denning, D. E. (1987). An intrusion-detection model. IEEE Transactions on Software Engineering, 13(2), 222–232.
Article Google Scholar
Elkan, C. (2003). Using the triangle inequality to qccelerate \(k\)-Means. In Proceedings of ICML (pp. 147–153).
Fraleigh, C., Moon, S., Lyles, B., Cotton, C., Khan, M., Moll, D., et al. (2003). Packet-level traffic measurements from the sprint IP backbone. IEEE Network, 17(6), 6–16.
Article Google Scholar
Gang, L., Hongli, Z., Yu, Z., Qassrawi, M. T., Xiangzhan, Y., & Lizhi, P. (2013). Automatically mining application signatures for lightweight deep packet inspection. Communications, China, 10(6), 86–99.
Article Google Scholar
He, W., Hu, G., & Zhou, Y. (2012). Large-scale IP network behavior anomaly detection and identification using substructure-based approach and multivariate time series mining. Telecommunication Systems, 50(1), 1–13.
He, D., Kumar, N., & Khan, M. K. (2014). Robust anonymous authentication protocol for healthcare applications using wireless medical sensor networks., Multimedia systems Berlin: Springer.
Google Scholar
Huang, C., & Janies, J. (2009). An adaptive approach to granular real-time anomaly detection. EURASIP Journal on Advances in Signal Processing, 7, 893.
Google Scholar
Jordan, E. H., Kelly, E. J., & Jordan, K. B. (2013). U.S. Patent Application 13/828,510.
Juniper Networks. (2010). http://www.juniper.net.
Knuth, D. (1997). The art of computer programming (3rd ed.). Reading, MA: Addison-Wesley.
Google Scholar
Lakhina, A., Crovella, M., & Diot, C. (2005). Mining anomalies using traffic feature distributions. SIGCOMM ’05: Proceedings of the 2005 conference on applications, technologies, architectures, and protocols for computer communications (pp. 217–228). New York, NY: ACM.
Chapter Google Scholar
Langin, C., & Rahimi, S. (2010). Soft computing in intrusion detection: the state of the art. Journal of Ambient Intelligence and Humanized Computing, 1(2), 133–145.
Article Google Scholar
Laskov, P., et al. (2005). Learning intrusion detection: Supervised or unsupervised?., Image analysis and processing-ICIAP Berlin: Springer.
Google Scholar
Macqueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Procedings of the fifth Berkeley symposium on math, statistics, and probability (Vol. 1, pp. 281–297). University of California Press.
Martinez, W. L., Martinez, A. R., & Solka, J. L. (2011). Exploratory data analysis with MATLAB (2nd ed.). Boca Raton: CRC.
Google Scholar
McMahon, D. (2014). Beyond perimeter defense: Defense-in-depth leveraging upstream security. Best Practices in Computer Network Defense: Incident Detection and Response, 35, 43–53.
Google Scholar
Mojena, R. (1977). Hierarchical grouping methods and stopping rules: An evaluation. The Computer Journal, 20(4), 359–363.
Article Google Scholar
Nadeem, A., & Howarth, M. (2013). Protection of MANETs from a range of attacks using an intrusion detection and prevention system. Telecommunication Systems, 52(4), 2047–2058.
Narang, P., Ray, S., Hota, C., & Venkatakrishnan, V. (2014). PeerShark: Detecting peer-to-peer botnets by tracking conversations. In IEEE security and privacy workshops (pp. 108–115). IEEE.
Narang, P., Hota, C., & Venkatakrishnan, V. N. (2014). PeerShark: Flow-clustering and conversation-generation for malicious peer-to-peer traffic identification. EURASIP Journal on Information Security, 2014(1), 1–12.
Article Google Scholar
Nikolova, E., & Jecheva, V. (2012). Some similarity coefficients and application of data mining techniques to the anomaly-based IDS. Telecommunication Systems, 50(2), 127–136.
Nychis, G., Sekar, V., Andersen, D. G., Kim, H. & Zhang H. (2008). An empirical evaluation of entropy-based traffic anomaly detection. In Proceedings of the internet measurement conference, Vouliagmeni (pp. 151–156).
Paxson, V. (2004). Strategies for sound internet measurement. In IMC ’04: Proceedings of the 4th ACM. SIGCOMM conference on Internet measurement (pp. 263–271).
Robinson, A., Chan, Y., & Dietz, D. (2006). Detecting a security disturbance in multi commodity stochastic networks. Telecommunication Systems, 31(1), 11–27.
Article Google Scholar
RSA 2012 Cybercrime Trends Report. http://www.rsa.com.
Scott, D. W. (2001). Multivariate density estimation: Theory, practice, and visualization. Chichester: Wiley-Interscience.
Google Scholar
Scott, D. W., & Rain, S. R. (2004). Multi-dimensional density estimation. In C. R. Rao & E. J. Wegman (Eds.), Handbook of statistics data mining and computational statistics. New York: Elsevier.
Google Scholar
Shafi, K., Abbass, H. A. & Zhu, W. (2009). A methodology to evaluate supervised learning algorithms for intrusion detection, Technical Report.
Shiravi, A., Shiravi, H., Tavallaee, M., & Ghorbani, A. A. (2012). Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Computers & Security, 31(3), 357–374.
Article Google Scholar
Silverman, B. W. (1986). Density estimation for statistics and data analysis. London: Chapman and Hall.
Book Google Scholar
Silviera, F., Diot, C., Taft, N. & Goviandan, R. (2010, June). Detecting traffic anomalies using an equilibrium property. In Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems, New York (pp. 377–378).
Sperotto, A., Schaffrath, G., Sadre, R., Morariu, C., Pras, A., & Stiller, B. (2010). An overview of IP flow-based intrusion detection. IEEE Communications Surveys & Tutorials, 12(3), 343–356.
Article Google Scholar
Tolle, J., Jahnke, M., Felde N. G., & Martini, P. (2006). Impact of sanitized message flows in a cooperative intrusion warning system. In IEEE MILCOM’06.
Velarde-Alvarado, P., Vargas-Rosales, C., Toral-Cruz, H., Ramirez-Pacheco, J., & Hernandez-Aquino, R. (2013). Characterizing flow-level traffic behavior with entropy spaces for anomaly detection, Building next-generation converged networks: Theory and practice. Baca Raton, FL: CRC.
Google Scholar
Wressnegger, C., Schwenk, G., Arp, D., & Rieck, K. (2013, November). A close look on n-grams in intrusion detection: anomaly detection vs. classification. In Proceedings of the 2013 ACM workshop on artificial intelligence and security (pp. 67–76). New Yok, NY: ACM.
Xu, K., Zhang, Z., & Bhattacharyya, S. (2008). Internet traffic behavior profiling for network security monitoring. IEEE/ACM Transactions on Networking, 16(6), 1241–1252.
Article Google Scholar

Download references

Acknowledgments

We would like to thank CONACyT for its support through its “SEP-CONACyT Ciencia Básica CB-2011” Research funding for Project number 167859. We also would like to thank the Telecommunications Focus Group at Tecnológico de Monterrey, Campus Monterrey.

Author information

Authors and Affiliations

Area of Basic Sciences and Engineering, Autonomous University of Nayarit, 63155, Tepic, Nayarit, Mexico
Pablo Velarde-Alvarado
Department of Electrical and Computer Engineering, Tecnológico de Monterrey, Campus Monterrey, 64849, Monterrey, Nuevo Leon, Mexico
Cesar Vargas-Rosales & Alberto F. Martinez-Herrera
Department of Information Technology, Autonomous University of Ciudad Juarez, Chihuahua, Mexico
Rafael Martinez-Pelaez
Department of Sciences and Engineering, University of Quintana Roo, 77019, Chetumal, Quintana Roo, Mexico
Homero Toral-Cruz

Authors

Pablo Velarde-Alvarado
View author publications
You can also search for this author in PubMed Google Scholar
Cesar Vargas-Rosales
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Martinez-Pelaez
View author publications
You can also search for this author in PubMed Google Scholar
Homero Toral-Cruz
View author publications
You can also search for this author in PubMed Google Scholar
Alberto F. Martinez-Herrera
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pablo Velarde-Alvarado.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Velarde-Alvarado, P., Vargas-Rosales, C., Martinez-Pelaez, R. et al. An unsupervised approach for traffic trace sanitization based on the entropy spaces. Telecommun Syst 61, 609–626 (2016). https://doi.org/10.1007/s11235-015-0017-6

Download citation

Published: 31 March 2015
Issue Date: March 2016
DOI: https://doi.org/10.1007/s11235-015-0017-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An unsupervised approach for traffic trace sanitization based on the entropy spaces

Abstract

Access this article

Similar content being viewed by others

Study and Evaluation of Unsupervised Algorithms Used in Network Anomaly Detection

The Use of Statistical Signatures to Detect Anomalies in Computer Network

Adaptive Traffic Modelling for Network Anomaly Detection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An unsupervised approach for traffic trace sanitization based on the entropy spaces

Abstract

Access this article

Similar content being viewed by others

Study and Evaluation of Unsupervised Algorithms Used in Network Anomaly Detection

The Use of Statistical Signatures to Detect Anomalies in Computer Network

Adaptive Traffic Modelling for Network Anomaly Detection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation