Abstract
Anomaly detection in a communication network is a powerful tool for predicting faults, detecting network sabotage attempts and learning user profiles for marketing purposes and quality of services improvements. In this article, we convert the unsupervised data mining learning problem into a supervised classification problem. We will propose three methods for creating an associative anomaly within a given commercial traffic data database and demonstrate how, using the Principle Component Analysis (PCA) algorithm, we can detect the network anomaly behavior and classify between a regular data stream and a data stream that deviates from a routine, at the IP network layer level. Although the PCA method was used in the past for the task of anomaly detection, there are very few examples where such tasks were performed on real traffic data that was collected and shared by a commercial company.
The article presents three interesting innovations: The first one is the use of an up-to-date database produced by the users of an international communications company. The dataset for the data mining algorithm retrieved from a data center which monitors and collects low-level network transportation log streams from all over the world. The second innovation is the ability to enable the labeling of several types of anomalies, from untagged datasets, by organizing and prearranging the database. The third innovation is the abilities, not only to detect the anomaly but also, to coloring the anomaly type. I.e., identification, classification and labeling some forms of the abnormality.
This work was supported by the Israel Innovation Authority (Formerly the Office of the Chief Scientist and MATIMOP).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Estan, C., Savage, S., Varghese, G.: Automatically inferring patterns of resource consumption in network traffic. In: ACM SIGCOMM, Karlsruhe, Germany, pp. 137–148 (2003)
Zhang, Y., Singh, S., Sen, S., Duffield, N., Lund, C.: Online identification of hierarchical heavy hitters: algorithms, evaluation, and applications. In: ACM Internet Measurement Conference, Taormina, Sicily, Italy, pp. 101–114 (2004)
Barford, P., Kline, J., Plonka, D., Ron, A.: A signal analysis of network traffic anomalies. In: ACM Internet Measurement Workshop, Marseille, France, pp. 71–82 (2002)
Krishnamurthy, B., Sen, S., Zhang, Y., Chen, Y.: Sketch-based change detection: methods, evaluation, and applications. In: ACM Internet Measurement Conference, Miami Beach, FL, USA, pp. 234–247 (2003)
Zhang, Y., Ge, Z., Greenberg, A., Roughan, M.: Network anomography. In: ACM Internet Measurement Conference, Berkeley, California, USA, October 2005
Soule, A., Salamatian, K., Taft, N.: Combining filtering and statistical methods for anomaly detection. In: ACM Internet Measurement Conference, Berkeley, California, USA, October 2005
Lakhina, A., Crovella, M., Diot, C.: Mining anomalies using traffic feature distributions. In: ACM SIGCOMM, Philadelphia, Pennsylvania, USA, pp. 217–228 (2005)
Lakhina, A., Crovella, M., Diot, C.: Diagnosing network-wide traffic anomalies. In: ACM SIGCOMM, Portland, Oregon, USA, pp. 219–230 (2004)
Soule, A., Ringberg, H., Silveira, F., Rexford, J., Diot, C.: Detectability of traffic anomalies in two adjacent networks. In: Passive and Active Measurement Conference (2007)
Mai, J., Chuah, C.-N., Sridharan, A., Ye, T., Zang, H.: Is sampled data sufficient for anomaly detection? In: ACM Internet measurement Conference, Rio de Janeriro, Brazil, pp. 165–176 (2006)
Mai, J., Sridharan, A., Chuah, C.-N., Zang, H., Ye, T.: Impact of packet sampling on portscan detection. IEEE J. Sel. Areas Commun. 24, 2285–2298 (2006)
Brauckhoff, D., Tellenbach, B., Wagner, A., May, M., Lakhina, A.: Impact of packet sampling on anomaly detection metrics. In: ACM Internet Measurement Conference, Rio de Janeriro, Brazil, pp. 159–164 (2006)
Fodor, I.K.: A Survey of Dimension Reduction Techniques, Technical report UCRL-ID-148494, Lawrence Livermore Nat’l Laboratory, Center for Applied Scientific Computing, June 2002
Mao, K.Z.: Identifying critical variables of principal components for unsupervised feature selection. IEEE Trans. Syst. Man Cybern. Part B 35, 339–344 (2005)
Breiman, L.: Statistical modeling: the two cultures. Stat. Sci. 16(3), 199–215 (2001)
Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Neural Comput. 9(7), 1545–1588 (1997)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)
Webb, A.R.: Statistical Pattern Recognition, 2nd edn. Wiley, Chichester (2002)
Viswanath, B., Bashir, M., Crovella, M., Guha, S., Gummadi, K., Krishnamurthy, B., Mislove, A.: Towards detecting anomalous user behavior in online social networks. In: 23rd USENIX Security Symposium (USENIX Security 14), pp. 223–238 (2014)
Bian, L.X., Crovella, F., Diot, M., Govindan, C., Iannaccone, R., Lakhina, A.: Detection and identification of network anomalies using sketch subspaces. In: Proceedings of the 6th ACM SIGCOMM Conference on Internet Measurement, pp. 147–152 (2006)
Lakhina, A., Crovella, M., Diot, C.: Characterization of network-wide anomalies in traffic flows. In: Proceedings of the 4th ACM SIGCOMM Conference on Internet Measurement, pp. 201–206, (2004)
Lakhina, A., Crovella, M., Diot, C.: Diagnosing network-wide traffic anomalies. SIGCOMM Comput. Commun. Rev. 34(4), 219–230 (2004)
Lakhina, A., Crovella, M., Diot, C.: Mining anomalies using traffic feature distributions. SIGCOMM Comput. Commun. Rev. 35(4), 217–228 (2005)
Lakhina, A., Papagiannaki, K., Crovella, M., Diot, C., Kolaczyk, E., Taft, N.: Structural analysis of network traffic flows. SIGMETRICS Perform. Eval. Rev. 32(1), 61–72 (2004)
Martin, R.A., Schwabacher, M., Oza, N., Srivastava, A.: Comparison Of Unsupervised Anomaly Detection Methods For Systems Health Management Using Space Shuttle Main Engine Data. Researchgate (2007)
Anderson, T.W.: An Introduction to Multivariate Statistical Analysis. Wiley Series in Probability and Mathematical Statistics, 2nd edn. Wiley, New York (1984)
Muirhead, R.J.: Aspects of Multivariate Statistical Theory. Wiley, New York (1982)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Segal, Y., Vilenchik, D., Hadar, O. (2018). Detecting and Coloring Anomalies in Real Cellular Network Using Principle Component Analysis. In: Dinur, I., Dolev, S., Lodha, S. (eds) Cyber Security Cryptography and Machine Learning. CSCML 2018. Lecture Notes in Computer Science(), vol 10879. Springer, Cham. https://doi.org/10.1007/978-3-319-94147-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-94147-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94146-2
Online ISBN: 978-3-319-94147-9
eBook Packages: Computer ScienceComputer Science (R0)