Abstract
Cloud computing resources are sometimes hijacked for fraudulent use. While some fraudulent use manifests as a small-scale resource consumption, a more serious type of fraud is that of fraud storms, which are events of large-scale fraudulent use. These events begin when fraudulent users discover new vulnerabilities in the sign up process, which they then exploit in mass. The ability to perform early detection of these storms is a critical component of any cloud-based public computing system.
In this work we analyze telemetry data from Microsoft Azure to detect fraud storms and raise early alerts on sudden increases in fraudulent use. The use of machine learning approaches to identify such anomalous events involves two inherent challenges: the scarcity of these events, and at the same time, the high frequency of anomalous events in cloud systems.
We compare the performance of a supervised approach to the one achieved by an unsupervised, multivariate anomaly detection framework. We further evaluate the system performance taking into account practical considerations of robustness in the presence of missing values, and minimization of the model’s data collection period.
This paper describes the system, as well as the underlying machine learning algorithms applied. A beta version of the system is deployed and used to continuously control fraud levels in Azure.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bhaduri, K., Das, K., Matthews, B.L.: Detecting abnormal machine characteristics in cloud infrastructures. In: 2011 IEEE 11th International Conference on Data Mining Workshops (ICDMW), pp. 137–144. IEEE (2011)
Zhu, Q., Tung, T., Xie, Q.: Automatic fault diagnosis in cloud infrastructure. In: 2013 IEEE 5th International Conference on Cloud Computing Technology and Science (CloudCom), pp. 467–474. IEEE (2013)
Hormozi, E., Hormozi, H., Akbari, M.K., Javan, M.S.: Using of machine learning into cloud environment (A Survey): managing and scheduling of resources in cloud systems. In: 2012 Seventh International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), pp. 363–368 (2012)
Beloglazov, A., Buyya, R.: Energy efficient resource management in virtualized cloud data centers. In: Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, pp. 826–831. IEEE Computer Society (2010)
Beloglazov, A., Abawajy, J., Buyya, R.: Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing. Future Gener. Comput. Syst. 28, 755–768 (2012)
Hashizume, K., Rosado, D.G., Fernández-Medina, E., Fernandez, E.B.: An analysis of security issues for cloud computing. J. Internet Serv. Appl. 4, 1–13 (2013)
Fernandes, D.A., Soares, L.F., Gomes, J.V., Freire, M.M., Inácio, P.R.: Security issues in cloud environments: a survey. Int. J. Inf. Secur. 13, 113–170 (2014)
Bontempi, G., Ben Taieb, S., Le Borgne, Y.-A.: Machine learning strategies for time series forecasting. In: Aufaure, M.-A., Zimányi, E. (eds.) eBISS 2012. LNBIP, vol. 138, pp. 62–77. Springer, Heidelberg (2013)
Aggarwal, C.C.: Outlier analysis. Springer Science & Business Media (2013)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: A survey. ACM Comput. Surv. CSUR. 41, 15 (2009)
Wang, C., Talwar, V., Schwan, K., Ranganathan, P.: Online detection of utility cloud anomalies using metric distributions. In: 2010 IEEE Network Operations and Management Symposium (NOMS), pp. 96–103. IEEE (2010)
Wang, C., Viswanathan, K., Choudur, L., Talwar, V., Satterfield, W., Schwan, K.: Statistical techniques for online anomaly detection in data centers. In: 2011 IFIP/IEEE International Symposium on Integrated Network Management (IM), pp. 385–392 (2011)
Dean, D.J., Nguyen, H., Gu, X.: Ubl: unsupervised behavior learning for predicting performance anomalies in virtualized cloud systems. In: Proceedings of the 9th International Conference on Autonomic Computing, pp. 191–200. ACM (2012)
Vallis, O., Hochenbaum, J., Kejariwal, A.: A novel technique for long-term anomaly detection in the cloud. In: Proceedings of the 6th USENIX Conference on Hot Topics in Cloud Computing, USENIX Association, Berkeley, CA, USA, pp. 15–15 (2014)
Li, L., McCann, J., Pollard, N.S., Faloutsos, C.: Dynammo: Mining and summarization of coevolving sequences with missing values. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 507–516. ACM (2009)
Wellenzohn, K., Mitterer, H., Gamper, J., Böhlen, M.H., Khayati, M.: Missing Value Imputation in Time Series using Top-k Case Matching
Xie, Y., Huang, J., Willett, R.: Changepoint detection for high-dimensional time series with missing data. ArXiv Prepr. ArXiv12085062 (2012)
Modi, C., Patel, D., Borisaniya, B., Patel, H., Patel, A., Rajarajan, M.: A survey of intrusion detection techniques in Cloud. J. Netw. Comput. Appl. 36, 42–57 (2013)
Bay, S., Saito, K., Ueda, N., Langley, P.: A framework for discovering anomalous regimes in multivariate time-series data with local models. In: Symposium on Machine Learning for Anomaly Detection, Stanford, USA (2004)
Rousseeuw, P.J., Driessen, K.V.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999)
Multivariate normal distribution (2015). http://en.wikipedia.org/w/index.php?title=Multivariate_normal_distribution&oldid=651587942
Genz, A., Bretz, F.: Computation of multivariate normal and t probabilities. Springer Science & Business Media (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Neuvirth, H., Finkelstein, Y., Hilbuch, A., Nahum, S., Alon, D., Yom-Tov, E. (2015). Early Detection of Fraud Storms in the Cloud. In: Bifet, A., et al. Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2015. Lecture Notes in Computer Science(), vol 9286. Springer, Cham. https://doi.org/10.1007/978-3-319-23461-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-23461-8_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23460-1
Online ISBN: 978-3-319-23461-8
eBook Packages: Computer ScienceComputer Science (R0)