ABSTRACT
During the machine learning pipeline development, engineers need to validate the efficiency of the machine learning methods in order to assess the quality of the made forecast.
Due to the wide deployment and implementation of the machine learning models and methods across monitoring systems, the actual scientific problem is the assessment of these methods in the monitoring systems. This research has concluded that the current standard metrics are not sufficient to get the accurate assessment for the used machine learning methods.
This research has provided the new complex rating for anomaly detection regarding the use-cases of cloud monitoring systems. The main difference from the standard metrics is that the new approach includes better integration to the business processes, demanding resources, and a critical glance to the false-positive alerts. The new approach might be used in the model assessment in monitoring systems with the similar requirements:
Cost-effective use of computing resources
Low amount of false-positives
Fast detection of anomalies
Furthermore, the current research proposes new methods of computation capacity planning for different anomaly detection methods. These methods are not even limited to anomaly detection and could be used as a basis for developing capacity planning for other machine learning techniques and approaches.
· Applied computing∼Operations research∼Forecasting · Computer systems organization∼Architectures∼Distributed architectures ∼Cloud computing∼Forecasting · Computing methodologies∼Machine learning
- F. T. Liu, K. M. Ting and Z. Zhou, "Isolation Forest," 2008 Eighth IEEE International Conference on Data Mining, 2008, pp. 413-422, doi: 10.1109/ICDM.2008.17.Google ScholarDigital Library
- Chandola, Varun; Banerjee, Arindam; Kumar, Kumar (July 2009). "Anomaly Detection: A Survey". ACM Computing Surveys. 41. doi:10.1145/1541880.1541882. S2CID 207172599Google ScholarDigital Library
- Altman, N. S. “An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression.” The American Statistician, vol. 46, no. 3, [American Statistical Association, Taylor & Francis, Ltd.], 1992, pp. 175–85, https://doi.org/10.2307/2685209.Google Scholar
- Breunig, M. M.; Kriegel, H.-P.; Ng, R. T.; Sander, J. (2000). LOF: Identifying Density-based Local Outliers (PDF). Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. SIGMOD. pp. 93–104. doi:10.1145/335191.335388. ISBN 1-58113-217-4.Google ScholarDigital Library
- Mia Hubert, Michiel Debruyne, Peter J. Rousseeuw Minimum Covariance Determinant and Extensions // WIREs Computational Statistics. 2017, doi: 10.1002/wics.1421Google ScholarDigital Library
- Boyd K., Eng K.H., Page C.D. (2013) Area under the Precision-Recall Curve: Point Estimates and Confidence Intervals. In: Blockeel H., Kersting K., Nijssen S., Železný F. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2013. Lecture Notes in Computer Science, vol 8190. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40994-3_29Google ScholarDigital Library
- Zhao, Y., Nasrullah, Z. and Li, Z., 2019. PyOD: A Python Toolbox for Scalable Outlier Detection. Journal of machine learning research (JMLR), 20(96), pp.1-7.Google Scholar
- Goldstein M, Uchida S. A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data. PLoS One. 2016;11(4):e0152173. Published 2016 Apr 19. doi:10.1371/journal.pone.0152173Google Scholar
- Pevný, T. Loda: Lightweight on-line detector of anomalies. Mach Learn 102, 275–304 (2016). https://doi.org/10.1007/s10994-015-5521-0Google ScholarDigital Library
- Wan, X., Wang, W., Liu, J. et al. Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Med Res Methodol 14, 135 (2014). https://doi.org/10.1186/1471-2288-14-135Google Scholar
- Priyanga Dilini Talagala, Rob J. Hyndman & Kate Smith-Miles (2021) Anomaly Detection in High-Dimensional Data, Journal of Computational and Graphical Statistics, 30:2, 360-374, DOI: 10.1080/10618600.2020.1807997Google Scholar
- Botchkarev, Alexei. (2019). A New Typology Design of Performance Metrics to Measure Errors in Machine Learning Regression Algorithms. Interdisciplinary Journal of Information, Knowledge, and Management. 14. 45-79. 10.28945/4184.Google Scholar
- A. Tannenbaum, Structured Computer Organization. —.: Piter, 2013. — P. 476. — 884 P. — ISBN 978-5-469-01274-0Google Scholar
- Cook, Stephen A. "An overview of computational complexity." ACM Turing award lectures (2007): 1982.Google Scholar
- Davis, Jesse, and Mark Goadrich. "The relationship between Precision-Recall and ROC curves." Proceedings of the 23rd international conference on Machine learning. 2006 APAGoogle Scholar
- Petrov V.V., Gennadinik A.V., Avksentieva E., Bryukhanov K. Current issues and methods of event processing in systems with event-driven architecture // Journal of Theoretical and Applied Information Technology - 2021, Vol. 99, No. 9, pp. 1943-1954Google Scholar
- Sokolova, Marina, Nathalie Japkowicz, and Stan Szpakowicz. "Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation." Australasian joint conference on artificial intelligence. Springer, Berlin, Heidelberg, 2006.Google Scholar
- Lancia, Giuseppe, "SNPs problems, complexity, and algorithms." European symposium on algorithms. Springer, Berlin, Heidelberg, 2001.Google Scholar
- Gorman, Mel. Understanding the Linux virtual memory manager. Upper Saddle River: Prentice Hall, 2004.Google Scholar
- Abdulwahhab Alshammari, Raed Almalki, and Riyad Alshammari, "Developing a Predictive Model of Predicting Appointment No-Show by Using Machine Learning Algorithms," Journal of Advances in Information Technology, Vol. 12, No. 3, pp. 234-239, August 2021. doi: 10.12720/jait.12.3.234-239Google Scholar
- Dulyawit Prangchumpol and Pijitra Jomsri, "Annual Rainfall Model by Using Machine Learning Techniques for Agricultural Adjustment," Journal of Advances in Information Technology, Vol. 11, No. 3, pp. 161-165, August 2020. doi: 10.12720/jait.11.3.161-165Google ScholarCross Ref
- Linux profiling with performance counters // https://perf.wiki.kernel.org/index.php/Main_Page (access date: 27.10.2021).Google Scholar
- Deploying machine learning models with serverless templates // https://aws.amazon.com/blogs/compute/deploying-machine-learning-models-with-serverless-templates/ (access date: 14.09.2021).Google Scholar
- Fil: A memory profiler for Python // https://pythonspeed.com/fil/docs/index.html#fil-a-memory-profiler-for-python (access date: 21.09.2021).Google Scholar
- Scikit-Learn // https://scikit-learn.org/stable/index.html (access date: 21.09.2021).Google Scholar
- Geekbench // https://www.geekbench.com/ (access date: 03.10.2021).Google Scholar
- AMD μProf // https://developer.amd.com/amd-uprof/ (access date: 06.10.2021).Google Scholar
- Metrics for machine learning evaluation methods in cloud monitoring systems
Recommendations
A machine learning technique for monitoring database systems
ISCC '95: Proceedings of the IEEE Symposium on Computers and Communications (ISCC'95)A machine learning technique based database monitoring system is introduced. We report on a system which makes use of effective machine learning techniques to analyze user's queries and make appropriate, valuable suggestions to the database ...
Monitoring IaaS Cloud for Healthcare Systems: Healthcare Information Management and Cloud Resources Utilization
Healthcare functionality is enriched by cloud services which offers a perspective for broad integration and interoperability. Cloud-based facilities support healthcare systems to remain connected to remote access devices to various tasks and ...
Comments