ABSTRACT
We propose a novel approach for resource demand profiling of resource-intensive monolithic workflows that consist of different phases. Workflow profiling aims to estimate the resource demands of workflows. Such estimates are important for workflow scheduling in data centers and enable the efficient use of available resources. Our approach considers the workflows as black boxes, in other words, our approach can fully rely on recorded system-level metrics, which is the standard scenario from the perspective of data center operators. Our approach first performs an offline analysis of a dataset of resource consumption values of different runs of a considered workflow. For this analysis, we apply the time series segmentation algorithm PELT and the clustering algorithm DBSCAN. This analysis extracts individual phases and the respective resource demands. We then use the results of this analysis to train a Hidden Markov Model in a supervised manner for online phase detection. Furthermore, we provide a method to update the resource demand profiles at run-time of the workflows based on this phase detection. We test our approach on Earth Observation workflows that process satellite data. The results imply that our approach already works in some common scenarios. On the other hand, for cases where the behavior of individual phases is changed too much by contention, we identify room and next steps for improvements.
- Ahmed Adel and Amr El Mougy. 2022. Cloud Computing Predictive Resource Management Framework Using Hidden Markov Model. In 2022 5th Conference on Cloud and Internet of Things (CIoT). 205--212. https://doi.org/10.1109/CIoT53061.2022.9766809Google ScholarCross Ref
- Celia A. Baumhoer, Andreas J. Dietz, C. Kneisel, and C. Kuenzer. 2019. Automated Extraction of Antarctic Glacier and Ice Shelf Fronts from Sentinel-1 Imagery Using Deep Learning. Remote Sensing 11, 21 (2019). https://doi.org/10.3390/rs11212529Google ScholarCross Ref
- Jonas Eberle, Maximilian Schwinger, and Julian Zeidler. 2023. Challenges in the development of the EO Exploitation Platform terrabyte. In Proceedings of the 2023 Conference on Big Data from Space (BiDS'23) -- From foresight to impact (Vienna, Austria). Publications Office of the European Union, 97--100. https://doi.org/10.2760/46796Google ScholarCross Ref
- German Aerospace Center (DLR). [n. d.]. terrabyte. https://www.dlr.de/eoc/terrabyte. Accessed: 2023--12-04.Google Scholar
- Piyush Gupta, Shashidhar G Koolagudi, Rahul Khanna, Mrittika Ganguli, and Ananth Narayan Sankaranarayanan. 2015. Analytic technique for optimal workload scheduling in data-center using phase detection. In 5th International Conference on Energy Aware Computing Systems & Applications. IEEE, 1--4.Google ScholarCross Ref
- Arijit Khan, Xifeng Yan, Shu Tao, and Nikos Anerousis. 2012. Workload characterization and prediction in the cloud: A multiple time series approach. In 2012 IEEE Network Operations and Management Symposium. 1287--1294. https://doi.org/10.1109/NOMS.2012.6212065Google ScholarCross Ref
- Igor Klein, Ursula Gessner, Andreas J. Dietz, and Claudia Kuenzer. 2017. Global WaterPack -- A 250m resolution dataset revealing the daily dynamics of global inland water bodies. Remote Sensing of Environment 198 (2017), 345--362. https://doi.org/10.1016/j.rse.2017.06.045Google ScholarCross Ref
- David Buchaca Prats, Josep Lluís Berral, and David Carrera. 2017. Automatic generation of workload profiles using unsupervised learning pipelines. IEEE Transactions on Network and Service Management 15, 1 (2017), 142--155.Google ScholarCross Ref
- Ivo Rohwer. 2024. Resource Demand Profiling of Monolithic Workflows. https://www.codeocean.com/. https://doi.org/10.24433/CO.0301206.v1Google ScholarCross Ref
- Nilabja Roy, Abhishek Dubey, and Aniruddha Gokhale. 2011. Efficient Autoscaling in the Cloud Using Predictive Models for Workload Forecasting. In 2011 IEEE 4th International Conference on Cloud Computing. 500--507. https://doi.org/10.1109/CLOUD.2011.42Google ScholarDigital Library
- Frank Thonfeld, Ursula Gessner, Stefanie Holzwarth, Jennifer Kriese, Emmanuel Canova, Juliane Huth, and Claudia Kuenzer. 2022. A First Assessment of Canopy Cover Loss in Germany's Forests after the 2018--2020 Drought Years. Remote Sensing 14 (01 2022), 562. https://doi.org/10.3390/rs14030562Google ScholarCross Ref
- Anna Wendleder, Daniel Abele, Martin Huber, Birgit Wessel, Dennis Kaiser, John Truckenbrodt, Peter Friedl, Sandro Groth, Florian Fichtner, and Jonas Eberle. 2023. Sentinel-1 Normalized Radar Backscatter processing on the High-Performance Data Platform terrabyte. In IGARSS 2023. https://elib.dlr.de/196780/Google Scholar
- Da Yu Xu, Shan Lin Yang, and Ren Ping Liu. 2013. A mixture of HMM, GA, and Elman network for load prediction in cloud-oriented data centers. Journal of Zhejiang University: Science C 14, 11 (Nov. 2013), 845--858. https://doi.org/10.1631/jzus.C1300109Google ScholarCross Ref
- Qi Zhang, Mohamed Faten Zhani, Shuo Zhang, Quanyan Zhu, Raouf Boutaba, and Joseph L. Hellerstein. 2012. Dynamic Energy-Aware Capacity Provisioning for Cloud Computing Environments. In Proceedings of the 9th International Conference on Autonomic Computing (San Jose, California, USA) (ICAC '12). Association for Computing Machinery, New York, NY, USA, 145--154. https://doi.org/10.1145/2371536.2371562Google ScholarDigital Library
Index Terms
- Resource Demand Profiling of Monolithic Workflows
Recommendations
Characterizing and profiling scientific workflows
Researchers working on the planning, scheduling, and execution of scientific workflows need access to a wide variety of scientific workflows to evaluate the performance of their implementations. This paper provides a characterization of workflows from ...
Accurately Simulating Energy Consumption of I/O-Intensive Scientific Workflows
Computational Science – ICCS 2019AbstractWhile distributed computing infrastructures can provide infrastructure-level techniques for managing energy consumption, application-level energy consumption models have also been developed to support energy-efficient scheduling and resource ...
Exploiting resource profiling mechanism for large-scale scientific computing on grids
Large-scale scientific applications from various scientific domains (e.g., astronomy, physics, pharmaceuticals, chemistry, etc.) usually require substantial amounts of computing resources and storage space. International Grid computing resources can be ...
Comments