research-article

IPOC: An Adaptive Interval Prediction Model based on Online Chasing and Conformal Inference for Large-Scale Systems

Authors:

Xiaofeng GaoAuthors Info & Claims

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 202 - 212

https://doi.org/10.1145/3580305.3599396

Published: 04 August 2023 Publication History

Abstract

In large-scale systems, due to system complexity and demand volatility, diverse and dynamic workloads make accurate predictions difficult. In this work, we address an online interval prediction problem (OnPred-Int) and adopt ensemble learning to solve it. We depict that the ensemble learning for OnPred-Int is a dynamic deterministic Markov Decision Process (Dd-MDP) and convert it into a stateful online learning task. Then we propose IPOC, a lightweight and flexible model able to produce effective confidence intervals, adapting the dynamics of real-time workload streams. At each time, IPOC selects a target model and executes chasing for it by a designed chasing oracle, during which process IPOC produces accurate confidence intervals. The effectiveness of IPOCis theoretically validated through sublinear regret analysis and satisfaction of confidence interval requirements. Besides, we conduct extensive experiments on 4 real-world datasets comparing with 19 baselines. To the best of our knowledge, we are the first to apply the frontier theory of online learning to time series prediction tasks.

Supplementary Material

MP4 File (rtfp0591-2min-promo.mp4)

Presentation video

Download
3.94 MB

References

[1]

Noga Alon, Mark Bun, Roi Livni, Maryanthe Malliaris, and Shay Moran. 2022. Private and online learnability are equivalent. ACM Journal of the ACM (JACM), Vol. 69, 4 (2022), 1--34.

Digital Library

[2]

Shivani Arbat, Vinodh Kumaran Jayakumar, Jaewoo Lee, Wei Wang, and In Kee Kim. 2022. Wasserstein Adversarial Transformer for Cloud Workload Prediction. In Conference on Artificial Intelligence (AAAI). 12433--12439.

[3]

Rina Foygel Barber, Emmanuel J Candes, Aaditya Ramdas, and Ryan J Tibshirani. 2022. Conformal prediction beyond exchangeability. arXiv preprint arXiv:2202.13415 (2022).

[4]

George EP Box and Gwilym M Jenkins. 1968. Some Recent Advances in Forecasting and Control. Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 17, 2 (1968), 91--109.

[5]

Leo Breiman. 1996. Bagging Predictors. Machine Learning, Vol. 24, 2 (1996), 123--140.

[6]

Leo Breiman, Jerome H Friedman, Richard A Olshen, and Charles J Stone. 2017. Classification and Regression Trees. Routledge.

[7]

Rajkumar Buyya, Satish Narayana Srirama, Giuliano Casale, Rodrigo N. Calheiros, Yogesh Simmhan, Blesson Varghese, Erol Gelenbe, Bahman Javadi, Luis Miguel Vaquero, Marco A. S. Netto, Adel Nadjaran Toosi, Maria Alejandra Rodriguez, Ignacio Martín Llorente, Sabrina De Capitani di Vimercati, Pierangela Samarati, Dejan S. Milojicic, Carlos A. Varela, Rami Bahsoon, Marcos Dias de Assuncc a o, Omer F. Rana, Wanlei Zhou, Hai Jin, Wolfgang Gentzsch, Albert Y. Zomaya, and Haiying Shen. 2019. A Manifesto for Future Generation Cloud Computing: Research Directions for the Next Decade. ACM Comput. Surv., Vol. 51, 5 (2019), 105:1--105:38.

Digital Library

[8]

Rodrigo N Calheiros, Enayat Masoumi, Rajiv Ranjan, and Rajkumar Buyya. 2014. Workload Prediction Ssing ARIMA Model and Its Impact on Cloud Applications' QoS. IEEE Transactions on Cloud Computing (TCC), Vol. 3, 4 (2014), 449--458.

Digital Library

[9]

Vitor Cerqueira, Luis Torgo, Mariana Oliveira, and Bernhard Pfahringer. 2017. Dynamic and Heterogeneous Ensembles for Time Series Forecasting. In International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 242--251.

[10]

Mincheng Chen, Jingling Yuan, Dongling Liu, and Tao Li. 2020b. An Adaption Scheduling Based on Dynamic Weighted Random Forests for Load Demand Forecasting. The Journal of Supercomputing (TJSC), Vol. 76, 3 (2020), 1735--1753.

[11]

Shi-Yong Chen, Yang Yu, Qing Da, Jun Tan, Hai-Kuan Huang, and Hai-Hong Tang. 2018. Stabilizing reinforcement learning in dynamic environment with application to online recommendation. In International Conference on Knowledge Discovery and Data Mining (KDD). 1187--1196.

Digital Library

[12]

Tianqi Chen and Carlos Guestrin. 2016. XGBOOST: A Scalable Tree Boosting System. In International Conference on Knowledge Discovery and Data Mining (KDD). 785--794.

Digital Library

[13]

Yitian Chen, Yanfei Kang, Yixiong Chen, and Zizhuo Wang. 2020a. Probabilistic Forecasting with Temporal Convolutional Neural Network. NeuroComputing, Vol. 399 (2020), 491--501.

[14]

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations Ssing RNN Encoder-Decoder for Statistical Machine Translation. arXiv preprint arXiv:1406.1078 (2014).

[15]

Sudipto Das, Feng Li, Vivek R Narasayya, and Arnd Christian König. 2016. Automated Demand-driven Resource Scaling in Relational Database-as-a-service. In ACM International Conference on Management of Data (SIGMOD). 1923--1934.

[16]

Harris Drucker, Christopher J Burges, Linda Kaufman, Alex Smola, and Vladimir Vapnik. 1996. Support Vector Regression Machines. Advances in Neural Information Processing Systems (NeurIPS), Vol. 9 (1996), 155--161.

[17]

Jeffrey L Elman. 1990. Finding Structure in Time. Cognitive Science, Vol. 14, 2 (1990), 179--211.

[18]

Yuval Emek, Ron Lavi, Rad Niazadeh, and Yangguang Shi. 2020. Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes. Advances in Neural Information Processing Systems (NeurIPS), Vol. 33 (2020), 2970--2982.

[19]

Gabriele Farina, Chung-Wei Lee, Haipeng Luo, and Christian Kroer. 2022. Kernelized multiplicative weights for 0/1-polyhedral games: Bridging the gap between learning in extensive-form and normal-form games. In International Conference on Machine Learning. PMLR, 6337--6357.

[20]

Yoav Freund and Robert E Schapire. 1997. A Decision-theoretic Generalization of On-line Learning and An Application to Boosting. Journal of Computer and System Sciences (JCSS), Vol. 55, 1 (1997), 119--139.

Digital Library

[21]

Jerome H Friedman. 2001. Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics (2001), 1189--1232.

[22]

Yuwei Fu, Di Wu, and Benoit Boulet. 2022. Reinforcement learning based dynamic model combination for time series forecasting. In Conference on Artificial Intelligence (AAAI), Vol. 36. 6639--6647.

[23]

Jo ao Gama, Indr.e vZ liobait.e, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. 2014. A Survey on Concept Drift Adaptation. ACM Computing Srveys (CSUR), Vol. 46, 4 (2014), 1--37.

Digital Library

[24]

Yuanning Gao, Xiuqi Huang, Xuanhe Zhou, Xiaofeng Gao, Guoliang Li, and Guihai Chen. 2023. DBAugur: An Adversarial-based Trend Forecasting System for Diversified Workloads. In International Conference on Data Engineering (ICDE)). 1--13.

[25]

Pierre Geurts, Damien Ernst, and Louis Wehenkel. 2006. Extremely Randomized Trees. Machine Learning, Vol. 63, 1 (2006), 3--42.

Digital Library

[26]

Isaac Gibbs and Emmanuel Candes. 2021. Adaptive Conformal Inference Under Distribution Shift. Advances in Neural Information Processing Systems (NeurIPS), Vol. 34 (2021), 1660--1672.

[27]

Mian Guo, Quansheng Guan, Weiqi Chen, Fei Ji, and Zhiping Peng. 2022. Delay-Optimal Scheduling of VMs in a Queueing Cloud Computing System with Heterogeneous Workloads. IEEE Trans. Serv. Comput., Vol. 15, 1 (2022), 110--123.

[28]

Antony S Higginson, Mihaela Dediu, Octavian Arsene, Norman W Paton, and Suzanne M Embury. 2020. Database Workload Capacity Planning Using Time Series Analysis and Machine Learning. In ACM International Conference on Management of Data (SIGMOD). 769--783.

[29]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation, Vol. 9, 8 (1997), 1735--1780.

Digital Library

[30]

Steven CH Hoi, Doyen Sahoo, Jing Lu, and Peilin Zhao. 2021. Online Learning: A Comprehensive Survey. NeuroComputing, Vol. 459 (2021), 249--289.

Digital Library

[31]

Charles C Holt. 2004. Forecasting Seasonals and Trends by Exponentially Weighted Moving Averages. International Journal of Forecasting (IJF), Vol. 20, 1 (2004), 5--10.

[32]

Vinodh Kumaran Jayakumar, Jaewoo Lee, In Kee Kim, and Wei Wang. 2020. A Self-Optimized Generic Workload Prediction Framework for Cloud Computing. In International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 779--788.

[33]

Adam Kalai and Santosh Vempala. 2005. Efficient Algorithms for Online Decision Problems. Journal of Computer and System Sciences (JCSS), Vol. 71, 3 (2005), 291--307.

Digital Library

[34]

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A Highly Efficient Gradient Boosting Decision Tree. Advances in Neural Information Processing Systems (NeurIPS), Vol. 30 (2017).

[35]

Bryan Lim, Sercan Ö Arik, Nicolas Loeff, and Tomas Pfister. 2021. Temporal fusion transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting (JCS), Vol. 37, 4 (2021), 1748--1764.

[36]

Lin Ma, Dana Van Aken, Ahmed Hefny, Gustavo Mezerhane, Andrew Pavlo, and Geoffrey J Gordon. 2018. Query-Based Workload Forecasting for Self-Driving Database Management Systems. In ACM International Conference on Management of Data (SIGMOD). 631--645.

[37]

Ning Ma, Mustafa Ispir, Yuan Li, Yongpeng Yang, Zhe Chen, Derek Zhiyuan Cheng, Lan Nie, and Kishor Barman. 2022. An Online Multi-task Learning Framework for Google Feed Ads Auction Models. In International Conference on Knowledge Discovery and Data Mining (KDD). 3477--3485.

[38]

Pablo Montero-Manso, George Athanasopoulos, Rob J Hyndman, and Thiyanga S Talagala. 2020. FFORMA: Feature-Based Forecast Model Averaging. International Journal of Forecasting (IJF), Vol. 36, 1 (2020), 86--92.

[39]

Quang Pham, Chenghao Liu, Doyen Sahoo, and Steven CH Hoi. 2022. Learning Fast and Slow for Online Time Series Forecasting. arXiv preprint arXiv:2202.11672 (2022).

[40]

Olga Poppe, Tayo Amuneke, Dalitso Banda, Aritra De, Ari Green, Manon Knoertzer, Ehi Nosakhare, Karthik Rajendran, Deepak Shankargouda, Meina Wang, Alan Au, Carlo Curino, Qun Guo, Alekh Jindal, Ajay Kalhan, Morgan Oslake, Sonia Parchani, Vijay Ramani, Raj Sellappan, Saikat Sen, Sheetal Shrotri, Soundararajan Srinivasan, Ping Xia, Shize Xu, Alicia Yang, and Yiwen Zhu. 2020. Seagull: An Infrastructure for Load Prediction and Optimized Resource Allocation. Proc. VLDB Endow., Vol. 14, 2 (2020), 154--162.

Digital Library

[41]

Amal Saadallah, Maryam Tavakol, and Katharina Morik. 2021. An Actor-critic Ensemble Aggregation Model for Time-series Forecasting. In International Conference on Data Engineering (ICDE). IEEE, 2255--2260.

[42]

David Salinas, Valentin Flunkert, Jan Gasthaus, and Tim Januschowski. 2020. DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks. International Journal of Forecasting (IJF), Vol. 36, 3 (2020), 1181--1191.

[43]

Matthias Seeger. 2004. Gaussian Processes for Machine Learning. International Journal of Neural Systems (IJNS), Vol. 14, 02 (2004), 69--106.

[44]

Glenn Shafer and Vladimir Vovk. 2008. A Tutorial on Conformal Prediction. Journal of Machine Learning Research (JMLC), Vol. 9 (2008), 371--421.

Digital Library

[45]

Ashutosh Kumar Singh, Deepika Saxena, Jitendra Kumar, and Vrinda Gupta. 2021. A Quantum Approach Towards the Adaptive Prediction of Cloud Workloads. IEEE Trans. Parallel Distributed Syst., Vol. 32, 12 (2021), 2893--2905.

[46]

Arun Suggala and Praneeth Netrapalli. 2020a. Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games. Advances in Neural Information Processing Systems (NeurIPS), Vol. 33 (2020), 22316--22326.

[47]

Arun Sai Suggala and Praneeth Netrapalli. 2020b. Online non-convex learning: Following the perturbed leader is optimal. In Algorithmic Learning Theory. 845--861.

[48]

Thiyanga S Talagala, Rob J Hyndman, George Athanasopoulos, et al. 2018. Meta-learning How to Forecast Time Series. Monash Econometrics and Business Statistics Working Papers, Vol. 6, 18 (2018), 16.

[49]

Sean J Taylor and Benjamin Letham. 2018. Forecasting at Scale. The American Statistician (TAS), Vol. 72, 1 (2018), 37--45.

[50]

Ryan J. Tibshirani, Rina Foygel Barber, Emmanuel J. Candès, and Aaditya Ramdas. 2019. Conformal prediction under covariate shift. Advances in Neural Information Processing Systems (NeurIPS), Vol. 32 (2019), 2526--2536.

[51]

Dinghuai Zhang, Hongyang Zhang, Aaron Courville, Yoshua Bengio, Pradeep Ravikumar, and Arun Sai Suggala. 2022. Building robust ensembles via margin boosting. In International Conference on Machine Learning. 26669--26692.

Cited By

Chen YYang KAn ZHolder BPaloutzian LBali KDu WBaeza-Yates RBonchi F(2024)MARLP: Time-series Forecasting Control for Agricultural Managed Aquifer RechargeProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671533(4862-4872)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671533
Luo YGao MYu ZGe HGao XCai TChen GBaeza-Yates RBonchi F(2024)Integrating System State into Spatio Temporal Graph Neural Network for Microservice Workload PredictionProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671508(5521-5531)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671508
Liu MZuo YLuo YWu DZhen PGuo JGao X(2024)Weather-Conditioned Multi-graph Network for Ride-Hailing Demand ForecastingService-Oriented Computing10.1007/978-981-96-0808-9_26(341-356)Online publication date: 7-Dec-2024
https://doi.org/10.1007/978-981-96-0808-9_26

Index Terms

IPOC: An Adaptive Interval Prediction Model based on Online Chasing and Conformal Inference for Large-Scale Systems
1. Information systems
  1. Information systems applications
    1. Data mining
      1. Data stream mining
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Online learning theory

Recommendations

Neural network-based construction of online prediction intervals
Abstract
With the emergence of online learning systems which generate ever-growing amounts of data, quantifying the uncertainty in predictions regarding the system’s operation is becoming increasingly more important. Prediction intervals offer a powerful ...
An investigation of online and offline learning models for online Just-in-Time Software Defect Prediction
Abstract
Just-in-Time Software Defect Prediction (JIT-SDP) operates in an online scenario where additional training data is received over time. Existing online JIT-SDP studies used online Oza ensemble learning methods with Hoeffding Trees as base learners ...
Large scale online kernel learning

In this paper, we present a new framework for large scale online kernel learning, making kernel methods efficient and scalable for large-scale online learning applications. Unlike the regular budget online kernel learning scheme that usually uses some ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 2023

5996 pages

ISBN:9798400701030

DOI:10.1145/3580305

General Chairs:
Ambuj Singh
UC Santa Barbara, USA
,
Yizhou Sun
UC Los Angeles, USA
,
Program Chairs:
Leman Akoglu
Carnegie Mellon University, USA
,
Dimitrios Gunopulos
University of Athens, Greece
,
Xifeng Yan
UC Santa Barbara, USA
,
Ravi Kumar
Google, USA
,
Fatma Ozcan
Google, USA
,
Jieping Ye
Alibaba DAMO Academy

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key R&D Program of China
Shanghai Municipal Science and Technology Major Project
National Natural Science Foundation of China

Conference

KDD '23

Sponsor:

KDD '23: The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 6 - 10, 2023

CA, Long Beach, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
517
Total Downloads

Downloads (Last 12 months)241
Downloads (Last 6 weeks)20

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen YYang KAn ZHolder BPaloutzian LBali KDu WBaeza-Yates RBonchi F(2024)MARLP: Time-series Forecasting Control for Agricultural Managed Aquifer RechargeProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671533(4862-4872)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671533
Luo YGao MYu ZGe HGao XCai TChen GBaeza-Yates RBonchi F(2024)Integrating System State into Spatio Temporal Graph Neural Network for Microservice Workload PredictionProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671508(5521-5531)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671508
Liu MZuo YLuo YWu DZhen PGuo JGao X(2024)Weather-Conditioned Multi-graph Network for Ride-Hailing Demand ForecastingService-Oriented Computing10.1007/978-981-96-0808-9_26(341-356)Online publication date: 7-Dec-2024
https://doi.org/10.1007/978-981-96-0808-9_26

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten