NQF-RNN: probabilistic forecasting via neural quantile function-based recurrent neural networks

Song, Jungyoon; Chang, Woojin; Song, Jae Wook

doi:10.1007/s10489-024-06077-7

NQF-RNN: probabilistic forecasting via neural quantile function-based recurrent neural networks

Published: 18 December 2024

Volume 55, article number 183, (2025)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

246 Accesses
Explore all metrics

Abstract

Probabilistic forecasting offers insights beyond point estimates, supporting more informed decision-making. This paper introduces the Neural Quantile Function with Recurrent Neural Networks (NQF-RNN), a model for multistep-ahead probabilistic time series forecasting. NQF-RNN combines neural quantile functions with recurrent neural networks, enabling applicability across diverse time series datasets. The model uses a monotonically increasing neural quantile function and is trained with a continuous ranked probability score (CRPS)-based loss function. NQF-RNN’s performance is evaluated on synthetic datasets generated from multiple distributions and six real-world time series datasets with both periodicity and irregularities. NQF-RNN demonstrates competitive performance on synthetic data and outperforms benchmarks on real-world data, achieving lower average forecast errors across most metrics. Notably, NQF-RNN surpasses benchmarks in CRPS, a key probabilistic metric, and tail-weighted CRPS, which assesses tail event forecasting with a narrow prediction interval. The model outperforms other deep learning models by 5% to 41% in CRPS, with improvements of 5% to 53% in left tail-weighted CRPS and 6% to 34% in right tail-weighted CRPS. Against its baseline model, DeepAR, NQF-RNN achieves a 41% improvement in CRPS, indicating its effectiveness in generating reliable prediction intervals. These results highlight NQF-RNN’s robustness in managing complex and irregular patterns in real-world forecasting scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Time-series forecasting through recurrent topology

Article Open access 09 January 2024

Monitoring Time Series with Missing Values: A Deep Probabilistic Approach

Stochastic Recurrent Neural Network for Multistep Time Series Forecasting

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

All datasets can be downloaded in the following URLs: − synthetic data

(https://doi.org/10.7910/DVN/W04WWC) − electricity [1, 44]

(https://archive.ics.uci.edu/dataset/321/electricityloaddiagrams20112014) − traffic [1, 44]

(https://archive.ics.uci.edu/dataset/204/pems+sf) − solar [45]

(https://www.nrel.gov/grid/solar-power-data.html) − M4-hourly [46]

(https://github.com/Mcompetitions/M4-methods/tree/master) − tourism-monthly, tourism-quarterly [47]

(https://robjhyndman.com/publications/the-tourism-forecasting-competition)

References

Salinas D, Flunkert V, Gasthaus J, Januschowski T (2020) Deepar: Probabilistic forecasting with autoregressive recurrent networks. Int J Forecast 36(3):1181–1191
Article Google Scholar
Rasul K, Sheikh A-S, Schuster I, Bergmann UM, Vollgraf R (2021) Multivariate probabilistic time series forecasting via conditioned normalizing flows. In: International conference on learning representations. https://openreview.net/forum?id=WiGQBFuVRv
Yang L, Zhang Z, Song Y, Hong S, Xu R, Zhao Y, Zhang W, Cui B, Yang M-H (2023) Diffusion models: A comprehensive survey of methods and applications. ACM Comput Surv 56(4):1–39
Article MATH Google Scholar
Cannon AJ (2011) Quantile regression neural networks: Implementation in r and application to precipitation downscaling. Comput Geosci 37(9):1277–1284
Article MATH Google Scholar
Gasthaus J, Benidis K, Wang Y, Rangapuram SS, Salinas D, Flunkert V, Januschowski T (2019) Probabilistic forecasting with spline quantile function rnns. In: The 22nd international conference on artificial intelligence and statistics, pp 1901–1910. PMLR
Chilinski P, Silva R (2020) Neural likelihoods via cumulative distribution functions. In: Conference on uncertainty in artificial intelligence, pp 420–429. PMLR
Alcántara A, Galván IM, Aler R (2022) Direct estimation of prediction intervals for solar and wind regional energy forecasting with deep neural networks. Eng Appl Artif Intell 114:105128
Article MATH Google Scholar
Sun R, Li C-L, Arik SÖ, Dusenberry MW, Lee C-Y, Pfister T (2023) Neural spline search for quantile probabilistic modeling. In: Proceedings of the AAAI conference on artificial intelligence, vol 37, pp 9927–9934
Wen R, Torkkola K, Narayanaswamy B, Madeka D (2017) A multi-horizon quantile recurrent forecaster. In: NIPS 2017 time series workshop
Zhang X-Y, Watkins C, Kuenzel S (2022) Multi-quantile recurrent neural network for feeder-level probabilistic energy disaggregation considering roof-top solar energy. Eng Appl Artif Intell 110:104707
Article MATH Google Scholar
Gouttes A, Rasul K, Koren M, Stephan J, Naghibi T (2021) Probabilistic time series forecasting with implicit quantile networks. In: ICML 2021 Time Series Workshop
Park Y, Maddix D, Aubet F-X, Kan K, Gasthaus J, Wang Y (2022) Learning quantile functions without quantile crossing for distribution-free time series forecasting. In: International conference on artificial intelligence and statistics, pp 8127–8150 . PMLR
Hu J, Tang J, Liu Z (2024) A novel time series probabilistic prediction approach based on the monotone quantile regression neural network. Inf Sci 654:119844
Article MATH Google Scholar
Cannon AJ (2018) Non-crossing nonlinear regression quantiles by monotone composite quantile regression neural network, with application to rainfall extremes. Stoch Env Res Risk A 32:3207–3225
Article MATH Google Scholar
Koenker R, Hallock KF (2001) Quantile regression. J Econ Perspect 15(4):143–156
Article MATH Google Scholar
Wang Q, Ma Y, Zhao K, Tian Y (2020) A comprehensive survey of loss functions in machine learning. Annals of Data Science, pp 1–26
Zou H, Yuan M (2008) Composite quantile regression and the oracle model selection theory. Ann Stat 36(3):1108–1126. Accessed 2024-03-16
Moon SJ, Jeon J-J, Lee JSH, Kim Y (2021) Learning multiple quantiles with neural networks. J Comput Graph Stat 30(4):1238–1248
Article MathSciNet MATH Google Scholar
Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 102(477):359–378
Article MathSciNet MATH Google Scholar
Rasul K, Seward C, Schuster I, Vollgraf R: Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting. In: International conference on machine learning, pp 8857–8868 (2021). PMLR
Zhang H, Zhang Z (1999) Feedforward networks with monotone constraints. In: IJCNN’99. international joint conference on neural networks. proceedings (Cat. No. 99CH36339), vol 3, pp 1820–1823 . IEEE
Friederichs P, Thorarinsdottir TL (2012) Forecast verification for extreme value distributions with an application to probabilistic peak wind prediction. Environmetrics 23(7):579–594
Article MathSciNet MATH Google Scholar
Lin F, Zhang Y, Wang K, Wang J, Zhu M (2022) Parametric probabilistic forecasting of solar power with fat-tailed distributions and deep neural networks. IEEE Trans Sustain Energy 13(4):2133–2147
Article MATH Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article MATH Google Scholar
Jozefowicz R, Zaremba W, Sutskever I (2015) An empirical exploration of recurrent network architectures. In: International conference on machine learning, pp 2342–2350. PMLR
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Advances in neural information processing systems 27
Goodfellow I, Bengio Y, Courville A (2016) Deep Learning. MIT Press. http://www.deeplearningbook.org
Williams RJ, Zipser D (1989) A learning algorithm for continually running fully recurrent neural networks. Neural Comput 1(2):270–280
Article MATH Google Scholar
Rudin W, et al (1976) Principles of Mathematical Analysis vol 3. McGraw-hill
Koenker R, Machado JA (1999) Goodness of fit and related inference processes for quantile regression. J Am Stat Assoc 94(448):1296–1310
Article MathSciNet MATH Google Scholar
Taylor JW (1999) Evaluating volatility and interval forecasts. J Forecast 18(2):111–128
Article MATH Google Scholar
Davis PJ, Rabinowitz P (2007) Methods of numerical integration. Courier Corporation
Van Der Walt S, Colbert SC, Varoquaux G (2011) The numpy array: a structure for efficient numerical computation. Comput Sci Eng 13(2):22–30
Article MATH Google Scholar
Gneiting T, Ranjan R (2011) Comparing density forecasts using threshold-and quantile-weighted scoring rules. J Bus Econ Stat 29(3):411–422
Article MathSciNet MATH Google Scholar
Zamo M, Naveau P (2018) Estimation of the continuous ranked probability score with limited information and applications to ensemble weather forecasts. Math Geosci 50(2):209–234
Bollerslev T (1986) Generalized autoregressive conditional heteroskedasticity. J Econ 31(3):307–327
Google Scholar
Meinshausen N, Ridgeway G (2006) Quantile regression forests. J Mach Learn Res 7(6)
Petropoulos F, Apiletti D, Assimakopoulos V, Babai MZ, Barrow DK, Taieb SB, Bergmeir C, Bessa RJ, Bijak J, Boylan JE et al (2022) Forecasting: theory and practice. Int J Forecast 38(3):705–871
Article Google Scholar
Hewamalage H, Bergmeir C, Bandara K (2021) Recurrent neural networks for time series forecasting: Current status and future directions. Int J Forecast 37(1):388–427
Article MATH Google Scholar
Kunz M, Birr S, Raslan M, Ma L, Januschowski T (2023) Deep learning based forecasting: a case study from the online fashion industry. In: Forecasting with artificial intelligence: theory and applications, pp 279–311. Springer
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 workshop on deep learning, December 2014
Challu C, Olivares KG, Oreshkin BN, Ramirez FG, Canseco MM, Dubrawski A (2023) Nhits: Neural hierarchical interpolation for time series forecasting. Proceedings of the AAAI conference on artificial intelligence 37:6989–6997
Article Google Scholar
Chen Z, Ma M, Li T, Wang H, Li C (2023) Long sequence time-series forecasting with deep learning: A survey. Inf Fusion 97:101819
Article MATH Google Scholar
Yu H-F, Rao N, Dhillon IS (2016) Temporal regularized matrix factorization for high-dimensional time series prediction. Advances in neural information processing systems 29
Lai G, Chang W-C, Yang Y, Liu H (2018) Modeling long-and short-term temporal patterns with deep neural networks. In: The 41st international ACM SIGIR conference on research & development in information retrieval, pp 95–104
Makridakis S, Spiliotis E, Assimakopoulos V (2018) The m4 competition: Results, findings, conclusion and way forward. Int J Forecast 34(4):802–808
Article MATH Google Scholar
Athanasopoulos G, Hyndman RJ, Song H, Wu DC (2011) The tourism forecasting competition. Int J Forecast 27(3):822–844
Article MATH Google Scholar

Download references

Funding

This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2023S1A5A2A03086550).

Author information

Authors and Affiliations

Department of Industrial Engineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea
Jungyoon Song & Woojin Chang
Department of Industrial Engineering, Hanyang University, 222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Republic of Korea
Jae Wook Song

Authors

Jungyoon Song
View author publications
You can also search for this author in PubMed Google Scholar
Woojin Chang
View author publications
You can also search for this author in PubMed Google Scholar
Jae Wook Song
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Jungyoon Song: Conceptualization, Methodology, Visualization, Writing − original draft, Woojin Chang: Writing − review & editing, Supervision, Jae Wook Song: Conceptualization, Writing − review & editing, Validation, Supervision.

Corresponding author

Correspondence to Jae Wook Song.

Ethics declarations

Ethical Approval

All datasets used in this article are publicly available, and no consent was required for their use.

Conflicts of Interest

The authors have no competing interests to declare that are relevant to the content of this article.

Competing interests

The authors have no financial interests that could have appeared to influence the work reported in this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Abbreviations & Symbols

The abbreviations and symbols used in this paper are summarized in Table 9.

Table 9 List of abbreviations & symbols

Full size table

Appendix B Proper scoring rules

Based on the work of [19],

$$\begin{aligned} S(P,Q) = \int S(P,\omega )\, dQ(\omega ) \end{aligned}$$

(B1)

for the expected score under Q when the probabilistic forecast is P. The scoring rule S is proper relative to $\mathcal {P}$ if

$$\begin{aligned} S(Q, Q) \ge S(P,Q) \text { for all } P,Q \in \mathcal {P} \end{aligned}$$

(B2)

It is strictly proper relative to $\mathcal {P}$ if (B2) holds with equality if and only if $P = Q$.

Appendix C Detailed preprocessing of datasets

1.1 C.1 Synthetic datasets

The synthetic dataset used in this study comprises four distinct time series input variables. The first input variable consists of scaled observations from a one-lagged time series, denoted by $z_{i,t-1}$. The second input variable serves as a time index, represented by $t-1$ within the range $[0, \dots , 41]$. The third input variable represents weekly seasonality, calculated as $t - 1 \;\text {mod}\; 7$, within the range $[0, \dots , 6]$. The fourth input variable is an item-related covariate, represented by an integer identifier that specifies the group to which a particular observation belongs. To standardize these input features, both the second and third variables are normalized using z-score normalization.

1.2 C.2 Real-world datasets

For the real-world datasets, each time series input is denoted as $x_{i,t} \in \mathbb {R}^{T \times D}$, where supplementary input variables are associated with time-dependent covariates. The first input variable is the scale-adjusted observation of a one-lagged time series, $z_{i,t-1}$. Additional input variables include time-related covariates derived from the date information in each dataset, such as hours (ranging from 0 to 23), weekdays (ranging from 0 to 6), months (ranging from 1 to 12), and quarters (ranging from 1 to 4). These covariates are represented by integer values and used as input features.

Another time-related covariate, age, represents the temporal distance from the initial observation within each time series. This variable is adjusted according to the time granularity specific to each dataset.

Additionally, an item-related covariate is included to identify specific series within each dataset. This covariate is also represented by an integer value; for example, in the electricity dataset, it corresponds to the customer index (ranging from 0 to 370) and is linked to the input embedding dimension. The time-related covariates undergo z-score normalization to ensure consistency across varied scales and to enhance model performance. The dataset is generated using a stride equal to the decoder length to avoid overlapping instances in model training, validation, and testing during forecasting.

Appendix D Hyperparameter sets for models

1.1 D.1 Synthetic datasets

For the synthetic datasets, the hyperparameters for each model are defined as follows. For the QRF model, the hyperparameters include the number of estimators, set to either 50 or 100, and the maximum depth, set within the range of 5, 10, or 15. Deep learning-based models use a standardized set of hyperparameters, including a learning rate of either 0.005 or 0.01 and item-related embedding dimensions of 1 or 5. The QRNN, SQF-RNN, IQN-RNN, and NQF-RNN models are configured with RNN hidden dimensions set to 20 or 60, with three hidden layers. The SQF-RNN model specifies the number of knot positions L as 5 or 10. For the NQF-RNN model, the quantile function comprises DNN layers configured as [32, 16] or [8, 4]. The NHITS model incorporates hyperparameters, including an aggregation kernel size defined as [2, 2, 1], [2, 2, 2], or [4, 2, 1], and frequency downsampling set to [4, 2, 1], [14, 7, 1], or [28, 14, 1]. This model also utilizes three blocks, with linear interpolation mode and Maxpool1d as the pooling mode.

1.2 D.2 Real-world datasets

For the real-world datasets, the hyperparameters for each model are as follows. The QRF model includes the number of estimators, set to either 50 or 100, and the maximum depth, set within the range of 5, 10, or 15. Deep learning-based models employ a standard set of hyperparameters, including a learning rate of 0.001 or 0.005 and item-related embedding dimensions of 5 or 20. The QRNN, SQF-RNN, IQN-RNN, and NQF-RNN models have RNN hidden dimensions set to 20, 40, or 60, across three hidden layers. The SQF-RNN model specifies a number of knot positions L as 5 or 10. For the NQF-RNN model, the quantile function incorporates DNN layers configured as [64, 32, 16, 8], [128, 64], [64, 32], or [16, 8]. The NHITS model includes hyperparameters such as an aggregation kernel size set to [2, 2, 1], [2, 2, 2], or [4, 2, 1], and frequency downsampling set to [4, 2, 1], [24, 12, 1], or [168, 24, 1]. This model also employs three blocks, using linear interpolation mode and Maxpool1d as the pooling mode. For each hyperparameter set, the model achieving optimal validation performance is selected for the final evaluation.

Appendix E Visualization of probabilistic forecasting in four real-world datasets

Examples of results for the traffic, M4-hourly, tourism-monthly, and tourism-quarterly datasets are presented in Figs. E1, E2, E3, and E4, respectively.

In panel (a), the solid gray line positioned outside the yellow box represents the conditioning range, which serves as input to the models along with various covariates. Using this input, the models perform multistep-ahead forecasting for the prediction range, depicted by a solid black line. Panels (b) through (j) display the probabilistic forecasting results within the prediction range for different models. The solid red line denotes the point forecasts, while the dotted red, orange, and yellow lines correspond to the quantiles of 0.1/0.9, 0.05/0.95, and 0.01/0.99 of the prediction intervals, respectively.

Figure E1 suggests that the time series across models exhibit comparable patterns but differ in scale of $y-axis$. Figure E2 shows that, while the time series generally maintain consistent patterns, notable scale differences are also present. Figure E3 indicates that the time series share similar scales but show irregularities. In Fig. E4, the time series predominantly adhere to consistent historical patterns.

In terms of performance, the GARCH and QRF models demonstrate inadequate predictive accuracy. QRNN encounters challenges in forecasting beyond the median range, while DeepAR tends to produce relatively broader prediction intervals compared to other probabilistic time series models. In contrast, other deep learning-based probabilistic time series models exhibit more effective pattern learning and generate reasonable prediction intervals. Notably, NQF-RNN achieves the narrowest prediction intervals across all datasets, effectively capturing the evolution of the original time series.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Song, J., Chang, W. & Song, J.W. NQF-RNN: probabilistic forecasting via neural quantile function-based recurrent neural networks. Appl Intell 55, 183 (2025). https://doi.org/10.1007/s10489-024-06077-7

Download citation

Accepted: 15 November 2024
Published: 18 December 2024
DOI: https://doi.org/10.1007/s10489-024-06077-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

NQF-RNN: probabilistic forecasting via neural quantile function-based recurrent neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Time-series forecasting through recurrent topology

Monitoring Time Series with Missing Values: A Deep Probabilistic Approach

Stochastic Recurrent Neural Network for Multistep Time Series Forecasting

Data Availability

References

Funding