Skip to main content

Learning Solutions of Stochastic Optimization Problems with Bayesian Neural Networks

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2024 (ICANN 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15016))

Included in the following conference series:

  • 577 Accesses

Abstract

Mathematical solvers use parametrized Optimization Problems (OPs) as inputs to yield optimal decisions. In many real-world settings, some of these parameters are unknown or uncertain. Recent research focuses on predicting the value of these unknown parameters using available contextual features, aiming to decrease decision regret by adopting end-to-end learning approaches. However, these approaches disregard prediction uncertainty and therefore make the mathematical solver susceptible to provide erroneous decisions in case of low-confidence predictions. We propose a novel framework that models prediction uncertainty with Bayesian Neural Networks (BNNs) and propagates this uncertainty into the mathematical solver with a Stochastic Programming technique. The differentiable nature of BNNs and differentiable mathematical solvers allow for two different learning approaches: In the Decoupled learning approach, we update the BNN weights to increase the quality of the predictions’ distribution of the OP parameters, while in the Combined learning approach, we update the weights aiming to directly minimize the expected OP’s cost function in a stochastic end-to-end fashion. We do an extensive evaluation using synthetic data with various noise properties and a real dataset, showing that decisions regret are generally lower (better) with both proposed methods. The code is available at https://github.com/AlanLahoud/BNNSOP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agrawal, A., Amos, B., Barratt, S., Boyd, S., Diamond, S., Kolter, J.Z.: Differentiable Convex Optimization Layers, vol. 32. Curran Associates Inc. (2019)

    Google Scholar 

  2. Amos, B., Kolter, J.Z.: Optnet: differentiable optimization as a layer in neural networks. In: International Conference on Machine Learning, pp. 136–145. PMLR (2017)

    Google Scholar 

  3. Ban, G.Y., Rudin, C.: The big data newsvendor: practical insights from machine learning. Oper. Res. 67(1), 90–108 (2019)

    Article  MathSciNet  Google Scholar 

  4. Bayraksan, G., Love, D.K.: Data-driven stochastic programming using phi-divergences. In: The operations research revolution, pp. 1–19. INFORMS (2015)

    Google Scholar 

  5. Bell, D.E.: Regret in decision making under uncertainty. Oper. Res. 30(5), 961–981 (1982)

    Article  Google Scholar 

  6. Birge, J.R., Louveaux, F.: Introduction to stochastic programming. Springer Science & Business Media (2011)

    Google Scholar 

  7. Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural network. In: International Conference on Machine Learning, pp. 1613–1622. PMLR (2015)

    Google Scholar 

  8. Demirović, E., et al.: An investigation into prediction + optimisation for the knapsack problem

    Google Scholar 

  9. Donti, P., Amos, B., Kolter, J.Z.: Task-based end-to-end model learning in stochastic optimization. Adv. Neural Inform. Process. Syst. 30 (2017)

    Google Scholar 

  10. Elmachtoub, A.N., Grigas, P.: Smart “predict, then optimize’’. Manage. Sci. 68(1), 9–26 (2017)

    Article  Google Scholar 

  11. Ferber, A., Wilder, B., Dilkina, B., Tambe, M.: Mipaal: Mixed integer program as a layer. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 1504–1511 (2020)

    Google Scholar 

  12. Gah-Yi, B., Rudin, C.: The big data newsvendor: practical insights from machine learning. Oper. Res. 67(1), 90–108 (2018)

    MathSciNet  Google Scholar 

  13. Grimes, D., Ifrim, G., O’Sullivan, B., Simonis, H.: Analyzing the impact of electricity price forecasting on energy cost-aware scheduling. Sustainable Comput. Inform. Syst. 4(4), 276–291 (2014), special Issue on Energy Aware Resource Management and Scheduling (EARMS)

    Google Scholar 

  14. Hannah, L.A.: Stochastic optimization. Inter. Encycl. Soc. Behav. Sci. 2, 473–481 (2015)

    Google Scholar 

  15. Hoseinzade, E., Haratizadeh, S.: Cnnpred: Cnn-based stock market prediction using a diverse set of variables. Expert Syst. Appl. 129, 273–285 (2019)

    Article  Google Scholar 

  16. Hüllermeier, E., Waegeman, W.: Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach. Learn. 110(3), 457–506 (2021)

    Article  MathSciNet  Google Scholar 

  17. Ifrim, G., O’Sullivan, B., Simonis, H.: Properties of energy-price forecasts for scheduling. In: Milano, M. (ed.) CP 2012. LNCS, pp. 957–972. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33558-7_68

    Chapter  Google Scholar 

  18. Jospin, L.V., Laga, H., Boussaid, F., Buntine, W., Bennamoun, M.: Hands-on bayesian neural networks-a tutorial for deep learning users. IEEE Comput. Intell. Mag. 17(2), 29–48 (2022)

    Article  Google Scholar 

  19. Kendall, A., Gal, Y.: What uncertainties do we need in bayesian deep learning for computer vision? Adv. Neural Inform. Process. Syst. 30 (2017)

    Google Scholar 

  20. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: Bengio, Y., LeCun, Y. (eds.) 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14-16 April 2014, Conference Track Proceedings (2014)

    Google Scholar 

  21. Kong, L., Cui, J., Zhuang, Y., Feng, R., Prakash, B.A., Zhang, C.: End-to-end stochastic optimization with energy-based model. Adv. Neural. Inf. Process. Syst. 35, 11341–11354 (2022)

    Google Scholar 

  22. Lahoud, A.A., Schaffernicht, E., Stork, J.A.: Datasp: A differential all-to-all shortest path algorithm for learning costs and predicting paths with context. arXiv preprint arXiv:2405.04923 (2024)

  23. Li, X., Shou, B., Qin, Z.: An expected regret minimization portfolio selection model. Eur. J. Oper. Res. 218(2), 484–492 (2012)

    Article  MathSciNet  Google Scholar 

  24. Mandi, J., Guns, T.: Interior point solving for lp-based prediction+ optimisation. Adv. Neural. Inf. Process. Syst. 33, 7272–7282 (2020)

    Google Scholar 

  25. Pearce, T., Leibfried, F., Brintrup, A.: Uncertainty in neural networks: Approximately bayesian ensembling. In: International Conference on Artificial Intelligence and Statistics, pp. 234–244. PMLR (2020)

    Google Scholar 

  26. Powell, W.B.: A unified framework for stochastic optimization. Eur. J. Oper. Res. 275(3), 795–821 (2019)

    Article  MathSciNet  Google Scholar 

  27. Rockafellar, R.T., Uryasev, S., et al.: Optimization of conditional value-at-risk. J. Risk 2, 21–42 (2000)

    Article  Google Scholar 

  28. Wilder, B., Dilkina, B., Tambe, M.: Melding the data-decisions pipeline: decision-focused learning for combinatorial optimization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 1658–1665 (2019)

    Google Scholar 

Download references

Acknowledgement

This work has been supported by the Industrial Graduate School Collaborative AI & Robotics funded by the Swedish Knowledge Foundation Dnr:20190128, and the Knut and Alice Wallenberg Foundation through Wallenberg AI, Autonomous Systems and Software Program (WASP).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alan A. Lahoud .

Editor information

Editors and Affiliations

Appendices

Appendices

1.1 Appendix A. Limitations of Uncertainty Propagation

This paper focuses on minimizing \({{\,\mathrm{arg\,min}\,}}_{\boldsymbol{z}} {{\,\mathrm{\mathbb {E}}\,}}_{\hat{\boldsymbol{y}}}f(\boldsymbol{z}, \hat{\boldsymbol{y}})\). This problem simplifies to \({{\,\mathrm{arg\,min}\,}}_{\boldsymbol{z}} f(\boldsymbol{z}, {{\,\mathrm{\mathbb {E}}\,}}_{\hat{\boldsymbol{y}}}\hat{\boldsymbol{y}})\) when substituting the objective function’s expected value with the expected value of predictions, but this simplification is only applicable in certain conditions. If these conditions are met, we recommend solving the argmin by directly calculating the expected value of predictions (Decoupled).

1.2 Appendix A.1. Linear Objective Functions with Respect to the Unknown Variable

If \(f(\boldsymbol{z}, \boldsymbol{y})\) is linear with respect to \(\boldsymbol{y}\), then \({{\,\mathrm{\mathbb {E}}\,}}_{\hat{\boldsymbol{y}}}f(\boldsymbol{z}, \hat{\boldsymbol{y}}) = f(\boldsymbol{z}, {{\,\mathrm{\mathbb {E}}\,}}_{\hat{\boldsymbol{y}}}\hat{\boldsymbol{y}})\). Applying the argmin with respect to \(\boldsymbol{z}\) on both sides we have \({{\,\mathrm{arg\,min}\,}}_{\boldsymbol{z}} {{\,\mathrm{\mathbb {E}}\,}}_{\hat{\boldsymbol{y}}}f(\boldsymbol{z}, \hat{\boldsymbol{y}}) = {{\,\mathrm{arg\,min}\,}}_{\boldsymbol{z}} f(\boldsymbol{z}, {{\,\mathrm{\mathbb {E}}\,}}_{\hat{\boldsymbol{y}}}\hat{\boldsymbol{y}}).\)

1.3 Appendix A.2. Balanced Newsvendor Problem

When \(c_s = c_e\) in the NV problem, the optimal order quantity \({{\,\mathrm{arg\,min}\,}}_{z} {{\,\mathrm{\mathbb {E}}\,}}_{\hat{y}}f(z, \hat{y})\) corresponds to the median of \(\hat{y}\)’s distribution, given by the \(\frac{c_s}{c_s+c_e} = 0.5\) quantile. If \(\hat{y}\)’s distribution is Gaussian, this median equals the mean, simplifying the argmin to the mean of \(\hat{y}\). This observation extends to both Gaussian models and the NVQP, highlighting that propagating uncertainty becomes more beneficial with more imbalance between \(c_s\) and \(c_e\).

Appendix B. Newsvendor Problem as Quadratic Programming

Following [9, 12], we reformulate Eq. 8 by introducing new decision variables \(\boldsymbol{z_s} = \boldsymbol{y}-\boldsymbol{z}\) and \(\boldsymbol{z_e} = \boldsymbol{z}-\boldsymbol{y}\), with added constraints to align with the original problem’s bounds. This leads to a QP formulation: \({{\,\mathrm{arg\,min}\,}}_{\boldsymbol{v}} \frac{1}{2}\boldsymbol{v}^\intercal \boldsymbol{H} \boldsymbol{v} + \boldsymbol{k}^\intercal \boldsymbol{v}\) subject to \(\boldsymbol{A}\boldsymbol{v}\preceq \boldsymbol{b}\), where \(\boldsymbol{H} = 2\text {diag}[\boldsymbol{Q}, \boldsymbol{Q_s}, \boldsymbol{Q_e}]\), \(\boldsymbol{v} = [\boldsymbol{z}, \boldsymbol{z_s}, \boldsymbol{z_e}]\), \(\boldsymbol{k} = [\boldsymbol{c}, \boldsymbol{c_s}, \boldsymbol{c_e}]\), \(\boldsymbol{A} = [-\boldsymbol{I_{3d_z}}, [-\boldsymbol{I_{d_z}}, -\boldsymbol{I_{d_z}}, 0], [\boldsymbol{I_{d_z}}, 0, -\boldsymbol{I_{d_z}}], [\boldsymbol{p}, 0, 0]]^{\intercal }\), and \(\boldsymbol{b}=[0, 0, 0, -\boldsymbol{y}, \boldsymbol{y}, B]\). \(\boldsymbol{z}^*(\boldsymbol{y})\) is the primary variable of interest. Assuming \(\boldsymbol{H}\) is positive-definite ensures convexity. The formulation’s efficiency depends on the item count. It’s initially suitable for single vector predictions \(\boldsymbol{y}\), but we propose a Stochastic Programming method for generalization to multiple predictions.

1.1 Appendix B.1. Newsvendor Problem as Stochastic Quadratic Programming

When propagating the uncertainty of \(\boldsymbol{y}\) in a Monte Carlo fashion with M samples, the formulation above becomes as \({{\,\mathrm{arg\,min}\,}}_{\boldsymbol{v}} \frac{1}{2}\boldsymbol{v}^\intercal \boldsymbol{H} \boldsymbol{v} + \boldsymbol{k}^\intercal \boldsymbol{v}\) s.t. \(\boldsymbol{A}\boldsymbol{v}\preceq \boldsymbol{b}\) where \(H = 2diag([\boldsymbol{Q} \quad \frac{\boldsymbol{Q_s}}{M} \quad ... \quad \frac{\boldsymbol{Q_s}}{M} \quad \frac{\boldsymbol{Q_e}}{M} \quad ... \quad \frac{\boldsymbol{Q_e}}{M}])\); \(\boldsymbol{k} = [\boldsymbol{c} \quad \frac{\boldsymbol{c_s}}{M} \quad ... \quad \frac{\boldsymbol{c_s}}{M} \quad \frac{\boldsymbol{c_e}}{M} \quad ... \quad \frac{\boldsymbol{c_e}}{M}]\); \( \boldsymbol{v} = [\boldsymbol{z} \quad \boldsymbol{z_s}^{(1)} \quad ... \quad \boldsymbol{z_s}^{(M)} \quad \boldsymbol{z_e}^{(1)} \quad ... \quad \boldsymbol{z_e}^{(M)}]\); \(\boldsymbol{A} = [-\boldsymbol{I_F}, [-\boldsymbol{I_{B1}}, -\boldsymbol{I_{BB}}, \boldsymbol{0_{BB}}], [\boldsymbol{I_{B1}} ,\boldsymbol{0_{BB}} , -\boldsymbol{I_{BB}}], [\boldsymbol{p} ,0 , 0]]^\intercal \); and

\(\boldsymbol{b} = [\boldsymbol{0_{F}}, -\boldsymbol{y}^{(1)}, ..., -\boldsymbol{y}^{(M)}, \boldsymbol{y}^{(1)},..., \boldsymbol{y}^{(M)}, B]\). Where \(\boldsymbol{I_F} = \boldsymbol{I_{d_z + 2Md_z}}\), \(\boldsymbol{0_F} = \boldsymbol{0_{d_z + 2Md_z}}\) (1D vector), \(\boldsymbol{I_{BB}} = \boldsymbol{I_{Md_z}}\), \(\boldsymbol{0_{BB}} = \boldsymbol{0_{Md_z}}\), \(\boldsymbol{I_{B1}} = \boldsymbol{I_{d_z}}\) repeated for M rows. This is a generalization of the quadratic newsvendor experiment proposed in [Donti et al., 2017]. Note that \(v \in \mathbb {R}^{d_z+2Md_z}\), \(H \in \mathbb {R}^{d_z+2Md_z \times d_z+2Md_z}\), \(k \in \mathbb {R}^{d_z+2Md_z}\), \(A \in \mathbb {R}^{d_z + 4Md_z+1 \times d_z+2Md_z}\) and \(b \in \mathbb {R}^{d_z + 4Md_z+1}\). Therefore, both the number of items and prediction sampling size play an important and approximately equal role on the time to solve each instance of the OP. In practice, the complexity of the QP problem depends on the decision variable dimension, which is \(d_z+2Md_z\).

Appendix C. Portfolio Risk Minimization as a LP

With the same strategy as in Appendix 7, we use the auxiliary variable \(\boldsymbol{u} = \max \{-\boldsymbol{y}^\intercal \boldsymbol{z}, 0\}\) to rewrite the POP formulation from the main text to

$$\begin{aligned} \begin{aligned} &(\boldsymbol{z}^*, \boldsymbol{u}^*)({\boldsymbol{y}}) = {{\,\mathrm{arg\,min}\,}}_{\boldsymbol{z}, \boldsymbol{u}} \boldsymbol{0}^\intercal \boldsymbol{z} + \boldsymbol{u} \quad \text {s.t. } -\boldsymbol{[z, u]}\preceq 0, -\boldsymbol{y}^\intercal \boldsymbol{z} \preceq \boldsymbol{u}, -\boldsymbol{p}^\intercal \boldsymbol{z} \le -R. \end{aligned} \end{aligned}$$
(6)

Note that the zero constant in the objective function is only to reinforce that \(\boldsymbol{z}\) is also part of the set of decision variables.

1.1 Appendix C.1. Portfolio Risk Minimization as a Stochastic LP

By giving a set of samples \(\boldsymbol{y}^{(j)}\) as input, as suggested in [Rockafellar et al., 2000], the equation above can be rewritten in a stochastic programming fashion as

$$\begin{aligned} \begin{aligned} &(\boldsymbol{z}^*({\boldsymbol{y}}), \boldsymbol{u}^*({\boldsymbol{y}}) ) = {{\,\mathrm{arg\,min}\,}}_{\boldsymbol{z}, \boldsymbol{u}} \boldsymbol{0}^\intercal \boldsymbol{z} + \frac{1}{M} \sum _{j=1}^{M} \boldsymbol{u}^{(j)}\\ &\text {s.t. } -\boldsymbol{z}\preceq 0, -\boldsymbol{u}^{(j)} \preceq \boldsymbol{0}, -\boldsymbol{y}^{\intercal (j)} \boldsymbol{z} -\boldsymbol{u}^{(j)} \preceq \boldsymbol{0} \quad \forall j \in 1..M, &-\boldsymbol{p}^\intercal \boldsymbol{z} \le -R. \end{aligned} \end{aligned}$$
(7)

For implementation purpose, we followed [28] by adding a quadratic small term to LP in order to fit the OP into the Amos & Kolter QP solver.

Appendix D. Implementation Details

NNs were implemented with Pytorch and Adam optimizer, with learning rates of 0.0015 for NV, 0.002 for NVQP, and 0.001 for POP experiments. The Decoupled Bayesian Neural Network (BNN) had a learning rate of 0.0007, whereas the Combined BNN’s rate ranged between 0.0004 and 0.0007. An exponential scheduler was used to adjust the learning rate by a factor of 0.99. Hyperparameter K balanced data loss and regularization, selected without optimization. Training occurred on Nvidia RTX 2080 GPUs, with models evaluated on the validation set before testing report. Gaussian Process baselines were implemented with Scikit-learn and radial basis function kernel, optimized the length scale and white noise. For multi-output tasks (NVQP and POP), separate Gaussian processes for each output proved more effective.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lahoud, A.A., Schaffernicht, E., Stork, J.A. (2024). Learning Solutions of Stochastic Optimization Problems with Bayesian Neural Networks. In: Wand, M., Malinovská, K., Schmidhuber, J., Tetko, I.V. (eds) Artificial Neural Networks and Machine Learning – ICANN 2024. ICANN 2024. Lecture Notes in Computer Science, vol 15016. Springer, Cham. https://doi.org/10.1007/978-3-031-72332-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72332-2_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72331-5

  • Online ISBN: 978-3-031-72332-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics