Carbon trading supply chain management based on constrained deep reinforcement learning

Wang, Qinghao; Yang, Yaodong

doi:10.1007/s10458-024-09669-2

Carbon trading supply chain management based on constrained deep reinforcement learning

Published: 06 August 2024

Volume 38, article number 38, (2024)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Qinghao Wang¹ &
Yaodong Yang¹

529 Accesses
Explore all metrics

Abstract

The issue of carbon emissions is a critical global concern, and how to effectively reduce energy consumption and emissions is a challenge faced by the industrial sector, which is highly emphasized in supply chain management. The complexity arises from the intricate coupling mechanism between carbon trading and ordering. T he large-scale state space involved and various constraints make cost optimization difficult. Carbon quota constraints and sequential decision-making exacerbate the challenges for businesses. Existing research implements rule-based and heuristic numerical simulation, which struggles to adapt to time-varying environments. We develop a unified framework from the perspective of Constrained Markov Decision Processes (CMDP). Constrained Deep Reinforcement Learning (DRL) with its powerful high-dimensional representations of neural networks and effective decision-making capabilities under constraints, provides a potential solution for supply chain management that includes carbon trading. DRL with constraints is a crucial tool to study cost optimization for enterprises. This paper constructs a DRL algorithm for Double Order based on PPO-Lagrangian (DOPPOL), aimed at addressing a supply chain management model that integrates carbon trading decisions and ordering decisions. The results indicate that businesses can optimize both business and carbon costs, thereby increasing overall profits, as well as adapt to various demand uncertainties. DOPPOL outperforms the traditional method (s, S) in fluctuating demand scenarios. By introducing carbon trading, enterprises are able to adjust supply chain orders and carbon emissions through interaction, and improve operational efficiency. Finally, we emphasize the significant role of carbon pricing in enterprise contracts in terms of profitability, as reasonable prices can help control carbon emissions and reduce costs. Our research is of great importance in achieving climate change control, as well as promoting sustainability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Reinforcement Learning and Optimization Approach for Multi-echelon Supply Chain with Uncertain Demands

Comparing Deep Reinforcement Learning Algorithms in Two-Echelon Supply Chains

Supply Chain Synchronization Through Deep Reinforcement Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Thuiller, W. (2007). Climate change and the ecologist. Nature, 448(7153), 550–552.
Article Google Scholar
Weyant, J. P. (1993). Costs of reducing global carbon emissions. Journal of Economic Perspectives, 7(4), 27–46.
Article Google Scholar
Huisingh, D., Zhang, Z., Moore, J. C., Qiao, Q., & Li, Q. (2015). Recent advances in carbon emissions reduction: Policies, technologies, monitoring, assessment and modeling. Journal of Cleaner Production, 103, 1–12.
Article Google Scholar
Spash, C. L. (2010). The brave new world of carbon trading. New Political Economy, 15(2), 169–195.
Article Google Scholar
Perdan, S., & Azapagic, A. (2011). Carbon trading: Current schemes and future developments. Energy Policy, 39(10), 6040–6054.
Article Google Scholar
Lamming, R., & Hampson, J. (1996). The environment as a supply chain management issue. British Journal of Management, 7(1), S45–S62.
Google Scholar
Stadtler, H. (2014). Supply chain management and advanced planning: Concepts, models, software, and case studies, 3–28
Goyal, S. K., & Satir, A. T. (1989). Joint replenishment inventory control: Deterministic and stochastic models. European Journal of Operational Research, 38(1), 2–13.
Article MathSciNet Google Scholar
Cachon, G. (2001). Managing a retailer’s shelf space, inventory, and transportation. Manufacturing & Service Operations Management, 3, 211–229.
Article Google Scholar
Yang, Y., Lin, J., Liu, G., & Zhou, L. (2021). The behavioural causes of bullwhip effect in supply chains: A systematic literature review. International Journal of Production Economics, 236, 108120.
Article Google Scholar
Altman, E. (2021). Constrained Markov decision processes. Routledge.
Book Google Scholar
Oroojlooyjadid, A., Nazari, M., Snyder, L. V., & Takáč, M. (2022). A deep q-network for the beer game: Deep reinforcement learning for inventory optimization. Manufacturing & Service Operations Management, 24(1), 285–304.
Article Google Scholar
Gijsbrechts, J., Boute, R. N., Van Mieghem, J. A., & Zhang, D. J. (2022). Can deep reinforcement learning improve inventory management? Performance on lost sales, dual-sourcing, and multi-echelon problems. Manufacturing & Service Operations Management, 24(3), 1349–1368.
Article Google Scholar
Boute, R. N., Gijsbrechts, J., Van Jaarsveld, W., & Vanvuchelen, N. (2022). Deep reinforcement learning for inventory control: A roadmap. European Journal of Operational Research, 298(2), 401–412.
Article MathSciNet Google Scholar
Liu, X., Alexopoulos, C., Hu, H., Han, S., Peng, Y., Qi, Y. (2023). Deep reinforcement learning for large-scale inventory management. Available at SSRN 4490327
Wang, Q., Peng, Y., & Yang, Y. (2022). Solving inventory management problems through deep reinforcement learning. Journal of Systems Science and Systems Engineering, 31(6), 677–689.
Article Google Scholar
Böttcher, L., Asikis, T., & Fragkos, I. (2023). Control of dual-sourcing inventory systems using recurrent neural networks. INFORMS Journal on Computing, 35, 1308–1328.
Article MathSciNet Google Scholar
Liu, X., Hu, M., Peng, Y., Yang, Y. (2022). Multi-agent deep reinforcement learning for multi-echelon inventory management. Available at SSRN
Das, C., & Jharkharia, S. (2018). Low carbon supply chain: A state-of-the-art literature review. Journal of Manufacturing Technology Management, 29(2), 398–428.
Article Google Scholar
Benjaafar, S., Li, Y., & Daskin, M. (2013). Carbon footprint and the management of supply chains: Insights from simple models. IEEE Transactions on Automation Science and Engineering, 10(1), 99–116.
Article Google Scholar
Sundarakani, B., de Souza, R., Goh, M., Wagner, S. M., & Manikandan, S. (2010). Modeling carbon footprints across the supply chain. International Journal of Production Economics, 128(1), 43–50. Integrating the Global Supply Chain.
Article Google Scholar
Xia, X., Li, C., & Zhu, Q. (2020). Game analysis for the impact of carbon trading on low-carbon supply chain. Journal of Cleaner Production, 276, 123220.
Article Google Scholar
Brandenburg, M. (2015). Low carbon supply chain configuration for a new product-a goal programming approach. International Journal of Production Research, 53(21), 6588–6610.
Article Google Scholar
Peng, H., Pang, T., & Cong, J. (2018). Coordination contracts for a supply chain with yield uncertainty and low-carbon preference. Journal of Cleaner Production, 205, 291–302.
Article Google Scholar
Hua, G., Cheng, T., Zhang, Y., Zhang, J., & Wang, S. (2016). Carbon-constrained perishable inventory management with freshness-dependent demand. International Journal of Simulation Modelling, 15(3), 542–552.
Article Google Scholar
Huang, Y.-S., Fang, C.-C., & Lin, Y.-A. (2020). Inventory management in supply chains with consideration of logistics, green investment and different carbon emissions policies. Computers & Industrial Engineering, 139, 106207.
Article Google Scholar
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
Article Google Scholar
Rusk, N. (2016). Deep learning. Nature Methods, 13(1), 35–35.
Article Google Scholar
Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237–285.
Article Google Scholar
Wiering, M. A., & Van Otterlo, M. (2012). Reinforcement learning. Adaptation, Learning, and Optimization, 12(3), 729.
Google Scholar
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT Press.
Google Scholar
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484–489.
Article Google Scholar
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., et al. (2018). A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419), 1140–1144.
Article MathSciNet Google Scholar
Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., Choi, D. H., Powell, R., Ewalds, T., Georgiev, P., et al. (2019). Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782), 350–354.
Article Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533.
Article Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International conference on machine learning, pp. 1861–1870. PMLR
Achiam, J., Held, D., Tamar, A., Abbeel, P. (2017). Constrained policy optimization. In: International conference on machine learning, pp. 22–31. PMLR
Ray, A., Achiam, J., & Amodei, D. (2019). Benchmarking safe exploration in deep reinforcement learning. 7(1), 2. arXiv:1910.01708.
Zhang, Y., Vuong, Q., & Ross, K. (2020). First order constrained optimization in policy space. Advances in Neural Information Processing Systems, 33, 15338–15349.
Google Scholar
Yang, F. (2023). Exploring safe reinforcement learning for sequential decision making. PhD thesis, Carnegie Mellon University Pittsburgh
Garcıa, J., & Fernández, F. (2015). A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1), 1437–1480.
MathSciNet Google Scholar
Isele, D., Nakhaei, A., Fujimura, K. (2018). Safe reinforcement learning on autonomous vehicles. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 1– 6. IEEE
Mirchevska, B., Pek, C., Werling, M., Althoff, M., Boedecker, J. (2018). High-level decision making for safe and reasonable autonomous lane changing using reinforcement learning. In: 2018 21st international conference on intelligent transportation systems (ITSC), pp. 2156–2162. IEEE
Gu, S., Yang, L., Du, Y., Chen, G., Walter, F., Wang, J., Yang, Y., Knoll, A. (2022). A review of safe reinforcement learning: Methods, theory and applications. arXiv preprint arXiv:2205.10330
García, J., & Shafie, D. (2020). Teaching a humanoid robot to walk faster through safe reinforcement learning. Engineering Applications of Artificial Intelligence, 88, 103360.
Article Google Scholar

Download references

Acknowledgements

This work has been sponsored by National Science and Technology Major Project of China (2022ZD0114900), National Natural Science Foundation of China (62376013), Young Elite Scientists Sponsorship Program by CAST (2022QNRC002), Beijing Municipal Science & Technology Commission (Z231100007423015).

Author information

Authors and Affiliations

Institute for Artificial Intelligence, Peking University, Bejing, 100871, China
Qinghao Wang & Yaodong Yang

Authors

Qinghao Wang
View author publications
You can also search for this author inPubMed Google Scholar
Yaodong Yang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

QW wrote the main manuscript text and prepared figures. YY put forward valuable suggestions. All authors reviewed the manuscript.

Corresponding author

Correspondence to Yaodong Yang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, Q., Yang, Y. Carbon trading supply chain management based on constrained deep reinforcement learning. Auton Agent Multi-Agent Syst 38, 38 (2024). https://doi.org/10.1007/s10458-024-09669-2

Download citation

Accepted: 24 July 2024
Published: 06 August 2024
DOI: https://doi.org/10.1007/s10458-024-09669-2

Keywords

Part of a collection:

Special Issue on Citizen-Centric AI Systems

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Carbon trading supply chain management based on constrained deep reinforcement learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Reinforcement Learning and Optimization Approach for Multi-echelon Supply Chain with Uncertain Demands

Comparing Deep Reinforcement Learning Algorithms in Two-Echelon Supply Chains

Supply Chain Synchronization Through Deep Reinforcement Learning

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now