Abstract
Multimedia content delivery in advanced networks faces exponential growth in data volumes, rendering existing solutions obsolete. This research investigates deep reinforcement learning (DRL) for autonomous optimization without extensive datasets. The work analyzes two prominent DRL algorithms, i.e., Dueling Deep Q-Network (DDQN) and Deep Q-Network (DQN) for multimedia delivery in simulated bus networks. DDQN utilizes a novel “dueling” architecture to estimate state value and action advantages, accelerating learning separately. DQN employs deep neural networks to approximate optimal policies. The environment simulates urban buses with passenger file requests and cache sizes modeled on actual data. Comparative analysis evaluates cumulative rewards and losses over 1500 training episodes to analyze learning efficiency, stability, and performance. Results demonstrate DDQN’s superior convergence and 32% higher cumulative rewards than DQN. However, DQN showed potential for gains over successive runs despite inconsistencies. It establishes DRL’s promise for automated decision-making while revealing enhancements to improve DQN. Further research should evaluate generalizability across problem domains, investigate hybrid models, and test physical systems. DDQN emerged as the most efficient algorithm, highlighting DRL’s potential to enable intelligent agents that optimize multimedia delivery.
Similar content being viewed by others
Data availability
The data that support the findings of this study are available from the corresponding author, upon reasonable request.
References
Akkaya I, Andrychowicz M, Chociej M, Litwin M, McGrew B, Petron A, Paino A, Plappert M, Powell G, Ribas R, Schneider J (2019) Solving Rubik’s cube with a robot hand. arXiv:1910.07113
Ali M, Yin B, Kumar A, Sheikh AM et al (2020) Reduction of multiplications in convolutional neural networks. In: 2020 39th Chinese control conference (CCC). IEEE, pp 7406–7411. https://doi.org/10.23919/CCC50068.2020.9188843
Al-Quzweeni AN, Lawey AQ, Elgorashi TE, Elmirghani JM (2021) Optimized energy efficient virtualization and content caching in 5G networks. arXiv:2102.01001
Andrychowicz M, Wolski F, Ray A, Schneider J, Fong R, Welinder P, McGrew B, Tobin J, Pieter Abbeel O, Zaremba W (2017) Hindsight experience replay. In: Advances in neural information processing systems, vol 30
Andrychowicz OM, Baker B, Chociej M, Jozefowicz R, McGrew B, Pachocki J, Petron A, Plappert M, Powell G, Ray A, Schneider J (2020) Learning dexterous in-hand manipulation. The Int J Robot Res 39(1):3–20
Aslam MS, Qaisar I, Majid A, Shamrooz S (2023) Adaptive event‐triggered robust H∞ control for Takagi–Sugeno fuzzy networked Markov jump systems with time‐varying delay. Asian J Control 25(1):213–228
Chu T, Wang J, Codecà L, Li Z (2019) Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans Intell Transp Syst 21(3):1086–1095
Dou H, Liu Y, Chen S et al (2023) A hybrid CEEMD-GMM scheme for enhancing the detection of traffic flow on highways. Soft Comput 27:16373–16388. https://doi.org/10.1007/s00500-023-09164-y
Fu C, Xu X, Zhang Y, Lyu Y, Xia Y, Zhou Z, Wu W (2022) Memory-enhanced deep reinforcement learning for UAV navigation in 3D environment. Neural Comput Appl 34(17):14599–14607
Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan J, Kumar V, Zhu H, Gupta A, Abbeel P, Levine S (2018) Soft actor-critic algorithms and applications. arXiv:1812.05905
Hazrat B, Yin B, Kumar A, Ali M, Zhang J, Yao J (2023) Jerk-bounded trajectory planning for rotary flexible joint manipulator: an experimental approach. Soft Comput 27(7):4029–4039. https://doi.org/10.1007/s00500-023-07923-5
Hussein A, Gaber MM, Elyan E, Jayne C (2017) Imitation learning: a survey of learning methods. ACM Comput Surv: CSUR 50(2):1–35
Iqbal MJ, Farhan M, Ullah F, Srivastava G, Jabbar S (2023) Intelligent multimedia content delivery in 5G/6G networks: a reinforcement learning approach. Trans Emerg Telecommun Technol e4842. https://doi.org/10.1002/ett.4842
Juliani A, Berges VP, Teng E, Cohen A, Harper J, Elion C, Goy C, Gao Y, Henry H, Mattar M, Lange D (2018) Unity: a general platform for intelligent agents. arXiv:1809.02627
Kiran BR, Sobh I, Talpaert V, Mannion P, Al Sallab AA, Yogamani S, Pérez P (2021) Deep reinforcement learning for autonomous driving: a survey. IEEE Trans Intell Transp Syst 23(6):4909–4926
Kumar A, Shaikh AM, Li Y et al (2021) Pruning filters with L1-norm and capped L1-norm for CNN compression. Appl Intell 51:1152–1160. https://doi.org/10.1007/s10489-020-01894-y
Li Y (2017) Deep reinforcement learning: an overview. arXiv:1701.07274
Luong NC, Hoang DT, Gong S, Niyato D, Wang P, Liang YC, Kim DI (2019) Applications of deep reinforcement learning in communications and networking: a survey. IEEE Commun Surv Tutor 21(4):3133–3174
Ma B, Liu Z, Zhao W, Yuan J, Long H, Wang X, Yuan Z (2023) Target tracking control of UAV through deep reinforcement learning. IEEE Trans Intell Transp Syst 24:5983–6000
Mao H, Alizadeh M, Menache I, Kandula S (2016) November. Resource management with deep reinforcement learning. In: Proceedings of the 15th ACM workshop on hot topics in networks, pp 50–56
Mao H, Zhang Z, Xiao Z, Gong Z, Ni Y (2020) Learning multi-agent communication with double attentional deep reinforcement learning. Auton Agent Multi Agent Syst 34:1–34
Mirhoseini A, Goldie A, Yazgan M, Jiang J, Songhori E, Wang S, Lee YJ, Johnson E, Pathak O, Bae S, Nazi A (2020) Chip placement with deep reinforcement learning. arXiv:2004.10746
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv:1312.5602
Morgulev E, Azar OH, Bar-Eli M (2020) Searching for momentum in NBA triplets of free throws. J Sports Sci 38(4):390–398
Peng B, Sun Q, Li SE, Kum D, Yin Y, Wei J, Gu T (2021) End-to-end autonomous driving through dueling double deep Q-network. Automot Innov 4:328–337
Qiao G, Leng S, Maharjan S, Zhang Y, Ansari N (2019) Deep reinforcement learning for cooperative content caching in vehicular edge computing and networks. IEEE Internet Things J 7(1):247–257
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489
Suha SA, Sanam TF (2022) A deep convolutional neural network-based approach for detecting burn severity from skin burn images. Mach Learn Appl 9:100371
Wang S, Liu H, Gomes PH, Krishnamachari B (2018) Deep reinforcement learning for dynamic multichannel access in wireless networks. IEEE Trans Cogn Commun Netw 4(2):257–265
Wang T, Bao X, Clavera I, Hoang J, Wen Y, Langlois E, Zhang S, Zhang G, Abbeel P, Ba J (2019) Benchmarking model-based reinforcement learning. arXiv:1907.02057
Wang L, Zhai Q, Yin B et al (2019) Second-order convolutional network for crowd counting. In: Proceedings of SPIE 11198, fourth international workshop on pattern recognition, 111980T (31 July 2019). https://doi.org/10.1117/12.2540362
Wu P, Partridge J, Anderlini E, Liu Y, Bucknall R (2021) Near-optimal energy management for plug-in hybrid fuel cell and battery propulsion using deep reinforcement learning. Int J Hydrogen Energy 46(80):40022–40040
Xiong Z, Zhang Y, Niyato D, Deng R, Wang P, Wang L-C (2019) Deep reinforcement learning for mobile 5G and beyond: fundamentals, applications, and challenges. IEEE Veh Technol Mag 14(2):44–52. https://doi.org/10.1109/MVT.2019.2903655
Xu H, Sun Y, Gao J, Guo J (2022) Intelligent edge content caching: a deep recurrent reinforcement learning method. Peer-to-Peer Netw Appl 15(6):2619–2632
Xu H, Sun Z, Cao Y et al (2023) A data-driven approach for intrusion and anomaly detection using automated machine learning for the Internet of Things. Soft Comput. https://doi.org/10.1007/s00500-023-09037-4
Yang Q, Yoo SJ (2018) Optimal UAV path planning: sensing data acquisition over IoT sensor networks using multi-objective bio-inspired algorithms. IEEE Access 6:13671–13684
Yao W, Guo Y, Wu Y, Guo J (2017) Experimental validation of fuzzy PID control of flexible joint system in presence of uncertainties. In: 2017 36th Chinese control conference (CCC). IEEE, pp 4192–4197. https://doi.org/10.23919/ChiCC.2017.8028015
Ye H, Li GY (2018) Deep reinforcement learning for resource allocation in V2V communications. In: 2018 IEEE ICC
Yin B, Khan J, Wang L, Zhang J, Kumar A (2019) Real-time lane detection and tracking for advanced driver assistance systems. In: 2019 Chinese control conference (CCC). IEEE, pp 6772–6777. https://doi.org/10.23919/ChiCC.2019.8866334
Yin B, Aslam MS et al (2023) A practical study of active disturbance rejection control for rotary flexible joint robot manipulator. Soft Comput 27:4987–5001. https://doi.org/10.1007/s00500-023-08026-x
Zhang P, Zhu X, Xie M (2021) A model-based reinforcement learning approach for maintenance optimization of degrading systems in a large state space. Comput Ind Eng 161:107622
Zhou Y, Li B, Lin TR (2022) Maintenance optimisation of multicomponent systems using hierarchical coordinated reinforcement learning. Reliab Eng Syst Saf 217:108078
Funding
This study was funded by (1) Guangxi vocational education teaching reform research project, China, Exploration and Practice of the chain Teaching system of “Six In One” in Higher Vocational Colleges under the perspective of “entrepreneurship and Innovation”, Project number: GXGZJG2021B096 and (2) The project of Improving the Basic Scientific Research Ability of Young and Middle-aged Teachers in Colleges and Universities in Guangxi, China, Research on the Development of “Smart Car Service” Operation Management Training System Based on Wechat, Project number: 2022KY1235.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
Informed consent was obtained from all the individual participants included in the study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lan, D., Shin, I. Boosting in-transit entertainment: deep reinforcement learning for intelligent multimedia caching in bus networks. Soft Comput 27, 19359–19375 (2023). https://doi.org/10.1007/s00500-023-09354-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-023-09354-8