Fast reinforcement learning algorithms for joint adaptive source coding and transmission control in IoT devices with renewable energy storage

Namjoonia, Farnoosh; Sheikhi, Marzieh; Hakami, Vesal

doi:10.1007/s00521-021-06656-6

Fast reinforcement learning algorithms for joint adaptive source coding and transmission control in IoT devices with renewable energy storage

Original Article
Published: 04 January 2022

Volume 34, pages 3959–3979, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

325 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

Energy harvesting and source coding are two key techniques that can be exploited to mitigate the device battery limitation in Internet of things (IoT). However, these mitigating techniques come with the expense of adding to the complexity of control and optimization in the digital communication chain. In this paper, to strike a balance between the opposing goals of energy-efficient communication, high-fidelity reconstruction at the IoT gateway, low packet drop ratio, and timeliness of data, we address the delay-constrained joint lossy compression and transmission control problem for an IoT device with rechargeable energy storage. Given the stochastic dynamics emanated from the random nature of the energy harvesting process as well as the fading channel, we formulate a stochastic optimization problem using the formalism of constrained Markov decision process (CMDP) and utilize the standard Lagrangian technique to recast the problem in the unconstrained form. To compute the optimal control policy, we propose a two-timescale stochastic approximation algorithm consisting of some reinforcement learning (RL) algorithms for estimating the CMDP’s value function and stochastic gradient descent for estimating the Lagrange multiplier. Specifically, we propose three RL procedures: one based on standard Q-learning, and two accelerated learning procedures, namely post-decision state (PDS) and virtual experience (VE) learning. These algorithms exploit the known system dynamics and batch updates to overcome the slowness caused by the asynchronous updating pattern in Q-learning. Simulation results demonstrate that the proposed PDS and VE learning algorithms speed up the convergence to the optimal control policy by one and two orders of magnitude, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Q-learning-based algorithms for dynamic transmission control in IoT equipment

Article 03 July 2022

Online Power Control and Optimization for Energy Harvesting Communication System Based on State of Charge

Article Open access 15 September 2021

A Novel Approach on Deep Reinforcement Learning for Improved Throughput in Power-Restricted IoT Networks

Availability of data and material

Data sharing is not applicable—no new data were generated.

References

Musaddiq A, Bin ZY, Hahm O, Yu H, Bashir AK, Kim SW (2018) A survey on resource management in IoT operating systems. IEEE Access. https://doi.org/10.1109/ACCESS.2018.2808324
Article Google Scholar
Osouli Tabrizi H, Al-Turjman F (2020) AI for dynamic packet size optimization of batteryless IoT nodes: a case study for wireless body area sensor networks. Neural Comput Appl 32(20):16167–16178. https://doi.org/10.1007/s00521-020-04813-x
Article Google Scholar
Zang W, Miao F, Gravina R, Sun F, Fortino G, Li Y (2020) CMDP-based intelligent transmission for wireless body area network in remote health monitoring. Neural Comput Appl 32(3):829–837. https://doi.org/10.1007/s00521-019-04034-x
Article Google Scholar
Tavli B, Bagci I, Ceylan O (2010) Optimal data compression and forwarding in wireless sensor networks. IEEE Commun Lett 14(5):408–410. https://doi.org/10.1109/LCOMM.2010.05.092372
Article Google Scholar
Liu H-S, Chuang C-C, Lin C-C, Chang R-I, Wang C-H, Hsieh C-C (2011) Data compression for energy efficient communication on ubiquitous sensor network. J Appl Sci Eng 14(3):245–254. https://doi.org/10.6180/jase.2011.14.3.08
Article Google Scholar
Reddy V, Gayathri P (2020) Energy efficient data transmission in WSN thru compressive slender penetrative etiquette. J Ambient Intell Humaniz Comput 11(11):4681–4693. https://doi.org/10.1007/s12652-020-01724-6
Article Google Scholar
Narendra N, Ponnalagu K, Ghose A, Tamilselvam S (2015) Goal-driven context-aware data filtering in IoT-based systems. In: 2015 IEEE 18th international conference on intelligent transportation systems. IEEE, pp 2172–2179
Kim D-Y, Jeong Y-S, Kim S (2017) Data-filtering system to avoid total data distortion in IoT networking. Symmetry (Basel) 9(1):16. https://doi.org/10.3390/sym9010016
Article MathSciNet Google Scholar
Ferrer-Cid P, Barcelo-Ordinas JM, Garcia-Vidal J, Ripoll A, Viana M (2020) Multisensor data fusion calibration in IoT Air pollution platforms. IEEE Internet Things J 7(4):3124–3132. https://doi.org/10.1109/JIOT.2020.2965283
Article Google Scholar
Zhou J, Hu L, Wang F, Lu H, Zhao K (2013) An efficient multidimensional fusion algorithm for IoT data based on partitioning. Tsinghua Sci Technol 18(4):369–378. https://doi.org/10.1109/TST.2013.6574675
Article Google Scholar
Hao H, Wang M, Tang Y, Li Q (2019) Research on data fusion of multi-sensors based on fuzzy preference relations. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3778-5
Article Google Scholar
Neely MJ (2008) Dynamic data compression for wireless transmission over a fading channel. In: 2008 42nd annual conference on information sciences and systems. IEEE, pp 1210–1215
Neely MJ, Sharma A (2008) dynamic data compression with distortion constraints for wireless transmission over a fading channel
Centenaro M, Rossi M, Zorzi M (2016) Joint optimization of lossy compression and transport in wireless sensor networks. 2016 IEEE globecom workshops. IEEE, Washington, DC, pp 1–6
Google Scholar
Pielli C, Stefanovic C, Popovski P, Zorzi M (2018) Joint compression, channel coding, and retransmission for data fidelity with energy harvesting. IEEE Trans Commun 66(4):1425–1439. https://doi.org/10.1109/TCOMM.2017.2785323
Article Google Scholar
Yu Y, Krishnamachari B, Prasanna VK (2008) Data gathering with tunable compression in sensor networks. IEEE Trans Parallel Distrib Syst 19(2):276–287. https://doi.org/10.1109/TPDS.2007.70709
Article Google Scholar
Vecchio M, Giaffreda R, Marcelloni F (2014) Adaptive lossless entropy compressors for tiny IoT devices. IEEE Trans Wirel Commun 13(2):1088–1100. https://doi.org/10.1109/TWC.2013.121813.130993
Article Google Scholar
Tahir M, Farrell R (2009) Optimal communication-computation tradeoff for wireless multimedia sensor network lifetime maximization. In: 2009 IEEE Wireless Communications and Networking Conference. IEEE, Budapest, pp 1–6
Hu W, Zhang W, Hu H, Wen Y, Tseng K-J (2017) Toward joint compression-transmission optimization for green wearable devices: an energy-delay tradeoff. IEEE Internet Things J 4(4):1006–1018. https://doi.org/10.1109/JIOT.2017.2704605
Article Google Scholar
Hakami V, Mostafavi S, Javan NT, Rashidi Z (2020) An optimal policy for joint compression and transmission control in delay-constrained energy harvesting IoT devices. Comput Commun 160(1):554–566. https://doi.org/10.1016/j.comcom.2020.07.005
Article Google Scholar
Zordan D, Melodia T, Rossi M (2016) On the design of temporal compression strategies for energy harvesting sensor networks. IEEE Trans Wirel Commun 15(2):1336–1352. https://doi.org/10.1109/TWC.2015.2489200
Article Google Scholar
Gunduz D, Stamatiou K, Michelusi N, Zorzi M (2014) Designing intelligent energy harvesting communication systems. IEEE Commun Mag 52(1):210–216. https://doi.org/10.1109/MCOM.2014.6710085
Article Google Scholar
Incebacak D, Zilan R, Tavli B, Barcelo-Ordinas JM, Garcia-Vidal J (2015) Optimal data compression for lifetime maximization in wireless sensor networks operating in stealth mode. Ad Hoc Netw. https://doi.org/10.1016/j.adhoc.2014.07.019
Article Google Scholar
Bhat RV, Motani M, Lim TJ (2016) Distortion minimization in energy harvesting sensor nodes with compression power constraints. In: 2016 IEEE International Conference on Communications (ICC). IEEE, pp 1–6
Vinodha R, Durairaj S (2020) Soft computing approach based energy and correlation aware cooperative data collection for wireless sensor network. J Ambient Intell Humaniz Comput 12:5297
Article Google Scholar
Zhang W, Fan R, Wen Y, Liu F (2018) Energy optimal wireless data transmission for wearable devices: a compression approach. IEEE Trans Veh Technol 67(10):9605–9618. https://doi.org/10.1109/TVT.2018.2859433
Article Google Scholar
Pielli C, Biason A, Zanella A, Zorzi M (2016) Joint optimization of energy efficiency and data compression in tdma-based medium access control for the IoT. 2016 IEEE Globecom Workshops (GC Wkshps). IEEE, Washington, DC, pp 1–6
Google Scholar
Ukil A, Bandyopadhyay S, Pal A (2015) IoT data compression: sensor-agnostic approach. In: 2015 Data Compression Conference. IEEE, pp 303–312
Elliott EO (1963) Estimates of error rates for codes on burst-noise channels. Bell Syst Tech J 42(5):1977–1997. https://doi.org/10.1002/j.1538-7305.1963.tb00955.x
Article Google Scholar
Kang MJ, Jeong S, Yoon I, Noh DK (2017) Energy-aware determination of compression for low latency in solar-powered wireless sensor networks. Int J Distrib Sens Networks 13(2):155014771769416. https://doi.org/10.1177/1550147717694168
Article Google Scholar
Castiglione P, Simeone O, Erkip E, Zemen T (2012) Energy management policies for energy-neutral source-channel coding. IEEE Trans Commun 60(9):2668–2678. https://doi.org/10.1109/TCOMM.2012.071212.110167
Article Google Scholar
Neely MJ (2010) Stochastic network optimization with application to communication and queueing systems. Morgan & Claypool
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. The MIT Press
Tapparello C, Simeone O, Rossi M (2013) Dynamic compression-transmission for energy-harvesting multihop networks with correlated sources. IEEE/ACM Trans Netw 22(6):1729–1741
Article Google Scholar
Cui Y, Lau VKN, Wang R, Huang H, Zhang S (2012) A survey on delay-aware resource control for wireless systems—large deviation theory, stochastic Lyapunov drift, and distributed stochastic learning. IEEE Trans Inf Theory 58(3):1677–1701. https://doi.org/10.1109/TIT.2011.2178150
Article MathSciNet MATH Google Scholar
Puterman ML (2014) Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, Inc.
Altman E (1999) Constrained Markov decision processes. Chapman & Hall/CRC
Salodkar N, Bhorkar A, Karandikar A, Borkar V (2008) An on-line learning algorithm for energy efficient delay constrained scheduling over a fading channel. IEEE J Sel Areas Commun 26(4):732–742. https://doi.org/10.1109/JSAC.2008.080514
Article Google Scholar
Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn. https://doi.org/10.1007/BF00992698
Article MATH Google Scholar
Mastronarde N, van der Schaar M (2011) Fast reinforcement learning for energy-efficient wireless communication. IEEE Trans Signal Process 59(12):6262–6266. https://doi.org/10.1109/TSP.2011.2165211
Article MathSciNet MATH Google Scholar
Miozzo M, Zordan D, Dini P, Rossi M (2014) SolarStat: modeling photovoltaic sources through stochastic Markov processes. In: 2014 IEEE international energy conference (ENERGYCON). IEEE, Dubrovnik, pp 688–695
Zordan D, Martinez B, Vilajosana I, Rossi M (2014) On the performance of lossy compression schemes for energy constrained sensor networking. ACM Trans Sens Netw 11(1):1–34. https://doi.org/10.1145/2629660
Article Google Scholar
Schoellhammer T, Greenstein B, Osterweil E, Wimbrow M, Estrin D (2004) Lightweight temporal compression of microclimate datasets [wireless sensor networks]. In: 29th Annual IEEE international conference on local computer networks. IEEE (Comput. Soc.), Tampa, FL, pp 516–524
Wang R, Zhang J, Song SH, Letaief KB (2016) Optimal QoS-aware channel assignment in D2D communications with partial CSI. IEEE Trans Wirel Commun 15(11):7594–7609. https://doi.org/10.1109/TWC.2016.2604813
Article Google Scholar
Peng S (1992) Stochastic Hamilton–Jacobi–Bellman equations. SIAM J Control Optim 30(2):284–304. https://doi.org/10.1137/0330018
Article MathSciNet MATH Google Scholar
Bertsekas DP, Gallager RG (1992) Data networks, 2nd edn. Prentice Hall
MATH Google Scholar
Lau VKN, Cui Y (2010) Delay-optimal power and subcarrier allocation for OFDMA systems via stochastic approximation. IEEE Trans Wirel Commun 9(1):227–233
Article Google Scholar
Liu A, Lau VKN (2014) Cache-enabled opportunistic cooperative MIMO for video streaming in wireless systems. IEEE Trans Signal Process 62(2):390–402. https://doi.org/10.1109/TSP.2013.2291211
Article MathSciNet MATH Google Scholar
Bertsekas DP (2012) Dynamic programming and optimal control, 4th edn. Athena Scientific
MATH Google Scholar
Aslani R, Hakami V, Dehghan M (2018) A token-based incentive mechanism for video streaming applications in peer-to-peer networks. Multimed Tools Appl 77(12):14625–14653. https://doi.org/10.1007/s11042-017-5051-9
Article Google Scholar
Gosavi A (2015) Simulation-based optimization parametric optimization techniques and reinforcement learning. Springer
MATH Google Scholar
Davidson J (1994) Stochastic limit theory: an introduction for econometricians, 1st edn. Oxford University Press
Book Google Scholar
Borkar VS (1997) Stochastic approximation with two time scales. Syst Control Lett 29(5):291–294. https://doi.org/10.1016/S0167-6911(97)90015-3
Article MathSciNet MATH Google Scholar
Borkar VS (2008) Stochastic approximation: a dynamical systems viewpoint. Hindustan Book Agency
Borkar VS (2005) An actor-critic algorithm for constrained Markov decision processes. Syst Control Lett 54(3):207–213. https://doi.org/10.1016/j.sysconle.2004.08.007
Article MathSciNet MATH Google Scholar
Pokhrel SR, Verma S, Garg S, Sharma AK, Choi J (2021) An efficient clustering framework for massive sensor networking in industrial internet of things. IEEE Trans Ind Informatics 17(7):4917–4924. https://doi.org/10.1109/TII.2020.3006276
Article Google Scholar
He Y, Zhang Z, Yu FR, Zhao N, Yin H, Leung VCM, Zhang Y (2017) Deep-reinforcement-learning-based optimization for cache-enabled opportunistic interference alignment wireless networks. IEEE Trans Veh Technol 66(11):10433–10445. https://doi.org/10.1109/TVT.2017.2751641
Article Google Scholar
Biason A, Pielli C, Zanella A, Zorzi M (2018) Access control for IoT nodes with energy and fidelity constraints. IEEE Trans Wirel Commun 17(5):3242–3257. https://doi.org/10.1109/TWC.2018.2808520
Article Google Scholar
Arastouie N, Sabaei M, Hakami V, Soltanali S (2013) A novel trade–off between communication and computation costs for data aggregation in wireless sensor networks. Int J Ad Hoc Ubiquitous Comput 12(4):245–253. https://doi.org/10.1504/IJAHUC.2013.052865
Article Google Scholar
Borkar VS (2002) Convex analytic methods in Markov decision processes. In: Feinberg EA, Shwartz A (eds) Handbook of Markov decision processes. Springer, Boston, MA, pp 347–375
Chapter Google Scholar
Hakami V, Dehghan M (2016) Learning stationary correlated equilibria in constrained general-sum stochastic games. IEEE Trans Cybern 46(7):1640–1654. https://doi.org/10.1109/TCYB.2015.2453165
Article Google Scholar

Download references

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

School of Computer Engineering, Iran University of Science and Technology, Tehran, 16846-13114, Iran
Farnoosh Namjoonia & Marzieh Sheikhi
Center of Excellence in Future Networks, School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran
Vesal Hakami

Authors

Farnoosh Namjoonia
View author publications
You can also search for this author in PubMed Google Scholar
Marzieh Sheikhi
View author publications
You can also search for this author in PubMed Google Scholar
Vesal Hakami
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vesal Hakami.

Ethics declarations

Conflict of interest

All authors declare that they have no conflicts of interest that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Namjoonia, F., Sheikhi, M. & Hakami, V. Fast reinforcement learning algorithms for joint adaptive source coding and transmission control in IoT devices with renewable energy storage. Neural Comput & Applic 34, 3959–3979 (2022). https://doi.org/10.1007/s00521-021-06656-6

Download citation

Received: 07 May 2021
Accepted: 27 October 2021
Published: 04 January 2022
Issue Date: March 2022
DOI: https://doi.org/10.1007/s00521-021-06656-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast reinforcement learning algorithms for joint adaptive source coding and transmission control in IoT devices with renewable energy storage

Abstract

Access this article

Similar content being viewed by others

Q-learning-based algorithms for dynamic transmission control in IoT equipment

Online Power Control and Optimization for Energy Harvesting Communication System Based on State of Charge

A Novel Approach on Deep Reinforcement Learning for Improved Throughput in Power-Restricted IoT Networks

Availability of data and material

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fast reinforcement learning algorithms for joint adaptive source coding and transmission control in IoT devices with renewable energy storage

Abstract

Access this article

Similar content being viewed by others

Q-learning-based algorithms for dynamic transmission control in IoT equipment

Online Power Control and Optimization for Energy Harvesting Communication System Based on State of Charge

A Novel Approach on Deep Reinforcement Learning for Improved Throughput in Power-Restricted IoT Networks

Availability of data and material

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation