Skip to main content

Advertisement

Log in

Efficiency and productivity for decision making on low-power heterogeneous CPU+GPU SoCs

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Markov decision processes provide a formal framework for a computer to make decisions autonomously and intelligently when the effects of its actions are not deterministic. This formalism has had tremendous success in many disciplines; however, its implementation on platforms with scarce computing capabilities and power, as it happens in robotics or autonomous driving, is still limited. To solve this computationally complex problem efficiently under these constraints, high-performance accelerator hardware and parallelized software come to the rescue. In particular, in this work, we evaluate off-line-tuned static and dynamic versus adaptive heterogeneous scheduling strategies for executing value iteration—a core procedure in many decision-making methods, such as reinforcement learning and task planning—on a low-power heterogeneous CPU+GPU SoC that only uses 10–15 W. Our experimental results show that by using CPU+GPU heterogeneous strategies, the computation time and energy required are considerably reduced. They can be up to 54% (61%) faster and 57% (65%) more energy-efficient with respect to multicore—TBB—(or GPU-only—OpenCL—) implementation. Additionally, we also explore the impact of increasing the abstraction level of the programming model to ease the programming effort. To that end, we compare the TBB+OpenCL vs. the TBB+oneAPI implementations of our heterogeneous schedulers, observing that oneAPI versions result in up to \(5\times\) less programming effort and only incur in 3–8% of overhead if the scheduling strategy is selected carefully.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. A detailed report on the matter is available in [5], Sects. 3.2 and 3.3.

  2. ScheduleUSM is the same code as ScheduleBUFF.

References

  1. Barber R, Crespo J, Gomez C, Hernamdez A, Galli M (2019) Mobile robot navigation in indoor environments: geometric, topological, and semantic navigation, chapter 5. Intech Open, London, pp 393–640

    Google Scholar 

  2. Bellman R (1954) The theory of dynamic programming. Bull Am Math Soc 60(6):503–515

    Article  MathSciNet  Google Scholar 

  3. Bertsekas DP (2007) Dynamic programming and optimal control, vol 2, 3rd edn. Athena Scientific, Nashua

    MATH  Google Scholar 

  4. Boucherie RJ, van Dijk NM (eds) (2017) Markov decision processes in practice. Springer

  5. Constantinescu DA (2017) Optimization of a decision making algorithm under uncertainty for heterogeneous platforms. Master’s thesis, Universidad de Málaga. https://doi.org/10.13140/RG.2.2.24922.70082

  6. Coradeschi S et al (2014) GiraffPlus: a system for monitoring activities and physiological parameters and promoting social interaction for elderly. In: Hippe ZS, Kulikowski JL, Mroczek T, Wtorek J (eds) Human–Computer Systems Interaction: Backgrounds and Applications 3. Springer, New York

    Google Scholar 

  7. Corbera F, Rodríguez A, Asenjo R, Navarro A, Vilches A, Garzarán MJ (2015) Reducing overheads of dynamic scheduling on heterogeneous chips. arXiv preprint arXiv:1501.03336

  8. Dios AJ, Asenjo R, Navarro AG, Corbera F, Zapata EL (2011) High-level template for the task-based parallel wavefront pattern. In: 18th International Conference on High Performance Computing

  9. Fernández-Madrigal JA, Cruz-Martin AM, Aguilar-Moreno M, Vega IF (2019) CRUMB: cognitive-robotics-supporting mobile base (consulted 1st of August, 2019). http://babel.isa.uma.es/crumb

  10. Gordon GJ (1999) Approximate solutions to markov decision processes. Ph.D. thesis, Carnegie Mellon University Pittsburgh. http://reports-archive.adm.cs.cmu.edu/anon/1999/CMU-CS-99-143.pdf

  11. Group K (2019) SYCL specification: SYCL integrates OpenCL devices with modern C++, v1.2.1

  12. Hernandez B, Pérez H, Rudomin I, Ruiz S, de Gyves O, Toledo L (2014) Simulating and visualizing real-time crowds on GPU clusters. Comput Sist 18(4):651–664

    Google Scholar 

  13. Iannucci S, Chen Q, Abdelwahed S (2016) High-performance intrusion response planning on many-core architectures. In: International Conference on Computer Communication and Networks (ICCCN). IEEE, pp 1–6

  14. Intel: Intel oneAPI Programming Guide (Beta) (2019)

  15. Jaskowski W (2017) Mastering 2048 with delayed temporal coherence learning, multi-stage weight promotion, redundant encoding and carousel shaping. In: IEEE Transactions on Computational Intelligence and AI in Games

  16. Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp 1928–1937

  17. Munir A, Gordon-Ross A, Ranka S (2015) Modeling and optimization of parallel and distributed embedded systems. Wiley, New York

    Google Scholar 

  18. Navarro A, Corbera F, Rodriguez A, Vilches A, Asenjo R (2019) Heterogeneous parallel_for template for CPU-GPU chips. Int J Parallel Program 47(2):213–233

    Article  Google Scholar 

  19. Powell WB (2011) Approximate dynamic programming: solving the curses of dimensionality, 2nd edn. Wiley, New York

    Book  Google Scholar 

  20. Puterman ML (2005) Markov decision processes: discrete stochastic dynamic programming (Wiley series in probability and statistics). Wiley, New York

    Google Scholar 

  21. Robotics C (2019) V-REP: virtual robot experimentation platform (consulted 1st of August, 2019). www.coppeliarobotics.com

  22. Rodríguez A, Navarro A, Asenjo R, Corbera F, Gran R, Suárez D, Nunez-Yanez J (2019) Parallel multiprocessing and scheduling on the heterogeneous Xeon+FPGA platform. J Supercomput. https://doi.org/10.1007/s11227-019-02935-1

    Article  Google Scholar 

  23. Ruiz S, Hernández B (2015) A parallel solver for Markov decision process in crowd simulations. In: 2015 Fourteenth Mexican International Conference on Artificial Intelligence (MICAI). IEEE, pp 107–116

  24. Sigaud O, Buffet O (2013) Markov decision processes in artificial intelligence. Wiley, New York

    Book  Google Scholar 

  25. Tai L, Liu M (2016) Mobile robots exploration through CNN-based reinforcement learning. Robot Biomim 3(1):24

    Article  Google Scholar 

  26. Thakur A, Svec P, Gupta SK (2012) GPU based generation of state transition models using simulations for unmanned surface vehicle trajectory planning. Robot Auton Syst 60(12):1457–1471

    Article  Google Scholar 

  27. Vega IF (2016) Development of a programming environment for a simulated TurtleBot-2 robot with a WindowsX manipulator arm through the connection of V-REP and MATLAB. B.Sc. thesis, University of Málaga

  28. Voss M, Asenjo R, Reinders J (2019) Pro TBB: C++ parallel programming with threading building blocks. Apress, New York

    Book  Google Scholar 

  29. White D (1993) Markov decision processes. Wiley, New York

    MATH  Google Scholar 

  30. Wiering M, Otterlo M (eds) (2012) Reinforcement learning: state-of-the-art. Springer, New York

    Google Scholar 

  31. Willhalm T, Dementiev R, Fay P (2020) Performance counter monitor (PCM) (consulted 21st of January, 2020). https://github.com/opcm/pcm

  32. Wu Z (2017) Parallelizing model checking algorithms using multi-core and many-core architectures. Ph.D. thesis, Nanyang Technological University, Singapore

  33. Yamaguchi U, Saito F, Ikeda K, Yamamoto T (2015) HSR, human support robot as research and development platform. In: International Conference on Advanced Mechatronics: Toward Evolutionary Fusion of IT and Mechatronics, pp 39–40

  34. Zhou H, Khatri SP, Hu J, Liu F, Sze C (2017) Fast and highly scalable Bayesian MDP on a GPU platform. In: International Conference on Bioinformatics, Computational Biology, and Health Informatics, pp 158–167

Download references

Acknowledgements

This work is a result of the research project TIN2016-80920-R, funded by the Spanish Government. It has also been supported by Junta de Andalucía under research projects UMA18-FEDERJA-108, UMA18-FEDERJA-113, and TEP-2279.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Denisa-Andreea Constantinescu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Constantinescu, DA., Navarro, A., Corbera, F. et al. Efficiency and productivity for decision making on low-power heterogeneous CPU+GPU SoCs. J Supercomput 77, 44–65 (2021). https://doi.org/10.1007/s11227-020-03257-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03257-3

Keywords

Navigation