Efficiency and productivity for decision making on low-power heterogeneous CPU+GPU SoCs

Constantinescu, Denisa-Andreea; Navarro, Angeles; Corbera, Francisco; Fernández-Madrigal, Juan-Antonio; Asenjo, Rafael

doi:10.1007/s11227-020-03257-3

Efficiency and productivity for decision making on low-power heterogeneous CPU+GPU SoCs

Published: 23 March 2020

Volume 77, pages 44–65, (2021)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Denisa-Andreea Constantinescu¹,
Angeles Navarro¹,
Francisco Corbera¹,
Juan-Antonio Fernández-Madrigal² &
…
Rafael Asenjo¹

909 Accesses
10 Citations
Explore all metrics

Abstract

Markov decision processes provide a formal framework for a computer to make decisions autonomously and intelligently when the effects of its actions are not deterministic. This formalism has had tremendous success in many disciplines; however, its implementation on platforms with scarce computing capabilities and power, as it happens in robotics or autonomous driving, is still limited. To solve this computationally complex problem efficiently under these constraints, high-performance accelerator hardware and parallelized software come to the rescue. In particular, in this work, we evaluate off-line-tuned static and dynamic versus adaptive heterogeneous scheduling strategies for executing value iteration—a core procedure in many decision-making methods, such as reinforcement learning and task planning—on a low-power heterogeneous CPU+GPU SoC that only uses 10–15 W. Our experimental results show that by using CPU+GPU heterogeneous strategies, the computation time and energy required are considerably reduced. They can be up to 54% (61%) faster and 57% (65%) more energy-efficient with respect to multicore—TBB—(or GPU-only—OpenCL—) implementation. Additionally, we also explore the impact of increasing the abstraction level of the programming model to ease the programming effort. To that end, we compare the TBB+OpenCL vs. the TBB+oneAPI implementations of our heterogeneous schedulers, observing that oneAPI versions result in up to \(5\times\) less programming effort and only incur in 3–8% of overhead if the scheduling strategy is selected carefully.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Orchestrated Co-scheduling, Resource Partitioning, and Power Capping on CPU-GPU Heterogeneous Systems via Machine Learning

Perception-Aware Motion Planning via Multiobjective Search on GPUs

Prototyping Methodology with Motion Estimation Algorithm

Notes

A detailed report on the matter is available in [5], Sects. 3.2 and 3.3.
ScheduleUSM is the same code as ScheduleBUFF.

References

Barber R, Crespo J, Gomez C, Hernamdez A, Galli M (2019) Mobile robot navigation in indoor environments: geometric, topological, and semantic navigation, chapter 5. Intech Open, London, pp 393–640
Google Scholar
Bellman R (1954) The theory of dynamic programming. Bull Am Math Soc 60(6):503–515
Article MathSciNet Google Scholar
Bertsekas DP (2007) Dynamic programming and optimal control, vol 2, 3rd edn. Athena Scientific, Nashua
MATH Google Scholar
Boucherie RJ, van Dijk NM (eds) (2017) Markov decision processes in practice. Springer
Constantinescu DA (2017) Optimization of a decision making algorithm under uncertainty for heterogeneous platforms. Master’s thesis, Universidad de Málaga. https://doi.org/10.13140/RG.2.2.24922.70082
Coradeschi S et al (2014) GiraffPlus: a system for monitoring activities and physiological parameters and promoting social interaction for elderly. In: Hippe ZS, Kulikowski JL, Mroczek T, Wtorek J (eds) Human–Computer Systems Interaction: Backgrounds and Applications 3. Springer, New York
Google Scholar
Corbera F, Rodríguez A, Asenjo R, Navarro A, Vilches A, Garzarán MJ (2015) Reducing overheads of dynamic scheduling on heterogeneous chips. arXiv preprint arXiv:1501.03336
Dios AJ, Asenjo R, Navarro AG, Corbera F, Zapata EL (2011) High-level template for the task-based parallel wavefront pattern. In: 18th International Conference on High Performance Computing
Fernández-Madrigal JA, Cruz-Martin AM, Aguilar-Moreno M, Vega IF (2019) CRUMB: cognitive-robotics-supporting mobile base (consulted 1st of August, 2019). http://babel.isa.uma.es/crumb
Gordon GJ (1999) Approximate solutions to markov decision processes. Ph.D. thesis, Carnegie Mellon University Pittsburgh. http://reports-archive.adm.cs.cmu.edu/anon/1999/CMU-CS-99-143.pdf
Group K (2019) SYCL specification: SYCL integrates OpenCL devices with modern C++, v1.2.1
Hernandez B, Pérez H, Rudomin I, Ruiz S, de Gyves O, Toledo L (2014) Simulating and visualizing real-time crowds on GPU clusters. Comput Sist 18(4):651–664
Google Scholar
Iannucci S, Chen Q, Abdelwahed S (2016) High-performance intrusion response planning on many-core architectures. In: International Conference on Computer Communication and Networks (ICCCN). IEEE, pp 1–6
Intel: Intel oneAPI Programming Guide (Beta) (2019)
Jaskowski W (2017) Mastering 2048 with delayed temporal coherence learning, multi-stage weight promotion, redundant encoding and carousel shaping. In: IEEE Transactions on Computational Intelligence and AI in Games
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp 1928–1937
Munir A, Gordon-Ross A, Ranka S (2015) Modeling and optimization of parallel and distributed embedded systems. Wiley, New York
Google Scholar
Navarro A, Corbera F, Rodriguez A, Vilches A, Asenjo R (2019) Heterogeneous parallel_for template for CPU-GPU chips. Int J Parallel Program 47(2):213–233
Article Google Scholar
Powell WB (2011) Approximate dynamic programming: solving the curses of dimensionality, 2nd edn. Wiley, New York
Book Google Scholar
Puterman ML (2005) Markov decision processes: discrete stochastic dynamic programming (Wiley series in probability and statistics). Wiley, New York
Google Scholar
Robotics C (2019) V-REP: virtual robot experimentation platform (consulted 1st of August, 2019). www.coppeliarobotics.com
Rodríguez A, Navarro A, Asenjo R, Corbera F, Gran R, Suárez D, Nunez-Yanez J (2019) Parallel multiprocessing and scheduling on the heterogeneous Xeon+FPGA platform. J Supercomput. https://doi.org/10.1007/s11227-019-02935-1
Article Google Scholar
Ruiz S, Hernández B (2015) A parallel solver for Markov decision process in crowd simulations. In: 2015 Fourteenth Mexican International Conference on Artificial Intelligence (MICAI). IEEE, pp 107–116
Sigaud O, Buffet O (2013) Markov decision processes in artificial intelligence. Wiley, New York
Book Google Scholar
Tai L, Liu M (2016) Mobile robots exploration through CNN-based reinforcement learning. Robot Biomim 3(1):24
Article Google Scholar
Thakur A, Svec P, Gupta SK (2012) GPU based generation of state transition models using simulations for unmanned surface vehicle trajectory planning. Robot Auton Syst 60(12):1457–1471
Article Google Scholar
Vega IF (2016) Development of a programming environment for a simulated TurtleBot-2 robot with a WindowsX manipulator arm through the connection of V-REP and MATLAB. B.Sc. thesis, University of Málaga
Voss M, Asenjo R, Reinders J (2019) Pro TBB: C++ parallel programming with threading building blocks. Apress, New York
Book Google Scholar
White D (1993) Markov decision processes. Wiley, New York
MATH Google Scholar
Wiering M, Otterlo M (eds) (2012) Reinforcement learning: state-of-the-art. Springer, New York
Google Scholar
Willhalm T, Dementiev R, Fay P (2020) Performance counter monitor (PCM) (consulted 21st of January, 2020). https://github.com/opcm/pcm
Wu Z (2017) Parallelizing model checking algorithms using multi-core and many-core architectures. Ph.D. thesis, Nanyang Technological University, Singapore
Yamaguchi U, Saito F, Ikeda K, Yamamoto T (2015) HSR, human support robot as research and development platform. In: International Conference on Advanced Mechatronics: Toward Evolutionary Fusion of IT and Mechatronics, pp 39–40
Zhou H, Khatri SP, Hu J, Liu F, Sze C (2017) Fast and highly scalable Bayesian MDP on a GPU platform. In: International Conference on Bioinformatics, Computational Biology, and Health Informatics, pp 158–167

Download references

Acknowledgements

This work is a result of the research project TIN2016-80920-R, funded by the Spanish Government. It has also been supported by Junta de Andalucía under research projects UMA18-FEDERJA-108, UMA18-FEDERJA-113, and TEP-2279.

Author information

Authors and Affiliations

Department of Computer Architecture, Universidad de Málaga, Málaga, Spain
Denisa-Andreea Constantinescu, Angeles Navarro, Francisco Corbera & Rafael Asenjo
Department of Systems Engineering and Automation, Universidad de Málaga, Málaga, Spain
Juan-Antonio Fernández-Madrigal

Authors

Denisa-Andreea Constantinescu
View author publications
You can also search for this author in PubMed Google Scholar
Angeles Navarro
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Corbera
View author publications
You can also search for this author in PubMed Google Scholar
Juan-Antonio Fernández-Madrigal
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Asenjo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Denisa-Andreea Constantinescu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Constantinescu, DA., Navarro, A., Corbera, F. et al. Efficiency and productivity for decision making on low-power heterogeneous CPU+GPU SoCs. J Supercomput 77, 44–65 (2021). https://doi.org/10.1007/s11227-020-03257-3

Download citation

Published: 23 March 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s11227-020-03257-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficiency and productivity for decision making on low-power heterogeneous CPU+GPU SoCs

Abstract

Access this article

Similar content being viewed by others

Orchestrated Co-scheduling, Resource Partitioning, and Power Capping on CPU-GPU Heterogeneous Systems via Machine Learning

Perception-Aware Motion Planning via Multiobjective Search on GPUs

Prototyping Methodology with Motion Estimation Algorithm

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficiency and productivity for decision making on low-power heterogeneous CPU+GPU SoCs

Abstract

Access this article

Similar content being viewed by others

Orchestrated Co-scheduling, Resource Partitioning, and Power Capping on CPU-GPU Heterogeneous Systems via Machine Learning

Perception-Aware Motion Planning via Multiobjective Search on GPUs

Prototyping Methodology with Motion Estimation Algorithm

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation