research-article

Coarse-Grained Task Parallelization by Dynamic Profiling for Heterogeneous SoC-Based Embedded System

Authors:

Chaitali ChakrabartiAuthors Info & Claims

ACM Transactions on Embedded Computing Systems, Volume 24, Issue 1

Article No.: 18, Pages 1 - 32

https://doi.org/10.1145/3704635

Published: 10 December 2024 Publication History

Get Access

Abstract

In this study, we introduce a methodology for automatically transforming user applications written in C/C++ to a parallel representation consisting of coarse-grained tasks based on dynamic profiling. Such a parallel representation is suitable for mapping applications onto heterogeneous SoCs. We present our approach for instrumenting the user application binary during the compilation process with parallel primitives that enable the runtime system to schedule and execute independent computation-intensive coarse-grained tasks concurrently. We use the proposed compilation and code transformation methodology to retarget each application for execution on a heterogeneous SoC composed of processor cores and accelerators. We demonstrate the capabilities of our integrated compile time and runtime flow through task-level parallelization and functionally correct execution of real-world applications in the communication systems and radar processing domains. We demonstrate the functionality of our integrated system by executing six distinct applications with different degrees of parallelism on four different platforms: an eight-core general-purpose processor, a heterogeneous SoC simulator, and two heterogeneous SoCs utilizing the Xilinx Zynq UltraScale+ FPGA and the Nvidia Jetson AGX board. Our integrated approach offers a path forward for application developers to take full advantage of the target SoC without requiring users to become hardware or parallel programming experts.

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (Savannah, GA, USA) (OSDI’16). USENIX Association, USA, 265–283.

Abstract

References

Index Terms

Recommendations

Heterogeneous coarse-grained processing elements: a template architecture for embedded processing acceleration

Prototyping dynamic task migration on heterogeneous reconfigurable systems

Coarse-Grained Architecture for Fingerprint Matching

Comments

Information

Published In

Publisher

Journal Family

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Full Text

Share

Share this Publication link

Share on social media

Affiliations