Abstract
This paper investigates the feasibility of using heterogeneous computing for future advanced driver assistance systems (ADAS) applications. In particular, we take lane detection algorithm (LDA) as a test case. The algorithm is customized into FPGA-GPU heterogeneous implementations which can be executed in either workload constant or balanced scheme. Then the heterogeneous executions are evaluated in view of performance and energy consumption, and further compared with the single-accelerator run. Experiments show that the heterogeneous execution alleviates both the performance and energy bottlenecks caused when only using a single accelerator. Moreover, compared with the single FPGA execution, the workload balance scheme increases the performance by 236.9% and 42.9% on our two tested platforms respectively, while ensuring the low energy cost.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alawieh, M., Kasparek, M., Franke, N., Hupfer, J.: A high performance FPGA-GPU-CPU platform for a real-time locating system. In: 23rd European Signal Processing Conference (EUSIPCO), pp. 1576–1580. IEEE (2015)
Aly, M.: Caltech lanes. http://www.vision.caltech.edu/malaa/datasets/caltech-lanes. Accessed 10 Mar 2017
Asano, S., Maruyama, T., Yamaguchi, Y.: Performance comparison of FPGa, GPU and CPU in image processing. In: 19th International Conference on Field Programmable Logic and Applications (FPL), pp. 126–131. IEEE (2009)
Blair, C., Robertson, N.M., Hume, D.: Characterizing a heterogeneous system for person detection in video using histograms of oriented gradients: power versus speed versus accuracy. IEEE J. Emerg. Sel. Top. Circ. Syst. 3(2), 236–247 (2013)
Che, S., Li, J., Sheaffer, J.W., Skadron, K., Lach, J.: Accelerating compute-intensive applications with GPUs and FPGAs. In: Proceedings of the 6th IEEE Symposium on Application Specific Processors (SASP), pp. 101–107. IEEE (2008)
Chen, D., Singh, D.: Fractal video compression in OpenCL: an evaluation of CPUs, GPUs, and FPGAs as acceleration platforms. In: 18th Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 297–304. IEEE (2013)
Cope, B., Cheung, P.Y., Luk, W., Howes, L.: Performance comparison of graphics processors to reconfigurable logic: a case study. IEEE Trans. Comput. 59(4), 433–448 (2010)
Da Silva, B., Braeken, A., D’Hollander, E.H., Touhafi, A., Cornelis, J.G., Lemeire, J.: Comparing and combining GPU and FPGA accelerators in an image processing context. In: 23rd International Conference on Field Programmable Logic and Applications (FPL), pp. 1–4. IEEE (2013)
Fowers, J., Brown, G., Cooke, P., Stitt, G.: A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications. In: Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA), pp. 47–56. ACM (2012)
Huang, K., Hu, B., Botsch, J., Madduri, N., Knoll, A.: A scalable lane detection algorithm on COTSs with OpenCL. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 229–232. IEEE (2016)
Intel: Powerplay early power estimators and power analyzer. https://www.altera.com/support/support-resources/operation-and-testing/power/pow-powerplay.html. Accessed 10 Mar 2017
Meng, P., Jacobsen, M., Kastner, R.: FPGA-GPU-CPU heterogenous architecture for real-time cardiac physiological optical mapping. In: International Conference on Field-Programmable Technology (ICFPT), pp. 37–42. IEEE (2012)
Nurvitadhi, E., Sheffield, D., Sim, J., Mishra, A., Venkatesh, G., Marr, D.: Accelerating binarized neural networks: comparison of FPGA, CPU, GPU, and ASIC. In: International Conference on Field-Programmable Technology (ICFPT), pp. 37–42. IEEE (2016)
Nvidia: Nvidia® jetson™: the embedded platform for autonomous everything. http://www.nvidia.com/object/embedded-systems-dev-kits-modules.html. Accessed 10 Mar 2017
Struyf, L., De Beugher, S., Van Uytsel, D.H., Kanters, F., Goedemé, T.: The battle of the giants: a case study of GPU vs FPGA optimisation for real-time image processing. In: Proceedings of the 4th International Conference on Pervasive and Embedded Computing and Communication Systems (PECCS), vol. 1, pp. 112–119. VISIGRAPP (2014)
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)
Acknowledgments
This work is supported in part by the scholarship from China Scholarship Council (CSC) under the Grant No. 201506270152.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Wang, X., Liu, L., Huang, K., Knoll, A. (2017). Exploring FPGA-GPU Heterogeneous Architecture for ADAS: Towards Performance and Energy. In: Ibrahim, S., Choo, KK., Yan, Z., Pedrycz, W. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2017. Lecture Notes in Computer Science(), vol 10393. Springer, Cham. https://doi.org/10.1007/978-3-319-65482-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-65482-9_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65481-2
Online ISBN: 978-3-319-65482-9
eBook Packages: Computer ScienceComputer Science (R0)