short-paper

Heterogeneous Managed Runtime Systems: A Computer Vision Case Study

Authors:

Christos Kotselidis,

James Clarkson,

Andrey Rodchenko,

Mikel LujánAuthors Info & Claims

VEE '17: Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

Pages 74 - 82

https://doi.org/10.1145/3050748.3050764

Published: 08 April 2017 Publication History

Abstract

Real-time 3D space understanding is becoming prevalent across a wide range of applications and hardware platforms. To meet the desired Quality of Service (QoS), computer vision applications tend to be heavily parallelized and exploit any available hardware accelerators. Current approaches to achieving real-time computer vision, evolve around programming languages typically associated with High Performance Computing along with binding extensions for OpenCL or CUDA execution.

Such implementations, although high performing, lack portability across the wide range of diverse hardware resources and accelerators. In this paper, we showcase how a complex computer vision application can be implemented within a managed runtime system. We discuss the complexities of achieving high-performing and portable execution across embedded and desktop configurations. Furthermore, we demonstrate that it is possible to achieve the QoS target of over 30 frames per second (FPS) by exploiting FPGA and GPGPU acceleration transparently through the managed runtime system.

References

[1]

B. Alpern, C. R. Attanasio, J. J. Barton, M. G. Burke, P. Cheng, J.-D. Choi, A. Cocchi, S.J. Fink, D. Grove, M. Hind, S. F. Hummel, D. Lieber, V. Litvinov, M. F. Mergen, T. Ngo, J. R. Russell, V. Sarkar, M. J. Serrano, J. C. Shepherd, S. E. Smith, V. C. Sreedhar, H. Srinivasan, and J. Whaley. 2000. The JalapeñO Virtual Machine. IBM Systems Journal (2000).

[2]

AMD-Aparapi. 2017. http://developer.amd.com/tools-and-sdks/heterogeneous-computing/aparapi/. (Feb. 2017).

[3]

Arvind. 2003. Bluespec: A Language for Hardware Design, Simulation, Synthesis and Verification Invited Talk. In Proceedings of the First ACM and IEEE International Conference on Formal Methods and Models for Co-Design (MEMOCODE '03). IEEE Computer Society, Washington, DC, USA, 249--. http://dl.acm.org/citation.cfm?id=823453.823860

[4]

Joshua Auerbach, David F. Bacon, Perry Cheng, and Rodric Rabbah. 2010. Lime: A Java-compatible and Synthesizable Language for Heterogeneous Architectures. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA '10). ACM, New York, NY, USA, 89--108.

Digital Library

[5]

James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. 2010. Theano: a CPU and GPU Math Expression Compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy).

[6]

P. J. Besl and H. D. McKay. 1992. A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 14, 2 (Feb 1992), 239--256.

Digital Library

[7]

S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In OOPSLA '06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-Oriented Programing, Systems, Languages, and Applications. ACM Press.

[8]

J. Butzke, K. Daniilidis, A. Kushleyev, D. D. Lee, M. Likhachev, C. Phillips, and M. Phillips. 2012. The University of Pennsylvania MAGIC 2010 multi-robot unmanned vehicle system. Journal of Field Robotics 29, 5 (2012), 745--761.

Digital Library

[9]

Bryan Catanzaro, Michael Garland, and Kurt Keutzer. 2011. Copperhead: Compiling an Embedded Data Parallel Language. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP '11). ACM, New York, NY, USA, 47--56.

Digital Library

[10]

Olivier Chafik. 2017. ScalaCL: Faster Scala: optimizing compiler plugin + GPU-based collections (OpenCL). (Feb. 2017). Retrieved March 11, 2017 from http://code.google.com/p/scalacl

[11]

Manuel M.T. Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, and Vinod Grover. 2011. Accelerating Haskell Array Codes with Multicore GPUs. In Proceedings of the Sixth Workshop on Declarative Aspects of Multicore Programming (DAMP '11). ACM, New York, NY, USA, 3--14.

Digital Library

[12]

James Clarkson, Christos Kotselidis, Gavin Brown, and Mikel Luján. 2017. Boosting Java Performance using GPGPUs. In Proceedings of the 30th International Conference on Architecture of Computing Systems (ARCS '17).

[13]

Ronan Collobert, Koray Kavukcuoglu, and Clément Farabet. 2011. Torch7: A Matlab-like Environment for Machine Learning. In BigLearn, NIPS Workshop.

[14]

Georg Dotzler, Ronald Veldema, and Michael Klemm. 2010. JCudaMP. In Proceedings of the 3rd International Workshop on Multicore Software Engineering.

Digital Library

[15]

EJML. 2017. (Feb. 2017). Retrieved March 11, 2017 from http://ejml.org

[16]

Juan José Fumero, Michel Steuwer, and Christophe Dubach. 2014. A Composable Array Function Interface for Heterogeneous Computing in Java. In Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY'14). ACM, New York, NY, USA, 44:44--44:49.

Digital Library

[17]

A. Handa, T. Whelan, J.B. McDonald, and A.J. Davison. 2014. A Benchmark for RGB-D Visual Odometry, 3D Reconstruction and SLAM. In ICRA.

[18]

A. Handa, T. Whelan, J.B. McDonald, and A.J. Davison. 2014. A Benchmark for RGB-D Visual Odometry, 3D Reconstruction and SLAM. In IEEE Intl. Conf. on Robotics and Automation, ICRA. Hong Kong, China.

[19]

Akihiro Hayashi, Max Grossman, Jisheng Zhao, Jun Shirako, and Vivek Sarkar. 2013. Accelerating Habanero-Java Programs with OpenCL Generation. In Proceedings of the 2013 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools.

Digital Library

[20]

Sylvain Henry. 2013. ViperVM: A Runtime System for Parallel Functional High-performance Computing on Heterogeneous Architectures. In Proceedings of the 2Nd ACM SIGPLAN Workshop on Functional High-performance Computing (FHPC '13). ACM, New York, NY, USA, 3--12.

Digital Library

[21]

Stephan Herhut, Richard L. Hudson, Tatiana Shpeisman, and Jaswanth Sreeram. 2013. River Trail: A Path to Parallelism in JavaScript. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA '13). ACM, New York, NY, USA, 729--744.

Digital Library

[22]

JEP 243: Java-Level JVM Compiler Interface. 2017. http://openjdk.java.net/jeps/243. (Feb. 2017).

[23]

Java bindings for OpenCL. 2017. (Feb. 2017). Retrieved March 11, 2017 from http://www.jocl.org/

[24]

Andreas Klöckner, Nicolas Pinto, Yunsup Lee, Bryan Catanzaro, Paul Ivanov, and Ahmed Fasih. 2012. PyCUDA and PyOpenCL: A Scripting-based Approach to GPU Run-time Code Generation. Parallel Comput. 38, 3 (March 2012), 157--174.

Digital Library

[25]

Christos Kotselidis, Andrey Rodchenko, Colin Barrett, Andy Nisbet, John Mawer, Will Toms, James Clarksonand Cosmin Gorgovan, Amanieu d'Antras, Yaman Cakmakci, Thanos Stratikopoulos, Sebatian Werner, Jim Garside, Javier Navaridas, Antoniu Pop, John Goodacre, and Mikel Luján. 2016. Project Beehive: A Hardware/Software Co-designed Stack for Runtime and Architectural Research. In Proceedings of the 9th International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG '16).

[26]

Geoffrey Mainland and Greg Morrisett. 2010. Nikola: Embedding Compiled GPU Functions in Haskell. In Proceedings of the Third ACM Haskell Symposium on Haskell (Haskell '10). ACM, New York, NY, USA, 67--78.

Digital Library

[27]

Luigi Nardi, Bruno Bodin, M. Zeeshan Zia, John Mawer, Andy Nisbet, Paul H.J. Kelly, Andrew J. Davison, Mikel Luján, Michael F. P. O'Boyle, Graham Riley, Nigel Topham, and Steve Furber. 2015. Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM. In ICRA.

[28]

Richard A. Newcombe, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J. Davison, Pushmeet Kohli, Jamie Shotton, Steve Hodges, and Andrew Fitzgibbon. 2011. KinectFusion: Real-time Dense Surface Mapping and Tracking. In Proceedings of the 2011 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR '11). IEEE Computer Society, Washington, DC, USA, 127--136.

Digital Library

[29]

Nathaniel Nystrom, Derek White, and Kishen Das. 2011. Firepile: Runtime Compilation for GPUs in Scala. In Proceedings of the 10th ACM International Conference on Generative Programming and Component Engineering (GPCE '11). ACM, New York, NY, USA, 107--116.

Digital Library

[30]

OpenJDK. 2017. http://openjdk.java.net/. (Feb. 2017).

[31]

P.C. Pratt-Szeliga, J.W. Fawcett, and R.D. Welch. 2012. Rootbeer: Seamlessly Using GPUs from Java. In Proceedings of 14th International IEEE High Performance Computing and Communication Conference on Embedded Software and Systems.

Digital Library

[32]

Alex Rubinsteyn, Eric Hielscher, Nathaniel Weinman, and Dennis Shasha. 2012. Parakeet: A Just-in-time Parallel Accelerator for Python. In Proceedings of the 4th USENIX Conference on Hot Topics in Parallelism (HotPar'12). USENIX Association, Berkeley, CA, USA, 14--14.

Digital Library

[33]

SpecJVM2008. 2017. https://www.spec.org/jvm2008/. (Feb. 2017).

[34]

Lukas Stadler, Thomas Würthinger, and Hanspeter Mössenböck. 2014. Partial Escape Analysis and Scalar Replacement for Java. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '14). ACM, New York, NY, USA, 165:165--165:174.

Digital Library

[35]

Tango. 2017. (Feb. 2017). Retrieved March 11, 2017 from https://get.google.com/tango/

[36]

Christian Wimmer, Michael Haupt, Michael L. Van De Vanter, Mick Jordan, Laurent Daynès, and Douglas Simon. 2013. Maxine: An Approachable Virtual Machine for, and in, Java. ACM Trans. Archit. Code Optim. (January 2013).

Digital Library

[37]

Yonghong Yan, Max Grossman, and Vivek Sarkar. 2009. JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA. In Euro-Par 2009 Parallel Processing, Henk Sips, Dick Epema, and Hai-Xiang Lin (Eds.), Vol. 5704. Springer Berlin Heidelberg.

Digital Library

[38]

Wojciech Zaremba, Yuan Lin, and Vinod Grover. 2012. JaBEE: Framework for Object-oriented Java Bytecode Compilation and Execution on Graphics Processor Units. In Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units (GPGPU-5). ACM, New York, NY, USA, 74--83.

Digital Library

[39]

Zhengyou Zhang. 1994. Iterative Point Matching for Registration of Free-form Curves and Surfaces. Int. J. Comput. Vision 13, 2 (Oct. 1994), 119--152.

Digital Library

Cited By

Fumero JBlanaru FStratikopoulos ADohrmann SViswanathan SKotselidis CBruno RMoss E(2023)Unified Shared Memory: Friend or Foe? Understanding the Implications of Unified Memory on Managed HeapsProceedings of the 20th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes10.1145/3617651.3622984(143-157)Online publication date: 19-Oct-2023
https://dl.acm.org/doi/10.1145/3617651.3622984
Papadakis OAndronikakis AFoutris NPapadimitriou MStratikopoulos AZakkak FXekalakis PKotselidis CBruno RMoss E(2023)A Multifaceted Memory Analysis of Java BenchmarksProceedings of the 20th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes10.1145/3617651.3622978(70-84)Online publication date: 19-Oct-2023
https://dl.acm.org/doi/10.1145/3617651.3622978
Papadakis OAndronikakis AFoutris NPapadimitriou MStratikopoulos AZakkak FXekalakis PKotselidis CBlackburn SPetrank E(2023)Scaling Up Performance of Managed Applications on NUMA SystemsProceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management10.1145/3591195.3595270(1-14)Online publication date: 6-Jun-2023
https://dl.acm.org/doi/10.1145/3591195.3595270
Show More Cited By

Recommendations

Heterogeneous Managed Runtime Systems: A Computer Vision Case Study
VEE '17

Real-time 3D space understanding is becoming prevalent across a wide range of applications and hardware platforms. To meet the desired Quality of Service (QoS), computer vision applications tend to be heavily parallelized and exploit any available ...
Compiler and runtime support for enabling reduction computations on heterogeneous systems

A trend that has materialized, and has given rise to much attention, is of the increasingly heterogeneous computing platforms. Presently, it has become very common for a desktop or a notebook computer to come equipped with both a multi-core CPU and a ...
Application Performance on the Newest Processors and GPUs
PEARC '18: Proceedings of the Practice and Experience on Advanced Research Computing: Seamless Creativity

This paper discusses the capabilities of the newest processors and GPUs to run a mixture of the most common chemistry applications. The baseline system for these comparisons is the 32-core Intel Broadwell processor which has been around for two years. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

VEE '17: Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

April 2017

261 pages

ISBN:9781450349482

DOI:10.1145/3050748

ACM SIGPLAN Notices Volume 52, Issue 7
VEE '17
July 2017
256 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3140607
Editor:
Matthew Fluet
Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 April 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Conference

VEE '17

Sponsor:

VEE '17: 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

April 8 - 9, 2017

Xi'an, China

Acceptance Rates

VEE '17 Paper Acceptance Rate 18 of 43 submissions, 42%;

Overall Acceptance Rate 80 of 235 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
503
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)4

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fumero JBlanaru FStratikopoulos ADohrmann SViswanathan SKotselidis CBruno RMoss E(2023)Unified Shared Memory: Friend or Foe? Understanding the Implications of Unified Memory on Managed HeapsProceedings of the 20th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes10.1145/3617651.3622984(143-157)Online publication date: 19-Oct-2023
https://dl.acm.org/doi/10.1145/3617651.3622984
Papadakis OAndronikakis AFoutris NPapadimitriou MStratikopoulos AZakkak FXekalakis PKotselidis CBruno RMoss E(2023)A Multifaceted Memory Analysis of Java BenchmarksProceedings of the 20th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes10.1145/3617651.3622978(70-84)Online publication date: 19-Oct-2023
https://dl.acm.org/doi/10.1145/3617651.3622978
Papadakis OAndronikakis AFoutris NPapadimitriou MStratikopoulos AZakkak FXekalakis PKotselidis CBlackburn SPetrank E(2023)Scaling Up Performance of Managed Applications on NUMA SystemsProceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management10.1145/3591195.3595270(1-14)Online publication date: 6-Jun-2023
https://dl.acm.org/doi/10.1145/3591195.3595270
Hartley TZakkak FNisbet AKotselidis CLuján M(2022)Just-In-Time Compilation on ARM—A Closer Look at Call-Site Code ConsistencyACM Transactions on Architecture and Code Optimization10.1145/354656819:4(1-23)Online publication date: 16-Sep-2022
https://dl.acm.org/doi/10.1145/3546568
Matsumoto KUgawa TIwasaki HLippautz MChisnall D(2022)Replication-based object persistence by reachabilityProceedings of the 2022 ACM SIGPLAN International Symposium on Memory Management10.1145/3520263.3534653(43-56)Online publication date: 14-Jun-2022
https://dl.acm.org/doi/10.1145/3520263.3534653
Horta EChuang HVSathish NPhilippidis CBarbalace AOlivier PRavindran BZhang KGherbi AVenkatasubramanian NVeiga L(2021)Xar-trekProceedings of the 22nd International Middleware Conference10.1145/3464298.3493388(104-118)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.1145/3464298.3493388
Papadimitriou MMarkou EFumero JStratikopoulos ABlanaru FKotselidis CTitzer BXu HZhang I(2021)Multiple-tasks on multiple-devices (MTMD): exploiting concurrency in heterogeneous managed runtimesProceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3453933.3454019(125-138)Online publication date: 7-Apr-2021
https://dl.acm.org/doi/10.1145/3453933.3454019
Papadimitriou MFumero JStratikopoulos AKotselidis CTitzer BXu HZhang I(2021)Automatically exploiting the memory hierarchy of GPUs through just-in-time compilationProceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3453933.3454014(57-70)Online publication date: 7-Apr-2021
https://dl.acm.org/doi/10.1145/3453933.3454014
Kotselidis CDiamantopoulos SAkrivopoulos ORosenfeld VDoka KMohammed HMylonas GSpitadakis VMorgan WDi Natale GFummi F(2020)Efficient compilation and execution of JVM-based data processing frameworks on heterogeneous co-processorsProceedings of the 23rd Conference on Design, Automation and Test in Europe10.5555/3408352.3408392(175-179)Online publication date: 9-Mar-2020
https://dl.acm.org/doi/10.5555/3408352.3408392
Kotselidis CDiamantopoulos SAkrivopoulos ORosenfeld VDoka KMohammed HMylonas GSpitadakis VMorgan W(2020)Efficient Compilation and Execution of JVM-Based Data Processing Frameworks on Heterogeneous Co-Processors2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE48585.2020.9116246(175-179)Online publication date: Mar-2020
https://doi.org/10.23919/DATE48585.2020.9116246
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten