research-article

Open access

Efficient Phase-Functioned Real-time Character Control in Mobile Games: A TVM Enabled Approach

Authors:

Minwen DengAuthors Info & Claims

ICPP '22: Proceedings of the 51st International Conference on Parallel Processing

Article No.: 15, Pages 1 - 9

https://doi.org/10.1145/3545008.3545095

Published: 13 January 2023 Publication History

All formats PDF

Abstract

In this paper, we propose a highly efficient computing method for game character control with phase-functioned neural networks (PFNN). The primary challenge to accelerate PFNN on mobile platforms is that PFNN dynamically produces weight matrices with an argument, phase, which is individual to each game character. Therefore existing libraries that generally assume frozen weight matrices are inefficient to accelerate PFNN. The situation becomes even worse when multiple characters are present. To address the challenges, we reformulate the equations and leverage the deep learning compiler stack TVM to build a cross-platform, high-performance implementation. Evaluations reveal that our solutions deliver close-to-peak performance on various platforms, from high-performance servers to energy-efficient mobile platforms. This work is publicly available at https://github.com/turbo0628/pfnn_tvm.

References

[1]

Riyadh Baghdadi, Abdelkader Nadir Debbagh, Kamel Abdous, Fatima Zohra Benhamida, Alex Renda, Jonathan Elliott Frankle, Michael Carbin, and Saman Amarasinghe. 2020. TIRAMISU: A polyhedral compiler for dense and sparse deep learning. arXiv preprint arXiv:2005.04091(2020).

[2]

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, 2018. {TVM}: An automated end-to-end optimizing compiler for deep learning. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 578–594.

[3]

Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. Learning to optimize tensor programs. arXiv preprint arXiv:1805.08166(2018).

[4]

Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759(2014).

[5]

Kazushige Goto and Robert A van de Geijn. 2008. Anatomy of high-performance matrix multiplication. ACM Transactions on Mathematical Software (TOMS) 34, 3 (2008), 1–25.

Digital Library

[6]

Gaël Guennebaud, Benoit Jacob, 2010. Eigen. URl: http://eigen. tuxfamily. org 3 (2010).

[7]

Alexander Heinecke, Greg Henry, Maxwell Hutchinson, and Hans Pabst. 2016. LIBXSMM: accelerating small matrix multiplications by runtime code generation. In SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 981–991.

[8]

Daniel Holden, Taku Komura, and Jun Saito. 2017. Phase-functioned neural networks for character control. ACM Transactions on Graphics (TOG) 36, 4 (2017), 1–13.

Digital Library

[9]

Ping Kuang, Dingli Luo, Haoshuang Wang, and Lixue Zhang. 2019. An improved calculation system for phase-functioned neural network and implementation in unreal engine. Cluster Computing 22, 6 (2019), 15505–15516.

[10]

Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. 2020. MLIR: A compiler infrastructure for the end of Moore’s law. arXiv preprint arXiv:2002.11054(2020).

[11]

Tzu-Mao Li, Michaël Gharbi, Andrew Adams, Frédo Durand, and Jonathan Ragan-Kelley. 2018. Differentiable programming for image processing and deep learning in Halide. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1–13.

Digital Library

[12]

Yongchao Liu, Yue Jin, Yong Chen, Teng Teng, Hang Ou, Rui Zhao, and Yao Zhang. 2020. Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware Multifaceted Optimizations. arXiv preprint arXiv:2008.04567(2020).

[13]

Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. Acm Sigplan Notices 48, 6 (2013), 519–530.

Digital Library

[14]

Sebastian Starke, He Zhang, Taku Komura, and Jun Saito. 2019. Neural state machine for character-scene interactions.ACM Trans. Graph. 38, 6 (2019), 209–1.

Digital Library

[15]

Sebastian Starke, Yiwei Zhao, Taku Komura, and Kazi Zaman. 2020. Local motion phases for learning multi-contact character movements. ACM Transactions on Graphics (TOG) 39, 4 (2020), 54–1.

Digital Library

[16]

Sebastian Starke, Yiwei Zhao, Fabio Zinno, and Taku Komura. 2021. Neural animation layering for synthesizing martial arts movements. ACM Transactions on Graphics (TOG) 40, 4 (2021), 1–16.

Digital Library

[17]

Field G Van Zee and Robert A Van De Geijn. 2015. BLIS: A framework for rapidly instantiating BLAS functionality. ACM Transactions on Mathematical Software (TOMS) 41, 3 (2015), 1–33.

Digital Library

[18]

Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv:1802.04730(2018).

[19]

Endong Wang, Qing Zhang, Bo Shen, Guangyong Zhang, Xiaowei Lu, Qing Wu, and Yajuan Wang. 2014. Intel math kernel library. In High-Performance Computing on the Intel® Xeon Phi™. Springer, 167–188.

[20]

Zhang Xianyi, Wang Qian, and Zaheer Chothia. 2012. Openblas. URL: http://xianyi. github. io/OpenBLAS 88 (2012).

[21]

He Zhang, Sebastian Starke, Taku Komura, and Jun Saito. 2018. Mode-adaptive neural networks for quadruped motion control. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1–11.

Digital Library

[22]

Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, 2020. Ansor: Generating high-performance tensor programs for deep learning. In 14th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 20). 863–879.

Index Terms

Efficient Phase-Functioned Real-time Character Control in Mobile Games: A TVM Enabled Approach
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems
    2. Robotics
2. Networks
  1. Network properties
    1. Network reliability

Recommendations

Optimizing Deep Learning Workloads on ARM GPU with TVM
ReQuEST '18: Proceedings of the 1st on Reproducible Quality-Efficient Systems Tournament on Co-designing Pareto-efficient Deep Learning

With the great success of deep learning, the demand for deploying deep neural networks to mobile devices is growing rapidly. However, current popular deep learning frameworks are often poorly optimized for mobile devices, especially mobile GPU. In this ...
C++OpenCL4TVM: Support C++OpenCL Kernel for TVM NN Operators
IWOCL '22: Proceedings of the 10th International Workshop on OpenCL

In an era of artificial intelligence (AI), OpenCL serves as one of the AI frameworks’ back-ends, notably, the tensor virtual machine (TVM), which focuses on the inference side of neural networks. After optimizing a computational graph, TVM traverses the ...
Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and Simulation

High performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP '22: Proceedings of the 51st International Conference on Parallel Processing

August 2022

976 pages

ISBN:9781450397339

DOI:10.1145/3545008

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 January 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

General Program of Key Technology from Shenzhen
Shenzhen Basic Research Fund
National Science Foundation of China
National Key Research and Development Program of China
Strategic Priority CAS Project
the Key Research and Development Project of Guangdong Province

Conference

ICPP '22

ICPP '22: 51st International Conference on Parallel Processing

August 29 - September 1, 2022

Bordeaux, France

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
434
Total Downloads

Downloads (Last 12 months)211
Downloads (Last 6 weeks)17

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten