Abstract
With growing applications such as image recognition, speech recognition, ADAS, and AIoT, artificial intelligence (AI) frameworks are becoming popular in various industries. Currently, many choices for neural network frameworks exist for executing AI models in applications, especially for training/inference purposes, including TensorFlow, Caffe, MXNet, PyTorch, Core ML, TensorFlow Lite, and NNAPI. With so many different emerging frameworks, exchange formats are needed for different AI frameworks. Given this requirement, the Khronos group created a standard draft known as the Neural Network Exchange Format (NNEF). However, because NNEF is new, conversion tools for various AI frameworks that would allow the exchange of various AI frameworks remain missing. In this work, we fill this gap by devising NNAPI conversion tools for NNEF. Our work allows NNEF to execute inference tasks on host and Android platforms and flexibly invokes Android neural networks through the API (NNAPI) on the Android platform to speed up inference operations. We invoke NNAPI by dividing the input NNEF model into multiple submodels and let NNAPI execute these submodels. We develop an algorithm named BFSelector that is based on a classic breadth-first search and includes cost constraints to determine how to divide the input model. Our preliminary experimental results show that our support of NNEF on NNAPI can obtain a speedup of 1.32 to 22.52 times over the baseline for API 27 and of 4.56 to 211 times over the baseline for API 28, where the baseline is the NNEF-to-Android platform conversion without invoking NNAPI. The experiment includes AI models such as LeNet, AlexNet, MobileNet_V1, MobileNet_V2, VGG-16, and VGG-19.
















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
TensorFlow. https://www.tensorflow.org/
MXNet. https://mxnet.apache.org/
PyTorch. https://pytorch.org/
TensorFlow Lite. https://www.tensorflow.org/lite/
NNAPI. https://developer.android.com/ndk/guides/neuralnetworks
The Khronos Group. https://www.khronos.org/
NNEF Overview. https://www.khronos.org/nnef
Sherstinsky A (2020) Fundamentals of recurrent neural network (RNN) and long short-term memory (ISTM) network. Phys D Nonlinear Phenom 404:132306
Protocol buffers. https://developers.google.com/protocol-buffers//
Google. https://www.google.com/
Android Studio. https://developer.android.com/studio/
LeCun Y, Bottou L, Bengio Y, Haffner P et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520
Simonyan K, Zisserman A Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp. 1026–1034
Tensorflow Lite. https://www.tensorflow.org/lite
KhronosGroup. KhronosGroup/NNEF-Tools. GitHub, 12 June 2019. github.com/KhronosGroup/NNEF-Tools/
The Web’s Scaffolding Tool for Modern Webapps. Yeoman. yeoman.io/
ImageNet. http://www.image-net.org/
Chen T, Moreau T, Jiang Z, Zheng L, Yan E, Shen H, Cowan M, Wang L, Hu Y, Ceze L et al (2018) \(\{\)TVM\(\}\): An automated end-to-end optimizing compiler for deep learning, In: Proceedings of the 13th \(\{\)USENIX\(\}\) symposium on operating systems design and implementation (\(\{\)OSDI\(\}\) 18), pp 578–594
Roesch J, Lyubomirsky S, Weber L, Pollock J, Kirisame M, Chen T, Tatlock Z (2018) Relay: a new IR for machine learning frameworks. In: Proceedings of the 2nd ACM SIGPLAN international workshop on machine learning and programming languages, pp 58–68
Lai M-Y, Sung C-Y, Lee J-K, Hung M-Y (2020) Enabling android nnapi flow for tvm runtime. In: Proceedings of the 49th international conference on parallel processing-ICPP: workshops, pp 1–8
Hung M-Y, Lai M-Y, Sung C-Y, Lee J-K (2020) A generic method to utilize vendor-specific AI accelerator on android mobile for TVM, TVM and deep learning compilation conference, Seattle,
Lee C-L, Chao C-T, Lee J-K, Hung M-Y, Huang C-W (2019) Accelerate DNN performance with sparse matrix compression in halide. In: Proceedings of the 48th international conference on parallel processing: workshops, pp 1–6
Develop applications and solutions that emulate human vision with the Intel Distribution of OpenVINO toolkit. https://software.intel.com/en-us/openvino-toolkit
Compute Library for Deep Neural Networks. https://github.com/intel/clDNN
Yu M-S, Chen T-L, Lee J-K (2020) Accelerating nnef framework on opencl devices using cldnn, In: Proceedings of the International Workshop on OpenCL, pp. 1–2
Bai J, Lu F, Zhang K et al, Onnx: Open neural network exchange, GitHub repository
ONNX runtime, https://www.onnxruntime.ai/
Acknowledgements
This work was supported in part by MOST of Taiwan and Mediatek.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Converting operators from NNEF to NNAPI
Appendix A: Converting operators from NNEF to NNAPI
Considering NNEF version 1.0 and NNAPI of API level 28, we can summarize the operators by our converter into six types. The following subsections explain these six different operator types in detail.
1.1 A.1 Operators with the same format
In this category, the format for NNEF and NNAPI operators is the same; therefore, we can translate these operators directly from NNEF to NNAPI without modifying their input/output tensor shapes and data layouts. Currently, our converter supports only a single operator of this kind: reshape. Taking this operator as an example, an NNEF snippet is shown in Listing 8, and the corresponding NNAPI code snippet is shown in Listing 9. In Listing 9, lines 2 and 4 describe the input and output shapes, respectively—the same as the NNEF version shown in Listing 8. Lines 5 to 12 prepare and add corresponding operands to the model. Lines 13 to 17 specify the input/output of the operator and add it to the model. As shown in the code snippet, the NNAPI reshape takes two inputs that have the same value/shape as the NNEF version and one output, which also has the same shape as the NNEF version.


1.2 A.2 Operators with different attributes
In the second category, the NNEF and NNAPI operators have different attributes; thus, when converting this type of operator, we must identify the purpose of each attribute and set a value corresponding to it. For example, in addition to input tensors, NNAPI has an extra attribute that is used to specify the type of activation to invoke on the result. In contrast, the add operation of NNEF has no such attribute; therefore, we currently set it to ANEURALNETWORKS_FUSED_NONE in our converter, which means no activation is used on the result. An example of converting this type of operator is shown in Listing 10 and Listing 11. There may be additional opportunities for optimization in this example, because we could set the value of the attribute according to the topology of the input NNEF graph. For example, an add followed by ReLU in NNEF can be translated into an add in NNAPI, with the attribute set to ANEURALNETWORKS_FUSED_RELU.

1.3 A.3 Operators with different names
In this category, operator names may differ between NNEF and NNAPI but have the same functionality. When translating this type of operator, the correct matching operator name must be found in NNAPI, and it must be ensured that it has the same properties as the NNEF operator. Take sigmoid as an example. The sigmoid operator is called logistic in NNAPI, but it has the same behavior as the sigmoid defined in NNEF. Listing 12 and Listing 13 illustrate the use of this operator in the two systems.


1.4 A.4 Operators with different tensor shapes
In this category, the operator has different tensor shape requirements between NNEF and NNAPI. After studying the specification of tensor shapes of each operator in NNEF and NNAPI, this type of operator can be identified. For example, an NNEF operator may need one of its input tensor shapes to be N-dimensional, where NNAPI requires it to be (N+/-1)-dimensional. When translating this type of operator, we must inspect the meaning of each dimension and find a way to reduce or increase its dimension so that it can meet the NNAPI requirement. Listing 14 and Listing 15 illustrate this type of situation; the convolution of NNEF requires the bias shape to be two-dimensional, while NNAPI requires it to be one-dimensional. However, by looking at the NNEF specification, we can see that the first dimension of the bias shape is fixed to 1. Consequently, it is permissible to remove this dimension directly because it does not affect the size of the tensor. Therefore, our converter extracts the second bias dimension and discards the first dimension when we convert the convolution from NNEF to NNAPI.


1.5 A.5 Operators with different data layouts
For the data layout, NNEF allows the NCHW data layout for 4-D tensors, while NNAPI uses NHWC; therefore, we must find a way to address this problem when the layout is different. Eventually, we decided to insert a transposition both before and after the operators that take 4-D tensors as input and output when the layouts are different. By correctly setting the transpose permutation, we can ensure that all 4-D tensors use the same data layout as NNAPI, as seen in Listing 14 and Listing 15. In the example NNAPI convolution code, there are two transpositions before the convolution, each of which takes [0,2,3,1] as the input permutation. This changes the data layout of the input 4-D tensors from NCHW to NHWC, which can be accepted in NNAPI. After the convolution, one transpose exists whose permutation is [0,3,1,2]. This changes the data layout of the output 4-D tensor from NHWC to NCHW. By performing this operation, we can ensure that when NNAPI passes the output 4-D tensor back to NNEF-RT, it uses the same data layout as NNEF.
Note that our converter inserts a transpose operation if and only if the 4-D tensor is the graph input or output of the NNEF model. In contrast, when an input 4-D tensor stems from another NNAPI operator, we know that it already uses the NHWC data layout; therefore, there is no need to transpose it to NCHW and then back to NHWC. For example, suppose we are converting the NNEF model shown in Listing 16. The conversion result is similar to the code snippet shown in Listing 17. We can see that no transpositions occur between convolution and max_pool, but they do occur before convolution (because it is a graph input) and after max_pool (because it is a graph output).
Furthermore, if the output tensor reshape operation is 4-D and is not a graph output, our converter transposes it implicitly (when appropriate). Thus, the following operator can use the output from the reshape directly without the need to insert transpose operations. As shown in Listing 16 and Listing 17, the reshaped NNEF output is [1,1,28,28], while the reshaped output of NNAPI is [1,28,28,1], which uses the NHWC data layout so that the subsequent convolution can directly take it as the input.


1.6 A.6 Operators with variations
In this category, operators may have variations, such as ReLU, ReLU1, or ReLU6 in NNAPI. However, NNEF does not include such variations; it uses ReLU and min to express ReLU1 and ReLU6. In our work, we provide an extension to existing NNEF to express ReLU1 and ReLU6, as seen in Listing 18. We modified NNEF-Parser so that ReLU can have an extra attribute called n, where n is set to 6, representing ReLU6. In this way, NNEF can have some variations of ReLU, similar to NNAPI. Listing 19 shows the conversion result of Listing 18; we can see that the ReLU where n is 6 in NNEF is translated into ReLU6 in NNAPI.

Rights and permissions
About this article
Cite this article
Chang, YM., Sung, CY., Sheu, YC. et al. Support NNEF execution model for NNAPI. J Supercomput 77, 10065–10096 (2021). https://doi.org/10.1007/s11227-021-03625-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-03625-7