Skip to main content
Log in

Support NNEF execution model for NNAPI

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

With growing applications such as image recognition, speech recognition, ADAS, and AIoT, artificial intelligence (AI) frameworks are becoming popular in various industries. Currently, many choices for neural network frameworks exist for executing AI models in applications, especially for training/inference purposes, including TensorFlow, Caffe, MXNet, PyTorch, Core ML, TensorFlow Lite, and NNAPI. With so many different emerging frameworks, exchange formats are needed for different AI frameworks. Given this requirement, the Khronos group created a standard draft known as the Neural Network Exchange Format (NNEF). However, because NNEF is new, conversion tools for various AI frameworks that would allow the exchange of various AI frameworks remain missing. In this work, we fill this gap by devising NNAPI conversion tools for NNEF. Our work allows NNEF to execute inference tasks on host and Android platforms and flexibly invokes Android neural networks through the API (NNAPI) on the Android platform to speed up inference operations. We invoke NNAPI by dividing the input NNEF model into multiple submodels and let NNAPI execute these submodels. We develop an algorithm named BFSelector that is based on a classic breadth-first search and includes cost constraints to determine how to divide the input model. Our preliminary experimental results show that our support of NNEF on NNAPI can obtain a speedup of 1.32 to 22.52 times over the baseline for API 27 and of 4.56 to 211 times over the baseline for API 28, where the baseline is the NNEF-to-Android platform conversion without invoking NNAPI. The experiment includes AI models such as LeNet, AlexNet, MobileNet_V1, MobileNet_V2, VGG-16, and VGG-19.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. TensorFlow. https://www.tensorflow.org/

  2. Caffe. http://caffe.berkeleyvision.org/

  3. MXNet. https://mxnet.apache.org/

  4. PyTorch. https://pytorch.org/

  5. Core ML. https://developer.apple.com/documentation/coreml/

  6. TensorFlow Lite. https://www.tensorflow.org/lite/

  7. NNAPI. https://developer.android.com/ndk/guides/neuralnetworks

  8. The Khronos Group. https://www.khronos.org/

  9. NNEF Overview. https://www.khronos.org/nnef

  10. Sherstinsky A (2020) Fundamentals of recurrent neural network (RNN) and long short-term memory (ISTM) network. Phys D Nonlinear Phenom 404:132306

    Article  MathSciNet  Google Scholar 

  11. Protocol buffers. https://developers.google.com/protocol-buffers//

  12. Google. https://www.google.com/

  13. Android Studio. https://developer.android.com/studio/

  14. LeCun Y, Bottou L, Bengio Y, Haffner P et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  15. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  16. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

  17. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520

  18. Simonyan K, Zisserman A Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  19. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Article  Google Scholar 

  20. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  21. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp. 1026–1034

  22. Tensorflow Lite. https://www.tensorflow.org/lite

  23. KhronosGroup. KhronosGroup/NNEF-Tools. GitHub, 12 June 2019. github.com/KhronosGroup/NNEF-Tools/

  24. The Web’s Scaffolding Tool for Modern Webapps. Yeoman. yeoman.io/

  25. MNIST. http://yann.lecun.com/exdb/mnist/

  26. CIFAR-10. https://www.cs.toronto.edu/~kriz/cifar.html

  27. ImageNet. http://www.image-net.org/

  28. Chen T, Moreau T, Jiang Z, Zheng L, Yan E, Shen H, Cowan M, Wang L, Hu Y, Ceze L et al (2018) \(\{\)TVM\(\}\): An automated end-to-end optimizing compiler for deep learning, In: Proceedings of the 13th \(\{\)USENIX\(\}\) symposium on operating systems design and implementation (\(\{\)OSDI\(\}\) 18), pp 578–594

  29. Roesch J, Lyubomirsky S, Weber L, Pollock J, Kirisame M, Chen T, Tatlock Z (2018) Relay: a new IR for machine learning frameworks. In: Proceedings of the 2nd ACM SIGPLAN international workshop on machine learning and programming languages, pp 58–68

  30. Lai M-Y, Sung C-Y, Lee J-K, Hung M-Y (2020) Enabling android nnapi flow for tvm runtime. In: Proceedings of the 49th international conference on parallel processing-ICPP: workshops, pp 1–8

  31. Hung M-Y, Lai M-Y, Sung C-Y, Lee J-K (2020) A generic method to utilize vendor-specific AI accelerator on android mobile for TVM, TVM and deep learning compilation conference, Seattle,

  32. Lee C-L, Chao C-T, Lee J-K, Hung M-Y, Huang C-W (2019) Accelerate DNN performance with sparse matrix compression in halide. In: Proceedings of the 48th international conference on parallel processing: workshops, pp 1–6

  33. Develop applications and solutions that emulate human vision with the Intel Distribution of OpenVINO toolkit. https://software.intel.com/en-us/openvino-toolkit

  34. Compute Library for Deep Neural Networks. https://github.com/intel/clDNN

  35. Yu M-S, Chen T-L, Lee J-K (2020) Accelerating nnef framework on opencl devices using cldnn, In: Proceedings of the International Workshop on OpenCL, pp. 1–2

  36. Bai J, Lu F, Zhang K et al, Onnx: Open neural network exchange, GitHub repository

  37. ONNX runtime, https://www.onnxruntime.ai/

Download references

Acknowledgements

This work was supported in part by MOST of Taiwan and Mediatek.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jenq-Kuen Lee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Converting operators from NNEF to NNAPI

Appendix A: Converting operators from NNEF to NNAPI

Considering NNEF version 1.0 and NNAPI of API level 28, we can summarize the operators by our converter into six types. The following subsections explain these six different operator types in detail.

1.1 A.1 Operators with the same format

In this category, the format for NNEF and NNAPI operators is the same; therefore, we can translate these operators directly from NNEF to NNAPI without modifying their input/output tensor shapes and data layouts. Currently, our converter supports only a single operator of this kind: reshape. Taking this operator as an example, an NNEF snippet is shown in Listing 8, and the corresponding NNAPI code snippet is shown in Listing 9. In Listing 9, lines 2 and 4 describe the input and output shapes, respectively—the same as the NNEF version shown in Listing 8. Lines 5 to 12 prepare and add corresponding operands to the model. Lines 13 to 17 specify the input/output of the operator and add it to the model. As shown in the code snippet, the NNAPI reshape takes two inputs that have the same value/shape as the NNEF version and one output, which also has the same shape as the NNEF version.

figure j
figure k

1.2 A.2 Operators with different attributes

In the second category, the NNEF and NNAPI operators have different attributes; thus, when converting this type of operator, we must identify the purpose of each attribute and set a value corresponding to it. For example, in addition to input tensors, NNAPI has an extra attribute that is used to specify the type of activation to invoke on the result. In contrast, the add operation of NNEF has no such attribute; therefore, we currently set it to ANEURALNETWORKS_FUSED_NONE in our converter, which means no activation is used on the result. An example of converting this type of operator is shown in Listing 10 and Listing 11. There may be additional opportunities for optimization in this example, because we could set the value of the attribute according to the topology of the input NNEF graph. For example, an add followed by ReLU in NNEF can be translated into an add in NNAPI, with the attribute set to ANEURALNETWORKS_FUSED_RELU.

figure l

1.3 A.3 Operators with different names

In this category, operator names may differ between NNEF and NNAPI but have the same functionality. When translating this type of operator, the correct matching operator name must be found in NNAPI, and it must be ensured that it has the same properties as the NNEF operator. Take sigmoid as an example. The sigmoid operator is called logistic in NNAPI, but it has the same behavior as the sigmoid defined in NNEF. Listing 12 and Listing 13 illustrate the use of this operator in the two systems.

figure m
figure n

1.4 A.4 Operators with different tensor shapes

In this category, the operator has different tensor shape requirements between NNEF and NNAPI. After studying the specification of tensor shapes of each operator in NNEF and NNAPI, this type of operator can be identified. For example, an NNEF operator may need one of its input tensor shapes to be N-dimensional, where NNAPI requires it to be (N+/-1)-dimensional. When translating this type of operator, we must inspect the meaning of each dimension and find a way to reduce or increase its dimension so that it can meet the NNAPI requirement. Listing 14 and Listing 15 illustrate this type of situation; the convolution of NNEF requires the bias shape to be two-dimensional, while NNAPI requires it to be one-dimensional. However, by looking at the NNEF specification, we can see that the first dimension of the bias shape is fixed to 1. Consequently, it is permissible to remove this dimension directly because it does not affect the size of the tensor. Therefore, our converter extracts the second bias dimension and discards the first dimension when we convert the convolution from NNEF to NNAPI.

figure o
figure p

1.5 A.5 Operators with different data layouts

For the data layout, NNEF allows the NCHW data layout for 4-D tensors, while NNAPI uses NHWC; therefore, we must find a way to address this problem when the layout is different. Eventually, we decided to insert a transposition both before and after the operators that take 4-D tensors as input and output when the layouts are different. By correctly setting the transpose permutation, we can ensure that all 4-D tensors use the same data layout as NNAPI, as seen in Listing 14 and Listing 15. In the example NNAPI convolution code, there are two transpositions before the convolution, each of which takes [0,2,3,1] as the input permutation. This changes the data layout of the input 4-D tensors from NCHW to NHWC, which can be accepted in NNAPI. After the convolution, one transpose exists whose permutation is [0,3,1,2]. This changes the data layout of the output 4-D tensor from NHWC to NCHW. By performing this operation, we can ensure that when NNAPI passes the output 4-D tensor back to NNEF-RT, it uses the same data layout as NNEF.

Note that our converter inserts a transpose operation if and only if the 4-D tensor is the graph input or output of the NNEF model. In contrast, when an input 4-D tensor stems from another NNAPI operator, we know that it already uses the NHWC data layout; therefore, there is no need to transpose it to NCHW and then back to NHWC. For example, suppose we are converting the NNEF model shown in Listing 16. The conversion result is similar to the code snippet shown in Listing 17. We can see that no transpositions occur between convolution and max_pool, but they do occur before convolution (because it is a graph input) and after max_pool (because it is a graph output).

Furthermore, if the output tensor reshape operation is 4-D and is not a graph output, our converter transposes it implicitly (when appropriate). Thus, the following operator can use the output from the reshape directly without the need to insert transpose operations. As shown in Listing 16 and Listing 17, the reshaped NNEF output is [1,1,28,28], while the reshaped output of NNAPI is [1,28,28,1], which uses the NHWC data layout so that the subsequent convolution can directly take it as the input.

figure q
figure r

1.6 A.6 Operators with variations

In this category, operators may have variations, such as ReLU, ReLU1, or ReLU6 in NNAPI. However, NNEF does not include such variations; it uses ReLU and min to express ReLU1 and ReLU6. In our work, we provide an extension to existing NNEF to express ReLU1 and ReLU6, as seen in Listing 18. We modified NNEF-Parser so that ReLU can have an extra attribute called n, where n is set to 6, representing ReLU6. In this way, NNEF can have some variations of ReLU, similar to NNAPI. Listing 19 shows the conversion result of Listing 18; we can see that the ReLU where n is 6 in NNEF is translated into ReLU6 in NNAPI.

figure s

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chang, YM., Sung, CY., Sheu, YC. et al. Support NNEF execution model for NNAPI. J Supercomput 77, 10065–10096 (2021). https://doi.org/10.1007/s11227-021-03625-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-03625-7

Keywords