DeepMaker: A multi-objective optimization framework for deep neural networks in embedded systems

https://doi.org/10.1016/j.micpro.2020.102989Get rights and content

Abstract

Deep Neural Networks (DNNs) are compute-intensive learning models with growing applicability in a wide range of domains. Due to their computational complexity, DNNs benefit from implementations that utilize custom hardware accelerators to meet performance and response time as well as classification accuracy constraints. In this paper, we propose DeepMaker framework that aims to automatically design a set of highly robust DNN architectures for embedded devices as the closest processing unit to the sensors. DeepMaker explores and prunes the design space to find improved neural architectures. Our proposed framework takes advantage of a multi-objective evolutionary approach that exploits a pruned design space inspired by a dense architecture. DeepMaker considers the accuracy along with the network size factor as two objectives to build a highly optimized network fitting with limited computational resource budgets while delivers an acceptable accuracy level. In comparison with the best result on the CIFAR-10 dataset, a generated network by DeepMaker presents up to a 26.4x compression rate while loses only 4% accuracy. Besides, DeepMaker maps the generated CNN on the programmable commodity devices, including ARM Processor, High-Performance CPU, GPU, and FPGA.

Introduction

In recent years, deep learning, which uses deep neural networks as the learning model, has shown excellent performance on many challenging artificial intelligence and machine learning tasks, such as image classification [1], speech recognition [2], and unsupervised learning tasks [3]. In particular, Convolutional Neural Networks (CNNs) propose massive success in visual recognition tasks in the past few years and are applied to various computer vision applications [4]. CNNs have penetrated in a broad spectrum of platforms from workstations to embedded devices due to influential learning capabilities.

However, as CNN architectures become increasingly complex in order to improve accuracy, also the energy consumption for inference is becoming a bottleneck. Dealing with enormous computing throughput demand of up-coming complex learning models in the context of big data will be more acute where the failure of traditional energy and performance scaling paradigm in affording of modern applications requirements leads computing landscape towards inefficiency [42]. On the other hand, leveraging high-performance cloud infrastructures for providing required computational capacity is not always feasible, especially for mission-critical applications due to limited network bandwidth, privacy constraints, low-power efficiency, and not guaranteeing worst-case response-time.

Generally, two approaches are presented to tackle these challenges: 1) diminishing the network size by leveraging network pruning techniques during the training phase [1] and 2) employing customized hardware accelerators [13,9,35]. However, optimizing the network architecture at design time should be taken into account as the third approach since the choice of the architecture strongly impacts on both the performance and the output quality of DNNs. To benefit from this opportunity, we propose a neural acceleration framework, named DeepMaker, which automatically generates a robust DNN in terms of network accuracy and network size, then maps the generated network to an embedded device. Unlike previous neural architectural solutions that their focus is only on improving the accuracy level, DeepMaker also considers network size as the second objective of the search space in order to adaptively find a fit DNN for limited resource embedded devices. For this, DeepMaker is equipped with a Multi-Objective Optimization (MOO) method to solve the neural architectural search problem by finding a set of Pareto-optimal surfaces. The design space has been pruned by taking inspirations from a cutting-edge architecture, DenseNet [6], to boost the convergence speed to an optimal result.

The proposed DeepMaker framework uses a multi-objective neuro-evolutionary approach for the space exploration of finding optimal deep neural architectures while mapping the generated network to the given hardware. An overview of the proposed framework is illustrated in Fig. 1. The configuration file of DeepMaker comprises predefined parameters for the MOO algorithm and network training parameters. As shown in Fig. 1, the input of the framework is a dataset for generating a neural network.

To approximate an application, developers first need to identify the approximation region of the code, then provide a training dataset for the specified code block in order to be mimicked by a DNN generated by DeepMaker. The approximation region of the code should be both hotspot and less sensitive to a quality loss in both data and operations. We can define a hotspot as a code region that consumes considerable energy or occupies a significant part of execution time [7].

The output of the DeepMaker framework is a set of optimized architectures. Network pruning is a popular solution for diminishing the amount of network computation. In addition to the design space exploration, DeepMaker can apply a network pruning method on a dense architecture to accelerate finding the optimal neural networks. In a nutshell, our main contributions in DeepMaker are as follows:

  • Developed a multi-objective neuro-evolutionary method to discover near-optimal DNN architectures in terms of the accuracy and the network size.

  • We applied a network pruning method [39] on the optimized neural network architectures designed by DeepMaker to obtain a higher level of network compression rate.

  • Supporting both Multi-Layer Perceptron (MLP) and Convolutional Neural Network (CNN) models fitting with the required accuracy of diverse applications from mathematical function to image classification.

  • Adaptive finding the best architecture regarding resource budget and execution time constraints. Then, mapping the generated network on different platforms to evaluate the applicability of DeepMaker is our last contribution.

The remainder of this paper is organized as follows: Section 2 gives preliminaries on CNN and the MOO algorithm. Details of the proposed framework are presented in Section 3, which consists of two solutions for network optimization: Design Space Exploration and Neural Network Pruning. The experimental results are presented in Section 4. Section 5 reviews related work in this scope, after which Section 6 concludes the paper.

Section snippets

Automatic design of deep neural network architecture

State-of-the-art approaches pointing to design the architecture of DNNs automatically that could be categorized into the hyper-parameter optimization, reinforcement learning, and evolutionary approaches.

  • a)

    Hyperparameter Optimization: From the machine learning point of view, we can model the DNN architecture designing problem as a hyperparameter optimization problem. There have been proposed many hyperparameter optimization methods, such as Grid Search (GS) [16], gradient search [17], Random

Convolutional Neural Networks (CNNs)

A Convolutional Neural Network (CNN) is a multi-layer neural network that is composed of neurons ordered in a layered structure. The neurons in different layers perform different kinds of computations and have different connection structures. The four essential layers of CNNs are convolutional layers (Conv), activation layers (Act), pooling layers (Pool), and fully-connected layers (FC). A typical NN structure is composed of several stacks of {Conv-Act-Pool} at the beginning, and a few stacks

The proposed framework

This section explains the DeepMaker framework. DeepMaker uses two solutions for network optimization: 1) Design Space Exploration for designing network architecture and 2) Neural Network Pruning for compressing the model size, which will be presented in Sections 4.1 and 4.2, respectively.

The DeepMaker framework is composed of frontend and backend layers. The frontend is responsible for generating the optimized DNN while the backend layer deals with hardware configuration and mapping. The

Experimental results

In this section, first, the used datasets for the experiments will be introduced. After that, the experimental results of design space exploration and network pruning will be presented, respectively. In the end, the hardware implementation of the proposed framework on four prevalent hardware platforms, Xilinx UltraScale plus FPGA, NVIDIA Tesla M60 GPU, Intel Core i7-7820, and ARM Cortex-A15 is discussed.

Conclusions

The technology of CNNs are ever-evolving and more complex processing models are developed, which becomes an obstacle for embedded systems where memory and energy are often constraining resources. To handle this problem, we proposed DeepMaker, a framework that automatically generates a highly optimized CNN for commercial embedded devices. DeepMaker alleviates the huge computational cost of CNNs by benefiting from minimizing the network architecture at design time. To reach this goal, DeepMaker

Declaration of Competing Interest

None.

Acknowledgment

KKS has supported this work within the projects DeepMaker and DPAC.

Mohammad Loni is a Ph.D. student at the School of Innovation, Design, and Engineering at Mälardalen University since October 2017. He received his B.Sc. degree in computer hardware engineering and the M.Sc. degree in computer science from Shiraz University in 2017. He is a member of the Dependable Platforms for Autonomous Systems and Control (DPAC), HERO, DeepMaker and FAST-ARTS projects at Mälardalen University. He is working on efficient implementation of neural networks on FPGA.

References (46)

  • A. Krizhevsky et al.

    Imagenet classification with deep convolutional neural networks

    Adv. Neural Inf. Process. Syst.

    (2012)
  • G. Hinton et al.

    Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups

    IEEE Signal Process. Mag.

    (2012)
  • V. Mnih et al.

    Human-level control through deep reinforcement learning

    Nature

    (2015)
  • R. Zhang et al.

    Colorful image colorization

  • J. T. et al.

    Learning both weights and connections for efficient neural networks

    Adv. Neural Inf. Process. Syst.

    (2015)
  • G. Huang et al.

    Densely connected convolutional networks

  • A. Yazdanbakhsh et al.

    AxBench: a multiplatform benchmark suite for approximate computing

    IEEE Des. Test.

    (2017)
  • K. Deb et al.

    A fast and elitist multiobjective genetic algorithm: NSGA-II

    IEEE Trans. Evol. Comput.

    (2002)
  • C. Zhang et al.

    Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks

  • F. Chollet, Keras, github, 2015. [Online]....
  • H. Esmaeilzadeh et al.

    Power challenges may end the multicore era

    Commun. ACM

    (2013)
  • B. Falsafi et al.

    FPGAs versus GPUs in data centers

    IEEE Micro

    (2017)
  • H. Sharma et al.

    DNNWEAVER: from high-level deep network models to FPGA acceleration

    IEEE Int. Conf. Mechatron. Electron. Autom. Eng.

    (2015)
  • Y. LeCun et al.

    Gradient based learning applied to document recognition

    Proc. IEEE

    (1998)
  • A. Krizhevsky and G. Hinton. Cifar-10 dataset....
  • J. Bergstra et al.

    Algorithms for hyperparameter optimization

  • Y. Bengio

    Gradient-based optimization of hyperparameters

    Neural Comput.

    (2000)
  • J. Bergstra et al.

    Random search for hyper-parameter optimization

    J. Mach. Learn. Res.

    (2012)
  • J. Snoek et al.

    Practical bayesian optimization of machine learning algorithms

    Adv. Neural Inf. Process. Syst.

    (2012)
  • Y. Sun, B. Xue, and M. Zhang, Evolving deep convolutional neural networks for image classification (2017)....
  • B. Baker, O. Gupta, N. Naik, and R. Raskar, Designing neural network architectures using reinforcement learning (2016)...
  • B. Zoph, and Q.V. Le, Neural architecture search with reinforcement learning (2016) arXiv prepr....
  • Z. Zhong, J. Yan, and C.L. Liu, Practical network blocks design with Q-Learning (2017) arXiv prepr....
  • Cited by (0)

    Mohammad Loni is a Ph.D. student at the School of Innovation, Design, and Engineering at Mälardalen University since October 2017. He received his B.Sc. degree in computer hardware engineering and the M.Sc. degree in computer science from Shiraz University in 2017. He is a member of the Dependable Platforms for Autonomous Systems and Control (DPAC), HERO, DeepMaker and FAST-ARTS projects at Mälardalen University. He is working on efficient implementation of neural networks on FPGA.

    Sima Sinaei received the B.S. and M.S. and P.h.D degrees in (Hardware and computer Architecture) computer engineering from Shahid-bahonar University of Kerman, University of Tehran, University of Tehran, Iran, in 2008 and 2011, 2018 respectively. She is currently working as a Postdoc researcher at IDT, in Malardalen University, Sweden. Her current research interests include Machine Learning, Deep Learning, Neural Network Architecture Optimization, Design Methodology for heterogeneous embedded systems, Design Space Exploration and Mapping Algorithms for multiprocessor systems.

    Ali Zoljodi is a Master student at Shiraz technical University. He is working on efficient implementation of stereo-vision algorithms on multi-core platforms.

    Masoud Daneshtalab: (http://www.idt.mdh.se/~md/) is currently a tenured associate professor at Mälardalen University (MDH) in Sweden while co-leading the Heterogeneous System research group (www.es.mdh.se/hero/). He has represented Sweden in the management committee of the EU COST Actions IC1202: Timing Analysis on Code-Level (TACLe). Since 2016 he is in Eurimicro board of Director and a member of the HiPEAC network. His research interests include interconnection networks, hardware/software co-design, deep learning architectures, and multi-objective optimization. He has published 2 book, 7 book chapters, and over 200 refereed international journals and conference papers.

    Mikael Sjödin: The current research goal is to find methods that will make software development cheaper, faster and yield software with higher quality. Concurrently, Mikael is also been pursuing research in analysis of real-time systems, where the goal is to find theoretical models for real-time systems that will allow their timing behavior and memory consumption to be calculated. Mikael received his PhD in computer systems 2000 from Uppsala University (Sweden). Since then he has been working in both academia and in industry with embedded systems, real-time systems, and embedded communications. Previous affiliations include Newline Information, Melody Interactive Solutions and CC Systems. In 2006 he joined the MRTC faculty as a full professor with specialty in real-time systems and vehicular software-systems.

    View full text