Distributed fine-tuning of CNNs for image retrieval on multiple mobile devices

https://doi.org/10.1016/j.pmcj.2020.101134Get rights and content

Abstract

The high performance of mobile devices has enabled deep learning to be extended to also exploit its strengths on such devices. However, because their computing power is not yet sufficient to perform on-device training, a pre-trained model is usually downloaded to mobile devices, and only inference is performed on them. This situation leads to the problem that accuracy may be degraded if the characteristics of the data for training and those for inference are sufficiently different. In general, fine-tuning allows a pre-trained model to adapt to a given data set, but it has also been perceived as difficult on mobile devices. In this paper, we introduce our on-going effort to improve the quality of mobile deep learning by enabling fine-tuning on mobile devices. In order to reduce its cost to a level that can be operated on mobile devices, a light-weight fine-tuning method is proposed, and its cost is further reduced by using distributing computing on mobile devices. The proposed technique has been applied to LetsPic-DL, a group photoware application under development in our research group. It required only 24 seconds to fine-tune a pre-trained MobileNet with 100 photos on five Galaxy S8 units, resulting in an excellent image retrieval accuracy reflected a 27–35% improvement.

Introduction

Thanks to the high computing power of recent mobile devices, there is an emerging trend to also utilize the superior performance of deep learning on mobile devices. Most notably, many leading companies have released software that enables us to develop deep learning applications running on mobile devices. For example, Google released TensorFlow Lite [1], Facebook released Caffe2Go [2], Apple released Core ML [3], and Baidu released Mobile Deep Learning (MDL) [4]. They are commonly optimized for on-device performance, which minimizes inference latency, memory requirements, and energy consumption. In addition, mobile chip vendors have incorporated artificial intelligence (AI) capabilities into their chips. For example, Huawei announced Kirin 970 for a mobile AI computing platform [5].

With this emerging trend, many deep learning applications are being developed for Android and iOS smartphones. In particular, image processing benefits from mobile deep learning. For example, Facebook is using Caffe2Go for the “style transfer” of images or videos in the Facebook app [2], and Apple has employed Core ML to detect faces from photos in iPhone albums [6]. Furthermore, in academia several research prototypes, such as DeepEye [7], DeepMon [8], and DeepX [9], have been developed for on-device deep learning for image or video processing. However, even with these developments in on-device deep learning, further improvements are still necessary due to constrained resources in a single mobile device. For example, a deep learning model can be too large to be fully loaded into the memory of a single device [7]. Moreover, object recognition based on mobile deep learning can take more than 100 s [8].

To overcome the abovementioned resource limitations of on-device deep learning, we can leverage the resources of mobile devices in the local vicinity [10]. These multiple devices form a mobile ad-hoc cloud [11], each of which is connected by wireless communication and pre-commissioned by the job clients. This “on-device” deep learning approach has several benefits over the traditional offloading-based approach [12] that moves compute-intensive operations from mobile devices to powerful infrastructures (e.g., the remote cloud). First, it supports stronger user data privacy, because the data does not leave the mobile ad-hoc cloud [10], [11]. Moreover, even within a mobile ad-hoc cloud, devices do not need to share user data but deep learning models [1], thus not compromising user data privacy at all. Second, it guarantees a low inference latency, because it does not involve network round-trips between the mobile device and the remote cloud [7], [8], [9], and thus it can be used in an offline environment where network connection to the remote cloud is not available [1]. Even in such a disastrous environment, a mobile ad-hoc cloud can still provide the resources of nearby mobile devices [13], [14], [15], [16]. Third, it eliminates the cloud (or remote server) maintenance cost [8], which has been never negligible. For example, using an Amazon EC2 g3.4xlarge instance costs one dollar per hour.

Despite significant improvements in mobile deep running, previous studies have concentrated on the inference step rather than the training step, because training remains too computationally intensive to be performed on smartphones.1 Accordingly, mobile deep learning software provides us with various pre-trained models such as MobileNet [18], SqueezeNet [19], and Inception V3 [20].

A popular approach to the inherent limitations of the pre-trained model is fine-tuning,2 which makes fine adjustments to adapt to the data set in hand and thus further improve the performance [22], [21], [23]. Although this is widely recognized in non-mobile environments, it is yet to be adopted in a mobile environment. Without careful consideration of constrained resources, fine-tuning itself also could be too expensive to run on mobile devices. Even in the state-of-the-art mobile deep learning platforms (e.g., TensorFlow Lite), transfer learning should be performed outside of the mobile device [1], which invalidates the benefits of mobile deep learning. Therefore, efficient on-device fine-tuning is urgently required considering the recent explosion of mobile deep learning.

In this paper, we develop a distributed on-device fine-tuning method for deep neural networks (DNNs), which operates completely within mobile devices. This outcome is accomplished via the following two main technical components:

  • Mobile distributed computing: On a mobile ad-hoc cloud, we run a MapReduce [24] job to perform fine-tuning in a distributed manner. Each mobile device individually processes part of the entire data that it owns. Thus, data parallelism [25] on the mobile ad-hoc cloud boosts the performance of on-device fine-tuning.

  • Lightweight fine-tuning: Furthermore, we design a lightweight fine-tuning algorithm, which runs very fast even on a mobile device with constrained resources. Using the data stored in the mobile ad-hoc cloud, we re-train the weights of only three fully-connected layers without touching those of the convolutional layers, because the convolutional layers are the main performance bottlenecks [26], [8].

To the best of our knowledge, our work is the first attempt to realize fine-tuning of DNNs on multiple mobile devices.

Collaborative photography involves collocated users taking photos in a large physical space for group activities. This is becoming increasingly popular in various domains such as education and research [27], [28], [29] and tourism and leisure [30], [31], [29]. It usually involves a shared goal to be completed using teamwork. For instance, a group of students are asked to visit a cultural heritage site and comprehensively photograph ancient monuments. This necessity has led to the development of several mobile apps, including Mobiphos [30] and LetsPic [29].

Most such apps provide photo sharing among group members in support of effective collaboration. We contend that content-based image retrieval (CBIR) should further improve the effectiveness of such teamwork. As shown in Fig. 1, when a group member takes a photo, photos that are similar content-wise can be quickly retrieved from those taken by other group members, and they appear at the bottom of the viewfinder. This feature is very useful for coordinating a shared goal, e.g., by identifying remaining work and reducing duplicated work. Therefore, we incorporated CBIR into LetsPic,3 a group photoware application which had been developed in our research group for four years, and we plan to release its next version LetsPic-DL. Here, our on-device fine-tuning plays an important role in improving the quality of CBIR.

SW & demo: The source code of LetsPic-DL is available at https://github.com/kaist-dmlab/LetsPic-DL, and a demo video is available at http://goo.gl/BwEy8j.

More scenarios: The technology for LetsPic-DL can be used in other various scenarios. A few example scenarios are presented in Section 6.

Performance: In our extensive experiments using the prototype of LetsPic-DL, fine-tuning with 100 photos was achieved very quickly (24 s) on five recent smartphones (Samsung Galaxy S8 [32]), thanks to the abovementioned advantages. Furthermore, performing such lightweight fine-tuning significantly improved the accuracy of content-based image retrieval by 2735% in popular benchmarks such as CIFAR-100 [33], Food-101 [34], and Caltech-Faces [35] over using just the pre-trained model.

Scope:

Among various DNNs, we focus on convolutional neural networks (CNNs) because the main target of LetsPic-DL is content-based image retrieval. It is widely recognized that CNNs are highly suited to image classification or recognition [22], [36], [23], [37], [38], [39], [40], [41].
We focus on mobile distributed computing to realize fast on-device fine-tuning. Local computations on each device can be accelerated by exploiting mobile graphics processing units (GPUs) [9], [8], [7]. Because leveraging hardware acceleration is orthogonal to our work, we do not discuss that issue in this paper.

Outline: The remainder of this paper is organized as follows. Section 2 reviews the state-of-the-art mobile deep learning technologies. Section 3 presents the infrastructure for distributed on-device fine-tuning. Section 4 presents the design and architecture of our proposed platform, LetsPic-DL. Section 5 evaluates the performance of LetsPic-DL. Section 6 discusses other possible usage scenarios. Finally, Section 7 concludes this paper.

Section snippets

Models

AlexNet [36] and VGG16 [40], which have been regarded as representative CNNs, have 60 million and 138 million parameters respectively, resulting in not only a large model size, but also a low learning speed. Thus, it is absolutely necessary to reduce the size of a model for the mobile environment with limited computing power and small memory compared to the server environment. There are two popular techniques—quantization [42] and factorization [18]—along this direction. First, quantization

Our mobile infrastructure

Proposed platform: LetsPic-DL

Overview: The high-level design considerations of LetsPic-DL are summarized as follows:

  • No cloud offloading: We use only the local resources of smartphones, without any cloud offloading, to perform the fine-tuning of the CNN as well as content-based image retrieval (CBIR).

  • Significant accuracy enhancement: Through the fine-tuning of the CNN using the photos stored in smartphones, we aim to improve the accuracy of CBIR significantly.

  • Reasonably low overhead: While achieving a significant accuracy

Implementation

The fine-tuning capability of LetsPic-DL was implemented on top of TensorFlow Lite [1]. Because it supports only the inference step, we customized it to support the training step as well by adopting the corresponding source code of TensorFlow for Java. This integration was feasible because the Java API and the Android API are very similar with each other.

Regarding feature extraction for content-based image retrieval (CBIR), we used the FC2 layer for the fine-tuned model and the FC1 layer for

Additional application scenarios

In this section, we suggest two potential application scenarios for the LetsPic-DL technology, thereby showing its versatility.

First, in mobile crowdsensing, a group of individuals having mobile devices collectively share data and extract information to infer a status of common interest. For example, this can be used for monitoring risky mountain trails [53]. When a group of acquaintances climb a mountain, their walking motion data are captured by their smartphones and used for inferring the

Conclusion

In this paper, we presented our on-going effort to develop an on-device fine-tuning method for CNNs. The proposed method has been applied to our group photoware application LetsPic-DL. The significance of our work is first to realize the fine-tuning of CNNs on resource-constrained mobile devices. According to our extensive evaluation, in the mobile ad-hoc cloud which consists of five Galaxy S8 smartphones, a 2735% improvement in accuracy was achieved by a small cost of 2447 seconds spent for

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the Korea Institute of Energy Technology Evaluation and Planning (KETEP) and the Ministry of Trade, Industry & Energy (MOTIE) of the Republic of Korea (No. 20183010141100).

This work was partly supported by the Korea Institute of Startup & Entrepreneurship Development(KISED) grant funded by the Seoul Center for Creative Economy and Innovation (CCEI) (No. 1016414501).

References (54)

  • FernandoN. et al.

    Mobile cloud computing: A survey

    Future Gener. Comput. Syst.

    (2013)
  • PatelN. et al.

    Mobiphos: A study of user engagement with a mobile collocated-synchronous photo sharing application

    Int. J. Hum.-Comput. Stud.

    (2009)
  • Google

    Introduction to TensorFlow Lite

    (2017)
  • JiaY. et al.

    Delivering real-time AI in the palm of your hand

    (2016)
  • AppleY.

    Core ML - Integrate machine learning models into your app

    (2017)
  • BaiduY.

    Mobile Deep Learning

    (2018)
  • HuaweiY.

    HUAWEI reveals the future of mobile AI at IFA 2017

    (2017)
  • Apple Computer Vision Machine Learning TeamY.

    An on-device deep neural network for face detection

    Apple Mach. Learn. J.

    (2017)
  • A. Mathur, N.D. Lane, S. Bhattacharya, A. Boran, C. Forlivesi, F. Kawsar, DeepEye: Resource efficient local execution...
  • L.N. Huynh, Y. Lee, R.K. Balan, DeepMon: Mobile GPU-based deep learning framework for continuous vision applications,...
  • N.D. Lane, S. Bhattacharya, P. Georgiev, C. Forlivesi, L. Jiao, L. Qendro, F. Kawsar, DeepX: A software accelerator for...
  • YaqoobI. et al.

    Mobile ad hoc cloud: A survey

    Wirel. Commun. Mob. Comput.

    (2016)
  • K. Ha, Z. Chen, W. Hu, W. Richter, P. Pillai, M. Satyanarayanan, Towards wearable cognitive assistance, in: Proc. 12th...
  • MarinelliE.E.

    Hyrax: Cloud Computing on Mobile Devices using MapReduce

    (2009)
  • G. Huerta-Canepa, D. Lee, A virtual cloud computing provider for mobile devices, in: Proc. 1st ACM Workshop on Mobile...
  • A. Dou, V. Kalogeraki, D. Gunopulos, T. Mielikainen, V.H. Tuulos, Misco: A MapReduce framework for mobile systems, in:...
  • woo LeeJ. et al.

    Maximizing MapReduce job speed and reliability in the mobile cloud by optimizing task allocation

    J. Pervasive Mob. Comput.

    (2019)
  • YouY. et al.

    Imagenet training in minutes

    (2017)
  • HowardA.G. et al.

    Mobilenets: Efficient convolutional neural networks for mobile vision applications

    (2017)
  • IandolaF.N. et al.

    SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and ¡0.5 MB model size

    (2016)
  • SzegedyC. et al.

    Rethinking the inception architecture for computer vision

    (2015)
  • YosinskiJ. et al.

    How transferable are features in deep neural networks?

  • K. Lin, H.-F. Yang, J.-H. Hsiao, C.-S. Chen, Deep learning of binary hash codes for fast image retrieval, in: Proc....
  • Z. Zhou, J. Shin, L. Zhang, S. Gurudu, M. Gotway, J. Liang, Fine-tuning convolutional neural networks for biomedical...
  • J. Dean, S. Ghemawat, MapReduce: Simplified data processing on large clusters, in: Proc. Sixth Sympo. on Operating...
  • J. Dean, et al. Large scale distributed deep networks, in: Proc. 25th Int’L Conf. on Neural Information Processing...
  • H.B. McMahan, M. Streeter, Delay-tolerant algorithms for asynchronous distributed online learning, in: Proc. 27th Int’L...
  • Cited by (5)

    • Automatic detection and diagnosis of cocoa diseases using mobile tech and deep learning

      2024, International Journal of Sustainable Agricultural Management and Informatics
    • A Novel Replication-Less Image Retrieval Method from Cloud Platforms using Divergence Features

      2023, 2023 2nd International Conference on Smart Technologies and Systems for Next Generation Computing, ICSTSN 2023
    View full text