research-article

CAP’NN: A Class-aware Framework for Personalized Neural Network Inference

Authors:

Joshua San Miguel,

Azadeh DavoodiAuthors Info & Claims

ACM Transactions on Embedded Computing Systems, Volume 21, Issue 5

Article No.: 59, Pages 1 - 24

https://doi.org/10.1145/3520126

Published: 09 December 2022 Publication History

Abstract

We propose a framework for Class-aware Personalized Neural Network Inference (CAP’NN), which prunes an already-trained neural network model based on the preferences of individual users. Specifically, by adapting to the subset of output classes that each user is expected to encounter, CAP’NN is able to prune not only ineffectual neurons but also miseffectual neurons that confuse classification, without the need to retrain the network. CAP’NN also exploits the similarities among pruning requests from different users to minimize the timing overheads of pruning the network. To achieve this, we propose a clustering algorithm that groups similar classes in the network based on the firing rates of neurons for each class and then implement a lightweight cache architecture to store and reuse information from previously pruned networks. In our experiments with VGG-16, AlexNet, and ResNet-152 networks, CAP’NN achieves, on average, up to 47% model size reduction while actually improving the top-1(5) classification accuracy by up to 3.9%(3.4%) when the user only encounters a subset of the trained classes in these networks.

References

[1]

Islam Atta, Pinar Tozun, Anastasia Ailamaki, and Andreas Moshovos. 2012. SLICC: Self-Assembly of instruction cache collectives for OLTP workloads. In Proceedings of the International Symposium on Microarchitecture. 188–198.

Digital Library

[2]

P. Bholowalia and A. Kumar. 2014. A clustering technique based on Elbow method and k-means in WSN. Int. J. Comput. Appl. 105, 9 (2014), 17–24.

[3]

Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, and Vivienne Sze. 2018. Understanding the limitations of existing energy-efficient design approaches for deep neural networks. Energy 2, L1 (2018), L3.

[4]

J. Chio, Z. Hakimi, P. Shin, W. Sampson, and V. Narayanan. 2019. Context-aware convolutional neural network over distributed system in collaborative computing. In Proceedings of the Design Automation Conference. 1–6.

Digital Library

[5]

Y. Le Cun, J. S. Denker, and S. A. Solla. 1990. Optimal brain damage. In Proceedings of the Conference on Neural Information Processing Systems. 598–605.

[6]

E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. In Proceedings of the Conference on Neural Information Processing Systems. 1269–1277.

[7]

X. Dong, C. Xu, Y. Xie, and N. P. Jouppi. 2012. NVSIM: A circuit-level performance, energy, and area model for emerging nonvolatile memory. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 31, 7 (2012), 994–1007.

Digital Library

[8]

J. Guo and M. Potkonjak. 2017. Pruning ConvNets online for efficient specialist model pruning. In Proceedings of the Computer Vision and Pattern Recognition Conference. 113–120.

[9]

S. Han, J. Pool, J. Tran, and W. Dally. 2015. Learning both weights and connections for efficient neural network. In Proceedings of the Conference on Neural Information Processing Systems. 1135–1143.

[10]

Babak Hassibi, David G. Stork, and Gregory J. Wolff. 1993. Optimal brain surgeon and general network pruning. In Proceedings of the International Conference on Neural Networks. 293–299.

[11]

Y. He, X. Zhang, and J. Sun. 2017. Channel pruning for accelerating very deep neural networks. In Proceedings of the International Conference on Computer Vision. 1389–1397.

[12]

Maedeh Hemmat, Joshua San Miguel, and Azadeh Davoodi. 2020. CAP’NN: Class-aware personalized neural network inference. In Proceedings of the Design Automation Conference. 1–6.

[13]

LeaderGPU. [n.d.]. Retrieved from https://www.leadergpu.com/catalog/tensorflow.

[14]

Hengyuan Hu, Rui Peng, Y. W. Tai, and Ch. Tang. 2016. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. Retrieved from https://arXiv:1607.03250.

[15]

Y. H. Hu, Qiuzhen Xue, and W. J. Tompkins. 1991. Structural simplification of a feed-forward, multi-layer perceptron artificial neural network. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 1061–1064.

[16]

N. P. Jouppi et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the International Symposium on Computer Architecture. 1–12.

Digital Library

[17]

Y. D. Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. 2015. Compression of deep convolutional neural networks for fast and low power mobile applications. Retrieved from https://arXiv:1511.06530.

[18]

S. Y. Kung and Y. H. Hu. 1991. A Frobenius approximation reduction method (FARM) for determining optimal number of hidden units. In Proceedings of the IEEE International Joint Conference on Neural Networks. 163–168.

[19]

Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2016. Pruning filters for efficient convnets. Retrieved from https://arXiv:1608.08710.

[20]

J. Luo, J. Wu, and W. Lin. 2017. Thinet: A filter level pruning method for deep neural network compression. In Proceedings of the International Conference on Computer Vision. 5058–5066.

[21]

Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2017. Pruning convolutional neural networks for resource efficient inference. Retrieved from https://arXiv:1611.06440.

[22]

M. Nazemi, G. Pasandi, and M. Pedram. 2019. Energy-efficient, low-latency realization of neural networks through boolean logic minimization. In Proceedings of the Asia and South Pacific Design Automation Conference. 274–279.

Digital Library

[23]

Z. Qin, F. Yu, Ch. Liu, and X. Chen. 2019. CAPTOR: A class adaptive filter pruning framework for convolutional neural networks in Mobile applications. In Proceedings of the Asia and South Pacific Design Automation Conference. 444–449.

Digital Library

[24]

M. S. Razlighi, M. Imani, F. Koushanfar, and T. Rosing. 2017. LookNN: Neural network with no multiplication. In Proceedings of the Design Automation and Test in Europe. 1779–1784.

[25]

A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. Strachan, M. Hu, R. Williams, and V. Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In Proceedings of the International Symposium on Computer Architecture. 14–26.

Digital Library

[26]

F. S. Snigdha, I. Ahmed, S. D. Manasi, M. G. Mankalale, J. Hu, and S. S. Sapatnekar. 2019. SeFAct: Selective feature activation and early classification for CNNs. In Proceedings of the Asia and South Pacific Design Automation Conference. 487–492.

Digital Library

[27]

Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning structured sparsity in deep neural networks. In Proceedings of the Conference on Neural Information Processing Systems. 2074–2082.

[28]

Zhengxin Yu, Jia Hu, Geyong Min, Haochuan Lu, Zhiwei Zhao, Haozhe Wang, and Nektarios Georgalas. 2018. Federated learning based proactive content caching in edge computing. In Proceedings of the Global Communications Conference. 1–6.

Digital Library

[29]

B. Zhang, A. Davoodi, and Y. Hu. 2018. Exploring energy and accuracy tradeoff in structure simplification of trained deep neural networks. IEEE J. Emerg. Select. Top. Circ. Syst. 8, 4 (2018), 836–848.

[30]

C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 161–170.

Digital Library

Index Terms

CAP’NN: A Class-aware Framework for Personalized Neural Network Inference
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks

Recommendations

CAP'NN: class-aware personalized neural network inference
DAC '20: Proceedings of the 57th ACM/EDAC/IEEE Design Automation Conference

We propose CAP'NN, a framework for Class-Aware Personalized Neural Network Inference. CAP'NN prunes an already-trained neural network model based on the preferences of individual users. Specifically, by adapting to the subset of output classes that each ...
Improving neural network’s performance using Bayesian inference
Abstract
Neural networks can be used as a data-driven model for system identification. But the probability properties of the training data are not included. The Bayesian approach can model the input and output probability distribution, but it ...
OCAP: On-device Class-Aware Pruning for personalized edge DNN models
Abstract
In this paper, we propose a new on-device class-aware pruning method for edge systems, namely OCAP. The motivation behind is that Deep Neural Network (DNN) models are usually trained with a large dataset so that they can learn more diverse ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 21, Issue 5

September 2022

526 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/3561947

Editor:
Tulika Mitra
National University of Singapore, Singapore

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 09 December 2022

Online AM: 21 March 2022

Accepted: 19 February 2022

Revised: 11 January 2022

Received: 29 June 2021

Published in TECS Volume 21, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
201
Total Downloads

Downloads (Last 12 months)37
Downloads (Last 6 weeks)7

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents