skip to main content
research-article

CAP’NN: A Class-aware Framework for Personalized Neural Network Inference

Published: 09 December 2022 Publication History

Abstract

We propose a framework for Class-aware Personalized Neural Network Inference (CAP’NN), which prunes an already-trained neural network model based on the preferences of individual users. Specifically, by adapting to the subset of output classes that each user is expected to encounter, CAP’NN is able to prune not only ineffectual neurons but also miseffectual neurons that confuse classification, without the need to retrain the network. CAP’NN also exploits the similarities among pruning requests from different users to minimize the timing overheads of pruning the network. To achieve this, we propose a clustering algorithm that groups similar classes in the network based on the firing rates of neurons for each class and then implement a lightweight cache architecture to store and reuse information from previously pruned networks. In our experiments with VGG-16, AlexNet, and ResNet-152 networks, CAP’NN achieves, on average, up to 47% model size reduction while actually improving the top-1(5) classification accuracy by up to 3.9%(3.4%) when the user only encounters a subset of the trained classes in these networks.

References

[1]
Islam Atta, Pinar Tozun, Anastasia Ailamaki, and Andreas Moshovos. 2012. SLICC: Self-Assembly of instruction cache collectives for OLTP workloads. In Proceedings of the International Symposium on Microarchitecture. 188–198.
[2]
P. Bholowalia and A. Kumar. 2014. A clustering technique based on Elbow method and k-means in WSN. Int. J. Comput. Appl. 105, 9 (2014), 17–24.
[3]
Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, and Vivienne Sze. 2018. Understanding the limitations of existing energy-efficient design approaches for deep neural networks. Energy 2, L1 (2018), L3.
[4]
J. Chio, Z. Hakimi, P. Shin, W. Sampson, and V. Narayanan. 2019. Context-aware convolutional neural network over distributed system in collaborative computing. In Proceedings of the Design Automation Conference. 1–6.
[5]
Y. Le Cun, J. S. Denker, and S. A. Solla. 1990. Optimal brain damage. In Proceedings of the Conference on Neural Information Processing Systems. 598–605.
[6]
E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. In Proceedings of the Conference on Neural Information Processing Systems. 1269–1277.
[7]
X. Dong, C. Xu, Y. Xie, and N. P. Jouppi. 2012. NVSIM: A circuit-level performance, energy, and area model for emerging nonvolatile memory. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 31, 7 (2012), 994–1007.
[8]
J. Guo and M. Potkonjak. 2017. Pruning ConvNets online for efficient specialist model pruning. In Proceedings of the Computer Vision and Pattern Recognition Conference. 113–120.
[9]
S. Han, J. Pool, J. Tran, and W. Dally. 2015. Learning both weights and connections for efficient neural network. In Proceedings of the Conference on Neural Information Processing Systems. 1135–1143.
[10]
Babak Hassibi, David G. Stork, and Gregory J. Wolff. 1993. Optimal brain surgeon and general network pruning. In Proceedings of the International Conference on Neural Networks. 293–299.
[11]
Y. He, X. Zhang, and J. Sun. 2017. Channel pruning for accelerating very deep neural networks. In Proceedings of the International Conference on Computer Vision. 1389–1397.
[12]
Maedeh Hemmat, Joshua San Miguel, and Azadeh Davoodi. 2020. CAP’NN: Class-aware personalized neural network inference. In Proceedings of the Design Automation Conference. 1–6.
[13]
[14]
Hengyuan Hu, Rui Peng, Y. W. Tai, and Ch. Tang. 2016. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. Retrieved from https://arXiv:1607.03250.
[15]
Y. H. Hu, Qiuzhen Xue, and W. J. Tompkins. 1991. Structural simplification of a feed-forward, multi-layer perceptron artificial neural network. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 1061–1064.
[16]
N. P. Jouppi et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the International Symposium on Computer Architecture. 1–12.
[17]
Y. D. Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. 2015. Compression of deep convolutional neural networks for fast and low power mobile applications. Retrieved from https://arXiv:1511.06530.
[18]
S. Y. Kung and Y. H. Hu. 1991. A Frobenius approximation reduction method (FARM) for determining optimal number of hidden units. In Proceedings of the IEEE International Joint Conference on Neural Networks. 163–168.
[19]
Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2016. Pruning filters for efficient convnets. Retrieved from https://arXiv:1608.08710.
[20]
J. Luo, J. Wu, and W. Lin. 2017. Thinet: A filter level pruning method for deep neural network compression. In Proceedings of the International Conference on Computer Vision. 5058–5066.
[21]
Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2017. Pruning convolutional neural networks for resource efficient inference. Retrieved from https://arXiv:1611.06440.
[22]
M. Nazemi, G. Pasandi, and M. Pedram. 2019. Energy-efficient, low-latency realization of neural networks through boolean logic minimization. In Proceedings of the Asia and South Pacific Design Automation Conference. 274–279.
[23]
Z. Qin, F. Yu, Ch. Liu, and X. Chen. 2019. CAPTOR: A class adaptive filter pruning framework for convolutional neural networks in Mobile applications. In Proceedings of the Asia and South Pacific Design Automation Conference. 444–449.
[24]
M. S. Razlighi, M. Imani, F. Koushanfar, and T. Rosing. 2017. LookNN: Neural network with no multiplication. In Proceedings of the Design Automation and Test in Europe. 1779–1784.
[25]
A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. Strachan, M. Hu, R. Williams, and V. Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In Proceedings of the International Symposium on Computer Architecture. 14–26.
[26]
F. S. Snigdha, I. Ahmed, S. D. Manasi, M. G. Mankalale, J. Hu, and S. S. Sapatnekar. 2019. SeFAct: Selective feature activation and early classification for CNNs. In Proceedings of the Asia and South Pacific Design Automation Conference. 487–492.
[27]
Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning structured sparsity in deep neural networks. In Proceedings of the Conference on Neural Information Processing Systems. 2074–2082.
[28]
Zhengxin Yu, Jia Hu, Geyong Min, Haochuan Lu, Zhiwei Zhao, Haozhe Wang, and Nektarios Georgalas. 2018. Federated learning based proactive content caching in edge computing. In Proceedings of the Global Communications Conference. 1–6.
[29]
B. Zhang, A. Davoodi, and Y. Hu. 2018. Exploring energy and accuracy tradeoff in structure simplification of trained deep neural networks. IEEE J. Emerg. Select. Top. Circ. Syst. 8, 4 (2018), 836–848.
[30]
C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 161–170.

Index Terms

  1. CAP’NN: A Class-aware Framework for Personalized Neural Network Inference

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 21, Issue 5
    September 2022
    526 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/3561947
    • Editor:
    • Tulika Mitra
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 09 December 2022
    Online AM: 21 March 2022
    Accepted: 19 February 2022
    Revised: 11 January 2022
    Received: 29 June 2021
    Published in TECS Volume 21, Issue 5

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Class-aware pruning
    2. personalized inference
    3. energy-efficient inference

    Qualifiers

    • Research-article
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 201
      Total Downloads
    • Downloads (Last 12 months)37
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media