skip to main content
10.1145/3400286.3418273acmconferencesArticle/Chapter ViewAbstractPublication PagesracsConference Proceedingsconference-collections
research-article

Channel-Wise Attention and Channel Combination for Knowledge Distillation

Published: 25 November 2020 Publication History

Abstract

Knowledge distillation is a strategy to build machine learning models efficiently by making use of knowledge embedded in a pretrained model. Teacher-student framework is a well-known one to use knowledge distillation, where a teacher network usually contains knowledge for a specific task and a student network is constructed in a simpler architecture inheriting the knowledge of the teacher network. This paper proposes a new approach that uses an attention mechanism to extract knowledge from a teacher network. The attention function plays the role of determining which channels of feature maps in the teacher network to be used for training the student network so that the student network can only learn useful features. This approach allows a new model to learn useful features considering the model complexity.

References

[1]
Alex Krizhevsky, Geoffrey Hinton, et al.2009. Learning multiple layers of features from tiny images. (2009).
[2]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[3]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
[4]
Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2014. Fitnets: Hints for thin deep nets. arXivpreprint arXiv:1412.6550 (2014).
[5]
Sergey Zagoruyko and Nikos Komodakis. 2016. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv: 1612.03928 (2016).
[6]
Byeongho Heo, Minsik Lee, Sangdoo Yun, and Jin Young Choi. 2019. Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3779--3787.
[7]
Byeongho Heo, Jeesoo Kim, Sangdoo Yun, Hyojin Park, Nojun Kwak, and Jin Young Choi. 2019. A comprehensive overhaul of feature distillation. In Proceedings of the IEEE International Conference on Computer Vision. 1921--1930.
[8]
Sungsoo Ahn, Shell Xu Hu, Andreas Damianou, Neil D Lawrence, and Zhenwen Dai. 2019. Variational information distillation for knowledge transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9163--9171.
[9]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint-arXiv:1409.0473 (2014)
[10]
Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint-arXiv:1508.04025 (2015).
[11]
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning. 2048--2057.
[12]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.
[13]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020.A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709 (2020).
[14]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.
[15]
Sergey Zagoruyko and Nikos Komodakis. 2016. Wide residual networks. arXivpreprint arXiv:1605.07146 (2016).

Index Terms

  1. Channel-Wise Attention and Channel Combination for Knowledge Distillation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    RACS '20: Proceedings of the International Conference on Research in Adaptive and Convergent Systems
    October 2020
    300 pages
    ISBN:9781450380256
    DOI:10.1145/3400286
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 November 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Attention
    2. Knowledge distillation
    3. Visual representation

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    RACS '20
    Sponsor:

    Acceptance Rates

    RACS '20 Paper Acceptance Rate 42 of 148 submissions, 28%;
    Overall Acceptance Rate 393 of 1,581 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 72
      Total Downloads
    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media