Abstract:
Feature-map based knowledge distillation has exhibited its significance in improving the performance of student model. Existing works mainly focus on the formulation of k...Show MoreMetadata
Abstract:
Feature-map based knowledge distillation has exhibited its significance in improving the performance of student model. Existing works mainly focus on the formulation of knowledge, but ignore the number difference of channels due to heterogeneous architectures of teacher-student pair. They generally adopt handcrafted matching or input-independent association matrix, which would lead to the semantic mismatch, thus suboptimal performance. To resolve this problem, we present an input-dependent channel association module. This module automatically generates an allocation matrix in a cross-attention manner, which enables each student channel to be dynamically connected to its semantic-related teacher channel based on its learning state. An alternative training scheme is applied for stable optimization. Extensive experiments on image classification with a variety of settings based on the popular network architectures well demonstrate the effectiveness of our proposed strategy.
Published in: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 04-10 June 2023
Date Added to IEEE Xplore: 05 May 2023
ISBN Information: