Abstract:
Previous paradigms have combined self-supervised learning (SSL) with knowledge distillation to compress a self-supervised teacher model into a smaller student. In this wo...Show MoreMetadata
Abstract:
Previous paradigms have combined self-supervised learning (SSL) with knowledge distillation to compress a self-supervised teacher model into a smaller student. In this work, we devise a self-supervised explorative distillation (SSED) algorithm to improve the representation quality of the lightweight models. We introduce a heterogeneous teacher to maximumly learn rich feature representation for the student, which reaches the expected goal of capturing discriminative feature information contained in network itself. SSED enforces the student to learn more diversified and perfect representations of the original class recognition task and self-supervised learning task. Extensive experiments show that SSED improves accuracy effectively on large and small models, and surpassing current top-performing SSL methods. Particularly, the linear results of our ResNet-18, trained with ResNet-50 teacher, achieves 65.5% ImageNet top-1 accuracy, which is 1.4% and 4.9% higher than OSS and DisCo. Code is available at https://github.com/nanxiaotong/SSED.
Published in: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 04-10 June 2023
Date Added to IEEE Xplore: 05 May 2023
ISBN Information: