Impact Statement:The advanced deep neural networks often suffer from complex network architectures which severely hinder their deployment on resource-limited devices. Knowledge distillati...Show More
Abstract:
As one of the most popular and effective methods in model compression, knowledge distillation (KD) attempts to transfer knowledge from single or multiple large-scale netw...Show MoreMetadata
Impact Statement:
The advanced deep neural networks often suffer from complex network architectures which severely hinder their deployment on resource-limited devices. Knowledge distillation, as one of the most popular model compression techniques, aims to transfer the knowledge from single or multiple cumbersome teacher models to a compact student model. However, existing KD methods either laboriously select a particular teacher or simply assign equal or fixed weights for multiple teachers, resulting in tedious teacher selection procedure and poor distillation efficiency. Thus, in this article, we propose a novel reinforcement learning-based KD approach to overcome these limitations and our experimental results demonstrate that it consistently outperforms other SOTA methods on two real-world tasks. The proposed method can achieve average improvements of 5.7% and 15.7% in terms of RMSE and score, respectively, for machine RUL prediction task, and 8.1% in terms of mean localization error for indoor local...
Abstract:
As one of the most popular and effective methods in model compression, knowledge distillation (KD) attempts to transfer knowledge from single or multiple large-scale networks (i.e., Teachers) to a compact network (i.e., Student). For the multiteacher scenario, existing methods either assign equal or fixed weights for different teacher models during distillation, which can be inefficient as teachers might perform variously or even oppositely on different training samples. To address this issue, we propose a novel reinforced knowledge distillation method with negatively correlated teachers which are generated via negative correlation learning. The negatively correlated teachers would encourage teachers to learn different aspects of data and thus the ensemble of them can be more comprehensive and suitable for multiteacher KD. Subsequently, a reinforced KD algorithm is proposed to dynamically employ proper teachers for different training instances via dueling double deep Q-network (DDQN). ...
Published in: IEEE Transactions on Artificial Intelligence ( Volume: 5, Issue: 6, June 2024)