Channel-Wise Attention and Channel Combination for Knowledge Distillation
Abstract
References
Index Terms
- Channel-Wise Attention and Channel Combination for Knowledge Distillation
Recommendations
Hierarchical Multi-Attention Transfer for Knowledge Distillation
Knowledge distillation (KD) is a powerful and widely applicable technique for the compression of deep learning models. The main idea of knowledge distillation is to transfer knowledge from a large teacher model to a small student model, where the ...
Knowledge Distillation with Classmate
Advanced Intelligent Computing Technology and ApplicationsAbstractKnowledge distillation, as a type of model compression algorithms, has been popularly adopted due to its easy implementation and effectiveness. However, transferring knowledge from a teacher network to a student one encounters a bottleneck. ...
Knowledge Fusion Distillation: Improving Distillation with Multi-scale Attention Mechanisms
AbstractThe success of deep learning has brought breakthroughs in many fields. However, the increased performance of deep learning models is often accompanied by an increase in their depth and width, which conflicts with the storage, energy consumption, ...
Comments
Information & Contributors
Information
Published In
Sponsors
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Check for updates
Author Tags
Qualifiers
- Research-article
- Research
- Refereed limited
Conference
Acceptance Rates
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 72Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)1
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in