research-article

Data-Free Ensemble Knowledge Distillation for Privacy-conscious Multimedia Model Compression

Authors:

Yonggang WenAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 1803 - 1811

https://doi.org/10.1145/3474085.3475329

Published: 17 October 2021 Publication History

Abstract

Recent advances in deep learning bring impressive performance for multimedia applications. Hence, compressing and deploying these applications on resource-limited edge devices via model compression becomes attractive. Knowledge distillation (KD) is one of the most popular model compression techniques. However, most well-behaved KD approaches require the original dataset, which is usually unavailable due to privacy issues, while existing data-free KD methods perform much worse than data-required counterparts. In this paper, we analyze previous data-free KD methods from the data perspective and point out that using a single pre-trained model limits the performance of these approaches. We then propose a Data-Free Ensemble knowledge Distillation (DFED) framework, which contains a student network, a generator network, and multiple pre-trained teacher networks. During training, the student mimics behaviors of the ensemble of teachers using samples synthesized by a generator, which aims to enlarge the prediction discrepancy between the student and teachers. A moment matching loss term assists the generator training by minimizing the distance between activations of synthesized samples and real samples. We evaluate DFED on three popular image classification datasets. Results demonstrate that our method achieves significant performance improvements compared with previous works. We also design an ablation study to verify the effectiveness of each component of the proposed framework.

References

[1]

Sungsoo Ahn, Shell Xu Hu, Andreas C. Damianou, Neil D. Lawrence, and Zhenwen Dai. 2019. Variational Information Distillation for Knowledge Transfer. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16--20, 2019. Computer Vision Foundation / IEEE, 9163--9171. https://doi.org/10.1109/CVPR.2019.00938

[2]

Zeyuan Allen-Zhu and Yuanzhi Li. 2020. Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning. CoRR, Vol. abs/2012.09816 (2020). arxiv: 2012.09816

[3]

Defang Chen, Jian-Ping Mei, Can Wang, Yan Feng, and Chun Chen. 2020. Online Knowledge Distillation with Diverse Peers. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7--12, 2020. AAAI Press, 3430--3437.

[4]

Hanting Chen, Yunhe Wang, Chang Xu, Zhaohui Yang, Chuanjian Liu, Boxin Shi, Chunjing Xu, Chao Xu, and Qi Tian. 2019. Data-Free Learning of Student Networks. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 3513--3521. https://doi.org/10.1109/ICCV.2019.00361

[5]

Shangchen Du, Shan You, Xiaojie Li, Jianlong Wu, Fei Wang, Chen Qian, and Changshui Zhang. 2020. Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6--12, 2020, virtual .

[6]

Gongfan Fang, Jie Song, Chengchao Shen, Xinchao Wang, Da Chen, and Mingli Song. 2019. Data-Free Adversarial Distillation. CoRR, Vol. abs/1912.11006 (2019). arxiv: 1912.11006

[7]

Tommaso Furlanello, Zachary Chase Lipton, Michael Tschannen, Laurent Itti, and Anima Anandkumar. 2018. Born-Again Neural Networks. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm"a ssan, Stockholm, Sweden, July 10--15, 2018 (Proceedings of Machine Learning Research, Vol. 80). PMLR, 1602--1611.

[8]

Matan Haroush, Itay Hubara, Elad Hoffer, and Daniel Soudry. 2020. The Knowledge Within: Methods for Data-Free Model Compression. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13--19, 2020. IEEE, 8491--8499. https://doi.org/10.1109/CVPR42600.2020.00852

[9]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27--30, 2016. IEEE Computer Society, 770--778. https://doi.org/10.1109/CVPR.2016.90

[10]

Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the Knowledge in a Neural Network. CoRR, Vol. abs/1503.02531 (2015). arxiv: 1503.02531

[11]

Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).

[12]

Xu Lan, Xiatian Zhu, and Shaogang Gong. 2018. Knowledge Distillation by On-the-Fly Native Ensemble. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3--8, 2018, Montré al, Canada. 7528--7538.

Digital Library

[13]

Fei-Fei Li, Robert Fergus, and Pietro Perona. 2006. One-Shot Learning of Object Categories. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 28, 4 (2006), 594--611. https://doi.org/10.1109/TPAMI.2006.79

Digital Library

[14]

Xiaojie Li, Jianlong Wu, Hongyu Fang, Yue Liao, Fei Wang, and Chen Qian. 2020 a. Local Correlation Consistency for Knowledge Distillation. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XII (Lecture Notes in Computer Science, Vol. 12357). Springer, 18--33. https://doi.org/10.1007/978--3-030--58610--2_2

[15]

Yuhang Li, Feng Zhu, Ruihao Gong, Mingzhu Shen, Fengwei Yu, Shaoqing Lu, and Shi Gu. 2020 b. Learning in School: Multi-teacher Knowledge Inversion for Data-Free Quantization. CoRR, Vol. abs/2011.09899 (2020). arxiv: 2011.09899

[16]

Raphael Gontijo Lopes, Stefano Fenu, and Thad Starner. 2017. Data-Free Knowledge Distillation for Deep Neural Networks. CoRR, Vol. abs/1710.07535 (2017). arxiv: 1710.07535

[17]

Andrey Malinin, Bruno Mlodozeniec, and Mark J. F. Gales. 2020. Ensemble Distribution Distillation. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26--30, 2020. OpenReview.net.

[18]

Paul Micaelli and Amos J. Storkey. 2019. Zero-shot Knowledge Transfer via Adversarial Belief Matching. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8--14, 2019, Vancouver, BC, Canada. 9547--9557.

Digital Library

[19]

Gaurav Kumar Nayak, Konda Reddy Mopuri, Vaisakh Shaj, Venkatesh Babu Radhakrishnan, and Anirban Chakraborty. 2019. Zero-Shot Knowledge Distillation in Deep Networks. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9--15 June 2019, Long Beach, California, USA (Proceedings of Machine Learning Research, Vol. 97). PMLR, 4743--4751.

[20]

Luong Trung Nguyen, Kwangjin Lee, and Byonghyo Shim. 2020. Stochasticity and Skip Connection Improve Knowledge Transfer. In 28th European Signal Processing Conference, EUSIPCO 2020, Amsterdam, Netherlands, January 18--21, 2021. IEEE, 1537--1541. https://doi.org/10.23919/Eusipco47968.2020.9287227

[21]

Seonguk Park and Nojun Kwak. 2019. FEED: Feature-level Ensemble for Knowledge Distillation. CoRR, Vol. abs/1909.10754 (2019). arxiv: 1909.10754

[22]

Baoyun Peng, Xiao Jin, Dongsheng Li, Shunfeng Zhou, Yichao Wu, Jiaheng Liu, Zhaoning Zhang, and Yu Liu. 2019. Correlation Congruence for Knowledge Distillation. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 5006--5015. https://doi.org/10.1109/ICCV.2019.00511

[23]

Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18--22, 2018. IEEE Computer Society, 4510--4520. https://doi.org/10.1109/CVPR.2018.00474

[24]

Bharat Bhusan Sau and Vineeth N. Balasubramanian. 2016. Deep Model Compression: Distilling Knowledge from Noisy Teachers. CoRR, Vol. abs/1610.09650 (2016). arxiv: 1610.09650

[25]

Devesh Walawalkar, Zhiqiang Shen, and Marios Savvides. 2020. Online Ensemble Model Compression Using Knowledge Distillation. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XIX (Lecture Notes in Computer Science, Vol. 12364). Springer, 18--35. https://doi.org/10.1007/978--3-030--58529--7_2

[26]

Hongxu Yin, Pavlo Molchanov, Jose M. Alvarez, Zhizhong Li, Arun Mallya, Derek Hoiem, Niraj K. Jha, and Jan Kautz. 2020. Dreaming to Distill: Data-Free Knowledge Transfer via DeepInversion. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13--19, 2020. IEEE, 8712--8721. https://doi.org/10.1109/CVPR42600.2020.00874

[27]

Jaemin Yoo, Minyong Cho, Taebum Kim, and U Kang. 2019. Knowledge Extraction with No Observable Data. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8--14, 2019, Vancouver, BC, Canada. 2701--2710.

Digital Library

[28]

Shan You, Chang Xu, Chao Xu, and Dacheng Tao. 2017. Learning from Multiple Teacher Networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13 - 17, 2017. ACM, 1285--1294. https://doi.org/10.1145/3097983.3098135

Digital Library

[29]

Fei Yuan, Linjun Shou, Jian Pei, Wutao Lin, Ming Gong, Yan Fu, and Daxin Jiang. 2020. Reinforced Multi-Teacher Selection for Knowledge Distillation. CoRR, Vol. abs/2012.06048 (2020). arxiv: 2012.06048

[30]

Sergey Zagoruyko and Nikos Komodakis. 2016. Wide Residual Networks. In Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19--22, 2016. BMVA Press.

[31]

Ying Zhang, Tao Xiang, Timothy M. Hospedales, and Huchuan Lu. 2018. Deep Mutual Learning. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18--22, 2018. IEEE Computer Society, 4320--4328. https://doi.org/10.1109/CVPR.2018.00454

Cited By

Wang YChen ZZhang JYang DGe ZLiu YLiu SSun YZhang WQi LCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Sampling to Distill: Knowledge Transfer from Open-World DataProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680618(2438-2447)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680618
Aerni MZhang JTramèr FLuo BLiao XXu JKirda ELie D(2024)Evaluations of Machine Learning Privacy Defenses are MisleadingProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690194(1271-1284)Online publication date: 2-Dec-2024
https://dl.acm.org/doi/10.1145/3658644.3690194
Lin CMao XQiu CZou L(2024)DTCNet: Transformer-CNN Distillation for Super-Resolution of Remote Sensing ImageIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2024.340980817(11117-11133)Online publication date: 2024
https://doi.org/10.1109/JSTARS.2024.3409808
Show More Cited By

Index Terms

Data-Free Ensemble Knowledge Distillation for Privacy-conscious Multimedia Model Compression
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
2. Security and privacy
  1. Human and societal aspects of security and privacy

Recommendations

Conditional generative data-free knowledge distillation
Abstract
Knowledge distillation has made remarkable achievements in model compression. However, most existing methods require the original training data, which is usually unavailable due to privacy and security issues. This paper proposes a ...
Highlights
- A conditional generative data-free distillation framework is proposed.
- The ...
Online Ensemble Model Compression Using Knowledge Distillation
Computer Vision – ECCV 2020
Abstract
This paper presents a novel knowledge distillation based model compression framework consisting of a student ensemble. It enables distillation of simultaneously learnt ensemble knowledge onto each of the compressed student models. Each model ...
Knowledge Distillation based Online Learning Methodology using Unlabeled Data Stream
MLMI '18: Proceedings of the 2018 International Conference on Machine Learning and Machine Intelligence

In supervised learning, the performance of the learning model decreases with the change of time step due to concept drift caused by overfitting of the training data. As a methodology to mitigate such concept drift, an online learning methodology has ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
358
Total Downloads

Downloads (Last 12 months)49
Downloads (Last 6 weeks)7

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang YChen ZZhang JYang DGe ZLiu YLiu SSun YZhang WQi LCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Sampling to Distill: Knowledge Transfer from Open-World DataProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680618(2438-2447)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680618
Aerni MZhang JTramèr FLuo BLiao XXu JKirda ELie D(2024)Evaluations of Machine Learning Privacy Defenses are MisleadingProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690194(1271-1284)Online publication date: 2-Dec-2024
https://dl.acm.org/doi/10.1145/3658644.3690194
Lin CMao XQiu CZou L(2024)DTCNet: Transformer-CNN Distillation for Super-Resolution of Remote Sensing ImageIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2024.340980817(11117-11133)Online publication date: 2024
https://doi.org/10.1109/JSTARS.2024.3409808
Huang X(2024)A New Data-Free Knowledge Distillation Method for Non-IID Federated Learning2024 6th International Conference on Communications, Information System and Computer Engineering (CISCE)10.1109/CISCE62493.2024.10652604(678-681)Online publication date: 10-May-2024
https://doi.org/10.1109/CISCE62493.2024.10652604
Tian LWu JSong WHong QLiu DYe FGao FHu YWu MLan YChen L(2024)Precise and automated lung cancer cell classification using deep neural network with multiscale features and model distillationScientific Reports10.1038/s41598-024-61101-714:1Online publication date: 7-May-2024
https://doi.org/10.1038/s41598-024-61101-7
Meng AZhang XYu XJia LSun ZGuo LYang H(2024)Investigation on lightweight identification method for pavement cracksConstruction and Building Materials10.1016/j.conbuildmat.2024.138017447(138017)Online publication date: Oct-2024
https://doi.org/10.1016/j.conbuildmat.2024.138017
Qi QZhang ALiao YSun WWang YLi XLiu S(2023)Simultaneously Training and Compressing Vision-and-Language Pre-Training ModelIEEE Transactions on Multimedia10.1109/TMM.2022.323325825(8194-8203)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TMM.2022.3233258
Shao HZhong D(2023)Multi-Target Cross-Dataset Palmprint Recognition via Distilling From Multi-TeacherIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2023.328405772(1-14)Online publication date: 2023
https://doi.org/10.1109/TIM.2023.3284057
Wu XBilal MXu XSong H(2023)Locally private estimation of conditional probability distribution for random forest in multimedia applicationsInformation Sciences: an International Journal10.1016/j.ins.2023.119111642:COnline publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1016/j.ins.2023.119111
Hao ZLuo YWang ZHu HAn J(2022)CDFKD-MFS: Collaborative Data-Free Knowledge Distillation via Multi-Level Feature SharingIEEE Transactions on Multimedia10.1109/TMM.2022.319266324(4262-4274)Online publication date: 2022
https://doi.org/10.1109/TMM.2022.3192663

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten