skip to main content
10.1145/3474085.3475329acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Data-Free Ensemble Knowledge Distillation for Privacy-conscious Multimedia Model Compression

Published: 17 October 2021 Publication History

Abstract

Recent advances in deep learning bring impressive performance for multimedia applications. Hence, compressing and deploying these applications on resource-limited edge devices via model compression becomes attractive. Knowledge distillation (KD) is one of the most popular model compression techniques. However, most well-behaved KD approaches require the original dataset, which is usually unavailable due to privacy issues, while existing data-free KD methods perform much worse than data-required counterparts. In this paper, we analyze previous data-free KD methods from the data perspective and point out that using a single pre-trained model limits the performance of these approaches. We then propose a Data-Free Ensemble knowledge Distillation (DFED) framework, which contains a student network, a generator network, and multiple pre-trained teacher networks. During training, the student mimics behaviors of the ensemble of teachers using samples synthesized by a generator, which aims to enlarge the prediction discrepancy between the student and teachers. A moment matching loss term assists the generator training by minimizing the distance between activations of synthesized samples and real samples. We evaluate DFED on three popular image classification datasets. Results demonstrate that our method achieves significant performance improvements compared with previous works. We also design an ablation study to verify the effectiveness of each component of the proposed framework.

References

[1]
Sungsoo Ahn, Shell Xu Hu, Andreas C. Damianou, Neil D. Lawrence, and Zhenwen Dai. 2019. Variational Information Distillation for Knowledge Transfer. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16--20, 2019. Computer Vision Foundation / IEEE, 9163--9171. https://doi.org/10.1109/CVPR.2019.00938
[2]
Zeyuan Allen-Zhu and Yuanzhi Li. 2020. Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning. CoRR, Vol. abs/2012.09816 (2020). arxiv: 2012.09816
[3]
Defang Chen, Jian-Ping Mei, Can Wang, Yan Feng, and Chun Chen. 2020. Online Knowledge Distillation with Diverse Peers. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7--12, 2020. AAAI Press, 3430--3437.
[4]
Hanting Chen, Yunhe Wang, Chang Xu, Zhaohui Yang, Chuanjian Liu, Boxin Shi, Chunjing Xu, Chao Xu, and Qi Tian. 2019. Data-Free Learning of Student Networks. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 3513--3521. https://doi.org/10.1109/ICCV.2019.00361
[5]
Shangchen Du, Shan You, Xiaojie Li, Jianlong Wu, Fei Wang, Chen Qian, and Changshui Zhang. 2020. Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6--12, 2020, virtual .
[6]
Gongfan Fang, Jie Song, Chengchao Shen, Xinchao Wang, Da Chen, and Mingli Song. 2019. Data-Free Adversarial Distillation. CoRR, Vol. abs/1912.11006 (2019). arxiv: 1912.11006
[7]
Tommaso Furlanello, Zachary Chase Lipton, Michael Tschannen, Laurent Itti, and Anima Anandkumar. 2018. Born-Again Neural Networks. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm"a ssan, Stockholm, Sweden, July 10--15, 2018 (Proceedings of Machine Learning Research, Vol. 80). PMLR, 1602--1611.
[8]
Matan Haroush, Itay Hubara, Elad Hoffer, and Daniel Soudry. 2020. The Knowledge Within: Methods for Data-Free Model Compression. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13--19, 2020. IEEE, 8491--8499. https://doi.org/10.1109/CVPR42600.2020.00852
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27--30, 2016. IEEE Computer Society, 770--778. https://doi.org/10.1109/CVPR.2016.90
[10]
Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the Knowledge in a Neural Network. CoRR, Vol. abs/1503.02531 (2015). arxiv: 1503.02531
[11]
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).
[12]
Xu Lan, Xiatian Zhu, and Shaogang Gong. 2018. Knowledge Distillation by On-the-Fly Native Ensemble. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3--8, 2018, Montré al, Canada. 7528--7538.
[13]
Fei-Fei Li, Robert Fergus, and Pietro Perona. 2006. One-Shot Learning of Object Categories. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 28, 4 (2006), 594--611. https://doi.org/10.1109/TPAMI.2006.79
[14]
Xiaojie Li, Jianlong Wu, Hongyu Fang, Yue Liao, Fei Wang, and Chen Qian. 2020 a. Local Correlation Consistency for Knowledge Distillation. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XII (Lecture Notes in Computer Science, Vol. 12357). Springer, 18--33. https://doi.org/10.1007/978--3-030--58610--2_2
[15]
Yuhang Li, Feng Zhu, Ruihao Gong, Mingzhu Shen, Fengwei Yu, Shaoqing Lu, and Shi Gu. 2020 b. Learning in School: Multi-teacher Knowledge Inversion for Data-Free Quantization. CoRR, Vol. abs/2011.09899 (2020). arxiv: 2011.09899
[16]
Raphael Gontijo Lopes, Stefano Fenu, and Thad Starner. 2017. Data-Free Knowledge Distillation for Deep Neural Networks. CoRR, Vol. abs/1710.07535 (2017). arxiv: 1710.07535
[17]
Andrey Malinin, Bruno Mlodozeniec, and Mark J. F. Gales. 2020. Ensemble Distribution Distillation. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26--30, 2020. OpenReview.net.
[18]
Paul Micaelli and Amos J. Storkey. 2019. Zero-shot Knowledge Transfer via Adversarial Belief Matching. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8--14, 2019, Vancouver, BC, Canada. 9547--9557.
[19]
Gaurav Kumar Nayak, Konda Reddy Mopuri, Vaisakh Shaj, Venkatesh Babu Radhakrishnan, and Anirban Chakraborty. 2019. Zero-Shot Knowledge Distillation in Deep Networks. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9--15 June 2019, Long Beach, California, USA (Proceedings of Machine Learning Research, Vol. 97). PMLR, 4743--4751.
[20]
Luong Trung Nguyen, Kwangjin Lee, and Byonghyo Shim. 2020. Stochasticity and Skip Connection Improve Knowledge Transfer. In 28th European Signal Processing Conference, EUSIPCO 2020, Amsterdam, Netherlands, January 18--21, 2021. IEEE, 1537--1541. https://doi.org/10.23919/Eusipco47968.2020.9287227
[21]
Seonguk Park and Nojun Kwak. 2019. FEED: Feature-level Ensemble for Knowledge Distillation. CoRR, Vol. abs/1909.10754 (2019). arxiv: 1909.10754
[22]
Baoyun Peng, Xiao Jin, Dongsheng Li, Shunfeng Zhou, Yichao Wu, Jiaheng Liu, Zhaoning Zhang, and Yu Liu. 2019. Correlation Congruence for Knowledge Distillation. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 5006--5015. https://doi.org/10.1109/ICCV.2019.00511
[23]
Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18--22, 2018. IEEE Computer Society, 4510--4520. https://doi.org/10.1109/CVPR.2018.00474
[24]
Bharat Bhusan Sau and Vineeth N. Balasubramanian. 2016. Deep Model Compression: Distilling Knowledge from Noisy Teachers. CoRR, Vol. abs/1610.09650 (2016). arxiv: 1610.09650
[25]
Devesh Walawalkar, Zhiqiang Shen, and Marios Savvides. 2020. Online Ensemble Model Compression Using Knowledge Distillation. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XIX (Lecture Notes in Computer Science, Vol. 12364). Springer, 18--35. https://doi.org/10.1007/978--3-030--58529--7_2
[26]
Hongxu Yin, Pavlo Molchanov, Jose M. Alvarez, Zhizhong Li, Arun Mallya, Derek Hoiem, Niraj K. Jha, and Jan Kautz. 2020. Dreaming to Distill: Data-Free Knowledge Transfer via DeepInversion. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13--19, 2020. IEEE, 8712--8721. https://doi.org/10.1109/CVPR42600.2020.00874
[27]
Jaemin Yoo, Minyong Cho, Taebum Kim, and U Kang. 2019. Knowledge Extraction with No Observable Data. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8--14, 2019, Vancouver, BC, Canada. 2701--2710.
[28]
Shan You, Chang Xu, Chao Xu, and Dacheng Tao. 2017. Learning from Multiple Teacher Networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13 - 17, 2017. ACM, 1285--1294. https://doi.org/10.1145/3097983.3098135
[29]
Fei Yuan, Linjun Shou, Jian Pei, Wutao Lin, Ming Gong, Yan Fu, and Daxin Jiang. 2020. Reinforced Multi-Teacher Selection for Knowledge Distillation. CoRR, Vol. abs/2012.06048 (2020). arxiv: 2012.06048
[30]
Sergey Zagoruyko and Nikos Komodakis. 2016. Wide Residual Networks. In Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19--22, 2016. BMVA Press.
[31]
Ying Zhang, Tao Xiang, Timothy M. Hospedales, and Huchuan Lu. 2018. Deep Mutual Learning. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18--22, 2018. IEEE Computer Society, 4320--4328. https://doi.org/10.1109/CVPR.2018.00454

Cited By

View all
  • (2024)Sampling to Distill: Knowledge Transfer from Open-World DataProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680618(2438-2447)Online publication date: 28-Oct-2024
  • (2024)Evaluations of Machine Learning Privacy Defenses are MisleadingProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690194(1271-1284)Online publication date: 2-Dec-2024
  • (2024)DTCNet: Transformer-CNN Distillation for Super-Resolution of Remote Sensing ImageIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2024.340980817(11117-11133)Online publication date: 2024
  • Show More Cited By

Index Terms

  1. Data-Free Ensemble Knowledge Distillation for Privacy-conscious Multimedia Model Compression

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '21: Proceedings of the 29th ACM International Conference on Multimedia
      October 2021
      5796 pages
      ISBN:9781450386517
      DOI:10.1145/3474085
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 October 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. data-free
      2. ensemble
      3. knowledge distillation
      4. model compression

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      MM '21
      Sponsor:
      MM '21: ACM Multimedia Conference
      October 20 - 24, 2021
      Virtual Event, China

      Acceptance Rates

      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)49
      • Downloads (Last 6 weeks)7
      Reflects downloads up to 17 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Sampling to Distill: Knowledge Transfer from Open-World DataProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680618(2438-2447)Online publication date: 28-Oct-2024
      • (2024)Evaluations of Machine Learning Privacy Defenses are MisleadingProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690194(1271-1284)Online publication date: 2-Dec-2024
      • (2024)DTCNet: Transformer-CNN Distillation for Super-Resolution of Remote Sensing ImageIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2024.340980817(11117-11133)Online publication date: 2024
      • (2024)A New Data-Free Knowledge Distillation Method for Non-IID Federated Learning2024 6th International Conference on Communications, Information System and Computer Engineering (CISCE)10.1109/CISCE62493.2024.10652604(678-681)Online publication date: 10-May-2024
      • (2024)Precise and automated lung cancer cell classification using deep neural network with multiscale features and model distillationScientific Reports10.1038/s41598-024-61101-714:1Online publication date: 7-May-2024
      • (2024)Investigation on lightweight identification method for pavement cracksConstruction and Building Materials10.1016/j.conbuildmat.2024.138017447(138017)Online publication date: Oct-2024
      • (2023)Simultaneously Training and Compressing Vision-and-Language Pre-Training ModelIEEE Transactions on Multimedia10.1109/TMM.2022.323325825(8194-8203)Online publication date: 1-Jan-2023
      • (2023)Multi-Target Cross-Dataset Palmprint Recognition via Distilling From Multi-TeacherIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2023.328405772(1-14)Online publication date: 2023
      • (2023)Locally private estimation of conditional probability distribution for random forest in multimedia applicationsInformation Sciences: an International Journal10.1016/j.ins.2023.119111642:COnline publication date: 1-Sep-2023
      • (2022)CDFKD-MFS: Collaborative Data-Free Knowledge Distillation via Multi-Level Feature SharingIEEE Transactions on Multimedia10.1109/TMM.2022.319266324(4262-4274)Online publication date: 2022

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media