skip to main content
10.1145/3442381.3449944acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

AID: Active Distillation Machine to Leverage Pre-Trained Black-Box Models in Private Data Settings

Published: 03 June 2021 Publication History

Abstract

This paper presents an active distillation method for a local institution (e.g., hospital) to find the best queries within its given budget to distill an on-server black-box model’s predictive knowledge into a local surrogate with transparent parameterization. This allows local institutions to understand better the predictive reasoning of the black-box model in its own local context or to further customize the distilled knowledge with its private dataset that cannot be centralized and fed into the server model. The proposed method thus addresses several challenges of deploying machine learning (ML) in many industrial settings (e.g., healthcare analytics) with strong proprietary constraints. These include: (1) the opaqueness of the server model’s architecture which prevents local users from understanding its predictive reasoning in their local data contexts; (2) the increasing cost and risk of uploading local data on the cloud for analysis; and (3) the need to customize the server model with private onsite data. We evaluated the proposed method on both benchmark and real-world healthcare data where significant improvements over existing local distillation methods were observed. A theoretical analysis of the proposed method is also presented.

References

[1]
Jimmy Ba and Rich Caruana. 2014. Do deep nets really need to be deep?. In NIPS. 2654–2662.
[2]
S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Muller, and W. Samek. 2015. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one 10(2015). Issue 7.
[3]
D. Baehrens, T. Schroeter, S. Harmeling, M. Kawanabe, K. Hansen, and K.-R. MAzller. 2010. How to explain individual classification decisions. JMLR 11(2010), 1803–1831.
[4]
Jianbo Chen, Le Song, Martin J Wainwright, and Michael I Jordan. 2018. Learning to Explain: An Information-Theoretic Perspective on Model Interpretation. arXiv preprint arXiv:1802.07814(2018).
[5]
Edward Choi, Mohammad Taha Bahadori, Le Song, Walter F. Stewart, and Jimeng Sun. 2017. GRAM: Graph-based Attention Model for Healthcare Representation Learning. In KDD.
[6]
Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. In NIPS.
[7]
Matthieu Courbariaux and Yoshua Bengio. 2016. BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. CoRR abs/1602.02830(2016). arxiv:1602.02830http://arxiv.org/abs/1602.02830
[8]
Samuel G Finlayson, Isaac S Kohane, and Andrew L Beam. 2018. Adversarial Attacks Against Medical Deep Learning Systems. arXiv:1804.05296 (2018).
[9]
Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2018. A Survey of Methods for Explaining Black Box Models. ACM Comput. Surv. (2018).
[10]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531(2015).
[11]
Q. M. Hoang, T. N. Hoang, K. H. Low, and C. Kingsford. 2019. Collective Model Fusion of Multiple Black-Box Experts. In Proc. ICML.
[12]
T. N. Hoang, C. T. Lam, K. H. Low, and P. Jaillet. 2020. Learning Task-Agnostic Embedding of Multiple Black-Box Experts for Multi-Task Model Fusion. In Proc. ICML.
[13]
T. N. Hoang, K. H. Low, P. Jaillet, and M. Kankanhalli. 2014. Nonmyopic ϵ-Bayes-Optimal Active Learning of Gaussian Processes. In Proc. ICML. 739–747.
[14]
T. N. Hoang, K. H. Low, P. Jaillet, and M. S. Kankanhalli. 2014. Active learning is planning: Non-myopic ϵ-Bayes-optimal active learning of Gaussian processes. In Proc. ECML-PKDD Nectar Track. 494–498.
[15]
Alistair EW Johnson, Tom J Pollard, Lu Shen, Li-wei H Lehman, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. 2016. MIMIC-III, a freely accessible critical care database. Scientific data (2016).
[16]
David Lopez-Paz, Léon Bottou, Bernhard Schölkopf, and Vladimir Vapnik. 2015. Unifying distillation and privileged information. ICLR (2015).
[17]
S. M. Lundberg and S.-I. Lee. 2017. A Unified Approach to Interpreting Model Predictions. In NIPS. 4768–4777.
[18]
Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, and Kaiming He. 2017. Data distillation: Towards omni-supervised learning. arXiv preprint arXiv:1712.04440(2017).
[19]
M. T. Ribeiro, S. Singh, and C. Guestrin. 2016. Why should I trust you?: Explaining the predictions of any classifier. In SIGKDD. 1135–1144.
[20]
M. T. Ribeiro, S. Singh, and C. Guestrin. 2018. Anchors: High-Precision Model-Agnostic Explanation. In AAAI.
[21]
Bharat Bhusan Sau and Vineeth N Balasubramanian. 2016. Deep model compression: Distilling knowledge from noisy teachers. arXiv:1610.09650 (2016).
[22]
Burr Settles. 2012. Active Learning. Morgan & Claypool Publishers.
[23]
A. Shrikumar, P. Greenside, and A. Kundaje. 2017. Learning important features through propagating activation differences. In ICML. 3145–3153.
[24]
K. Simonyan, A. Vedaldi, and A. Zisserman. 2013. Deep inside convolutional networks: Visualizing image classification models and saliency maps. http://arxiv.org/abs/1312.6034
[25]
J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller. 2014. Striving for simplicity: The all convolutional net. http://arxiv.org/abs/1412.6806
[26]
Eric J. Topol. 2019. High-performance medicine: the convergence of human and artificial intelligence. Nature Medicine 25(2019), 44–56.
[27]
Yanbo Xu, Siddharth Biswal, Shriprasad R Deshpande, Kevin O Maher, and Jimeng Sun. 2018. RAIM: Recurrent Attentive and Intensive Model of Multimodal Patient Monitoring Data. In KDD.
[28]
Junho Yim, Donggyu Joo, Jihoon Bae, and Junmo Kim. 2017. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In CVPR, Vol. 2.

Cited By

View all
  • (2023)Collaborative causal inference with fair incentivesProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619582(28300-28320)Online publication date: 23-Jul-2023
  • (2023)Active learning for data streams: a surveyMachine Language10.1007/s10994-023-06454-2113:1(185-239)Online publication date: 20-Nov-2023
  • (2022)M3Care: Learning with Missing Modalities in Multimodal Healthcare DataProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539388(2418-2428)Online publication date: 14-Aug-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '21: Proceedings of the Web Conference 2021
April 2021
4054 pages
ISBN:9781450383127
DOI:10.1145/3442381
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep learning
  2. disease risk prediction
  3. model distillation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

WWW '21
Sponsor:
WWW '21: The Web Conference 2021
April 19 - 23, 2021
Ljubljana, Slovenia

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)1
Reflects downloads up to 11 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Collaborative causal inference with fair incentivesProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619582(28300-28320)Online publication date: 23-Jul-2023
  • (2023)Active learning for data streams: a surveyMachine Language10.1007/s10994-023-06454-2113:1(185-239)Online publication date: 20-Nov-2023
  • (2022)M3Care: Learning with Missing Modalities in Multimodal Healthcare DataProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539388(2418-2428)Online publication date: 14-Aug-2022
  • (2021)Fault-tolerant federated reinforcement learning with theoretical guaranteeProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540339(1007-1021)Online publication date: 6-Dec-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media