research-article

AID: Active Distillation Machine to Leverage Pre-Trained Black-Box Models in Private Data Settings

Authors:

Trong Nghia Hoang,

Jimeng SunAuthors Info & Claims

WWW '21: Proceedings of the Web Conference 2021

Pages 3569 - 3581

https://doi.org/10.1145/3442381.3449944

Published: 03 June 2021 Publication History

Abstract

This paper presents an active distillation method for a local institution (e.g., hospital) to find the best queries within its given budget to distill an on-server black-box model’s predictive knowledge into a local surrogate with transparent parameterization. This allows local institutions to understand better the predictive reasoning of the black-box model in its own local context or to further customize the distilled knowledge with its private dataset that cannot be centralized and fed into the server model. The proposed method thus addresses several challenges of deploying machine learning (ML) in many industrial settings (e.g., healthcare analytics) with strong proprietary constraints. These include: (1) the opaqueness of the server model’s architecture which prevents local users from understanding its predictive reasoning in their local data contexts; (2) the increasing cost and risk of uploading local data on the cloud for analysis; and (3) the need to customize the server model with private onsite data. We evaluated the proposed method on both benchmark and real-world healthcare data where significant improvements over existing local distillation methods were observed. A theoretical analysis of the proposed method is also presented.

References

[1]

Jimmy Ba and Rich Caruana. 2014. Do deep nets really need to be deep?. In NIPS. 2654–2662.

[2]

S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Muller, and W. Samek. 2015. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one 10(2015). Issue 7.

[3]

D. Baehrens, T. Schroeter, S. Harmeling, M. Kawanabe, K. Hansen, and K.-R. MAzller. 2010. How to explain individual classification decisions. JMLR 11(2010), 1803–1831.

Digital Library

[4]

Jianbo Chen, Le Song, Martin J Wainwright, and Michael I Jordan. 2018. Learning to Explain: An Information-Theoretic Perspective on Model Interpretation. arXiv preprint arXiv:1802.07814(2018).

[5]

Edward Choi, Mohammad Taha Bahadori, Le Song, Walter F. Stewart, and Jimeng Sun. 2017. GRAM: Graph-based Attention Model for Healthcare Representation Learning. In KDD.

Digital Library

[6]

Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. In NIPS.

[7]

Matthieu Courbariaux and Yoshua Bengio. 2016. BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. CoRR abs/1602.02830(2016). arxiv:1602.02830http://arxiv.org/abs/1602.02830

[8]

Samuel G Finlayson, Isaac S Kohane, and Andrew L Beam. 2018. Adversarial Attacks Against Medical Deep Learning Systems. arXiv:1804.05296 (2018).

[9]

Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2018. A Survey of Methods for Explaining Black Box Models. ACM Comput. Surv. (2018).

[10]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531(2015).

[11]

Q. M. Hoang, T. N. Hoang, K. H. Low, and C. Kingsford. 2019. Collective Model Fusion of Multiple Black-Box Experts. In Proc. ICML.

[12]

T. N. Hoang, C. T. Lam, K. H. Low, and P. Jaillet. 2020. Learning Task-Agnostic Embedding of Multiple Black-Box Experts for Multi-Task Model Fusion. In Proc. ICML.

[13]

T. N. Hoang, K. H. Low, P. Jaillet, and M. Kankanhalli. 2014. Nonmyopic ϵ-Bayes-Optimal Active Learning of Gaussian Processes. In Proc. ICML. 739–747.

[14]

T. N. Hoang, K. H. Low, P. Jaillet, and M. S. Kankanhalli. 2014. Active learning is planning: Non-myopic ϵ-Bayes-optimal active learning of Gaussian processes. In Proc. ECML-PKDD Nectar Track. 494–498.

[15]

Alistair EW Johnson, Tom J Pollard, Lu Shen, Li-wei H Lehman, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. 2016. MIMIC-III, a freely accessible critical care database. Scientific data (2016).

[16]

David Lopez-Paz, Léon Bottou, Bernhard Schölkopf, and Vladimir Vapnik. 2015. Unifying distillation and privileged information. ICLR (2015).

[17]

S. M. Lundberg and S.-I. Lee. 2017. A Unified Approach to Interpreting Model Predictions. In NIPS. 4768–4777.

[18]

Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, and Kaiming He. 2017. Data distillation: Towards omni-supervised learning. arXiv preprint arXiv:1712.04440(2017).

[19]

M. T. Ribeiro, S. Singh, and C. Guestrin. 2016. Why should I trust you?: Explaining the predictions of any classifier. In SIGKDD. 1135–1144.

[20]

M. T. Ribeiro, S. Singh, and C. Guestrin. 2018. Anchors: High-Precision Model-Agnostic Explanation. In AAAI.

[21]

Bharat Bhusan Sau and Vineeth N Balasubramanian. 2016. Deep model compression: Distilling knowledge from noisy teachers. arXiv:1610.09650 (2016).

[22]

Burr Settles. 2012. Active Learning. Morgan & Claypool Publishers.

[23]

A. Shrikumar, P. Greenside, and A. Kundaje. 2017. Learning important features through propagating activation differences. In ICML. 3145–3153.

[24]

K. Simonyan, A. Vedaldi, and A. Zisserman. 2013. Deep inside convolutional networks: Visualizing image classification models and saliency maps. http://arxiv.org/abs/1312.6034

[25]

J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller. 2014. Striving for simplicity: The all convolutional net. http://arxiv.org/abs/1412.6806

[26]

Eric J. Topol. 2019. High-performance medicine: the convergence of human and artificial intelligence. Nature Medicine 25(2019), 44–56.

[27]

Yanbo Xu, Siddharth Biswal, Shriprasad R Deshpande, Kevin O Maher, and Jimeng Sun. 2018. RAIM: Recurrent Attentive and Intensive Model of Multimodal Patient Monitoring Data. In KDD.

Digital Library

[28]

Junho Yim, Donggyu Joo, Jihoon Bae, and Junmo Kim. 2017. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In CVPR, Vol. 2.

Cited By

Qiao RXu XLow BKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Collaborative causal inference with fair incentivesProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619582(28300-28320)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619582
Cacciarelli DKulahci M(2023)Active learning for data streams: a surveyMachine Language10.1007/s10994-023-06454-2113:1(185-239)Online publication date: 20-Nov-2023
https://dl.acm.org/doi/10.1007/s10994-023-06454-2
Zhang CChu XMa LZhu YWang YWang JZhao JZhang ARangwala H(2022)M3Care: Learning with Missing Modalities in Multimodal Healthcare DataProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539388(2418-2428)Online publication date: 14-Aug-2022
https://dl.acm.org/doi/10.1145/3534678.3539388
Show More Cited By

Recommendations

Enhancing healthcare decision support through explainable AI models for risk prediction
Abstract
Electronic health records (EHRs) are a valuable source of information that can aid in understanding a patient’s health condition and making informed healthcare decisions. However, modelling longitudinal EHRs with heterogeneous information is a ...
Can the development of a patient’s condition be predicted through intelligent inquiry under the e-health business mode? Sequential feature map-based disease risk prediction upon features selected from cognitive diagnosis big data
Highlights
- Online intelligent medical inquiry-based physician’s cognitive diagnosis big data was fused with offline EMR to obtain VEMR.
Abstract
The data-driven mode has promoted the researches of preventive medicine. In prediction of disease risks, physicians’ clinical cognitive diagnosis data can be used for early prevention of diseases and, therefore, to reduce medical cost, ...
FedAL: Black-Box Federated Knowledge Distillation Enabled by Adversarial Learning
Knowledge distillation (KD) can enable collaborative learning among distributed clients that have different model architectures and do not share their local data and model parameters with others. Each client updates its local model using the average model ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '21: Proceedings of the Web Conference 2021

April 2021

4054 pages

ISBN:9781450383127

DOI:10.1145/3442381

Editors:
Jure Leskovec
Stanford
,
Marko Grobelnik
Jožef Stefan Institute
,
Marc Najork
Google
,
Jie Tang
Tsinghua University
,
Leila Zia
Wikimedia Foundation

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '21

Sponsor:

SIGWEB

WWW '21: The Web Conference 2021

April 19 - 23, 2021

Ljubljana, Slovenia

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
142
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)1

Reflects downloads up to 11 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Qiao RXu XLow BKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Collaborative causal inference with fair incentivesProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619582(28300-28320)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619582
Cacciarelli DKulahci M(2023)Active learning for data streams: a surveyMachine Language10.1007/s10994-023-06454-2113:1(185-239)Online publication date: 20-Nov-2023
https://dl.acm.org/doi/10.1007/s10994-023-06454-2
Zhang CChu XMa LZhu YWang YWang JZhao JZhang ARangwala H(2022)M3Care: Learning with Missing Modalities in Multimodal Healthcare DataProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539388(2418-2428)Online publication date: 14-Aug-2022
https://dl.acm.org/doi/10.1145/3534678.3539388
Fan FMa YDai ZJing WTan CLow BRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Fault-tolerant federated reinforcement learning with theoretical guaranteeProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540339(1007-1021)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3540339

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten