Abstract
A data custodian of a big organization (such as a Commonwealth Data Integrating Authority), namely teacher, can easily build an intelligent model which is well trained by comprehensive data collected from multiple sources. However, due to information security and privacy-related regulation requirements, full access to the well-trained intelligent model and the comprehensive training data is usually limited to the teacher only and not available to any unit (or branch) of that organization. Therefore, if a unit, namely student, needs an intelligent function similar to the trained intelligent model, the student has to train a similar model from scratch using the student’s own dataset. Such a dataset is usually unlabelled, requiring a big workload on labelling. Inspired by the Iterative Machine Teaching, we propose a novel collaboration pipeline. It enables the teacher to iteratively guide the student to select samples that are most worth labelling from the student’s own dataset, which significantly reduces the requirement for human labelling and, at the same time, prevents regulation and information security breaches. The effectiveness and efficiency of the proposed pipeline is empirically demonstrated on two publicly available healthcare datasets in comparison with baseline methods. This work has broad implications for the healthcare sector to facilitate data modelling in instances where the large labelled datasets are not accessible to each unit.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Balcan, M.F., Hanneke, S., Vaughan, J.W.: The true sample complexity of active learning. Mach. Learn. 80(2), 111–139 (2010)
Baytas, I.M., Xiao, C., Zhang, X., Wang, F., Jain, A.K., Zhou, J.: Patient subtyping via time-aware LSTM networks. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 65–74 (2017)
Cakmak, M., Thomaz, A.L.: Eliciting good teaching from humans for machine learners. Artif. Intell. 217, 198–215 (2014)
Choi, E., Bahadori, M.T., Song, L., Stewart, W.F., Sun, J.: GRAM: graph-based attention model for healthcare representation learning. In: SIGKDD, pp. 787–795. ACM (2017)
Choi, E., Bahadori, M.T., Sun, J., Kulas, J., Schuetz, A., Stewart, W.: RETAIN: an interpretable predictive model for healthcare using reverse time attention mechanism. In: NeurIPS, pp. 3504–3512 (2016)
Fails, J.A., Olsen Jr., D.R.: Interactive machine learning. In: Proceedings of the 8th International Conference on Intelligent User Interfaces, pp. 39–45 (2003)
Gao, J., Xiao, C., Wang, Y., Tang, W., Glass, L.M., Sun, J.: Stagenet: Stage-aware neural networks for health risk prediction. In: Proceedings of the Web Conference 2020, pp. 530–540 (2020)
Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L.A., Mark, R.: Mimic-iv (version 0.4). PhysioNet (2020)
Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lee, D., et al.: Generating sequential electronic health records using dual adversarial autoencoder. J. Am. Med. Inform. Assoc. 27(9), 1411–1419 (2020)
Liu, M., Jiang, L., Liu, J., Wang, X., Zhu, J., Liu, S.: Improving learning-from-crowds through expert validation. In: IJCAI, pp. 2329–2336 (2017)
Liu, W., et al.: Iterative machine teaching. In: International Conference on Machine Learning, pp. 2149–2158. PMLR (2017)
Long, G., Shen, T., Tan, Y., Gerrard, L., Clarke, A., Jiang, J.: Federated learning for privacy-preserving open innovation future on digital health. arXiv preprint arXiv:2108.10761 (2021)
Ma, F., You, Q., Xiao, H., Chitta, R., Zhou, J., Gao, J.: KAME: knowledge-based attention model for diagnosis prediction in healthcare. In: CIKM, pp. 743–752. ACM, October 2018
Nguyen, P., Tran, T., Wickramasinghe, N., Venkatesh, S.: Deepr: a convolutional net for medical records. IEEE J. Biomed. Health Inform. 21(1), 22–30 (2017)
Peng, X., Long, G., Pan, S., Jiang, J., Niu, Z.: Attentive dual embedding for understanding medical concepts in electronic health records. In: IJCNN, pp. 1–8. IEEE (2019)
Peng, X., Long, G., Shen, T., Wang, S., Jiang, J., Zhang, C.: BiteNet: bidirectional temporal encoder network to predict medical outcomes. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 412–421. IEEE (2020)
Peng, X., Shen, T., Wang, S., Niu, Z., Zhang, C., et al.: MIMO: mutual integration of patient journey and medical ontology for healthcare representation learning. arXiv preprint arXiv:2107.09288 (2021)
Pham, T., Tran, T., Phung, D., Venkatesh, S.: DeepCare: a deep dynamic memory model for predictive medicine. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS (LNAI), vol. 9652, pp. 30–41. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31750-2_3
Settles, B.: Active learning. Synth. Lect. Artif. Intell. Mach. Learn. 6(1), 1–114 (2012)
Settles, B.: Active learning literature survey (2009)
Shickel, B., Tighe, P.J., Bihorac, A., Rashidi, P.: Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J. Biomed. Health Inform. 22(5), 1589–1604 (2018)
Singla, A., Bogunovic, I., Bartók, G., Karbasi, A., Krause, A.: Near-optimally teaching the crowd to classify. In: International Conference on Machine Learning, pp. 154–162. PMLR (2014)
Song, L., Cheong, C.W., Yin, K., Cheung, W.K., Cm, B.: Medical concept embedding with multiple ontological representations. In: IJCAI, pp. 4613–4619 (2019)
Wang, Y., Long, G., Peng, X., Clarke, A., Stevenson, R., Gerrard, L.: Interactive deep metric learning for healthcare cohort discovery. In: Le, T.D., et al. (eds.) AusDM 2019. CCIS, vol. 1127, pp. 208–221. Springer, Singapore (2019). https://doi.org/10.1007/978-981-15-1699-3_17
Zhao, J., Chen, Y., Zhang, W.: Differential privacy preservation in deep learning: challenges, opportunities and solutions. IEEE Access 7, 48901–48911 (2019)
Zhu, X.: Machine teaching for Bayesian learners in the exponential family. In: NIPS, pp. 1905–1913 (2013)
Zhu, X.: Machine teaching: an inverse problem to machine learning and an approach toward optimal education. In: AAAI, vol. 29 (2015)
Acknowledgements
This research is supported by an Australian Government Research Training Program Scholarship. We also thank the Australian Government Department of Health for supporting this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, Y., Peng, X., Clarke, A., Schlegel, C., Jiang, J. (2022). Machine Teaching-Based Efficient Labelling for Cross-unit Healthcare Data Modelling. In: Long, G., Yu, X., Wang, S. (eds) AI 2021: Advances in Artificial Intelligence. AI 2022. Lecture Notes in Computer Science(), vol 13151. Springer, Cham. https://doi.org/10.1007/978-3-030-97546-3_26
Download citation
DOI: https://doi.org/10.1007/978-3-030-97546-3_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-97545-6
Online ISBN: 978-3-030-97546-3
eBook Packages: Computer ScienceComputer Science (R0)