Machine Teaching-Based Efficient Labelling for Cross-unit Healthcare Data Modelling

Wang, Yang; Peng, Xueping; Clarke, Allison; Schlegel, Clement; Jiang, Jing

doi:10.1007/978-3-030-97546-3_26

Machine Teaching-Based Efficient Labelling for Cross-unit Healthcare Data Modelling

Yang Wang^11,12,
Xueping Peng¹¹,
Allison Clarke¹²,
Clement Schlegel¹² &
…
Jing Jiang¹¹

Conference paper
First Online: 19 March 2022

1758 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13151))

Abstract

A data custodian of a big organization (such as a Commonwealth Data Integrating Authority), namely teacher, can easily build an intelligent model which is well trained by comprehensive data collected from multiple sources. However, due to information security and privacy-related regulation requirements, full access to the well-trained intelligent model and the comprehensive training data is usually limited to the teacher only and not available to any unit (or branch) of that organization. Therefore, if a unit, namely student, needs an intelligent function similar to the trained intelligent model, the student has to train a similar model from scratch using the student’s own dataset. Such a dataset is usually unlabelled, requiring a big workload on labelling. Inspired by the Iterative Machine Teaching, we propose a novel collaboration pipeline. It enables the teacher to iteratively guide the student to select samples that are most worth labelling from the student’s own dataset, which significantly reduces the requirement for human labelling and, at the same time, prevents regulation and information security breaches. The effectiveness and efficiency of the proposed pipeline is empirically demonstrated on two publicly available healthcare datasets in comparison with baseline methods. This work has broad implications for the healthcare sector to facilitate data modelling in instances where the large labelled datasets are not accessible to each unit.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Balcan, M.F., Hanneke, S., Vaughan, J.W.: The true sample complexity of active learning. Mach. Learn. 80(2), 111–139 (2010)
Article MathSciNet Google Scholar
Baytas, I.M., Xiao, C., Zhang, X., Wang, F., Jain, A.K., Zhou, J.: Patient subtyping via time-aware LSTM networks. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 65–74 (2017)
Google Scholar
Cakmak, M., Thomaz, A.L.: Eliciting good teaching from humans for machine learners. Artif. Intell. 217, 198–215 (2014)
Article Google Scholar
Choi, E., Bahadori, M.T., Song, L., Stewart, W.F., Sun, J.: GRAM: graph-based attention model for healthcare representation learning. In: SIGKDD, pp. 787–795. ACM (2017)
Google Scholar
Choi, E., Bahadori, M.T., Sun, J., Kulas, J., Schuetz, A., Stewart, W.: RETAIN: an interpretable predictive model for healthcare using reverse time attention mechanism. In: NeurIPS, pp. 3504–3512 (2016)
Google Scholar
Fails, J.A., Olsen Jr., D.R.: Interactive machine learning. In: Proceedings of the 8th International Conference on Intelligent User Interfaces, pp. 39–45 (2003)
Google Scholar
Gao, J., Xiao, C., Wang, Y., Tang, W., Glass, L.M., Sun, J.: Stagenet: Stage-aware neural networks for health risk prediction. In: Proceedings of the Web Conference 2020, pp. 530–540 (2020)
Google Scholar
Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L.A., Mark, R.: Mimic-iv (version 0.4). PhysioNet (2020)
Google Scholar
Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lee, D., et al.: Generating sequential electronic health records using dual adversarial autoencoder. J. Am. Med. Inform. Assoc. 27(9), 1411–1419 (2020)
Article Google Scholar
Liu, M., Jiang, L., Liu, J., Wang, X., Zhu, J., Liu, S.: Improving learning-from-crowds through expert validation. In: IJCAI, pp. 2329–2336 (2017)
Google Scholar
Liu, W., et al.: Iterative machine teaching. In: International Conference on Machine Learning, pp. 2149–2158. PMLR (2017)
Google Scholar
Long, G., Shen, T., Tan, Y., Gerrard, L., Clarke, A., Jiang, J.: Federated learning for privacy-preserving open innovation future on digital health. arXiv preprint arXiv:2108.10761 (2021)
Ma, F., You, Q., Xiao, H., Chitta, R., Zhou, J., Gao, J.: KAME: knowledge-based attention model for diagnosis prediction in healthcare. In: CIKM, pp. 743–752. ACM, October 2018
Google Scholar
Nguyen, P., Tran, T., Wickramasinghe, N., Venkatesh, S.: Deepr: a convolutional net for medical records. IEEE J. Biomed. Health Inform. 21(1), 22–30 (2017)
Article Google Scholar
Peng, X., Long, G., Pan, S., Jiang, J., Niu, Z.: Attentive dual embedding for understanding medical concepts in electronic health records. In: IJCNN, pp. 1–8. IEEE (2019)
Google Scholar
Peng, X., Long, G., Shen, T., Wang, S., Jiang, J., Zhang, C.: BiteNet: bidirectional temporal encoder network to predict medical outcomes. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 412–421. IEEE (2020)
Google Scholar
Peng, X., Shen, T., Wang, S., Niu, Z., Zhang, C., et al.: MIMO: mutual integration of patient journey and medical ontology for healthcare representation learning. arXiv preprint arXiv:2107.09288 (2021)
Pham, T., Tran, T., Phung, D., Venkatesh, S.: DeepCare: a deep dynamic memory model for predictive medicine. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS (LNAI), vol. 9652, pp. 30–41. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31750-2_3
Chapter Google Scholar
Settles, B.: Active learning. Synth. Lect. Artif. Intell. Mach. Learn. 6(1), 1–114 (2012)
MathSciNet MATH Google Scholar
Settles, B.: Active learning literature survey (2009)
Google Scholar
Shickel, B., Tighe, P.J., Bihorac, A., Rashidi, P.: Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J. Biomed. Health Inform. 22(5), 1589–1604 (2018)
Article Google Scholar
Singla, A., Bogunovic, I., Bartók, G., Karbasi, A., Krause, A.: Near-optimally teaching the crowd to classify. In: International Conference on Machine Learning, pp. 154–162. PMLR (2014)
Google Scholar
Song, L., Cheong, C.W., Yin, K., Cheung, W.K., Cm, B.: Medical concept embedding with multiple ontological representations. In: IJCAI, pp. 4613–4619 (2019)
Google Scholar
Wang, Y., Long, G., Peng, X., Clarke, A., Stevenson, R., Gerrard, L.: Interactive deep metric learning for healthcare cohort discovery. In: Le, T.D., et al. (eds.) AusDM 2019. CCIS, vol. 1127, pp. 208–221. Springer, Singapore (2019). https://doi.org/10.1007/978-981-15-1699-3_17
Chapter Google Scholar
Zhao, J., Chen, Y., Zhang, W.: Differential privacy preservation in deep learning: challenges, opportunities and solutions. IEEE Access 7, 48901–48911 (2019)
Article Google Scholar
Zhu, X.: Machine teaching for Bayesian learners in the exponential family. In: NIPS, pp. 1905–1913 (2013)
Google Scholar
Zhu, X.: Machine teaching: an inverse problem to machine learning and an approach toward optimal education. In: AAAI, vol. 29 (2015)
Google Scholar

Download references

Acknowledgements

This research is supported by an Australian Government Research Training Program Scholarship. We also thank the Australian Government Department of Health for supporting this work.

Author information

Authors and Affiliations

Australian AI Institute, University of Technology Sydney, Ultimo, Australia
Yang Wang, Xueping Peng & Jing Jiang
Health Economics and Research Division, Australian Department of Health, Canberra, Australia
Yang Wang, Allison Clarke & Clement Schlegel

Authors

Yang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xueping Peng
View author publications
You can also search for this author in PubMed Google Scholar
Allison Clarke
View author publications
You can also search for this author in PubMed Google Scholar
Clement Schlegel
View author publications
You can also search for this author in PubMed Google Scholar
Jing Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xueping Peng .

Editor information

Editors and Affiliations

University of Technology Sydney, Sydney, NSW, Australia
Guodong Long
RMIT University, Melbourne, SA, Australia
Xinghuo Yu
University of Queensland, Brisbane, QLD, Australia
Sen Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., Peng, X., Clarke, A., Schlegel, C., Jiang, J. (2022). Machine Teaching-Based Efficient Labelling for Cross-unit Healthcare Data Modelling. In: Long, G., Yu, X., Wang, S. (eds) AI 2021: Advances in Artificial Intelligence. AI 2022. Lecture Notes in Computer Science(), vol 13151. Springer, Cham. https://doi.org/10.1007/978-3-030-97546-3_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-97546-3_26
Published: 19 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-97545-6
Online ISBN: 978-3-030-97546-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics