Skip to main content

Machine Teaching-Based Efficient Labelling for Cross-unit Healthcare Data Modelling

  • Conference paper
  • First Online:
  • 1758 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13151))

Abstract

A data custodian of a big organization (such as a Commonwealth Data Integrating Authority), namely teacher, can easily build an intelligent model which is well trained by comprehensive data collected from multiple sources. However, due to information security and privacy-related regulation requirements, full access to the well-trained intelligent model and the comprehensive training data is usually limited to the teacher only and not available to any unit (or branch) of that organization. Therefore, if a unit, namely student, needs an intelligent function similar to the trained intelligent model, the student has to train a similar model from scratch using the student’s own dataset. Such a dataset is usually unlabelled, requiring a big workload on labelling. Inspired by the Iterative Machine Teaching, we propose a novel collaboration pipeline. It enables the teacher to iteratively guide the student to select samples that are most worth labelling from the student’s own dataset, which significantly reduces the requirement for human labelling and, at the same time, prevents regulation and information security breaches. The effectiveness and efficiency of the proposed pipeline is empirically demonstrated on two publicly available healthcare datasets in comparison with baseline methods. This work has broad implications for the healthcare sector to facilitate data modelling in instances where the large labelled datasets are not accessible to each unit.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Balcan, M.F., Hanneke, S., Vaughan, J.W.: The true sample complexity of active learning. Mach. Learn. 80(2), 111–139 (2010)

    Article  MathSciNet  Google Scholar 

  2. Baytas, I.M., Xiao, C., Zhang, X., Wang, F., Jain, A.K., Zhou, J.: Patient subtyping via time-aware LSTM networks. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 65–74 (2017)

    Google Scholar 

  3. Cakmak, M., Thomaz, A.L.: Eliciting good teaching from humans for machine learners. Artif. Intell. 217, 198–215 (2014)

    Article  Google Scholar 

  4. Choi, E., Bahadori, M.T., Song, L., Stewart, W.F., Sun, J.: GRAM: graph-based attention model for healthcare representation learning. In: SIGKDD, pp. 787–795. ACM (2017)

    Google Scholar 

  5. Choi, E., Bahadori, M.T., Sun, J., Kulas, J., Schuetz, A., Stewart, W.: RETAIN: an interpretable predictive model for healthcare using reverse time attention mechanism. In: NeurIPS, pp. 3504–3512 (2016)

    Google Scholar 

  6. Fails, J.A., Olsen Jr., D.R.: Interactive machine learning. In: Proceedings of the 8th International Conference on Intelligent User Interfaces, pp. 39–45 (2003)

    Google Scholar 

  7. Gao, J., Xiao, C., Wang, Y., Tang, W., Glass, L.M., Sun, J.: Stagenet: Stage-aware neural networks for health risk prediction. In: Proceedings of the Web Conference 2020, pp. 530–540 (2020)

    Google Scholar 

  8. Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L.A., Mark, R.: Mimic-iv (version 0.4). PhysioNet (2020)

    Google Scholar 

  9. Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016)

    Google Scholar 

  10. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  11. Lee, D., et al.: Generating sequential electronic health records using dual adversarial autoencoder. J. Am. Med. Inform. Assoc. 27(9), 1411–1419 (2020)

    Article  Google Scholar 

  12. Liu, M., Jiang, L., Liu, J., Wang, X., Zhu, J., Liu, S.: Improving learning-from-crowds through expert validation. In: IJCAI, pp. 2329–2336 (2017)

    Google Scholar 

  13. Liu, W., et al.: Iterative machine teaching. In: International Conference on Machine Learning, pp. 2149–2158. PMLR (2017)

    Google Scholar 

  14. Long, G., Shen, T., Tan, Y., Gerrard, L., Clarke, A., Jiang, J.: Federated learning for privacy-preserving open innovation future on digital health. arXiv preprint arXiv:2108.10761 (2021)

  15. Ma, F., You, Q., Xiao, H., Chitta, R., Zhou, J., Gao, J.: KAME: knowledge-based attention model for diagnosis prediction in healthcare. In: CIKM, pp. 743–752. ACM, October 2018

    Google Scholar 

  16. Nguyen, P., Tran, T., Wickramasinghe, N., Venkatesh, S.: Deepr: a convolutional net for medical records. IEEE J. Biomed. Health Inform. 21(1), 22–30 (2017)

    Article  Google Scholar 

  17. Peng, X., Long, G., Pan, S., Jiang, J., Niu, Z.: Attentive dual embedding for understanding medical concepts in electronic health records. In: IJCNN, pp. 1–8. IEEE (2019)

    Google Scholar 

  18. Peng, X., Long, G., Shen, T., Wang, S., Jiang, J., Zhang, C.: BiteNet: bidirectional temporal encoder network to predict medical outcomes. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 412–421. IEEE (2020)

    Google Scholar 

  19. Peng, X., Shen, T., Wang, S., Niu, Z., Zhang, C., et al.: MIMO: mutual integration of patient journey and medical ontology for healthcare representation learning. arXiv preprint arXiv:2107.09288 (2021)

  20. Pham, T., Tran, T., Phung, D., Venkatesh, S.: DeepCare: a deep dynamic memory model for predictive medicine. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS (LNAI), vol. 9652, pp. 30–41. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31750-2_3

    Chapter  Google Scholar 

  21. Settles, B.: Active learning. Synth. Lect. Artif. Intell. Mach. Learn. 6(1), 1–114 (2012)

    MathSciNet  MATH  Google Scholar 

  22. Settles, B.: Active learning literature survey (2009)

    Google Scholar 

  23. Shickel, B., Tighe, P.J., Bihorac, A., Rashidi, P.: Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J. Biomed. Health Inform. 22(5), 1589–1604 (2018)

    Article  Google Scholar 

  24. Singla, A., Bogunovic, I., Bartók, G., Karbasi, A., Krause, A.: Near-optimally teaching the crowd to classify. In: International Conference on Machine Learning, pp. 154–162. PMLR (2014)

    Google Scholar 

  25. Song, L., Cheong, C.W., Yin, K., Cheung, W.K., Cm, B.: Medical concept embedding with multiple ontological representations. In: IJCAI, pp. 4613–4619 (2019)

    Google Scholar 

  26. Wang, Y., Long, G., Peng, X., Clarke, A., Stevenson, R., Gerrard, L.: Interactive deep metric learning for healthcare cohort discovery. In: Le, T.D., et al. (eds.) AusDM 2019. CCIS, vol. 1127, pp. 208–221. Springer, Singapore (2019). https://doi.org/10.1007/978-981-15-1699-3_17

    Chapter  Google Scholar 

  27. Zhao, J., Chen, Y., Zhang, W.: Differential privacy preservation in deep learning: challenges, opportunities and solutions. IEEE Access 7, 48901–48911 (2019)

    Article  Google Scholar 

  28. Zhu, X.: Machine teaching for Bayesian learners in the exponential family. In: NIPS, pp. 1905–1913 (2013)

    Google Scholar 

  29. Zhu, X.: Machine teaching: an inverse problem to machine learning and an approach toward optimal education. In: AAAI, vol. 29 (2015)

    Google Scholar 

Download references

Acknowledgements

This research is supported by an Australian Government Research Training Program Scholarship. We also thank the Australian Government Department of Health for supporting this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xueping Peng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, Y., Peng, X., Clarke, A., Schlegel, C., Jiang, J. (2022). Machine Teaching-Based Efficient Labelling for Cross-unit Healthcare Data Modelling. In: Long, G., Yu, X., Wang, S. (eds) AI 2021: Advances in Artificial Intelligence. AI 2022. Lecture Notes in Computer Science(), vol 13151. Springer, Cham. https://doi.org/10.1007/978-3-030-97546-3_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-97546-3_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-97545-6

  • Online ISBN: 978-3-030-97546-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics