Skip to main content

Efficient Medical Image Assessment via Self-supervised Learning

  • Conference paper
  • First Online:
Data Augmentation, Labelling, and Imperfections (DALI 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13567))

Included in the following conference series:

  • 788 Accesses

Abstract

High-performance deep learning methods typically rely on large annotated training datasets, which are difficult to obtain in many clinical applications due to the high cost of medical image labeling. Existing data assessment methods commonly require knowing the labels in advance, which are not feasible to achieve our goal of ‘knowing which data to label.’ To this end, we formulate and propose a novel and efficient data assessment strategy, EXponentiAl Marginal sINgular valuE (\(\textsf{EXAMINE}\)) score, to rank the quality of unlabeled medical image data based on their useful latent representations extracted via Self-supervised Learning (SSL) networks. Motivated by theoretical implication of SSL embedding space, we leverage a Masked Autoencoder [8] for feature extraction. Furthermore, we evaluate data quality based on the marginal change of the largest singular value after excluding the data point in the dataset. We conduct extensive experiments on a pathology dataset. Our results indicate the effectiveness and efficiency of our proposed methods for selecting the most valuable data to label.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    In practice, there are approximation methods for calculating Shapley value, but the it still requires around \(\mathcal {O}(T\textrm{poly}(N))\) [11].

  2. 2.

    \(\lambda _S > \lambda _{S\backslash \{i\}}\) is for sure given the properties of singular value.

  3. 3.

    The running time for LOO and Data Shapley can significantly increase if we use a deep neural network as the utility model.

References

  1. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)

    Google Scholar 

  2. Chen, Y., Wei, C., Kumar, A., Ma, T.: Self-training avoids using spurious features under domain shift. Adv. Neural Inf. Process. Syst. 33, 21061–21071 (2020)

    Google Scholar 

  3. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  4. Fadahunsi, K.P., et al.: Protocol for a systematic review and qualitative synthesis of information quality frameworks in eHealth. BMJ Open 9(3), e024722 (2019)

    Google Scholar 

  5. Fadahunsi, K.P., et al.: Information quality frameworks for digital health technologies: systematic review. J. Med. Internet Res. 23(5), e23479 (2021)

    Google Scholar 

  6. Ghorbani, A., Zou, J.: Data shapley: equitable valuation of data for machine learning. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 2242–2251. PMLR (2019)

    Google Scholar 

  7. Grill, J.B., e al.: Bootstrap your own latent-a new approach to self-supervised learning. Adv. Neural Inf. Process. Syst. 33, 21271–21284 (2020)

    Google Scholar 

  8. He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009 (2022)

    Google Scholar 

  9. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MathSciNet  Google Scholar 

  10. Jia, R., et al.: Towards efficient data valuation based on the shapley value. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1167–1176. PMLR (2019)

    Google Scholar 

  11. Jia, R., Sun, X., Xu, J., Zhang, C., Li, B., Song, D.: An empirical and comparative analysis of data valuation with scalable algorithms (2019)

    Google Scholar 

  12. Kenton, J.D.M.W.C., Toutanova, L.K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of naacL-HLT, pp. 4171–4186 (2019)

    Google Scholar 

  13. Lee, J.D., Lei, Q., Saunshi, N., Zhuo, J.: Predicting what you already know helps: provable self-supervised learning. Adv. Neural Inf. Process. Syst. 34 (2021)

    Google Scholar 

  14. Loshchilov, I., Hutter, F.: Sgdr: stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016)

  15. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

  16. Redman, T.C.: Data Driven: Profiting from Your Most Important Business Asset. Harvard Business Press (2008)

    Google Scholar 

  17. Tosh, C., Krishnamurthy, A., Hsu, D.: Contrastive learning, multi-view redundancy, and linear models. In: Algorithmic Learning Theory, pp. 1179–1206. PMLR (2021)

    Google Scholar 

  18. Veeling, B.S., Linmans, J., Winkens, J., Cohen, T., Welling, M.: Rotation equivariant CNNs for digital pathology. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11071, pp. 210–218. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00934-2_24

  19. Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 649–666. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_40

Download references

Acknowledgement

This work is supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC) and NVIDIA Hardware Award.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Chun-Yin Huang or Xiaoxiao Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huang, CY., Lei, Q., Li, X. (2022). Efficient Medical Image Assessment via Self-supervised Learning. In: Nguyen, H.V., Huang, S.X., Xue, Y. (eds) Data Augmentation, Labelling, and Imperfections. DALI 2022. Lecture Notes in Computer Science, vol 13567. Springer, Cham. https://doi.org/10.1007/978-3-031-17027-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-17027-0_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-17026-3

  • Online ISBN: 978-3-031-17027-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics