RemixFormer: A Transformer Model for Precision Skin Tumor Differential Diagnosis via Multi-modal Imaging and Non-imaging Data

Xu, Jing; Gao, Yuan; Liu, Wei; Huang, Kai; Zhao, Shuang; Lu, Le; Wang, Xiaosong; Hua, Xian-Sheng; Wang, Yu; Chen, Xiang

doi:10.1007/978-3-031-16437-8_60

Jing Xu¹²,
Yuan Gao¹²,
Wei Liu¹²,
Kai Huang¹³,
Shuang Zhao¹³,
Le Lu¹²,
Xiaosong Wang¹²,
Xian-Sheng Hua¹²,
Yu Wang¹² &
…
Xiang Chen¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13433))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

6207 Accesses
4 Citations

Abstract

Skin tumor is one of the most common diseases worldwide and the survival rate could be drastically increased if the cancerous lesions were identified early. Intrinsic visual ambiguities displayed by skin tumors in multi-modal imaging data impose huge amounts of challenges to diagnose them precisely, especially at the early stage. To achieve high diagnosis accuracy or precision, all possibly available clinical data (imaging and/or non-imaging) from multiple sources are used, and even the missing-modality problem needs to be tackled when some modality may become unavailable. To this end, we first devise a new disease-wise pairing of all accessible patient data if they fall into the same disease category as a remix operation of data samples. A novel cross-modality-fusion module is also proposed and integrated with our transformer-based multi-modality deep classification framework that can effectively perform multi-source data fusion (i.e., clinical images, dermoscopic images and accompanied with clinical patient-wise metadata) for skin tumors. Extensive quantitative experiments are conducted. We achieve an absolute 6.5% increase in averaged F1 and 2.8% in accuracy for the classification of five common skin tumors by comparing to the prior leading method on Derm7pt dataset of 1011 cases. More importantly, our method obtains an overall 88.5% classification accuracy using a large-scale in-house dataset of 5601 patients and in ten skin tumor classes (pigmented and non-pigmented). This experiment further validates the robustness and implies the potential clinical usability of our method, in a more realistic and pragmatic clinic setting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bi, L., Feng, D.D., Fulham, M., Kim, J.: Multi-label classification of multi-modality skin lesion via hyper-connected convolutional neural network. Pattern Recogn. 107, 107502 (2020)
Google Scholar
Chen, C.F.R., Fan, Q., Panda, R.: Crossvit: cross-attention multi-scale vision transformer for image classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 357–366 (2021)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2021)
Google Scholar
Esteva, A., et al.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017)
Article Google Scholar
Ge, Z., Demyanov, S., Chakravorty, R., Bowling, A., Garnavi, R.: Skin disease recognition using deep saliency features and multimodal learning of dermoscopy and clinical images. In: Medical Image Computing and Computer Assisted Intervention (MICCAI 2017), pp. 250–258 (2017)
Google Scholar
Haenssle, H.A., et al.: Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann. Oncol. 29, 1836–1842 (2018)
Article Google Scholar
Haenssle, H.A., et al.: Man against machine reloaded: performance of a market-approved convolutional neural network in classifying a broad spectrum of skin lesions in comparison with 96 dermatologists working under less artificial conditions. Ann. Oncol. 31, 137–143 (2020)
Article Google Scholar
Kawahara, J., Daneshvar, S., Argenziano, G., Hamarneh, G.: Seven-point checklist and skin lesion classification using multitask multimodal neural nets. IEEE J. Biomed. Health Inform. 23(2), 538–546 (2019)
Article Google Scholar
Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10012–10022 (2021)
Google Scholar
Perera, E., Gnaneswaran, N., Jennens, R., Sinclair, R.: Malignant melanoma. Healthcare 2(1), 1 (2013)
Article Google Scholar
Soenksen, L.R., Kassis, T., Conover, S.T., Marti-Fuster, B., Gray, M.L.: Using deep learning for dermatologist-level detection of suspicious pigmented skin lesions from wide-field images. Sci. Transl. Med. 13(581), eabb3652 (2021)
Article Google Scholar
Tang, P., et al.: Fusionm4net: a multi-stage multi-modal learning algorithm for multi-label skin lesion classification. Med. Image Anal. 76(102307), 1–13 (2022)
Google Scholar
Tschandl, P., et al.: Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study. Lancet Oncol. 20, 938–947 (2019)
Article Google Scholar
Tschandl, P., et al.: Human-computer collaboration for skin cancer recognition. Nat. Med. 26, 1229–1234 (2020)
Article Google Scholar
Yu, Z., et al.: End-to-end ugly duckling sign detection for melanoma identification with transformers. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12907, pp. 176–184. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87234-2_17
Chapter Google Scholar

Download references

Acknowledgement

This work was supported by National Key R &D Program of China (2020YFC2008703) and the Project of Intelligent Management Software for Multimodal Medical Big Data for New Generation Information Technology, the Ministry of Industry and Information Technology of the People’s Republic of China (TC210804V).

Author information

Authors and Affiliations

DAMO Academy, Alibaba Group, Hangzhou, China
Jing Xu, Yuan Gao, Wei Liu, Le Lu, Xiaosong Wang, Xian-Sheng Hua & Yu Wang
Department of Dermatology, Xiangya Hospital Central South University, Changsha, China
Kai Huang, Shuang Zhao & Xiang Chen

Authors

Jing Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Gao
View author publications
You can also search for this author in PubMed Google Scholar
Wei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Kai Huang
View author publications
You can also search for this author in PubMed Google Scholar
Shuang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Le Lu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaosong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xian-Sheng Hua
View author publications
You can also search for this author in PubMed Google Scholar
Yu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Wang .

Editor information

Editors and Affiliations

Rochester Institute of Technology, Rochester, NY, USA
Linwei Wang
Chinese University of Hong Kong, Hong Kong, Hong Kong
Qi Dou
University of Virginia, Charlottesville, VA, USA
P. Thomas Fletcher
National Center for Tumor Diseases (NCT/UCC), Dresden, Germany
Stefanie Speidel
Case Western Reserve University, Cleveland, OH, USA
Shuo Li

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 85 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, J. et al. (2022). RemixFormer: A Transformer Model for Precision Skin Tumor Differential Diagnosis via Multi-modal Imaging and Non-imaging Data. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol 13433. Springer, Cham. https://doi.org/10.1007/978-3-031-16437-8_60

Download citation

DOI: https://doi.org/10.1007/978-3-031-16437-8_60
Published: 16 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16436-1
Online ISBN: 978-3-031-16437-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

RemixFormer: A Transformer Model for Precision Skin Tumor Differential Diagnosis via Multi-modal Imaging and Non-imaging Data