skip to main content
10.1145/3447548.3467106acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Tolerating Data Missing in Breast Cancer Diagnosis from Clinical Ultrasound Reports via Knowledge Graph Inference

Published: 14 August 2021 Publication History

Abstract

Medical diagnosis through artificial intelligence has been drawing increasing attention currently. For breast lesions, the clinical ultrasound reports are the most commonly used data in the diagnosis of breast cancer. Nevertheless, the input reports always encounter the inevitable issue of data missing. Unfortunately, despite the efforts made in previous approaches that made progress on tackling data imprecision, nearly all of these approaches cannot accept inputs with data missing. A common way to alleviate the data missing issue is to fill the missing values with artificial data. However, the data filling strategy actually brings in additional noises that do not exist in the raw data. Inspired by the advantage of open world assumption, we regard the missing data in clinical ultrasound reports as non-observed terms of facts, and propose a Knowledge Graph embedding based model KGSeD with the capability of tolerating data missing, which can successfully circumvent the pollution caused by data filling. Our KGSeD is designed via an encoder-decoder framework, where the encoder incorporates structural information of the graph via embedding, and the decoder diagnose patients by inferring their links to clinical outcomes. Comparative experiments show that KGSeD achieves noticeable diagnosis performances. When data missing occurred, KGSeD yields the most stable performance over those of existing approaches, showing better tolerance to data missing.

Supplementary Material

MP4 File (tolerating_data_missing_in_breast-jianing_xi-liping_ye-38958075-757f.mp4)
Medical diagnosis through artificial intelligence has been drawing increasing attention currently. For breast lesions, the clinical ultrasound reports are the most commonly used data in the diagnosis of breast cancer. Nevertheless, the input reports always encounter the inevitable issue of data missing. Unfortunately, nearly all of them cannot accept inputs with data missing. A common way to alleviate the data missing issue is to fill the missing values with artificial data, but it actually brings in additional noises. Inspired by open world assumption, we regard the missing data in clinical ultrasound reports as non-observed terms of facts, and propose a Knowledge Graph embedding based model KGSeD, which can successfully circumvent the pollution caused by data filling. KGSeD is designed via an encoder-decoder framework. When data missing occurred, KGSeD yields the most stable performance over those of existing approaches, showing better tolerance to data missing.

References

[1]
Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Neural Information Processing Systems (NIPS). 1--9.
[2]
Federico Cismondi, André S Fialho, Susana M Vieira, Shane R Reti, Joao MC Sousa, and Stan N Finkelstein. 2013. Missing data in medical databases: Impute, delete or classify? Artificial Intelligence in Medicine 58, 1 (2013), 63--72.
[3]
Krzysztof J. Geras and Charles Sutton. 2015. Scheduled denoising autoencoders. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann Le Cun (Eds.). http://arxiv.org/abs/1406.3269
[4]
Yoav Goldberg and Omer Levy. 2014. word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722(2014).
[5]
Qinghua Huang, Baozhu Hu, and Fan Zhang. 2019. Evolutionary optimized fuzzy reasoning with mined diagnostic patterns for classification of breast tumors in ultrasound. Information Sciences502 (2019), 525--536.
[6]
Qinghua Huang, Bowen Wu, Jiulong Lan, and Xuelong Li. 2018. Fully automatic three-dimensional ultrasound imaging based on conventional B-scan. IEEE transactions on biomedical circuits and systems 12, 2 (2018), 426--436.
[7]
Q. Huang, J. Yao, J. Li, M. Li, M. R. Pickering, and X. Li. 2020. Measurement of Quasi-Static 3-D Knee Joint Movement Based on the Registration From CT to US. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control 67, 6(2020), 1141--1150. https://doi.org/10.1109/TUFFC.2020.2965149
[8]
Jonathan L Jesneck, Joseph Y Lo, and Jay A Baker. 2007. Breast mass lesions:computer-aided diagnosis models with mammographic and sonographic descriptors. Radiology 244, 2 (2007), 390--398.
[9]
Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: Long papers). 687--696.
[10]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings, Yoshua Bengioand Yann LeCun (Eds.). http://arxiv.org/abs/1412.6980
[11]
Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24--26, 2017, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=SJU4ayYgl
[12]
Thijs Kooi, Geert Litjens, Bram Van Ginneken, Albert Gubern-Mérida, Clara I Sánchez, Ritse Mann, Ard den Heeten, and Nico Karssemeijer. 2017. Large scale deep learning for computer aided detection of mammographic lesions. Medical image analysis 35 (2017), 303--312.
[13]
Xuelong Li, Mulin Chen, Feiping Nie, and Qi Wang. 2017. Locality Adaptive Discriminant Analysis. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19--25, 2017, Carles Sierra (Ed.). ijcai.org, 2201--2207. https://doi.org/10.24963/ijcai.2017/306
[14]
Renjie Liao, Tao Wan, and Zengchang Qin. 2011. Classification of benign and malignant breast tumors in ultrasound images based on multiple sonographic and textural features. In 2011 Third International Conference on Intelligent Human-Machine Systems and Cybernetics, Vol. 1. IEEE, 71--74.
[15]
Laura Liberman and Jennifer H Menell. 2002. Breast imaging reporting and data system (BI-RADS). Radiologic Clinics 40, 3 (2002), 409--430.
[16]
Jau-Huei Lin and Peter J Haug. 2008. Exploiting missing clinical data in Bayesian network modeling for predicting medical problems.Journal of Biomedical Informatics 41, 1 (2008), 1--14.
[17]
Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 29.
[18]
Hai Liu, Kairong Hu, Fu-Lee Wang, and Tianyong Hao. 2020. Aggregating neighborhood information for negative sampling for knowledge graph embedding. Neural Computing and Applications(2020), 1--17.
[19]
Debra L Monticciolo, Mary S Newell, Linda Moy, Bethany Niell, Barbara Monsees, and Edward A Sickles. 2018. Breast cancer screening in women at higher-than-average risk: recommendations from the ACR. Journal of the American College of Radiology 15, 3 (2018), 408--414.
[20]
Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich. 2016. A review of relational machine learning for knowledge graphs. Proc. IEEE 104, 1(2016), 11--33.
[21]
Filippo Pesapane, Marina Codari, and Francesco Sardanelli. 2018. Artificial intelligence in medical imaging: threat or opportunity? Radiologists again at the forefront of innovation in medicine. European Radiology Experimental 2, 1 (2018), 1--10.
[22]
Lei Qu, Changfeng Wu, and Liang Zou. 2020. 3D Dense Separated Convolution Module for Volumetric Medical Image Analysis.Applied Sciences 10, 2 (2020), 485.
[23]
Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2009. The graph neural network model. IEEE transactions on neural networks 20, 1 (2009), 61--80.
[24]
Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van?den Berg, Ivan Titov, and Max Welling. 2018. Modeling Relational Data with Graph Convolutional Networks. In The Semantic Web, Aldo Gangemi, Roberto Navigli, Maria-Esther Vidal, Pascal Hitzler, Raphael Troncy, Laura Hollink, Anna Tordai, and Mehwish Alam (Eds.). Springer International Publishing, Cham, 593--607.
[25]
Juan Shan, S Kaisar Alam, Brian Garra, Yingtao Zhang, and Tahira Ahmed. 2016. Computer-aided diagnosis for breast ultrasound using computerized BI-RADS features and machine learning methods. Ultrasound in medicine & biology 42, 4(2016), 980--988.
[26]
Jialin Su, Yuanzhuo Wang, Xiaolong Jin, Yantao Jia, and Xueqi Cheng. 2020. Link Prediction between Group Entities in Knowledge Graphs (Student Abstract). In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 13925--13926.
[27]
Pengyang Wang, Kunpeng Liu, Lu Jiang, Xiaolin Li, and Yanjie Fu. 2020. Incremental mobile user profiling: Reinforcement learning with spatial knowledge graph for modeling event streams. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 853--861.
[28]
Quan Wang, Zhendong Mao, Bin Wang, and Li Guo. 2017. Knowledge graph embedding: A survey of approaches and applications.IEEE Transactions on Knowledge and Data Engineering 29, 12 (2017), 2724--2743.
[29]
Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 28.
[30]
Gerhard Weikum and Martin Theobald. 2010. From information to knowledge: harvesting entities and relationships from web sources. In Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 65--76.
[31]
Jianing Xi, Ao Li, and Minghui Wang. 2020. HetRCNA: a novel method to identify recurrent copy number alternations from heterogeneous tumor samples based on matrix decomposition framework.IEEE/ACM transactions on computational biology and bioinformatics 17, 2 (2020), 422--434.
[32]
Jianing Xi, Xiguo Yuan, Minghui Wang, Ao Li, Xuelong Li, and Qinghua Huang. 2020. Inferring subgroup-specific driver genes from heterogeneous cancer samples via subspace learning with subgroup indication. Bioinformatics 36, 6 (2020), 1855--1863.
[33]
Moi Hoon Yap, Gerard Pons, Joan Marti, Sergi Ganau, Melcior Sentis, Reyer Zwiggelaar, Adrian K Davison, and Robert Marti. 2019. Automated breast ultra-sound lesions detection using convolutional neural networks. IEEE Journal of Biomedical and Health Informatics 22, 4 (2019), 1218--1226.
[34]
Kun Zhou, Wayne Xin Zhao, Shuqing Bian, Yuanhang Zhou, Ji-Rong Wen, and Jingsong Yu. 2020. Improving conversational recommender systems via knowledge graph based semantic fusion. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1006--1014.

Cited By

View all
  • (2024)Knowledge graph based reasoning in medical image analysisComputers in Biology and Medicine10.1016/j.compbiomed.2024.109100182:COnline publication date: 1-Nov-2024
  • (2024)Interactive optimization of relation extraction via knowledge graph representation learningJournal of Visualization10.1007/s12650-024-00955-527:2(197-213)Online publication date: 26-Feb-2024
  • (2024)Review of AI & XAI-based breast cancer diagnosis methods using various imaging modalitiesMultimedia Tools and Applications10.1007/s11042-024-20271-284:5(2209-2260)Online publication date: 15-Oct-2024
  • Show More Cited By

Index Terms

  1. Tolerating Data Missing in Breast Cancer Diagnosis from Clinical Ultrasound Reports via Knowledge Graph Inference

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
      August 2021
      4259 pages
      ISBN:9781450383325
      DOI:10.1145/3447548
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 14 August 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. computer-aided diagnosis
      2. knowledge graph
      3. medical ultrasound data
      4. open world assumption

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      KDD '21
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Upcoming Conference

      KDD '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)75
      • Downloads (Last 6 weeks)8
      Reflects downloads up to 28 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Knowledge graph based reasoning in medical image analysisComputers in Biology and Medicine10.1016/j.compbiomed.2024.109100182:COnline publication date: 1-Nov-2024
      • (2024)Interactive optimization of relation extraction via knowledge graph representation learningJournal of Visualization10.1007/s12650-024-00955-527:2(197-213)Online publication date: 26-Feb-2024
      • (2024)Review of AI & XAI-based breast cancer diagnosis methods using various imaging modalitiesMultimedia Tools and Applications10.1007/s11042-024-20271-284:5(2209-2260)Online publication date: 15-Oct-2024
      • (2024)Application of Machine Learning in PharmacoproteomicsPharmacoproteomics10.1007/978-3-031-64021-6_15(333-349)Online publication date: 1-Sep-2024
      • (2023)Pathological Tissue-level Contour Genomic Profile Interpretation of Lung Adenocarcinoma via Spatial and Morphological Features Co-action Graph Neural Network2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM58861.2023.10385514(958-964)Online publication date: 5-Dec-2023
      • (2023)TransH-RA: A Learning Model of Knowledge Representation by Hyperplane Projection and Relational AttributesIEEE Access10.1109/ACCESS.2023.326013911(29510-29520)Online publication date: 2023
      • (2023)An omics-to-omics joint knowledge association subtensor model for radiogenomics cross-modal modules from genomics and ultrasonic images of breast cancersComputers in Biology and Medicine10.1016/j.compbiomed.2023.106672155:COnline publication date: 1-Mar-2023
      • (2022)Knowledge Graph Applications in Medical Imaging Analysis: A Scoping ReviewHealth Data Science10.34133/2022/98415482022Online publication date: Jan-2022
      • (2022)DA-IMRN: Dual-Attention-Guided Interactive Multi-Scale Residual Network for Hyperspectral Image ClassificationRemote Sensing10.3390/rs1403053014:3(530)Online publication date: 23-Jan-2022
      • (2022)Ontologies and Knowledge Graphs in Oncology ResearchCancers10.3390/cancers1408190614:8(1906)Online publication date: 10-Apr-2022
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media