Cancer classification with data augmentation based on generative adversarial networks

Wei, Kaimin; Li, Tianqi; Huang, Feiran; Chen, Jinpeng; He, Zefan

doi:10.1007/s11704-020-0025-x

Cancer classification with data augmentation based on generative adversarial networks

Research Article
Published: 09 September 2021

Volume 16, article number 162601, (2022)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Kaimin Wei^1,2,
Tianqi Li^1,2,
Feiran Huang^1,2,
Jinpeng Chen³ &
…
Zefan He^1,2

428 Accesses
20 Citations
1 Altmetric
Explore all metrics

Abstract

Accurate diagnosis is a significant step in cancer treatment. Machine learning can support doctors in prognosis decision-making, and its performance is always weakened by the high dimension and small quantity of genetic data. Fortunately, deep learning can effectively process the high dimensional data with growing. However, the problem of inadequate data remains unsolved and has lowered the performance of deep learning. To end it, we propose a generative adversarial model that uses non target cancer data to help target generator training. We use the reconstruction loss to further stabilize model training and improve the quality of generated samples. We also present a cancer classification model to optimize classification performance. Experimental results prove that mean absolute error of cancer gene made by our model is 19.3% lower than DC-GAN, and the classification accuracy rate of our produced data is higher than the data created by GAN. As for the classification model, the classification accuracy of our model reaches 92.6%, which is 7.6% higher than the model without any generated data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GAN-Based Data Augmentation for Prediction Improvement Using Gene Expression Data in Cancer

Conditional Generative Adversarial Networks for Data Augmentation in Breast Cancer Classification

Data augmentation using MG-GAN for improved cancer classification on gene expression data

Article 13 December 2019

References

Padma V V. An overview of targeted cancer therapy. BioMedicine, 2015, 5(4): 1–6
Article Google Scholar
Siegel R, Miller K, Jemal A. Cancer statistics 2019. CA: A Cancer Journal for Clinicians, 2019, 69(1): 7–34
Google Scholar
Abeel T, Helleputte T, Van de Deer Y, Dupont P, Saeys Y. Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics, 2009, 26(3): 392–398
Article Google Scholar
Bokulich N A, Kaehler B D, Rideout J R, Dillon M, Bolyen E, Knight R, Huttley G A, Caporaso J G. Optimizing taxonomic classification of marker-gene amplicon sequences with qiime 2s q2-feature-classifier plugin. Microbiome, 2018, 6(90): 1–17
Google Scholar
Zhang R, Huang G, Sundararajan N, Saratchandran P. Multicategory classification using an extreme learning machine for microarray gene expression cancer diagnosis. IEEE/ACM Transactions on Computer Biology Bioinformation, 2007, 4(3): 485–495
Article Google Scholar
Sun W, Zheng B, Qian W. Computer aided lung cancer diagnosis with deep learning algorithms. Medical Imaging 2016: Computer-Aided Diagnosis. 2016, 9785: 97850Z
Google Scholar
Institute N C. The cancer genome atlas. see the homepage of National Cancer Institute, 2020
Ebigbo A, Mendel R, Probst A, Manzeneder J, de Souza Jr L A, Papa J P, Palm C, Messmann H. Computer-aided diagnosis using deep learning in the evaluation of early oesophageal adenocarcinoma. Gut, 2019, 68(7): 1143–1145
Article Google Scholar
Khosravan N, Celik H, Turkbey B, Jones E C, Wood B, Bagci U. A collaborative computer aided diagnosis (C-CAD) system with eye-tracking, sparse attentional model, and deep learning. Medical Image Analysis, 2019, 51: 101–115
Article Google Scholar
Afshar P, Mohammadi A, Plataniotis K N, Oikonomou A, Benali H. From handcrafted to deep-learning-based cancer radiomics: challenges and opportunities. IEEE Signal Processing Magazine, 2019, 36(4): 132–160
Article Google Scholar
Jeyaraj P R, Nadar E R S. Computer-assisted medical image classification for early diagnosis of oral cancer employing deep learning algorithm. Journal of Cancer Research and Clinical Oncology, 2019, 145(4): 829–837
Article Google Scholar
Golub T R, Slonim D K, Tamayo P, Huard C, Gaasenbeek M, Mesirov J P, Coller H, Loh M L, Downing J R, Caligiuri M A, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 1999, 286(5439): 531–537
Article Google Scholar
Furey T S, Cristianini N, Duffy N, Bednarski D W, Schummer M, Haussler D. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 2000, 16(10): 906–914
Article Google Scholar
Reddy S, Reddy K T, Kumari V V, Varma K V. An SVM based approach to breast cancer classification using rbf and polynomial kernel functions with varying arguments. International Journal of Computer Science and Information Technologies, 2014, 5(4): 5901–5904
Google Scholar
Fakoor R, Ladhak F, Nazi A, Huber M. Using deep learning to enhance cancer diagnosis and classification. In: Proceedings of International Conference on Machine Learning. 2013, 1–7
Danaee P, Ghaeini R, Hendrix D. A deep learning approach for cancer detection and relevant gene identification. In: Proceedings of Pacific Symposium on Biocomputing. 2017, 219–229
Esteva A, Kuprel B, Novoa R A, Ko J, Swetter S M, Blau H M, Thrun S. Dermatologist-level classification of skin cancer with deep neural networks. Nature, 2017, 542(7639): 115–118
Article Google Scholar
Sirinukunwattana K, Raza S E A, Tsang Y, Snead D R J, Cree I A, Rajpoot N M. Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images. IEEE Transacations on Medical Imaging, 2016, 35(5): 1196–1206
Article Google Scholar
Coudray N, Ocampo P S, Sakellaropoulos T, Narula N, Snuderl M, Fenyö D, Moreira A L, Razavian N, Tsirigos A. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nature Medicine, 2018, 24(10): 1559–1569
Article Google Scholar
Liang M, Li Z, Chen T, Zeng J. Integrative data analysis of multiplatform cancer data with a multimodal deep learning approach. IEEE/ACM Transactions on Computer Biology Bioinformation, 2015, 12(4): 928–937
Article Google Scholar
Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P. Smote: synthetic minority over-sampling. Journal of Artificial Intelligence Research, 2002, 16(1): 321–357
Article MATH Google Scholar
Li F, Fergus R, Perona P. A bayesian approach to unsupervised oneshot learning of object categories. In: Proceedings of the 9th IEEE International Conference on Computer Vision. 2003, 1134–1141
Perez L, Wang J. The effectiveness of data augmentation in image classification using deep learning. 2017, arXiv preprint arXiv: 1712.04621
Peng X, Tang Z, Yang F, Feris R S, Metaxas D N. Jointly optimize data augmentation and network training: adversarial data augmentation in human pose estimation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2018, 2226–2234
Mok T C W, Chung A C S. Learning data augmentation for brain tumor segmentation with coarse-to-fine generative adversarial networks. In: Proceedings of the 4th International Workshop on Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. 2018, 70–80
Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A C, Bengio Y. Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014, 2672–2680
Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. In: Proceedings of the 4th International Conference on Learning Representations. 2016, 1–16
Kingma D P, Welling M. Auto-encoding variational bayes. In: Proceedings of the 2nd International Conference on Learning Representations. 2014, 1–14

Download references

Acknowledgements

This work was supported by National Key R&D Program of China (2018YFB1402600, 2017YFB0802203), the National Natural Science Foundation of China (Grant Nos. 61972178, 61702043, 61906075, 61932010), Key-Area Research and Development Program of Guangdong Province (2019B010137005), Natural Science Foundation of Guangdong Province (2017A030313334, 2019A1515011753, 2019A151 5011920), Science and Technology Program of Guangzhou of China (201802010061) and Beijing Natural Science Foundation (4194086).

Author information

Authors and Affiliations

College of Information Science and Technology, Jinan University, Guangzhou, 510632, China
Kaimin Wei, Tianqi Li, Feiran Huang & Zefan He
Guangdong Key Laboratory of Data Security and Privacy Protection, Guangzhou, 510632, China
Kaimin Wei, Tianqi Li, Feiran Huang & Zefan He
School of Software Engineering, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Jinpeng Chen

Authors

Kaimin Wei
View author publications
You can also search for this author in PubMed Google Scholar
Tianqi Li
View author publications
You can also search for this author in PubMed Google Scholar
Feiran Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jinpeng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zefan He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feiran Huang.

Additional information

Kaimin Wei is now an associate professor at the College of Information Science and Technology in Jinan University, China. He received the PhD degree from Beihang University, China. His research interests include mobile networks, crowd sensing, mechine learning and AI security.

Tianqi Li is a graduate student at the College of Information Science and Technology in Jinan University, China. She received the bechelor degree in computer science and technology from Guizhou University, China. Her research interests include bioinformatics, computer vision, and machine learning.

Feiran Huang received his BSc degree from Central South University, China in 2011. He received his PhD degree in computer software and theory from School of Computer Science and Engineering, Beihang University, China in 2019. He is currently a lecturer at College of Information Science and Technology & College of Cyber Security, Jinan University, China. He has published over 10 papers, such as TIP, TOMM, TCYB, TITS, ACM MM, CIKM, and ICMR. His research interests include social media analysis and multi-modal learning.

Jinpeng Chen is now an associate professor at the School of Software Engineering, Beijing University of Posts and Telecommunications, China. His research interests include social network analysis, recommendation system, data mining, and machine learning.

Zefan He is an undergraduate majoring in Computer Science and Technology in Jinan University, China. Her research interests include crowdsensing, blockchain, and machine learning.

Electronic Supplementary Material