Abstract
Recent research demonstrates that gene expression based cancer subtype classification has more advantages over the traditional classification. However, since this kind of data always has thousands of features, performing classification is impossible by human beings without efficient and accurate algorithms. This paper reports an empirical study that explores the problem of finding a highly-efficient and accurate machine learning method on human cancer subtype classification based on the gene expression data in cancer cells. Several machine learning algorithms are well developed to solve this kind of problems, including Naive Bayes Classifier, Support Vector Machine (SVM), Random Forest, Neural Networks. Here we generate two prediction models using SVM and Random Forest algorithms along with a feature selection approach (FSA) to predict the subtype of lung cell lines. The accuracy of the two prediction models is close with a rate of more than 90%. However, the running time of SVM is much shorter than that of Random Forest.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Samuel, A.: Some studies in machine learning using the game of checkers. ii—recent progress. IBM J. Res. Dev. 11, 601–617 (1967)
Kourou, K., Exarchos, T., Exarchos, K., Karamouzis, M., Fotiadis, D.: Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 13, 8–17 (2015)
Zemouri, R., Zerhouni, N., Racoceanu, D.: Deep learning in the biomedical applications: recent and future status. Appl. Sci. 9, 1526 (2019)
Gulshan, V., et al.: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316, 2402–2410 (2016)
Inamura, K.: Lung cancer: understanding its molecular pathology and the 2015 WHO classification. Front Oncol. 7, 193 (2017)
“what-is-cancer”. https://www.cancer.gov/about-cancer/understanding/what-is-cancer
Jiang, L., Xiao, Y., Ding, Y., Tang, J., Guo, F.: Discovering cancer subtypes via an accurate fusion strategy on multiple profile data. Front. Genet. 10, 20 (2019)
Wu, M., et al.: Prediction of molecular subtypes of breast cancer using BI-RADS features based on a “white box” machine learning approach in a multi-modal imaging setting. Eur. J. Radiol. 114, 175–184 (2019)
Aruna, S., Rajagopalan, S.: A novel SVM based CSSFFS feature selection algorithm for detecting breast cancer. Int. J. Comput. Appl. 31(8), 14–20 (2011)
de Souto, M., Costa, I., de Araujo, D., Ludermir, T., Schliep, A.: Clustering cancer gene expression data: a comparative study. BMC Bioinf. 9, 497 (2008)
Kakushadze, Z., Yu, W.: *K-means and cluster models for cancer signatures. Biomol. Detect. Quantification 13, 7–31 (2017)
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theor. 28, 129–137 (1982)
Wang, X., et al.: Subtype-specific secretomic characterization of pulmonary neuroendocrine tumor cells. Nat. Commun. 10, 3201 (2019)
Borromeo, M., et al.: ASCL1 and NEUROD1 reveal heterogeneity in pulmonary neuroendocrine tumors and regulate distinct genetic programs. Cell Rep. 16, 1259–1272 (2016)
Augustyn, A., et al.: ASCL1 is a lineage oncogene providing therapeutic targets for high-grade neuroendocrine lung cancers. Proc. Natl. Acad. Sci. U.S.A. 111, 14788–14793 (2014)
Liu, S., et al.: Feature selection of gene expression data for cancer classification using double RBF-kernels. BMC Bioinformatics 19, 396 (2018)
Chen, H., Zhang, Y., Gutman, I.: A kernel-based clustering method for gene selection with gene expression data. J. Biomed. Inf. 62, 12–20 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, Y., Wang, XD., Qiu, M., Zhao, H. (2019). Machine Learning for Cancer Subtype Prediction with FSA Method. In: Qiu, M. (eds) Smart Computing and Communication. SmartCom 2019. Lecture Notes in Computer Science(), vol 11910. Springer, Cham. https://doi.org/10.1007/978-3-030-34139-8_39
Download citation
DOI: https://doi.org/10.1007/978-3-030-34139-8_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34138-1
Online ISBN: 978-3-030-34139-8
eBook Packages: Computer ScienceComputer Science (R0)