Abstract
An early diagnosis of cancer is essential for a good prognosis, and the identification of differentially expressed genes can enable a better personalization of the treatment plan that can target those genes in therapy. This work proposes a pipeline that predicts the presence of lung cancer and the subtype allowing the identification of differentially expressed genes for lung cancer adenocarcinoma and squamous cell carcinoma subtypes. A gradient boosted tree model is used for the classification tasks based on RNA-seq data. The analysis of gene expressions that better differentiate cancerous from normal tissue, and features that distinguish between lung subtypes is the main focus of the present work. Differential expressed genes are analyzed by performing hierarchical clustering in order to identify gene signatures that are commonly regulated and biological signatures associated with a specific subtype. This analysis highlighted patterns of commonly regulated genes already known in the literature as cancer or subtype-specific genes, and others that are not yet documented in the literature.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ahn, T., et al.: Deep learning-based identification of cancer or normal tissue using gene expression data, pp. 1748–1752 (2018). https://doi.org/10.1109/BIBM.2018.8621108
Altekruse, S.F., et al.: SEER Cancer Statistics Review 1975–2007 National Cancer Institute. Cancer, pp. 1975–2007 (2010)
Arranz, E.E., Vara, J.Á.F., Gámez-Pozo, A., Zamora, P.: Gene signatures in breast cancer: current and future uses. Transl. Oncol. 5(6), 398–403 (2012). https://doi.org/10.1593/tlo.12244
Arroyo Varela, M., et al.: Comparative gene expression analysis in lung cancer. Europ. Respiratory J. 52(suppl 62), PA2797 (2018). https://doi.org/10.1183/13993003.congress-2018.PA2797. http://erj.ersjournals.com/content/52/suppl_62/PA2797.abstract
Danaee, P., Ghaeini, R., Hendrix, D.A.: A deep learning approach for cancer detection and relevant gene identification. In: Pacific Symposium on Biocomputing (212679), pp. 219–229 (2017). https://doi.org/10.1142/9789813207813_0022
Duhig, E., et al.: Network, CGenome Atl,: Comprehensive molecular profiling of lung adenocarcinoma: the cancer genome atlas research network. Nature 511(7511), 543–550 (2014). https://doi.org/10.1038/nature13385
Grant, G.R., Manduchi, E., Stoeckert, C.J.: Analysis and management of microarray gene expression data. Current protocols in molecular biology Chapter 19, Unit 19.6, January 2007. https://doi.org/10.1002/0471142727.mb1906s77
Inamura, K.: Lung cancer: understanding its molecular pathology and the 2015 wHO classification. Front. Oncol. 7, 1–7 (2017). https://doi.org/10.3389/fonc.2017.00193
Li, B., et al.: Mir-629-3p-induced downregulation of SFTPC promotes cell proliferation and predicts poor survival in lung adenocarcinoma. Artif. Cells Nanomed. Biotechnol. 47(1), 3286–3296 (2019). https://doi.org/10.1080/21691401.2019.1648283. pMID: 31379200
Li, Z., et al.: MACC1 overexpression in carcinoma-associated fibroblasts induces the invasion of lung adenocarcinoma cells via paracrine signaling. Int. J. Oncol. 54(4), 1367–1375 (2019). https://doi.org/10.3892/ijo.2019.4702
Liang, M., Li, Z., Chen, T., Zeng, J.: Integrative data analysis of multi-platform cancer data with a multimodal deep learning approach. IEEE/ACM Trans. Comput. Biol. Bioinf. 12(4), 928–937 (2015). https://doi.org/10.1109/TCBB.2014.2377729
O’Leary, N.A., et al.: Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44(D1), D733–D745 (2015). https://doi.org/10.1093/nar/gkv1189
Qian, Y., et al.: Prognostic cancer gene expression signatures: current status and challenges. Cells 10(3), 648 (2021). https://doi.org/10.3390/cells10030648
Ramos, B., Pereira, T., Moranguinho, J., Morgado, J., Costa, J.L., Oliveira, H.P.: An interpretable approach for lung cancer prediction and subtype classification using gene expression. In: 2021 43rd Annual International Conference of the IEEE Engineering in Medicine Biology Society (EMBC), pp. 1707–1710 (2021). https://doi.org/10.1109/EMBC46164.2021.9630775
Shriwash, N., Singh, P., Arora, S., Ali, S.M., Ali, S., Dohare, R.: Identification of differentially expressed genes in small and non-small cell lung cancer based on meta-analysis of MRNA. Heliyon 5(6), e01707 (2019). https://doi.org/10.1016/j.heliyon.2019.e01707
The Cancer Genome Atlas Research Network: Comprehensive genomic characterization of squamous cell lung cancers. Nature 489(7417), 519–525 (2012). https://doi.org/10.1038/nature11404
Uhlén, M., et al.: Tissue-based map of the human proteome. Science 347(6220), 1260419 (2015). https://doi.org/10.1126/science.1260419. https://www.science.org/doi/abs/10.1126/science.1260419
Yang, R., Zhou, Y., Du, C., Wu, Y.: Bioinformatics analysis of differentially expressed genes in tumor and paracancerous tissues of patients with lung adenocarcinoma. J. Thoracic Disease 12(12) (2020). https://jtd.amegroups.com/article/view/47626
Ye, X., Zhang, W., Sakurai, T.: Adaptive unsupervised feature learning for gene signature identification in non-small-cell lung cancer. IEEE Access 8, 154354–154362, e01707 (2020). https://doi.org/10.1109/ACCESS.2020.3018480
Zhong Wang, M.G., Snyder, M.: RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2009). https://doi.org/10.1038/nrg2484
Acknowledgment
This work was partially funded by the Project TAMI - Transparent Artificial Medical Intelligence (NORTE-01-0247-FEDER-045905) financed by ERDF - European Regional Fund through the North Portugal Regional Operational Program - NORTE 2020 and by the Portuguese Foundation for Science and Technology - FCT under the CMU - Portugal International Partnership.
This work is also financed by National Funds through the Portuguese funding agency, FCT-Fundação para a Ciência e a Tecnologia, within a PhD Grant Number: 2021.05767.BD.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Ramos, B., Pereira, T., Silva, F., Costa, J.L., Oliveira, H.P. (2022). Differential Gene Expression Analysis of the Most Relevant Genes for Lung Cancer Prediction and Sub-type Classification. In: Pinho, A.J., Georgieva, P., Teixeira, L.F., Sánchez, J.A. (eds) Pattern Recognition and Image Analysis. IbPRIA 2022. Lecture Notes in Computer Science, vol 13256. Springer, Cham. https://doi.org/10.1007/978-3-031-04881-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-04881-4_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04880-7
Online ISBN: 978-3-031-04881-4
eBook Packages: Computer ScienceComputer Science (R0)