skip to main content
research-article

Performing Cancer Diagnosis via an Isoform Expression Ranking-based LSTM Model

Published: 14 November 2023 Publication History

Abstract

The known set of genetic factors involved in the development of several types of cancer has considerably been expanded, thus easing to devise and implement better therapeutic strategies. The automatic diagnosis of cancer, however, remains as a complex task because of the high heterogeneity of tumors and the biological variability between samples. In this work, a long short-term memory network-based model is proposed for diagnosing cancer from transcript-base data. An efficient method that transforms data into gene/isoform expression-based rankings was formulated, allowing us to directly embed important information in the relative order of the elements of a ranking that can subsequently ease the classification of samples. The proposed predictive model leverages the power of deep recurrent neural networks, being able to learn existing patterns on the individual rankings of isoforms describing each sample of the population. To evaluate the suitability of the proposal, an extensive experimental study was conducted on 17 transcript-based datasets, and the results showed the effectiveness of this novel approach and also indicated the gene/isoforms expression-based rankings contained valuable information that can lead to a more effective cancer diagnosis.

References

[1]
Miguel A. García-Campos et al.2015. Pathway analysis: State of the art. Front. Physiol. 6 (Dec. 2015). DOI:
[2]
Mortazavi Ali et al.2008. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Meth. 5, 7 (May 2008), 621–628. DOI:
[3]
Simon Anders and Wolfgang Huber. 2010. Differential expression analysis for sequence count data. Genome Biol. 11, 10 (2010), R106.
[4]
Beate Bergmann and Gerhard Hommel. 1988. Improvements of general multiple test procedures for redundant systems of hypotheses. In Multiple Hypothesenprüfung/Multiple Hypotheses Testing. Springer, 100–115.
[5]
Sabri Boughorbel, Fethi Jarray, and Mohammed El-Anbari. 2017. Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PLoS ONE 12, 6 (June 2017), e0177678. DOI:
[6]
Michael P. S. Brown et al.2000. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. U.S.A. 97, 1 (2000), 262–267.
[7]
S. B. Cho and H.-H. Won. 2007. Cancer classification using ensemble of neural networks with multiple significant gene subsets. Appl. Intell. 26, 3 (2007), 243–250. DOI:
[8]
Héctor Climente-González et al.2017. The functional impact of alternative splicing in cancer. Cell Rep. 20, 9 (2017), 2215–2226.
[9]
Padideh Danaee, Reza Ghaeini, and David A. Hendrix. 2016. A deep learning approach for cancer detection and relevant gene identification. In Biocomputing 2017. World Scientific.
[10]
C. M. Dasari and R. Bhukya. 2021. Explainable deep neural networks for novel viral genome prediction. Appl. Intell. (2021). DOI:
[11]
Mercedes del Río-Moreno et al.2019. Dysregulation of the splicing machinery is associated to the development of non-alcoholic fatty liver disease. J. Clin. Endocrinol. Metab. 104, 8 (2019), 3389–3402.
[12]
Ramón Díaz-Uriarte and Sara Alvarez De Andres. 2006. Gene selection and classification of microarray data using random forest. BMC Bioinform. 7, 1 (2006), 3.
[13]
M. L. Disis. 2010. Immune regulation of cancer. J. Clin. Oncol. 28, 29 (2010), 4531–4538.
[14]
Cynthia Dwork et al.2001. Rank aggregation methods for the web. In Proceedings of the 10th International Conference on World Wide Web. ACM, 613–622.
[15]
Antonio C. Fuentes-Fayos et al.2020. Splicing machinery dysregulation drives glioblastoma development/aggressiveness: Oncogenic role of SRSF3. Brain 143, 11 (11 2020), 3273–3293. Retrieved from arXiv:https://academic.oup.com/brain/article-pdf/143/11/3273/36134859/awaa273.pdf
[16]
Sergio Pedraza-Arevalo et al.2023. Spliceosomic dysregulation unveils NOVA1 as a candidate actionable therapeutic target in pancreatic neuroendocrine tumors. Translat. Res. 251 (2023), 63–73. DOI:
[17]
Y. A. Fouad and C. Aanei. 2017. Revisiting the hallmarks of cancer. Am J Cancer Res 7, 5 (2017), 1016–1036.
[18]
Milton Friedman. 1940. A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat. 11, 1 (1940), 86–92.
[19]
M. D. Gahete et al.2018. Changes in splicing machinery components influence, precede, and early predict the development of type 2 diabetes: From the CORDIOPREV study. EBioMedicine 37 (2018), 356–365.
[20]
Donald Geman et al.2004. Classifying gene expression profiles from pairwise mRNA comparisons. Stat. Appl. Genet. Mo. B 3, 1 (2004), 1–19.
[21]
S. Govindarajan and R. Swaminathan. 2021. Differentiation of COVID-19 conditions in planar chest radiographs using optimized convolutional neural networks. Appl. Intell. 51, 5 (2021), 2764–2775. DOI:
[22]
Yang Guo, Xuequn Shang, and Zhanhuai Li. 2019. Identification of cancer subtypes by integrating multiple types of transcriptomics data with deep learning in breast cancer. Neurocomputing 324 (2019), 20–30.
[23]
Yang Guo et al.2018. BCDForest: A boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data. BMC Bioinform. 19, 5 (2018), 118.
[24]
D. Hanahan and R. A. Weinberg. 2011. Hallmarks of cancer: The next generation. Cell 144, 5 (2011), 646–674.
[25]
X. Huang et al.2018. Feature clustering-based support vector machine recursive feature elimination for gene selection. Appl. Intell. 48, 3 (2018), 594–607. DOI:
[26]
Nathan T. Johnson, Andi Dhroso, Katelyn J. Hughes, and Dmitry Korkin. 2018. Biological classification with RNA-seq data: Can alternatively spliced transcript expression enhance machine learning classifiers? RNA 24, 9 (June 2018), 1119–1132. DOI:
[27]
Juan M. Jiménez-Vacas et al.2020. Dysregulation of the splicing machinery is directly associated to aggressiveness of prostate cancer. EBioMedicine 51 (2020), 102547. DOI:
[28]
Diederik P. Kingma and Jimmy Ba. 2014. ADAM: A method for stochastic optimization. Retrieved from https://arXiv:1412.6980
[29]
Yunchuan Kong and Tianwei Yu. 2018. A deep neural network model using random forest to extract feature representation for gene expression data classification. Sci. Rep. 8, 1 (2018), 16477.
[30]
Yunchuan Kong and Tianwei Yu. 2018. A graph-embedded deep feedforward network for disease outcome classification and feature selection using gene expression data. Bioinformatics 34, 21 (2018), 3727–3737.
[31]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. J. Mach. Learn. Res. 9 (2008), 2579–2605. DOI:arxiv:1307.1662
[32]
Y. Lecun, Y. Bengio, and G. Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436–444. DOI:
[33]
N. Martinez-Montiel et al.2018. Alternative splicing as a target for cancer treatment. Int J Mol Sci 19, 2 (2018), 545.
[34]
E. Pérez et al.2021. Convolutional neural networks for the automatic diagnosis of melanoma: An extensive experimental study. Med. Image Anal. 67 (2021).
[35]
O. Reyes et al.2020. A supervised machine learning-based methodology for analyzing dysregulation in splicing machinery: An application in cancer diagnosis. Artif. Intell. Med. 108 (2020), 101950.
[36]
E. Sebestyén, M. Zawisza, and E. Eyras. 2015. Detection of recurrent alternative splicing switches in tumor samples reveals novel signatures of cancer. Nucleic Acids Res. 43, 3 (2015), 1345–1356.
[37]
Endre Sebestyén et al.2016. Large-scale analysis of genome and transcriptome alterations in multiple tumors unveils novel cancer-relevant splicing networks. Genome Res. 26, 6 (Apr. 2016), 732–744. DOI:
[38]
L. Shkreta and B. Chabot. 2015. The RNA splicing response to DNA damage. Biomolecules 5, 4 (2015), 2935–77.
[39]
Rebecca L. Siegel, D. Miller Kimberly, and Jemal Ahmedin. 2018. Cancer statistics, 2018. CA: Cancer J. Clin. 68, 1 (2018), 7–30.
[40]
Aravind Subramanian et al.2005. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U.S.A. 102, 43 (2005), 15545–15550.
[41]
Tijmen Tieleman and Geoffrey Hinton. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. 4, 2 (2012), 26–31.
[42]
M. L. Truitt and D. Ruggero. 2016. New frontiers in translational control of the cancer genome. Nat. Rev. Cancer 16, 5 (2016), 288–304.
[43]
Morgan L. Truitt et al.2015. Differential requirements for eIF4E dose in normal development and cancer. Cell 162, 1 (2015), 59–71.
[44]
Daniel Urda et al.2017. Deep learning to analyze RNA-Seq gene expression data. In Proceedings of the International Work-Conference on Artificial Neural Networks. Springer, 50–59.
[45]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Adv. Neural Info. Process. Syst. 30 (2017).
[46]
S. Verma et al.2018. Collective feature selection to identify crucial epistatic variants. BioData Min. 11, 1 (2018), 5.
[47]
Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometrics 1, 6 (1945), 80–83.
[48]
Yawen Xiao et al.2018. A deep learning-based multi-model ensemble method for cancer prediction. Comput. Meth. Prog. Bio. 153 (2018), 1–9.
[49]
Yawen Xiao et al.2018. A semi-supervised deep learning method based on stacked sparse auto-encoder for cancer prediction using RNA-seq data. Comput. Meth. Prog. Bio. 166 (2018), 99–105.
[50]
Jing Xu et al.2019. A novel deep flexible neural forest model for classification of cancer subtypes based on gene expression data. IEEE Access 7 (2019), 22086–22095.
[51]
Zhi-Hua Zhou and Ji Feng. 2017. Deep forest: Towards an alternative to deep neural networks. Retrieved from https://arXiv:1702.08835
[52]
U. Çavuşoğlu.2019. A new hybrid approach for intrusion detection using machine learning methods. Appl. Intell. 49, 7 (2019), 2735–2761. DOI:

Index Terms

  1. Performing Cancer Diagnosis via an Isoform Expression Ranking-based LSTM Model

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 14, Issue 6
      December 2023
      493 pages
      ISSN:2157-6904
      EISSN:2157-6912
      DOI:10.1145/3632517
      • Editor:
      • Huan Liu
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 14 November 2023
      Online AM: 22 September 2023
      Accepted: 05 September 2023
      Revised: 22 June 2023
      Received: 29 December 2022
      Published in TIST Volume 14, Issue 6

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Cancer diagnosis
      2. gene/isoform expression-based ranking
      3. Long Short-Term Memory network

      Qualifiers

      • Research-article

      Funding Sources

      • Health Institute Carlos III of Spain

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 194
        Total Downloads
      • Downloads (Last 12 months)115
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 16 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media