Skip to main content

Recent Dimensionality Reduction Techniques for High-Dimensional COVID-19 Data

  • Conference paper
  • First Online:
  • 346 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 13483))

Abstract

We are going through the last years of the COVID-19 pandemic, where almost the entire research community has focused on the challenges that constantly arise. From the computational and mathematical perspective, we have to deal with a dataset with ultra-high volume and ultra-high dimensionality in several experimental studies. An indicative example is DNA sequencing technologies, which offer a more realistic picture of human diseases at the molecular biology level. However, these technologies produce data with high complexity and ultra-high dimensionality. On the other hand, dimensionality reduction techniques are the first choice to address this complexity, revealing the hidden data structure in the original multidimensional space. Also, such techniques can improve the efficiency of machine learning tasks such as classification and clustering. Towards this direction, we study the behavior of seven well-known and cutting-edge dimensionality reduction techniques tailored for RNA-sequencing data. Along with the study of the effect of these algorithms, we propose the extension of the Random projection and Geodesic distance t-Stochastic Neighbor Embedding (RGt-SNE) algorithm, a recent t-Stochastic Neighbor Embedding (t-SNE) improvement. We suggest a new distance criterion for the kernel matrix construction. Our results show the potential of the proposed algorithm and, at the same time, highlight the complexity of the COVID-19 data, which are not separable, creating a significant challenge that the Machine Learning field will have to face.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Ioannidis, J.P., Salholz-Hillel, M., Boyack, K.W., Baas, J.: The rapid, massive growth of COVID-19 authors in the scientific literature. R. Soc. Open Sci. 8(9), 210389 (2021)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Bohn, M.K., Hall, A., Sepiashvili, L., Jung, B., Steele, S., Adeli, K.: Pathophysiology of COVID-19: mechanisms underlying disease severity and progression. Physiology 35(5), 288–301 (2020)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Feng, W., et al.: Molecular diagnosis of COVID-19: challenges and research needs. Anal. Chem. 92(15), 10196–10209 (2020)

    Article  CAS  PubMed  Google Scholar 

  4. Qi, C., et al.: SCovid: single-cell atlases for exposing molecular characteristics of COVID-19 across 10 human tissues. Nucleic Acids Res. 50(D1), D867–D874 (2022)

    Article  CAS  PubMed  Google Scholar 

  5. Saliba, A.E., Westermann, A.J., Gorski, S.A., Vogel, J.: Single-cell RNA-seq: advances and future challenges. Nucleic Acids Res. 42(14), 8845–8860 (2014)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Wilk, A.J., et al.: A single-cell atlas of the peripheral immune response in patients with severe COVID-19. Nat. Med. 26(7), 1070–1076 (2020)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Luecken, M.D., Theis, F.J.: Current best practices in single-cell RNA-seq analysis: a tutorial. Mol. Syst. Biol. 15(6), e8746 (2019)

    Article  PubMed  PubMed Central  Google Scholar 

  8. Sun, S., Zhu, J., Ma, Y., Zhou, X.: Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis. Genome Biol. 20(1), 1–21 (2019)

    Article  Google Scholar 

  9. Fernandes, J.D., et al.: The UCSC SARS-CoV-2 genome browser. Nat. Genet. 52(10), 991–998 (2020)

    Article  PubMed  PubMed Central  Google Scholar 

  10. Hasin, Y., Seldin, M., Lusis, A.: Multi-omics approaches to disease. Genome Biol. 18(1), 1–15 (2017)

    Article  Google Scholar 

  11. Abd-Alrazaq, A., et al.: Artificial intelligence in the fight against COVID-19: scoping review. J. Med. Internet Res. 22(12), e20756 (2020)

    Article  PubMed  PubMed Central  Google Scholar 

  12. Van Der Maaten, L., Postma, E., Van den Herik, J.: Dimensionality reduction: a comparative. J. Mach. Learn. Res. 10(66–71), 13 (2009)

    Google Scholar 

  13. Jolliffe, I.T., Cadima, J.: Principal component analysis: a review and recent developments. Philos. Trans. R. Soc. A: Math. Phys. Eng. Sci. 374(2065), 20150202 (2016)

    Article  Google Scholar 

  14. Kobak, D., Berens, P.: The art of using t-SNE for single-cell transcriptomics. Nat. Commun. 10(1), 1–14 (2019)

    Article  CAS  Google Scholar 

  15. McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)

  16. Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11) (2008)

    Google Scholar 

  17. Becht, E., et al.: Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37(1), 38–44 (2019)

    Article  CAS  Google Scholar 

  18. Narayan, A., Berger, B., Cho, H.: Assessing single-cell transcriptomic variability through density-preserving data visualization. Nat. Biotechnol. 39(6), 765–774 (2021)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Moon, K.R., et al.: Visualizing structure and transitions in high-dimensional biological data. Nat. Biotechnol. 37(12), 1482–1492 (2019)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Vrahatis, A.G., Tasoulis, S.K., Dimitrakopoulos, G.N., Plagianakos, V.P.: Visualizing high-dimensional single-cell RNA-seq data via random projections and geodesic distances. In: 2019 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 1–6. IEEE (2019)

    Google Scholar 

  21. Pardo-Diaz, J., Bozhilova, L.V., Beguerisse-Díaz, M., Poole, P.S., Deane, C.M., Reinert, G.: Robust gene coexpression networks using signed distance correlation. Bioinformatics 37(14), 1982–1989 (2021)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Liesecke, F., et al.: Ranking genome-wide correlation measurements improves microarray and RNA-seq based global and targeted co-expression networks. Sci. Rep. 8(1), 1–16 (2018)

    Article  CAS  Google Scholar 

  23. Tarashansky, A.J., Xue, Y., Li, P., Quake, S.R., Wang, B.: Self-assembling manifolds in single-cell RNA sequencing data. Elife 8, e48994 (2019)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Lieberman, N.A., et al.: In vivo antiviral host transcriptional response to SARS-CoV-2 by viral load, sex, and age. PLoS Biol. 18(9), e3000849 (2020)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Ng, D.L., et al.: A diagnostic host response biosignature for COVID-19 from RNA profiling of nasal swabs and blood. Sci. Adv. 7(6), eabe5984 (2021)

    Google Scholar 

  26. Overmyer, K.A., et al.: Large-scale multi-omic analysis of COVID-19 severity. Cell Syst. 12(1), 23–40 (2021)

    Article  CAS  PubMed  Google Scholar 

  27. Silvin, A., et al.: Elevated calprotectin and abnormal myeloid cell subsets discriminate severe from mild COVID-19. Cell 182(6), 1401–1418 (2020)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)

    Article  CAS  PubMed  Google Scholar 

  29. Rendón, E., Abundez, I., Arizmendi, A., Quiroz, E.M.: Internal versus external cluster validation indexes. Int. J. Comput. Commun. 5(1), 27–34 (2011)

    Google Scholar 

  30. Bolshakova, N., Azuaje, F.: Cluster validation techniques for genome expression data. Signal Process. 83(4), 825–833 (2003)

    Article  Google Scholar 

  31. Cakir, B., Prete, M., Huang, N., Van Dongen, S., Pir, P., Kiselev, V.Y.: Comparison of visualization tools for single-cell RNAseq data. NAR Genomics Bioinform. 2(3), lqaa052 (2020)

    Google Scholar 

Download references

Acknowledgements

This project has received funding from the Hellenic Foundation for Research and Innovation(HFRI) and the General Secretariat for Research and Technology (GSRT), under grant agreement No 1901.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ioannis L. Dallas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dallas, I.L., Vrahatis, A.G., Tasoulis, S.K., Plagianakos, V.P. (2022). Recent Dimensionality Reduction Techniques for High-Dimensional COVID-19 Data. In: Chicco, D., et al. Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2021. Lecture Notes in Computer Science(), vol 13483. Springer, Cham. https://doi.org/10.1007/978-3-031-20837-9_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20837-9_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20836-2

  • Online ISBN: 978-3-031-20837-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics