Abstract
We are going through the last years of the COVID-19 pandemic, where almost the entire research community has focused on the challenges that constantly arise. From the computational and mathematical perspective, we have to deal with a dataset with ultra-high volume and ultra-high dimensionality in several experimental studies. An indicative example is DNA sequencing technologies, which offer a more realistic picture of human diseases at the molecular biology level. However, these technologies produce data with high complexity and ultra-high dimensionality. On the other hand, dimensionality reduction techniques are the first choice to address this complexity, revealing the hidden data structure in the original multidimensional space. Also, such techniques can improve the efficiency of machine learning tasks such as classification and clustering. Towards this direction, we study the behavior of seven well-known and cutting-edge dimensionality reduction techniques tailored for RNA-sequencing data. Along with the study of the effect of these algorithms, we propose the extension of the Random projection and Geodesic distance t-Stochastic Neighbor Embedding (RGt-SNE) algorithm, a recent t-Stochastic Neighbor Embedding (t-SNE) improvement. We suggest a new distance criterion for the kernel matrix construction. Our results show the potential of the proposed algorithm and, at the same time, highlight the complexity of the COVID-19 data, which are not separable, creating a significant challenge that the Machine Learning field will have to face.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Ioannidis, J.P., Salholz-Hillel, M., Boyack, K.W., Baas, J.: The rapid, massive growth of COVID-19 authors in the scientific literature. R. Soc. Open Sci. 8(9), 210389 (2021)
Bohn, M.K., Hall, A., Sepiashvili, L., Jung, B., Steele, S., Adeli, K.: Pathophysiology of COVID-19: mechanisms underlying disease severity and progression. Physiology 35(5), 288–301 (2020)
Feng, W., et al.: Molecular diagnosis of COVID-19: challenges and research needs. Anal. Chem. 92(15), 10196–10209 (2020)
Qi, C., et al.: SCovid: single-cell atlases for exposing molecular characteristics of COVID-19 across 10 human tissues. Nucleic Acids Res. 50(D1), D867–D874 (2022)
Saliba, A.E., Westermann, A.J., Gorski, S.A., Vogel, J.: Single-cell RNA-seq: advances and future challenges. Nucleic Acids Res. 42(14), 8845–8860 (2014)
Wilk, A.J., et al.: A single-cell atlas of the peripheral immune response in patients with severe COVID-19. Nat. Med. 26(7), 1070–1076 (2020)
Luecken, M.D., Theis, F.J.: Current best practices in single-cell RNA-seq analysis: a tutorial. Mol. Syst. Biol. 15(6), e8746 (2019)
Sun, S., Zhu, J., Ma, Y., Zhou, X.: Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis. Genome Biol. 20(1), 1–21 (2019)
Fernandes, J.D., et al.: The UCSC SARS-CoV-2 genome browser. Nat. Genet. 52(10), 991–998 (2020)
Hasin, Y., Seldin, M., Lusis, A.: Multi-omics approaches to disease. Genome Biol. 18(1), 1–15 (2017)
Abd-Alrazaq, A., et al.: Artificial intelligence in the fight against COVID-19: scoping review. J. Med. Internet Res. 22(12), e20756 (2020)
Van Der Maaten, L., Postma, E., Van den Herik, J.: Dimensionality reduction: a comparative. J. Mach. Learn. Res. 10(66–71), 13 (2009)
Jolliffe, I.T., Cadima, J.: Principal component analysis: a review and recent developments. Philos. Trans. R. Soc. A: Math. Phys. Eng. Sci. 374(2065), 20150202 (2016)
Kobak, D., Berens, P.: The art of using t-SNE for single-cell transcriptomics. Nat. Commun. 10(1), 1–14 (2019)
McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11) (2008)
Becht, E., et al.: Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37(1), 38–44 (2019)
Narayan, A., Berger, B., Cho, H.: Assessing single-cell transcriptomic variability through density-preserving data visualization. Nat. Biotechnol. 39(6), 765–774 (2021)
Moon, K.R., et al.: Visualizing structure and transitions in high-dimensional biological data. Nat. Biotechnol. 37(12), 1482–1492 (2019)
Vrahatis, A.G., Tasoulis, S.K., Dimitrakopoulos, G.N., Plagianakos, V.P.: Visualizing high-dimensional single-cell RNA-seq data via random projections and geodesic distances. In: 2019 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 1–6. IEEE (2019)
Pardo-Diaz, J., Bozhilova, L.V., Beguerisse-Díaz, M., Poole, P.S., Deane, C.M., Reinert, G.: Robust gene coexpression networks using signed distance correlation. Bioinformatics 37(14), 1982–1989 (2021)
Liesecke, F., et al.: Ranking genome-wide correlation measurements improves microarray and RNA-seq based global and targeted co-expression networks. Sci. Rep. 8(1), 1–16 (2018)
Tarashansky, A.J., Xue, Y., Li, P., Quake, S.R., Wang, B.: Self-assembling manifolds in single-cell RNA sequencing data. Elife 8, e48994 (2019)
Lieberman, N.A., et al.: In vivo antiviral host transcriptional response to SARS-CoV-2 by viral load, sex, and age. PLoS Biol. 18(9), e3000849 (2020)
Ng, D.L., et al.: A diagnostic host response biosignature for COVID-19 from RNA profiling of nasal swabs and blood. Sci. Adv. 7(6), eabe5984 (2021)
Overmyer, K.A., et al.: Large-scale multi-omic analysis of COVID-19 severity. Cell Syst. 12(1), 23–40 (2021)
Silvin, A., et al.: Elevated calprotectin and abnormal myeloid cell subsets discriminate severe from mild COVID-19. Cell 182(6), 1401–1418 (2020)
Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)
Rendón, E., Abundez, I., Arizmendi, A., Quiroz, E.M.: Internal versus external cluster validation indexes. Int. J. Comput. Commun. 5(1), 27–34 (2011)
Bolshakova, N., Azuaje, F.: Cluster validation techniques for genome expression data. Signal Process. 83(4), 825–833 (2003)
Cakir, B., Prete, M., Huang, N., Van Dongen, S., Pir, P., Kiselev, V.Y.: Comparison of visualization tools for single-cell RNAseq data. NAR Genomics Bioinform. 2(3), lqaa052 (2020)
Acknowledgements
This project has received funding from the Hellenic Foundation for Research and Innovation(HFRI) and the General Secretariat for Research and Technology (GSRT), under grant agreement No 1901.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Dallas, I.L., Vrahatis, A.G., Tasoulis, S.K., Plagianakos, V.P. (2022). Recent Dimensionality Reduction Techniques for High-Dimensional COVID-19 Data. In: Chicco, D., et al. Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2021. Lecture Notes in Computer Science(), vol 13483. Springer, Cham. https://doi.org/10.1007/978-3-031-20837-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-20837-9_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20836-2
Online ISBN: 978-3-031-20837-9
eBook Packages: Computer ScienceComputer Science (R0)