Abstract
Genomes are complex biological structures that encode information that can be translated onto several levels, such as genes and proteins. Identification of relevant patterns in genomes is of paramount importance, as they may indicate states of biological or medical relevance. Among the patterns that can be detected, anomalies are especially relevant. Anomalies are instances that do not resemble, under certain metrics, the rest of the observations under study. Anomalies and their detection are relevant since their presence may indicate a systematic error in some stage of the analyzed process or structure, or may indicate that the studied system or phenomenon is undergoing a phase transition or other relevant drift in its dynamics. Here, we applied unsupervised anomaly detection algorithms to the codon usage of the genomes of thousands of SARS COV2 virus isolated in Mexico. Codon usage condenses the relative frequency of appearance of nucleotide triplets, or codons, which code for amino acids, the basic blocks of proteins. By applying several algorithms, we detected patterns that are of epidemiological relevance. The detected patterns are anomalous genomes based on their codon usage. Anomalous patterns are relevant not only because they have not been previously detected in data from Mexico, but also because they allow identification of one of the possible sources of the anomalies. Most of these anomalies were identified in two neighboring states in Mexico, namely Puebla and Tlaxcala. In addition, we identified that almost all anomalies come from subjects who were treated in the same laboratory. Based on the evidence we present here, we conclude that anomaly detection algorithms are relevant in the surveillance of epidemics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Jain, A., et al.: Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 4–37 (2000). https://doi.org/10.1109/34.824819
Vogt, J.: Unsupervised structure detection in biomedical data. IEEE Trans. Comp. Biol. Bioinform. (2015). https://doi.org/10.1109/TCBB.2015.2394408
Markou, M., Singh, M.: Novelty detection: a review-Part 1: statistical approaches. Signal Proc. 83(12), 2481–2497 (2003). https://doi.org/10.1016/j.sigpro.2003.07.0
Markou, M., Singh, M.: Novelty detection: a review-Part 2: neural network based approaches. Signal Proc. 83(12), 2499–2521 (2003). https://doi.org/10.1016/j.sigpro.2003.07.019
Wu, F., et al.: A new coronavirus associated with human respiratory disease in China. Nature 7798, 265–269 (2020). https://doi.org/10.1038/s41586-020-2008-3
WHO Director-General’s opening remarks at the media briefing on COVID-19 - 11 March 2020. https://www.who.int/director-general/speeches/detail/who-director-general-s-opening-remarks-at-the-media-briefing-on-covid-19---11-march-2020
Hou, W.: Characterization of codon usage pattern in SARS-CoV-2. Virol. J. (2020). https://doi.org/10.1186/s12985-020-01395-x
Gordon, D., et al.: A SARS-CoV-2 protein interaction map reveals targets for drug repurposing. Nature 583(7816) (2020). https://doi.org/10.1038/s41586-020-2286-9
Davidson, A.: Characterisation of the transcriptome and proteome of SARS-CoV-2 reveals a cell passage induced in-frame deletion of the furin-like cleavage site from the spike glycoprotein. Gen. Med. (2020). https://doi.org/10.1186/s13073-020-00763-0
Maloy, S., Hughes, K.: Brenner’s Encyclopedia of Genetics (Second Edition). Academic Press, San Diego (2013). ISBN: 978-0-08-096156-9
Simón, D., et al.: Nucleotide composition and codon usage across viruses and their respective hosts. Front. Microbiol. (2021). https://doi.org/10.3389/fmicb.2021.64630
Posani, E., et al.: Temporal evolution and adaptation of SARS-CoV-2 codon usage. Front. Biosci. 27(1) (2022). https://doi.org/10.31083/j.fbl2701013
Pimentel, M., et al.: A review on novelty detection. Signal Proc., 215–249 (2014)
Legaria, U., et al.: Anomaly detection in the probability simplex under different geometries. Info. Geo. 6, 385–412 (2023). https://doi.org/10.1007/s41884-023-00107-y
Irfan, A., et al.: Anomaly detection using K-Means and long-short term memory for predictive maintenance of large-scale photovoltaic plant. Energy Rep. (2023). https://doi.org/10.1016/j.egyr.2023.09.159
Liu, F.T., et al.: Isolation forest. In: Eighth IEEE International Conference on Data Mining, pp. 413–422 (2008). https://doi.org/10.1109/ICDM.2008.17
Vincent, P.L.H.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)
Welling, M., Kingma, D.: An introduction to variational autoencoders. Found. Trends Mach. Learn. 12(4), 307–392 (2019)
Chen, Z., Yeo, C., Lee, B., Lau, C.: Autoencoder-based network anomaly detection. In: 2018 Wireless Telecommunications Symposium (WTS), pp. 1–5 (2018)
Ferré, Q., Chèneby, J., Puthier, D., Capponi, C., Ballester, B.: Anomaly detection in genomic catalogues using unsupervised multi-view autoencoders. BMC Bioinform. 22 (2021)
Tenenbaum, J., Silva, V., Langdord, J.: A global geometric framework for nonlinear dimensionality reduction. Science 290 (2000)
Acknowledgments
AN thanks PAPIIT under project TA101323 for financial support. SM and BS received CONAHCYT scholarships for their postgraduate studies.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Martínez, S., Salas, B., Pérez, N., Neme, A. (2025). Unsupervised Anomaly Detection Algorithms Unveil Relevant Temporal and Spatial Patterns in the SARS COV2 Codon Usage in México. In: Martínez-Villaseñor, L., Ochoa-Ruiz, G. (eds) Advances in Soft Computing. MICAI 2024. Lecture Notes in Computer Science(), vol 15247. Springer, Cham. https://doi.org/10.1007/978-3-031-75543-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-75543-9_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-75542-2
Online ISBN: 978-3-031-75543-9
eBook Packages: Computer ScienceComputer Science (R0)