Abstract
In the past fifteen years, the advent of Next-Generation Sequencing technologies, characterized by high efficiency and reduced costs, has marked a pivotal turn for research across various fields including molecular biology, genetics, and molecular medicine. Projects that would have previously required extensive timeframes and significant investments can now be completed swiftly at a fraction of the cost. A direct consequence of the proliferation of these systems is the exponential increase in data generated by RNA-Seq experiments. Much of this data originates from biological samples (cells, tissues, mucus, etc.) of organisms with either absent or incomplete genomic annotations. Compounding this issue is the fact that the surge in data has not been matched by the development of adequate software tools capable of analyzing RNA-Seq data for such organisms. Currently available tools have several limitations: a) they operate in silos, so they only support certain types of analyses, thus complicating the biological interpretation of results; b) they are often executable only via Web interfaces, overlooking the parallelism and efficiency offered by modern supercomputers; c) functional analysis tools rely on outdated functional annotations or support only a limited set of organisms with genomic annotation; d) only one comparison (between two different experimental conditions) can be tested at each run. In order to overcome these limitations, we present IGUANER - (DIfferential Gene expression and fUnctionAl aNalyzER), a software aimed at ensuring the capability for integrated and up-to-date analysis of RNA-Seq data from any organism, regardless of the level of genomic annotation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Alonso, A., et al.: aRNApipe: a balanced, efficient and distributed pipeline for processing RNA-Seq data in high-performance computing environments. Bioinformatics 33(11), 1727–1729 (2017). https://doi.org/10.1093/bioinformatics/btx023
Ashburner, M., et al.: Gene ontology: tool for the unification of biology. Nat. Genet. 25(1), 25–29 (2000). https://doi.org/10.1038/75556
Bolis, M., et al.: Network-guided modeling allows tumor-type independent prediction of sensitivity to all-trans-retinoic acid. Ann. Oncol. 28(3), 611–621 (2017). https://doi.org/10.1093/annonc/mdw660
Cantalapiedra, C.P., et al.: eggnog-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol. Biol. Evol. 38(12), 5825–5829 (2021). https://doi.org/10.1093/molbev/msab293
Castrignanò, T., et al.: ASPIC: a web resource for alternative splicing prediction and transcript isoforms characterization. Nucleic Acids Res. 34(WEB. SERV. ISS.), W440–W443 (2006). https://doi.org/10.1093/nar/gkl324
Castrignanò, T., et al.: ASPicDB: a database resource for alternative splicing analysis. Bioinformatics 24(10), 1300–1304 (2008). https://doi.org/10.1093/bioinformatics/btn113
Castrignanò, T., et al.: ELIXIR-IT HPC@CINECA: high performance computing resources for the bioinformatics community. BMC Bioinform. 21 (2020). https://doi.org/10.1186/s12859-020-03565-8
Chiara, M., et al.: CoVaCS: a consensus variant calling system. BMC Genom. 19(1) (2018). https://doi.org/10.1186/s12864-018-4508-1
Chiocchio, A., et al.: Brain de novo transcriptome assembly of a toad species showing polymorphic anti-predatory behavior. Sci. Data 9(1) (2022). https://doi.org/10.1038/s41597-022-01724-5
Cirilli, M., et al.: PeachVar-DB: a curated collection of genetic variations for the interactive analysis of peach genome data. Plant Cell Physiol. 59(1) (2018). https://doi.org/10.1093/pcp/pcx183
Consortium, T.U.: UniProt: the universal protein knowledgebase in 2023. Nucleic Acids Res. 51(D1), D523–D531 (2022). https://doi.org/10.1093/nar/gkac1052
Consortium The Gene Ontology: The gene ontology knowledgebase in 2023. Genetics 224(1), iyad031 (2023). https://doi.org/10.1093/genetics/iyad031
Costa-Silva, J., Domingues, D., Lopes, F.M.: RNA-Seq differential expression analysis: an extended review and a software tool. PLoS ONE 12(12), e0190152 (2017). https://doi.org/10.1371/journal.pone.0190152
Flati, T., et al.: A gene expression atlas for different kinds of stress in the mouse brain. Sci. Data 7(1) (2020). https://doi.org/10.1038/s41597-020-00772-z
Flati, T., et al.: HPC-REDItools: a novel HPC-aware tool for improved large scale RNA-editing analysis. BMC Bioinform. 21 (2020). https://doi.org/10.1186/s12859-020-03562-x
Ge, S.X., Son, E.W., Yao, R.: iDEP: an integrated web application for differential expression and pathway analysis of RNA-Seq data. BMC Bioinform. 19(1) (2018). https://doi.org/10.1186/s12859-018-2486-6
Gillespie, M., et al.: The reactome pathway knowledgebase 2022. Nucleic Acids Res. 50(D1), D687–D692 (2022). https://doi.org/10.1093/nar/gkab1028
Huang, Q., et al.: RNA-Seq analyses generate comprehensive transcriptomic landscape and reveal complex transcript patterns in hepatocellular carcinoma. PLoS ONE 6(10), e26168 (2011). https://doi.org/10.1371/journal.pone.0026168
Hunter, J.D.: Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9(3), 90–95 (2007). https://doi.org/10.1109/MCSE.2007.55
Jimenez-Jacinto, V., Sanchez-Flores, A., Vega-Alvarado, L.: Integrative differential expression analysis for multiple experiments (IDEAMEX): a web server tool for integrated RNA-Seq data analysis. Front. Genet. 10(MAR) (2019). https://doi.org/10.3389/fgene.2019.00279
Kalari, K.R., et al.: MAP-RSeq: mayo analysis pipeline for RNA sequencing. BMC Bioinform. 15(1) (2014). https://doi.org/10.1186/1471-2105-15-224
Kanehisa, M., Goto, S.: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000). https://doi.org/10.1093/nar/28.1.27
Kanehisa, M., Sato, Y., Morishima, K.: BlastKOALA and ghostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J. Mol. Biol. 428(4), 726–731 (2016). https://doi.org/10.1016/j.jmb.2015.11.006
Kanehisa, M., et al.: KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45(D1), D353–D361 (2017). https://doi.org/10.1093/nar/gkw1092
Kanehisa, M., et al.: KEGG for taxonomy-based analysis of pathways and genomes. Nucleic Acids Res. 51(D1), D587–D592 (2023). https://doi.org/10.1093/nar/gkac963
Kohen, R., et al.: UTAP: user-friendly transcriptome analysis pipeline. BMC Bioinform. 20(1) (2019). https://doi.org/10.1186/s12859-019-2728-2
Langfelder, P., Horvath, S.: WGCNA: an R package for weighted correlation network analysis. BMC Bioinform. 9(1), 559 (2008). https://doi.org/10.1186/1471-2105-9-559
Libro, P., et al.: First brain de novo transcriptome of the Tyrrhenian tree frog, Hyla sarda, for the study of dispersal behavior. Front. Ecol. Evol. 10 (2022). https://doi.org/10.3389/fevo.2022.947186
Libro, P., et al.: De novo transcriptome assembly and annotation for gene discovery in salamandra salamandra at the larval stage. Sci. Data 10(1) (2023). https://doi.org/10.1038/s41597-023-02217-9
Lohse, M., et al.: RobiNA: a user-friendly, integrated software solution for RNA-Seq-based transcriptomics. Nucleic Acids Res. 40(W1), W622–W627 (2012). https://doi.org/10.1093/nar/gks540
Lombardozzi, V., et al.: An interactive database for an ecological analysis of stone biopitting. Int. Biodeterior. Biodegrad. 73, 8–15 (2012). https://doi.org/10.1016/j.ibiod.2012.04.016
Love, M.I., Huber, W., Anders, S.: Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Genome Biol. 15(12), 550 (2014). https://doi.org/10.1186/s13059-014-0550-8
Marguerat, S., Bähler, J.: RNA-Seq: from technology to biology. Cell. Mol. Life Sci. 67(4), 569–579 (2010). https://doi.org/10.1007/s00018-009-0180-6
McKinney, W.: Data structures for statistical computing in python. In: van der Walt, S., Millman, J. (eds.) Proceedings of the 9th Python in Science Conference, pp. 56–61 (2010). https://doi.org/10.25080/Majora-92bf1922-00a
Mistry, J., et al.: Pfam: the protein families database in 2021. Nucleic Acids Res. 49(D1), D412–D419 (2021). https://doi.org/10.1093/nar/gkaa913
Monier, B., et al.: IRIS-EDA: an integrated RNA-Seq interpretation system for gene expression data analysis. PLoS Comput. Biol. 15(2) (2019). https://doi.org/10.1371/journal.pcbi.1006792
Palomba, M., et al.: De novo transcriptome assembly and annotation of the third stage larvae of the zoonotic parasite Anisakis pegreffii. BMC Res. Notes 15(1) (2022). https://doi.org/10.1186/s13104-022-06099-9
Palomba, M., et al.: De novo transcriptome assembly of an Antarctic nematode for the study of thermal adaptation in marine parasites. Sci. Data 10(1) (2023). https://doi.org/10.1038/s41597-023-02591-4
Patro, R., et al.: Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14(4), 417–419 (2017). https://doi.org/10.1038/nmeth.4197
Pertea, M., et al.: StringTie enables improved reconstruction of a transcriptome from RNA-Seq reads. Nat. Biotechnol. 33(3), 290–295 (2015). https://doi.org/10.1038/nbt.3122
Picardi, E., et al.: ExpEdit: a webserver to explore human RNA editing in RNA-Seq experiments. Bioinformatics 27(9), 1311–1312 (2011). https://doi.org/10.1093/bioinformatics/btr117
Reyes, A., et al.: GENAVi: a shiny web application for gene expression normalization, analysis and visualization. BMC Genom. 20(1) (2019). https://doi.org/10.1186/s12864-019-6073-7
Schmidt, B., Hildebrandt, A.: Next-generation sequencing: big data meets high performance computing. Drug Discov. Today 22(4), 712–717 (2017). https://doi.org/10.1016/j.drudis.2017.01.014
Su, W., Sun, J., Shimizu, K., Kadota, K.: TCC-GUI: a shiny-based application for differential expression analysis of RNA-Seq count data. BMC Res. Notes 12(1) (2019). https://doi.org/10.1186/s13104-019-4179-2
Surachat, K., et al.: aTAP: automated transcriptome analysis platform for processing RNA-Seq data by de novo assembly. Heliyon 8(8) (2022). https://doi.org/10.1016/j.heliyon.2022.e10255
Tripathi, R., et al.: Next-generation sequencing revolution through big data analytics. Front. Life Sci. 9(2), 119–149 (2016). https://doi.org/10.1080/21553769.2016.1178180
Wang, Z., Gerstein, M., Snyder, M.: RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2009). https://doi.org/10.1038/nrg2484
Weaver, K., et al.: An Introduction to Statistical Analysis in Research: With Applications in the Biological and Life Sciences. Wiley, Hoboken (2017). https://doi.org/10.1002/9781119454205
Wickham, H.: ggplot2: Elegant Graphics for Data Analysis. Springer, New York (2016), https://ggplot2.tidyverse.org
Wickham, H., Vaughan, D., Girlich, M.: tidyr: tidy messy data (2023). https://tidyr.tidyverse.org
Wickham H., et al.: dplyr: a grammar of data manipulation (2023). https://dplyr.tidyverse.org
Wu, T., et al.: clusterprofiler 4.0: a universal enrichment tool for interpreting omics data. Innov. (Camb.) 2(3), 100141 (2021). https://linkinghub.elsevier.com/retrieve/pii/S2666675821000667
Yu, G., et al.: clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS: J. Integr. Biol. 16(5), 284–287 (2012). https://doi.org/10.1089/omi.2011.0118
Acknowledgments
Part of this research is based on the Cooperative Research Project at the Research Center for Biomedical Engineering CRP-BE-2057.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Pinna, V., Di Martino, J., Liberati, F., Bottoni, P., Castrignanò, T. (2024). IGUANER - DIfferential Gene Expression and fUnctionAl aNalyzER. In: Sachdeva, S., Watanobe, Y. (eds) Big Data Analytics in Astronomy, Science, and Engineering. BDA 2023. Lecture Notes in Computer Science, vol 14516. Springer, Cham. https://doi.org/10.1007/978-3-031-58502-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-58502-9_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-58501-2
Online ISBN: 978-3-031-58502-9
eBook Packages: Computer ScienceComputer Science (R0)