YMLA: A comparative platform to carry out functional enrichment analysis for multiple gene lists in yeast
Introduction
Biologists used to study one gene or few genes at a time. With recent advancements in high-throughput technologies and sequencing methods, it is routine to obtain several lists of genes under various cellular conditions nowadays [1], [2]. For example, a list of genes targeted by some transcription factor (TF) can be obtained via chromatin immunoprecipitation (ChIP) experiments coupled with microarray or high-throughput sequencing [3], [4], [5]. And the development of RNA sequencing (RNA-seq) techniques makes it possible to grab multiple lists of highly expressed genes under different environmental stimuli (such as heat or poisons) [6], [7]. Various Mass spectrometry methods are also available to measure the lists of existing proteins in different samples [8], [9], [10], [11]. Based on the identified multiple gene/protein lists, functional mechanism hypotheses can be inferred. Therefore, recognizing the functional properties of multiple gene lists is of great interest to the molecular biology community.
Several genome-wide annotations have been confirmed over the past decades. Common annotations include gene ontology (GO) terms from the GO consortium [12], biological pathways from KEGG [13], protein–protein interaction (PPI) data from BioGRID [14], and transcription factor binding sites (TFBSs) [15]. And these annotation terms can help suggest the potential functional features of the obtained lists of genes [16], [17]. Using statistical enrichment tests, co-functioning genes in a particular functional feature will result in a significant over-representation ratio within the gene list in comparison with the percentage of genes annotated by this same feature in the whole genome [18]. Such significantly enriched features can thus potentially form the functional hypotheses of the gene groups under study. Based on this concept, several enrichment analysis tools were developed for functional annotation on gene lists. For example, GoMiner [19] can provide enriched GO terms for a given gene list. And the DAVID [20] database was constructed to help provide comprehensive functional annotations for a gene group.
RNA-seq or ribosome profiling (ribo-seq) experiments often lead researchers to extract multiple groups of genes from various environmental conditions, raising the need for concurrent and comparative analysis on multiple gene lists. Through simultaneous analysis of multiple gene lists, the shared or distinct functional features among these gene groups can indicate regulatory mechanism insights for different cellular conditions. To facilitate comparative multiple gene list enrichment analysis, a tool that enables the following functionalities is needed: (1) Comparative analysis, summarization, and visualization of the enriched features among multiple gene lists. (2) Integrative platform that includes diverse functional aspects. (3) Customized feature set reanalysis to facilitate easy comparison and retrieval of mechanism hypotheses. Existing popular gene list enrichment analysis tools (under the criterion that Google scholar citations 1000) include DAVID [20], BiNGO [21], GOrilla [22], WEGO [23], GoMiner [19], GOStat [24], g:Profiler [25], FatiGO [26], and MAPPFinder [27]. Nevertheless, these tools focus merely on one single gene list and cannot easily deal with the enrichment analysis of multiple gene lists [28]. Although two recent works, ToppCluster [29] and FLAME [30], were developed to consider multiple gene lists, they did not fully support comparative analysis, result visualization, and customized feature set reanalysis in their designs. And all these tools contain only limited features (less than 17). As a result, no multiple gene list analysis platform currently fulfills all the above research requirements.
In this work, we constructed the YMLA (Yeast Multiple List Analyzer) platform to provide integrative and comparative yeast multiple gene list enrichment analysis. The yeast model organism was selected owing to its availability of comprehensive experimental data and gene annotations. We collected 4 categories of yeast datasets (ontology and functional annotation data, functional gene group data, gene/protein property data, and high-throughput probing data) to evaluate 39 features for multiple gene lists. Tabular forms and visualization using heatmaps and network plots were implemented in YMLA to facilitate comparative analysis and investigation of the shared or distinct features among multiple gene lists. Moreover, we also designed the customized feature set reanalysis function in YMLA to assist in mechanism hypothesis formation based on the chosen features. We demonstrated the biological applicability of YMLA using two example gene groups: a list of genes with top high translation efficiency and a gene list with bottom low translation efficiency in the yeast genome. Based on the enrichment analysis provided by YMLA, novel facts (e.g., codon usage may determine translation efficacy and steady-state mRNA levels) can be deduced. In summary, YMLA aims to deal with the routine functional analysis for multiple gene lists identified in high-throughput experiments. And these feature enrichment analyses on yeast multiple gene groups can further shed light on understanding the functional mechanisms of higher eukaryotes. YMLA is available online at https://cosbi7.ee.ncku.edu.tw/YMLA/.
Section snippets
Construction and contents
The Yeast Multiple List Analyzer (YMLA) platform is designed to support comparative feature enrichment analysis for multiple gene lists. The construction of YMLA can be divided into 3 steps. First, four categories of yeast datasets were gathered in YMLA: ontology and functional annotation data, functional gene groups, gene/protein properties, and high-throughput probing results. The collection encompasses 23 datasets and results in 39 features for functional enrichment analysis. We then
Functions implemented in YMLA
YMLA aims to provide comparative functional enrichment analysis for multiple gene lists in yeast. Four major functions were implemented in YMLA to fulfill this need (Fig. 1): (1) Multiple gene list enrichment analysis. Users can input multiple gene lists in YMLA to perform batch enrichment analysis. (2) Comparative visualization. The feature enrichment results are by default shown in a table. The adjusted feature enrichment -values in the table cells are presented as minus logarithm scores.
Conclusions
The YMLA (Yeast Multiple List Analyzer) platform was constructed in this research to help biologists handle the functional feature enrichment analysis and comparison for multiple gene lists obtained from high-throughput experiments in yeast. To facilitate the usage of this platform, an easy-to-use web interface that can generate tabular results and visualization plots was built. Compared with previous works, YMLA collects the most comprehensive datasets and can further compare and extract the
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by National Cheng Kung University and the National Science and Technology Council of Taiwan [MOST 108-2628-E-006-004-MY3, MOST 111-2636-B-006-013, MOST 110-2221-E-006-198-MY3, MOST 111-2221-E-006-151-MY3, MOST 110-2222-E-006-017, and MOST 111-2221-E-006-231].
References (85)
- et al.
Cancer DEIso: An integrative analysis platform for investigating differentially expressed gene-level and isoform-level human cancer markers
Comput. Struct. Biotechnol. J.
(2021) - et al.
Simultaneous improvement in the precision, accuracy, and robustness of label-free proteome quantification by optimizing data manipulation chains*[S]
Mol. Cell. Proteom.
(2019) - et al.
SSRTool: a web tool for evaluating RNA secondary structure predictions based on species-specific functional interpretability
Comput. Struct. Biotechnol. J.
(2022) - et al.
Identification and distinct regulation of yeast TATA box-containing genes
Cell
(2004) - et al.
Rapidly regulated genes are intron poor
Trends Genet.
(2008) How introns enhance gene expression
Int. J. Biochem. Cell Biol.
(2017)- et al.
cisMEP: an integrated repository of genomic epigenetic profiles and cis-regulatory modules in Drosophila
BMC Syst. Biol.
(2014) - et al.
regCNN: identifying Drosophila genome-wide cis-regulatory modules via integrating the local patterns in epigenetic marks and transcription factor binding motifs
Comput. Struct. Biotechnol. J.
(2022) - et al.
YPIBP: A repository for phosphoinositide-binding proteins in yeast
Comput. Struct. Biotechnol. J.
(2021) - et al.
Mechanisms and consequences of alternative polyadenylation
Mol. Cell
(2011)