YMLA: A comparative platform to carry out functional enrichment analysis for multiple gene lists in yeast

doi:10.1016/j.compbiomed.2022.106314

Computers in Biology and Medicine

Volume 151, Part B, December 2022, 106314

https://doi.org/10.1016/j.compbiomed.2022.106314 Get rights and content

Highlights

•
Analyzing comparative features among multiple gene lists is now a routine task.
•
Comparative analysis and hypothesis deduction are the required critical functions.
•
Existing analysis tools provide limited features and lack these functions.
•
YMLA deposits 39 yeast features and enables comparative analysis and visualization.
•
YMLA also provides customized feature set reanalysis to aid hypothesis formation.

Abstract

Comparative analysis among multiple gene lists on their functional features is now a routine task due to the advancement of high-throughput experiments. Several enrichment analysis tools were developed in the past. However, these tools mainly focus on one gene list and contain only gene ontology or interaction features. What makes it worse, comparative investigation and customized feature set reanalysis are still unavailable. Therefore, we constructed the YMLA (Yeast Multiple List Analyzer) platform in this research. YMLA includes 39 yeast features and facilitates comparative analysis among multiple gene lists via tabular views, heatmaps, and network plots. Moreover, the customized feature set reanalysis function was implemented in YMLA to help form mechanism hypotheses based on a selected enriched feature subset. We demonstrated the biological applicability of YMLA via example lists consisting of genes with top/bottom translation efficiency values. The analysis results provided by YMLA reveal novel facts consistent with previous experiments. YMLA is available at https://cosbi7.ee.ncku.edu.tw/YMLA/.

Introduction

Biologists used to study one gene or few genes at a time. With recent advancements in high-throughput technologies and sequencing methods, it is routine to obtain several lists of genes under various cellular conditions nowadays [1], [2]. For example, a list of genes targeted by some transcription factor (TF) can be obtained via chromatin immunoprecipitation (ChIP) experiments coupled with microarray or high-throughput sequencing [3], [4], [5]. And the development of RNA sequencing (RNA-seq) techniques makes it possible to grab multiple lists of highly expressed genes under different environmental stimuli (such as heat or poisons) [6], [7]. Various Mass spectrometry methods are also available to measure the lists of existing proteins in different samples [8], [9], [10], [11]. Based on the identified multiple gene/protein lists, functional mechanism hypotheses can be inferred. Therefore, recognizing the functional properties of multiple gene lists is of great interest to the molecular biology community.

Several genome-wide annotations have been confirmed over the past decades. Common annotations include gene ontology (GO) terms from the GO consortium [12], biological pathways from KEGG [13], protein–protein interaction (PPI) data from BioGRID [14], and transcription factor binding sites (TFBSs) [15]. And these annotation terms can help suggest the potential functional features of the obtained lists of genes [16], [17]. Using statistical enrichment tests, co-functioning genes in a particular functional feature will result in a significant over-representation ratio within the gene list in comparison with the percentage of genes annotated by this same feature in the whole genome [18]. Such significantly enriched features can thus potentially form the functional hypotheses of the gene groups under study. Based on this concept, several enrichment analysis tools were developed for functional annotation on gene lists. For example, GoMiner [19] can provide enriched GO terms for a given gene list. And the DAVID [20] database was constructed to help provide comprehensive functional annotations for a gene group.

RNA-seq or ribosome profiling (ribo-seq) experiments often lead researchers to extract multiple groups of genes from various environmental conditions, raising the need for concurrent and comparative analysis on multiple gene lists. Through simultaneous analysis of multiple gene lists, the shared or distinct functional features among these gene groups can indicate regulatory mechanism insights for different cellular conditions. To facilitate comparative multiple gene list enrichment analysis, a tool that enables the following functionalities is needed: (1) Comparative analysis, summarization, and visualization of the enriched features among multiple gene lists. (2) Integrative platform that includes diverse functional aspects. (3) Customized feature set reanalysis to facilitate easy comparison and retrieval of mechanism hypotheses. Existing popular gene list enrichment analysis tools (under the criterion that Google scholar citations $>$ 1000) include DAVID [20], BiNGO [21], GOrilla [22], WEGO [23], GoMiner [19], GOStat [24], g:Profiler [25], FatiGO [26], and MAPPFinder [27]. Nevertheless, these tools focus merely on one single gene list and cannot easily deal with the enrichment analysis of multiple gene lists [28]. Although two recent works, ToppCluster [29] and FLAME [30], were developed to consider multiple gene lists, they did not fully support comparative analysis, result visualization, and customized feature set reanalysis in their designs. And all these tools contain only limited features (less than 17). As a result, no multiple gene list analysis platform currently fulfills all the above research requirements.

In this work, we constructed the YMLA (Yeast Multiple List Analyzer) platform to provide integrative and comparative yeast multiple gene list enrichment analysis. The yeast model organism was selected owing to its availability of comprehensive experimental data and gene annotations. We collected 4 categories of yeast datasets (ontology and functional annotation data, functional gene group data, gene/protein property data, and high-throughput probing data) to evaluate 39 features for multiple gene lists. Tabular forms and visualization using heatmaps and network plots were implemented in YMLA to facilitate comparative analysis and investigation of the shared or distinct features among multiple gene lists. Moreover, we also designed the customized feature set reanalysis function in YMLA to assist in mechanism hypothesis formation based on the chosen features. We demonstrated the biological applicability of YMLA using two example gene groups: a list of genes with top high translation efficiency and a gene list with bottom low translation efficiency in the yeast genome. Based on the enrichment analysis provided by YMLA, novel facts (e.g., codon usage may determine translation efficacy and steady-state mRNA levels) can be deduced. In summary, YMLA aims to deal with the routine functional analysis for multiple gene lists identified in high-throughput experiments. And these feature enrichment analyses on yeast multiple gene groups can further shed light on understanding the functional mechanisms of higher eukaryotes. YMLA is available online at https://cosbi7.ee.ncku.edu.tw/YMLA/.

Section snippets

Construction and contents

The Yeast Multiple List Analyzer (YMLA) platform is designed to support comparative feature enrichment analysis for multiple gene lists. The construction of YMLA can be divided into 3 steps. First, four categories of yeast datasets were gathered in YMLA: ontology and functional annotation data, functional gene groups, gene/protein properties, and high-throughput probing results. The collection encompasses 23 datasets and results in 39 features for functional enrichment analysis. We then

Functions implemented in YMLA

YMLA aims to provide comparative functional enrichment analysis for multiple gene lists in yeast. Four major functions were implemented in YMLA to fulfill this need (Fig. 1): (1) Multiple gene list enrichment analysis. Users can input multiple gene lists in YMLA to perform batch enrichment analysis. (2) Comparative visualization. The feature enrichment results are by default shown in a table. The adjusted feature enrichment $p$ -values in the table cells are presented as minus logarithm scores.

Conclusions

The YMLA (Yeast Multiple List Analyzer) platform was constructed in this research to help biologists handle the functional feature enrichment analysis and comparison for multiple gene lists obtained from high-throughput experiments in yeast. To facilitate the usage of this platform, an easy-to-use web interface that can generate tabular results and visualization plots was built. Compared with previous works, YMLA collects the most comprehensive datasets and can further compare and extract the

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by National Cheng Kung University and the National Science and Technology Council of Taiwan [MOST 108-2628-E-006-004-MY3, MOST 111-2636-B-006-013, MOST 110-2221-E-006-198-MY3, MOST 111-2221-E-006-151-MY3, MOST 110-2222-E-006-017, and MOST 111-2221-E-006-231].

References (85)

YangT.-H. et al.
Cancer DEIso: An integrative analysis platform for investigating differentially expressed gene-level and isoform-level human cancer markers
Comput. Struct. Biotechnol. J.
(2021)
TangJ. et al.
Simultaneous improvement in the precision, accuracy, and robustness of label-free proteome quantification by optimizing data manipulation chains*[S]
Mol. Cell. Proteom.
(2019)
YangT.-H. et al.
SSRTool: a web tool for evaluating RNA secondary structure predictions based on species-specific functional interpretability
Comput. Struct. Biotechnol. J.
(2022)
BasehoarA.D. et al.
Identification and distinct regulation of yeast TATA box-containing genes
Cell
(2004)
JeffaresD.C. et al.
Rapidly regulated genes are intron poor
Trends Genet.
(2008)
ShaulO.
How introns enhance gene expression
Int. J. Biochem. Cell Biol.
(2017)
YangT.-H. et al.
cisMEP: an integrated repository of genomic epigenetic profiles and cis-regulatory modules in Drosophila
BMC Syst. Biol.
(2014)
YangT.-H. et al.
regCNN: identifying Drosophila genome-wide cis-regulatory modules via integrating the local patterns in epigenetic marks and transcription factor binding motifs
Comput. Struct. Biotechnol. J.
(2022)
RathodJ. et al.
YPIBP: A repository for phosphoinositide-binding proteins in yeast
Comput. Struct. Biotechnol. J.
(2021)
Di GiammartinoD.C. et al.
Mechanisms and consequences of alternative polyadenylation
Mol. Cell
(2011)

PresnyakV. et al.

Codon optimality is a major determinant of mRNA stability

Cell

(2015)

ChristianoR. et al.

Global proteome turnover analyses of the yeasts S. cerevisiae and S. pombe

Cell Rep.

(2014)

HolstegeF.C. et al.

Dissecting the regulatory circuitry of a eukaryotic genome

Cell

(1998)

LeeH.C. et al.

Exon junction complex enhances translation of spliced mRNAs at multiple steps

Biochem. Biophys. Res. Commun.

(2009)

YuC.-H. et al.

Codon usage influences the local rate of translation elongation to regulate co-translational protein folding

Mol. Cell

(2015)

YangQ. et al.

Consistent gene signature of schizophrenia identified by a novel feature selection strategy from comprehensive sets of transcriptomic data

Brief. Bioinform.

(2020)

HarbisonC.T. et al.

Transcriptional regulatory code of a eukaryotic genome

Nature

(2004)

LefrançoisP. et al.

Global analysis of transcription factor-binding sites in yeast using ChIP-Seq

YangT.-H. et al.

Inferring functional transcription factor-gene binding pairs by integrating transcription factor binding data with transcription factor knockout data

BMC Syst. Biol.

(2013)

BendjilaliN. et al.

Time-course analysis of gene expression during the Saccharomyces cerevisiae hypoxic response

G3: Genes Genom. Genet.

(2017)

YangT.-H. et al.

Human IRES Atlas: an integrative platform for studying IRES-driven translational regulation in humans

Database

(2021)

YangT.-H. et al.

iPhos: a toolkit to streamline the alkaline phosphatase-assisted comprehensive LC-MS phosphoproteome investigation

BMC Bioinformatics

(2014)

TangJ. et al.

ANPELA: analysis and performance assessment of the label-free quantification workflow for metaproteomic studies

Brief. Bioinform.

(2020)

FuJ. et al.

Optimization of metabolomic data processing using NOREVA

Nat. Protoc.

(2022)

AshburnerM. et al.

Gene ontology: tool for the unification of biology

Nature Genet.

(2000)

KanehisaM. et al.

KEGG for linking genomes to life and the environment

Nucleic Acids Res.

(2007)

OughtredR. et al.

The BioGRID interaction database: 2019 update

Nucleic Acids Res.

(2019)

Castro-MondragonJ.A. et al.

JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles

Nucleic Acids Res.

(2022)

YangT.-H.

An aggregation method to identify the RNA meta-stable secondary structure and its functionally interpretable structure ensemble

IEEE/ACM Trans. Comput. Biol. Bioinform.

(2022)

YangT.-H.

Transcription factor regulatory modules provide the molecular mechanisms for functional redundancy observed among transcription factors in yeast

BMC Bioinformatics

(2019)

ZeebergB.R. et al.

GoMiner: a resource for biological interpretation of genomic and proteomic data

Genome Biol.

(2003)

DennisG. et al.

DAVID: database for annotation, visualization, and integrated discovery

Genome Biol.

(2003)

MaereS. et al.

BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks

Bioinformatics

(2005)

EdenE. et al.

GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists

BMC Bioinformatics

(2009)

YeJ. et al.

WEGO: a web tool for plotting GO annotations

Nucleic Acids Res.

(2006)

BeissbarthT. et al.

GOstat: find statistically overrepresented Gene Ontologies within a group of genes

Bioinformatics

(2004)

RaudvereU. et al.

g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update)

Nucleic Acids Res.

(2019)

Al-ShahrourF. et al.

FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes

Bioinformatics

(2004)

DonigerS.W. et al.

MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data

Genome Biol.

(2003)

HuangD.W. et al.

Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists

Nucleic Acids Res.

(2009)

KaimalV. et al.

ToppCluster: a multiple gene list feature analyzer for comparative enrichment clustering and network-based dissection of biological systems

Nucleic Acids Res.

(2010)

ThanatiF. et al.

FLAME: a web tool for functional and literature enrichment analysis of multiple gene lists

Biology

(2021)

Cited by (0)

View full text

YMLA: A comparative platform to carry out functional enrichment analysis for multiple gene lists in yeast

Highlights

Abstract

Introduction

Section snippets

Construction and contents

Functions implemented in YMLA

Conclusions

Declaration of Competing Interest

Acknowledgments

Comput. Struct. Biotechnol. J.

Mol. Cell. Proteom.

Comput. Struct. Biotechnol. J.

Cell

Trends Genet.

Int. J. Biochem. Cell Biol.

BMC Syst. Biol.

Comput. Struct. Biotechnol. J.

Comput. Struct. Biotechnol. J.

Mol. Cell

Cell

Cell Rep.

Cell

Biochem. Biophys. Res. Commun.

Mol. Cell

Consistent gene signature of schizophrenia identified by a novel feature selection strategy from comprehensive sets of transcriptomic data

Brief. Bioinform.

Transcriptional regulatory code of a eukaryotic genome

Nature

Global analysis of transcription factor-binding sites in yeast using ChIP-Seq

Inferring functional transcription factor-gene binding pairs by integrating transcription factor binding data with transcription factor knockout data

BMC Syst. Biol.

Time-course analysis of gene expression during the Saccharomyces cerevisiae hypoxic response

G3: Genes Genom. Genet.

Human IRES Atlas: an integrative platform for studying IRES-driven translational regulation in humans

Database

iPhos: a toolkit to streamline the alkaline phosphatase-assisted comprehensive LC-MS phosphoproteome investigation

BMC Bioinformatics

ANPELA: analysis and performance assessment of the label-free quantification workflow for metaproteomic studies

Brief. Bioinform.

Optimization of metabolomic data processing using NOREVA

Nat. Protoc.

Gene ontology: tool for the unification of biology

Nature Genet.

KEGG for linking genomes to life and the environment

Nucleic Acids Res.

The BioGRID interaction database: 2019 update

Nucleic Acids Res.

JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles

Nucleic Acids Res.

An aggregation method to identify the RNA meta-stable secondary structure and its functionally interpretable structure ensemble

IEEE/ACM Trans. Comput. Biol. Bioinform.

Transcription factor regulatory modules provide the molecular mechanisms for functional redundancy observed among transcription factors in yeast

BMC Bioinformatics

GoMiner: a resource for biological interpretation of genomic and proteomic data

Genome Biol.

DAVID: database for annotation, visualization, and integrated discovery

Genome Biol.

BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks

Bioinformatics

GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists

BMC Bioinformatics

WEGO: a web tool for plotting GO annotations

Nucleic Acids Res.

GOstat: find statistically overrepresented Gene Ontologies within a group of genes

Bioinformatics

g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update)

Nucleic Acids Res.

FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes

Bioinformatics

MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data

Genome Biol.

Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists

Nucleic Acids Res.

ToppCluster: a multiple gene list feature analyzer for comparative enrichment clustering and network-based dissection of biological systems

Nucleic Acids Res.

FLAME: a web tool for functional and literature enrichment analysis of multiple gene lists

Biology