Predicting the associations between microbes and diseases by integrating multiple data sources and path-based HeteSim scores
Introduction
Trillions of microbes populate the human body, which mainly including bacteria, virus, archaea, fungi, protozoa and eukaryotes, inhabiting in different organs surface such as skin, mouth, gut, vagina and gastrointestinal tract [1]. With the fast development of microbiome and meta-genome sequencing technology, differentially regulated microorganisms under different conditions are identified and show that microbes are closely associated with our health and disease [2]. For example, the Human Microbiome Project (HMP) and Earth Microbiome Project (EMP) were built with the goal of investigating the relationship between microbiota and human diseases [3–5]. The normal microbiota forms a healthy commensal or symbiotic relationship with the host body by inhabiting in various organs [4]. The large warehouse of combined microorganisms and their gene products provide diverse biochemical and metabolic activities [1]. In addition, the commensal microbiota is considered as “a forgotten organ”, which contribute to the development of the immune system [6], protection against pathogens [7] and drug metabolism [8]. Meanwhile, the microbe–host interaction is also a mutualistic symbiotic relationship, which is selected by co-evolution between humans and their symbiotic microbes. The microbial communities are varies in different bodies, which are influenced by the genetics [9] and the environments of the hosts such as diets [10], seasons [11], antibiotics [12] and smoking [13]. Thus, the imbalance or dysbiosis of microbial communities may cause diseases [14]. For example, low microbial diversity can cause obesity and inflammatory bowel disease [15], [16]. However, fecal microbiota transplantation (FMT) has been most accepted treatment for the intestinal microbiota dysbiosis [17]. Recently, with the fast development of sequencing technology and analytic system [18], [19], microbes related with diseases have been identified such as cardiovascular disease [20], cancer [21], autoinflammatory disease [22], and metabolic syndrome (e.g. obesity and diabetes) [23], [24]. These studies make a progress in the field of discovering and understanding the disease formation, diagnosis and therapy. However, the experiment-based methods for identifying microbe-disease associations are laborious and time-consuming, moreover, the interaction between host and microbial community is dynamic and complex. Therefore, system biology approach to identify the relationships between microbes and diseases is still a challenge.
Recently, some computational methods have been proposed for studying microorganism and human diseases [25], [26], [27]. These methods helped us to understand the microbial communities more comprehensively and systematically. Furthermore, the construction of the Human Microbe-Disease Association Database (HMDAD) that collect the biological information of microbe-disease associations at the genus level, which provides a possibility for investigating the potential microbe-disease associations [28]. Based on the assumption that microbes involved in phenotypically similar diseases tend to be functional similar and vice versa, several methods have been developed. Chen et al. [29] presented the computational model called KATZHMDA to infer potential disease-related microbes by integrating the number of walks and their lengths. Wang et al. [30] adopted a semi-supervised learning framework called LRLSHMDA by integrating Gaussian interaction profile kernel similarity and Laplacian regularized least squares (LapRLS) classification. RWRH [31] applied a random walk with restart algorithm on the heterogeneous network to get the top ranked microbes as the most associated with the disease. Huang et al. [32] introduced PBHMDA to calculate the scores of each candidate microbe-disease pair by a special depth-first search algorithm. Zou et al. [33] employed a bi-random walk algorithm on the heterogeneous network to predict potential microbe-disease associations. Bao et al. [34] proposed a non-parametric universal network-based method to predict associated microbes for investigated diseases. Wu et al [35] employed particle swarm optimization to optimize the parameters of random walk with restart model in microbe-disease heterogeneous network.
Although several computational methods have achieved good results, it is still a tip of the iceberg because most microbes related with diseases remain unknown. Meanwhile, HeteSim [36] measure achieved good performance for the prediction of investment behavior prediction [37], microRNA-disease associations [38], lncRNA-protein interactions [39], drug-target interactions [40], disease genes [41]. Inspired by this idea, we apply the HeteSim measure to calculate the relatedness of heterogeneous object pair (same or different types) with a path-constrained similarity score, which have considered the subtle semantic meaning of different paths.
In this study, a new method was developed to predict the potential microbe-disease associations by integrating Multiple Data sources and Path-based HeteSim scores for Human Microbe-Disease Associations (MDPH_HMDA), which executed the HeteSim measure on the heterogeneous graph consisting of microbe similarity network, disease similarity network and weighted microbe-disease association network. The similarity of microbe-microbe and disease-disease pair was calculated by the integration of microbe-microbe functional scores and Gaussian interaction profile kernel similarity matrix for microbes, symptom-based disease similarity scores and Gaussian interaction profile kernel similarity matrix for diseases, respectively. Then, normalized HeteSim algorithm was executed to weight the known microbe-disease pairs, and the HeteSim scores of different paths were integrated for the potential microbe-disease association on the heterogeneous graph. Leave-one-out cross validation (LOOCV) was introduced to evaluate the performance of MDPH_HMDA, and the AUC (the area under the ROC curve) value 0.9015 of LOOCV was obtained and was higher than previously proposed BiRWHMDA model. Case studies of type 2 diabetes, colorectal carcinoma, asthma and inflammatory bowel disease (IBD) demonstrate that most of top 10 predicted disease-related microbes have been validated by the PubMed bibliographic records. Therefore, MDPH_HMDA is an effective approach for predicting novel microbe-disease associations.
Section snippets
Datasets
The dataset was derived from the Human Microbe-Disease Association Database (HMDAD, http://www.cuilab.cn/hmdad) [28], which integrated 483 verified microbe-disease associations including 39 diseases and 292 microbes. These microorganisms were curated at the genus level, because many microbiome studies used 16s RNA sequencing and gave out genus-level information. Here, the duplicate interactions were removed and left 450 distinct associations. Then, the adjacency matrix MD was constructed to
Performance evaluation with LOOCV
Leave-one-out cross validation (LOOCV) is implemented on the microbe-disease associations in HMDAD database to assess the performance of MDPH_HMDA. In the framework of LOOCV, each known microbe-disease pair left out in turn as test sample and all the other known microbe-disease pairs were used as training sample. All associations between 39 diseases and 292 microbes except for the known microbe-disease associations are regarded as candidate samples. The predictive performance was evaluated by
Conclusion
With the development of high-throughput sequencing technology and analysis system, the relationship between microbes and human host are widely researched. Clinical evidence shown that microorganisms play important roles to human health and disease. Microorganism can not only as biomarkers for disease diagnosis, prognosis, but also related to disease mechanism [71]. However, the associations between microbes and diseases were far away from clearly detected. Base on the assumption that the
Acknowledgements
This paper is supported by the National Natural Science Foundation of China [Grant numbers 61672334, 61502290 and 61401263). The funding agencies had no role in study, its design, the data collection and analysis, the decision to publish, or the preparation of the manuscript.
Chunyan Fan received M.S. degree in Biophysics from College of Life Science, Shaanxi Normal University and is pursuing Ph.D. Degree in Computer Software and Theory from School of Computer Science, Shaanxi Normal University. Her research interests include identification and functional analysis of noncoding RNAs and network science's application in biological research.
References (75)
Rapidly expanding knowledge on the role of the gut microbiome in health and disease
Biochim. Biophys. Acta
(2014)An immunomodulatory molecule of symbiotic bacteria directs maturation of the host immune system
Cell
(2005)Human genetics shape the gut microbiome
Cell
(2014)A systematic analysis of biosynthetic gene clusters in the human microbiome reveals a common family of antibiotics
Cell
(2014)Microbes in gastrointestinal health and disease
Gastroenterology
(2009)Gut microbiota community adaption during young children fecal microbiota transplantation by 16s rDNA sequencing
Neurocomputing
(2016)Prioritizing disease-causing microbes based on random walking on the heterogeneous network
Methods
(2017)Investment behavior prediction in heterogeneous information network
Neurocomputing
(2016)Prediction and validation of association between microRNAs and diseases by multipath methods
Biochim. Biophys. Acta
(2016)Relevance search for predicting lncRNA–protein interactions based on heterogeneous network
Neurocomputing
(2016)
The multistep nature of cancer
Trends Genet.
Relationship between Helicobacter pylori CagA status and colorectal cancer
Am. J. Gastroenterol.
The gut microbiota–masters of host development and physiology
Nat. Rev. Microbiol.
A framework for human microbiome research
Nature
Structure, function and diversity of the healthy human microbiome
Nature
Meeting report: the terabase metagenomics workshop and the vision of an Earth microbiome project
Stand. Genomic Sci.
Streptococcal antagonism in oral biofilms: Streptococcus sanguinis and Streptococcus gordonii interference with Streptococcus mutans
J. Bacteriol.
Gut microbiota: a potential new territory for drug targeting
Nat. Rev. Drug Discov.
Diet rapidly and reproducibly alters the human gut microbiome
Nature
Seasonal variation in human gut microbiome composition
PLoS One
The subgingival microbiome of clinically healthy current and never smokers
ISME J.
A core gut microbiome in obese and lean twins
Nature
A human gut microbial gene catalogue established by metagenomic sequencing
Nature
Next-generation sequencing of the bacterial 16S rRNA gene for forensic soil comparison: a feasibility study
J. Forensic Sci.
Microbial taxonomy in the post-genomic era: rebuilding from scratch?
Arch Microbiol.
The contributory role of gut microbiota in cardiovascular disease
J. Clin. Invest.
The microbiome and cancer
Nat. Rev. Cancer
Dietary modulation of the microbiome affects autoinflammatory disease
Nature
A metagenome-wide association study of gut microbiota in type 2 diabetes
Nature
Innate immunity and intestinal microbiota in the development of Type 1 diabetes
Nature
mmnet: an R package for metagenomics systems biology analysis
Biomed. Res. Int.
Computational methodology for predicting the landscape of the human-microbial interactome region level influence
J. Bioinform. Comput. Biol.
MetaQuery: a web server for rapid annotation and quantitative analysis of specific genes in the human gut microbiome
Bioinformatics
An analysis of human microbe-disease associations
Brief Bioinform.
A novel approach based on KATZ measure to predict associations of human microbiota with non-infectious diseases
Bioinformatics
LRLSHMDA: Laplacian regularized least squares for human microbe-disease association prediction
Sci. Rep.
PBHMDA: path-based human microbe-disease association prediction
Front. Microbiol.
Cited by (28)
Heterogeneous question answering community detection based on graph neural network
2023, Information SciencesCitation Excerpt :The complex network is currently the most popular multi-type entity-relationship representation technology and has been exploited in various domains [20]. Fan, et al. [7] exploited a heterogeneous graph to establish the relationship between disease and pathogen. Gao, et al. [11] merged node and path characteristics to discover different entity relationships.
Taxonomy dimension reduction for colorectal cancer prediction
2019, Computational Biology and ChemistryCitation Excerpt :Many studies have shown that there is an important link between microbes and disease. ( Fan et al. (2019)) proposed a method to determine the microbial–disease association by integrating Multiple Data sources and Path-based HeteSim scores for Human Microbe-Disease Associations (MDPH_HMDA). ( Chen et al. (2017a)) proposed a novel method based on Katz et al. (Katz, 1953) to predict the associations of human microbiota with non-infectious diseases. (
Chunyan Fan received M.S. degree in Biophysics from College of Life Science, Shaanxi Normal University and is pursuing Ph.D. Degree in Computer Software and Theory from School of Computer Science, Shaanxi Normal University. Her research interests include identification and functional analysis of noncoding RNAs and network science's application in biological research.
Xiujuan Lei is a professor and Ph.D. Supervisor at Shaanxi Normal University. She received the Ph.D. Degree in Northwestern Polytechnical University in 2005. Her research interests include bioinformatics and intelligent computing.
Ling Guo is a senior experimentalist in college of life science at Shaanxi Normal University. Her interests are bioinformatics, evolution and cell biology, etc.
Aidong Zhang is a distinguished professor of the Department of Computer Science and Engineering at State University of New York at Buffalo. She is the IEEE Fellow. Her current research interests include data mining, bioinformatics, health Informatics and database systems, etc.