Elsevier

Neurocomputing

Volume 323, 5 January 2019, Pages 76-85
Neurocomputing

Predicting the associations between microbes and diseases by integrating multiple data sources and path-based HeteSim scores

https://doi.org/10.1016/j.neucom.2018.09.054Get rights and content

Abstract

A variety of microbial communities are renowned as “a forgotten organ” throughout human body, which have significant impacts on the human health and disease. Identifying the associations between microbes and diseases can provide us with valuable insights for understand the complex disease pathogenesis as well as the diagnosis and therapy, prevention, prognosis drug discovery. Because the experiment-based methods need a long and sampled time series to identify the microbe-disease associations, computational methods provide a valuable insight into understanding complex diseases. However, discovering novel and effective microbial candidates for complex diseases with computational prediction models is still limited. Here, we developed a new method to predict the potential microbe-disease associations by integrating Multiple Data sources and Path-based HeteSim scores for Human Microbe-Disease Associations (MDPH_HMDA). First of all, a heterogeneous network was constructed, in which microbe similarity was measured by the microbe-microbe functional similarity and Gaussian interaction profile kernel similarity for microbes, disease similarity was measured by the symptom-based human disease similarity and Gaussian interaction profile kernel similarity for diseases. Then, normalized HeteSim measure was employed to weight the known microbe-disease pairs, and the HeteSim scores of microbe-disease-disease path and microbe-microbe-disease path were integrated to calculate the relatedness scores for potential microbe-disease associations. Additionally, MDPH_HMDA achieved a reliable prediction performance with AUCs (the area under the ROC curve) of 0.9015 in the leave-one-out cross validation, and the results showed that our method could be effective to find the potential associations between microbes and microbes. Furthermore, representative diseases including type 2 diabetes, colorectal carcinoma, asthma and inflammatory bowel disease (IBD), in which the potential microbes associated with these diseases were ranked as candidate disease-causing microbes, respectively. The reliable performance showed that our proposed method could serve as a powerful computational tool to identify more microbe-disease associations and benefit to medical scientific progress in terms of the human medical improvement.

Introduction

Trillions of microbes populate the human body, which mainly including bacteria, virus, archaea, fungi, protozoa and eukaryotes, inhabiting in different organs surface such as skin, mouth, gut, vagina and gastrointestinal tract [1]. With the fast development of microbiome and meta-genome sequencing technology, differentially regulated microorganisms under different conditions are identified and show that microbes are closely associated with our health and disease [2]. For example, the Human Microbiome Project (HMP) and Earth Microbiome Project (EMP) were built with the goal of investigating the relationship between microbiota and human diseases [3–5]. The normal microbiota forms a healthy commensal or symbiotic relationship with the host body by inhabiting in various organs [4]. The large warehouse of combined microorganisms and their gene products provide diverse biochemical and metabolic activities [1]. In addition, the commensal microbiota is considered as “a forgotten organ”, which contribute to the development of the immune system [6], protection against pathogens [7] and drug metabolism [8]. Meanwhile, the microbe–host interaction is also a mutualistic symbiotic relationship, which is selected by co-evolution between humans and their symbiotic microbes. The microbial communities are varies in different bodies, which are influenced by the genetics [9] and the environments of the hosts such as diets [10], seasons [11], antibiotics [12] and smoking [13]. Thus, the imbalance or dysbiosis of microbial communities may cause diseases [14]. For example, low microbial diversity can cause obesity and inflammatory bowel disease [15], [16]. However, fecal microbiota transplantation (FMT) has been most accepted treatment for the intestinal microbiota dysbiosis [17]. Recently, with the fast development of sequencing technology and analytic system [18], [19], microbes related with diseases have been identified such as cardiovascular disease [20], cancer [21], autoinflammatory disease [22], and metabolic syndrome (e.g. obesity and diabetes) [23], [24]. These studies make a progress in the field of discovering and understanding the disease formation, diagnosis and therapy. However, the experiment-based methods for identifying microbe-disease associations are laborious and time-consuming, moreover, the interaction between host and microbial community is dynamic and complex. Therefore, system biology approach to identify the relationships between microbes and diseases is still a challenge.

Recently, some computational methods have been proposed for studying microorganism and human diseases [25], [26], [27]. These methods helped us to understand the microbial communities more comprehensively and systematically. Furthermore, the construction of the Human Microbe-Disease Association Database (HMDAD) that collect the biological information of microbe-disease associations at the genus level, which provides a possibility for investigating the potential microbe-disease associations [28]. Based on the assumption that microbes involved in phenotypically similar diseases tend to be functional similar and vice versa, several methods have been developed. Chen et al. [29] presented the computational model called KATZHMDA to infer potential disease-related microbes by integrating the number of walks and their lengths. Wang et al. [30] adopted a semi-supervised learning framework called LRLSHMDA by integrating Gaussian interaction profile kernel similarity and Laplacian regularized least squares (LapRLS) classification. RWRH [31] applied a random walk with restart algorithm on the heterogeneous network to get the top ranked microbes as the most associated with the disease. Huang et al. [32] introduced PBHMDA to calculate the scores of each candidate microbe-disease pair by a special depth-first search algorithm. Zou et al. [33] employed a bi-random walk algorithm on the heterogeneous network to predict potential microbe-disease associations. Bao et al. [34] proposed a non-parametric universal network-based method to predict associated microbes for investigated diseases. Wu et al [35] employed particle swarm optimization to optimize the parameters of random walk with restart model in microbe-disease heterogeneous network.

Although several computational methods have achieved good results, it is still a tip of the iceberg because most microbes related with diseases remain unknown. Meanwhile, HeteSim [36] measure achieved good performance for the prediction of investment behavior prediction [37], microRNA-disease associations [38], lncRNA-protein interactions [39], drug-target interactions [40], disease genes [41]. Inspired by this idea, we apply the HeteSim measure to calculate the relatedness of heterogeneous object pair (same or different types) with a path-constrained similarity score, which have considered the subtle semantic meaning of different paths.

In this study, a new method was developed to predict the potential microbe-disease associations by integrating Multiple Data sources and Path-based HeteSim scores for Human Microbe-Disease Associations (MDPH_HMDA), which executed the HeteSim measure on the heterogeneous graph consisting of microbe similarity network, disease similarity network and weighted microbe-disease association network. The similarity of microbe-microbe and disease-disease pair was calculated by the integration of microbe-microbe functional scores and Gaussian interaction profile kernel similarity matrix for microbes, symptom-based disease similarity scores and Gaussian interaction profile kernel similarity matrix for diseases, respectively. Then, normalized HeteSim algorithm was executed to weight the known microbe-disease pairs, and the HeteSim scores of different paths were integrated for the potential microbe-disease association on the heterogeneous graph. Leave-one-out cross validation (LOOCV) was introduced to evaluate the performance of MDPH_HMDA, and the AUC (the area under the ROC curve) value 0.9015 of LOOCV was obtained and was higher than previously proposed BiRWHMDA model. Case studies of type 2 diabetes, colorectal carcinoma, asthma and inflammatory bowel disease (IBD) demonstrate that most of top 10 predicted disease-related microbes have been validated by the PubMed bibliographic records. Therefore, MDPH_HMDA is an effective approach for predicting novel microbe-disease associations.

Section snippets

Datasets

The dataset was derived from the Human Microbe-Disease Association Database (HMDAD, http://www.cuilab.cn/hmdad) [28], which integrated 483 verified microbe-disease associations including 39 diseases and 292 microbes. These microorganisms were curated at the genus level, because many microbiome studies used 16s RNA sequencing and gave out genus-level information. Here, the duplicate interactions were removed and left 450 distinct associations. Then, the adjacency matrix MD was constructed to

Performance evaluation with LOOCV

Leave-one-out cross validation (LOOCV) is implemented on the microbe-disease associations in HMDAD database to assess the performance of MDPH_HMDA. In the framework of LOOCV, each known microbe-disease pair left out in turn as test sample and all the other known microbe-disease pairs were used as training sample. All associations between 39 diseases and 292 microbes except for the known microbe-disease associations are regarded as candidate samples. The predictive performance was evaluated by

Conclusion

With the development of high-throughput sequencing technology and analysis system, the relationship between microbes and human host are widely researched. Clinical evidence shown that microorganisms play important roles to human health and disease. Microorganism can not only as biomarkers for disease diagnosis, prognosis, but also related to disease mechanism [71]. However, the associations between microbes and diseases were far away from clearly detected. Base on the assumption that the

Acknowledgements

This paper is supported by the National Natural Science Foundation of China [Grant numbers 61672334, 61502290 and 61401263). The funding agencies had no role in study, its design, the data collection and analysis, the decision to publish, or the preparation of the manuscript.

Chunyan Fan received M.S. degree in Biophysics from College of Life Science, Shaanxi Normal University and is pursuing Ph.D. Degree in Computer Software and Theory from School of Computer Science, Shaanxi Normal University. Her research interests include identification and functional analysis of noncoding RNAs and network science's application in biological research.

References (75)

  • B. Vogelstein et al.

    The multistep nature of cancer

    Trends Genet.

    (1993)
  • H. Shmuely

    Relationship between Helicobacter pylori CagA status and colorectal cancer

    Am. J. Gastroenterol.

    (2001)
  • F. Sommer et al.

    The gut microbiota–masters of host development and physiology

    Nat. Rev. Microbiol.

    (2013)
  • B.A. Methé

    A framework for human microbiome research

    Nature

    (2012)
  • C. Huttenhower

    Structure, function and diversity of the healthy human microbiome

    Nature

    (2012)
  • J.A. Gilbert

    Meeting report: the terabase metagenomics workshop and the vision of an Earth microbiome project

    Stand. Genomic Sci.

    (2010)
  • J. Kreth et al.

    Streptococcal antagonism in oral biofilms: Streptococcus sanguinis and Streptococcus gordonii interference with Streptococcus mutans

    J. Bacteriol.

    (2008)
  • W. Jia

    Gut microbiota: a potential new territory for drug targeting

    Nat. Rev. Drug Discov.

    (2008)
  • L.A. David

    Diet rapidly and reproducibly alters the human gut microbiome

    Nature

    (2014)
  • E.R. Davenport

    Seasonal variation in human gut microbiome composition

    PLoS One

    (2014)
  • M.R. Mason

    The subgingival microbiome of clinically healthy current and never smokers

    ISME J.

    (2015)
  • P.J. Turnbaugh

    A core gut microbiome in obese and lean twins

    Nature

    (2009)
  • J. Qin

    A human gut microbial gene catalogue established by metagenomic sequencing

    Nature

    (2010)
  • E.M. Jesmok et al.

    Next-generation sequencing of the bacterial 16S rRNA gene for forensic soil comparison: a feasibility study

    J. Forensic Sci.

    (2016)
  • C.C. Thompson

    Microbial taxonomy in the post-genomic era: rebuilding from scratch?

    Arch Microbiol.

    (2015)
  • W.H. Tang et al.

    The contributory role of gut microbiota in cardiovascular disease

    J. Clin. Invest.

    (2014)
  • R.F. Schwabe et al.

    The microbiome and cancer

    Nat. Rev. Cancer

    (2013)
  • J.R. Lukens

    Dietary modulation of the microbiome affects autoinflammatory disease

    Nature

    (2014)
  • J. Qin

    A metagenome-wide association study of gut microbiota in type 2 diabetes

    Nature

    (2012)
  • L. Wen

    Innate immunity and intestinal microbiota in the development of Type 1 diabetes

    Nature

    (2008)
  • Y. Cao

    mmnet: an R package for metagenomics systems biology analysis

    Biomed. Res. Int.

    (2015)
  • E.D. Coelho

    Computational methodology for predicting the landscape of the human-microbial interactome region level influence

    J. Bioinform. Comput. Biol.

    (2015)
  • S. Nayfach et al.

    MetaQuery: a web server for rapid annotation and quantitative analysis of specific genes in the human gut microbiome

    Bioinformatics

    (2015)
  • W. Ma

    An analysis of human microbe-disease associations

    Brief Bioinform.

    (2017)
  • X. Chen

    A novel approach based on KATZ measure to predict associations of human microbiota with non-infectious diseases

    Bioinformatics

    (2018)
  • F. Wang

    LRLSHMDA: Laplacian regularized least squares for human microbe-disease association prediction

    Sci. Rep.

    (2017)
  • Z.A. Huang

    PBHMDA: path-based human microbe-disease association prediction

    Front. Microbiol.

    (2017)
  • Cited by (28)

    • Heterogeneous question answering community detection based on graph neural network

      2023, Information Sciences
      Citation Excerpt :

      The complex network is currently the most popular multi-type entity-relationship representation technology and has been exploited in various domains [20]. Fan, et al. [7] exploited a heterogeneous graph to establish the relationship between disease and pathogen. Gao, et al. [11] merged node and path characteristics to discover different entity relationships.

    • Taxonomy dimension reduction for colorectal cancer prediction

      2019, Computational Biology and Chemistry
      Citation Excerpt :

      Many studies have shown that there is an important link between microbes and disease. ( Fan et al. (2019)) proposed a method to determine the microbial–disease association by integrating Multiple Data sources and Path-based HeteSim scores for Human Microbe-Disease Associations (MDPH_HMDA). ( Chen et al. (2017a)) proposed a novel method based on Katz et al. (Katz, 1953) to predict the associations of human microbiota with non-infectious diseases. (

    View all citing articles on Scopus

    Chunyan Fan received M.S. degree in Biophysics from College of Life Science, Shaanxi Normal University and is pursuing Ph.D. Degree in Computer Software and Theory from School of Computer Science, Shaanxi Normal University. Her research interests include identification and functional analysis of noncoding RNAs and network science's application in biological research.

    Xiujuan Lei is a professor and Ph.D. Supervisor at Shaanxi Normal University. She received the Ph.D. Degree in Northwestern Polytechnical University in 2005. Her research interests include bioinformatics and intelligent computing.

    Ling Guo is a senior experimentalist in college of life science at Shaanxi Normal University. Her interests are bioinformatics, evolution and cell biology, etc.

    Aidong Zhang is a distinguished professor of the Department of Computer Science and Engineering at State University of New York at Buffalo. She is the IEEE Fellow. Her current research interests include data mining, bioinformatics, health Informatics and database systems, etc.

    View full text