Keywords

1 Introduction

Gene Ontology (GO) [1] is a taxonomy of biological terms to represent the properties of genes and/or gene products (e.g., proteins)Footnote 1. It is organized as a DAG (directed acyclic graph) to describe the relationship among the terms. Gene products are annotated to pertinent GO terms through annotation corpora. There are three GOs: biological process (BP), cellular component (CC), and molecular function (MF). Lord et al. [7] did the first pioneering work by utilizing the ontology-based semantic similarity measure (SSM) in the field of genomics. SSM is a quantitative function, \( SSM (t_1,t_2)\), that measures the closeness between two terms \(t_1\) and \(t_2\) based on their semantic representations in a given ontology. Subsequently, a variety of GO-based SSMs have been proposed and successfully applied to different genomics applications [4, 9].

The high similarity score between two proteins indicates that either they are annotated with similar cellular components (if CC-based GO is used), or with similar biological processes (if BP-based GO is used). This gives an indirect evidence that the two proteins are likely to be interacting compare to other pairs, which has a low similarity score. Hence several studies have used GO-based SSM between two gene products (involved in a PPI) as a confidence score of the interaction. However, GO is being updated regularly with the addition, deletion, and merging of terms along with their annotations. This may affect similarity score between a protein-pair calculated over different versions of the ontology. However, to the best of our knowledge, there is no study which systematically studies the effect of the evolution of GO over SSMs. In this paper, we systematically study whether changes in GO affect the performance of similarity measures. In particular, we focus on GO-based SSMs. Further, we compare multiple GO-based SSMs under this setting for the task of scoring confidence of PPIs.

Section 2 briefly discusses the necessary backgrounds and terminologies. In Sect. 3, we discuss datasets and different GO versions used along with evaluation metrics. Results are discussed and analyzed in Sect. 4.

2 Background

Semantic Similarity Measure (SSM). SSMs can be categorized mainly into two approaches: edge- and node-based [10]. The edge-based approach mainly considers the shared paths between two ontology terms and does not account annotation information of terms. Node-based SSMs compute the similarity between two terms by comparing their properties, common ancestors, and their descendants. This approach is less sensitive to the topological structure of the ontology but more sensitive to change in annotations. SSMs such as [2, 14] try to combine both node- and edge-based approaches and are commonly referred to as the hybrid approach. Few methods, such as TCSS [4], are developed based on the complex structure of GO DAG.

SSMs are defined for two individual terms, but a protein is annotated with a set of terms. So if two proteins \(p_1\) and \(p_2\) are annotated with a set of terms S and T, respectively, then \( SSM (p_1,p_2)\) is calculated as \( SSM (S,T)\) which requires combining SSM between individual term-pairs. Generally, three types of strategies are used in the literature: maximum (MAX), average (Avg), and best-match average (BMA). In MAX and Avg strategies, the similarity between S and T is calculated as the maximum and average of the set \(S \times T\), respectively. SSMs between two sets of terms can be treated as a matrix. BMA is defined as the average of all maximum similarity scores on each row and column of the matrix.

3 Experimental Design

GOs and SSMs. We consider BP and CC ontologies along with MAX and BMA in the evaluation. These ontologies and strategies are the most relevant for scoring confidence of PPIs [8]. We exclude electronically inferred annotations (IEA) as they are not verified by human experts. Further, we consider only those PPIs where both the interacting proteins are annotated to at least one GO term other than the root.

We select five different Bioconductor versions of GO and corresponding annotation corpora: 3.0 (2014-09-13), 3.1 (2015-03-13), 3.2 (2015-09-19), 3.3 (2016-03-05), and 3.4 (2016-09-21). We consider six state-of-the-art SSMs proposed by Resnik [12], Lin [6], Schlicker et al. [13], Jiang and Conrath [5], Wang et al. [14], and Jain and Bader [4], referred to as Resnik, Lin, Rel, Jiang, Wang, and TCSS, respectively, in the rest of the paper. Resnik and TCSS with MAX strategy have been considered to be the best SSMs for scoring confidence of PPIs by several studies [4, 9]. We also consider RDS, RNS, and RES, proposed recently by Paul and Anand [8]. The selected nine SSMs encompass all types of SSMs, as discussed in Sect. 2.

Datasets. We utilize the core subsets of the yeast PPIs from the DIP database (Database of Interacting Proteins) [15] downloaded on 29.10.2015 as positive instances. As done in [4], an equal number of negative PPI instances are generated independently by randomly choosing protein pairs annotated in BP and CC and are not present in the iRefWeb database [11], a combined database of all known PPIs, accessed on 27.11.2015.

Proteins involved in a pathway are more likely to interact among themselves and likely to be annotated to the same or similar GO terms and thus should show high similarity scores. A set of 11 yeast (S. cerevisiae) KEGG pathways is selected as in [8]. During the selection of pathways, the authors of [8] try to maintain a trade-off between functional diversity and computational time required for the experiment.

Evaluation Metrics. A similarity measure can classify a set of PPIs into two groups: positives and negatives, for a given cutoff similarity score. Hence an SSM can be treated as a binary classifier. We utilize the area under the ROC curve (AUC) as an evaluation metric for binary classifiers.

For each KEGG pathway, an intra-set average similarity is computed as the average of all pairwise similarities of proteins within the pathway. An inter-set average similarity for every two pathways is computed as the average of all pairwise cross-similarities of proteins between the two pathways. A discriminating power (DP) of a pathway is defined as the ratio between intra-set average similarity and the average of all inter-set average similarities between that pathway and other pathways as in [3]. Thus the DP quantifies the ability of an SSM to distinguish among various functionally different sets of proteins (e.g., KEGG pathways).

4 Results and Discussion

ROC curve analysis: Table 1 summarizes AUC of the top five SSMs for the different versions of BP ontology. Insignificant change in AUC values for all SSMs indicates that the evolution of GO has no impact on their classification performance. This can be explained easily. An AUC of 1 implies a perfect classifier, while an area of 0.5 indicates a random classifier. So, the practical range of AUC for a reasonably good classifier is very limited (Generally, [0.7, 1]). Unless the majority of the PPIs get affected (due to the changes in GO), it is unexpected to observe high variability in AUCs over the different versions of GO. By affected we mean for a given PPI, an SSM produces different similarity scores for different GO versions. In fact, the majority of PPIs (in the PPI dataset) does not get affected significantly due to the changes in GO.

Table 1. The area under the curves (AUCs) of SSMs for the different GO-BP versions. The best AUC for each strategy is shown in bold.

To see the closer picture of the impact, we find those PPIs whose similarity scores change over the versions of GO. For each SSM, we select the common PPIs (more than \(99\%\) of PPIs are common) among the five GO versions. For each of the selected PPIs, the standard deviation of the five similarity scores corresponding to the five GO versions is calculated. Then we sort the PPIs according to their standard deviation (in descending order) and select the top \(10\%\) PPIs. The selected PPIs are the most affected \(10\%\) PPIs due to the changes in GO. An equal number of negative PPIs are selected from the already generated negative PPIs for the corresponding SSM. Finally, AUC is computed for the selected positive and negative PPIs for each GO version. The resultant AUCs of two best performing SSMs for the different versions of GO-BP are demonstrated in Table 2.

Now, the performance variations of SSMs among GO versions are quite visible. For RES, we observe relative changes of approximately \(8\%\) and \(4\%\) while using MAX and BMA strategies respectively. Similarly, for TCSS, relative changes of approximately \(6\%\) and \(7\%\) while using MAX and BMA strategies. These changes are observed between versions 3.0 and 3.4. Similar observations are made for the other SSMs and using other ontologies. We also observe that across all measures, the overall variability is higher in CC than BP.

Table 2. The area under the curves (AUCs) of two best performing SSMs for the different GO-BP versions with top 10% most affected PPIs.

To find a general pattern of variability among SSMs, we repeat the aforementioned process for different cutoffs (\(100\%\) to the top \(10\%\)) of affected PPIs. Here a cutoff of \(100\%\) implies that all PPIs are considered and hence, the majority of them have no change in their similarity score. The mean AUCs (of five GO versions) achieved by SSMs in increasing order of variability of PPIs are shown in Fig. 1.

SSMs with BMA strategy shows robustness compared to MAX strategy. Almost all SSMs with BMA strategy either improve or keep their performance similar from their initial performance as variability increases in both the ontologies. Particularly in BP, the improvement is more smooth and consistent. However, with MAX strategy, the performance is quite fluctuating, and the irregularity is more in CC. Therefore it seems that MAX strategy overestimates in many cases, especially in CC.

All SSMs exhibit higher robustness in BP than CC. If we examine the same for each SSM separately, we get further insights (See Figs. 2 and 3). With all data considered (\(100\%\)), SSMs with MAX strategy gives better AUC in comparison with BMA. However, as variability increases (by removing PPIs having no changes over GO evolution), SSMs with BMA obtain higher AUCs. In TCSS, although BMA increases its performances continuously, it is unable to cross the performance of MAX, particularly in BP. In fact, the difference of performance between MAX and BMA of TCSS and Resnik is reducing as variability increases, and they show almost similar performances with very high variable PPIs (>50%).

RES-BMA continuously produces the highest AUCs as variability increases. In general, RES, RNS, and TCSS show comparatively high robustness. With the top \(10\%\) variable PPIs, the highest mean AUC is 0.949/0.957 (BP/CC) produced by RES-BMA while the second-highest mean AUC is 0.922/ 0.940 (BP/CC) produced by TCSS with MAX or BMA.

Fig. 1.
figure 1

The mean AUCs of five GO versions achieved by SSMs at different cutoffs of affected PPIs.

Fig. 2.
figure 2

The mean AUCs achieved by Lin, Rel, RDS, and Wang at different cutoffs of affected PPIs with the plotting of individual SSM.

Fig. 3.
figure 3

The mean AUCs achieved by RNS, RES, TCSS, and Resnik at different cutoffs of affected PPIs with the plotting of individual SSM.

Set-discriminating power of KEGG pathways: For each GO versions and SSM, we calculate DP values of each pathway with respect to other 10 pathways. Then we take version-wise (GO) mean DP values. Table 3 shows the mean DP values of all the 11 pathways for each GO-BP version and SSM.

Table 3. The mean DP values of all the 11 pathways for each GO-BP version and SSM. The best DP values are shown in bold.

The majority of SSMs produce quite similar DP values over the evolution of GO since less number of PPIs are affected due to the changes in GO. RES almost continuously produces higher DP values in both the ontologies, particularly, with BMA strategy. TCSS shows competitive performances in both the ontologies while Jiang achieves good DP values in BP only. The significant differences between MAX and BMA strategies, in both BP and CC simultaneously, are observed with RES, TCSS, and some extend with RNS only.

RES-BMA shows continuous and significant improvement over the evolution of GO. We can assume that the newer GO version represents more accurate and complete information than the older, and the robust SSMs should reflect that positively. RES-BMA almost continuously improves its DP value over the evolution of BP ontology (5.59, 5.76, 6.38, 6.58, and 6.50) except for the last version (Ver. 3.4), whereas other SSMs keep their performances quite similar. In fact, the changes, particularly, in edges, between the two GO-BP versions (Ver. 3.3 to Ver. 3.4) are very less (+0.30%) in comparison with other versions (The avg. successive change is +2.91%). Hence the changes are reflected better way with RES-BMA than the others.

5 Conclusion

In this paper, we systematically study how similarity measures get affected due to the evolution of gene ontology for the task of scoring confidence of PPIs. We observe that the performance of each measure gets affected due to the regular updates of GO. All SSMs exhibit satisfactory robustness with BMA strategy in BP ontology only. SSMs with MAX strategy have the tendency to overestimate, particularly in CC. Although, RES-BMA, TCSS-BMA and RNS-BMA exhibit comparatively good robustness, the changes in GO is reflected better way with RES-BMA than the others.