PLA-GNN: Computational inference of protein subcellular location alterations under drug treatments with deep graph neural networks
Introduction
Proteins are sorted to appropriate subcellular compartments or secreted outside the cell after or along with the translation process [1,2]. The molecular function of a protein is highly correlated with its subcellular localization [3]. The aberrant translocation of a protein may affect its normal molecular function, and may involve it in an incorrect biological process [4,5]. Environmental stresses may alter protein sorting destinations [6], which is a response of a living cell to a changing environment. Protein mis-localization events are related to complex disorders, including Alzheimer's disease [7], amyotrophic lateral sclerosis [8] and acute myeloid leukemia [9]. Interfering protein sorting process by pharmaceutical substances is a kind of therapies to complex diseases [10,11]. Several practices have been performed [12].
Human protein subcellular localizations have been systematically mapped by experiments [13]. However, this mapping process is incredibly expensive and time consuming [14]. It is unlikely to determine every mis-localization event in a given cellular state by this way. The cellular state here means a cell in its normal living state or a disease state or a disease state with drug perturbations. Therefore, computational estimations are considered as alternative approaches to determine protein mis-localization events [[15], [16], [17]].
In a fixed cellular state, predicting protein subcellular locations has been well studied [[18], [19], [20], [21]]. There are many computational methods for predicting protein subcellular locations. These methods can predict protein subcellular location in a tissue-specific or a lineage-specific manner [20,[22], [23], [24]]. These computational approaches utilized protein sequences [18,19,25], structures [26,27] and interactions [16,28] to estimate protein subcellular locations. However, only a handful of studies tried to predict alterations of protein subcellular locations in different cellular states [17,[29], [30], [31]]. These studies generally fall into two categories, the image-based and the omics-based methods.
Image-based methods take immunohistochemical images [20] or immunofluorescence images [21] as input. They use image analysis algorithms along with machine learning models to identify protein subcellular locations in different cellular states. By comparing prediction results in different cellular states, these methods can report protein mis-localization events [20,21]. Omics-based methods take protein sequences and interactions as input. Systems biology methods are used to report mis-localization events. For example, Lee et al. integrated protein sequences, PPI (protein-protein interaction) networks and gene expression profiles to find mis-localized proteins in gliomas [31]. For another example, the PROLocalizer predictor used sequence mutations to detect protein mis-localizations in diseases [29,30].
Neither strategy can be applied as a common pipeline. Image-based methods face two challenges: the lack of fluorescence images and the limited resolution in immunohistochemical images [32]. Omics-based methods usually use the PPI networks in a normal state to mimic PPI networks in other cellular states, assuming the changes of PPIs can be ignored. This is due to the fact that PPI networks in different cellular states are usually not available [16]. However, this assumption has a paradox. Given that PPIs are usually physical interactions, if the subcellular location of a protein was changed, it would be less likely to interact with proteins in its original subcellular compartments. Its interacting proteins would be surely changed also. Therefore, assuming a universal PPI network in various cellular states just discarded the most informative changes. Although gene expressions may rescue this assumption to some extent, the prediction performances are inevitable affected [16].
Li et al. proposed the DPPN-SVM [17] method in accordance to the differential network biology concept [33]. They used gene expression profiles to estimate PPI networks in different cellular states. The PPI network in a given cellular state can be estimated by adding and removing certain interactions from the normal state network. By using this strategy, DPPN-SVM identified a serial of potentially mis-localized proteins in the breast cancer and validated them by other literatures.
Although attempts have been made in predicting mis-localized proteins in diseases, as far as we know, no existing study can computationally identify mis-localized proteins in drug therapies. In this work, we propose a new computational method for predicting mis-localized proteins in drug therapies. We estimated PPI networks under drug treatments. Graph neural network models were trained to aggregate high-order topological information of PPI networks, as it is reported that the high-order interaction information is more dominant in PPI networks [34,35]. We name our method as PLA-GNN (Protein Localization Alterations by Graph Neural Network).
We took TSA (trichostatin A), bortezomib, and tacrolimus as instances in our study. TSA, an antifungal biotic, is a potent and specific inhibitor of histone deacetylase (HDAC) activity [36]. Bortezomib is a dipeptide boronic acid derivative and a proteasome inhibitor. It is reported that bortezomib enhances Docetaxel-induced cell death level and has an inhibitory effect on cell migration in breast cancer [37]. Tacrolimus is a calcineurin inhibitor for preventing rejections in transplants, and for treating moderate to severe atopic dermatitis [38]. Our results indicated that, when administered, several proteins, which are highly related to pharmacological mechanisms of these drugs, may undergo protein localization alterations. This may provide useful information for pharmacological studies. Our method has the potential to become a common pipeline for predicting protein localization alterations in drug therapies.
Section snippets
PPI network
We downloaded PPI records from the BioGRID database [39]. To construct a high-quality working dataset, we screened the raw PPI records strictly according to the following steps: (1) Only interactions between two human proteins were kept. (2) All interactions between two identical proteins were excluded. (3) Duplicate records were reduced. All redundant records were removed. (4) Non-physical interaction records were excluded. We kept only interactions with a type MI:0915 (physical association),
Network topology adjustment
The PPI network has a total of 1,376,072 interactions in the control state. When creating the dynamic PPI network, a total of 577,969,681 differential PCC values of protein pairs are calculated for each of the three drugs. Topology adjustments were carried out according to these values. We finally obtained 2,202,772 interactions with the TSA treatment, 2,295,812 interactions with the bortezomib treatment, and 1,367,114 interactions with the tacrolimus treatment. Distributions of differential
Conclusions
Computational prediction of protein subcellular localizations has been studied for over two decades. However, only a handful of studies considered protein subcellular location alterations in different cellular states. Notably, no existing study considered drug treatment states. We take the TSA, bortezomib, and tacrolimus as instances to develop PLA-GNN, which detects protein subcellular location alterations in drug perturbation states. We integrated gene expression profiles and PPIs to create a
Author contributions
RHW collected the data, constructed the model, implement the algorithm, performed experiments and partially wrote the manuscript. TL analyzed the results and partially wrote the manuscript. HLZ partially analyzed the results. PFD supervised the whole study, conceptualized the algorithm, analyzed the results and partially wrote the manuscript.
Funding
This work was supported by National Natural Science Foundation of China [NSFC 61872268].
Data availability statement
The code and data for reproducing the results of this paper is available in GitHub (https://github.com/quinlanW/PLA-GNN).
Declaration of competing interest
None declared.
References (48)
- et al.
Co-translational targeting and translocation of proteins to the endoplasmic reticulum
Biochim. Biophys. Acta
(2013) - et al.
Protein sorting gone wrong--VPS10P domain receptors in cardiovascular and metabolic diseases
Atherosclerosis
(2016) - et al.
Dysfunctional diversity of p53 proteins in adult acute myeloid leukemia: projections on diagnostic workup and therapy
Blood
(2017) - et al.
Protein mislocalization: mechanisms, functions and clinical applications in cancer
Biochim. Biophys. Acta Rev. Canc
(2014) - et al.
Recent progress in protein subcellular location prediction
Anal. Biochem.
(2007) - et al.
Targeting mitochondria to overcome conventional and bortezomib/proteasome inhibitor PS-341 resistance in multiple myeloma (MM) cells
Blood
(2004) - et al.
Posttranslational protein translocation across the membrane of the endoplasmic reticulum
Biol. Chem.
(1999) - et al.
Coordinated protein sorting, targeting and distribution in polarized cells
Nat. Rev. Mol. Cell Biol.
(2008) - et al.
Protein sorting at the trans-Golgi network
Annu. Rev. Cell Dev. Biol.
(2014) - et al.
Molecular chaperones and stress-inducible protein-sorting factors coordinate the spatiotemporal distribution of protein aggregates
Mol. Biol. Cell
(2012)