PLA-GNN: Computational inference of protein subcellular location alterations under drug treatments with deep graph neural networks

https://doi.org/10.1016/j.compbiomed.2023.106775Get rights and content

Highlights

  • PLA-GNN is the first computational method for predicting mis-localized proteins under drug treatments.

  • We reported a list of proteins that are mis-localized under TSA, Bortezomib or Tacrolimus treatments.

  • We find that protein mis-localization events may not be rare under drug treatments.

  • The topology of PPI network is highly distorted under drug treatments.

Abstract

The aberrant protein sorting has been observed in many conditions, including complex diseases, drug treatments, and environmental stresses. It is important to systematically identify protein mis-localization events in a given condition. Experimental methods for finding mis-localized proteins are always costly and time consuming. Predicting protein subcellular localizations has been studied for many years. However, only a handful of existing works considered protein subcellular location alterations. We proposed a computational method for identifying alterations of protein subcellular locations under drug treatments. We took three drugs, including TSA (trichostain A), bortezomib and tacrolimus, as instances for this study. By introducing dynamic protein-protein interaction networks, graph neural network algorithms were applied to aggregate topological information under different conditions. We systematically reported potential protein mis-localization events under drug treatments. As far as we know, this is the first attempt to find protein mis-localization events computationally in drug treatment conditions. Literatures validated that a number of proteins, which are highly related to pharmacological mechanisms of these drugs, may undergo protein localization alterations. We name our method as PLA-GNN (Protein Localization Alteration by Graph Neural Networks). It can be extended to other drugs and other conditions. All datasets and codes of this study has been deposited in a GitHub repository (https://github.com/quinlanW/PLA-GNN).

Introduction

Proteins are sorted to appropriate subcellular compartments or secreted outside the cell after or along with the translation process [1,2]. The molecular function of a protein is highly correlated with its subcellular localization [3]. The aberrant translocation of a protein may affect its normal molecular function, and may involve it in an incorrect biological process [4,5]. Environmental stresses may alter protein sorting destinations [6], which is a response of a living cell to a changing environment. Protein mis-localization events are related to complex disorders, including Alzheimer's disease [7], amyotrophic lateral sclerosis [8] and acute myeloid leukemia [9]. Interfering protein sorting process by pharmaceutical substances is a kind of therapies to complex diseases [10,11]. Several practices have been performed [12].

Human protein subcellular localizations have been systematically mapped by experiments [13]. However, this mapping process is incredibly expensive and time consuming [14]. It is unlikely to determine every mis-localization event in a given cellular state by this way. The cellular state here means a cell in its normal living state or a disease state or a disease state with drug perturbations. Therefore, computational estimations are considered as alternative approaches to determine protein mis-localization events [[15], [16], [17]].

In a fixed cellular state, predicting protein subcellular locations has been well studied [[18], [19], [20], [21]]. There are many computational methods for predicting protein subcellular locations. These methods can predict protein subcellular location in a tissue-specific or a lineage-specific manner [20,[22], [23], [24]]. These computational approaches utilized protein sequences [18,19,25], structures [26,27] and interactions [16,28] to estimate protein subcellular locations. However, only a handful of studies tried to predict alterations of protein subcellular locations in different cellular states [17,[29], [30], [31]]. These studies generally fall into two categories, the image-based and the omics-based methods.

Image-based methods take immunohistochemical images [20] or immunofluorescence images [21] as input. They use image analysis algorithms along with machine learning models to identify protein subcellular locations in different cellular states. By comparing prediction results in different cellular states, these methods can report protein mis-localization events [20,21]. Omics-based methods take protein sequences and interactions as input. Systems biology methods are used to report mis-localization events. For example, Lee et al. integrated protein sequences, PPI (protein-protein interaction) networks and gene expression profiles to find mis-localized proteins in gliomas [31]. For another example, the PROLocalizer predictor used sequence mutations to detect protein mis-localizations in diseases [29,30].

Neither strategy can be applied as a common pipeline. Image-based methods face two challenges: the lack of fluorescence images and the limited resolution in immunohistochemical images [32]. Omics-based methods usually use the PPI networks in a normal state to mimic PPI networks in other cellular states, assuming the changes of PPIs can be ignored. This is due to the fact that PPI networks in different cellular states are usually not available [16]. However, this assumption has a paradox. Given that PPIs are usually physical interactions, if the subcellular location of a protein was changed, it would be less likely to interact with proteins in its original subcellular compartments. Its interacting proteins would be surely changed also. Therefore, assuming a universal PPI network in various cellular states just discarded the most informative changes. Although gene expressions may rescue this assumption to some extent, the prediction performances are inevitable affected [16].

Li et al. proposed the DPPN-SVM [17] method in accordance to the differential network biology concept [33]. They used gene expression profiles to estimate PPI networks in different cellular states. The PPI network in a given cellular state can be estimated by adding and removing certain interactions from the normal state network. By using this strategy, DPPN-SVM identified a serial of potentially mis-localized proteins in the breast cancer and validated them by other literatures.

Although attempts have been made in predicting mis-localized proteins in diseases, as far as we know, no existing study can computationally identify mis-localized proteins in drug therapies. In this work, we propose a new computational method for predicting mis-localized proteins in drug therapies. We estimated PPI networks under drug treatments. Graph neural network models were trained to aggregate high-order topological information of PPI networks, as it is reported that the high-order interaction information is more dominant in PPI networks [34,35]. We name our method as PLA-GNN (Protein Localization Alterations by Graph Neural Network).

We took TSA (trichostatin A), bortezomib, and tacrolimus as instances in our study. TSA, an antifungal biotic, is a potent and specific inhibitor of histone deacetylase (HDAC) activity [36]. Bortezomib is a dipeptide boronic acid derivative and a proteasome inhibitor. It is reported that bortezomib enhances Docetaxel-induced cell death level and has an inhibitory effect on cell migration in breast cancer [37]. Tacrolimus is a calcineurin inhibitor for preventing rejections in transplants, and for treating moderate to severe atopic dermatitis [38]. Our results indicated that, when administered, several proteins, which are highly related to pharmacological mechanisms of these drugs, may undergo protein localization alterations. This may provide useful information for pharmacological studies. Our method has the potential to become a common pipeline for predicting protein localization alterations in drug therapies.

Section snippets

PPI network

We downloaded PPI records from the BioGRID database [39]. To construct a high-quality working dataset, we screened the raw PPI records strictly according to the following steps: (1) Only interactions between two human proteins were kept. (2) All interactions between two identical proteins were excluded. (3) Duplicate records were reduced. All redundant records were removed. (4) Non-physical interaction records were excluded. We kept only interactions with a type MI:0915 (physical association),

Network topology adjustment

The PPI network has a total of 1,376,072 interactions in the control state. When creating the dynamic PPI network, a total of 577,969,681 differential PCC values of protein pairs are calculated for each of the three drugs. Topology adjustments were carried out according to these values. We finally obtained 2,202,772 interactions with the TSA treatment, 2,295,812 interactions with the bortezomib treatment, and 1,367,114 interactions with the tacrolimus treatment. Distributions of differential

Conclusions

Computational prediction of protein subcellular localizations has been studied for over two decades. However, only a handful of studies considered protein subcellular location alterations in different cellular states. Notably, no existing study considered drug treatment states. We take the TSA, bortezomib, and tacrolimus as instances to develop PLA-GNN, which detects protein subcellular location alterations in drug perturbation states. We integrated gene expression profiles and PPIs to create a

Author contributions

RHW collected the data, constructed the model, implement the algorithm, performed experiments and partially wrote the manuscript. TL analyzed the results and partially wrote the manuscript. HLZ partially analyzed the results. PFD supervised the whole study, conceptualized the algorithm, analyzed the results and partially wrote the manuscript.

Funding

This work was supported by National Natural Science Foundation of China [NSFC 61872268].

Data availability statement

The code and data for reproducing the results of this paper is available in GitHub (https://github.com/quinlanW/PLA-GNN).

Declaration of competing interest

None declared.

References (48)

  • C. Kontaxi et al.

    Lysine-directed post-translational modifications of Tau protein in Alzheimer's disease and related tauopathies

    Front. Mol. Biosci.

    (2017)
  • J.-E. Kim et al.

    Altered nucleocytoplasmic proteome and transcriptome distributions in an in vitro model of amyotrophic lateral sclerosis

    PLoS One

    (2017)
  • R. Hill et al.

    Targeting nucleocytoplasmic transport in cancer therapy

    Oncotarget

    (2014)
  • M.-C. Hung et al.

    Protein localization in disease and therapy

    J. Cell Sci.

    (2011)
  • P.J. Thul et al.

    A subcellular map of the human proteome

    Science

    (2017)
  • R. Horwitz et al.

    Whole cell maps chart a course for 21st-century cell biology

    Science

    (2017)
  • T. Ideker et al.

    Differential network biology

    Mol. Syst. Biol.

    (2012)
  • K. Lee et al.

    Proteome-wide discovery of mislocated proteins in cancer

    Genome Res.

    (2013)
  • G.-P. Li et al.

    DPPN-SVM: computational identification of mis-localized proteins in cancers by integrating differential gene expressions with dynamic protein-protein interaction networks

    Front. Genet.

    (2020)
  • P. Du et al.

    Predicting multisite protein subcellular locations: progress and challenges

    Expert Rev. Proteomics

    (2013)
  • P. Du et al.

    Recent progress in predicting protein sub-subcellular locations

    Expert Rev. Proteomics

    (2011)
  • Y.-Y. Xu et al.

    An image-based multi-label human protein subcellular localization predictor (iLocator) reveals protein mislocalizations in cancer tissues

    Bioinformatics

    (2013)
  • L.P. Coelho et al.

    Determining the subcellular location of new proteins from microscope images using local features

    Bioinformatics

    (2013)
  • K.-C. Chou et al.

    iLoc-Euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins

    PLoS One

    (2011)
  • Cited by (0)

    View full text