Abstract:
Exploring the functions of proteins is crucial for explaining cellular mechanisms, treating diseases, and developing new drugs. Due to experimental limitations, large-sca...Show MoreMetadata
Abstract:
Exploring the functions of proteins is crucial for explaining cellular mechanisms, treating diseases, and developing new drugs. Due to experimental limitations, large-scale identification of protein function remains a challenging task in cell biology. Here we propose DeepFusionGo, a novel protein function prediction method that adopts a graph representation learning approach (GraphSAGE) to extract features from heterogeneous data sources. First, we generate embeddings from protein sequences using the pre-trained protein language model and InterPro domains with scaling gradient. Then we integrate these two embeddings with adaptive feature weights to the PPI graph and use GraphSAGE to generate the representation vector. Finally, we build the classification model to predict protein function based on the concatenated feature vector. The experimental results show that DeepFusionGO outperforms existing state-of-the-art methods, including sequence-based DeepGOPLUS, and PPI-based DeepGraphGO. DeepFusionGO also performs well in difficult protein function prediction. We demonstrate that selecting an appropriate protein features fusion method can improve the prediction performance, and using the PPI network and the protein representation vector obtained from the protein language model through the GraphSAGE algorithm is an effective way to mine potential functional clues. The source code and data sets are available at: https://github.com/Hhhzj-7/DeepFusionGO.
Date of Conference: 06-08 December 2022
Date Added to IEEE Xplore: 02 January 2023
ISBN Information: