Abstract
More than one-third of the proteins contain metal ions in the Protein Data Bank. Correct identification of metal ion-binding residues is important for understanding protein functions and designing novel drugs. Due to the small size and high versatility of metal ions, it remains challenging to computationally predict their binding sites from protein sequence. Existing sequence-based methods are of low accuracy due to the lack of structural information, and time-consuming owing to the usage of multi-sequence alignment. Here, we propose LMetalSite, an alignment-free sequence-based predictor for binding sites of the four most frequently seen metal ions (Zn2+, Ca2+, Mg2+ and Mn2+). LMetalSite leverages the pretrained language model to rapidly generate informative sequence representations and employs transformer to capture long-range dependencies. Multi-task learning is adopted to compensate for the scarcity of training data and capture the intrinsic similarities between different metal ions. LMetalSite was shown to surpass state-of-the-art structure-based methods by more than 19.7%, 14.4%, 36.8%, and 12.6% in AUPR on the four independent tests, respectively. Further analyses indicated that the self-attention modules are effective to learn the structural contexts of residues from protein sequence.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Author biography:
Qianmu Yuan is a PhD student in the School of Computer Science and Engineering at Sun Yat-sen University. His research interests lie in deep learning, graph neural network, protein structure prediction, and protein function prediction.
Sheng Chen is a PhD student in the School of Computer Science and Engineering at Sun Yat-sen University. His research interests include deep learning, protein design, protein structure prediction, and graph neural network.
Yu Wang is a research professor in Peng Cheng National Laboratory at Shenzhen. His research interests include AI for systems biology, particularly foundation models in biomedicine research.
Huiying Zhao is an associate research fellow in the Sun Yat-sen Memorial Hospital at Sun Yat-sen University. Her research interests include pathogenic gene analysis, protein function, and RNA function prediction.
Yuedong Yang is a professor in the School of Computer Science and Engineering at Sun Yat-sen University. Currently he focuses on developing AI algorithms and the HPC platform for biomedicine.