Skip to main content

Advertisement

Log in

DeepHBSP: A Deep Learning Framework for Predicting Human Blood-Secretory Proteins Using Transfer Learning

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

The identification of blood-secretory proteins and the detection of protein biomarkers in the blood have an important clinical application value. Existing methods for predicting blood-secretory proteins are mainly based on traditional machine learning algorithms, and heavily rely on annotated protein features. Unlike traditional machine learning algorithms, deep learning algorithms can automatically learn better feature representations from raw data, and are expected to be more promising to predict blood-secretory proteins. We present a novel deep learning model (DeepHBSP) combined with transfer learning by integrating a binary classification network and a ranking network to identify blood-secretory proteins from the amino acid sequence information alone. The loss function of DeepHBSP in the training step is designed to apply descriptive loss and compactness loss to the binary classification network and the ranking network, respectively. The feature extraction subnetwork of DeepHBSP is composed of a multi-lane capsule network. Additionally, transfer learning is used to train a highly accurate generalized model with small samples of blood-secretory proteins. The main contributions of this study are as follows: 1) a novel deep learning architecture by integrating a binary classification network and a ranking network is proposed, superior to existing traditional machine learning algorithms and other state-of-the-art deep learning architectures for biological sequence analysis; 2) the proposed model for blood-secretory protein prediction uses only amino acid sequences, overcoming the heavy dependence of existing methods on annotated protein features; 3) the blood-secretory proteins predicted by our model are statistically significant compared with existing blood-based biomarkers of cancer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Nagpal M, Singh S, Singh P, Chauhan P, Zaidi M A. Tumor markers: A diagnostic tool. National Journal of Maxillofacial Surgery, 2016, 7(1): 17-20. https://doi.org/10.4103/0975-5950.196135.

  2. Loke S Y, Lee A S G. The future of blood-based biomarkers for the early detection of breast cancer. European Journal of Cancer, 2018, 92: 54-68. https://doi.org/10.1016/j.ejca.2017.12.025.

    Article  Google Scholar 

  3. Geyer P E, Kulak N A, Pichler G, Holdt L M, Teupser D, Mann M. Plasma proteome profiling to assess human health and disease. Cell Systems, 2016, 2(3): 185-195. https://doi.org/10.1016/j.cels.2016.02.015.

    Article  Google Scholar 

  4. Cui J, Liu Q, Puett D, Xu Y. Computational prediction of human proteins that can be secreted into the bloodstream. Bioinformatics, 2008, 24(20): 2370-2375. https://doi.org/10.1093/bioinformatics/btn418.

    Article  Google Scholar 

  5. Dhanasekaran S M, Barrette T R, Ghosh D, Shah R, Varambally S, Kurachi K, Pienta K J, Rubin M A, Chinnaiyan A M. Delineation of prognostic biomarkers in prostate cancer. Nature, 2001, 412(6849): 822-826. https://doi.org/10.1038/35090585.

    Article  Google Scholar 

  6. Liu Q, Cui J, Yang Q, Xu Y. In-silico prediction of blood-secretory human proteins using a ranking algorithm. BMC Bioinformatics, 2010, 11: Article No. 250. https://doi.org/10.1186/1471-2105-11-250.

  7. Robinson J L, Feizi A, Uhlén M, Nielsen J. A systematic investigation of the malignant functions and diagnostic potential of the cancer secretome. Cell Reports, 2019, 26(10): 2622-2635. https://doi.org/10.1016/j.celrep.2019.02.025.

    Article  Google Scholar 

  8. Geyer P E, Holdt L M, Teupser D, Mann M. Revisiting biomarker discovery by plasma proteomics. Molecular Systems Biology, 2017, 13(9): Article No. 942. https://doi.org/10.15252/msb.20156297.

  9. Huang L, Shao D, Wang Y, Cui X, Li Y, Chen Q, Cui J. Human body-fluid proteome: Quantitative profiling and computational prediction. Briefings in Bioinformatics, 2021, 22(1): 315-333. https://doi.org/10.1093/bib/bbz160.

    Article  Google Scholar 

  10. Zhang J, Chai H, Guo S, Guo H, Li Y. High-throughput identification of mammalian secreted proteins using species-specific scheme and application to human proteome. Molecules, 2018, 23(6): Article No. 1448. https://doi.org/10.3390/molecules23061448.

  11. Zhang J, Zhang Y, Ma Z. In silico prediction of human secretory proteins in plasma based on discrete firefly optimization and application to cancer biomarkers identification. Frontiers in Genetics, 2019, 10: Article No. 542. https://doi.org/10.3389/fgene.2019.00542.

  12. Wang D, Zeng S, Xu C, Qiu W, Liang Y, Joshi T, Xu D. MusiteDeep: A deep-learning framework for general and kinase-specific phosphorylation site prediction. Bioinformatics, 2017, 33(24): 3909-3916. https://doi.org/10.1093/bioinformatics/btx496.

    Article  Google Scholar 

  13. Liang H, Sun X, Sun Y, Gao Y. Text feature extraction based on deep learning: A review. EURASIP Journal on Wireless Communications and Networking, 2017, 2017: Article No. 211. https://doi.org/10.1186/s13638-017-0993-1.

  14. Cao Z, Du W, Li G, Cao H. DEEPSMP: A deep learning model for predicting the ectodomain shedding events of membrane proteins. Journal of Bioinformatics Computational Biology, 2020, 18(3): Article No. 2050017. https://doi.org/10.1142/S0219720020500171.

  15. Du W, Pang R, Li G, Cao H, Li Y, Liang Y. DeepUEP: Prediction of urine excretory proteins using deep learning. IEEE Access, 2020, 8: 100251-100261. https://doi.org/10.1109/ACCESS.2020.2997937.

    Article  Google Scholar 

  16. Altschul S F, Madden T L, Schäffer A A, Zhang J, Zhang Z, Miller W, Lipman D J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 1997, 25(17): 3389-3402. https://doi.org/10.1093/nar/25.17.3389.

    Article  Google Scholar 

  17. The UniProt Consortium. UniProt: The universal protein knowledgebase. Nucleic Acids Research, 2017, 45(D1): D158-D169. https://doi.org/10.1093/nar/gkw1099.

  18. Meinken J, Walker G, Cooper C R, Min X J. MetazSecKB: The human and animal secretome and subcellular proteome knowledgebase. Database, 2015: Article No. bav077. https://doi.org/10.1093/database/bav077.

  19. Omenn G S. The HUPO human plasma proteome project. Proteomics Clinical Applications, 2007, 1(8): 769-779. https://doi.org/10.1002/prca.200700369.

    Article  Google Scholar 

  20. Li S J, Peng M, Li H, Liu B S, Wang C, Wu J R, Li Y X, Zeng R. Sys-BodyFluid: A systematical database for human body uid proteome research. Nucleic Acids Research, 2009, 37(Database Issue): D907-D912. https://doi.org/10.1093/nar/gkn849.

  21. Huang Y, Niu B, Gao Y, Fu L, Li W. CD-HIT suite: A web server for clustering and comparing biological sequences. Bioinformatics, 2010, 26(5): 680-682. https://doi.org/10.1093/bioinformatics/btq003.

    Article  Google Scholar 

  22. Maurer-Stroh S, Debulpaep M, Kuemmerer N et al. Exploring the sequence determinants of amyloid structure using position-specific scoring matrices. Nature Methods, 2010, 7(3): 237-242. https://doi.org/10.1038/nmeth.1432.

    Article  Google Scholar 

  23. Suzek B E, Wang Y, Huang H, McGarvey P B, Wu C H, the UniProt Consortium. UniRef clusters: A comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics, 2015, 31(6): 926-932. https://doi.org/10.1093/bioinformatics/btu739.

    Article  Google Scholar 

  24. Magnan C N, Baldi P. SSpro/ACCpro 5: Almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity. Bioinformatics, 2014, 30(18): 2592-2597. https://doi.org/10.1093/bioinformatics/btu352.

    Article  Google Scholar 

  25. Perera P, Patel V M. Learning deep features for one-class classification. IEEE Transactions on Image Processing, 2019, 28(11): 5450-5463. https://doi.org/10.1109/TIP.2019.2917862.

    Article  MathSciNet  MATH  Google Scholar 

  26. Sabour S, Frosst N, Hinton G E. Dynamic routing between capsules. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.3856-3866. https://doi.org/10.5555/3294996.3295142.

  27. Li Y, Yuan Y. Convergence analysis of two-layer neural networks with ReLU activation. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.597-607. https://doi.org/10.5555/3294771.3294828.

  28. Armenteros J J A, Sønderby C K, Sønderby S K, Nielsen H, Winther O. DeepLoc: Prediction of protein subcellular localization using deep learning. Bioinformatics, 2017, 33(21): 3387-3395. https://doi.org/10.1093/bioinformatics/btx431.

    Article  Google Scholar 

  29. Wang D, Liang Y, Xu D. Capsule network for protein post-translational modification site prediction. Bioinformatics, 2019, 35(14): 2386-2394. https://doi.org/10.1093/bioinformatics/bty977.

    Article  Google Scholar 

  30. Caruana R. Learning many related tasks at the same time with backpropagation. In Proc. the 1994 International Conference on Neural Information Processing Systems, Jan. 1994, pp.657-664. https://doi.org/10.5555/2998687.2998769.

  31. Ng H W, Nguyen V D, Vonikakis V, Winkler S. Deep learning for emotion recognition on small datasets using transfer learning. In Proc. the 2015 ACM International Conference Multimodal Interaction, Nov. 2015, pp.443-449. https://doi.org/10.1145/2818346.2830593.

  32. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 2014, 15(1): 1929-1958.

    MathSciNet  MATH  Google Scholar 

  33. Yao Y, Rosasco L, Caponnetto A. On early stopping in gradient descent learning. Constructive Approximatio, 2007, 26(2): 289-315. https://doi.org/10.1007/s00365-006-0663-2.

  34. Jurtz V I, Johansen A R, Nielsen M, Armenteros J J A, Nielsen H, Sønderby C K, Winther O, Sønderby S K. An introduction to deep learning on biological sequence data: Examples and solutions. Bioinformatics, 2017, 33(22): 3685-3690. https://doi.org/10.1093/bioinformatics/btx531.

    Article  Google Scholar 

  35. Kingma D P, Ba J. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014. http://arxiv.org/abs/14-12.6980, May 2020.

  36. Matthews B W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA) — Protein Structure, 1975, 405(2): 442-451. https://doi.org/10.1016/0005-2795(75)90109-9.

    Article  Google Scholar 

  37. Linden A. Measuring diagnostic and predictive accuracy in disease management: An introduction to receiver operating characteristic (ROC) analysis. Journal of Evaluation in Clinical Practice, 2006, 12(2): 132-139. https://doi.org/10.1111/j.1365-2753.2005.00598.x.

  38. Savojardo C, Martelli P L, Fariselli P, Casadio R. Deep-Sig: Deep learning improves signal peptide detection in proteins. Bioinformatics, 2018, 34(10): 1690-1696. https://doi.org/10.1093/bioinformatics/btx818.

    Article  Google Scholar 

  39. Quang D, Xie X. DanQ: A hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Research, 2016, 44(11): Article No. e107. https://doi.org/10.1093/nar/gkw226.

  40. Du W, Sun Y, Li G, Cao H, Pang R, Li Y. CapsNet-SSP: Multilane capsule network for predicting human saliva-secretory proteins. BMC Bioinformatics, 2020, 21(1): Article No. 237. https://doi.org/10.1186/s12859-020-03579-2.

  41. Zhou Y, Zhou B, Pache L, Chang M, Khodabakhshi A H, Tanaseichuk O, Benner C, Chanda S K. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nature Communications, 2019, 10(1): Article No. 1523. https://doi.org/10.1038/s41467-019-09234-6.

  42. Emilsson V, Ilkov M, Lamb J R et al. Co-regulatory networks of human serum proteins link genetics to disease. Science, 2018, 361(6404): 769-773. https://doi.org/10.1126/science.aaq1327.

    Article  Google Scholar 

  43. Ahn S B, Sharma S, Mohamedali A et al. Potential early clinical stage colorectal cancer diagnosis using a proteomics blood test panel. Clinical Proteomics, 2019, 16: Article No. 34. https://doi.org/10.1186/s12014-019-9255-z.

  44. Ahn J M, Sung H J, Yoon Y H, Kim B G, Yang W S, Lee C, Park H M, Kim B J, Kim B G, Lee S Y, An H J, Cho J Y. Integrated glycoproteomics demonstrates fucosylated serum paraoxonase 1 alterations in small cell lung cancer. Molecular & Cellular Proteomics, 2014, 13(1): 30-48. https://doi.org/10.1074/mcp.M113.028621.

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ying Li or Yan-Chun Liang.

Supplementary Information

ESM 1

(PDF 909 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Du, W., Sun, Y., Bao, HM. et al. DeepHBSP: A Deep Learning Framework for Predicting Human Blood-Secretory Proteins Using Transfer Learning. J. Comput. Sci. Technol. 36, 234–247 (2021). https://doi.org/10.1007/s11390-021-0851-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-021-0851-9

Keywords

Navigation