Skip to main content
Log in

Protein remote homology detection combining PCA and multiobjective optimization tools

  • Research Paper
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

Protein remote homology detection (PRHD) is one of the intense researched and a complex problem in computational biology. The objective of PRHD is to predict the structural and functional characteristics of un-annotated protein sequences by means of homologies. Two sequences are said to be homologous if there exists a shared ancestry between them. It aims to detect the evolutionary relationships among protein sequences, that are distantly located, via computational techniques. Prediction of functions of protein sequences are computationally intensive. Features extracted from protein sequences are too high and presence of insignificant redundant features degrades the performance of computational model with respect to time and cost. Hence, using Principal Component Analysis (PCA, the high dimensional input matrix was transformed into a lower dimension matrix. In the first stage, for each protein sequence, 531 physico-chemical properties from AAIndex database are considered as features for homology detection. The protein sequences are taken from protein families from UniProtKB database. A set of 185 representative features have been extracted from 531 features using PCA. In the second stage, NSGA II and NSGA III, two powerful optimization techniques are used to efficiently search the non-zero eigen space and retrieve distinguishable eigen vectors. This work is tested on widely used Uniprot and SCOP benchmark datasets. To check the efficiency of both the algorithms, the controlling parameters are kept same and numerical simulations were performed. NSGA-III outperformed NSGA-II in the results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. UniProt Consortium (2019) UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 47(D1):D506–D515

    Article  Google Scholar 

  2. Jaakkola TS, Diekhans M, Haussler D (1999) Using the Fisher kernel method to detect remote protein homologies. ISMB 99:149–158

    Google Scholar 

  3. Logan B, Moreno P, Suzek B, Weng Z, Kasif S (2001) A study of remote homology detection

  4. Leslie C, Eskin E, Noble WS (2002) The spectrum kernel: a string kernel for SVM protein classification. In: Biocomputing, pp 564–575

  5. Liao L, Noble WS (2002) Combining pairwise sequence similarity and support vector machines for remote protein homology detection. In: Proceedings of the sixth annual international conference on Computational biology, pp 225–232

  6. Deb K, Pratap A, Agarwal S, Meyarivan TAMT (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197

    Article  Google Scholar 

  7. Deb K, Jain H (2013) An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: solving problems with box constraints. IEEE Trans Evol Comput 18(4):577–601

    Article  Google Scholar 

  8. Jain H, Deb K (2013) An evolutionary many-objective optimization algorithm using reference-point based nondominated sorting approach, part II: handling constraints and extending to an adaptive approach. IEEE Trans Evol Comput 18(4):602–622

    Article  Google Scholar 

  9. Lin TH, Murphy RF, Bar-Joseph Z (2010) Discriminative motif finding for predicting protein subcellular localization. IEEE/ACM Trans Comput Biol Bioinf 8(2):441–451

    Article  Google Scholar 

  10. Tomii K, Kumar S, Zhi D, Brenner SE (2020) Meta-align: a novel HMM-based algorithm for pairwise alignment of error-prone sequencing reads. bioRxiv

  11. Webb-Robertson BJM, Ratuiste KG, Oehmen CS (2010) Physicochemical property distributions for accurate and rapid pairwise protein homology detection. BMC Bioinform 11(1):145

    Article  Google Scholar 

  12. Bedoya O, Tischer I (2014) Remote homology detection incorporating the context of physicochemical properties. Comput Biol Med 45:43–50

    Article  Google Scholar 

  13. Goodfellow Ian, Bengio Yoshua, Courville Aaron (2016) Deep learning. MIT Press, Harvard

    MATH  Google Scholar 

  14. Wang Y, Bao J, Huang F, et al (2010) Protein remote homology detection based on deep convolutional neural network. Preprint (version 1) available at Research Square [+https://doi.org/10.21203/rs.2.15388/v1+]

  15. Huang DS (2004) A constructive approach for finding arbitrary roots of polynomials by neural networks. IEEE Trans Neural Netw 15(2):477–491

    Article  Google Scholar 

  16. Zhao ZQ, Huang DS, Sun BY (2004) Human face recognition based on multi-features using neural networks committee. Pattern Recogn Lett 25(12):1351–1358

    Article  Google Scholar 

  17. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444

    Article  Google Scholar 

  18. Cao R, Bhattacharya D, Hou J, Cheng J (2016) DeepQA: improving the estimation of single protein model quality with deep belief networks. BMC Bioinform 17(1):495

    Article  Google Scholar 

  19. Schmidhuber J, Hochreiter S (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  20. Gers Felix A, Jurgen S, Cummins F (2000) Learning to forget: continual prediction with LSTM. Neural Comput 12(10):2451–2471

    Article  Google Scholar 

  21. Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555

  22. Hochreiter S, Heusel M, Obermayer K (2007) Fast model-based protein homology detection without alignment. Bioinformatics 23(14):1728–1736

    Article  Google Scholar 

  23. Xiao X, Cao W, Lin W (2018) Using grey model to predict protein remote homologous family. In: IOP conference series: earth and environmental science, vol 170, no. 5. IOP Publishing, p 052019

  24. Liu B, Li S (2018) ProtDet-CCH: protein remote homology detection by combining long short-term memory and ranking methods. IEEE/ACM Trans Comput Biol Bioinform 16:1203–1210

    Article  Google Scholar 

  25. Kaucic M, Moradi M, Mirzazadeh M (2019) Portfolio optimization by improved NSGA-II and SPEA 2 based on different risk measures. Financ Innov 5(1):1

    Article  Google Scholar 

  26. Lin W, Xiao X, Qiu W, Chou KC (2020) Use Chou’s 5-steps rule to predict remote homology proteins by merging grey incidence analysis and domain similarity analysis. Nat Sci 12(03):181

    Google Scholar 

  27. Zangooei MH, Jalili S (2012) Protein secondary structure prediction using DWKF based on SVR-NSGAII. Neurocomputing 94:87–101

    Article  Google Scholar 

  28. Ishibuchi H, Imada R, Setoguchi Y, Nojima Y (2016) Performance comparison of NSGA-II and NSGA-III on various many-objective test problems. In: 2016 IEEE congress on evolutionary computation (CEC). IEEE, pp 3045–3052

  29. Boutet E, Lieberherr D, Tognolli M, Schneider M, Bairoch A (2007) Uniprotkb/Swiss-prot. In: Plant bioinformatics. Humana Press, pp. 89–112

  30. Routray M (2020) RHD using GA and NSGA-II on physicochemical properties. Int J Comput Appl Technol

  31. Baliarsingh SK, Vipsita S, Muhammad K, Dash B, Bakshi S (2019) Analysis of high-dimensional genomic data employing a novel bio-inspired algorithm. Appl Soft Comput 77:520–532

    Article  Google Scholar 

  32. Baliarsingh SK, Ding W, Vipsita S, Bakshi S (2019) A memetic algorithm using emperor penguin and social engineering optimization for medical data classification. Appl Soft Comput 85:105773

    Article  Google Scholar 

  33. Handstad T, Hestnes AJ, Sætrom P (2007) Motif kernel generated by genetic programming improves remote homology and fold detection. BMC Bioinform 8(1):1–16

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mukti Routray.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Routray, M., Vipsita, S. Protein remote homology detection combining PCA and multiobjective optimization tools. Evol. Intel. 16, 67–76 (2023). https://doi.org/10.1007/s12065-021-00642-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-021-00642-6

Keywords

Navigation