Skip to main content
Log in

Fast H-DROP: A thirty times accelerated version of H-DROP for interactive SVM-based prediction of helical domain linkers

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

Efficient and rapid prediction of domain regions from amino acid sequence information alone is often required for swift structural and functional characterization of large multi-domain proteins. Here we introduce Fast H-DROP, a thirty times accelerated version of our previously reported H-DROP (Helical Domain linker pRediction using OPtimal features), which is unique in specifically predicting helical domain linkers (boundaries). Fast H-DROP, analogously to H-DROP, uses optimum features selected from a set of 3000 ones by combining a random forest and a stepwise feature selection protocol. We reduced the computational time from 8.5 min per sequence in H-DROP to 14 s per sequence in Fast H-DROP on an 8 Xeon processor Linux server by using SWISS-PROT instead of Genbank non-redundant (nr) database for generating the PSSMs. The sensitivity and precision of Fast H-DROP assessed by cross-validation were 33.7 and 36.2%, which were merely ~2% lower than that of H-DROP. The reduced computational time of Fast H-DROP, without affecting prediction performances, makes it more interactive and user-friendly. Fast H-DROP and H-DROP are freely available from http://domserv.lab.tuat.ac.jp/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Han JH, Batey S, Nickson AA, Teichmann SA, Clarke J (2007) The folding and evolution of multidomain proteins. Nat Rev Mol Cell Biol 8(4):319–330

    Article  CAS  Google Scholar 

  2. Itoh K, Sasai M (2008) Cooperativity, connectivity, and folding pathways of multidomain proteins. Proc Natl Acad Sci USA 105(37):13865–13870

    Article  CAS  Google Scholar 

  3. Jacobs SA, Podell ER, Wuttke DS, Cech TR (2005) Soluble domains of telomerase reverse transcriptase identified by high-throughput screening. Protein Sci 14(8):2051–2058

    Article  CAS  Google Scholar 

  4. Jawhari A, Boussert S, Lamour V, Atkinson RA, Kieffer B, Poch O, Potier N, van Dorsselaer A, Moras D, Poterszman A (2004) Domain architecture of the p62 subunit from the human transcription/repair factor TFIIH deduced by limited proteolysis and mass spectrometry analysis. Biochemistry 43(45):14420–14430

    Article  CAS  Google Scholar 

  5. Song AX, Chang YG, Gao YG, Lin XJ, Shi YH, Lin DH, Hang QH, Hu HY (2005) Identification, expression, and purification of a unique stable domain from human HSPC144 protein. Protein Expr Purif 42(1):146–152

    Article  CAS  Google Scholar 

  6. Both D, Steiner EM, Stadler D, Lindqvist Y, Schnell R, Schneider G (2013) Structure of LdtMt2, an L, D-transpeptidase from Mycobacterium tuberculosis. Acta Crystallogr Sect D 69(Pt 3):432–441

    Article  Google Scholar 

  7. Hasegawa J, Tokuda E, Tenno T, Tsujita K, Sawai H, Hiroaki H, Takenawa T, Itoh T (2011) SH3YL1 regulates dorsal ruffle formation by a novel phosphoinositide-binding domain. J Cell Biol 193(5):901–916

    Article  CAS  Google Scholar 

  8. Chikayama E, Kurotani A, Tanaka T, Yabuki T, Miyazaki S, Yokoyama S, Kuroda Y (2010) Mathematical model for empirically optimizing large scale production of soluble protein domains. BMC Bioinform 11(1):1–9

    Article  Google Scholar 

  9. Hondoh T, Kato A, Yokoyama S, Kuroda Y (2006) Computer-aided NMR assay for detecting natively folded structural domains. Protein Sci 15(4):871–883

    Article  CAS  Google Scholar 

  10. Bondugula R, Lee MS, Wallqvist A (2009) FIEFDom: a transparent domain boundary recognition system using a fuzzy mean operator. Nucleic Acids Res 37(2):452–462

    Article  CAS  Google Scholar 

  11. Dumontier M, Yao R, Feldman HJ, Hogue CW (2005) Armadillo: domain boundary prediction by amino acid composition. J Mol Biol 350(5):1061–1073

    Article  CAS  Google Scholar 

  12. Ebina T, Toh H, Kuroda Y (2009) Loop-length-dependent SVM prediction of domain linkers for high-throughput structural proteomics. Biopolymers 92(1):1–8

    Article  CAS  Google Scholar 

  13. Ebina T, Toh H, Kuroda Y (2011) DROP: an SVM domain linker predictor trained with optimal features selected by random forest. Bioinformatics 27(4):487–494

    Article  CAS  Google Scholar 

  14. Eickholt J, Deng X, Cheng J (2011) DoBo: Protein domain boundary prediction by integrating evolutionary signals and machine learning. BMC Bioinform 12(1):1–8

    Article  Google Scholar 

  15. Miyazaki S, Kuroda Y, Yokoyama S (2002) Characterization and prediction of linker sequences of multi-domain proteins by a neural network. J Struct Func Genom 2(1):37–51

    Article  CAS  Google Scholar 

  16. Miyazaki S, Kuroda Y, Yokoyama S (2006) Identification of putative domain linkers by a neural network – application to a large sequence database. BMC Bioinform 7(1):1–9

    Article  Google Scholar 

  17. Sim J, Kim SY, Lee J (2005) PPRODO: prediction of protein domain boundaries using neural networks. Proteins 59(3):627–632

    Article  CAS  Google Scholar 

  18. Suyama M, Ohara O (2003) DomCut: prediction of inter-domain linker regions in amino acid sequences. Bioinformatics 19(5):673–674

    Article  CAS  Google Scholar 

  19. Tanaka T, Kuroda Y, Yokoyama S (2003) Characteristics and prediction of domain linker sequences in multi-domain proteins. J Struct Func Genom 4(2–3):79–85

    Article  CAS  Google Scholar 

  20. Xue Z, Xu D, Wang Y, Zhang Y (2013) ThreaDom: extracting protein domain boundary information from multiple threading alignments. Bioinformatics 29(13):i247–i256

    Article  CAS  Google Scholar 

  21. Tanaka T, Yokoyama S, Kuroda Y (2006) Improvement of domain linker prediction by incorporating loop-length-dependent characteristics. Biopolymers 84:161–168

    Article  CAS  Google Scholar 

  22. Reddy Chichili VP, Kumar V, Sivaraman J (2013) Linkers in the structural biology of protein–protein interactions. Protein Sci 22(2):153–167

    Article  CAS  Google Scholar 

  23. George RA, Heringa J (2002) An analysis of protein domain linkers: their classification and role in protein folding. Protein Eng 15(11):871–879

    Article  CAS  Google Scholar 

  24. Gokhale RS, Khosla C (2000) Role of linkers in communication between protein modules. Curr Opin Chem Biol 4(1):22–27

    Article  CAS  Google Scholar 

  25. Zaki N (2009) Protein–protein interaction prediction using homology and inter-domain linker region information. In: Ao S-I, Gelman L (eds) Advances in Electrical Engineering and Computational Science. Springer Netherlands, Dordrecht, pp 635–645

    Chapter  Google Scholar 

  26. Argos P (1990) An investigation of oligopeptides linking domains in protein tertiary structures and possible candidates for general gene fusion. J Mol Biol 211(4):943–958

    Article  CAS  Google Scholar 

  27. Zhu X, Zhao X, Burkholder WF, Gragerov A, Ogata CM, Gottesman ME, Hendrickson WA (1996) Structural analysis of substrate binding by the molecular chaperone DnaK. Science 272(5268):1606–1614

    Article  CAS  Google Scholar 

  28. Ebina T, Suzuki R, Tsuji R, Kuroda Y (2014) H-DROP: an SVM based helical domain linker predictor trained with features optimized by combining random forest and stepwise selection. J Comput-Aided Mol Des 28(8):831–839

    Article  CAS  Google Scholar 

  29. Ebina T, Umezawa Y, Kuroda Y (2013) IS-Dom: a dataset of independent structural domains automatically delineated from protein structures. J Comput-Aided Mol Des 27(5):419–426

    Article  CAS  Google Scholar 

  30. Xu Y, Xu D, Gabow HN (2000) Protein domain decomposition using a graph-theoretic approach. Bioinformatics 16(12):1091–1104

    Article  CAS  Google Scholar 

  31. Hirose S, Shimizu K, Kanai S, Kuroda Y, Noguchi T (2007) POODLE-L: a two-level SVM prediction system for reliably predicting long disordered regions. Bioinformatics 23(16):2046–2053

    Article  CAS  Google Scholar 

  32. Liaw A, Wiener M (2002) Classification and Regression by randomForest. R News 2 (3):18–22.

    Google Scholar 

  33. Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292(2):195–202

    Article  CAS  Google Scholar 

  34. Ahmad S, Sarai A (2005) PSSM-based prediction of DNA binding sites in proteins. BMC Bioinform 6:33

    Article  Google Scholar 

  35. Mishra NK, Chang J, Zhao PX (2014) Prediction of membrane transport proteins and their substrate specificities using primary sequence information. PloS ONE 9 (6):e100278.

    Article  Google Scholar 

  36. Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH (2007) UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23(10):1282–1288

    Article  CAS  Google Scholar 

  37. El-Manzalawy Y, Abbas M, Malluhi Q, Honavar V (2016) FastRNABindR: Fast and Accurate Prediction of Protein–RNA Interface Residues. PloS ONE 11 (7):e0158445.

    Article  Google Scholar 

  38. Bairoch A, Apweiler R (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res 28(1):45–48

    Article  CAS  Google Scholar 

  39. Garg A, Raghava GP (2008) ESLpred2: improved method for predicting subcellular localization of eukaryotic proteins. BMC Bioinform 9:503

    Article  Google Scholar 

  40. Miao Z, Westhof E (2015) Prediction of nucleic acid binding probability in proteins: a neighboring residue network based score. Nucleic Acids Res 43(11):5340–5351

    Article  CAS  Google Scholar 

  41. Schnoes AM, Brown SD, Dodevski I, Babbitt PC (2009) Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS Comput Biol 5(12):e1000605

    Article  Google Scholar 

  42. Sawa J, Malet H, Krojer T, Canellas F, Ehrmann M, Clausen T (2011) Molecular adaptation of the DegQ protease to exert protein quality control in the bacterial cell envelope. J Biol Chem 286(35):30680–30690

    Article  CAS  Google Scholar 

  43. Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This work was supported by the Japan Society for the Promotion of Science (JSPS) postdoctoral fellowship to TR.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yutaka Kuroda.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 640 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Richa, T., Ide, S., Suzuki, R. et al. Fast H-DROP: A thirty times accelerated version of H-DROP for interactive SVM-based prediction of helical domain linkers. J Comput Aided Mol Des 31, 237–244 (2017). https://doi.org/10.1007/s10822-016-9999-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-016-9999-8

Keywords

Navigation