Abstract
A challenging Pattern Recognition problem in Bioinformatics concerns the detection of a functional relation between two proteins even when they show very low sequence similarity – this is the so-called Protein Remote Homology Detection (PRHD) problem. In this paper we propose a novel approach to PRHD, which casts the problem into a Multiple Instance Learning (MIL) framework, which seems very suitable for this context. Experiments on a standard benchmark show very competitive performances, also in comparison with alternative discriminative methods.
Keywords
M. Bicego and P. Lovato were partially supported by the University of Verona through the program “Bando di Ateneo per la Ricerca di Base 2015”.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Along the text we will refer equivalently to K-mers or N-grams.
- 2.
Available at http://noble.gs.washington.edu/proj/svm-pairwise/.
- 3.
Available at http://bioinformatics.hitsz.edu.cn/main/~binliu/remote.
- 4.
Downloadable from http://www.chibi.ubc.ca/gist/ [14].
References
Chen, J., Guo, M., Wang, X., Liu, B.: A comprehensive review and comparison of different computational methods for protein remote homology detection. Brief. Bioinf. 19, 1–14 (2016)
Chen, Y., Bi, J., Wang, J.Z.: MILES: multiple-instance learning via embedded instance selection. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 1931–1947 (2006)
Cheplygina, V., Tax, D., Loog, M.: Dissimilarity-based ensembles for multiple instance learning. IEEE Trans. Neural Netw. Learn. Syst. 27(6), 1379–1391 (2016)
Cucci, A., Lovato, P., Bicego, M.: Enriched bag of words for protein remote homology detection. In: Robles-Kelly, A., Loog, M., Biggio, B., Escolano, F., Wilson, R. (eds.) S+SSPR 2016. LNCS, vol. 10029, pp. 463–473. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49055-7_41
Dietterich, T., Lathrop, R., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 89(1–2), 31–71 (1997)
Dong, Q., Lin, L., Wang, X.: Protein remote homology detection based on binary profiles. In: Hochreiter, S., Wagner, R. (eds.) BIRD 2007. LNCS, vol. 4414, pp. 212–223. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71233-6_17
Dong, Q., Wang, X., Lin, L.: Application of latent semantic analysis to protein remote homology detection. Bioinformatics 22(3), 285–290 (2006)
Fung, G., Dundar, M., Krishnapuram, B., Rao, R.: Multiple instance learning for computer aided diagnosis. Proc. Adv. Neural Inf. Process. Syst. 19, 425–432 (2007)
Gribskov, M., Robinson, N.: Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching. Comput. Chem. 20(1), 25–33 (1996)
Kittler, J., Hatef, M., Duin, R.P., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)
Kuang, R., Wang, K., Wang, K., Siddiqi, M., Freund, Y., Leslie, C.: Profile-based string kernels for remote homology detection and motif extraction. J. Bioinf. Comput. Biol. 3(03), 527–550 (2005)
Kuksa, P.P., Pavlovic, V.: Efficient evaluation of large sequence kernels. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 759–767. ACM (2012)
Leslie, C., Eskin, E., Noble, W.: The spectrum kernel: a string kernel for SVM protein classification. In: PSB, pp. 566–575 (2002)
Liao, L., Noble, W.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J. Comput. Biol. 10(6), 857–868 (2003)
Liu, B., Wang, X., Lin, L., Dong, Q., Wang, X.: A discriminative method for protein remote homology detection and fold recognition combining top-n-grams and latent semantic analysis. BMC Bioinf. 9(1), 510 (2008). https://doi.org/10.1186/1471-2105-9-510
Liu, B., et al.: Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics 30(4), 472–479 (2014)
Lovato, P., Cristani, M., Bicego, M.: Soft Ngram representation and modeling for protein remote homology detection. IEEE/ACM Trans. Comput. Biol. Bioinf. 14(6), 1482–1488 (2017)
Lovato, P., Giorgetti, A., Bicego, M.: A multimodal approach for protein remote homology detection. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 12(5), 1193–1198 (2015)
Pekalska, E., Duin, R.P.W.: The Dissimilarity Representation for Pattern Recognition: Foundations and Applications, Machine Perception and Artificial Intelligence, vol. 64. World Scientific, Singapore (2005)
Rangwala, H., Karypis, G.: Profile-based direct kernels for remote homology detection and fold recognition. Bioinformatics 21(23), 4239–4247 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Mensi, A., Bicego, M., Lovato, P., Loog, M., Tax, D.M.J. (2018). Protein Remote Homology Detection Using Dissimilarity-Based Multiple Instance Learning. In: Bai, X., Hancock, E., Ho, T., Wilson, R., Biggio, B., Robles-Kelly, A. (eds) Structural, Syntactic, and Statistical Pattern Recognition. S+SSPR 2018. Lecture Notes in Computer Science(), vol 11004. Springer, Cham. https://doi.org/10.1007/978-3-319-97785-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-97785-0_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97784-3
Online ISBN: 978-3-319-97785-0
eBook Packages: Computer ScienceComputer Science (R0)