Abstract
This paper addresses the problem of intellectual human herpes viruses recognition based on the analysis of their protein sequences. To compare proteins, we use a new dissimilarity measure based on finding an optimal sequence alignment. In the previous work, we proved that the proposed way of sequence comparison generates a measure that has properties of a metric. These properties allow for more convenient and effective use of the proposed measure in further analysis in contrast to the traditional similarity measure, such as Needleman-Wunch alignment. The results of herpes viruses recognition show, that the metric properties allow to improve the classification quality. In addition, in this paper, we adduce an updated computational scheme for the proposed metric, which allows to speed up the comparison of proteins.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Huleihel, M., Shufan, E., Zeiri, L., Salman, A.: Detection of vero cells infected with Herpes simplex types 1 and 2 and Varicella Zoster viruses using Raman spectroscopy and advanced statistical methods. PLoS ONE 11(4), e0153599 (2016). https://doi.org/10.1371/journal.pone.0153599
Mc Geoch, D.J., Rixon, F.J., Davison, A.J.: Topics in herpesvirus genomics and evolution. Virus Res. 117, 90–104 (2006). https://doi.org/10.1016/j.virusres.2006.01.002
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970). https://doi.org/10.1016/0022-2836(70)90057-4
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981). https://doi.org/10.1016/0022-2836(81)90087-5
Zhang, Z., Schwartz, S., Wagnerm, L., Miller, W.: A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 7(1–2), 203–214 (2000). https://doi.org/10.1089/10665270050081478
Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, p. 356. Cambridge University Press, Cambridge (1998)
Vapnik, V.N.: Statistical Learning Theory, p. 768. Wiley, Hoboken (1998)
Schölkopf, B., Tsuda, K., Vert, J.-P.: Kernel Methods in Computational Biology, p. 410. MIT Press, Cambridge (2004)
Aizerman, M.A., et al.: Potential Functions Method in Machine Learning Theory, p. 384. Nauka, Moscow (1970). (in Russian)
Sulimova, V.V.: Kernel functions for analysis of signals and symbolic sequences of different length, p. 122. Ph.D. thesis, Tula (2009). (in Russian)
Miklós, I., Novak, A., Satija, R., Lyngso, R., Hein, J.: Stochastic models of sequence evolution including insertion-deletion events. Stat. Methods Med. Res. 18(5), 453–485 (2009). https://doi.org/10.1177/0962280208099500
Seeger, M.: Covariance kernels from Bayesian generative models. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14, pp. 905–912. MIT Press (2002)
Abramov, V.I., Seredin, O.S., Mottl, V.V.: Pattern recognition training by support object method in Euclidean metric spaces with affine operations. In: Proceedings of Tula State University. Natural Sciences Series, vol. 2, no. 1, pp. 119–136. TSU, Tula (2013). (in Russian)
Pekalska, E.M.: Dissimilarity representations in pattern recognition. Concepts, Theory and Applications. Ph.D. thesis, p. 344 (2005). ISBN 90-9019021-X
Seredin O.S., Mottl V.V.: Support object method for pattern recognition training in arbitrary metric spaces. In: Proceedings of Tula State University. Natural Sciences Series, vol. 4, pp. 178–196. TSU, Tula (2015). (in Russian)
Braverman, E.M.: Experiments on training a machine for pattern recognition. Ph.D. thesis. Moscow (1961). (in Russian)
Xing, E.P., Ng, A.Y., Jordan, M.I., Russel, S.: Distance metric learning with application to clustering with side-information. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, vol. 15, pp. 521–528. MIT Press (2003)
Bellet, A., Harbrad, A., Sebban, M.: A survey on metric learning for feature vectors and structured data. CoRR (2013). http://arxiv.org/abs/1306.6709
Wang, J., Sun, K., Sha, F., Marchand-Maillet, S., Kalousis, K.: Two-stage metric learning. In: Proceedings of the 31st International Conference on Machine Learning, Cycle 2, vol. 32, pp. 370–378 (2014)
Schultz, M., Joachims, T.: Learning a distance metric from relative comparisons. In: Thrun, S., Saul, L.K., Schölkopf, P.B. (eds.) Advances in Neural Information Processing System, vol. 16, pp. 41–48. MIT Press (2004)
Wang, J., Do, H., Woznica, A., Kalousis, A.: Metric learning with multiple Kernels. In: Shawe-Taylor, J., Zemel, R. S., Bartlett, P.L., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 24, pp. 1–9. Curran Associates, Inc. (2011)
Cao, M., Zhang, H., Park, J., Daniels, N.M., Crovella, M.E., et al.: Going the distance for protein function prediction: a new distance metric for protein interaction networks. PLoS ONE 8(10), e76339 (2013). https://doi.org/10.1371/journal.pone.0076339
Rogen, P., Fain, B.: Automatic classification of protein structure by using Gauss integrals. Proc. Natl. Acad. Sci. USA 100(1), 119–124 (2002). https://doi.org/10.1073/pnas.2636460100
Dayhoff, M., Schwarts, R., Orcutt, B.: A model of evolutionary change in proteins. Atlas of Protein Sequences Struct. 5(3), 345–352 (1978)
Mottl, V.V.: Metric spaces admitting linear operations and inner product. Doklady Math. 67(1), 140–143 (2003)
Sulimova, V., Seredin, O., Mottl, V.: Metrics on the basis of optimal alignment of biomolecular sequences. JMLDA 2(3), 286–304 (2016). https://doi.org/10.21469/22233792.2.3.03
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990). https://doi.org/10.1006/jmbi.1990.9999
Lipman, D.J., Pearson, W.R.: Rapid and sensitive protein similarity searches. Science 227(4693), 1435–1441 (1985). https://doi.org/10.1126/science.2983426
Pearson, W.R.: Flexible sequence similarity searching with the FASTA3 program package. Methods Mol. Biol. 185–219 (2000). https://doi.org/10.1385/1-59259-192-2:185
Sakoe, H., Chiba, S.: Dynamic programming optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978). https://doi.org/10.1109/tassp.1978.1163055
Myers, C., Rabiner, L.R., Rosenberg, A.E.: Performance tradeoffs in dynamic time warping algorithms for isolated word recognition. IEEE Trans. Acoust. Speech Signal Process. 28(6), 623–635 (1980). https://doi.org/10.1109/tassp.1980.1163491
Silva, D.F., Batista, G.E.A.P.A.: Speeding up all-pairwise dynamic time warping matrix calculation. In: Proceedings of the 2016 SIAM International Conference on Data Mining, pp. 837–845 (2016). https://doi.org/10.1137/1.9781611974348.94
Virus Database at University College London (VIDA). http://www.biochem.ucl.ac.uk/bsm/virus_database/VIDA3/VIDA.html
Lanckriet, G., Bie, T.D., Cristianini, N., Jordan, M.I., Noble, W.S.: A statistical framework for genomic data fusion. Bioinformatics 20(16), 2626–2635 (2004). https://doi.org/10.1093/bioinformatics/bth294
Acknowledgements
This work is supported by the Russian Foundation for Basic Research, Grant 15-07-08967.
The results of the research project are published with the financial support of Tula State University within the framework of the scientific project - 2017-18PUBL.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Sulimova, V., Seredin, O., Mottl, V. (2019). Recognition of Herpes Viruses on the Basis of a New Metric for Protein Sequences. In: Strijov, V., Ignatov, D., Vorontsov, K. (eds) Intelligent Data Processing. IDP 2016. Communications in Computer and Information Science, vol 794. Springer, Cham. https://doi.org/10.1007/978-3-030-35400-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-35400-8_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35399-5
Online ISBN: 978-3-030-35400-8
eBook Packages: Computer ScienceComputer Science (R0)