Skip to main content

Recognition of Herpes Viruses on the Basis of a New Metric for Protein Sequences

  • Conference paper
  • First Online:
  • 170 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 794))

Abstract

This paper addresses the problem of intellectual human herpes viruses recognition based on the analysis of their protein sequences. To compare proteins, we use a new dissimilarity measure based on finding an optimal sequence alignment. In the previous work, we proved that the proposed way of sequence comparison generates a measure that has properties of a metric. These properties allow for more convenient and effective use of the proposed measure in further analysis in contrast to the traditional similarity measure, such as Needleman-Wunch alignment. The results of herpes viruses recognition show, that the metric properties allow to improve the classification quality. In addition, in this paper, we adduce an updated computational scheme for the proposed metric, which allows to speed up the comparison of proteins.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Huleihel, M., Shufan, E., Zeiri, L., Salman, A.: Detection of vero cells infected with Herpes simplex types 1 and 2 and Varicella Zoster viruses using Raman spectroscopy and advanced statistical methods. PLoS ONE 11(4), e0153599 (2016). https://doi.org/10.1371/journal.pone.0153599

    Article  Google Scholar 

  2. Mc Geoch, D.J., Rixon, F.J., Davison, A.J.: Topics in herpesvirus genomics and evolution. Virus Res. 117, 90–104 (2006). https://doi.org/10.1016/j.virusres.2006.01.002

    Article  Google Scholar 

  3. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970). https://doi.org/10.1016/0022-2836(70)90057-4

    Article  Google Scholar 

  4. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981). https://doi.org/10.1016/0022-2836(81)90087-5

    Article  Google Scholar 

  5. Zhang, Z., Schwartz, S., Wagnerm, L., Miller, W.: A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 7(1–2), 203–214 (2000). https://doi.org/10.1089/10665270050081478

    Article  Google Scholar 

  6. Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, p. 356. Cambridge University Press, Cambridge (1998)

    Book  Google Scholar 

  7. Vapnik, V.N.: Statistical Learning Theory, p. 768. Wiley, Hoboken (1998)

    MATH  Google Scholar 

  8. Schölkopf, B., Tsuda, K., Vert, J.-P.: Kernel Methods in Computational Biology, p. 410. MIT Press, Cambridge (2004)

    Book  Google Scholar 

  9. Aizerman, M.A., et al.: Potential Functions Method in Machine Learning Theory, p. 384. Nauka, Moscow (1970). (in Russian)

    Google Scholar 

  10. Sulimova, V.V.: Kernel functions for analysis of signals and symbolic sequences of different length, p. 122. Ph.D. thesis, Tula (2009). (in Russian)

    Google Scholar 

  11. Miklós, I., Novak, A., Satija, R., Lyngso, R., Hein, J.: Stochastic models of sequence evolution including insertion-deletion events. Stat. Methods Med. Res. 18(5), 453–485 (2009). https://doi.org/10.1177/0962280208099500

    Article  MathSciNet  Google Scholar 

  12. Seeger, M.: Covariance kernels from Bayesian generative models. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14, pp. 905–912. MIT Press (2002)

    Google Scholar 

  13. Abramov, V.I., Seredin, O.S., Mottl, V.V.: Pattern recognition training by support object method in Euclidean metric spaces with affine operations. In: Proceedings of Tula State University. Natural Sciences Series, vol. 2, no. 1, pp. 119–136. TSU, Tula (2013). (in Russian)

    Google Scholar 

  14. Pekalska, E.M.: Dissimilarity representations in pattern recognition. Concepts, Theory and Applications. Ph.D. thesis, p. 344 (2005). ISBN 90-9019021-X

    Google Scholar 

  15. Seredin O.S., Mottl V.V.: Support object method for pattern recognition training in arbitrary metric spaces. In: Proceedings of Tula State University. Natural Sciences Series, vol. 4, pp. 178–196. TSU, Tula (2015). (in Russian)

    Google Scholar 

  16. Braverman, E.M.: Experiments on training a machine for pattern recognition. Ph.D. thesis. Moscow (1961). (in Russian)

    Google Scholar 

  17. Xing, E.P., Ng, A.Y., Jordan, M.I., Russel, S.: Distance metric learning with application to clustering with side-information. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, vol. 15, pp. 521–528. MIT Press (2003)

    Google Scholar 

  18. Bellet, A., Harbrad, A., Sebban, M.: A survey on metric learning for feature vectors and structured data. CoRR (2013). http://arxiv.org/abs/1306.6709

  19. Wang, J., Sun, K., Sha, F., Marchand-Maillet, S., Kalousis, K.: Two-stage metric learning. In: Proceedings of the 31st International Conference on Machine Learning, Cycle 2, vol. 32, pp. 370–378 (2014)

    Google Scholar 

  20. Schultz, M., Joachims, T.: Learning a distance metric from relative comparisons. In: Thrun, S., Saul, L.K., Schölkopf, P.B. (eds.) Advances in Neural Information Processing System, vol. 16, pp. 41–48. MIT Press (2004)

    Google Scholar 

  21. Wang, J., Do, H., Woznica, A., Kalousis, A.: Metric learning with multiple Kernels. In: Shawe-Taylor, J., Zemel, R. S., Bartlett, P.L., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 24, pp. 1–9. Curran Associates, Inc. (2011)

    Google Scholar 

  22. Cao, M., Zhang, H., Park, J., Daniels, N.M., Crovella, M.E., et al.: Going the distance for protein function prediction: a new distance metric for protein interaction networks. PLoS ONE 8(10), e76339 (2013). https://doi.org/10.1371/journal.pone.0076339

    Article  Google Scholar 

  23. Rogen, P., Fain, B.: Automatic classification of protein structure by using Gauss integrals. Proc. Natl. Acad. Sci. USA 100(1), 119–124 (2002). https://doi.org/10.1073/pnas.2636460100

    Article  Google Scholar 

  24. Dayhoff, M., Schwarts, R., Orcutt, B.: A model of evolutionary change in proteins. Atlas of Protein Sequences Struct. 5(3), 345–352 (1978)

    Google Scholar 

  25. Mottl, V.V.: Metric spaces admitting linear operations and inner product. Doklady Math. 67(1), 140–143 (2003)

    MathSciNet  MATH  Google Scholar 

  26. Sulimova, V., Seredin, O., Mottl, V.: Metrics on the basis of optimal alignment of biomolecular sequences. JMLDA 2(3), 286–304 (2016). https://doi.org/10.21469/22233792.2.3.03

    Article  Google Scholar 

  27. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990). https://doi.org/10.1006/jmbi.1990.9999

    Article  Google Scholar 

  28. Lipman, D.J., Pearson, W.R.: Rapid and sensitive protein similarity searches. Science 227(4693), 1435–1441 (1985). https://doi.org/10.1126/science.2983426

    Article  Google Scholar 

  29. Pearson, W.R.: Flexible sequence similarity searching with the FASTA3 program package. Methods Mol. Biol. 185–219 (2000). https://doi.org/10.1385/1-59259-192-2:185

  30. Sakoe, H., Chiba, S.: Dynamic programming optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978). https://doi.org/10.1109/tassp.1978.1163055

    Article  MATH  Google Scholar 

  31. Myers, C., Rabiner, L.R., Rosenberg, A.E.: Performance tradeoffs in dynamic time warping algorithms for isolated word recognition. IEEE Trans. Acoust. Speech Signal Process. 28(6), 623–635 (1980). https://doi.org/10.1109/tassp.1980.1163491

    Article  MATH  Google Scholar 

  32. Silva, D.F., Batista, G.E.A.P.A.: Speeding up all-pairwise dynamic time warping matrix calculation. In: Proceedings of the 2016 SIAM International Conference on Data Mining, pp. 837–845 (2016). https://doi.org/10.1137/1.9781611974348.94

  33. Virus Database at University College London (VIDA). http://www.biochem.ucl.ac.uk/bsm/virus_database/VIDA3/VIDA.html

  34. Lanckriet, G., Bie, T.D., Cristianini, N., Jordan, M.I., Noble, W.S.: A statistical framework for genomic data fusion. Bioinformatics 20(16), 2626–2635 (2004). https://doi.org/10.1093/bioinformatics/bth294

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the Russian Foundation for Basic Research, Grant 15-07-08967.

The results of the research project are published with the financial support of Tula State University within the framework of the scientific project - 2017-18PUBL.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Valentina Sulimova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sulimova, V., Seredin, O., Mottl, V. (2019). Recognition of Herpes Viruses on the Basis of a New Metric for Protein Sequences. In: Strijov, V., Ignatov, D., Vorontsov, K. (eds) Intelligent Data Processing. IDP 2016. Communications in Computer and Information Science, vol 794. Springer, Cham. https://doi.org/10.1007/978-3-030-35400-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-35400-8_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-35399-5

  • Online ISBN: 978-3-030-35400-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics