Skip to main content

Advertisement

Log in

Introduction to the Peptide Binding Problem of Computational Immunology: New Results

  • Published:
Foundations of Computational Mathematics Aims and scope Submit manuscript

Abstract

We attempt to establish geometrical methods for amino acid sequences. To measure the similarities of these sequences, a kernel on strings is defined using only the sequence structure and a good amino acid substitution matrix (e.g. BLOSUM62). The kernel is used in learning machines to predict binding affinities of peptides to human leukocyte antigen DR (HLA-DR) molecules. On both fixed allele (Nielsen and Lund in BMC Bioinform. 10:296, 2009) and pan-allele (Nielsen et al. in Immunome Res. 6(1):9, 2010) benchmark databases, our algorithm achieves the state-of-the-art performance. The kernel is also used to define a distance on an HLA-DR allele set based on which a clustering analysis precisely recovers the serotype classifications assigned by WHO (Holdsworth et al. in Tissue Antigens 73(2):95–170, 2009; Marsh et al. in Tissue Antigens 75(4):291–455, 2010). These results suggest that our kernel relates well the sequence structure of both peptides and HLA-DR molecules to their biological functions, and that it offers a simple, powerful and promising methodology to immunology and amino acid sequence studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. Allele: an alternative form of a gene that occurs at a specified chromosomal position (locus) [22].

  2. ftp://ftp.ebi.ac.uk/pub/databases/imgt/mhc/hla/DRB_prot.fasta.

  3. We have found from a number of different experiments that “they do not cluster”. (Perhaps the geometric phenomenon here is in the higher dimensional scaled topology, i.e. the Betti numbers β i >0, for i>0 [4].)

  4. Both the data set and the 5-fold partition are available at http://www.cbs.dtu.dk/suppl/immunology/NetMHCII-2.0.php.

  5. Both the data set and the 5-part partition are available at http://www.cbs.dtu.dk/suppl/immunology/NetMHCIIpan-2.0.

  6. The data set was downloaded from http://www.immuneepitope.org/list_page.php?list_type=mhc&measured_response=&total_rows=64797&queryType=true, on May 23, 2012.

  7. The code is published in http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?netMHCIIpan.

  8. Another way of measuring distance between clusters is the Hausdorff distance.

References

  1. M. Andreatta, Discovering sequence motifs in quantitative and qualitative peptide data. Ph.D. thesis, Center for Biological Sequence Analysis, Department of systems biology, Technical University of Denmark, 2012.

  2. N. Aronszajn, Theory of reproducing kernels, Trans. Am. Math. Soc. 68, 337–404 (1950).

    Article  MathSciNet  MATH  Google Scholar 

  3. A. Baas, X.J. Gao, G. Chelvanayagam, Peptide binding motifs and specificities for HLA-DQ molecules, Immunogenetics 50, 8–15 (1999).

    Article  Google Scholar 

  4. L. Bartholdi, T. Schick, N. Smale, S. Smale, A.W. Baker, Hodge theory on metric spaces, Found. Comput. Math. 12(1), 1–48 (2012).

    Article  MathSciNet  MATH  Google Scholar 

  5. E.E. Bittar, N. Bittar (eds.), Principles of Medical Biology: Molecular and Cellular Pharmacology (JAI Press, London, 1997).

    Google Scholar 

  6. F.A. Castelli, C. Buhot, A. Sanson, H. Zarour, S. Pouvelle-Moratille, C. Nonn, H. Gahery-Ségard, J.-G. Guillet, A. Ménez, B. Georges, B. Maillère, HLA-DP4, the most frequent HLA II molecule, defines a new supertype of peptide-binding specificity, J. Immunol. 169, 6928–6934 (2002).

    Article  Google Scholar 

  7. F. Cucker, D.X. Zhou, Learning Theory: An Approximation Theory Viewpoint (Cambridge University Press, Cambridge, 2007).

    Book  Google Scholar 

  8. W.H.E. Day, H. Edelsbrunner, Efficient algorithms for agglomerative hierarchical clustering methods, J. Classif. 1(1), 7–24 (1984).

    Article  MATH  Google Scholar 

  9. I.A. Doytchinova, D.R. Flower, In silico identification of supertypes for class II MHCs, J. Immunol. 174(11), 7085–7095 (2005).

    Article  Google Scholar 

  10. Y. El-Manzalawy, D. Dobbs, V. Honavar, On evaluating MHC-II binding peptide prediction methods, PLoS ONE 3, e3268 (2008).

    Article  Google Scholar 

  11. M. Galan, E. Guivier, G. Caraux, N. Charbonnel, J.-F. Cosson, A 454 multiplex sequencing method for rapid and reliable genotyping of highly polymorphic genes in large-scale studies, BMC Genom. 11(296) (2010).

  12. G.H. Golub, M. Heath, G. Wahba, Generalized cross-validation as a method for choosing a good ridge parameter, Technometrics 21, 215–224 (1979).

    Article  MathSciNet  MATH  Google Scholar 

  13. D. Graur, W.-H. Li, Fundamentals of Molecular Evolution (Sinauer Associates, Sunderland, 2000).

    Google Scholar 

  14. W.W. Grody, R.M. Nakamura, F.L. Kiechle, C. Strom, Molecular Diagnostics: Techniques and Applications for the Clinical Laboratory (Academic Press, San Diego, 2010).

    Google Scholar 

  15. D. Haussler, Convolution kernels on discrete structures. Tech. report, 1999.

  16. S. Henikoff, J.G. Henikoff, Amino acid substitution matrices from protein blocks, Proc. Natl. Acad. Sci. USA 89, 10915–10919 (1992).

    Article  Google Scholar 

  17. R. Holdsworth, C.K. Hurley, S.G. Marsh, M. Lau, H.J. Noreen, J.H. Kempenich, M. Setterholm, M. Maiers, The HLA dictionary 2008: a summary of HLA-A, -B, -C, -DRB1/3/4/5, and -DQB1 alleles and their association with serologically defined HLA-A, -B, -C, -DR, and -DQ antigens, Tissue Antigens 73(2), 95–170 (2009).

    Article  Google Scholar 

  18. R.A. Horn, C.R. Johnson, Topics in Matrix Analysis (Cambridge University Press, Cambridge, 1994).

    MATH  Google Scholar 

  19. L. Jacob, J.-P. Vert, Efficient peptide–MHC-I binding prediction for alleles with few known binders, Bioinformatics 24(3), 358–366 (2008).

    Article  Google Scholar 

  20. C.A. Janeway, P. Travers, M. Walport, M.J. Shlomchik, Immunobiology, 5th edn. (Garland Science, New York, 2001).

    Google Scholar 

  21. N. Jojic, M. Reyes-Gomez, D. Heckerman, C. Kadie, O. Schueler-Furman, Learning MHC I–peptide binding, Bioinformatics 22(14), e227–e235 (2006).

    Article  Google Scholar 

  22. T.J. Kindt, R.A. Goldsby, B.A. Osborne, J. Kuby, Kuby Immunology (Freeman, New York, 2007).

    Google Scholar 

  23. C. Leslie, E. Eskin, W.S. Noble, The spectrum kernel: a string kernel for SVM protein classification, in Pacific Symposium on Biocomputing, vol. 7 (2002), pp. 566–575.

    Google Scholar 

  24. H.H. Lin, G.L. Zhang, S. Tongchusak, E.L. Reinherz, V. Brusic, Evaluation of MHC-II peptide binding prediction servers: applications for vaccine research, BMC Bioinform. 9(Suppl 12), S22 (2008).

    Article  Google Scholar 

  25. O. Lund, M. Nielsen, C. Kesmir, A.G. Petersen, C. Lundegaard, P. Worning, C. Sylvester-Hvid, K. Lamberth, G. Røder, S. Justesen, S. Buus, S. Brunak, Definition of supertypes for HLA molecules using clustering of specificity matrices, Immunogenetics 55(12), 797–810 (2004).

    Article  Google Scholar 

  26. O. Lund, M. Nielsen, C. Lundegaard, C. Keşmir, S. Brunak, Immunological Bioinformatics (MIT Press, Cambridge, 2005).

    MATH  Google Scholar 

  27. M. Maiers, G.M. Schreuder, M. Lau, S.G. Marsh, M. Fernandes-Vi na, H. Noreen, M. Setterholm, C.K. Hurley, Use of a neural network to assign serologic specificities to HLA-A, -B and -DRB1 allelic products, Tissue Antigens 62(1), 21–47 (2003).

    Article  Google Scholar 

  28. S.G.E. Marsh, E.D. Albert, W.F. Bodmer, R.E. Bontrop, B. Dupont, H.A. Erlich, M. Fernández-Vi na, D.E. Geraghty, R. Holdsworth, C.K. Hurley, M. Lau, K.W. Lee, B. Mach, M. Maiersj, W.R. Mayr, C.R. Müller, P. Parham, E.W. Petersdorf, T. SasaZuki, J.L. Strominger, A. Svejgaard, P.I. Terasaki, J.M. Tiercy, J. Trowsdale, Nomenclature for factors of the HLA system, 2010, Tissue Antigens 75(4), 291–455 (2010).

    Article  Google Scholar 

  29. M. Nielsen, O. Lund, NN-align. An artificial neural network-based alignment algorithm for MHC class II peptide binding prediction, BMC Bioinform. 10, 296 (2009).

    Article  Google Scholar 

  30. M. Nielsen, C. Lundegaard, T. Blicher, B. Peters, A. Sette, S. Justesen, S. Buus, O. Lund, Quantitative predictions of peptide binding to any HLA-DR molecule of known sequence: NetMHCIIpan, PLoS Comput. Biol. 4(7), e1000107 (2008).

    Article  Google Scholar 

  31. M. Nielsen, S. Justesen, O. Lund, C. Lundegaard, S. Buus, NetMHCIIpan-2.0: improved pan-specific HLA-DR predictions using a novel concurrent alignment and weight optimization training procedure, Immunome Res. 6(1), 9 (2010).

    Article  Google Scholar 

  32. D. Ou, L.A. Mitchell, A.J. Tingle, A new categorization of HLA DR alleles on a functional basis, Hum. Immunol. 59(10), 665–676 (1998).

    Article  Google Scholar 

  33. J. Robinson, M.J. Waller, P. Parham, N. de Groot, R. Bontrop, L.J. Kennedy, P. Stoehr, S.G. Marsh, IMGT/HLA and IMGT/MHC: sequence databases for the study of the major histocompatibility complex, Nucleic Acids Res. 31(1), 311–314 (2003).

    Article  Google Scholar 

  34. R. Sadiq, S. Tesfamariam, Probability density functions based weights for ordered weighted averaging (OWA) operators: an example of water quality indices, Eur. J. Oper. Res. 182(3), 1350–1368 (2007).

    Article  MATH  Google Scholar 

  35. H. Saigo, J.-P. Vert, N. Ueda, T. Akutsu, Protein homology detection using string alignment kernels, Bioinformatics 20(11), 1682–1689 (2004).

    Article  Google Scholar 

  36. H. Saigo, J.P. Vert, T. Akutsu, Optimizing amino acid substitution matrices with a local alignment kernel, BMC Bioinform. 7, 246 (2006).

    Article  Google Scholar 

  37. J. Salomon, D.R. Flower, Predicting class II MHC-peptide binding: a kernel based approach using similarity scores, BMC Bioinform. 7, 501 (2006).

    Article  Google Scholar 

  38. B. Schölkopf, A.J. Smola, Learning with Kernels (MIT Press, Cambridge, 2001).

    Google Scholar 

  39. A. Sette, J. Sidney, Nine major HLA class I supertypes account for the vast preponderance of HLA-A and -B polymorphism, Immunogenetics 50(3–4), 201–212 (1999).

    Article  Google Scholar 

  40. A. Sette, L. Adorini, S.M. Colon, S. Buus, H.M. Grey, Capacity of intact proteins to bind to MHC class II molecules, J. Immunol. 143(4), 1265–1267 (1989).

    Google Scholar 

  41. J. Shawe-Taylor, N. Cristianini, Kernel Methods for Pattern Analysis (Cambridge University Press, Cambridge, 2004).

    Book  Google Scholar 

  42. J. Sidney, H.M. Grey, R.T. Kubo, A. Sette, Practical, biochemical and evolutionary implications of the discovery of HLA class I supermotifs, Immunol. Today 17(6), 261–266 (1996).

    Article  Google Scholar 

  43. J. Sidney, B. Peters, N. Frahm, C. Brander, A. Sette, HLA class I supertypes: a revised and updated classification, BMC Immunol. 9(1) (2008).

  44. S. Smale, L. Rosasco, J. Bouvrie, A. Caponnetto, T. Poggio, Mathematics of the neural response, Found. Comput. Math. 10(1), 67–91 (2010).

    Article  MathSciNet  MATH  Google Scholar 

  45. S. Southwood, J. Sidney, A. Kondo, M.F. del Guercio, E. Appella, S. Hoffman, R.T. Kubo, R.W. Chesnut, H.M. Grey, A. Sette, Several common HLA-DR types share largely overlapping peptide binding repertoires, J. Immunol. 160(7), 3363–3373 (1998).

    Google Scholar 

  46. G. Thomson, N. Marthandan, J.A. Hollenbach, S.J. Mack, H.A. Erlich, R.M. Single, M.J. Waller, S.G.E. Marsh, P.A. Guidry, D.R. Karp, R.H. Scheuermann, S.D. Thompson, D.N. Glass, W. Helmberg, Sequence feature variant type (SFVT) analysis of the HLA genetic association in juvenile idiopathic arthritis, in Pacific Symposium on Biocomputing’2010 (2010), pp. 359–370.

    Google Scholar 

  47. J.-P. Vert, H. Saigo, T. Akustu, Convolution and local alignment kernel, in Kernel Methods in Computational Biology, ed. by B. Schoelkopf, K. Tsuda, J.-P. Vert (MIT Press, Cambridge, 2004), pp. 131–154.

    Google Scholar 

  48. G. Wahba, Spline Models for Observational Data (SIAM, Philadelphia, 1990).

    Book  MATH  Google Scholar 

  49. L. Wan, G. Reinert, F. Sun, M.S. Waterman, Alignment-free sequence comparison (II): theoretical power of comparison statistics, J. Comput. Biol. 17(11), 1467–1490 (2010).

    Article  MathSciNet  Google Scholar 

  50. P. Wang, J. Sidney, C. Dow, B. Mothé, A. Sette, B. Peters, A systematic assessment of MHC class II peptide binding predictions and evaluation of a consensus approach, PLoS Comput. Biol. 4, e1000048 (2008).

    Article  Google Scholar 

  51. C. Widmer, N.C. Toussaint, Y. Altun, O. Kohlbacher, G. Rätsch, Novel machine learning methods for MHC class I binding prediction, in Pattern Recognition Bioinformatics, vol. 6282, ed. by T.M.H. Dijkstra, E. Tsivtsivadze, E. Marchiori, T. Heskes (Springer, Berlin, 2010), pp. 98–109.

    Chapter  Google Scholar 

  52. R.R. Yager, On ordered weighted averaging aggregation operators in multicriteria decision making, IEEE Trans. Syst. Man Cybern. 18(1), 183–190 (1988).

    Article  MathSciNet  MATH  Google Scholar 

  53. J.W. Yewdell, J.R. Bennink, Immunodominance in major histocompatibility complex class I-restricted T lymphocyte responses, Annu. Rev. Immunol. 17, 51–88 (1999).

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Shuaicheng Li for pointing out to us that the portions of DRB alleles that contact with peptides can be obtain from the non-aligned DRB amino acid sequences by the use of two markers, “RFL” and “TVQ”. We thank Morten Nielsen for his criticism on over-fitting.

We thank Yiming Cheng for his suggestions on the computer code which were very helpful for speeding up the algorithm for evaluating K 3. He also discussed with us the influence on HLA–peptide binding prediction of using different representations of the alleles, and of adjusting the index β in the kernel according to the sequence length. Although the topics are not included in the paper, they have some potential for future work.

Also, we appreciate Felipe Cucker for reviewing our draft, making many improvements. We thank Santiago Laplagne for pointing out a bug in the codes for Table 2.

The work described in this paper is supported by GRF grant [Project No. 9041544] and [Project No. CityU 103210] and [Project No. 9380050].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephen Smale.

Additional information

Communicated by Teresa Krick.

Appendix: The BLOSUM62-2 Matrix

Appendix: The BLOSUM62-2 Matrix

We list the whole BLOSUM62-2 matrix in Table 8. Table 9 explains the amino acids denoted by the capital letters.

Table 8 The BLOSUM62-2 matrix
Table 9 The list of the amino acids

From the Introduction, we see that the matrix Q can be recovered from the BLOSUM62-2 once the marginal probability vector p is available. The latter vector is obtained by

$$p = \bigl(\mbox{[BLOSUM62-2]}\bigr)^{-1} v_1, $$

where \(v_{1} = (1,\ldots,1)\in\mathbb{R}^{20}\) is a vector with all its coordinate being 1. The matrix Q can be obtained precisely from http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/algo/blast/composition_adjustment/matrix_frequency_data.c#L391.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shen, WJ., Wong, HS., Xiao, QW. et al. Introduction to the Peptide Binding Problem of Computational Immunology: New Results. Found Comput Math 14, 951–984 (2014). https://doi.org/10.1007/s10208-013-9173-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10208-013-9173-9

Keywords

Mathematics Subject Classification

Navigation