Introduction to the Peptide Binding Problem of Computational Immunology: New Results

Shen, Wen-Jun; Wong, Hau-San; Xiao, Quan-Wu; Guo, Xin; Smale, Stephen

doi:10.1007/s10208-013-9173-9

Introduction to the Peptide Binding Problem of Computational Immunology: New Results

Published: 17 September 2013

Volume 14, pages 951–984, (2014)
Cite this article

Foundations of Computational Mathematics Aims and scope Submit manuscript

Wen-Jun Shen¹,
Hau-San Wong¹,
Quan-Wu Xiao²,
Xin Guo³ &
…
Stephen Smale⁴

994 Accesses
17 Citations
Explore all metrics

Abstract

We attempt to establish geometrical methods for amino acid sequences. To measure the similarities of these sequences, a kernel on strings is defined using only the sequence structure and a good amino acid substitution matrix (e.g. BLOSUM62). The kernel is used in learning machines to predict binding affinities of peptides to human leukocyte antigen DR (HLA-DR) molecules. On both fixed allele (Nielsen and Lund in BMC Bioinform. 10:296, 2009) and pan-allele (Nielsen et al. in Immunome Res. 6(1):9, 2010) benchmark databases, our algorithm achieves the state-of-the-art performance. The kernel is also used to define a distance on an HLA-DR allele set based on which a clustering analysis precisely recovers the serotype classifications assigned by WHO (Holdsworth et al. in Tissue Antigens 73(2):95–170, 2009; Marsh et al. in Tissue Antigens 75(4):291–455, 2010). These results suggest that our kernel relates well the sequence structure of both peptides and HLA-DR molecules to their biological functions, and that it offers a simple, powerful and promising methodology to immunology and amino acid sequence studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Near-term advances in quantum natural language processing

Article 11 April 2024

An Overview of Scoring Functions Used for Protein–Ligand Interactions in Molecular Docking

Article 15 March 2019

Advances in Structural Bioinformatics

Notes

Allele: an alternative form of a gene that occurs at a specified chromosomal position (locus) [22].
ftp://ftp.ebi.ac.uk/pub/databases/imgt/mhc/hla/DRB_prot.fasta.
We have found from a number of different experiments that “they do not cluster”. (Perhaps the geometric phenomenon here is in the higher dimensional scaled topology, i.e. the Betti numbers β _i>0, for i>0 [4].)
Both the data set and the 5-fold partition are available at http://www.cbs.dtu.dk/suppl/immunology/NetMHCII-2.0.php.
Both the data set and the 5-part partition are available at http://www.cbs.dtu.dk/suppl/immunology/NetMHCIIpan-2.0.
The data set was downloaded from http://www.immuneepitope.org/list_page.php?list_type=mhc&measured_response=&total_rows=64797&queryType=true, on May 23, 2012.
The code is published in http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?netMHCIIpan.
Another way of measuring distance between clusters is the Hausdorff distance.

References

M. Andreatta, Discovering sequence motifs in quantitative and qualitative peptide data. Ph.D. thesis, Center for Biological Sequence Analysis, Department of systems biology, Technical University of Denmark, 2012.
N. Aronszajn, Theory of reproducing kernels, Trans. Am. Math. Soc. 68, 337–404 (1950).
Article MathSciNet MATH Google Scholar
A. Baas, X.J. Gao, G. Chelvanayagam, Peptide binding motifs and specificities for HLA-DQ molecules, Immunogenetics 50, 8–15 (1999).
Article Google Scholar
L. Bartholdi, T. Schick, N. Smale, S. Smale, A.W. Baker, Hodge theory on metric spaces, Found. Comput. Math. 12(1), 1–48 (2012).
Article MathSciNet MATH Google Scholar
E.E. Bittar, N. Bittar (eds.), Principles of Medical Biology: Molecular and Cellular Pharmacology (JAI Press, London, 1997).
Google Scholar
F.A. Castelli, C. Buhot, A. Sanson, H. Zarour, S. Pouvelle-Moratille, C. Nonn, H. Gahery-Ségard, J.-G. Guillet, A. Ménez, B. Georges, B. Maillère, HLA-DP4, the most frequent HLA II molecule, defines a new supertype of peptide-binding specificity, J. Immunol. 169, 6928–6934 (2002).
Article Google Scholar
F. Cucker, D.X. Zhou, Learning Theory: An Approximation Theory Viewpoint (Cambridge University Press, Cambridge, 2007).
Book Google Scholar
W.H.E. Day, H. Edelsbrunner, Efficient algorithms for agglomerative hierarchical clustering methods, J. Classif. 1(1), 7–24 (1984).
Article MATH Google Scholar
I.A. Doytchinova, D.R. Flower, In silico identification of supertypes for class II MHCs, J. Immunol. 174(11), 7085–7095 (2005).
Article Google Scholar
Y. El-Manzalawy, D. Dobbs, V. Honavar, On evaluating MHC-II binding peptide prediction methods, PLoS ONE 3, e3268 (2008).
Article Google Scholar
M. Galan, E. Guivier, G. Caraux, N. Charbonnel, J.-F. Cosson, A 454 multiplex sequencing method for rapid and reliable genotyping of highly polymorphic genes in large-scale studies, BMC Genom. 11(296) (2010).
G.H. Golub, M. Heath, G. Wahba, Generalized cross-validation as a method for choosing a good ridge parameter, Technometrics 21, 215–224 (1979).
Article MathSciNet MATH Google Scholar
D. Graur, W.-H. Li, Fundamentals of Molecular Evolution (Sinauer Associates, Sunderland, 2000).
Google Scholar
W.W. Grody, R.M. Nakamura, F.L. Kiechle, C. Strom, Molecular Diagnostics: Techniques and Applications for the Clinical Laboratory (Academic Press, San Diego, 2010).
Google Scholar
D. Haussler, Convolution kernels on discrete structures. Tech. report, 1999.
S. Henikoff, J.G. Henikoff, Amino acid substitution matrices from protein blocks, Proc. Natl. Acad. Sci. USA 89, 10915–10919 (1992).
Article Google Scholar
R. Holdsworth, C.K. Hurley, S.G. Marsh, M. Lau, H.J. Noreen, J.H. Kempenich, M. Setterholm, M. Maiers, The HLA dictionary 2008: a summary of HLA-A, -B, -C, -DRB1/3/4/5, and -DQB1 alleles and their association with serologically defined HLA-A, -B, -C, -DR, and -DQ antigens, Tissue Antigens 73(2), 95–170 (2009).
Article Google Scholar
R.A. Horn, C.R. Johnson, Topics in Matrix Analysis (Cambridge University Press, Cambridge, 1994).
MATH Google Scholar
L. Jacob, J.-P. Vert, Efficient peptide–MHC-I binding prediction for alleles with few known binders, Bioinformatics 24(3), 358–366 (2008).
Article Google Scholar
C.A. Janeway, P. Travers, M. Walport, M.J. Shlomchik, Immunobiology, 5th edn. (Garland Science, New York, 2001).
Google Scholar
N. Jojic, M. Reyes-Gomez, D. Heckerman, C. Kadie, O. Schueler-Furman, Learning MHC I–peptide binding, Bioinformatics 22(14), e227–e235 (2006).
Article Google Scholar
T.J. Kindt, R.A. Goldsby, B.A. Osborne, J. Kuby, Kuby Immunology (Freeman, New York, 2007).
Google Scholar
C. Leslie, E. Eskin, W.S. Noble, The spectrum kernel: a string kernel for SVM protein classification, in Pacific Symposium on Biocomputing, vol. 7 (2002), pp. 566–575.
Google Scholar
H.H. Lin, G.L. Zhang, S. Tongchusak, E.L. Reinherz, V. Brusic, Evaluation of MHC-II peptide binding prediction servers: applications for vaccine research, BMC Bioinform. 9(Suppl 12), S22 (2008).
Article Google Scholar
O. Lund, M. Nielsen, C. Kesmir, A.G. Petersen, C. Lundegaard, P. Worning, C. Sylvester-Hvid, K. Lamberth, G. Røder, S. Justesen, S. Buus, S. Brunak, Definition of supertypes for HLA molecules using clustering of specificity matrices, Immunogenetics 55(12), 797–810 (2004).
Article Google Scholar
O. Lund, M. Nielsen, C. Lundegaard, C. Keşmir, S. Brunak, Immunological Bioinformatics (MIT Press, Cambridge, 2005).
MATH Google Scholar
M. Maiers, G.M. Schreuder, M. Lau, S.G. Marsh, M. Fernandes-Vi na, H. Noreen, M. Setterholm, C.K. Hurley, Use of a neural network to assign serologic specificities to HLA-A, -B and -DRB1 allelic products, Tissue Antigens 62(1), 21–47 (2003).
Article Google Scholar
S.G.E. Marsh, E.D. Albert, W.F. Bodmer, R.E. Bontrop, B. Dupont, H.A. Erlich, M. Fernández-Vi na, D.E. Geraghty, R. Holdsworth, C.K. Hurley, M. Lau, K.W. Lee, B. Mach, M. Maiersj, W.R. Mayr, C.R. Müller, P. Parham, E.W. Petersdorf, T. SasaZuki, J.L. Strominger, A. Svejgaard, P.I. Terasaki, J.M. Tiercy, J. Trowsdale, Nomenclature for factors of the HLA system, 2010, Tissue Antigens 75(4), 291–455 (2010).
Article Google Scholar
M. Nielsen, O. Lund, NN-align. An artificial neural network-based alignment algorithm for MHC class II peptide binding prediction, BMC Bioinform. 10, 296 (2009).
Article Google Scholar
M. Nielsen, C. Lundegaard, T. Blicher, B. Peters, A. Sette, S. Justesen, S. Buus, O. Lund, Quantitative predictions of peptide binding to any HLA-DR molecule of known sequence: NetMHCIIpan, PLoS Comput. Biol. 4(7), e1000107 (2008).
Article Google Scholar
M. Nielsen, S. Justesen, O. Lund, C. Lundegaard, S. Buus, NetMHCIIpan-2.0: improved pan-specific HLA-DR predictions using a novel concurrent alignment and weight optimization training procedure, Immunome Res. 6(1), 9 (2010).
Article Google Scholar
D. Ou, L.A. Mitchell, A.J. Tingle, A new categorization of HLA DR alleles on a functional basis, Hum. Immunol. 59(10), 665–676 (1998).
Article Google Scholar
J. Robinson, M.J. Waller, P. Parham, N. de Groot, R. Bontrop, L.J. Kennedy, P. Stoehr, S.G. Marsh, IMGT/HLA and IMGT/MHC: sequence databases for the study of the major histocompatibility complex, Nucleic Acids Res. 31(1), 311–314 (2003).
Article Google Scholar
R. Sadiq, S. Tesfamariam, Probability density functions based weights for ordered weighted averaging (OWA) operators: an example of water quality indices, Eur. J. Oper. Res. 182(3), 1350–1368 (2007).
Article MATH Google Scholar
H. Saigo, J.-P. Vert, N. Ueda, T. Akutsu, Protein homology detection using string alignment kernels, Bioinformatics 20(11), 1682–1689 (2004).
Article Google Scholar
H. Saigo, J.P. Vert, T. Akutsu, Optimizing amino acid substitution matrices with a local alignment kernel, BMC Bioinform. 7, 246 (2006).
Article Google Scholar
J. Salomon, D.R. Flower, Predicting class II MHC-peptide binding: a kernel based approach using similarity scores, BMC Bioinform. 7, 501 (2006).
Article Google Scholar
B. Schölkopf, A.J. Smola, Learning with Kernels (MIT Press, Cambridge, 2001).
Google Scholar
A. Sette, J. Sidney, Nine major HLA class I supertypes account for the vast preponderance of HLA-A and -B polymorphism, Immunogenetics 50(3–4), 201–212 (1999).
Article Google Scholar
A. Sette, L. Adorini, S.M. Colon, S. Buus, H.M. Grey, Capacity of intact proteins to bind to MHC class II molecules, J. Immunol. 143(4), 1265–1267 (1989).
Google Scholar
J. Shawe-Taylor, N. Cristianini, Kernel Methods for Pattern Analysis (Cambridge University Press, Cambridge, 2004).
Book Google Scholar
J. Sidney, H.M. Grey, R.T. Kubo, A. Sette, Practical, biochemical and evolutionary implications of the discovery of HLA class I supermotifs, Immunol. Today 17(6), 261–266 (1996).
Article Google Scholar
J. Sidney, B. Peters, N. Frahm, C. Brander, A. Sette, HLA class I supertypes: a revised and updated classification, BMC Immunol. 9(1) (2008).
S. Smale, L. Rosasco, J. Bouvrie, A. Caponnetto, T. Poggio, Mathematics of the neural response, Found. Comput. Math. 10(1), 67–91 (2010).
Article MathSciNet MATH Google Scholar
S. Southwood, J. Sidney, A. Kondo, M.F. del Guercio, E. Appella, S. Hoffman, R.T. Kubo, R.W. Chesnut, H.M. Grey, A. Sette, Several common HLA-DR types share largely overlapping peptide binding repertoires, J. Immunol. 160(7), 3363–3373 (1998).
Google Scholar
G. Thomson, N. Marthandan, J.A. Hollenbach, S.J. Mack, H.A. Erlich, R.M. Single, M.J. Waller, S.G.E. Marsh, P.A. Guidry, D.R. Karp, R.H. Scheuermann, S.D. Thompson, D.N. Glass, W. Helmberg, Sequence feature variant type (SFVT) analysis of the HLA genetic association in juvenile idiopathic arthritis, in Pacific Symposium on Biocomputing’2010 (2010), pp. 359–370.
Google Scholar
J.-P. Vert, H. Saigo, T. Akustu, Convolution and local alignment kernel, in Kernel Methods in Computational Biology, ed. by B. Schoelkopf, K. Tsuda, J.-P. Vert (MIT Press, Cambridge, 2004), pp. 131–154.
Google Scholar
G. Wahba, Spline Models for Observational Data (SIAM, Philadelphia, 1990).
Book MATH Google Scholar
L. Wan, G. Reinert, F. Sun, M.S. Waterman, Alignment-free sequence comparison (II): theoretical power of comparison statistics, J. Comput. Biol. 17(11), 1467–1490 (2010).
Article MathSciNet Google Scholar
P. Wang, J. Sidney, C. Dow, B. Mothé, A. Sette, B. Peters, A systematic assessment of MHC class II peptide binding predictions and evaluation of a consensus approach, PLoS Comput. Biol. 4, e1000048 (2008).
Article Google Scholar
C. Widmer, N.C. Toussaint, Y. Altun, O. Kohlbacher, G. Rätsch, Novel machine learning methods for MHC class I binding prediction, in Pattern Recognition Bioinformatics, vol. 6282, ed. by T.M.H. Dijkstra, E. Tsivtsivadze, E. Marchiori, T. Heskes (Springer, Berlin, 2010), pp. 98–109.
Chapter Google Scholar
R.R. Yager, On ordered weighted averaging aggregation operators in multicriteria decision making, IEEE Trans. Syst. Man Cybern. 18(1), 183–190 (1988).
Article MathSciNet MATH Google Scholar
J.W. Yewdell, J.R. Bennink, Immunodominance in major histocompatibility complex class I-restricted T lymphocyte responses, Annu. Rev. Immunol. 17, 51–88 (1999).
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank Shuaicheng Li for pointing out to us that the portions of DRB alleles that contact with peptides can be obtain from the non-aligned DRB amino acid sequences by the use of two markers, “RFL” and “TVQ”. We thank Morten Nielsen for his criticism on over-fitting.

We thank Yiming Cheng for his suggestions on the computer code which were very helpful for speeding up the algorithm for evaluating K ³. He also discussed with us the influence on HLA–peptide binding prediction of using different representations of the alleles, and of adjusting the index β in the kernel according to the sequence length. Although the topics are not included in the paper, they have some potential for future work.

Also, we appreciate Felipe Cucker for reviewing our draft, making many improvements. We thank Santiago Laplagne for pointing out a bug in the codes for Table 2.

The work described in this paper is supported by GRF grant [Project No. 9041544] and [Project No. CityU 103210] and [Project No. 9380050].

Author information

Authors and Affiliations

Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
Wen-Jun Shen & Hau-San Wong
Microsoft Search Technology Center Asia, Beijing, China
Quan-Wu Xiao
Department of Statistical Science, Duke University, Durham, NC, USA
Xin Guo
Department of Mathematics, City University of Hong Kong, Kowloon, Hong Kong
Stephen Smale

Authors

Wen-Jun Shen
View author publications
You can also search for this author in PubMed Google Scholar
Hau-San Wong
View author publications
You can also search for this author in PubMed Google Scholar
Quan-Wu Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Xin Guo
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Smale
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stephen Smale.

Additional information

Communicated by Teresa Krick.

Appendix: The BLOSUM62-2 Matrix

We list the whole BLOSUM62-2 matrix in Table 8. Table 9 explains the amino acids denoted by the capital letters.

Table 8 The BLOSUM62-2 matrix

Full size table

Table 9 The list of the amino acids

Full size table

From the Introduction, we see that the matrix Q can be recovered from the BLOSUM62-2 once the marginal probability vector p is available. The latter vector is obtained by

$$p = \bigl(\mbox{[BLOSUM62-2]}\bigr)^{-1} v_1, $$

where $v_{1} = (1,\ldots,1)\in\mathbb{R}^{20}$ is a vector with all its coordinate being 1. The matrix Q can be obtained precisely from http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/algo/blast/composition_adjustment/matrix_frequency_data.c#L391.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shen, WJ., Wong, HS., Xiao, QW. et al. Introduction to the Peptide Binding Problem of Computational Immunology: New Results. Found Comput Math 14, 951–984 (2014). https://doi.org/10.1007/s10208-013-9173-9

Download citation

Received: 26 August 2012
Revised: 05 May 2013
Accepted: 17 July 2013
Published: 17 September 2013
Issue Date: October 2014
DOI: https://doi.org/10.1007/s10208-013-9173-9

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introduction to the Peptide Binding Problem of Computational Immunology: New Results

Abstract

Access this article

Similar content being viewed by others

Near-term advances in quantum natural language processing

An Overview of Scoring Functions Used for Protein–Ligand Interactions in Molecular Docking

Advances in Structural Bioinformatics

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: The BLOSUM62-2 Matrix

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Introduction to the Peptide Binding Problem of Computational Immunology: New Results

Abstract

Access this article

Similar content being viewed by others

Near-term advances in quantum natural language processing

An Overview of Scoring Functions Used for Protein–Ligand Interactions in Molecular Docking

Advances in Structural Bioinformatics

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: The BLOSUM62-2 Matrix

Appendix: The BLOSUM62-2 Matrix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation