Abstract
Human adaptive immune response relies on the recognition of short peptides through proteins of the major histocompatibility complex (MHC). MHC class II molecules are responsible for the recognition of antigens external to a cell. Understanding their specificity is an important step in the design of peptide-based vaccines. The high degree of polymorphism in MHC class II makes the prediction of peptides that bind (and then usually cause an immune response) a challenging task. Typically, these predictions rely on machine learning methods, thus a sufficient amount of data points is required. Due to the scarcity of data, currently there are reliable prediction models only for about 7% of all known alleles available.
We show how to transform the problem of MHC class II binding peptide prediction into a well-studied machine learning problem called multiple instance learning. For alleles with sufficient data, we show how to build a well-performing predictor using standard kernels for multiple instance learning. Furthermore, we introduce a new method for training a classifier of an allele without the necessity for binding allele data of the target allele. Instead, we use binding peptide data from other alleles and similarities between the structures of the MHC class II alleles to guide the learning process. This allows for the first time constructing predictors for about two thirds of all known MHC class II alleles. The average performance of these predictors on 14 test alleles is 0.71, measured as area under the ROC curve.
Availability: The methods are integrated into the EpiToolKit framework for which there exists a webserver at http://www.epitoolkit.org/ mhciimulti
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Topalian, S.L.: MHC class II restricted tumor antigens and the role of CD4+ T cells in cancer immunotherapy. Curr. Opin. Immunol. 6(5), 741–745 (1994)
Robinson, J., Waller, M.J., Parham, P., Groot, N.d., et al.: IMGT/HLA and IMGT/MHC: sequence databases for the study of the major histocompatibility complex. Nucleic Acids Res. 31(1), 311–314 (2003)
Peters, B., Sidney, J., Bourne, P., Bui, H.H., Buus, S., et al.: The immune epitope database and analysis resource: from vision to blueprint. PLoS Biol. 3(3), 91 (2005)
Bui, H.H., Sidney, J., Peters, B., Sathiamurthy, M., Asabe, S., et al.: Automated generation and evaluation of specific MHC binding predictive tools: ARB matrix applications. Immunogenetics 57(5), 304–314 (2005)
Nielsen, M., Lundegaard, C., Lund, O.: Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method. BMC Bioinformatics 8, 238 (2007)
Rammensee, H.G., Friede, T., Stevanović, S.: MHC ligands and peptide motifs: first listing. Immunogenetics 41(4), 178–228 (1995)
Reche, P.A., Glutting, J.P., Zhang, H., Reinherz, E.L.: Enhancement to the RANKPEP resource for the prediction of peptide binding to MHC molecules using profiles. Immunogenetics 56(6), 405–419 (2004)
Singh, H., Raghava, G.P.: ProPred: prediction of HLA-DR binding sites. Bioinformatics 17(12), 1236–1237 (2001)
Sturniolo, T., Bono, E., Ding, J., Raddrizzani, L., Tuereci, O., et al.: Generation of tissue-specific and promiscuous HLA ligand databases using DNA microarrays and virtual HLA class II matrices. Nat. Biotechnol. 17(6), 555–561 (1999)
Nielsen, M., Lundegaard, C., Worning, P., Hvid, C.S., Lamberth, K., Buus, S., Brunak, S., Lund, O.: Improved prediction of MHC class I and class II epitopes using a novel Gibbs sampling approach. Bioinformatics 20(9), 1388–1397 (2004)
Noguchi, H., Kato, R., Hanai, T., Matsubara, Y., Honda, H., Brusic, V., Kobayashi, T.: Hidden markov model-based prediction of antigenic peptides that interact with MHC class II molecules. J. Biosci. Bioeng. 94(3), 264–270 (2002)
Karpenko, O., Shi, J., Dai, Y.: Prediction of MHC class II binders using the ant colony search strategy. Artif. Intell. Med. 35(1-2), 147–156 (2005)
Brusic, V., Rudy, G., Honeyman, G., Hammer, J., Harrison, L.: Prediction of MHC class II-binding peptides using an evolutionary algorithm and artificial neural network. Bioinformatics 14(2), 121–130 (1998)
Guan, P., Doytchinova, I.A., Zygouri, C., Flower, D.R.: MHCPred: A server for quantitative prediction of peptide-MHC binding. Nucleic Acids Res. 31(13), 3621–3624 (2003)
Dönnes, P., Kohlbacher, O.: SVMHC: a server for prediction of MHC-binding peptides. Nucleic Acids Res. 34, 194–197 (Web Server issue) (2006)
Salomon, J., Flower, D.: Predicting class II MHC-peptide binding: a kernel based approach using similarity scores. BMC Bioinformatics 7(1), 501 (2006)
Wan, J., Liu, W., Xu, Q., Ren, Y., Flower, D.R., Li, T.: SVRMHC prediction server for MHC-binding peptides. BMC Bioinformatics 7, 463 (2006)
Wang, P., Sidney, J., Dow, C., Mothé, B., Sette, A., Peters, B.: A systematic assessment of MHC class II peptide binding predictions and evaluation of a consensus approach. PLoS Comput. Biol. 4(4), 1000048 (2008)
Zaitlen, N., Reyes-Gomez, M., Heckerman, D., Jojic, N.: Shift-invariant adaptive double threading: Learning MHC II - peptide binding. In: Speed, T., Huang, H. (eds.) RECOMB 2007. LNCS (LNBI), vol. 4453, pp. 181–195. Springer, Heidelberg (2007)
DeLuca, D., Khattab, B., Blasczyk, R.: A modular concept of hla for comprehensive peptide binding prediction. Immunogenetics 59(1), 25–35 (2007)
Jacob, L., Vert, J.P.: Efficient peptide-MHC-I binding prediction for alleles with few known binders. Bioinformatics 24(3), 358–366 (2008)
Nielsen, M., Lundegaard, C., Blicher, T., Lamberth, K., Harndahl, M., Justesen, S., Røder, G., Peters, B., Sette, A., Lund, O., Buus, S.: NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and -B locus protein of known sequence. PLoS ONE 2(8), 796 (2007)
Gärtner, T., Flach, P.A., Kowalczyk, A., Smola, A.J.: Multi-instance kernels. In: Sammut, C., Hoffmann, A.G. (eds.) ICML, pp. 179–186. Morgan Kaufmann, San Francisco (2002)
Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 89(1-2), 31–71 (1997)
Schölkopf, B., Smola, A.J., Williamson, R.C., Bartlett, P.L.: New support vector algorithms. Neural Comput. 12(5), 1207–1245 (2000)
Dooly, D.R., Zhang, Q., Goldman, S.A., Amar, R.A.: Multiple-instance learning of real-valued data. J. Machine Learn Res. 3, 651–678 (2002)
Ray, S., Page, D.: Multiple instance regression. In: ICML 2001: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 425–432. Morgan Kaufmann Publishers Inc, San Francisco (2001)
Hammer, J., Belunis, C., Bolin, D., Papadopoulos, J., Walsky, R., Higelin, J., Danho, W., Sinigaglia, F., Nagy, Z.A.: High-affinity binding of short peptides to major histocompatibility complex class II molecules by anchor combinations. Proc. Natl. Acad. Sci. USA 91(10), 4456–4460 (1994)
Venkatarajan, M.S., Braun, W.: New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical-chemical properties. Journal of Molecular Modeling 7(12), 445–453 (2001)
Kawashima, S., Ogata, H., Kanehisa, M.: AAindex: Amino acid index database. Nucleic Acids Res. 27(1), 368–369 (1999)
Hertz, T., Yanover, C.: Pepdist: A new framework for protein-peptide binding prediction based on learning peptide distance functions. BMC Bioinformatics 7 (suppl. 1), S3 (2006)
Crooks, G.E., Hon, G., Chandonia, J.M., Brenner, S.E.: WebLogo: a sequence logo generator. Genome Res. 14(6), 1188–1190 (2004)
Li, H., Jiang, T.: A class of edit kernels for SVMs to predict translation initiation sites in eukaryotic mRNAs. In: RECOMB, pp. 262–271 (2004)
Schoenberg, I.J.: Metric spaces and positive definite functions. Trans. Amer. Math. Soc. 44(3), 522–536 (1938)
Consogno, G., Manici, S., Facchinetti, V., Bachi, A., Hammer, J., et al.: Identification of immunodominant regions among promiscuous HLA-DR-restricted CD4+ T-cell epitopes on the tumor antigen MAGE-3. Blood 101(3), 1038–1044 (2003)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Feldhahn, M., Thiel, P., Schuler, M.M., Hillen, N., Stevanović, S., et al.: EpiToolKit–a web server for computational immunomics. Nucleic Acids Res. (2008) (advanced access, doi:10.1093/nar/gkn229)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pfeifer, N., Kohlbacher, O. (2008). Multiple Instance Learning Allows MHC Class II Epitope Predictions Across Alleles. In: Crandall, K.A., Lagergren, J. (eds) Algorithms in Bioinformatics. WABI 2008. Lecture Notes in Computer Science(), vol 5251. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87361-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-87361-7_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87360-0
Online ISBN: 978-3-540-87361-7
eBook Packages: Computer ScienceComputer Science (R0)