Abstract
This chapter provides a review of the challenges machine-learning specialists face when trying to assist virologists by generating an automatic prediction of an outcome of HIV therapy.
Optimizing HIV therapies is crucial since the virus rapidly develops mutations to evade drug pressures. Modern anti-HIV regimens comprise multiple drugs in order to prevent, or at least delay, the development of resistance mutations. In recent years, large databases have been collected to allow the automatic analysis of relations between the virus genome other clinical and demographical information, and the failure or success of a therapy. The EuResist integrated database (EID) collected from about 18500 patients and 65000 different therapies is probably one of the largest clinical genomic databases. Only one third of the therapies in the EID contain therapy response information and only 5% of the therapy records have response information as well as genotypic data. This leads to two specific challenges (a) semi-supervised learning – a setting where many samples are available but only a small proportion of them are labeled and (b) missing data.
We review a novel solution for the first setting: a novel dimensionality reduction framework that binds information theoretic considerations with geometrical constraints over the simplex. The dimensionality reduction framework is formulated to find optimal low dimensional geometric embedding of the simplex that preserves pairwise distances. This novel similarity-based clustering solution was tested on toy data and textual data. We show that this solution, although it outperforms other methods and provides good results on a small sample of the Euresist data, is impractical for the large EuResist dataset. In addition, we review a generative-discriminative prediction system that successfully overcomes the missing value challenge.
Apart from a review of the EuResist project and related challenges, this chapter provides an overview of recent developments in the field of machine learning-based prediction methods for HIV drug resistance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Altfeld, M., Alle, T.: Hitting hiv where it hurts: an alternative approach to hiv vaccine design. TRENDS in Immunology 27, 504–510 (2006)
Cai, F., Haifeng, C., Hicks, C., Bartlett, J., Zhu, J., Gao, F.: Detection of minor drug-resistant populations by parallel allele-specific sequencing. Nature Methods 4, 123–125 (2007)
Schindler, M., Mönch, J., Kutsch, O., Li, H., Santiago, M.L., Billet-Ruche, F., Müller-Trutwein, M.C., Novembre, F.J., Peeters, M., Courgnaud, V., Bailes, E., Roques, P., Sodora, D.L., Silvetri, G., Sharp, P.M., Hahn, B.H., Kirchhoff, F.: Nef-mediated suppression of t cell activation was lost in a lentiviral lineage that gave rise to hiv-1. Cell 125, 1055–1067 (2006)
Brass, A.L., Dykxhoorn, D.M., Benita, Y., Yan, N., Engelman, A., Xavier, R.J., Lieberman, J., Elledge, S.J.: Identification of host proteins required for hiv infection through a functional genomic screen. Science, 1152725 (January 2008)
Roomp, K., Beerenwinkel, N., Sing, T., Schülter, E., Büch, J., Sierra-Aragon, S., Däumer, M., Hoffmann, D., Kaiser, R., Lengauer, T., Selbig, J.: Arevir: A secure platform for designing personalized antiretroviral therapies against hiv. In: Leser, U., Naumann, F., Eckman, B. (eds.) DILS 2006. LNCS (LNBI), vol. 4075, pp. 185–194. Springer, Heidelberg (2006)
Cordes, F., Kaiser, R., Selbig, J.: Bioinformatics approach to predicting hiv drug resistance. Expert Review of Molecular Diagnostics 6(2), 207–215 (2006)
Altmann, A., Rosen-Zvi, M., Prosperi, M., Aharoni, E., Neuvirth, H., Schülter, E., Büch, J., Peres, Y., Incardona, F., Sönnerborg, A., Kaiser, R., Zazzi, M., Lengauer, T.: The euresist approach for predicting response to anti hiv-1 therapy. In: The 6th European HIV Drug Resistance Workshop, Cascais, Portugal (2008)
Rhee, S.Y., Taylor, J., Wadhera, G., Ben-Hur, A., Brutlag, B., Shafer, R.: Genotypic predictors of human immunodeficiency virus type 1 drug resistance. PNAS 103, 17355–17360 (2006)
Carvajal-Rodriguez, A.: The importance of bio-computational tools for predicting hiv drug re-sistance. Cell 1, 63–68 (2007)
Beerenwinkel, N., Schmidt, B., Walther, H., Kaiser, R., Lengauer, T., Hoffmann, R., Korn, K., Selbig, J.: Diversity and complexity of hiv-1 drug resitance: A bioinformatics approach to predicting phenotype from genotype. PNAS 99, 8271–8276 (2002)
Altmann, A., Beerenwinkel, N., Sing, T., Savenkov, I., Doumer, M., Kaiser, R., Rhee, S.Y., Fessel, W., Shafer, R., Lengauer, T.: Super learning: A applica-tion to prediction of hiv-1 drug resistance. Statistical Applications in Genetics and Molecular Biology 6(7), 169–178 (2007)
Johnson, A., Brun-Vezinet, F., Clotet, B., Gunthard, H., Kuritzkes, D., Pillay, D., Schapiro, J., Richman, D.: Update of the drug resistance mutations in hiv-1: 2007. Top HIV Med. 15(4), 119–125 (1991)
Brun-Vezinet, F., Costagliola, D., Mounir Ait, K., Calvez, V., Clavel, F., Clotet, B., Haubrich, R., Kempf, D., King, M., Kuritzkes, D., Lanier, R., Miller, M., Miller, V., Phillips, A., Pillay, D., Schapiro, J., Scott, J., Shafer, R., Zazzi, M., Zolopa, A., DeGruttola, V.: Clinically validated genotype analysis: guiding principles and statistical concerns. Antiviral therapy 9(4), 465–478 (2004)
Van Laethem, K., De Luca, A., Antinori, A., Cingolani, A., Perna, C., Vandamme, A.: A genotypic drug resistance interpretation algorithm that significantly predicts therapy response in hiv-1-infected patients. Antiviral therapy 2, 123–129 (2002)
Kantor, R., Machekano, R., Gonzales, M.J., Dupnik, K., Schapiro, J.M., Shafer, R.W.: Human immunodeficiency virus reverse transcriptase and protease sequence database: an expanded data model integrating natural language text and sequence analysis programs. Nucleic Acids Research 29(1), 296–299 (2001)
Zazzi, M., Romano, L., Venturi, G., Shafer, R., Reid, C., Dal Bello, F., Parolin, C., Palu, G., Valensin, P.: Comparative evaluation of three computerized algorithms for prediction of antiretroviral susceptibility from hiv type 1 genotype. J. Antimicrob Chemother 53(2), 356–360 (2004)
Beerenwinkel, N., Däumer, M., Oette, M., Korn, K., Hoffmann, D., Kaiser, R., Lengauer, T., Selbig, J., Walter, H.: Geno2pheno: estimating phenotypic drug resistance from hiv-1 genotypes. Nucleic Acids Research 31(13), 3850–3855 (2003)
Larder, B., Wang, D., Revell, A., Montaner, J., Harrigan, R., De Wolf, F., Lange, J., Wegner, S., Ruiz, L., Perez-Elias, M., Emery, S., Gatell, J., Monforte, A., Torti, C., Zazzi, M., Lane, L.: The development of artificial neural networks to predict virological response to combination hiv therapy. Antiviral therapy 12, 15–24 (2007)
Altmann, A., Beerenwinkel, N., Sing, T., Savenkov, I., Doumer, M., Kaiser, R., Rhee, S.Y., Fessel, W., Shafer, R., Lengauer, T.: Improved prediction of response to antiretroviral combination therapy using the genetic barrier to drug resistance. Antiviral therapy 12, 169–178 (2007)
Saigo, H., Uno, T., Tsuda, K.: Mining complex genotypic features for predicting hiv-1 drug resistance. Bioinformatics 23(18), 2455–2462 (2007)
Almerico, A., Tutone, M., Lauria, A.: Docking and multivariate methods to explore hiv-1 drug-resistance: a comparative analysis. J. Comput. Aided Mol. Des. (2008)
Altmann, A., Rosen-Zvi, M., Prosperi, M., Aharoni, E., Neuvirth, H., Schülter, E., Büch, J., Struck, D., Peres, Y., Incardona, F., Sönnerborg, A., Kaiser, R., Zazzi, M., Lengauer, T.: Comparison of classifier fusion methods for predicting response to anti hiv-1 therapy. PLoS ONE 3(10), 3470 (2008)
Rosen-Zvi, M., Altmann, A., Prosperi, M., Aharoni, E., Neuvirth, H., Sönnerborg, A., Schülter, E., Struck, D., Peres, Y., Incardona, F., Kaiser, R., Zazzi, M., Lengauer, T.: Selecting anti-HIV therapies based on a variety of genomic and clinical factors. Bioinformatics 24(13), i399–i406 (2008)
Aharoni, E., Altman, A., Borgulya, G., D’Autilia, R., Incardona, F., Kaiser, R., Kent, C., Lengauer, T., Neuvirth, H., Peres, Y., Petroczi, A., Prosperi, M., Rosen-Zvi, M., Schülter, E., Sing, T., Sönnenborg, A., Thompson, R., Zazzi, M.: Integration of viral genomics with clinical data to predict response to anti-hiv treatment. In: IST-Africa 2007 Conference & Exhibition (2007)
Zazzi, M., Aharoni, E., Altmann, A., Baszó, F., Bidgood, P., Borgulya, G., Denholm-Prince, J., Fielder, M., Kent, C., Lengauer, T., Nepusz, T., Neuvirth, H., Peres, Y., Petroczi, A., Prosperi, M., Romano, L., Rosen-Zvi, M., Schülter, E., Sing, T., Sönnerborg, A., Thompson, R., Ulivi, G., Zalány, L., Incardona, F.: Euresist: exploration of multiple modeling techniques for prediction of response to treatment. In: Proceedings of the 5th European HIV Drug Resistance Workshop (2007)
Rosen-Zvi, M., Neuvirth, H., Aharoni, E., Zazzi, M., Tishby, N.: Consistent dimensionality reduction scheme and its application to clinical hiv data. In: NIPS 2006 workshop, Novel Applications of Dimensionality Reduction (2006)
Cover, T.M., Thomas, J.A.: Elements of information theory. Wiley-Interscience, New York (1991)
Lin, J.: Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory 37(1), 145–151 (1991)
Endres, D.M., Schindelin, J.E.: A new metric for probability distributions. IEEE Transactions on Information Theory 49(7), 1858–1860 (2003)
Chan, A.B., Chan, A.B., Vasconcelos, N., Vasconcelos, N., Moreno, P.J., Moreno, P.J.: A family of probabilistic kernels based on information divergence (2004)
Lafferty, J., Lebanon, G.: Diffusion kernels on statistical manifolds. J. Mach. Learn. Res. 6, 129–163 (2005)
Tishby, N., Pereira, F., Bialek, W.: The information bottleneck method. In: Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, pp. 368–377 (1999)
Chung, F., Handjani, S., Jungreis, D.: Generalizations of polya’s urn problem. Annals of combinatorics 7, 141–154 (2003)
Chapelle, O., Zien, A.: Semi-supervised classification by low density separation. In: Cowell, R., Ghahramani, Z. (eds.) Tenth International Workshop on Artificial Intelligence and Statistics, pp. 57–64 (2005)
Siliciano, R.: Viral reservoirs and ongoing virus replication in patients on haart: implications for clinical management. In: Conf. Retrovir Oppor. Infect Conf. Retrovir Oppor. Infect 8th Abstract No. L5 (2001)
Piliero, P.: Early factors in successful anti-hiv treatment. Journal of the International Association of Physicians in AIDS Care (JIAPAC) 2(1), 10–20 (2003)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, Santa Mateo (1988)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Rosen-Zvi, M., Aharoni, E., Selbig, J. (2009). HIV-1 Drug Resistance Prediction and Therapy Optimization: A Case Study for the Application of Classification and Clustering Methods. In: Biehl, M., Hammer, B., Verleysen, M., Villmann, T. (eds) Similarity-Based Clustering. Lecture Notes in Computer Science(), vol 5400. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01805-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-01805-3_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01804-6
Online ISBN: 978-3-642-01805-3
eBook Packages: Computer ScienceComputer Science (R0)