skip to main content
10.1145/2147805.2147841acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
short-paper

Sequence-based prediction of HIV-1 coreceptor usage: utility of n-grams for representing gp120 V3 loops

Published:01 August 2011Publication History

ABSTRACT

Human immunodeficiency virus type 1 (HIV-1) targets for infection host cells that express both the CD4 surface membrane receptor, which binds the viral envelope glycoprotein gp120, as well as either the CCR5 (R5) or CXCR4 (X4) chemokine coreceptor, which principally interact with the V3 loop region of gp120. Coreceptor selectivity, or tropism, is dependent upon the sequence patterns encoding HIV-1 viral strains, and there are medications currently on the market and in development designed to bind and inhibit each coreceptor. Since determination of HIV-1 coreceptor usage must be undertaken prior to administering such a drug, and given the costly and time-consuming nature of experimental assays in this regard, there is now considerable interest in direct application of machine learning algorithms for classifying HIV-1 coreceptor usage based on the V3 loop region of gp120. Here for the first time, a number of n-grams (subsequences formed by a sliding window of size n) approaches are described for representing as feature vectors two large datasets of V3 loop peptide sequences obtained from HIV-1 viruses with known coreceptor usage, and the random forest algorithm is implemented for classification. These datasets were previously retrieved and used to develop combined sequence-structure based classifiers as well as sequence based string kernel classifiers, respectively. A comparison of the accuracy reported for those complex classifiers with the performance achieved here using relatively simpler and more computationally efficient n-grams reveals significant advantages while highlighting limitations.

References

  1. Boisvert, S., Marchand, M., Laviolette, F., and Corbeil, J. HIV-1 coreceptor usage prediction without multiple alignments: an application of string kernels. Retrovirology, 5:110, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  2. Breiman, L. Random forests. Machine Learning, 45:5--32, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Broder, S. The development of antiretroviral therapy and its impact on the HIV-1/AIDS pandemic. Antiviral Res, 85 (1):1--18, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  4. Cheng, B. Y., Carbonell, J. G., and Klein-Seetharaman, J. Protein classification based on text document classification techniques. Proteins, 58 (4):955--970, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  5. Damashek, M. Gauging Similarity with n-Grams: Language-Independent Categorization of Text. Science, 267 (5199):843--848, 1995.Google ScholarGoogle Scholar
  6. Dayhoff, M. O., Schwartz, R. M., and Orcut, B. C. A model for evolutionary change in proteins. In Atlas of Protein Sequence and Structure, Vol 5. M. O. Dayhoff, Ed. National Biomedical Research Foundation, Washington D. C., 345--352, 1978.Google ScholarGoogle Scholar
  7. De Jong, J. J., De Ronde, A., Keulen, W., Tersmette, M., and Goudsmit, J. Minimal requirements for the human immunodeficiency virus type 1 V3 domain to support the syncytium-inducing phenotype: analysis by single amino acid substitution. J Virol, 66 (11):6777--6780, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  8. Dong, Q., Zhou, S., Deng, L., and Guan, J. Gene ontology-based protein function prediction by using sequence composition information. Protein Pept Lett, 17 (6):789--795, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  9. Eggink, D., Berkhout, B., and Sanders, R. W. Inhibition of HIV-1 by fusion inhibitors. Curr Pharm Des, 16 (33):3716--3728, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  10. Frank, E., Hall, M., Trigg, L., Holmes, G., and Witten, I. H. Data mining in bioinformatics using Weka. Bioinformatics, 20 (15):2479--2481, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gardner, E. M., Burman, W. J., Steiner, J. F., Anderson, P. L., and Bangsberg, D. R. Antiretroviral medication adherence and the development of class-specific antiretroviral resistance. AIDS, 23 (9):1035--1046, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  12. Gulick, R. M., Lalezari, J., Goodrich, J., et al. Maraviroc for previously treated patients with R5 HIV-1 infection. N Engl J Med, 359 (14):1429--1441, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  13. Jensen, M. A. and van 't Wout, A. B. Predicting HIV-1 coreceptor usage with sequence analysis. AIDS Rev, 5 (2):104--112, 2003.Google ScholarGoogle Scholar
  14. Jensen, M. A., Coetzer, M., van 't Wout, A. B., Morris, L., and Mullins, J. I. A reliable phenotype predictor for human immunodeficiency virus type 1 subtype C based on envelope V3 sequences. J Virol, 80 (10):4698--4704, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  15. Low, A. J., Dong, W., Chan, D., Sing, T., Swanstrom, R., Jensen, M., Pillai, S., Good, B., and Harrigan, P. R. Current V3 genotyping algorithms are inadequate for predicting X4 co-receptor usage in clinical isolates. AIDS, 21 (14):F17--24, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  16. Mansoori, E. G., Zolghadri, M. J., and Katebi, S. D. Protein superfamily classification using fuzzy rule-based classifier. IEEE Trans Nanobioscience, 8 (1):92--99, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  17. Masso, M. and Vaisman, I. I. Accurate and efficient gp120 V3 loop structure based models for the determination of HIV-1 co-receptor usage. BMC Bioinformatics, 11:494, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  18. Patrussi, L. and Baldari, C. T. The CXCL12/CXCR4 Axis as a Therapeutic Target in Cancer and HIV-1 Infection. Curr Med Chem, 18 (4):497--512, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  19. Pillai, S., Good, B., Richman, D., and Corbeil, J. A new perspective on V3 phenotype prediction. AIDS Res Hum Retroviruses, 19 (2):145--149, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  20. Prosperi, M. C., Fanti, I., Ulivi, G., Micarelli, A., De Luca, A., and Zazzi, M. Robust supervised and unsupervised statistical learning for HIV type 1 coreceptor usage analysis. AIDS Res Hum Retroviruses, 25 (3):305--314, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  21. Ramkumar, K., Serrao, E., Odde, S., and Neamati, N. HIV-1 integrase inhibitors: 2007--2008 update. Med Res Rev, 30 (6):890--954, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  22. Resch, W., Hoffman, N., and Swanstrom, R. Improved success of phenotype prediction of the human immunodeficiency virus type 1 from envelope variable loop 3 sequence using neural networks. Virology, 288 (1):51--62, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  23. Rose, J. D., Rhea, A. M., Weber, J., and Quinones-Mateu, M. E. Current tests to evaluate HIV-1 coreceptor tropism. Curr Opin HIV AIDS, 4 (2):136--142, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  24. Sagar, M. Clinical implications of new findings in HIV basic research. HIV Ther, 3 (4):351--360, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  25. Sander, O., Sing, T., Sommer, I., Low, A. J., Cheung, P. K., Harrigan, P. R., Lengauer, T., and Domingues, F. S. Structural descriptors of gp120 V3 loop for the prediction of HIV-1 coreceptor usage. PLoS Comput Biol, 3 (3):e58, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  26. Scheib, H., Sperisen, P., and Hartley, O. HIV-1 coreceptor selectivity: structural analogy between HIV-1 V3 regions and chemokine beta-hairpins is not the explanation. Structure, 14 (4):645--647; discussion 649--651, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  27. Sharon, M., Kessler, N., Levy, R., Zolla-Pazner, S., Gorlach, M., and Anglister, J. Alternative conformations of HIV-1 V3 loops mimic beta hairpins in chemokines, suggesting a mechanism for coreceptor selectivity. Structure, 11 (2):225--236, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  28. Sing, T., Low, A. J., Beerenwinkel, N., et al. Predicting HIV coreceptor usage on the basis of genetic and clinical covariates. Antivir Ther, 12 (7):1097--1106, 2007.Google ScholarGoogle Scholar
  29. Vries, J. K., Liu, X., and Bahar, I. The relationship between n-gram patterns and protein secondary structure. Proteins, 68 (4):830--838, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  30. Watabe, T., Kishino, H., Okuhara, Y., and Kitazoe, Y. Fold recognition of the human immunodeficiency virus type 1 V3 loop and flexibility of its crown structure during the course of adaptation to a host. Genetics, 172 (3):1385--1396, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  31. Westby, M. and van der Ryst, E. CCR5 antagonists: host-targeted antiviral agents for the treatment of HIV infection, 4 years on. Antivir Chem Chemother, 20 (5):179--192, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  32. Wu, C. H., Zhao, S., Chen, H. L., Lo, C. J., and McLarty, J. Motif identification neural design for rapid and sensitive protein family search. Comput Appl Biosci, 12 (2):109--118, 1996.Google ScholarGoogle Scholar
  33. Wu, Y. The co-receptor signaling model of HIV-1 pathogenesis in peripheral CD4 T cells. Retrovirology, 6:41, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  34. Zhang, K. X. and Ouellette, B. F. GAIA: a gram-based interaction analysis tool--an approach for identifying interacting domains in yeast. BMC Bioinformatics, 10 Suppl 1:S60, 2009.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Sequence-based prediction of HIV-1 coreceptor usage: utility of n-grams for representing gp120 V3 loops

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                BCB '11: Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
                August 2011
                688 pages
                ISBN:9781450307963
                DOI:10.1145/2147805
                • General Chairs:
                • Robert Grossman,
                • Andrey Rzhetsky,
                • Program Chairs:
                • Sun Kim,
                • Wei Wang

                Copyright © 2011 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 1 August 2011

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • short-paper

                Acceptance Rates

                Overall Acceptance Rate254of885submissions,29%
              • Article Metrics

                • Downloads (Last 12 months)1
                • Downloads (Last 6 weeks)0

                Other Metrics

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader