ABSTRACT
Human immunodeficiency virus type 1 (HIV-1) targets for infection host cells that express both the CD4 surface membrane receptor, which binds the viral envelope glycoprotein gp120, as well as either the CCR5 (R5) or CXCR4 (X4) chemokine coreceptor, which principally interact with the V3 loop region of gp120. Coreceptor selectivity, or tropism, is dependent upon the sequence patterns encoding HIV-1 viral strains, and there are medications currently on the market and in development designed to bind and inhibit each coreceptor. Since determination of HIV-1 coreceptor usage must be undertaken prior to administering such a drug, and given the costly and time-consuming nature of experimental assays in this regard, there is now considerable interest in direct application of machine learning algorithms for classifying HIV-1 coreceptor usage based on the V3 loop region of gp120. Here for the first time, a number of n-grams (subsequences formed by a sliding window of size n) approaches are described for representing as feature vectors two large datasets of V3 loop peptide sequences obtained from HIV-1 viruses with known coreceptor usage, and the random forest algorithm is implemented for classification. These datasets were previously retrieved and used to develop combined sequence-structure based classifiers as well as sequence based string kernel classifiers, respectively. A comparison of the accuracy reported for those complex classifiers with the performance achieved here using relatively simpler and more computationally efficient n-grams reveals significant advantages while highlighting limitations.
- Boisvert, S., Marchand, M., Laviolette, F., and Corbeil, J. HIV-1 coreceptor usage prediction without multiple alignments: an application of string kernels. Retrovirology, 5:110, 2008.Google ScholarCross Ref
- Breiman, L. Random forests. Machine Learning, 45:5--32, 2001. Google ScholarDigital Library
- Broder, S. The development of antiretroviral therapy and its impact on the HIV-1/AIDS pandemic. Antiviral Res, 85 (1):1--18, 2010.Google ScholarCross Ref
- Cheng, B. Y., Carbonell, J. G., and Klein-Seetharaman, J. Protein classification based on text document classification techniques. Proteins, 58 (4):955--970, 2005.Google ScholarCross Ref
- Damashek, M. Gauging Similarity with n-Grams: Language-Independent Categorization of Text. Science, 267 (5199):843--848, 1995.Google Scholar
- Dayhoff, M. O., Schwartz, R. M., and Orcut, B. C. A model for evolutionary change in proteins. In Atlas of Protein Sequence and Structure, Vol 5. M. O. Dayhoff, Ed. National Biomedical Research Foundation, Washington D. C., 345--352, 1978.Google Scholar
- De Jong, J. J., De Ronde, A., Keulen, W., Tersmette, M., and Goudsmit, J. Minimal requirements for the human immunodeficiency virus type 1 V3 domain to support the syncytium-inducing phenotype: analysis by single amino acid substitution. J Virol, 66 (11):6777--6780, 1992.Google ScholarCross Ref
- Dong, Q., Zhou, S., Deng, L., and Guan, J. Gene ontology-based protein function prediction by using sequence composition information. Protein Pept Lett, 17 (6):789--795, 2010.Google ScholarCross Ref
- Eggink, D., Berkhout, B., and Sanders, R. W. Inhibition of HIV-1 by fusion inhibitors. Curr Pharm Des, 16 (33):3716--3728, 2010.Google ScholarCross Ref
- Frank, E., Hall, M., Trigg, L., Holmes, G., and Witten, I. H. Data mining in bioinformatics using Weka. Bioinformatics, 20 (15):2479--2481, 2004. Google ScholarDigital Library
- Gardner, E. M., Burman, W. J., Steiner, J. F., Anderson, P. L., and Bangsberg, D. R. Antiretroviral medication adherence and the development of class-specific antiretroviral resistance. AIDS, 23 (9):1035--1046, 2009.Google ScholarCross Ref
- Gulick, R. M., Lalezari, J., Goodrich, J., et al. Maraviroc for previously treated patients with R5 HIV-1 infection. N Engl J Med, 359 (14):1429--1441, 2008.Google ScholarCross Ref
- Jensen, M. A. and van 't Wout, A. B. Predicting HIV-1 coreceptor usage with sequence analysis. AIDS Rev, 5 (2):104--112, 2003.Google Scholar
- Jensen, M. A., Coetzer, M., van 't Wout, A. B., Morris, L., and Mullins, J. I. A reliable phenotype predictor for human immunodeficiency virus type 1 subtype C based on envelope V3 sequences. J Virol, 80 (10):4698--4704, 2006.Google ScholarCross Ref
- Low, A. J., Dong, W., Chan, D., Sing, T., Swanstrom, R., Jensen, M., Pillai, S., Good, B., and Harrigan, P. R. Current V3 genotyping algorithms are inadequate for predicting X4 co-receptor usage in clinical isolates. AIDS, 21 (14):F17--24, 2007.Google ScholarCross Ref
- Mansoori, E. G., Zolghadri, M. J., and Katebi, S. D. Protein superfamily classification using fuzzy rule-based classifier. IEEE Trans Nanobioscience, 8 (1):92--99, 2009.Google ScholarCross Ref
- Masso, M. and Vaisman, I. I. Accurate and efficient gp120 V3 loop structure based models for the determination of HIV-1 co-receptor usage. BMC Bioinformatics, 11:494, 2010.Google ScholarCross Ref
- Patrussi, L. and Baldari, C. T. The CXCL12/CXCR4 Axis as a Therapeutic Target in Cancer and HIV-1 Infection. Curr Med Chem, 18 (4):497--512, 2011.Google ScholarCross Ref
- Pillai, S., Good, B., Richman, D., and Corbeil, J. A new perspective on V3 phenotype prediction. AIDS Res Hum Retroviruses, 19 (2):145--149, 2003.Google ScholarCross Ref
- Prosperi, M. C., Fanti, I., Ulivi, G., Micarelli, A., De Luca, A., and Zazzi, M. Robust supervised and unsupervised statistical learning for HIV type 1 coreceptor usage analysis. AIDS Res Hum Retroviruses, 25 (3):305--314, 2009.Google ScholarCross Ref
- Ramkumar, K., Serrao, E., Odde, S., and Neamati, N. HIV-1 integrase inhibitors: 2007--2008 update. Med Res Rev, 30 (6):890--954, 2010.Google ScholarCross Ref
- Resch, W., Hoffman, N., and Swanstrom, R. Improved success of phenotype prediction of the human immunodeficiency virus type 1 from envelope variable loop 3 sequence using neural networks. Virology, 288 (1):51--62, 2001.Google ScholarCross Ref
- Rose, J. D., Rhea, A. M., Weber, J., and Quinones-Mateu, M. E. Current tests to evaluate HIV-1 coreceptor tropism. Curr Opin HIV AIDS, 4 (2):136--142, 2009.Google ScholarCross Ref
- Sagar, M. Clinical implications of new findings in HIV basic research. HIV Ther, 3 (4):351--360, 2009.Google ScholarCross Ref
- Sander, O., Sing, T., Sommer, I., Low, A. J., Cheung, P. K., Harrigan, P. R., Lengauer, T., and Domingues, F. S. Structural descriptors of gp120 V3 loop for the prediction of HIV-1 coreceptor usage. PLoS Comput Biol, 3 (3):e58, 2007.Google ScholarCross Ref
- Scheib, H., Sperisen, P., and Hartley, O. HIV-1 coreceptor selectivity: structural analogy between HIV-1 V3 regions and chemokine beta-hairpins is not the explanation. Structure, 14 (4):645--647; discussion 649--651, 2006.Google ScholarCross Ref
- Sharon, M., Kessler, N., Levy, R., Zolla-Pazner, S., Gorlach, M., and Anglister, J. Alternative conformations of HIV-1 V3 loops mimic beta hairpins in chemokines, suggesting a mechanism for coreceptor selectivity. Structure, 11 (2):225--236, 2003.Google ScholarCross Ref
- Sing, T., Low, A. J., Beerenwinkel, N., et al. Predicting HIV coreceptor usage on the basis of genetic and clinical covariates. Antivir Ther, 12 (7):1097--1106, 2007.Google Scholar
- Vries, J. K., Liu, X., and Bahar, I. The relationship between n-gram patterns and protein secondary structure. Proteins, 68 (4):830--838, 2007.Google ScholarCross Ref
- Watabe, T., Kishino, H., Okuhara, Y., and Kitazoe, Y. Fold recognition of the human immunodeficiency virus type 1 V3 loop and flexibility of its crown structure during the course of adaptation to a host. Genetics, 172 (3):1385--1396, 2006.Google ScholarCross Ref
- Westby, M. and van der Ryst, E. CCR5 antagonists: host-targeted antiviral agents for the treatment of HIV infection, 4 years on. Antivir Chem Chemother, 20 (5):179--192, 2010.Google ScholarCross Ref
- Wu, C. H., Zhao, S., Chen, H. L., Lo, C. J., and McLarty, J. Motif identification neural design for rapid and sensitive protein family search. Comput Appl Biosci, 12 (2):109--118, 1996.Google Scholar
- Wu, Y. The co-receptor signaling model of HIV-1 pathogenesis in peripheral CD4 T cells. Retrovirology, 6:41, 2009.Google ScholarCross Ref
- Zhang, K. X. and Ouellette, B. F. GAIA: a gram-based interaction analysis tool--an approach for identifying interacting domains in yeast. BMC Bioinformatics, 10 Suppl 1:S60, 2009.Google ScholarCross Ref
Index Terms
- Sequence-based prediction of HIV-1 coreceptor usage: utility of n-grams for representing gp120 V3 loops
Recommendations
Exploring antiviral potency of N-1 substituted pyrimidines against HIV-1 and other DNA/RNA viruses: Design, synthesis, characterization, ADMET analysis, docking, molecular dynamics and biological activity
AbstractA novel series of pyrimidine derivatives, bearing modified benzimidazoles at N-1 position, has been designed, synthesized and screened as NNRTIs against HIV and as broad-spectrum antiviral agents. The molecules were screened against ...
Graphical AbstractDisplay Omitted
Highlights- New pyrimidine derivatives bearing modified benzimidazoles synthesized as antivirals against HIV-1 and different DNA/RNA viruses.
HIV-1 CRF01_AE coreceptor usage prediction using kernel methods based logistic model trees
The determination of HIV-1 coreceptor usage plays a major role in HIV treatment. Since Maraviroc has been used in a treatment for patients those exclusively harbor R5-tropic strains, the efficient performance of classifying HIV-1 coreceptor usage can ...
Prediction of R5, X4, and R5X4 HIV-1 Coreceptor Usage with Evolved Neural Networks
The HIV-1 genome is highly heterogeneous. This variation affords the virus a wide range of molecular properties, including the ability to infect cell types, such as macrophages and lymphocytes, expressing different chemokine receptors on the cell ...
Comments