skip to main content
10.1145/2382936.2382944acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

Multi-target protein-chemical interaction prediction using task-regularized and boosted multi-task learning

Published:07 October 2012Publication History

ABSTRACT

Interactions between proteins and small-molecule chemicals modulate many protein functions and biological processes, and identifying these interactions is a crucial step in modern drug discovery. Supervised learning methods for predicting protein-chemical interactions (PCI) have been widely studied, but their performance is largely limited by insufficient availability of binding data for many proteins. In addition, many complex diseases such as Alzheimer's disease and cancers are found associated with multiple target proteins. Chemicals that selectively modulate only one of these target proteins are unable to effectively conquer these diseases. In this paper we propose two multi-task learning (MTL) algorithms for predicting active compounds of multiple proteins related to the same diseases, some of which may have very few binding examples. In the first method we optimize the likelihood of compound features with a Gaussian prior, while the second method boosts compound features using a number of independent boosting classifiers. Experimental studies demonstrate significant performance improvement of our MTL methods over baseline methods. Our MTL methods are also able to accurately identify promiscuous compounds that interact with multiple related proteins.

References

  1. R. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res., 6:1817--1853, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Argyriou, T. Evgeniou, and M. Pontil. Multi-task feature learning. 2006.Google ScholarGoogle Scholar
  3. J. Benson, Y. P. Chen, S. Cornell-Kennon, M. Dorsch, S. Kim, M. Leszczyniecka, W. Sellers, and C. Lengauer. Validating cancer drug targets. Nature, 441:451--456, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  4. S. Bickel, J. Bogojeska, T. Lengauer, and T. Scheffer. Multi-task learning for HIV therapy screening. pages 56--63, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Bleicher, H.-J. Bohm, K. Muller, and A. I. Alanine. Hit and lead generation: beyond high-throughput screening. Nat. Rev. Drug Discov., 2:369--378, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  6. J. Bogojeska, S. Bickel, A. Altmann, and T. Lengauer. Dealing with sparse data in predicting outcomes of HIV combination therapies. Bioinformatics, 26(17):2085--2092, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Caruana. Multitask learning: a knowledge-based source of inductive bias. Machine Learning, 28:41--75, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at their site http://www.csie.ntu.edu.tw/~cjlin/libsvm. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Chen and R. Rosenfeld. A gaussian prior for smoothing maximum entropy models, 2003. Technical Report (28 pages), Carnegie-Mellon University.Google ScholarGoogle Scholar
  10. D. Erhan and P. L'Heureux. Collaborative filtering on a family of biological targets. J. Chem. Inf. Model., 46:626--635, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  11. J.-L. Faulon, M. Collins, and R. Carr. The signature molecular descriptor. 4. canonizing molecules using extended valence sequences. J. Chem. Inf. Model., 44(2):427--436, 2004.Google ScholarGoogle Scholar
  12. Y. Freund and R. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. pages 23--37, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. A. Goodnow Jr. Hit and lead identification: integrated technology-based approaches. Drug Discov. Today: Technol., 3(4):367--375, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  14. L. Jacob and J.-P. Vert. Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics, 24(19):2149--2156, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Ji and J. Ye. Linear dimensionality reduction for multi-label classification. In Proceedings of the 21st International Joint Conference on Artifical Intelligence (IJCAI'09), pages 1077--1082, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. K. Koh, S.-J. Kim, and S. Boyd. An interior-point method for large-scale l1-regularized logistic regression. J. Mach. Lear. Res., 8:1519--1555, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Liu, Y. Lin, X. Wen, R. N. Jorissen, and M. K. Gilson. BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res., 35:D198--D201, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  18. L. Mason, J. Baxter, P. Bartlett, and M. Frean. Boosting algorithms as gradient descent. pages 512--518, 2000.Google ScholarGoogle Scholar
  19. H. Matthies and G. Strang. The solution of non-linear finite element equations. Int. J. Numer. Meth. Eng., 14:1613--1626, 1979.Google ScholarGoogle ScholarCross RefCross Ref
  20. X. Ning, H. Rangwala, and G. Karypis. Multi-assay-based structure-activity relationship models: improving structure-activity relationship models by incorporating activity information from related targets. J. Chem. Inf. Model., 49:2444--2456, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  21. J. Nocedal. Updating quasi-newton matrices with limited storage. Math. Comput., 35:773--782, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  22. K. Puniyani, S. Kim, and E. P. Xing. Multi-population GWA mapping via multi-task regularized regression. pages 208--216, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. V. Stephenson, R. Heyding, and D. Weaver. The "promiscuous drug concept" with applications to alzheimer's disease. FEBS Letters, 579:1338--1342, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  24. G. Tsoumakas and I. Katakis. Multi-label classification: an overview. Int. J. Data Warehousing and Mining, 579:1--13, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  25. X. Wang, C. Zhang, and Z. Zhang. Boosted multi-task learning for face verification with applications to web image and video search. pages 142--149, 2009.Google ScholarGoogle Scholar
  26. F. Wu, Y. Han, Q. Tian, and Y Zhuang. Multi-label boosting for image annotation by structural grouping sparsity. In Proceedings of the International Conference on Multimedia (MM'10), pages 15--24, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Q. Xu, S. Pan, H. Xue, and Q. Yang. Multitask learning for protein subcellular location prediction. IEEE/ACM Trans. Comp. Biol. and Bioinfo., 99:748--759, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Zhang and J. Huan. Comparison of chemical descriptors for protein-chemical interaction prediction. Int. J. Comput. Biosci., 1(1):13--21, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  29. K. Zhang, J. W. Gray, and B. Parvin. Sparse multitask regression for identifying common mechanism of response to therapeutic targets. pages 97--105, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Multi-target protein-chemical interaction prediction using task-regularized and boosted multi-task learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      BCB '12: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
      October 2012
      725 pages
      ISBN:9781450316705
      DOI:10.1145/2382936

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 October 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      BCB '12 Paper Acceptance Rate33of159submissions,21%Overall Acceptance Rate254of885submissions,29%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader