ABSTRACT
Interactions between proteins and small-molecule chemicals modulate many protein functions and biological processes, and identifying these interactions is a crucial step in modern drug discovery. Supervised learning methods for predicting protein-chemical interactions (PCI) have been widely studied, but their performance is largely limited by insufficient availability of binding data for many proteins. In addition, many complex diseases such as Alzheimer's disease and cancers are found associated with multiple target proteins. Chemicals that selectively modulate only one of these target proteins are unable to effectively conquer these diseases. In this paper we propose two multi-task learning (MTL) algorithms for predicting active compounds of multiple proteins related to the same diseases, some of which may have very few binding examples. In the first method we optimize the likelihood of compound features with a Gaussian prior, while the second method boosts compound features using a number of independent boosting classifiers. Experimental studies demonstrate significant performance improvement of our MTL methods over baseline methods. Our MTL methods are also able to accurately identify promiscuous compounds that interact with multiple related proteins.
- R. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res., 6:1817--1853, 2005. Google ScholarDigital Library
- A. Argyriou, T. Evgeniou, and M. Pontil. Multi-task feature learning. 2006.Google Scholar
- J. Benson, Y. P. Chen, S. Cornell-Kennon, M. Dorsch, S. Kim, M. Leszczyniecka, W. Sellers, and C. Lengauer. Validating cancer drug targets. Nature, 441:451--456, 2006.Google ScholarCross Ref
- S. Bickel, J. Bogojeska, T. Lengauer, and T. Scheffer. Multi-task learning for HIV therapy screening. pages 56--63, 2008. Google ScholarDigital Library
- K. Bleicher, H.-J. Bohm, K. Muller, and A. I. Alanine. Hit and lead generation: beyond high-throughput screening. Nat. Rev. Drug Discov., 2:369--378, 2003.Google ScholarCross Ref
- J. Bogojeska, S. Bickel, A. Altmann, and T. Lengauer. Dealing with sparse data in predicting outcomes of HIV combination therapies. Bioinformatics, 26(17):2085--2092, 2010. Google ScholarDigital Library
- R. Caruana. Multitask learning: a knowledge-based source of inductive bias. Machine Learning, 28:41--75, 1997. Google ScholarDigital Library
- C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at their site http://www.csie.ntu.edu.tw/~cjlin/libsvm. Google ScholarDigital Library
- S. Chen and R. Rosenfeld. A gaussian prior for smoothing maximum entropy models, 2003. Technical Report (28 pages), Carnegie-Mellon University.Google Scholar
- D. Erhan and P. L'Heureux. Collaborative filtering on a family of biological targets. J. Chem. Inf. Model., 46:626--635, 2006.Google ScholarCross Ref
- J.-L. Faulon, M. Collins, and R. Carr. The signature molecular descriptor. 4. canonizing molecules using extended valence sequences. J. Chem. Inf. Model., 44(2):427--436, 2004.Google Scholar
- Y. Freund and R. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. pages 23--37, 1995. Google ScholarDigital Library
- R. A. Goodnow Jr. Hit and lead identification: integrated technology-based approaches. Drug Discov. Today: Technol., 3(4):367--375, 2006.Google ScholarCross Ref
- L. Jacob and J.-P. Vert. Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics, 24(19):2149--2156, 2008. Google ScholarDigital Library
- S. Ji and J. Ye. Linear dimensionality reduction for multi-label classification. In Proceedings of the 21st International Joint Conference on Artifical Intelligence (IJCAI'09), pages 1077--1082, 2009. Google ScholarDigital Library
- K. Koh, S.-J. Kim, and S. Boyd. An interior-point method for large-scale l1-regularized logistic regression. J. Mach. Lear. Res., 8:1519--1555, 2007. Google ScholarDigital Library
- T. Liu, Y. Lin, X. Wen, R. N. Jorissen, and M. K. Gilson. BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res., 35:D198--D201, 2007.Google ScholarCross Ref
- L. Mason, J. Baxter, P. Bartlett, and M. Frean. Boosting algorithms as gradient descent. pages 512--518, 2000.Google Scholar
- H. Matthies and G. Strang. The solution of non-linear finite element equations. Int. J. Numer. Meth. Eng., 14:1613--1626, 1979.Google ScholarCross Ref
- X. Ning, H. Rangwala, and G. Karypis. Multi-assay-based structure-activity relationship models: improving structure-activity relationship models by incorporating activity information from related targets. J. Chem. Inf. Model., 49:2444--2456, 2009.Google ScholarCross Ref
- J. Nocedal. Updating quasi-newton matrices with limited storage. Math. Comput., 35:773--782, 1980.Google ScholarCross Ref
- K. Puniyani, S. Kim, and E. P. Xing. Multi-population GWA mapping via multi-task regularized regression. pages 208--216, 2010. Google ScholarDigital Library
- V. Stephenson, R. Heyding, and D. Weaver. The "promiscuous drug concept" with applications to alzheimer's disease. FEBS Letters, 579:1338--1342, 2005.Google ScholarCross Ref
- G. Tsoumakas and I. Katakis. Multi-label classification: an overview. Int. J. Data Warehousing and Mining, 579:1--13, 2007.Google ScholarCross Ref
- X. Wang, C. Zhang, and Z. Zhang. Boosted multi-task learning for face verification with applications to web image and video search. pages 142--149, 2009.Google Scholar
- F. Wu, Y. Han, Q. Tian, and Y Zhuang. Multi-label boosting for image annotation by structural grouping sparsity. In Proceedings of the International Conference on Multimedia (MM'10), pages 15--24, 2010. Google ScholarDigital Library
- Q. Xu, S. Pan, H. Xue, and Q. Yang. Multitask learning for protein subcellular location prediction. IEEE/ACM Trans. Comp. Biol. and Bioinfo., 99:748--759, 2010. Google ScholarDigital Library
- J. Zhang and J. Huan. Comparison of chemical descriptors for protein-chemical interaction prediction. Int. J. Comput. Biosci., 1(1):13--21, 2010.Google ScholarCross Ref
- K. Zhang, J. W. Gray, and B. Parvin. Sparse multitask regression for identifying common mechanism of response to therapeutic targets. pages 97--105, 2010. Google ScholarDigital Library
Index Terms
- Multi-target protein-chemical interaction prediction using task-regularized and boosted multi-task learning
Recommendations
Protein-ligand interaction prediction
Motivation: Predicting interactions between small molecules and proteins is a crucial step to decipher many biological processes, and plays a critical role in drug discovery. When no detailed 3D structure of the protein target is available, ligand-...
Comments