Abstract
Virtual screening (VS) methods can be categorized into structure-based virtual screening (SBVS) that involves knowledge about the target’s 3D structure and ligand-based virtual screening (LBVS) approaches that utilize information from at least one identified ligand. However, the activity prediction of new bioactive molecules in highly diverse data set is still less accurate and the result is not comprehensive enough since only one approach is applied at one time. This paper aims to recommend the boosting ensemble method, MultiBoost, into LBVS using the well-known chemoinformatics database, the MDL Drug Data Report (MDDR). The experimental results were compared with Support Vector Machines (SVM). The final outcomes showed that MultiBoost ensemble classifiers had improved the effectiveness of the prediction of new bioactive molecules in high diverse data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Elsevier, Cambridge (2012)
Chen, F., Deng, P., Wan, J., Zhang, D., Vasilakos, A.V., Rong, X.: Data mining for the internet of things: literature review and challenges. Int. J. Distrib. Sens. Netw. 2015(2015), 14 p. (2015)
Kantardzic, M.: Data Mining: Concepts, Models, Methods, and Algorithms, 2nd edn. Wiley, Hoboken (2011)
Geppert, H., Vogt, M., Bajorath, J.: Current trends in ligand-based virtual screening: molecular representations, data mining methods, new application areas, and performance evaluation. J. Chem. Inf. Model. 50(2), 205–216 (2010)
Brown, F.K.: Chemoinformatics: what is it and how does it impact drug discovery. Annu. Rep. Med. Chem. 33, 375–384 (1998)
Lavecchia, A.: Machine-learning approaches in drug discovery: methods and applications. Drug Discov. Today 20(3), 318–331 (2015)
Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., Walter, P.: Molecular Biology of the Cell, 4th edn. Garland Science, New York (2002)
Svensson, F., Karlen, A., Sköld, C.: Virtual screening data fusion using both structure-and ligand-based methods. J. Chem. Inf. Model. 52(1), 225–232 (2011)
Venkatraman, V., Perez-Nueno, V.I., Mavridis, L., Ritchie, D.W.: Comprehensive comparison of ligand-based virtual screening tools against the DUD data set reveals limitations of current 3D methods. J. Chem. Inf. Model. 50(12), 2079–2093 (2010)
Mitchell, J.B.: Machine learning methods in chemoinformatics. Wiley Interdiscipl. Rev.: Comput. Mol. Sci. 4(5), 468–481 (2014)
Melville, J.L., Burke, E.K., Hirst, J.D.: Machine learning in virtual screening. Comb. Chem. High Throughput Screen. 12(4), 332–343 (2009)
Abdo, A., Leclère, V., Jacques, P., Salim, N., Pupin, M.: Prediction of new bioactive molecules using a Bayesian belief network. J. Chem. Inf. Model. 54(1), 30–36 (2014)
Sheridan, R.P., Kearsley, S.K.: Why do we need so many chemical similarity search methods? Drug Discov. Today 7(17), 903–911 (2002)
Ding, H., Takigawa, I., Mamitsuka, H., Zhu, S.: Similarity-based machine learning methods for predicting drug–target interactions: a brief review. Brief. Bioinform. 15(5), 734–747 (2014)
Jenkins, J.L., Bender, A., Davies, J.W.: In silico target fishing: predicting biological targets from chemical structure. Drug Discov. Today: Technol. 3(4), 413–421 (2007)
Harper, G., Bradshaw, J., Gittins, J.C., Green, D.V., Leach, A.R.: Prediction of biological activity for high-throughput screening using binary kernel discrimination. J. Chem. Inf. Comput. Sci. 41(5), 1295–1300 (2001)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM 2(3), 27 (2011)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of 13th International Conference on Machine Learning, pp. 148–156. Morgan Kaufmann (1996
Webb, G.: Multiboosting: a technique for combining boosting and wagging. Mach. Learn. 40(2), 159–196 (2000)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Cohen, W.W.: Fast effective rule induction. In: Proceedings of 12th International Conference on Machine Learning, pp. 115–123 (1995)
Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Proceedings of 15th International Conference on Machine Learning. Department of Computer Science, University of Waikato (1998)
Smusz, S., Kurczab, R., Bojarski, A.J.: A multidimensional analysis of machine learning methods performance in the classification of bioactive compounds. Chemometr. Intell. Lab. Syst. 128, 89–100 (2013)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. Elsevier (2005)
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach. Learn. 36(1), 105–139 (1999)
Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Philip, S.Y., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)
Siegel, S., Castellan Jr., N.J.: Nonparametric Statistics for the Behavioral Sciences, 2nd edn. Mcgraw-Hill Book Company, New York (1988)
Acknowledgments
This work is supported by the Ministry of Higher Education (MOHE) and Research Management Centre (RMC) at the Universiti Teknologi Malaysia (UTM) under the Research University Grant Category (VOT Q.J130000.2528.16H74).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Hashim, H., Saeed, F. (2017). Prediction of New Bioactive Molecules of Chemical Compound Using Boosting Ensemble Methods. In: Mohamed, A., Berry, M., Yap, B. (eds) Soft Computing in Data Science. SCDS 2017. Communications in Computer and Information Science, vol 788. Springer, Singapore. https://doi.org/10.1007/978-981-10-7242-0_22
Download citation
DOI: https://doi.org/10.1007/978-981-10-7242-0_22
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7241-3
Online ISBN: 978-981-10-7242-0
eBook Packages: Computer ScienceComputer Science (R0)