Skip to main content

Prediction of New Bioactive Molecules of Chemical Compound Using Boosting Ensemble Methods

  • Conference paper
  • First Online:
Soft Computing in Data Science (SCDS 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 788))

Included in the following conference series:

Abstract

Virtual screening (VS) methods can be categorized into structure-based virtual screening (SBVS) that involves knowledge about the target’s 3D structure and ligand-based virtual screening (LBVS) approaches that utilize information from at least one identified ligand. However, the activity prediction of new bioactive molecules in highly diverse data set is still less accurate and the result is not comprehensive enough since only one approach is applied at one time. This paper aims to recommend the boosting ensemble method, MultiBoost, into LBVS using the well-known chemoinformatics database, the MDL Drug Data Report (MDDR). The experimental results were compared with Support Vector Machines (SVM). The final outcomes showed that MultiBoost ensemble classifiers had improved the effectiveness of the prediction of new bioactive molecules in high diverse data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Elsevier, Cambridge (2012)

    MATH  Google Scholar 

  2. Chen, F., Deng, P., Wan, J., Zhang, D., Vasilakos, A.V., Rong, X.: Data mining for the internet of things: literature review and challenges. Int. J. Distrib. Sens. Netw. 2015(2015), 14 p. (2015)

    Google Scholar 

  3. Kantardzic, M.: Data Mining: Concepts, Models, Methods, and Algorithms, 2nd edn. Wiley, Hoboken (2011)

    Book  MATH  Google Scholar 

  4. Geppert, H., Vogt, M., Bajorath, J.: Current trends in ligand-based virtual screening: molecular representations, data mining methods, new application areas, and performance evaluation. J. Chem. Inf. Model. 50(2), 205–216 (2010)

    Article  Google Scholar 

  5. Brown, F.K.: Chemoinformatics: what is it and how does it impact drug discovery. Annu. Rep. Med. Chem. 33, 375–384 (1998)

    Article  Google Scholar 

  6. Lavecchia, A.: Machine-learning approaches in drug discovery: methods and applications. Drug Discov. Today 20(3), 318–331 (2015)

    Article  Google Scholar 

  7. Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., Walter, P.: Molecular Biology of the Cell, 4th edn. Garland Science, New York (2002)

    Google Scholar 

  8. Svensson, F., Karlen, A., Sköld, C.: Virtual screening data fusion using both structure-and ligand-based methods. J. Chem. Inf. Model. 52(1), 225–232 (2011)

    Article  Google Scholar 

  9. Venkatraman, V., Perez-Nueno, V.I., Mavridis, L., Ritchie, D.W.: Comprehensive comparison of ligand-based virtual screening tools against the DUD data set reveals limitations of current 3D methods. J. Chem. Inf. Model. 50(12), 2079–2093 (2010)

    Article  Google Scholar 

  10. Mitchell, J.B.: Machine learning methods in chemoinformatics. Wiley Interdiscipl. Rev.: Comput. Mol. Sci. 4(5), 468–481 (2014)

    Google Scholar 

  11. Melville, J.L., Burke, E.K., Hirst, J.D.: Machine learning in virtual screening. Comb. Chem. High Throughput Screen. 12(4), 332–343 (2009)

    Article  Google Scholar 

  12. Abdo, A., Leclère, V., Jacques, P., Salim, N., Pupin, M.: Prediction of new bioactive molecules using a Bayesian belief network. J. Chem. Inf. Model. 54(1), 30–36 (2014)

    Article  Google Scholar 

  13. Sheridan, R.P., Kearsley, S.K.: Why do we need so many chemical similarity search methods? Drug Discov. Today 7(17), 903–911 (2002)

    Article  Google Scholar 

  14. Ding, H., Takigawa, I., Mamitsuka, H., Zhu, S.: Similarity-based machine learning methods for predicting drug–target interactions: a brief review. Brief. Bioinform. 15(5), 734–747 (2014)

    Article  Google Scholar 

  15. Jenkins, J.L., Bender, A., Davies, J.W.: In silico target fishing: predicting biological targets from chemical structure. Drug Discov. Today: Technol. 3(4), 413–421 (2007)

    Article  Google Scholar 

  16. Harper, G., Bradshaw, J., Gittins, J.C., Green, D.V., Leach, A.R.: Prediction of biological activity for high-throughput screening using binary kernel discrimination. J. Chem. Inf. Comput. Sci. 41(5), 1295–1300 (2001)

    Article  Google Scholar 

  17. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM 2(3), 27 (2011)

    Google Scholar 

  18. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of 13th International Conference on Machine Learning, pp. 148–156. Morgan Kaufmann (1996

    Google Scholar 

  19. Webb, G.: Multiboosting: a technique for combining boosting and wagging. Mach. Learn. 40(2), 159–196 (2000)

    Article  Google Scholar 

  20. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MATH  Google Scholar 

  21. Cohen, W.W.: Fast effective rule induction. In: Proceedings of 12th International Conference on Machine Learning, pp. 115–123 (1995)

    Google Scholar 

  22. Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Proceedings of 15th International Conference on Machine Learning. Department of Computer Science, University of Waikato (1998)

    Google Scholar 

  23. Smusz, S., Kurczab, R., Bojarski, A.J.: A multidimensional analysis of machine learning methods performance in the classification of bioactive compounds. Chemometr. Intell. Lab. Syst. 128, 89–100 (2013)

    Article  Google Scholar 

  24. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  25. Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. Elsevier (2005)

    Google Scholar 

  26. Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach. Learn. 36(1), 105–139 (1999)

    Article  Google Scholar 

  27. Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Philip, S.Y., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)

    Article  Google Scholar 

  28. Siegel, S., Castellan Jr., N.J.: Nonparametric Statistics for the Behavioral Sciences, 2nd edn. Mcgraw-Hill Book Company, New York (1988)

    Google Scholar 

Download references

Acknowledgments

This work is supported by the Ministry of Higher Education (MOHE) and Research Management Centre (RMC) at the Universiti Teknologi Malaysia (UTM) under the Research University Grant Category (VOT Q.J130000.2528.16H74).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Faisal Saeed .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hashim, H., Saeed, F. (2017). Prediction of New Bioactive Molecules of Chemical Compound Using Boosting Ensemble Methods. In: Mohamed, A., Berry, M., Yap, B. (eds) Soft Computing in Data Science. SCDS 2017. Communications in Computer and Information Science, vol 788. Springer, Singapore. https://doi.org/10.1007/978-981-10-7242-0_22

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-7242-0_22

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-7241-3

  • Online ISBN: 978-981-10-7242-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics