Abstract
We investigate the classification performance of circular fingerprints in combination with the Naive Bayes Classifier (MP2D), Inductive Logic Programming (ILP) and Support Vector Inductive Logic Programming (SVILP) on a standard molecular benchmark dataset comprising 11 activity classes and about 102,000 structures. The Naive Bayes Classifier treats features independently while ILP combines structural fragments, and then creates new features with higher predictive power. SVILP is a very recently presented method which adds a support vector machine after common ILP procedures. The performance of the methods is evaluated via a number of statistical measures, namely recall, specificity, precision, F-measure, Matthews Correlation Coefficient, area under the Receiver Operating Characteristic (ROC) curve and enrichment factor (EF). According to the F-measure, which takes both recall and precision into account, SVILP is for seven out of the 11 classes the superior method. The results show that the Bayes Classifier gives the best recall performance for eight of the 11 targets, but has a much lower precision, specificity and F-measure. The SVILP model on the other hand has the highest recall for only three of the 11 classes, but generally far superior specificity and precision. To evaluate the statistical significance of the SVILP superiority, we employ McNemar’s test which shows that SVILP performs significantly (p < 5%) better than both other methods for six out of 11 activity classes, while being superior with less significance for three of the remaining classes. While previously the Bayes Classifier was shown to perform very well in molecular classification studies, these results suggest that SVILP is able to extract additional knowledge from the data, thus improving classification results further.












Similar content being viewed by others
References
Johnson AM, Maggiora GM (1990) Concepts and applications of molecular similarity, eds. Wiley, New York
Bender A, Jenkins JL, Li Q, Adams SE, Cannon EO, Glen RC (2006) Molecular similarity: advances in methods, applications and validations in virtual screening and QSAR. In: Annual reports in computational chemistry, vol 2, pp 141–168
Patterson DE, Cramer RD, Ferguson AM, Clark RD, Weinberger LE (1996) J Med Chem 39:3049
Bohm HJ, Schneider G (2000) Virtual screening for bioactive molecules ed. Wiley-VCH
Downs GM, Willett P, Fisanick W (1994) J Chem Inf Comput Sci 34:1094
Estrada E, Uriarte E (2001) Curr Med Chem 8:1573
Mason JS, Good AC, Martin EJ (2001) Curr Pharm Des 7:567
Leach AR, Gillet VJ (2003) An introduction to chemoinformatics. Kluwer, Dordrecht
Gasteiger J (2003) Handbook of chemoinformatics, eds. Wiley-VCH, Weinheim
Scitegic Inc. Retrieved from http://www.scitegic.com/
Hert J, Willett P, Wilton DJ, Acklin P, Azzaoui K, Jacoby E, Schuffenhauer A (2004) Org Biomol Chem 2:3256
Elsevier MDL, 2440 Camino Ramon, Suite 300, San Ramon, CA 94583, USA. http://www.mdl.com/
Glen RC, Bender A, Arnby CH, Carlsson L, Boyer S, Smith J (2006) IDrugs 9:199
Bender A, Mussa HY, Glen RC, Reiling S (2004) J Chem Inf Comput Sci 44:170
Mitchell TM (1997) Machine learning, ed. McGraw-Hill, New York
Liu YA (2004) J Chem Inf Comput Sci 44:1823
Muggleton SH, Lodhi H, Amini A, Sternberg MJE (2006) In: Holmes D, Jain LC (eds) Innovations in machine learning. Springer-Verlag, pp 113–135
Muggleton SH, Lodhi H, Amini A, Sternberg MJE (2005) Proceedings of the 8th international conference on discovery science. Springer-Verlag, 3735:163
Briem H, Lessel UF (2000) Persepect Drug Discovery Design 20:231
Bender A, Mussa HY, Glen RC, Reiling S (2004) J Chem Inf Comput Sci 44:1708
Hert J, Willett P, Wilton DJ, Acklin P, Azzaoui K, Jacoby E, Schuffenhauer A (2004) J Chem Inf Comput Sci 44:1177
Cannon EO, Bender A, Palmer DS, Mitchell JBO (2006) J Chem Inf Model 46:2369
World Anti-Doping Agency (WADA), Stock Exchange Tower, 800 Place Victoria, (Suite 1700), P.O. Box 120, Montreal, Quebec, H4Z 1B7, Canada. Retrieved from http://www.wada.ama.org
Rodgers S, Glen RC, Bender A (2006) J Chem Inf Model 46:569
King RD, Muggleton SH, Lewis R, Sternberg MJE (1992) Proc Natl Acad Sci 89:11322
King RD, Muggleton SH, Srinivasan A, Sternberg MJE (1996) Proc Natl Acad Sci 93:438
Buttingsrud B, Ryeng E, King RD, Alsberg BK (2006) J Comput Aid Mol Des 20:361
Pompe U, Kononenko I (1995) Proceedings of the 5th international workshop on inductive logic programming, pp 417–436
Dutra I, Page D, Santos Costa V, Shavlik J (2003) In: Matwin S, Sammut C (eds) Proceedings of the 12th international conference on inductive logic programming, vol 2583. Lecture Notes in Computer Science, Springer-Verlag, pp 48–65
Hoche S, Wrobel S (2001) In: Rouveirol C, Sebag M (eds) Proceedings of the 11th interational conference on inductive logic programming, vol 2157. Lecture Notes In Computer Science, Springer-Verlag, pp 51–64
Bender A, Glen RC (2004) Org Biomol Chem 2:3204
Barrett SJ, Langdon WB (2006) In: Tiwari A, Knowles J (eds) Applications of soft computing: recent trends, vol 19. Springer-Verlag, pp 99–110
Guha R, Howard MT, Hutchison GR, Murray-Rust P, Rzepa H, Steinbeck C, Wegner J, Willighagen EL (2006) J Chem Inf Model 46(3):991. The Open Babel Package (2006), version 2.0.1. Retrieved from http://openbabel.sourceforge.net/
Quinlan JR (1986) Mach Learn 1:81
A-Razzak M, Glen RC (1992) J Comput Aided Mol Des 6:349
Muggleton SH (1995) New Generation Comput 13:245
Muggleton SH, Bryant CH (2000) In: Cussens J, Frisch AM (eds) Proceedings of the 10th international conference on inductive logic programming. Springer-Verlag, pp 130–146
Joachims T (1999) Making large-Scale SVM learing practical. In: Schölkopf B, Burges C, Smola A (eds) Advances in Kernel Methods-Support Vector Learing, MIT-press, http://svmlight.joachims.org
Siegel S, Castellan NJ Jr (1988) Nonparametric statistics for the behavioral sciences. Boston, MA, McGraw-Hill
McNemar Q (1947) Psychometrica 12:153
Bender A, Glen RC (2005) J Chem Inf Model 45:1369
Acknowledgements
E.O. Cannon, R.C. Glen and J.B.O. Mitchell thank Unilever plc and the EPSRC for funding. A. Bender thanks the Education Office of the Novartis Institutes for BioMedical Research for a postdoctoral fellowship. A. Amini, M.J.E. Sternberg and S.H. Muggleton thank the BBSRC for funding.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material
Rights and permissions
About this article
Cite this article
Cannon, E.O., Amini, A., Bender, A. et al. Support vector inductive logic programming outperforms the naive Bayes classifier and inductive logic programming for the classification of bioactive chemical compounds. J Comput Aided Mol Des 21, 269–280 (2007). https://doi.org/10.1007/s10822-007-9113-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-007-9113-3