Skip to main content

Advertisement

Log in

Hybrid ensemble approach for classification

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

This paper presents a novel hybrid ensemble approach for classification in medical databases. The proposed approach is formulated to cluster extracted features from medical databases into soft clusters using unsupervised learning strategies and fuse the decisions using parallel data fusion techniques. The idea is to observe associations in the features and fuse the decisions made by learning algorithms to find the strong clusters which can make impact on overall classification accuracy. The novel techniques such as parallel neural-based strong clusters fusion and parallel neural network based data fusion are proposed that allow integration of various clustering algorithms for hybrid ensemble approach. The proposed approach has been implemented and evaluated on the benchmark databases such as Digital Database for Screening Mammograms, Wisconsin Breast Cancer, and Pima Indian Diabetics. A comparative performance analysis of the proposed approach with other existing approaches for knowledge extraction and classification is presented. The experimental results demonstrate the effectiveness of the proposed approach in terms of improved classification accuracy on benchmark medical databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Damien M, Graham JW, Jie C, Huidong J (2005) A delivery framework for health data mining and analytics. In: Proceedings of the twenty-eighth Australasian conference on computer science, Newcastle, Australia, pp 381–387

  2. Gulbinat W (1997) What is the role of WHO as an intergovernmental organisation In: The coordination of telematics in healthcare? World Health Organisation. Geneva, Switzerland at http://www.hon.ch/libraray/papers/gulbinat.html

  3. Handl J, Knowles J (2007) An evolutionary approach to multiobjective clustering. IEEE Trans Evol Comput 56–76

  4. Korkmaz EE, Du J, Alhajj R, Barker K (2006) Combining advantages of new chromosome representation scheme and multi-objective genetic algorithms for better clustering. In: Proceedings of intelligent data analysis, pp 163–182

  5. Boulis C, Ostendorf M (2004) Combining multiple clustering systems. In: Boulicaut J, Esposito F, Giannotti F, Pedreschi D (eds) 8th European conference on principles and practice of knowledge discovery in databases. Lecture notes in computer science, pp 63–74

  6. Fred ALN, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 835–850

  7. Evgenia D, Andreas W, Kurt H (1999) Voting in clustering and finding the number of clusters. In: Bothe H, Oja E, Massad E, Haefke C (eds) Proceedings of the international symposium on advances in intelligent data analysis (AIDA 99). ICSC Academic Press, pp 291–296

  8. Greene D, Tsymbal A, Bolshakova N, Cunningham P (2004) Ensemble clustering in medical diagnostics. In: Proceedings of the 17th IEEE symposium on computer-based medical systems. IEEE Comput Soc, Washington, pp 576–581

    Google Scholar 

  9. Lourenco A, Fred A (2005) Ensemble methods in the clustering of string patterns. In: Proceedings of the seventh IEEE workshops on application of computer vision. IEEE Comput Soc, Washington, pp 143–148

    Google Scholar 

  10. Greene D, Cunningham P (2006) Efficient ensemble methods for document clustering. Tech Rep TCD-CS-2006-48. Department of Computer Science, Trinity College Dublin

  11. Chen D, Chang RF, Huang YL (2000) Breast cancer diagnosis using self-organizing map for sonography. Ultrasound Med Biol 405–411

  12. West D, West V (2000) Model selection for a medical diagnostic decision support system: a breast cancer detection case. Artif Intell Med 183–204

  13. Pattaraintakorn P, Cercone N, Naruedomkul K (2005) Hybrid intelligent systems: selecting attributes for soft-computing analysis. In: 29th annual international computer software and applications conference (COMPSAC), pp 319–325

  14. Dietterich TG (2000) Ensemble methods in machine learning. In: First international workshop on multiple classifier systems. Lecture notes in computer science, pp 1–15

  15. Hu X (2001) Using rough sets theory and database operations to construct a good ensemble of classifiers for data mining applications. In: IEEE ICDM, pp 233–240

  16. Dudoit S, Fridlyand J (2003) Bagging to improve the accuracy of a clustering procedure. Bioinformatics 1090–1099

  17. Fischer B, Buhmann JM (2003) Path-based clustering for grouping of smooth curves and texture segmentation. IEEE Trans Pattern Anal Mach Intell 513–518

  18. Leisch F (1999) Bagged clustering. Working Papers. SFB adaptive information systems and modeling in economics and management science. Institut für Information, Abt. Produktionsmanagement, Wien, Wirtschaftsuniv

  19. Fred ALN (2001) Finding consistent clusters in data partitions. In: Roli F, Kittler J (eds) Proc 3d Int workshop on multiple classifier systems. LNCS, vol 2364, pp 309–318

  20. Fred ALN, Jain AK (2002) Data clustering using evidence accumulation. In: Proc of the 16th international conference on pattern recognition, pp 276–280

  21. Kellam P, Liu X, Martin NJ, Orengo C, Swift S, Tucker A (2001) Comparing, contrasting and combining clusters in viral gene expression data. In: Proceedings of 6th workshop on intelligent data analysis in medicine and pharmocology, pp 56–62

  22. Boulis C, Ostendorf M (2004) Combining multiple clustering systems. In: Boulicaut J, Esposito F, Giannotti F, Pedreschi D (eds) 8th European conference on principles and practice of knowledge discovery in databases. Lecture notes in computer science, pp 63–74

  23. Martin HCL, Alexander PT, Anil KJ (2004) Multiobjective data clustering. In: IEEE computer society conference on computer vision and pattern recognition, pp 424–430

  24. Evgenia D, Andreas W, Kurt H (1999) Voting in clustering and finding the number of clusters. In: Bothe H, Oja E, Massad E, Haefke C (eds) Proceedings of the international symposium on advances in intelligent data analysis (AIDA 99). ICSC Academic Press, pp 291–296

  25. Greene D, Tsymbal A, Bolshakova N, Cunningham P (2004) Ensemble clustering in medical diagnostics. In: Proceedings of the 17th IEEE symposium on computer-based medical systems. IEEE Comput Soc, Washington, pp 576–581

    Chapter  Google Scholar 

  26. Xiahua H, Illhoi Y (2004) Cluster ensemble and its applications in gene expression analysis. In: Proceedings of the second conference on Asia-Pacific bioinformatics. Dune din, New Zealand, vol 29, pp 297–302

  27. Setiono R (2000) Generating concise and accurate classification rules for breast cancer diagnosis. Artif Intell Med 205–219

  28. Blake CL, Merz CJ (1996) UCI repository of machine learning databases. Available from http://www.ics.uci.edu./~mlearn/MLReporsitory.html

  29. Joachim D, Sabine B, Johann FD (1993) Segmentation of microcalcifications in mammograms. IEEE Trans Med Imag 12–18

  30. Jerez-Aragones JM, Gomez-Ruiz JA, Ramos-Jimenez G, Munoz-Perez J, Alba-Conejo E (2003) A combined neural network and decision trees model for prognosis of breast cancer relapse. Artif Intell Med, pp 45–63

  31. Kıyan T, Yıldırım T (2003) Breast cancer diagnosis using statistical neural networks. In: XII TAINN symposium proceedings, Çanakkale, Turkey 754–761

  32. Kayaer K, Yıldırım T (2003) Medical diagnosis on Pima indian diabetes using general regression neural networks. In: Artificial neural networks and neural information processing (ICANN/ICONIP), Istanbul, Turkey, June 26–29, pp 181–184

  33. Kemal P, Salih G, Ahmet A (2008) A cascade learning system for classification of diabetes disease: generalized discriminant analysis and least square support vector machine expert systems with applications, pp 482–487

  34. Watkins AB (2005) Exploiting immunological metaphors in the development of serial, parallel, and distributed learning algorithms. PhD dissertation, University of Kent, Canterbury, March

  35. Panchal R, Verma B (2006) Neural classification of mass abnormalities with different types of features in digital mammography. Int J Comput Intell Appl, pp 61–67

  36. Verma B (2006) A neural learning algorithm for the diagnosis of breast cancer. IEEE international joint conference on neural networks, IJCNN’06, Canada. IEEE Press, New York, pp 10786–10791

  37. Mahmoud RH, Yo-Sung H (2005) Automated detection of tumours in mammograms using two segments for classification, pp 910–921

  38. Anna K, Ioannis B, Spyros S, Philippos S, Eleni L, George P, Lena C (2006) A texture analysis approach for characterizing microcalcifications on mammograms. In: International special topic conference on Information technology in bio medicine, pp 251–257

  39. Osmar RZ, Maria-Luiza A, Alexandru C (2002) Mammography classification by an association rule-based classifier. In: Third international ACM SIGKDD workshop on multimedia data mining (MDM/KDD’2002) in conjunction with eighth ACM SIGKDD, Edmonton, Alberta, Canada, pp 62–69

  40. Keir B, Sameer S (2002) Classification of mammographic breast density using a combined classifier paradigm. In: 4th international workshop on digital mammography, pp 177–180

  41. Abonyi J, Szeifert F (2003) Supervised fuzzy clustering for the identification of fuzzy classifiers. Pattern Recogn Lett 2195–2207

  42. Bennet KP, Blue JA (1997) A support vector machine approach to decision trees. Math Report, Rensselaer Polytechnic Institute, pp 97–100

  43. Goodman DE, Boggess L, Watkins A (2003) An investigation into the source of power for AIRS, an artificial immune classification system. In: Proceedings of the international joint conference on neural networks (IJCNN ’03). IEEE Press, New York, pp 1678–1683

  44. Hamilton HJ, Shan N, Cercone N (1996) RIAC: a rule induction algorithm based on approximate classification. Technical Report CS 96-06, University of Regina

  45. Pena-Reyes CA, Sipper M (1999) A fuzzy-genetic approach to breast cancer diagnosis. Artif Intell Med 131–155

  46. Quinlan JR (1996) Improved use of continuous attributes in C4.5. J Artif Intell Res 77–90

  47. Polat K, Gunes S, Tosun S (2006) Diagnosis of heart disease using artificial immune recognition system and fuzzy weighted pre-processing. Pattern Recogn 2186–2193

  48. Polat K, Ahan SS, Gunes S (2006) A new method for medical diagnosis: artificial immune recognition system (AIRS) with fuzzy weighted preprocessing and application to ECG arrhythmia. Expert Syst Appl 264–269

  49. Weiss SM, Kapouleas I (1990) An empirical comparison of pattern recognition, neural nets and machine learning classification methods. In: Shavlik JW, Dietterich TG (eds) Readings in machine learning. Morgan Kauffmann, San Mateo

    Google Scholar 

  50. Ster B, Dobnikar A (1996) Neural networks in medical diagnosis: comparison with other methods. In: Proceedings of the international conference on engineering applications of neural networks (EANN ’96), pp 427–430

  51. Mitra S, Banka H, Pedrycz W (2006) Rough-fuzzy collaborative clustering. IEEE Trans Syst Man Cybern, Part B 36(4):795–805

    Article  Google Scholar 

  52. Wiering MA, van Hasselt H (2008) Ensemble algorithms in reinforcement learning. IEEE Trans Syst Man Cybern, Part B 38(4):930–936

    Article  Google Scholar 

  53. Liu Y, Yao X (1999) Simultaneous training of negatively correlated neural networks in an ensemble. IEEE Trans Syst Man Cybern, Part B 29(6):716–725

    Article  Google Scholar 

  54. Islam MM, Yao X, Shahriar SM, Islam MA, Murase K (2008) Bagging and boosting negatively correlated neural networks. IEEE Trans Syst Man Cybern, Part B 38(3):771–784

    Article  Google Scholar 

  55. Parikh D, Polikar R (2007) An ensemble-based incremental learning approach to data fusion. IEEE Trans Syst Man Cybern, Part B 37(2):437–450

    Article  Google Scholar 

  56. Hassan SZ, Verma B (2007) A hybrid data mining approach for knowledge extraction and classification in medical databases. In: 7th international conference on intelligent systems design and applications, Brazil, pp 503–510

  57. Carpenter GA, Tan AH (1993) Rule extraction, fuzzy ARTMAP, and medical databases. In: Proceedings of world congress on neural networks, Portland, USA, vol I, pp 501–506

  58. Carpenter GA (1997) Distributed learning, recognition, and prediction by ART and ARTMAP neural networks. Neural Netw 10(8):1473–1494

    Article  Google Scholar 

  59. Carpenter GA, Grossberg S, Markuzon N, Reynolds J, Rosen D (1992) Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Trans Neural Netw 3(5):698–713

    Article  Google Scholar 

  60. Carpenter GA, Grossberg S, Reynolds J (1991) ARTMAP: Supervised real-time learning and classification of nonstationary data by a self-organizing neural network. Neural Netw 4(5):565–588

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Brijesh Verma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Verma, B., Hassan, S.Z. Hybrid ensemble approach for classification. Appl Intell 34, 258–278 (2011). https://doi.org/10.1007/s10489-009-0194-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-009-0194-7

Keywords

Navigation