Skip to main content
Log in

Machine learning: a review of classification and combining techniques

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Supervised classification is one of the tasks most frequently carried out by so-called Intelligent Systems. Thus, a large number of techniques have been developed based on Artificial Intelligence (Logic-based techniques, Perceptron-based techniques) and Statistics (Bayesian Networks, Instance-based techniques). The goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features. The resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown. This paper describes various classification algorithms and the recent attempt for improving classification accuracy—ensembles of classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Acid S and de Campos LM (2003). Searching for Bayesian network structures in the space of restricted acyclic partially directed graphs. J Artif Intell Res 18: 445–490

    MATH  Google Scholar 

  • Aha D (1997). Lazy learning. Kluwer Academic Publishers, Dordrecht

    MATH  Google Scholar 

  • An A, Cercone N (1999) Discretization of continuous attributes for learning classification rules. Third Pacific-Asia conference on methodologies for knowledge discovery & data mining, 509–514

  • An A, Cercone N (2000) Rule quality measures improve the accuracy of rule induction: an experimental approach. Lecture notes in computer science, vol. 1932, pp 119–129

  • Auer P and Warmuth M (1998). Tracking the best disjunction. Machine Learning 32: 127–150

    Article  MATH  Google Scholar 

  • Baik S, Bala J (2004) A decision tree algorithm for distributed data mining: towards network intrusion detection. Lecture notes in computer science, vol. 3046, pp 206–212

  • Batista G and Monard MC (2003). An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell 17: 519–533

    Article  Google Scholar 

  • Basak J and Kothari R (2004). A classification paradigm for distributed vertically partitioned data. Neural Comput 16(7): 1525–1544

    Article  MATH  Google Scholar 

  • Bauer E and Kohavi R (1999). An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning 36: 105–139

    Article  Google Scholar 

  • Blake CL, Merz CJ (1998) UCI repository of machine learning databases. University of California, Irvine. (http://www.ics.uci.edu/~mlearn/MLRepository.html). Accessed 10 Oct 2005

  • Blockeel H and De Raedt L (1998). Top-down induction of first order logical decision trees. Artif Intell 101(1–2): 285–297

    Article  MATH  Google Scholar 

  • Blum A (1997). Empirical support for winnow and weighted-majority algorithms: results on a calendar scheduling domain. Mach Learn 26(1): 5–23

    Article  MathSciNet  Google Scholar 

  • Bonarini A (2000). An introduction to learning fuzzy classifier systems. Lect Notes Comput Sci 1813: 83–92

    Google Scholar 

  • Bouckaert R (2003) Choosing between two learning algorithms based on calibrated tests. In: Proceedings of 20th International Conference on Machine Learning. Morgan Kaufmann, pp 51–58

  • Bouckaert R (2004). Naive Bayes classifiers that perform well with continuous variables. Lect Notes Comput Sci 3339: 1089–1094

    MathSciNet  Google Scholar 

  • Brazdil P, Soares C and Da Costa J (2003). Ranking learning algorithms: using IBL and meta-learning on accuracy and time results. Mach Learn 50: 251–277

    Article  MATH  Google Scholar 

  • Breiman L (1996). Bagging predictors. Mach Learn 24: 123–140

    MATH  MathSciNet  Google Scholar 

  • Breslow LA and Aha DW (1997). Simplifying decision trees: a survey. Knowl Eng Rev 12: 1–40

    Article  Google Scholar 

  • Brighton H and Mellish C (2002). Advances in instance selection for instance-based learning algorithms. Data Min Knowl Disc 6: 153–172

    Article  MATH  MathSciNet  Google Scholar 

  • Burges C (1998). A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2): 1–47

    Article  Google Scholar 

  • Camargo LS and Yoneyama T (2001). Specification of training sets and the number of hidden neurons for multilayer perceptrons. Neural Comput 13: 2673–2680

    Article  MATH  Google Scholar 

  • Castellano G, Fanelli A and Pelillo M (1997). An iterative pruning algorithm for feedforward neural networks. IEEE Trans Neural Netw 8: 519–531

    Article  Google Scholar 

  • Cheng J, Greiner R (2001) Learning Bayesian belief network classifiers: algorithms and system. In: Stroulia E, Matwin S (eds) AI 2001, LNAI 2056, pp 141–151

  • Cheng J, Greiner R, Kelly J, Bell D and Liu W (2002). Learning Bayesian networks from data: an information-theory based approach. Artif Intell 137: 43–90

    Article  MATH  MathSciNet  Google Scholar 

  • Chickering DM (2002). Optimal structure identification with greedy search. J Mach Learn Res 3: 507–554

    Article  MathSciNet  Google Scholar 

  • Cohen W (1995) Fast effective rule induction. In: Proceedings of ICML-95, pp 115–123

  • Cowell RG (2001) Conditions under which conditional independence and scoring methods lead to identical selection of Bayesian network models. In: Proceedings of 17th International Conference on Uncertainty in Artificial Intelligence

  • Crammer K and Singer Y (2002). On the learnability and design of output codes for multiclass problems. Mach Learn 47: 201–233

    Article  MATH  Google Scholar 

  • Cristianini N and Shawe-Taylor J (2000). An introduction to support vector machines and other Kernel-based learning methods. Cambridge University Press, Cambridge

    Google Scholar 

  • Dantsin E, Eiter T, Gottlob G and Voronkov A (2001). Complexity and expressive power of logic programming. ACM Comput Surveys 33: 374–425

    Article  Google Scholar 

  • De Raedt L (1996) Advances in inductive logic programming. IOS Press

  • De Mantaras RL and Armengol E (1998). Machine learning from examples: inductive and Lazy methods. Data Knowl Eng 25: 99–123

    Article  MATH  Google Scholar 

  • Dietterich TG (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7): 1895–1924

    Article  Google Scholar 

  • Dietterich TG (2000). An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting and randomization. Mach Learn 40: 139–157

    Article  Google Scholar 

  • Domeniconi C and Gunopulos D (2001). Adaptive nearest neighbor classification using support vector machines. Adv Neural Inf Process Syst 14: 665–672

    Google Scholar 

  • Domingos P and Pazzani M (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29: 103–130

    Article  MATH  Google Scholar 

  • Dutton D and Conroy G (1996). A review of machine learning. Knowl Eng Rev 12: 341–367

    Article  Google Scholar 

  • Dzeroski S and Lavrac N (2001). Relational data mining. Springer, Berlin

    MATH  Google Scholar 

  • Elomaa T and Rousu J (1999). General and efficient multisplitting of numerical attributes. Mach Learn 36: 201–244

    Article  MATH  Google Scholar 

  • Elomaa T (1999). The biases of decision tree pruning strategies. Lect Notes Comput Sci 1642: 63–74, Springer

    Google Scholar 

  • Fidelis MV, Lopes HS, Freitas AA (2000) Discovering comprehensible classification rules using a genetic algorithm. In: Proceedings of CEC-2000, conference on evolutionary computation La Jolla, USA, v. 1, pp 805–811

  • Flach PA, Lavrac N (2000) The role of feature construction in inductive rule learning. In: De Raedt L, Kramer S (eds) Proceedings of the ICML2000 workshop on attribute-value learning and relational learning: bridging the Gap. Stanford University

  • Frank E and Witten I (1998). Generating accurate rule sets without global optimization. In: Shavlik, J (eds) Machine learning: proceedings of the fifteenth international conference, pp. Morgan Kaufmann Publishers, San Francisco

    Google Scholar 

  • Freund Y and Schapire R (1997). A decision-theoretic generalization of on-line learning and an application to boosting. JCSS 55(1): 119–139

    MATH  MathSciNet  Google Scholar 

  • Freund Y and Schapire R (1999). Large margin classification using the perceptron algorithm. Mach Learn 37: 277–296

    Article  MATH  Google Scholar 

  • Friedman N, Geiger D and Goldszmidt M (1997). Bayesian network classifiers. Mach Learn 29: 131–163

    Article  MATH  Google Scholar 

  • Friedman N and Koller D (2003). Being Bayesian about network structure: a Bayesian approach to structure discovery in Bayesian networks. Mach Learn 50(1): 95–125

    Article  MATH  Google Scholar 

  • Furnkranz J (1997). Pruning algorithms for rule learning. Mach Learn 27: 139–171

    Article  Google Scholar 

  • Furnkranz J (1999). Separate-and-conquer rule learning. Artif Intell Rev 13: 3–54

    Article  Google Scholar 

  • Furnkranz J (2001) Round robin rule learning. In: Proceedings of the 18th international conference on machine learning (ICML-01), pp 146–153

  • Gama J and Brazdil P (1999). Linear tree. Intelligent Data Anal 3: 1–22

    Article  MATH  Google Scholar 

  • Gama J and Brazdil P (2000). Cascade generalization. Mach Learn 41: 315–343

    Article  MATH  Google Scholar 

  • Gehrke J, Ramakrishnan R and Ganti V (2000). RainForest—a framework for fast decision tree construction of large datasets. Data Min Knowl Disc 4(2–3): 127–162

    Article  Google Scholar 

  • Genton M (2001). Classes of Kernels for machine learning: a statistics perspective. J Mach Learn Res 2: 299–312

    Article  MathSciNet  Google Scholar 

  • Guo G, Wang H, Bell D, Bi Y and Greer K (2003). KNN model-based approach in classification. Lect Notes Comput Sci 2888: 986–996

    Article  Google Scholar 

  • Guyon I and Elissee A (2003). An introduction to variable and feature selection. J Mach Learn Res 3: 1157–1182

    Article  MATH  Google Scholar 

  • Hall L, Bowyer K, Kegelmeyer W, Moore T, Chao C (2000) Distributed learning on very large data sets. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 79–84

  • Heckerman D, Meek C, Cooper G (1999) A Bayesian approach to causal discovery. In: Glymour C, Cooper G (eds) Computation, causation, and discovery, MIT Press, pp 141–165

  • Ho TK (1998). The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20: 832–844

    Article  Google Scholar 

  • Hodge V and Austin J (2004). A survey of outlier detection methodologies. Artif Intell Rev 22(2): 85–126

    Article  MATH  Google Scholar 

  • Japkowicz N and Stephen S (2002). The class imbalance problem: a systematic study. Intell Data Anal 6(5): 429–450

    MATH  Google Scholar 

  • Jain AK, Murty MN and Flynn P (1999). Data clustering: a review. ACM Comput Surveys 31(3): 264–323

    Article  Google Scholar 

  • Jensen F (1996) An introduction to Bayesian networks. Springer

  • Jordan MI (1998). Learning in graphical models. MIT Press, Cambridge

    MATH  Google Scholar 

  • Kalousis A and Gama G (2004). On data and algorithms: understanding inductive performance. Mach Learn 54: 275–312

    Article  MATH  Google Scholar 

  • Keerthi S and Gilbert E (2002). Convergence of a generalized SMO algorithm for SVM classifier design. Mach Learn 46: 351–360

    Article  MATH  Google Scholar 

  • Kivinen J (2002) Online learning of linear classifiers. In: Advanced Lectures on Machine Learning: Machine Learning Summer School 2002, Australia, February 11–22, pp 235–257, ISSN: 0302–9743

  • Klusch M, Lodi S, Moro G (2003) Agent-based distributed data mining: the KDEC scheme. In: Intelligent information agents: the agentlink perspective, LNAI 2586, Springer, pp 104–122

  • Kon M and Plaskota L (2000). Information complexity of neural networks. Neural Netw 13: 365–375

    Article  Google Scholar 

  • Kotsiantis S, Pintelas P (2004) Selective voting. In: Proceedings of the 4th International Conference on Intelligent Systems Design and Applications (ISDA 2004), August 26–28, Budapest, pp 397–402

  • Kuncheva L and Whitaker C (2001). Feature subsets for classifier combination: an enumerative experiment. Lect Notes Comput Sci 2096: 228–237

    MathSciNet  Google Scholar 

  • Lazkano E and Sierra B (2003). BAYES-NEAREST: a new hybrid classifier combining Bayesian network and distance based algorithms. Lect Notes Comput Sci 2902: 171–183

    Article  Google Scholar 

  • LiMin W, SenMiao Y, Ling L and HaiJun L (2004). Improving the performance of decision tree: a hybrid approach. Lect Notes Comput Sci 3288: 327–335

    Article  Google Scholar 

  • Lindgren T (2004). Methods for rule conflict resolution. Lect Notes Comput Sci 3201: 262–273

    Article  Google Scholar 

  • Littlestone N and Warmuth M (1994). The weighted majority algorithm. Informa Comput 108(2): 212–261

    Article  MATH  MathSciNet  Google Scholar 

  • Liu H and Metoda H (2001). Instance selection and constructive data mining. Kluwer, Boston

    Google Scholar 

  • Maclin R, Shavlik J (1995) Combining the prediction of multiple classifiers: using competitive learning to initialize ANNs. In: Proceedings of the 14th International joint conference on AI, pp 524–530

  • Madden M (2003) The performance of Bayesian network classifiers constructed using different techniques. In: Proceedings of European conference on machine learning, workshop on probabilistic graphical models for classification, pp 59–70

  • Markovitch S and Rosenstein D (2002). Feature generation using general construction functions. Mach Learn 49: 59–98

    Article  MATH  Google Scholar 

  • McSherry D (1999). Strategic induction of decision trees. Knowl Based Syst 12(5–6): 269–275

    Article  Google Scholar 

  • Melville P, Mooney R (2003) Constructing diverse classifier ensembles using artificial training examples. In: Proceedings of the IJCAI-2003, Acapulco, Mexico, pp 505–510

  • Mitchell T (1997) Machine learning. McGraw Hill

  • Muggleton S (1995). Inverse entailment and Progol. New Generat Comput Special issue on Inductive Logic Programming 13(3–4): 245–286

    Google Scholar 

  • Muggleton S (1999). Inductive logic programming: issues, results and the challenge of learning language in logic. Artif Intell 114: 283–296

    Article  MATH  Google Scholar 

  • Murthy SK (1998). Automatic construction of decision trees from data: a multi-disciplinary survey. Data Min Knowl Disc 2: 345–389

    Article  MathSciNet  Google Scholar 

  • Nadeau C and Bengio Y (2003). Inference for the generalization error. Mach Learn 52: 239–281

    Article  MATH  Google Scholar 

  • Neocleous C, Schizas C (2002) Artificial neural network learning: a comparative review, LNAI 2308, Springer-Verlag, pp 300–313

  • Okamoto S and Yugami N (2003). Effects of domain characteristics on instance-based learning algorithms. Theoret Comput Sci 298: 207–233

    Article  MATH  MathSciNet  Google Scholar 

  • Opitz D and Maclin R (1999). Popular ensemble methods: an empirical study. J Artif Intell Res (JAIR) 11: 169–198

    MATH  Google Scholar 

  • Parekh R, Yang J and Honavar V (2000). Constructive neural network learning algorithms for pattern classification. IEEE Trans Neural Netw 11(2): 436–451

    Article  Google Scholar 

  • Platt J (1999) Using sparseness and analytic QP to speed training of support vector machines. In: Kearns M, Solla S, Cohn D (eds) Advances in neural information processing systems, MIT Press

  • Quinlan JR (1993). C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco

    Google Scholar 

  • Quinlan JR (1995). Induction of logic programs: FOIL and related systems. New Generat Comput 13: 287–312

    Google Scholar 

  • Quinlan JR (1996) Bagging, boosting, and C4.5. Proceedings of the 13th National Conference on Artificial intelligence. AAAI Press and the MIT Press, Menlo Park, CA, pp 725–730

  • Ratanamahatana C and Gunopulos D (2003). Feature selection for the naive Bayesian classifier using decision trees. Appl Artif Intell 17(5–6): 475–487

    Google Scholar 

  • Reinartz T (2002) A unifying view on instance selection, data mining and knowledge discovery, vol. 6. Kluwer Academic Publishers, pp 191–210

  • Reeves CR, Rowe JE (2003) Genetic algorithms—principles and perspectives: a guide to GA theory. Kluwer Academic

  • Roli F, Giacinto G and Vernazza G (2001). Methods for designing multiple classifier systems. Lect Notes Comput Sci 2096: 78–87

    MathSciNet  Google Scholar 

  • Robert J, Howlett LCJ (2001) Radial basis function networks 2: new advances in design. Physica-Verlag Heidelberg, ISBN: 3790813680

  • Roy A (2000). On connectionism, rule extraction and brain-like learning. IEEE Trans Fuzzy Syst 8(2): 222–227

    Article  Google Scholar 

  • Saad D (1998). Online learning in neural networks. Cambridge University Press, London

    Google Scholar 

  • Saitta L and Neri F (1998). Learning in the ‘Real World’. Mach Learn 30(2–3): 313–163

    Google Scholar 

  • Sanchez J, Barandela R and Ferri F (2002). On filtering the training prototypes in nearest neighbour classification. Lect Notes Comput Sci 2504: 239–248

    Article  Google Scholar 

  • Schapire RE, Singer Y, Singhal A (1998) Boosting and Rocchio applied to text filtering. In SIGIR ’98: Proceedings of the 21st Annual International Conference on Research and Development in Information Retrieval, pp 215–223

  • Scholkopf C, Burges JC, Smola AJ (1999) Advances in Kernel Methods. MIT Press

  • Setiono R and Loew WK (2000). FERNN: an algorithm for fast extraction of rules from neural networks. Appl Intell 12: 15–25

    Article  Google Scholar 

  • Siddique MNH and Tokhi MO (2001). Training neural networks: backpropagation vs. genetic algorithms. IEEE Int Joint Conf Neural Netw 4: 2673–2678

    Google Scholar 

  • Ting K and Witten I (1999). Issues in stacked generalization. Artif Intell Res 10: 271–289

    MATH  Google Scholar 

  • Tjen-Sien L, Wei-Yin L and Yu-Shan S (2000). A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach Learn 40: 203–228

    Article  MATH  Google Scholar 

  • Todorovski L and Dzeroski S (2003). Combining classifiers with meta decision trees. Mach Learn 50: 223–249

    Article  MATH  Google Scholar 

  • Utgoff P, Berkman N and Clouse J (1997). Decision tree induction based on efficient tree restructuring. Mach Learn 29(1): 5–44

    Article  MATH  Google Scholar 

  • Veropoulos K, Campbell C, Cristianini N (1999) Controlling the sensitivity of support vector machines. In: Proceedings of the international joint conference on artificial intelligence (IJCAI99)

  • Villada R and Drissi Y (2002). A perspective view and survey of meta-learning. Artif Intell Rev 18: 77–95

    Article  Google Scholar 

  • Vivarelli F and Williams C (2001). Comparing Bayesian neural network algorithms for classifying segmented outdoor images. Neural Netw 14: 427–437

    Article  Google Scholar 

  • Wall R, Cunningham P, Walsh P and Byrne S (2003). Explaining the output of ensembles in medical decision support on a case by case basis. Artif Intell Med 28(2): 191–206

    Article  Google Scholar 

  • Webb IG (2000). Multiboosting: a technique for combining boosting and wagging. Mach Learn 40(2): 159–196

    Article  Google Scholar 

  • Wettschereck D, Aha DW and Mohri T (1997). A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artif Intell Rev 10: 1–37

    Google Scholar 

  • Wilson DR and Martinez T (2000). Reduction techniques for instance-based learning algorithms. Mach Learn 38: 257–286

    Article  MATH  Google Scholar 

  • Witten I and Frank E (2005). Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco

    MATH  Google Scholar 

  • Yam J and Chow W (2001). Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients. IEEE Trans Neural Netw 12: 430–434

    Article  Google Scholar 

  • Yang Y and Webb G (2003). On why discretization works for Naive-Bayes classifiers. Lect Notes Comput Sci 2903: 440–452

    MathSciNet  Google Scholar 

  • Yen GG, Lu H (2000) Hierarchical genetic algorithm based neural network design. In: IEEE symposium on combinations of evolutionary computation and neural networks, pp 168–175

  • Yu L and Liu H (2004). Efficient feature selection via analysis of relevance and redundancy. JMLR 5: 1205–1224

    MathSciNet  Google Scholar 

  • Zhang G (2000). Neural networks for classification: a survey. IEEE Trans Syst Man Cy C 30(4): 451–462

    Article  Google Scholar 

  • Zhang S, Zhang C and Yang Q (2002). Data preparation for data mining. Appl Artif Intell 17: 375–381

    Article  Google Scholar 

  • Zheng Z (1998). Constructing conjunctions using systematic search on decision trees. Knowl Based Syst J 10: 421–430

    Article  Google Scholar 

  • Zheng Z (2000). Constructing X-of-N attributes for decision tree learning. Mach Learn 40: 35–75

    Article  MATH  Google Scholar 

  • Zheng Z and Webb G (2000). Lazy learning of Bayesian rules. Mach Learn 41(1): 53–84

    Article  Google Scholar 

  • Zhou Z and Chen Z (2002). Hybrid decision tree. Knowl Based Syst 15(8): 515–528

    Article  Google Scholar 

  • Zhou Z (2004). Rule extraction: using neural networks or for neural networks?. J Comput Sci Technol 19(2): 249–253

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. B. Kotsiantis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kotsiantis, S.B., Zaharakis, I.D. & Pintelas, P.E. Machine learning: a review of classification and combining techniques. Artif Intell Rev 26, 159–190 (2006). https://doi.org/10.1007/s10462-007-9052-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-007-9052-3

Keywords

Navigation