Machine learning: a review of classification and combining techniques

Kotsiantis, S. B.; Zaharakis, I. D.; Pintelas, P. E.

doi:10.1007/s10462-007-9052-3

Machine learning: a review of classification and combining techniques

Published: 10 November 2007

Volume 26, pages 159–190, (2006)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

S. B. Kotsiantis^1,2,
I. D. Zaharakis³ &
P. E. Pintelas^1,2

13k Accesses
911 Citations
18 Altmetric
1 Mention
Explore all metrics

Abstract

Supervised classification is one of the tasks most frequently carried out by so-called Intelligent Systems. Thus, a large number of techniques have been developed based on Artificial Intelligence (Logic-based techniques, Perceptron-based techniques) and Statistics (Bayesian Networks, Instance-based techniques). The goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features. The resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown. This paper describes various classification algorithms and the recent attempt for improving classification accuracy—ensembles of classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Introduction

Supervised Learning: Classification and Regression

Classification

References

Acid S and de Campos LM (2003). Searching for Bayesian network structures in the space of restricted acyclic partially directed graphs. J Artif Intell Res 18: 445–490
MATH Google Scholar
Aha D (1997). Lazy learning. Kluwer Academic Publishers, Dordrecht
MATH Google Scholar
An A, Cercone N (1999) Discretization of continuous attributes for learning classification rules. Third Pacific-Asia conference on methodologies for knowledge discovery & data mining, 509–514
An A, Cercone N (2000) Rule quality measures improve the accuracy of rule induction: an experimental approach. Lecture notes in computer science, vol. 1932, pp 119–129
Auer P and Warmuth M (1998). Tracking the best disjunction. Machine Learning 32: 127–150
Article MATH Google Scholar
Baik S, Bala J (2004) A decision tree algorithm for distributed data mining: towards network intrusion detection. Lecture notes in computer science, vol. 3046, pp 206–212
Batista G and Monard MC (2003). An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell 17: 519–533
Article Google Scholar
Basak J and Kothari R (2004). A classification paradigm for distributed vertically partitioned data. Neural Comput 16(7): 1525–1544
Article MATH Google Scholar
Bauer E and Kohavi R (1999). An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning 36: 105–139
Article Google Scholar
Blake CL, Merz CJ (1998) UCI repository of machine learning databases. University of California, Irvine. (http://www.ics.uci.edu/~mlearn/MLRepository.html). Accessed 10 Oct 2005
Blockeel H and De Raedt L (1998). Top-down induction of first order logical decision trees. Artif Intell 101(1–2): 285–297
Article MATH Google Scholar
Blum A (1997). Empirical support for winnow and weighted-majority algorithms: results on a calendar scheduling domain. Mach Learn 26(1): 5–23
Article MathSciNet Google Scholar
Bonarini A (2000). An introduction to learning fuzzy classifier systems. Lect Notes Comput Sci 1813: 83–92
Google Scholar
Bouckaert R (2003) Choosing between two learning algorithms based on calibrated tests. In: Proceedings of 20th International Conference on Machine Learning. Morgan Kaufmann, pp 51–58
Bouckaert R (2004). Naive Bayes classifiers that perform well with continuous variables. Lect Notes Comput Sci 3339: 1089–1094
MathSciNet Google Scholar
Brazdil P, Soares C and Da Costa J (2003). Ranking learning algorithms: using IBL and meta-learning on accuracy and time results. Mach Learn 50: 251–277
Article MATH Google Scholar
Breiman L (1996). Bagging predictors. Mach Learn 24: 123–140
MATH MathSciNet Google Scholar
Breslow LA and Aha DW (1997). Simplifying decision trees: a survey. Knowl Eng Rev 12: 1–40
Article Google Scholar
Brighton H and Mellish C (2002). Advances in instance selection for instance-based learning algorithms. Data Min Knowl Disc 6: 153–172
Article MATH MathSciNet Google Scholar
Burges C (1998). A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2): 1–47
Article Google Scholar
Camargo LS and Yoneyama T (2001). Specification of training sets and the number of hidden neurons for multilayer perceptrons. Neural Comput 13: 2673–2680
Article MATH Google Scholar
Castellano G, Fanelli A and Pelillo M (1997). An iterative pruning algorithm for feedforward neural networks. IEEE Trans Neural Netw 8: 519–531
Article Google Scholar
Cheng J, Greiner R (2001) Learning Bayesian belief network classifiers: algorithms and system. In: Stroulia E, Matwin S (eds) AI 2001, LNAI 2056, pp 141–151
Cheng J, Greiner R, Kelly J, Bell D and Liu W (2002). Learning Bayesian networks from data: an information-theory based approach. Artif Intell 137: 43–90
Article MATH MathSciNet Google Scholar
Chickering DM (2002). Optimal structure identification with greedy search. J Mach Learn Res 3: 507–554
Article MathSciNet Google Scholar
Cohen W (1995) Fast effective rule induction. In: Proceedings of ICML-95, pp 115–123
Cowell RG (2001) Conditions under which conditional independence and scoring methods lead to identical selection of Bayesian network models. In: Proceedings of 17th International Conference on Uncertainty in Artificial Intelligence
Crammer K and Singer Y (2002). On the learnability and design of output codes for multiclass problems. Mach Learn 47: 201–233
Article MATH Google Scholar
Cristianini N and Shawe-Taylor J (2000). An introduction to support vector machines and other Kernel-based learning methods. Cambridge University Press, Cambridge
Google Scholar
Dantsin E, Eiter T, Gottlob G and Voronkov A (2001). Complexity and expressive power of logic programming. ACM Comput Surveys 33: 374–425
Article Google Scholar
De Raedt L (1996) Advances in inductive logic programming. IOS Press
De Mantaras RL and Armengol E (1998). Machine learning from examples: inductive and Lazy methods. Data Knowl Eng 25: 99–123
Article MATH Google Scholar
Dietterich TG (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7): 1895–1924
Article Google Scholar
Dietterich TG (2000). An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting and randomization. Mach Learn 40: 139–157
Article Google Scholar
Domeniconi C and Gunopulos D (2001). Adaptive nearest neighbor classification using support vector machines. Adv Neural Inf Process Syst 14: 665–672
Google Scholar
Domingos P and Pazzani M (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29: 103–130
Article MATH Google Scholar
Dutton D and Conroy G (1996). A review of machine learning. Knowl Eng Rev 12: 341–367
Article Google Scholar
Dzeroski S and Lavrac N (2001). Relational data mining. Springer, Berlin
MATH Google Scholar
Elomaa T and Rousu J (1999). General and efficient multisplitting of numerical attributes. Mach Learn 36: 201–244
Article MATH Google Scholar
Elomaa T (1999). The biases of decision tree pruning strategies. Lect Notes Comput Sci 1642: 63–74, Springer
Google Scholar
Fidelis MV, Lopes HS, Freitas AA (2000) Discovering comprehensible classification rules using a genetic algorithm. In: Proceedings of CEC-2000, conference on evolutionary computation La Jolla, USA, v. 1, pp 805–811
Flach PA, Lavrac N (2000) The role of feature construction in inductive rule learning. In: De Raedt L, Kramer S (eds) Proceedings of the ICML2000 workshop on attribute-value learning and relational learning: bridging the Gap. Stanford University
Frank E and Witten I (1998). Generating accurate rule sets without global optimization. In: Shavlik, J (eds) Machine learning: proceedings of the fifteenth international conference, pp. Morgan Kaufmann Publishers, San Francisco
Google Scholar
Freund Y and Schapire R (1997). A decision-theoretic generalization of on-line learning and an application to boosting. JCSS 55(1): 119–139
MATH MathSciNet Google Scholar
Freund Y and Schapire R (1999). Large margin classification using the perceptron algorithm. Mach Learn 37: 277–296
Article MATH Google Scholar
Friedman N, Geiger D and Goldszmidt M (1997). Bayesian network classifiers. Mach Learn 29: 131–163
Article MATH Google Scholar
Friedman N and Koller D (2003). Being Bayesian about network structure: a Bayesian approach to structure discovery in Bayesian networks. Mach Learn 50(1): 95–125
Article MATH Google Scholar
Furnkranz J (1997). Pruning algorithms for rule learning. Mach Learn 27: 139–171
Article Google Scholar
Furnkranz J (1999). Separate-and-conquer rule learning. Artif Intell Rev 13: 3–54
Article Google Scholar
Furnkranz J (2001) Round robin rule learning. In: Proceedings of the 18th international conference on machine learning (ICML-01), pp 146–153
Gama J and Brazdil P (1999). Linear tree. Intelligent Data Anal 3: 1–22
Article MATH Google Scholar
Gama J and Brazdil P (2000). Cascade generalization. Mach Learn 41: 315–343
Article MATH Google Scholar
Gehrke J, Ramakrishnan R and Ganti V (2000). RainForest—a framework for fast decision tree construction of large datasets. Data Min Knowl Disc 4(2–3): 127–162
Article Google Scholar
Genton M (2001). Classes of Kernels for machine learning: a statistics perspective. J Mach Learn Res 2: 299–312
Article MathSciNet Google Scholar
Guo G, Wang H, Bell D, Bi Y and Greer K (2003). KNN model-based approach in classification. Lect Notes Comput Sci 2888: 986–996
Article Google Scholar
Guyon I and Elissee A (2003). An introduction to variable and feature selection. J Mach Learn Res 3: 1157–1182
Article MATH Google Scholar
Hall L, Bowyer K, Kegelmeyer W, Moore T, Chao C (2000) Distributed learning on very large data sets. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 79–84
Heckerman D, Meek C, Cooper G (1999) A Bayesian approach to causal discovery. In: Glymour C, Cooper G (eds) Computation, causation, and discovery, MIT Press, pp 141–165
Ho TK (1998). The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20: 832–844
Article Google Scholar
Hodge V and Austin J (2004). A survey of outlier detection methodologies. Artif Intell Rev 22(2): 85–126
Article MATH Google Scholar
Japkowicz N and Stephen S (2002). The class imbalance problem: a systematic study. Intell Data Anal 6(5): 429–450
MATH Google Scholar
Jain AK, Murty MN and Flynn P (1999). Data clustering: a review. ACM Comput Surveys 31(3): 264–323
Article Google Scholar
Jensen F (1996) An introduction to Bayesian networks. Springer
Jordan MI (1998). Learning in graphical models. MIT Press, Cambridge
MATH Google Scholar
Kalousis A and Gama G (2004). On data and algorithms: understanding inductive performance. Mach Learn 54: 275–312
Article MATH Google Scholar
Keerthi S and Gilbert E (2002). Convergence of a generalized SMO algorithm for SVM classifier design. Mach Learn 46: 351–360
Article MATH Google Scholar
Kivinen J (2002) Online learning of linear classifiers. In: Advanced Lectures on Machine Learning: Machine Learning Summer School 2002, Australia, February 11–22, pp 235–257, ISSN: 0302–9743
Klusch M, Lodi S, Moro G (2003) Agent-based distributed data mining: the KDEC scheme. In: Intelligent information agents: the agentlink perspective, LNAI 2586, Springer, pp 104–122
Kon M and Plaskota L (2000). Information complexity of neural networks. Neural Netw 13: 365–375
Article Google Scholar
Kotsiantis S, Pintelas P (2004) Selective voting. In: Proceedings of the 4th International Conference on Intelligent Systems Design and Applications (ISDA 2004), August 26–28, Budapest, pp 397–402
Kuncheva L and Whitaker C (2001). Feature subsets for classifier combination: an enumerative experiment. Lect Notes Comput Sci 2096: 228–237
MathSciNet Google Scholar
Lazkano E and Sierra B (2003). BAYES-NEAREST: a new hybrid classifier combining Bayesian network and distance based algorithms. Lect Notes Comput Sci 2902: 171–183
Article Google Scholar
LiMin W, SenMiao Y, Ling L and HaiJun L (2004). Improving the performance of decision tree: a hybrid approach. Lect Notes Comput Sci 3288: 327–335
Article Google Scholar
Lindgren T (2004). Methods for rule conflict resolution. Lect Notes Comput Sci 3201: 262–273
Article Google Scholar
Littlestone N and Warmuth M (1994). The weighted majority algorithm. Informa Comput 108(2): 212–261
Article MATH MathSciNet Google Scholar
Liu H and Metoda H (2001). Instance selection and constructive data mining. Kluwer, Boston
Google Scholar
Maclin R, Shavlik J (1995) Combining the prediction of multiple classifiers: using competitive learning to initialize ANNs. In: Proceedings of the 14th International joint conference on AI, pp 524–530
Madden M (2003) The performance of Bayesian network classifiers constructed using different techniques. In: Proceedings of European conference on machine learning, workshop on probabilistic graphical models for classification, pp 59–70
Markovitch S and Rosenstein D (2002). Feature generation using general construction functions. Mach Learn 49: 59–98
Article MATH Google Scholar
McSherry D (1999). Strategic induction of decision trees. Knowl Based Syst 12(5–6): 269–275
Article Google Scholar
Melville P, Mooney R (2003) Constructing diverse classifier ensembles using artificial training examples. In: Proceedings of the IJCAI-2003, Acapulco, Mexico, pp 505–510
Mitchell T (1997) Machine learning. McGraw Hill
Muggleton S (1995). Inverse entailment and Progol. New Generat Comput Special issue on Inductive Logic Programming 13(3–4): 245–286
Google Scholar
Muggleton S (1999). Inductive logic programming: issues, results and the challenge of learning language in logic. Artif Intell 114: 283–296
Article MATH Google Scholar
Murthy SK (1998). Automatic construction of decision trees from data: a multi-disciplinary survey. Data Min Knowl Disc 2: 345–389
Article MathSciNet Google Scholar
Nadeau C and Bengio Y (2003). Inference for the generalization error. Mach Learn 52: 239–281
Article MATH Google Scholar
Neocleous C, Schizas C (2002) Artificial neural network learning: a comparative review, LNAI 2308, Springer-Verlag, pp 300–313
Okamoto S and Yugami N (2003). Effects of domain characteristics on instance-based learning algorithms. Theoret Comput Sci 298: 207–233
Article MATH MathSciNet Google Scholar
Opitz D and Maclin R (1999). Popular ensemble methods: an empirical study. J Artif Intell Res (JAIR) 11: 169–198
MATH Google Scholar
Parekh R, Yang J and Honavar V (2000). Constructive neural network learning algorithms for pattern classification. IEEE Trans Neural Netw 11(2): 436–451
Article Google Scholar
Platt J (1999) Using sparseness and analytic QP to speed training of support vector machines. In: Kearns M, Solla S, Cohn D (eds) Advances in neural information processing systems, MIT Press
Quinlan JR (1993). C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco
Google Scholar
Quinlan JR (1995). Induction of logic programs: FOIL and related systems. New Generat Comput 13: 287–312
Google Scholar
Quinlan JR (1996) Bagging, boosting, and C4.5. Proceedings of the 13th National Conference on Artificial intelligence. AAAI Press and the MIT Press, Menlo Park, CA, pp 725–730
Ratanamahatana C and Gunopulos D (2003). Feature selection for the naive Bayesian classifier using decision trees. Appl Artif Intell 17(5–6): 475–487
Google Scholar
Reinartz T (2002) A unifying view on instance selection, data mining and knowledge discovery, vol. 6. Kluwer Academic Publishers, pp 191–210
Reeves CR, Rowe JE (2003) Genetic algorithms—principles and perspectives: a guide to GA theory. Kluwer Academic
Roli F, Giacinto G and Vernazza G (2001). Methods for designing multiple classifier systems. Lect Notes Comput Sci 2096: 78–87
MathSciNet Google Scholar
Robert J, Howlett LCJ (2001) Radial basis function networks 2: new advances in design. Physica-Verlag Heidelberg, ISBN: 3790813680
Roy A (2000). On connectionism, rule extraction and brain-like learning. IEEE Trans Fuzzy Syst 8(2): 222–227
Article Google Scholar
Saad D (1998). Online learning in neural networks. Cambridge University Press, London
Google Scholar
Saitta L and Neri F (1998). Learning in the ‘Real World’. Mach Learn 30(2–3): 313–163
Google Scholar
Sanchez J, Barandela R and Ferri F (2002). On filtering the training prototypes in nearest neighbour classification. Lect Notes Comput Sci 2504: 239–248
Article Google Scholar
Schapire RE, Singer Y, Singhal A (1998) Boosting and Rocchio applied to text filtering. In SIGIR ’98: Proceedings of the 21st Annual International Conference on Research and Development in Information Retrieval, pp 215–223
Scholkopf C, Burges JC, Smola AJ (1999) Advances in Kernel Methods. MIT Press
Setiono R and Loew WK (2000). FERNN: an algorithm for fast extraction of rules from neural networks. Appl Intell 12: 15–25
Article Google Scholar
Siddique MNH and Tokhi MO (2001). Training neural networks: backpropagation vs. genetic algorithms. IEEE Int Joint Conf Neural Netw 4: 2673–2678
Google Scholar
Ting K and Witten I (1999). Issues in stacked generalization. Artif Intell Res 10: 271–289
MATH Google Scholar
Tjen-Sien L, Wei-Yin L and Yu-Shan S (2000). A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach Learn 40: 203–228
Article MATH Google Scholar
Todorovski L and Dzeroski S (2003). Combining classifiers with meta decision trees. Mach Learn 50: 223–249
Article MATH Google Scholar
Utgoff P, Berkman N and Clouse J (1997). Decision tree induction based on efficient tree restructuring. Mach Learn 29(1): 5–44
Article MATH Google Scholar
Veropoulos K, Campbell C, Cristianini N (1999) Controlling the sensitivity of support vector machines. In: Proceedings of the international joint conference on artificial intelligence (IJCAI99)
Villada R and Drissi Y (2002). A perspective view and survey of meta-learning. Artif Intell Rev 18: 77–95
Article Google Scholar
Vivarelli F and Williams C (2001). Comparing Bayesian neural network algorithms for classifying segmented outdoor images. Neural Netw 14: 427–437
Article Google Scholar
Wall R, Cunningham P, Walsh P and Byrne S (2003). Explaining the output of ensembles in medical decision support on a case by case basis. Artif Intell Med 28(2): 191–206
Article Google Scholar
Webb IG (2000). Multiboosting: a technique for combining boosting and wagging. Mach Learn 40(2): 159–196
Article Google Scholar
Wettschereck D, Aha DW and Mohri T (1997). A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artif Intell Rev 10: 1–37
Google Scholar
Wilson DR and Martinez T (2000). Reduction techniques for instance-based learning algorithms. Mach Learn 38: 257–286
Article MATH Google Scholar
Witten I and Frank E (2005). Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
MATH Google Scholar
Yam J and Chow W (2001). Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients. IEEE Trans Neural Netw 12: 430–434
Article Google Scholar
Yang Y and Webb G (2003). On why discretization works for Naive-Bayes classifiers. Lect Notes Comput Sci 2903: 440–452
MathSciNet Google Scholar
Yen GG, Lu H (2000) Hierarchical genetic algorithm based neural network design. In: IEEE symposium on combinations of evolutionary computation and neural networks, pp 168–175
Yu L and Liu H (2004). Efficient feature selection via analysis of relevance and redundancy. JMLR 5: 1205–1224
MathSciNet Google Scholar
Zhang G (2000). Neural networks for classification: a survey. IEEE Trans Syst Man Cy C 30(4): 451–462
Article Google Scholar
Zhang S, Zhang C and Yang Q (2002). Data preparation for data mining. Appl Artif Intell 17: 375–381
Article Google Scholar
Zheng Z (1998). Constructing conjunctions using systematic search on decision trees. Knowl Based Syst J 10: 421–430
Article Google Scholar
Zheng Z (2000). Constructing X-of-N attributes for decision tree learning. Mach Learn 40: 35–75
Article MATH Google Scholar
Zheng Z and Webb G (2000). Lazy learning of Bayesian rules. Mach Learn 41(1): 53–84
Article Google Scholar
Zhou Z and Chen Z (2002). Hybrid decision tree. Knowl Based Syst 15(8): 515–528
Article Google Scholar
Zhou Z (2004). Rule extraction: using neural networks or for neural networks?. J Comput Sci Technol 19(2): 249–253
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, University of Peloponnese, Peloponnese, Greece
S. B. Kotsiantis & P. E. Pintelas
Educational Software Development Laboratory, Department of Mathematics, University of Patras, P. O. Box 1399, Patras, Greece
S. B. Kotsiantis & P. E. Pintelas
Computer Technology Institute, Patras, Greece
I. D. Zaharakis

Authors

S. B. Kotsiantis
View author publications
You can also search for this author in PubMed Google Scholar
I. D. Zaharakis
View author publications
You can also search for this author in PubMed Google Scholar
P. E. Pintelas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. B. Kotsiantis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kotsiantis, S.B., Zaharakis, I.D. & Pintelas, P.E. Machine learning: a review of classification and combining techniques. Artif Intell Rev 26, 159–190 (2006). https://doi.org/10.1007/s10462-007-9052-3

Download citation

Published: 10 November 2007
Issue Date: November 2006
DOI: https://doi.org/10.1007/s10462-007-9052-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Machine learning: a review of classification and combining techniques

Abstract

Access this article

Similar content being viewed by others

Introduction

Supervised Learning: Classification and Regression

Classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Machine learning: a review of classification and combining techniques

Abstract

Access this article

Similar content being viewed by others

Introduction

Supervised Learning: Classification and Regression

Classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation