A novel pruning approach using expert knowledge for data-specific pruning

Mahmood, Ali Mirza; Kuppa, Mrithyumjaya Rao

doi:10.1007/s00366-011-0214-1

A novel pruning approach using expert knowledge for data-specific pruning

Original Article
Published: 17 March 2011

Volume 28, pages 21–30, (2012)
Cite this article

Engineering with Computers Aims and scope Submit manuscript

Ali Mirza Mahmood¹ &
Mrithyumjaya Rao Kuppa²

511 Accesses
14 Citations
Explore all metrics

Abstract

Classification is an important data mining task that discovers hidden knowledge from the labeled datasets. Most approaches to pruning assume that all dataset are equally uniform and equally important, so they apply equal pruning to all the datasets. However, in real-world classification problems, all the datasets are not equal and considering equal pruning rate during pruning tends to generate a decision tree with large size and high misclassification rate. We approach the problem by first investigating the properties of each dataset and then deriving data-specific pruning value using expert knowledge which is used to design pruning techniques to prune decision trees close to perfection. An efficient pruning algorithm dubbed EKBP is proposed and is very general as we are free to use any learning algorithm as the base classifier. We have implemented our proposed solution and experimentally verified its effectiveness with forty real world benchmark dataset from UCI machine learning repository. In all these experiments, the proposed approach shows it can dramatically reduce the tree size while enhancing or retaining the level of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new rule-based knowledge extraction approach for imbalanced datasets

Article 25 January 2019

Model tree pruning

Article 02 February 2019

An Optimized Formulation of Decision Tree Classifier

References

Mahmood AM, Kuppa MR (2010) Early detection of clinical parameters in heart disease using improved decision tree algorithm. In: Proceedings of IEEE 2nd Vaagdevi international conference on information technology for real world problems (VCON’10), Warangal, India, pp. 24–29
Mahmood AM, Kuppa MR, Reddi KK (2010) A novel algorithm for scaling up the accuracy of decision trees. Int J Comput Sci Eng 02(02):126–131
Google Scholar
Mahmood AM, Kuppa MR, Reddi KK (2010) A new decision tree induction using composite splitting criterion. J Appl Comput Sci Math 9(4): 69–74 (Suceava)
Google Scholar
Reddi KK, Mahmood AM, Kuppa MR (2010) Generating optimized decision tree based on discrete wavelet transform. Int J Eng Sci Technol 2(3):157–164
Google Scholar
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth, Belmont
MATH Google Scholar
Quinlan JR (1987) Simplifying decision trees. Int J Man Mach Stud 27:221–234
Article Google Scholar
Niblett T, Bratko I (1986) Learning decision rules in noisy domains, in expert systems. Cambridge University Press, Cambridge
Google Scholar
Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco
Google Scholar
Bratko I, Bohanec M (1994) Trading accuracy for simplicity in decision trees. Mach Learning 15:223–250
MATH Google Scholar
Almuallim H (1996) An efficient algorithm for optimal pruning of decision trees. Artif Intell 83(2):347–362
Article Google Scholar
Rissanen J (1989) Stochastic complexity and statistical inquiry. World Scientific, Singapore
Google Scholar
Quinlan JR, Rivest RL (1989) Inferring decision trees using the minimum description length principle. Inf Comput 80:227–248
Article MATH MathSciNet Google Scholar
Mehta RL, Rissanen J, Agrawal R (1995) MDL-based decision tree pruning. In: Proceedings of 1st international conference on knowledge discovery and data mining, pp 216–221
Mingers J (1989) An empirical comparison of pruning methods for decision tree induction. Mach Learning 4(2):227–243
Article Google Scholar
Wallace C, Patrick J (1993) Coding decision trees. Mach Learning 11:7–22
Article MATH Google Scholar
Kearns M, Mansour Y (1998) A fast bottom-up decision tree pruning algorithm with near-optimal generalization. In: Shavlik J (ed) Proceedings of 15th international conference on machine learning, pp 269–277
Wei J-M, Wang SQ, Yu G, Gu L, Wang G-Y, Yuan X-J (2009) A novel method for pruning decision tree. In: Proceedings of the eighth international conference on machine learning and cybernetics, Baoding, pp 339–343
Hartley RVL (1928) Transmission of information. Bell Syst Tech J 7:535–563
Google Scholar
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423, 623–656
Google Scholar
Shafer, Glenn (1976) A mathematical theory of evidence. Princeton University Press, Princeton
Zadeh L (1965) Fuzzy Sets. Inf Control 8:338–353
Article MATH MathSciNet Google Scholar
Buchanan BG, Shortliffe EH (eds) (1979) Rule-based expert systems: the MYCIN experiments of the stanford heuristic programming project, SRI Report, Stanford Research Institute, 333 Ravenswood Avenue, Menlo Park
Duda RO, Hart P, Konolige K, Reboh R (1984) A computer-based consultation for mineral exploration. Addison-Wesley, Reading
Google Scholar
Durkin J (1991) Designing an induction expert system. AI Expert 6(12):29–35
Google Scholar
Rokach L, Maimon O (2005) Top-down induction of decision trees classifiers—a survey. IEEE Trans Syst Man Cybern Part C Appl Rev 35(4):476–487
Article Google Scholar
Blake C, Merz CJ (2000) UCI repository of machine learning databases. Machine-readable data repository. Department of Information and Computer Science, University of California at Irvine, Irvine. http://www.ics.uci.edu/mlearn/MLRepository.html

Download references

Acknowledgments

The authors would like to thank Hiroshi Motoda and Huan Liu, for their suggestions and help during PAKDD 2010 Conference. The authors would also like to thank UCI repository of machine learning databases.

Author information

Authors and Affiliations

Acharya Nagarjuna University, Guntur, India
Ali Mirza Mahmood
Vaagdevi College of Engineering, Warangal, India
Mrithyumjaya Rao Kuppa

Authors

Ali Mirza Mahmood
View author publications
You can also search for this author in PubMed Google Scholar
Mrithyumjaya Rao Kuppa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Mirza Mahmood.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mahmood, A.M., Kuppa, M.R. A novel pruning approach using expert knowledge for data-specific pruning. Engineering with Computers 28, 21–30 (2012). https://doi.org/10.1007/s00366-011-0214-1

Download citation

Received: 17 July 2010
Accepted: 14 February 2011
Published: 17 March 2011
Issue Date: January 2012
DOI: https://doi.org/10.1007/s00366-011-0214-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel pruning approach using expert knowledge for data-specific pruning

Abstract

Access this article

Similar content being viewed by others

A new rule-based knowledge extraction approach for imbalanced datasets

Model tree pruning

An Optimized Formulation of Decision Tree Classifier

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel pruning approach using expert knowledge for data-specific pruning

Abstract

Access this article

Similar content being viewed by others

A new rule-based knowledge extraction approach for imbalanced datasets

Model tree pruning

An Optimized Formulation of Decision Tree Classifier

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation