An Improved CART Decision Tree for Datasets with Irrelevant Feature

Mahmood, Ali Mirza; Imran, Mohammad; Satuluri, Naganjaneyulu; Kuppa, Mrithyumjaya Rao; Rajesh, Vemulakonda

doi:10.1007/978-3-642-27172-4_64

Ali Mirza Mahmood²⁰,
Mohammad Imran²¹,
Naganjaneyulu Satuluri²⁰,
Mrithyumjaya Rao Kuppa²² &
…
Vemulakonda Rajesh²³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7076))

Included in the following conference series:

International Conference on Swarm, Evolutionary, and Memetic Computing

2222 Accesses
6 Citations

Abstract

Data mining tasks results are usually improved by reducing the dimensionality of data. This improvement however is achieved harder in the case that data size is moderate or huge. Although numerous algorithms for accuracy improvement have been proposed, all assume that inducing a compact and highly generalized model is difficult. In order to address above said issue, we introduce Randomized Gini Index (RGI), a novel heuristic function for dimensionality reduction, particularly applicable in large scale databases. Apart from removing irrelevant attributes, our algorithm is capable of minimizing the level of noise in the data to a greater extend which is a very attractive feature for data mining problems. We extensively evaluate its performance through experiments on both artificial and real world datasets. The outcome of the study shows the suitability and viability of our approach for knowledge discovery in moderate and large datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zhao, H., Sinha, A.P.: An Efficient Algorithm for Generating Generalized Decision Forests. IEEE Transactions on Systems, Man, and Cybernetics —Part A: Systems and Humans 35(5), 287–299 (2005)
Google Scholar
Hu, J., Deng, J., Sui, M.: A New Approach for Decision Tree Based on Principal Component Analysis. In: Proceedings of Conference on Computational Intelligence and Software Engineering, pp. 1–4 (2009)
Google Scholar
Liu, D., Lai, C., Lee, W.: A Hybrid of Sequential Rules and Collaborative Filtering for Product Recommendation. Information Sciences 179(20), 3505–3519 (2009)
Article Google Scholar
Mitchell, T.M.: Machine Learning. McGraw Hill, New York (1997)
MATH Google Scholar
Mahmood, A.M., Kuppa, M.R., Reddi, K.K.: A new decision tree induction using composite splitting criterion. Journal of Applied Computer Science & Mathematics 9(4), 69–74 (2010)
Google Scholar
Aviad, B., Roy, G.: Classification by Clustering Decision Tree-like Classifier based on Adjusted Clusters. Expert Systems with Applications (2011), doi:10.1016/j.eswa.2011.01.001
Google Scholar
Chang, P.-C., Fan, C.-Y., Lin, J.-L.: Trend discovery in financial time series data using a case based fuzzy decision tree. Expert Systems with Applications 38, 6070–6080 (2011)
Article Google Scholar
Khanli, L.M., Mahan, F., Isazadeh, A.: Active rule learning using decision tree for resource management in grid computing. In: Future Generation Computer Systems (2011), doi:10.1016/j.future.2010.12.016
Google Scholar
Mahmood, A.M., Kuppa, M.R.: A novel pruning approach using expert knowledge for data-specific pruning. Engineering with Computers (2011), doi:10.1007/s00366-011-0214-1
Google Scholar
Wang, Y.: The Cascade Decision-tree Improvement Algorithm Based on Unbalanced Data Set. In: Proceedings of 2010 International Conference on Communications and Mobile Computing, pp. 284–288 (2010)
Google Scholar
Tsang, S., Kao, B., Yip, K.Y., Ho, W.-S., Lee, S.D.: Decision Trees for Uncertain Data. IEEE Transactions on Knowledge and Data Engineering 23(1), 64–78 (2011)
Article Google Scholar
Politos, D.N., Romano, J.P., Wolf, M.: Subsampling. Springer, Heidelberg (1999)
Book MATH Google Scholar
Liu, H., Motoda, H.: Computational methods of feature selection. Chapman and Hall/CRC Editions (2008)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Asuncion, A., Newman, D.: UCI machine learning repository (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Li, N., Zhao, L., Chen, A.-X., Meng, Q.-W., Zhang, G.-F.: A New Heuristic of the Decision Tree Induction. In: Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, Baoding, pp. 1659–1664 (2009)
Google Scholar
Qi, C.: A New Partition Criterion for fuzzy Decision Tree Algorithm. In: Proceedings of Workshop on IntelligentInformation Technology Application, pp. 43–46 (2007)
Google Scholar
Mahmood, A.M., Kuppa, M.R.: Early Detection of Clinical Parameters in Heart Disease Using Improved Decision Tree Algorithm. In: Proceedings of IEEE 2^nd Vaagdevi International Conference on Information Technology for Real World Problems (VCON 2010), Warangal, India, pp. 24–29 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Acharya Nagarjuna University, Guntur, Andhra Pradesh, India
Ali Mirza Mahmood & Naganjaneyulu Satuluri
Rayalaseema University, Kurnool, Andhra Pradesh, India
Mohammad Imran
Vaagdevi College of Engineering, Warangal, Andhra Pradesh, India
Mrithyumjaya Rao Kuppa
Pursing M.Tech, MIST, Sathupalli, Khamaman District, Andhra Pradesh, India
Vemulakonda Rajesh

Authors

Ali Mirza Mahmood
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Imran
View author publications
You can also search for this author in PubMed Google Scholar
Naganjaneyulu Satuluri
View author publications
You can also search for this author in PubMed Google Scholar
Mrithyumjaya Rao Kuppa
View author publications
You can also search for this author in PubMed Google Scholar
Vemulakonda Rajesh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical Engineering, IIT Delhi, India
Bijaya Ketan Panigrahi
School of Electrical and Electronic Engineering, Nanyang Technological University, 639798, Singapore
Ponnuthurai Nagaratnam Suganthan
Department of Electronics and Telecommunications, Jadavpur University, 700032, Kolkata, India
Swagatam Das
ANITS, Visakhapatnam, India
Suresh Chandra Satapathy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mahmood, A.M., Imran, M., Satuluri, N., Kuppa, M.R., Rajesh, V. (2011). An Improved CART Decision Tree for Datasets with Irrelevant Feature. In: Panigrahi, B.K., Suganthan, P.N., Das, S., Satapathy, S.C. (eds) Swarm, Evolutionary, and Memetic Computing. SEMCCO 2011. Lecture Notes in Computer Science, vol 7076. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27172-4_64

Download citation

DOI: https://doi.org/10.1007/978-3-642-27172-4_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27171-7
Online ISBN: 978-3-642-27172-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics