Abstract
Data mining tasks results are usually improved by reducing the dimensionality of data. This improvement however is achieved harder in the case that data size is moderate or huge. Although numerous algorithms for accuracy improvement have been proposed, all assume that inducing a compact and highly generalized model is difficult. In order to address above said issue, we introduce Randomized Gini Index (RGI), a novel heuristic function for dimensionality reduction, particularly applicable in large scale databases. Apart from removing irrelevant attributes, our algorithm is capable of minimizing the level of noise in the data to a greater extend which is a very attractive feature for data mining problems. We extensively evaluate its performance through experiments on both artificial and real world datasets. The outcome of the study shows the suitability and viability of our approach for knowledge discovery in moderate and large datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Zhao, H., Sinha, A.P.: An Efficient Algorithm for Generating Generalized Decision Forests. IEEE Transactions on Systems, Man, and Cybernetics —Part A: Systems and Humans 35(5), 287–299 (2005)
Hu, J., Deng, J., Sui, M.: A New Approach for Decision Tree Based on Principal Component Analysis. In: Proceedings of Conference on Computational Intelligence and Software Engineering, pp. 1–4 (2009)
Liu, D., Lai, C., Lee, W.: A Hybrid of Sequential Rules and Collaborative Filtering for Product Recommendation. Information Sciences 179(20), 3505–3519 (2009)
Mitchell, T.M.: Machine Learning. McGraw Hill, New York (1997)
Mahmood, A.M., Kuppa, M.R., Reddi, K.K.: A new decision tree induction using composite splitting criterion. Journal of Applied Computer Science & Mathematics 9(4), 69–74 (2010)
Aviad, B., Roy, G.: Classification by Clustering Decision Tree-like Classifier based on Adjusted Clusters. Expert Systems with Applications (2011), doi:10.1016/j.eswa.2011.01.001
Chang, P.-C., Fan, C.-Y., Lin, J.-L.: Trend discovery in financial time series data using a case based fuzzy decision tree. Expert Systems with Applications 38, 6070–6080 (2011)
Khanli, L.M., Mahan, F., Isazadeh, A.: Active rule learning using decision tree for resource management in grid computing. In: Future Generation Computer Systems (2011), doi:10.1016/j.future.2010.12.016
Mahmood, A.M., Kuppa, M.R.: A novel pruning approach using expert knowledge for data-specific pruning. Engineering with Computers (2011), doi:10.1007/s00366-011-0214-1
Wang, Y.: The Cascade Decision-tree Improvement Algorithm Based on Unbalanced Data Set. In: Proceedings of 2010 International Conference on Communications and Mobile Computing, pp. 284–288 (2010)
Tsang, S., Kao, B., Yip, K.Y., Ho, W.-S., Lee, S.D.: Decision Trees for Uncertain Data. IEEE Transactions on Knowledge and Data Engineering 23(1), 64–78 (2011)
Politos, D.N., Romano, J.P., Wolf, M.: Subsampling. Springer, Heidelberg (1999)
Liu, H., Motoda, H.: Computational methods of feature selection. Chapman and Hall/CRC Editions (2008)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Asuncion, A., Newman, D.: UCI machine learning repository (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Li, N., Zhao, L., Chen, A.-X., Meng, Q.-W., Zhang, G.-F.: A New Heuristic of the Decision Tree Induction. In: Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, Baoding, pp. 1659–1664 (2009)
Qi, C.: A New Partition Criterion for fuzzy Decision Tree Algorithm. In: Proceedings of Workshop on IntelligentInformation Technology Application, pp. 43–46 (2007)
Mahmood, A.M., Kuppa, M.R.: Early Detection of Clinical Parameters in Heart Disease Using Improved Decision Tree Algorithm. In: Proceedings of IEEE 2nd Vaagdevi International Conference on Information Technology for Real World Problems (VCON 2010), Warangal, India, pp. 24–29 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mahmood, A.M., Imran, M., Satuluri, N., Kuppa, M.R., Rajesh, V. (2011). An Improved CART Decision Tree for Datasets with Irrelevant Feature. In: Panigrahi, B.K., Suganthan, P.N., Das, S., Satapathy, S.C. (eds) Swarm, Evolutionary, and Memetic Computing. SEMCCO 2011. Lecture Notes in Computer Science, vol 7076. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27172-4_64
Download citation
DOI: https://doi.org/10.1007/978-3-642-27172-4_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27171-7
Online ISBN: 978-3-642-27172-4
eBook Packages: Computer ScienceComputer Science (R0)