ABSTRACT
It is known that certain classification algorithm requires continuous data to be discretized for it to produce better classification accuracy. Hence, many works have explored the pairing of classification algorithm and discretization techniques, yet tree-based classifier especially Classification and Regression Trees (CART) still have an issue with classification accuracy regardless of different pairing with existing discretization techniques. The role of fuzzy partition and fuzzy sets interval are not something new in data discretization but none yet to explore the pairing of fuzzy discretization with tree-based algorithm. This paper will be discussing on an approach of using fuzzy based discretization and a member of tree-based algorithm known as Random Forest, a better version of CART. In this study, continuous data are identified from a dataset and discretized through the fuzzy discretization. Then, 10-fold cross validation is done on the transformed dataset and seven well-known classifiers are used including the proposed approach. Based on the results, better classification accuracy is achieved when fuzzy discretization is paired with Random Forest algorithm compared to CART. On top of that, with the present of fuzzy discretization technique, an increased in the classification accuracy has been obtained compared to other classification algorithms.
- Gennady Agre and Stanimir Peev. 2002. On Supervised and Unsupervised Discretization. Methods 2, 2 (2002).Google Scholar
- Nor Idayu Ahmad-Azami, Nooraini Yusoff, and Ku Ruhana Ku-Mahamud. 2018. Fuzzy Discretization Technique for Bayesian Flood Disaster Model. 2, 2 (2018), 167--189.Google Scholar
- Azuraliza Abu Bakar, Zulaiha Ali Othman, Nor Liyana, and Mohd Shuib. 2009. Building A New Taxonomy For Data Discretization Techniques. 2009 2nd Conf. Data Min. Optim. October (2009), 132--140. DOI:https://doi.org/10.1109/DMO.2009.5341896Google Scholar
- James Dougherty, Ron Kohavi, and Mehran Sahami. Supervised and Unsupervised Discretization of Continous Fearures. 0,.Google Scholar
- Michela Fazzolari, Rafael Alcalá, and Francisco Herrera. 2014. A multi-objective evolutionary method for learning granularities based on fuzzy discretization to improve the accuracy-complexity trade-off of fuzzy rule-based classification systems: D-MOFARC algorithm. Appl. Soft Comput. J. 24, (2014), 470--481. DOI:https://doi.org/10.1016/j.asoc.2014.07.019Google ScholarDigital Library
- Mehmet Hacibeyoglu and Ahmet Arslan. 2011. Improving Classification Accuracy with Discretization on Datasets Including Continuous Valued Features. June (2011).Google Scholar
- Ehsan Ali Kareem and Mehdi Duaimi. 2014. Improved Accuracy for Decision Tree Algorithm Based on Unsupervised Discretization. September (2014).Google Scholar
- Sotiris Kotsiantis and Dimitris Kanellopoulos. 2006. Discretization Techniques: A recent survey. GESTS Int. Trans. Comput. Sci. Eng. 32, 1 (2006), 47--58.Google Scholar
- Simone A Ludwig. 2015. Analyzing gene expression data: Fuzzy decision tree algorithm applied to the classification of cancer data Analyzing Gene Expression Data: Fuzzy Decision Tree Algorithm applied to the Classification of Cancer Data. August (2015). DOI:https://doi.org/10.1109/FUZZ-IEEE.2015.7337854Google ScholarDigital Library
- Sergio Ram, David Mart, and Manuel Ben. Data Discretization: Taxonomy and Big Data Challenge. 1--26.Google Scholar
- Sahar Sardari, Mahdi Eftekhari, and Fatemeh Afsari. 2017. Hesitant fuzzy decision tree approach for highly imbalanced data classification. Appl. Soft Comput. J. 61, (2017), 727--741. DOI:https://doi.org/10.1016/j.asoc.2017.08.052Google ScholarCross Ref
- M. Shanmugapriya, H. Khanna Nehemiah, R.S. Bhuvaneswaran, Kannan Arputharaj, and J. Dhalia Sweetlin. 2017. Fuzzy Discretization based Classification of Medical Data. Res. J. Appl. Sci. Eng. Technol. 14, 8 (2017), 291--298. DOI:https://doi.org/10.19026/rjaset.14.4953Google ScholarCross Ref
- Jaime Lynn Speiser, Michael E Miller, Janet Tooze, and Edward Ip. 2019. A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl. 134, (2019), 93--101. DOI:https://doi.org/10.1016/j.eswa.2019.05.028Google ScholarDigital Library
- Chih-fong Tsai and Yu-chi Chen. 2019. The optimal combination of feature selection and data discretization: An empirical study. Inf. Sci. (Ny). 505, (2019), 282--293. DOI:https://doi.org/10.1016/j.ins.2019.07.091Google ScholarDigital Library
Index Terms
- Improving Classification Accuracy of Random Forest Algorithm Using Unsupervised Discretization with Fuzzy Partition and Fuzzy Set Intervals
Recommendations
A fuzzy random forest
When individual classifiers are combined appropriately, a statistically significant increase in classification accuracy is usually obtained. Multiple classifier systems are the result of combining several individual classifiers. Following Breiman's ...
Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm
A decision forest is an ensemble of decision trees, and it is often built to discover more patterns (i.e. logic rules) and predict/classify class values more accurately than a single decision tree. Existing decision forest algorithms are typically used ...
Fuzzy fast classification algorithm with hybrid of ID3 and SVM
Recent Advances in Soft Computing: Theories and ApplicationsThe Classification of data is usually very large database that is the reason we want to classify the large data into different fragmentation of its same type. Already many algorithms have been used for classification like Id3, rule based algorithm, ...
Comments