research-article

Improving Classification Accuracy of Random Forest Algorithm Using Unsupervised Discretization with Fuzzy Partition and Fuzzy Set Intervals

Authors:
Muhammad Nur Fikri Hishamuddin

Department of Computer and Information Science, Universiti Teknologi PETRONAS, Seri Iskandar, Perak, Malaysia

Department of Computer and Information Science, Universiti Teknologi PETRONAS, Seri Iskandar, Perak, Malaysia
View Profile

,
Mohd Fadzil Hassan

Department of Computer and Information Science, Universiti Teknologi PETRONAS, Seri Iskandar, Perak, Malaysia

Department of Computer and Information Science, Universiti Teknologi PETRONAS, Seri Iskandar, Perak, Malaysia
View Profile

,
Ainul Akmar Mokhtar

Department of Mechanical Engineering, Universiti Teknologi PETRONAS, Seri Iskandar, Perak, Malaysia

Department of Mechanical Engineering, Universiti Teknologi PETRONAS, Seri Iskandar, Perak, Malaysia
View Profile

ICSCA '20: Proceedings of the 2020 9th International Conference on Software and Computer ApplicationsFebruary 2020Pages 99–104https://doi.org/10.1145/3384544.3384590

Published:17 April 2020Publication History

ICSCA '20: Proceedings of the 2020 9th International Conference on Software and Computer Applications

Pages 99–104

ABSTRACT

It is known that certain classification algorithm requires continuous data to be discretized for it to produce better classification accuracy. Hence, many works have explored the pairing of classification algorithm and discretization techniques, yet tree-based classifier especially Classification and Regression Trees (CART) still have an issue with classification accuracy regardless of different pairing with existing discretization techniques. The role of fuzzy partition and fuzzy sets interval are not something new in data discretization but none yet to explore the pairing of fuzzy discretization with tree-based algorithm. This paper will be discussing on an approach of using fuzzy based discretization and a member of tree-based algorithm known as Random Forest, a better version of CART. In this study, continuous data are identified from a dataset and discretized through the fuzzy discretization. Then, 10-fold cross validation is done on the transformed dataset and seven well-known classifiers are used including the proposed approach. Based on the results, better classification accuracy is achieved when fuzzy discretization is paired with Random Forest algorithm compared to CART. On top of that, with the present of fuzzy discretization technique, an increased in the classification accuracy has been obtained compared to other classification algorithms.

References

Gennady Agre and Stanimir Peev. 2002. On Supervised and Unsupervised Discretization. Methods 2, 2 (2002).Google Scholar
Nor Idayu Ahmad-Azami, Nooraini Yusoff, and Ku Ruhana Ku-Mahamud. 2018. Fuzzy Discretization Technique for Bayesian Flood Disaster Model. 2, 2 (2018), 167--189.Google Scholar
Azuraliza Abu Bakar, Zulaiha Ali Othman, Nor Liyana, and Mohd Shuib. 2009. Building A New Taxonomy For Data Discretization Techniques. 2009 2nd Conf. Data Min. Optim. October (2009), 132--140. DOI:https://doi.org/10.1109/DMO.2009.5341896Google Scholar
James Dougherty, Ron Kohavi, and Mehran Sahami. Supervised and Unsupervised Discretization of Continous Fearures. 0,.Google Scholar
Michela Fazzolari, Rafael Alcalá, and Francisco Herrera. 2014. A multi-objective evolutionary method for learning granularities based on fuzzy discretization to improve the accuracy-complexity trade-off of fuzzy rule-based classification systems: D-MOFARC algorithm. Appl. Soft Comput. J. 24, (2014), 470--481. DOI:https://doi.org/10.1016/j.asoc.2014.07.019Google ScholarDigital Library
Mehmet Hacibeyoglu and Ahmet Arslan. 2011. Improving Classification Accuracy with Discretization on Datasets Including Continuous Valued Features. June (2011).Google Scholar
Ehsan Ali Kareem and Mehdi Duaimi. 2014. Improved Accuracy for Decision Tree Algorithm Based on Unsupervised Discretization. September (2014).Google Scholar
Sotiris Kotsiantis and Dimitris Kanellopoulos. 2006. Discretization Techniques: A recent survey. GESTS Int. Trans. Comput. Sci. Eng. 32, 1 (2006), 47--58.Google Scholar
Simone A Ludwig. 2015. Analyzing gene expression data: Fuzzy decision tree algorithm applied to the classification of cancer data Analyzing Gene Expression Data: Fuzzy Decision Tree Algorithm applied to the Classification of Cancer Data. August (2015). DOI:https://doi.org/10.1109/FUZZ-IEEE.2015.7337854Google ScholarDigital Library
Sergio Ram, David Mart, and Manuel Ben. Data Discretization: Taxonomy and Big Data Challenge. 1--26.Google Scholar
Sahar Sardari, Mahdi Eftekhari, and Fatemeh Afsari. 2017. Hesitant fuzzy decision tree approach for highly imbalanced data classification. Appl. Soft Comput. J. 61, (2017), 727--741. DOI:https://doi.org/10.1016/j.asoc.2017.08.052Google ScholarCross Ref
M. Shanmugapriya, H. Khanna Nehemiah, R.S. Bhuvaneswaran, Kannan Arputharaj, and J. Dhalia Sweetlin. 2017. Fuzzy Discretization based Classification of Medical Data. Res. J. Appl. Sci. Eng. Technol. 14, 8 (2017), 291--298. DOI:https://doi.org/10.19026/rjaset.14.4953Google ScholarCross Ref
Jaime Lynn Speiser, Michael E Miller, Janet Tooze, and Edward Ip. 2019. A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl. 134, (2019), 93--101. DOI:https://doi.org/10.1016/j.eswa.2019.05.028Google ScholarDigital Library
Chih-fong Tsai and Yu-chi Chen. 2019. The optimal combination of feature selection and data discretization: An empirical study. Inf. Sci. (Ny). 505, (2019), 282--293. DOI:https://doi.org/10.1016/j.ins.2019.07.091Google ScholarDigital Library

Index Terms

Improving Classification Accuracy of Random Forest Algorithm Using Unsupervised Discretization with Fuzzy Partition and Fuzzy Set Intervals
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Classification and regression trees
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

A fuzzy random forest

When individual classifiers are combined appropriately, a statistically significant increase in classification accuracy is usually obtained. Multiple classifier systems are the result of combining several individual classifiers. Following Breiman's ...
Read More
Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm

A decision forest is an ensemble of decision trees, and it is often built to discover more patterns (i.e. logic rules) and predict/classify class values more accurately than a single decision tree. Existing decision forest algorithms are typically used ...
Read More
Fuzzy fast classification algorithm with hybrid of ID3 and SVM
Recent Advances in Soft Computing: Theories and Applications

The Classification of data is usually very large database that is the reason we want to classify the large data into different fragmentation of its same type. Already many algorithms have been used for classification like Id3, rule based algorithm, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICSCA '20: Proceedings of the 2020 9th International Conference on Software and Computer Applications
February 2020
382 pages
ISBN:9781450376655
DOI:10.1145/3384544

Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 April 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Classification
Discretization
Fuzzy
Random Forest
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 126
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Improving Classification Accuracy of Random Forest Algorithm Using Unsupervised Discretization with Fuzzy Partition and Fuzzy Set Intervals

ICSCA '20: Proceedings of the 2020 9th International Conference on Software and Computer Applications

ABSTRACT

References

Cited By

Index Terms

Recommendations

A fuzzy random forest

Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm

Fuzzy fast classification algorithm with hybrid of ID3 and SVM

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Improving Classification Accuracy of Random Forest Algorithm Using Unsupervised Discretization with Fuzzy Partition and Fuzzy Set Intervals

ICSCA '20: Proceedings of the 2020 9th International Conference on Software and Computer Applications

ABSTRACT

References

Cited By

Index Terms

Recommendations

A fuzzy random forest

Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm

Fuzzy fast classification algorithm with hybrid of ID3 and SVM

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media