skip to main content
10.1145/3384544.3384590acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicscaConference Proceedingsconference-collections
research-article

Improving Classification Accuracy of Random Forest Algorithm Using Unsupervised Discretization with Fuzzy Partition and Fuzzy Set Intervals

Authors Info & Claims
Published:17 April 2020Publication History

ABSTRACT

It is known that certain classification algorithm requires continuous data to be discretized for it to produce better classification accuracy. Hence, many works have explored the pairing of classification algorithm and discretization techniques, yet tree-based classifier especially Classification and Regression Trees (CART) still have an issue with classification accuracy regardless of different pairing with existing discretization techniques. The role of fuzzy partition and fuzzy sets interval are not something new in data discretization but none yet to explore the pairing of fuzzy discretization with tree-based algorithm. This paper will be discussing on an approach of using fuzzy based discretization and a member of tree-based algorithm known as Random Forest, a better version of CART. In this study, continuous data are identified from a dataset and discretized through the fuzzy discretization. Then, 10-fold cross validation is done on the transformed dataset and seven well-known classifiers are used including the proposed approach. Based on the results, better classification accuracy is achieved when fuzzy discretization is paired with Random Forest algorithm compared to CART. On top of that, with the present of fuzzy discretization technique, an increased in the classification accuracy has been obtained compared to other classification algorithms.

References

  1. Gennady Agre and Stanimir Peev. 2002. On Supervised and Unsupervised Discretization. Methods 2, 2 (2002).Google ScholarGoogle Scholar
  2. Nor Idayu Ahmad-Azami, Nooraini Yusoff, and Ku Ruhana Ku-Mahamud. 2018. Fuzzy Discretization Technique for Bayesian Flood Disaster Model. 2, 2 (2018), 167--189.Google ScholarGoogle Scholar
  3. Azuraliza Abu Bakar, Zulaiha Ali Othman, Nor Liyana, and Mohd Shuib. 2009. Building A New Taxonomy For Data Discretization Techniques. 2009 2nd Conf. Data Min. Optim. October (2009), 132--140. DOI:https://doi.org/10.1109/DMO.2009.5341896Google ScholarGoogle Scholar
  4. James Dougherty, Ron Kohavi, and Mehran Sahami. Supervised and Unsupervised Discretization of Continous Fearures. 0,.Google ScholarGoogle Scholar
  5. Michela Fazzolari, Rafael Alcalá, and Francisco Herrera. 2014. A multi-objective evolutionary method for learning granularities based on fuzzy discretization to improve the accuracy-complexity trade-off of fuzzy rule-based classification systems: D-MOFARC algorithm. Appl. Soft Comput. J. 24, (2014), 470--481. DOI:https://doi.org/10.1016/j.asoc.2014.07.019Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Mehmet Hacibeyoglu and Ahmet Arslan. 2011. Improving Classification Accuracy with Discretization on Datasets Including Continuous Valued Features. June (2011).Google ScholarGoogle Scholar
  7. Ehsan Ali Kareem and Mehdi Duaimi. 2014. Improved Accuracy for Decision Tree Algorithm Based on Unsupervised Discretization. September (2014).Google ScholarGoogle Scholar
  8. Sotiris Kotsiantis and Dimitris Kanellopoulos. 2006. Discretization Techniques: A recent survey. GESTS Int. Trans. Comput. Sci. Eng. 32, 1 (2006), 47--58.Google ScholarGoogle Scholar
  9. Simone A Ludwig. 2015. Analyzing gene expression data: Fuzzy decision tree algorithm applied to the classification of cancer data Analyzing Gene Expression Data: Fuzzy Decision Tree Algorithm applied to the Classification of Cancer Data. August (2015). DOI:https://doi.org/10.1109/FUZZ-IEEE.2015.7337854Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Sergio Ram, David Mart, and Manuel Ben. Data Discretization: Taxonomy and Big Data Challenge. 1--26.Google ScholarGoogle Scholar
  11. Sahar Sardari, Mahdi Eftekhari, and Fatemeh Afsari. 2017. Hesitant fuzzy decision tree approach for highly imbalanced data classification. Appl. Soft Comput. J. 61, (2017), 727--741. DOI:https://doi.org/10.1016/j.asoc.2017.08.052Google ScholarGoogle ScholarCross RefCross Ref
  12. M. Shanmugapriya, H. Khanna Nehemiah, R.S. Bhuvaneswaran, Kannan Arputharaj, and J. Dhalia Sweetlin. 2017. Fuzzy Discretization based Classification of Medical Data. Res. J. Appl. Sci. Eng. Technol. 14, 8 (2017), 291--298. DOI:https://doi.org/10.19026/rjaset.14.4953Google ScholarGoogle ScholarCross RefCross Ref
  13. Jaime Lynn Speiser, Michael E Miller, Janet Tooze, and Edward Ip. 2019. A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl. 134, (2019), 93--101. DOI:https://doi.org/10.1016/j.eswa.2019.05.028Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Chih-fong Tsai and Yu-chi Chen. 2019. The optimal combination of feature selection and data discretization: An empirical study. Inf. Sci. (Ny). 505, (2019), 282--293. DOI:https://doi.org/10.1016/j.ins.2019.07.091Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Improving Classification Accuracy of Random Forest Algorithm Using Unsupervised Discretization with Fuzzy Partition and Fuzzy Set Intervals

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ICSCA '20: Proceedings of the 2020 9th International Conference on Software and Computer Applications
        February 2020
        382 pages
        ISBN:9781450376655
        DOI:10.1145/3384544

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 April 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader