Abstract
There are many factors that influence classifiers behavior in machine learning, and thus determining the best classifier is not an easy task. One way of tackling this problem is by experimenting the classifiers with several performance measures. In this paper, the behaviors of machine learning classifiers are experimented using the Rattle tool. Rattle tool is a graphical user interface (GUI) in R package used to carry out data mining modeling using classifiers namely, tree, boost, random forest, support vector machine, logit and neural net. This study was conducted using simulation and real data in which the behaviors of the classifiers are observed based on accuracy, ROC curve and modeling time. Based on the simulation data, there is grouping of the algorithms in terms of accuracy. The first are logit, neural net and support vector machine. The second are boost and random forest and the third is decision tree. Based on the real data, the highest accuracy based on the training data is boost algorithm and based on the testing data the highest accuracy is the neural net algorithm. Overall, the support vector machine and neural net classifier are the two best classifiers in both simulation and real data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Delgado, M.F., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15, 3133–3181 (2014)
Williams, G.J.: Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery. Springer, New York (2011). https://doi.org/10.1007/978-1-4419-9890-3
Statistics Indonesia: Labor Market Indicators Indonesia, February 2017. https://www.bps.go.id/publication/2017/08/03/60626049b6ad3a897e96b8c0/indikator-pasar-tenaga-kerja-indonesia-februari-2017.html. Accessed 01 Aug 2018
Mutalib, S., Ali, A., Rahman, S.A., Mohamed, A.: An exploratory study in classification methods for patients’ dataset. In: 2nd Conference on Data Mining and Optimization. IEEE (2009)
Ali, A.M., Angelov, P.: Anomalous behaviour detection based on heterogeneous data and data fusion. Soft. Comput. 22(10), 3187–3201 (2018)
Therneau, T., Atkinson, B., Ripley, B.: rpart: recursive partitioning and regression trees. R package version 4.1–11. https://cran.r-project.org/web/packages/rpart/index.html. Accessed 01 Aug 2018
Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002)
Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y.: xgboost: extreme gradient boosting. R package version 0.6.4.1. https://cran.r-project.org/web/packages/xgboost/index.html. Accessed 01 Aug 2018
Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: kernlab - an S4 package for kernel methods in R. J. Stat. Softw. 11(9), 1–20 (2004). https://www.jstatsoft.org/article/view/v011i09. Accessed 01 Aug 2018
R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org. Accessed 01 Aug 2018
Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S, 4th edn. Springer, New York (2002). https://doi.org/10.1007/978-0-387-21706-2
Acknowledgment
The authors are grateful to the Institut Teknologi Sepuluh Nopember that has supported this work partly through the Research Grant contract number 1192/PKS/ITS/2018 (1302/PKS/ITS/2018).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix A. Summary of Results
Replication | Training data accuracy | |||||
---|---|---|---|---|---|---|
Tree | Forest | Boost | SVM | Logit | Neural | |
1 | 86.9 | 100 | 92.8 | 91.5 | 91.4 | 91.5 |
2 | 87 | 100 | 92.7 | 91.6 | 91.5 | 91.5 |
3 | 86.7 | 100 | 92.7 | 91.3 | 91.3 | 91.3 |
4 | 87 | 100 | 92.6 | 91.3 | 91.4 | 91.4 |
5 | 86.3 | 100 | 92.7 | 91.5 | 91.4 | 91.5 |
6 | 86.8 | 100 | 92.7 | 91.4 | 91.3 | 91.3 |
7 | 85.8 | 100 | 92.8 | 91.5 | 91.5 | 91.5 |
8 | 86.6 | 100 | 92.7 | 91.4 | 91.3 | 91.4 |
9 | 86.7 | 100 | 92.6 | 91.4 | 91.3 | 91.3 |
10 | 86.8 | 100 | 92.8 | 91.6 | 91.5 | 91.5 |
Mean | 86.660 | 100.000 | 92.710 | 91.450 | 91.390 | 91.420 |
sd | 0.366 | 0.000 | 0.074 | 0.108 | 0.088 | 0.092 |
Replication | Testing data accuracy | |||||
---|---|---|---|---|---|---|
Tree | Forest | Boost | SVM | Logit | Neural | |
1 | 86.4 | 90.6 | 90.8 | 91 | 91.2 | 91.1 |
2 | 86.2 | 90.6 | 90.6 | 91 | 91 | 91 |
3 | 86.3 | 91 | 91.2 | 91.5 | 91.7 | 91.7 |
4 | 86.6 | 91.1 | 91 | 91.4 | 91.8 | 91.7 |
5 | 86.2 | 90.7 | 90.7 | 91.1 | 91.3 | 91.3 |
6 | 86.2 | 90.6 | 90.7 | 91.2 | 91.3 | 91.4 |
7 | 85.3 | 90.9 | 91 | 91.4 | 91.5 | 91.5 |
8 | 86.3 | 90.6 | 90.7 | 91.2 | 91.2 | 91.2 |
9 | 86.8 | 90.9 | 91.1 | 91.4 | 91.7 | 91.5 |
10 | 86.7 | 91.1 | 91 | 91.3 | 91.4 | 91.4 |
Mean | 86.300 | 90.810 | 90.880 | 91.250 | 91.410 | 91.380 |
sd | 0.414 | 0.213 | 0.204 | 0.178 | 0.260 | 0.235 |
Replication | Area under curve training data | |||||
---|---|---|---|---|---|---|
Tree | Forest | Boost | SVM | Logit | Neural | |
1 | 0.8788 | 1 | 0.9751 | 0.9524 | 0.9661 | 0.9664 |
2 | 0.8740 | 1 | 0.9751 | 0.953 | 0.9664 | 0.9666 |
3 | 0.8690 | 1 | 0.9748 | 0.952 | 0.9653 | 0.9655 |
4 | 0.8785 | 1 | 0.9747 | 0.9528 | 0.9658 | 0.966 |
5 | 0.8673 | 1 | 0.9755 | 0.9547 | 0.9664 | 0.9667 |
6 | 0.8820 | 1 | 0.9749 | 0.954 | 0.9657 | 0.9659 |
7 | 0.8636 | 1 | 0.9756 | 0.9538 | 0.9665 | 0.9667 |
8 | 0.8779 | 1 | 0.9752 | 0.9547 | 0.9665 | 0.9667 |
9 | 0.8742 | 1 | 0.9749 | 0.9521 | 0.9658 | 0.9661 |
10 | 0.8733 | 1 | 0.975 | 0.9531 | 0.9662 | 0.9664 |
Mean | 0.8739 | 1 | 0.9751 | 0.9533 | 0.9661 | 0.9663 |
sd | 0.0058 | 0 | 0.0003 | 0.0010 | 0.0004 | 0.0004 |
Replication | Area under curve testing data | |||||
---|---|---|---|---|---|---|
Tree | Forest | Boost | SVM | Logit | Neural | |
1 | 0.8697 | 0.9576 | 0.9615 | 0.9497 | 0.9647 | 0.9643 |
2 | 0.8615 | 0.9589 | 0.961 | 0.9483 | 0.9642 | 0.9639 |
3 | 0.8631 | 0.9623 | 0.9649 | 0.9548 | 0.9676 | 0.9673 |
4 | 0.8757 | 0.9625 | 0.9646 | 0.9544 | 0.968 | 0.9678 |
5 | 0.8597 | 0.9596 | 0.962 | 0.9529 | 0.9655 | 0.9654 |
6 | 0.8782 | 0.9597 | 0.9624 | 0.9512 | 0.9655 | 0.9654 |
7 | 0.857 | 0.9608 | 0.964 | 0.9529 | 0.9668 | 0.9665 |
8 | 0.873 | 0.9582 | 0.9603 | 0.9489 | 0.9642 | 0.964 |
9 | 0.8782 | 0.9611 | 0.9634 | 0.9512 | 0.9665 | 0.9662 |
10 | 0.8727 | 0.9621 | 0.9642 | 0.9534 | 0.9672 | 0.9669 |
Mean | 0.869 | 0.960 | 0.963 | 0.952 | 0.966 | 0.966 |
sd | 0.008 | 0.002 | 0.002 | 0.002 | 0.001 | 0.001 |
Replication | Processing time (sec) | |||||
---|---|---|---|---|---|---|
Tree | Forest | Boost | SVM | Logit | Neural | |
1 | 4.51 | 76.8 | 5.39 | 504.6 | 3.2 | 19.37 |
2 | 4.71 | 81.6 | 3.22 | 494.4 | 2.49 | 21.41 |
3 | 5.14 | 89.4 | 3.96 | 510.6 | 2.18 | 19.95 |
4 | 4.96 | 92.4 | 3.25 | 513.6 | 2.9 | 21.09 |
5 | 4.8 | 90.6 | 4.69 | 483.6 | 2.12 | 22.9 |
6 | 4.55 | 79.8 | 3.11 | 481.8 | 1.92 | 21.28 |
7 | 4.74 | 90.6 | 4.82 | 496.2 | 1.98 | 21.69 |
8 | 4.74 | 83.4 | 4.58 | 496.8 | 1.91 | 21.74 |
9 | 4.99 | 84 | 2.79 | 504.6 | 1.93 | 19.06 |
10 | 4.8 | 80.4 | 3.01 | 480 | 2.17 | 21.26 |
Mean | 4.794 | 84.900 | 3.882 | 496.620 | 2.280 | 20.975 |
sd | 0.194 | 5.450 | 0.924 | 11.933 | 0.447 | 1.177 |
Appendix B. ROC Curve of Classifier Real Data
![figure a](http://media.springernature.com/lw685/springer-static/image/chp%3A10.1007%2F978-981-13-3441-2_25/MediaObjects/476620_1_En_25_Figa_HTML.png)
![figure b](http://media.springernature.com/lw685/springer-static/image/chp%3A10.1007%2F978-981-13-3441-2_25/MediaObjects/476620_1_En_25_Figb_HTML.png)
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wibowo, W., Abdul-Rahman, S. (2019). An Empirical Study of Classifier Behavior in Rattle Tool. In: Yap, B., Mohamed, A., Berry, M. (eds) Soft Computing in Data Science. SCDS 2018. Communications in Computer and Information Science, vol 937. Springer, Singapore. https://doi.org/10.1007/978-981-13-3441-2_25
Download citation
DOI: https://doi.org/10.1007/978-981-13-3441-2_25
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-3440-5
Online ISBN: 978-981-13-3441-2
eBook Packages: Computer ScienceComputer Science (R0)