Abstract
To partition numerical attributes, machine learning (ML) has used a variety of discretization approaches that partition the numerical attribute into intervals. However, an effective method for discretization is still missing in various ML approaches, e.g., association rule mining. Moreover, the existing discretization techniques do not reflect best the impact of the independent numerical factor on the dependent numerical target factor. The main objective of this research is to develop a benchmark approach for partitioning numerical factors. We present an in-depth analysis of human perceptions of partitioning a numerical factor and compare it with one of our proposed measures. We also examine the perceptions of various experts in data science, statistics and engineering disciplines by using a series of graphs with numerical data. The analysis of the collected responses indicates that \(68.7\%\) of the human responses were approximately close to the values obtained by the proposed method. Based on this analysis, the proposed method may be used as one of the methods for discretizing the numerical attributes.
This work has been partially conducted in the project “ICT programme” which was supported by the European Union through the European Social Fund.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aupetit, M., Sedlmair, M., Abbas, M.M., Baggag, A., Bensmail, H.: Toward perception-based evaluation of clustering techniques for visual analytics. In: IEEE Visualization Conference on Proceedings of the VIS 2019, pp. 141–145 (2019)
Catlett, J.: On changing continuous attributes into ordered discrete attributes. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 164–178. Springer, Heidelberg (1991). https://doi.org/10.1007/BFb0017012
Demiralp, Ç., Bernstein, M.S., Heer, J.: Learning perceptual kernels for visualization design. IEEE Trans. Vis. Comput. Graph. 20(12), 1933–1942 (2014)
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Machine Learning Proceedings 1995, pp. 194–202. Elsevier (1995)
Draheim, D.: Generalized Jeffrey Conditionalization: A Frequentist Semantics of Partial Conditionalization. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69868-7
Draheim, D.: Future perspectives of association rule mining based on partial conditionalization. In: The 30th International Conference on Database and Expert Systems Applications, Proceedings of the DEXA 2019. LNCS, vol. 11706, p. xvi. Springer, Heidelberg (2019). https://doi.org/10.13140/RG.2.2.17763.48163
Etemadpour, R., da Motta, R.C., de Souza Paiva, J.G., Minghim, R., de Oliveira, M.C.F., Linsen, L.: Role of human perception in cluster-based visual analysis of multidimensional data projections. In: International Conference on Information Visualization Theory and Applications, Proceedings of IVAPP, pp. 276–283 (2014)
Fayyad, U., Irani, K.B.: Multi-interval discretization of continuous valued attributes for classification learning, 1993. In: The 13th International Joint Conference on Artificial Intelligence, Proceedings of IJCAI 1993 (1993)
Garcia, S., Luengo, J., Sáez, J.A., Lopez, V., Herrera, F.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25(4), 734–750 (2012)
Kaushik, M., Sharma, R., Peious, S.A., Draheim, D.: Impact-driven discretization of numerical factors: case of two- and three-partitioning. In: Srirama, S.N., Lin, J.C.-W., Bhatnagar, R., Agarwal, S., Reddy, P.K. (eds.) BDA 2021. LNCS, vol. 13147, pp. 244–260. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-93620-4_18
Kaushik, M., Sharma, R., Peious, S.A., Shahin, M., Ben Yahia, S., Draheim, D.: On the potential of numerical association rule mining. In: Dang, T.K., Küng, J., Takizawa, M., Chung, T.M. (eds.) FDSE 2020. CCIS, vol. 1306, pp. 3–20. Springer, Singapore (2020). https://doi.org/10.1007/978-981-33-4370-2_1
Kaushik, M., Sharma, R., Peious, S.A., Shahin, M., Yahia, S.B., Draheim, D.: A systematic assessment of numerical association rule mining methods. SN Comput. Sci. 2(5), 1–13 (2021)
Kotsiantis, S., Kanellopoulos, D.: Discretization techniques: a recent survey. GESTS Int. Trans. Comput. Sci. Eng. 32(1), 47–58 (2006)
Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: an enabling technique. Data Min. Knowl. Disc. 6(4), 393–423 (2002)
Liu, H., Setiono, R.: Feature selection via discretization. IEEE Trans. Knowl. Data Eng. 9(4), 642–645 (1997)
Arakkal Peious, S., Sharma, R., Kaushik, M., Shah, S.A., Yahia, S.B.: Grand reports: a tool for generalizing association rule mining to numeric target values. In: Song, M., Song, I.-Y., Kotsis, G., Tjoa, A.M., Khalil, I. (eds.) DaWaK 2020. LNCS, vol. 12393, pp. 28–37. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59065-9_3
Shahin, M., et al.: Big data analytics in association rule mining: a systematic literature review. In: International Conference on Big Data Engineering and Technology, Proceedings of the BDET 2021, pp. 40–49. ACM (2021)
Sharma, R., et al.: A novel framework for unification of association rule mining, online analytical processing and statistical reasoning. IEEE Access 10, 12792–12813 (2022)
Sharma, R., Kaushik, M., Peious, S.A., Yahia, S.B., Draheim, D.: Expected vs. unexpected: selecting right measures of interestingness. In: Song, M., Song, I.-Y., Kotsis, G., Tjoa, A.M., Khalil, I. (eds.) DaWaK 2020. LNCS, vol. 12393, pp. 38–47. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59065-9_4
Srikant, R., Agrawal, R.: Mining quantitative association rules in large relational tables. In: International Conference on Management of Data, Proceedings of the ACM SIGMOD 1996, pp. 1–12 (1996)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Kaushik, M., Sharma, R., Vidyarthi, A., Draheim, D. (2022). Discretizing Numerical Attributes: An Analysis of Human Perceptions. In: Chiusano, S., et al. New Trends in Database and Information Systems. ADBIS 2022. Communications in Computer and Information Science, vol 1652. Springer, Cham. https://doi.org/10.1007/978-3-031-15743-1_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-15743-1_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15742-4
Online ISBN: 978-3-031-15743-1
eBook Packages: Computer ScienceComputer Science (R0)