Abstract
In Machine learning (ML), several discretization techniques and mathematical approaches are used to partition numerical data attributes. However, cut-points retrieved by discretizing techniques often do not match with human perceived cut-points. Therefore, understanding the human perception for discretizing the numerical attribute is important for developing an effective discretizing technique. In this paper, we conduct a study of human perception of partitions in numerical data that reflects best the impact of one independent numerical attribute on another dependent numerical attribute. We aim to understand how expert data scientists and statisticians partition numerical attributes under different types of data points, such as dense data points, outliers, and uneven random points. The findings lead to an interesting discussion about the importance of human perception under distinct kinds of data points for finding partitions of numerical attributes.
This work has been partially conducted in the project “ICT programme” which was supported by the European Union through the European Social Fund.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aupetit, M., Sedlmair, M., Abbas, M.M., Baggag, A., Bensmail, H.: Toward perception-based evaluation of clustering techniques for visual analytics. In: Proceedings of VIS2019 - IEEE Visualization Conference, pp. 141–145 (2019)
Demiralp, Ç., Bernstein, M.S., Heer, J.: Learning perceptual kernels for visualization design. IEEE Trans. Visual Comput. Graph. 20(12), 1933–1942 (2014)
Draheim, D.: Generalized Jeffrey conditionalization: a frequentist semantics of partial conditionalization. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69868-7
Draheim, D.: Future perspectives of association rule mining based on partial conditionalization. In: Proceedings of DEXA’2019 - the 30th International Conference on Database and Expert Systems Applications, LNCS, vol. 11706, p. xvi. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27615-7
Etemadpour, R., da Motta, R.C., de Souza Paiva, J.G., Minghim, R., de Oliveira, M.C.F., Linsen, L.: Role of human perception in cluster-based visual analysis of multidimensional data projections. In: Proceedings of IVAPP -International Conference on Information Visualization Theory and Applications, pp. 276–283 (2014)
Garcia, S., Luengo, J., Sáez, J.A., Lopez, V., Herrera, F.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25(4), 734–750 (2012)
Janosi, A., Steinbrunn, W., Pfisterer, M., Detrano, R.: Heart Disease. UCI machine learning repository (1988)
Kalish, M.: DC public employee salaries (2011). https://data.world/codefordc/dc-public-employee-salaries-2011
Kaushik, M.: Datasets (2022). https://github.com/minakshikaushik/LSQM-measure.git
Kaushik, M., Sharma, R., Peious, S.A., Draheim, D.: Impact-Driven Discretization of Numerical Factors: Case of Two- and Three-Partitioning. In: Srirama, S.N., Lin, J.C.-W., Bhatnagar, R., Agarwal, S., Reddy, P.K. (eds.) BDA 2021. LNCS, vol. 13147, pp. 244–260. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-93620-4_18
Kaushik, M., et al.: A systematic assessment of numerical association rule mining methods. SN Comput. Sci. 2(5), 1–13 (2021)
Kotsiantis, S., Kanellopoulos, D.: Discretization techniques: a recent survey. GESTS Int. Trans. Comput. Sci. Eng. 32(1), 47–58 (2006)
Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: An enabling technique. Data Min. Knowl. Disc. 6(4), 393–423 (2002)
Naik, S.: NJ teacher salaries. (2016). https://data.world/sheilnaik/nj-teacher-salaries-2016
Arakkal Peious, S., Sharma, R., Kaushik, M., Shah, S.A., Yahia, S.B.: Grand reports: a tool for generalizing association rule mining to numeric target values. In: Song, M., Song, I.-Y., Kotsis, G., Tjoa, A.M., Khalil, I. (eds.) DaWaK 2020. LNCS, vol. 12393, pp. 28–37. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59065-9_3
Shahin, M., et al.: Big data analytics in association rule mining: A systematic literature review. In: Proceedings of BDET 2021- International Conference on Big Data Engineering and Technology, pp. 40–49. ACM (2021)
Sharma, R., et al.: A novel framework for unification of association rule mining, online analytical processing and statistical reasoning. IEEE Access 10, 12792–12813 (2022). https://doi.org/10.1109/ACCESS.2022.3142537
Sharma, R., Kaushik, M., Peious, S.A., Shahin, M., Yadav, A.S., Draheim, D.: Towards unification of statistical reasoning, OLAP and association rule mining: semantics and pragmatics. In: Database Systems for Advanced Applications. DASFAA 2022, LNCS, vol. 13245. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-00123-9_48
Sharma, R., Kaushik, M., Peious, S.A., Yahia, S.B., Draheim, D.: Expected vs. unexpected: selecting right measures of interestingness. In: Song, M., Song, I.-Y., Kotsis, G., Tjoa, A.M., Khalil, I. (eds.) DaWaK 2020. LNCS, vol. 12393, pp. 38–47. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59065-9_4
Srikant, R., Agrawal, R.: Mining quantitative association rules in large relational tables. In: Proceedings of ACM SIGMOD 1996 - International Conference on Management of Data, pp. 1–12 (1996)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kaushik, M., Sharma, R., Shahin, M., Peious, S.A., Draheim, D. (2022). An Analysis of Human Perception of Partitions of Numerical Factor Domains. In: Pardede, E., Delir Haghighi, P., Khalil, I., Kotsis, G. (eds) Information Integration and Web Intelligence. iiWAS 2022. Lecture Notes in Computer Science, vol 13635. Springer, Cham. https://doi.org/10.1007/978-3-031-21047-1_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-21047-1_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21046-4
Online ISBN: 978-3-031-21047-1
eBook Packages: Computer ScienceComputer Science (R0)