An Analysis of Human Perception of Partitions of Numerical Factor Domains

Kaushik, Minakshi; Sharma, Rahul; Shahin, Mahtab; Peious, Sijo Arakkal; Draheim, Dirk

doi:10.1007/978-3-031-21047-1_13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13635))

Included in the following conference series:

International Conference on Information Integration and Web

596 Accesses

Abstract

In Machine learning (ML), several discretization techniques and mathematical approaches are used to partition numerical data attributes. However, cut-points retrieved by discretizing techniques often do not match with human perceived cut-points. Therefore, understanding the human perception for discretizing the numerical attribute is important for developing an effective discretizing technique. In this paper, we conduct a study of human perception of partitions in numerical data that reflects best the impact of one independent numerical attribute on another dependent numerical attribute. We aim to understand how expert data scientists and statisticians partition numerical attributes under different types of data points, such as dense data points, outliers, and uneven random points. The findings lead to an interesting discussion about the importance of human perception under distinct kinds of data points for finding partitions of numerical attributes.

This work has been partially conducted in the project “ICT programme” which was supported by the European Union through the European Social Fund.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aupetit, M., Sedlmair, M., Abbas, M.M., Baggag, A., Bensmail, H.: Toward perception-based evaluation of clustering techniques for visual analytics. In: Proceedings of VIS2019 - IEEE Visualization Conference, pp. 141–145 (2019)
Google Scholar
Demiralp, Ç., Bernstein, M.S., Heer, J.: Learning perceptual kernels for visualization design. IEEE Trans. Visual Comput. Graph. 20(12), 1933–1942 (2014)
Article Google Scholar
Draheim, D.: Generalized Jeffrey conditionalization: a frequentist semantics of partial conditionalization. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69868-7
Article MATH Google Scholar
Draheim, D.: Future perspectives of association rule mining based on partial conditionalization. In: Proceedings of DEXA’2019 - the 30th International Conference on Database and Expert Systems Applications, LNCS, vol. 11706, p. xvi. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27615-7
Etemadpour, R., da Motta, R.C., de Souza Paiva, J.G., Minghim, R., de Oliveira, M.C.F., Linsen, L.: Role of human perception in cluster-based visual analysis of multidimensional data projections. In: Proceedings of IVAPP -International Conference on Information Visualization Theory and Applications, pp. 276–283 (2014)
Google Scholar
Garcia, S., Luengo, J., Sáez, J.A., Lopez, V., Herrera, F.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25(4), 734–750 (2012)
Article Google Scholar
Janosi, A., Steinbrunn, W., Pfisterer, M., Detrano, R.: Heart Disease. UCI machine learning repository (1988)
Google Scholar
Kalish, M.: DC public employee salaries (2011). https://data.world/codefordc/dc-public-employee-salaries-2011
Kaushik, M.: Datasets (2022). https://github.com/minakshikaushik/LSQM-measure.git
Kaushik, M., Sharma, R., Peious, S.A., Draheim, D.: Impact-Driven Discretization of Numerical Factors: Case of Two- and Three-Partitioning. In: Srirama, S.N., Lin, J.C.-W., Bhatnagar, R., Agarwal, S., Reddy, P.K. (eds.) BDA 2021. LNCS, vol. 13147, pp. 244–260. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-93620-4_18
Chapter Google Scholar
Kaushik, M., et al.: A systematic assessment of numerical association rule mining methods. SN Comput. Sci. 2(5), 1–13 (2021)
Article Google Scholar
Kotsiantis, S., Kanellopoulos, D.: Discretization techniques: a recent survey. GESTS Int. Trans. Comput. Sci. Eng. 32(1), 47–58 (2006)
Google Scholar
Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: An enabling technique. Data Min. Knowl. Disc. 6(4), 393–423 (2002)
Article MathSciNet Google Scholar
Naik, S.: NJ teacher salaries. (2016). https://data.world/sheilnaik/nj-teacher-salaries-2016
Arakkal Peious, S., Sharma, R., Kaushik, M., Shah, S.A., Yahia, S.B.: Grand reports: a tool for generalizing association rule mining to numeric target values. In: Song, M., Song, I.-Y., Kotsis, G., Tjoa, A.M., Khalil, I. (eds.) DaWaK 2020. LNCS, vol. 12393, pp. 28–37. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59065-9_3
Chapter Google Scholar
Shahin, M., et al.: Big data analytics in association rule mining: A systematic literature review. In: Proceedings of BDET 2021- International Conference on Big Data Engineering and Technology, pp. 40–49. ACM (2021)
Google Scholar
Sharma, R., et al.: A novel framework for unification of association rule mining, online analytical processing and statistical reasoning. IEEE Access 10, 12792–12813 (2022). https://doi.org/10.1109/ACCESS.2022.3142537
Article Google Scholar
Sharma, R., Kaushik, M., Peious, S.A., Shahin, M., Yadav, A.S., Draheim, D.: Towards unification of statistical reasoning, OLAP and association rule mining: semantics and pragmatics. In: Database Systems for Advanced Applications. DASFAA 2022, LNCS, vol. 13245. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-00123-9_48
Sharma, R., Kaushik, M., Peious, S.A., Yahia, S.B., Draheim, D.: Expected vs. unexpected: selecting right measures of interestingness. In: Song, M., Song, I.-Y., Kotsis, G., Tjoa, A.M., Khalil, I. (eds.) DaWaK 2020. LNCS, vol. 12393, pp. 38–47. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59065-9_4
Chapter Google Scholar
Srikant, R., Agrawal, R.: Mining quantitative association rules in large relational tables. In: Proceedings of ACM SIGMOD 1996 - International Conference on Management of Data, pp. 1–12 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Information Systems Group, Tallinn University of Technology, Akadeemia tee 15a, 12618, Tallinn, Estonia
Minakshi Kaushik, Rahul Sharma, Mahtab Shahin, Sijo Arakkal Peious & Dirk Draheim

Authors

Minakshi Kaushik
View author publications
You can also search for this author in PubMed Google Scholar
Rahul Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Mahtab Shahin
View author publications
You can also search for this author in PubMed Google Scholar
Sijo Arakkal Peious
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Draheim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Minakshi Kaushik .

Editor information

Editors and Affiliations

La Trobe University, Melbourne, VIC, Australia
Eric Pardede
Monash University, Melbourne, VIC, Australia
Pari Delir Haghighi
Johannes Kepler University Linz, Linz, Austria
Ismail Khalil
Johannes Kepler University Linz, Linz, Austria
Gabriele Kotsis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kaushik, M., Sharma, R., Shahin, M., Peious, S.A., Draheim, D. (2022). An Analysis of Human Perception of Partitions of Numerical Factor Domains. In: Pardede, E., Delir Haghighi, P., Khalil, I., Kotsis, G. (eds) Information Integration and Web Intelligence. iiWAS 2022. Lecture Notes in Computer Science, vol 13635. Springer, Cham. https://doi.org/10.1007/978-3-031-21047-1_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-21047-1_13
Published: 20 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21046-4
Online ISBN: 978-3-031-21047-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Analysis of Human Perception of Partitions of Numerical Factor Domains