Impact-Driven Discretization of Numerical Factors: Case of Two- and Three-Partitioning

Kaushik, Minakshi; Sharma, Rahul; Peious, Sijo Arakkal; Draheim, Dirk

doi:10.1007/978-3-030-93620-4_18

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13147))

Included in the following conference series:

International Conference on Big Data Analytics

737 Accesses

Abstract

Many real-world data sets contain a mix of various types of data, i.e., binary, numerical, and categorical; however, many data mining and machine learning (ML) algorithms work merely with discrete values, e.g., association rule mining. Therefore, the discretization process plays an essential role in data mining and ML. In state-of-the-art data mining and ML, different discretization techniques are used to convert numerical attributes into discrete attributes. However, existing discretization techniques do not reflect best the impact of the independent numerical factor onto the dependent numerical target factor. This paper proposes and compares two novel measures for order-preserving partitioning of numerical factors that we call Least Squared Ordinate-Directed Impact Measure and Least Absolute-Difference Ordinate-Directed Impact Measure. The main aim of these measures is to optimally reflect the impact of a numerical factor onto another numerical target factor. We implement the proposed measures for two-partitions and three-partitions. We evaluate the performance of the proposed measures by comparison with human-perceived cut-points. We use twelve synthetic data sets and one real-world data set for the evaluation, i.e., school teacher salaries from New Jersey (NJ). As a result, we find that the proposed measures are useful in finding the best cut-points perceived by humans.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Discretizing Numerical Attributes: An Analysis of Human Perceptions

An Analysis of Human Perception of Partitions of Numerical Factor Domains

Attribute Selection Based on Reduction of Numerical Attributes During Discretization

Notes

1.
https://github.com/minakshikaushik/Least-square-measure.git.

References

Abachi, H.M., Hosseini, S., Maskouni, M.A., Kangavari, M., Cheung, N.M.: Statistical discretization of continuous attributes using Kolmogorov-Smirnov test. In: Wang, J., Cong, G., Chen, J., Qi, J. (eds.) Databases Theory and Applications, pp. 309–315. Springer International Publishing, Cham (2018)
Chapter Google Scholar
Bergerhoff, L., Weickert, J., Dar, Y.: Algorithms for piecewise constant signal approximations. In: 27th European Signal Processing Conference (EUSIPCO), pp. 1–5. IEEE (2019)
Google Scholar
Catlett, J.: On changing continuous attributes into ordered discrete attributes. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 164–178. Springer, Heidelberg (1991). https://doi.org/10.1007/BFb0017012
Chapter Google Scholar
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Machine learning proceedings 1995, pp. 194–202. Elsevier (1995)
Google Scholar
Draheim, D.: Generalized Jeffrey Conditionalization: A Frequentist Semantics of Partial Conditionalization. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69868-7
Eubank, R.: Optimal grouping, spacing, stratification, and piecewise constant approximation. Siam Rev. 30(3), 404–420 (1988)
Article MathSciNet Google Scholar
Garcia, S., Luengo, J., Sáez, J.A., Lopez, V., Herrera, F.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25(4), 734–750 (2012)
Article Google Scholar
Gelman, A., et al.: Analysis of variance - why it is more important than ever. Ann. Stat. 33(1), 1–53 (2005)
Article MathSciNet Google Scholar
Kaushik, M., Sharma, R., Peious, S.A., Shahin, M., Ben Yahia, S., Draheim, D.: On the potential of numerical association rule mining. In: Dang, T.K., Küng, J., Takizawa, M., Chung, T.M. (eds.) FDSE 2020. CCIS, vol. 1306, pp. 3–20. Springer, Singapore (2020). https://doi.org/10.1007/978-981-33-4370-2_1
Chapter Google Scholar
Konno, H., Kuno, T.: Best piecewise constant approximation of a function of single variable. Oper. Res. Lett. 7(4), 205–210 (1988)
Article MathSciNet Google Scholar
Kotsiantis, S., Kanellopoulos, D.: Discretization techniques: a recent survey. GESTS Int. Trans. Comput. Sci. Eng. 32(1), 47–58 (2006)
Google Scholar
Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: an enabling technique. Data Min. Knowl. Discov. 6(4), 393–423 (2002)
Article MathSciNet Google Scholar
Lud, M.C., Widmer, G.: Relative unsupervised discretization for association rule mining. In: Zighed, D.A., Komorowski, J., Żytkow, J. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 148–158. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45372-5_15
Chapter Google Scholar
Mehta, S., Parthasarathy, S., Yang, H.: Toward unsupervised correlation preserving discretization. IEEE Trans. Knowl. Data Eng. 17(9), 1174–1185 (2005)
Article Google Scholar
Montgomery, D.C., Peck, E.A., Vining, G.G.: Introduction to Linear Regression Analysis. John Wiley & Sons, Hoboken (2021)
MATH Google Scholar
Naik, S.: Nj teacher salaries (2016). https://data.world/sheilnaik/nj-teacher-salaries-2016
Pearson, K.: VII. Note on regression and inheritance in the case of two parents. Proc. Roy. Soc. London 58(347–352), 240–242 (1895)
Google Scholar
Arakkal Peious, S., Sharma, R., Kaushik, M., Shah, S.A., Yahia, S.B.: Grand reports: a tool for generalizing association rule mining to numeric target values. In: Song, M., Song, lY., Kotsis, G., Tjoa, A.M., Khalil, I. (eds.) DaWaK 2020. LNCS, vol. 12393, pp. 28–37. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59065-9_3
Chapter Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Article Google Scholar
Shahin, M., et al.: Big data analytics in association rule mining: a systematic literature review. In: International Conference on Big Data Engineering and Technology (BDET), pp. 40–49. Association for Computing Machinery (2021)
Google Scholar
Sharma, R., Kaushik, M., Peious, S.A., Yahia, S.B., Draheim, D.: Expected vs. unexpected: selecting right measures of interestingness. In: Song, M., Song, I.Y., Kotsis, G., Tjoa, A.M., Khalil, I. (eds.) DaWaK 2020. LNCS, vol. 12393, pp. 38–47. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59065-9_4
Chapter Google Scholar
Srikant, R., Agrawal, R.: Mining quantitative association rules in large relational tables. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 1–12 (1996)
Google Scholar
Stigler, S.M.: Francis galton’s account of the invention of correlation. Stat. Sci. 4, 73–79 (1989)
Article MathSciNet Google Scholar
Xun, Y., Yin, Q., Zhang, J., Yang, H., Cui, X.: A novel discretization algorithm based on multi-scale and information entropy. Appl. Intell. 51(2), 991–1009 (2021)
Article Google Scholar

Download references

Acknowledgements

This work has been conducted in the project “ICT programme" which was supported by the European Union through the European Social Fund.

Author information

Authors and Affiliations

Information Systems Group, Tallinn University of Technology, Akadeemia tee 15a, 12618, Tallinn, Estonia
Minakshi Kaushik, Rahul Sharma, Sijo Arakkal Peious & Dirk Draheim

Authors

Minakshi Kaushik
View author publications
You can also search for this author in PubMed Google Scholar
Rahul Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Sijo Arakkal Peious
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Draheim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Minakshi Kaushik .

Editor information

Editors and Affiliations

University of Hyderabad, Hyderabad, India
Satish Narayana Srirama
Western Norway University of Applied Sciences, Bergen, Norway
Jerry Chun-Wei Lin
University of Cincinnati, Cincinnati, OH, USA
Raj Bhatnagar
Indian Institute of Information Technology Allahabad, Prayagraj, India
Sonali Agarwal
International Institute of Information Technology, Hyderabad, India
P. Krishna Reddy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kaushik, M., Sharma, R., Peious, S.A., Draheim, D. (2021). Impact-Driven Discretization of Numerical Factors: Case of Two- and Three-Partitioning. In: Srirama, S.N., Lin, J.CW., Bhatnagar, R., Agarwal, S., Reddy, P.K. (eds) Big Data Analytics. BDA 2021. Lecture Notes in Computer Science(), vol 13147. Springer, Cham. https://doi.org/10.1007/978-3-030-93620-4_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-93620-4_18
Published: 18 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93619-8
Online ISBN: 978-3-030-93620-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Impact-Driven Discretization of Numerical Factors: Case of Two- and Three-Partitioning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Discretizing Numerical Attributes: An Analysis of Human Perceptions

An Analysis of Human Perception of Partitions of Numerical Factor Domains

Attribute Selection Based on Reduction of Numerical Attributes During Discretization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Impact-Driven Discretization of Numerical Factors: Case of Two- and Three-Partitioning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Discretizing Numerical Attributes: An Analysis of Human Perceptions

An Analysis of Human Perception of Partitions of Numerical Factor Domains

Attribute Selection Based on Reduction of Numerical Attributes During Discretization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation