Identifying Optimal Clusters in Purchase Transaction Data

Cleofas-Sanchez, L.; Pineda-Briseño, A.; Sanchez, J. S.

doi:10.1007/978-3-030-89817-5_1

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13067))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

1214 Accesses

Abstract

Clustering in transaction databases can find potentially useful patterns to gain some insight into the structure of the data, which can help for effective decision-making. However, one of the critical tasks in clustering is to identify the appropriate number of clusters, which will determine the performance of any process further applied to the transaction database. This paper presents a methodology to discover the optimal structure of purchase transaction data using the Davies-Bouldin and Calinski-Harabasz validity indices to obtain the number of clusters and formed them with the farthest-first traversals algorithm. The quality of the structures previously formed is evaluated with data complexity measures such as F1, F2, F3, N1 and IR. In this work, we use the support vector machine and multi-layer perceptron classification algorithms, to determine recognition ability in classification problems of more than two classes, and in the context of separability and imbalance of classes present in the groups previously obtained. The experimental results exhibit the viability of the proposed methodology for decision-making.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Enhancing Pattern Classification Accuracy Through Customer Segmentation-Using Machine Learning Algorithms

Article 09 October 2024

A Sparse Binary Data Clustering Method for Transaction Data

Evaluation of Analysis Model for Products with Coefficients of Binary Classifiers and Consideration of Way to Improve

References

Alcalá-Fdez, J., et al.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17(2), 255–287 (2011)
Google Scholar
Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recogn. 46(1), 243–256 (2013)
Article Google Scholar
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3(1), 1–27 (1974)
MathSciNet MATH Google Scholar
Chen, N., Chen, A., Zhou, L., Lu, L.: A graph-based clustering algorithm in large transaction databases. Intell. Data Anal. 5(4), 327–338 (2004)
Article Google Scholar
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 224–227 (1979)
Article Google Scholar
Fahad, A., et al.: A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans. Emerg. Top. Comput. 2(3), 267–279 (2014)
Article Google Scholar
Fraley, C., Raftery, A.E.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41(8), 578–588 (1998)
Article Google Scholar
Garcia, V., Mollineda, R., Sánchez, J.: On the KNN performance in a challenging scenario of imbalance and overlapping. Pattern Anal. Appl. 11, 269–280 (2007)
Article Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3), 107–145 (2001)
Article Google Scholar
Han, E.H., Karypis, G., Kumar, V., Mobasher, B.: Hypergraph based clustering in high-dimensional data sets: a summary of results. IEEE Bulletin Tech. Committee Data Eng. 21, 01–08 (1998)
Google Scholar
He, Z., Xu, X., Deng, S.: TCSOM: clustering transactions using self-organizing map. Neural Process. Lett. 22(3), 249–262 (2005)
Article Google Scholar
Hochbaum, D.S., Shmoys, D.B.: A best possible heuristic for the k-center problem. Math. Oper. Res. 10(2), 180–184 (1985)
Article MathSciNet Google Scholar
Huang, X., Song, Z.: Clustering analysis on e-commerce transaction based on K-means clustering. J. Netw. 9(2), 443–450 (2014)
MathSciNet Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Article Google Scholar
Kokate, U., Deshpande, A., Mahalle, P., Patil, P.: Data stream clustering techniques, applications, and models: comparative analysis and discussion. Big Data Cogn. Comput. 2(4), 32 (2018)
Article Google Scholar
Kaur, P.J.: A survey of clustering techniques and algorithms. In: Proceedings of the 2nd International Conference on Computing for Sustainable Global Development, pp. 304–307. New Delhi (2015)
Google Scholar
Pakhira, M.K., Bandyopadhyay, S., Maulik, U.: Validity index for crisp and fuzzy clusters. Pattern Recogn. 37(3), 487–501 (2004)
Article Google Scholar
Sánchez, J.S., Mollineda, R.A., Sotoca, J.M.: An analysis of how training data complexity affects the nearest neighbor classifiers. Pattern Anal. Appl. 10(3), 189–201 (2007)
Article MathSciNet Google Scholar
Saxena, M.P.A., et al.: A review of clustering techniques and developments. Neurocomputing 267, 664–681 (2017)
Article Google Scholar
Sotoca, J., Mollineda, R.A., Sánchez, J.: A meta-learning framework for pattern classification by means of data complexity measures. Inteligencia Artif. Revista Iberoamericana de Inteligencia Artif. 29, 31–38 (2006)
Google Scholar
Tin, K., Mitra, B.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)
Article Google Scholar
Tsai, C.Y., Chiu, C.C.: A purchase-based market segmentation methodology. Exp. Syst. Appl. 27(2), 265–276 (2004)
Article Google Scholar
Witten, I., Frank, E., Hall, M.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2011)
Google Scholar
Wu, R.S., Chou, P.H.: Customer segmentation of multiple category data in e-commerce using a soft-clustering approach. Electron. Commer. Res. Appl. 10(3), 331–341 (2011)
Article Google Scholar
Xiao, Y., Dunham, M.H.: Interactive clustering for transaction data. In: Proceedings of the 3rd International Conference on Data Warehousing and Knowledge Discovery, pp. 121–130. Munich (2001)
Google Scholar
Xu, J., Xiong, H., Sung, S.Y., Kumar, V.: A new clustering algorithm for transaction data via caucus. In: Proceedings of the 7th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp. 551–562. Seoul (2003)
Google Scholar
Yun, C.H., Chuang, K.T., Chen, M.S.: An efficient clustering algorithm for market basket data based on small large ratios. In: Proceedings of the 25th Annual International Computer Software and Applications Conference. pp. 505–510. Chicago (2001)
Google Scholar

Download references

Acknowledgment

This work was partially supported by the E.S.I.M.E., Zacatenco, Instituto Politecnico Nacional, the TecNM/Instituto Tecnologico de Matamoros, and the Universitat Jaume I under grant UJI-B2018-49.

Author information

Authors and Affiliations

Escuela Superior de Ingenieria Mecanica y Electrica, Zacatenco, Instituto Politecnico Nacional, CDMX, Mexico
L. Cleofas-Sanchez
Tecnologico Nacional de Mexico/Instituto Tecnologico de Matamoros H. Matamoros, Tamps, Mexico
A. Pineda-Briseño
Department Computer Languages and Systems, Institute of New Imaging Technologies, Universitat Jaume I, Castello de la Plana, Spain
J. S. Sanchez

Authors

L. Cleofas-Sanchez
View author publications
You can also search for this author in PubMed Google Scholar
A. Pineda-Briseño
View author publications
You can also search for this author in PubMed Google Scholar
J. S. Sanchez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Pineda-Briseño .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Centro de Investigación en Computación, Mexico City, Mexico
Ildar Batyrshin
Instituto Politécnico Nacional, Centro de Investigación en Computación, Mexico City, Mexico
Alexander Gelbukh
Instituto Politécnico Nacional, Centro de Investigación en Computación, Mexico City, Mexico
Grigori Sidorov

Appendix

Table 17. Confusion matrix of Trans30k_2c

Full size table

Table 18. Confusion matrix of Trans30k_7c

Full size table

Table 19. Confusion matrix of Trans50k_2c

Full size table

Table 20. Confusion matrix of Trans50k_5c

Full size table

Table 21. Confusion matrix of Trans50k_12c

Full size table

Table 22. Confusion matrix of Trans70k_2c

Full size table

Table 23. Confusion matrix of Trans70k_5c

Full size table

Table 24. Confusion matrix of Trans70k_10c

Full size table

Table 25. Confusion matrix of Trans10k_2c

Full size table

Table 26. Confusion matrix of Trans10k_4c

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cleofas-Sanchez, L., Pineda-Briseño, A., Sanchez, J.S. (2021). Identifying Optimal Clusters in Purchase Transaction Data. In: Batyrshin, I., Gelbukh, A., Sidorov, G. (eds) Advances in Computational Intelligence. MICAI 2021. Lecture Notes in Computer Science(), vol 13067. Springer, Cham. https://doi.org/10.1007/978-3-030-89817-5_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-89817-5_1
Published: 21 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89816-8
Online ISBN: 978-3-030-89817-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Identifying Optimal Clusters in Purchase Transaction Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Enhancing Pattern Classification Accuracy Through Customer Segmentation-Using Machine Learning Algorithms

A Sparse Binary Data Clustering Method for Transaction Data

Evaluation of Analysis Model for Products with Coefficients of Binary Classifiers and Consideration of Way to Improve

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us