Abstract
Clustering in transaction databases can find potentially useful patterns to gain some insight into the structure of the data, which can help for effective decision-making. However, one of the critical tasks in clustering is to identify the appropriate number of clusters, which will determine the performance of any process further applied to the transaction database. This paper presents a methodology to discover the optimal structure of purchase transaction data using the Davies-Bouldin and Calinski-Harabasz validity indices to obtain the number of clusters and formed them with the farthest-first traversals algorithm. The quality of the structures previously formed is evaluated with data complexity measures such as F1, F2, F3, N1 and IR. In this work, we use the support vector machine and multi-layer perceptron classification algorithms, to determine recognition ability in classification problems of more than two classes, and in the context of separability and imbalance of classes present in the groups previously obtained. The experimental results exhibit the viability of the proposed methodology for decision-making.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alcalá-Fdez, J., et al.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17(2), 255–287 (2011)
Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recogn. 46(1), 243–256 (2013)
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3(1), 1–27 (1974)
Chen, N., Chen, A., Zhou, L., Lu, L.: A graph-based clustering algorithm in large transaction databases. Intell. Data Anal. 5(4), 327–338 (2004)
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 224–227 (1979)
Fahad, A., et al.: A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans. Emerg. Top. Comput. 2(3), 267–279 (2014)
Fraley, C., Raftery, A.E.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41(8), 578–588 (1998)
Garcia, V., Mollineda, R., Sánchez, J.: On the KNN performance in a challenging scenario of imbalance and overlapping. Pattern Anal. Appl. 11, 269–280 (2007)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3), 107–145 (2001)
Han, E.H., Karypis, G., Kumar, V., Mobasher, B.: Hypergraph based clustering in high-dimensional data sets: a summary of results. IEEE Bulletin Tech. Committee Data Eng. 21, 01–08 (1998)
He, Z., Xu, X., Deng, S.: TCSOM: clustering transactions using self-organizing map. Neural Process. Lett. 22(3), 249–262 (2005)
Hochbaum, D.S., Shmoys, D.B.: A best possible heuristic for the k-center problem. Math. Oper. Res. 10(2), 180–184 (1985)
Huang, X., Song, Z.: Clustering analysis on e-commerce transaction based on K-means clustering. J. Netw. 9(2), 443–450 (2014)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Kokate, U., Deshpande, A., Mahalle, P., Patil, P.: Data stream clustering techniques, applications, and models: comparative analysis and discussion. Big Data Cogn. Comput. 2(4), 32 (2018)
Kaur, P.J.: A survey of clustering techniques and algorithms. In: Proceedings of the 2nd International Conference on Computing for Sustainable Global Development, pp. 304–307. New Delhi (2015)
Pakhira, M.K., Bandyopadhyay, S., Maulik, U.: Validity index for crisp and fuzzy clusters. Pattern Recogn. 37(3), 487–501 (2004)
Sánchez, J.S., Mollineda, R.A., Sotoca, J.M.: An analysis of how training data complexity affects the nearest neighbor classifiers. Pattern Anal. Appl. 10(3), 189–201 (2007)
Saxena, M.P.A., et al.: A review of clustering techniques and developments. Neurocomputing 267, 664–681 (2017)
Sotoca, J., Mollineda, R.A., Sánchez, J.: A meta-learning framework for pattern classification by means of data complexity measures. Inteligencia Artif. Revista Iberoamericana de Inteligencia Artif. 29, 31–38 (2006)
Tin, K., Mitra, B.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)
Tsai, C.Y., Chiu, C.C.: A purchase-based market segmentation methodology. Exp. Syst. Appl. 27(2), 265–276 (2004)
Witten, I., Frank, E., Hall, M.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2011)
Wu, R.S., Chou, P.H.: Customer segmentation of multiple category data in e-commerce using a soft-clustering approach. Electron. Commer. Res. Appl. 10(3), 331–341 (2011)
Xiao, Y., Dunham, M.H.: Interactive clustering for transaction data. In: Proceedings of the 3rd International Conference on Data Warehousing and Knowledge Discovery, pp. 121–130. Munich (2001)
Xu, J., Xiong, H., Sung, S.Y., Kumar, V.: A new clustering algorithm for transaction data via caucus. In: Proceedings of the 7th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp. 551–562. Seoul (2003)
Yun, C.H., Chuang, K.T., Chen, M.S.: An efficient clustering algorithm for market basket data based on small large ratios. In: Proceedings of the 25th Annual International Computer Software and Applications Conference. pp. 505–510. Chicago (2001)
Acknowledgment
This work was partially supported by the E.S.I.M.E., Zacatenco, Instituto Politecnico Nacional, the TecNM/Instituto Tecnologico de Matamoros, and the Universitat Jaume I under grant UJI-B2018-49.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Cleofas-Sanchez, L., Pineda-Briseño, A., Sanchez, J.S. (2021). Identifying Optimal Clusters in Purchase Transaction Data. In: Batyrshin, I., Gelbukh, A., Sidorov, G. (eds) Advances in Computational Intelligence. MICAI 2021. Lecture Notes in Computer Science(), vol 13067. Springer, Cham. https://doi.org/10.1007/978-3-030-89817-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-89817-5_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89816-8
Online ISBN: 978-3-030-89817-5
eBook Packages: Computer ScienceComputer Science (R0)