Skip to main content

Identifying Optimal Clusters in Purchase Transaction Data

  • Conference paper
  • First Online:
Advances in Computational Intelligence (MICAI 2021)

Abstract

Clustering in transaction databases can find potentially useful patterns to gain some insight into the structure of the data, which can help for effective decision-making. However, one of the critical tasks in clustering is to identify the appropriate number of clusters, which will determine the performance of any process further applied to the transaction database. This paper presents a methodology to discover the optimal structure of purchase transaction data using the Davies-Bouldin and Calinski-Harabasz validity indices to obtain the number of clusters and formed them with the farthest-first traversals algorithm. The quality of the structures previously formed is evaluated with data complexity measures such as F1, F2, F3, N1 and IR. In this work, we use the support vector machine and multi-layer perceptron classification algorithms, to determine recognition ability in classification problems of more than two classes, and in the context of separability and imbalance of classes present in the groups previously obtained. The experimental results exhibit the viability of the proposed methodology for decision-making.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Alcalá-Fdez, J., et al.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17(2), 255–287 (2011)

    Google Scholar 

  2. Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recogn. 46(1), 243–256 (2013)

    Article  Google Scholar 

  3. Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3(1), 1–27 (1974)

    MathSciNet  MATH  Google Scholar 

  4. Chen, N., Chen, A., Zhou, L., Lu, L.: A graph-based clustering algorithm in large transaction databases. Intell. Data Anal. 5(4), 327–338 (2004)

    Article  Google Scholar 

  5. Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 224–227 (1979)

    Article  Google Scholar 

  6. Fahad, A., et al.: A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans. Emerg. Top. Comput. 2(3), 267–279 (2014)

    Article  Google Scholar 

  7. Fraley, C., Raftery, A.E.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41(8), 578–588 (1998)

    Article  Google Scholar 

  8. Garcia, V., Mollineda, R., Sánchez, J.: On the KNN performance in a challenging scenario of imbalance and overlapping. Pattern Anal. Appl. 11, 269–280 (2007)

    Article  Google Scholar 

  9. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3), 107–145 (2001)

    Article  Google Scholar 

  10. Han, E.H., Karypis, G., Kumar, V., Mobasher, B.: Hypergraph based clustering in high-dimensional data sets: a summary of results. IEEE Bulletin Tech. Committee Data Eng. 21, 01–08 (1998)

    Google Scholar 

  11. He, Z., Xu, X., Deng, S.: TCSOM: clustering transactions using self-organizing map. Neural Process. Lett. 22(3), 249–262 (2005)

    Article  Google Scholar 

  12. Hochbaum, D.S., Shmoys, D.B.: A best possible heuristic for the k-center problem. Math. Oper. Res. 10(2), 180–184 (1985)

    Article  MathSciNet  Google Scholar 

  13. Huang, X., Song, Z.: Clustering analysis on e-commerce transaction based on K-means clustering. J. Netw. 9(2), 443–450 (2014)

    MathSciNet  Google Scholar 

  14. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Article  Google Scholar 

  15. Kokate, U., Deshpande, A., Mahalle, P., Patil, P.: Data stream clustering techniques, applications, and models: comparative analysis and discussion. Big Data Cogn. Comput. 2(4), 32 (2018)

    Article  Google Scholar 

  16. Kaur, P.J.: A survey of clustering techniques and algorithms. In: Proceedings of the 2nd International Conference on Computing for Sustainable Global Development, pp. 304–307. New Delhi (2015)

    Google Scholar 

  17. Pakhira, M.K., Bandyopadhyay, S., Maulik, U.: Validity index for crisp and fuzzy clusters. Pattern Recogn. 37(3), 487–501 (2004)

    Article  Google Scholar 

  18. Sánchez, J.S., Mollineda, R.A., Sotoca, J.M.: An analysis of how training data complexity affects the nearest neighbor classifiers. Pattern Anal. Appl. 10(3), 189–201 (2007)

    Article  MathSciNet  Google Scholar 

  19. Saxena, M.P.A., et al.: A review of clustering techniques and developments. Neurocomputing 267, 664–681 (2017)

    Article  Google Scholar 

  20. Sotoca, J., Mollineda, R.A., Sánchez, J.: A meta-learning framework for pattern classification by means of data complexity measures. Inteligencia Artif. Revista Iberoamericana de Inteligencia Artif. 29, 31–38 (2006)

    Google Scholar 

  21. Tin, K., Mitra, B.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)

    Article  Google Scholar 

  22. Tsai, C.Y., Chiu, C.C.: A purchase-based market segmentation methodology. Exp. Syst. Appl. 27(2), 265–276 (2004)

    Article  Google Scholar 

  23. Witten, I., Frank, E., Hall, M.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2011)

    Google Scholar 

  24. Wu, R.S., Chou, P.H.: Customer segmentation of multiple category data in e-commerce using a soft-clustering approach. Electron. Commer. Res. Appl. 10(3), 331–341 (2011)

    Article  Google Scholar 

  25. Xiao, Y., Dunham, M.H.: Interactive clustering for transaction data. In: Proceedings of the 3rd International Conference on Data Warehousing and Knowledge Discovery, pp. 121–130. Munich (2001)

    Google Scholar 

  26. Xu, J., Xiong, H., Sung, S.Y., Kumar, V.: A new clustering algorithm for transaction data via caucus. In: Proceedings of the 7th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp. 551–562. Seoul (2003)

    Google Scholar 

  27. Yun, C.H., Chuang, K.T., Chen, M.S.: An efficient clustering algorithm for market basket data based on small large ratios. In: Proceedings of the 25th Annual International Computer Software and Applications Conference. pp. 505–510. Chicago (2001)

    Google Scholar 

Download references

Acknowledgment

This work was partially supported by the E.S.I.M.E., Zacatenco, Instituto Politecnico Nacional, the TecNM/Instituto Tecnologico de Matamoros, and the Universitat Jaume I under grant UJI-B2018-49.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Pineda-Briseño .

Editor information

Editors and Affiliations

Appendix

Appendix

Table 17. Confusion matrix of Trans30k_2c
Table 18. Confusion matrix of Trans30k_7c
Table 19. Confusion matrix of Trans50k_2c
Table 20. Confusion matrix of Trans50k_5c
Table 21. Confusion matrix of Trans50k_12c
Table 22. Confusion matrix of Trans70k_2c
Table 23. Confusion matrix of Trans70k_5c
Table 24. Confusion matrix of Trans70k_10c
Table 25. Confusion matrix of Trans10k_2c
Table 26. Confusion matrix of Trans10k_4c

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cleofas-Sanchez, L., Pineda-Briseño, A., Sanchez, J.S. (2021). Identifying Optimal Clusters in Purchase Transaction Data. In: Batyrshin, I., Gelbukh, A., Sidorov, G. (eds) Advances in Computational Intelligence. MICAI 2021. Lecture Notes in Computer Science(), vol 13067. Springer, Cham. https://doi.org/10.1007/978-3-030-89817-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-89817-5_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-89816-8

  • Online ISBN: 978-3-030-89817-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics