Skip to main content
Log in

Forecast of seasonal consumption behavior of consumers and privacy-preserving data mining with new S-Apriori algorithm

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Nowadays, supermarkets and retail stores all use software systems with databases to store customer transactions. Over time, the volume of data is also increasing and it contributes a lot of hidden value in this data warehouse, mining data from historical transactions will find out the buying patterns and behavior of consumers, which can assist in improving sales by reaching customers more precisely. Data-mining techniques allow us to exploit synthetic information in many aspects, such as association rules for statistics and decision support in many fields. Most users of e-commerce systems or web platforms are concerned about privacy protection, such as privacy requirements for name, occupation, age, interests, residence, or sales transactions on the e-commerce system. Therefore, protecting the privacy of electronic service users in data mining is also an important factor that needs to be considered. For those important reasons, the Apriori algorithm was researched and extrapolated into a new S-Apriori algorithm for the concept of seasonal shopping. This paper applied the S-Apriori, ORM model, SQL language, and C# to build the libraries for the forecast of Seasonal Consumption Behavior of Consumers. Also, a new Thanh and Huh Cryptography algorithm for privacy-preserving filters is proposed for data-mining processing privacy protection. The paper experimented on two datasets based on a small dataset with 37 records and the Adventure large dataset of Microsoft with 172,459 records, while the software provides association rules with the corresponding confidence ratio for users to easily make decisions. In addition, the model will be packaged and published to the Microsoft Nuget ecosystem, developers and researchers can use it to develop association rule mining systems or further extend it based on the new S-Apriori model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23

Similar content being viewed by others

Data availability

Please contact the corresponding author for data requests. Also, the C# coding sample and S-Apriori model are available. Duy Thanh Tran, Jun-Ho Huh [47], Full source code for S-Apriori model https://github.com/thanhtd32/SAprioriSystem/tree/main/SAprioriModelDuy Thanh Tran, Jun-Ho Huh [50], Small dataset https://github.com/thanhtd32/SAprioriSystem/tree/main/dataset/smalldatasetDuy Thanh Tran, Jun-Ho Huh [51], Large dataset https://github.com/thanhtd32/SAprioriSystem/tree/main/dataset/largedataset (We converted the Microsoft SQL Server Adventure- Works2017 database to Json large dataset format) Duy Thanh Tran, Jun-Ho Huh [53], Large dataset with two layers of data privacy https://github.com/thanhtd32/SAprioriSystem/tree/main/dataset/privacydatasetDuy Thanh Tran, Jun-Ho Huh, S-Apriori [54] model in Microsoft Nuget System https://www.nuget.org/packages/SAprioriModel/

References

  1. Golec D, Strugar I, Belak D (2022) The benefits of enterprise data warehouse implementation in cloud vs. on-premises. Entrenova Enterp Res Innov 7(1):66–74. https://doi.org/10.54820/DMZS9230

    Article  Google Scholar 

  2. Li H, Sheu PCY (2022) A scalable association rule learning and recommendation algorithm for large-scale microarray datasets. J Big Data 9:35. https://doi.org/10.1186/s40537-022-00577-4

    Article  Google Scholar 

  3. X Yingzhuo, W Xuewen (2021) “Research on community consumer behavior based on association rules analysis,”In: 2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP), pp. 1213–1216, doi: https://doi.org/10.1109/ICSP51882.2021.9408917

  4. S Diwandari and U Zaky (2021) “Analysis of customer purchase behavior using association rules in e-shop,” In: 2021 IEEE 5th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), pp. 144–149, doi: https://doi.org/10.1109/ICITISEE53823.2021.9655892

  5. U. Fayyad, “Data mining and knowledge discovery in databases: implications for scientific databases,” In: Proceedings. 9th International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150), 1997, pp. 2–11. Doi:https://doi.org/10.1109/SSDM.1997.621141

  6. Schuh G et al (2019) Data mining definitions and applications for the management of production complexity. Procedia CIRP 81:874–879. https://doi.org/10.1016/j.procir.2019.03.217

    Article  Google Scholar 

  7. Jain A, Jain S, Merh N (2021) Application of association rule mining in a clothing retail store. In: Laha AK (ed) Applied advanced analytics springer proceedings in business and economics. Springer, Singapore

    Google Scholar 

  8. Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. AI Mag 17(3):37. https://doi.org/10.1609/aimag.v17i3.1230

    Article  Google Scholar 

  9. Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth. 1996. Knowledge discovery and data mining: towards a unifying framework. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD’96). AAAI, 82–88.

  10. Martin K, Borah A, Palmatier R (2016) Data privacy: effects on customer and firm performance. J Mark. https://doi.org/10.1509/jm.15.0497

    Article  Google Scholar 

  11. Bleier A, Goldfarb A, Tucker C (2020) Consumer privacy and the future of data-based innovation and marketing. Int J Res Mark. https://doi.org/10.1016/j.ijresmar.2020.03.006

    Article  Google Scholar 

  12. Rakesh A, Tomasz I, Arun S (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Rec 22:207–216. https://doi.org/10.1145/170035.170072

    Article  Google Scholar 

  13. Xie H (2021) Research and case analysis of apriori algorithm based on mining frequent item-sets. Open J Soc Sci 9:458–468. https://doi.org/10.4236/jss.2021.94034

    Article  Google Scholar 

  14. D Colley, C Stanier, M Asaduzzaman (2018) “The impact of object-relational mapping frameworks on relational query performance,” In: 2018 International Conference on Computing, Electronics & Communications Engineering (iCCECE), pp. 47–52. https://doi.org/10.1109/iCCECOME.2018.8659222

  15. Markus H (2008) The Apriori algorithm–a tutorial. In: Goh SS, Ron A, Shen Z (eds) Mathematics and computation in imaging science and information processing. World Scientific

    Google Scholar 

  16. Kumar M (2012) Evaluating the performance of Apriori and predictive Apriori algorithm to find new association rules based on the statistical measures of datasets. IJERT Int J Eng Res Technol 1:1–5

    Google Scholar 

  17. Mutter S, Hall M, Frank E (2004) Using classification to evaluate the output of confidence-based association rule mining. In: Webb GI, Yu X (eds) Advances in artificial intelligence AI 2004. Springer, Berlin

    Google Scholar 

  18. Jin X, Han J (2011) K-Means Clustering. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning. Springer, Boston

    Google Scholar 

  19. Dharshinni NP et al (2019) Analysis of accuracy K-means and Apriori algorithms for patient data clusters. J Phys Conf Series. https://doi.org/10.1088/1742-6596/1230/1/012020

    Article  Google Scholar 

  20. Singh S, Garg R, Mishra PK (2015) Performance analysis of apriori algorithm with different data structures on hadoop cluster. Int J Comput Appl. https://doi.org/10.48550/arXiv.1511.07017

    Article  Google Scholar 

  21. Selvanambi R, Natarajan J (2017) Performance evaluation of association rule mining with enhanced apriori algorithm incorporated with artificial bee colony optimization algorithm. Int J Intell Eng Syst. https://doi.org/10.22266/ijies2017.0430.07

    Article  Google Scholar 

  22. Gaikwad P, Kamble S, Thakur N, Patharkar A (2017) Evaluation of Apriori algorithm on retail market transactional database to get frequent Itemsets. RICE. https://doi.org/10.15439/2017R83

    Article  Google Scholar 

  23. Sinthuja Puviarasan N, Aruna P (2017) Evaluating the performance of association rule mining algorithms. World Appl Sci J 35:43–53. https://doi.org/10.5829/idosi.wasj.2017.43.53

    Article  Google Scholar 

  24. SO Fageeri R Ahmad, H Alhussian (2016) “A performance analysis of association rule mining algorithms,” In: 2016 3rd International Conference on Computer and Information Sciences (ICCOINS), pp 328–333. https://doi.org/10.1109/ICCOINS.2016.7783236

  25. Haotong Wu (2020) Data association rules mining method based on improved apriori algorithm. In 2020 the 4th International Conference on Big Data Research (ICBDR’20). Association for Computing Machinery, New York, NY, USA, 12–17. Doi: https://doi.org/10.1145/3445945.3445948

  26. Y-Q. Wei, R-H Yang , P-Y Liu (2009) “An improved Apriori algorithm for association rules of mining,” In: 2009 IEEE International Symposium on IT in Medicine & Education, 2009, pp 942–946. Doi: http://dx.doi.org/https://doi.org/10.1109/ITIME.2009.5236211

  27. Zhai Liang A, Tang Xinming B, Li Lin A , Jiang Wenliang A (2005) “Temporal association rule mining based on T-Apriori algorithm and its typical application” In: Proceedings of international symposium on spatio-temporal modeling, spatial reasoning, analysis, data mining and data fusion.

  28. S Lakumarapu and R Agarwal (2018) “Time-based connotation rule mining based on T-Apriori Algorithm Using Weka Tool Slants,” In: 2018 International Conference on Computational Techniques, Electronics and Mechanical Systems (CTEMS), pp 261–264. Doi: http://dx.doi.org/https://doi.org/10.1109/CTEMS.2018.8769122

  29. J Ni, B Cao, B Yao, P Yu and L Li (2016) “ARTAR: Temporal association rule mining algorithm based on attribute reduction,”In: 2016 First IEEE International Conference on Computer Communication and the Internet (ICCCI), pp. 350–353. https://doi.org/10.1109/CCI.2016.7778940

  30. Segura-Delgado A, Gacto M, Alcalá R, Alcala-Fdez J (2020) Temporal association rule mining: an overview considering the time variable as an integral or implied component. Wiley Interdiscip Rev Data Min Knowl Discov. https://doi.org/10.1002/widm.1367

    Article  Google Scholar 

  31. Gao J (2021) Research on application of improved association rules mining algorithm in personalized recommendation. J Phys Conf Series. https://doi.org/10.1088/1742-6596/1744/3/032111

    Article  Google Scholar 

  32. Saxena A, Rajpoot V (2021) A comparative analysis of association rule mining algorithms. IOP Conf Series Mater Sci Eng. https://doi.org/10.1088/1757-899X/1099/1/012032

    Article  Google Scholar 

  33. Zheng Y, Chen P, Chen B, Wei D, Wang M (2021) Application of Apriori improvement algorithm in asthma case data mining. J Healthc Eng. https://doi.org/10.1155/2021/9018408

    Article  Google Scholar 

  34. Ratra R, Gulia P (2020) Privacy preserving data mining: techniques and algorithms. Inter J Eng Trends Technol 68:56–62

    Article  Google Scholar 

  35. Özkoç EE (2021) Privacy preserving data mining. In: Thomas C (ed) Data mining––concepts and applications. IntechOpen, Berlin

    Google Scholar 

  36. Bhuyan HK, Kamila NK, Pani SK (2022) Individual privacy in data mining using fuzzy optimization. Eng Optim. https://doi.org/10.1080/0305215X.2021.1922897

    Article  MathSciNet  Google Scholar 

  37. Canayaz M, Kantorovitch I, Mihet R (2021) Consumer privacy and value of consumer data. Swiss Finance Inst Res Paper. https://doi.org/10.2139/ssrn.3986562

    Article  Google Scholar 

  38. Chen Z (2022) Privacy costs and consumer data acquisition: an economic analysis of data privacy regulation. SSRN J. https://doi.org/10.2139/ssrn.4085923

    Article  Google Scholar 

  39. Hristakeva M, Vuppala R (2009). A Survey of Object-Oriented Programming Languages. https://doi.org/10.1145/63320.66468

    Article  Google Scholar 

  40. González-Aparicio, María, Younas, Muhammad, Tuya, Javier, Casado, Ruben. (2016). A new model for testing CRUD operations in a NoSQL database. In: 2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA) pp. 79-86

  41. T Mads (2007) Querying in C#: how language integrated query (LINQ) works. In: Companion to the 22nd ACM SIGPLAN Conference on Object-Oriented Programming Systems and Applications Companion 2007 Oct 20 (pp. 852-853). Doi: http://dx.doi.org/https://doi.org/10.1145/1297846.1297922

  42. C Stevica, J Dragan (1970) A comparative study of the features and performance of ORM tools in a NET environment. In: Objects and Databases 3rd International Conference, ICOODB 2010, Frankfurt/Main, Germany, September 28-30, 2010. Proceedings 3 2010 (pp. 147-158). Springer Berlin 6348. Doi: https://doi.org/10.1007/978-3-642-16092-9_14

  43. P Giuseppe, L Patricia, D Wouter (2016) Energy efficiency of ORM approaches: an empirical evaluation. In: Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement pp. 1-10. Doi: https://doi.org/10.1145/2961111.2962586

  44. Balliauw, Maarten & Decoster, Xavier. (2013). Package manifest reference. https://doi.org/10.1007/978-1-4302-6002-8_11

  45. Hameed T, Sadeeq H (2022) Modified Vigenère cipher algorithm based on new key generation method. Indonesian J Electr Eng Comput Sci 28:954–961. https://doi.org/10.11591/ijeecs.v28.i2.pp954-961

    Article  Google Scholar 

  46. Duy Thanh Tran, Jun-Ho Huh, Full source code for S-Apriori model https://github.com/thanhtd32/SAprioriSystem/tree/main/SAprioriModel

  47. Teng Lv, Ping Y, Weimin He (2018) Survey on JSON data modelling. J Physi Conf Series. https://doi.org/10.1088/1742-6596/1069/1/012101

    Article  Google Scholar 

  48. Grochowski K, Breiter M, Nowak R (2019) Serialization in object-oriented programming languages. In: Sud K, Erdogmus P, Kadry S (eds) Introduction to data science and machine learning. IntechOpen

    Google Scholar 

  49. Duy Thanh Tran, Jun-Ho Huh, Small dataset https://github.com/thanhtd32/SAprioriSystem/tree/main/dataset/smalldataset

  50. Duy Thanh Tran, Jun-Ho Huh, Large dataset https://github.com/thanhtd32/SAprioriSystem/tree/main/dataset/largedataset (We converted the Microsoft SQL Server Adventure- Works2017 database to Json large dataset format)

  51. The microsoft adventure––works 2017 database https://docs.microsoft.com/en-us/sql/samples/adventureworks-install-configure

  52. Duy Thanh Tran, Jun-Ho Huh, Large dataset with two layers of data privacy https://github.com/thanhtd32/SAprioriSystem/tree/main/dataset/privacydataset

  53. Duy Thanh Tran, Jun-Ho Huh, S-Apriori model https://www.nuget.org/packages/SAprioriModel/

  54. Duy Thanh Tran (2023) Doctoral dissertation “New Machine Learning Models for Data Mining Ecosystem” http://www.dcollection.net/handler/kmou/200000666830

Download references

Acknowledgements

We would like to thank the University of Economics and Law, Vietnam National University, Ho Chi Minh City, and the National Korea Maritime and Ocean University for supporting us during this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun-Ho Huh.

Ethics declarations

Competing Interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tran, D.T., Huh, JH. Forecast of seasonal consumption behavior of consumers and privacy-preserving data mining with new S-Apriori algorithm. J Supercomput 79, 12691–12736 (2023). https://doi.org/10.1007/s11227-023-05105-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05105-6

Keywords

Navigation