Abstract
Nowadays, supermarkets and retail stores all use software systems with databases to store customer transactions. Over time, the volume of data is also increasing and it contributes a lot of hidden value in this data warehouse, mining data from historical transactions will find out the buying patterns and behavior of consumers, which can assist in improving sales by reaching customers more precisely. Data-mining techniques allow us to exploit synthetic information in many aspects, such as association rules for statistics and decision support in many fields. Most users of e-commerce systems or web platforms are concerned about privacy protection, such as privacy requirements for name, occupation, age, interests, residence, or sales transactions on the e-commerce system. Therefore, protecting the privacy of electronic service users in data mining is also an important factor that needs to be considered. For those important reasons, the Apriori algorithm was researched and extrapolated into a new S-Apriori algorithm for the concept of seasonal shopping. This paper applied the S-Apriori, ORM model, SQL language, and C# to build the libraries for the forecast of Seasonal Consumption Behavior of Consumers. Also, a new Thanh and Huh Cryptography algorithm for privacy-preserving filters is proposed for data-mining processing privacy protection. The paper experimented on two datasets based on a small dataset with 37 records and the Adventure large dataset of Microsoft with 172,459 records, while the software provides association rules with the corresponding confidence ratio for users to easily make decisions. In addition, the model will be packaged and published to the Microsoft Nuget ecosystem, developers and researchers can use it to develop association rule mining systems or further extend it based on the new S-Apriori model.























Similar content being viewed by others
Data availability
Please contact the corresponding author for data requests. Also, the C# coding sample and S-Apriori model are available. Duy Thanh Tran, Jun-Ho Huh [47], Full source code for S-Apriori model https://github.com/thanhtd32/SAprioriSystem/tree/main/SAprioriModelDuy Thanh Tran, Jun-Ho Huh [50], Small dataset https://github.com/thanhtd32/SAprioriSystem/tree/main/dataset/smalldatasetDuy Thanh Tran, Jun-Ho Huh [51], Large dataset https://github.com/thanhtd32/SAprioriSystem/tree/main/dataset/largedataset (We converted the Microsoft SQL Server Adventure- Works2017 database to Json large dataset format) Duy Thanh Tran, Jun-Ho Huh [53], Large dataset with two layers of data privacy https://github.com/thanhtd32/SAprioriSystem/tree/main/dataset/privacydatasetDuy Thanh Tran, Jun-Ho Huh, S-Apriori [54] model in Microsoft Nuget System https://www.nuget.org/packages/SAprioriModel/
References
Golec D, Strugar I, Belak D (2022) The benefits of enterprise data warehouse implementation in cloud vs. on-premises. Entrenova Enterp Res Innov 7(1):66–74. https://doi.org/10.54820/DMZS9230
Li H, Sheu PCY (2022) A scalable association rule learning and recommendation algorithm for large-scale microarray datasets. J Big Data 9:35. https://doi.org/10.1186/s40537-022-00577-4
X Yingzhuo, W Xuewen (2021) “Research on community consumer behavior based on association rules analysis,”In: 2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP), pp. 1213–1216, doi: https://doi.org/10.1109/ICSP51882.2021.9408917
S Diwandari and U Zaky (2021) “Analysis of customer purchase behavior using association rules in e-shop,” In: 2021 IEEE 5th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), pp. 144–149, doi: https://doi.org/10.1109/ICITISEE53823.2021.9655892
U. Fayyad, “Data mining and knowledge discovery in databases: implications for scientific databases,” In: Proceedings. 9th International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150), 1997, pp. 2–11. Doi:https://doi.org/10.1109/SSDM.1997.621141
Schuh G et al (2019) Data mining definitions and applications for the management of production complexity. Procedia CIRP 81:874–879. https://doi.org/10.1016/j.procir.2019.03.217
Jain A, Jain S, Merh N (2021) Application of association rule mining in a clothing retail store. In: Laha AK (ed) Applied advanced analytics springer proceedings in business and economics. Springer, Singapore
Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. AI Mag 17(3):37. https://doi.org/10.1609/aimag.v17i3.1230
Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth. 1996. Knowledge discovery and data mining: towards a unifying framework. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD’96). AAAI, 82–88.
Martin K, Borah A, Palmatier R (2016) Data privacy: effects on customer and firm performance. J Mark. https://doi.org/10.1509/jm.15.0497
Bleier A, Goldfarb A, Tucker C (2020) Consumer privacy and the future of data-based innovation and marketing. Int J Res Mark. https://doi.org/10.1016/j.ijresmar.2020.03.006
Rakesh A, Tomasz I, Arun S (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Rec 22:207–216. https://doi.org/10.1145/170035.170072
Xie H (2021) Research and case analysis of apriori algorithm based on mining frequent item-sets. Open J Soc Sci 9:458–468. https://doi.org/10.4236/jss.2021.94034
D Colley, C Stanier, M Asaduzzaman (2018) “The impact of object-relational mapping frameworks on relational query performance,” In: 2018 International Conference on Computing, Electronics & Communications Engineering (iCCECE), pp. 47–52. https://doi.org/10.1109/iCCECOME.2018.8659222
Markus H (2008) The Apriori algorithm–a tutorial. In: Goh SS, Ron A, Shen Z (eds) Mathematics and computation in imaging science and information processing. World Scientific
Kumar M (2012) Evaluating the performance of Apriori and predictive Apriori algorithm to find new association rules based on the statistical measures of datasets. IJERT Int J Eng Res Technol 1:1–5
Mutter S, Hall M, Frank E (2004) Using classification to evaluate the output of confidence-based association rule mining. In: Webb GI, Yu X (eds) Advances in artificial intelligence AI 2004. Springer, Berlin
Jin X, Han J (2011) K-Means Clustering. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning. Springer, Boston
Dharshinni NP et al (2019) Analysis of accuracy K-means and Apriori algorithms for patient data clusters. J Phys Conf Series. https://doi.org/10.1088/1742-6596/1230/1/012020
Singh S, Garg R, Mishra PK (2015) Performance analysis of apriori algorithm with different data structures on hadoop cluster. Int J Comput Appl. https://doi.org/10.48550/arXiv.1511.07017
Selvanambi R, Natarajan J (2017) Performance evaluation of association rule mining with enhanced apriori algorithm incorporated with artificial bee colony optimization algorithm. Int J Intell Eng Syst. https://doi.org/10.22266/ijies2017.0430.07
Gaikwad P, Kamble S, Thakur N, Patharkar A (2017) Evaluation of Apriori algorithm on retail market transactional database to get frequent Itemsets. RICE. https://doi.org/10.15439/2017R83
Sinthuja Puviarasan N, Aruna P (2017) Evaluating the performance of association rule mining algorithms. World Appl Sci J 35:43–53. https://doi.org/10.5829/idosi.wasj.2017.43.53
SO Fageeri R Ahmad, H Alhussian (2016) “A performance analysis of association rule mining algorithms,” In: 2016 3rd International Conference on Computer and Information Sciences (ICCOINS), pp 328–333. https://doi.org/10.1109/ICCOINS.2016.7783236
Haotong Wu (2020) Data association rules mining method based on improved apriori algorithm. In 2020 the 4th International Conference on Big Data Research (ICBDR’20). Association for Computing Machinery, New York, NY, USA, 12–17. Doi: https://doi.org/10.1145/3445945.3445948
Y-Q. Wei, R-H Yang , P-Y Liu (2009) “An improved Apriori algorithm for association rules of mining,” In: 2009 IEEE International Symposium on IT in Medicine & Education, 2009, pp 942–946. Doi: http://dx.doi.org/https://doi.org/10.1109/ITIME.2009.5236211
Zhai Liang A, Tang Xinming B, Li Lin A , Jiang Wenliang A (2005) “Temporal association rule mining based on T-Apriori algorithm and its typical application” In: Proceedings of international symposium on spatio-temporal modeling, spatial reasoning, analysis, data mining and data fusion.
S Lakumarapu and R Agarwal (2018) “Time-based connotation rule mining based on T-Apriori Algorithm Using Weka Tool Slants,” In: 2018 International Conference on Computational Techniques, Electronics and Mechanical Systems (CTEMS), pp 261–264. Doi: http://dx.doi.org/https://doi.org/10.1109/CTEMS.2018.8769122
J Ni, B Cao, B Yao, P Yu and L Li (2016) “ARTAR: Temporal association rule mining algorithm based on attribute reduction,”In: 2016 First IEEE International Conference on Computer Communication and the Internet (ICCCI), pp. 350–353. https://doi.org/10.1109/CCI.2016.7778940
Segura-Delgado A, Gacto M, Alcalá R, Alcala-Fdez J (2020) Temporal association rule mining: an overview considering the time variable as an integral or implied component. Wiley Interdiscip Rev Data Min Knowl Discov. https://doi.org/10.1002/widm.1367
Gao J (2021) Research on application of improved association rules mining algorithm in personalized recommendation. J Phys Conf Series. https://doi.org/10.1088/1742-6596/1744/3/032111
Saxena A, Rajpoot V (2021) A comparative analysis of association rule mining algorithms. IOP Conf Series Mater Sci Eng. https://doi.org/10.1088/1757-899X/1099/1/012032
Zheng Y, Chen P, Chen B, Wei D, Wang M (2021) Application of Apriori improvement algorithm in asthma case data mining. J Healthc Eng. https://doi.org/10.1155/2021/9018408
Ratra R, Gulia P (2020) Privacy preserving data mining: techniques and algorithms. Inter J Eng Trends Technol 68:56–62
Özkoç EE (2021) Privacy preserving data mining. In: Thomas C (ed) Data mining––concepts and applications. IntechOpen, Berlin
Bhuyan HK, Kamila NK, Pani SK (2022) Individual privacy in data mining using fuzzy optimization. Eng Optim. https://doi.org/10.1080/0305215X.2021.1922897
Canayaz M, Kantorovitch I, Mihet R (2021) Consumer privacy and value of consumer data. Swiss Finance Inst Res Paper. https://doi.org/10.2139/ssrn.3986562
Chen Z (2022) Privacy costs and consumer data acquisition: an economic analysis of data privacy regulation. SSRN J. https://doi.org/10.2139/ssrn.4085923
Hristakeva M, Vuppala R (2009). A Survey of Object-Oriented Programming Languages. https://doi.org/10.1145/63320.66468
González-Aparicio, María, Younas, Muhammad, Tuya, Javier, Casado, Ruben. (2016). A new model for testing CRUD operations in a NoSQL database. In: 2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA) pp. 79-86
T Mads (2007) Querying in C#: how language integrated query (LINQ) works. In: Companion to the 22nd ACM SIGPLAN Conference on Object-Oriented Programming Systems and Applications Companion 2007 Oct 20 (pp. 852-853). Doi: http://dx.doi.org/https://doi.org/10.1145/1297846.1297922
C Stevica, J Dragan (1970) A comparative study of the features and performance of ORM tools in a NET environment. In: Objects and Databases 3rd International Conference, ICOODB 2010, Frankfurt/Main, Germany, September 28-30, 2010. Proceedings 3 2010 (pp. 147-158). Springer Berlin 6348. Doi: https://doi.org/10.1007/978-3-642-16092-9_14
P Giuseppe, L Patricia, D Wouter (2016) Energy efficiency of ORM approaches: an empirical evaluation. In: Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement pp. 1-10. Doi: https://doi.org/10.1145/2961111.2962586
Balliauw, Maarten & Decoster, Xavier. (2013). Package manifest reference. https://doi.org/10.1007/978-1-4302-6002-8_11
Hameed T, Sadeeq H (2022) Modified Vigenère cipher algorithm based on new key generation method. Indonesian J Electr Eng Comput Sci 28:954–961. https://doi.org/10.11591/ijeecs.v28.i2.pp954-961
Duy Thanh Tran, Jun-Ho Huh, Full source code for S-Apriori model https://github.com/thanhtd32/SAprioriSystem/tree/main/SAprioriModel
Teng Lv, Ping Y, Weimin He (2018) Survey on JSON data modelling. J Physi Conf Series. https://doi.org/10.1088/1742-6596/1069/1/012101
Grochowski K, Breiter M, Nowak R (2019) Serialization in object-oriented programming languages. In: Sud K, Erdogmus P, Kadry S (eds) Introduction to data science and machine learning. IntechOpen
Duy Thanh Tran, Jun-Ho Huh, Small dataset https://github.com/thanhtd32/SAprioriSystem/tree/main/dataset/smalldataset
Duy Thanh Tran, Jun-Ho Huh, Large dataset https://github.com/thanhtd32/SAprioriSystem/tree/main/dataset/largedataset (We converted the Microsoft SQL Server Adventure- Works2017 database to Json large dataset format)
The microsoft adventure––works 2017 database https://docs.microsoft.com/en-us/sql/samples/adventureworks-install-configure
Duy Thanh Tran, Jun-Ho Huh, Large dataset with two layers of data privacy https://github.com/thanhtd32/SAprioriSystem/tree/main/dataset/privacydataset
Duy Thanh Tran, Jun-Ho Huh, S-Apriori model https://www.nuget.org/packages/SAprioriModel/
Duy Thanh Tran (2023) Doctoral dissertation “New Machine Learning Models for Data Mining Ecosystem” http://www.dcollection.net/handler/kmou/200000666830
Acknowledgements
We would like to thank the University of Economics and Law, Vietnam National University, Ho Chi Minh City, and the National Korea Maritime and Ocean University for supporting us during this research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing Interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tran, D.T., Huh, JH. Forecast of seasonal consumption behavior of consumers and privacy-preserving data mining with new S-Apriori algorithm. J Supercomput 79, 12691–12736 (2023). https://doi.org/10.1007/s11227-023-05105-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05105-6