Abstract
High utility itemsets mining has become a hot research topic in association rules mining. But many algorithms directly mine datasets, and there is a problem on dense datasets, that is, too many itemsets stored in each transaction. In the process of mining association rules, it takes a lot of storage space and affects the running efficiency of the algorithm. In the existing algorithms, there is a lack of efficient itemset mining algorithms for dense datasets. Aiming at this problem, a high utility itemsets mining algorithm based on divide-and-conquer strategy is proposed. Using the improved silhouette coefficient to select the best K-means cluster number, the datasets are divided into many smaller subclasses. Then, the association rules mining is performed by Boolean matrix compression operation on each subclass, and iteratively merge them to get the final mining results. We also analyze the time complexity of our method and Apriori algorithm. Finally, experimental results on several well-known real world datasets are conducted to show that the improved algorithm performs faster and consumes less memory on dense datasets, which can effectively improve the computational efficiency of the algorithm.
Similar content being viewed by others
References
Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate generation. ACM Sigmod Record, 29(2), 1–12.
Tseng, V. S., Wu, C. W., Shie, B. E., et al. (2010). UP-Growth: An efficient algorithm for high utility itemset mining. In Proceedings of the 16th international conference on knowledge discovery and data mining (pp. 253–262).
Tseng, V. S., Shie, B. E., Wu, C. W., et al. (2013). Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Transactions on Knowledge and Data Engineering, 25(8), 1772–1786.
Agrawal, R., Imielinaki, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD international conference on the management of data (pp. 207–216).
Singh, H., & Dhir, R. (2012). An effective method for association rule mining based on transactional matrix. International Journal of Computer Applications, 39(9), 13–15.
Fukuda, T., Morimoto, Y., Morishita, S., et al. (2001). Data mining with optimized two-dimensional association rules. ACM Transactions on Database Systems, 26(2), 179–213.
Niu, K., Jiao, H., & Gao, Z., et al. (2017). A developed algorithm based on frequent matrix. In Proceedings of the 5th international conference on bioinformatics and computational biology (pp. 55–58).
Oguz, D., & Ergenc, B. (2012). Incremental itemset mining based on matrix Apriori algorithm. In Proceedings of the 14th international conference on data warehousing and knowledge discovery (pp. 192–204).
Ying, C., & Zhigang, M. (2016). Improved Apriori algorithm based on vector matrix optimization frequent items. Journal of Jilin University (Science Edition), 54(2), 349–353.
Roul, R. K., Varshneya, S., Kalra, A., et al. (2015). A novel modified Apriori approach for web document clustering. Computer Science, 33, 159–171.
Dahbi, A., Mouhir, M., & Balouki, Y. (2016). Classification of association rules based on K-means algorithm. In Proceedings the 4th IEEE international colloquium on information science and technology (pp. 300–305).
Yao, H., & Hamilton, H. J. (2006). Mining itemsets utilities from transaction databases. Data & Knowledge Engineering, 59(3), 603–626.
Ling, W., Jian, Y., Meng, P. P., et al. (2018). Mining temporal association rules with frequent itemsets tree. Applied Soft Computing, 62, 817–829.
Nguyen, L. T. T., Vo, B., Selamat, A., et al. (2017). Etarm: an efficient top-k association rule mining algorithm. Applied Intelligence, 48(5), 1148–1160.
Ming, T. W. J., Justin, Z., Sanket, C., et al. (2018). Mining association rules for low-frequency itemsets. PLoS ONE, 13(7), e0198066.
Lin, C. W., Yang, L., Fournier-Viger, P., et al. (2016). Mining high-utility itemsets based on particle swarm optimization. Engineering Applications of Artificial Intelligence, 55, 320–330.
Jha, J., & Ragha, L. (2013). Educational data mining using improved Apriori algorithm. International Journal of Information and Computation Technology, 3(5), 411–418.
Dutt, S., Choudhary, N., & Singh, D. (2014). An improved Apriori algorithm based on matrix data structure. Global Journal of Computer Science and Technology, 14(5), 6–10.
Hartigan, J. A., & Wong, M. A. (1979). Algorithm as 136: a K-means clustering algorithm. Journal of the Royal Statistical Society Series C: Applied Statistics, 28(1), 100–108.
Chen, L., He, S., & Jiang, Q. (2009). Validation indices for projective clustering. Frontiers of Computer Science, 3(4), 477–484.
Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65.
Yu, C. H., Gao, F., Wang, Q. L., et al. (2016). Quantum algorithm for association rules mining. Physical Review A, 94(4), 1–8.
Mai, T., Vo, B., & Nguyen, L. T. T. (2017). A lattice-based approach for mining high utility association rules. Information Sciences, 399, 81–97.
Teng, S., Li, J., Li, R., & Zhang, W. (2013). The calculation of similarity and its application in data mining. In Proceedings the international conference on pervasive computing and the networked world (pp. 563–574).
Li, L., Li, Q., Wu, Y., et al. (2017). Mining association rules based on deep pruning strategies. Wireless Personal Communications, 102(3), 2157–2181.
Zhao, C. J., Sun, Z. X., & Yuan, Y. (2016). An efficient association rule mining algorithm based on prejudging and screening. Journal of Electronics & Information Technology, 38(7), 1654–1659.
Goethals, B., & Zak, M. (2016). Frequent itemset mining implementations repository. http://fimi.ua.ac.be/.
Pisharath, J., Liu, Y., & Parhi, J. (2016). NU-MineBench Version3.0.1. http://cucis.ece.northwestern.edu/projects/DMS/MineBench.html.
Yesilbudak, M. (2016). Clustering analysis of multidimensional wind speed data using k-means approach. In Proceedings of the 2016 IEEE international conference on renewable energy research and applications (pp. 961–965).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liao, J., Wu, S. & Liu, A. High Utility Itemsets Mining Based on Divide-and-Conquer Strategy. Wireless Pers Commun 116, 1639–1657 (2021). https://doi.org/10.1007/s11277-020-07753-w
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-020-07753-w