Skip to main content
Log in

High Utility Itemsets Mining Based on Divide-and-Conquer Strategy

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

High utility itemsets mining has become a hot research topic in association rules mining. But many algorithms directly mine datasets, and there is a problem on dense datasets, that is, too many itemsets stored in each transaction. In the process of mining association rules, it takes a lot of storage space and affects the running efficiency of the algorithm. In the existing algorithms, there is a lack of efficient itemset mining algorithms for dense datasets. Aiming at this problem, a high utility itemsets mining algorithm based on divide-and-conquer strategy is proposed. Using the improved silhouette coefficient to select the best K-means cluster number, the datasets are divided into many smaller subclasses. Then, the association rules mining is performed by Boolean matrix compression operation on each subclass, and iteratively merge them to get the final mining results. We also analyze the time complexity of our method and Apriori algorithm. Finally, experimental results on several well-known real world datasets are conducted to show that the improved algorithm performs faster and consumes less memory on dense datasets, which can effectively improve the computational efficiency of the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate generation. ACM Sigmod Record, 29(2), 1–12.

    Article  Google Scholar 

  2. Tseng, V. S., Wu, C. W., Shie, B. E., et al. (2010). UP-Growth: An efficient algorithm for high utility itemset mining. In Proceedings of the 16th international conference on knowledge discovery and data mining (pp. 253–262).

  3. Tseng, V. S., Shie, B. E., Wu, C. W., et al. (2013). Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Transactions on Knowledge and Data Engineering, 25(8), 1772–1786.

    Article  Google Scholar 

  4. Agrawal, R., Imielinaki, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD international conference on the management of data (pp. 207–216).

  5. Singh, H., & Dhir, R. (2012). An effective method for association rule mining based on transactional matrix. International Journal of Computer Applications, 39(9), 13–15.

    Article  Google Scholar 

  6. Fukuda, T., Morimoto, Y., Morishita, S., et al. (2001). Data mining with optimized two-dimensional association rules. ACM Transactions on Database Systems, 26(2), 179–213.

    Article  Google Scholar 

  7. Niu, K., Jiao, H., & Gao, Z., et al. (2017). A developed algorithm based on frequent matrix. In Proceedings of the 5th international conference on bioinformatics and computational biology (pp. 55–58).

  8. Oguz, D., & Ergenc, B. (2012). Incremental itemset mining based on matrix Apriori algorithm. In Proceedings of the 14th international conference on data warehousing and knowledge discovery (pp. 192–204).

  9. Ying, C., & Zhigang, M. (2016). Improved Apriori algorithm based on vector matrix optimization frequent items. Journal of Jilin University (Science Edition), 54(2), 349–353.

    Google Scholar 

  10. Roul, R. K., Varshneya, S., Kalra, A., et al. (2015). A novel modified Apriori approach for web document clustering. Computer Science, 33, 159–171.

    Google Scholar 

  11. Dahbi, A., Mouhir, M., & Balouki, Y. (2016). Classification of association rules based on K-means algorithm. In Proceedings the 4th IEEE international colloquium on information science and technology (pp. 300–305).

  12. Yao, H., & Hamilton, H. J. (2006). Mining itemsets utilities from transaction databases. Data & Knowledge Engineering, 59(3), 603–626.

    Article  Google Scholar 

  13. Ling, W., Jian, Y., Meng, P. P., et al. (2018). Mining temporal association rules with frequent itemsets tree. Applied Soft Computing, 62, 817–829.

    Article  Google Scholar 

  14. Nguyen, L. T. T., Vo, B., Selamat, A., et al. (2017). Etarm: an efficient top-k association rule mining algorithm. Applied Intelligence, 48(5), 1148–1160.

    Google Scholar 

  15. Ming, T. W. J., Justin, Z., Sanket, C., et al. (2018). Mining association rules for low-frequency itemsets. PLoS ONE, 13(7), e0198066.

    Article  Google Scholar 

  16. Lin, C. W., Yang, L., Fournier-Viger, P., et al. (2016). Mining high-utility itemsets based on particle swarm optimization. Engineering Applications of Artificial Intelligence, 55, 320–330.

    Article  Google Scholar 

  17. Jha, J., & Ragha, L. (2013). Educational data mining using improved Apriori algorithm. International Journal of Information and Computation Technology, 3(5), 411–418.

    Google Scholar 

  18. Dutt, S., Choudhary, N., & Singh, D. (2014). An improved Apriori algorithm based on matrix data structure. Global Journal of Computer Science and Technology, 14(5), 6–10.

    Google Scholar 

  19. Hartigan, J. A., & Wong, M. A. (1979). Algorithm as 136: a K-means clustering algorithm. Journal of the Royal Statistical Society Series C: Applied Statistics, 28(1), 100–108.

    MATH  Google Scholar 

  20. Chen, L., He, S., & Jiang, Q. (2009). Validation indices for projective clustering. Frontiers of Computer Science, 3(4), 477–484.

    Article  Google Scholar 

  21. Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65.

    Article  Google Scholar 

  22. Yu, C. H., Gao, F., Wang, Q. L., et al. (2016). Quantum algorithm for association rules mining. Physical Review A, 94(4), 1–8.

    Article  Google Scholar 

  23. Mai, T., Vo, B., & Nguyen, L. T. T. (2017). A lattice-based approach for mining high utility association rules. Information Sciences, 399, 81–97.

    Article  Google Scholar 

  24. Teng, S., Li, J., Li, R., & Zhang, W. (2013). The calculation of similarity and its application in data mining. In Proceedings the international conference on pervasive computing and the networked world (pp. 563–574).

  25. Li, L., Li, Q., Wu, Y., et al. (2017). Mining association rules based on deep pruning strategies. Wireless Personal Communications, 102(3), 2157–2181.

    Article  Google Scholar 

  26. Zhao, C. J., Sun, Z. X., & Yuan, Y. (2016). An efficient association rule mining algorithm based on prejudging and screening. Journal of Electronics & Information Technology, 38(7), 1654–1659.

    Google Scholar 

  27. Goethals, B., & Zak, M. (2016). Frequent itemset mining implementations repository. http://fimi.ua.ac.be/.

  28. Pisharath, J., Liu, Y., & Parhi, J. (2016). NU-MineBench Version3.0.1. http://cucis.ece.northwestern.edu/projects/DMS/MineBench.html.

  29. Yesilbudak, M. (2016). Clustering analysis of multidimensional wind speed data using k-means approach. In Proceedings of the 2016 IEEE international conference on renewable energy research and applications (pp. 961–965).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiyong Liao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liao, J., Wu, S. & Liu, A. High Utility Itemsets Mining Based on Divide-and-Conquer Strategy. Wireless Pers Commun 116, 1639–1657 (2021). https://doi.org/10.1007/s11277-020-07753-w

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-020-07753-w

Keywords

Navigation