Skip to main content

Advertisement

Log in

Maximizing diversity in k-pattern set mining through constraint programming and entropy

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Extracting diverse and frequent closed itemsets from large datasets is a core challenge in pattern mining, with significant implications across domains such as fraud detection, recommendation systems, and machine learning. Existing approaches often lack flexibility and efficiency, and struggle with initial itemset selection bias and redundancy. This paper addresses these research gaps by introducing a compact and modular constraint programming model that formalizes the search for diverse patterns. Our approach incorporates a novel global constraint derived from a relaxed Overlap diversity measure, using tighter lower and upper bounds to improve filtering capabilities. Unlike traditional methods, we leverage an entropy-based optimization framework that combines joint entropy maximization with top-k pattern mining to identify the maximally k-diverse pattern set. Our approach ensures more comprehensive and informative pattern discovery by minimizing redundancy and promoting pattern diversity. Extensive experiments validate the effectiveness of the proposed method, demonstrating significant performance gains and superior pattern quality compared to state-of-the-art approaches. Implemented in both sequential and parallel versions, the framework offers an efficient and adaptable solution for anytime pattern mining tasks in various domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Algorithm 2
Algorithm 3
Fig. 4
Algorithm 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Data Availability

The datasets used and/or analyzed during the current study are publicly available and accessible on the Internet. Specific details and the link to these datasets can be found at https://dtai-static.cs.kuleuven.be/CP4IM/datasets/

Notes

  1. Value between \(\langle . \rangle \) indicates the frequency of a pattern.

  2. Detailed proofs of all the propositions presented in this Section can be found in the Supplementary Material.

  3. https://github.com/mohameddouad/ClosedOverlap

  4. https://dtai-static.cs.kuleuven.be/CP4IM/datasets/

References

  1. Amane M, Aissaoui K, Berrada M (2023) Enhancing learning object analysis through fuzzy c-means clustering and web mining methods. Emerg Sci J 7(3)

  2. Aribi N, Ouali A, Lebbah Y et al (2018) Equitable conceptual clustering using OWA operator. In: Phung DQ, Tseng VS, Webb GI et al (eds) Advances in knowledge discovery and data mining - 22nd Pacific-Asia conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018, Proceedings, Part III, Lecture Notes in Computer Science, vol 10939. Springer, pp 465–477. https://doi.org/10.1007/978-3-319-93040-4_37

  3. Belfodil A, Belfodil A, Bendimerad A et al (2019) FSSD - A fast and efficient algorithm for subgroup set discovery. In: Singh L, Veaux RDD, Karypis G et al (eds) 2019 IEEE international conference on data science and advanced analytics, DSAA 2019, Washington, DC, USA, October 5-8, 2019. IEEE, pp 91–99. https://doi.org/10.1109/DSAA.2019.00023

  4. Belmecheri N, Aribi N, Lazaar N et al (2023) Boosting the learning for ranking patterns. Algorithms 16(5):21. https://doi.org/10.3390/A16050218

    Article  MATH  Google Scholar 

  5. Bendimerad A, Lijffijt J, Plantevit M et al (2020) Gibbs sampling subjectively interesting tiles. In: Berthold MR, Feelders A, Krempl G (eds) Advances in intelligent data analysis XVIII - 18th international symposium on intelligent data analysis, IDA 2020, Konstanz, Germany, April 27-29, 2020, Proceedings, Lecture Notes in Computer Science, vol 12080. Springer, pp 80–92. https://doi.org/10.1007/978-3-030-44584-3_7

  6. Bie TD (2011) Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Min Knowl Discov 23(3):407–44. https://doi.org/10.1007/s10618-010-0209-3

    Article  MathSciNet  MATH  Google Scholar 

  7. Bosc G, Boulicaut J, Raïssi C et al (2018) Anytime discovery of a diverse set of patterns with monte carlo tree search. Data Min Knowl Discov 32(3):604–65. https://doi.org/10.1007/s10618-017-0547-5

    Article  MathSciNet  MATH  Google Scholar 

  8. Chakraborty S, Fremont DJ, Meel KS et al (2014) Distribution-aware sampling and weighted model counting for sat. arXiv:1404.2984

  9. Cover TM, Thomas JA (2006) Elements of information theory (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, USA

    MATH  Google Scholar 

  10. De Raedt L, Zimmermann A (2007) Constraint-based pattern set mining. In: Proceedings of the seventh SIAM international conference on data mining. SIAM, Minneapolis, Minnesota, USA

  11. De Raedt L, Guns T, Nijssen S (2008) Constraint programming for itemset mining. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 204–212

  12. Dzyuba V, van Leeuwen M, Raedt LD (2017) Flexible constrained sampling with guarantees for pattern mining. Data Min Knowl Discov 31(5):1266–129. https://doi.org/10.1007/s10618-017-0501-6

    Article  MathSciNet  MATH  Google Scholar 

  13. Gong Z, Zhong P, Hu W (2018) Diversity in machine learning. CoRR abs/1807.01477. http://arxiv.org/abs/1807.01477, arXiv:1807.01477

  14. Gong Z, Zhong P, Hu W (2019) Diversity in machine learning. IEEE Access 7:64323–6435. https://doi.org/10.1109/ACCESS.2019.2917620

    Article  MATH  Google Scholar 

  15. Haugland V, Kjølleberg M, Larsen SE et al (2014) A two-armed bandit collective for hierarchical examplar based mining of frequent itemsets with applications to intrusion detection. Trans Comput Collect Intell 14:1–19. https://api.semanticscholar.org/CorpusID:45799111

  16. Hien A, Loudni S, Aribi N et al (2021) A relaxation-based approach for mining diverse closed patterns. In: Hutter F, Kersting K, Lijffijt J et al (eds) Machine learning and knowledge discovery in databases. Springer International Publishing, Cham, pp 36–54

    MATH  Google Scholar 

  17. Hien A, Aribi N, Loudni S et al (2024) Mining diverse sets of patterns with constraint programming using the pairwise jaccard similarity relaxation. Constraints An Int J 29(1):80–11. https://doi.org/10.1007/S10601-024-09373-8

    Article  MathSciNet  MATH  Google Scholar 

  18. Khiari M, Boizumault P, Crémilleux B (2010) Constraint programming for mining n-ary patterns. In: CP’10, LNCS, vol 6308. Springer, pp 552–567

  19. Kim J, Choi I, Li Q (2021) Customer satisfaction of recommender system: examining accuracy and diversity in several types of recommendation approaches. Sustainability 13(11). https://doi.org/10.3390/su13116165, https://www.mdpi.com/2071-1050/13/11/6165

  20. Knobbe AJ, Ho EKY (2006a) Maximally informative k-itemsets and their efficient discovery. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 237–244. https://doi.org/10.1145/1150402.115043

  21. Knobbe AJ, Ho EKY (2006b) Pattern teams. In: European conference on principles of data mining and knowledge discovery, pp 577–584

  22. Lazaar N, Lebbah Y, Loudni S et al (2016) A global constraint for closed frequent pattern mining. In: Principles and practice of constraint programming - 22nd international conference, CP 2016, Toulouse, France, September 5-9, 2016, Proceedings, pp 333–349. https://doi.org/10.1007/978-3-319-44953-1_22

  23. van Leeuwen M, Knobbe A (2012) Diverse subgroup set discovery. Data Min Knowl Disc 25(2):208–242. https://doi.org/10.1007/s10618-012-0273-y

    Article  MathSciNet  MATH  Google Scholar 

  24. Pan F, Roberts A, McMillan L et al (2007) Sample selection for maximal diversity. In: Seventh IEEE International Conference on Data Mining (ICDM 2007), pp 262–271.https://doi.org/10.1109/ICDM.2007.16

  25. Prud’homme C, Fages JG, Lorca X (2016) Choco solver documentation

  26. Régin JC (2011) Global constraints: a survey. Springer New York, New York, NY, pp 63–134. https://doi.org/10.1007/978-1-4419-1644-0_3

  27. Rossi F, van Beek P, Walsh T (2006) Handbook of constraint programming, 1st edn. Elsevier

    MATH  Google Scholar 

  28. Schaus P, Aoga JOR, Guns T (2017) Coversize: a global constraint for frequency-based itemset mining. In: Principles and practice of constraint programming, pp 529–546

  29. Sullivan D, Smyth B, Wilson D (2004) Preserving recommender accuracy and diversity in sparse datasets. Int J Artif Intell Tools 13(01):219–235. https://doi.org/10.1142/S0218213004001491

  30. Vernerey C, Loudni S, Aribi N et al (2022) Threshold-free pattern mining meets multi-objective optimization: Application to asso ciation rules. In: Raedt LD (ed) Proceedings of the thirty-first international joint conference on artificial intelligence, IJCAI-22. International Joint Conferences on Artificial Intelligence Organization, pp 1880–1886. https://doi.org/10.24963/ijcai.2022/261, main Track

  31. Vreeken J, van Leeuwen M, Siebes A (2011) Krimp: mining itemsets that compress. Data Min Knowl Discov 23(1):169–21. https://doi.org/10.1007/s10618-010-0202-x

    Article  MathSciNet  MATH  Google Scholar 

  32. Zaki MJ, Parthasarathy S, Ogihara M et al (1997) New algorithms for fast discovery of association rules. In: Heckerman D, Mannila H, Pregibon D (eds) Proceedings of the third international conference on knowledge discovery and data mining (KDD-97), Newport Beach, California, USA, August 14-17, 1997. AAAI Press, pp 283–286. http://www.aaai.org/Library/KDD/1997/kdd97-060.php

  33. Zhang C, Liu J, Qu Y et al (2018) Enhancing the robustness of recommender systems against spammers. PLoS ONE 13(11):e0206458. https://doi.org/10.1371/journal.pone.0206458

    Article  MATH  Google Scholar 

  34. Zhu N (2023) Research on customer relationship segmentation of apparel retail industry through data mining. Ital Publication 4(2)

Download references

Acknowledgements

We would like to thank the Directorate-General for Scientific Research and Technological Development (DGRSDT) for its support of this research work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed El Amine Douad.

Ethics declarations

To ensure objectivity and transparency in the research and to confirm that accepted principles of ethical and professional conduct were followed, the authors certify the following statements:

Ethical Approval

This research did not involve human participants or animals and, therefore, did not require approval from an ethics committee.

Informed Consent

All authors listed on the title page have contributed to this work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission to the Journal of Applied Intelligence.

Disclosure of Potential Conflicts of Interest

All authors declare no conflict of interest and approve the submission of the manuscript. We affirm that the article is our original work, has not been previously published, and is not under consideration for publication elsewhere.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 635 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Douad, M.E.A., Aribi, N., Loudni, S. et al. Maximizing diversity in k-pattern set mining through constraint programming and entropy. Appl Intell 55, 597 (2025). https://doi.org/10.1007/s10489-025-06482-6

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-025-06482-6

Keywords