Abstract
Extracting diverse and frequent closed itemsets from large datasets is a core challenge in pattern mining, with significant implications across domains such as fraud detection, recommendation systems, and machine learning. Existing approaches often lack flexibility and efficiency, and struggle with initial itemset selection bias and redundancy. This paper addresses these research gaps by introducing a compact and modular constraint programming model that formalizes the search for diverse patterns. Our approach incorporates a novel global constraint derived from a relaxed Overlap diversity measure, using tighter lower and upper bounds to improve filtering capabilities. Unlike traditional methods, we leverage an entropy-based optimization framework that combines joint entropy maximization with top-k pattern mining to identify the maximally k-diverse pattern set. Our approach ensures more comprehensive and informative pattern discovery by minimizing redundancy and promoting pattern diversity. Extensive experiments validate the effectiveness of the proposed method, demonstrating significant performance gains and superior pattern quality compared to state-of-the-art approaches. Implemented in both sequential and parallel versions, the framework offers an efficient and adaptable solution for anytime pattern mining tasks in various domains.











Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Data Availability
The datasets used and/or analyzed during the current study are publicly available and accessible on the Internet. Specific details and the link to these datasets can be found at https://dtai-static.cs.kuleuven.be/CP4IM/datasets/
Notes
Value between \(\langle . \rangle \) indicates the frequency of a pattern.
Detailed proofs of all the propositions presented in this Section can be found in the Supplementary Material.
References
Amane M, Aissaoui K, Berrada M (2023) Enhancing learning object analysis through fuzzy c-means clustering and web mining methods. Emerg Sci J 7(3)
Aribi N, Ouali A, Lebbah Y et al (2018) Equitable conceptual clustering using OWA operator. In: Phung DQ, Tseng VS, Webb GI et al (eds) Advances in knowledge discovery and data mining - 22nd Pacific-Asia conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018, Proceedings, Part III, Lecture Notes in Computer Science, vol 10939. Springer, pp 465–477. https://doi.org/10.1007/978-3-319-93040-4_37
Belfodil A, Belfodil A, Bendimerad A et al (2019) FSSD - A fast and efficient algorithm for subgroup set discovery. In: Singh L, Veaux RDD, Karypis G et al (eds) 2019 IEEE international conference on data science and advanced analytics, DSAA 2019, Washington, DC, USA, October 5-8, 2019. IEEE, pp 91–99. https://doi.org/10.1109/DSAA.2019.00023
Belmecheri N, Aribi N, Lazaar N et al (2023) Boosting the learning for ranking patterns. Algorithms 16(5):21. https://doi.org/10.3390/A16050218
Bendimerad A, Lijffijt J, Plantevit M et al (2020) Gibbs sampling subjectively interesting tiles. In: Berthold MR, Feelders A, Krempl G (eds) Advances in intelligent data analysis XVIII - 18th international symposium on intelligent data analysis, IDA 2020, Konstanz, Germany, April 27-29, 2020, Proceedings, Lecture Notes in Computer Science, vol 12080. Springer, pp 80–92. https://doi.org/10.1007/978-3-030-44584-3_7
Bie TD (2011) Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Min Knowl Discov 23(3):407–44. https://doi.org/10.1007/s10618-010-0209-3
Bosc G, Boulicaut J, Raïssi C et al (2018) Anytime discovery of a diverse set of patterns with monte carlo tree search. Data Min Knowl Discov 32(3):604–65. https://doi.org/10.1007/s10618-017-0547-5
Chakraborty S, Fremont DJ, Meel KS et al (2014) Distribution-aware sampling and weighted model counting for sat. arXiv:1404.2984
Cover TM, Thomas JA (2006) Elements of information theory (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, USA
De Raedt L, Zimmermann A (2007) Constraint-based pattern set mining. In: Proceedings of the seventh SIAM international conference on data mining. SIAM, Minneapolis, Minnesota, USA
De Raedt L, Guns T, Nijssen S (2008) Constraint programming for itemset mining. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 204–212
Dzyuba V, van Leeuwen M, Raedt LD (2017) Flexible constrained sampling with guarantees for pattern mining. Data Min Knowl Discov 31(5):1266–129. https://doi.org/10.1007/s10618-017-0501-6
Gong Z, Zhong P, Hu W (2018) Diversity in machine learning. CoRR abs/1807.01477. http://arxiv.org/abs/1807.01477, arXiv:1807.01477
Gong Z, Zhong P, Hu W (2019) Diversity in machine learning. IEEE Access 7:64323–6435. https://doi.org/10.1109/ACCESS.2019.2917620
Haugland V, Kjølleberg M, Larsen SE et al (2014) A two-armed bandit collective for hierarchical examplar based mining of frequent itemsets with applications to intrusion detection. Trans Comput Collect Intell 14:1–19. https://api.semanticscholar.org/CorpusID:45799111
Hien A, Loudni S, Aribi N et al (2021) A relaxation-based approach for mining diverse closed patterns. In: Hutter F, Kersting K, Lijffijt J et al (eds) Machine learning and knowledge discovery in databases. Springer International Publishing, Cham, pp 36–54
Hien A, Aribi N, Loudni S et al (2024) Mining diverse sets of patterns with constraint programming using the pairwise jaccard similarity relaxation. Constraints An Int J 29(1):80–11. https://doi.org/10.1007/S10601-024-09373-8
Khiari M, Boizumault P, Crémilleux B (2010) Constraint programming for mining n-ary patterns. In: CP’10, LNCS, vol 6308. Springer, pp 552–567
Kim J, Choi I, Li Q (2021) Customer satisfaction of recommender system: examining accuracy and diversity in several types of recommendation approaches. Sustainability 13(11). https://doi.org/10.3390/su13116165, https://www.mdpi.com/2071-1050/13/11/6165
Knobbe AJ, Ho EKY (2006a) Maximally informative k-itemsets and their efficient discovery. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 237–244. https://doi.org/10.1145/1150402.115043
Knobbe AJ, Ho EKY (2006b) Pattern teams. In: European conference on principles of data mining and knowledge discovery, pp 577–584
Lazaar N, Lebbah Y, Loudni S et al (2016) A global constraint for closed frequent pattern mining. In: Principles and practice of constraint programming - 22nd international conference, CP 2016, Toulouse, France, September 5-9, 2016, Proceedings, pp 333–349. https://doi.org/10.1007/978-3-319-44953-1_22
van Leeuwen M, Knobbe A (2012) Diverse subgroup set discovery. Data Min Knowl Disc 25(2):208–242. https://doi.org/10.1007/s10618-012-0273-y
Pan F, Roberts A, McMillan L et al (2007) Sample selection for maximal diversity. In: Seventh IEEE International Conference on Data Mining (ICDM 2007), pp 262–271.https://doi.org/10.1109/ICDM.2007.16
Prud’homme C, Fages JG, Lorca X (2016) Choco solver documentation
Régin JC (2011) Global constraints: a survey. Springer New York, New York, NY, pp 63–134. https://doi.org/10.1007/978-1-4419-1644-0_3
Rossi F, van Beek P, Walsh T (2006) Handbook of constraint programming, 1st edn. Elsevier
Schaus P, Aoga JOR, Guns T (2017) Coversize: a global constraint for frequency-based itemset mining. In: Principles and practice of constraint programming, pp 529–546
Sullivan D, Smyth B, Wilson D (2004) Preserving recommender accuracy and diversity in sparse datasets. Int J Artif Intell Tools 13(01):219–235. https://doi.org/10.1142/S0218213004001491
Vernerey C, Loudni S, Aribi N et al (2022) Threshold-free pattern mining meets multi-objective optimization: Application to asso ciation rules. In: Raedt LD (ed) Proceedings of the thirty-first international joint conference on artificial intelligence, IJCAI-22. International Joint Conferences on Artificial Intelligence Organization, pp 1880–1886. https://doi.org/10.24963/ijcai.2022/261, main Track
Vreeken J, van Leeuwen M, Siebes A (2011) Krimp: mining itemsets that compress. Data Min Knowl Discov 23(1):169–21. https://doi.org/10.1007/s10618-010-0202-x
Zaki MJ, Parthasarathy S, Ogihara M et al (1997) New algorithms for fast discovery of association rules. In: Heckerman D, Mannila H, Pregibon D (eds) Proceedings of the third international conference on knowledge discovery and data mining (KDD-97), Newport Beach, California, USA, August 14-17, 1997. AAAI Press, pp 283–286. http://www.aaai.org/Library/KDD/1997/kdd97-060.php
Zhang C, Liu J, Qu Y et al (2018) Enhancing the robustness of recommender systems against spammers. PLoS ONE 13(11):e0206458. https://doi.org/10.1371/journal.pone.0206458
Zhu N (2023) Research on customer relationship segmentation of apparel retail industry through data mining. Ital Publication 4(2)
Acknowledgements
We would like to thank the Directorate-General for Scientific Research and Technological Development (DGRSDT) for its support of this research work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
To ensure objectivity and transparency in the research and to confirm that accepted principles of ethical and professional conduct were followed, the authors certify the following statements:
Ethical Approval
This research did not involve human participants or animals and, therefore, did not require approval from an ethics committee.
Informed Consent
All authors listed on the title page have contributed to this work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission to the Journal of Applied Intelligence.
Disclosure of Potential Conflicts of Interest
All authors declare no conflict of interest and approve the submission of the manuscript. We affirm that the article is our original work, has not been previously published, and is not under consideration for publication elsewhere.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Douad, M.E.A., Aribi, N., Loudni, S. et al. Maximizing diversity in k-pattern set mining through constraint programming and entropy. Appl Intell 55, 597 (2025). https://doi.org/10.1007/s10489-025-06482-6
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-025-06482-6