Maximizing diversity in k-pattern set mining through constraint programming and entropy

Douad, Mohamed El Amine; Aribi, Noureddine; Loudni, Samir; Hien, Arnold; Lebbah, Yahia

doi:10.1007/s10489-025-06482-6

Maximizing diversity in k-pattern set mining through constraint programming and entropy

Published: 29 March 2025

Volume 55, article number 597, (2025)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

97 Accesses
Explore all metrics

Abstract

Extracting diverse and frequent closed itemsets from large datasets is a core challenge in pattern mining, with significant implications across domains such as fraud detection, recommendation systems, and machine learning. Existing approaches often lack flexibility and efficiency, and struggle with initial itemset selection bias and redundancy. This paper addresses these research gaps by introducing a compact and modular constraint programming model that formalizes the search for diverse patterns. Our approach incorporates a novel global constraint derived from a relaxed Overlap diversity measure, using tighter lower and upper bounds to improve filtering capabilities. Unlike traditional methods, we leverage an entropy-based optimization framework that combines joint entropy maximization with top-k pattern mining to identify the maximally k-diverse pattern set. Our approach ensures more comprehensive and informative pattern discovery by minimizing redundancy and promoting pattern diversity. Extensive experiments validate the effectiveness of the proposed method, demonstrating significant performance gains and superior pattern quality compared to state-of-the-art approaches. Implemented in both sequential and parallel versions, the framework offers an efficient and adaptable solution for anytime pattern mining tasks in various domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining diverse sets of patterns with constraint programming using the pairwise Jaccard similarity relaxation

Article 01 June 2024

A Relaxation-Based Approach for Mining Diverse Closed Patterns

Constrained pattern mining in the new era

Article 23 July 2015

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Artificial Intelligence

Data Availability

The datasets used and/or analyzed during the current study are publicly available and accessible on the Internet. Specific details and the link to these datasets can be found at https://dtai-static.cs.kuleuven.be/CP4IM/datasets/

Notes

Value between $\langle . \rangle $ indicates the frequency of a pattern.
Detailed proofs of all the propositions presented in this Section can be found in the Supplementary Material.
https://github.com/mohameddouad/ClosedOverlap
https://dtai-static.cs.kuleuven.be/CP4IM/datasets/

References

Amane M, Aissaoui K, Berrada M (2023) Enhancing learning object analysis through fuzzy c-means clustering and web mining methods. Emerg Sci J 7(3)
Aribi N, Ouali A, Lebbah Y et al (2018) Equitable conceptual clustering using OWA operator. In: Phung DQ, Tseng VS, Webb GI et al (eds) Advances in knowledge discovery and data mining - 22nd Pacific-Asia conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018, Proceedings, Part III, Lecture Notes in Computer Science, vol 10939. Springer, pp 465–477. https://doi.org/10.1007/978-3-319-93040-4_37
Belfodil A, Belfodil A, Bendimerad A et al (2019) FSSD - A fast and efficient algorithm for subgroup set discovery. In: Singh L, Veaux RDD, Karypis G et al (eds) 2019 IEEE international conference on data science and advanced analytics, DSAA 2019, Washington, DC, USA, October 5-8, 2019. IEEE, pp 91–99. https://doi.org/10.1109/DSAA.2019.00023
Belmecheri N, Aribi N, Lazaar N et al (2023) Boosting the learning for ranking patterns. Algorithms 16(5):21. https://doi.org/10.3390/A16050218
Article MATH Google Scholar
Bendimerad A, Lijffijt J, Plantevit M et al (2020) Gibbs sampling subjectively interesting tiles. In: Berthold MR, Feelders A, Krempl G (eds) Advances in intelligent data analysis XVIII - 18th international symposium on intelligent data analysis, IDA 2020, Konstanz, Germany, April 27-29, 2020, Proceedings, Lecture Notes in Computer Science, vol 12080. Springer, pp 80–92. https://doi.org/10.1007/978-3-030-44584-3_7
Bie TD (2011) Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Min Knowl Discov 23(3):407–44. https://doi.org/10.1007/s10618-010-0209-3
Article MathSciNet MATH Google Scholar
Bosc G, Boulicaut J, Raïssi C et al (2018) Anytime discovery of a diverse set of patterns with monte carlo tree search. Data Min Knowl Discov 32(3):604–65. https://doi.org/10.1007/s10618-017-0547-5
Article MathSciNet MATH Google Scholar
Chakraborty S, Fremont DJ, Meel KS et al (2014) Distribution-aware sampling and weighted model counting for sat. arXiv:1404.2984
Cover TM, Thomas JA (2006) Elements of information theory (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, USA
MATH Google Scholar
De Raedt L, Zimmermann A (2007) Constraint-based pattern set mining. In: Proceedings of the seventh SIAM international conference on data mining. SIAM, Minneapolis, Minnesota, USA
De Raedt L, Guns T, Nijssen S (2008) Constraint programming for itemset mining. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 204–212
Dzyuba V, van Leeuwen M, Raedt LD (2017) Flexible constrained sampling with guarantees for pattern mining. Data Min Knowl Discov 31(5):1266–129. https://doi.org/10.1007/s10618-017-0501-6
Article MathSciNet MATH Google Scholar
Gong Z, Zhong P, Hu W (2018) Diversity in machine learning. CoRR abs/1807.01477. http://arxiv.org/abs/1807.01477, arXiv:1807.01477
Gong Z, Zhong P, Hu W (2019) Diversity in machine learning. IEEE Access 7:64323–6435. https://doi.org/10.1109/ACCESS.2019.2917620
Article MATH Google Scholar
Haugland V, Kjølleberg M, Larsen SE et al (2014) A two-armed bandit collective for hierarchical examplar based mining of frequent itemsets with applications to intrusion detection. Trans Comput Collect Intell 14:1–19. https://api.semanticscholar.org/CorpusID:45799111
Hien A, Loudni S, Aribi N et al (2021) A relaxation-based approach for mining diverse closed patterns. In: Hutter F, Kersting K, Lijffijt J et al (eds) Machine learning and knowledge discovery in databases. Springer International Publishing, Cham, pp 36–54
MATH Google Scholar
Hien A, Aribi N, Loudni S et al (2024) Mining diverse sets of patterns with constraint programming using the pairwise jaccard similarity relaxation. Constraints An Int J 29(1):80–11. https://doi.org/10.1007/S10601-024-09373-8
Article MathSciNet MATH Google Scholar
Khiari M, Boizumault P, Crémilleux B (2010) Constraint programming for mining n-ary patterns. In: CP’10, LNCS, vol 6308. Springer, pp 552–567
Kim J, Choi I, Li Q (2021) Customer satisfaction of recommender system: examining accuracy and diversity in several types of recommendation approaches. Sustainability 13(11). https://doi.org/10.3390/su13116165, https://www.mdpi.com/2071-1050/13/11/6165
Knobbe AJ, Ho EKY (2006a) Maximally informative k-itemsets and their efficient discovery. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 237–244. https://doi.org/10.1145/1150402.115043
Knobbe AJ, Ho EKY (2006b) Pattern teams. In: European conference on principles of data mining and knowledge discovery, pp 577–584
Lazaar N, Lebbah Y, Loudni S et al (2016) A global constraint for closed frequent pattern mining. In: Principles and practice of constraint programming - 22nd international conference, CP 2016, Toulouse, France, September 5-9, 2016, Proceedings, pp 333–349. https://doi.org/10.1007/978-3-319-44953-1_22
van Leeuwen M, Knobbe A (2012) Diverse subgroup set discovery. Data Min Knowl Disc 25(2):208–242. https://doi.org/10.1007/s10618-012-0273-y
Article MathSciNet MATH Google Scholar
Pan F, Roberts A, McMillan L et al (2007) Sample selection for maximal diversity. In: Seventh IEEE International Conference on Data Mining (ICDM 2007), pp 262–271.https://doi.org/10.1109/ICDM.2007.16
Prud’homme C, Fages JG, Lorca X (2016) Choco solver documentation
Régin JC (2011) Global constraints: a survey. Springer New York, New York, NY, pp 63–134. https://doi.org/10.1007/978-1-4419-1644-0_3
Rossi F, van Beek P, Walsh T (2006) Handbook of constraint programming, 1st edn. Elsevier
MATH Google Scholar
Schaus P, Aoga JOR, Guns T (2017) Coversize: a global constraint for frequency-based itemset mining. In: Principles and practice of constraint programming, pp 529–546
Sullivan D, Smyth B, Wilson D (2004) Preserving recommender accuracy and diversity in sparse datasets. Int J Artif Intell Tools 13(01):219–235. https://doi.org/10.1142/S0218213004001491
Vernerey C, Loudni S, Aribi N et al (2022) Threshold-free pattern mining meets multi-objective optimization: Application to asso ciation rules. In: Raedt LD (ed) Proceedings of the thirty-first international joint conference on artificial intelligence, IJCAI-22. International Joint Conferences on Artificial Intelligence Organization, pp 1880–1886. https://doi.org/10.24963/ijcai.2022/261, main Track
Vreeken J, van Leeuwen M, Siebes A (2011) Krimp: mining itemsets that compress. Data Min Knowl Discov 23(1):169–21. https://doi.org/10.1007/s10618-010-0202-x
Article MathSciNet MATH Google Scholar
Zaki MJ, Parthasarathy S, Ogihara M et al (1997) New algorithms for fast discovery of association rules. In: Heckerman D, Mannila H, Pregibon D (eds) Proceedings of the third international conference on knowledge discovery and data mining (KDD-97), Newport Beach, California, USA, August 14-17, 1997. AAAI Press, pp 283–286. http://www.aaai.org/Library/KDD/1997/kdd97-060.php
Zhang C, Liu J, Qu Y et al (2018) Enhancing the robustness of recommender systems against spammers. PLoS ONE 13(11):e0206458. https://doi.org/10.1371/journal.pone.0206458
Article MATH Google Scholar
Zhu N (2023) Research on customer relationship segmentation of apparel retail industry through data mining. Ital Publication 4(2)

Download references

Acknowledgements

We would like to thank the Directorate-General for Scientific Research and Technological Development (DGRSDT) for its support of this research work.

Author information

Authors and Affiliations

LITIO Laboratory, Université Oran1 Ahmed Ben Bella, Oran, 31000, Algeria
Mohamed El Amine Douad, Noureddine Aribi & Yahia Lebbah
IMT Atlantique, LS2N, UMR CNRS 6004, F-44307, Nantes, France
Samir Loudni & Arnold Hien

Authors

Mohamed El Amine Douad
View author publications
You can also search for this author inPubMed Google Scholar
Noureddine Aribi
View author publications
You can also search for this author inPubMed Google Scholar
Samir Loudni
View author publications
You can also search for this author inPubMed Google Scholar
Arnold Hien
View author publications
You can also search for this author inPubMed Google Scholar
Yahia Lebbah
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Mohamed El Amine Douad.

Ethics declarations

To ensure objectivity and transparency in the research and to confirm that accepted principles of ethical and professional conduct were followed, the authors certify the following statements:

Ethical Approval

This research did not involve human participants or animals and, therefore, did not require approval from an ethics committee.

Informed Consent

All authors listed on the title page have contributed to this work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission to the Journal of Applied Intelligence.

Disclosure of Potential Conflicts of Interest

All authors declare no conflict of interest and approve the submission of the manuscript. We affirm that the article is our original work, has not been previously published, and is not under consideration for publication elsewhere.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 635 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Douad, M.E.A., Aribi, N., Loudni, S. et al. Maximizing diversity in k-pattern set mining through constraint programming and entropy. Appl Intell 55, 597 (2025). https://doi.org/10.1007/s10489-025-06482-6

Download citation

Accepted: 16 March 2025
Published: 29 March 2025
DOI: https://doi.org/10.1007/s10489-025-06482-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Maximizing diversity in k-pattern set mining through constraint programming and entropy

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Mining diverse sets of patterns with constraint programming using the pairwise Jaccard similarity relaxation

A Relaxation-Based Approach for Mining Diverse Closed Patterns

Constrained pattern mining in the new era

Explore related subjects

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical Approval

Informed Consent

Disclosure of Potential Conflicts of Interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 635 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now