Abstract
In recent years, pattern mining has evolved from a slow-moving, repetitive three-step process to a much more agile and iterative/user-centric mining model. A crucial element of this framework is the capability to rapidly provide a set of diverse patterns to the user. This paper proposes a pattern mining approach based on constraint programming that incorporates a non-redundancy/diversity constraint into closed pattern enumeration. The level of diversity is controlled through a threshold on the maximum pairwise Jaccard similarity of pattern occurrences. We show that the Jaccard measure does not have nice (anti-)monotonicity properties w.r.t. the general-to-specific enumeration. To address this limitation, we propose anti-monotonic lower and upper-bound relaxations of the Jaccard similarity with nice pruning-enabling properties, and connect the final results to the original Jaccard Index. To evaluate the effectiveness of our relaxations, we conduct a comprehensive comparison against several existing pattern mining techniques designed to control redundancy. Experimental results illustrate that our approach provides an effective solution for mining diverse itemsets, showing competitive performance in both runtime and flexibility.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Opposed to more rigid search in classical pattern mining algorithms, which often rely on exploiting the properties of a particular interestingness measure.
\(\kappa \in (0, 1)\) denotes the sampling error tolerance.
The source code is publicly available at https://github.com/lobnury/ClosedDiversity
References
Agrawal, R., & Srikant, R. (1994), Fast algorithms for mining association rules in large databases. In Proceedings of the 20th VLDB (pp. 487–499). San Francisco, CA, USA.
Belaid, M., Bessiere, C., & Lazaar, N. (2019). Constraint programming for association rules. In T. Y. Berger-Wolf, & N. V. Chawla (Eds.) Proceedings of the 2019 SIAM International Conference on Data Mining, SDM 2019, Calgary, Alberta, Canada, May 2-4, 2019 (pp. 127–135). SIAM. https://doi.org/10.1137/1.9781611975673.15
Belaid, M., Bessiere, C., & Lazaar, N. (2019). Constraint programming for mining borders of frequent itemsets. In S. Kraus (Ed.) Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI 2019, Macao, China, August 10-16, 2019. (pp. 1064–1070). https://doi.org/10.24963/ijcai.2019/149
Belfodil, A., Belfodil, A., Bendimerad, A., Lamarre, P., Robardet, C., Kaytoue, M., & Plantevit, M. (2019). FSSD - A fast and efficient algorithm for subgroup set discovery. In L. Singh, R. D. D. Veaux, & G. Karypis et al. (Eds.) 2019 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2019, Washington, DC, USA, October 5-8, 2019 (pp. 91–99). IEEE. https://doi.org/10.1109/DSAA.2019.00023
Bendimerad, A., Lijffijt, J., Plantevit, M., Robardet, C., & De Bie, T. (2020). Gibbs sampling subjectively interesting tiles. In: M. R. Berthold, A. Feelders, & G. Krempl (Eds.) Advances in intelligent data analysis XVIII - 18th International Symposium on Intelligent Data Analysis, IDA 2020, Konstanz, Germany, April 27-29, 2020, Proceedings, Lecture Notes in Computer Science, vol 12080 (pp. 80–92). Springer. https://doi.org/10.1007/978-3-030-44584-3_7
Bie, T. D. (2011). Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Mining and Knowledge Discovery, 23(3), 407–446. https://doi.org/10.1007/s10618-010-0209-3
Boley, M., Moens, S., & Gärtner, T. (2012). Linear space direct pattern sampling using coupling from the past. In Q. Yang, D. Agarwal, & J. Pei (Eds.) The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, Beijing, China, August 12-16, 2012 (pp. 69–77). ACM. https://doi.org/10.1145/2339530.2339545
Boley, M., Mampaey, M., Kang, B., Tokmakov, P., & Wrobel, S. (2013). One click mining: interactive local pattern discovery through implicit preference and performance learning. In D. H. Chau, J. Vreeken, & M. van Leeuwen, et al. (Eds.) Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics, IDEA@KDD 2013, Chicago, Illinois, USA, August 11, 2013 (pp. 27–35). ACM. https://doi.org/10.1145/2501511.2501517
Borgelt, C. (2012). Frequent item set mining. WIREs Data Mining and Knowledge Discovery, 2(6), 437–456. https://doi.org/10.1002/widm.1074
Bosc, G., Boulicaut, J., Raïssi, C., & Kaytoue, M. (2018). Anytime discovery of a diverse set of patterns with monte carlo tree search. Data Mining and Knowledge Discovery, 32(3), 604–650. https://doi.org/10.1007/s10618-017-0547-5
Bringmann, B., & Zimmermann, A. (2007). The chosen few: On identifying valuable patterns. In: Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007), October 28-31, 2007, Omaha, Nebraska, USA (pp. 63–72). IEEE Computer Society. https://doi.org/10.1109/ICDM.2007.85
Bringmann, B., & Zimmermann, A. (2009). One in a million: picking the right patterns. Knowledge and Information Systems, 18(1), 61–81. https://doi.org/10.1007/s10115-008-0136-4
Dzyuba, V., & van Leeuwen, M. (2013). Interactive discovery of interesting subgroup sets. In A. Tucker, F. Höppner, & A. Siebes, et al. (Eds.) Advances in Intelligent Data Analysis XII - 12th International Symposium, IDA 2013, London, UK, October 17-19, 2013 (pp. 150–161). Proceedings, Lecture Notes in Computer Science, vol 8207, Springer. https://doi.org/10.1007/978-3-642-41398-8_14
Dzyuba, V., van Leeuwen, M., & Raedt, L. D. (2017). Flexible constrained sampling with guarantees for pattern mining. Data Mining and Knowledge Discovery, 31(5), 1266–1293. https://doi.org/10.1007/s10618-017-0501-6
Gallo, A., Miettinen, P., & Mannila, H. (2008). Finding subgroups having several descriptions: Algorithms for redescription mining. In: Proceedings of the SIAM International Conference on Data Mining, SDM 2008, April 24-26, 2008, Atlanta, Georgia, USA (pp. 334–345). SIAM. https://doi.org/10.1137/1.9781611972788.30
Hien, A., Loudni, S., Aribi, N., Lebbah, Y., Laghzaoui, M. E. A., Ouali, A., & Zimmermann, A. (2020). A relaxation-based approach for mining diverse closed patterns. In F. Hutter, K. Kersting, & J. Lijffijt et al. (Eds.) Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2020, Ghent, Belgium, September 14-18, 2020, Proceedings, Part I, Lecture Notes in Computer Science, vol 12457 (pp. 36–54). Springer. https://doi.org/10.1007/978-3-030-67658-2_3
Hoeve, W., Katriel, I. (2006). Global constraints. In Handbook of Constraint Programming (pp. 169–208). Elsevier Science Inc.
Ke, Y., Cheng, J., Yu, J. X. (2009). Top-k correlative graph mining. In: SDM. SIAM (pp. 1038–1049). https://doi.org/10.1137/1.9781611972795
Khiari, M., Boizumault, P., & Crémilleux, B. (2010). Constraint programming for mining n-ary patterns. In: D. Cohen (Ed.) Principles and Practice of Constraint Programming - CP 2010 - 16th International Conference, CP 2010, St. Andrews, Scotland, UK, September 6-10, 2010 (pp. 552–567). Proceedings, Lecture Notes in Computer Science, vol 6308. Springer. https://doi.org/10.1007/978-3-642-15396-9_44
Kifer, D., Gehrke, J., Bucila, C., & White, W. (2006). How to quickly find a witness. In Constraint-Based Mining and Inductive Databases (pp. 216–242). Berlin Heidelberg: Springer.
Knobbe, A. J., & Ho, E. K. Y. (2006). Pattern teams. In: J. Fürnkranz, T. Scheffer, & M. Spiliopoulou (Eds.) Knowledge Discovery in Databases: PKDD 2006, 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, Berlin, Germany, September 18-22, 2006, Proceedings, Lecture Notes in Computer Science, vol 4213 (pp. 577–584). Springer. https://doi.org/10.1007/11871637_58
Lazaar, N., Lebbah, Y., Loudni, S., Maamar, M., Lemière, V., Bessiere, C., & Boizumault, P. (2016). A global constraint for closed frequent pattern mining. In M. Rueher (Ed) Principles and Practice of Constraint Programming - 22nd International Conference, CP 2016, Toulouse, France, September 5-9, 2016, Proceedings, Lecture Notes in Computer Science, vol 9892 (pp. 333–349). Springer. https://doi.org/10.1007/978-3-319-44953-1_22
van Leeuwen, M. (2014). Interactive data exploration using pattern mining. Lecture Notes in Computer Science, vol 8401, (pp. 169–182). Springer. https://doi.org/10.1007/978-3-662-43968-5_9
van Leeuwen, M., & Knobbe, A. J. (2012). Diverse subgroup set discovery. Data Mining and Knowledge Discovery, 25(2), 208–242. https://doi.org/10.1007/s10618-012-0273-y
Makhalova, T., Kuznetsov, S.O., & Napoli, A. (2019). On on entropy in pattern mining. In SFC 2019 - XXVIe Rencontres de la Société Francophone de Classification, Sep 2019, Nancy, France. hal-02193296. https://hal.archives-ouvertes.fr/hal-02193296
Meeng, M., Duivesteijn, W., & Knobbe, A. J. (2014). Rocsearch - an roc-guided search strategy for subgroup discovery. In M. J. Zaki, Z. Obradovic, & P. Tan, et al. (Eds.) Proceedings of the 2014 SIAM International Conference on Data Mining, Philadelphia, Pennsylvania, USA, April 24-26, 2014 (pp. 704–712). SIAM. https://doi.org/10.1137/1.9781611973440.81
Mitchell, T. M. (1982). Generalization as search. Artificial Intelligence, 18(2), 203–226. https://doi.org/10.1016/0004-3702(82)90040-6
Ng, R. T., Lakshmanan, L. V. S., & Han, J., et al. (1998). Exploratory mining and pruning optimizations of constrained association rules. In Proceedings of ACM SIGMOD (pp. 13–24).
Nijssen, S., & Zimmermann, A. (2014). Constraint-based pattern mining. In C. C. Aggarwal, & J. Han (Eds.) Frequent Pattern Mining (pp. 147–163). Springer. https://doi.org/10.1007/978-3-319-07821-2_7
Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. In Proceedings of the 7th ICDT, (pp. 398–416).
Pei, J., Han, J., Lakshmanan, L. V. S. (2001). Mining frequent item sets with convertible constraints. In D. Georgakopoulos, & A. Buchmann (Eds.) Proceedings of the 17th International Conference on Data Engineering, April 2-6, 2001, Heidelberg, Germany (pp. 433–442). IEEE Computer Society.https://doi.org/10.1109/ICDE.2001.914856
Prud’homme, C., Fages, J. G., & Lorca, X. (2016). Choco Solver Documentation.
Puolamäki, K., Kang, B., Lijffijt, J., & De Bie, T. (2016). Interactive visual data exploration with subjective feedback. In Proceedings of ECML PKDD (pp. 214–229). Springer
Raedt, L. D., Guns, T., & Nijssen, S. (2008). Constraint programming for itemset mining. In Y. Li, B. Liu, & S. Sarawagi (Eds.) Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, August 24-27, 2008 (pp. 204–212). ACM. https://doi.org/10.1145/1401890.1401919
Rojas, W. U., Boizumault, P., Loudni, S., Crémilleux, B., & Lepailleur, A. (2014) Mining (soft-) skypatterns using dynamic CSP. In H. Simonis (Ed.) Integration of AI and OR Techniques in Constraint Programming - 11th International Conference, CPAIOR 2014, Cork, Ireland, May 19-23, 2014. (pp. 71–87). Proceedings, Lecture Notes in Computer Science, vol 8451. Springer. https://doi.org/10.1007/978-3-319-07046-9_6
Schaus, P., Aoga, J. O. R., & Guns, T. (2017). Coversize: A global constraint for frequency-based itemset mining. In J. C. Beck (Ed.) Principles and Practice of Constraint Programming - 23rd International Conference, CP 2017, Melbourne, VIC, Australia, August 28 - September 1, 2017, Proceedings, Lecture Notes in Computer Science, vol 10416 (pp. 529–546). Springer. https://doi.org/10.1007/978-3-319-66158-2_34
Ugarte, W., Boizumault, P., Loudni, S., Crémilleux, B., & Lepailleur, A. (2015). Soft constraints for pattern mining. Journal of Intelligent Information System, 44(2), 193–221. https://doi.org/10.1007/s10844-013-0281-4
Vijayakumar, A. K., Cogswell, M., Selvaraju, R. R., Sun, Q., Lee, S., Crandall, D., & Batra, D. (2018). Diverse beam search for improved description of complex scenes. In S. A. McIlraith & K. Q. Weinberger (eds) Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018 (pp. 7371–7379). AAAI Press. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17329
Vreeken, J., van Leeuwen, M., & Siebes, A. (2011). Krimp: mining itemsets that compress. Data Mining and Knowledge Discovery, 23(1), 169–214. https://doi.org/10.1007/s10618-010-0202-x
Wang, J., Han, J., & Pei, J. (2003). CLOSET+: searching for the best strategies for mining frequent closed itemsets. In L. Getoor, T. E. Senator, & P. M. Domingos et al. (eds) Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 24 - 27, 2003 (pp. 236–245). ACM. https://doi.org/10.1145/956750.956779
Wang, J., Han, J., Lu, Y., & Tzvetkov, P. (2005). TFP: an efficient algorithm for mining top-k frequent closed itemsets. IEEE Transactions on Knowledge and Data Engineering, 17(5), 652–664.
Zaki, M. J., Parthasarathy, S., Ogihara, M., & Li, W. (1997). New algorithms for fast discovery of association rules. In D. Heckerman, H. Mannila, & D. Pregibon (eds) Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97), Newport Beach, California, USA, August 14-17, 1997 (pp. 283–286) . AAAI Press. http://www.aaai.org/Library/KDD/1997/kdd97-060.php
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
To ensure objectivity and transparency in research and to ensure that accepted principles of ethical and professional conduct have been followed, the authors certify the following statements: \(\bullet \) All authors have no conflict of interest to declare and approve the manuscript being submitted. We warrant that the article is the Authors’ original work. We warrant that the article has not received prior publication and is not under consideration for publication elsewhere. On behalf of all co-authors, the corresponding author shall bear full responsibility for the submission. \(\bullet \) All authors did not receive support from any organization for the submitted work. The authors have no relevant financial or non-financial interests to disclose. \(\bullet \) All authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission to the Constraints Journal.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hien, A., Aribi, N., Loudni, S. et al. Mining diverse sets of patterns with constraint programming using the pairwise Jaccard similarity relaxation. Constraints 29, 80–111 (2024). https://doi.org/10.1007/s10601-024-09373-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10601-024-09373-8