Skip to main content
Log in

Mining diverse sets of patterns with constraint programming using the pairwise Jaccard similarity relaxation

  • Published:
Constraints Aims and scope Submit manuscript

Abstract

In recent years, pattern mining has evolved from a slow-moving, repetitive three-step process to a much more agile and iterative/user-centric mining model. A crucial element of this framework is the capability to rapidly provide a set of diverse patterns to the user. This paper proposes a pattern mining approach based on constraint programming that incorporates a non-redundancy/diversity constraint into closed pattern enumeration. The level of diversity is controlled through a threshold on the maximum pairwise Jaccard similarity of pattern occurrences. We show that the Jaccard measure does not have nice (anti-)monotonicity properties w.r.t. the general-to-specific enumeration. To address this limitation, we propose anti-monotonic lower and upper-bound relaxations of the Jaccard similarity with nice pruning-enabling properties, and connect the final results to the original Jaccard Index. To evaluate the effectiveness of our relaxations, we conduct a comprehensive comparison against several existing pattern mining techniques designed to control redundancy. Experimental results illustrate that our approach provides an effective solution for mining diverse itemsets, showing competitive performance in both runtime and flexibility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. Opposed to more rigid search in classical pattern mining algorithms, which often rely on exploiting the properties of a particular interestingness measure.

  2. https://github.com/lobnury/ClosedDiversity/tree/master/Suppl_Material

  3. \(\kappa \in (0, 1)\) denotes the sampling error tolerance.

  4. https://github.com/lobnury/ClosedDiversity/tree/master/Suppl_Material

  5. The source code is publicly available at https://github.com/lobnury/ClosedDiversity

References

  1. Agrawal, R., & Srikant, R. (1994), Fast algorithms for mining association rules in large databases. In Proceedings of the 20th VLDB (pp. 487–499). San Francisco, CA, USA.

  2. Belaid, M., Bessiere, C., & Lazaar, N. (2019). Constraint programming for association rules. In T. Y. Berger-Wolf, & N. V. Chawla (Eds.) Proceedings of the 2019 SIAM International Conference on Data Mining, SDM 2019, Calgary, Alberta, Canada, May 2-4, 2019 (pp. 127–135). SIAM. https://doi.org/10.1137/1.9781611975673.15

  3. Belaid, M., Bessiere, C., & Lazaar, N. (2019). Constraint programming for mining borders of frequent itemsets. In S. Kraus (Ed.) Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI 2019, Macao, China, August 10-16, 2019. (pp. 1064–1070). https://doi.org/10.24963/ijcai.2019/149

  4. Belfodil, A., Belfodil, A., Bendimerad, A., Lamarre, P., Robardet, C., Kaytoue, M., & Plantevit, M. (2019). FSSD - A fast and efficient algorithm for subgroup set discovery. In L. Singh, R. D. D. Veaux, & G. Karypis et al. (Eds.) 2019 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2019, Washington, DC, USA, October 5-8, 2019 (pp. 91–99). IEEE. https://doi.org/10.1109/DSAA.2019.00023

  5. Bendimerad, A., Lijffijt, J., Plantevit, M., Robardet, C., & De Bie, T. (2020). Gibbs sampling subjectively interesting tiles. In: M. R. Berthold, A. Feelders, & G. Krempl (Eds.) Advances in intelligent data analysis XVIII - 18th International Symposium on Intelligent Data Analysis, IDA 2020, Konstanz, Germany, April 27-29, 2020, Proceedings, Lecture Notes in Computer Science, vol 12080 (pp. 80–92). Springer. https://doi.org/10.1007/978-3-030-44584-3_7

  6. Bie, T. D. (2011). Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Mining and Knowledge Discovery, 23(3), 407–446. https://doi.org/10.1007/s10618-010-0209-3

    Article  MathSciNet  Google Scholar 

  7. Boley, M., Moens, S., & Gärtner, T. (2012). Linear space direct pattern sampling using coupling from the past. In Q. Yang, D. Agarwal, & J. Pei (Eds.) The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, Beijing, China, August 12-16, 2012 (pp. 69–77). ACM. https://doi.org/10.1145/2339530.2339545

  8. Boley, M., Mampaey, M., Kang, B., Tokmakov, P., & Wrobel, S. (2013). One click mining: interactive local pattern discovery through implicit preference and performance learning. In D. H. Chau, J. Vreeken, & M. van Leeuwen, et al. (Eds.) Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics, IDEA@KDD 2013, Chicago, Illinois, USA, August 11, 2013 (pp. 27–35). ACM. https://doi.org/10.1145/2501511.2501517

  9. Borgelt, C. (2012). Frequent item set mining. WIREs Data Mining and Knowledge Discovery, 2(6), 437–456. https://doi.org/10.1002/widm.1074

    Article  Google Scholar 

  10. Bosc, G., Boulicaut, J., Raïssi, C., & Kaytoue, M. (2018). Anytime discovery of a diverse set of patterns with monte carlo tree search. Data Mining and Knowledge Discovery, 32(3), 604–650. https://doi.org/10.1007/s10618-017-0547-5

    Article  MathSciNet  Google Scholar 

  11. Bringmann, B., & Zimmermann, A. (2007). The chosen few: On identifying valuable patterns. In: Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007), October 28-31, 2007, Omaha, Nebraska, USA (pp. 63–72). IEEE Computer Society. https://doi.org/10.1109/ICDM.2007.85

  12. Bringmann, B., & Zimmermann, A. (2009). One in a million: picking the right patterns. Knowledge and Information Systems, 18(1), 61–81. https://doi.org/10.1007/s10115-008-0136-4

    Article  Google Scholar 

  13. Dzyuba, V., & van Leeuwen, M. (2013). Interactive discovery of interesting subgroup sets. In A. Tucker, F. Höppner, & A. Siebes, et al. (Eds.) Advances in Intelligent Data Analysis XII - 12th International Symposium, IDA 2013, London, UK, October 17-19, 2013 (pp. 150–161). Proceedings, Lecture Notes in Computer Science, vol 8207, Springer. https://doi.org/10.1007/978-3-642-41398-8_14

  14. Dzyuba, V., van Leeuwen, M., & Raedt, L. D. (2017). Flexible constrained sampling with guarantees for pattern mining. Data Mining and Knowledge Discovery, 31(5), 1266–1293. https://doi.org/10.1007/s10618-017-0501-6

    Article  MathSciNet  Google Scholar 

  15. Gallo, A., Miettinen, P., & Mannila, H. (2008). Finding subgroups having several descriptions: Algorithms for redescription mining. In: Proceedings of the SIAM International Conference on Data Mining, SDM 2008, April 24-26, 2008, Atlanta, Georgia, USA (pp. 334–345). SIAM. https://doi.org/10.1137/1.9781611972788.30

  16. Hien, A., Loudni, S., Aribi, N., Lebbah, Y., Laghzaoui, M. E. A., Ouali, A., & Zimmermann, A. (2020). A relaxation-based approach for mining diverse closed patterns. In F. Hutter, K. Kersting, & J. Lijffijt et al. (Eds.) Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2020, Ghent, Belgium, September 14-18, 2020, Proceedings, Part I, Lecture Notes in Computer Science, vol 12457 (pp. 36–54). Springer. https://doi.org/10.1007/978-3-030-67658-2_3

  17. Hoeve, W., Katriel, I. (2006). Global constraints. In Handbook of Constraint Programming (pp. 169–208). Elsevier Science Inc.

  18. Ke, Y., Cheng, J., Yu, J. X. (2009). Top-k correlative graph mining. In: SDM. SIAM (pp. 1038–1049). https://doi.org/10.1137/1.9781611972795

  19. Khiari, M., Boizumault, P., & Crémilleux, B. (2010). Constraint programming for mining n-ary patterns. In: D. Cohen (Ed.) Principles and Practice of Constraint Programming - CP 2010 - 16th International Conference, CP 2010, St. Andrews, Scotland, UK, September 6-10, 2010 (pp. 552–567). Proceedings, Lecture Notes in Computer Science, vol 6308. Springer. https://doi.org/10.1007/978-3-642-15396-9_44

  20. Kifer, D., Gehrke, J., Bucila, C., & White, W. (2006). How to quickly find a witness. In Constraint-Based Mining and Inductive Databases (pp. 216–242). Berlin Heidelberg: Springer.

  21. Knobbe, A. J., & Ho, E. K. Y. (2006). Pattern teams. In: J. Fürnkranz, T. Scheffer, & M. Spiliopoulou (Eds.) Knowledge Discovery in Databases: PKDD 2006, 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, Berlin, Germany, September 18-22, 2006, Proceedings, Lecture Notes in Computer Science, vol 4213 (pp. 577–584). Springer. https://doi.org/10.1007/11871637_58

  22. Lazaar, N., Lebbah, Y., Loudni, S., Maamar, M., Lemière, V., Bessiere, C., & Boizumault, P. (2016). A global constraint for closed frequent pattern mining. In M. Rueher (Ed) Principles and Practice of Constraint Programming - 22nd International Conference, CP 2016, Toulouse, France, September 5-9, 2016, Proceedings, Lecture Notes in Computer Science, vol 9892 (pp. 333–349). Springer. https://doi.org/10.1007/978-3-319-44953-1_22

  23. van Leeuwen, M. (2014). Interactive data exploration using pattern mining. Lecture Notes in Computer Science, vol 8401, (pp. 169–182). Springer. https://doi.org/10.1007/978-3-662-43968-5_9

  24. van Leeuwen, M., & Knobbe, A. J. (2012). Diverse subgroup set discovery. Data Mining and Knowledge Discovery, 25(2), 208–242. https://doi.org/10.1007/s10618-012-0273-y

    Article  MathSciNet  Google Scholar 

  25. Makhalova, T., Kuznetsov, S.O., & Napoli, A. (2019). On on entropy in pattern mining. In SFC 2019 - XXVIe Rencontres de la Société Francophone de Classification, Sep 2019, Nancy, France. hal-02193296. https://hal.archives-ouvertes.fr/hal-02193296

  26. Meeng, M., Duivesteijn, W., & Knobbe, A. J. (2014). Rocsearch - an roc-guided search strategy for subgroup discovery. In M. J. Zaki, Z. Obradovic, & P. Tan, et al. (Eds.) Proceedings of the 2014 SIAM International Conference on Data Mining, Philadelphia, Pennsylvania, USA, April 24-26, 2014 (pp. 704–712). SIAM. https://doi.org/10.1137/1.9781611973440.81

  27. Mitchell, T. M. (1982). Generalization as search. Artificial Intelligence, 18(2), 203–226. https://doi.org/10.1016/0004-3702(82)90040-6

    Article  MathSciNet  Google Scholar 

  28. Ng, R. T., Lakshmanan, L. V. S., & Han, J., et al. (1998). Exploratory mining and pruning optimizations of constrained association rules. In Proceedings of ACM SIGMOD (pp. 13–24).

  29. Nijssen, S., & Zimmermann, A. (2014). Constraint-based pattern mining. In C. C. Aggarwal, & J. Han (Eds.) Frequent Pattern Mining (pp. 147–163). Springer. https://doi.org/10.1007/978-3-319-07821-2_7

  30. Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. In Proceedings of the 7th ICDT, (pp. 398–416).

  31. Pei, J., Han, J., Lakshmanan, L. V. S. (2001). Mining frequent item sets with convertible constraints. In D. Georgakopoulos, & A. Buchmann (Eds.) Proceedings of the 17th International Conference on Data Engineering, April 2-6, 2001, Heidelberg, Germany (pp. 433–442). IEEE Computer Society.https://doi.org/10.1109/ICDE.2001.914856

  32. Prud’homme, C., Fages, J. G., & Lorca, X. (2016). Choco Solver Documentation.

  33. Puolamäki, K., Kang, B., Lijffijt, J., & De Bie, T. (2016). Interactive visual data exploration with subjective feedback. In Proceedings of ECML PKDD (pp. 214–229). Springer

  34. Raedt, L. D., Guns, T., & Nijssen, S. (2008). Constraint programming for itemset mining. In Y. Li, B. Liu, & S. Sarawagi (Eds.) Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, August 24-27, 2008 (pp. 204–212). ACM. https://doi.org/10.1145/1401890.1401919

  35. Rojas, W. U., Boizumault, P., Loudni, S., Crémilleux, B., & Lepailleur, A. (2014) Mining (soft-) skypatterns using dynamic CSP. In H. Simonis (Ed.) Integration of AI and OR Techniques in Constraint Programming - 11th International Conference, CPAIOR 2014, Cork, Ireland, May 19-23, 2014. (pp. 71–87). Proceedings, Lecture Notes in Computer Science, vol 8451. Springer. https://doi.org/10.1007/978-3-319-07046-9_6

  36. Schaus, P., Aoga, J. O. R., & Guns, T. (2017). Coversize: A global constraint for frequency-based itemset mining. In J. C. Beck (Ed.) Principles and Practice of Constraint Programming - 23rd International Conference, CP 2017, Melbourne, VIC, Australia, August 28 - September 1, 2017, Proceedings, Lecture Notes in Computer Science, vol 10416 (pp. 529–546). Springer. https://doi.org/10.1007/978-3-319-66158-2_34

  37. Ugarte, W., Boizumault, P., Loudni, S., Crémilleux, B., & Lepailleur, A. (2015). Soft constraints for pattern mining. Journal of Intelligent Information System, 44(2), 193–221. https://doi.org/10.1007/s10844-013-0281-4

    Article  Google Scholar 

  38. Vijayakumar, A. K., Cogswell, M., Selvaraju, R. R., Sun, Q., Lee, S., Crandall, D., & Batra, D. (2018). Diverse beam search for improved description of complex scenes. In S. A. McIlraith & K. Q. Weinberger (eds) Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018 (pp. 7371–7379). AAAI Press. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17329

  39. Vreeken, J., van Leeuwen, M., & Siebes, A. (2011). Krimp: mining itemsets that compress. Data Mining and Knowledge Discovery, 23(1), 169–214. https://doi.org/10.1007/s10618-010-0202-x

    Article  MathSciNet  Google Scholar 

  40. Wang, J., Han, J., & Pei, J. (2003). CLOSET+: searching for the best strategies for mining frequent closed itemsets. In L. Getoor, T. E. Senator, & P. M. Domingos et al. (eds) Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 24 - 27, 2003 (pp. 236–245). ACM. https://doi.org/10.1145/956750.956779

  41. Wang, J., Han, J., Lu, Y., & Tzvetkov, P. (2005). TFP: an efficient algorithm for mining top-k frequent closed itemsets. IEEE Transactions on Knowledge and Data Engineering, 17(5), 652–664.

    Article  Google Scholar 

  42. Zaki, M. J., Parthasarathy, S., Ogihara, M., & Li, W. (1997). New algorithms for fast discovery of association rules. In D. Heckerman, H. Mannila, & D. Pregibon (eds) Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97), Newport Beach, California, USA, August 14-17, 1997 (pp. 283–286) . AAAI Press. http://www.aaai.org/Library/KDD/1997/kdd97-060.php

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Arnold Hien, Noureddine Aribi or Samir Loudni.

Ethics declarations

To ensure objectivity and transparency in research and to ensure that accepted principles of ethical and professional conduct have been followed, the authors certify the following statements: \(\bullet \) All authors have no conflict of interest to declare and approve the manuscript being submitted. We warrant that the article is the Authors’ original work. We warrant that the article has not received prior publication and is not under consideration for publication elsewhere. On behalf of all co-authors, the corresponding author shall bear full responsibility for the submission. \(\bullet \) All authors did not receive support from any organization for the submitted work. The authors have no relevant financial or non-financial interests to disclose. \(\bullet \) All authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission to the Constraints Journal.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hien, A., Aribi, N., Loudni, S. et al. Mining diverse sets of patterns with constraint programming using the pairwise Jaccard similarity relaxation. Constraints 29, 80–111 (2024). https://doi.org/10.1007/s10601-024-09373-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10601-024-09373-8

Keywords