Skip to main content

DiffVersify: a Scalable Approach to Differentiable Pattern Mining with Coverage Regularization

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases. Research Track (ECML PKDD 2024)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14946))

  • 557 Accesses

Abstract

Pattern mining addresses the challenge of automatically identifying interpretable and discriminative patterns within data. Recent approaches, leveraging differentiable approach through neural autoencoder with class recovery, have achieved encouraging results but tend to fall short as the magnitude of the noise and the number of underlying features increase in the data. Empirically, one can observe that the number of discovered patterns tend to be limited in these challenging contexts. In this article, we present a differentiable binary model that integrates a new regularization technique to enhance pattern coverage. Besides, we introduce an innovative pattern decoding strategy taking advantage of non-negative matrix factorization (NMF), extending beyond conventional thresholding methods prevalent in existing approaches. Experiments on four real-world datasets exhibit superior performances of DiffVersify in terms of the ROC-AUC metric. On synthetic data, we observe an increase in the similarity between the discovered patterns and the ground truth. Finally, using several metrics to finely evaluate the quality of the patterns in regard to the data, we show the global effectiveness of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Code and data are available: https://chataingt.github.io/DiffVersify/.

  2. 2.

    https://www.kaggle.com/datasets/sulianova/cardiovasculardisease-dataset.

  3. 3.

    https://www.kaggle.com/datasets/itachi9604/diseasesymptom-description-dataset.

  4. 4.

    The BRCA datasets were derived from data made available by the TCGA Research Network.

References

  1. Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: SIGMOD, pp. 207–216. ACM Press (1993)

    Google Scholar 

  2. Berman, A., Plemmons, R.J.: Nonnegative matrices in the mathematical sciences. In: Classics in Applied Mathematics (1979)

    Google Scholar 

  3. Bosc, G., Boulicaut, J., Raïssi, C., Kaytoue, M.: Anytime discovery of a diverse set of patterns with Monte Carlo tree search. DAMI 32(3), 604–650 (2018)

    MathSciNet  Google Scholar 

  4. Budhathoki, K., Vreeken, J.: The difference and the norm: characterising similarities and differences between databases. In: Mach (2015)

    Google Scholar 

  5. Dash, S., Günlük, O., Wei, D.: Boolean decision rules via column generation. In: NeurIPS, pp. 4660–4670 (2018)

    Google Scholar 

  6. De Bie, T.: Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Min. Knowl. Discov. 23(3), 407–446 (2011)

    Article  MathSciNet  Google Scholar 

  7. Dierckx, L., Veroneze, R., Nijssen, S.: RL-net: interpretable rule learning with neural networks. In: PAKDD, pp. 95–107 (2023)

    Google Scholar 

  8. Dzyuba, V., van Leeuwen, M., Raedt, L.D.: Flexible constrained sampling with guarantees for pattern mining. Data Min. Knowl. Discov. 31(5), 1266–1293 (2017)

    Article  MathSciNet  Google Scholar 

  9. Fischer, J., Vreeken, J.: Differentiable pattern set mining. In: SIGKDD, pp. 383–392. ACM (2021)

    Google Scholar 

  10. Gionis, A., Mannila, H., Mielikäinen, T., Tsaparas, P.: Assessing data mining results via swap randomization. ACM Trans. Knowl. Discov. Data 1(3), 14 (2007)

    Article  Google Scholar 

  11. Hayden, M., et al.: Fast sparse decision tree optimization via reference ensembles. In: AAAI, vol. 36 (2022)

    Google Scholar 

  12. Hedderich, M., Fischer, J., Klakow, D., Vreeken, J.: Label-descriptive patterns and their application to characterize classification errors. In: ICML (2022)

    Google Scholar 

  13. Hess, S., Morik, K.: C-SALT: mining class-specific alterations in boolean matrix factorization. In: ECML PKDD, vol. 10534, pp. 547–563 (2017)

    Google Scholar 

  14. Kusters, R., Kim, Y., Collery, M., Marie, C.d.S., Gupta, S.: Differentiable rule induction with learned relational features. arXiv preprint arXiv:2201.06515 (2022)

  15. Lakkaraju, H., Bach, S.H., Leskovec, J.: Interpretable decision sets: a joint framework for description and prediction. In: SIGKDD, pp. 1675–1684 (2016)

    Google Scholar 

  16. Lemmerich, F., Becker, M.: pysubgroup: easy-to-use subgroup discovery in python. In: Brefeld, U., et al. (eds.) ECML PKDD 2018. LNCS (LNAI), vol. 11053, pp. 658–662. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-10997-4_46

    Chapter  Google Scholar 

  17. Lin, J.J., Zhong, C., Hu, D., Rudin, C., Seltzer, M.I.: Generalized and scalable optimal sparse decision trees. In: ICML (2020)

    Google Scholar 

  18. Pellegrina, L., Riondato, M., Vandin, F.: Spumante: significant pattern mining with unconditional testing. In: SIGKDD, pp. 1528–1538 (2019)

    Google Scholar 

  19. Proença, H.M., van Leeuwen, M.: Interpretable multiclass classification by mdl-based rule lists. Inf. Sci. 512, 1372–1393 (2020)

    Article  Google Scholar 

  20. Shi, T., Kang, K., Choo, J., Reddy, C.K.: Short-text topic modeling via NMF enriched with local word-context correlations. In: WWW (2018)

    Google Scholar 

  21. Walter, N.P., Fischer, J., Vreeken, J.: Finding interpretable class-specific patterns through efficient neural search. In: AAAI (2024)

    Google Scholar 

  22. Wang, Z., Zhang, W., Liu, N., Wang, J.: Scalable rule-based representation learning for interpretable classification. In: NeurIPS, vol. 34, pp. 30479–30491 (2021)

    Google Scholar 

  23. Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: ACM SIGIR (2003)

    Google Scholar 

  24. Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: SIGKDD, pp. 283–286 (1997)

    Google Scholar 

Download references

Acknowledgement

This work benefited from state aid managed by the National Research Agency under France 2030 with the reference “ANR-22-PEAE-0008”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Plantevit .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chataing, T., Perez, J., Plantevit, M., Robardet, C. (2024). DiffVersify: a Scalable Approach to Differentiable Pattern Mining with Coverage Regularization. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14946. Springer, Cham. https://doi.org/10.1007/978-3-031-70365-2_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70365-2_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70364-5

  • Online ISBN: 978-3-031-70365-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics