DiffVersify: a Scalable Approach to Differentiable Pattern Mining with Coverage Regularization

Chataing, Thibaut; Perez, Julien; Plantevit, Marc; Robardet, Céline

doi:10.1007/978-3-031-70365-2_24

Thibaut Chataing^13,14,
Julien Perez¹⁵,
Marc Plantevit¹⁵ &
…
Céline Robardet¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14946))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

557 Accesses

Abstract

Pattern mining addresses the challenge of automatically identifying interpretable and discriminative patterns within data. Recent approaches, leveraging differentiable approach through neural autoencoder with class recovery, have achieved encouraging results but tend to fall short as the magnitude of the noise and the number of underlying features increase in the data. Empirically, one can observe that the number of discovered patterns tend to be limited in these challenging contexts. In this article, we present a differentiable binary model that integrates a new regularization technique to enhance pattern coverage. Besides, we introduce an innovative pattern decoding strategy taking advantage of non-negative matrix factorization (NMF), extending beyond conventional thresholding methods prevalent in existing approaches. Experiments on four real-world datasets exhibit superior performances of DiffVersify in terms of the ROC-AUC metric. On synthetic data, we observe an increase in the similarity between the discovered patterns and the ground truth. Finally, using several metrics to finely evaluate the quality of the patterns in regard to the data, we show the global effectiveness of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Interactive Pattern Mining Using Discriminant Sub-patterns as Dynamic Features

Binary Classification of Sequences Possessing Unilateral Common Factor with AMS and APR

On Coupling FCA and MDL in Pattern Mining

Notes

1.
Code and data are available: https://chataingt.github.io/DiffVersify/.
2.
https://www.kaggle.com/datasets/sulianova/cardiovasculardisease-dataset.
3.
https://www.kaggle.com/datasets/itachi9604/diseasesymptom-description-dataset.
4.
The BRCA datasets were derived from data made available by the TCGA Research Network.

References

Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: SIGMOD, pp. 207–216. ACM Press (1993)
Google Scholar
Berman, A., Plemmons, R.J.: Nonnegative matrices in the mathematical sciences. In: Classics in Applied Mathematics (1979)
Google Scholar
Bosc, G., Boulicaut, J., Raïssi, C., Kaytoue, M.: Anytime discovery of a diverse set of patterns with Monte Carlo tree search. DAMI 32(3), 604–650 (2018)
MathSciNet Google Scholar
Budhathoki, K., Vreeken, J.: The difference and the norm: characterising similarities and differences between databases. In: Mach (2015)
Google Scholar
Dash, S., Günlük, O., Wei, D.: Boolean decision rules via column generation. In: NeurIPS, pp. 4660–4670 (2018)
Google Scholar
De Bie, T.: Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Min. Knowl. Discov. 23(3), 407–446 (2011)
Article MathSciNet Google Scholar
Dierckx, L., Veroneze, R., Nijssen, S.: RL-net: interpretable rule learning with neural networks. In: PAKDD, pp. 95–107 (2023)
Google Scholar
Dzyuba, V., van Leeuwen, M., Raedt, L.D.: Flexible constrained sampling with guarantees for pattern mining. Data Min. Knowl. Discov. 31(5), 1266–1293 (2017)
Article MathSciNet Google Scholar
Fischer, J., Vreeken, J.: Differentiable pattern set mining. In: SIGKDD, pp. 383–392. ACM (2021)
Google Scholar
Gionis, A., Mannila, H., Mielikäinen, T., Tsaparas, P.: Assessing data mining results via swap randomization. ACM Trans. Knowl. Discov. Data 1(3), 14 (2007)
Article Google Scholar
Hayden, M., et al.: Fast sparse decision tree optimization via reference ensembles. In: AAAI, vol. 36 (2022)
Google Scholar
Hedderich, M., Fischer, J., Klakow, D., Vreeken, J.: Label-descriptive patterns and their application to characterize classification errors. In: ICML (2022)
Google Scholar
Hess, S., Morik, K.: C-SALT: mining class-specific alterations in boolean matrix factorization. In: ECML PKDD, vol. 10534, pp. 547–563 (2017)
Google Scholar
Kusters, R., Kim, Y., Collery, M., Marie, C.d.S., Gupta, S.: Differentiable rule induction with learned relational features. arXiv preprint arXiv:2201.06515 (2022)
Lakkaraju, H., Bach, S.H., Leskovec, J.: Interpretable decision sets: a joint framework for description and prediction. In: SIGKDD, pp. 1675–1684 (2016)
Google Scholar
Lemmerich, F., Becker, M.: pysubgroup: easy-to-use subgroup discovery in python. In: Brefeld, U., et al. (eds.) ECML PKDD 2018. LNCS (LNAI), vol. 11053, pp. 658–662. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-10997-4_46
Chapter Google Scholar
Lin, J.J., Zhong, C., Hu, D., Rudin, C., Seltzer, M.I.: Generalized and scalable optimal sparse decision trees. In: ICML (2020)
Google Scholar
Pellegrina, L., Riondato, M., Vandin, F.: Spumante: significant pattern mining with unconditional testing. In: SIGKDD, pp. 1528–1538 (2019)
Google Scholar
Proença, H.M., van Leeuwen, M.: Interpretable multiclass classification by mdl-based rule lists. Inf. Sci. 512, 1372–1393 (2020)
Article Google Scholar
Shi, T., Kang, K., Choo, J., Reddy, C.K.: Short-text topic modeling via NMF enriched with local word-context correlations. In: WWW (2018)
Google Scholar
Walter, N.P., Fischer, J., Vreeken, J.: Finding interpretable class-specific patterns through efficient neural search. In: AAAI (2024)
Google Scholar
Wang, Z., Zhang, W., Liu, N., Wang, J.: Scalable rule-based representation learning for interpretable classification. In: NeurIPS, vol. 34, pp. 30479–30491 (2021)
Google Scholar
Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: ACM SIGIR (2003)
Google Scholar
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: SIGKDD, pp. 283–286 (1997)
Google Scholar

Download references

Acknowledgement

This work benefited from state aid managed by the National Research Agency under France 2030 with the reference “ANR-22-PEAE-0008”.

Author information

Authors and Affiliations

PALO IT, Lyon, France
Thibaut Chataing
INSA Lyon, CNRS, LIRIS UMR 5205, 69621, Villeurbanne, France
Thibaut Chataing & Céline Robardet
EPITA Research Laboratory (LRE), 94276, Le Kremlin-Bicêtre, France
Julien Perez & Marc Plantevit

Authors

Thibaut Chataing
View author publications
You can also search for this author in PubMed Google Scholar
Julien Perez
View author publications
You can also search for this author in PubMed Google Scholar
Marc Plantevit
View author publications
You can also search for this author in PubMed Google Scholar
Céline Robardet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marc Plantevit .

Editor information

Editors and Affiliations

LTCI, Télécom Paris, Palaiseau Cedex, France
Albert Bifet
KU Leuven, Leuven, Belgium
Jesse Davis
Faculty of Informatics, Vytautas Magnus University, Akademija, Lithuania
Tomas Krilavičius
Institute of Computer Science, University of Tartu, Tartu, Estonia
Meelis Kull
Department of Computer Science, Bundeswehr University Munich, Munich, Germany
Eirini Ntoutsi
Dept. of Computer Science, University of Helsinki, Helsinki, Finland
Indrė Žliobaitė

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chataing, T., Perez, J., Plantevit, M., Robardet, C. (2024). DiffVersify: a Scalable Approach to Differentiable Pattern Mining with Coverage Regularization. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14946. Springer, Cham. https://doi.org/10.1007/978-3-031-70365-2_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-70365-2_24
Published: 22 August 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70364-5
Online ISBN: 978-3-031-70365-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

DiffVersify: a Scalable Approach to Differentiable Pattern Mining with Coverage Regularization