DEvIANT: Discovering Significant Exceptional (Dis-)Agreement Within Groups

Belfodil, Adnene; Duivesteijn, Wouter; Plantevit, Marc; Cazalens, Sylvie; Lamarre, Philippe

doi:10.1007/978-3-030-46150-8_1

Adnene Belfodil¹⁴,
Wouter Duivesteijn¹⁵,
Marc Plantevit¹⁶,
Sylvie Cazalens¹⁴ &
…
Philippe Lamarre¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11906))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2457 Accesses
1 Citations

Abstract

We strive to find contexts (i.e., subgroups of entities) under which exceptional (dis-)agreement occurs among a group of individuals, in any type of data featuring individuals (e.g., parliamentarians, customers) performing observable actions (e.g., votes, ratings) on entities (e.g., legislative procedures, movies). To this end, we introduce the problem of discovering statistically significant exceptional contextual intra-group agreement patterns. To handle the sparsity inherent to voting and rating data, we use Krippendorff’s Alpha measure for assessing the agreement among individuals. We devise a branch-and-bound algorithm, named DEvIANT, to discover such patterns. DEvIANT exploits both closure operators and tight optimistic estimates. We derive analytic approximations for the confidence intervals (CIs) associated with patterns for a computationally efficient significance assessment. We prove that these approximate CIs are nested along specialization of patterns. This allows to incorporate pruning properties in DEvIANT to quickly discard non-significant patterns. Empirical study on several datasets demonstrates the efficiency and the usefulness of DEvIANT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Identifying exceptional (dis)agreement between groups

Article 26 November 2019

Flash Points: Discovering Exceptional Pairwise Behaviors in Vote or Rating Data

Flexible level-1 consensus ensuring stable social choice: analysis and algorithms

Article 02 November 2017

Notes

1.
This paradigm naturally raises the question of how to address the multiple comparisons problem [19]. This is a non-trivial task in our setting, and solving it requires an extension of the significant pattern mining paradigm as a whole: its scope is bigger than this paper. We provide a brief discussion in Appendix C.
2.
In the same line of reasoning of [5], one can assume that the underlying distribution can be derived from what prior beliefs the end-user may have on such distribution. If only the observed expectation $\mu $ and variance $\sigma ^2$ are given as constraints which must hold for the underlying distribution, the maximum entropy distribution (taking into account no other prior information than the given constraints) is known to be the Normal distribution $\mathcal {N}(\mu ,\sigma ^2)$ [3, p.413].
3.
Random-SMWA: Randomized algorithm - Subset with Maximum Weighted Average.
4.
Finding the subset having the minimum weighted average is a dual problem to finding the subset having the maximum weighted average. To solve the former problem using Random-SMWA, we modify the values of $v_i$ to $-v_i$ and keep the same weights $w_i$.

References

Amer-Yahia, S., Kleisarchaki, S., Kolloju, N.K., Lakshmanan, L.V., Zamar, R.H..: Exploring rated datasets with rating maps. In: WWW (2017)
Google Scholar
Belfodil, A., Cazalens, S., Lamarre, P., Plantevit, M.: Flash points: discovering exceptional pairwise behaviors in vote or rating data. In: Ceci, M., Hollmén, J., Todorovski, L., Vens, C., Džeroski, S. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10535, pp. 442–458. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71246-8_27
Chapter Google Scholar
Cover, T., Thomas, J.: Elements of Information Theory. Wiley, Hoboken (2012)
MATH Google Scholar
Das, M., Amer-Yahia, S., Das, G., Mri, C.Y.: Meaningful interpretations of collaborative ratings. PVLDB 4(11), 1063–1074 (2011)
Google Scholar
de Bie, T.: An information theoretic framework for data mining. In: KDD (2011)
Google Scholar
Duivesteijn, W., Feelders, A.J., Knobbe, A.: Exceptional model mining. Data Min. Knowl. Disc. 30(1), 47–98 (2016)
Article MathSciNet MATH Google Scholar
Duivesteijn, W., Knobbe, A.: Exploiting false discoveries-statistical validation of patterns and quality measures in subgroup discovery. In: ICDM (2011)
Google Scholar
Duivesteijn, W., Knobbe, A.J., Feelders, A., van Leeuwen, M.: Subgroup discovery meets Bayesian networks - an exceptional model mining approach. In: ICDM (2010)
Google Scholar
Duris, F., et al.: Mean and variance of ratios of proportions from categories of a multinomial distribution. J. Stat. Distrib. Appl. 5(1), 1–20 (2018). https://doi.org/10.1186/s40488-018-0083-x
Article MathSciNet Google Scholar
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. CRC Press, Boca Raton (1994)
Book MATH Google Scholar
Eppstein, D., Hirschberg, D.S.: Choosing subsets with maximum weighted average. J. Algorithms 24(1), 177–193 (1997)
Article MathSciNet MATH Google Scholar
Ganter, B., Kuznetsov, S.O.: Pattern structures and their projections. In: Delugach, H.S., Stumme, G. (eds.) ICCS-ConceptStruct 2001. LNCS (LNAI), vol. 2120, pp. 129–142. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44583-8_10
Chapter Google Scholar
Ganter, B., Wille, R.: Formal Concept Analysis - Mathematical Foundations. Springer, Heidelberg (1999). https://doi.org/10.1007/978-3-642-59830-2
Book MATH Google Scholar
Geisser, S.: Predictive Inference, vol. 55. CRC Press, Boca Raton (1993)
Book MATH Google Scholar
Grosskreutz, H., Rüping, S., Wrobel, S.: Tight optimistic estimates for fast subgroup discovery. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS (LNAI), vol. 5211, pp. 440–456. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87479-9_47
Chapter Google Scholar
Hämäläinen, W.: StatApriori: an efficient algorithm for searching statistically significant association rules. Knowl. Inf. Syst. 23(3), 373–399 (2010)
Article Google Scholar
Hämäläinen, W., Webb, G.I.: A tutorial on statistically sound pattern discovery. Data Min. Knowl. Disc. 33(2), 325–377 (2018). https://doi.org/10.1007/s10618-018-0590-x
Article MathSciNet Google Scholar
Hayes, A.F., Krippendorff, K.: Answering the call for a standard reliability measure for coding data. Commun. Methods Meas. 1(1), 77–89 (2007)
Article Google Scholar
Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 65–70 (1979)
Google Scholar
Kendall, M., Stuart, A., Ord, J.: Kendall’s advanced theory of statistics. v. 1: distribution theory (1994)
Google Scholar
Krippendorff, K.: Content Analysis, An Introduction to Its Methodology (2004)
Google Scholar
Kuznetsov, S.O.: Learning of simple conceptual graphs from positive and negative examples. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 384–391. Springer, Heidelberg (1999). https://doi.org/10.1007/978-3-540-48247-5_47
Chapter Google Scholar
van Leeuwen, M., Knobbe, A.J.: Diverse subgroup set discovery. Data Min. Knowl. Discov. 25(2), 208–242 (2012)
Article MathSciNet Google Scholar
Lemmerich, F., Becker, M., Atzmueller, M.: Generic pattern trees for exhaustive exceptional model mining. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS (LNAI), vol. 7524, pp. 277–292. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33486-3_18
Chapter Google Scholar
Lemmerich, F., Becker, M., Singer, P., Helic, D., Hotho, A., Strohmaier, M.: Mining subgroups with exceptional transition behavior. In: KDD (2016)
Google Scholar
Minato, S., Uno, T., Tsuda, K., Terada, A., Sese, J.: A fast method of statistical assessment for combinatorial hypotheses based on frequent itemset enumeration. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8725, pp. 422–436. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44851-9_27
Chapter Google Scholar
Webb, G.I.: Discov significant patterns. Mach. Learn. 68(1), 1–33 (2007)
Article MathSciNet Google Scholar
Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: PKDD (1997)
Google Scholar

Download references

Acknowledgments

This work has been partially supported by the project ContentCheck ANR-15-CE23-0025 funded by the French National Research Agency. The authors would like to thank the reviewers for their valuable remarks. They also warmly thank Arno Knobbe, Simon van der Zon, Aimene Belfodil and Gabriela Ciuperca for interesting discussions.

Author information

Authors and Affiliations

Univ Lyon, INSA Lyon, CNRS, LIRIS UMR 5205, 69621, Lyon, France
Adnene Belfodil, Sylvie Cazalens & Philippe Lamarre
Technische Universiteit Eindhoven, Eindhoven, The Netherlands
Wouter Duivesteijn
Univ Lyon, CNRS, LIRIS UMR 5205, 69622, Lyon, France
Marc Plantevit

Authors

Adnene Belfodil
View author publications
You can also search for this author in PubMed Google Scholar
Wouter Duivesteijn
View author publications
You can also search for this author in PubMed Google Scholar
Marc Plantevit
View author publications
You can also search for this author in PubMed Google Scholar
Sylvie Cazalens
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Lamarre
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adnene Belfodil .

Editor information

Editors and Affiliations

Leuphana University, Lüneburg, Germany
Ulf Brefeld
IRISA/Inria, Rennes, France
Elisa Fromont
University of Würzburg, Würzburg, Germany
Andreas Hotho
Leiden University, Leiden, The Netherlands
Arno Knobbe
ETH Zurich, Zurich, Switzerland
Marloes Maathuis
Institut National des Sciences Appliquées, Villeurbanne, France
Céline Robardet

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 593 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Belfodil, A., Duivesteijn, W., Plantevit, M., Cazalens, S., Lamarre, P. (2020). DEvIANT: Discovering Significant Exceptional (Dis-)Agreement Within Groups. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11906. Springer, Cham. https://doi.org/10.1007/978-3-030-46150-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-46150-8_1
Published: 30 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46149-2
Online ISBN: 978-3-030-46150-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)