Skip to main content

DEvIANT: Discovering Significant Exceptional (Dis-)Agreement Within Groups

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11906))

Abstract

We strive to find contexts (i.e., subgroups of entities) under which exceptional (dis-)agreement occurs among a group of individuals, in any type of data featuring individuals (e.g., parliamentarians, customers) performing observable actions (e.g., votes, ratings) on entities (e.g., legislative procedures, movies). To this end, we introduce the problem of discovering statistically significant exceptional contextual intra-group agreement patterns. To handle the sparsity inherent to voting and rating data, we use Krippendorff’s Alpha measure for assessing the agreement among individuals. We devise a branch-and-bound algorithm, named DEvIANT, to discover such patterns. DEvIANT exploits both closure operators and tight optimistic estimates. We derive analytic approximations for the confidence intervals (CIs) associated with patterns for a computationally efficient significance assessment. We prove that these approximate CIs are nested along specialization of patterns. This allows to incorporate pruning properties in DEvIANT to quickly discard non-significant patterns. Empirical study on several datasets demonstrates the efficiency and the usefulness of DEvIANT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    This paradigm naturally raises the question of how to address the multiple comparisons problem [19]. This is a non-trivial task in our setting, and solving it requires an extension of the significant pattern mining paradigm as a whole: its scope is bigger than this paper. We provide a brief discussion in Appendix C.

  2. 2.

    In the same line of reasoning of [5], one can assume that the underlying distribution can be derived from what prior beliefs the end-user may have on such distribution. If only the observed expectation \(\mu \) and variance \(\sigma ^2\) are given as constraints which must hold for the underlying distribution, the maximum entropy distribution (taking into account no other prior information than the given constraints) is known to be the Normal distribution \(\mathcal {N}(\mu ,\sigma ^2)\) [3, p.413].

  3. 3.

    Random-SMWA: Randomized algorithm - Subset with Maximum Weighted Average.

  4. 4.

    Finding the subset having the minimum weighted average is a dual problem to finding the subset having the maximum weighted average. To solve the former problem using Random-SMWA, we modify the values of \(v_i\) to \(-v_i\) and keep the same weights \(w_i\).

References

  1. Amer-Yahia, S., Kleisarchaki, S., Kolloju, N.K., Lakshmanan, L.V., Zamar, R.H..: Exploring rated datasets with rating maps. In: WWW (2017)

    Google Scholar 

  2. Belfodil, A., Cazalens, S., Lamarre, P., Plantevit, M.: Flash points: discovering exceptional pairwise behaviors in vote or rating data. In: Ceci, M., Hollmén, J., Todorovski, L., Vens, C., Džeroski, S. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10535, pp. 442–458. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71246-8_27

    Chapter  Google Scholar 

  3. Cover, T., Thomas, J.: Elements of Information Theory. Wiley, Hoboken (2012)

    MATH  Google Scholar 

  4. Das, M., Amer-Yahia, S., Das, G., Mri, C.Y.: Meaningful interpretations of collaborative ratings. PVLDB 4(11), 1063–1074 (2011)

    Google Scholar 

  5. de Bie, T.: An information theoretic framework for data mining. In: KDD (2011)

    Google Scholar 

  6. Duivesteijn, W., Feelders, A.J., Knobbe, A.: Exceptional model mining. Data Min. Knowl. Disc. 30(1), 47–98 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  7. Duivesteijn, W., Knobbe, A.: Exploiting false discoveries-statistical validation of patterns and quality measures in subgroup discovery. In: ICDM (2011)

    Google Scholar 

  8. Duivesteijn, W., Knobbe, A.J., Feelders, A., van Leeuwen, M.: Subgroup discovery meets Bayesian networks - an exceptional model mining approach. In: ICDM (2010)

    Google Scholar 

  9. Duris, F., et al.: Mean and variance of ratios of proportions from categories of a multinomial distribution. J. Stat. Distrib. Appl. 5(1), 1–20 (2018). https://doi.org/10.1186/s40488-018-0083-x

    Article  MathSciNet  Google Scholar 

  10. Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. CRC Press, Boca Raton (1994)

    Book  MATH  Google Scholar 

  11. Eppstein, D., Hirschberg, D.S.: Choosing subsets with maximum weighted average. J. Algorithms 24(1), 177–193 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  12. Ganter, B., Kuznetsov, S.O.: Pattern structures and their projections. In: Delugach, H.S., Stumme, G. (eds.) ICCS-ConceptStruct 2001. LNCS (LNAI), vol. 2120, pp. 129–142. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44583-8_10

    Chapter  Google Scholar 

  13. Ganter, B., Wille, R.: Formal Concept Analysis - Mathematical Foundations. Springer, Heidelberg (1999). https://doi.org/10.1007/978-3-642-59830-2

    Book  MATH  Google Scholar 

  14. Geisser, S.: Predictive Inference, vol. 55. CRC Press, Boca Raton (1993)

    Book  MATH  Google Scholar 

  15. Grosskreutz, H., Rüping, S., Wrobel, S.: Tight optimistic estimates for fast subgroup discovery. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS (LNAI), vol. 5211, pp. 440–456. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87479-9_47

    Chapter  Google Scholar 

  16. Hämäläinen, W.: StatApriori: an efficient algorithm for searching statistically significant association rules. Knowl. Inf. Syst. 23(3), 373–399 (2010)

    Article  Google Scholar 

  17. Hämäläinen, W., Webb, G.I.: A tutorial on statistically sound pattern discovery. Data Min. Knowl. Disc. 33(2), 325–377 (2018). https://doi.org/10.1007/s10618-018-0590-x

    Article  MathSciNet  Google Scholar 

  18. Hayes, A.F., Krippendorff, K.: Answering the call for a standard reliability measure for coding data. Commun. Methods Meas. 1(1), 77–89 (2007)

    Article  Google Scholar 

  19. Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 65–70 (1979)

    Google Scholar 

  20. Kendall, M., Stuart, A., Ord, J.: Kendall’s advanced theory of statistics. v. 1: distribution theory (1994)

    Google Scholar 

  21. Krippendorff, K.: Content Analysis, An Introduction to Its Methodology (2004)

    Google Scholar 

  22. Kuznetsov, S.O.: Learning of simple conceptual graphs from positive and negative examples. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 384–391. Springer, Heidelberg (1999). https://doi.org/10.1007/978-3-540-48247-5_47

    Chapter  Google Scholar 

  23. van Leeuwen, M., Knobbe, A.J.: Diverse subgroup set discovery. Data Min. Knowl. Discov. 25(2), 208–242 (2012)

    Article  MathSciNet  Google Scholar 

  24. Lemmerich, F., Becker, M., Atzmueller, M.: Generic pattern trees for exhaustive exceptional model mining. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS (LNAI), vol. 7524, pp. 277–292. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33486-3_18

    Chapter  Google Scholar 

  25. Lemmerich, F., Becker, M., Singer, P., Helic, D., Hotho, A., Strohmaier, M.: Mining subgroups with exceptional transition behavior. In: KDD (2016)

    Google Scholar 

  26. Minato, S., Uno, T., Tsuda, K., Terada, A., Sese, J.: A fast method of statistical assessment for combinatorial hypotheses based on frequent itemset enumeration. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8725, pp. 422–436. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44851-9_27

    Chapter  Google Scholar 

  27. Webb, G.I.: Discov significant patterns. Mach. Learn. 68(1), 1–33 (2007)

    Article  MathSciNet  Google Scholar 

  28. Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: PKDD (1997)

    Google Scholar 

Download references

Acknowledgments

This work has been partially supported by the project ContentCheck ANR-15-CE23-0025 funded by the French National Research Agency. The authors would like to thank the reviewers for their valuable remarks. They also warmly thank Arno Knobbe, Simon van der Zon, Aimene Belfodil and Gabriela Ciuperca for interesting discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adnene Belfodil .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 593 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Belfodil, A., Duivesteijn, W., Plantevit, M., Cazalens, S., Lamarre, P. (2020). DEvIANT: Discovering Significant Exceptional (Dis-)Agreement Within Groups. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11906. Springer, Cham. https://doi.org/10.1007/978-3-030-46150-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-46150-8_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-46149-2

  • Online ISBN: 978-3-030-46150-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics