Multiple Hypothesis Testing in Pattern Discovery

Hanhijärvi, Sami

doi:10.1007/978-3-642-24477-3_12

Sami Hanhijärvi²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6926))

Included in the following conference series:

International Conference on Discovery Science

1438 Accesses
7 Citations

Abstract

The problem of multiple hypothesis testing arises when there are more than one hypothesis to be tested simultaneously for statistical significance. This is a very common situation in many data mining applications. For instance, assessing simultaneously the significance of all frequent itemsets of a single dataset entails a host of hypothesis, one for each itemset. A multiple hypothesis testing method is needed to control the number of false positives (Type I error). Our contribution in this paper is to extend the multiple hypothesis framework to be used in a generic data mining setting. We provide a method that provably controls the family-wise error rate (FWER, the probability of at least one false positive). We show the power of our solution on real data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

False Discovery Variance Reduction in Large Scale Simultaneous Hypothesis Tests

Article 21 February 2020

Introducing and analyzing the Bayesian power function as an alternative to the power function for a test

Article 28 September 2015

Multiple Hypothesis Tests: A Bayesian Approach

References

Bay, S.D., Pazzani, M.J.: Detecting group differences: Mining contrast sets. Data Mining and Knowledge Discovery 5(3), 213–246 (2001)
Article MATH Google Scholar
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological) 57(1), 289–300 (1995)
MathSciNet MATH Google Scholar
Dudoit, S., Shaffer, J.P., Boldrick, J.C.: Multiple hypothesis testing in microarray experiments. Statistical Science 18(1), 71–103 (2003)
Article MathSciNet MATH Google Scholar
Gionis, A., Mannila, H., Mielikäinen, T., Tsaparas, P.: Assessing data mining results via swap randomization. ACM Transactions on Knowledge Discovery from Data 1(3) (2007)
Google Scholar
Hanhijärvi, S., Garriga, G.C., Puolamäki, K.: Randomization techniques for graphs. In: Proceedings of the Ninth SIAM International Conference on Data Mining, SDM 2009 (2009)
Google Scholar
Hanhijärvi, S., Ojala, M., Vuokko, N., Puolamäki, K., Tatti, N., Mannila, H.: Tell me something i don’t know: randomization strategies for iterative data mining. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, pp. 379–388. ACM, New York (2009)
Google Scholar
Kuramochi, M., Karypis, G.: An efficient algorithm for discovering frequent subgraphs. IEEE Transactions on Knowledge and Data Engineering 16(9), 1038–1051 (2004)
Article Google Scholar
Lallich, S., Teytaud, O., Prudhomme, E.: Association rule interestingness: measure and statistical validation. Quality Measures in Data Mining, 251–275 (2006)
Google Scholar
Lallich, S., Teytaud, O., Prudhomme, E.: Statistical inference and data mining: false discoveries control. In: 17th COMPSTAT Symposium of the IASC, La Sapienza, Rome, pp. 325–336 (2006)
Google Scholar
Megiddo, N., Srikant, R.: Discovering predictive association rules. In: Knowledge Discovery and Data Mining, pp. 274–278 (1998)
Google Scholar
North, B.V., Curtis, D., Sham, P.C.: A note on the calculation of empirical P values from Monte Carlo procedures. The American Journal of Human Genetics 71(2), 439–441 (2002)
Article Google Scholar
Ojala, M., Vuokko, N., Kallio, A., Haiminen, N., Mannila, H.: Assessing data analysis results on real-valued matrices. Statistical Analysis and Data Mining 2, 209–230 (2009)
Article MathSciNet Google Scholar
Webb, G.: Discovering significant patterns. Machine Learning 68, 1–33 (2007)
Article Google Scholar
Webb, G.: Layered critical values: a powerful direct-adjustment approach to discovering significant patterns. Machine Learning 71, 307–323 (2008)
Article Google Scholar
Webb, G.I.: Discovering significant rules. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, pp. 434–443. ACM, New York (2006)
Google Scholar
Westfall, P.H., Young, S.S.: Resampling-based multiple testing: examples and methods for p-value adjustment. Wiley, Chichester (1993)
MATH Google Scholar
Ying, X., Wu, X.: Graph generation with predescribed feature constraints. In: Proceedings of the Ninth SIAM International Conference on Data Mining, SDM 2009 (2009)
Google Scholar
Zhang, H., Padmanabhan, B., Tuzhilin, A.: On the discovery of significant statistical quantitative rules. In: KDD 2004: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 374–383. ACM, New York (2004)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information and Computer Science, Aalto University, Finland
Sami Hanhijärvi

Authors

Sami Hanhijärvi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Software Systems, Tampere University of Technology, P. O. Box 553, 33101, Tampere, Finland
Tapio Elomaa
Department of Information and Computer Science, Aalto University School of Science, P.O. Box 15400, 00076, Aalto, Finland
Jaakko Hollmén
Helsinki Institute for Information Technology (HIIT), Finland
Heikki Mannila

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hanhijärvi, S. (2011). Multiple Hypothesis Testing in Pattern Discovery. In: Elomaa, T., Hollmén, J., Mannila, H. (eds) Discovery Science. DS 2011. Lecture Notes in Computer Science(), vol 6926. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24477-3_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-24477-3_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24476-6
Online ISBN: 978-3-642-24477-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics