Bayesian analysis of GUHA hypotheses

Piché, Robert; Järvenpää, Marko; Turunen, Esko; Šimůnek, Milan

doi:10.1007/s10844-013-0255-6

Bayesian analysis of GUHA hypotheses

Published: 23 June 2013

Volume 42, pages 47–73, (2014)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Robert Piché¹,
Marko Järvenpää¹,
Esko Turunen² &
…
Milan Šimůnek³

264 Accesses
4 Citations
Explore all metrics

Abstract

The LISp-Miner system for data mining and knowledge discovery uses the GUHA method to comb through a large data base and finds 2 × 2 contingency tables that satisfy a certain condition given by generalised quantifiers and thereby suggest the existence of possible relations between attributes. In this paper, we show how a more detailed interpretation of the data in the tables that were found by GUHA can be obtained using Bayesian statistical methods. Using a multinomial sampling model and Dirichlet prior, we derive posterior distributions for parameters that correspond to GUHA generalised quantifiers. Examples are presented illustrating the new Bayesian post-processing tools implemented in LISp-Miner. A statistical model for the analysis of contingency tables for data from two subpopulations is also presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Balakrishnan, N., & Nevzorov, V.B. (2003). A primer on statistical distributions. New York: Wiley.
Book MATH Google Scholar
Berry, D.A. (1996). Statistics: A Bayesian perspective. Duxberry Press.
Bolstad, W. (2007). Introduction to Bayesian statistics (2nd ed.). New York: Wiley.
Book MATH Google Scholar
Cook, J.D. (2009). Exact calculation of beta inequalities. Technical Report 54, University of Texax M. D. Anderson Cancer Center Department of Biostatistics. http://biostats.bepress.com/mdandersonbiostat/paper54. Accessed 19 June 2013
Cools, R. (2003). An encyclopaedia of cubature formulas. Journal of Complexity, 19, 445–453.
Article MATH MathSciNet Google Scholar
Dardzinska, A. (2013). Action rules mining. Studies in Computational Intelligence (Vol. 468). Springer.
Devroye, L. (1986). Non-uniform random variate generation. New York: Springer. Web Edition http://www.nrbook.com/devroye/. Accessed 19 June 2013
Book MATH Google Scholar
Eerola, H. (2009). Lääketieteellisen datan analysointia GUHA-tiedonlouhintamenetelmällä (in Finnish). Master’s thesis, Tampere University of Technology.
Frigyik, B., Kapila, A., Gupta, M. (2010). Introduction to the Dirichlet distribution and related processes. Technical Report UWEETR-2010-0006, University of Washington Information Design Lab. http://ee.washington.edu/research/guptalab/publications/UWEETR-2010-0006.pdf.
Hájek, P., & Havránek, T. (1978). Mechanizing hypothesis formation: Mathematical foundations for a general theory. Springer. http://www.cs.cas.cz/hajek/guhabook/. Accessed 19 June 2013
Hájek, P., Havel, I., Chytil, M. (1966). The GUHA method of automatic hypotheses determination. Computing, 1, 293–308. ISSN 0010-485X. doi:10.1007/BF02345483.
Article MATH Google Scholar
Hájek, P., Holeňa, M., Rauch, J. (2010). The GUHA method and its meaning for data mining. Journal of Computer and System Sciences, 76(1), 34–48. ISSN 0022-0000. doi:10.1016/j.jcss.2009.05.004.
Article MATH MathSciNet Google Scholar
Hubbard, R. (2011). The widespread misinterpretation of p-values as error probabilities. Journal of Applied Statistics, 38(11), 2617–2626. ISSN 0266-4763 (print), 1360-0532 (electronic). doi:10.1080/02664763.2011.567245.
Article MathSciNet Google Scholar
Kotz, S., Balakrishnan, N., Johnson, N.L. (2000). Continuous multivariate distributions, volume 1: Models and applications (2nd ed.). New York: Wiley.
Book Google Scholar
Lee, P.M. (2012). Bayesian statistics: An introduction. New York: Wiley.
Google Scholar
Myllymäki, P., Silander, T., Tirri, H., Uronen, P. (2002). B-course contraceptive method choice dataset. http://b-course.cs.helsinki.fi/obc/cmcexpl.html. Accessed 19 June 2013
Ng, K.W., Tian, G., Tang, M. (2011). Dirichlet and related distributions. New York: Wiley.
Book MATH Google Scholar
Pham-Gia, T., Turkkan, N., Eng, P. (1993). Bayesian analysis of the difference of two proportions. Communications in Statistics Theory and Methods, 22(6), 1755–1771.
Article MATH MathSciNet Google Scholar
Piché, R., & Turunen, E. (2010). Bayesian assaying of GUHA nuggets. In E. Hüllermeier, R. Kruse, F. Hoffmann (Eds.), Information processing and management of uncertainty in knowledge-based systems. Theory and Methods, Communications in computer and information science (Vol. 80, pp. 348–355). doi:10.1007/978-3-642-14055-6.
Ras, Z., & Wieczorkowska, A. (2000). Action-rules: How to increase profit of a company. In D. Zighed, J. Komorowski, J. Zytkow (Eds.), Principles of data mining and knowledge discovery. Lecture notes in computer science (Vol. 1910, pp. 75–116). Springer. ISBN 978-3-540-41066-9. doi:10.1007/3-540-45372-5_70.
Rauch, J. (2005). Logic of association rules. Applied Intelligence, 22, 9–28.
Article MATH Google Scholar
Rauch, J. (2009). Considerations on logical calculi for dealing with knowledge in data mining online. Applied Intelligence, 22, 177–201.
Google Scholar
Rauch, J. (2013). Observational calculi and association rules. Studies in computational intelligence. Springer.
Rauch, J., & Šimůnek, M. (2005). An alternative approach to mining association rules. In T.Y. Lin, S. Ohsuga, C.-J. Liau, X. Hu, S. Tsumoto (Eds.), Foundations of data mining and knowledge discovery. Studies in computational intelligence (Vol. 6, pp. 211–231). Springer. ISBN 978-3-540-26257-2. doi:10.1007/11498186_13.
Rauch, J., & Šimůnek, M. (2009). Dealing with background knowledge in the sewebar project. In B. Berendt, D. Mladenic, M. de Gemmis, G. Semeraro, M. Spiliopoulou, G. Stumme, V. Svatek, F. Železnỳ (Eds.), Knowledge discovery enhanced with semantic and social information (pp. 89–106). Springer.
Rauch, J., & Šimůnek, M. (2012). LISp-Miner project homepage. http://lispminer.vse.cz/ (online). Accessed 21 Sep 2012.
Roussas, G. (1997). A course in mathematical statistics (2nd ed.). New York: Academic.
MATH Google Scholar
Šimůnek, M. (2003). Academic KDD project LISp-Miner. In A. Abraham, K. Franke, K. Koppen (Eds.), Intelligent systems design and applications, advances in soft computing (pp. 263–272). Springer.
Šimundić, A.-M. & Nikolac, N. (2009). Statistical errors in manuscripts submitted to biochemia medica journal. Biechemia Medica, 19(3), 294–300.
Google Scholar
Turunen, E. (2012). The GUHA method in data mining. Lecture notes. Tampere University of Technology. http://URN.fi/URN:NBN:fi:tty-201209261292. Accessed 19 June 2013

Download references

Author information

Authors and Affiliations

Tampere University of Technology, Tampere, Finland
Robert Piché & Marko Järvenpää
Center for Machine Perception, Department of Cybernetics, Faculty of Electrical Engineering, Czech Technical University, Prague, Czech Republic
Esko Turunen
University of Economics Prague, Prague, Czech Republic
Milan Šimůnek

Authors

Robert Piché
View author publications
You can also search for this author in PubMed Google Scholar
Marko Järvenpää
View author publications
You can also search for this author in PubMed Google Scholar
Esko Turunen
View author publications
You can also search for this author in PubMed Google Scholar
Milan Šimůnek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert Piché.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Piché, R., Järvenpää, M., Turunen, E. et al. Bayesian analysis of GUHA hypotheses. J Intell Inf Syst 42, 47–73 (2014). https://doi.org/10.1007/s10844-013-0255-6

Download citation

Received: 05 March 2013
Revised: 12 June 2013
Accepted: 13 June 2013
Published: 23 June 2013
Issue Date: February 2014
DOI: https://doi.org/10.1007/s10844-013-0255-6

Keywords

Mathematics Subject Classifications (2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian analysis of GUHA hypotheses

Abstract

Access this article

Similar content being viewed by others

Knowledge Discovery from Constrained Relational Data: A Tutorial on Markov Logic Networks

A tutorial on statistically sound pattern discovery

Constraint-Based Querying for Bayesian Network Exploration

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classifications (2010)

Navigation

Bayesian analysis of GUHA hypotheses

Abstract

Access this article

Similar content being viewed by others

Knowledge Discovery from Constrained Relational Data: A Tutorial on Markov Logic Networks

A tutorial on statistically sound pattern discovery

Constraint-Based Querying for Bayesian Network Exploration

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classifications (2010)

Search

Navigation