Skip to main content
Log in

Bayesian analysis of GUHA hypotheses

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

The LISp-Miner system for data mining and knowledge discovery uses the GUHA method to comb through a large data base and finds 2 × 2 contingency tables that satisfy a certain condition given by generalised quantifiers and thereby suggest the existence of possible relations between attributes. In this paper, we show how a more detailed interpretation of the data in the tables that were found by GUHA can be obtained using Bayesian statistical methods. Using a multinomial sampling model and Dirichlet prior, we derive posterior distributions for parameters that correspond to GUHA generalised quantifiers. Examples are presented illustrating the new Bayesian post-processing tools implemented in LISp-Miner. A statistical model for the analysis of contingency tables for data from two subpopulations is also presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Balakrishnan, N., & Nevzorov, V.B. (2003). A primer on statistical distributions. New York: Wiley.

    Book  MATH  Google Scholar 

  • Berry, D.A. (1996). Statistics: A Bayesian perspective. Duxberry Press.

  • Bolstad, W. (2007). Introduction to Bayesian statistics (2nd ed.). New York: Wiley.

    Book  MATH  Google Scholar 

  • Cook, J.D. (2009). Exact calculation of beta inequalities. Technical Report 54, University of Texax M. D. Anderson Cancer Center Department of Biostatistics. http://biostats.bepress.com/mdandersonbiostat/paper54. Accessed 19 June 2013

  • Cools, R. (2003). An encyclopaedia of cubature formulas. Journal of Complexity, 19, 445–453.

    Article  MATH  MathSciNet  Google Scholar 

  • Dardzinska, A. (2013). Action rules mining. Studies in Computational Intelligence (Vol. 468). Springer.

  • Devroye, L. (1986). Non-uniform random variate generation. New York: Springer. Web Edition http://www.nrbook.com/devroye/. Accessed 19 June 2013

    Book  MATH  Google Scholar 

  • Eerola, H. (2009). Lääketieteellisen datan analysointia GUHA-tiedonlouhintamenetelmällä (in Finnish). Master’s thesis, Tampere University of Technology.

  • Frigyik, B., Kapila, A., Gupta, M. (2010). Introduction to the Dirichlet distribution and related processes. Technical Report UWEETR-2010-0006, University of Washington Information Design Lab. http://ee.washington.edu/research/guptalab/publications/UWEETR-2010-0006.pdf.

  • Hájek, P., & Havránek, T. (1978). Mechanizing hypothesis formation: Mathematical foundations for a general theory. Springer. http://www.cs.cas.cz/hajek/guhabook/. Accessed 19 June 2013

  • Hájek, P., Havel, I., Chytil, M. (1966). The GUHA method of automatic hypotheses determination. Computing, 1, 293–308. ISSN 0010-485X. doi:10.1007/BF02345483.

    Article  MATH  Google Scholar 

  • Hájek, P., Holeňa, M., Rauch, J. (2010). The GUHA method and its meaning for data mining. Journal of Computer and System Sciences, 76(1), 34–48. ISSN 0022-0000. doi:10.1016/j.jcss.2009.05.004.

    Article  MATH  MathSciNet  Google Scholar 

  • Hubbard, R. (2011). The widespread misinterpretation of p-values as error probabilities. Journal of Applied Statistics, 38(11), 2617–2626. ISSN 0266-4763 (print), 1360-0532 (electronic). doi:10.1080/02664763.2011.567245.

    Article  MathSciNet  Google Scholar 

  • Kotz, S., Balakrishnan, N., Johnson, N.L. (2000). Continuous multivariate distributions, volume 1: Models and applications (2nd ed.). New York: Wiley.

    Book  Google Scholar 

  • Lee, P.M. (2012). Bayesian statistics: An introduction. New York: Wiley.

    Google Scholar 

  • Myllymäki, P., Silander, T., Tirri, H., Uronen, P. (2002). B-course contraceptive method choice dataset. http://b-course.cs.helsinki.fi/obc/cmcexpl.html. Accessed 19 June 2013

  • Ng, K.W., Tian, G., Tang, M. (2011). Dirichlet and related distributions. New York: Wiley.

    Book  MATH  Google Scholar 

  • Pham-Gia, T., Turkkan, N., Eng, P. (1993). Bayesian analysis of the difference of two proportions. Communications in Statistics Theory and Methods, 22(6), 1755–1771.

    Article  MATH  MathSciNet  Google Scholar 

  • Piché, R., & Turunen, E. (2010). Bayesian assaying of GUHA nuggets. In E. Hüllermeier, R. Kruse, F. Hoffmann (Eds.), Information processing and management of uncertainty in knowledge-based systems. Theory and Methods, Communications in computer and information science (Vol. 80, pp. 348–355). doi:10.1007/978-3-642-14055-6.

  • Ras, Z., & Wieczorkowska, A. (2000). Action-rules: How to increase profit of a company. In D. Zighed, J. Komorowski, J. Zytkow (Eds.), Principles of data mining and knowledge discovery. Lecture notes in computer science (Vol. 1910, pp. 75–116). Springer. ISBN 978-3-540-41066-9. doi:10.1007/3-540-45372-5_70.

  • Rauch, J. (2005). Logic of association rules. Applied Intelligence, 22, 9–28.

    Article  MATH  Google Scholar 

  • Rauch, J. (2009). Considerations on logical calculi for dealing with knowledge in data mining online. Applied Intelligence, 22, 177–201.

    Google Scholar 

  • Rauch, J. (2013). Observational calculi and association rules. Studies in computational intelligence. Springer.

  • Rauch, J., & Šimůnek, M. (2005). An alternative approach to mining association rules. In T.Y. Lin, S. Ohsuga, C.-J. Liau, X. Hu, S. Tsumoto (Eds.), Foundations of data mining and knowledge discovery. Studies in computational intelligence (Vol. 6, pp. 211–231). Springer. ISBN 978-3-540-26257-2. doi:10.1007/11498186_13.

  • Rauch, J., & Šimůnek, M. (2009). Dealing with background knowledge in the sewebar project. In B. Berendt, D. Mladenic, M. de Gemmis, G. Semeraro, M. Spiliopoulou, G. Stumme, V. Svatek, F. Železnỳ (Eds.), Knowledge discovery enhanced with semantic and social information (pp. 89–106). Springer.

  • Rauch, J., & Šimůnek, M. (2012). LISp-Miner project homepage. http://lispminer.vse.cz/ (online). Accessed 21 Sep 2012.

  • Roussas, G. (1997). A course in mathematical statistics (2nd ed.). New York: Academic.

    MATH  Google Scholar 

  • Šimůnek, M. (2003). Academic KDD project LISp-Miner. In A. Abraham, K. Franke, K. Koppen (Eds.), Intelligent systems design and applications, advances in soft computing (pp. 263–272). Springer.

  • Šimundić, A.-M. & Nikolac, N. (2009). Statistical errors in manuscripts submitted to biochemia medica journal. Biechemia Medica, 19(3), 294–300.

    Google Scholar 

  • Turunen, E. (2012). The GUHA method in data mining. Lecture notes. Tampere University of Technology. http://URN.fi/URN:NBN:fi:tty-201209261292. Accessed 19 June 2013

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert Piché.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Piché, R., Järvenpää, M., Turunen, E. et al. Bayesian analysis of GUHA hypotheses. J Intell Inf Syst 42, 47–73 (2014). https://doi.org/10.1007/s10844-013-0255-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-013-0255-6

Keywords

Mathematics Subject Classifications (2010)

Navigation