Skip to main content
Log in

Efficient discovery of interesting statements in databases

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

The Explora system supportsDiscovery in Databases by large scale search for interesting instances of statistical patterns. In this paper we describe how Explora assessesinterestingness and achievescomputational efficiency. These problems arise because of the variety of patterns and the immense combinatorial possibilities of generating instances when studying relations between variables in subsets of data. First, the user must be saved from getting overwhelmed with a deluge of findings. To restrict the search with respect to the analysis goals, the user can focus each discovery task performed during an interactive and iterative exploration process. Some basic organization principles of search can further limit the search effort. One principle is to organize search hierarchically and to evaluate first the statistical or information theoretic evidence of the general hypotheses. Then more special hypotheses can be eliminated from further search, if a more general hypothesis was already verified. But this approach alone has some drawbacks and even in moderately sized data does not prevent large sets of findings. Therefore, in a second evaluation phase, further aspects of interestingness are assessed. A refinement strategy selects the most interesting of the statistically significant statements. A second problem for discovery systems is efficiency. Each hypothesis evaluation requires many data accesses. We describe strategies that reduce data accesses and speed up computation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Chan, P., and Stolfo, S. (1993). “Towards Parallel and Distributed Learning by Meta-Learning.” In Piatetsky-Shapiro, G. (Ed.),Proc. AAAI-93 Workshop on Knowledge Discovery in Database, AAAI Press TR-20, pp. 227–240.

  • Frawley, W.J., Piatetsky-Shapiro, G., and Matheus, C.J. (1991). “Knowledge Discovery in Databases: An Overview.” In Piatetsky-Shapiro, G., and Frawley, W. J. (Eds.),Knowledge Discovery in Databases. MIT Press, Cambridge, MA.

    Google Scholar 

  • Gebhardt, F. (1991). “Choosing among Competing Generalizations.”Knowledge Acquisition 3, pp. 361–380.

    Google Scholar 

  • Gebhardt, F. (1994). “Discovering interesting statements from a database.”Applied Stochastic Models and Data Analysis 10 (1).

  • Hoschka, P., and Klösgen, W. (1991). “A Support System for Interpreting Statistical Data.” In Piatetsky-Shapiro, G., and Frawley, W. J. (Eds.),Knowledge Discovery in Databases. MIT Press, Cambridge, MA.

    Google Scholar 

  • Klösgen, W. (1992a). “Problems for Knowledge Discovery in Databases and their Treatment in the Statistics Interpreter EXPLORA.”International Journal for Intelligent Systems vol. 7(7), pp. 649–673.

    Google Scholar 

  • Klösgen, W. (1992b). “Patterns for Knowledge Discovery in Databases.” In Zytkow, J. (Ed.),Proc. ML-92 Workshop on Machine Discovery, pp. 1–10. National Institute for Aviation Research, Wichita, KS.

    Google Scholar 

  • Klösgen, W. (1993).Explora: A support system for Discovery in Databases, Version 1.1, User Manual. GMD, Sankt Augustin.

    Google Scholar 

  • Koopmans, L.H. (1981).An Introduction to Contemporary Statistics. Duxbury Press, Boston, MA.

    Google Scholar 

  • Major, J.A., and Mangano, J.J. (1994). this issue.

  • Matheus, C.J., Chan, P.K., and Piatetsky-Shapiro, G. (1993). “Systems for Knowledge Discovery in Databases.” IEEE TKDE special issue onLearning and Discovery in Knowledge-Based Databases.

  • Merzbacher, M., and Chu, W. (1993). “Pattern-Based Clustering for Database Attribute Values.” In Piatetsky-Shapiro, G. (Ed.),Proc. AAA1-93 Workshop on Knowledge Discovery in Database, AAAI Press TR-20, pp. 291–298.

  • Morik, K., Wrobel, S., Kietz, J. U., and Emde, W. (1993).Knowledge Acquisition and Machine Learning: Theory, Methods and Applications. Academic Press, New York.

    Google Scholar 

  • Piatetsky-Shapiro, G., and Frawley, W. J. (Eds.) (1991),Knowledge Discovery in Databases. MIT Press, Cambridge, MA.

    Google Scholar 

  • Piatetsky-Shapiro, G. and Matheus, C. J. (1992). “Knowledge Discovery Workbench for Exploring Business Databases.”International Journal for Intelligent Systems vol. 7(7), pp. 675–686.

    Google Scholar 

  • Quinlan, J. R. (1990). “Learning Logical Definitions from Relations.”Machine Learning 5(3), pp. 239–266.

    Google Scholar 

  • Valdes-Perez, R., Simon, H., and Zytkow, J. (1993). “Scientific Model Building as Search in Matrix Spaces.” InProc. Eleventh National Conference on Artificial Intelligence, pp. 472–478.

  • Zytkow, J. (Ed.) (1992).Proc. ML-92 Workshop on Machine Discovery. “National Institute for Aviation Research,” Wichita, KS.

    Google Scholar 

  • Zytkow, J., and Baker, J. (1991). “Interactive Mining of Regularities in Databases.” In Piatetsky-Shapiro, G., and Frawley, W. J. (Eds.),Knowledge Discovery in Databases. MIT Press, Cambridge, MA.

    Google Scholar 

  • Zytkow, J., and Zembowicz, R. (1993). “Database Exploration in Search of Regularities.”Journal of Intelligent Information Systems 2, pp. 39–81.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Klösgen, W. Efficient discovery of interesting statements in databases. J Intell Inf Syst 4, 53–69 (1995). https://doi.org/10.1007/BF00962822

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00962822

Keywords

Navigation