Worst Case and a Distribution-Based Case Analyses of Sampling for Rule Discovery Based on Generality and Accuracy

Suzuki, Einoshin

doi:10.1023/B:APIN.0000047381.08666.c9

Worst Case and a Distribution-Based Case Analyses of Sampling for Rule Discovery Based on Generality and Accuracy

Published: January 2005

Volume 22, pages 29–36, (2005)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Einoshin Suzuki¹

60 Accesses
2 Citations
Explore all metrics

Abstract

In this paper, we propose two sampling theories of rule discovery based on generality and accuracy. The first theory concerns the worst case: it extends a preliminary version of PAC learning, which represents a worst-case analysis for classification. In our analysis, a rule is defined as a probabilistic constraint of true assignment to the class attribute for corresponding examples, and we mainly analyze the case in which we try to avoid finding a bad rule. Effectiveness of our approach is demonstrated through examples for conjunction-rule discovery. The second theory concerns a distribution-based case: it represents the conditions that a rule exceeds pre-specified thresholds for generality and accuracy with high reliability. The idea is to assume a 2-dimensional normal distribution for two probabilistic variables, and obtain the conditions based on their confidence region. This approach has been validated experimentally using 21 benchmark data sets in the machine learning community against conventional methods each of which evaluates the reliability of generality. Discussions on related work are provided for PAC learning, multiple comparison, and analysis of association-rule discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimating Precisions for Multiple Binary Classifiers Under Limited Samples

Bayesian Confirmation Measures in Rule-Based Classification

A Bayesian Criterion for Evaluating the Robustness of Classification Rules in Binary Data Sets

References

R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A.I. Verkamo, "Fast discovery of association rules," in Advances in Knowledge Discovery and Data Mining, edited by U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, AAAI/MIT Press: Menlo Park, CA, pp. 307–328, 1996.
Google Scholar
E. Suzuki, "Worst-Case analysis of rule discovery," in Discovery Science,LNAI 2226 (DS), Washington D. C., 2001, pp. 365–377. (Erratum: http:// www.slab.dnj.ynu.ac.jp/erratumds2001.pdf.)
E. Suzuki, "Simultaneous reliability evaluation of generality and accuracy for rule discovery in databases," in Proc. Fourth Int. Conf. Knowledge Discovery and Data Mining (KDD),New York, 1998, pp. 339–343.
J. Dougherty, R. Kohavi, and M. Sahami, "Supervised and unsu-pervised discretization of continuous features," in Proc. Twelfth Int. Conf. Machine Learning (ICML),Tahoe City, Calif., 1995, pp. 194–202.
P. Smyth and R.M. Goodman, "An information theoretic approach to rule induction from databases," IEEE Trans. Knowledge and Data Eng.,vol. 4, no. 4, pp. 301–316, 1992.
Google Scholar
M.J. Kearns and U.V. Vazirani, An Introduction to Computational Learning Theory, MIT Press: Cambridge, MA, 1994.
Google Scholar
S. Russel and P. Norvig, Artificial Intelligence, a Modern Approach, Prentice Hall: Upper Saddle River, N.J., pp. 552–558, 1995.
Google Scholar
W. Feller, An Introduction to Probability Theory and Its Applications,Wiley: New York, 1957.
Google Scholar
C.J. Merz and P.M. Murphy, UCI Repository of Machine Learning Databases, http:// www.ics.uci. edu/~mlearn/ MLRepository.html, Univ. of California, Dept. of Information and Computer Sci., 1994.
U.M. Fayyad and K.B. Irani, "Multi-interval discretization of continuous-valued attributes for classification learning," in Proc. Thirteenth Int. Joint Conf. Artificial Intelligence (IJCAI), Chamb´ ery, France, 1993, pp. 1022–1027.
A. Siebes, "Homogeneous discoveries contain no surprises: in-ferring risk-profiles from large databases," in AAAI-94 Workshop on Knowledge Discovery in Databases, Seattle, 1994, pp. 97–107.
L. Devroye, L. Györfi, and G. Lugosi, "Vapnik-Chervonenskis Theory", in A Probabilistic Theory of Pattern Recognition, Springer-Verlag: New York, 1996, pp. 187–213.
Google Scholar
T. Akutsu and A. Takasu, "On PAC learnability of functional dependencies," New Generation Computing,vol. 12, no. 4, pp. 359–374, 1994.
Google Scholar
J. Kivinen and H. Mannila, "Approximate inference of functional dependencies from relations," Theoretical Computer Science, vol. 149, no. 1, pp. 129–149, 1995.
Google Scholar
D.D. Jensen and P.R. Cohen, "Multiple comparisons in induction algorithms," Machine Learning,vol. 38, no. 3, pp. 309–338, 2000.
Google Scholar
C. Schaffer, "Overfitting avoidance as bias," Machine Learning, vol. 10, no. 2, pp. 153–178, 1993.
Google Scholar
J.R. Quinlan and R. Cameron-Jones, "Oversearching and lay-ered search in empirical learning," in Proc. Fourteenth Int. Joint Conf. Artificial Intelligence (IJCAI), Montr´ eal, Canada, 1995, pp. 1019–1024.
E. Suzuki, "Autonomous discovery of reliable exception rules," in Proc. Third Int. Conf. Knowledge Discovery and Data Mining (KDD),Newport Beach, CA, 1997, pp. 259–262.

Download references

Author information

Authors and Affiliations

Division of Electrical and Computer Engineering, Faculty of Engineering, Yokohama National University, 79-5, Tokiwadai, Hodogaya, Yokohama, 240-8501, Japan
Einoshin Suzuki

Authors

Einoshin Suzuki
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Suzuki, E. Worst Case and a Distribution-Based Case Analyses of Sampling for Rule Discovery Based on Generality and Accuracy. Applied Intelligence 22, 29–36 (2005). https://doi.org/10.1023/B:APIN.0000047381.08666.c9

Download citation

Issue Date: January 2005
DOI: https://doi.org/10.1023/B:APIN.0000047381.08666.c9

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Worst Case and a Distribution-Based Case Analyses of Sampling for Rule Discovery Based on Generality and Accuracy

Abstract

Access this article

Similar content being viewed by others

Estimating Precisions for Multiple Binary Classifiers Under Limited Samples

Bayesian Confirmation Measures in Rule-Based Classification

A Bayesian Criterion for Evaluating the Robustness of Classification Rules in Binary Data Sets

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Worst Case and a Distribution-Based Case Analyses of Sampling for Rule Discovery Based on Generality and Accuracy

Abstract

Access this article

Similar content being viewed by others

Estimating Precisions for Multiple Binary Classifiers Under Limited Samples

Bayesian Confirmation Measures in Rule-Based Classification

A Bayesian Criterion for Evaluating the Robustness of Classification Rules in Binary Data Sets

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation