ABSTRACT
Rule conflicts can arise in machine learning systems that utilise unordered rule sets. A rule conflict is when two or more rules cover the same example but differ in their majority classes. This conflict must be solved before a classification can be made. The standard methods for solving this type of problem are to use naive Bayes to solve the conflict or using the most frequent class (CN2). This paper studies the problem of rule conflicts in the area of numerical features. A novel family of methods, called distance based methods, for solving rule conflicts in continuous domains is presented. An empirical evaluation between a distance based method, CN2 and naive Bayes is made. It is shown that the distance based method significantly outperforms both naive Bayes and CN2.
- P. Clark and R. Boswell. Rule induction with CN2: Some recent improvements. In Proceedings of the Fifth European Working Session on Learning, pages 151--163, Berlin, 1991. Springer-Verlag. Google ScholarDigital Library
- P. Clark and T. Niblett. The CN2 Induction Algorithm. Machine Learning, 3, 261--283, 1989. Google ScholarDigital Library
- James Dougherty, Ron Kohavi, and Mehran Sahami. Supervised and unsupervised discretization of continuous features. In International Conference on Machine Learning, pages 194--202, 1995.Google ScholarCross Ref
- Tom Fawcett. Using rule sets to maximize roc performance. In ICDM, pages 131--138, 2001. Google ScholarDigital Library
- U. M. Fayyad and K. B. Irani. On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8:87--102, 1992. Google ScholarCross Ref
- J. Fürnkranz and G. Widmer. Incremental Reduced Error Pruning. In Proceedings of the 11th International Conference on Machine Learning, 1994.Google ScholarCross Ref
- Johannes Fürnkranz. Separate-and-Conquer Rule Learning. Artificial Intelligence Review, 1999. Google ScholarDigital Library
- R. Kohavi, B. Becker, and D. Sommerfield. Improving simple Bayes. In Proceedings of the European Conference on Machine Learning, 1997.Google Scholar
- T. Lindgren and H. Boström. Resolving rule conflicts with double induction. Intelligent Data Analysis - An International Journal, Volume 8, Number 5, 2004. Google ScholarDigital Library
- Tony Lindgren. Methods for Rule Conflict Resolution. In Proceedings of the 15th European Conference on Machine Learning (ECML-04), pages 262--273. Springer, 2004.Google Scholar
- Tony Lindgren and Henrik Boström. Classification with Intersecting Rules. In Proceedings of the 13th International Conference on Algorithmic Learning Theory (ALT'02), pages 395--402. Springer, 2002. Google ScholarDigital Library
- Huan Liu, Farhad Hussain, Chew Lim Tan, and Manoranjan Dash. Discretization: An enabling technique. Data Min. Knowl. Discov., 6(4):393--423, 2002. Google ScholarDigital Library
- J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81--106, 1986. Google ScholarCross Ref
- RDS. Rule Discovery System (RDS) --- 1.0, Compumine AB, 2003. www.compumine.com.Google Scholar
Index Terms
- On handling conflicts between rules with numerical features
Recommendations
Efficient learning of large sets of locally optimal classification rules
AbstractConventional rule learning algorithms aim at finding a set of simple rules, where each rule covers as many examples as possible. In this paper, we argue that the rules found in this way may not be the optimal explanations for each of the examples ...
Learning semantically coherent rules
DMNLP'14: Proceedings of the 1st International Conference on Interactions between Data Mining and Natural Language Processing - Volume 1202The capability of building a model that can be understood and interpreted by humans is one of the main selling points of symbolic machine learning algorithms, such as rule or decision tree learners. However, those algorithms are most often optimized ...
Random rules from data streams
SAC '13: Proceedings of the 28th Annual ACM Symposium on Applied ComputingExisting works suggest that random inputs and random features produce good results in classification. In this paper we study the problem of generating random rule sets from data streams. One of the most interpretable and flexible models for data stream ...
Comments