Skip to main content
Log in

Understanding the Crucial Role of Attribute Interaction in Data Mining

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

This is a review paper, whose goal is tosignificantly improve our understanding of thecrucial role of attribute interaction in datamining. The main contributions of this paperare as follows. Firstly, we show that theconcept of attribute interaction has a crucialrole across different kinds of problem in datamining, such as attribute construction, copingwith small disjuncts, induction of first-orderlogic rules, detection of Simpson's paradox,and finding several types of interesting rules.Hence, a better understanding of attributeinteraction can lead to a better understandingof the relationship between these kinds ofproblems, which are usually studied separatelyfrom each other. Secondly, we draw attention tothe fact that most rule induction algorithmsare based on a greedy search which does notcope well with the problem of attributeinteraction, and point out some alternativekinds of rule discovery methods which tend tocope better with this problem. Thirdly, wediscussed several algorithms and methods fordiscovering interesting knowledge that,implicitly or explicitly, are based on theconcept of attribute interaction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Anglano, C., Giordana, A., Lo Bello, G. & Saitta, L. (1997). A Network Genetic Algorithm for Concept Learning. Proc. 7th Int. Conf. Genetic Algorithms, 434-441. Morgan Kaufmann.

  • Araujo, D. L. A., Lopes, H. S. & Freitas, A. A. (1999). A Parallel Genetic Algorithm for Rule Discovery in Large Databases. Proc. 1999 IEEE Systems, Man and Cybernetics Conf., v. III, 940-945. Tokyo.

  • Banzhaf, W., Nordin, P., Keller, R.E. & Francone, F.D. (1998) Genetic Programming, an Introduction: on the Automatic Evolution of Computer Programs and its Applications. Morgan Kaufmann.

  • Bhandari, I. (1993). Attribute Focusing: Machine-assisted Knowledge Discovery Applied to Software Production Process Control. Proc. 1993 Workshop on Knowledge Discovery in Databases, 61-69. AAAI Press.

  • Bhandari, I. & Biyani, S. (1994). On the role of statistical significance in exploratory data analysis. Proc. AAAI-94 Workshop on Knowledge Discovery in Databases, 61-72. AAAI Press.

  • Brazdil, P. B. & Henery, R. J. (1994). Analysis of Results. In Michie, D., Spiegelhalter, D.J. & Taylor, C.C. (eds.) Machine Learning, Neural and Statistical Classification, Chapter 10. Ellis Horwood.

  • Carvalho, D. R. & Freitas, A. A. (2000a). A Hybrid Decision Tree/Genetic Algorithm for Coping with the Problem of Small Disjuncts in Data Mining. Proc. Genetic and Evolutionary Computation Conf. (GECCO-2000), 1061-1068. Las Vegas, NV, USA.

  • Carvalho, D. R. & Freitas,A. A. (2000b). A Genetic Algorithm-based Solution for the Problem of Small Disjuncts. Principles of Data Mining and Knowledge Discovery (Proc. 4th European Conf., PKDD-2000). Lecture Notes in Artificial Intelligence 1910, 345-352. Springer-Verlag.

  • Danyluk, A. P. & Provost, F. J. (1993). Small Disjuncts in Action: Learning to Diagnose Errors in the Local Loop of the Telephone Network. Proc. 10th Int. Conf. Machine Learning, 81-88.

  • Dhar, V., Chou, D. & Provost, F. (2000). Discovering Interesting Patterns for Investment Decision Making with GLOWER-A Genetic Learner Overlaid with Entropy Reduction. Data Mining & Knowledge Discovery 4(4): 251-280.

    Google Scholar 

  • Domingos, P. (1995). Rule Induction and Instance-based Learning: a Unified Approach. Proc. 14th Int. Joint Conf. on Artif. Intel. (IJCAI-95), 1226-1232.

  • Dzeroski, S. & Lavrac, N. (1993). Inductive Learning in Deductive Databases. IEEE Trans. Knowledge and Data Engineering 5(6): 939-949.

    Google Scholar 

  • Fabris, C. C. & Freitas, A. A. (1999). Discovering Surprising Patterns by Detecting Occurrences of Simpson's Paradox. In Bramer, M. et al. (eds.) Research and Development in Intelligent Systems XVI, 148-160. Springer-Verlag.

  • Fabris, C. C. & Freitas, A. A. (2000). Incorporating Deviation-detection Functionality into the OLAP Paradigm. Unpublished manuscript.

  • Frawley, W. J., Piatetsky-Shapiro, G. & Matheus, C. J. (1991). Knowlege Discovery in Databases: An Overview. (1991) In Piatetsky-Shapiro, G. & Frawley, W.J. (eds.) Knowledge Discovery in Databases, 1-27. AAAI/MIT Press.

  • Freitas, A. A. (1998). On Objective Measures of Rule Surprisingness. Principles of Data Mining & Knowledge Discovery (Proc. PKDD'98)-Lecture Notes in Artif. Intel. 1510, 1-9. Springer-Verlag.

  • Freitas, A. A. (ed.) (1999). Data Mining with Evolutionary Algorithms: Research Directions-Papers from the AAAI Workshop. Technical Report WS-99-06. AAAI.

  • Freitas,A. A. (ed.) (2000). DataMining with Evolutionary AlgorithmsWorkshop. In Wu, A. S. (ed.) Proc. of the 2000 Genetic and Evolutionary Computation Conf. Workshop Program, 69-92. Las Vegas, NV, USA.

  • Freitas, A. A. & Lavington, S. H. (1998). Mining Very Large Databases with Parallel Processing. Kluwer.

  • Greene, D. P. & Smith, S. F. (1993). Competition-based Induction of Decision Models from Examples. Machine Learning 13, 229-257.

    Google Scholar 

  • Gardner, H. (1984). The Mind's New Science: A History of the Cognitive Revolution. Basic Books.

  • Goil, S. & Choudhary, A. (1997). High Performance OLAP and Data Mining on Parallel Computers. Data Mining and Knowledge Discovery 1(4): 391-417.

    Google Scholar 

  • Holte, R. C., Acker, L. E. & Porter, B.W. (1989). Concept Learning and the Problem of Small Disjuncts. Proc. Int. Joint Conf. Artif. Intel. (IJCAI-89), 813-818.

  • Hu, Y-J. (1998). A Genetic Programming Approach to Constructive Induction. Genetic Programming 1998: Proc. 3rd Annual Conf., 146-151. Morgan Kaufmann.

  • Kuscu, I. (1999). A Genetic Constructive Induction Model. Proc. Congress on Evolutionary Computation (CEC-99), 212-217. Washington D.C., USA.

  • Lavrac, N. & Dzeroski, S. (1994). Inductive Logic Programming: Techniques and Applications. Ellis Horwood.

  • Liu, H. & Motoda, H. (1998). Feature Extraction, Construction and Selection: A Data Mining Perspective. Kluwer.

  • Liu, B., Hsu, W. & Ma, Y. (1999). Pruning and Summarizing the Discovered Associations. Proc. 4th Int. Conf. Knowledge Discovery and Data Mining, 125-134. ACM.

  • Michalewicz, Z. (1996). Genetic Algorithms + Data structures = Evolution Programs, 3rd Ed. Springer-Verlag.

  • Michalski, R. W. (1983). A Theory and Methodology of Inductive Learning. Artificial Intelligence 20: 111-161.

    Google Scholar 

  • Michie, D., Spiegelhalter, D. J. & Taylor, C. C. (1994). Conclusions. In Michie, D., Spiegelhalter, D. J. & Taylor, C. C. (eds.) Machine Learning, Neural and Statistical Classification, Chapter 11, 213-227. Ellis Horwood.

  • Nazar, K. & Bramer, M. A. (1999). Estimating Concept Difficulty with Cross Entropy. In Bramer, M. A. (ed.) Knowledge Discovery and Data Mining, 3-31. London: The Institution of Electrical Engineers.

    Google Scholar 

  • Neri, F. & Giordana, A. (1995). A Parallel Genetic Algorithm for Concept Learning. Proc. 6th Int. Conf. Genetic Algorithms, 436-443. Morgan Kaufmann.

  • Newson, G. (1991). Simpson's Paradox Revisited. The Mathematical Gazette 75(473): 290-293. Oct. 1991.

    Google Scholar 

  • Pazzani, M. J. (2000). Knowledge Discovery from Data? IEEE Intel. Systems, March/April 2000, 10-12.

  • Piatetsky-Shapiro, G. (1991). Knowledge Discovery in Real Databases: A Report on the IJCAI-89 Workshop. AI Magazine, Vol. 11, No. 5, 68-70, Jan. 1991.

    Google Scholar 

  • Provost, F. & Kolluri, V. (1999). A Survey of Methods for Scaling up Inductive Algorithms. Data Mining and Knowledge Discovery 3(2): 131-195.

    Google Scholar 

  • Quinlan, J. R. (1990). Learning Logical Definitions from Relations. Machine Learning 5(3): 239-266.

    Google Scholar 

  • Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann.

  • Rendell, L. & Cho, H. (1990). Empirical Learning as a Function of Concept Character. Machine Learning 5(3): 267-298.

    Google Scholar 

  • Rendell, L. & Seshu, R. (1990). Learning Hard Concepts Through Constructive Induction: Framework and Rationale. Computational Intelligence 6: 247-270.

    Google Scholar 

  • Rendell, L. & Ragavan, H. (1993). Improving the Design of Induction Methods by Analyzing Algorithm Functionality and Data-based Concept Complexity. Proc. 13th Int. Joint Conf. on Artif. Intel. (IJCAI-93), 952-958.

  • Samuel, A. L. (1959). Some Studies in Machine Learning Using the Game of Checkers. IBM journal of Research and Development 3: 211-229. Reprinted in E. A. Feigenbaum (ed.) Computers and Thought. McGraw-Hill, 1963.

  • Schaffer, C. (1993). Overfitting Avoidance as Bias. Machine Learning 10: 153-178.

    Google Scholar 

  • Simpson, E. H. (1951). The Interpretation of Interaction in Contingency Tables. Journal of the Royal Statistical Society, Series B 13: 238-241.

    Google Scholar 

  • Srinivasan, A. & King, R. D. (1999). Feature Construction with Inductive Logic Programming: a Study of Quantitative Predictions of Biological Activity Aided by Structural Attributes. Data Mining and Knowledge Discovery 3(1): 37-57.

    Google Scholar 

  • Taha, I. A. & Ghosh, J. (1999). Symbolic Interpretation of Artificial Neural Networks. IEEE Trans. Knowledge and Data Engineering 11(3):, 448-463. May/June 1999.

    Google Scholar 

  • Ting, K. M. (1994). The Problem of Small Disjuncts: Its Remedy in Decision Trees. Proc. 10th Canadian Conf. Artif. Intel., 91-97.

  • Vaughn, M. L. (1996). Interpretation and Knowledge Discovery from the Multilayer Perceptron Network: Opening the Black Box. Neural Comput. & Appl. 4: 72-82.

    Google Scholar 

  • Wagner, C. H. (1982). Simpson's Paradox in Real Life. The American Statistician 36(1): 46-48. Feb. 1982.

    Google Scholar 

  • Weiss, G. M. (1995). Learning with Rare Cases and Small Disjuncts. Proc. 12th Int. Conf. Machine Learning (ML-95), 558-565. Morgan Kaufmann.

  • Weiss, G. M. (1998). The Problem with Noise and Small Disjuncts. Proc. Int. Conf. Machine Learning (ICML-98), 574-578. Morgan Kaufmann.

  • Weiss, G. M. and Hirsh, H. (2000). A Quantitative Study of Small Disjuncts. Proc. 17th Nat. Conf. on Artificial Intelligence (AAAI-2000), 665-670. AAAI Press.

  • Zytkow, J. (ed.) (1999). Special Session on Data Mining. In: Angeline, P. (ed.), Proc. 1999 Congress on Evolutionary Computation (CEC-99), 1307-1345.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Freitas, A.A. Understanding the Crucial Role of Attribute Interaction in Data Mining. Artificial Intelligence Review 16, 177–199 (2001). https://doi.org/10.1023/A:1011996210207

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1011996210207

Navigation