Skip to main content
Log in

A Knowledge Discovery System with Support for Model Selection and Visualization

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The process of knowledge discovery in databases consists of several steps that are iterative and interactive. In each application, to go through this process the user has to exploit different algorithms and their settings that usually yield multiple models. Model selection, that is, the selection of appropriate models or algorithms to achieve such models, requires meta-knowledge of algorithm/model and model performance metrics. Therefore, model selection is usually a difficult task for the user. We believe that simplifying the process of model selection for the user is crucial to the success of real-life knowledge discovery activities. As opposed to most related work that aims to automate model selection, in our view model selection is a semiautomatic process, requiring an effective collaboration between the user and the discovery system. For such a collaboration, our solution is to give the user the ability to try various alternatives and to compare competing models quantitatively by performance metrics, and qualitatively by effective visualization. This paper presents our research on model selection and visualization in the development of a knowledge discovery system called D2MS. The paper addresses the motivation of model selection in knowledge discovery and related work, gives an overview of D2MS, and describes its solution to model selection and visualization. It then presents the usefulness of D2MS model selection in two case studies of discovering medical knowledge in hospital data—on meningitis and stomach cancer—using three data mining methods of decision trees, conceptual clustering, and rule induction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. H. Mannila, “Methods and problems in data mining,” in Inter. Conf. on Database Theory, Springer-Verlag, 1997, pp. 41–55.

  2. D.J. Hand, H. Mannila, and P. Smyth, Principles of Data Mining, The MIT Press, 2001.

  3. J. Han and M. Kamber, Data Mining. Concepts and Techniques, Morgan Kaufmann, 2001.

  4. D.H. Wolpert, “The relationship between PAC, the statistical physics framework, the bayesian framework, and the VC framework,” in The Mathematics of Generalization, edited by D.H. Wolpert, Addison-Wesley, 1995, pp. 117–214.

  5. R.J. Brachman and T. Anand, “The process of knowledge discovery in databases,” in Advances in Knowledge Discovery and Data Mining, edited by U.M. Fayyad et al., AAAI Press/MIT Press, 1996, pp. 37–57.

  6. A.W. Crapo, L.B. Waisel, W.A. Wallace, and T.R. Willemain, “Visualization and the process of modeling: A cognitive-theoretic view,” in Sixth Inter. Conf. on Knowledge Discovery and Data Mining KDD'00, 2000, pp. 218–226.

  7. T.D. Nguyen and T.B. Ho, “An interactive graphic system for decision tree induction,” Journal of Japanese Society for Artificial Intelligence, vol. 14, no.1, 1999, pp. 131–138.

    Google Scholar 

  8. T.D. Nguyen, T.B. Ho, and H. Shimodaira, “Ascalable algorithm for rule post-pruning of large decision trees,” in Fifth Pacific-Asia Conf. on Knowledge Discovery and Data Mining PAKDD'01, LNAI 2035, Springer, 2001, pp. 467–476.

  9. T.B. Ho, “Discovering and using knowledge from unsupervised data,” in Decision Support Systems, Elsevier Science, 1997, vol. 21, no.1, pp. 27–41.

    Google Scholar 

  10. T.B. Ho, “Knowledge discovery from unsupervised data in support of decision making,” Knowledge Based Systems: Techniques and Applications, edited by C.T. Leondes, Academic Press, 2000, pp. 435–461.

  11. T.B. Ho, S. Kawasaki, and D.D. Nguyen, “Extracting predictive knowledge from meningitis data by integration of rule induction and association mining,” in Inter. Workshop Challenge in KDD, JSAI Conference 2001, LNAI 2253, Springer, pp. 508–515.

  12. H.P. Kumar, C. Plaisant, and B. Shneiderman, “Browsing hierarchical data with multi-level dynamic queries and pruning,” Inter. Journal of Human-Computer Studies, vol. 46, no.1, pp. 103–124, 1997.

    Google Scholar 

  13. G.W. Furnas, “The FISHEYE view: A new look at structured files,” Bell Laboratories Technical Memorandum #81-11221-9, 1981.

  14. T.D. Nguyen, T.B. Ho, and H. Shimodaira, “A visualization tool for interactive learning of large decision trees,” in Twelfth IEEE Inter. Conf. on Tools with Artificial Intelligence ICTAI’2000, 2000, pp. 28–35.

  15. D.J. Hand, Construction and Assessment of Classification Rules, John Willey & Sons, 1997.

  16. W. Zucchini, “An introduction to model selection,” Journal of Mathematical Psychology, vol. 44, pp. 41–61, 2000.

    Google Scholar 

  17. M.R. Forster, “Key concepts in model selection: Performance and generalizability,” Journal of Mathematical Psychology, vol. 44, no.1, pp. 205–231, 2000.

    Google Scholar 

  18. C.E. Brodley, “Recursive automatic bias selection for classifier construction,” Machine Learning, 1995, vol. 20, pp. 63–94.

    Google Scholar 

  19. P.B. Brazdil and C. Soares, “A comparison of ranking methods for classification algorithm selection,” in Eleventh European Conf. on Machine Learning ICML’2000, 2000, pp. 63–74.

  20. M. Hilario and A. Kalousis, “Building algorithm profiles for prior model selection in knowledge discovery systems,” Engineering Intelligent Systems, vol. 8, no.2, pp. 77–87, 2000.

    Google Scholar 

  21. A. Kalousis and T. Theoharis, “NOEMON: Design, implementation and performance results for an intelligent assistant for classifier selection,” Intelligent Data Analysis Journal, vol. 3, no.5, pp. 319–337, 1999.

    Google Scholar 

  22. G. Nakhaeizadeh and A. Schnabl, “Development of multicriteria metrics for evaluation of data mining algorithms,” in Third Inter. Conf. on Knowledge Discovery and Data Mining KDD'97, 1997, pp. 37–42.

  23. R. Kohavi, “A study of cross-validation and bootstrap for accuracy estimation and model selection,” in Inter. Joint Conf. on Artificial Intelligence IJCAI’95, 1995, pp. 1137–1143.

  24. C. Brunk, J. Kelly, and R. Kohavi, “MineSet: An integrated system for data mining,” in Third Inter. Conf. on Knowledge Discovery and Data Mining KDD'97, 1997, pp. 135–138.

  25. M. Ankerst, M. Ester, and H.P. Kriegel, “Towards an effective cooperation of the user and the computer for classification,” in Sixth Inter. Conf. on Knowledge Discovery and Data Mining KDD'00, 2000, pp. 197–188.

  26. J. Han, and N. Cercone, “RuleViz: A model for visualizing knowledge discovery process,” in Sixth Inter. Conf. on Knowledge Discovery and Data Mining KDD'2000, 2000, pp. 244–253.

  27. G.G. Robertson, J.D. Mackinlay, and S.K. Card, “Cone trees: Animated 3D visualization of hierarchical information,” in ACM Conf. on Human Factors in Computing Systems, 1991, pp. 189–194.

  28. J. Lamping and R. Rao, “The hyperbolic browser: A focus + context techniques for visualizing large hierarchies,” Journal of Visual Languages and Computing, vol. 7, no.1, pp. 33–35, 1997.

    Google Scholar 

  29. R. Kohavi, D. Sommerfield, and J. Dougherty, “Data mining using MLC++, a machine learning library in C++,” International Journal of Artificial Intelligence Tools, vol. 6, no.4, pp. 537–566, 1997.

    Google Scholar 

  30. C. Domslak, D. Gershkovich, E. Gudes, N. Liusternik, A. Meisels, T. Rosen, and S.E. Shimony, “FlexiMine—A flexible platform for KDD research and application construction,” in Fourth Inter. Conf. on Knowledge Discovery and Data Mining KDD'98, 1998, pp. 184–188.

  31. J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, 1993.

  32. J. Dougherty, R. Kohavi, and M. Sahami, “Supervised and unsupervised discretization of continuous features,” in Twelfth Inter. Conf. on Machine Learning ICML’95, 1995, pp. 194–202.

  33. Y. Fujikawa and T.B. Ho, “Cluster-based algorithms for filling missing values,” 6th Pacific-Asia Conf. Knowledge Discovery and Data Mining, Lecture Notes in Artificial Intelligence 2336, Springer, 2002, pp. 549–554.

  34. H. Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining, Kluwer Academic Publishers, 1998.

  35. L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees, Wadsworth: Belmont, CA, 1984.

    Google Scholar 

  36. J. Mingers, “An empirical comparison of selection measures for decision tree induction,” Machine Learning, vol. 3, pp. 319–342, 1989.

    Google Scholar 

  37. N.B. Nguyen and T.B. Ho, “A mixed similarity measure in near-linear computational complexity for distance-based methods,” in 4th European Conf. on Principles of Data Mining and Knowledge Discovery PKDD'2000, LNAI 1910, Springer, 2000, pp. 211–220.

  38. U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, “From data mining to knowledge discovery: An overview,” in Advances in Knowledge Discovery and Data Mining, edited by U.M. Fayyad et al., AAAI Press/MIT Press, 1996, pp. 1–36.

  39. E.M. Reingold and J.S. Tilford, “Tidier Drawings of Trees,” IEEE Transactions on Software Engineering, vol. SE-7, no.2, pp. 223–228, 1991.

    Google Scholar 

  40. J. Mingers, “An empirical comparison of pruning methods for decision tree induction,” Machine Learning, vol. 4, pp. 227–243, 1989.

    Google Scholar 

  41. J. Furnkranz, “Separate-and-conquer rule learning,” Journal Artificial Intelligence Review, vol. 13, pp. 3–54, 1999.

    Google Scholar 

  42. S. Tsumoto, “Comparison and evaluation of knowledge obtained by KDD methods,” Journal of Japanese Society for Artificial Intelligence, vol. 15, no.5, pp. 790–797, 2000.

    Google Scholar 

  43. B. Liu, W. Hsu, and Y. Ma, “Integrating classification and association rule mining,” in Fourth Inter. Conf. on Knowledge Discovery and Data Mining KDD'98, 1998, pp. 80–86.

  44. A. Ohrn, Rosetta Technical Reference Manual, Norwegian University of Science and Technology, 1999.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ho, T.B., Nguyen, T.D., Shimodaira, H. et al. A Knowledge Discovery System with Support for Model Selection and Visualization. Applied Intelligence 19, 125–141 (2003). https://doi.org/10.1023/A:1023876925609

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1023876925609

Navigation