Hostname: page-component-76fb5796d-wq484 Total loading time: 0 Render date: 2024-04-25T14:18:40.583Z Has data issue: false hasContentIssue false

Toward an integrated knowledge discovery and data mining process model

Published online by Cambridge University Press:  01 March 2010

Sumana Sharma*
Affiliation:
Department of Information Systems, the Information Systems Research Institute, Virginia Commonwealth University, Richmond, VA 23284, USA
Kweku-Muata Osei-Bryson*
Affiliation:
Department of Information Systems, the Information Systems Research Institute, Virginia Commonwealth University, Richmond, VA 23284, USA

Abstract

The knowledge discovery and data mining (KDDM) process models describe the various phases (e.g. business understanding, data understanding, data preparation, modeling, evaluation and deployment) of the KDDM process. They act as a roadmap for implementation of the KDDM process by presenting a list of tasks for executing the various phases. The checklist approach of describing the tasks is not adequately supported by appropriate tools, which specify ‘how’ the particular task can be implemented. This may result in tasks not being implemented. Another disadvantage is that the long checklist does not capture or leverage the dependencies that exist among the various tasks of the same and different phases. This not only makes the process cumbersome to implement, but also hinders possibilities for semi-automation of certain tasks. Given that each task in the process model serves an important goal and even affects the execution of related tasks due to the dependencies, these limitations are likely to negatively affect the efficiency and effectiveness of KDDM projects. This paper proposes an improved KDDM process model that overcomes these shortcomings by prescribing tools for supporting each task as well as identifying and leveraging dependencies among tasks for semi-automation of tasks, wherever possible.

Type
Articles
Copyright
Copyright © Cambridge University Press 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Anand, S., Buchner, A. 1998. Decision Support Using Data Mining. London: Financial Times Pitman Publishers.Google Scholar
Basili, V. R., Weiss, D. M. 1984. A methodology for collecting valid software engineering data. IEEE Transactions on Software Engineering 10(6), 728738.CrossRefGoogle Scholar
Bernstein, A., Provost, F. & Hill, S. 2005. Toward intelligent assistance for a data mining process: an ontology-based approach for cost-sensitive classification. IEEE Transactions on Knowledge and Data Engineering 17(4), 503518.CrossRefGoogle Scholar
Berry, M., Linoff, G. 1997. Data Mining Techniques for Marketing, Sales and Customer Support. John Wiley and Sons.Google Scholar
Berry, M., Linoff, G. 2000. Mastering Data Mining: The Art and Relationship of Customer Relationship Management. John Wiley and Sons.Google Scholar
Cabena, P., Hadjinian, P., Stadler, R., Verhees, J.Zanasi, A. 1998. Discovering Data Mining: From Concepts to Implementation. Prentice Hall.Google Scholar
Charest, M., Delisle, S., Cervantes, O.Shen, Y. 2006. Intelligent data mining assistance via CBR and ontologies. In Proceedings of the 17th International Conference on Database and Expert Systems Applications (DEXA’06).Google Scholar
Choi, D. H., Ahn, B. S., Kim, S. H. 2005. Ranking discovered rules from data mining with multiple criteria by data envelopment analysis. Expert Systems with Applications 29(4), 867878.CrossRefGoogle Scholar
Cios, K., Kurgan, L. 2005. Trends in data mining and knowledge discovery. In Advanced Techniques in Knowledge Discovery and Data Mining. Pal, N. & Jain, L. (eds). Springer, 126.Google Scholar
Cios, K., Teresinska, A., Konieczna, J. & Sharma, S. 2000. Diagnosing myocardial perfusion from PECT bull’s-eye maps—a knowledge discovery approach. IEEE Engineering in Medicine and Biology Magazine, Special Issue on Medical Data Mining and Knowledge Discovery 19(4), 1725.CrossRefGoogle Scholar
CRISP-DM. (2003). Cross Industry Standard Process for Data Mining 1.0: Step by Step Data Mining Guide. http://www.crisp-dm.org/ accessed October 1, 2007.Google Scholar
Davenport, T. H., Harris, J. G. 2007. Competing on Analytics. Harvard Business School Press.Google Scholar
Doran, G. T. 1981. There’s a S.M.A.R.T. way to write management goals and objectives. Management Review (AMA Forum), 3536.Google Scholar
Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P. & Uthuruswamy, R. (eds). 1996a. Advances in Knowledge Discovery and Data Mining. AAAI Press/MIT Press.Google Scholar
Fox, M. S., Barbuceanu, M. & Gruninger, M. 1998. An organization ontology for enterprise modeling. Simulating Organizations: Computational Models of Institutions and Groups. AAAI/MIT Press, 131152.Google Scholar
Han, J., Kamber, M. 2001. Data Mining: Concepts and Techniques. Morgan Kaufmann.Google Scholar
Keeney, R. 1996. Value focussed thinking: a path to creative decision-making, Harvard University Press.CrossRefGoogle Scholar
Kurgan, L. A., Musilek, P. 2006. A survey of knowledge discovery and data mining process models. The Knowledge Engineering Review 21(1), 124.CrossRefGoogle Scholar
Laguna, M. A., Marqués, J. M. & Garcia, F. 2001. A user requirements elicitation tool. ACM SIGSOFT Software Engineering Notes Archive 26(2), 3537.CrossRefGoogle Scholar
Osei-Bryson, K.-M. 2004. Evaluation of decision trees. Computers and Operations Research 31, 19331945.CrossRefGoogle Scholar
Osei-Bryson, K.-M. 2006. Class Notes: Clustering Info 614: Graduate Course in Data Mining Virginia Commonwealth University.Google Scholar
Pyle, D. 2003. Business Modeling and Data Mining. Morgan Kaufmann Publishers.Google Scholar
Redpath, R., Srinivasan, B. 2003. Criteria for a comparative study of visualization techniques in data mining. IEEE 3rd International Conference On Intelligent Systems Design and Application, Tulsa, USA. Springer-Verlag.Google Scholar
Saaty, T. L. 1991. Response to Holder’s comments on the analytic hierarchy process. The Journal of the Operational Research Society 42(10), 909914.CrossRefGoogle Scholar
Sharma, S., Osei-Bryson, K.-M. 2008a. Organization-Ontology Based Framework for Executing the Business Understanding Phase of Data Mining Projects. Hawaii International Conference on Systems Sciences.Google Scholar
Sharma, S., Osei-Bryson, K.-M. 2008b. Framework for formal implementation of the business understanding Phase of data mining projects. Expert Systems with Applications 36(2), 41144124.CrossRefGoogle Scholar
Simon, H. A. 1996. The Sciences of the Artificial. MIT Press.Google Scholar
Simoudis, E., Livezey, B. & Kerber, R. 1996. Integrating inductive and deductive reasoning for data mining. In Advances in Knowledge Discovery and Data Mining. Fayyad, U., Paitetsky-Shapiro, G., Smyth, P. & Uthurusamy, R. (eds). AAAI Press/MIT Press.Google Scholar