Abstract
This paper presents the KEOPS data mining methodology centered on domain knowledge integration. KEOPS is a CRISP-DM compliant methodology which integrates a knowledge base and an ontology. In this paper, we focus first on the pre-processing steps of business understanding and data understanding in order to build an ontology driven information system (ODIS). Then we show how the knowledge base is used for the post-processing step of model interpretation. We detail the role of the ontology and we define a part-way interestingness measure that integrates both objective and subjective criteria in order to eval model relevance according to expert knowledge. We present experiments conducted on real data and their results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.: Crisp-dm 1.0: Step-by-step data mining guide. In: SPSS Inc. (2000)
Kedad, Z., Métais, E.: Ontology-based data cleaning. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2553, pp. 137–149. Springer, Heidelberg (2002)
McGarry, K.: A survey of interestingness measures for knowledge discovery. Knowl. Eng. Rev. 20, 39–61 (2005)
Guarino, N.: Formal Ontology in Information Systems. IOS Press, Amsterdam (1998); Amended version of previous one in Proceedings of the 1st International Conference, Trento, Italy, June 6-8 (1998)
Ceri, S., Fraternali, P.: Designing Database Applications with Objects and Rules: The IDEA Methodology. Series on Database Systems and Applications. Addison-Wesley, Reading (1997)
Guarino, N., Masolo, C., Vetere, G.: Ontoseek: Using large linguistic ontologies for gathering information resources from the web. Technical report, LADSEB-CNR (1998)
Penarrubia, A., Fernandez-Caballero, A., Gonzalez, P., Botella, F., Grau, A., Martinez, O.: Ontology-based interface adaptivity in web-based learning systems. In: ICALT 2004: Proceedings of the IEEE International Conference on Advanced Learning Technologies (ICALT 2004), Washington, DC, USA, pp. 435–439. IEEE Computer Society, Los Alamitos (2004)
Leacock, C., Chodorow, M.: Combining local context with wordnet similarity for word sense identification. In: Fellbaum, C. (ed.) WordNet: A Lexical Reference System and its Application. MIT Press, Cambridge (1998)
Choi, I., Kim, M.: Topic distillation using hierarchy concept tree. In: ACM SIGIR conference, pp. 371–372 (2003)
Zhong, J., Zhu, H., Li, J., Yu, Y.: Conceptual graph matching for semantic search. In: ICCS conference, pp. 92–196 (2002)
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: IJCAI conference, pp. 448–453 (1995)
Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999)
Lin, D.: An information-theoretic definition of similarity. In: ICML conference (1998)
Jiang, J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. CoRR cmp-lg/9709008 (1997)
Lord, P., Stevens, R., Brass, A., Goble, C.A.: Semantic similarity measures as tools for exploring the gene ontology. In: PSB conference (2003)
Schlicker, A., Domingues, F., Rahnenfuhrer, J., Lengauer, T.: A new measure for functional similarity of gene products based on gene ontology. BMC Bioinformatics 7, 302 (2006)
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Closed set based discovery of small covers for association rules. In: Actes des 15émes journées Bases de Données Avancées (BDA 1999), pp. 361–381 (1999)
Becker, H.S.: Sociological Work: Method and Substance. Transaction Publishers, U. S (1976)
De Leenheer, P., de Moor, A.: Context-driven disambiguation in ontology elicitation. In: Shvaiko, P., Euzenat, J. (eds.) Context and Ontologies: Theory, Practice and Applications, Pittsburgh, Pennsylvania, AAAI, pp. 17–24. AAAI Press, Menlo Park (2005)
Berka, P., Bruha, I.: Discretization and grouping: Preprocessing steps for data mining. In: Żytkow, J.M. (ed.) PKDD 1998. LNCS, vol. 1510, pp. 239–245. Springer, Heidelberg (1998)
Srikant, R., Agrawal, R.: Mining generalized association rules. In: VLDB 1995: Proceedings of the 21th International Conference on Very Large Data Bases, pp. 407–419. Morgan Kaufmann Publishers Inc., San Francisco (1995)
Brisson, L.: Knowledge extraction using a conceptual information system (ExCIS). In: Collard, M. (ed.) ODBIS 2005/2006. LNCS, vol. 4623, pp. 119–134. Springer, Heidelberg (2007)
Imieliński, T., Mannila, H.: A database perspective on knowledge discovery. Commun. ACM 39, 58–64 (1996)
Rizzi, S., Bertino, E., Catania, B., Golfarelli, M., Halkidi, M., Terrovitis, M., Vassiliadis, P., Vazirgiannis, M., Vrachnos, E.: Towards a logical model for patterns. In: Song, I.-Y., Liddle, S.W., Ling, T.-W., Scheuermann, P. (eds.) ER 2003. LNCS, vol. 2813, pp. 77–90. Springer, Heidelberg (2003)
Collard, M., Vansnick, J.C.: How to measure interestingness in data mining: a multiple criteria decision analysis approach. In: RCIS, pp. 395–400 (2007)
Rada, R., Mili, H., Bicknell, E., Blettner, M.: Development and application of a metric on semantic nets, vol. 19, pp. 17–30 (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brisson, L., Collard, M. (2009). How to Semantically Enhance a Data Mining Process?. In: Filipe, J., Cordeiro, J. (eds) Enterprise Information Systems. ICEIS 2008. Lecture Notes in Business Information Processing, vol 19. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00670-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-00670-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00669-2
Online ISBN: 978-3-642-00670-8
eBook Packages: Computer ScienceComputer Science (R0)