How to Semantically Enhance a Data Mining Process?

Brisson, Laurent; Collard, Martine

doi:10.1007/978-3-642-00670-8_8

Laurent Brisson^7,9 &
Martine Collard^8,10

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 19))

Included in the following conference series:

International Conference on Enterprise Information Systems

568 Accesses

Abstract

This paper presents the KEOPS data mining methodology centered on domain knowledge integration. KEOPS is a CRISP-DM compliant methodology which integrates a knowledge base and an ontology. In this paper, we focus first on the pre-processing steps of business understanding and data understanding in order to build an ontology driven information system (ODIS). Then we show how the knowledge base is used for the post-processing step of model interpretation. We detail the role of the ontology and we define a part-way interestingness measure that integrates both objective and subjective criteria in order to eval model relevance according to expert knowledge. We present experiments conducted on real data and their results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Ontology-Based Data Mining Workflow Construction

Using Ontologies for Semantic Data Integration

Enabling Semantics in Enterprises

References

Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.: Crisp-dm 1.0: Step-by-step data mining guide. In: SPSS Inc. (2000)
Google Scholar
Kedad, Z., Métais, E.: Ontology-based data cleaning. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2553, pp. 137–149. Springer, Heidelberg (2002)
Chapter Google Scholar
McGarry, K.: A survey of interestingness measures for knowledge discovery. Knowl. Eng. Rev. 20, 39–61 (2005)
Article Google Scholar
Guarino, N.: Formal Ontology in Information Systems. IOS Press, Amsterdam (1998); Amended version of previous one in Proceedings of the 1st International Conference, Trento, Italy, June 6-8 (1998)
Google Scholar
Ceri, S., Fraternali, P.: Designing Database Applications with Objects and Rules: The IDEA Methodology. Series on Database Systems and Applications. Addison-Wesley, Reading (1997)
Google Scholar
Guarino, N., Masolo, C., Vetere, G.: Ontoseek: Using large linguistic ontologies for gathering information resources from the web. Technical report, LADSEB-CNR (1998)
Google Scholar
Penarrubia, A., Fernandez-Caballero, A., Gonzalez, P., Botella, F., Grau, A., Martinez, O.: Ontology-based interface adaptivity in web-based learning systems. In: ICALT 2004: Proceedings of the IEEE International Conference on Advanced Learning Technologies (ICALT 2004), Washington, DC, USA, pp. 435–439. IEEE Computer Society, Los Alamitos (2004)
Chapter Google Scholar
Leacock, C., Chodorow, M.: Combining local context with wordnet similarity for word sense identification. In: Fellbaum, C. (ed.) WordNet: A Lexical Reference System and its Application. MIT Press, Cambridge (1998)
Google Scholar
Choi, I., Kim, M.: Topic distillation using hierarchy concept tree. In: ACM SIGIR conference, pp. 371–372 (2003)
Google Scholar
Zhong, J., Zhu, H., Li, J., Yu, Y.: Conceptual graph matching for semantic search. In: ICCS conference, pp. 92–196 (2002)
Google Scholar
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: IJCAI conference, pp. 448–453 (1995)
Google Scholar
Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999)
Google Scholar
Lin, D.: An information-theoretic definition of similarity. In: ICML conference (1998)
Google Scholar
Jiang, J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. CoRR cmp-lg/9709008 (1997)
Google Scholar
Lord, P., Stevens, R., Brass, A., Goble, C.A.: Semantic similarity measures as tools for exploring the gene ontology. In: PSB conference (2003)
Google Scholar
Schlicker, A., Domingues, F., Rahnenfuhrer, J., Lengauer, T.: A new measure for functional similarity of gene products based on gene ontology. BMC Bioinformatics 7, 302 (2006)
Article Google Scholar
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Closed set based discovery of small covers for association rules. In: Actes des 15émes journées Bases de Données Avancées (BDA 1999), pp. 361–381 (1999)
Google Scholar
Becker, H.S.: Sociological Work: Method and Substance. Transaction Publishers, U. S (1976)
Google Scholar
De Leenheer, P., de Moor, A.: Context-driven disambiguation in ontology elicitation. In: Shvaiko, P., Euzenat, J. (eds.) Context and Ontologies: Theory, Practice and Applications, Pittsburgh, Pennsylvania, AAAI, pp. 17–24. AAAI Press, Menlo Park (2005)
Google Scholar
Berka, P., Bruha, I.: Discretization and grouping: Preprocessing steps for data mining. In: Żytkow, J.M. (ed.) PKDD 1998. LNCS, vol. 1510, pp. 239–245. Springer, Heidelberg (1998)
Chapter Google Scholar
Srikant, R., Agrawal, R.: Mining generalized association rules. In: VLDB 1995: Proceedings of the 21th International Conference on Very Large Data Bases, pp. 407–419. Morgan Kaufmann Publishers Inc., San Francisco (1995)
Google Scholar
Brisson, L.: Knowledge extraction using a conceptual information system (ExCIS). In: Collard, M. (ed.) ODBIS 2005/2006. LNCS, vol. 4623, pp. 119–134. Springer, Heidelberg (2007)
Chapter Google Scholar
Imieliński, T., Mannila, H.: A database perspective on knowledge discovery. Commun. ACM 39, 58–64 (1996)
Article Google Scholar
Rizzi, S., Bertino, E., Catania, B., Golfarelli, M., Halkidi, M., Terrovitis, M., Vassiliadis, P., Vazirgiannis, M., Vrachnos, E.: Towards a logical model for patterns. In: Song, I.-Y., Liddle, S.W., Ling, T.-W., Scheuermann, P. (eds.) ER 2003. LNCS, vol. 2813, pp. 77–90. Springer, Heidelberg (2003)
Chapter Google Scholar
Collard, M., Vansnick, J.C.: How to measure interestingness in data mining: a multiple criteria decision analysis approach. In: RCIS, pp. 395–400 (2007)
Google Scholar
Rada, R., Mili, H., Bicknell, E., Blettner, M.: Development and application of a metric on semantic nets, vol. 19, pp. 17–30 (1989)
Google Scholar

Download references

Author information

Authors and Affiliations

Institut TELECOM, TELECOM Bretagne, CNRS UMR 3192 LAB-STICC, Technopôle Brest-Iroise CS 83818, 29238, Brest Cedex 3, France
Laurent Brisson
INRIA Sophia Antipolis, 2004 route des Lucioles, 06902 BP93, Sophia Antipolis, France
Martine Collard
Université européenne de Bretagne, France
Laurent Brisson
Université Nice Sophia Antipolis, France
Martine Collard

Authors

Laurent Brisson
View author publications
You can also search for this author in PubMed Google Scholar
Martine Collard
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Systems and informatics, Institute for Systems and Technologies of Information, Control and Communication (INSTICC) and Instituto Politécnico de Setúbal (IPS), Rua do Vale de Chaves, Estefanilha, 2910-761, Setúbal, Portugal
Joaquim Filipe & José Cordeiro &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brisson, L., Collard, M. (2009). How to Semantically Enhance a Data Mining Process?. In: Filipe, J., Cordeiro, J. (eds) Enterprise Information Systems. ICEIS 2008. Lecture Notes in Business Information Processing, vol 19. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00670-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-00670-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00669-2
Online ISBN: 978-3-642-00670-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics