Abstract
Ontologies provide a common layer which plays a major role in supporting information exchange and sharing. In this paper, we focus on the ontological concept extraction process from HTML documents. We propose an unsupervised hierarchical clustering algorithm namely “Contextual Ontological Concept Extraction” (COCE) which is an incremental use of a partitioning algorithm and is guided by a structural context. This context exploits the html structure and the location of words to select the semantically closer cooccurrents for each word and to improve the words weighting. Guided by this context definition, we perform an incremental clustering that refines the words’ context of each cluster to obtain semantic extracted concepts. The COCE algorithm offers the choice between either an automatic execution or an interactive one. We experiment the COCE algorithm on French documents related to the tourism. Our results show how the execution of our context-based algorithm improves the relevance of the clusters’ conceptual quality.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Faure, D., Nedellec, C., Rouveirol, C.: Acquisition of semantic knowledge uing machine learning methods: the system ASIUM. Technical report number ICS-TR-88-16, inference and learning group, University of Paris-sud (1998)
Meadche, A., Staab, S.: Ontology learning for the semantic Web. IEEE journal on Intelligent Systems 16(2), 72–79 (2001)
Han, H., Elmasri, R.: Architecture of WebOntEx: A system for automatic extraction of ontologies from the Web. In: WCM 2000 (submitted, 2000)
Davulcu, H., Vadrevu, S., Nagarajan, S.: OntoMiner: Boostrapping ontologies from overlapping domain specific web sites. In: AAAI 1998/IAAI 1998: Proceedings of the 15th National Conference on Artificial Intelligence (1998)
Navigli, R., Velardi, P.: Learning domain ontologies from document warehousees and dedicated web sites. In: AAAI 1998/IAAI 1998: Proceedings of the 15th National Conference on Artificial Intelligence (1998)
Michelet, B.: L’analyse des associations. Thèse de doctorat, Université de Paris VII, UFR de Chimie, Paris (Octobre 26, 1988)
Karoui, L., Bennacer, N.: A framework for retrieving conceptual knowledge from Web pages. Semantic Web Applications and Perspectives SWAP, Italy (2005)
Vazirgiannis, M., Halkidi, M., Gunopoulos, D.: Uncertaintly handling and quality assessmen in data mining. Springer, Heidelberg (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Karoui, L., Bennacer, N., Aufaure, MA. (2006). Contextual Ontological Concepts Extraction. In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds) Discovery Science. DS 2006. Lecture Notes in Computer Science(), vol 4265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11893318_32
Download citation
DOI: https://doi.org/10.1007/11893318_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46491-4
Online ISBN: 978-3-540-46493-8
eBook Packages: Computer ScienceComputer Science (R0)