Discovering Semantic Sibling Groups from Web Documents with XTREEM-SG

Brunzel, Marko; Spiliopoulou, Myra

doi:10.1007/11891451_15

Marko Brunzel²⁰ &
Myra Spiliopoulou²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4248))

Included in the following conference series:

International Conference on Knowledge Engineering and Knowledge Management

940 Accesses
4 Citations

Abstract

The acquisition of explicit semantics is still a research challenge. Approaches for the extraction of semantics focus mostly on learning hierarchical hypernym-hyponym relations. The extraction of co-hyponym and co-meronym sibling semantics is performed to a much lesser extent, though they are not less important in ontology engineering.

In this paper we will describe and evaluate the XTREEM-SG (Xhtml TREE Mining – for Sibling Groups) approach on finding sibling semantics from semi-structured Web documents. XTREEM takes advantage of the added value of mark-up, available in web content, for grouping text siblings. We will show that this grouping is semantically meaningful. The XTREEM-SG approach has the advantage that it is domain and language independent; it does not rely on background knowledge, NLP software or training.

In this paper we apply the XTREEM-SG approach and evaluate against the reference semantics from two golden standard ontologies. We investigate how variations on input, parameters and reference influence the obtained results on structuring a closed vocabulary on sibling relations. Earlier methods that evaluate sibling relations against a golden standard report a 14.18% F-measure value. Our method improves this number into 21.47%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

DRHTG: A Knowledge-Centric Approach for Document Retrieval Based on Heterogeneous Entity Tree Generation and RDF Mapping

SHELDON: Semantic Holistic FramEwork for LinkeD ONtology Data

Augmenting Linked Data Ontologies with New Object Properties

Article 08 February 2020

References

Agirre, E., Ansa, O., Hovy, E., Martinez, D.: Enriching very large ontologies using the WWW. In: Proc. of the Workshop on Ontology Construction ECAI 2000 (2000)
Google Scholar
Buttler, D.: A short survey of document structure similarity algorithms. In: Proc. of the International Conference on Internet Computing (June 2004)
Google Scholar
Buitelaar, P., Cimiano, P., Magnini, B.: Ontology Learning from Text: Methods. Evaluation and Applications. In: Frontiers in Artificial Intelligence and Applications Series, vol. 123. IOS Press, Amsterdam (2005)
Google Scholar
Brunzel, M., Spiliopoulou, M.: Discovering Multi Terms and Co-Hyponymy from XHTML Documents with XTREEM. In: Nayak, R., Zaki, M.J. (eds.) KDXD 2006. LNCS, vol. 3915, pp. 22–32. Springer, Heidelberg (2006)
Chapter Google Scholar
Caraballo, S.: Automatic construction of a hypernym-labeled noun hierarchy from text. In: Proc. of the 37th Annual Meeting of The Association for Computational Linguistics ACL
Google Scholar
Choi, I., Moon, B., Kim, H.-J.: A Clustering Method based on Path Similarities of XML Data. Data & Knowledge Engineering (February 2006)
Google Scholar
Cimiano, P., Staab, S.: Learning by googling. SIGKDD Explorations 6(2), 24–34 (2004)
Article Google Scholar
Cimiano, P., Staab, S.: Learning concept hierarchies from text with a guided hierarchical clustering algorithm. In: Workshop on Learning and Extending Lexical Ontologies at ICML 2005, Bonn (2005)
Google Scholar
Dalamagas, T., Cheng, T., Winkel, K.J., Sellis, T.: Clustering XML documents using structural summaries. In: Lindner, W., Mesiti, M., Türker, C., Tzitzikas, Y., Vakali, A.I. (eds.) EDBT 2004. LNCS, vol. 3268, pp. 547–556. Springer, Heidelberg (2004)
Chapter Google Scholar
Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-Scale Information Extraction in KnowItAll. In: Proc of the 13th International WWW Conference, New York (2004)
Google Scholar
Nédellec, C., Faure, D.: Knowledge Acquisition of Predicate Argument Structures from Technical Texts Using Machine Learning: The System ASIUM. In: Fensel, D., Studer, R. (eds.) EKAW 1999. LNCS (LNAI), vol. 1621, pp. 329–334. Springer, Heidelberg (1999)
Chapter Google Scholar
Faatz, A., Steinmetz, R.: Ontology Enrichment with Texts from the WWW. In: Proc. of the First International Workshop on Semantic Web Mining, European Conference on Machine Learning 2002, Helsinki (2002)
Google Scholar
Hearst, M.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th International Conference on Computational Linguistics, pp. 539–545 (1992)
Google Scholar
Kruschwitz, U.: A Rapidly Acquired Domain Model Derived from Mark-Up Structure. In: In Proc. of the ESSLLI 2001 Workshop on Semantic Knowledge Acquisition and Categorization, Helsinki (2001)
Google Scholar
Kruschwitz, U.: Exploiting Structure for Intelligent Web Search. In: Proc of the 34th Hawaii International Conference on System Sciences (HICSS), Maui Hawaii 2001. IEEE, Los Alamitos (2001)
Google Scholar
Kashyap, V.: Design and creation of ontologies for environmental information retrieval. In: Proc. of the 12th Workshop on Knowledge Acquisition, Modeling and Management, Alberta, Canada (1999)
Google Scholar
Maedche, A., Staab, S.: Discovering conceptual relations from text. In: Nareyek, A. (ed.) ECAI-WS 2000. LNCS (LNAI), vol. 2148, pp. 321–325. Springer, Heidelberg (2001)
Google Scholar
Pasca, M.: Finding Instance Names and Alternative Glosses on the Web: WordNet Reloaded. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 280–292. Springer, Heidelberg (2005)
Chapter Google Scholar
Salton, G., Buckley, C.: Term weighting approaches in automatic text retrieval. Information Processing & Management 24(5), 513–523 (1988)
Article Google Scholar
Stojanovic, L., Stojanovic, N., Volz, R.: Migrating data-intensive Web Sites into the Semantic Web. In: Proc. of the 17th ACM symposium on applied computing, pp. 1100–1107. ACM press, New York (2002)
Google Scholar
Shinzato, K., Torisawa, K.: Acquiring hyponymy relations from Web Documents. In: Proceedings of the 2004 Human Language Technology Conference (HLT-NAACL 2004), Boston, Massachusetts, pp. 73–80 (2004)
Google Scholar
Tagarelli, A., Greco, S.: Toward Semantic XML Clustering. In: 6th SIAM International Conference on Data Mining (SDM 2006). Bethesda, Maryland, USA, April 20-22 (2006)
Google Scholar
Zhang, Z., Li, R., Cao, S., Zhu, Y.: Similarity metric for XML documents. In: Proc. of the Workshop on Knowledge and Experience Management (October 2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Otto-von-Guericke-University, Magdeburg
Marko Brunzel & Myra Spiliopoulou

Authors

Marko Brunzel
View author publications
You can also search for this author in PubMed Google Scholar
Myra Spiliopoulou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fachbereich Informatik, Universität Koblenz-Landau, Universitätsstraße 1, 56070, Koblenz, Germany
Steffen Staab
Dept. Information and Knowledge Engineering,, University of Economics, Prague, Winston Churchill Sq. 4, 130 67 Praha 3, Prague, Czech Republic
Vojtěch Svátek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brunzel, M., Spiliopoulou, M. (2006). Discovering Semantic Sibling Groups from Web Documents with XTREEM-SG. In: Staab, S., Svátek, V. (eds) Managing Knowledge in a World of Networks. EKAW 2006. Lecture Notes in Computer Science(), vol 4248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11891451_15

Download citation

DOI: https://doi.org/10.1007/11891451_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46363-4
Online ISBN: 978-3-540-46365-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics