Abstract
The acquisition of explicit semantics is still a research challenge. Approaches for the extraction of semantics focus mostly on learning subordination relations. The extraction of coordination relations, also called “sibling relations” is studied much less, though they are not less important in ontology engineering.
We describe and evaluate the XTREEM-SG approach on finding sibling semantics from semi-structured Web documents. XTREEM-SG stands for “Xhtml TREE Mining - for Sibling Groups”. It uses the XHTML-markup that is available in Web content to group together terms that are in a sibling relation to each other. Our approach has the advantage that it is domain and language independent; it does not rely on background knowledge, NLP software nor training.
We evaluate XTREEM-SG towards two gold standard ontologies. We investigate how variations on input, parameters and gold standard influence the obtained results on structuring a closed vocabulary into semantic sibling groups. Earlier methods that evaluate sibling relations against a gold standard report a 14.18% F-measure on average sibling overlap. Our method improves this number into 22.93%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agirre, E., Ansa, O., Hovy, E.H., Martínez, D.: Enriching very large ontologies using the WWW. In: Staab, S., Maedche, A., Nedellec, C., Wiemer-Hastings, P. (eds.) Proceedings of ECAI Workshop on Ontology Learning, Berlin, Germany. CEUR Workshop Proceedings, vol. 31, CEUR-WS.org (August 2000)
Aussenac-Gilles, N., Jacques, M.-P.: Designing and evaluating patterns for ontology enrichment from texts. In: Staab, S., Svátek, V. (eds.) EKAW 2006. LNCS, vol. 4248, pp. 158–165. Springer, Heidelberg (2006)
Brunzel, M.: Learning of semantic sibling group hierarchies - k-means vs. bi-secting-k-means. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2007. LNCS, vol. 4654, pp. 365–374. Springer, Heidelberg (2007)
Brunzel, M., Spiliopoulou, M.: Domain relevance on term weighting. In: Kedad, Z., Lammari, N., Métais, E., Meziane, F., Rezgui, Y. (eds.) NLDB 2007. LNCS, vol. 4592, pp. 427–432. Springer, Heidelberg (2007)
Brunzel, M., Spiliopoulou, M.: Discovering semantic sibling associations from Web documents with XTREEM-SP. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 469–480. Springer, Heidelberg (2006)
Brunzel, M., Spiliopoulou, M.: Discovering semantic sibling groups from Web documents with XTREEM-SG. In: Staab, S., Svátek, V. (eds.) EKAW 2006. LNCS, vol. 4248, pp. 141–157. Springer, Heidelberg (2006)
Brunzel, M., Spiliopoulou, M.: Discovering multi terms and co-hyponymy from XHTML documents with XTREEM. In: Nayak, R., Zaki, M.J. (eds.) KDXD 2006. LNCS, vol. 3915, pp. 22–32. Springer, Heidelberg (2006)
Brunzel, M., Spiliopoulou, M.: Acquiring semantic sibling associations from Web documents. International Journal of Data Warehousing and Mining (IJDWM) 3(4), 83–98 (2007)
Buitelaar, P., Cimiano, P., Magnini, B.: Ontology Learning from Text: Methods, Evaluation and Applications. Frontiers in Artificial Intelligence and Applications Series, vol. 7. IOS Press, Amsterdam (2005)
Buttler, D.: A short survey of document structure similarity algorithms. In: Arabnia, H.R., Droegehorn, O. (eds.) Proceedings of the International Conference on Internet Computing (IC 2004), Las Vegas, Nevada, USA, pp. 3–9. CSREA Press (June 2004)
Caraballo, S.A.: Automatic construction of a hypernym-labeled noun hierarchy from text. In: Proceedings of the 37th annual meeting of the Association for Computational Linguistics (ACL 1999), Morristown, NJ, USA, pp. 120–126. Association for Computational Linguistics (1999)
Choi, I., Moon, B., Kim, H.-J.: A clustering method based on path similarities of XML data. Data & Knowledge Engineering (DKE) 60(2), 361–376 (2007)
Cimiano, P.: Ontology Learning and Population. PhD thesis, Universität Karlsruhe, Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB) (2006)
Cimiano, P., Staab, S.: Learning concept hierarchies from text with a guided agglomerative clustering algorithm. In: Biemann, C., Paas, G. (eds.) Proceedings of the ICML 2005 Workshop on Learning and Extending Lexical Ontologies with Machine Learning Methods, Bonn, Germany (August 2005)
Cimiano, P., Staab, S.: Learning by googling. SIGKDD Explorations 6(2), 24–33 (2004)
Dalamagas, T., Cheng, T., Winkel, K.-J., Sellis, T.: A methodology for clustering XML documents by structure. Information Systems 31(3), 187–228 (2006)
Ehrig, M., Maedche, A.: Ontology-focused crawling of Web documents. In: Proceedings of the 2003 ACM symposium on Applied computing, pp. 1174–1178. ACM Press, New York (2003)
Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Webscale information extraction in knowitall (preliminary results). In: Proceedings of the 13th International Conference on World Wide Web (WWW 2004), pp. 100–110. ACM Press, New York (2004)
Faatz, A., Steinmetz, R.: Ontology enrichment with texts from the WWW. In: Proceedings of the First International Workshop on Semantic Web Mining at the ECML 2002, Helsinki, Finland (2002)
Gómez-Pérez, A., Manzano-Macho, D.: A survey of ontology learning methods and techniques. deliverable 1.5, OntoWeb project, universidad polytecnica de madrid. Technical Report OntoWeb Deliverable D1.5, Universidad Polytecnica de Madrid (May 2003)
Hartigan, J.A., Wong, M.A.: A K-means clustering algorithm. Applied Statistics 28, 100–108 (1979)
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th conference on Computational linguistics, Morristown, NJ, USA, pp. 539–545. Association for Computational Linguistics (1992)
Heyer, G., Läuter, M., Quasthoff, U., Wittig, T., Wolff, C.: Learning relations using collocations. In: Proceedings of IJCAI 2001 Workshop on Ontology Learning (OL 2001), Seattle, USA. CEUR Workshop Proceedings, vol. 38, CEUR-WS.org (August 2001)
Kashyap, V.: Design and creation of ontologies for environmental information retrieval. In: Proceedings of 12th Workshop on Knowledge Acquisition, Modeling and Management (KAW 1999), Banff, Alberta, Canada (October 1999)
Kruschwitz, U.: Exploiting structure for intelligent Web search. In: Proceedings of the 34th Annual Hawaii International Conference on System Sciences (HICSS 2001), Washington, DC, USA. IEEE Computer Society, Los Alamitos (2001)
Kruschwitz, U.: A rapidly acquired domain model derived from markup structure. In: Proceedings of the ESSLLI 2001 Workshop on Semantic Knowledge Acquisition and Categorization, Helsinki, Finland (2001)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: 5th Berkley Symposium on Mathematics and Probability, pp. 281–297 (1967)
Novacek, V., Smrz, P.: Empirical merging of ontologies a proposal of universal uncertainty representation framework. In: Sure, Y., Domingue, J. (eds.) ESWC 2006. LNCS, vol. 4011, pp. 65–79. Springer, Heidelberg (2006)
Pasca, M.: Finding instance names and alternative glosses on the Web: Wordnet reloaded. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 280–292. Springer, Heidelberg (2005)
Salton, G., Buckley, C.: Term weighting approaches in automatic text retrieval. Technical report, Cornell University, Ithaca, NY, USA (1987)
Shamsfard, M., Barforoush, A.A.: The state of the art in ontology learning: a framework for comparison. The Knowledge Engineering Review 18(4), 293–316 (2003)
Shinzato, K., Torisawa, K.: Acquiring hyponymy relations from Web documents. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2004), Boston, Massachusetts, pp. 73–80 (2004)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Proceedings of the KDD International Workshop on Text Mining, Boston, MA, USA (August 2000)
Stojanovic, L., Stojanovic, N., Volz, R.: Migrating data-intensive Web sites into the semantic Web. In: Proceedings of the 2002 ACM symposium on Applied computing (SAC 2002), Madrid, Spain, pp. 1100–1107. ACM Press, New York (2002)
Tagarelli, A., Greco, S.: Toward semantic XML clustering. In: Ghosh, J., Lambert, D., Skillicorn, D.B., Srivastava, J. (eds.) Proceedings of the Sixth SIAM International Conference on Data Mining (SIAM 2006), Bethesda, MD, USA, pp. 188–199. SIAM, Philadelphia (2006)
Zhang, Z., Li, R., Cao, S., Zhu, Y.: Similarity metric for XML documents. In: Ralph, M.S. (ed.) Proceedings of Workshop on Knowledge Experience and Management (FGWM 2003), Karlsruhe, Germany, pp. 255–261. AIFB Karlsruhe, GI (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Brunzel, M., Spiliopoulou, M. (2008). Discovering Groups of Sibling Terms from Web Documents with XTREEM-SG. In: Spaccapietra, S., et al. Journal on Data Semantics XI. Lecture Notes in Computer Science, vol 5383. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92148-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-92148-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-92147-9
Online ISBN: 978-3-540-92148-6
eBook Packages: Computer ScienceComputer Science (R0)