Skip to main content

Discovering Groups of Sibling Terms from Web Documents with XTREEM-SG

  • Chapter
Journal on Data Semantics XI

Part of the book series: Lecture Notes in Computer Science ((JODS,volume 5383))

Abstract

The acquisition of explicit semantics is still a research challenge. Approaches for the extraction of semantics focus mostly on learning subordination relations. The extraction of coordination relations, also called “sibling relations” is studied much less, though they are not less important in ontology engineering.

We describe and evaluate the XTREEM-SG approach on finding sibling semantics from semi-structured Web documents. XTREEM-SG stands for “Xhtml TREE Mining - for Sibling Groups”. It uses the XHTML-markup that is available in Web content to group together terms that are in a sibling relation to each other. Our approach has the advantage that it is domain and language independent; it does not rely on background knowledge, NLP software nor training.

We evaluate XTREEM-SG towards two gold standard ontologies. We investigate how variations on input, parameters and gold standard influence the obtained results on structuring a closed vocabulary into semantic sibling groups. Earlier methods that evaluate sibling relations against a gold standard report a 14.18% F-measure on average sibling overlap. Our method improves this number into 22.93%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agirre, E., Ansa, O., Hovy, E.H., Martínez, D.: Enriching very large ontologies using the WWW. In: Staab, S., Maedche, A., Nedellec, C., Wiemer-Hastings, P. (eds.) Proceedings of ECAI Workshop on Ontology Learning, Berlin, Germany. CEUR Workshop Proceedings, vol. 31, CEUR-WS.org (August 2000)

    Google Scholar 

  2. Aussenac-Gilles, N., Jacques, M.-P.: Designing and evaluating patterns for ontology enrichment from texts. In: Staab, S., Svátek, V. (eds.) EKAW 2006. LNCS, vol. 4248, pp. 158–165. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  3. Brunzel, M.: Learning of semantic sibling group hierarchies - k-means vs. bi-secting-k-means. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2007. LNCS, vol. 4654, pp. 365–374. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  4. Brunzel, M., Spiliopoulou, M.: Domain relevance on term weighting. In: Kedad, Z., Lammari, N., Métais, E., Meziane, F., Rezgui, Y. (eds.) NLDB 2007. LNCS, vol. 4592, pp. 427–432. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  5. Brunzel, M., Spiliopoulou, M.: Discovering semantic sibling associations from Web documents with XTREEM-SP. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 469–480. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Brunzel, M., Spiliopoulou, M.: Discovering semantic sibling groups from Web documents with XTREEM-SG. In: Staab, S., Svátek, V. (eds.) EKAW 2006. LNCS, vol. 4248, pp. 141–157. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Brunzel, M., Spiliopoulou, M.: Discovering multi terms and co-hyponymy from XHTML documents with XTREEM. In: Nayak, R., Zaki, M.J. (eds.) KDXD 2006. LNCS, vol. 3915, pp. 22–32. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  8. Brunzel, M., Spiliopoulou, M.: Acquiring semantic sibling associations from Web documents. International Journal of Data Warehousing and Mining (IJDWM) 3(4), 83–98 (2007)

    Article  Google Scholar 

  9. Buitelaar, P., Cimiano, P., Magnini, B.: Ontology Learning from Text: Methods, Evaluation and Applications. Frontiers in Artificial Intelligence and Applications Series, vol. 7. IOS Press, Amsterdam (2005)

    Google Scholar 

  10. Buttler, D.: A short survey of document structure similarity algorithms. In: Arabnia, H.R., Droegehorn, O. (eds.) Proceedings of the International Conference on Internet Computing (IC 2004), Las Vegas, Nevada, USA, pp. 3–9. CSREA Press (June 2004)

    Google Scholar 

  11. Caraballo, S.A.: Automatic construction of a hypernym-labeled noun hierarchy from text. In: Proceedings of the 37th annual meeting of the Association for Computational Linguistics (ACL 1999), Morristown, NJ, USA, pp. 120–126. Association for Computational Linguistics (1999)

    Google Scholar 

  12. Choi, I., Moon, B., Kim, H.-J.: A clustering method based on path similarities of XML data. Data & Knowledge Engineering (DKE) 60(2), 361–376 (2007)

    Article  Google Scholar 

  13. Cimiano, P.: Ontology Learning and Population. PhD thesis, Universität Karlsruhe, Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB) (2006)

    Google Scholar 

  14. Cimiano, P., Staab, S.: Learning concept hierarchies from text with a guided agglomerative clustering algorithm. In: Biemann, C., Paas, G. (eds.) Proceedings of the ICML 2005 Workshop on Learning and Extending Lexical Ontologies with Machine Learning Methods, Bonn, Germany (August 2005)

    Google Scholar 

  15. Cimiano, P., Staab, S.: Learning by googling. SIGKDD Explorations 6(2), 24–33 (2004)

    Article  Google Scholar 

  16. Dalamagas, T., Cheng, T., Winkel, K.-J., Sellis, T.: A methodology for clustering XML documents by structure. Information Systems 31(3), 187–228 (2006)

    Article  MATH  Google Scholar 

  17. Ehrig, M., Maedche, A.: Ontology-focused crawling of Web documents. In: Proceedings of the 2003 ACM symposium on Applied computing, pp. 1174–1178. ACM Press, New York (2003)

    Chapter  Google Scholar 

  18. Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Webscale information extraction in knowitall (preliminary results). In: Proceedings of the 13th International Conference on World Wide Web (WWW 2004), pp. 100–110. ACM Press, New York (2004)

    Google Scholar 

  19. Faatz, A., Steinmetz, R.: Ontology enrichment with texts from the WWW. In: Proceedings of the First International Workshop on Semantic Web Mining at the ECML 2002, Helsinki, Finland (2002)

    Google Scholar 

  20. Gómez-Pérez, A., Manzano-Macho, D.: A survey of ontology learning methods and techniques. deliverable 1.5, OntoWeb project, universidad polytecnica de madrid. Technical Report OntoWeb Deliverable D1.5, Universidad Polytecnica de Madrid (May 2003)

    Google Scholar 

  21. Hartigan, J.A., Wong, M.A.: A K-means clustering algorithm. Applied Statistics 28, 100–108 (1979)

    Article  MATH  Google Scholar 

  22. Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th conference on Computational linguistics, Morristown, NJ, USA, pp. 539–545. Association for Computational Linguistics (1992)

    Google Scholar 

  23. Heyer, G., Läuter, M., Quasthoff, U., Wittig, T., Wolff, C.: Learning relations using collocations. In: Proceedings of IJCAI 2001 Workshop on Ontology Learning (OL 2001), Seattle, USA. CEUR Workshop Proceedings, vol. 38, CEUR-WS.org (August 2001)

    Google Scholar 

  24. Kashyap, V.: Design and creation of ontologies for environmental information retrieval. In: Proceedings of 12th Workshop on Knowledge Acquisition, Modeling and Management (KAW 1999), Banff, Alberta, Canada (October 1999)

    Google Scholar 

  25. Kruschwitz, U.: Exploiting structure for intelligent Web search. In: Proceedings of the 34th Annual Hawaii International Conference on System Sciences (HICSS 2001), Washington, DC, USA. IEEE Computer Society, Los Alamitos (2001)

    Google Scholar 

  26. Kruschwitz, U.: A rapidly acquired domain model derived from markup structure. In: Proceedings of the ESSLLI 2001 Workshop on Semantic Knowledge Acquisition and Categorization, Helsinki, Finland (2001)

    Google Scholar 

  27. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: 5th Berkley Symposium on Mathematics and Probability, pp. 281–297 (1967)

    Google Scholar 

  28. Novacek, V., Smrz, P.: Empirical merging of ontologies a proposal of universal uncertainty representation framework. In: Sure, Y., Domingue, J. (eds.) ESWC 2006. LNCS, vol. 4011, pp. 65–79. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  29. Pasca, M.: Finding instance names and alternative glosses on the Web: Wordnet reloaded. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 280–292. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  30. Salton, G., Buckley, C.: Term weighting approaches in automatic text retrieval. Technical report, Cornell University, Ithaca, NY, USA (1987)

    Google Scholar 

  31. Shamsfard, M., Barforoush, A.A.: The state of the art in ontology learning: a framework for comparison. The Knowledge Engineering Review 18(4), 293–316 (2003)

    Article  Google Scholar 

  32. Shinzato, K., Torisawa, K.: Acquiring hyponymy relations from Web documents. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2004), Boston, Massachusetts, pp. 73–80 (2004)

    Google Scholar 

  33. Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Proceedings of the KDD International Workshop on Text Mining, Boston, MA, USA (August 2000)

    Google Scholar 

  34. Stojanovic, L., Stojanovic, N., Volz, R.: Migrating data-intensive Web sites into the semantic Web. In: Proceedings of the 2002 ACM symposium on Applied computing (SAC 2002), Madrid, Spain, pp. 1100–1107. ACM Press, New York (2002)

    Chapter  Google Scholar 

  35. Tagarelli, A., Greco, S.: Toward semantic XML clustering. In: Ghosh, J., Lambert, D., Skillicorn, D.B., Srivastava, J. (eds.) Proceedings of the Sixth SIAM International Conference on Data Mining (SIAM 2006), Bethesda, MD, USA, pp. 188–199. SIAM, Philadelphia (2006)

    Chapter  Google Scholar 

  36. Zhang, Z., Li, R., Cao, S., Zhu, Y.: Similarity metric for XML documents. In: Ralph, M.S. (ed.) Proceedings of Workshop on Knowledge Experience and Management (FGWM 2003), Karlsruhe, Germany, pp. 255–261. AIFB Karlsruhe, GI (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Brunzel, M., Spiliopoulou, M. (2008). Discovering Groups of Sibling Terms from Web Documents with XTREEM-SG. In: Spaccapietra, S., et al. Journal on Data Semantics XI. Lecture Notes in Computer Science, vol 5383. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92148-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-92148-6_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-92147-9

  • Online ISBN: 978-3-540-92148-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics