Ontology Augmentation via Attribute Extraction from Multiple Types of Sources

Fang, Xiu Susie; Wang, Xianzhi; Sheng, Quan Z.

doi:10.1007/978-3-319-19548-3_2

Xiu Susie Fang¹⁶,
Xianzhi Wang¹⁶ &
Quan Z. Sheng¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9093))

Included in the following conference series:

Australasian Database Conference

1511 Accesses
1 Citations

Abstract

A comprehensive ontology can ease the discovery, maintenance and popularization of knowledge in many domains. As a means to enhance existing ontologies, attribute extraction has attracted tremendous research attentions. However, most existing attribute extraction techniques focus on exploring a single type of sources, such as structured (e.g., relational databases), semi-structured (e.g., Extensible Markup Language (XML)) or unstructured sources (e.g., Web texts, images), which leads to the poor coverage of knowledge bases (KBs). This paper presents a framework for ontology augmentation by extracting attributes from four types of sources, namely existing knowledge bases (KBs), query stream, Web texts, and Document Object Model (DOM) trees. In particular, we use query stream and two major KBs, DBpedia and Freebase, to seed the attribute extraction from Web texts and DOM trees. We specially focus on exploring the extraction technique from DOM trees, which is rarely studied in previous works. Algorithms and a series of filters are developed. Experiments show the capability of our approach in augmenting existing KB ontology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adelberg, B.: NoDoSE - A Tool for Semi-automatically Extracting Structured and Semistructured Data from Text Documents. ACM SIGMOD Record 27(2), 283–294 (1998)
Article Google Scholar
Arasu, A., Garcia-Molina, H.: Extracting structured data from web pages. In: Proceedings of ACM SIGMOD Conference (SIGMOD 2003), New York, USA (2003)
Google Scholar
Bing, L., Lam, W., Gu, Y.: Towards a unified solution: data record region detection and segmentation. In: Proceedings of the 20th ACM Intl. Conf. on Information and Knowledge Management (CIKM 2011), New York, NY, USA (2011)
Google Scholar
Crescenzi, V., Mecca, G., Merialdo, P.: RoadRunner: automatic data extraction from data-intensive web sites. In: Proceedings of the 2002 ACM SIGMOD Conference (SIGMOD 2002), New York, NY, USA (2002)
Google Scholar
Grishman, R.: Information extraction: capabilities and challenges. In: Notes for the 2012 International Winter School in Language and Speech Technologies. Rovira i Virgili University, Tarragona (2012)
Google Scholar
Gupta, R., Halevy, A., Wang, X., Whang, S., Wu, F.: Biperpedia: An Ontology for Search Applications. The VLDB Endowment (PVLDB) 7(7), 505–516 (2014)
Article Google Scholar
Haghighi, A., Klein, D.: Simple coreference resolution with rich syntactic and semantic features. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP 2009), Singapore (2009)
Google Scholar
Irmak, U., Suel, T.: Interactive wrapper generation with minimal user effort. In: Proceedings of the 15th International Conference on World Wide Web (WWW 2006), New York, NY, USA (2006)
Google Scholar
Kopliku, A., Boughanem, M., Pinel-Sauvagnat, K.: Towards a framework for attribute retrieval. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM 2011), New York, NY, USA (2011)
Google Scholar
Kristjansson, T., Culotta, A., Viola, P., McCallum, A.: Interactive information extraction with constrained conditional random fields. In: Proceedings of the 19th National Conf. on Artifical Intelligence (AAAI 2004), San Jose, California (2004)
Google Scholar
Lee, T., Wang, Z., Wang, H., won Hwang, S.: Attribute extraction and scoring: a probabilistic approach. In: Proceedings of 29th International Conference on Data Engineering (ICDE 2013), Brisbane, Australia (2013)
Google Scholar
Liu, B., Grossman, R., Zhai, Y.: Mining data records in web pages. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003), New York, NY, USA (2003)
Google Scholar
Liu, L., Pu, C., Han, W.: XWRAP: an XML-enabled wrapper construction system for web information sources. In: Proceedings of the 16th International Conference on Data Engineering (ICDE 2000), San Diego, California, USA (2000)
Google Scholar
Paşca, M., Alfonseca, E., Robledo-Arnuncio, E., Martin-Brualla, R., Hall, K.: The role of query sessions in extracting instance attributes from web search queries. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 62–74. Springer, Heidelberg (2010)
Chapter Google Scholar
Pasca, M., Durme, B.V.: What you seek is what you get: extraction of class attributes from query logs. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007), Hyderabad, India (2007)
Google Scholar
Turmo, J., Ageno, A., Català, N.: Adaptive Information Extraction. ACM Computing Surveys (CSUR) 38(2), 4-es (2006)
Article Google Scholar
Zhu, J., Nie, Z., Wen, J.R., Zhang, B., Ma, W.Y.: Simultaneous record detection and attribute labeling in web data extraction. In: Proceedings of the 12th ACM SIGKDD Conference (KDD 2006), New York, NY, USA (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, The University of Adelaide, Adelaide, SA, 5005, Australia
Xiu Susie Fang, Xianzhi Wang & Quan Z. Sheng

Authors

Xiu Susie Fang
View author publications
You can also search for this author in PubMed Google Scholar
Xianzhi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Quan Z. Sheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiu Susie Fang .

Editor information

Editors and Affiliations

University of Queensland, Brisbane, Queensland, Australia
Mohamed A. Sharaf
Monash University, Clayton, Australia
Muhammad Aamir Cheema
The University of Melbourne, Melbourne, Australia
Jianzhong Qi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fang, X.S., Wang, X., Sheng, Q.Z. (2015). Ontology Augmentation via Attribute Extraction from Multiple Types of Sources. In: Sharaf, M., Cheema, M., Qi, J. (eds) Databases Theory and Applications. ADC 2015. Lecture Notes in Computer Science(), vol 9093. Springer, Cham. https://doi.org/10.1007/978-3-319-19548-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-19548-3_2
Published: 28 May 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19547-6
Online ISBN: 978-3-319-19548-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics