Relational Model Based Annotation of the Web Data

Gelgi, Fatih; Vadrevu, Srinivas; Davulcu, Hasan

doi:10.1007/978-3-540-72575-6_20

Fatih Gelgi¹,
Srinivas Vadrevu¹ &
Hasan Davulcu¹

Part of the book series: Advances in Soft Computing ((AINSC,volume 43))

665 Accesses

Abstract

In this paper, we present a fast and scalable Bayesian model for improving weakly annotated data – which is typically generated by a (semi) automated information extraction (IE) system from Web documents. Weakly annotated data suffers from incorrect ontological role assignments. Our experimental evaluations with the TAP and a collection of 20,000 home pages from university, shopping and sports Web sites, indicate that the model described here can improve the accuracy of role assignments from 40% to 85% for template driven sites, from 68% to 87% for non-template driven sites.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: ACM SIGMOD, Washington, D.C., pp. 207–216. ACM Press, New York (1993), citeseer.ist.psu.edu/agrawal93mining.html
Google Scholar
Alpaydin, E.: Introduction to Machine Learning, pp. 39–59. MIT Press, Cambridge (2004)
Google Scholar
Chickering, D.M.: Learning bayesian networks is NP-complete. In: Learning from Data: Artificial Intelligence and Statistics V (1996)
Google Scholar
Crescenzi, V., Mecca, G.: Automatic information extraction from large web sites. Journal of ACM 51(5), 731–779 (2004)
Article MathSciNet Google Scholar
Dill, S., et al.: A case for automated large-scale semantic annotation. Journal of Web Semantics 1(1), 115–132 (2003)
Google Scholar
Friedman, N., et al.: Learning probabilistic relational models. In: IJCAI, pp. 1300–1309 (1999), citeseer.ist.psu.edu/friedman99learning.html
Gelgi, F., Vadrevu, S., Davulcu, H.: Automatic extraction of relational models from the web data. Technical Report ASU-CSE-TR-06-009, Arizona State University (April 2006)
Google Scholar
Guha, R., McCool, R.: TAP: A semantic web toolkit. Semantic Web Journal (2003)
Google Scholar
Murphy, K.: A brief intro. to graphical models and bayesian networks (1998)
Google Scholar
Vadrevu, S., Gelgi, F., Davulcu, H.: Semantic partitioning of web pages. In: WISE, New York, NY, USA, pp. 107–118 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Arizona State University, Tempe, AZ,
Fatih Gelgi, Srinivas Vadrevu & Hasan Davulcu

Authors

Fatih Gelgi
View author publications
You can also search for this author in PubMed Google Scholar
Srinivas Vadrevu
View author publications
You can also search for this author in PubMed Google Scholar
Hasan Davulcu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Katarzyna M. Wegrzyn-Wolska Piotr S. Szczepaniak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gelgi, F., Vadrevu, S., Davulcu, H. (2007). Relational Model Based Annotation of the Web Data. In: Wegrzyn-Wolska, K.M., Szczepaniak, P.S. (eds) Advances in Intelligent Web Mastering. Advances in Soft Computing, vol 43. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72575-6_20

Download citation

DOI: https://doi.org/10.1007/978-3-540-72575-6_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72574-9
Online ISBN: 978-3-540-72575-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics