A Confidence–Weighted Metric for Unsupervised Ontology Population from Web Texts

Oliveira, Hilário; Lima, Rinaldo; Gomes, João; Ferreira, Rafael; Freitas, Fred; Costa, Evandro

doi:10.1007/978-3-642-32600-4_14

Hilário Oliveira²⁰,
Rinaldo Lima²⁰,
João Gomes²⁰,
Rafael Ferreira²⁰,
Fred Freitas²⁰ &
…
Evandro Costa²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7446))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

878 Accesses
5 Citations

Abstract

Knowledge engineers have had difficulty in automatically constructing and populating domain ontologies, mainly due to the well-known knowledge acquisition bottleneck. In this paper, we attempt to alleviate this problem by proposing an unsupervised approach for extracting class instances using the web as a big corpus and exploring linguistic patterns to identify and extract ontological class instances. The prototype implementation uses shallow syntactic parsing for disambiguation issues. In addition, we propose a confidence-weighted metric based on different versions of the classical PMI metric, WordNet similarity measures, and heuristics to calculate the final confidence score that can altogether improve the ranking of candidate instances retrieved by the system. We conducted preliminary experiments comparing the proposed confidence metric against some versions of the PMI metric. We obtained promising results for the final ranking of the candidate instances, achieving a gain in precision up to 24%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American 284(5), 34–43 (2001)
Article Google Scholar
Cimiano, P.: Ontology Learning and Population from Text: Algorithms, Evaluation and Applications. Springer, New York (2006)
Google Scholar
Wimalasuriya, D.C., Dou, D.: Ontology-based information extraction: An introduction and a Survey of Current Approaches. J. Information Science 36(3), 306–323 (2010)
Article Google Scholar
Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A., Shaked, T., Soderland, S., Weld, D., Yates, A.: Web-Scale Information Extraction in KnowItAll. In: Proc. of the 13th Inter. WWW Conference (WWW 2004), New York City, New York, pp. 100–110 (2004)
Google Scholar
Cimiano, P., Handschuh, S., Staab, S.: Towards the self-annotating web. In: Proceedings of the 13th International Conf. on World Wide Web, pp. 462–471. ACM, New York (2004)
Google Scholar
Cimiano, P., Ladwig, G., Staab, S.: Gimme The Context: Context driven Automatic Semantic Annotation with CPANKOW. In: Proc. of the 14th Inter. Conf. on WWW, Japan, pp. 332–341 (2005)
Google Scholar
McDowell, L.K., Cafarella, M.: Ontology-Driven, Unsupervised Instance Population. Web Semantics: Science, Services and Agents on the World Wide Web 6(3), 218–236 (2008)
Article Google Scholar
Hearst, M.A.: Automatic Acquisition of Hyponyms from Large Text Corpora. In: 14th Conference on Computational Linguistics, COLING 1992, Nantes, France, vol. 2, pp. 539–545. Morgan Kaufmann (1992)
Google Scholar
Wu, F., Weld, D.S.: Autonomously Semantifying Wikipedia. In: CIKM, pp. 41-50. ACM (2007)
Google Scholar
Brill, E.: Processing Natural Language without Natural Language Processing. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 360–369. Springer, Heidelberg (2003)
Chapter Google Scholar
Ciravegna, F., Dingli, A., Guthrie, D., Wilks, Y.: Integrating Information to Bootstrap Information Extraction from Web Sites. In: IJCAI 2003 Workshop on Intelligent Information Integration, pp. 9–14 (2003)
Google Scholar
Petasis, G., Karkaletsis, V., Paliouras, G., Krithara, A., Zavitsanos, E.: Ontology Population and Enrichment: State of the Art. In: Paliouras, G., Spyropoulos, C.D., Tsatsaronis, G. (eds.) Multimedia Information Extraction. LNCS, vol. 6050, pp. 134–166. Springer, Heidelberg (2011)
Chapter Google Scholar
Pedersen, T.: Information Content Measures of Semantic Similarity Perform Better Without Sense-Tagged Text. In: Proc. of the 11th Annual Conf. of the North American Chapter of the Association for Computational Linguistics, Los Angeles, pp. 329–332 (2010)
Google Scholar
Lin, D.: An Information-Theoretic Definition of Similarity. In: Proceedings of International Conference on Machine Learning, Madison, Wisconsin (1998)
Google Scholar
Wu, Z., Palmer, M.: Verb Semantics and Lexical Selection. In: 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, pp. 133–138 (1994)
Google Scholar
Monllaó, C.V.: Ontology-based Information Extraction. Dissertation Thesis, Polytechnic University of Catalunya (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Informatics Center, Federal University of Pernambuco, Recife, Brazil
Hilário Oliveira, Rinaldo Lima, João Gomes, Rafael Ferreira & Fred Freitas
Computing Institute, Federal University of Alagoas, Maceió, Brazil
Evandro Costa

Authors

Hilário Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Rinaldo Lima
View author publications
You can also search for this author in PubMed Google Scholar
João Gomes
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Ferreira
View author publications
You can also search for this author in PubMed Google Scholar
Fred Freitas
View author publications
You can also search for this author in PubMed Google Scholar
Evandro Costa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Marriott School, Brigham Young University, 784 TNRB, 84602, Provo, UT, USA
Stephen W. Liddle
Software Competence Center Hagenberg, Softwarepark 21, 4232, Hagenberg, Austria
Klaus-Dieter Schewe
Institute of Software Technology & Interactive Systems, Vienna University of Technology, Favoritenstr. 9-11/188, 1040, Vienna, Austria
A Min Tjoa
School of Information Technology and Electrical Engineering, University of Queensland, 4072, Brisbane, QLD, Australia
Xiaofang Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oliveira, H., Lima, R., Gomes, J., Ferreira, R., Freitas, F., Costa, E. (2012). A Confidence–Weighted Metric for Unsupervised Ontology Population from Web Texts. In: Liddle, S.W., Schewe, KD., Tjoa, A.M., Zhou, X. (eds) Database and Expert Systems Applications. DEXA 2012. Lecture Notes in Computer Science, vol 7446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32600-4_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-32600-4_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32599-1
Online ISBN: 978-3-642-32600-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics