Abstract
Named entities (e.g., “Kofi Annan”, “Coca-Cola”, “Second World War”) are ubiquitous in web pages and other types of document and often provide a simplified picture of the document’s content. We present an ontology currently containing 31,000 named entities in different languages from various domains such as history, geography, politics, sports, arts, etc., which is being developed at the University of Munich (LMU). The underlying graph data model is simple and yet extremely versatile in different application scenarios. We demonstrate a prototype of a graphical interface to both the ontology and to documents on the web or in a local document repository, with a tight interaction in both directions. Occurrences of concepts from the ontology are highlighted and hyperlinked in the documents. Unrecognized entities could be added to the database and related to other concepts in a semiautomatic process. The entity database can also be used for extending full-text queries on the web or the repository to semantically close documents, and for indexing different kinds of named entities in the document repository. Similar to a programming IDE, the system illustrates how integrated browsing, search and update functionality contributes to the construction of high-quality ontologies, fundamental to the vision of a truly “semantic” web.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Dean, M., Schreiber, G.: OWL Web Ontology Language Ref., W3C Rec. (2005)
Klyne, G., Carroll, J.J.: Resource Description Framework, W3C Rec. (2005)
Sure, Y., Erdmann, M., Angele, J., Staab, S., Studer, R., Wenke, D.: OntoEdit: Collaborative Ontology Engineering for the Semantic Web. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, pp. 221–235. Springer, Heidelberg (2002)
Noy, N.F., Sintek, M., et al.: Creating Semantic Web Contents with Protege-2000. IEEE Intelligent Systems 16, 60–71 (2001)
Wikipedia: The Free Encyclopedia, http://www.wikipedia.org
Schulz, K.U., Weigel, F.: Systematics and architecture for a resource representing knowledge about named entities. In: Bry, F., Henze, N., Małuszyński, J. (eds.) PPSWR 2003. LNCS, vol. 2901, pp. 189–207. Springer, Heidelberg (2003)
Brunner, L., Schulz, K.U., Weigel, F.: Organizing Thematic, Geographic and Temporal Knowledge in a Well-founded Navigation Space: Logical and Algorithmic Foundations for EFGT Nets. J. Web Serv. Research, Spec. Issue Semantically Augmented Metadata for Services, Grids, and Software Engin. (in press, 2006)
Mihov, S., Schulz, K.U.: Efficient Dictionary-Based Text Rewriting using Subsequential Transducers. Journal of Natural Language Engineering (2005)
Dzbor, M., Domingue, J., Motta, E.: Magpie – towards a semantic web browser. In: Fensel, D., Sycara, K.P., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 690–705. Springer, Heidelberg (2003)
Carr, L., Hall, W., Bechhofer, S., Goble, C.: Conceptual Linking: Ontology-based Open Hypermedia. In: Proc. 10th Int. World Wide Web Conf., pp. 334–342 (2001)
Cunningham, H., Humphreys, K., et al.: GATE – a General Architecture for Text Engineering. In: Proc. 5th Applied Natural Lang. Processing Conf., pp. 29–30 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Weigel, F., Schulz, K.U., Brunner, L., Torres-Schumann, E. (2006). Integrated Document Browsing and Data Acquisition for Building Large Ontologies. In: Gabrys, B., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2006. Lecture Notes in Computer Science(), vol 4253. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11893011_78
Download citation
DOI: https://doi.org/10.1007/11893011_78
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46542-3
Online ISBN: 978-3-540-46544-7
eBook Packages: Computer ScienceComputer Science (R0)