Abstract
To ease the retrieval of documents published on the Web, the documents should be classified in a way that users find helpful and meaningful. This paper presents an approach to semantic document classification and retrieval based on Natural Language Analysis and Conceptual Modeling. A conceptual domain model is used in combination with linguistic tools to define a controlled vocabulary for a document collection. Users may browse this domain model and interactively classify documents by selecting model fragments that describe the contents of the documents. Natural language tools are used to analyze the text of the documents and propose relevant domain model concepts and relations. The proposed fragments are refined by the users and stored as XML document descriptions. For document retrieval, lexical analysis is used to pre-process search expressions and map these to the domain model for manual query-refinement. A prototype of the system is described, and the approach is illustrated with examples from a document collection published by the Norwegian Center for Medical Informatics (KITH).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sølvberg, A. “Data and what they refer to”. in Conceptual modeling: Historical perspectives and future trends. 1998. In conjunction with 16th Int. Conf. on Conceptual modeling, Los Angeles, CA, USA.
Nordhuus, I., “Definisjonskatalog for Somatiske sykehus (In Norwegian)”, http://www.kith.no/kodeverk/definisjonskatalog/defkat_somatiske/default.htm, (Accessed: March 2000)
Scott, M., “WordSmith Tools”, http://www.liv.ac.uk/~ms2928/wordsmit.htm, (Accessed: Jan 1998)
Voutilainen, A., “A short introduction to the NP Tool”, http://www.lingsoft.fi/doc/nptool/intro, (Accessed: March 2000)
SPRI, “Methods and Principles in terminological work (In Swedish)”,. 1991, Helso och sjukvårdens utvecklingsinstitutt.
ISO/DIS, “Terminology work-principles and methods”,. 1999.
Lingsoft, “NORTHES Norwegian Thesauri”, http://www.lingsoft.fi/cgi-pub/northes, (Accessed: March 2000)
Lingsoft, “Lingsoft Indexing and Retreieval-Morphological Analysis”, http://www.lingsoft.fi/en/indexing/, (Accessed: March 2000)
W3CRDF, “Resource Description Framework-Working Draft”, http://www.w3.org/Metadata/RDF/, (Accessed: March 2000)
Weibel, S. and E. Millner, “The Dublin Core Metadata Element Set home page”, http://purl.oclc.org/dc/, (Accessed: May 199)
Sparck-Jones, K., “What is The Role of NLP in Information Retrieval?”, in Natural Language Information Retrieval, T. Strzalkowski, Editor. 1999, Kluwer Academic Publisher.
BSCW, “Basic Support for Cooperative Work on the WWW”, http://bscw.gmd.de, (Accessed: May 1999)
Farshchian, B.A. “ICE: An object-oriented toolkit for tailoring collaborative Web—applications”. in IFIP WG8.1 Conference on Information Systems in the WWW Environment. 1998. Beijing, China.
TeamWave, “TeamWave WorkPlace Overview”, http://www.teamwave.com, (Accessed: May, 1999)
Voss, A., K. Nakata, M. Juhnke and T. Schardt. “Collaborative information management using concepts”. in 2nd International Workshop IIIS-99. 1999. Copenhague, DK: Postproceedings published by IGP.
Gruber, T., “Towards Priciples for the Design of Ontologies used for Knowledge Sharing”. Human and Computer Studies, 1995. Vol. 43 (No. 5/6): p. 907–928.
Guarino, N., “Ontologies and Knowledge Bases”,. 1995, IOS Press, Amsterdam.
Uschold, M. “Building Ontologies: Towards a unified methodology”. in The 16th annual conference of the British Computer Society Specialist Group on Expert Systems. 1996. Cambridge (UK).
Gruber, T.R., “Ontolingua-A mechanism to support portable ontologies”,. 1992, Knowledge Systems Lab, Stanford University.
Domingue, J. “Tadzebao and WebOnto: Discussing, Browsing, and Editing Ontologies on the Web.”. in 11th Banff Knowledge Aquisition for Knowledge-based systems Workshop. 1998. Banff, Canada.
Fensel, D., S. Decker, M. Erdmann and R. Studer. “Ontobroker: How to make the web intelligent”. in 11th Banff Knowledge Aquisition for Knowledge-based systems Workshop. 1998. Banff, Canada.
Fensel, D., J. Angele, S. Decker, M. Erdmann and H.-P. Schnurr, “On2Broker: Improving access to information sources at the WWW”, http://www.aifb.uni-karlsruhe.de/WBS/www-broker/o2/o2.pdf, (Accessed: May, 1999)
Swartout, B., R. Patil, K. Knight and T. Russ. “Ontosaurus: A tool for browsing and editing ontologies”. in 9th Banff Knowledge Aquisition for KNowledge-based systems Workshop. 1996. Banff, Canada.
Spriterm, “Spriterm-hälso och sjukvårdens gemensamma fakta och termdatabas”, http://www.spri.se/i/Spriterm/i-prg2.htm, (Accessed: March 2000)
Soamares de Lima, L., A.H.F. Laender and B.A. Ribeiro-Neto. “A Hierarchical Approach to the Automatic Categorization of Medical Documents”. in CIKM*98. 1998. Bethesda, USA: ACM.
OMNI, “OMNI: Organisaing Medical Networked Information”, http://www.omni.ac.uk/, (Accessed: May, 1999)
Galen, “Why Galen-The need for Integrated medical systems”, http://www.galen-organisation.com/approach.html, (Accessed: March 2000)
ISO/IEC, “Information Technology-Document Description and Processing Languages”, http://www.ornl.gov/sgml/sc34/document/0058.htm, (Accessed: March 2000)
Schneiderman, B., D. Byrd and W. Bruce Croft, “Clarifying Search: A User-Interface Framework for Text Searches”. D-Lib Magazine, 1997. Vol. (No. January)
Strzalkowski, T., F. Lin and J. Perez-Carballo. “Natural Language Information Retrieval TREC-6 Report”. in 6th Text Retrieval Conference, TREC-6. 1997. Gaithersburg, November, 1997.
Strzalkowski, T., G. Stein, G. Bowden-Wise, J. Perez-Caballo, P. Tapanainen, T. Jarvinen, A. Voutilainen and J. Karlgren. “Natural Language Information Retrieval-TREC-7 report”. in TREC-7. 1998.
Strzalkowski, T., “Natural Language Information Retrieval”. 1999: Kluwer Academic Publishers.
Arampatzis, A.T., T.P. van der Weide, P. van Bommel and C.H.A. Koster, “Linguistically Motivated Information Retrieval”,. 1999, University of Nijmegen.
Puder, A. “Service trading using conceptual structures”. in International Conference on Conceptual Structures (ICCS’95). 1995: Springer-Verlag.
Rau, L.F., “Knowledge organization and access in a conceptual information system”. Information Processing and Management, 1987. Vol. 21 (No. 4): p. 269–283.
Katz, B. “From Sentence Processing to Information Access on the World Wide Web”. in AAAI Spring Symposium on Natural Language Processing for the World Wide Web. 1997. Stanford University, Stanford CA.
Métais, E., “The role of knowledge and reasoning i CASE Tools”,. 1999, University of Versailles.
Fliedl, G., C. Kop, W. Mayerthaler, H.C. May and C. Winkler. “NTS-based derivation of KCPM Perspective Determiners”. in 3rd Int. workshop on Applications of Natural Language to Information Systems (NLDB’97). 1997. Vancouver, Ca.
Tjoa, A.M. and L. Berger. “Transformation of Requirement Specifications Expressed in Natural Language into EER Model”. in 12th Int. conceference on Entity-Relation approach. 1993.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brasethvik, T., Atle Gulla, J. (2001). Natural Language Analysis for Semantic Document Modeling. In: Bouzeghoub, M., Kedad, Z., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2000. Lecture Notes in Computer Science, vol 1959. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45399-7_11
Download citation
DOI: https://doi.org/10.1007/3-540-45399-7_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41943-3
Online ISBN: 978-3-540-45399-4
eBook Packages: Springer Book Archive