Abstract
Document analysis and text mining techniques are used to pre-process documents in information retrieval systems, to extract concepts in ontology construction processes, and to discover and classify knowledge along several dimensions. In most cases it is not obvious how the techniques should be configured and combined, and it is a time-consuming process to set up and test various combinations of techniques. In this paper, we present a workbench that makes it easy to plug in new document analysis and text mining techniques and experiment with different constellations of techniques. We explain the architecture of the workbench and show how the workbench has been used to extract ontological concepts and relationships for a document collection published by the Norwegian Center for Medical Informatics.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Ahmad, K.: Language Engineering and the Processing of Specialist Terminology (1994), http://www.computing.survey.ac.uk/ai/pointer/paris.html
Baeza-Yates, R., Ribeiro-Net, B.: Modern Information Retrieval. Addison Wesley Longham, Reading (1999)
Berry, M.W.: Survey of Text Mining: Clustering, Classification, and Retrieval. Springer, Heidelberg (2004)
Brasethvik, T., Gulla, J.A.: Natural Language Analysis for Semantic Document Modeling. Data & Knowledge Engineering 38(1), 45–62 (2001)
Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann Publishers, San Francisco (2003)
Desmontils, E., Jacquin, C.: Indexing a Web Site with a Terminology Oriented Ontology. In: Proceedings of the first Semantic Web Working Symposium (SWWS 2001), Stanford, July/August 2001, pp. 549–565 (2001)
Faure, D., Poibeau, T.: First experiments of using semantic knowledge learned by ASIUM for information extraction task using INTEX. In: Proceedings of the First Workshop on Ontology Learning OL 2000. ECAI Workshop on Ontology Learning, Berlin (August 2000)
Gulla, J.A., Auran, P.G., Risvik, K.M.: Linguistics in Large-Scale Web Search. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2553, pp. 218–222. Springer, Heidelberg (2002)
Haddad, H.: Combining Text Mining and NLP for Information Retrieval. In: Proceedings of the International Conference on Artificial Intelligence (IC-AI 2002), Las Vegas, June 2002, vol. 1, pp. 434–439 (2002)
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice-Hall, Englewood Cliffs (2000)
Kaada, H.: Linguistic Workbench for Document Analysis and Text Data Mining. Master’s thesis. Norwegian University of Science and Technology, Trondheim (2002)
KITH, Definisjonskatalog for helsestasjoner og skolehelsetjenesten. KITH report 15/02, First edition. In: Norwegian (2003) ISBN 82-7846-140-6
Kunze, M., Rösner, D.: An XML-based Approach for the Presentation and Exploitation of Extracted Information. In: Proceedings of the 1st International Workshop on Web Document Analysis (WDA 2001), Seattle (September 2001)
Maedche, A., Staab, S.: Ontology Learning for the Semantic Web. In: IEEE Intelligent Systems, March/April 2001, pp. 72–79 (2001)
Navigli, R., Velardi, P., Gangemi, A.: Ontology Learning and Its Application to Automated Terminology Translation. In: IEEE Intelligent Systems, January/February 2003, pp. 22–31 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gulla, J.A., Brasethvik, T., Kaada, H. (2004). A Flexible Workbench for Document Analysis and Text Mining. In: Meziane, F., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2004. Lecture Notes in Computer Science, vol 3136. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27779-8_29
Download citation
DOI: https://doi.org/10.1007/978-3-540-27779-8_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22564-5
Online ISBN: 978-3-540-27779-8
eBook Packages: Springer Book Archive