A Flexible Workbench for Document Analysis and Text Mining

Gulla, Jon Atle; Brasethvik, Terje; Kaada, Harald

doi:10.1007/978-3-540-27779-8_29

A Flexible Workbench for Document Analysis and Text Mining

Jon Atle Gulla¹⁸,
Terje Brasethvik¹⁸ &
Harald Kaada¹⁸

Conference paper

678 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3136))

Abstract

Document analysis and text mining techniques are used to pre-process documents in information retrieval systems, to extract concepts in ontology construction processes, and to discover and classify knowledge along several dimensions. In most cases it is not obvious how the techniques should be configured and combined, and it is a time-consuming process to set up and test various combinations of techniques. In this paper, we present a workbench that makes it easy to plug in new document analysis and text mining techniques and experiment with different constellations of techniques. We explain the architecture of the workbench and show how the workbench has been used to extract ontological concepts and relationships for a document collection published by the Norwegian Center for Medical Informatics.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ahmad, K.: Language Engineering and the Processing of Specialist Terminology (1994), http://www.computing.survey.ac.uk/ai/pointer/paris.html
Baeza-Yates, R., Ribeiro-Net, B.: Modern Information Retrieval. Addison Wesley Longham, Reading (1999)
Google Scholar
Berry, M.W.: Survey of Text Mining: Clustering, Classification, and Retrieval. Springer, Heidelberg (2004)
MATH Google Scholar
Brasethvik, T., Gulla, J.A.: Natural Language Analysis for Semantic Document Modeling. Data & Knowledge Engineering 38(1), 45–62 (2001)
Article MATH Google Scholar
Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann Publishers, San Francisco (2003)
Google Scholar
Desmontils, E., Jacquin, C.: Indexing a Web Site with a Terminology Oriented Ontology. In: Proceedings of the first Semantic Web Working Symposium (SWWS 2001), Stanford, July/August 2001, pp. 549–565 (2001)
Google Scholar
Faure, D., Poibeau, T.: First experiments of using semantic knowledge learned by ASIUM for information extraction task using INTEX. In: Proceedings of the First Workshop on Ontology Learning OL 2000. ECAI Workshop on Ontology Learning, Berlin (August 2000)
Google Scholar
Gulla, J.A., Auran, P.G., Risvik, K.M.: Linguistics in Large-Scale Web Search. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2553, pp. 218–222. Springer, Heidelberg (2002)
Chapter Google Scholar
Haddad, H.: Combining Text Mining and NLP for Information Retrieval. In: Proceedings of the International Conference on Artificial Intelligence (IC-AI 2002), Las Vegas, June 2002, vol. 1, pp. 434–439 (2002)
Google Scholar
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice-Hall, Englewood Cliffs (2000)
Google Scholar
Kaada, H.: Linguistic Workbench for Document Analysis and Text Data Mining. Master’s thesis. Norwegian University of Science and Technology, Trondheim (2002)
Google Scholar
KITH, Definisjonskatalog for helsestasjoner og skolehelsetjenesten. KITH report 15/02, First edition. In: Norwegian (2003) ISBN 82-7846-140-6
Google Scholar
Kunze, M., Rösner, D.: An XML-based Approach for the Presentation and Exploitation of Extracted Information. In: Proceedings of the 1st International Workshop on Web Document Analysis (WDA 2001), Seattle (September 2001)
Google Scholar
Maedche, A., Staab, S.: Ontology Learning for the Semantic Web. In: IEEE Intelligent Systems, March/April 2001, pp. 72–79 (2001)
Google Scholar
Navigli, R., Velardi, P., Gangemi, A.: Ontology Learning and Its Application to Automated Terminology Translation. In: IEEE Intelligent Systems, January/February 2003, pp. 22–31 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Norwegian University of Science and Technology, Trondheim, Norway
Jon Atle Gulla, Terje Brasethvik & Harald Kaada

Authors

Jon Atle Gulla
View author publications
You can also search for this author in PubMed Google Scholar
Terje Brasethvik
View author publications
You can also search for this author in PubMed Google Scholar
Harald Kaada
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing, Science and Engineering Newton Building, University of Salford, M5 4WT, Greater Manchester, UK
Farid Meziane
Lab. CEDRIC, CNAM, Paris, France
Elisabeth Métais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gulla, J.A., Brasethvik, T., Kaada, H. (2004). A Flexible Workbench for Document Analysis and Text Mining. In: Meziane, F., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2004. Lecture Notes in Computer Science, vol 3136. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27779-8_29

Download citation

DOI: https://doi.org/10.1007/978-3-540-27779-8_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22564-5
Online ISBN: 978-3-540-27779-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics