Skip to main content

A Flexible Workbench for Document Analysis and Text Mining

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3136))

Abstract

Document analysis and text mining techniques are used to pre-process documents in information retrieval systems, to extract concepts in ontology construction processes, and to discover and classify knowledge along several dimensions. In most cases it is not obvious how the techniques should be configured and combined, and it is a time-consuming process to set up and test various combinations of techniques. In this paper, we present a workbench that makes it easy to plug in new document analysis and text mining techniques and experiment with different constellations of techniques. We explain the architecture of the workbench and show how the workbench has been used to extract ontological concepts and relationships for a document collection published by the Norwegian Center for Medical Informatics.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ahmad, K.: Language Engineering and the Processing of Specialist Terminology (1994), http://www.computing.survey.ac.uk/ai/pointer/paris.html

  2. Baeza-Yates, R., Ribeiro-Net, B.: Modern Information Retrieval. Addison Wesley Longham, Reading (1999)

    Google Scholar 

  3. Berry, M.W.: Survey of Text Mining: Clustering, Classification, and Retrieval. Springer, Heidelberg (2004)

    MATH  Google Scholar 

  4. Brasethvik, T., Gulla, J.A.: Natural Language Analysis for Semantic Document Modeling. Data & Knowledge Engineering 38(1), 45–62 (2001)

    Article  MATH  Google Scholar 

  5. Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann Publishers, San Francisco (2003)

    Google Scholar 

  6. Desmontils, E., Jacquin, C.: Indexing a Web Site with a Terminology Oriented Ontology. In: Proceedings of the first Semantic Web Working Symposium (SWWS 2001), Stanford, July/August 2001, pp. 549–565 (2001)

    Google Scholar 

  7. Faure, D., Poibeau, T.: First experiments of using semantic knowledge learned by ASIUM for information extraction task using INTEX. In: Proceedings of the First Workshop on Ontology Learning OL 2000. ECAI Workshop on Ontology Learning, Berlin (August 2000)

    Google Scholar 

  8. Gulla, J.A., Auran, P.G., Risvik, K.M.: Linguistics in Large-Scale Web Search. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2553, pp. 218–222. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  9. Haddad, H.: Combining Text Mining and NLP for Information Retrieval. In: Proceedings of the International Conference on Artificial Intelligence (IC-AI 2002), Las Vegas, June 2002, vol. 1, pp. 434–439 (2002)

    Google Scholar 

  10. Jurafsky, D., Martin, J.H.: Speech and Language Processing: An introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice-Hall, Englewood Cliffs (2000)

    Google Scholar 

  11. Kaada, H.: Linguistic Workbench for Document Analysis and Text Data Mining. Master’s thesis. Norwegian University of Science and Technology, Trondheim (2002)

    Google Scholar 

  12. KITH, Definisjonskatalog for helsestasjoner og skolehelsetjenesten. KITH report 15/02, First edition. In: Norwegian (2003) ISBN 82-7846-140-6

    Google Scholar 

  13. Kunze, M., Rösner, D.: An XML-based Approach for the Presentation and Exploitation of Extracted Information. In: Proceedings of the 1st International Workshop on Web Document Analysis (WDA 2001), Seattle (September 2001)

    Google Scholar 

  14. Maedche, A., Staab, S.: Ontology Learning for the Semantic Web. In: IEEE Intelligent Systems, March/April 2001, pp. 72–79 (2001)

    Google Scholar 

  15. Navigli, R., Velardi, P., Gangemi, A.: Ontology Learning and Its Application to Automated Terminology Translation. In: IEEE Intelligent Systems, January/February 2003, pp. 22–31 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gulla, J.A., Brasethvik, T., Kaada, H. (2004). A Flexible Workbench for Document Analysis and Text Mining. In: Meziane, F., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2004. Lecture Notes in Computer Science, vol 3136. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27779-8_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27779-8_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22564-5

  • Online ISBN: 978-3-540-27779-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics