Skip to main content

A Tool for Semi-automatic Document Reengineering

  • Chapter
Book cover Reading and Learning

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2956))

  • 410 Accesses

Abstract

Marking-up of documents that only contain a layout-oriented structure (e.g. documents created by an ordinary word-processor) becomes more and more important for the future of information management in modern companies. That’s because only after the document has been marked up with logical elements, those additional information can be used for example to implement single-source publishing or to enable content-oriented retrieval. Today the process of marking-up layout-oriented documents usually has to be done manually what leads to high costs for the companies.

In the project “Adaptive READ” the Institute for Human Factors and Technology Management (IAT) of the University of Stuttgart has developed a semi-automatic approach to solve this problem of marking-up documents that only contain a layout-oriented structure. The main issues of this development are discussed in the following article.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. http://www.mintert.com/xml/mlweb/MarkUpLang.html last visited 2003-08-05

  2. Behme, H., Minnert, S.: XML in der Praxis. In: Professionelles Web-Publishing mit der Extensible Markup Language. Addison-Wesley Verlag, München (2000)

    Google Scholar 

  3. Altenhofen, C.: Document Reengineering: Der Pfad der Altbestände in eine strukturierte Zukunft. In: Vortrag im Rahmen der T.I.E.M. 1997, Wart, Juni 1997, pp. 11–13 (1997)

    Google Scholar 

  4. Lobin, H.: Informationsmodellierung in XML und SGML. Springer, Berlin (2001)

    MATH  Google Scholar 

  5. http://xml.coverpages.org/xmlApplications.html (last visited 2003-02-21)

  6. Soto, P.: Text Mining: Beyond Search Technology, DB2 magazine online, available at http://www.db2mag.com/db_area/archives/1998/q3/98fsoto.shtml (last visited 2003-01-30)

  7. Ahonen, H.: Automatic generation of SGML content models, Electronic Publishing – Origination. Dissemination and Design 8(2\&3), 195–206 (1996), http://www.cs.helsinki.fi/~hahonen/helena_ep96.ps (last visited 2003-01-30)

    Google Scholar 

  8. Ahonen, H.: Disambiguation of SGML content models. In: Munson, E.V., Nicholas, C., Wood, D. (eds.) PODDP 1998 and PODP 1998. LNCS, vol. 1481, p. 24. Springer, Heidelberg (1996); available at http://www.cs.helsinki.fi/~hahonen/ahonen_podp96.ps (last visited 2003-01-30)

    Chapter  Google Scholar 

  9. Ahonen, H.: Generating Grammars for Structured Documents Using Grammatical Inference Methods, PhD-Thesis, Series of Publications A, Report A-1996-4, Department of Computer Science, University of Helsinki, (November 1996), available at http://www.cs.Helsinki.FI/u/hahonen/fogram.ps.gz (last visited 2003-01-30)

  10. Ahonen, H., Heikkinen, B., Heinonen, O., Klemettinen, M.: Improving the Accessibility of SGML-Documents - A Content-analytical Approach. In: SGML Europe 9́7, CGA S.321-327 (Mai 1997), available at http://www.cs.helsinki.fi/u/oheinone/publications/Improving_the_Accessibility_of_SGML_Documents_-_A_Content-analytical_Approach.ps.gz (last visited 2003-01-30)

  11. Klein, B., Fankhauser, P.: Error tolerant Document Structure Analysis, GMD-IPSI Darmstadt, P-97-18. International Journal on Digital Libraries 1(4), 344–357 (1997)

    Article  Google Scholar 

  12. Ahonen, H., Heinonen, O., Klemettinen, M., Verkamo, A.I.: Applying Data Mining Techniques in Text Analysis, Report C-1997-23, Department of Computer Science, University of Helsinki (1997), available at http://www.cs.helsinki.fi/u/oheinone/publications/Applying_Data_Mining_Techniques_in_Text_Analysis.ps.gz (last visited 2003-01-30)

  13. Heikkinen, B.: Generalization of Document Structures and Document Assembly, PhDThesis, Series of Publications A, Report A-2000-2, Department of Computer Science, University of Helsinki (April 2000), available at http://www.cs.Helsinki.FI/u/bheikkin/bh_thesis.zip (last visited 2003-01-30)

  14. Zeigermann, O.: Strukturierte Transformation, Diploma thesis at the University of Hamburg, Department of Computer Science (February 2000)

    Google Scholar 

  15. http://www.vftis.com (last visited 2003-01-30)

  16. http://www.docconsult.de (last visited 2003-02-28)

  17. http://www.stellent.com (last visited 2003-02-07)

  18. Autonomy Technology White Paper, Autonomy Corporation (2000), http://www.autonomy.com/echo/userfile/germanwhitepaper.pdf (last download 2000-11- 13)

  19. http://www-3.ibm.com/software/data/iminer/ (last visited 2003-02-07)

  20. http://www.temis-group.com/ (last visited 2003-02-07)

  21. http://www.inxight.com (last visited 2003-02-07)

  22. Ludemann, P.: Enhancing Searching and Content Management with XML Tags and Linguistic Processing, WhitePaper of Inxight Software, Inc. (2000), available at, http://www.firstworld.net/~ludemann/XML.html (last visited 2003-02-07)

  23. http://www.inxight.com/products/oem/linguistx/index.php (last visited 2003-02-07)

  24. http://www.omnimark.com/home/home.html (last visited 2003-02-07)

  25. http://www.zeigermann.de/xtal (last visited 2003-02-07)

  26. ftp://ftp.ifi.uio.no/pub/SGML/Rainbow/ (last visited 2003-01-31)

  27. ftp://ftp.ifi.uio.no/pub/SGML/Rainbow/ (last visited 2003-01-31)

  28. Bullinger, H.-J(I.), Weisbecker, A.: Aufbereitung unstrukturierter Dokumentinhalte. In: Content Management - Digitale Inhalte als Bausteine einer vernetzten Welt, pp. S.1–7. Fraunhofer IRB Verlag, Stuttgart (2002)

    Google Scholar 

  29. Altenhofen, C.: Semi-automatische Informationsstrukturierung in ‘Adaptive-READ’. In: Proceedings of the conference, presentation in the XML user panel of the tekom annual conference, pp. 61–63 (2002-11-20)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Drawehn, J., Altenhofen, C., Stanišić-Petrović, M., Weisbecker, A. (2004). A Tool for Semi-automatic Document Reengineering. In: Dengel, A., Junker, M., Weisbecker, A. (eds) Reading and Learning. Lecture Notes in Computer Science, vol 2956. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24642-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24642-8_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21904-0

  • Online ISBN: 978-3-540-24642-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics