Skip to main content

Towards automatic hypertextual representation of linear texts

  • Conference paper
  • First Online:
Principles of Document Processing (PODP 1996)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1293))

Included in the following conference series:

  • 97 Accesses


Associative searching using hypertext links is a useful extension for conventional IR systems; manual conversion of texts into hypertexts, however, is feasible only in very restricted environments. Therefore, for large textual knowledge bases automatic conversion becomes necessary. In this paper, we will give a survey of existing (and implemented) as well as of projected approaches to the goal of automatic hypertextual representation as a prerequisite for associative searching. We will describe and compare the main ideas of these approaches, including their advantages and disadvantages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. E. Adar and J. Hylton. On-the-fly hyperlink creation for page images. In Proceedings of Digital Libraries '95 (, June 11–13, 1995, Austin, USA, 1995.

    Google Scholar 

  2. M. Agosti, M. Melucci, and F. Crestani. Automatic authoring and construction of hypermedia for information retrieval. Multimedia Systems, (3):15–24, 1995.

    Google Scholar 

  3. H. Argenton and P. Becker. Efficient retrieval of labeled binary trees. In International Symposium on Advanced Database Technologies and Their Integration, Nara, 1994.

    Google Scholar 

  4. A. D. Bagdanov and J. Kanai (eds.). Information Science Research Institute. Information Science Research Institute, University of Nevada, Las Vegas, 4505 Maryland Parkway, Box 454021, Las Vegas, Nevada 89154-4021, 1995.

    Google Scholar 

  5. Y. Chenevoy and A. Belaid. Low-level structural recognition of documents. In Third Annual Symposium on Document Analysis and Information Retrieval, April 11–13, 1994, Alexis Park Hotel, Las Vegas, Nevada, pages 365–374, 4505 Maryland Parway, Box 454021, Las Vegas, Nevada 89154-4021, USA, 1994. University of Nevada, Las Vegas.

    Google Scholar 

  6. N. Chomsky. Lectures on Government and Binding. Dordrecht, 1981.

    Google Scholar 

  7. N. Chomsky. A Minimalist Program for Linguistic Theory. Occasional Papers in Linguistics. Cambridge, Mass., 1992.

    Google Scholar 

  8. C. Cleary and R. Bareiss. Practical methods for automatically generating typed links. In Hypertext '96, Washington DC, March 16–20, 1996, pages 31–41, New York, 1996. The Association for Computing Machinery.

    Google Scholar 

  9. G. H. Collier. Thoth-II: Hypertext with explicit semantics. In Proceedings of the Hypertext '87, Chapel Hill, November, 1987, pages 269–289. ACM, 1987.

    Google Scholar 

  10. W. B. Croft and H. Turtle. A retrieval model for incorporating hypertext links. In Proceedings of the ACM Hypertext '89, Nov. 5–8, 1989, SIGCHI Bulletin, pages 213–224, Pittsburgh, Pennsylvania, 1989.

    Google Scholar 

  11. W. Fitzgerald and C. Wisdo. Using natural language processing to construct large-scale hypertext systems. In Proc. of the 8th Knowledge Acquisition for Knowledge-Based Systems Workshop, Banff, Canada, Jan. 30–Feb. 4, 1994.

    Google Scholar 

  12. D. Frei, H. P. and Stieger.Making use of hypertext links when retrieving information. In D. Lucarella, editor, Proceedings of the ACM Conference on Hypertext, Milano, Italy, Nova 30–Dec. 4,1992, pages 102-111, 1992.

    Google Scholar 

  13. D. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming. addison-Wesley, 1973.

    Google Scholar 

  14. R. Kuhlen and M. S. Hess. Passagen-Retrieval — auch eine Möglichkeit der automatischen Verknüpfung in Hypertexten. In G. Knorz, J. Krause, and C. Womser-Hacker, editors, Information Retrieval '93 — Von der Modellierung zur Anwendung, volume 12 of Schriften zur Informationswissenschaft, pages 100–115. Universitätsverlag Konstanz, 1993.

    Google Scholar 

  15. V. I. Levenshtein. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady, 10(8):707–710, February 1966.

    Google Scholar 

  16. M. Lipshutz and S. Liebowitz Taylor. Automatic generation of hypertext from legacy documents. In Proc. of the RIAO 94, Rockefeller University, New York, USA, Oct. 11–13, 1994, volume 2, pages 103–111. CASIS, CID, 1994.

    Google Scholar 

  17. J. Mayfield and C. Nicholas. Snitch: Augmenting hypertext documents with a semantic net. 1993.

    Google Scholar 

  18. E. Mittendorf, P. Schäuble, and P. Sheridan. Applying probabilistic term weighting to OCR text in the case of a large alphabetic library catalogue. In Proceedings of the SIGIR'95, Seattle, June 9–13, 1995, 1995.

    Google Scholar 

  19. A. Myka. Putting paper documents in the World-Wide Web. In I. Goldstein, editor, Proceedings of the 2nd International WWW Conference '94, Oct. 17–20, 1994, Chicago, volume 1, pages 199–208, 1994.

    Google Scholar 

  20. A. Myka and U. Güntzer. Automatic hypertext conversion of paper document collections. In N. Adam, B. Bhargava, and Y. Yesha, editors, Advances in Digital Libraries, number 916 in Lecture Notes in Computer Science, pages 65–90. Springer-Verlag, 1995.

    Google Scholar 

  21. A. Myka and U. Güntzer. Fuzzy full-text searches in OCR databases. In (to appear Proc. ADL '95, A Forum on Research and Technology Advances in Digital Libraries, May 15–19, 1995, Tysons Corner, Virginia, 1996.

    Google Scholar 

  22. A. Myka, U. Güntzer, and F. Sarre. Monitoring user actions in the hypertext system “HyperMan”. In Going Online — Conference Proceedings of the SIGDOC '92 (Oct. 13–16, 1992, Ottawa, Canada), pages 103–114, 1515 Broadway, New York, New York 10036, 1992. The Association for Computing Machinery.

    Google Scholar 

  23. A. Myka, M. Hiittl, and U. Güntzer. Hypertext conversion and representation of a printed manual. In Proceedings of the RIAO '94, New York, Oct. 11–13, 1994, pages 407–417, 36 bis rue Ballu, 75009 Paris, France, 1994. C.I.D.-C.A.S.I.S.

    Google Scholar 

  24. A. Myka, F. Sarre, and U. Güntzer. Rule-based machine learning of hypertext links. Upravlyaemye Sistemy i Machiny, (7/8):75–82, 1992.

    Google Scholar 

  25. R. Rada. Hypertext writing and document reuse: The role of a semantic net. Electronic Publishing — Origination, Dissemination and Design, 3(3):125–140, 1990.

    Google Scholar 

  26. W. Richter. Amos and its environment — our experiences. In Proc. Computers and Poetic Texts, Symposium on the Use of the Computer for the Study of Literary Texts in Middle Eastern Languages, Bern, 1992.

    Google Scholar 

  27. G. Salton and C. Buckley. On the automatic generation of content links in hypertext. Technical Report TR 89-993, Department of Computer Science, Cornell University, April 1989.

    Google Scholar 

  28. Gerard Salton, editor. The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall, 1971.

    Google Scholar 

  29. Gerard Salton. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, 1989.

    Google Scholar 

  30. F. Sarre and U. Güntzer. Automatic transformation of linear text into hypertext. In Proceedings of the International Symposium on Database Systems for Advanced Applications (DASFAA '91), Tokyo, Japan, April 2–4, 1991, pages 498–506, 1991.

    Google Scholar 

  31. G. Specht and B. Freitag. Amos: A natural language parser in lola. In Proc. Workshop on Programming with Logic Databases, University of Wisconsin, Madison, Vancouver BC, 1993.

    Google Scholar 

  32. K. Taghva, Borsack. J., and A. Condit. Results of applying probabilistic IR to OCR text. In ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, July, 1994, pages 202–211, 1994.

    Google Scholar 

  33. J. Werner and U. Güntzer. A step towards a true electronic library. In Proceedings of the ITTE '92 Conference, Brisbane, Australia, Sept. 29–Oct. 2, 1992, pages 614–631, 1992.

    Google Scholar 

  34. S. Wiesener, W. Kowarschik, P. Vogel, and R. Bayer. Semantic hypermedia retrieval in digital libraries. In To appear in: Advances in Digital Libraries, Lecture Notes in Computer Science. Springer-Verlag, 1995.

    Google Scholar 

  35. T. W. Yan and H. Garcia-Molina. Index structures for information filtering under the vector space model. Technical Report STAN-CS-TR-93-1494, Department of Computer Science, Stanford University, November 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Charles Nicholas Derick Wood

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Myka, A., Argenton, H., Güntzer, U. (1997). Towards automatic hypertextual representation of linear texts. In: Nicholas, C., Wood, D. (eds) Principles of Document Processing. PODP 1996. Lecture Notes in Computer Science, vol 1293. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63620-5

  • Online ISBN: 978-3-540-69614-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics