The TELLTALE dynamic hypertext environment: Approaches to scalability

Pearce, Claudia; Miller, Ethan

doi:10.1007/BFb0023962

Claudia Pearce¹ &
Ethan Miller²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1326))

Included in the following conference series:

96 Accesses
7 Citations

Abstract

Methods and tools for finding documents relevant to a user's needs in document corpora can be found in the information retrieval, library science, and hypertext communities. Typically, these systems provide retrieval capabilities for fairly static corpora, their algorithms are dependent on the language for which they are written, e.g. English, and they don't perform well when presented with misspelled words or text that has been degraded by OCR (optical character recognition) techniques. In this chapter, we present the TELLTALE system. TELLTALE is a dynamic hypertext environment that provides full-text search from a hypertextstyle user interface for text corpora that may be garbled by OCR or transmission errors, and that may contain languages other than English by using several techniques based on n-grams (n character sequences of text). In this chapter, we identify methods and techniques that we have applied to the n-gram data structures. We also discuss algorithms that we used to enhance the scalability of the TELLTALE Dynamic Hypertext System.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M. Aboud, C. Chrisment, R. Razouk, F. Sedes, and C. Soule-Dupuy. Querying a hypertext information retrieval system by the use of classification. Information Processing and Management, 29(3):387–396, 1990.
Google Scholar
W. B. Cavnar. N-Gram-Based text filtering for TREC-2. In Donna Harman, editor, Proceedings of TREC-2: Text Retrieval Conference 2, Gaithersburg, MD, 1993. National Institute of Standards and Technology.
Google Scholar
Jonathan Cohen. Highlights: Language-and domain-independent automatic indexing terms for abstracting. To appear in JASIS, 1995.
Google Scholar
The Unicode Consortium. The Unicode Standard: World Wide Character Encoding. Addison-Wesley, Redwood City, CA, 1992.
Google Scholar
W. B. Croft and R. Thompson. I ³R: A new approach to the design of document retrieval systems. Journal of the American Society for Information Science, 38:389–404, 1987.
Google Scholar
W. B. Croft and H. Turtle. A retrieval model for incorporating hypertext links. In Hypertext '89 Proceedings, pages 213–224. ACM Press, November 1989. Pittsburgh, PA, Nov 5–8.
Google Scholar
Donald B. Crouch, Carolyn J. Crouch, and Glenn Andreas. The use of cluster hierarchies in hypertext information retrieval. In Hypertext '89 Proceedings, pages 225–237. ACM Press, November 1989. Pittsburgh, PA, Nov 5–8.
Google Scholar
Marc Damashek, 1995. U. S. Patent Number 5,418,951.
Google Scholar
Marc Damashek. Gauging similarity with N-Grams: Language-independent categorization of text. Science, 267:843–848, 10 February 1995.
Google Scholar
R. D'Amore and C. Mah. One-time complete indexing of text: theory and practice. In Proceedings 8th International ACM Conference on Research and Development in Information Retrieval. ACM Press, 1985.
Google Scholar
The dp packagefor Tcl/Tk.Availablefor ftp from ftp://aud.alcatel.com/tcl/extensions/tcl-dp3.3bl.tar.gz.
Google Scholar
Douglas C. Engelbart and W. K. English. A research center for augmenting human intellect. In Proceedings of the Fall Joint Computer Conference. AFIPS Press, Montvale, NY, 1968.
Google Scholar
Mark E. Frisse and Steven B. Cousins. Information retrieval from hypertext: Update on the dynamic medical handbook project. In Hypertext '89 Proceedings. ACM Press, November 1989. Pittsburgh, PA, Nov 5–8.
Google Scholar
Donna Harmon, editor. TREC-2-Text REtrieval Conference-2. National Institute of Standards and Technology, August 1993.
Google Scholar
Donald E. Knuth. Sorting and Searching, pages 561–562. Addison Wesley, 1973.
Google Scholar
Theodor H. Nelson. Managing immense storage. BYTE, 13(1):225–238, January 1988.
Google Scholar
Jakob Nielsen. Hypertext and Hypermedia. Academic Press, San Diego, CA, 1990.
Google Scholar
Claudia E. Pearce. A Dynamic Hypertext Environment Through n-gram Analysis. PhD thesis, University of Maryland Baltimore County, 1994.
Google Scholar
Claudia E. Pearce. Dynamic hypertext links for highly degraded data in telltale. In Fourth Annual Symposium on Document Analysis and Information Retrieval, pages 89–106. Information Science Research Institute, University of Nevada Las Vegas, University of Nevada, 4505 Maryland Parkway, Box 454021, Las Vegas, Nevada 89154-4021, 1995.
Google Scholar
Gerard Salton and Michael McGill. Introduction to Modern Information Retrieval. McGraw-Hill Book Company, 1983.
Google Scholar
C. Y. Suen. n-gram statistics for natural language understanding and text processing. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2):164–172, 1979.
Google Scholar
Brent B. Welch. Practical Programming in Tcl and Tk. Prentice-Hall, Inc., 1995.
Google Scholar
P. Willette. Document retrieval experiments using indexing vocabularies of varying size. II. hashing, truncation, diagram and trigram encoding of index terms. Journal of Documentation, 35:296–305, December 1979.
Google Scholar
Ian H. Witten, Alistair Moffat, and Timothy C. Bell. Managing Gigabytes. Van Nostrand Reinhold, 1994.
Google Scholar
E. J. Yannakoudakis, P. Goyal, and J. A. Huggil. The generation and use of text fragments for data compression. Information Processing and Management, 18(1):15–21, 1982.
Google Scholar
E. M. Zamora, J. J. Pollock, and A. Zamora. The use of trigram analysis for spelling error detection. Information Processing and Management, 17(6):305–316, 1981.
Google Scholar

Download references

Author information

Authors and Affiliations

U.S. Department of Defense, USA
Claudia Pearce
University of Maryland Baltimore County, USA
Ethan Miller

Authors

Claudia Pearce
View author publications
You can also search for this author in PubMed Google Scholar
Ethan Miller
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Charles Nicholas James Mayfield

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Pearce, C., Miller, E. (1997). The TELLTALE dynamic hypertext environment: Approaches to scalability. In: Nicholas, C., Mayfield, J. (eds) Intelligent Hypertext. WIH WIH 1994 1993. Lecture Notes in Computer Science, vol 1326. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0023962

Download citation

DOI: https://doi.org/10.1007/BFb0023962
Published: 10 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63637-3
Online ISBN: 978-3-540-69622-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics