A Cache-Based Distributed Terabyte Text Retrieval System in CADAL

Cheng, Jun; Gao, Wen; Liu, Bin; Huang, Tie-jun; Zhang, Ling

doi:10.1007/3-540-36227-4_40

Jun Cheng⁶,
Wen Gao⁶,
Bin Liu⁶,
Tie-jun Huang⁶ &
…
Ling Zhang⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2555))

Included in the following conference series:

International Conference on Asian Digital Libraries

1207 Accesses

Abstract

The China-America Digital Academic Library (CADAL) project aims to create a searchable collection of one million digital books freely available over the Internet. For this, a terabyte text retrieval system is required. This paper presents a cache-based, distributed terabyte text retrieval system, with fulltext retrieval, distributed computing and caching techniques. By distributing data by subject on different index servers, query searching is limited to specific index servers. With cache servers, response time is reduced. When queried, the system returns only highly relevant search results, to reduce the workload on the network. The prototype system shows the effectiveness of our design.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bin Liu, Wen Gao, Ling Zhang, Tie-jun Hang, Xiao-ming Zhang, & Jun Cheng, Toward a Distributed Terabyte Text Retrieval System in China-US Million Book Digital Library. Joint Conference on Digital Libraries 2002, p. 7.
Google Scholar
Zhihong Lu, Scalable Distributed Architectures for Information Retrieval. University of Massachusetts at Amherst, Ph.D. dissertation, 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

Chinese Academy of Sciences, 100080, Beijing, P R China
Jun Cheng, Wen Gao, Bin Liu, Tie-jun Huang & Ling Zhang

Authors

Jun Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Wen Gao
View author publications
You can also search for this author in PubMed Google Scholar
Bin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Tie-jun Huang
View author publications
You can also search for this author in PubMed Google Scholar
Ling Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Nanyang Technological University, Singapore
Ee- Peng Lim , Schubert Foo & Chris Khoo , &
University of Arizona, USA
Hsinchun Chen
Virginia Tech, USA
Edward Fox
University of Mysore, Mysore
Shalini Urs
IEI-CNR, Italy
Thanos Costantino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cheng, J., Gao, W., Liu, B., Huang, Tj., Zhang, L. (2002). A Cache-Based Distributed Terabyte Text Retrieval System in CADAL. In: Lim, E.P., et al. Digital Libraries: People, Knowledge, and Technology. ICADL 2002. Lecture Notes in Computer Science, vol 2555. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36227-4_40

Download citation

DOI: https://doi.org/10.1007/3-540-36227-4_40
Published: 16 December 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00261-1
Online ISBN: 978-3-540-36227-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics