Towards a Terabyte Digital Library System

Ding, Hao; Lin, Yun; Liu, Bin

doi:10.1007/978-3-540-45080-1_147

Hao Ding⁷,
Yun Lin⁷ &
Bin Liu⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2690))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1373 Accesses
1 Citations

Abstract

In China-US Million Book Digital Library, output of the digitalization process is more than one terabyte of text in OEB and PDF format. To access these data quickly and accurately, we are developing a distributed terabyte text retrieval system. To solve the interoperability and extensibility among different information resources, we introduced our solutions of three kinds of metadata schemes. Furthermore, because of the complexity in Chinese language, we made an approach in word segment methods to increase the efficiency and response time of the DL system. In the testbed, we put an extra layer in the cache server and designed a new algorithm based on VSM. With the query cache, system can search less data while maintaining acceptable retrieval accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Research on Key Technology of Distributed Indexing and Retrieval System Based on Lucene

Keyword-Based Search on Bilingual Digital Libraries

Construction of Online Reading Corpus Based on SQL Server Database Management System

References

Bailey, P., Hawking, D.: A Parallel Architecture for Query Processing Over A Terabyte of Text. Australian National University, Technical Report (1996)
Google Scholar
Lu, Z.: Scalable Distributed Architectures for Information Retrieval. University of Massachusetts, Amherst (1999)
Google Scholar
Open eBook Publication Structure Specification 1.0.1, http://openebook.org/oebps/
Markatos, E.P.: On Caching Search Engine Query Results. Computer Communications 24(2), 137–143 (2001)
Article Google Scholar
Xie, Y., O’Hallaron, D.: Locality in Search Engine Queries and Its Implications for Caching. Carnegie Mellon University, Technical Report (2001)
Google Scholar
The Dublin Core Home Page, http://purl.oclc.org/dc/documents/rec-dces-19990702.htm
Gschwind, T., Hauswirth, M.: A Cache Architecture for Modernizing the Usenet Infrastructure. In: Proceedings of the 32nd Hawai’i International Conference On System Sciences, Maui, Hawaii, January 5-8 (1999)
Google Scholar
W3C Resource Description Framework (RDF) Schema Speci_cation, http://www.w3.org/TR/PR-rdf-schema/
Dexi, Z.: Lecture of Chinese Syntax. Commercial Press (1982)
Google Scholar
Leibo, Z., Lu’en, J.: The difference between Dublin Core and USMARC. Journal of the Theory and Research, June 13 (2001) http://www.libnet.sh.en/magzine/00-10/p17.htm

Download references

Author information

Authors and Affiliations

Department of Computer and Information Science, Norwegian University of Science and Technology, 7491, Trondheim, NORWAY
Hao Ding & Yun Lin
Digital Media Research Center, Chinese Academy of Sciences, Beijing, 100080, P.R.China
Bin Liu

Authors

Hao Ding
View author publications
You can also search for this author in PubMed Google Scholar
Yun Lin
View author publications
You can also search for this author in PubMed Google Scholar
Bin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
Jiming Liu
Department of Computer Science, Hong Kong Baptist University, Hong Kong
Yiu-ming Cheung
School of Electrical and Electronic Engineering, University of Manchester, UK
Hujun Yin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ding, H., Lin, Y., Liu, B. (2003). Towards a Terabyte Digital Library System. In: Liu, J., Cheung, Ym., Yin, H. (eds) Intelligent Data Engineering and Automated Learning. IDEAL 2003. Lecture Notes in Computer Science, vol 2690. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45080-1_147

Download citation

DOI: https://doi.org/10.1007/978-3-540-45080-1_147
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40550-4
Online ISBN: 978-3-540-45080-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics