Skip to main content

Towards a Terabyte Digital Library System

  • Conference paper
Intelligent Data Engineering and Automated Learning (IDEAL 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2690))

Abstract

In China-US Million Book Digital Library, output of the digitalization process is more than one terabyte of text in OEB and PDF format. To access these data quickly and accurately, we are developing a distributed terabyte text retrieval system. To solve the interoperability and extensibility among different information resources, we introduced our solutions of three kinds of metadata schemes. Furthermore, because of the complexity in Chinese language, we made an approach in word segment methods to increase the efficiency and response time of the DL system. In the testbed, we put an extra layer in the cache server and designed a new algorithm based on VSM. With the query cache, system can search less data while maintaining acceptable retrieval accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bailey, P., Hawking, D.: A Parallel Architecture for Query Processing Over A Terabyte of Text. Australian National University, Technical Report (1996)

    Google Scholar 

  2. Lu, Z.: Scalable Distributed Architectures for Information Retrieval. University of Massachusetts, Amherst (1999)

    Google Scholar 

  3. Open eBook Publication Structure Specification 1.0.1, http://openebook.org/oebps/

  4. Markatos, E.P.: On Caching Search Engine Query Results. Computer Communications 24(2), 137–143 (2001)

    Article  Google Scholar 

  5. Xie, Y., O’Hallaron, D.: Locality in Search Engine Queries and Its Implications for Caching. Carnegie Mellon University, Technical Report (2001)

    Google Scholar 

  6. The Dublin Core Home Page, http://purl.oclc.org/dc/documents/rec-dces-19990702.htm

  7. Gschwind, T., Hauswirth, M.: A Cache Architecture for Modernizing the Usenet Infrastructure. In: Proceedings of the 32nd Hawai’i International Conference On System Sciences, Maui, Hawaii, January 5-8 (1999)

    Google Scholar 

  8. W3C Resource Description Framework (RDF) Schema Speci_cation, http://www.w3.org/TR/PR-rdf-schema/

  9. Dexi, Z.: Lecture of Chinese Syntax. Commercial Press (1982)

    Google Scholar 

  10. Leibo, Z., Lu’en, J.: The difference between Dublin Core and USMARC. Journal of the Theory and Research, June 13 (2001) http://www.libnet.sh.en/magzine/00-10/p17.htm

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ding, H., Lin, Y., Liu, B. (2003). Towards a Terabyte Digital Library System. In: Liu, J., Cheung, Ym., Yin, H. (eds) Intelligent Data Engineering and Automated Learning. IDEAL 2003. Lecture Notes in Computer Science, vol 2690. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45080-1_147

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45080-1_147

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40550-4

  • Online ISBN: 978-3-540-45080-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics