Skip to main content

A No-Compromises Architecture for Digital Document Preservation

  • Conference paper
Research and Advanced Technology for Digital Libraries (ECDL 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3652))

Included in the following conference series:

  • 1268 Accesses

Abstract

The Multivalent Document Model offers a practical, proven, no-compromises architecture for preserving digital documents of potentially any data format. We have implemented from scratch such complex and currently important formats as PDF and HTML, as well as older formats including scanned paper, UNIX manual pages, TeX DVI, and Apple II AppleWorks word processing. The architecture, stable since its definition in 1997, extends easily to additional document formats, defines a cross-format document tree data structure that fully captures semantics and layout, supports full expression of a format’s often idiosyncratic concepts and behavior, enables sharing of functionality across formats thus reducing implementation effort, can introduce new functionality such as hyperlinks and annotation to older formats that cannot express them, and provides a single interface (API) across all formats. Multivalent contrasts sharply with emulation and conversion, and advances Lorie’s Universal Virtual Computer with high-level architecture and extensive implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Adobe Systems, Inc. PDF as a Standard for Archiving, Adobe white paper, http://www.adobe.com/products/acrobat/pdfs/pdfarchiving.pdf

  2. Birrell, A., McJones, P.: Virtual Paper Web site (1995–1997), http://www.research.compaq.com/SRC/virtualpaper/

  3. Brown, A.: Preserving the Digital Heritage: Building a Digital Archive for UK Government Records. In: Brown, A. (ed.) Proceedings of Online Information (2003), http://www.nationalarchives.gov.uk/preservation/digitalarchive/

  4. IBM. Digital Asset Preservation Tool Web site, http://www.alphaworks.ibm.com/tech/uvc?Open&ca=daw-flHts-120204

  5. International Standards Organization. In: ISO/CD 19005-1, Document management—Electronic document file format for long-term preservation—Part 1: Use of PDF (PDF/A), (November 31, 2003), http://www.aiim.org/documents/standards/ISO_19005-1_E.doc

  6. Janssen, W.C., Popat, K.: UpLib: a universal personal digital library system, In. In: Proceedings of the ACM symposium on Document Engineering, pp. 234–242 (2003)

    Google Scholar 

  7. Lindholm, T., Yellin, F.: The Java Virtual Machine Specification, 2nd edn. Addison-Wesley, Longman Publishing Co., Inc. (1999)

    Google Scholar 

  8. Lorie, R.: Long term preservation of digital information. In: Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 346–352 (2001)

    Google Scholar 

  9. Marshall, C.C., Golovchinsky, G.: Saving Private Hypertext: Requirements and pragmatic dimensions for preservation. In: Proceedings of ACM Hypertext, pp. 130–138 (2004)

    Google Scholar 

  10. Mellor, P., Wheatley, P., Sergeant, D.: Migration on Request: A Practical Technique for Digital Preservation. In: Agosti, M., Thanos, C. (eds.) ECDL 2002. LNCS, vol. 2458, p. 516. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  11. Meehan, J., Taft, E., Chernicoff, S., Rose, C., Karr, R.: PDF Reference, 5th edn. (2004)

    Google Scholar 

  12. Messerschmitt, D.G.: Opportunities for Research Libraries in the NSF Cyberinfrastructure Program. In: ARL Bimonthly Report, p. 229 (2003), http://www.arl.org/newsltr/229/cyber.html

  13. Multivalent Web site, http://multivalent.sourceforge.net

  14. The National Archives. PRONOM Web site, http://www.nationalarchives.gov.uk/pronom/

  15. Phelps, T.A.: Multivalent Documents: Anytime, Anywhere, Any Type, Every Way User-Improvable Digital Documents and Systems. Ph.D. Dissertation, University of California, Berkeley (1998)

    Google Scholar 

  16. Phelps, T.A., Wilensky, R.: The Multivalent Browser: a platform for new ideas. In: Proceedings of Document Engineering, pp. 58–67 (2001)

    Google Scholar 

  17. Rodig, P., Borghoff, U.M., Scheffczyk, J., Schmitz, L.: Preservation of digital publications: An OAIS extension and implementation. In: Proceedings of the ACM Symposium on Document Engineering, pp. 131–139 (2003)

    Google Scholar 

  18. San Diego Supercomputer Center. Persistent Archive Testbed (PAT), http://www.sdsc.edu/PAT/

  19. Wheatley, P.: Survey and Assessment of Sources of Information on File Formats and Software Documentation (2003), http://www.jisc.ac.uk/uploaded_documents/FileFormatsreport.pdf

  20. Wotzits Format? Web site, http://www.wotsit.org/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Phelps, T.A., Watry, P.B. (2005). A No-Compromises Architecture for Digital Document Preservation. In: Rauber, A., Christodoulakis, S., Tjoa, A.M. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2005. Lecture Notes in Computer Science, vol 3652. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551362_24

Download citation

  • DOI: https://doi.org/10.1007/11551362_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28767-4

  • Online ISBN: 978-3-540-31931-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics