Skip to main content

Data Model for Document Transformation and Assembly

Extended Abstract

  • Conference paper
  • First Online:
Principles of Digital Document Processing (PODDP 1998)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1481))

Included in the following conference series:

Abstract

This paper shows a data model for transforming and assembling document information such as SGML or XML documents. The biggest advantage over other data models is that this data model simultaneously provides (1) powerful patterns and contextual conditions, and (2) schema transformation. Patterns and contextual conditions capture conditions on subordinates and those on superiors, siblings, subordinates of siblings, etc, respectively, and have been recognized as highly important mechanisms for identifying document components in the document processing community. Meanwhile, schema transformation has been, since the RDB, recognized as crucial in the database community. However, no data models have provided all three of patterns, contextual conditions, and schema transformation.

This data model is based on the forest-regular language theory. A schema is a forest automaton and an instance is a finite set of forests (sequences of trees). Since the parse tree set of an extended-context free grammar is accepted by a forest automaton, this model is a generalization of Gonnet and Tompa’s grammatical model. Patterns are captured as forest automatons; contextual conditions are pointed forest representations (a variation of Podelski’s pointed tree representations). Controlled by patterns and contextual conditions, an operator creates an instance from an input instance and also creates a reasonably small schema from an input schema. Furthermore, the created schema is often minimally sufficient; any forest permitted by it may be generated by some input instance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abiteboul, S., Cluet, S., Milo, T.: Querying and updating the file. VLDB’ 93 19 (1993) 73–84

    Google Scholar 

  2. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995)

    Google Scholar 

  3. Baeza-Yates, R., Navarro, G.: Integrating contents and structure in text retrieval. SIGMOD Record 25:1 (1996) 67–79

    Article  Google Scholar 

  4. Blake, G., Bray, T., Tompa, F.: Shortening the OED: Experience with a grammar-defined database. ACM TOIS 10:3 (1992) 213–232

    Article  Google Scholar 

  5. Christophides, V., Abiteboul, S., Cluet, S., Scholl, M.: From structured documents to novel query facilities. SIGMOD Record 23:2 (1994) 313–324

    Article  Google Scholar 

  6. Colby, L., Van Gucht, D., Saxton, L.: Concepts for modeling and querying list-structured data. Information Processing & Management 30:5 (1994) 687–709

    Article  Google Scholar 

  7. Gécseg, F., Steinby, M.: Tree Automata. Akadémiai Kiadó (1984)

    Google Scholar 

  8. Gonnet, G., Tompa, F.: Mind your grammar: a new approach to modeling text. VLDB’ 87 13 (1987) 339–346

    Google Scholar 

  9. Gyssens, M., Paredaens, J., Van Gucht, D.: A grammar-based approach towards unifying hierarchical data models. SIAM Journal on Computing 23:6 (1994) 1093–1137

    Article  MATH  MathSciNet  Google Scholar 

  10. Murata, M.: Transformation of documents and schemas by patterns and contextual conditions. Lecture Notes in Computer Science 1293 (1997) 153–169

    Google Scholar 

  11. Pair, C., Quere, A.: Définition et etude des bilangages réguliers. Information and Control 13:6 (1968) 565–593

    Article  MATH  MathSciNet  Google Scholar 

  12. Podelski, A.: A monoid approach to tree automata. In Tree Automata and Languages North-Holland (1992) 41–56

    Google Scholar 

  13. Takahashi, M.: Generalizations of regular sets and their application to a study of context-free languages. Information and Control 27 (1975) 1–36

    Article  MATH  MathSciNet  Google Scholar 

  14. Zdonik, S., Maier, D.: Readings in Object-Oriented Database Systems. Morgan Kaufmann (1990)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Murata, M. (1998). Data Model for Document Transformation and Assembly. In: Munson, E.V., Nicholas, C., Wood, D. (eds) Principles of Digital Document Processing. PODDP 1998. Lecture Notes in Computer Science, vol 1481. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49654-8_12

Download citation

  • DOI: https://doi.org/10.1007/3-540-49654-8_12

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65086-7

  • Online ISBN: 978-3-540-49654-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics