Data Model for Document Transformation and Assembly

Murata, Makoto

doi:10.1007/3-540-49654-8_12

Makoto Murata⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1481))

Included in the following conference series:

International Workshop on Principles of Digital Document Processing

184 Accesses
9 Citations

Abstract

This paper shows a data model for transforming and assembling document information such as SGML or XML documents. The biggest advantage over other data models is that this data model simultaneously provides (1) powerful patterns and contextual conditions, and (2) schema transformation. Patterns and contextual conditions capture conditions on subordinates and those on superiors, siblings, subordinates of siblings, etc, respectively, and have been recognized as highly important mechanisms for identifying document components in the document processing community. Meanwhile, schema transformation has been, since the RDB, recognized as crucial in the database community. However, no data models have provided all three of patterns, contextual conditions, and schema transformation.

This data model is based on the forest-regular language theory. A schema is a forest automaton and an instance is a finite set of forests (sequences of trees). Since the parse tree set of an extended-context free grammar is accepted by a forest automaton, this model is a generalization of Gonnet and Tompa’s grammatical model. Patterns are captured as forest automatons; contextual conditions are pointed forest representations (a variation of Podelski’s pointed tree representations). Controlled by patterns and contextual conditions, an operator creates an instance from an input instance and also creates a reasonably small schema from an input schema. Furthermore, the created schema is often minimally sufficient; any forest permitted by it may be generated by some input instance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

MetaConfigurator: A User-Friendly Tool for Editing Structured Data Files

Article Open access 30 May 2024

Transformation of XML Data Sources for Sequential Path Mining

Hierarchical Graph Transformation Revisited

References

Abiteboul, S., Cluet, S., Milo, T.: Querying and updating the file. VLDB’ 93 19 (1993) 73–84
Google Scholar
Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995)
Google Scholar
Baeza-Yates, R., Navarro, G.: Integrating contents and structure in text retrieval. SIGMOD Record 25:1 (1996) 67–79
Article Google Scholar
Blake, G., Bray, T., Tompa, F.: Shortening the OED: Experience with a grammar-defined database. ACM TOIS 10:3 (1992) 213–232
Article Google Scholar
Christophides, V., Abiteboul, S., Cluet, S., Scholl, M.: From structured documents to novel query facilities. SIGMOD Record 23:2 (1994) 313–324
Article Google Scholar
Colby, L., Van Gucht, D., Saxton, L.: Concepts for modeling and querying list-structured data. Information Processing & Management 30:5 (1994) 687–709
Article Google Scholar
Gécseg, F., Steinby, M.: Tree Automata. Akadémiai Kiadó (1984)
Google Scholar
Gonnet, G., Tompa, F.: Mind your grammar: a new approach to modeling text. VLDB’ 87 13 (1987) 339–346
Google Scholar
Gyssens, M., Paredaens, J., Van Gucht, D.: A grammar-based approach towards unifying hierarchical data models. SIAM Journal on Computing 23:6 (1994) 1093–1137
Article MATH MathSciNet Google Scholar
Murata, M.: Transformation of documents and schemas by patterns and contextual conditions. Lecture Notes in Computer Science 1293 (1997) 153–169
Google Scholar
Pair, C., Quere, A.: Définition et etude des bilangages réguliers. Information and Control 13:6 (1968) 565–593
Article MATH MathSciNet Google Scholar
Podelski, A.: A monoid approach to tree automata. In Tree Automata and Languages North-Holland (1992) 41–56
Google Scholar
Takahashi, M.: Generalizations of regular sets and their application to a study of context-free languages. Information and Control 27 (1975) 1–36
Article MATH MathSciNet Google Scholar
Zdonik, S., Maier, D.: Readings in Object-Oriented Database Systems. Morgan Kaufmann (1990)
Google Scholar

Download references

Author information

Authors and Affiliations

Fuji Xerox Information Systems Co., Ltd., KSP 9A7, 2-1 Sakado 3-chome, Takatsu-ku, Kawasaki-shi, Kanagawa-ken, Japan, 213
Makoto Murata

Authors

Makoto Murata
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical Engineering and Computer Science, University of Wisconsin-Milwaukee, Milwaukee, WI, 53211, USA
Ethan V. Munson
Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD, 21250, USA
Charles Nicholas
Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong SAR
Derick Wood

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Murata, M. (1998). Data Model for Document Transformation and Assembly. In: Munson, E.V., Nicholas, C., Wood, D. (eds) Principles of Digital Document Processing. PODDP 1998. Lecture Notes in Computer Science, vol 1481. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49654-8_12

Download citation

DOI: https://doi.org/10.1007/3-540-49654-8_12
Published: 15 September 2000
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65086-7
Online ISBN: 978-3-540-49654-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics