Abstract
The Standard Generalized Markup Language (SGML), an ISO standard, has become the accepted method of defining markup conventions for text files. SGML is a metalanguage for defining grammars for textual markup in much the same way that Backus-Naur Form is a metalanguage for defining programming-language grammars. Indeed, HTML, the method of marking up a hypertext documents for the World Wide Web, is an SGML grammar. The underlying assumptions of the SGML initiative are that a logical structure of a document can be identified and that it can be indicated by the insertion of labeled matching brackets (start and end tags). Moreover, it is assumed that the nesting relationships of these tags can be described with an extended context-free grammar (the right-hand sides of productions are regular expressions). In this survey of some of the issues raised by the SGML initiative, I reexamine the underlying assumptions and address some of the theoretical questions that SGML raises. In particular, I respond to two kinds of questions. The first kind are technical: Can we decide whether tag minimization is possible? Can we decide whether a proposed content model is legal? Can we remove exceptions in a structure preserving manner? Can we decide whether two SGML grammars are equivalent?
The second kind are philosophical and foundational: What is a logical structure? What logical structures may a document have? Can logical structures always be captured by context-free nesting?
This work was supported under Natural Sciences and Engineering Research Council of Canada grants.
Preview
Unable to display preview. Download preview PDF.
References
A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley Series in Computer Science. Addison-Wesley Publishing Company, Reading, MA, 1986.
G. Berry and R. Sethi. From regular expressions to deterministic automata. Theoretical Computer Science, 48:117–126, 1986.
R. Book, S. Even, S. Greibach, and G. Ott. Ambiguity in graphs and expressions. IEEE Transactions on Computers, C-20(2):149–153, February 1971.
A. Brüggemann-Klein. Regular expressions into finite automata. In I. Simon, editor, Latin '92, pages 87–98, Berlin, 1992. Springer-Verlag. Lecture Notes in Computer Science 583.
A. Brüggemann-Klein. Regular expressions into finite automata. Theoretical Computer Science, 120:197–213, 1993.
A. Brüggemann-Klein. Unambiguity of extended regular expressions in SGML document grammars. In Th. Lengauer, editor, Algorithms—ESA '93, pages 73–84, Berlin, 1993. Springer-Verlag. Lecture Notes in Computer Science 726.
A. Brüggemann-Klein. Compiler-construction tools' and techniques for SGML parsers: Difficulties and solutions. To appear in Electronic Publishing— Origination, Dissemination and Design, 1995.
A. Brüggemann-Klein and D. Wood. One-unambiguous regular languages. To appear in Information and Computation, 1995.
A. Brüggemann-Klein and D. Wood. The validation of SGML content models. To appear in Mathematical and Computer Modelling, 1995.
H. Cameron and D. Wood. Structural equivalence of extended context-free and EOL grammars. Submitted for publication, 1995.
J.-M. Champarnaud. From a regular expression to an automaton. Unpublished Manuscript, 1992.
C.-H. Chen and R. Paige. New theoretical and computational results for regular languages. Technical report 587, Courant Institute, New York University, 1992. Proceedings of the Third Symposium on Combinatorial Pattern Matching.
V. Christofides, S. Christofides, S. Cluet, and M. Scholl. From structured documents to novel query facilities. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, pages 313–324, 1994. SIGMOD Record, 23(2).
S. J. DeRose and D. G. Durand. Making Hypermedia Work: A User's Guide to HyTime. Kluwer Academic, Boston, 1994.
V. M. Glushkov. The abstract theory of automata. Russian Mathematical Surveys, 16:1–53, 1961.
C. F. Goldfarb. A generalized approach to document markup. Proceedings of the ACM SIGPLAN SIGOA Symposium on Text Manipulation, pages 68–73, June 1981. SIGPLAN Notices of the ACM.
C. F. Goldfarb. The SGML Handbook. Clarendon Press, Oxford, 1990.
ISO 8879: Information processing—Text and office systems—Standard Generalized Markup Language (SGML), October 1986. International Organization for Standardization.
ISO/IEC CD 10744: Information Technology—Hypermedia/Time-based structuring language (HyTime), 1991. International Organization for Standardization.
ISO/DIS 10179.2: Information processing—Text and office systems—Document style semantics and specification language (DSSSL), 1994. International Organization for Standardization.
P. Kilpeläinen and D. Wood. Exceptions in SGML document grammars. Submitted for publication, 1995.
E. Leiss. The complexity of restricted regular expressions and the synthesis problem of finite automata. Journal of Computer and System Sciences, 23(3):348–354, December 1981.
R. McNaughton and H. Yamada. Regular expressions and state graphs for automata. IRE Transactions on Electronic Computers, EC-9(1):39–47, March 1960.
B. G. Mirkin. An algorithm for constructing a base in a language of regular expressions. Engineering Cybernetics, 5:110–116, 1966.
J. Nievergelt, G. Coray, Jean-Daniel Nicoud, and Alan C. Shaw. Document Preparation Systems. North Holland, Amsterdam, 1982.
J.-E. Pin. Local languages and the Berry-Sethi algorithm. Unpublished Manuscript, 1992.
D. R. Raymond, F. W. Tompa, and D. Wood. Markup reconsidered. Principles of Document Processing, 1992.
D. R. Raymond, F. W. Tompa, and D. Wood. From data representation to data model: Meta-semantic issues in the evolution of SGML. Computer Standards and Interfaces, to appear, July, 1995.
J.W. Thatcher. Characterizing derivation trees of a context-free grammar through a generalization of finite-automata theory. Journal of Computer and System Sciences, 1:317–322, 1967.
K. Thompson. Regular expression search algorithm. Communications of the ACM, 11(6):419–422, 1968.
J. Warmer and S. van Egmond. The implementation of the Amsterdam SGML parser. Electronic Publishing—Origination, Dissemination and Design, 2:65–90, 1989.
B. W. Watson. A taxonomy of finite automata construction and minimization algorithms. Manuscript, 1993.
D. Wood. Theory of Computation. John Wiley & Sons, New York, NY, 1987.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Wood, D. (1995). Standard Generalized Markup Language: Mathematical and philosophical issues. In: van Leeuwen, J. (eds) Computer Science Today. Lecture Notes in Computer Science, vol 1000. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0015253
Download citation
DOI: https://doi.org/10.1007/BFb0015253
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60105-0
Online ISBN: 978-3-540-49435-5
eBook Packages: Springer Book Archive