Skip to main content

Standard Generalized Markup Language: Mathematical and philosophical issues

  • Chapter
  • First Online:
Computer Science Today

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1000))

Abstract

The Standard Generalized Markup Language (SGML), an ISO standard, has become the accepted method of defining markup conventions for text files. SGML is a metalanguage for defining grammars for textual markup in much the same way that Backus-Naur Form is a metalanguage for defining programming-language grammars. Indeed, HTML, the method of marking up a hypertext documents for the World Wide Web, is an SGML grammar. The underlying assumptions of the SGML initiative are that a logical structure of a document can be identified and that it can be indicated by the insertion of labeled matching brackets (start and end tags). Moreover, it is assumed that the nesting relationships of these tags can be described with an extended context-free grammar (the right-hand sides of productions are regular expressions). In this survey of some of the issues raised by the SGML initiative, I reexamine the underlying assumptions and address some of the theoretical questions that SGML raises. In particular, I respond to two kinds of questions. The first kind are technical: Can we decide whether tag minimization is possible? Can we decide whether a proposed content model is legal? Can we remove exceptions in a structure preserving manner? Can we decide whether two SGML grammars are equivalent?

The second kind are philosophical and foundational: What is a logical structure? What logical structures may a document have? Can logical structures always be captured by context-free nesting?

This work was supported under Natural Sciences and Engineering Research Council of Canada grants.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley Series in Computer Science. Addison-Wesley Publishing Company, Reading, MA, 1986.

    Google Scholar 

  2. G. Berry and R. Sethi. From regular expressions to deterministic automata. Theoretical Computer Science, 48:117–126, 1986.

    Article  Google Scholar 

  3. R. Book, S. Even, S. Greibach, and G. Ott. Ambiguity in graphs and expressions. IEEE Transactions on Computers, C-20(2):149–153, February 1971.

    Google Scholar 

  4. A. Brüggemann-Klein. Regular expressions into finite automata. In I. Simon, editor, Latin '92, pages 87–98, Berlin, 1992. Springer-Verlag. Lecture Notes in Computer Science 583.

    Google Scholar 

  5. A. Brüggemann-Klein. Regular expressions into finite automata. Theoretical Computer Science, 120:197–213, 1993.

    Article  Google Scholar 

  6. A. Brüggemann-Klein. Unambiguity of extended regular expressions in SGML document grammars. In Th. Lengauer, editor, Algorithms—ESA '93, pages 73–84, Berlin, 1993. Springer-Verlag. Lecture Notes in Computer Science 726.

    Google Scholar 

  7. A. Brüggemann-Klein. Compiler-construction tools' and techniques for SGML parsers: Difficulties and solutions. To appear in Electronic Publishing— Origination, Dissemination and Design, 1995.

    Google Scholar 

  8. A. Brüggemann-Klein and D. Wood. One-unambiguous regular languages. To appear in Information and Computation, 1995.

    Google Scholar 

  9. A. Brüggemann-Klein and D. Wood. The validation of SGML content models. To appear in Mathematical and Computer Modelling, 1995.

    Google Scholar 

  10. H. Cameron and D. Wood. Structural equivalence of extended context-free and EOL grammars. Submitted for publication, 1995.

    Google Scholar 

  11. J.-M. Champarnaud. From a regular expression to an automaton. Unpublished Manuscript, 1992.

    Google Scholar 

  12. C.-H. Chen and R. Paige. New theoretical and computational results for regular languages. Technical report 587, Courant Institute, New York University, 1992. Proceedings of the Third Symposium on Combinatorial Pattern Matching.

    Google Scholar 

  13. V. Christofides, S. Christofides, S. Cluet, and M. Scholl. From structured documents to novel query facilities. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, pages 313–324, 1994. SIGMOD Record, 23(2).

    Google Scholar 

  14. S. J. DeRose and D. G. Durand. Making Hypermedia Work: A User's Guide to HyTime. Kluwer Academic, Boston, 1994.

    Google Scholar 

  15. V. M. Glushkov. The abstract theory of automata. Russian Mathematical Surveys, 16:1–53, 1961.

    Article  Google Scholar 

  16. C. F. Goldfarb. A generalized approach to document markup. Proceedings of the ACM SIGPLAN SIGOA Symposium on Text Manipulation, pages 68–73, June 1981. SIGPLAN Notices of the ACM.

    Google Scholar 

  17. C. F. Goldfarb. The SGML Handbook. Clarendon Press, Oxford, 1990.

    Google Scholar 

  18. ISO 8879: Information processing—Text and office systems—Standard Generalized Markup Language (SGML), October 1986. International Organization for Standardization.

    Google Scholar 

  19. ISO/IEC CD 10744: Information Technology—Hypermedia/Time-based structuring language (HyTime), 1991. International Organization for Standardization.

    Google Scholar 

  20. ISO/DIS 10179.2: Information processing—Text and office systems—Document style semantics and specification language (DSSSL), 1994. International Organization for Standardization.

    Google Scholar 

  21. P. Kilpeläinen and D. Wood. Exceptions in SGML document grammars. Submitted for publication, 1995.

    Google Scholar 

  22. E. Leiss. The complexity of restricted regular expressions and the synthesis problem of finite automata. Journal of Computer and System Sciences, 23(3):348–354, December 1981.

    Article  Google Scholar 

  23. R. McNaughton and H. Yamada. Regular expressions and state graphs for automata. IRE Transactions on Electronic Computers, EC-9(1):39–47, March 1960.

    Google Scholar 

  24. B. G. Mirkin. An algorithm for constructing a base in a language of regular expressions. Engineering Cybernetics, 5:110–116, 1966.

    Google Scholar 

  25. J. Nievergelt, G. Coray, Jean-Daniel Nicoud, and Alan C. Shaw. Document Preparation Systems. North Holland, Amsterdam, 1982.

    Google Scholar 

  26. J.-E. Pin. Local languages and the Berry-Sethi algorithm. Unpublished Manuscript, 1992.

    Google Scholar 

  27. D. R. Raymond, F. W. Tompa, and D. Wood. Markup reconsidered. Principles of Document Processing, 1992.

    Google Scholar 

  28. D. R. Raymond, F. W. Tompa, and D. Wood. From data representation to data model: Meta-semantic issues in the evolution of SGML. Computer Standards and Interfaces, to appear, July, 1995.

    Google Scholar 

  29. J.W. Thatcher. Characterizing derivation trees of a context-free grammar through a generalization of finite-automata theory. Journal of Computer and System Sciences, 1:317–322, 1967.

    Google Scholar 

  30. K. Thompson. Regular expression search algorithm. Communications of the ACM, 11(6):419–422, 1968.

    Article  Google Scholar 

  31. J. Warmer and S. van Egmond. The implementation of the Amsterdam SGML parser. Electronic Publishing—Origination, Dissemination and Design, 2:65–90, 1989.

    Google Scholar 

  32. B. W. Watson. A taxonomy of finite automata construction and minimization algorithms. Manuscript, 1993.

    Google Scholar 

  33. D. Wood. Theory of Computation. John Wiley & Sons, New York, NY, 1987.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Jan van Leeuwen

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Wood, D. (1995). Standard Generalized Markup Language: Mathematical and philosophical issues. In: van Leeuwen, J. (eds) Computer Science Today. Lecture Notes in Computer Science, vol 1000. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0015253

Download citation

  • DOI: https://doi.org/10.1007/BFb0015253

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-60105-0

  • Online ISBN: 978-3-540-49435-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics