Skip to main content

Evolving a Set of DTDs According to a Dynamic Set of XML Documents

  • Conference paper
  • First Online:
XML-Based Data Management and Multimedia Engineering — EDBT 2002 Workshops (EDBT 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2490))

Included in the following conference series:

Abstract

In this paper we address the problem of evolving a set of DTDs so to obtain a description as precise as possible of the structures of the documents actually stored in a source of XML documents. This problem is highly relevant in such a dynamic and heterogeneous environment as the Web. The approach we propose relies on the use of a classification mechanism based on document structure and on the use of data mining association rules to find out frequent structural patterns in data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Abiteboul. Querying Semi-Structured Data. In F. Afrati and P. Kolaitis, editors, Database Theory-ICDT’97, pages 1–18, 1997.

    Google Scholar 

  2. E. Bertino, G. Guerrini, and M. Mesiti. Measuring the Structural Similary among XML Documents and DTDs. Technical Report DISI-TR-02-02, Dipartimento di Informatica e Scienze dell’Informazione, Universitá di Genova, December 2001. Available at http://www.disi.unige.it/person/MesitiM.

  3. M. N. Garofalakis, A. Gionis, R. Rastogi, S. Seshadri, and K. Shim. XTRACT: A System for Extracting Document Type Descriptors from XML Documents. In Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, pages 165–176, 2000.

    Google Scholar 

  4. J. H. Han, and M. Kamber. Data Mining, Concepts and Techniques. Morgan Kaufmann. 2001.

    Google Scholar 

  5. A. Miller. WordNet: A Lexical Database for English. Communications of the ACM, 38(11):39–41, November 1995.

    Google Scholar 

  6. C. Moh, E. Lim, and W. Ng. Re-engineering Structures from Web Documents. In Proc. of the Fifth ACM Int’l Conf. on Digital Libraries, pages 67–76, 2000.

    Google Scholar 

  7. S. Nestorov, S. Abiteboul, and R. Motwani. Extracting Schema from Semistructured Data. In L. M. Haas and A. Tiwary, editors, Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, pages 295–306, 1998.

    Google Scholar 

  8. W3C. Extensible Markup Language (XML) 1.0, 1998.

    Google Scholar 

  9. K. Wang and H. Liu. Discovering Typical Structures of Documents: a Road Map Approach. In Proc of the Twentyfirst Annual Int’l ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 146–154, 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bertino, E., Guerrini, G., Mesiti, M., Tosetto, L. (2002). Evolving a Set of DTDs According to a Dynamic Set of XML Documents. In: Chaudhri, A.B., Unland, R., Djeraba, C., Lindner, W. (eds) XML-Based Data Management and Multimedia Engineering — EDBT 2002 Workshops. EDBT 2002. Lecture Notes in Computer Science, vol 2490. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36128-6_4

Download citation

  • DOI: https://doi.org/10.1007/3-540-36128-6_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00130-0

  • Online ISBN: 978-3-540-36128-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics