Skip to main content

Approximate Graph Schema Extraction for Semi-structured Data

  • Conference paper
  • First Online:
Advances in Database Technology — EDBT 2000 (EDBT 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1777))

Included in the following conference series:

Abstract

Semi-structured data are typically represented in the form of labeled directed graphs. They are self-describing and schemaless. The lack of a schema renders query processing over semi-structured data expensive. To overcome this predicament, some researchers proposed to use the structure of the data for schema representation. Such schemas are commonly referred to as graph schemas. Nevertheless, since semi- structured data are irregular and frequently subjected to modifications, it is costly to construct an accurate graph schema and worse still, it is difficult to maintain it thereafter. Furthermore, an accurate graph schema is generally very large, hence impractical. In this paper, an approximation approach is proposed for graph schema extraction. Approximation is achieved by summarizing the semi-structured data graph using an incremental clustering method. The preliminary experimental results have shown that approximate graph schemas were more compact than the conventional accurate graph schemas and promising in query evaluation that involved regular path expressions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Abiteboul. Querying semi-structured data. In Proceedings of the International Conference On Database Theory, 1997.

    Google Scholar 

  2. S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wiener. The lorel query language for semi-structured data. International Journal on Digital Libraries, 1(1):68–88, 1997.

    Article  Google Scholar 

  3. P. Bueman, S. Davidson, G. Hillebrand, and D. Suciu. A query language and optimization techniques for unstructured data. In Proceedings of ACM SIGMOD International Conference on Management of Data, 1996.

    Google Scholar 

  4. P. Buneman. Semistructured data. In Proceedings of PODS, 1997.

    Google Scholar 

  5. P. Buneman, S. Davidson, M. Fernandez, and D. Suciu. Adding structure to unstructured data. In Proceedings of International Conference on Database Theory, 1997.

    Google Scholar 

  6. V. Christophides, S. Cluet, and G. Moerkotte. Evaluating queries with generalized path expressions. In Proceedings of ACM SIGMOD International Conference on Management of Data, 1996.

    Google Scholar 

  7. M. Fernandez and D. Suciu. Optimizing regular path expressions using graph schemas. In Proceedings of International Conference on Data Engineering, 1998.

    Google Scholar 

  8. D. Fisher. Knowledge acquisition via incremental conceptual clustering. In J. Shavlik and T. Dietterich, editors, Readings in Machine Learning. Morgan Kaufmann Publishers, 1990.

    Google Scholar 

  9. G. Gardarin, J. Gruser, and Z. Tang. Cost-based selection of path expression processing algorithms in object-oriented databases. In Proceedings of the 22nd International Conference on Very Large Data Bases, 1996.

    Google Scholar 

  10. R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. In Proceedings of the 23rd International Conference on Very Large Data Bases, 1997.

    Google Scholar 

  11. R. Goldman and J. Widom. Approximate dataguides. Technical report, Stanford University, 1998.

    Google Scholar 

  12. G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73–170, 1993.

    Article  Google Scholar 

  13. D. Konopnicki and O. Shmueli. W3qs:a query system for the world wide web. In Proceedings of the International Conference on Very Large Data Bases, 1995.

    Google Scholar 

  14. J. McHugh and J. Widom. Compile-time path expansion in lore. Technical report, Stanford University, 1998.

    Google Scholar 

  15. A. Mendelzon, G. Mihaila, and T. Milo. Querying the world wide web. In Proceedings of the Fourth Conference on Parallel and Distributed Information Systems, 1996.

    Google Scholar 

  16. A. Mendelzon and P. Wood. Finding regular simple paths in graph databases. SIAM Journal of Computing, 24(6):1235–1258, 1995.

    Article  MATH  MathSciNet  Google Scholar 

  17. S. Nestorov, S. Abiteboul, and R. Motwani. Inferring structure in semistructured data. In Proceedings of the Workshop on Management of Semistructured Data, 1997.

    Google Scholar 

  18. S. Nestorov, S. Abiteboul, and R. Motwani. Extracting schema from semistructured data. In Proceedings of ACM SIGMOD International Conference on Management of Data, 1998.

    Google Scholar 

  19. S. Nestorov, J. Ullman, J. Wiener, and S. Chawathe. Representative objects: Concise representations of semistructured, hierarchical data. In Proceedings of International Conference on Data Engineering, 1997.

    Google Scholar 

  20. Y. Papakonstantinou, H. Garcia-Molina, and J. Widom. Object exchange across heterogeneous information sources. In Proceedings of International Conference on Data Engineering, 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, Q.Y., Yu, J.X., Wong, KF. (2000). Approximate Graph Schema Extraction for Semi-structured Data. In: Zaniolo, C., Lockemann, P.C., Scholl, M.H., Grust, T. (eds) Advances in Database Technology — EDBT 2000. EDBT 2000. Lecture Notes in Computer Science, vol 1777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46439-5_21

Download citation

  • DOI: https://doi.org/10.1007/3-540-46439-5_21

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67227-2

  • Online ISBN: 978-3-540-46439-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics