Skip to main content

On Mining XML Structures Based on Statistics

  • Conference paper
Knowledge-Based Intelligent Information and Engineering Systems (KES 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3681))

Abstract

We propose an approach to dynamically generate database schemas for well-formed XML data. Our approach controls the number of tables to be divided based on statistics of XML so that the total cost of processing queries is reduced. We devise schemas appropriate for complex data such as text formatting and child elements with the small maximum number of occurrences in order to reduce the number of tables. To this end, we define three functions NULL expectation, Large Leaf Fields, and Large Child Fields for controlling the tables to be divided. We evaluated typical XML queries over the generated schemas and normalized schemas and measured and compared both of the costs. Through this, we successfully validated our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. DBLP (Digital Bibliography & Library Project), http://www.informatik.uni-trier.de/~ley/db/index.html (accessed 2004)

  2. Deutsch, A., Fernandez, M., Suciu, D.: Storing Semistructured Data with STORED. In: Proceeding of ACM SIGMOD Conference, June 1999, pp. 431–442 (1999)

    Google Scholar 

  3. Elmasri, R., Navathe, S.B.: Fundamentals of Database Systems, 3rd edn. Pearson Addison Wesley, London (1999)

    Google Scholar 

  4. Florescu, D., Kossmann, D.: Storing and Querying XML Data Using an RDBMS. IEEE Data Engineering Bulletin 22(3), 27–34 (1999)

    Google Scholar 

  5. Jiang, H., Lu, H., Wang, W., Yu, J.X.: Path Materialization Revisited: An Efficient Storage Model for XML Data. In: Proceedings of Thirteenth Australasian Database Conference, pp. 85–94 (2002)

    Google Scholar 

  6. Klettke, M., Meyer, H.:XML and Object-Relational Database Systems Enhancing Structural Mappings Based On Statistics. LNCS, vol. 1997, pp. 151–170. Springer, Heidelberg (2001)

    Google Scholar 

  7. Schmidt, A.R., Waas, F., Kersten, M.L., Florescu, D., Manolescu, I., Carey, M.J., Busse, R.: The XML Benchmark Project. Technical report, INS-R0103, CWI (2001), http://monetdb.cwi.nl/xml/index.html (accessed 2004)

  8. Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton, J.: Relational Databases for Querying XML Documents: Limitations and Opportunities. In: Proceedings of the VLDB Conference, pp. 302–314 (1999)

    Google Scholar 

  9. Tian, F., Witt, D.J.D., Chen, J., Zhang, C.: The Design and Performance Evaluation of Alternative XML Storage Strategies. ACM Sigmod Record 31(1), 5–10 (2002)

    Article  Google Scholar 

  10. XML Schema, http://www.w3.org/TR/xmlschema-0/ (accessed 2004)

  11. Yoshikawa, M., Amagasa, T.: XRel: A path-based approach to storage and retrieval of XML documents using relational databases. ACM Transactions on Internet Technology 1(1), 110–141 (2001)

    Article  Google Scholar 

  12. Zhang, C., Naughton, J.F., DeWitt, D.J., Luo, Q., Lohman, G.M.: On Supporting Containment Queries in Relational Database Management Systems. In: Proceedings of ACM SIGMOD Conference, pp. 425–436 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ishikawa, H., Yokoyama, S., Ohta, M., Katayama, K. (2005). On Mining XML Structures Based on Statistics. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2005. Lecture Notes in Computer Science(), vol 3681. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11552413_55

Download citation

  • DOI: https://doi.org/10.1007/11552413_55

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28894-7

  • Online ISBN: 978-3-540-31983-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics