On Mining XML Structures Based on Statistics

Ishikawa, Hiroshi; Yokoyama, Shohei; Ohta, Manabu; Katayama, Kaoru

doi:10.1007/11552413_55

Hiroshi Ishikawa²¹,
Shohei Yokoyama²¹,
Manabu Ohta²¹ &
…
Kaoru Katayama²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3681))

Included in the following conference series:

International Conference on Knowledge-Based and Intelligent Information and Engineering Systems

1077 Accesses
2 Citations

Abstract

We propose an approach to dynamically generate database schemas for well-formed XML data. Our approach controls the number of tables to be divided based on statistics of XML so that the total cost of processing queries is reduced. We devise schemas appropriate for complex data such as text formatting and child elements with the small maximum number of occurrences in order to reduce the number of tables. To this end, we define three functions NULL expectation, Large Leaf Fields, and Large Child Fields for controlling the tables to be divided. We evaluated typical XML queries over the generated schemas and normalized schemas and measured and compared both of the costs. Through this, we successfully validated our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

DBLP (Digital Bibliography & Library Project), http://www.informatik.uni-trier.de/~ley/db/index.html (accessed 2004)
Deutsch, A., Fernandez, M., Suciu, D.: Storing Semistructured Data with STORED. In: Proceeding of ACM SIGMOD Conference, June 1999, pp. 431–442 (1999)
Google Scholar
Elmasri, R., Navathe, S.B.: Fundamentals of Database Systems, 3rd edn. Pearson Addison Wesley, London (1999)
Google Scholar
Florescu, D., Kossmann, D.: Storing and Querying XML Data Using an RDBMS. IEEE Data Engineering Bulletin 22(3), 27–34 (1999)
Google Scholar
Jiang, H., Lu, H., Wang, W., Yu, J.X.: Path Materialization Revisited: An Efficient Storage Model for XML Data. In: Proceedings of Thirteenth Australasian Database Conference, pp. 85–94 (2002)
Google Scholar
Klettke, M., Meyer, H.:XML and Object-Relational Database Systems Enhancing Structural Mappings Based On Statistics. LNCS, vol. 1997, pp. 151–170. Springer, Heidelberg (2001)
Google Scholar
Schmidt, A.R., Waas, F., Kersten, M.L., Florescu, D., Manolescu, I., Carey, M.J., Busse, R.: The XML Benchmark Project. Technical report, INS-R0103, CWI (2001), http://monetdb.cwi.nl/xml/index.html (accessed 2004)
Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton, J.: Relational Databases for Querying XML Documents: Limitations and Opportunities. In: Proceedings of the VLDB Conference, pp. 302–314 (1999)
Google Scholar
Tian, F., Witt, D.J.D., Chen, J., Zhang, C.: The Design and Performance Evaluation of Alternative XML Storage Strategies. ACM Sigmod Record 31(1), 5–10 (2002)
Article Google Scholar
XML Schema, http://www.w3.org/TR/xmlschema-0/ (accessed 2004)
Yoshikawa, M., Amagasa, T.: XRel: A path-based approach to storage and retrieval of XML documents using relational databases. ACM Transactions on Internet Technology 1(1), 110–141 (2001)
Article Google Scholar
Zhang, C., Naughton, J.F., DeWitt, D.J., Luo, Q., Lohman, G.M.: On Supporting Containment Queries in Relational Database Management Systems. In: Proceedings of ACM SIGMOD Conference, pp. 425–436 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Engineering, Tokyo Metropolitan University,
Hiroshi Ishikawa, Shohei Yokoyama, Manabu Ohta & Kaoru Katayama

Authors

Hiroshi Ishikawa
View author publications
You can also search for this author in PubMed Google Scholar
Shohei Yokoyama
View author publications
You can also search for this author in PubMed Google Scholar
Manabu Ohta
View author publications
You can also search for this author in PubMed Google Scholar
Kaoru Katayama
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Business, La Trobe University, 3086, Melbourne, Victoria, Australia
Rajiv Khosla
Centre for SMART systems Engineering Research Centre, University of Brighton, Moulsecoomb, BN2 4GJ, Brighton, UK
Robert J. Howlett
School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, 5095, Mawson Lakes, SA, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ishikawa, H., Yokoyama, S., Ohta, M., Katayama, K. (2005). On Mining XML Structures Based on Statistics. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2005. Lecture Notes in Computer Science(), vol 3681. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11552413_55

Download citation

DOI: https://doi.org/10.1007/11552413_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28894-7
Online ISBN: 978-3-540-31983-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics