Abstract
We propose an approach to dynamically generate database schemas for well-formed XML data. Our approach controls the number of tables to be divided based on statistics of XML so that the total cost of processing queries is reduced. We devise schemas appropriate for complex data such as text formatting and child elements with the small maximum number of occurrences in order to reduce the number of tables. To this end, we define three functions NULL expectation, Large Leaf Fields, and Large Child Fields for controlling the tables to be divided. We evaluated typical XML queries over the generated schemas and normalized schemas and measured and compared both of the costs. Through this, we successfully validated our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
DBLP (Digital Bibliography & Library Project), http://www.informatik.uni-trier.de/~ley/db/index.html (accessed 2004)
Deutsch, A., Fernandez, M., Suciu, D.: Storing Semistructured Data with STORED. In: Proceeding of ACM SIGMOD Conference, June 1999, pp. 431–442 (1999)
Elmasri, R., Navathe, S.B.: Fundamentals of Database Systems, 3rd edn. Pearson Addison Wesley, London (1999)
Florescu, D., Kossmann, D.: Storing and Querying XML Data Using an RDBMS. IEEE Data Engineering Bulletin 22(3), 27–34 (1999)
Jiang, H., Lu, H., Wang, W., Yu, J.X.: Path Materialization Revisited: An Efficient Storage Model for XML Data. In: Proceedings of Thirteenth Australasian Database Conference, pp. 85–94 (2002)
Klettke, M., Meyer, H.:XML and Object-Relational Database Systems Enhancing Structural Mappings Based On Statistics. LNCS, vol. 1997, pp. 151–170. Springer, Heidelberg (2001)
Schmidt, A.R., Waas, F., Kersten, M.L., Florescu, D., Manolescu, I., Carey, M.J., Busse, R.: The XML Benchmark Project. Technical report, INS-R0103, CWI (2001), http://monetdb.cwi.nl/xml/index.html (accessed 2004)
Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton, J.: Relational Databases for Querying XML Documents: Limitations and Opportunities. In: Proceedings of the VLDB Conference, pp. 302–314 (1999)
Tian, F., Witt, D.J.D., Chen, J., Zhang, C.: The Design and Performance Evaluation of Alternative XML Storage Strategies. ACM Sigmod Record 31(1), 5–10 (2002)
XML Schema, http://www.w3.org/TR/xmlschema-0/ (accessed 2004)
Yoshikawa, M., Amagasa, T.: XRel: A path-based approach to storage and retrieval of XML documents using relational databases. ACM Transactions on Internet Technology 1(1), 110–141 (2001)
Zhang, C., Naughton, J.F., DeWitt, D.J., Luo, Q., Lohman, G.M.: On Supporting Containment Queries in Relational Database Management Systems. In: Proceedings of ACM SIGMOD Conference, pp. 425–436 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ishikawa, H., Yokoyama, S., Ohta, M., Katayama, K. (2005). On Mining XML Structures Based on Statistics. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2005. Lecture Notes in Computer Science(), vol 3681. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11552413_55
Download citation
DOI: https://doi.org/10.1007/11552413_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28894-7
Online ISBN: 978-3-540-31983-2
eBook Packages: Computer ScienceComputer Science (R0)