Skip to main content

Advertisement

Log in

Towards optimal workload-aware XML to relational schema mapping

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

Storing XML documents in relational databases has drawn much attention in recent years because it can leverage existing investments in relational database technologies. Different algorithms have been proposed to map XML DTD/Schema to relational schema in order to store XML data in relational databases. However, most work defines mapping rules based on heuristics without considering application characteristics, hence fails to produce efficient relational schema for various applications. In this paper, we propose a workload-aware approach to generate relational schema from XML data and user specified workload. Our approach adopts the genetic algorithm to find optimal mappings. An elegant encoding method and related operations are proposed to manipulate mappings using bit strings. Various techniques for optimization can be applied to the XML to relational mapping problem based on this representation. We implemented the proposed algorithm and our experiment results showed that our algorithm was more robust and produced better mappings than existing work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agrawal, R., Chaudhuri, S. et al. (2003). Automating layout of relational databases. In Proceedings of ICDE (pp. 607–618).

  • Ailamaki, A., DeWitt, D. J. et al. (2001). Weaving relations for cache performance. In Proceedings of VLDB (pp. 169–180).

  • Beheshti, H. M., Hultman, M. et al. (2007). Electronic supply chain management applications by swedish smes. Enterprise Information Systems, 1, 255–268. doi:10.1080/17517570701273221.

    Article  Google Scholar 

  • Bohannon, P., Freire, J., Roy, P., & Simeon, J. (2002). From xml schema to relations: A cost-based approach to xml storage. In Proc. of the 18th ICDE conference (pp. 64–75).

  • Bosak, J., Bray, T. et al. (1998). W3c xml specification dtd. Available via http://www.w3.org/XML/1998/xmlspec-report-199980910.htm.

  • Chen, Y., Davidson, S. B., Hara, C. S., & Zheng, Y. (2003). Rrxf: Redundancy reducing xml storage in relations. In Proc. of the 29th VLDB conference (pp. 189–200).

  • Copeland, G. P., & Khoshafian, S. (1985). A decomposition storage model. In Proceedings of the ACM SIGMOD international conference on management of data (pp. 268–279).

  • Davidson, S. B., Fan, W., Hara, C., & Qin, J. (2003). Propagating xml constraints to relations. In Proc. of the 19th ICDE conference (pp. 543–554).

  • Davis, L. (1987). Genetic algorithms and simulated annealing. Research notes in artificial intelligence. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • DBL (2007). Available via http://dblp.uni-trier.de/xml/.

  • Deutsch, A., Fernandez, M. F., & Suciu, D. (1999). Storing semistructured data with stored. In Proc. of the 1999 ACM SIGMOD conference (pp. 431–442).

  • Florescu, D., & Kossmann, D. (1999). Storing and querying xml data using an rdmbs. IEEE Data Engineer Bulletin, 22, 27–34.

    Google Scholar 

  • Florescu, D. et al. (1999). A performance evaluation of alternative mapping schemes for storing xml data in a relational database (Technical Report). Available via www.dbis.ethz.ch/research/publications/21a.pdf.

  • Holland, J. H. (1975). Adaptation in natural and artifical systems. Ann Arbor, MI: University of Michigan Press.

    Google Scholar 

  • Hsu, C., & Wallace, W. A. (2007). An industrial network flow information integration model for supply chain management and intelligent transportation. Enterprise Information Systems, 1, 327–351. doi:10.1080/17517570701504633 .

    Article  Google Scholar 

  • IBM (2004). Ibm db2 xml extender. Available via http://www-306.ibm.com/software/data/db2/.

  • Krishnamurthy, R., Chakaravarthy, V. T., & Naughton, J. F. (2003a). On the difficulty of finding optimal relational decompositions for xml workloads: A complexity theoretic perspective. In Proc. of the 9th ICDT conference (pp. 270–284).

  • Krishnamurthy, R., Kaushik, R., & Naughton, J. F. (2003b). Xml-sql query translation literature: The state of the art and open problems. In Proc. of the 1st XSym symposium (pp. 1–18).

  • Krishnamurthy, R., Chakaravarthy, V. T. et al. (2004a). Recursive xml schemas, recursive xml queries, and relational storage: XML-to-sql query translation. In Proc. of the 20th ICDE conference (pp. 42–53).

  • Krishnamurthy, R., Kaushik, R., & Naughton, J. F. (2004b). Efficient xml-to-sql query translation: Where to add the intelligence. In Proc. of the 30th VLDB conference (pp. 144–155).

  • Microsoft (2004). Microsoft sql server 2000 books online, xml and internet support. Available via http://msdn2.microsoft.com/.

  • Schmidt, A., Kersten, M. L., Windhouwer, M., & Waas, F. (2000). Efficient relational storage and retrieval of xml documents. In Proc. of the 3rd WebDB workshop (pp. 137–150).

  • Shanmugasundaram, J., Tufte, K., Zhang, C., He, G. et al. (1999). Relational databases for querying xml documents: Limitations and opportunities. In Proc. of the 25th VLDB conference (pp. 302–314).

  • Shanmugasundaram, J., Shekita, E. J. et al. (2001). A general techniques for querying xml documents using a relational database system. SIGMOD Record, 30, 20–26. doi:10.1145/603867.603871.

    Article  Google Scholar 

  • Tian, F., DeWitt, D. J., Chen, J., & Zhang, C. (2002). The design and performance evaluation of alternative xml storage strategies. SIGMOD Record, 31, 5–10. doi:10.1145/507338.507341.

    Article  Google Scholar 

  • Zang, C., & Fan, Y. (2007). Complex event processing in enterprise information systems based on rfid. Enterprise Information Systems, 1, 3–23. doi:10.1080/17517570601092127.

    Article  Google Scholar 

  • Zhang, N., Kacholia, V., & Özsu, M. T. (2004). A succinct physical storage scheme for efficient evaluation of path queries in xml. In Proc. of the 20th ICDE conference (pp. 54–65).

  • Zheng, S., Wen, J. R., & Lu, H. (2003). Cost-driven storage schema selection for xml. In Proc. of the 8th DASFAA conference (pp. 337–344).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoling Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, X., Luan, J., Liu, G. et al. Towards optimal workload-aware XML to relational schema mapping. Ann Oper Res 168, 133–150 (2009). https://doi.org/10.1007/s10479-008-0361-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-008-0361-y

Keywords

Navigation