Abstract
Storing XML documents in relational databases has drawn much attention in recent years because it can leverage existing investments in relational database technologies. Different algorithms have been proposed to map XML DTD/Schema to relational schema in order to store XML data in relational databases. However, most work defines mapping rules based on heuristics without considering application characteristics, hence fails to produce efficient relational schema for various applications. In this paper, we propose a workload-aware approach to generate relational schema from XML data and user specified workload. Our approach adopts the genetic algorithm to find optimal mappings. An elegant encoding method and related operations are proposed to manipulate mappings using bit strings. Various techniques for optimization can be applied to the XML to relational mapping problem based on this representation. We implemented the proposed algorithm and our experiment results showed that our algorithm was more robust and produced better mappings than existing work.
Similar content being viewed by others
References
Agrawal, R., Chaudhuri, S. et al. (2003). Automating layout of relational databases. In Proceedings of ICDE (pp. 607–618).
Ailamaki, A., DeWitt, D. J. et al. (2001). Weaving relations for cache performance. In Proceedings of VLDB (pp. 169–180).
Beheshti, H. M., Hultman, M. et al. (2007). Electronic supply chain management applications by swedish smes. Enterprise Information Systems, 1, 255–268. doi:10.1080/17517570701273221.
Bohannon, P., Freire, J., Roy, P., & Simeon, J. (2002). From xml schema to relations: A cost-based approach to xml storage. In Proc. of the 18th ICDE conference (pp. 64–75).
Bosak, J., Bray, T. et al. (1998). W3c xml specification dtd. Available via http://www.w3.org/XML/1998/xmlspec-report-199980910.htm.
Chen, Y., Davidson, S. B., Hara, C. S., & Zheng, Y. (2003). Rrxf: Redundancy reducing xml storage in relations. In Proc. of the 29th VLDB conference (pp. 189–200).
Copeland, G. P., & Khoshafian, S. (1985). A decomposition storage model. In Proceedings of the ACM SIGMOD international conference on management of data (pp. 268–279).
Davidson, S. B., Fan, W., Hara, C., & Qin, J. (2003). Propagating xml constraints to relations. In Proc. of the 19th ICDE conference (pp. 543–554).
Davis, L. (1987). Genetic algorithms and simulated annealing. Research notes in artificial intelligence. San Mateo, CA: Morgan Kaufmann.
DBL (2007). Available via http://dblp.uni-trier.de/xml/.
Deutsch, A., Fernandez, M. F., & Suciu, D. (1999). Storing semistructured data with stored. In Proc. of the 1999 ACM SIGMOD conference (pp. 431–442).
Florescu, D., & Kossmann, D. (1999). Storing and querying xml data using an rdmbs. IEEE Data Engineer Bulletin, 22, 27–34.
Florescu, D. et al. (1999). A performance evaluation of alternative mapping schemes for storing xml data in a relational database (Technical Report). Available via www.dbis.ethz.ch/research/publications/21a.pdf.
Holland, J. H. (1975). Adaptation in natural and artifical systems. Ann Arbor, MI: University of Michigan Press.
Hsu, C., & Wallace, W. A. (2007). An industrial network flow information integration model for supply chain management and intelligent transportation. Enterprise Information Systems, 1, 327–351. doi:10.1080/17517570701504633 .
IBM (2004). Ibm db2 xml extender. Available via http://www-306.ibm.com/software/data/db2/.
Krishnamurthy, R., Chakaravarthy, V. T., & Naughton, J. F. (2003a). On the difficulty of finding optimal relational decompositions for xml workloads: A complexity theoretic perspective. In Proc. of the 9th ICDT conference (pp. 270–284).
Krishnamurthy, R., Kaushik, R., & Naughton, J. F. (2003b). Xml-sql query translation literature: The state of the art and open problems. In Proc. of the 1st XSym symposium (pp. 1–18).
Krishnamurthy, R., Chakaravarthy, V. T. et al. (2004a). Recursive xml schemas, recursive xml queries, and relational storage: XML-to-sql query translation. In Proc. of the 20th ICDE conference (pp. 42–53).
Krishnamurthy, R., Kaushik, R., & Naughton, J. F. (2004b). Efficient xml-to-sql query translation: Where to add the intelligence. In Proc. of the 30th VLDB conference (pp. 144–155).
Microsoft (2004). Microsoft sql server 2000 books online, xml and internet support. Available via http://msdn2.microsoft.com/.
Schmidt, A., Kersten, M. L., Windhouwer, M., & Waas, F. (2000). Efficient relational storage and retrieval of xml documents. In Proc. of the 3rd WebDB workshop (pp. 137–150).
Shanmugasundaram, J., Tufte, K., Zhang, C., He, G. et al. (1999). Relational databases for querying xml documents: Limitations and opportunities. In Proc. of the 25th VLDB conference (pp. 302–314).
Shanmugasundaram, J., Shekita, E. J. et al. (2001). A general techniques for querying xml documents using a relational database system. SIGMOD Record, 30, 20–26. doi:10.1145/603867.603871.
Tian, F., DeWitt, D. J., Chen, J., & Zhang, C. (2002). The design and performance evaluation of alternative xml storage strategies. SIGMOD Record, 31, 5–10. doi:10.1145/507338.507341.
Zang, C., & Fan, Y. (2007). Complex event processing in enterprise information systems based on rfid. Enterprise Information Systems, 1, 3–23. doi:10.1080/17517570601092127.
Zhang, N., Kacholia, V., & Özsu, M. T. (2004). A succinct physical storage scheme for efficient evaluation of path queries in xml. In Proc. of the 20th ICDE conference (pp. 54–65).
Zheng, S., Wen, J. R., & Lu, H. (2003). Cost-driven storage schema selection for xml. In Proc. of the 8th DASFAA conference (pp. 337–344).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, X., Luan, J., Liu, G. et al. Towards optimal workload-aware XML to relational schema mapping. Ann Oper Res 168, 133–150 (2009). https://doi.org/10.1007/s10479-008-0361-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-008-0361-y