Abstract
The presence of a schema for XML documents has numerous advantages. Unfortunately, many XML documents in practice are not accompanied by a (valid) schema. Therefore, it is essential to devise algorithms to infer schemas from XML documents. The fundamental task in XML schema inference is learning regular expressions. In this paper we consider unordered XML, where the relative order among siblings is ignored, and focus on the subclass called disjunctive multiplicity expressions (DMEs) which are proposed for unordered XML. Previous work in this direction lacks inference algorithms that support for learning DME from both positive and negative examples. In this paper, we provide an algorithm to learn DMEs from both positive and negative examples based on genetic algorithms.
Work supported by the National Natural Science Foundation of China under Grant Nos. 61872339 and 61472405.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Unordered concatenation can be viewed as a weaker form of interleaving.
References
Abiteboul, S., Bourhis, P., Vianu, V.: Highly expressive query languages for unordered data trees. Theory Comput. Syst. 57(4), 927–966 (2015)
Benedikt, M., Fan, W., Geerts, F.: Xpath satisfiability in the presence of DTDs. J. ACM 55(2), 8:1–8:79 (2008)
Bex, G.J., Neven, F., Vansummeren, S.: Inferring XML schema definitions from XML Data. In: Proceedings of the 33rd International Conference on VLDB, pp. 998–1009 (2007)
Boneva, I., Ciucanu, R., Staworko, S.: Simple schemas for unordered XML. In: Proceedings of the 16th International Conference on WebDB, pp. 13–18 (2013)
Che, D., Aberer, K., Özsu, M.T.: Query optimization in XML structured-document databases. J. VLDB 15(3), 263–289 (2006)
Ciucanu, R., Staworko, S.: Learning schemas for unordered XML. In: Proceedings of the 14th International Conference on DBPL (2013)
Gold, E.M.: Language identification in the limit. Inf. Control 10(5), 447–474 (1967)
Grijzenhout, S., Marx, M.: The quality of the XML Web. J. Web Sem 19, 59–68 (2013)
Li, Y., Mou, X., Chen, H.: Learning concise relax NG schemas supporting interleaving from XML documents. In: Gan, G., Li, B., Li, X., Wang, S. (eds.) ADMA 2018. LNCS (LNAI), vol. 11323, pp. 303–317. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05090-0_26
Li, Y., Zhang, X., Xu, H., Mou, X., Chen, H.: Learning restricted regular expressions with interleaving from XML data. In: Trujillo, J.C., et al. (eds.) ER 2018. LNCS, vol. 11157, pp. 586–593. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00847-5_43
Martens, W., Neven, F.: Frontiers of tractability for typechecking simple XML transformations. In: Proceedings of the 23rd International Conference on PODS, pp. 23–34 (2004)
Peng, F., Chen, H.: Discovering restricted regular expressions with interleaving. In: Cheng, R., Cui, B., Zhang, Z., Cai, R., Xu, J. (eds.) APWeb 2015. LNCS, vol. 9313, pp. 104–115. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25255-1_9
Staworko, S., Boneva, I., Gayo, J.E.L., Hym, S., Prud’hommeaux, E.G., Solbrig, H.R.: Complexity and expressiveness of shex for RDF. In: Proceedings of the 18th International Conference on ICDT, pp. 195–211 (2015)
Zhang, X., Li, Y., Cui, F., Dong, C., Chen, H.: Inference of a concise regular expression considering interleaving from XML documents. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10938, pp. 389–401. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93037-4_31
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, Y., Dong, C., Chu, X., Chen, H. (2019). Learning DMEs from Positive and Negative Examples. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds) Database Systems for Advanced Applications. DASFAA 2019. Lecture Notes in Computer Science(), vol 11448. Springer, Cham. https://doi.org/10.1007/978-3-030-18590-9_61
Download citation
DOI: https://doi.org/10.1007/978-3-030-18590-9_61
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18589-3
Online ISBN: 978-3-030-18590-9
eBook Packages: Computer ScienceComputer Science (R0)