Learning DMEs from Positive and Negative Examples

Li, Yeting; Dong, Chunmei; Chu, Xinyu; Chen, Haiming

doi:10.1007/978-3-030-18590-9_61

Learning DMEs from Positive and Negative Examples

Yeting Li^19,20,
Chunmei Dong^19,20,
Xinyu Chu^19,20 &
…
Haiming Chen¹⁹

Conference paper
First Online: 24 April 2019

3542 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11448))

Abstract

The presence of a schema for XML documents has numerous advantages. Unfortunately, many XML documents in practice are not accompanied by a (valid) schema. Therefore, it is essential to devise algorithms to infer schemas from XML documents. The fundamental task in XML schema inference is learning regular expressions. In this paper we consider unordered XML, where the relative order among siblings is ignored, and focus on the subclass called disjunctive multiplicity expressions (DMEs) which are proposed for unordered XML. Previous work in this direction lacks inference algorithms that support for learning DME from both positive and negative examples. In this paper, we provide an algorithm to learn DMEs from both positive and negative examples based on genetic algorithms.

Work supported by the National Natural Science Foundation of China under Grant Nos. 61872339 and 61472405.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Unordered concatenation can be viewed as a weaker form of interleaving.

References

Abiteboul, S., Bourhis, P., Vianu, V.: Highly expressive query languages for unordered data trees. Theory Comput. Syst. 57(4), 927–966 (2015)
Article MathSciNet Google Scholar
Benedikt, M., Fan, W., Geerts, F.: Xpath satisfiability in the presence of DTDs. J. ACM 55(2), 8:1–8:79 (2008)
Article MathSciNet Google Scholar
Bex, G.J., Neven, F., Vansummeren, S.: Inferring XML schema definitions from XML Data. In: Proceedings of the 33rd International Conference on VLDB, pp. 998–1009 (2007)
Google Scholar
Boneva, I., Ciucanu, R., Staworko, S.: Simple schemas for unordered XML. In: Proceedings of the 16th International Conference on WebDB, pp. 13–18 (2013)
Google Scholar
Che, D., Aberer, K., Özsu, M.T.: Query optimization in XML structured-document databases. J. VLDB 15(3), 263–289 (2006)
Article Google Scholar
Ciucanu, R., Staworko, S.: Learning schemas for unordered XML. In: Proceedings of the 14th International Conference on DBPL (2013)
Google Scholar
Gold, E.M.: Language identification in the limit. Inf. Control 10(5), 447–474 (1967)
Article MathSciNet Google Scholar
Grijzenhout, S., Marx, M.: The quality of the XML Web. J. Web Sem 19, 59–68 (2013)
Article Google Scholar
Li, Y., Mou, X., Chen, H.: Learning concise relax NG schemas supporting interleaving from XML documents. In: Gan, G., Li, B., Li, X., Wang, S. (eds.) ADMA 2018. LNCS (LNAI), vol. 11323, pp. 303–317. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05090-0_26
Chapter Google Scholar
Li, Y., Zhang, X., Xu, H., Mou, X., Chen, H.: Learning restricted regular expressions with interleaving from XML data. In: Trujillo, J.C., et al. (eds.) ER 2018. LNCS, vol. 11157, pp. 586–593. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00847-5_43
Chapter Google Scholar
Martens, W., Neven, F.: Frontiers of tractability for typechecking simple XML transformations. In: Proceedings of the 23rd International Conference on PODS, pp. 23–34 (2004)
Google Scholar
Peng, F., Chen, H.: Discovering restricted regular expressions with interleaving. In: Cheng, R., Cui, B., Zhang, Z., Cai, R., Xu, J. (eds.) APWeb 2015. LNCS, vol. 9313, pp. 104–115. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25255-1_9
Chapter Google Scholar
Staworko, S., Boneva, I., Gayo, J.E.L., Hym, S., Prud’hommeaux, E.G., Solbrig, H.R.: Complexity and expressiveness of shex for RDF. In: Proceedings of the 18th International Conference on ICDT, pp. 195–211 (2015)
Google Scholar
Zhang, X., Li, Y., Cui, F., Dong, C., Chen, H.: Inference of a concise regular expression considering interleaving from XML documents. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10938, pp. 389–401. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93037-4_31
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, 100190, China
Yeting Li, Chunmei Dong, Xinyu Chu & Haiming Chen
University of Chinese Academy of Sciences, Beijing, China
Yeting Li, Chunmei Dong & Xinyu Chu

Authors

Yeting Li
View author publications
You can also search for this author in PubMed Google Scholar
Chunmei Dong
View author publications
You can also search for this author in PubMed Google Scholar
Xinyu Chu
View author publications
You can also search for this author in PubMed Google Scholar
Haiming Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haiming Chen .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Guoliang Li
Duke University, Durham, NC, USA
Jun Yang
University of Porto, Porto, Portugal
Joao Gama
Chiang Mai University, Chiang Mai, Thailand
Juggapong Natwichai
Beihang University, Beijing, China
Yongxin Tong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Dong, C., Chu, X., Chen, H. (2019). Learning DMEs from Positive and Negative Examples. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds) Database Systems for Advanced Applications. DASFAA 2019. Lecture Notes in Computer Science(), vol 11448. Springer, Cham. https://doi.org/10.1007/978-3-030-18590-9_61

Download citation

DOI: https://doi.org/10.1007/978-3-030-18590-9_61
Published: 24 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18589-3
Online ISBN: 978-3-030-18590-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics